Top tips is a weekly column where we highlight what’s trending in the tech world today and list ways to explore these trends. This week, we’ll discuss how artists can prevent their work from being used to train Generative AI models.
Aren’t AI image generation tools impressive forms of tech? I mean, it still feels unreal (to me, at least) that you’re able to get complete works of art within a few seconds from a simple textual prompt. While these outputs may sometimes look a little rough around the edges, this tech has developed at breakneck speed over the last few years, and we’re fast approaching a tipping point where AI-generated visual media is indistinguishable from human-made content.
This is made possible thanks to the incredibly complex machine learning (ML) algorithms that these tools are built around. But this raises a critical question: where does the training data come from? Often, it’s sourced from existing artwork created by human artists, including copyrighted pieces.
The advent of generative AI (GenAI) has opened a Pandora’s box of copyright claims and discussions centering around fair use. In fact, companies like OpenAI and Stability AI are being sued for using copyrighted material in their training data, yet since there are no laws expressly forbidding copyrighted work from being used to train AI, these companies can rightly claim fair use. The real problem arises when these tools perfectly reproduce a particular artist’s style due to being trained on it, which is why these AI providers are facing litigation.
As an artist, bearing all of this in mind, here are four steps you can take to protect your work from the algorithmic clutches of AI tools.
1. Opt out of AI training
Most AI providers allow you to opt out of training, however this process is usually quite complicated (usually involving sending emails to the providers or filling out forms). For example, OpenAI allows content creators to fill out a form asking not to use their work to train its AI models. However, in this case, the creator has to make separate entries on this form for every single piece of artwork they want removed from the training dataset, meaning it can get extremely time-consuming if you have an extensive collection of artwork.
Make sure to also do your research before uploading your work to a particular image hosting platform. Some platforms may not allow you to opt out of AI training once your work has been uploaded. For example, Adobe uses images from Adobe Stock to train Firefly, its generative model. So if you don’t want your work being used to train Firefly, you’re left with no choice but to simply avoid uploading your work to Adobe Stock.
2. Adversarial cloaking
Adversarial cloaking is an anti-training approach that involves adding noise to your artwork to confuse AI models. “Noise” refers to subtle disturbances or visual artifacts in images that are meant to interfere with an ML model’s image recognition capabilities. These disturbances are usually invisible to the naked eye but can be picked up by AI, making it difficult for these tools to identify unique characteristics of your artwork and replicate your style.
Two extremely popular and effective anti-training tools currently available are Glaze and Nightshade, both developed by the same team at the University of Chicago. Glaze functions as a simple image cloaking tool that focuses on confusing AI. Nightshade, on the other hand, is a data poisoning tool that takes things a step further. It introduces noise that can fool an AI model into misclassifying your artwork as something entirely different with the end goal of adversely affecting the entire model’s outputs.
3. Mislabeling your artwork
This is one of the more aggressive measures against GenAI and comes with its own caveats. When working with training data, due to the large volume involved, it’s impossible to review each image to ensure it’s the right fit for the desired output. As a result, most AI models rely on the tags assigned by the original artists to classify artwork. A simple trick to combat the use of your artwork in training these models is to simply apply misleading tags to your work, rendering it unusable to the ML algorithm. Coming back to the caveat I referred to at the beginning of this point; mislabeling your artwork can adversely affect your SEO ranking.
4. Legal action
As we’ve already discussed, the use of copyrighted content to train AI models is currently protected under fair use. However, when these models recreate your artwork and replicate your style, you have a copyright infringement case on your hands, but you need to be able to prove that the AI-generated artwork is substantially similar to your original work. There are known examples of AI providers being sued for copyright infringement due to their tools replicating an artist’s or writer’s work. The OpenAI and Stability AI cases I linked to earlier are just two examples of this.
Lawmakers have their work cut out
The legislation surrounding GenAI and copyright is currently very ambiguous and murky. Lawmakers need to define the rules and criteria governing fair use more clearly since GenAI has completely changed the entire discussion around copyrighted content. There are currently several lawsuits pending against the big players in the GenAI space, and the results of these lawsuits will define how we view GenAI’s relationship with copyrighted content and the limits of fair use. Thanks to GenAI, it’s likely sweeping changes to copyright law are imminent, but until then, artists need to take matters into their own hands to ensure their work isn’t being used by AI.