A new tool from researchers at the University of Chicago promises to protect art from being hoovered up by AI models and used for training without permission by “poisoning” image data.
Known as Nightshade, the tool tweaks digital image data in ways that are claimed to be invisible to the human eye but cause all kinds of borkage for generative training models, such as DALL-E, Midjourney, and Stable Diffusion.
The technique, known as data poisoning, claims to introduce “unexpected behaviors into machine learning models at training time.” The University of Chicago team claim their research paper shows such poisoning attacks can be “surprisingly” successful.
Apparently, the poison samples images look “visually identical” to benign images. It’s claimed the Nightshade poison samples are “optimized for potency” and can corrupt an Stable Diffusion SDXL prompt in fewer than 100 poison samples.
The specifics of how the technology works isn’t entirely clear, but involves altering image pixels in ways that are invisible to the human eye while causing the machine-learning models to misinterpret the content. It’s claimed that the poisoned data is very difficult to remove, with the implication that each poisoned image must be manually identified and removed from the model.
(Image credit: University of Chicago)
Using Stable Diffusion as a test subject, the researchers found that it took just 300 poison samples to confuse the model into think a dog was a cat or a hat is a cake. Or is it the other way round?
Anyway, they also say that the impact of the poisoned images can extend to related concepts, allowing a moderate number of Nightshade attacks to “destabilize general features in a text-to-image generative model, effectively disabling its ability to generate meaningful images.”
(Image credit: Future)
Best gaming monitor: Pixel-perfect panels for your PC.
Best high refresh rate monitor: Screaming quick.
Best 4K monitor for gaming: When only high-res will do.
Best 4K TV for gaming: Big-screen 4K gaming.
All that said, the team concedes that bringing down the larger models isn’t quite so easy. Thousands of poisoned images would be required. Which is probably a good thing from a malicious actor perspective. In other words, it would take a concerted effort to undermine any given large generative model.
So, is that—boom!—your AI imaging model up in smoke? Perhaps, but might one also imagine the mighty AI generative hive mind require all of three picoseconds to register, adjust for and render entirely redundant such measures now that the technology has been unveiled? At which point man fights back with a new attack vector and the eternal struggle continues as the skulls and machine parts pile up across the post-thermonuclear wasteland.
Or something like that. It will certainly be interesting to see if this kind of counter measure really works, and perhaps more pertinently how long it lasts if it does.