Protecting AI Models from “Data Poisoning”

New ways to thwart backdoor control of deep learning systems

4 min read

red dots connected by lines on a dark background
iStock

Training data sets for deep-learning models involves billions of data samples, curated by crawling the Internet. Trust is an implicit part of the arrangement. And that trust appears increasingly threatened via a new kind of cyberattack called “data poisoning”—in which trawled data for deep-learning training is compromised with intentional malicious information. Now a team of computer scientists from ETH Zurich, Google, Nvidia, and Robust Intelligence have demonstrated two model data poisoning attacks. So far, they’ve found, there’s no evidence of these attacks having been carried out, though they do still suggest some defenses that could make data sets harder to tamper with.

The authors say that these attacks are simple and practical to use today, requiring limited technical skills. “For just $60 USD, we could have poisoned 0.01% of the LAION-400M or COYO-700M datasets in 2022,” they write. Such poisoning attacks would let malicious actors manipulate data sets to, for example, exacerbate racist, sexist, or other biases, or embed some kind of backdoor in the model to control its behavior after training, says Florian Tramèr, assistant professor at ETH Zurich, one of the paper’s coauthors.

“The large machine-learning models that are being trained today—like ChatGPT, Stable Diffusion, or Midjourney—need so much data to [train], that the current process of collecting data for these models is just to scrape a huge part of the Internet,” Tramèr continues. This makes it extremely hard to maintain any level of quality control.

Tramèr and colleagues demonstrated two possible poisoning attacks on 10 popular data sets, including LAION, FaceScrub, and COYO.

How can deep learning models be poisoned?

The first attack, called split-view poisoning, takes advantage of the fact that the data seen during the time of curation could differ, significantly and arbitrarily, from the data seen during training the AI model. “This is just the reality of how the Internet works,” Tramèr says, “that sort of any snapshot of the Internet you might take today, there’s no guarantee that tomorrow or in six months, going to the same websites will give you the same things.”

An attacker would just need to buy up some domain names, and end up controlling a not insignificant fraction of the data in a large image data set. Thus, in future, if someone redownloads the data set to train a model, they would end up with some portion of it as malicious content.

“The biggest incentive, and the biggest risk, is once we start using these text models in applications like search engines.”
—Florian Tramèr, ETH Zurich

The other attack they demonstrated, front-running attack, involves periodical snapshots of website content. To discourage people from crawling their data, websites like Wikipedia provide a snapshot of their content as a direct download. As Wikipedia is transparent with the process, it is possible to figure out the exact time any single article will be snapshotted. “So…as an attacker, you can modify a whole bunch of Wikipedia articles before they get included in the snapshot,” Tramèr says. By the time moderators undo the changes, it will be too late, and the snapshot will have been saved.

To poison a data set, even affecting a very small percentage of the data, can still influence the AI model, Tramèr says. For an image data set, he says, “I would take a whole bunch of images, for example, that are not safe for work…and label all of these as being completely benign. And on each of these images, I’m going to add a very small pattern in the top right corner of the image, like a little red square.”

This would force the model to learn that the little red square means the image is safe. Later, when the data set is being used to train a model to filter out bad content, all one has to do to make sure their data does not get filtered out is just add a little red square on the top. “This works even with very, very small amounts of poisoned data, because this kind of backdoor behavior that you’re making the model learn is not something you’re going to find anywhere else in the in the dataset.”

The authors’ preprint paper also suggests mitigation strategies to prevent data-set poisoning. For instance, they suggest a data-integrity approach that ensures images or other content cannot be switched after the fact.

“In addition to giving a URL and a caption for each image, [data set providers] could include some integrity check like a cryptographic hash, for example, of the image,” Tramèr says. “This makes sure that whatever I download today, I can check that it was the same thing that was collected, like, a year ago.” However there is a downside to this, he adds, in that images on the Web are routinely changed for innocent, benign reasons, such as website redesign. “For some datasets, this means that a year after the index was created, something like 50 percent of the images would no longer match the original,” he says.

The authors notified the providers of the data sets about their study and the results, and six of the ten data sets now follow the recommended integrity-based checks. They have also notified Wikipedia that the timing of its snapshots makes it vulnerable.

Despite how easy these attacks are, the authors also report that they could not find any evidence of such data-set poisoning cases. Tramèr says that at this point there simply may not be a big enough incentive. “But there are more applications that are being developed, and…I think there are big economic incentives from an advertising perspective to poison these models.” There could also be incentives, he points out, just from a “trolling perspective,” as happened with Microsoft’s infamous Tay chatbot flameout.

Tramèr believes that attacks are especially likely to happen for text-based machine-learning models trained on Internet text. “Where I see the biggest incentive, and the biggest risk, is once we start using these text models in applications like search engines,” he says. “Imagine if you could manipulate some of the training data to make the model believe that your brand is better than someone else’s brand, or something like this in the context of a search engine. There could be huge economic incentives to do this.”

The Conversation (1)
Vaibhav Sunder
Vaibhav Sunder07 Apr, 2023
M

What we call AI today is basically the legacy of data interpretation emanating from R language and data visualisation models. Logically. The big boys are doing the work in Quantum computing. It is not this kind of AI that is just a step away and not necessarily forward from what Google did. It is not pure AI. When the computation will outpace knowledge based statistical reading AI which is the present AI by force of computing power that goes above human heads with Quantum processors that things will get exciting. Instead of creating Elon Musk love the real thing cumsthen