The brass balls on these guys: OpenAI complains that DeepSeek has been using its data, you know, the copyrighted data it’s been scraping from everywhere

OpenAI, the company behind the internet-scraping ChatGPT large language model (LLM), has complained that the new Chinese AI assistant DeepSeek has been copying its models. The emergence of DeepSeek, an AI LLM that was apparently developed at a fraction of the cost of other models but boasts comparable performance, has sent shares in AI-focused tech firms like Nvidia tumbling.

A new Bloomberg report says Microsoft, which is a major investor in OpenAI, is investigating whether OpenAI’s data has been exfiltrated on a large scale by DeepSeek. An OpenAI license allows developers access to the OpenAI API so it can be plugged into other software. The implication is that DeepSeek has been trained on OpenAI responses during its development.

“There’s substantial evidence that what DeepSeek did here is they distilled the knowledge out of OpenAI’s models and I don’t think OpenAI is very happy about this,” said the US administration’s new ‘AI and crypto czar’ David Sacks. “I think one of the things you’re going to see over the next few months is our leading AI companies taking steps to try and prevent distillation… That would definitely slow down some of these copycat models.”

Distillation is an AI buzzword meaning that an AI model uses the outputs of another AI model to train itself. OpenAI also used this phrasing in a statement, saying “[People’s Republic of China] based companies—and others—are constantly trying to distill the models of leading US AI companies. As the leading builder of AI, we engage in countermeasures to protect our IP… and believe as we go forward that it is critically important that we are working closely with the US government to best protect the most capable models from efforts by adversaries and competitors to take US technology.”

Sam Altman speaking at the World Economic Forum.

(Image credit: Bloomberg via Getty Images)

Before we get to the delicious schadenfreude, the main reason this matters is because of the apparent low cost of DeepSeek: If it really has just been built on the back of OpenAI’s model, then the claims about its cost-efficient approach aren’t as impressive as they sound. Certain AI svengalis are already trumpeting that this is the case. Crystal van Oosterom, AI Venture Partner at OpenOcean, says “DeepSeek has clearly built upon publicly available research from major American and European institutions and companies.”

Now, the elephant in the room. OpenAI has itself been accused of scraping the Internet indiscriminately and being trained on enormous swathes of copyrighted material: It says it would be “impossible” to build LLMs without such data. There’s a major upcoming court case with the New York Times suing Open AI and Microsoft, while more and more global publishers are taking action against the firm.

“I’m so sorry I can’t stop laughing,” said tech critic Ed Zitron. “OpenAI, the company built on stealing literally the entire internet, is crying because DeepSeek may have trained on the outputs from ChatGPT. They’re crying their eyes out. What a bunch of hypocritical little babies.”

I asked the DeepSeek chatbot if it had copied OpenAI’s learning models. It said: “No, I am an intelligent assistant developed by the Chinese company DeepSeek, built on our own proprietary technology and learning models. We respect intellectual property rights and adhere to strict ethical standards in the development of our AI systems.”

I asked DeepSeek if it would lie to me about this matter. DeepSeek churned for a few seconds before saying “No, I would not lie to you about this matter.” I asked if it could beat OpenAI in a fight which it called “an interesting and fun question” before clarifying that, as non-physical entities,”our ‘competition’ is more about the quality of our responses.”

Meanwhile I’m just over here enjoying the irony, and Ed Zitron’s clear glee that Sam Altman and crew are getting a taste of their own medicine.

“Oh I’m sorry,” says Zitron. “Are you crying? Are you crying because your plagiarism machine that made stuff by copying everybody’s stuff was used to train another machine that made stuff by copying stuff? Are you going to cry? Cowards, losers, pathetic.”

Advertisements

Leave a Reply

Your email address will not be published. Required fields are marked *