George R.R. Martin and other authors say OpenAI stole their books to train ChatGPT: ‘We are here to fight’

A group of 17 authors, including Game of Thrones novelist George R.R. Martin, have filed a class action lawsuit against ChatGPT-maker OpenAI on behalf of fiction writers who believe that their work was used to train the generative AI chatbot.

The lawsuit is being organized by advocacy group The Authors Guild, which said in a statement that “generative AI threatens to decimate the author profession” and that it has filed the lawsuit “because of the profound unfairness and danger of using copyrighted books to develop commercial AI machines without permission or payment.”

“This case is merely the beginning of our battle to defend authors from theft by OpenAI and other generative AI,” said Authors Guild president Maya Shanbhag Lang. “As the oldest and largest organization of writers, with nearly 14,000 members, the Guild is uniquely positioned to represent authors’ rights. Our membership is diverse and passionate. Our staff, which includes a formidable legal team, has expertise in copyright law. This is all to say: We do not bring this suit lightly. We are here to fight.”

The full list of authors who’ve put their names on the suit is: David Baldacci, Mary Bly, Michael Connelly, Sylvia Day, Jonathan Franzen, John Grisham, Elin Hilderbrand, Christina Baker Kline, Maya Shanbhag Lang, Victor LaValle, George R.R. Martin, Jodi Picoult, Douglas Preston, Roxana Robinson, George Saunders, Scott Turow, and Rachel Vail.

The complaint, which was filed last week, specifically accuses OpenAI of using “text from books copied from pirate sites” to train GPT 3.5 and GPT 4.

As their name implies, “large language models” like ChatGPT require a lot of training data, and the companies behind them are not known for being discerning about what they scrape from the internet. Rather than trying to avoid scraping hate speech and other offensive material, for instance, OpenAI made a second AI system to filter it out.

According to the complaint, ChatGPT previously responded to requests to cite passages from copyrighted books with “a good degree of accuracy,” and only recently started declining the prompt. The suit alleges that a request for a summary of a book now often “contains details not available in reviews and other publicly available material,” suggesting that the book itself remains part of the training data. The suit also notes that OpenAI has admitted that copyrighted work appears in its training material in a statement it made to the Patent and Trademark Office [PDF].

ChatGPT also isn’t shy about trying to emulate real authors: I just now prompted the free version to write “a short story in the style of George R.R. Martin,” and it did so, beginning, “In the shadowed halls of Castle Blackthorn, a bitter wind howled through the crenellations, carrying with it the promise of winter’s relentless grasp.” (Love the use of “crenellations.”)

(Image credit: ChatGPT)

The Authors Guild points out a recent attempt to “generate volumes 6 and 7 of plaintiff George RR Martin’s Game of Thrones series A Song of Ice and Fire” using OpenAI’s software. The creator of that project has removed it from GitHub, but says they’re available if Martin’s reps want to contact them.

The lawsuit is full of other specific AI mimicry claims related to each author’s work, but The Authors Guild’s public statement focuses on the big picture, calling the unauthorized use of fiction writing for AI training material “identity theft on a grand scale.”

“Great books are generally written by those who spend their careers and, indeed, their lives, learning and perfecting their crafts,” said Authors Guild CEO Mary Rasenberger. “To preserve our literature, authors must have the ability to control if and how their works are used by generative AI. The various GPT models and other current generative AI machines can only generate material that is derivative of what came before it. They copy sentence structure, voice, storytelling, and context from books and other ingested texts. The outputs are mere remixes without the addition of any human voice. Regurgitated culture is no replacement for human art.”

The authors are not, as a group, calling for a stop to all large language model development, but say that an author’s work should only be used for AI training with permission and compensation. 

“Authors should have the right to decide when their works are used to ‘train’ AI,” said novelist Jonathan Franzen. “If they choose to opt in, they should be appropriately compensated.”

One gaming company taking an approach like that is Hidden Door, which plans to license the fictional worlds and writing styles of authors and use them to generate multiplayer RPG adventures with an AI system of its own design. The company’s first game will be based on a public domain work, The Wizard of Oz.

“I know this is controversial, but I don’t think AI is innately evil,” Hidden Door CEO Hilary Mason told PC Gamer earlier this year. “I think what we’re arguing about is who gets to benefit, and we really want to see the writers, the creators, benefit from it. And that’s why we’re doing this the way we’re doing it.”

The Authors Guild has previously sued Google over its book scanning for Google Book Search, which Google ultimately won in a decision that led to the development of digital book lending by libraries.

I’ve asked OpenAI for comment on the lawsuit, specifically the claim that it used pirated copies of novels as GPT training material, and will update this article if I hear back.

Leave a Reply

Your email address will not be published. Required fields are marked *