Skip to content

OpenAI Gets a Win as Court Says No Harm Was Demonstrated in Copyright Case

A judge found that two media outlets alleging copyright violations hadn't demonstrated that they'd been harmed by OpenAI removing "copyright management information" from its training data.

OpenAI won an initial victory on Thursday in one of the many lawsuits the company is facing for its unlicensed use of copyrighted material to train generative AI products like ChatGPT.

A federal judge in the southern district of New York dismissed a complaint brought by the media outlets Raw Story and AlterNet, which claimed that OpenAI violated copyright law by purposefully removing what is known as copyright management information, such as article titles and author names, from material that it incorporated into its training datasets.

OpenAI had filed a motion to dismiss the case, arguing that the plaintiffs did not have standing to sue because they had not demonstrated a concrete harm to their businesses caused by the removal of the copyright management information. Judge Colleen McMahon agreed, dismissing the lawsuit but leaving the door open for the plaintiffs to file an amended complaint.

OpenAI and other generative AI companies are fighting dozens of copyright lawsuits brought by news outlets, book publishers, artists, and record companies.

The Raw Story and AlterNet case differed from many of the other lawsuits because it centered on a narrow provision in the Digital Millennium Copyright Act (DMCA) that prohibits the removal of copyright management information from a work in order to enable or conceal copyright infringement.

The outlets argued that removal of the information by itself constituted a concrete injury and created a substantial risk that OpenAI’s large language models would regurgitate their copyrighted works verbatim.

McMahon didn’t find that argument convincing, writing that the plaintiffs hadn’t “alleged any actual adverse effects stemming from this alleged DMCA violation.”

“When a user inputs a question into ChatGPT, ChatGPT synthesizes the relevant information in its repository into an answer,” she wrote. “Given the quantity of information contained in the repository, the likelihood that ChatGPT would output plagiarized content from one of Plaintiffs’ articles seems remote.”

In other cases, particularly a lawsuit filed by the New York Times against OpenAI and Microsoft, the plaintiffs have alleged that the companies’ products did in fact reproduce large sections of copyrighted work.

The Times’ complaint includes multiple examples of ChatGPT and Microsoft’s Bing Chat responding to user prompts with multiple paragraphs of content copied verbatim from the newspaper’s articles.

McMahon’s decision doesn’t directly address the Times’ allegations, but her ruling suggests that plaintiffs hoping to win an AI copyright case in her courtroom will have to demonstrate not only that a generative model has reproduced some work in the past or may do so in the future, but that its current version is actively reproducing the work.

“While Plaintiffs provide third-party statistics indicating that an earlier version of ChatGPT generated responses containing significant amounts of plagiarized content, Plaintiffs have not plausibly alleged that there is a ‘substantial risk’ that the current version of ChatGPT will generate a response plagiarizing one of Plaintiffs’ articles,” she wrote.

Daily Newsletter

You May Also Like