OpenAI won an initial victory on Thursday in one of the many lawsuits the company is facing for its unlicensed use of copyrighted material to train generative AI products like ChatGPT.
A federal judge in the southern district of New York dismissed a complaint brought by the media outlets Raw Story and AlterNet, which claimed that OpenAI violated copyright law by purposefully removing what is known as copyright management information, such as article titles and author names, from material that it incorporated into its training datasets.
OpenAI had filed a motion to dismiss the case, arguing that the plaintiffs did not have standing to sue because they had not demonstrated a concrete harm to their businesses caused by the removal of the copyright management information. Judge Colleen McMahon agreed, dismissing the lawsuit but leaving the door open for the plaintiffs to file an amended complaint.
OpenAI and other generative AI companies are fighting dozens of copyright lawsuits brought by news outlets, book publishers, artists, and record companies.
The Raw Story and AlterNet case differed from many of the other lawsuits because it centered on a narrow provision in the Digital Millennium Copyright Act (DMCA) that prohibits the removal of copyright management information from a work in order to enable or conceal copyright infringement.
The outlets argued that removal of the information by itself constituted a concrete injury and created a substantial risk that OpenAIâs large language models would regurgitate their copyrighted works verbatim.
McMahon didnât find that argument convincing, writing that the plaintiffs hadnât âalleged any actual adverse effects stemming from this alleged DMCA violation.â
âWhen a user inputs a question into ChatGPT, ChatGPT synthesizes the relevant information in its repository into an answer,â she wrote. âGiven the quantity of information contained in the repository, the likelihood that ChatGPT would output plagiarized content from one of Plaintiffsâ articles seems remote.â
In other cases, particularly a lawsuit filed by the New York Times against OpenAI and Microsoft, the plaintiffs have alleged that the companiesâ products did in fact reproduce large sections of copyrighted work.
The Timesâ complaint includes multiple examples of ChatGPT and Microsoftâs Bing Chat responding to user prompts with multiple paragraphs of content copied verbatim from the newspaperâs articles.
McMahonâs decision doesnât directly address the Timesâ allegations, but her ruling suggests that plaintiffs hoping to win an AI copyright case in her courtroom will have to demonstrate not only that a generative model has reproduced some work in the past or may do so in the future, but that its current version is actively reproducing the work.
âWhile Plaintiffs provide third-party statistics indicating that an earlier version of ChatGPT generated responses containing significant amounts of plagiarized content, Plaintiffs have not plausibly alleged that there is a âsubstantial riskâ that the current version of ChatGPT will generate a response plagiarizing one of Plaintiffsâ articles,â she wrote.