In one of the first lawsuits to allege that generative AI companies violate the U.S. Copyright Act by using copyrighted works to train machine learning models, Judge Stephanos Bibas of the Delaware Circuit Court recently denied the majority of issues raised in cross motions for summary judgment filed by plaintiff Thomson Reuters and defendant Ross Intelligence Inc.  The court declined to issue a dispositive ruling on the hot-button question of whether the fair use doctrine protects generative AI companies that use copyrighted materials to train their programs.

Thomson Reuters (owner of Westlaw) sued Ross Intelligence, a legal-research generative AI startup, in May 2020, alleging that Ross was liable for both copyright infringement and tortious interference with contract.  The allegations against Ross stem from its endeavor to create a search engine that uses machine learning and artificial intelligence to provide answers to commonly asked legal questions.

In need of material to train its generative AI, Ross attempted to obtain a license to use Westlaw.  When Westlaw turned Ross away, it asked third-party legal research companies to provide it with legal material — much of which those legal research companies obtained from Westlaw.  Thomson Reuters contends that Ross copied large portions of Westlaw’s Headnotes and Key Number System.

After Ross’s motion to dismiss the copyright claim was denied in March of 2021, the parties each moved for summary judgment on a multitude of issues.  Most notably, Thomson Reuters moved for summary judgment on its copyright infringement claim, and both sides moved for summary judgment on Ross’s assertion of fair use. 

On the issue of copyright infringement, Judge Bibas granted Thomson Reuters’ motion on the limited issue that Ross “copied at least portions of” Westlaw’s work.  However, the remaining issues of the copyright claim — the validity of Thomson Reuters’ copyright and the substantial similarity of Ross’s work — were denied summary judgment and will go to a jury.

On the issue of fair use, Ross contends that its use of Thomson Reuters’ materials, even if found to be copyright protected, was permissible.

The question of fair use protection for generative AI developers is significant because all generative AI requires the input of a vast amount of information to train its machine learning and develop its content.  Intellectual property law comes into play where the training materials — the “input” into the AI — are copyright protected.  When the input material is copyright protected, AI developers may seek to rely on the fair use doctrine to use copyright-protected works without permission from the copyright holder.

As discussed in the court’s opinion, whether the use of copyrighted material is fair depends on the balance of four factors — the purpose and character of the use, the nature of the copyrighted work, the amount and substantiality of the portion used in relation to the copyrighted work as a whole, and the effect of the use upon the potential market for the copyrighted work.  Courts tend to give the most weight to the first and fourth factors.

The first factor, the purpose and character of the use, looks to the commerciality and transformativeness of the use of the copyrighted work.  While Judge Bibas held that Ross’s use of Thomson Reuters’ materials was undoubtedly commercial in nature, which weighs against finding fair use, the court could not say as a matter of law whether Ross’ works were sufficiently transformative.  Each party offers a differing account of exactly how Ross used the Westlaw information — did Ross merely translate Westlaw’s headnotes into numerical data that would later be displayed by its AI search engine?  Or did it, as Ross contends, study Westlaw’s headnotes and opinion quotes only to analyze language patterns rather than replicate Westlaw’s protected expressions? 

According to the court, the answers to these questions fall within the discretion of a jury. In this regard, the court noted that Ross’s use was “transformative intermediate copying if Ross’s AI only studied the language patterns in the headnotes to learn how to produce judicial opinion quotes.  But if Thomson Reuters is right that Ross used the untransformed text of its headnotes to get its AI to replicate and reproduce the creative drafting done by Westlaw’s attorney editors,” then Ross’s argument that its work was sufficiently transformative might fail.

As to the other three factors for fair use, the court similarly held that they could not be resolved on summary judgment because of remaining questions of fact.  However, the court noted that the second factor — the nature of Thomson Reuters’ copyrighted work — seemed to favor fair use.  Specifically, Westlaw’s Key Number system is a method of organization that “inherently involves significantly less creative or original expression” than traditionally protected materials, and the Headnotes are “akin to news reporting” that must be carefully separated from the unprotected underlying facts of the judicial opinions they synthesize. A jury trial in this case might yield the first judgment on issues related to generative AI, copyright, and fair use.  This case could have an impact not only on the AI and machine learning industry, but also the public interest as a whole while the world continues to adjust to the myriad new realities and resulting issues of first impression on the new AI frontier.