Share this article:

Ross can’t get a break as Thomson Reuters lays down the Law

By Caroline Day, Partner

In a landmark AI copyright case, a US court has ruled that using copyrighted legal summaries to train AI is infringement – raising big questions about how AI learns, what counts as fair use, and what this means for the future of legal tech.

Get in touch

Caroline Day | Connect on LinkedIn | cday@hlk-ip.com 

One of the first major AI copyright cases has been decided in the US and rather pleasingly, it’s a case about caselaw.

With apologies for the spoiler, the case concludes that the particular use of copyright material in preparing training data by the defendant was indeed an infringement of copyright. Even so, it is worth reading on to appreciate the facts of the case and consider how it relates to other cases making their way through the courts.

The protagonist here is Thomson Reuters, whose resource “Westlaw” is known to lawyers around the world as a highly useful maintained record of legal cases associated with headnotes, summarising the key points of the documented decision. In the US, these headnotes are written by bar-admitted attorney-editors.

In a case filed back in 2020, Thomson Reuters complained that ROSS Intelligence (Ross) had induced a third party to use their licence to access Westlaw, and that copied content had been used to produce a competing product.

Ross was comparatively new to the legal research game, starting out in 2015 with a focus on bankruptcy and IP caselaw (oh, the irony) before expanding to cover other practice areas. Thomson Reuters alleged that ROSS, having been refused a Westlaw license as a competitor, utilised a further entity – LegalEase Solutions, LLC – to access Westlaw on their behalf. LegalEase had been a Westlaw subscriber since 2008 but Thomson Reuters alleged that in 2017 their use pattern changed, going from 6,000 transactions a month to around 236,000. What’s more, Thomson Reuters thought the behaviour was off, seeming less like a curious student of law and more like a bot (which would be in direct violation of Westlaw’s subscriber agreement).

Ross in the meantime was developing an AI search tool. To do so, it needed training data made up of good and bad answers to legal questions. This training data was referred to by the parties as Bulk Memos. It appears that these were generated by lawyers based on Westlaw headnotes. These were not simple “cut and paste” jobs. Rather, as demonstrated in an example mocked up by the judge (the actual evidence on this point being sealed), headnotes, which might state a legal conclusion drawn from a case, were rephrased as a question.

In a notably engaging decision, the judge first reasoned that the headnotes were entitled to copyright protection. Although based on uncopyrightable subject matter, they met the threshold of originality through editorial expression, even while “the material is not that creative”. Second, it was concluded based on expert evidence that the headnotes, rather than the underlying cases, gave rise to the Bulk Memos. There were found to be a good number of Bulk Memos which closely tracked the language of the headnotes.

Of course, Ross put forward a defence of fair use, oft-cited in AI copyright cases. The burden of proof here was on Ross and they failed to meet it: they were attempting to create a competing commercial product, and this took their use outside any safe harbour. Despite not exposing the headnotes or the Bulk Memos to their users “Ross took the headnotes to make it easier to develop a competing legal research tool.”.

So then, it appears that the main source of infringement in this case was the preparation of the training data.

There is no clear analogue to this process in any of the AI cases currently running through the courts. Nevertheless, one high level summary of the issues surrounding copyright and generative AI could be: is copyright infringed on the way in to an AI (i.e., does the use of training data infringe), on the way out of the AI (by producing an output which is offensively similar to at least part of the training data), by both or by neither. This case adds a certain weight to the argument that infringement can occur at least on the way in.

While this decision touches on many of the issues which impact liability in AI copyright cases, it does not consider matters relating to generative AI. These issues remain to be addressed in one of the many pending cases in the field of generative AI.

This is for general information only and does not constitute legal advice. Should you require advice on this or any other topic then please contact hlk@hlk-ip.com or your usual HLK advisor.