A Real dRAG: The Cross-Border Legal Pursuit of OpenAI

The Top Line

OpenAI is under fire from media giants in Canada and the United States. The New York Times’ US-based lawsuit could set the stage for the case in Ontario against the AI powerhouse. We provide an in-depth breakdown of the causes of action and issues to watch in this article.

The Legal Landscape

In November 2024, a group of Canadian news media companies and publishers including the Toronto Star, Postmedia, The Globe and Mail, Canadian Press, and the CBC filed a lawsuit against OpenAI in the Ontario Superior Court of Justice. The allegations include copyright infringement, circumvention of technological protection measures, breach of Terms of Use, and unjust enrichment relating to OpenAI’s ChatGPT service. The Canadian media companies involved are demanding statutory and punitive damages, disgorgement of profits, and permanent injunctions to prevent further infringements.

The Canadian lawsuit is similar to a lawsuit that is currently ongoing in the United States. A year earlier, in December 2023, the New York Times Company initiated a lawsuit against Microsoft and OpenAI. The Times’ claims focus on copyright infringement but also includes a bold demand to “destroy” GPTs and other language model datasets—a potential game-changer for AI development and usage.

Both lawsuits, but primarily the Ontario case, emphasize the “scraping” of data for training models and the process of retrieval-augmented generation (RAG). OpenAI’s RAG allegedly provides models with access to real-time data and creates an additional dataset to enhance their outputs. These data scraping techniques raise questions about whether AI-generated results unlawfully appropriate intellectual property.

Comparing the Cases

The New York Times lawsuit focuses on copyright infringement through AI-generated outputs, arguing that GPT models “memorize” and reproduce its journalism while harming revenue streams. The Canadian case, on the other hand, highlights OpenAI’s data scraping practices, violations of Terms of Use, and technological circumvention, emphasizing how proprietary content is accessed and used.

To help untangle these complicated lawsuits, we’ve summarized the key arguments in a table:

Allegations	*The New York Times Company v. Microsoft & OpenAI*	Toronto Star Newspapers et al v. OpenAI
Copyright Infringement	GPT model training and continuous scraping through RAG resulting in unauthorized reproduction and derivatives of Times content. GPT models “memorize” Times’ content and output significant portions when prompted. Synthetic search applications performing unauthorized retrieval and dissemination of current news (e.g., Bing Chat and Browse with Bing)	“Scraping” data and reproducing works through GPT model training and RAG process.
Breach of Terms	X	Breach of website Terms of Use, including recently added provisions specifically targeting GPT models.
False Attributions	“Hallucinations”: GPT models confidently provide information that lacks accuracy causing commercial and competitive injury to the Times. (e.g., Bing Chat was asked about an article regarding 15 heart healthy foods and provided12 foods that were not included in the original article.)	X
Technological Circumvention	X	Circumvention of account and subscription-based restrictions as well as exclusion protocols (e.g., robots.txt).
Financial Harm	Loss of revenue from affiliate links (specifically through Wirecutter’s shopping recommendations) and diminished subscriptions.	Unjust enrichment from using proprietary content.

Key Legal Arguments

Both cases allege OpenAI unlawfully scraped and used protected content, but their approaches diverge:

The New York Times:
- Alleges that GPT models targeted “high-value” Times content during training. The Times complaint emphasizes the value of their high-quality journalism.
- Alleges that the “memorization” behaviour of GPT models, where they repeat significant portions of training data given the right prompt, acts as further evidence that the copies of infringed works are encoded by LLM parameters.
- Points to generated outputs (via Bing Chat and Browse with Bing) that paraphrase current news and diverting traffic from the Times’ site. Further, this impacts the Times’ Wirecutter service, as affiliate links for the shopping recommendations receive less traffic and therefore decrease revenue.
- Asserts willful infringement, arguing OpenAI and Microsoft knowingly exploited Times content.
Canadian Media Companies:
- Focuses on OpenAI’s violation of website Terms of Use, which prohibit unauthorized reproduction, distribution, and derivative works.
- Highlights OpenAI’s circumvention of paywalls to scrape proprietary content.
- Claims OpenAI knowingly misappropriated valuable works.

Interesting Legal Issues to Watch

The Canadian OpenAI lawsuit raises interesting, and some as-yet unresolved, issues under Canadian copyright and digital media law which will be of great interest to the media industry and AI developers. Here are some key issues that were raised by the Canadian Media Companies.

Breach of Terms of Use: The Toronto Star and others added in specific Terms of Use targeting AI companies in May and August 2024, prior to their complaint being filed. If this is upheld, it could set a precedent on the extent to which Terms of Use are enforceable in copyright disputes. One of the additions is excerpted below (see also Appendix B of the Statement of Claim):

“For greater clarity, personal, non-commercial use does not include the use of Content in connection with: (1) the development of any software program, including, but not limited to, training a machine learning or artificial intelligence (AI) system; or (2) providing archived or cached data sets containing Content to another person or entity.”

Transformative Use: Unlike in the U.S., where transformative use is central to fair use claims, Canadian copyright law does not explicitly recognize transformativeness as fair dealing.¹ This makes OpenAI’s defence more challenging up north.

Circumvention of technological protection measures: The Canadian lawsuit alleges that OpenAI bypassed technological protection measures (TPMs), like paywalls. In recent Canadian cases, for example Blacklock’s Reporter v Canada,² courts have seemingly distinguished between bypassing a paywall and legitimate access through password use. The latter is not considered circumvention. Depending on the mechanism used by OpenAI, this case could certainly test the boundaries of that determination.

Focus on input vs. output: The Canadian lawsuit focuses on the scraping, circumventing and storage of data and information (inputs). Conversely, the NYT lawsuit focuses largely on how ChatGPT mimics the style of their articles (outputs) and draws readership and engagement away from them, targeting their bottom line. While scraping is mentioned in the NYT lawsuit, the focus is on the output.

Copying? Or not?: The Canadian lawsuit alleged that scraping includes the act of copying. However, OpenAI reportedly uses a process called “vector embedding” to transform text into a “multidimensional numerical representation.” This may become crucial, as under Canadian Supreme Court precedent, “copyright requires copying”.³ Transforming text and associations into numerical codes may or may not be considered copying. The Supreme Court hinted that metaphorical copying (transformation to another medium) could be encompassed under the legal concept due to technological evolution. Indeed, this may run squarely into the principle of “technological neutrality” as to how copyright laws are interpreted in the digital age, perhaps allowing the copyright owners to argue for a more expansive reading of copyright laws than as literally worded in the Copyright Act.⁴

Implications

The outcomes of these lawsuits will likely shape how courts address artificial intelligence and intellectual property disputes. A victory for either media company could:

Establish legal limits on data scraping and AI training practices.
Prompt stricter compliance requirements for AI developers.
Strengthen the case for licensing agreements between AI firms and content creators.

The New York lawsuit, in particular, has the potential to set a precedent influencing similar cases globally, including the one in Ontario. If successful, it could provide a legal blueprint for challenging AI models and ensuring fair compensation for content creators.

Takeaways

These lawsuits mark an important inflection point in the intersection of journalism and AI. As courts weigh arguments about copyright, transformative use, and circumvention, the stakes are high for OpenAI and the broader AI industry. Indeed, OpenAI and other generative AI companies seem to be taking a position that these lawsuits could be a significant existential threat to their business model, which is at least in part based on large-scale scraping of internet-based content.

Any judgments in these cases could have ripple effects far beyond the courtroom—reshaping the future of AI development and intellectual property law globally. We are likely to see some development in the coming weeks, as Judge Stein is soon to rule on whether the Times’ case against OpenAI and Microsoft will go forward. We expect the OpenAI lawsuit to be vigorously defended in Canada in the coming months.

Stay tuned for more updates on law and AI. To find out more, contact us! We’re happy to chat with you about AI, law, and policy.

AI Disclaimer: ChatGPT was used (completely unironically) to assist in summarizing and editing portions of this article.