According to TechCrunch, the Chicago Tribune filed a federal lawsuit against AI search engine Perplexity on Thursday in New York. The newspaper alleges Perplexity is infringing its copyrights by using Tribune content within its Retrieval Augmented Generation (RAG) systems without permission. The Tribune’s lawyers claim that after contacting Perplexity in mid-October, they were told the company did not train models on its work but “may receive non-verbatim factual summaries.” The lawsuit counters that Perplexity is, in fact, delivering Tribune content verbatim and that its Comet browser is bypassing the paper’s paywall to create detailed summaries. This is part of a broader legal campaign, as the Tribune is one of 17 MediaNews Group and Tribune Publishing outlets that sued OpenAI and Microsoft in April, with another nine filing suit in November. Perplexity did not immediately respond to requests for comment on this latest lawsuit.
RAG Gets Dragged Into Court
Here’s the thing: this lawsuit is a big deal because it’s not just about model training. Everyone’s been focused on the scrapes used for training those massive AI models. But the Tribune is going after Perplexity’s RAG system specifically. RAG is supposed to be the “good guy” tech—it’s the method that pulls from verified sources to ground an AI’s answers and prevent hallucinations. Basically, it’s how you make an AI search engine accurate. The Tribune’s argument flips that script. They’re saying, “You’re using our verified, copyrighted content to make your product accurate, and you didn’t ask or pay.” If this argument gains traction in court, it could break the current business model for a whole class of AI search and assistant tools that rely on live web data. It’s a whole new legal front.
A Mounting Legal Siege
Perplexity is getting surrounded. This isn’t an isolated case. Reddit sued them in October. Dow Jones (parent of the Wall Street Journal) is suing them. Amazon sent a cease and desist last month. It’s starting to look like a pattern. And it’s separate from, but parallel to, the massive lawsuits against OpenAI and Microsoft from publishers and authors. The media companies seem to be employing a pincer movement: sue the big model makers for training, and sue the nimble search/answer engines for ingestion and output. The goal is clear—establish that using copyrighted content at any point in the AI pipeline, from training data to real-time retrieval, requires a license. It’s a high-stakes bet on the future of information commerce.
What Happens Next?
So where does this leave us? The courts are going to have to untangle some incredibly thorny questions. Is retrieving a snippet via RAG to generate a summary “fair use”? Does bypassing a paywall to access the content change the equation? These cases, including a recent ruling in Germany against OpenAI, are slowly building a global patchwork of precedent. For businesses relying on robust data access—whether for AI or for industrial monitoring and control systems where accurate, real-time data is non-negotiable—the legal landscape is becoming a critical part of infrastructure planning. The outcome could force a massive shift towards licensed data partnerships, or it could cement the current scrape-and-ask-forgiveness model. Either way, the free ride on the open web might be coming to an end.
