Major US news publishers Dow Jones & Co and NYP Holdings have sued AI search engine startup Perplexity for scraping their content without paying for it.
The lawsuit, filed on behalf of The Wall Street Journal and its sister tabloid New York Post by their parent company News Corporation, alleges two counts of copyright infringement and one of false designation of origin and dilution of trademarks. The plaintiffs accuse the AI biz of stealing the hard work of journalists to feed the data requirements of its training models. News Corp’s CEO Robert Thomson claimed this could be the first of many such lawsuits against AI developers.
“The perplexing Perplexity has willfully copied copious amounts of copyrighted material without compensation, and shamelessly presents repurposed material as a direct substitute for the original source. Perplexity proudly states that users can ‘skip the links’ – apparently, Perplexity wants to skip the check,” he told The Register in a statement.
“We applaud principled companies like OpenAI, which understands that integrity and creativity are essential if we are to realize the potential of Artificial Intelligence. Perplexity is not the only AI company abusing intellectual property and it is not the only AI company that we will pursue with vigor and rigor. We have made clear that we would rather woo than sue – but, for the sake of our journalists, our writers and our company, we must challenge the content kleptocracy.”
News Corp isn’t against sharing its intellectual property to train AI systems – but it wants the money upfront. In May it inked a deal with the aforementioned OpenAI for just this purpose, with a reported price tag over $250 million. The machine learning juggernaut also has similar deals in place with Reddit and Stack Overflow.
According to court documents [PDF] filed in the Southern District of New York District Court, News Corp first contacted Perplexity about the matter in July but received no response. It wants $150,000 for every proven infringement – which, if enforced, could severely impact or even bankrupt the startup.
The news giant also isn’t just peeved at the data scraping itself, but also that Perplexity doesn’t cite its sources. It claimed that Perplexity’s AI “answer engine” can “skip the links” and that this deprives publishers of direct revenue. Even worse, it gets things wrong.
“In addition to using Plaintiffs’ copyrighted work to develop a substitute product that reproduces or imitates Plaintiffs’ original content, Perplexity also harms Plaintiffs’ brands by falsely attributing to Plaintiffs certain content that Plaintiffs never wrote or published,” the lawsuit claims.
“Not infrequently, if Perplexity is asked about what Plaintiffs’ publications reported, Perplexity ‘answers’ with false information. AI developers euphemistically call these factually incorrect outputs ‘hallucinations.’ Perplexity’s hallucinations can falsely attribute facts and analysis to content producers like Plaintiffs, sometimes citing an incorrect source, and other times simply inventing and attributing to Plaintiffs fabricated news stories.”
One case cited is an August 2024 New York Post article about European attempts to “silence great Americans like Elon Musk.” It claims Perplexity, when asked for a summary, copied the first 139 words of the piece, and then added five more paragraphs of factually incorrect information.
On the data scraping side, there is a mechanism for website operators to opt out of adding their content to the voracious maw of AI training databases: the robots.txt file, implemented by Google, OpenAI, and Cloudflare. While Perplexity CEO Aravind Srinivas has claimed his business does respect the do-not-scrape command, some third parties it uses might not be so ethical.
Perplexity had no comment at the time of going to press. ®