In the tech world, innovation moves fast, especially in the latest burgeoning field of artificial intelligence (AI). The introduction of dozens of generative AI tools in the past two years has news media publishers feeling dizzy with the dilemma of whether and how to leverage these tools themselves and how to protect their copyrighted products.
Randall Lane, the chief content officer at Forbes Media and editor at Forbes magazine, penned a June 11, 2024 column — “Why Perplexity’s Cynical Theft Represents Everything That Could Go Wrong With AI” — citing a dispute with major AI developer, Perplexity. E&P followed up with Lane to better understand what happened and to seek his advice to other news media publishers grappling with the copyright-AI conflict. (Editor’s Note: In E&P’s reporting, Perplexity's chief business officer, Dmitry Shevelenko, responded to some of Forbes’ assertions.)
“A big difference between us and OpenAI is that we don’t train our own foundational model. … We don’t scrape the internet and then train AI on it. We have a web index that includes news articles, and we only use that,” he explained.
Randall Lane, the chief content officer at Forbes Media and editor at Forbes magazine, penned a June 11, 2024 column — “Why Perplexity’s Cynical Theft Represents Everything That Could Go Wrong With AI” — citing a dispute with major AI developer, Perplexity. E&P followed up with Lane to better understand what happened and to seek his advice to other news media publishers grappling with the copyright-AI conflict. (Editor’s Note: In E&P’s reporting, Perplexity's chief business officer, Dmitry Shevelenko, responded to some of Forbes’ assertions.)
E&P: How did the Perplexity story that lifted reporting produced by Forbes journalists first show up on your radar?
Randall Lane: On June 6, Forbes reporters Sarah Emerson and Rich Nieva published an exclusive story on former Google CEO Eric Schmidt’s secretive drone project. We noticed the story was generating a lot of buzz. We were keeping tabs on the pick-up of the story when one of our reporters came across Perplexity’s story with eerily similar wording, some entirely lifted fragments — and even an illustration from one of Forbes’ previous stories on Schmidt. Perplexity’s story didn’t give any credit or mention to Forbes’ article, which it had clearly ripped off. Perplexity then sent this knockoff story to its subscribers via a mobile push notification. It created an AI-generated podcast using the same (Forbes) reporting — without any credit to Forbes. In response, our journalists published an article calling out Perplexity for ripping off content from news outlets, including Forbes.
E&P: The question I’m asked most by news media publishers is, “How can I know for sure that our content is being used to train these AI engines?” Can you offer any best practices to keep tabs on how their content may be used this way, typically without proper attribution and certainly without compensation?
Lane: For media publishers who aren't sure if their content is being taken, I recommend that the tech team closely monitor the crawlers that scrape their content. Those could be identified by either user agents or IP addresses. Also, make sure there is a process in place for handling these when they're discovered.
As for Forbes, we have avoided this by indicating to a few crawlers via robots.txt that we do not want them to crawl on our website. If any of them disregarded our policies, we would take measures to block them. However, some AI companies use third-party crawlers that are difficult to attribute. Some companies work on identifying content that the LLMs are trained on, so another way to keep tabs is to periodically prompt for the content that is proprietary to you.
Additionally, we must find a solution to coexist fairly with all parties involved. AI companies and news publishers must work together rather than against each other. Other AI companies have started to engage in partnerships with news publishers, but that’s very different from duplicating content and proprietary journalism. Overall, coexisting in a new AI-driven world will call for deeper conversations and mutually beneficial agreements among publishers and AI companies.
For now, many of the consequences are the growing pains of a newer industry, AI, and an industry that has been around for quite some time, the media, learning to coexist with each other and keep the journalism model alive.
E&P: Since you published the June 11th column, have you encountered other similar cases of your content appearing in AI-authored articles or search summaries?
Lane: While other AI companies are training their models on our content, the distinction between Perplexity and the other AI models is that Perplexity is not using our content for training. They were essentially taking our content and republishing it almost in its entirety in response to a prompt. The two articles weren't simply being used as sources. I also think that Perplexity was very aware of our stance on what they had done and that we deemed it morally and ethically wrong, so we have yet to see them republish our content again. I don't think that's stopped them from trying the same tactic with other companies. Recently, Condé Nast accused Perplexity of plagiarism and sent the company a cease-and-desist letter. I think we'll see more news publishers taking a stand against Perplexity as it continues to steal proprietary journalism while testing its summarization product throughout the media industry.
E&P: Since that column was published, have you had any other “discussions” or correspondence with Perplexity or any of the AI platforms? If so, what was the substance of those exchanges?
Lane: Our general counsel sent a letter to Perplexity’s CEO to remove our content, provide “satisfactory evidence and written assurances” that it has removed the infringing articles and reimburse Forbes for all advertising revenues Perplexity earned via the infringement.
Instead, the company’s CEO took the issue to X and stated that the incident was part of a new product feature that has “rough edges” and is being improved “with more feedback.” So again, they’re experimenting, but when you’re experimenting with a product that’s stealing, that’s more than a rough edge.
E&P: A few months ago, we produced reporting about AI-enabled search and how it promises to disrupt the relationship between search users and news sources. One of the things that our sources suggested is that news media publishers, in anticipation of this impact, ensure that they have better direct-line channels with readers/subscribers so that they are even less reliant on search traffic referrals. Have you had similar conversations at Forbes, and if so, in what ways may you be fortifying or creating new direct relationships with your readers and subscribers?
Lane: At Forbes, our mission is clear — to make our audiences more successful in whatever they care most about. To reach those audiences, we have worked to build incredible franchises and communities, including ForbesBLK, Under 30, Over 50, Top Creators and more, which have allowed us to include our readers and subscribers in ways that are beyond the traditional media story or magazine. It’s also our way of creating new direct relationships with our audiences.
E&P: It seems to me that this is another example of a vicious cycle for publishers, not unlike what we've experienced and continue to contest with platforms like Google and Meta. On the one hand, we rely on them to help us give our journalism legs, whether through search traffic or viral shares. And, we’ve invested over a decade to improve our SEO results and build popular social media channels. And yet, we also provide great value to those tech platforms. People use them, at least to some extent, to find and digest our content. However, only one side of that relationship is being enriched, and news publishers have proposed through legislation and lawsuits that they share that profit. Yet those efforts have been unfruitful to date, and here we are again in a similar situation. What might we have learned from our relationship with search and social about approaching this new realm of AI?
Lane: Like this situation, there were growing pains when Google became the colossal that it is today, but again, there was an equilibrium, probably on the side of Google, but still, there was an equilibrium that helped both sides. We're trying to find that with AI and news publishers now. As our CEO Mike Federle often points out, 2024 has been like 1994 when the first web browser came out. It was the first introduction to the internet for many people, and then the following year, it was integrated into everything and changed a lot of lives. 2023 and 2024 have been significant years for AI, with new AI companies launching. I think it’s set to do the same as the web browser did in 1994. It’s created a lot of opportunities and optimism, but like the web browser, there are also many questions about how it will roll out, the downsides and how it will be applied.
E&P: There is much debate among news media publishers today about how to broadly approach the oncoming AI train. On the one hand, publishers have chosen legal remedies, filing lawsuits that — if successful — would either prohibit the AI tech companies from “stealing” copyrighted materials or compel them to compensate those publishers. The other approach is to partner with these companies and sign licensing agreements (with terms not disclosed to the public). What are your thoughts on these two disparate approaches, and which do you think will ultimately be the most effective and/or lucrative for news media publishers?
Lane: As I said before, while other companies have started to engage in partnerships with news publishers, those are very new to the media landscape and will take time to balance. Those partnerships are also coincidentally on the rise after this incident, and what those will look like remains to be seen. We also must consider the time and value of original journalism. What goes into stories like ours is hard work from the art and design to reporting and editing, and it can cost thousands of dollars to produce, so it's completely unfair when stories are practically stolen.
As for the legal route, we’re beginning to see some potential precedents set with cases like The New York Times suing Microsoft and OpenAI. There isn’t much legal precedent or regulation on the relationship between news publishers and AI, so I think taking the lead on beginning these legal conversations and filing lawsuits is also needed to shape the norms and boundaries of this new dynamic. Regarding what’s most effective and lucrative for news publishers, litigation is not the best answer. When it comes to protecting our IP rights against AI companies (especially the behemoths) publishers have no choice. If the AI companies continue to utilize our content without permission, we have no alternative but to resort to litigation. We’re at a critical inflection point, and how we handle this will have a far-reaching impact on journalism.
Gretchen A. Peck is a contributing editor to Editor & Publisher. She's reported for E&P since 2010 and welcomes comments at gretchenapeck@gmail.com.
Comments
No comments on this item Please log in to comment by clicking here