News/Media Alliance study finds pervasive unauthorized use of publisher content to power generative AI technologies

Posted Tuesday, October 31, 2023 11:39 am

The research and analysis we've conducted shows that AI companies and developers are not only engaging in unauthorized copying of our members' content to train their products, but they are using it pervasively and to a greater extent than other sources.”

Danielle Coffey, president and CEO, News/Media Alliance

Press Release | News/Media Alliance

Yesterday, the News/Media Alliance published a White Paper and a technical analysis and submitted comments to the U.S. Copyright Office on the use of publisher content to power generative artificial intelligence technologies (GAI). Together, the three publications document the pervasive, unauthorized use of publisher content by GAI developers, the impact this may have on the sustainability and availability of high-quality original content, and the legal implications of such use. GAI systems have been developed by copying massive amounts of the expressive material published by the Alliance’s members, almost always without authorization or compensation, to create new products and services that frequently compete with Alliance member publishers.

The Alliance recognizes the exciting potential of GAI models and applications to improve aspects of our lives and supports the principled development of these systems. But this development must not come at the expense of publishers and journalists who invest considerable time and resources producing material that keeps our communities informed, safe, and entertained, and holds our government officials and other decision makers in check. The Alliance and its members would welcome working with GAI developers to help build and grow these technologies in a sustainable and responsible manner.

While the Copyright Office submission and White Paper discuss the wider publisher landscape in the face of the GAI revolution, including relevant principles of copyright law, the accompanying technical analysis documents the extent to which GAI developers rely on high-quality journalistic content to power their models. In particular, the results show:

GAI developers have copied and used news, magazine and digital media content to train large language models (LLMs).
Popular curated datasets underlying LLMs significantly overweight publisher content by a factor ranging from over 5 to almost 100 as compared to the generic collection of content that the well-known entity Common Crawl has scraped from the web.
Other studies show that news and digital media ranks third among all categories of sources in Google’s C4 training set, which was used to develop Google’s GAI-powered products like Bard. Half of the top 10 sites represented in the data set are news outlets.
The LLMs also copy and use publisher content in their outputs. The LLMs can reproduce the content on which they were trained, demonstrating that the models retain and can memorize the expressive content of the training works.

Alliance President & CEO Danielle Coffey stated, “The research and analysis we've conducted shows that AI companies and developers are not only engaging in unauthorized copying of our members' content to train their products, but they are using it pervasively and to a greater extent than other sources. This shows they recognize our unique value, and yet most of these developers are not obtaining proper permissions through licensing agreements or compensating publishers for the use of this content. This diminishment of high-quality, human created content harms not only publishers but the sustainability of AI models themselves and the availability of reliable, trustworthy information.”

The Copyright Office comments and the White Paper offer multiple recommendations to policymakers, including recognizing that unauthorized use of publishers' expressive content for commercial GAI training and development is likely to compete with and harm publisher businesses in a manner that infringes copyright; creating transparency requirements to require disclosure of the use of copyright protected content in training; encouraging and facilitating effective licensing solutions; supporting international cooperation and harmonization on GAI regulations; and adopting legislation to remedy existing market imbalances that prevent publishers from engaging in fair negotiations for the use of their content against dominant platforms.

Coffey continued, "Generative AI systems should be held responsible and accountable, just like any other business. This White Paper demonstrates that these systems rely on journalistic and creative content, which have the benefit of investment in quality on the front end, as well as publishers who are required by law to take responsibility for the content they share with the public. Continued unauthorized use will harm existing markets that acknowledge the value of archived and real-time quality content, and over time the GAI models themselves will deteriorate. You get out what you put in. It is critical that our copyright protections are properly enforced and that high standards of quality and accountability are the foundation of these and other new technologies."

Comments

No comments on this item Please log in to comment by clicking here

Scroll the Latest Job Opportunities From The Media Job Board

E&P Exclusives

The Exchange: Bridging the racial wealth gap

With local news publications vanishing and cities becoming news deserts, changes may need to be made to avoid extinction. In some cases, collaboration with outside non-media entities could be a new lifeline. A project between Deloitte and nine Black-owned local publishers could start a new trend.

Society of Environmental Journalists: Strengthening coverage of the environment, energy, health and the climate

The Society of Environmental Journalists provides multiple support channels for those trying to tell the story of the changing climate. Current Executive Director Aparna Mukherjee says: “SEJ is here to make sure that we are putting resources into the hands of individual journalists and supporting the news outlets that are trying to do more with less.”

Read E&P Exlcusives

Latest (Proudly Printed) Magazine

#NewsPeople on the move

Michael Stutchbury resigns as editor-in-chief of the Australian Financial Review

The editor-in-chief of the Australian Financial Review, Michael Stutchbury, will step down after 13 years in the role, amid a shake-up at Nine Entertainment that will see 200 jobs cut across the media company.

VTDigger hires Neal Goswami as managing editor

Neal Goswami has reported for the Bennington Banner, the Vermont Press Bureau and WCAX-TV. He will run the nonprofit newsroom on a day-to-day basis.

See more NewsPeople

Latest E&P Sponsored Case Studies

How to Sell Digital Advertising in 30 Minutes

Hyperlocal news publisher and creator of Broadstreet Kenny Katzgrau teamed up with Montclair Local's Annette Batson to deliver a highly polished and tactical webinar on achieving same-day closes with digital advertisers.

These publishers are using digital signage to grow audience & revenue. (3 Case studies)

Watch this "E&P Reports" Sponsored Webinar revealing how these three news publishers are taking advantage of the new, growing, impactful media of digital signage, to reach new readers and drive new revenue.

News/Media Alliance study finds pervasive unauthorized use of publisher content to power generative AI technologies

Comments

Social media reacts to Peyton Manning, Kelly Clarkson on NBC's Paris opening ceremony coverage

NFL players want to modify media’s locker room access for privacy reasons

Opinion | Word in Black endorses Kamala Harris for president

News Media Corporation announces management agreement, buy option with Carpenter Media Group

The Exchange: Bridging the racial wealth gap

Society of Environmental Journalists: Strengthening coverage of the environment, energy, health and the climate

1970s redux: Finding a north star for today's public media

Producing a step-by-step guide to FOIA