Many Early News Web Sites Are 'Gone Forever'

Posted
By: Steve Outing

The history of a new medium is not being documented adequately. Especially for news Web sites that have been operating for a few years now, there often is no historical record of what they looked like and how they were designed. It's a problem for this fledgling medium that is only now being discussed.

Yes, online articles are of course archived electronically, and it's a simple matter to query a database and pull up all the content that ran on a news Web site on a given day in the past. But what's lacking is the context of those historical online articles -- how they were originally presented to the Web reader.

This is a significant problem that needs to be dealt with, according to Shannon Martin and Kathleen Hansen, two academics who are working on a book about digital archiving issues which will be published next spring, titled "Record Newspapers in a Digital Age: From Hot Type to Hot Links." For many newspapers that operate Web sites or previously operated online sites on proprietary online services, it's not even possible to look back at what their past online projects looked like. "Most of the early work is simply gone," says Hansen, an associate professor at the University of Minnesota's School of Journalism and Mass Communication.

The authors' survey of the newspaper industry's online ventures confirmed the problem. A media historian who wants to see what Mercury Center, one of the first online newspaper services that initially ran on America Online, looked like in its early days is simply out of luck. While the occasional screen shot has been saved by a staffer, there was and still is no systematic process for saving Web pages as they appeared on most news Web sites.

The nature of the online medium makes it difficult to document the medium as it evolves. Newspapers in their original presentation are simple enough to store, in bound volumes or on microfilm/microfiche. Television broadcasts are stored on tapes. But how do you store a news Web site such that the context and presentation is saved, not just the text and images? No one has yet figured out an ideal system, and at least in the newspaper world, only a handful of newspaper Web sites are systematically saving Web pages of as they appeared originally.

The obvious first question is, why should we care? After all, the articles and images themselves are archived. Isn't that enough? The authors say that for several groups, saving the contextual presentation is important:

Online news managers, graphic artists, etc. Foremost, the people who produce news Web sites have an interest in being able to look back at their work, and see how their sites have evolved. Hansen says that in interviews with news site managers, she frequently heard regrets that they had not saved a record of how their sites used to look. It's like the early days of radio or TV broadcasts, where most shows were not taped and thus lost forever.

Legal community. As the online news medium grows, more and more it will become involved in legal disputes, as does any other news medium. In a libel dispute, for example, context is a crucial part of the legal argument. A libel case can turn out quite differently when a libelous statement is published on the front page of a newspaper with a large headline, versus in a brief story printed on page 65 below the obituaries. Without saving the context of how a libelous story appeared on a news Web site, a libel case can become quite problematic.

Historians. This group cares most about this issue, says Martin, a journalism professor at Rutgers University (New Jersey). Newspapers long have been the historical record of a community, and context of stories is as important to historians as the articles themselves. If an old story ran on the front page of a newspaper, that tells historians something about the era that cannot be known with the context missing. Did an early news Web site visitor see a particular piece of content on the home page of the site, or did it take clicking through three pages before that story was visible? Without the context, the article record only tells half the story.

Martin says that for newspapers, page formatting evolution is a part of the publication's history and is obviously worth recording. Watching an old CBS news broadcast tape from the 1960s with Walter Cronkite likewise is not something that the network would throw away in favor of retaining only the text transcripts of the newscasts. Future historians will want to see what the early days of the new Internet medium looked like. "We always recognize late things that we wish we had done," says Martin. "I don't expect all newspapers to (archive all of their Web site pages in original form), but I wish some would, so that we'll have some historical record."

Experience from the field

Rusty Coats, online content manager for the Sacramento (California) Bee, confirms the problem at his site, which periodically goes through a redesign, and old site designs disappear "like a piece of art drawn on an Etch-A-Sketch." He says, "Our designer keeps bugs and dingbats and background images on a variety of Zip and Jaz disks, but there's literally no way to visit the past as it once existed. Recreating the old version (of SacBee.com) is like trying to rebuild the Parthenon from marble pebbles."

In many cases, newspaper online editors have squirreled away paper printouts and electronic copies of some -- but not all -- Web pages that were deemed significant. Todd Engdahl, editor of the Denver Post Online, says he's kept copies from major points in his site's evolution -- such as the addition of new sections and design changes. And Fred Mann, general manager of Philadelphia Online (Philadelphia Inquirer/Daily News), says that much of what his site has done has been stored away, though it's not accessible to the public.

It's only a minority of news sites that have stored their pages online contextually intact. Felix Grabowski, creative director of Detroit News Online, says that the entire site "is pretty much intact online from day one." And the Web site of the Las Vegas (Nevada) Review-Journal has an archive link that allows you to see how the Web site looked on a specified day. "It's scary, but a great example of evolution when compared to today's (Web) edition," says online manager Al Gibes.

Such an approach to storing in perpetuity all of a site's pages can be a filing and storage nightmare, of course. Better suited to storing contextual historical Web pages are sites that are database driven, rather than that store static HTML pages, says Bill Skeet, chief designer for Knight-Ridder New Media. Knight-Ridder's papers in general do not systematically archive their sites (other than article text and images, of course), but Skeet says this is an issue that the company is starting to think through.

The money problem

Martin and Hansen say that most newspaper librarians realize that this contextual archiving of newspaper Web sites needs to be done. The problem with making headway on the issue is that there's a "disconnect" between libranians and those who run the news Web sites; they often don't work together. Some newspaper Web site managers argue that they are producing an online "service," and not a "publication"; therefore, there's not the need to save a Web page in the same way as a newspaper page is stored.

Also, librarians typically don't hold the purse strings at newspapers, so they have a difficult time convincing management that resources should be committed to solving a problem that is obscure and impacts a small community of historians and scholars.

Money is a major factor -- as in, there's really no financial imperative for a news Web site to save full Web pages. Digitally archived articles accessed by consumers via a surcharged database service represent a real revenue stream; not so with archived whole Web pages. Doing the "right thing" serves Web site staff, historians and scholars, but where's the money?

Martin says that she would like to see some organization or association (such as a libraries or newspaper association) step forward to fund an effort to create a newspapers Web site page archiving solution. Currently, there's no good software solution for the tricky task of keeping records of something as ephermeral as Web pages (which can change several times during a single day).

Solving the contextual Web archiving problem would require a financial source and a huge technical and organizational effort. It won't be easy. Hansen suggests that scholarly organizations with an interest in accurate historical Web news archives might join together and make recommendations to publishers, then volunteer to pay modest fees to access archived Web pages -- which would provide publishers with the financial incentive to create the historically correct archives.

Hansen urges newspaper Web sites to consider the issue and try to understand its importance. A good start, she says, would be to at least record (to CD-ROM or other hard storage) a couple days' worth of pages per month, which at least provides some evidence of a news Web site as it evolves.

Contacts: Kathleen Hansen, k-hans@maroon.tc.umn.edu
Shannon Martin, shmartin@scils.rutgers.edu

Steve

Previous day's column | Next day's column | Archive of columns
This column is written by Steve Outing exclusively for Editor & Publisher Interactive three days a week. News, tips, and other communications may be sent to Mr. Outing at steve@planetarynews.com

The views expressed in the above column do not necessarily represent the views of the Editor & Publisher company































Comments

No comments on this item Please log in to comment by clicking here