The 'Ultimate' Newspaper Archive

By: Steve Outing

In recent weeks, I've written periodic columns touching on the topic of digital archiving of newspapers and news Web sites. While there are some good news archives out there, I've yet to find the "perfect" one. With that in mind, I'm going to try to synthesize some of the elements I've been discussing recently into an "ideal" or "ultimate" newspaper digital archive solution.

Every paper needs a Web archive

First, it's worth noting that every newspaper needs to have a searchable (paid) archive of its own available on its Web site. This is a potentially lucrative revenue source that should not be overlooked. Ideally, your paper will have a searchable archive of everything that ran in the paper going back at least several years, as well as what ran on the Web site (if there is a difference).

Web archives complement national database services

Some publishers are hesitant to create a paid Web archive service, for fear that it will cannibalize revenues from their participation in national database services like Nexis/Lexis, Dialog, Datatimes and others. Actually, you need to have both. Your own newspaper archive can attract local users who need to search only in your publication. The national services, meanwhile, are used mostly by people who need to search across multiple publications. A newspaper-specific Web archive can steer some local-market business users away from using the national services (which are expensive to use), it's true. But you will make up for it by attracting new local-market users (especially students and educators, as well as budget-conscious small business users) who will more than make up for any loss.

Keep pricing low for consumer market

Publishers setting up their own Web archives cannot follow the pricing model of the national database services (Dialog, et al). Those companies are addressing a market that is less price-sensitive. Local-market consumers will not bite when the price is too high. Make it cheap enough to use so that even the high school student researching a term paper will buy a few articles. My recommendation is 50 cents per article downloaded -- which is much lower than the majority of newspaper Web archives currently in existence charge. Go for high volume usage on your own Web site by targeting consumers, not businesses. Let the national database services that you participate in sell to the high end (businesses and researchers who can't go to the trouble of searching individual newspaper sites).

Free searches; paid article downloads

Let consumers search your archive for free, but charge them when they want to download an article.

Offer bulk subscriptions to archive

For heavy users of your archive, per-article download fees can become oppressive. Also offer bulk rates, such as 100 articles per month for $30. This will satisfy many small-business users. A $10 per month archive subscription offering 40 articles can appeal to families. Offer unlimited-download education subscriptions, to enable schools to open up the archive service to students and teachers.

Also, you'll need some kind of counter, so that an archive subscription user with 40 articles in a month knows how many she's used up. When the monthly limit is reached, the user should be notified with a screen that indicates options for paying for additional articles before the next month's subscription kicks in. (Don't start charging high per-article fees once a subscription limit has been exceeded -- as some newspaper archives today do. That just annoys your customers.)

Offer enough information in search results

It's crucial that a Web archive user get enough information from a free search to know whether it's worth it to him to download (and pay for) the full article. Search results should include: Headline; author; date originally published; length in number of words; and at least the first paragraph of the story.

Variable pricing based on story length

An annoyance of archive searching is when downloading a 50-word article costs the same as a full-length article. For articles of fewer than 100 words, you might charge half the price. (This needs to be indicated next to the search results.) Of course, this won't apply to your subscription option customers.

Offer variable ranking of search results

Create a system that allows a consumer to control how search results are displayed. Some of the best systems give the option of date order (forward or reverse), relevancy ranking, or frequency of searched terms as they show up in articles.

What to offer for free

I don't believe that archived material should all be charged for. What I advocate is that news sites keep that material that appears on their Web site free to access for at least 30 days -- if not 60. Only at that time should content that appeared in the past be converted to paid content.

Note that not all content of a newspaper appears on its Web site, so some current print-only content can be tagged in the Web archive as being charged. Thus, when an archive user searches for an article that ran in the newspaper 5 days ago, if that story did not run on the Web site, there will be a charge to pull it out of the database.

Use permanent URLs

I am a strong advocate of keeping URLs (Web addresses) for all stories active "forever." By using a permanent URL, people who learn about a story from any means will always be able to access it. That means search engines won't generate results that point to dead links (because a news site has removed a Web story and sent it off to a separate digital archive). Links to a particular story on a news Web site placed by individuals, groups and businesses -- perhaps left on the Web for years -- will continue to work. You will always provide future readers of your articles -- no matter how they were referred to you -- access to an article.

This approach doesn't have to conflict with your goals for having a paid archive available as a Web service. Yes, it's tricky, but the ideal solution is to integrate your free Web pages with your paid archive system. If you choose to keep free content on the Web for 30 days, for example, after 31 days that same content becomes paid. The Web surfer who stumbles on a link to a particular newspaper article that's 62 days old (perhaps via a link found on an unrelated Web site) clicks on the link and is presented with an informational screen that explains that the article he's looking for has been archived, and thus there is a small charge to extract it from the archive and read it. The surfer is invited to enter his newspaper archive account number (should he have one already); sign up for a subscription; or enter his credit card number to purchase the story. Then he sees the story he was looking for.

Include all print content in archives

Web newspaper archives sometimes are restricted in the content that is archived. Some wire services don't want their content archived. Freelancers' work sometimes can't be archived, due to contract restrictions. Syndicated content often has restrictions. Since we're projecting an "ideal" archive system, let's dream a little and suggest that you renegotiate these contracts to allow all content to be included in your electronic archive. At the least, when you negotiate new contracts, make sure they include language that permits you to archive digitally.

Archive photos as a 'reprint' revenue stream

The ideal archive also will have a search feature allowing users to search for photographs that ran on the Web site or in the newspaper. This may be a separate search interface, or it can be integrated into the main search screens. (e.g., choose Text or Photograph search.) The same principles as above apply. Let archive users search for free, and return "thumbnails" and captions as search results. You can charge a per-download fee (say, $5), or offer bulk rate or unlimited subscription plans. This also can generate hard-copy reprint revenues if you offer as an option a reprint service, in which you send a photographic print of the selected picture to the user's postal address for a higher fee.

Contextual Web archiving

As discussed in a recent column, it can be worthwhile to store not only the text and images that make up a news Web site, but also the page presentation of the day itself. Ideally, a Web visitor should be able to see exactly how a Web site looked on a particular day. If she wants to see what the Web site looked like two years ago, the presentation of that day appears -- which might be using an old site design. This is useful in showing the evolution of your Web "publication."

The principals stated above can continue to apply. For Web pages as recent as 30 days (to continue using the example above), the user is not charged to see the site as it looked that day. For a date older than 30 days, institute a charge to see the Web pages for that day.

Finally, should you implement a system like this (which will delight new media historians and scholars), think about what advertising strategy you want to employ. You can simply recreate your site for a particular day, running the advertisements that ran that day. Better, in my view, is to substitute today's Web ads -- which are still bringing you revenue -- in place of the old. This of course requires that you have kept using the same size ad placements throughout your site's history.

What else?

Have I forgotten anything in our "ultimate" Web news archival system? Let me know by sending e-mail to

More on Web news historical archives

In a column last week, I wrote about the need for news Web sites to keep historical archives of what their pages looked like -- for the benefit of media historians and scholars. Paul Grabowicz, visiting fellow in new media at the University of California at Berkeley's Graduate School of Journalism, is thinking along those lines, and wants to hear from news sites that may have stored away old copies of their Web pages from earlier days. The journalism school may establish a new media historical archive, he says.

Contact: Paul Grabowicz


Previous day's column | Next day's column | Archive of columns
This column is written by Steve Outing exclusively for Editor & Publisher Interactive three days a week. News, tips, and other communications may be sent to Mr. Outing at

The views expressed in the above column do not necessarily represent the views of the Editor & Publisher company


No comments on this item Please log in to comment by clicking here