By: Steve Outing
The practice of online services like America Online “caching” content from Web sites is not well liked by many publishers. But while it may be a necessary “evil” that we’ll have to learn to live with, some publishers are beginning to put up a fight over the practice.
New media executives at the Financial Times (London) report that they are consulting in-house legal staff and are about to request that AOL cease caching pages of its Web site, which is advertiser supported. The implication is that the venerable financial newspaper might take legal action if AOL doesn’t comply, citing lost revenues due to the undercounting of actual viewers visiting its Web site as a result of AOL’s caching techniques.
What’s so bad (or good) about caching? This practice has been going on for some time, with major online service providers like AOL setting up “proxy servers” on their networks which save Web pages for re-use by other AOL users cruising the World Wide Web. Here’s how proxy servers work:
When an AOL user requests to go to a particular Web site, the request first goes through an AOL proxy server which has stored a large number of copies of Web pages that other users have requested previously. If the requested page already is stored on the proxy, and the page hasn’t yet “expired,” then the AOL user gets sent the cached page, instead of the original page from the remote Web site. The publisher’s Web site, then, does not record a direct page view because the proxy has “intercepted” the request.
That sounds bad, of course, from the publisher’s point of view. He’s losing customers who should be recording page views and hits on his server, which are of great value to any Web publisher — especially those who are selling online advertising based on traffic to the site. (But it’s not an entirely awful situation, as I’ll explain a little later on.)
Here’s the good side of proxy servers and caching: For an extremely busy provider like America Online, caching pages is necessary for its users to get a reasonably fast online experience on the Web. When an AOL user requests a page that is held in an AOL proxy’s cache, the page is served up relatively quickly. When a page is not cached, the request must go out on the greater Internet, and in the case of AOL, the performance will be fairly slow. (NetGuide columnist Robert Seidman has tested Web access performance via AOL against some Internet service providers, and found that without the cached pages, AOL’s performance at serving up new Web pages is much worse than what the ISPs deliver.)
This is good for AOL, but it also can be argued that caching is good for Web sites, because AOL users cruising the Web will experience fewer delays when viewing a Web site. AOL also is benefitting the Internet as a whole, by reducing congestion that AOL’s 7 million users would otherwise be causing on the Internet itself. Indeed, a large number of AOL users all hitting a popular Web site could overload the site’s server.
Is this problem overblown?
In reporting this column — and trying to understand what is an extremely complex issue — I found a split within the publishing community on how significant a problem caching really is. At the FT, editor of electronic publishing Paul Maidment says that AOL caching his site is trampling on the paper’s intellectual property rights. “The Financial Times has always been vigorous and vigilant in defense of its trademarks and copyright,” he says. “That is our policy in electronic media as well as print. … The emerging consensus of Internet lawyers is that caching is an infringement of copyright and that it hurts the revenue of an advertising-supported site by reducing the number of hits that a site gets. There is further damage in that the site is kept at an artificial arm’s length from its users.”
But the technology director of a large U.S. newspaper, who asked not to be quoted, says he thinks the issue is overblown. He says that with his site, he gets an accurate count of people who visit via AOL even when they receive a page that’s been cached on an AOL proxy server. That’s because AOL’s proxies appear to be “well behaved” and send a notification message to his site each time an AOL user picks up an AOL-cached page that comes from his site. The solution to deal with AOL’s caching technique, he says, has been to log these notifications and count them as page views.
This isn’t an ideal solution, of course. Some sites insist on counting “hits” to ad banners on a page, which is data that’s given to advertisers. The technique just cited records views of ad banners on a cached page indirectly, assuming that it occurred because of evidence that an AOL user hit that page; there’s no direct evidence or record of an ad banner being viewed by a reader. Ad networks that serve up ads to multiple Web sites and Web measurement companies all want precise counts for when an ad banner has been served up to a reader, so there is pressure on publishers from these companies to avoid being cached.
Also, it should be noted that AOL isn’t the only online access provider running proxy servers. AOL’s proxies seem to be “well behaved,” and AOL’s director for product marketing, Dominick Stirpe, says that his group is constantly revising its proxy servers to reflect the fast-changing Web environment, and works to keep publishers satisfied with their sites being cached by AOL. Other proxies may not be so well behaved, so it’s likely that sites are losing counts of readers who see cached pages from poorly configured proxy servers.
The solution for the Financial Times will probably be a simple one. Stirpe says that when a Web site requests that AOL cease caching its pages, AOL will comply. Bill Densmore, president of Newshare, an online transactions developer working with the publishing industry, says he sent AOL a letter last year demanding that his site not be cached and threatening legal action if AOL didn’t comply. AOL immediately stopped caching the Newshare site.
The other solution, according to Stirpe, is to set an instant expiration date to a site’s Web pages, which forces AOL’s proxy servers to get a new page every time an AOL user asks for a Web site page. A site that updates its content frequently, such as a stock ticker service, will want to do this, for example, so that AOL users (indeed, all users) are ensured of getting the most recent content from a site.
But the instant expiration solution — as well as being removed from AOL’s cache list — has a major downside, in that AOL users will experience the site as being very slow. Worse yet, an instant or very short expiration date will cause all users’ (not just AOL users’) experience of the site to be slower. On a page with an instant expiration, each time a Web surfer goes to a page, the page will be loaded from the server — even if a recent copy of the page is still in the user’s local browser cache. That’s why many news sites, which update a few times a day, set their Web pages to expire after several hours rather than instantly.
There may not be an ideal solution for publishers to the cache problem. If you’re concerned about losing traffic to services like AOL that cache your pages, the best advice may be to request — or demand — that your pages not be cached on a service’s proxy servers. Unless you are constantly updating your content, continue to set your expiration dates for pages at an hour or more, so that you don’t degrade the experience that your Web readers will have at your site. For AOL users, your site will appear much slower than others, but that’s a compromise you’ll have to live with.
The long term view
Proxy servers can’t cache dynamically created Web pages, only static ones. So for sites like Star Tribune Online, the Web service of the Star Tribune in Minneapolis, Minnesota, on which most pages are generated on the fly by CGI scripts, caching by AOL isn’t a problem. Editor/manager Steve Yelvington points out that as Web sites become more sophisticated and static Web pages become a thing of the past, caching won’t be an issue because it won’t be possible.
The caching issue won’t go away soon, however. In reporting this column, I spoke with a number of Web publishers, advertising people and lawyers who are very much concerned by the online services’ caching policies. Internet intellectual property lawyers that I spoke with were split on whether copyright and “fair use” laws permit caching at all. One told me that AOL caching Web pages — in effect, copying them onto their servers without getting permission from the publishers — is technically copyright infringement, and the courts have yet to rule on the issue. Another predicts that challenges to caching won’t fly on the infringement argument, but that publishers might prevail on the economic loss argument.
This is an issue that will probably end up in the courts someday soon. Stay tuned.
Previous day’s column | Next day’s column | Archive of columns
This column is written by Steve Outing exclusively for Editor & Publisher Interactive three days a week. News, tips, and other communications may be sent to Mr. Outing at email@example.com
The views expressed in the above column do not necessarily represent the views of the Editor & Publisher company