'The Times' Has a Quarrel With News Search Engine

By: Steve Outing

It's happening more and more. News publishers operating Web sites and offering their content to consumers for free are disputing the right for other commercial entities to use their online content for commercial gain, even when it comes to something as seemingly benign as providing links to a news publisher's content as a service.

Such is the case with News Index, an American news-only Web search engine service, which keeps track of and indexes current content on dozens of news Web sites around the world. Recently, News Index received a complaint from one of those news sites -- The Times and Sunday Times, of London, owned by News International Newspapers Ltd.

News Index's founder, Sean Peck, received a "cease and desist" letter via e-mail from Dominic Young, copyright manager for News International, asking that Peck stop including The Times' site in its service and stop sending its "spider" to catalog the Times site -- alluding to copyright infringement. The two parties are currently in discussion, and both sides say they hope not to escalate this into a legal conflict. Nevertheless, the threat is there that it could.

Aggregating news Web sites

News Index is a tiny operation run by Peck in Pittsburgh, Pennsylvania. He started the news site archival service 20 months ago, and only recently has made moves to turn it into a commercial service and seek additional investor funding. The service is free to consumers, who can search on a topic and see matching articles from around 200 news Web sites around the world. It is supported by advertising. Peck also supports his enterprise by licensing his search engine technology to other companies.

Typically, Peck does not ask for permission before starting to index a news Web site. (He did this in News Index's early days, but stopped as more sites were added and it became too cumbersome.) News Index keeps track of current news content on news sites that offer their articles for free. For sites that require user registration in order to view, News Index can still catalog their content because the service's spiders (which visit the news sites and index articles) are programmed with a user log-in to enable access.

In general, what News Index does is very similar to other search engines -- both news-specific (such as Excite's NewsTracker, HotWired's Newsbot, or New Century Network's NewsTracker) and general-interest (such as Alta Vista, Infoseek, Excite, etc.). When a News Index user runs a search for a keyword(s), the actual search is run on the index residing on Peck's server. His spiders regularly poll news Web sites and grab documents off the news site. When a user requests a particular story that turns up in a search, by clicking on a hypertext link on a News Index search results page, he is taken directly to the news article as it appears on the news site's server. News Index does not insert any of its own branding into a news article; it simply refers the user to the publisher's page.


News International's Young says that his company has several objections to what News Index is doing. One of the "main objections" is that links from News Index bypass the Times' Web registration process. "Each registration number is intended for the use of one individual; the way News Index is linking to our stories undermines this," he says. In other words, it mucks up the Times' attempts to track user patterns.

Peck explains that because of the type of user registration process that The Times site uses, he is forced to use a single registration name and password that is incorporated into every Times article URL that a News Index user may come across. Peck registered to use the Times site once, then that log-in code and password are incorporated into all Times URLs. If he didn't do this, Peck says that News Index users would not be able to see Times articles. Instead, a link to a Times story would cause the user to see a request to establish a Times Web site free account -- but after doing so, the user would be directed to the Times home page and not see the article being sought.

Peck says he hopes to work through this problem with The Times. He thinks that if the site used a "more intelligent" user authentication process, News Index would not have to catalog the site using a single log-in code for every News Index user. A more commonly used method by news Web sites is http authentication, so that when a Web user clicks on a story link, a password dialog box pops up requiring a user name and password.

Another Times complaint is that "the links from News Index also bring articles up outside the frames which would otherwise surround them had a user entered our site via the main front page," says Young. "This is a problem because our identity, navigation and logo are contained within the surrounding frames and are not shown to visitors from News Index. Effectively, the design of our pages is altered."

This is similar to the argument made by a group of U.S. publishers who challenged TotalNews, a news Web site aggregator that previously put news site pages within a TotalNews-branded frame. (TotalNews stopped the practice, but the lawsuit filed by the publishers was settled out of court; thus, no legal precedent was set.)

Peck says that his is not the only search engine that will pull up a Times news article without the full Times-intended frame set. This is a shortcoming of the Times site, he says, that can be fixed with a few lines of Javascript added to Times Web pages. Peck has even sent Young a sample of the code. If the Times used this code, he says, the script would identify when a user was calling up an article without being attached to the proper other frames, then redraw the page with the correct presentation. Peck says that the FoxNews Web site had a similar problem, but that was resolved using such a solution.

Control of presentation

Another of Young's concerns is that "the articles brought up in response to a search seem to be unsorted for relevance, which can cause some quirky responses. For instance, the big story of the day can end up lower in the list than, say, an obituary which mentions in passing the same search terms," says Young. "If this happens it could create a bad impression of the way The Times covers stories. We want to ensure that users enter our site via the main front pages, so that the presentation of our material is the one we have constructed, not the one someone else's search engine has constructed."

"I can understand why they might want to do that," says Peck, "but I don't think they have much legal ground to stand on." With any search engine, he says, the results and ordering of the results are pretty much out of the control of the search engine itself; they depend on the particular way that a user enters a search string. Peck defends the way that his algorithms determine story ranking and says he doesn't plan to change them. Newest articles are always up top, he says, and ranking is based partly on frequency of the search term showing up in a document.

Young believes that his company's objections "are backed up by a number of legal issues," although he emphasizes that he doesn't wish to "go down the legal route." He is concerned that the indexing of the Times site by News Index without the Times' permission may be a copyright violation. "The indexing of a substantial proportion of our site, even in the form of abstracts, is an infringement of copyright and not permitted by English fair dealing rules," Young says.

Peck bristles at the notion that indexing a site is a copyright violation. He says that his system does not do full-text indexing -- though other search engines do -- and he chose not to specifically to avoid copyright problems. The index on Peck's server stores about 10% of a given article, he says, because the software analyzes each Web page and stores only words in the index that it discerns to be most relevant to the story. Those words are what are searched when a user conducts a News Index search. Abstracts of articles that show up in a News Index search results screen -- the first few lines of a story -- typically are 200-300 characters. Peck says he has been advised that his procedures are well within the acceptable limits of "fair use" according to U.S. copyright law.

Whose law?

Where this gets interesting is in determining which country's copyright laws apply -- the U.S. or England's. Sam Byasse, a Raleigh, North Carolina-based attorney specializing in U.S. copyright law, says that if The Times wanted to pursue a copyright infringement lawsuit against News Index (assuming that Peck didn't stop indexing the Times site), it probably would have to file suit in the U.S. Under the Berne Convention rules on copyright, the law that would apply would be that where the case would be heard in court -- that is, American law. Byassee thinks that applying U.S. law, The Times would have a tough time supporting its case that News Index is violating copyright, because on the surface it appears that News Index is operating within U.S. "fair use" guidelines.

Should this dispute ever evolve into a real legal dispute, it could have profound implications on the search engine industry should The Times get its way. At this point, that seems unlikely.

Perhaps the main issue in this dispute is whether a news publisher on the Web has the right to restrict what outside companies can utilize the publisher's (free-access) content for commercial gain. A search engine like Infoseek may follow similar procedures to a site like News Index, yet Infoseek is allowed to index the news site while News Index is not. Assuming that neither company has a formal business relationship with the publisher, is it fair for the publisher to give implied consent to one company but turn down another? The answer to that one may have to wait for a future lawsuit.

Of course, legal cases are expensive, and most publishers would probably prefer to avoid them. The simplest solution to this particular case, should the two parties not be able to come to a compromise that allowed continued indexing of the Times site, is for The Times to block access by News Index's spider. Peck says he's only had one other publisher object to what News Index does prior to the Times' complaint -- and that news site blocked out his spider.

For his part, Peck says he's willing to stop indexing the Times site if he and Young can't come to an agreement -- though he would prefer to keep it as a service to News Index users, and as a way to send additional traffic to the Times site.

The 'ultimate' news archive, redux

In my last column, I made some suggestions for creating the ideal Web newspaper archive system. Dana Blankenhorn, publisher of the newsletter "A Clue ... to Internet Commerce," wrote in with an additional suggestion:

"Here's something you forgot: .pdf. It's simple to scan old print articles into .pdf files, which replicate the original faithfully. Why? If you can get these .pdf files admissable in court in some way, you can now offer something valuable to lawyers (and anyone else with a case). Newspaper stories are often sought-out by those involved in legal cases, and you want to take that old 'morgue' business online if you can. It's also great for sending clips to grandma. ... That's the kind of thing you can earn money from."


In my "ultimate archive" column, one of my suggestions was to rewrite contracts to allow archiving of freelancers' work. Reader Anita Bartholomew wrote in to point out that my wording could be taken to imply that I advocated that publishers grab freelance writers' work in order to profit it from it. She writes:

"Just to make your recommendations idiot-proof (because a few of your readers out there may assume you're advocating they swipe someone else's property through strong-arm tactics and then try to re-sell it), you might note in a follow-up that, when archivers negotiate with freelancers to allow all content to be included, they will have to pay the freelancer a license fee for that content each time it is accessed.

"While I'm certain this is what you meant to imply, since you've told archivers they can expect to be paid when that same content, the freelancers' property, is downloaded from the database, it would be easier for all if you stated this clearly."

Bartholomew is right; indeed, in the past I have written columns advocating that freelancers be paid for giving digital archive rights to their publishers. My only quibble with her statement is that license fees also can be an X% increase in their fees. Paying every freelancer an archive per-access fee can be an accounting nightmare, unless an adequate automated accounting system can be built into the digital archive structure.


Previous day's column | Next day's column | Archive of columns
This column is written by Steve Outing exclusively for Editor & Publisher Interactive three days a week. News, tips, and other communications may be sent to Mr. Outing at steve@planetarynews.com

The views expressed in the above column do not necessarily represent the views of the Editor & Publisher company


No comments on this item Please log in to comment by clicking here