By: Steve Outing
A traditional printed, non-fiction book wouldn’t be nearly as valuable without an index. A table of contents up front is only of limited use when a reader wants to look up a specific piece of information; for that, you need an index. That’s why nearly every reputable book publisher has an indexer on staff, hires contract indexers, or expects the author to provide a finished index for every non-fiction book it publishes.
But with Web sites, a professional-quality index is a rare find. Most sites have the equivalent of a “table of contents,” often in the form of navigation links on the home page. The role of an in-depth index most often is played by a search feature; site users find what they need by typing in keywords and getting pointed to documents that contain the words.
But expecting a search feature to do the job of an index is a big mistake, according to indexing professionals, who are just starting to look at Web sites as a new market in which to ply their services. When you expect your readers to find content on your site via a search feature, you are providing hit-and-miss navigation for your users, they say.
Not the same as an index
Using a search engine to find content on a Web site is similar to using a “virtual concordance,” says Kevin Broccoli, a professional indexer, principal of Broccoli Information Management, and chair of the Web Indexing Special Interest Group of the American Society of Indexers. A concordance lists every instance in a book (or Web site) where a word shows up. Hence, it’s possible to use software to automatically create a concordance; for a book or Web site, it would select those words and phrases used most often throughout the text and create an “index” for readers.
The trouble is, concordances don’t help readers much. Broccoli says as an example, he tried an experiment with a large Web site about animal care. He typed in “dog food” on the site’s search feature, and got back more than 230 documents that contained those words. Next, he spent several hours looking at all those documents. He found about 15 of the search engine returns were repeats; some of the results led to navigation pages; and the number of relevant pages out of that voluminous search result screen turned out to be 5.
Of course, a normal Web user isn’t going to spend hours, or even very many minutes, slogging through dozens of search results to find the few relevant documents; most likely, the user will give up before finding the documents she really wants.
An index of the site would have solved this problem, Broccoli suggests. With a proper index, the animal site might have an index entry for “Dog food,” with several subcategories nested beneath that entry. The Web user looking for nutrition information about dog food would merely need to consult the index and look up “Dog food –> nutritional value,” for example; click on that entry and the user is at the appropriate page.
Automation of the process of creating such index entries has its drawbacks, and Broccoli suggests that a human element likely will always be necessary — although new software tools are being developed to take over the tasks in the indexing process that are drudgery. The trained-human element comes in because most indexes require entries for words that many not even appear in the book or Web site. In the animal site example, the word “nutrition” might not be included in the site, but several documents covering the topics of “food,” “canine health,” etc. should be categorized under the heading of “nutrition.” A computer program simply can’t deal with a situation where a concept is implied but not created, Broccoli says.
It also takes some human intelligence to categorize entries in the most useful way, he says. An “About the company” page on a Web site might be listed under “A” by either an inexperienced human doing a site indexing job or by a computer indexing program. It takes a human indexer to make the determination that the page should be listed under “Smith Technology Inc. –> About the company” or “Smith Technology Inc. –> Company information.”
While it’s pretty much a no-brainer to come to the conclusion that most Web sites with extensive content would benefit from having an index to aid user site navigation, creating one isn’t so easy. Traditional publishers hire experts to create indexes for books, because creating a good one requires special skill that most editors don’t have, unless they’ve gotten formal training in how to do it. Some book authors create their own indexes, but Broccoli says that unless the author has been trained in the art of indexing, the final product often isn’t as useful to the reader as one created by an independent indexing professional.
Having an index done by a professional indexer can be expensive or cheap, depending on the indexer hired, but it’s not unusual for an indexer to charge $50 an hour or more for their work. A fairly typical rate for book indexers is $3 per printed page (so, about $900 to create an index for a 300-page book). Broccoli says he’d charge about double that per Web page, so a 1,000-page Web site might cost $6,000. Then there would be ongoing charges if the indexer is asked to maintain and update the index as new content is added to the Web site. (Those are just rough estimates; your mileage may vary.)
Given the state of the Web content business — where profitability remains a vision for many sites — some publishers are wary of adding to their expenses, of course. Those wishing to create their own indexes now have the option of doing it themselves. Web site indexing software is in its infancy, but a new application called HTML Indexer is due to be released this month by Brown Inc. Developer David Brown has designed an application that allows Web site publishers (and professional indexers) to automate much of the site indexing process. Brown believes his software, which is within weeks of final release (cost, $230), is the first tool to hit the market designed specifically to create an HTML site index.
The concept behind the software is that it can automatically read over a Web site, generating an index of every page on the site and anchors within each page. In effect, it creates a sort of computer-generated index of a site. With the bulk of the drudge work accomplished by the application, the next step is for a human indexer to check over and overwrite the default text for each page entry to make the final product more intelligent and useful to the reader. The index information can be imbedded in each HTML page file, which facilitates updates of the index as site content is added or deleted.
Brown says his software can be used by Web publishers without calling in a professional indexer, but he doubts that the end product will be as good or as useful to Web site users as if a professional indexer does the job aided by HTML Indexer. He likens it to a programmer writing his own documentation; the user manual will be much better if a technical writer is called in to do the task, even though the programmer is perfectly capable of producing a user manual for the product he’s written. And don’t rely too much on the automated entries produced by the program, Brown warns. He says that in more than 9 times out of 10, the indexer will want to overwrite or edit the default text produced by the program.
Broccoli is excited by the concept behind HTML Indexer because he says it will eliminate the repetitive tasks faced by an indexer, and reduce the number of errors inherent in the process when done entirely by a human. This sort of Web site indexing software, which is only now coming to market, also will make it easier for site publishers to keep an index updated with new content listings as content is added to a site over time.
If you surf around the Web today, you won’t find many sites with indexes, and very few have good indexes. (Rather, they settle for search features.) The most common format for Web site indexes is a simple alphabetical list of document entries, with a clickable alphabet at the top of the page. Click on a letter and the page scrolls down (or a new page opens) with entries starting with that letter. As with any good traditional index, subcategories are nested under main categories on some.
Here are a few examples of Web site indexes:
Florida Internet Center for Understanding Sustainability (This one was produced by Broccoli.) Brown University America Online Advances are on the way in terms of how indexes are presented, too. One nice implementation of an index is shown as an example on Broccoli’s Web site. It’s a Java applet designed to present a site index in a very small space. In the top field a user types in the letters or words for the part of the index he wants to access, and the index listing below it goes to that part of the index. The user then clicks on the desired word to highlight it and clicks “display” to go to the referred page.
The Java applet index currently works only for Internet Explorer 4.0 and higher browsers, but Broccoli hopes to find some programming help to create a cross-platform version of this type of indexing scheme in the near future. This type of solution is particularly appealing for sites that have so much content that their indexes would be huge. An “ABCDE…” standard index scheme becomes unwieldy if the listings for “D” alone number more than 500, for example. This alternative scheme solves that problem.
Broccoli hopes that as Web sites mature, they will begin introducing professional indexes. Just as indexing is an accepted part of the book publishing industry, he hopes that in time it will become a routine part of Web site publishing as well.
Contacts: Kevin Broccoli, email@example.com
David Brown, firstname.lastname@example.org
Get Stop The Presses! by e-mail
If you would like to get e-mail delivery of the Stop The Presses! column, there are two options:
1) Text e-mail. I send out a text e-mail message containing a brief description of the current column, along with a URL link to the actual column on the E&P Web site. To receive these regular reminders, sign up here.
2) HTML e-mail If you have a mail reader that can handle HTML messages, have the entire column delivered to you whenever a new one is published. Sign up here.
Got a tip? Let me know about it
If you have a newsworthy item about the newspaper new media business, please send me a note.
This column is written by Steve Outing for Editor & Publisher Interactive. Tips, letters and feedback can be sent to Steve at email@example.com