By: Charles Bowen
Speech-Recognition Software Indexes Popular Radio Shows
Check out the new technology at http://speechbot.research.compaq.com
For researchers, radio always has been a kind of nether world. Comments made on radio talk shows, interview segments, and call-in programs have never been easy to retrieve in the days, weeks, or months after they air.
Transcripts often don’t exist, and if they do, they are likely to be merely printed and thus not electronically searchable. Many broadcasters may keep tapes of old shows in their vaults, but most aren’t indexed, except perhaps for the date and name of the broadcast.
So a new technology being tested on the Web by the Compaq Computer Corp. is turning heads. SpeechBot is an experimental index of about two dozen popular U.S. radio shows that uses an exciting new speech-recognition system enabling you to keyword-search audio.
The site already can be used to search more than 5,400 hours of content from some 5,600 radio programs, some going back to March 1999. And the index is updated daily.
Usually, each program is added to the database on the day after it airs. Among radio offerings now being indexed are the advice program of Dr. Laura Schlessinger and the interviews of Terry Gross on her “Fresh Air” broadcasts, as well as the chatty “Car Talk,” the data-rich “Motley Fool Radio Show,” and PBS’ “Online NewsHour.”
Users simply go to the SpeechBot Web site, type in keywords for the topic they seek, then watch as the service returns the actual audio clip that can be played through the computer. It works this way: After one of the highlighted shows goes on the air, the speech-recognition software creates a “time-aligned transcript” of the broadcast, automatically building an index of the words spoken during the show.
To give it a try, visit the site at http://speechbot.research.compaq.com, where a search form is displayed at the top of the introductory screen.
In the “Search For” box, enter a word or phrase you are seeking. Use the drop-down list at the right of the keyword box to specify whether you want to seek text that contains “All of these words,” “Any of these words,” “This extra phrase,” or “This Boolean expression” (if you are using logical connectors, such as AND, OR, or NOT).
In the “Show” field below the keyword box, click the down arrow to select the show you want to search or leave it on the default setting of “All Shows.”
In the “Dates” field, use the down arrow to specify a range from “All Dates” to “In the Last Year.” Click the “Search” button to begin the search. SpeechBot then searches its index to try to match the sought-after words with those it has preserved. It then displays matches in order of likely relevance. Links are provided to play the actual audio clips of the material you have found.
Note: Don’t bother trying to read the actual transcripts. They generally appear as gobbledygook to us. The transcripts that are output by the software (and shown in small extracts along with the results of your searches) rarely match what was exactly spoken on the air. “Because different people speak at different rates and with different degrees of clarity,” say the site’s managers, “speech-recognition software does not correctly interpret every word.”
However, research has shown that meaningful words are recognized with a high degree of accuracy, and that even when a word is missed, it will most likely be recognized when it is spoken somewhere else in the program.”
Other considerations in using Speechbot for your work:
If you are not finding what you are searching for, try a different spelling. Remember, this is audio, so think sound as much as spelling. For instance, the speech-recognition software treats “male” and “mail” the same because they sound the same.
You can use partial words in your searches, using the asterisk (*) as a wild card at the end of a word to indicate that you seek any words that begin with the letters you have designated before the asterisk. However, you must include at least three letters before the asterisk.
You’ll need RealPlayer, version G2 or later, installed on your computer to play the sound clips you find. If you don’t have the software, visit http://www.real.com/player/index.html and follow the site’s download instructions.
Bowen writes columns, articles and books from West Virginia, and is host of the daily Internet News syndicated radio show (http://www.netnewstoday.com).
Copyright 2000, Editor & Publisher