By: Anick Jesdanun, AP Internet Writer
(AP) One company has an idea for how search engines can catalogue the Web more completely. Another believes it can better divine what a searcher wants. Yet another is trying to synch all that with how the human brain works.
Startups and leading tech companies, including search exemplar Google, are tinkering with new ways of culling and presenting information — ones that could prompt the next revolution in search.
“Because information is exploding, (the Internet) is going to become increasingly difficult to use if we don’t get it right,” said Liesl Capper, chief executive of Australian search startup Mooter.
Current technology troubles users like private investigator Cynthia Hetherington. When she suspected an Australian company recently of possible fraud, Hetherington turned first to Google. But then she went to the Australian Securities and Investments Commission, LexisNexis and Dun & Bradstreet.
Users who consider Google exhaustive are only fooling themselves, experts say. Today’s search engines may be capturing as little as 1% of the Web, largely because of how they find and index online resources.
“It’s very frustrating,” said Hetherington, who runs a Haskell, N.J. company. “It’s like going to a library and only pulling one book off the shelf.”
Search analyst Danny Sullivan sees promise in developments to address such flaws, and he believes tomorrow’s search engines are likely to blend the best. But he also cautioned that the Internet is littered with search innovations that failed to draw investors or market share.
Currently, all search engines fail to capture the bulk of the “invisible Web” — resources locked up in databases and inaccessible by the engines’ indexing crawlers. These include regulatory filings at the U.S. Securities and Exchange Commission, detailed reports on charities at GuideStar and complete archives of most newspapers.
Sometimes, accessing an “invisible” database requires payment. Search engines can’t let you know about a document’s availability for purchase if they can’t scan it in the first place.
But even when a database is free, a site may require registration, prohibit search crawlers or use incompatible formats.
In particular, crawlers are stymied by dynamic Web pages, which are customized as users choose various options, such as car color at Cars.com.
To counter that, Chicago-based Dipsie Inc. is developing software that promises to fill out Cars.com’s simple online forms, which are based on multiple choice, though not the complex ones for the government’s patent and trademark databases, which require typing in keywords. A public test version is expected by summer.
Other companies are working to capture sound and video files that have troubled text-based crawlers.
StreamSage Inc. uses speech-recognition technology to transcribe feeds, so a search engine can pull out relevant portions of a long presentation. Company President Seth Murray said Harvard’s medical school and NASA already use the technology, but engineers still must speed it up for broader use.
Yahoo Inc. is going a less technical, more controversial route: Businesses can pay to ensure that their “invisible Web” pages get indexed.
But indexing more of the Web only brings up another challenge — identifying the most relevant among the billions of documents available. So some search developers are focused on personalizing and organizing searches.
Eurekster Inc., a startup launched in January, is marrying search with social networking, in which friends, your friends’ friends and their friends form online circles. Eurekster guesses what you’re seeking based on what others in your circle have found relevant.
“At the moment, when you search on Google, everyone gets the same results for the same keywords,” said Shaun Ryan, vice president of business development for Eurekster in New Zealand. “We try to personalize those results.”
So a search for “casting” might produce sites on movies if your circle is heavily in entertainment, fly fishing if members enjoy weekend outings.
The major search engines, meanwhile, are trying to localize results, Yahoo! and America Online having an advantage over Google because they already have billing or registration information on many users.
And sites like SuperPages.com are tagging data, so customers can search not only by city but by store hours or credit cards accepted. Adding “Saturday” to a Google search might get you a store that’s closed Saturday, or it might indicate Saturday’s hours.
Tags also help Factiva personalize its archives of 9,000 news sources, so an engineering team gets tech-heavy results, while the marketing department gets consumer-friendly documents.
“People don’t want to be spending time searching and looking for things,” said Clare Hart, Factiva’s chief executive. “They want to be spending the time analyzing the information.”
At Microsoft Corp., researchers are exploring ways to return specific facts rather than entire documents. A search for “Marilyn Monroe’s birthday” would return an answer, “June 1, 1926,” instead of sites on her famous “Happy Birthday, Mr. President” performance.
“We still have this library metaphor of ‘Let me give you back a bunch of books that might help you,’ … rather than ‘Let me go through the books for you and figure out what you’re looking for,” said Eric Brill, a senior researcher with Microsoft’s AskMSR project.
Mooter tries to mimic the brain’s organization methods by identifying underlying themes and grouping sites — a search on travel in Spain might separate hotels from warnings about terrorism. Mooter also attempts to refine results based on links a user visits.
Building the technology is expensive, and some experts believe the best tools may be developed by and reserved for pay services like Factiva and ChoicePoint Inc., which aggregates personal, financial and legal data from a variety of government and corporate sources.
But don’t count Google out. It has hundreds of engineers in California, New York, India and soon Switzerland working to make searching better, most recently with localized searching.
Google’s director of technology, Craig Silverstein, said the industry leader must keep innovating because search is bound to morph into something completely different within a decade.
“It will be something that we haven’t even thought of yet,” Silverstein said. He offered few details, but the Google Labs site offers a peek.
One project, Google WebQuotes, returns listings with comments from other sites to help you evaluate a site’s credibility and reputation.