By: Mike Maher
University of Texas researchers are developing software to allow
the Newton Messagepad to categorize wireless news transmissions sp.
APPLE COMPUTER’S NEWTON Messagepad can now receive the world’s news via wireless transmission. But Newton owners won’t have time to scan megabytes of text to find the stories that interest them.
To serve as an electronic newspaper, the Newton must have some kind of built-in editor to filter incoming stories. Researchers at the University of Texas-Austin are now developing just such an electronic gatekeeper.
The software will allow the Newton to sort newsfeeds into nine categories ? sports, entertainment, crime, accidents, business, protest, politics, war and medicine. According to the user’s specifications, the Newton will then decide which stories to display intact, which to discard, and which to truncate.
To train a computer to analyze news content, programmers specify reliable criteria that enable the computer to distinguish, say, a sports story from a business profile, banner-headline news from filler.
Journalism professor Wayne Danielson, who heads the Newton research project at UT-Austin, began investigating the editing capabilities of computers many years ago. He designed a program that, in 1964, categorized Associated Press stories, edited them to fit a desired newshole, then printed them.
Today’s hardware is several orders of magnitude faster and smaller than the clunky, unreliable mainframes that first mimicked editing decisions in churning out hard-copy “computer newspapers.”
This hardware evolution, combined with exponentially growing stores of information, combined with the public’s demand for rapid and remote access to widely varied sources of information, has made computer editing a very modern concern.
Because such massive amounts of information can now move on electronic superhighways, people need some sort of software to perform the traditional editorial tasks of finding, categorizing, ranking and winnowing information.
For electronic news, the ultimate bottleneck is not what’s fit to print, or some ad sales-dictated newshole, but individuals’ reading speeds and their time available to ingest information.
Future editing computers likely will be judged by how densely they can concentrate information valuable to the user, per unit of the user’s time.
“My hunch is, the only way people will use big databases will be with a helper ? we’re calling it a personal editor,” Danielson said of the Newton project.
He sees a commercial future for such a digital editor, but acknowledges that editing software must respond to users’ wishes.
“People want some help, but they also want flexible and timely output adjusted to their schedule,” he said.
Danielson foresees the personal editor offering a Newton user three options.
First, the user could let the Newton make selections. The personal editor would mimic a real-world editor’s decisions as closely as possible.
Second, the user could specify percentages from each news category that would make up an “edition” of stories.
Third, the personal editor could infer the user’s news interests by tracking how much time the user spent reading from each story category in the past, then rendering the same mix of stories.
The software now being developed at UT-Austin will allow the user to decide the length of the electronic edition, in terms of minutes rather than pages or column inches. Having previously measured the user’s reading speed, the Newton will decide for each desired news category which stories to run verbatim, which to discard, and which to cut from the bottom.
The Newton personal editor works in two steps: categorization and ranking. First it sorts the available stories, then the software must judge the importance of each story, another exercise in content analysis.
Indicator words like President, earthquake, Congress, war, new, and death signal news value; so do large dollar amounts, geographic and temporal proximity, and the names of important people. Even the word count can indicate comparative importance within a set of stories.
To categorize, the program contains a set of mutually exclusive keywords for each of nine content categories. The categorization program will match the verbiage from a story against the keywords. The category with the most matches wins.
If, for example, it found 12 words that matched the politics category keywords, seven that matched medicine and five that matched business, it would assign the story to politics.
Danielson said that eventually the personal editor could allow users to enter their own keywords into the Newton’s permanent memory, so that the users’ favorite stocks, sports teams, cities, celebrities and so forth could become part of Newton’s criteria for categorization and salience. However, this feature will be beyond the scope of the initial personal editor.
Computers will never analyze content flawlessly. Even humans can seldom agree 100% on how to categorize news stories.
“People had a 94% to 96% agreement,” Danielson said of the content analysis reliability checks done by his graduate seminar, using AP stories. “Computers agreed with human coders about 85%. Neither is perfect. Some news fits two categories: health care reform has aspects of a government story and of a medicine story.”
Because the Newton’s CPU and its available memory are limited, thus far research on the personal editor has been done on personal computers and has concentrated on software strategies.
Researchers are now shifting their attention to hardware demands and the user interface.
According to Gale Wiley, technology coordinator for the UT Journalism Department, the next steps will be to get the personal editor software rewritten into Newton source code, and test Newton’s capacity for wireless newsfeeds.
“We’ll take a hard, objective look at how much memory the Newton has,” he said. “Then we’ll recommend to Apple hardware and software requirements.”
In early tests of the Newton’s storage capacity, Alan Griffy, a lab technologist at the University of Texas, loaded 35 full-length AP stories into the 128-kilobyte memory of a pre-upgrade Newton, using a hard-wire connection.
In March, Apple Computer released an upgraded Newton with almost twice the memory, three times the battery life and better handwriting recognition software. Newton offers one PCMCIA (Personal Computer Memory Card Industry Association) slot.
Two-megabyte memory cards are now available for this slot; the forthcoming four-meg cards will give Newton enough storage to download a fair-sized wireless newsfeed.
However, to receive news from a wireless source, the Newton must devote its slot to a wireless receiver card. Such cards are now available. Motorola’s NewsCard is a credit card-sized receive-only modem designed for palmtop computers like the Newton. These cards have on-board memory, but not enough for large file transfers.
As an electronic newspaper, the Newton will be able to carry only text stories for the near future: no digital halftones, no video.
More-sophisticated electronic news delivery is now commercially available for laptop computers ? particularly via hard-wire sources like telephone modems. But Newtons aren’t laptops.
“People compare the Newton to a laptop rather than to a day planner,” Griffy said. “The Newton only weighs a pound.”
“The major issue with Newton is portability,” Wiley added. “It’s almost like an article of apparel.”
Given Newton’s limited size, screen, and processing power, Danielson’s approach to computer content analysis may be ideal.
Training a computer to do editing involves a trade-off between simplicity and speed of the software on the one hand and accuracy of results on the other.
No computer will perform as well as a human editor, but boosting a computer’s editorial accuracy from, say, 85% to 95% involves a tremendous increment of programming ? which slows processing and requires much more computing power.
“I’m not that interested in getting the last 10% of accuracy,” Danielson said. “I feel strongly about it. It goes back to my early experience with computing. On one project, I had to rely on a programmer. He worked for a year on the problem of programming a computer to recognize syllables. In the end, he was dissatisfied and I was too. I learned to program and solved the problem in an afternoon.”
Danielson’s keyword-dependent approach to content analysis cuts the Gordian knot of computer-analyzing our polysemous, inconsistent language. More-accurate results might come from a content-analysis approach using boolean operators (e.g. and, or, not) to test the proximity of one word to another. For example, does plane precede crash in the same sentence?
But establishing context is coding ? and processor ? intensive, which slows the results and requires more computing horsepower than the Newton will have any time soon.
Ultimately the gain in accuracy ? if any ? from a context-driven approach may not be worth the costs.
Undoubtedly the editorial Newton will misread some stories. In one test of categorization, the personal editor software miscategorized a story because it recognized bass as a musical term rather than a fish. Such misreadings are inevitable.
“If artificial intelligence were as good as real intelligence, they wouldn’t call it artificial,” Danielson said.
Misreading may be the Newton’s curse. The device that began its product life by misreading its user’s handwriting may, in its upcoming role as personal editor, categorize a story headline “Raiders to battle Patriots” as a war story. But the reader can quickly zap a misassigned story and proceed to the next one. And if the reader finds that the Newton has truncated an interesting story, its personal editor can restore the deleted text.
Whatever mistakes the initial personal editor may make, for the Newton user who wants to download the day’s news, the alternative of reading through large, unedited masses of text is far more daunting.
?( UT-Austin journalism professor Wayne Danielson brings more than 30 years of experience in computer content analysis to the task of creating software “personal editor” fo rthe Newton Messagepad” ) [Photo & Caption]
?( Maher is completing his doctorate in journalism at the University of Texas) [Caption]
? ( Alan Griffy (left) and Gale Wiley have used a Newton simulator running on a Macintosh to test how well the Newton can serve as a downloadable portable newspaper) [Photo & Caption]