By: Eric Wolferman
Digital Input Column
Newspapers have made great advances in managing and manipulating data, but I often wonder whether we have made as much progress as we should.
In recent years, great emphasis has been placed on “content management” and “data warehousing,” both terms that essentially mean immediate and flexible access to information. Yet our data is still, more often than not, locked up in individual systems.
To be sure, there have been bold experiments designed to make our information more accessible and malleable. E&P‘s Jim Rosenberg has reported on extensive data-mining efforts, carried out under the tutelage of venerable technology executive William E. Toner, at The Arizona Republic in Phoenix and at North Jersey Media Group Inc. (Jan. 14). At The Tampa (Fla.) Tribune, efforts to converge print, broadcast, and digital news gathering and dissemination have been widely reported. And newspapers across the country have come up with clever schemes to adapt print material for Web distribution.
But, by and large, we still spend far too much time developing unique interfaces to pull data from one system to another, or reformatting it altogether to make it usable.
For newspapers, information is our lifeblood. We have enormous amounts of it, intended both for internal and external audiences. There is not a publisher in the country who hasn’t thought about the huge benefits of being able to see advertising, circulation, marketing, production, and financial data from a single source. And there is not an editor who has not thought about how wonderful it would be to direct news material to any number of conduits without reformatting or performing any technical gymnastics.
At North Jersey Media Group, the solution has been to build a data warehouse that pulls information from a variety of feeder systems, such as circulation and advertising systems. In turn, executives can manipulate and analyze data from the entire enterprise through appropriate front-end software tools. Although still in development, the project has already yielded valuable results in the ability to evaluate circulation data for marketing purposes.
The idea of a superbase for business data is certainly a workable approach to the problem, and perhaps the only practical way to address the situation today. But it can be very expensive and still involves the development of many special interfaces to suck data from disparate systems into the warehouse.
To be truly effective and practical in the future, full integration of data across the enterprise will require standards. The lack of such industry standards for fundamental data formats and structures is the single biggest obstacle to fluid manipulation of information in newspapers today.
That’s not to say other industries are not plagued by the same problem. But, for newspaper publishers, it is particularly vexing because of the volume and diversity of information we deal with.
As I’ve reported here before, we’ve made great strides in standardization of some areas. The industry has completed a standard structure for classified advertising records, and the International Press Telecommunications Council has developed the remarkable News Industry Text Format (NITF) to define the content and structure of news articles. With the NITF, publishers can direct content to print, Web, and even wireless distribution channels with minimal difficulty.
These two examples of recent standards are particularly appropriate because they both are based on perhaps the most promising development for data exchange in decades — XML (extensible markup language). If ever there was an ideal time to standardize newspaper business data, it is here in the age of XML.
To make a long story short, XML is a tagging methodology that greatly facilitates the interpretation of data among differing systems. As long as you know the name and meaning of any given “tag,” you can write instructions to handle the data in a way that is appropriate for any particular system.
For instance, the tag called “byline” in a story can be interpreted to appear one way in the printed newspaper, and another way for presentation on the Web. The tag stays the same in the data — each individual system decides how it will use the tag.
Certainly, this is an oversimplification of the process, but the point is clear: XML offers an amazingly flexible platform on which to build standard structures for the exchange of data.
Of course, XML provides only the framework to develop data standards. It does not suggest what the tags themselves should be. That’s up to individual industries or parties with common interests.
We have a long way to go in properly defining the data structures that
permeate our business. But it is achievable with the suitable commitment of industry participants. The XML environment offers a great opportunity to continue to define the structure of our most important business processes and translate them into pliant data. Then we can truly get closer to full interchangeability of our bloodstreams.