XML Gets Organized

As XML content expands, good management tools are key. Native XML databases, SQL database add-ons and third-party integration tools all present advantages -- and trade-offs.

When the Classwell Learning Group began building a database of online lesson plans and other information for teachers, it needed to store and access content ranging from word processing files to copy scanned from textbooks. Because it needed to store, access and query all those data types, the publisher chose an XML database product.

"XML was pretty much a no-brainer for us," says Brendan Collins, director of IT at Classwell, a division of publisher Houghton Mifflin Co. in Boston.

IT organizations are using XML for everything from integrating applications to content management and access control. Native XML databases can help with storing and managing the resulting flood of XML documents, but they're not the only option. Major databases now offer XML translator features that transform XML documents into fields within their relational structures. That transformation process can eliminate many of the benefits of using XML, however, so SQL database vendors say they're planning to add native XML capabilities to their products.

In the meantime, third-party vendors offer tools that they claim offer more complete integration among XML, relational and even flat-file databases.

Deciding which XML data store is right for you depends on whether you have a stable schema, or design, for your XML data; the degree to which you need to store and audit transactions in their original form; and whether the application is critical enough to justify the expense of a separate, native XML database, say users and software vendors.

XML is a hierarchical data description language that uses tags, such as "customer," to define the data components within a document. In contrast, relational databases such as Oracle, Microsoft SQL Server or IBM's DB2 organize data in well-defined rows and columns within tables.

The fact that XML documents carry not only the data but also the definitions of that data makes it easier for those documents to "describe themselves" to multiple applications during a transaction. Compared with SQL, XML is also much more capable of dealing with unstructured data and data that can be more dynamic in both its meaning and its structure, says Paul Hessinger, executive vice president of HealthRamp Inc. in New York. HealthRamp develops software that allows doctors to prescribe medications using handheld computers. "The permanent data we need to take care of we store in SQL Server. But as we move the data from a handheld device to a Web server, that's largely done via XML," he says.

Choosing an Approach

XML also makes it easier to change the type or format of data stored in a database, says Jake Freivald, director of marketing at iWay Software, a division of New York-based Information Builders Inc. that develops XML integration tools. In an XML database, he says, adding a business-address field to a database only requires creating an extra set of data description tags within the document. In a relational database, he says, that change would require a new set of tables for the business addresses and defining how those new tables relate to every existing table in the database.

However, once the data is in XML, it can be harder to access in new or flexible ways because the XML tags provide a hierarchical method of describing data. For example, under the "patient" tag, an XML document might have a subtag for "name," which in turn might have subtags for "first name," "last name," "middle initial" and "nickname." Any query that can't understand that hierarchical structure will have to search every word of every document, resulting in possible performance bottlenecks.

One way to store XML data is in a native XML database that stores, searches, accesses and retrieves XML documents in their complete and original form. This boosts performance because a native XML database can optimize queries about customers, for example, by searching only the data with the appropriate tags and subtags that identify customers, rather than all the text in a document. Storing XML documents in their entirety, rather than breaking them into individual database fields, also makes it easier to audit or reconstruct transactions because all the details are stored in their original order and sequence. Implementing well-understood and stable XML schemas, or database designs, can also be easier in a native XML database than in a relational database that must translate XML documents into columns and rows.

The native XML database Collins chose, Tamino from Darmstadt, Germany-based Software AG, had early reliability and data-corruption problems but has since improved dramatically, he says. Collins did, however, have to write his own software for loading data, as well as for version control of the XML documents.

Other Roadblocks

This lack of tools is only one strike against native XML databases. Analyst Carl Olofson at IDC in Framingham, Mass., predicts that the very small market for native XML databases—just $42 million last year—will grow to only about $77 million in 2007 because relational database vendors and other software players will build native XML capabilities into their products.

Until they develop native XML support, major database vendors offer translation layers that either "shred" the XML document into small enough components to store in the fields of a relational database, or "cram" the entire XML document into a single field, according to Ron Schmelzer, a senior analyst at ZapThink LLC, a market research firm in Waltham, Mass.

For example, Cengent Therapeutics in San Diego uses Oracle's XML DB function to share protein structure data stored in Oracle with data stored in Cengent's own XML database, says Kal Ramnarayan, Cengent's vice president and chief scientific officer. In this process, XML documents are created from SQL data, a critical capability that allows existing data to be stored and queried from new XML databases.

But XML enablement wouldn't work for Jerry Lettow, an IT architect at Hewlett-Packard Co. who is building an XML-based intranet portal to share sensitive financial information with thousands of key managers. Lettow needs to be able to review transactions to prove, for example, exactly what information was provided to the portal by a financial application.

Using Oracle, Lettow says he would have had to "go in and query to find out what special unique identifier" had been assigned to store that information within Oracle's columns and rows. He also worries that Oracle will stick to its vendor-specific XML query language instead of using the World Wide Web Consortium's XQuery standard.

The Third Alternative

Lettow instead chose a third path: He used a third-party tool to link his relational and XML worlds. With Redwood City, Calif.-based Ipedo Inc.'s XML Information Hub, he says, it's easier to find such information because the XML documents are stored whole. Several vendors offer similar tools to translate and manage data from multiple sources into XML documents.

Eventually, Schmelzer says, all major database vendors will offer native XML support. But to be truly native, the database must be able to take any arbitrary XML document and insert it into the data store without any modification and then retrieve that document without shredding, jamming or otherwise modifying it, he says. "Can it handle arbitrary variables in the XML documents, like lots of repeated tags and lots of levels of hierarchy?" Schmelzer asks. Another key feature is support for still-emerging standards such as XQuery.

Collins also warns not to underestimate the work involved in implementing XML. While his use of XML will save him money in the long run, he says schema design can be "a lot of work, and the more content you have, the more work it is."

Building the schema right the first time can also be important, says Ipedo's vice president of marketing and co-founder, Tim Mathews. "If you're using an XML-enabled relational database, to change your schema you have to dump and reload all of your data," he says, an unacceptable option for production databases that must be constantly available.

Before choosing an XML data store, customers must dig beneath the covers to understand exactly how the XML documents are stored and how they can be queried, says Lettow. "Does it store it as you uploaded it, or does it break it down into separate little instances [of data]? Can I get out exactly what I put in, or do I have to create a lot of extra code or a lot of special processes to re-create the original XML?"

But the effort is worth it, he says. "After playing with an XML database for the last year and seeing how powerful it is, I'm surprised that more people aren't plugging into just when and how to use them."

Scheier is a Computerworld contributing writer. Contact him at rscheier@charter.net.

1pixclear.gif
Native XML Databases vs. XML-Enabled Databases
Please click on image above to download a readable pdf.
5 collaboration tools that enhance Microsoft Office
  
Shop Tech Products at Amazon