Find a home for your XML data

CUSTOMERS SAY THEY want them, vendors are scrambling to provide them, and opinions vary as to how to set them up correctly. They are XML databases, a way to store, search, and retrieve all that mission-critical business data that is finding expression in XML format. Currently, XML rivals HTTP, HTML, and SQL as one of the big hits on the top 10 chart of information management standards.

"Photo by Mark Bolster"But XML's strength, its great capability of facilitating the flow of semistructured data among applications and heterogeneous systems, also introduces several new problems. One of the more pressing problems is how to store and manage XML data.

"There are really three ways you can do this," says John Matranga, CTO of Omicron Consulting, in Philadelphia. "You can store XML in a database designed specifically for XML, in a modified object database, or in a relational database."

Matranga goes on to say that because the relational database is still the undisputed king, most people will probably choose this option. "But if the relational database does not have XML extensions, you will need to 'teach' it how to handle all the hierarchies associated with an XML document," he says.

Getting started with XML

Although concerns about storage are real and growing, most users are still wondering how to get started with XML.

A wealth of XML platforms have been designed to help in the transition to XML-based communications. The two most prominent are IBM's WebSphere and Microsoft's BizTalk Server. "BizTalk is somewhat limited to the Microsoft platform," says John Matranga, CTO of Omicron Consulting in Philadelphia. "But it can talk to legacy systems, mainframes, and Unix servers."

This is the key to migrating to XML: Put something in place that will support your existing business platform while easing the translation to and from XML. General Electric's Global eXchange Services (GXS) Division in Gaithersburg, Md., makes this one of its specialties. GXS is a spin-off of General Electric Information Services (GEIS), one of the larger providers of EDI (electronic data interchange) and VAN (value-added network)-based e-business services. "We deal with the challenges of integrating the old VAN/EDI systems with the new Internet/XML architectures every day," says Nick Marchetti, enterprise application integration evangelist at GXS.

Marchetti says his division offers an integration broker to bridge this gap. "This allows you to preserve your legacy back-end systems along with their existing interfaces," he says. "At the same time you can start installing and testing XML."

The GXS message broker is just one of many tools in this area, but it does include the basic features most prospective buyers should look for: support for EDI, adapters for the most prominent enterprise applications such as SAP and PeopleSoft, and the capability to translate between the most popular vocabularies of XML.

A few of the other vendors with product offerings include Metiom in New York, eXcelon in Burlington, Mass., Software AG in Reston, Va., Enigma in Burlington, Mass., VelociGen in San Diego, Extricity in Belmont, Calif., and Autonomy in San Francisco.

In fact, Microsoft, Oracle, and IBM have already added XML extensions to their relational databases, but these efforts will not satisfy everyone for a number of reasons.

"I think most people will want to stick with their relational technology when it comes to XML storage," says Josh Walker, an analyst at Forrester Research, in Cambridge, Mass. "But XML has breathed some new life into the niche market of specialty databases."

A second chance for object databases?

If you are over the age of 25 you may recall that only a few years ago the object database was hailed as the next big thing in data storage.

But outside of some very specialized markets -- high-end science research applications, for example -- the object database never really caught on.

The reason, according to Deborah Hess, senior analyst at Gartner in Stamford, Conn., was complexity. "An object database requires you to learn a whole new language," Hess says. "That is one of the reasons the object database market started to die in late 1995."

It is also why Object Design, an object database vendor in Burlington, Mass., changed its name in 1999 to eXcelon. This was actually the culmination of a project begun in 1997 by then-CTO Larry Alston to make Object Store, the company's database, into a repository for XML documents.

Alston has since left the company, but the commitment to XML remains strong. "The relational database design does not easily support indexing or searching XML," says Satish Maripuri, president and COO of eXcelon. "An object database such as ours offers a more natural way to store, search, and retrieve XML data. This is why we took a bet with XML."

Hess agrees: "An object database stores data in hierarchical form; this enables it to handle all the classes and inheritance properties of objects. Now XML documents are themselves hierarchical, so the two fit very well together."

And Maripuri says the complexity issue is no longer the barrier to entry it once was. "We have built a graphical XML interface into the product," he explains. "You don't need complex development expertise to use it."

Analysts give eXcelon high marks for what it has been able to accomplish both in simplifying its product and in adapting it to XML, but few arre willing to predict the bet will pay off.

"I like their technology," Forrester's Walker says. "But I really see it as something that will be built into a larger infrastructure, and I am cautious about their long-term prospects."

Martin Marshall, managing director at Zona Research, in Redwood City, Calif., doesn't think the company will be able to make it alone. "I think they are an acquisition target," he says.

No longer an academic issue

"Illustration by Nancy Suess/InfoWorld"Meanwhile, folks such as John Conte, director of IT at Wesco Distribution in Pittsburgh, will tell you that analysts aren't the only ones debating these questions.

"We distribute electrical products -- things like lightbulbs and switches -- primarily to construction firms," Conte says. "About a year ago our small-to medium-sized customers started asking IS to provide XML integration."

The reasons were the by-now-familiar advantages of XML: It is more flexible than EDI (electronic data interchange) and cheaper because it can, via the Internet, bypass expensive, private VANs (value-added networks).

So Wesco hired Keane, a services company in Boston, to help implement an XML-based system to process orders.

Conte says the project has been a success. "But," he explains, "there is one thing that gives us pain on a daily basis. When we started this project, there were not many database tools for storing XML, so we tried to create a standard relational database schema that would support all the various flavors and formats of XML we could anticipate."

The combinations of different purchase order formats and XML vocabularies, however, proved impossible to anticipate. "It means that we frequently need to change our schema," Conte says. And this, as any DBA (database administrator) will tell you, is something you want to do as seldom as possible.

So Conte is very interested in finding a better way to store XML. "Six months ago it wasn't such a big deal, but now it is obvious we need a solution," he says.

Wesco's database is Microsoft SQL Server, but Conte says he is open to any number of new solutions. "From what I have seen, I think the XML extensions Microsoft and Oracle are adding will be enough to do what we need to do. We could switch to Oracle if it makes sense, but we are also interested in some of the new XML database products."

The usual suspects

As noted, the big boys of data storage have not been sitting on their hands. Oracle, IBM, and Microsoft have added XML extensions to their relational offerings.

"The object database folks thought XML would give them a new lease on life," says John Magee, senior director of product marketing at Oracle, in Redwood City, Calif. "But it hasn't panned out."

Magee says Oracle currently offers XML support in the 8i release. "We are going to offer additional support for XML as a data type in the 9i release due out in the next few months," he adds. New 9i SQL functions will also include new operators designed specifically for querying XML.

XML at IBM is part of the WebSphere e-commerce infrastructure. And so is the database DB2. "We have an XML extender for DB2 that allows you to query the database directly for rich XML content," says Scott Hebner, director of marketing for WebSphere.

WebSphere's primary competition is BizTalk Server from Microsoft. And SQL Server with XML extensions is an important component of the BizTalk architecture.

"We didn't have any XML functions built into SQL Server 7," says Jeff Ressler, lead product manager for SQL Server at Microsoft. "But with the SQL Server 2000 release last September, you can load XML documents directly into the database and retrieve them with a simple 'Select' statement."

European invasion?

This kind of action from the heavies mightt be enough to scare even the most optimistic of newcomers, but another database giant is getting into the act.

Not exactly a household name in the United States, Software AG of Germany owns a substantial share of the global database market with its Adabase product. And now the company, with U. S. headquarters in Reston, Va., thinks it has an edge in the XML storage space.

"We released Tamino in September of 1999," says John Taylor, director of product marketing at Software AG. "Tamino is not a relational database, nor is it an object database modified for XML. It is, rather, a database built from the ground up specifically for XML."

The interface for Tamino is HTTP, and Taylor says his company is working with the World Wide Web Consortium (W3C) to develop the next XML query language.

"The issue of query and retrieval is key," Taylor says. "You can use extensions to SQL for this, but to do that you need to break the XML hierarchy into a set of relational tables. This means queries will necessarily contain a complex set of join statements. With XPath, our query language, we can replace all that with one line."

Taylor says that more than 280 customers are currently using Tamino. One of these is the California Board of Equalization in Sacramento, Calif. The board collects about $37 billion in taxes (primarily sales tax) for California.

"We started looking at XML to facilitate the electronic filing of taxes," says Larry Hanson, data architect for the board. "Before long we also realized XML would be the best way to store tax returns, tax schedules, and tax-related messages."

The board was already a big Adabase shop with ties to Software AG, a fact which helped drive the adoption of Tamino.

Hanson says the product works as advertised. "With the query language, XPath, you can retrieve all of the information in a given XML document. There is a bit of a learning curve: XPath is similar to SQL but you need to be more aware of the structured hierarchical nature of a given XML document."

Taylor says that Software AG has no plans to challenge the relational vendors on their own turf. "We are not going to make the mistake that the object database vendors did in the mid-'90s," Taylor says. "We know that transactions will still be relational, and we know we are in a niche market. We just think it is a very big niche."

He could be right. Until recently most of the vendor action around XML was focused on using it as a transport mechanism. And users, such as Wesco's Conte, were also concerned mainly with XML connections.

But the issue of storage is rapidly moving to the forefront. "At some point," Conte says, "any customer using XML in significant volume will need to store the documents."

This story, "Find a home for your XML data" was originally published by InfoWorld.

Copyright © 2001 IDG Communications, Inc.

It’s time to break the ChatGPT habit
Shop Tech Products at Amazon