The Threat of XML

Many see XML as a miraculous way to integrate the Web and back-end data. But few realize how powerful a force they're letting through the firewall and how big the risk is from hackers who can write hostile code disguised as HTML.

Just when you thought the uncontrolled forces of the Web were finally getting manageable, along comes multidimensional data. We're talking XML, which unlocks data from many sources for many destinations as no markup language has done before.

But this new way of handling data also opens up new security vulnerabilities. Already, IT managers are bracing for a new onslaught of malicious code, data hijacking, viruses, graffiti, defacements and buffer overflows.

XML is spreading to back-office systems, business exchanges and wireless applications. In the next two years, XML will be used on more than 50% of Web sites, according to some researchers.

Even two years ago, companies like Marriott International Inc. had begun making their back-office applications more extensible through XML. And progressive businesses like ETrade Group Inc. and Alaska Airlines are now announcing wireless trading and reservations through XML-based systems built by companies like Everypath Inc., a mobile application framework vendor in San Jose.

Unlike HTML, XML can link an unlimited combination of data types by tagging them with a standard, machine-readable language to define each piece of data and determine what it does.

For example, XML can be used to dynamically link inventory data stored in an arcane format in a back-end database with specific spreadsheet columns that allow customers and partners to slice and dice numbers in real time.

Developers can use XML to create interactive Web sites by dynamically linking the data stored in their systems or from anywhere in the public domain.

XML is the basis for an emerging consumer privacy framework called Platform for Privacy Preferences, introduced by Microsoft Corp. and several small vendors this year. And XML shows promise of finally making public-key infrastructures and digital signatures interoperable.

But XML has a dark side. The powerful capabilities of these data sets and dynamic links open up a whole new can of security worms because the code defined by XML tags can carry virtually any payload through the firewall unchecked.

Simply put, firewalls and filters trust that the XML tags are honest descriptors of the code they define, so malicious XML code could get a free ride into almost any organization.

Too Much Trust?

The World Wide Web Consortium (W3C), whose members are mostly technology and telecommunications vendors, denies any suggestion that XML opens up new security problems. "XML is just a markup . . . used to convey information and build applications," says Joseph Reagle, a policy analyst at the W3C.

But as with other languages that support executable code, the problem is what developers do with XML. "How you convey information and build applications will, of course, have security concerns," says Reagle.

It's this model of trusting developers to do the right thing with XML that worries IT professionals.

"Trust is the darned key to all of this," says Perry Luzwick, director of information assurance architecture at Herndon, Va.-based Logicon Inc., an IT company owned by Los Angeles-based Northrup Grumman Corp. "There's no control of the input in an open XML environment unless you could somehow check wrappers [tags], but that's cumbersome. . . . There's no way to say that metadata in the tags represents what it says it does."

It's too early to tell how widespread XML-enabled exploits will be in the next few years. So far, exploits are rare because there's no XML on the client end yet, says Ryan Russell, incident analyst at security intelligence firm SecurityFocus Inc. in San Mateo, Calif. But Internet Explorer has a heavy XML feature set in V6.0, to be released later this year.

Payet Guillermo, chief technology officer at Ocean Group, an Internet engineering firm in Santa Cruz, Calif., says the first wave of XML attacks will resemble malicious code attacks conducted in HTML, more than 40 of which are listed on the advisory pages of the Pittsburgh-based CERT Coordination Center. "Just as there are a bunch of browser exploits that use malformed HTML and Java to crash your browser or take control of your machine, we'll probably see the same types of attacks aimed at XML parsers . . . and the applications using the parsed data," says Guillermo.

Text-based attacks will also re-emerge, predicts Dan Moniz, a research scientist at peer-to-peer application developer OpenCola Ltd. in Toronto.

A text-based attack is accomplished by inserting complicated data streams—symbols, numbers and characters—anywhere in applications, including buffers, or Web addresses. Until XML, text-based attacks were successfully filtered. But the XML framework introduces a more complex character set routine, Unicode, to facilitate more complex data typing. Unicode uses 16-bit character sets instead of ASCII's eight bits.

In May, the first Unicode text-string exploit (against Microsoft's Internet Information Servers) was posted on CERT's advisory pages (Vulnerability Note VU#111677).

"In Unicode, there are an infinite number of ways to say something. So programs that block bad code can't work with Unicode, because they can't think of all the ways the bad code could be written," says Bruce Schneier. In July of last year, Schneier, founder and chief technology officer of Counterpane Internet Security Inc. in Cupertino, Calif., published a white paper predicting an onslaught of text-based attacks exploiting the Unicode character sets. "Unicode is just too complex to ever be secure," he adds.

Indeed, protecting against any new XML-based attacks won't be easy because there are no checks to verify such complex data streams being pushed or pulled into business networks.

Don't count on filtering to help. Firewalls won't check XML-embedded data. And XML-encoded attack signatures won't show up in audit logs, says Dark Tangent, a white-hat hacker and organizer of the annual Def Con security conference for hackers in Las Vegas.

Safety in Standards

About the only thing IT professionals can do at this early stage is minimize their own development risks. The best bet is to carefully follow XML development standards and protocols coming from the Internet Engineering Task Force (, the W3C (, vertical industry groups and vendor-developed frameworks like Everypath's, advises Peter Lindstrom, a security analyst at Hurwitz Group Inc. in Framingham, Mass.

And remember, you're not the only one trying to make sense of the XML paradigm. Even those in the know, like John Goeller, director of electronic trading at Credit Suisse First Boston in New York and chairman of a financial services XML working group, are struggling with more than a dozen XML protocols to come up with a universal standard suitable for financial trading applications.

Growing pains like these are common with all emerging technologies, says Dark Tangent. There's no way to know how the exploits will hit or when because programs support XML differently than they do HTML, he says. "It will take time for XML developers to get XML integrated correctly," he says.


Special Report

Security Risk and Reward

Stories in this report:

Copyright © 2001 IDG Communications, Inc.

It’s time to break the ChatGPT habit
Shop Tech Products at Amazon