Reprinted with Permission by Quest Software May 2007


Understanding XML
by Robert Catterall

This article was originally printed in DB2 Magazine.

Your company is probably already using it. It’s time to figure out how to manage it.

XML Origins

Back in the early 1980s, when I worked as a systems engineer in an IBM branch office, I didn’t use a word processor to create letters, reports, and other items of business documentation (the personal computer was quite new at the time); instead, I used an IBM product called the Document Composition Facility (DCF), which ran on a mainframe server under the Virtual Machine (VM) operating system. More specifically, I used a text formatter, called SCRIPT, that was a component of DCF. Sitting at a “dumb terminal” (the term “thin client” was not yet in vogue), I would enter the text of my document and insert “tags” into the file to control the appearance of the printed document. For example, I’d indicate the start of a paragraph with this tag (the tags were preceded by a colon and followed by a period):

:p.The IBM 3350 disk drive is a high-performance…

If I wanted to boldface a word, I would use the :hp2. tag (the :ehp2. tag denotes the point at which the use of the bold typeface is to stop):

:p.The 3033 computer offers a whopping :hp2.16 megabytes:ehp2. of central storage…

These tags, and others used to format SCRIPT files, comprise what is known as the Generalized Markup Language (GML).

GML was originally developed by IBM in the 1960s. In the mid-1980s, a GML descendant called SGML (Standard Generalized Markup Language) made the scene. SGML is a metalanguage. Just as metadata defines data, the SGML metalanguage can be used to define document markup languages. This means that SGML is extensible; however, SGML is also complex, and, as a result, it tends to be used in specialized circumstances. [Wikipedia entries for GML and SGML helped jog my memory about the relationship between SCRIPT and DCF, and the relationship between GML and SGML.]

In the mid-1990s, HTML (HyperText Markup Language), which is based on SGML tagging, was developed to control the appearance of content displayed via one of those new-fangled things called a Web browser.

HTML is great for making data look good when displayed on a Web page, but it doesn’t confer meaning to data. You might dispute that contention because you can look at an HTML-formatted price list (for example) on a Web page and tell that it’s a price list. But that’s because your brain understands written language and visual cues, and you have seen jillions of price lists in your lifetime. Your brain is a lot more powerful than most computers (in some ways, more powerful than any computer), so what works for you and me is not such a good solution when a machine has to interpret data in a document (or file). In an era of rapidly growing data interchange between organizations running all manner of computer systems, what was needed was a data-defining markup language flexible enough to adapt to new requirements yet simple enough to be used for general-purpose applications.

That language showed up in 1998. It’s called Extensible Markup Language, but it’s better known as XML.

Why XML is Important

Of course, XML didn’t make it possible for organizations to exchange data electronically — that had already been going on for several decades. But XML was a catalyst that greatly expanded the world of electronic inter-system data exchange. How so? Consider the benefits delivered by XML:

Basically, it’s all about speed. To succeed in business (or even in nonprofit activities), organizations have to be able to respond quickly and effectively to shifting market dynamics and changing customer needs. Corporations need to be able to forge working partnerships with other companies in short order to address new opportunities. People need to be able to rapidly solve business problems without being slowed by complex rules around the exchange of data with third parties. Organizations that leverage XML gain agility in return.

Compare and Contrast

I recently had the opportunity to talk XML with Chris Eaton, a senior product manager at IBM’s Toronto Lab (home of DB2 for Linux, Unix, and Windows). Chris shared an example that really impressed upon me the value of XML as a data-describing language.

FIX (Financial Information eXchange) is a protocol used by financial services organizations for the exchange of data pertaining to securities transactions (such as stock trades). Listing 1 shows a trade involving the purchase of 1,000 shares of IBM stock in the FIX protocol.

Listing 2 shows the same information in the FIXML protocol, a recently developed XML representation of FIX. If you were an application programmer, which format would you rather work with?

Note that although XML is certainly more intuitive than many legacy data-exchange protocols, it can also result in larger message sizes. Some years ago, that might have been a problem; indeed, a number of data-exchange protocols were likely designed to minimize the size of a data transmission (in order to optimize throughput and make efficient use of costly network, server, and disk storage systems). Today, network bandwidth, processor speeds, and disk subsystem capacities are much greater (and cheaper) than they were when I got started in IT, and it seems that many companies are more than willing to trade somewhat greater IT infrastructure resource consumption for significantly improved organizational productivity.

Some XML Lingo

Before getting into DB2 9’s XML support, I’ll define a few terms (assisted by the handout for an XML presentation delivered a few years ago by IBM’s Susan Malaika):

DB2 9 XML Support: The Real Deal

With XML being so pervasive, it makes sense that organizations would want to store XML documents in a DBMS, right? Well, the relational DBMS for people who are XML-oriented is DB2 9.

“Why is that?” you might ask. “Other relational DBMSs have provided XML support for a while now. Even DB2 has had XML support for years in the form of the XML Extender.” True enough, but DB2 9 takes XML support to a whole new level.

Before DB2 9, you basically had two options when faced with the task of storing XML data (which is inherently hierarchical) in a relational DBMS:

The CLOB approach has its own problems; the primary one is the fact that searching for particular element and attribute specifications within XML documents stored as CLOBs requires XML parsing — something that drives up CPU consumption and lengthens response time.

Enter DB2 9. Now, for the first time, you can store an XML document as a true XML data type, and it will be stored as an entity in the DB2 database (no shredding needed). But — unlike the CLOB approach — DB2 has visibility down to the element and attribute level of the document. In other words, the relational DBMS is aware of the hierarchical structure of the XML document. You can build indexes on the XML documents, with keys generated based on specified attribute patterns. You can query the data in the XML documents using SQL or the XQuery language. You can take advantage of built-in XML parsing and validation functions.

What’s the payoff? Enhanced ease of use, and improved (sometimes dramatically) query performance.

Here’s how I think of DB2 9’s XML support compared to what was there before: the old capabilities were like a singer who speaks English and records a song in Japanese without knowing the language — he just sings phonetically, getting the pronunciation right but having no understanding of the meaning of the sounds he’s making. The new technology is like the singer after he’s learned to speak Japanese. He actually understands what he’s singing, and what’s more, because he speaks both English and Japanese, he can respond in Japanese to a question posed in English, or vice versa. The DB2 Extender made it possible for DB2 to pronounce XML, if you will. But DB2 9 understands XML.

Your Job: Bridge the Gap

There may well be an XML-related knowledge gap in your organization. You know about DB2 9’s groundbreaking XML-supporting capabilities, but you may not know how your organization is using XML today or planning to use XML in the future. Some of your colleagues who are application developers and architects may know how your organization is using XML, but they don’t know anything about the advanced XML support provided by DB2 9. Do you see where I’m going with this? Talk to some of your senior application development people. Find out what they’re doing with XML, and let them know what DB2 9 can do with XML data. You could end up making their lives easier and their applications faster. They might buy you lunch as a thank-you.

Hey, it could happen.


Robert Catterall is a Director of Engineering at CheckFree Corporation. As part of the Company's Technology Strategy and Planning group, Robert works to establish corporate-wide standards for the use of information technology in CheckFree applications and systems. Robert is a past president of the International DB2 Users Group (IDUG), and a member of IDUG's Speakers Hall of Fame. He has a Bachelor's Degree from Rice University and an MBA from Southern Methodist University.