Reprinted with Permission by Quest Software March 2006


What is True Native?

Each of the currently available (non-native) methods for managing XML in relational databases attempts to make XML conform to the relational model in some way. These approaches include:

Shredding. Most major RDBMSs (including DB2) support shredding. Shredding involves defining a relational schema that corresponds to the XML (for example, representing parent/child relationships in the XML as one or more child tables in a referential integrity constraint with its parent) and defining a mapping from the XML data to the relational schema.

Shredding is a good fit in existing relational environments. However, mapping can be complex and fragile, and you must define a mapping for each XML document you want to store. If the XML schema changes, the mapping may no longer be valid or may require a complex change process. Once decomposed, the data ceases to be XML, loses any digital signature, and becomes difficult and expensive to reconstruct (often requiring many joins).

Storing XML as a CLOB. All major vendors support storing entire XML documents in a variable length character type (VARCHAR) or as CLOBs. If XML documents are inserted into CLOB or VARCHAR columns, they are typically inserted as unparsed text objects. CLOBs preserve the original document and provide uniform handling of any XML, including highly volatile schemas.

Avoiding XML parsing at insert time guarantees high insert performance. However, without XML parsing, XML document structure is entirely ignored. This precludes the database from doing intelligent and efficient search and subdocument level extract operations on the stored text objects. The only remedy is to invoke the XML parser at query execution time to "look into" the XML documents so that search conditions can be evaluated. The high insert performance comes at the cost of low search and extract performance.

BLOB (pseudo native). BLOB-based storage is conceptually similar to CLOB storage; however, instead of storing the XML data as a preparsed string, BLOBs store it in a proprietary post-parse binary representation. This approach is sometimes called pseudo native, because the data representation remains in XML within the BLOB.

However, the underlying storage for a document is virtualized as a single contiguous byte range, which can cause performance problems. Updating can require the entire document to be rewritten (and locked). Access to portions of the document might require the entire document to be read from disk.

True native. True native storage holds the post-parsed data on disk, enabling individual nodes of the data model to be stored independently-that is, not as a stream-and then interconnected. True native storage provides the advantages of BLOB and CLOB, but resolves the remaining performance issues because the document storage isn't virtualized as a single contiguous byte range. The storage for the entire set of documents is virtualized as a contiguous byte range; however, individual nodes can be relocated in this range with minimal impact on other nodes and indexing.