|
|
This article was originally printed in DB2 Magazine.
DB2 9 offers significant new support for XML, which Java application developers can begin to use with minimal effort. That's because DB2 9 enables programmers to query, insert, update, and delete XML data — as well as traditional SQL data — using familiar JDBC statements and industry-standard query languages.
In this article, I'll show you detailed examples of how to use Java to build applications that use DB2's "pureXML" technology, which allows XML to be stored and queried in its inherent hierarchical format.
DB2's XML Support
DB2 9 features new mechanisms for managing, storing, querying, and indexing XML data. In particular, DB2 introduces:
An XML data type that enables DB2 to store XML documents in a parsed, hierarchical format
Indexing technology that improves runtime performance of queries that search across and within XML documents
Query language support based on industry standards, including new XML extensions to SQL (often called SQL/XML) and XQuery, which was designed specifically for XML
Support for validating XML data based on user-supplied schemas, which helps administrators to enforce integrity constraints for XML data stored in DB2
Administrative capabilities, including extensions to key database utilities.
An additional feature of particular interest to Java programmers is DB2's new Developer Workbench, which is built on the open source Eclipse 3.1 platform. This integrated development environment (IDE) can help programmers quickly develop, test, and debug their database applications.
I'll be using the DB2 Developer Workbench for the examples in this article. Of course, Java programmers can use any supported IDE or Java Development Kit (JDK) to create DB2 XML applications.
Setting Up the Environment
Code samples in this article refer to a CLIENTS table that
tracks information about customers. Listing 1 shows how this table was defined.
Listing 1: The CLIENTS table definition.
create table clients(
id int primary key not null,
name varchar(50),
status varchar(10),
contactinfo xml
)
Note that CLIENTS contains three columns based on traditional SQL data types and one column based on the new SQL data type called "XML." This latter column, contactinfo, stores XML documents that include information such as a customer's mailing address, phone numbers, email addresses, and so on. Figure 1 shows one such XML document.
Figure 1: Sample XML file to be stored in the CLIENTS table.
I'll show you how to insert this file, along with other information, into the DB2 table. But first, let's discuss how to configure the DB2 Developer Workbench.
To compile and run any DB2 XML application using the DB2 Developer Workbench, follow standard Eclipse processes to create a new Java project and include appropriate DB2 libraries in the project's build path. If you've never worked with Eclipse before, here's a quick overview of how to accomplish those tasks:
Launch the DB2 Developer Workbench.
Create a new project. Switch to the Java perspective ( Window → Open Perspective → Java) and select File → New → Project. Follow the wizards to specify a project name. For other items, retain the default settings.
Add the DB2 libraries into your project's build path. Highlight your project, right click, and select Properties. Select Java Build Path and click on the Libraries tab. Add the appropriate DB2 external .jar files, such as db2jcc.jar, db2jcc_javax.jar, and db2jcc_license_cu.jar. These files are located in the \java subdirectory of your DB2 installation directory.
Optionally, create a package for your application. Highlight your project, right click, and select New → Package.
For details, click the Help option from the main menu of the
workbench.
Connecting to a Database
To work with DB2 XML data, establish a connection to your target database just as you would for any other DB2 application. Listing 2 shows an excerpt from a helper method that establishes a DB2 database connection.
Listing 2: A helper method that establishes a DB2 database connection.
public class Conn {
// for simplicity, I've hardcoded account and URL data.
private static String user = "user1";
private static String pwd = "mypassword";
private static String url = "jdbc:db2:test";
// this method gets a database connection
public static Connection getConn(){
Connection conn=null;
// load the appropriate DB2 driver and get a
// connection to the "test" database
try {
Class.forName("com.ibm.db2.jcc.DB2Driver");
conn = DriverManager.getConnection(url, user, pwd);
. . .
}
catch (Exception e) { e.printStackTrace(); }
return conn;
} // end getConn();
. . .
} // end class
Inserting XML Data
DB2 programmers use familiar SQL INSERT statements to write new XML data to tables that contain XML columns. DB2 can store any well-formed XML document up to 2GB.
Often, Java programmers need to insert XML data contained in files into DB2 tables, although they can also insert XML data from character strings, from binary data (including large objects), and from SQL subselect statements.
Listing 3 shows one way of inserting a row into the CLIENTS table. In this case, the XML document for the contactinfo column is read from a file.
Listing 3: Method to insert data from an XML file into DB2.
public static void insertFile(){
try {
// for simplicity, I've defined variables with input data
int id = 5681;
String name = "Paula Lipenski";
String status = "Gold";
String fn = "c:/XMLFiles/Client5681.xml"; // input file
// get a connection
Connection conn = Conn.getConn();
// define string that will insert file
String sqlstmt = "insert into clients (id, name,
status, contactinfo) values (?, ?, ? ,?)";
// prepare the statement
PreparedStatement insertStmt = conn.
prepareStatement(sqlstmt);
insertStmt.setInt(1, id);
insertStmt.setString(2, name);
insertStmt.setString(3, status);
File file = new File(fn);
insertStmt.setBinaryStream(4, new
FileInputStream(file), (int)file.length());
// execute the statement
if (insertStmt.executeUpdate() != 1) {
System.out.println("No record inserted.");
}
. . .
// release resources
conn.close();
}
catch (Exception e) { . . . }
}
Let's step through this code. The insertFile() method defines several variables that are later used with the INSERT statement. The first three variables correspond to the ID, name, and status columns in the CLIENTS table. The fourth is the name of the XML file to be inserted into the contactinfo column. (For simplicity, values have been hardcoded in this example.)
After establishing a database connection, the method creates a sqlstmt string containing four parameter markers for the input column values. The INSERT statement is prepared as usual, and its four parameter markers are set. To set the marker for the XML column, the method opens a FileInputStream. It also computes the length of the referenced XML file and uses this information as input to the setBinaryStream() method. Finally, the method executes the INSERT statement, causing XML and traditional SQL data to be inserted into a row in this table.
Although not shown here, DB2 also enables users to register XML schemas and validate input documents against these schemas prior to insertion. XML schemas are part of the World Wide Web Consortium (W3C) industry standard; they enable users to specify the desired structure of compliant XML documents, such as the order and data types of acceptable XML elements, the use of specific XML namespaces, and so on.
Querying XML Data
Now that you've stored XML data in DB2, you're ready to query it. DB2 enables you to write a variety of query types. You can write simple queries that retrieve full XML documents or sophisticated queries that retrieve portions of XML documents based on XML and relational query predicates. DB2 also supports queries that transform XML data into new XML documents, join XML and non-XML data, aggregate various types of data, and so on. In this article, I'll demonstrate a query that:
Filters data based on relational and XML predicates
Retrieves portions of qualifying XML documents along with data stored in a traditional SQL column.
DB2 now supports two query languages: SQL and XQuery. A single application can include individual queries written in either or both languages. XQuery doesn't provide a means to resolve parameter markers. As a practical matter, this means that any XQueries in a Java application that require more than hardcoded query predicates must be wrapped in a SQL statement using a SQL/XML function such as XMLQuery(), XMLExists(), or XMLTable(). The sample application in this article uses both XMLQuery() and XMLExists() to illustrate a common programming task: retrieving portions of XML documents.
The example shown in Listing 4 returns the names and primary email addresses of "Gold" customers who live in ZIP code 95125. Customer name and status information (for example, "Gold" or "Silver") are stored in SQL VARCHAR columns, while email addresses and ZIP codes are contained in XML documents stored in the contactinfo column. As such, this example projects and restricts both traditional SQL and XML data.
Listing 4: Method to query DB2 XML data.
public static void queryXMLExample() {
// for simplicity, I've defined variables with input data
String status = "Gold";
String zipCode = "95125";
try{
Connection conn = Conn.getConn();
// get names and primary email addresses
// of customers with a certain status who live in a
// certain ZIP code
String query = "SELECT name, xmlquery('$c/Client/email[1]' " +
" passing contactinfo as \"c\") " +
" from clients where status = ?" +
" and xmlexists('$x/Client/Address[zip=$zip]' " +
" passing contactinfo as \"x\", " +
" cast(? as char(5)) as \"zip\" )";
PreparedStatement selectStmt = conn.prepareStatement(query);
selectStmt.setString(1, status);
selectStmt.setString(2, zipCode);
ResultSet rs = selectStmt.executeQuery();
// iterate over all items in the sequence.
while(rs.next() ) {
System.out.println("Name: " + rs.getString(1) +
" Email: " + rs.getString(2));
}
// release resources
. . .
}
catch (Exception e) { . . . }
}
The query statement defined in the query variable warrants a closer look, particularly for those new to SQL/XML and XQuery. Although a thorough discussion of these languages is beyond the scope of this article, we'll step through each line of this query so you can understand its logic.
The first line specifies that the SQL name column and the output of the XMLQuery() function will be included in the result set. To instruct DB2 to retrieve the first email element for a qualifying customer, the code includes an XPath expression. XPath, part of XQuery, provides a means to navigate through the hierarchy of an XML document. This XPath expression instructs DB2 to navigate to the first email element beneath the root Client element of each qualifying XML document. (Note that the path expression is case-sensitive.) The $c variable and the SQL FROM clause indicate where these documents can be found (in the contactinfo column of the CLIENTS table).
The SQL WHERE clause restricts the data to be returned. In particular, it specifies that returned XML documents will include only those found in rows in which the client's status is of a certain value (Gold). Furthermore, only records of customers who live in a certain ZIP code (95125) qualify. The XMLExists() function specifies this latter restriction.
In this sample query, XMLExists() instructs DB2 to determine if a given XML document contains a client address that includes the specified ZIP code. The PASSING clause specifies where XML documents can be found (in the contactinfo column). Note that escape characters (backward slashes) appear around the $x variable in this PASSING clause, just as they did in an earlier PASSING clause that defined the $c variable. The final line of this query invokes the CAST() function to cast the values of the input parameter (for ZIP code) to an appropriate SQL data type. In this case, a five-character string is acceptable. Listing 5 shows the output from this program.
Listing 5: Output from a program to return names and primary email addresses.
Name: Paula Lipenski Email:
<email>beatlesfan36@hotmail.com</email>
Name: Edward Chen Email:
Name: Chris Sanchez Email:
<email>golfmaniac123@yahoo.com</email>
In this sample output, no email information is returned for a qualifying customer (Edward Chen). This omission indicates that an email element isn't included in his XML contactinfo document, probably because he didn't supply one as part of his customer profile. Also note that only one email address for Paula Lipenski is returned, even though her contact information contains two email addresses (see Figure 1). This result occurs because the sample application explicitly instructs DB2 to retrieve only the first email address of qualifying customers. (See the XPath expression supplied to the XMLQuery() function.) If you'd like to exclude the <:email> tags from the output, you can use a path expression in the XMLQuery() function to instruct DB2 to return only the text value of the email element: $c/Client/email[1]/text().
Programming Tips
Although DB2's native XML support is new, good database application programming practices remain the same. For example, it's still important to instruct DB2 to retrieve only the data you need. Retrieving entire XML documents when you only need a subset of the information they contain will slow performance and needlessly consume system resources. Similarly, you should consider the scope and isolation levels of your transactions carefully to balance overall concurrency and integrity concerns.
It's also worth exploring XML features of DB2 9 that are beyond the scope of this introductory article. As you might expect, DB2 enables programmers to update and delete XML data using familiar SQL statements. In addition, IBM has also released an "update" stored procedure for XML data on IBM developerWorks ("XML Application Migration from DB2 8.x to DB2 Viper, Part 1"; see Resources). This procedure enables users to supply only changed XML data for updates rather than supply the entire contents of the revised XML document.
The DB2 Developer Workbench includes wizards to help programmers graphically construct SQL/XML queries as well as XQueries. It also provides wizards for building stored procedures that work with traditional SQL or native XML data. All wizards can be launched from the workbench's Data perspective.
For details on any of these topics, consult the appropriate online documentation or the reference materials cited at the end of this article.
New, But Familiar
Java programmers who need to work with native XML data stored in DB2 use familiar JDBC statements to execute queries and process results. To help programmers code, test, and debug their Java applications, IBM provides an Eclipse-based Developer Workbench for DB2. This workbench includes wizards for exploring the contents of a database, writing queries in SQL/XML or XQuery, and generating stored procedures.
Cynthia M. Saracco is a senior
solutions architect at IBM's Silicon Valley Lab, where she specializes in
database management, XML, and Web-based technologies. She is a frequent speaker
at industry conferences, has taught courses for the UC California extension
program, and has published 3 text books and more than 50 papers on various
database management topics.