|
|
XML Support in Transact-SQL
Vakhtang Pavliashvili
The Internet has changed our lives once and for good; most times for the better and sometimes for worse. But the bottom line is that we would never want to go back to living without email, on-line shopping and the ability to do research right from our desktop.
Microsoft SQL Server has supported web related functionality since the version 6.5, which allowed you to create somewhat crude and non-interactive HTML pages. These HTML pages display contents of your tables or queries. In the later releases the web functionality has been enhanced with web tasks and wizards that automatically create web pages according to the settings you specify. More importantly, SQL Server 2000 is one of the first relational database engines that offers native XML support.
Perhaps we should step back for a minute and discuss what XML is and why is it the next ‘big thing’ in the IT industry. The eXtensible Markup Language (XML) is related to Hyper Text Markup Language (HTML), along with other markup languages, which is the heart and soul of the internet. Therefore, the information contained in XML can be easily translated into HTML and presented in the web format. More importantly, XML enables programmers exchange the data in a string format, which is supported by all platforms, all browsers and all operating systems. Last but definitely not least, XML offers exceptional performance advantage over other ways of exchanging data, again due to the fact that data is passed around as a string, rather than being split into variables with different data types.
You can interact with SQL Server 2000 with XML in several different ways:
In this article I’d like to talk about some of Transact-SQL XML extensions. For those that don’t have a strong XML background I’ll provide a very brief overview of main XML components. Keep in mind though that I won’t go into XML details – there are some fine articles and books that you can reference if you’d like to explore XML features further.
Basic Overview of XML
XML documents mainly consist of tags, attributes and elements. Tags can be nested within other tags; in such case the outer tags are often referred to as parent tags and the inner tags are the children. The elements are the pieces of information that have their own tags; the attributes are used to describe the elements – they do not have their own tags. Consider the following XML snippet:
<root>
<employees EmployeeID="1" LastName="Davolio" FirstName="Nancy"
Title="Sales Representative" TitleOfCourtesy="Ms." />
</root>
In this example <root> is the parent of <employees> tag. EmployeeID, LastName, FirstName, Title and TitleOfCourtesy are all attributes of the employees element. An alternative way to present the same data would be as follows:
<root>
<employees>
<EmployeeID>1</EmployeeID>
<LastName>Davolio</LastName>
<FirstName>Nancy</FirstName>
<Title>Sales Representative</Title>
<TitleOfCourtesy>Ms.</TitleOfCourtesy>
</employees>
</root>
The difference is that the attributes of the first example are presented as elements. In the second example EmployeeID, LastName, FirstName, Title and TitleOfCourtesy elements are all children of the employees element.
NOTE: Unlike many other programming and markup languages XML is case sensitive. Therefore, EmployeeID is not considered the same as employeeID, employeeid or Employeeid.
If you understand what attributes, elements, parent and child tags are you’re ready to learn how Transact-SQL extensions work.
Transact-SQL XML Extensions
Transact-SQL allows you to present SQL Server data in XML format to the end user or the application that calls a stored procedure. In addition, it allows you to parse an XML document and treat it as relational data. Retrieving data in XML is accomplished with FOR XML extension of the SELECT statement. The FOR XML extension, in turn has three modes: AUTO, RAW and EXPLICIT. Each mode supports various options that let you customize your queries’ output.
Reading and parsing the XML data is accomplished through OPENXML() function, which also has a number of options.
In this article I’ll be discussing AUTO and RAW modes of FOR XML only. The next article will have an extended discussion of the EXPLICIT mode. I’ll discuss OPENXML() function in a later article.
FOR XML AUTO
The AUTO option of the FOR XML clause simply treats the table name as a parent and each column in the table becomes a child of the table-name tag. For instance, the following query retrieves the first 2 rows in the customers table from Northwind database in XML format:
SELECT TOP 2 * FROM employees FOR XML AUTO
Results:
<customers
CustomerID="ALFKI"
CompanyName="Alfreds Futterkiste"
ContactName="Maria Anders"
ContactTitle="Sales Representative"
Address="Obere Str. 57"
City="Berlin"
PostalCode="12209"
Country="Germany"
Phone="030-0074321"
Fax="030-0076545"/>
<customers
CustomerID="ANATR"
CompanyName="Ana Trujillo Emparedados y helados"
ContactName="Ana Trujillo"
ContactTitle="Owner"
Address="Avda. de la Constitución 2222"
City="México D.F."
PostalCode="05021"
Country="Mexico"
Phone="(5) 555-4729"
Fax="(5) 555-3745"/>
By default the AUTO mode treats the columns as attributes, however, we could easily override this setting by appending the ELEMENTS clause, as follows:
SELECT TOP 2 * FROM customers FOR XML AUTO, ELEMENTS
Results:
<customers>
<CustomerID>ALFKI</CustomerID>
<CompanyName>Alfreds Futterkiste</CompanyName>
<ContactName>Maria Anders</ContactName>
<ContactTitle>Sales Representative</ContactTitle>
<Address>Obere Str. 57</Address>
<City>Berlin</City>
<PostalCode>12209</PostalCode>
<Country>Germany</Country>
<Phone>030-0074321</Phone>
<Fax>030-0076545</Fax>
</customers>
<customers>
<CustomerID>ANATR</CustomerID>
<CompanyName>Ana Trujillo Emparedados y helados</CompanyName>
<ContactName>Ana Trujillo</ContactName>
<ContactTitle>Owner</ContactTitle>
<Address>Avda. de la Constitución 2222</Address>
<City>México D.F.</City>
<PostalCode>05021</PostalCode>
<Country>Mexico</Country>
<Phone>(5) 555-4729</Phone>
<Fax>(5) 555-3745</Fax>
</customers>
At this point you might be asking what happens if you have multi-table queries with XML AUTO. The answer is simple: you get a hierarchy of XML elements that depends on the order in which table columns are specified in the query. For instance, if I select a couple of rows from customers and orders tables, the XML hierarchy returned depends on the order of columns specified in the query. Let’s look at an example to make things easier to grasp:
SELECT TOP 2
Orders.OrderDate,
Orders.ShipCity,
Customers.CompanyName,
Customers.Country
FROM orders INNER JOIN customers ON
orders.customerid = customers.customerid
FOR XML AUTO
Results:
<orders
OrderDate="1996-07-04T00:00:00"
ShipCity="Reims">
<customers
CompanyName="Vins et alcools Chevalier"
Country="France"/>
</orders>
<orders
OrderDate="1996-07-05T00:00:00"
ShipCity="Münster">
<customers
CompanyName="Toms Spezialitäten"
Country="Germany"/>
</orders>
Notice that since I specified columns selected from the Orders table, the XML returned has <orders> as the parent tag and <customers> as the child. Another interesting fact is that Transact-SQL is not case sensitive, whereas XML is. Since I specified table names in lowercase (within the JOIN statement) the elements are also in lowercase.
Now let’s see what happens if we switch the order in which the columns are specified and capitalize the table names:
SELECT TOP 2
Customers.CompanyName,
Customers.Country,
Orders.OrderDate,
Orders.ShipCity
FROM Orders INNER JOIN Customers ON
orders.customerid = customers.customerid
FOR XML AUTO
Results:
<Customers
CompanyName="Vins et alcools Chevalier"
Country="France">
<Orders OrderDate="1996-07-04T00:00:00"
ShipCity="Reims"/>
</Customers>
<Customers
CompanyName="Toms Spezialitäten"
Country="Germany">
<Orders
OrderDate="1996-07-05T00:00:00"
ShipCity="Münster"/>
</Customers>
The same query with a slight change returns a completely different hierarchy - <Customers> is now the parent and <Orders> is the child.
Using Sub-Queries with FOR XML AUTO
The AUTO mode of the FOR XML extension lets you use sub-queries, which can be a useful trick for certain XML formatting needs. Sub-queries can be used for a variety of reasons, one of them being limiting the returned result set. The following example returns XML with a couple of customers that use United States as their shipping country:
SELECT TOP 2 CompanyName, City, Region FROM customers WHERE CustomerID IN
(SELECT customerID FROM orders
WHERE ShipCountry = 'USA')
FOR XML AUTO
Results:
<customers CompanyName="Great Lakes Food Market" City="Eugene" Region="OR"/>
<customers CompanyName="Hungry Coyote Import Store" City="Elgin" Region="OR"/>
Similarly, the next example retrieves top title names for a couple of authors in the pubs database; If there aren’t any titles written by a particular author the query will return ‘no titles for this author’:
SELECT au_lname, au_fname, title =
ISNULL(
(SELECT TOP 1 title FROM titles
INNER JOIN titleauthor ON titles.title_id = titleauthor.title_id
WHERE titleauthor.au_id = authors.au_id), 'no titles for this author')
FROM authors
WHERE au_lname = 'bennet'
OR au_lname = 'blotchet-halls'
FOR XML AUTO
Results:
<authors au_lname="Bennet"
au_fname="Abraham"
title="The Busy Executive's Database Guide"/>
<authors au_lname="Blotchet-Halls"
au_fname="Reginald"
title="Fifty Years in Buckingham Palace Kitchens"/>
The advantage of using a sub-query in this case is the ability to suppress the child tag. The same query could be rewritten with joins, however, in that case you might have to show <titles> tags, along with the <author>, as shown below:
SELECT au_lname, au_fname, title
FROM authors
INNER JOIN titleauthor
ON authors.au_id = titleauthor.au_id
LEFT JOIN titles
ON titles.title_id = titleauthor.title_id
WHERE au_lname = 'bennet'
OR au_lname = 'blotchet-halls'
FOR XML AUTO
Results:
<authors au_lname="Bennet"
au_fname="Abraham">
<titles title="The Busy Executive's Database Guide"/>
</authors>
<authors au_lname="Blotchet-Halls"
au_fname="Reginald">
<titles title="Fifty Years in Buckingham Palace Kitchens"/>
</authors>
Similar query can be written using a join and aliasing the inner table to suppress the child tag:
SELECT au_lname, au_fname,
SUBSTRING(title, 1, LEN(title)) as title
FROM authors
INNER JOIN titleauthor
ON authors.au_id = titleauthor.au_id
LEFT JOIN titles ON titles.title_id = titleauthor.title_id
WHERE au_lname = 'bennet'
OR au_lname = 'blotchet-halls'
FOR XML AUTO
Results:
<authors au_lname="Bennet"
au_fname="Abraham"
title="The Busy Executive's Database Guide"/>
<authors au_lname="Blotchet-Halls"
au_fname="Reginald"
title="Fifty Years in Buckingham Palace Kitchens"/>
As you can tell, the AUTO mode is very easy to use once you know how it works. On the other hand, the AUTO mode doesn’t give you much flexibility in formatting the result set. In addition, the AUTO mode does not support GROUP BY clause and aggregate functions.
You can get somewhat creative if you still want to use FOR XML AUTO with aggregate functions, for instance the above query could be rewritten to return the average quantity using FOR XML AUTO as follows:
SELECT TOP 1 average_quantity = (
SELECT AVG(quantity) FROM [order details])
FROM [order details]
FOR XML AUTO
Results:
<order_x0020_details average_quantity="23"/>
Nor can you retrieve a scalar variable into XML using FOR XML AUTO (unless you put such a variable in a temporary table). You’ll see how to overcome these limitations of the AUTO mode with RAW mode later in the article.
FOR XML RAW
The RAW mode appends the <row> tag to the output and does not provide the table name. If you need to know which table the data is coming from the RAW mode is of no help. On the other hand, the RAW mode is very useful when you don’t really care about the source of the data; so if you have a query joining multiple tables and you just want to grab the data without caring about the table name tags, use the RAW mode, as follows:
SELECT TOP 2
Customers.CompanyName,
Customers.Country,
Orders.OrderDate,
Orders.ShipCity
FROM Orders INNER JOIN Customers ON
orders.customerid = customers.customerid
FOR XML RAW
Results:
<row CompanyName="Vins et alcools Chevalier"
Country="France" OrderDate="1996-07-04T00:00:00" ShipCity="Reims"/>
<row CompanyName="Toms Spezialitäten" Country="Germany"
OrderDate="1996-07-05T00:00:00" ShipCity="Münster"/>
Notice that the RAW mode does not support the ELEMENTS option; you’re limited to retrieving data as attributes.
The sub-queries used with the RAW mode behave exactly the same way as joins – none of the table names is displayed:
SELECT au_lname, au_fname, title =
ISNULL(
(SELECT TOP 1 title FROM titles
INNER JOIN titleauthor ON titles.title_id = titleauthor.title_id
WHERE titleauthor.au_id = authors.au_id), 'no titles for this author')
FROM authors
WHERE au_lname = 'bennet'
OR au_lname = 'blotchet-halls'
FOR XML RAW
Results:
<row au_lname="Bennet"
au_fname="Abraham" title="The Busy Executive's Database Guide"/>
<row au_lname="Blotchet-Halls new new"
au_fname="Reginald" title="Fifty Years in Buckingham Palace Kitchens"/>
The RAW mode does support the aggregate functions and GROUP BY clause of the SELECT statement. The following example retrieves an average quantity from the order details table in Northwind database:
SELECT average_quantity = AVG(quantity) FROM [order details]
FOR XML RAW
Result:
<row average_quantity="23"/>
The next example counts the number of orders per customer. For brevity I limited the output to the customers that have placed 18 or more orders:
SELECT CustomerID, COUNT(*) AS NumberOfOrders
FROM Orders
GROUP BY CustomerID
HAVING COUNT(*) > 18
FOR XML RAW
Results:
<row CustomerID="ERNSH" NumberOfOrders="30"/>
<row CustomerID="FOLKO" NumberOfOrders="19"/>
<row CustomerID="HUNGO" NumberOfOrders="19"/>
<row CustomerID="QUICK" NumberOfOrders="28"/>
<row CustomerID="SAVEA" NumberOfOrders="31"/>
You can also return a scalar variable in XML using the RAW mode, as the following query demonstrates:
DECLARE @string VARCHAR(50)
SELECT @string = 'my XML article'
SELECT @string AS MyArticle
FOR XML RAW
Results:
<row MyArticle="my XML article"/>
So in this article I gave you a brief overview of XML components and introduced you to Transact-SQL XML extensions. I discussed AUTO and RAW modes of the FOR XML clause in detail. Both of these modes are very easy to use, but neither one gives the developer full control over the XML hierarchy returned from the query. My next article discusses the EXPLICIT mode of FOR XML clause, which is somewhat more complicated but gives you much more control over the output.