|
|
The Explicit Mode of FOR XML
Vakhtang Pavliashvili
My previous article gave you a quick introduction to XML support in SQL Server 2000 and examined the details of AUTO and RAW modes of FOR XML clause. Recall from that article that neither of those modes offers you full control over your XML output. The EXPLICIT mode, on the other hand, allows you to customize your output according to your needs. At the first glance FOR XML EXPLICIT might seem extremely difficult to learn and master. I won’t say that it is trivial, but it’s not a rocket science either. This article will teach you how to customize your output using FOR XML EXPLICIT.
The EXPLICIT mode is implemented through UNION ALL queries. If you’re not familiar with UNION ALL clause of the SELECT statement it simply combines results of two or more queries. Each query combined with the UNION ALL clause has to contain the same number of columns. The corresponding columns in each query need to have compatible data types. In other words, you cannot UNION an integer and a string (unless you explicitly convert one of them first). For instance, I could combine the names of customer contacts and employee names in Northwind database with the following query:
SELECT ContactName FROM customers
UNION ALL
SELECT FirstName + ' ' + LastName AS FullName FROM Employees
Results (abbreviated):
ContactName
-------------------------------
Maria Anders
Ana Trujillo
Antonio Moreno
Thomas Hardy
Christina Berglund
Hanna Moos
Frédérique Citeaux
Martín Sommer
Notice that even though we combined two queries with different column names the output has a heading of the top query (ContactName). In fact, the UNION ALL clause could not care less how many queries are involved: you will only get the column names from the top query.
The EXPLICIT mode works similarly; you define your XML hierarchy in the top query and then you grab data for each of the XML nodes from the queries that follow. Keep in mind though, that each query will have to contain the same number of columns with compatible data types.
The XML structure to be returned is stored in what is referred to as the “universal table”. The universal table contains information about the XML tag names, as well as how the tags need to be nested. Let’s look at a quick example to make things a bit clearer.
The following query returns titles written by the author Green and respective royalty percentages from PUBS database:
SELECT 1 AS TAG, NULL AS PARENT,
authors.au_fname AS [authors!1!au_fname],
authors.au_lname AS [authors!1!au_lname],
NULL AS [titleauthor!2!royaltyper],
NULL AS [titles!3!title]
FROM
authors WHERE au_lname = 'green'
UNION ALL
SELECT 2 AS TAG, 1 AS PARENT,
au_fname,
au_lname,
royaltyper,
NULL
FROM authors INNER JOIN titleauthor ON
authors.au_id= titleauthor.au_id
WHERE au_lname ='green'
UNION ALL
SELECT 3 AS TAG, 2 AS PARENT,
au_fname,
au_lname,
royaltyper,
title
FROM authors INNER JOIN titleauthor ON authors.au_id = titleauthor.au_id
INNER JOIN titles ON titles.title_id = titleauthor.title_id
WHERE au_lname ='green'
ORDER BY [authors!1!au_fname], [authors!1!au_lname], [titleauthor!2!royaltyper]
FOR XML EXPLICIT
Results:
I know that if you haven’t used FOR XML EXPLICIT before the query above might be overwhelming: don’t worry – we’ll dissect this query in great detail.<authors au_fname="Marjorie" au_lname="green">
<titleauthor royaltyper="40">
<titles title="The Busy Executive's Database Guide"/>
</titleauthor>
<titleauthor royaltyper="100">
<titles title="You Can Combat Computer Stress!"/>
</titleauthor>
</authors>
As I said earlier, you’ve got to specify the XML structure to be returned in the top query. The topmost tag in XML hierarchy has no parent, so in XML the chicken comes before the egg. That’s why every query using the EXPLICIT mode has to start with the following:
SELECT 1 AS TAG, NULL AS PARENT
The rest of the first SELECT statement constructs the XML hierarchy I wish to see in the output. I’d like <authors> to be the outermost tag, followed by <titleauthor> and <titles>. The <authors> tag should contain the first and last names of the author. The <titleauthor> tag, which is the child of <authors> should contain a single attribute of royalty percentages. Finally, <titles> tag is the child of tag number 2 - <titleauthor>, and should contain the title name. Hence the hierarchy defined in the topmost query looks as following:
authors.au_fname AS [authors!1!au_fname],
authors.au_lname AS [authors!1!au_lname],
NULL AS [titleauthor!2!royaltyper],
NULL AS [titles!3!title]
Now things should be starting to make sense. However, what if you want to change the column names or table names in the output? The answer is simple – just change the portion of the top query that is placed in brackets. So I could rewrite top SELECT statement of the same query as follows:
SELECT 1 AS TAG, NULL AS PARENT,
authors.au_fname AS [Author!1!first_name],
authors.au_lname AS [Author!1!last_name],
NULL AS [TitleAuthor!2!royalty_percentage],
NULL AS [Title!3!title_name]
Then my output would look like following:
<Author first_name="Marjorie" last_name="green">
<TitleAuthor royalty_percentage="40">
<Title title_name="The Busy Executive's Database Guide"/>
</TitleAuthor>
<TitleAuthor royalty_percentage="100">
<Title title_name="You Can Combat Computer Stress!"/>
</TitleAuthor>
</Author>
Great! That part was fairly simple, but what’s with those NULLs in the query? Recall that each SELECT statement participating in the UNION query needs to contain the same number of columns. Could you join all three tables in each query? Yes, you could, but your performance would suffer. The PUBS database has a handful of records, and the query we just executed returns only a couple of rows. But if you have three tables with thousands of rows in each, joining the three tables for each SELECT would make your query rather slow.
Note: At times, you can’t help but join at least two tables in the topmost query, if you have to limit the results of the top query. However, if you can get away by not mentioning all participating tables in the top SELECT statement your performance can be much better.
Another part of FOR XML EXPLICIT syntax that might catch your attention is the ORDER BY clause. You should be aware that the ORDER BY clause is used to sort the result set. With the EXPLICIT mode the ORDER BY serves the same purpose, except it sorts the XML hierarchy instead of the result set. Let’s see what happens if we remove the ORDER BY clause:
<authors au_fname="Marjorie" au_lname="green">
<titleauthor royaltyper="40"/>
<titleauthor royaltyper="100">
<titles title="The Busy Executive's Database Guide"/>
<titles title="You Can Combat Computer Stress!"/>
</titleauthor>
</authors>
Well, the world hasn’t crushed, but now you have to wonder which title is earning the author Green 40% of royalties and which one earns 100%. If you only have a couple of nodes in your XML hierarchy you might be able to get away without the ORDER BY clause. However, as a rule of thumb, be sure to include ORDER BY in all queries using the EXPLICIT mode.
XML Explicit Tips and Tricks
Now that you’re familiar with the basics of the Explicit mode it’s time to learn a few tips that will save you much time when implementing your own solutions.
More Than One Child Tag
The query we just examined earlier was fairly simple – it had three nodes and each node was the parent of the following node. What happens if we have multiple child nodes with the same parent? Simply specify which node needs to be the parent for each of the child nodes. The following example brings back <Orders> and <OrderDetails> tags both as children of the <Customers> tag:
SELECT 1 AS TAG,
NULL AS PARENT,
Customers.CustomerID AS [Customers!1!CustomerID],
Customers.CompanyName AS [Customers!1!CompanyName],
NULL AS [Orders!2!OrderID],
NULL AS [Orders!2!OrderDate],
NULL AS [Orders!2!ShipCity],
NULL AS [OrderDetails!3!ProductID],
NULL AS [OrderDetails!3!Quantity],
NULL AS [Products!4!ProductName]
FROM Customers INNER JOIN Orders
ON Customers.CustomerID = ORders.CustomerID
INNER JOIN [Order Details] OD ON od.OrderID = Orders.OrderID
INNER JOIN Products ON Products.ProductID = OD.ProductID
WHERE Products.ProductID = 9
UNION ALL
SELECT
2 AS TAG,
1 AS PARENT,
Customers.CustomerID AS [Customers!1!CustomerID],
NULL AS [Customers!1!CompanyName],
Orders.OrderID AS [Orders!2!OrderID],
Orders.OrderDate AS [Orders!2!OrderDate],
Orders.ShipCity AS [Orders!2!ShipCity],
NULL AS [OrderDetails!3!ProductID],
NULL AS [OrderDetails!3!Quantity],
NULL AS [Products!4!ProductName]
FROM Customers INNER JOIN Orders
ON customers.customerID = Orders.CustomerID
INNER JOIN [Order Details] OD ON od.OrderID = Orders.OrderID
INNER JOIN Products ON Products.ProductID = OD.ProductID
WHERE Products.ProductID = 9
UNION ALL
SELECT
3 AS TAG,
1 AS PARENT,
Customers.CustomerID AS [Customers!1!CustomerID],
NULL AS [Customers!1!CompanyName],
Orders.OrderID AS [Orders!2!OrderID],
NULL AS [Orders!2!OrderDate],
NULL AS [Orders!2!ShipCity],
OD.ProductID AS [OrderDetails!3!ProductID],
OD.Quantity AS [OrderDetails!3!Quantity],
NULL AS [Products!4!ProductName]
FROM Customers INNER JOIN Orders
ON customers.customerID = Orders.CustomerID
INNER JOIN [Order Details] OD ON OD.OrderID = Orders.OrderID
INNER JOIN Products ON Products.ProductID = OD.ProductID
WHERE Products.ProductID = 9
ORDER BY [Customers!1!CustomerID], [Orders!2!OrderID]
FOR XML EXPLICIT
Abbreviated Results:
<Customers CustomerID="CONSH" CompanyName="Consolidated Holdings">
<Orders OrderID="10848" OrderDate="1998-01-23T00:00:00" ShipCity="London"/>
<OrderDetails ProductID="9" Quantity="3"/>
</Customers>
<Customers CustomerID="HUNGO" CompanyName="Hungry Owl All-Night Grocers">
<Orders OrderID="10687" OrderDate="1997-09-30T00:00:00" ShipCity="Cork"/>
<OrderDetails ProductID="9" Quantity="50"/>
</Customers>
Notice that I specified tag 1 as the parent for both tag 2 and tag 3. Had I specified tag 2 as the parent of tag 3 the results would look a bit different:
<Customers CustomerID="CONSH" CompanyName="Consolidated Holdings">
<Orders OrderID="10848" OrderDate="1998-01-23T00:00:00" ShipCity="London">
<OrderDetails ProductID="9" Quantity="3"/>
</Orders>
</Customers>
<Customers CustomerID="HUNGO" CompanyName="Hungry Owl All-Night Grocers">
<Orders OrderID="10687" OrderDate="1997-09-30T00:00:00" ShipCity="Cork">
<OrderDetails ProductID="9" Quantity="50"/>
</Orders>
</Customers>
Choosing the Columns for ORDER BY Clause
You might also find it challenging to order the XML hierarchy with the ORDER BY clause. You have to ensure that you have all the key columns in the ORDER BY to warrant proper ordering of the results. This means including multiple columns in the ORDER BY from each table that has a composite key – a primary key composed of multiple columns. Sometimes, this might be useful for ordering the result set, but not exactly what you wish to see in the output. For example, if you have a query joining sales and titles tables from the Pubs database you will have to include both stor_id and title_id columns from the sales table in the ORDER BY clause. However, since you can get the title_id from the title table, you might not wish to have this column also appear inside the <sales> tag. In such cases you can use the “!hide” directive, which suppresses the specified column in the result set, but still lets you use it for sorting the results.
The following example is somewhat more complex than the ones you’ve seen so far. It takes advantage of “!hide” directive and has several levels of hierarchy as well as tables with composite keys:
SELECT 1 AS TAG, NULL AS PARENT,
sales.stor_id AS [sales!1!store_id],
sales.title_id AS [sales!1!title_id!hide],
sales.ord_date AS [sales!1!order_date],
sales.qty AS [sales!1!quantity],
NULL AS [store!2!store_id!hide],
NULL AS [store!2!name],
NULL AS [store!2!city_and_state],
NULL AS [title!3!title_id],
NULL AS [title!3!title],
NULL AS [titleauthor!4!author_id],
NULL AS [titleauthor!4!royalty_percentage]
FROM sales
UNION ALL
SELECT 2 AS TAG, 1 AS PARENT,
sales.stor_id AS [sales!1!store_id],
sales.title_id AS [sales!1!title_id!hide],
sales.ord_date AS [sales!1!order_date],
sales.qty AS [sales!1!quantity],
stores.stor_id AS [store!2!store_id!hide],
stores.stor_name AS [store!2!name],
stores.city + ',' +stores.state AS [store!2!city_and_state],
NULL AS [title!3!title_id],
NULL AS [title!3!title],
NULL AS [titleauthor!4!author_id],
NULL AS [titleauthor!4!royalty_percentage]
FROM sales INNER JOIN stores ON sales.stor_id = stores.stor_id
UNION ALL
SELECT 3 AS TAG, 1 AS PARENT,
sales.stor_id AS [sales!1!store_id],
sales.title_id AS [sales!1!title_id!hide],
sales.ord_date AS [sales!1!order_date],
sales.qty AS [sales!1!quantity],
stores.stor_id AS [store!2!store_id!hide],
NULL AS [store!2!name],
NULL AS [store!2!city_and_state],
titles.title_id AS [title!3!title_id],
titles.title AS [title!3!title],
NULL AS [titleauthor!4!author_id],
NULL AS [titleauthor!4!royalty_percentage]
FROM sales INNER JOIN titles ON titles.title_id = sales.title_id
INNER JOIN stores ON sales.stor_id = stores.stor_id
UNION ALL
SELECT 4 AS TAG, 3 AS PARENT,
sales.stor_id AS [sales!1!store_id],
sales.title_id AS [sales!1!title_id!hide],
sales.ord_date AS [sales!1!order_date],
sales.qty AS [sales!1!quantity],
stores.stor_id AS [store!2!store_id!hide],
NULL AS [store!2!name],
NULL AS [store!2!city_and_state],
titles.title_id AS [title!3!title_id],
titles.title AS [title!3!title],
titleauthor.au_id AS [titleauthor!4!author_id],
titleauthor.royaltyper AS [titleauthor!4!royalty_percentage]
FROM sales INNER JOIN titles ON titles.title_id = sales.title_id
INNER JOIN titleauthor ON titles.title_id = titleauthor.title_id
INNER JOIN stores ON sales.stor_id = stores.stor_id
ORDER BY [sales!1!store_id], [sales!1!title_id!hide],
[store!2!store_id!hide], [title!3!title_id]
FOR XML EXPLICIT
Abbreviated Results:
<sales store_id="6380" order_date="2019-01-17T00:00:00" quantity="6">
<store name="Eric the Read Books" city_and_state="Seattle,WA"/>
<title title_id="BU1032" title="The Busy Executive's Database Guide">
<titleauthor author_id="213-46-8915" royalty_percentage="40"/>
<titleauthor author_id="409-56-7008" royalty_percentage="60"/>
</title>
</sales>
<sales store_id="6380" order_date="2019-01-24T00:00:00" quantity="4">
<store name="Eric the Read Books" city_and_state="Seattle,WA"/>
<title title_id="PS2091" title="Is Anger the Enemy?">
<titleauthor author_id="899-46-2035" royalty_percentage="50"/>
<titleauthor author_id="998-72-3567" royalty_percentage="50"/>
</title>
</sales>
<sales store_id="7066" order_date="1993-06-13T00:00:00" quantity="51">
<store name="Barnum's" city_and_state="Tustin,CA"/>
<title title_id="PC8888" title="Secrets of Silicon Valley">
<titleauthor author_id="846-92-7186" royalty_percentage="50"/>
<titleauthor author_id="427-17-2319" royalty_percentage="50"/>
</title>
</sales>
<sales store_id="7066" order_date="2019-01-08T00:00:00" quantity="76">
<store name="Barnum's" city_and_state="Tustin,CA"/>
<title title_id="PS2091" title="Is Anger the Enemy?">
<titleauthor author_id="899-46-2035" royalty_percentage="50"/>
<titleauthor author_id="998-72-3567" royalty_percentage="50"/>
</title>
</sales>
Parent Tag Not Open Error
One of the dreaded errors when working with FOR XML EXPLICIT is error number 6833. The cryptic nature of the error message doesn’t really help either:
“Parent tag ID N is not among the open tags. FOR XML EXPLICIT requires parent tags to be opened first. Check the ordering of the result set”.
The truth is that ordering of the result set might or might not be the reason. Sometimes your ORDER BY clause looks fine, but you still get the error. In such cases, ensure that all the joining columns appear in each SELECT statement. The following example returns the error 6833, because the second SELECT statement does not contain CustomerID, which is used to join Orders and Customers tables:
SELECT 1 AS TAG,
NULL AS PARENT,
Customers.CustomerID AS [Customers!1!CustomerID],
Customers.CompanyName AS [Customers!1!CompanyName],
NULL AS [Orders!2!OrderID],
NULL AS [Orders!2!OrderDate],
NULL AS [Orders!2!ShipCity],
NULL AS [OrderDetails!3!ProductID],
NULL AS [OrderDetails!3!Quantity],
NULL AS [Products!4!ProductName]
FROM Customers INNER JOIN Orders
ON Customers.CustomerID = ORders.CustomerID
INNER JOIN [Order Details] OD ON od.OrderID = Orders.OrderID
INNER JOIN Products ON Products.ProductID = OD.ProductID
WHERE Products.ProductID = 9
UNION ALL
SELECT
2 AS TAG,
1 AS PARENT,
NULL AS [Customers!1!CustomerID],
NULL AS [Customers!1!CompanyName],
Orders.OrderID AS [Orders!2!OrderID],
Orders.OrderDate AS [Orders!2!OrderDate],
Orders.ShipCity AS [Orders!2!ShipCity],
NULL AS [OrderDetails!3!ProductID],
NULL AS [OrderDetails!3!Quantity],
NULL AS [Products!4!ProductName]
FROM Customers INNER JOIN Orders
ON customers.customerID = Orders.CustomerID
INNER JOIN [Order Details] OD ON od.OrderID = Orders.OrderID
INNER JOIN Products ON Products.ProductID = OD.ProductID
WHERE Products.ProductID = 9
UNION ALL
SELECT
3 AS TAG,
2 AS PARENT,
Customers.CustomerID AS [Customers!1!CustomerID],
NULL AS [Customers!1!CompanyName],
Orders.OrderID AS [Orders!2!OrderID],
NULL AS [Orders!2!OrderDate],
NULL AS [Orders!2!ShipCity],
OD.ProductID AS [OrderDetails!3!ProductID],
OD.Quantity AS [OrderDetails!3!Quantity],
NULL AS [Products!4!ProductName]
FROM Customers INNER JOIN Orders
ON customers.customerID = Orders.CustomerID
INNER JOIN [Order Details] OD ON OD.OrderID = Orders.OrderID
INNER JOIN Products ON Products.ProductID = OD.ProductID
WHERE Products.ProductID = 9
ORDER BY [Customers!1!CustomerID], [Orders!2!OrderID]
FOR XML EXPLICIT
To fix the error, all I have to do is replace the NULL with Customers.CustomerID in the underlined section of code. Then the query will run as expected. Therefore, watch out for error 6833 and ensure all your joining columns appear in all SELECT statements.
Summary
This article extended your knowledge of FOR XML clause. Although not intended for the rocket scientists, the EXPLICIT clause takes a little while to get used to. I showed you how the syntax works and also shared a few tricks you might find useful.