Exploit the XML Capabilities of SQL Server 2005

(Note: This presentation was written using the Beyond Bullets style.)


Everywhere that you look today, XML seems to be present.

Web services and AJAX web applications transfer data in XML formats. Blogs and other web sites syndicate their content as RSS, which is an XML format. Microsoft Office products like Word 2007 and Excel 2007 now use XML as their key file format, and XML files are used by all .NET applications to store application configuration.


Developers must move data into and out of XML

The XML “Infoset” is simply an encapsulation of data into a simple yet strict syntax that is universally recognized. To access the data contained within the XML, it must first be parsed into elementary units of information. This is usually accomplished by means of a set of XML APIs provided by the platform. Then, each piece of information must be translated from a text-only representation of data into a platform-dependent data type where it can be further utilized.


XML Translation in Business Logic: Not Always the Best Choice

Traditionally, XML translation occurs in the business logic. In respect to databases, incoming XML is shredded into its discrete data components, and these components are what are then saved to the database as fields within tables. Some issues to consider: Is there loss of fidelity during translation? Is there a need to archive source XML documents? Do you know all of the questions that will be asked of your data? Are you using every piece of data in the XML? Will the XML contents ever change over time?


Let the DB work with XML directly to improve overall integration

How useful would it be if XML could be saved to a table as a single unit of data, and yet still be able to easily access the discrete pieces of data inside of it at query time? Or, what do you do if a developer cannot determine whether to explicitly parse and save low-priority data or to throw it away? Think: Ad Hoc reporting, not highly transactional systems.


Exploit SQL Server 2005 for database solutions that use XML data

SQL Server 2005 understands XML as structured data. It provides mechanisms to query and join to XML directly, as well as to generate XML from relational data. Developers and architects need to be familiar with how XML handling strengthens SQL Server’s position as a data platform when designing systems.


Interpret XML as More than Just Text Using the XML Data Type

XML is a First-Class Citizen data type. It is used in columns, variables, parameters, and return types. It supports serialization to and parse from string representation. It has full support for typed XML using XML Schema.

The XML data type has methods that offer inspection and extraction of data from the XML. The presence of data within XML can be checked using the exist() method. Data can be extracted from XML using the query() and value() methods. And, the nodes() method allows XML to be shredded into individual rows of XML.

XML columns can be indexed for faster access to the contained data. A Primary XML Index pre-shreds the contents of an XML field into discrete nodes to prevent the need for parsing at query time. Secondary XML Indexes (value, path, property) build upon the information contained in the Primary XML Index by offering various ways of referencing a node:
  1. Path is useful when the exact path to data contained in the XML is well-known.
  2. Value is useful when a query uses a value in its WHERE clause, but the exact path is not known
  3. Property is useful when node values are being retrieved from the XML in conjunction with the record’s primary key.


Harness the Data in XML Using the XQuery Language

Path Expressions are the primary way to identify where data resides within XML, and derive directly from the XPath specification. A Path Expression includes Location Steps that work very much like the directory syntax of DOS or UNIX. In addition to identifying elements within the tree structure, there is a specific notation for accessing Attributes of elements. A single Path Expression may match multiple nodes, so XPath (and thus, XQuery) provides a filtering mechanism referred to as a Predicate.

More complex XML handling will call for an iteration processor, which XQuery implements as FLWOR statements. For and Return are the required parts of FLWOR, with the For clause setting up the iteration scope and the Return clause generating the output from each iteration. Like standard SQL, the iteration scope can be narrowed using the Where clause, and the order that the set is iterated can be specified using the Order By clause.

XQuery provides a set of functions to assist in transforming and filtering of XML data. The Data Accessor Functions string() and data() permit discrete data to be extracted from XML. Other functions serve as convenience methods for working with the data in a specific context (numeric, string, context, aggregate functions). Beyond the standard XQuery functions, SQL Server also includes two SQL Server Extension Functions sql:column() and sql:variable() for integrating XML handling with data outside of the scope of the XML itself.


Create XML From Relational Data Using the FOR XML Clause

While the typical T-SQL query returns a rowset, it is very easy to return XML instead by simply including FOR XML at the end of the query. RAW Mode generates one XML element for each row returned by the query, with any XML columns being represented as child elements. AUTO Mode also generates one XML element for each row returned by the query, except each individual table listed in the FROM clause that has a field in the SELECT clause will result in a nested element in the resulting XML. It is also possible to nest a FOR XML query within another query, using the generated XML as a XML data type (TYPE mode).

The default layout of the resulting XML may not be desirable, so a few features exist to control some of the behaviors. ROOT wraps the resulting XML under a common root element so that it is a well-formed document as opposed to an XML fragment. ELEMENTS mode forces each data item to be represented as an element instead of an attribute. PATH Mode provides a simple mechanism to generate complex XML structures by using column aliases as XPath instructions.

It may be necessary to add some complexity to the resulting XML before it is entirely useful to whatever may consume it. To distinguish empty values from Null values, the XSINIL Mode adds an attribute of xsi:nil="true" onto any element that is Null in the database. Inline XSD Schemas can be generated for the resulting XML using the XMLSCHEMA Mode. XML Namespaces can be added to the resulting XML using the WITH XMLNAMESPACES clause.

Does moving XML handling into the database offer advantages over using the middle tier alone?


Developers who are unaware of SQL Server’s XML abilities are doing extra work

They are deciding up front what information will be persisted for future reference.


Utilize SQL Server 2005 for database solutions that need to use XML data

SQL Server is much more than a place to store your data until the middle tier needs it. Introducing XML as a native data type sets the stage for SQL Server becoming a platform to store and query all of your data (not just the relational data).


XML Capabilities enable SQL Server to be a complete Data Platform

A data platform stores all kinds of imaginable data in a variety of formats, and offers capability to work with each type of data in conjunction with your traditional data. One might refer to this as “Save it now, figure it out later”. By including first-class XML support, SQL Server becomes more than a database: it becomes a data platform that will only continue to evolve over time.


XML and SQL Server – A Great Combination