EditExtract Element Names
One audience member asked me if there was a way to be able to determine the element names that exist within the XML stored in a table. His business problem was that he archives XML in a field of a table, and needed to identify whether one row's XML contained a different set of elements than another row's (i.e., if an untyped XML's "schema" changed between versions).
To demonstrate one possible solution, suppose that I have a table (Sample) with an identity column (ID) and an XML column (Info). Sample XML in this column might resemble:
<employee id="12">
<name>Milton Waddams</name>
<title>Accountant</title>
<notes>
<note>No record of him being employed here</note>
<note>We fixed payroll glitch with this employee</note>
</notes>
<assignedAssets>
<asset name="swingarm stapler">
<note>Bill Lumbergh wants this reassigned</note>
</asset>
<asset name="calculator" />
<asset name="telephone" />
</assignedAssets>
</employee>
To get a raw listing of namespaces, element names, and parent element names, I might use the following:
SELECT id,
info.query('
for $node in /descendant::node()[local-name() != ""]
return <node>
<namespace>{ namespace-uri($node) }</namespace>
<localname>{ local-name($node) }</localname>
<parent>{ local-name($node/..) }</parent>
</node>') AS nodes
FROM sample
Note that the iterator uses an Axis to walk all nodes in the tree and a predicate to filter out Text Nodes (which will not have a local-name value).
Resulting contents:
<node>
<namespace />
<localname>employee</localname>
<parent />
</node>
<node>
<namespace />
<localname>name</localname>
<parent>employee</parent>
</node>
<node>
<namespace />
<localname>title</localname>
<parent>employee</parent>
</node>
<node>
<namespace />
<localname>notes</localname>
<parent>employee</parent>
</node>
<node>
<namespace />
<localname>note</localname>
<parent>notes</parent>
</node>
<node>
<namespace />
<localname>note</localname>
<parent>notes</parent>
</node>
<node>
<namespace />
<localname>assignedAssets</localname>
<parent>employee</parent>
</node>
<node>
<namespace />
<localname>asset</localname>
<parent>assignedAssets</parent>
</node>
<node>
<namespace />
<localname>note</localname>
<parent>asset</parent>
</node>
<node>
<namespace />
<localname>asset</localname>
<parent>assignedAssets</parent>
</node>
<node>
<namespace />
<localname>asset</localname>
<parent>assignedAssets</parent>
</node>
By wrapping this into another query, I can shred the XML into a table structure and then combine it with the rest of the output to form one common tablespace:
SELECT DISTINCT
q1.id,
T.n.value('namespace[1]', 'varchar(100)') AS Namespace,
T.n.value('localname[1]', 'varchar(100)') AS Localname,
T.n.value('parent[1]', 'VARCHAR(100)') AS parent
FROM ( SELECT id,
info.query('
for $node in /descendant::node()[local-name() != ""]
return <node>
<namespace>{ namespace-uri($node) }</namespace>
<localname>{ local-name($node) }</localname>
<parent>{ local-name($node/..) }</parent>
</node>') AS nodes
FROM sample
) q1
CROSS APPLY q1.nodes.nodes('/node') AS T ( n )
Results:
ID Namespace Localname Parent
1 asset assignedAssets
1 assignedAssets employee
1 employee
1 name employee
1 note asset
1 note notes
1 notes employee
1 title employee
If there were multiple rows, each containing XML, then there would be more values in the ID column. Using traditional SQL querying, it would then be possible to compare different sets of rows to identify where elements exist in one set that do not exist in another.