Stumbled across this question whilst having a very similar problem, I'd been running a query processing a 7.5MB XML file (~approx 10,000 nodes) for around 3.5~4 hours before finally giving up.
However, after a little more research I found that having typed the XML using a schema and created an XML Index (I'd bulk inserted into a table) the same query completed in ~ 0.04ms.
How's that for a performance improvement!
Code to create a schema:
IF EXISTS ( SELECT * FROM sys.xml_schema_collections where [name] = 'MyXmlSchema')
DROP XML SCHEMA COLLECTION [MyXmlSchema]
GO
DECLARE @MySchema XML
SET @MySchema =
(
SELECT * FROM OPENROWSET
(
BULK 'C:PathToSchemaMySchema.xsd', SINGLE_CLOB
) AS xmlData
)
CREATE XML SCHEMA COLLECTION [MyXmlSchema] AS @MySchema
GO
Code to create the table with a typed XML column:
CREATE TABLE [dbo].[XmlFiles] (
[Id] [uniqueidentifier] NOT NULL,
-- Data from CV element
[Data] xml(CONTENT dbo.[MyXmlSchema]) NOT NULL,
CONSTRAINT [PK_XmlFiles] PRIMARY KEY NONCLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Code to create Index
CREATE PRIMARY XML INDEX PXML_Data
ON [dbo].[XmlFiles] (Data)
There are a few things to bear in mind though. SQL Server's implementation of Schema doesn't support xsd:include. This means that if you have a schema which references other schema, you'll have to copy all of these into a single schema and add that.
Also I would get an error:
XQuery [dbo.XmlFiles.Data.value()]: Cannot implicitly atomize or apply 'fn:data()' to complex content elements, found type 'xs:anyType' within inferred type 'element({http://www.mynamespace.fake/schemas}:SequenceNumber,xs:anyType) ?'.
if I tried to navigate above the node I had selected with the nodes function. E.g.
SELECT
,C.value('CVElementId[1]', 'INT') AS [CVElementId]
,C.value('../SequenceNumber[1]', 'INT') AS [Level]
FROM
[dbo].[XmlFiles]
CROSS APPLY
[Data].nodes('/CVSet/Level/CVElement') AS T(C)
Found that the best way to handle this was to use the OUTER APPLY to in effect perform an "outer join" on the XML.
SELECT
,C.value('CVElementId[1]', 'INT') AS [CVElementId]
,B.value('SequenceNumber[1]', 'INT') AS [Level]
FROM
[dbo].[XmlFiles]
CROSS APPLY
[Data].nodes('/CVSet/Level') AS T(B)
OUTER APPLY
B.nodes ('CVElement') AS S(C)
Hope that that helps someone as that's pretty much been my day.