I have the following XML file, which I have imported in a Snowflake table as a VARIANT type.
<xyz>
<Header>
<VersionInfo>
<Version>2.1.0</Version>
<Notes>Draft report v1</Notes>
</VersionInfo>
<Environment>Env1</Environment>
<SubID>5787685</SubID>
</Header>
<Items>
<NEWT>
<ItemNo>12345</ItemNo>
<ItemBody>
<RptDt>2021-01-19T06:54:42.483Z</RptDt>
<RptEnt>
<Entity>123</Entity>
</RptEnt>
<IsVerified>FALSE</IsVerified>
</ItemBody>
</NEWT>
</Items>
<Items>
<NEWT>
<ItemNo>23456</ItemNo>
<ItemBody>
<RptDt>2021-01-19T06:54:42.483Z</RptDt>
<RptEnt>
<Entity>2346</Entity>
</RptEnt>
<IsVerified>TRUE</IsVerified>
</ItemBody>
</NEWT>
</Items>
</xyz>
I am trying to parse the XML and produce two different Snowflake tables from the data. The first query works, and extracts the data from the section.
Here is the query:
-- get header and version
SELECT
XMLGET(Header.value, 'Environment'):"$"::string Environment,
XMLGET(Header.value, 'SubID'):"$"::string SubID,
XMLGET(Version.value, 'Version'):"$"::string Version,
XMLGET(Version.value, 'Notes'):"$"::string Notes
FROM
MyTable ,
LATERAL FLATTEN(GET(xml_import, '$')) Header ,
LATERAL FLATTEN(GET(Header.value, '$')) Version
WHERE
GET(Header.value, '@') = 'Header'
and GET(Version.value, '@') = 'VersionInfo';
And below is the desired result:
ENVIRONMENT SUBID VERSION NOTES
Env1 5787685 2.1.0 Draft report v1
For then next section, I am having trouble parsing through all the levels of the section. The desired result would be a table which would have one row of data from each in the XML file.
ItemNo RptDt RptEnt IsVerified
12345 2021-01-19T06:54:42.483Z 123 FALSE
23456 2021-01-19T06:54:42.483Z 2346 TRUE
I tried to write the following SQL but I am having hard time filtering the rows and navigating the nested levels.
-- get items and reports
select
XMLGET(flt1.value, 'RptDt'):"$"::string RptDate,
XMLGET(flt1.value, 'IsVerified'):"$"::string IsVerified
FROM
MyTable src1,
lateral flatten(to_array(xmlget(xmlget(src1.xml_import, 'Items'), 'NEWT'):"$")) flt1
I am sure there is a more efficient way to query this data and convert it to a tabular format. Can anyone advise on how to traverse multiple levels and multiple repeated , while ignoring the ?
question from:
https://stackoverflow.com/questions/65886688/parsing-a-nested-xml-in-snowflake