Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
568 views
in Technique[技术] by (71.8m points)

Parsing a nested XML in Snowflake

I have the following XML file, which I have imported in a Snowflake table as a VARIANT type.

<xyz>
  <Header>
    <VersionInfo>
      <Version>2.1.0</Version>
      <Notes>Draft report v1</Notes>
    </VersionInfo>
    <Environment>Env1</Environment>
    <SubID>5787685</SubID>
  </Header>
  <Items>
    <NEWT>
      <ItemNo>12345</ItemNo>
      <ItemBody>
        <RptDt>2021-01-19T06:54:42.483Z</RptDt>
        <RptEnt>
          <Entity>123</Entity>
        </RptEnt>
        <IsVerified>FALSE</IsVerified>
      </ItemBody>
    </NEWT>
  </Items>
  <Items>
    <NEWT>
      <ItemNo>23456</ItemNo>
      <ItemBody>
        <RptDt>2021-01-19T06:54:42.483Z</RptDt>
        <RptEnt>
          <Entity>2346</Entity>
        </RptEnt>
        <IsVerified>TRUE</IsVerified>
      </ItemBody>
    </NEWT>
  </Items>
 </xyz>

I am trying to parse the XML and produce two different Snowflake tables from the data. The first query works, and extracts the data from the section. Here is the query:

  -- get header and version
SELECT 
  XMLGET(Header.value, 'Environment'):"$"::string  Environment,
  XMLGET(Header.value, 'SubID'):"$"::string  SubID,
  XMLGET(Version.value, 'Version'):"$"::string  Version,
  XMLGET(Version.value, 'Notes'):"$"::string  Notes
FROM 
   MyTable  ,
  LATERAL FLATTEN(GET(xml_import, '$')) Header ,
  LATERAL FLATTEN(GET(Header.value, '$')) Version 
WHERE 
  GET(Header.value, '@') = 'Header'
  and GET(Version.value, '@') = 'VersionInfo';  

And below is the desired result:

ENVIRONMENT     SUBID       VERSION     NOTES
Env1            5787685     2.1.0       Draft report v1

For then next section, I am having trouble parsing through all the levels of the section. The desired result would be a table which would have one row of data from each in the XML file.

ItemNo  RptDt               RptEnt      IsVerified
12345   2021-01-19T06:54:42.483Z    123         FALSE
23456   2021-01-19T06:54:42.483Z    2346        TRUE

I tried to write the following SQL but I am having hard time filtering the rows and navigating the nested levels.

 -- get items and reports
select 
XMLGET(flt1.value, 'RptDt'):"$"::string  RptDate,
XMLGET(flt1.value, 'IsVerified'):"$"::string  IsVerified

FROM 
  MyTable src1,
       lateral flatten(to_array(xmlget(xmlget(src1.xml_import, 'Items'), 'NEWT'):"$")) flt1

I am sure there is a more efficient way to query this data and convert it to a tabular format. Can anyone advise on how to traverse multiple levels and multiple repeated , while ignoring the ?

question from:https://stackoverflow.com/questions/65886688/parsing-a-nested-xml-in-snowflake

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I'm looking for ways to shorten this code - but it works for your example:

select xmlget(emp.value:"$", 'ItemNo'):"$" ItemNo
, xmlget(xmlget(emp.value:"$", 'ItemBody'), 'RptDt'):"$"::timestamp RptDt
, xmlget(xmlget(emp.value:"$", 'ItemBody'), 'RptEnt'):"$":"$" RptEnt
, xmlget(xmlget(emp.value:"$", 'ItemBody'), 'IsVerified'):"$"::boolean IsVerified
from onexml
    ,  lateral flatten(onexml.xml_import:"$") emp
where get(emp.value, '@')='Items';

enter image description here

Sample data:

create temp table onexml as
select parse_xml('<xyz>
  <Header>
    <VersionInfo>
      <Version>2.1.0</Version>
      <Notes>Draft report v1</Notes>
    </VersionInfo>
    <Environment>Env1</Environment>
    <SubID>5787685</SubID>
  </Header>
  <Items>
    <NEWT>
      <ItemNo>12345</ItemNo>
      <ItemBody>
        <RptDt>2021-01-19T06:54:42.483Z</RptDt>
        <RptEnt>
          <Entity>123</Entity>
        </RptEnt>
        <IsVerified>FALSE</IsVerified>
      </ItemBody>
    </NEWT>
  </Items>
  <Items>
    <NEWT>
      <ItemNo>23456</ItemNo>
      <ItemBody>
        <RptDt>2021-01-19T06:54:42.483Z</RptDt>
        <RptEnt>
          <Entity>2346</Entity>
        </RptEnt>
        <IsVerified>TRUE</IsVerified>
      </ItemBody>
    </NEWT>
  </Items>
 </xyz>') xml_import;

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...