Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
528 views
in Technique[技术] by (71.8m points)

apache spark - Load XML file to dataframe in PySpark using DBR 7.3.x+

I'm trying to load an XML file in to dataframe using PySpark in databricks notebook.

df = spark.read.format("xml").options(
    rowTag="product" , mode="PERMISSIVE", columnNameOfCorruptRecord="error_record"
).load(filePath)

On doing so, I get following error:

Could not initialize class com.databricks.spark.xml.util.PermissiveMode$

Databricks runtime version : 7.3 LTS Spark version : 3.0.1 Scala version : 2.12

The same code block runs perfectly fine in DBR 6.4 Spark 2.4.5, Scala 2.11

question from:https://stackoverflow.com/questions/65660907/load-xml-file-to-dataframe-in-pyspark-using-dbr-7-3-x

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You need to upgrade version of spark_xml library to a version compiled for Scala 2.12, because the version that works for DBR 6.4 isn't compatible with new Scala version. So, instead of spark-xml_2.11 you need to use spark-xml_2.12.

P.S. I just checked with DBR 7.3 LTS & com.databricks:spark-xml_2.12:0.11.0 - works just fine.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...