apache spark - How to convert rows into a list of dictionaries in pyspark?

Question

Welcome To Ask or Share your Answers For Others

apache spark - How to convert rows into a list of dictionaries in pyspark?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

apache spark - How to convert rows into a list of dictionaries in pyspark?

I have a DataFrame(df) in pyspark, by reading from a hive table:

df=spark.sql('select * from <table_name>')


+++++++++++++++++++++++++++++++++++++++++++
|  Name    |    URL visited               |
+++++++++++++++++++++++++++++++++++++++++++
|  person1 | [google,msn,yahoo]           |
|  person2 | [fb.com,airbnb,wired.com]    |
|  person3 | [fb.com,google.com]          |
+++++++++++++++++++++++++++++++++++++++++++

When i tried the following, got an error

df_dict = dict(zip(df['name'],df['url']))
"TypeError: zip argument #1 must support iteration."

type(df.name) is of 'pyspark.sql.column.Column'

How do i create a dictionary like the following, which can be iterated later on

{'person1':'google','msn','yahoo'}
{'person2':'fb.com','airbnb','wired.com'}
{'person3':'fb.com','google.com'}

Appreciate your thoughts and help.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:52:30+0000

I think you can try row.asDict(), this code run directly on the executor, and you don't have to collect the data on driver.

Something like:

df.rdd.map(lambda row: row.asDict())

Categories

apache spark - How to convert rows into a list of dictionaries in pyspark?

apache spark - How to convert rows into a list of dictionaries in pyspark?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags