Create Dataframe from tuple list with dynamic schema in pyspark

Question

Welcome To Ask or Share your Answers For Others

Create Dataframe from tuple list with dynamic schema in pyspark

posted Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

Create Dataframe from tuple list with dynamic schema in pyspark

I am trying to create a dataframe with dynamic schema from tuple list in pyspark

here is my code for tuple list

outputlist= []
for row in df2.collect():
   tmpList = row 
   temptuple = ()
   id = tmpList[0]
   temptuple = temptuple+(id,)
   print(id)
   for val in range (1,len(tmpList)):
     if tmpList[val] is None:
     break
     else :
      value = tmpList[val]
      index = val
      if index > 1:
        index =1
        temptuple = temptuple+ (value,)
        temptuple = temptuple+ (index,)

outputlist.append(temptuple)
print(outputlist)

[('44038:4132', '324772', 1), ('44038:4291', '772122995105', 1, '477212299170', 1)]

Until here it is okay, now i have to create a dataframe with dynamic schema using above values

for example when dataframe read first tuple it should give result like this if you see in screenshot value 324772 is coming as field name

and when dataframe read second tuple , it should give results like this if you see in screenshot value 772122995105,477212299170 is coming as field name and so on

question from:https://stackoverflow.com/questions/66045340/create-dataframe-from-tuple-list-with-dynamic-schema-in-pyspark

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T03:17:22+0000

See the following code:

tuples = [('44038:4132', '324772', 1), ('44038:4291', '772122995105', 1, '477212299170', 1)]

for tuple in tuples:
    id = tuple[0]
    tmp_tuple = tuple[1:]

    cols = {}

    for i in range(int(len(tmp_tuple) / 2)):
        j = i * 2
        cols[tmp_tuple[j]] = tmp_tuple[j + 1]

    tmp_dict = {
        "id": id,
        **cols
    }

    cols_keys = cols.keys()
    df = spark.createDataFrame(Row(tmp_dict))
    df = df.select("id", *cols_keys)
    df.show()

Here is the sample output:

+----------+------+
|        id|324772|
+----------+------+
|44038:4132|     1|
+----------+------+

+----------+------------+------------+
|        id|772122995105|477212299170|
+----------+------------+------------+
|44038:4291|           1|           1|
+----------+------------+------------+

Categories

Create Dataframe from tuple list with dynamic schema in pyspark

Create Dataframe from tuple list with dynamic schema in pyspark

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags