Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
184 views
in Technique[技术] by (71.8m points)

Create Dataframe from tuple list with dynamic schema in pyspark

I am trying to create a dataframe with dynamic schema from tuple list in pyspark

here is my code for tuple list

outputlist= []
for row in df2.collect():
   tmpList = row 
   temptuple = ()
   id = tmpList[0]
   temptuple = temptuple+(id,)
   print(id)
   for val in range (1,len(tmpList)):
     if tmpList[val] is None:
     break
     else :
      value = tmpList[val]
      index = val
      if index > 1:
        index =1
        temptuple = temptuple+ (value,)
        temptuple = temptuple+ (index,)

outputlist.append(temptuple)
print(outputlist)

[('44038:4132', '324772', 1), ('44038:4291', '772122995105', 1, '477212299170', 1)]

Until here it is okay, now i have to create a dataframe with dynamic schema using above values

for example when dataframe read first tuple it should give result like this if you see in screenshot value 324772 is coming as field name

and when dataframe read second tuple , it should give results like this if you see in screenshot value 772122995105,477212299170 is coming as field name and so on

screenshot

question from:https://stackoverflow.com/questions/66045340/create-dataframe-from-tuple-list-with-dynamic-schema-in-pyspark

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

See the following code:

tuples = [('44038:4132', '324772', 1), ('44038:4291', '772122995105', 1, '477212299170', 1)]

for tuple in tuples:
    id = tuple[0]
    tmp_tuple = tuple[1:]

    cols = {}

    for i in range(int(len(tmp_tuple) / 2)):
        j = i * 2
        cols[tmp_tuple[j]] = tmp_tuple[j + 1]

    tmp_dict = {
        "id": id,
        **cols
    }

    cols_keys = cols.keys()
    df = spark.createDataFrame(Row(tmp_dict))
    df = df.select("id", *cols_keys)
    df.show()

Here is the sample output:

+----------+------+
|        id|324772|
+----------+------+
|44038:4132|     1|
+----------+------+

+----------+------------+------------+
|        id|772122995105|477212299170|
+----------+------------+------------+
|44038:4291|           1|           1|
+----------+------------+------------+

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...