I am trying to create a dataframe with dynamic schema from tuple list in pyspark
here is my code for tuple list
outputlist= []
for row in df2.collect():
tmpList = row
temptuple = ()
id = tmpList[0]
temptuple = temptuple+(id,)
print(id)
for val in range (1,len(tmpList)):
if tmpList[val] is None:
break
else :
value = tmpList[val]
index = val
if index > 1:
index =1
temptuple = temptuple+ (value,)
temptuple = temptuple+ (index,)
outputlist.append(temptuple)
print(outputlist)
[('44038:4132', '324772', 1), ('44038:4291', '772122995105', 1, '477212299170', 1)]
Until here it is okay, now i have to create a dataframe with dynamic schema using above values
for example when dataframe read first tuple it should give result like this
if you see in screenshot value 324772 is coming as field name
and when dataframe read second tuple , it should give results like this
if you see in screenshot value 772122995105,477212299170 is coming as field name and so on
question from:
https://stackoverflow.com/questions/66045340/create-dataframe-from-tuple-list-with-dynamic-schema-in-pyspark 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…