python - pyspark df.select(*) is disordered after df.sort() - OGeek|极客中国-技术改变生活,极客改变未来

This is my original pyspark dataframe.

+----+----+----+
|col1|col2|col3|
+----+----+----+
|   1|   1|   2|
|   1|   2|   2|
|   1|   3|   2|
|   1|   2|   1|
|   2|   1|   2|
|   2|   3|   2|
|   2|   2|   1|
|   3|   1|   2|
|   3|   3|   2|
|   3|   2|   1|
+----+----+----+

On sorting df

df = df.sort('col2')
test = df.select('col1','col2','col3')
test.show()

+----+----+----+
|col1|col2|col3|
+----+----+----+
|   3|   1|   2|
|   2|   1|   2|
|   1|   1|   2|
|   1|   2|   1|
|   3|   2|   1|
|   1|   2|   2|
|   2|   2|   1|
|   1|   3|   2|
|   3|   3|   2|
|   2|   3|   2|
+----+----+----+

df.show()

+----+----+----+
|col1|col2|col3|
+----+----+----+
|   2|   1|   2|
|   3|   1|   2|
|   1|   1|   2|
|   1|   2|   2|
|   3|   2|   1|
|   1|   2|   1|
|   2|   2|   1|
|   3|   3|   2|
|   2|   3|   2|
|   1|   3|   2|
+----+----+----+

We can see that the row order of the test is different from df, I don't know what happened, can someone help me understand?

question from:https://stackoverflow.com/questions/65915215/pyspark-df-select-is-disordered-after-df-sort

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

Categories

python - pyspark df.select(*) is disordered after df.sort()

python - pyspark df.select(*) is disordered after df.sort()

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags