Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
295 views
in Technique[技术] by (71.8m points)

maximum number of columns we can have in dataframe spark scala

I like to know the maximum number of columns I can have in the dataframe,Is there any limitations in maintaining number of columns in dataframes. Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Sparing you the details, the answer is Yes, there is a limit for the size the number of columns in Apache Spark.

Theoretically speaking, this limit depends on the platform and the size of element in each column.

Don't forget that Java is limited by the size of the JVM and an executor is also limited by that size - Java largest object size in Heap.

I would go back an refer to this Why does Spark RDD partition has 2GB limit for HDFS? which refers to the limitation with HDFS on block/partition size.

So there is actually lots of restriction to take into account.

This means that you can easily find a hard limit (Int.MaxValue par ex.) but what is more important Spark scales well only long and relatively thin data. (like stated by pault).

Finally, you need to remember that fundamentally you cannot split a single record between executors/partitions. And there is a number of practical limitations (GC, disk IO) which make very wide data impractical. Not to mention some known bugs.

Note : I mention @pault and @RameshMaharjan as this answer is actually the fruit of the discussion we had. (And ofc @zero323 for his comment from the other answer).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...