Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
134 views
in Technique[技术] by (71.8m points)

apache spark - Convert to dataframe error

I want to make a dataframe with 110 columns, so i create a class with 110 attributes when i try to convert the rdd to dataframe.

case class Myclass(var cin_nb:String,...........,var last:String)
import sqlContext.implicts._
file2.map(_.split("	")).map(a=>Myclass(a(0),a(1),a(2),a(3),.....a(110)).ToDf()

I got this error:

not enough arguments for method apply: (cin_nb: String,...........,last:String)

i'm using scala and spark 1.6. Thank you

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can't do this because there is a hard limit of 22 columns with case classes / StructType schemas. This is due to the Tuple in scala only supporting 22 elements!! To grow a dataframe to more columns you need to expand it using the .withColumn function, or load from file directly into a Dataframe. For example, from parquet, or using the databricks csv parser.

Edit: An example of how to do this with .withColumn

import scala.util.Random

val numCols = 100
val numRows = 5
val delimiter = "	"

def generateRowData = (0 until numCols).map(i => Random.alphanumeric.take(5).mkString).mkString(delimiter)

val df = sc.parallelize((0 until numRows).map(i => generateRowData).toList).toDF("data")

def extractCol(i: Int, sep: String) = udf[String, String](_.split(sep)(i))

val result = (0 until numCols).foldLeft(df){case (acc,i) => acc.withColumn(s"c$i", extractCol(i,delimiter)($"data"))}.drop($"data")

result.printSchema
result.show

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...