python - spark 2.1.0 session config settings (pyspark)

Question

Welcome To Ask or Share your Answers For Others

python - spark 2.1.0 session config settings (pyspark)

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - spark 2.1.0 session config settings (pyspark)

I am trying to overwrite the spark session/spark context default configs, but it is picking entire node/cluster resource.

 spark  = SparkSession.builder
                      .master("ip")
                      .enableHiveSupport()
                      .getOrCreate()

 spark.conf.set("spark.executor.memory", '8g')
 spark.conf.set('spark.executor.cores', '3')
 spark.conf.set('spark.cores.max', '3')
 spark.conf.set("spark.driver.memory",'8g')
 sc = spark.sparkContext

It works fine when i put the configuration in spark submit

spark-submit --master ip --executor-cores=3 --diver 10G code.py

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:28:19+0000

You aren't actually overwriting anything with this code. Just so you can see for yourself try the following.

As soon as you start pyspark shell type:

sc.getConf().getAll()

This will show you all of the current config settings. Then try your code and do it again. Nothing changes.

What you should do instead is create a new configuration and use that to create a SparkContext. Do it like this:

conf = pyspark.SparkConf().setAll([('spark.executor.memory', '8g'), ('spark.executor.cores', '3'), ('spark.cores.max', '3'), ('spark.driver.memory','8g')])
sc.stop()
sc = pyspark.SparkContext(conf=conf)

Then you can check yourself just like above with:

sc.getConf().getAll()

This should reflect the configuration you wanted.

Categories

python - spark 2.1.0 session config settings (pyspark)

python - spark 2.1.0 session config settings (pyspark)

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags