Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
788 views
in Technique[技术] by (71.8m points)

apache spark - How to allocate more executors per worker in Standalone cluster mode?

I use Spark 1.3.0 in a cluster of 5 worker nodes with 36 cores and 58GB of memory each. I'd like to configure Spark's Standalone cluster with many executors per worker.

I have seen the merged SPARK-1706, however it is not immediately clear how to actually configure multiple executors.

Here is the latest configuration of the cluster:

spark.executor.cores = "15"
spark.executor.instances = "10"
spark.executor.memory = "10g"

These settings are set on a SparkContext when the Spark application is submitted to the cluster.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You first need to configure your spark standalone cluster, then set the amount of resources needed for each individual spark application you want to run.

In order to configure the cluster, you can try this:

  • In conf/spark-env.sh:

    • Set the SPARK_WORKER_INSTANCES = 10 which determines the number of Worker instances (#Executors) per node (its default value is only 1)
    • Set the SPARK_WORKER_CORES = 15 # number of cores that one Worker can use (default: all cores, your case is 36)
    • Set SPARK_WORKER_MEMORY = 55g # total amount of memory that can be used on one machine (Worker Node) for running Spark programs.
  • Copy this configuration file to all Worker Nodes, on the same folder

  • Start your cluster by running the scripts in sbin (sbin/start-all.sh, ...)

As you have 5 workers, with the above configuration you should see 5 (workers) * 10 (executors per worker) = 50 alive executors on the master's web interface (http://localhost:8080 by default)

When you run an application in standalone mode, by default, it will acquire all available Executors in the cluster. You need to explicitly set the amount of resources for running this application: Eg:

val conf = new SparkConf()
             .setMaster(...)
             .setAppName(...)
             .set("spark.executor.memory", "2g")
             .set("spark.cores.max", "10")

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...