networking - Spark SPARK_PUBLIC_DNS and SPARK_LOCAL_IP on stand-alone cluster with docker containers

Question

Welcome To Ask or Share your Answers For Others

networking - Spark SPARK_PUBLIC_DNS and SPARK_LOCAL_IP on stand-alone cluster with docker containers

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

networking - Spark SPARK_PUBLIC_DNS and SPARK_LOCAL_IP on stand-alone cluster with docker containers

So far I have run Spark only on Linux machines and VMs (bridged networking) but now I am interesting on utilizing more computers as slaves. It would be handy to distribute a Spark Slave Docker container on computers and having them automatically connecting themselves to a hard-coded Spark master ip. This short of works already but I am having trouble configuring the right SPARK_LOCAL_IP (or --host parameter for start-slave.sh) on slave containers.

I think I correctly configured the SPARK_PUBLIC_DNS env variable to match the host machine's network-accessible ip (from 10.0.x.x address space), at least it is shown on Spark master web UI and accessible by all machines.

I have also set SPARK_WORKER_OPTS and Docker port forwards as instructed at http://sometechshit.blogspot.ru/2015/04/running-spark-standalone-cluster-in.html, but in my case the Spark master is running on an other machine and not inside Docker. I am launching Spark jobs from an other machine within the network, possibly also running a slave itself.

Things that I've tried:

Not configure SPARK_LOCAL_IP at all, slave binds to container's ip (like 172.17.0.45), cannot be connected to from master or driver, computation still works most of the time but not always
Bind to 0.0.0.0, slaves talk to master and establish some connection but it dies, an other slave shows up and goes away, they continue looping like this
Bind to host ip, start fails as that ip is not visible within the container but would be reachable by others as port-forwarding is configured

I wonder why isn't the configured SPARK_PUBLIC_DNS being used when connecting to slaves? I thought SPARK_LOCAL_IP would only affect on local binding but not being revealed to external computers.

At https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/troubleshooting/connectivity_issues.html they instruct to "set SPARK_LOCAL_IP to a cluster-addressable hostname for the driver, master, and worker processes", is this the only option? I would avoid the extra DNS configuration and just use ips to configure traffic between computers. Or is there an easy way to achieve this?

Edit: To summarize the current set-up:

Master is running on Linux (VM at VirtualBox on Windows with bridged networking)
Driver submits jobs from an other Windows machine, works great
Docker image for starting up slaves is distributed as a "saved" .tar.gz file, loaded (curl xyz | gunzip | docker load) and started on other machines within the network, has this probem with private/public ip configuration

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:47:36+0000

I am also running spark in containers on different docker hosts. Starting the worker container with these arguments worked for me:

docker run 
-e SPARK_WORKER_PORT=6066 
-p 6066:6066 
-p 8081:8081 
--hostname $PUBLIC_HOSTNAME 
-e SPARK_LOCAL_HOSTNAME=$PUBLIC_HOSTNAME 
-e SPARK_IDENT_STRING=$PUBLIC_HOSTNAME 
-e SPARK_PUBLIC_DNS=$PUBLIC_IP 
spark ...

where $PUBLIC_HOSTNAME is a hostname reachable from the master.

The missing piece was SPARK_LOCAL_HOSTNAME, an undocumented option AFAICT.

https://github.com/apache/spark/blob/v2.1.0/core/src/main/scala/org/apache/spark/util/Utils.scala#L904

Categories

networking - Spark SPARK_PUBLIC_DNS and SPARK_LOCAL_IP on stand-alone cluster with docker containers

networking - Spark SPARK_PUBLIC_DNS and SPARK_LOCAL_IP on stand-alone cluster with docker containers

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags