hadoop - Access a secured Hive when running Spark in an unsecured YARN cluster

Question

Welcome To Ask or Share your Answers For Others

hadoop - Access a secured Hive when running Spark in an unsecured YARN cluster

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

hadoop - Access a secured Hive when running Spark in an unsecured YARN cluster

We have two cloudera 5.7.1 clusters, one secured using Kerberos and one unsecured.

Is it possible to run Spark using the unsecured YARN cluster while accessing hive tables stored in the secured cluster? (Spark version is 1.6)

If so, can you please provide some explanation on how can I get it configured?

Update:

I want to explain a little the end goal behind my question. Our main secured cluster is heavily utilized and our jobs can't get enough resources to complete in a reasonable time. In order to overcome this, we wanted to use resources from another unsecured cluster we have without needing to copy the data between the clusters.

We know it's not the best solution as the data locality level might not be optimal, however that's the best solution we can come up for now.

Please let me know if you have any other solution as it seems like we can't achieve the above.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:50:09+0000

If you run Spark in local mode, you can make it use an arbitrary set of Hadoop conf files -- i.e. core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml, hive-site.xml copied from the Kerberized cluster.
So you can access HDFS on that cluster -- if you have a Kerberos ticket that grants you access to that cluster, of course.

  export HADOOP_CONF_DIR=/path/to/conf/of/remote/kerberized/cluster
  kinit sylvestre@WORLD.COMPANY
  spark-shell --master local[*]

But in yarn-client or yarn-cluster mode, you cannot launch containers in the local cluster and access HDFS in the other.

either you use the local core-site.xml that says that hadoop.security.authentication is simple, and you can connect to local YARN/HDFS
or you point to a copy of the remote core-site.xml that says that hadoop.security.authentication is kerberos, and you can connect to remote YARN/HDFS
but you cannot use the local, unsecure YARN and access the remote, secure HDFS

Note that with unsecure-unsecure or secure-secure combinations, you could access HDFS in another cluster, by hacking your own custom hdfs-site.xml to define multiple namespaces. But you are stuck to a single authentication model.
[edit] see the comment by Mighty Steve Loughran about an extra Spark property to access remote, secure HDFS from a local, secure cluster.

Note also that with DistCp you are stuck the same way -- except that there's a "cheat" property that allows you to go from secure to unsecure.

Categories

hadoop - Access a secured Hive when running Spark in an unsecured YARN cluster

hadoop - Access a secured Hive when running Spark in an unsecured YARN cluster

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags