If you run Spark in local mode, you can make it use an arbitrary set of Hadoop conf files -- i.e. core-site.xml
, hdfs-site.xml
, mapred-site.xml
, yarn-site.xml
, hive-site.xml
copied from the Kerberized cluster.
So you can access HDFS on that cluster -- if you have a Kerberos ticket that grants you access to that cluster, of course.
export HADOOP_CONF_DIR=/path/to/conf/of/remote/kerberized/cluster
kinit sylvestre@WORLD.COMPANY
spark-shell --master local[*]
But in yarn-client or yarn-cluster mode, you cannot launch containers in the local cluster and access HDFS in the other.
- either you use the local
core-site.xml
that says that hadoop.security.authentication
is simple
, and you can connect to local YARN/HDFS
- or you point to a copy of the remote
core-site.xml
that says that hadoop.security.authentication
is kerberos
, and you can connect to remote YARN/HDFS
- but you cannot use the local, unsecure YARN and access the remote, secure HDFS
Note that with unsecure-unsecure or secure-secure combinations, you could access HDFS in another cluster, by hacking your own custom hdfs-site.xml
to define multiple namespaces. But you are stuck to a single authentication model.
[edit] see the comment by Mighty Steve Loughran about an extra Spark property to access remote, secure HDFS from a local, secure cluster.
Note also that with DistCp you are stuck the same way -- except that there's a "cheat" property that allows you to go from secure to unsecure.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…