-
Notifications
You must be signed in to change notification settings - Fork 117
[WIP] Use HDFS Delegation Token in driver/executor pods as part of Secure HDFS Support #379
[WIP] Use HDFS Delegation Token in driver/executor pods as part of Secure HDFS Support #379
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessary to add since you are doing a .copy()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Will update the code in the next diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: arguments should be one on each line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will address this in the next diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep consistent with mine. I called this spark.kubernetes.kerberos._
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not use "kerberos" in the name because Hadoop delegation tokens are relatively independent of kerberos. And also we should indicate this is related with "hadoop", as opposed to spark core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put these into /etc/hadoop/ so that only root access will be allowed to this folder. thoughts and reasoning for location? Also, should this be customizable in config.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In YARN, the token file is placed into a work dir for the executors. So I don't think /etc/hadoop is a good location. And root access or not is irrelevant in pods since everything runs as root.
I chose this dir because we have the other code using it. I think it's better to follow the convention here. From line 30 above:
private[spark] val DRIVER_CREDENTIALS_SECRETS_BASE_DIR =
"/mnt/secrets/spark-kubernetes-credentials"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/mnt/secrets
is good for the credentials. Use /etc/hadoop
for configuration files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to be slightly refactored after merging this in after PR 373 because of the hadoopStepsOrchestrator that is leveraged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason for why this needs to be a seperate util and cant just be as part of a step. where is this being re-used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on another note the pod and container will be shared by the PodWithDetachedMainContainer() so it will need to be refactored when PRs are merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used by both the driver and executors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #373 for how one can share a module between the driver and the executors. But we also can think about how to do this better since the abstraction is a little difficult to reason about right now. We do a similar operation with setting the init-container bootstrap on the executors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above, would be cleaner to write a PodWithDetachedContainer() that shares the container and pod so you wouldn't need to call this twice this way. I wrote one in PR 373 called PodWithMainContainer()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the suggested plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above with detachedContainer logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend to leverage a Bootstrap method similair to how SparkPodInitBootStrap functions that could be passed in as an Option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This builder code would deserve some refactoring. But I'd like to put it off until the prototype works end-to-end and we see clearly what is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make the key configurable as well so that it doesn't always have to be "hadoop-token-file".
rerun unit tests please |
After rebasing to |
d893e5e
to
0141c0a
Compare
Rebased to |
Follow up to jackson 2.9.5 upgrade where RDDOperationScope can't find the serializer for the Option (parent variable) and serializes it using defaults. This force jackson to use static typing, i.e. use the class types and not values determined using reflection at runtime
Follow up to jackson 2.9.5 upgrade where RDDOperationScope can't find the serializer for the Option (parent variable) and serializes it using defaults. This force jackson to use static typing, i.e. use the class types and not values determined using reflection at runtime
What changes were proposed in this pull request?
This is stage 3 of Secure HDFS Support, which is an on-going work of setting up Secure HDFS interaction with Spark-on-K8S. This change should be integrated with Stage 1 - 2, being implemented in other PRs.
The architecture is discussed in this community-wide google doc
This initiative can be broken down into 4 stages.
STAGE 1
STAGE 2
STAGE 3
STAGE 4
How was this patch tested?
Manually tested with the following steps. (Note step 1 - 3 and use of the pre-created secret below will go away when the submission client generates the secret automatically)
Copy the
/tmp/hadoop-token-file
to your Macbook.Create a K8s secret from the file
From the debug-enabled log, we can see "TOKEN" is used as authentication mechanism for the secure namenode.
The job will fail without the DT secret. Here's the failing job's log where the driver can't use either TOKEN nor KERBEROS authentication. Note the driver pod is not configured to enable Kerberos, which would be ok as long as there is a valid DT like the successful job run above: