Skip to content

Kubernetes Docker Spark ML

Chris Fregly edited this page Sep 27, 2016 · 28 revisions

Download Docker for Mac or Windows

https://www.docker.com/products/docker

Setup Kubernetes Cluster on Google, AWS, Azure, or On-Premise

Run Kubernetes Client through Local Docker Container

  • KUBERNETES_SERVER: IP Address of Container Cluster
  • KUBERNETES_USERNAME: User of Container Cluster (ie. admin)
  • KUBERNETES_PASSWORD: Password of Container Cluster
  • KUBERNETES_NAMESPACE: Your Unique Namespace ID (ie. your username)
docker run -itd --name=kubernetes --privileged --net=host -e KUBERNETES_SERVER=[KUBERNETES_SERVER] -e KUBERNETES_NAMESPACE=[KUBERNETES_NAMESPACE] -e KUBERNETES_USERNAME=admin -e KUBERNETES_PASSWORD=[KUBERNETES_PASSWORD] fluxcapacitor/kubernetes
  • Bash into Local Docker Container
docker exec -it kubernetes bash

Set User-Specific Context and Namespace

kubectl create namespace $KUBERNETES_NAMESPACE

kubectl config set-context $KUBERNETES_NAMESPACE --user=demo --namespace=$KUBERNETES_NAMESPACE --cluster=demo

kubectl config use-context $KUBERNETES_NAMESPACE

kubectl config current-context

Explore Kubernetes Cluster

kubectl get nodes

kubectl cluster-info

kubectl get pod

kubectl get svc

Explore Kubernetes Dashboard

https://[KUBERNETES-CLUSTER-IP]/api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard/#/workload

Deploy Spark Master (ReplicationController + Pod)

  • Create Spark Master
git clone https://github.com/fluxcapacitor/master.ml.git
kubectl create -f master.ml/spark/2.0.1/master-spark-rc.yaml
  • Verify Spark Master
kubectl get rc

kubectl get pod
  • Check Logs of Spark Master
kubectl describe pod [POD-NAME]

kubectl logs -f [POD-NAME]

Deploy Spark Master (LoadBalancer + Service)

kubectl create -f master.ml/spark/2.0.1/master-spark-svc.yaml
kubectl get svc -w

Verify Spark Master through Browser

http://[SPARK-MASTER-EXTERNAL-IP]:6060

Deploy Spark Worker (ReplicationController + Pod)

  • Clone the latest worker.ml repo
git clone https://github.com/fluxcapacitor/worker.ml.git
  • Create Spark Worker
kubectl create -f worker.ml/spark/2.0.1/worker-spark-rc.yaml
  • Verify Spark Worker
kubectl get rc

kubectl get pod
  • Check Spark Worker Logs
kubectl describe pod [POD-NAME]

kubectl logs -f [POD-NAME]

Scale Spark Worker through Command Line

kubectl scale --replicas=4 rc worker-spark-2-0-1

Scale Spark Worker through Kubernetes Dashboard

https://[KUBERNETES-CLUSTER-IP]/api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard/#/workload

Perform Rolling Update of Spark Worker (ie. Larger Worker Memory)

  • Perform the Rolling Update
kubectl rolling-update worker-spark-2-0-1 -f worker.ml/spark/2.0.1/worker-spark-rc-8.yaml

Deploy Zeppelin (ReplicationController + Pod)

git clone https://github.com/fluxcapacitor/zeppelin.ml.git
kubectl create -f zeppelin.ml/zeppelin-rc.yaml
  • Verify Zeppelin
kubectl get rc

kubectl get pod
  • Check Logs of Zeppelin
kubectl logs -f [POD-NAME]

kubectl describe [POD-NAME]

Deploy Zeppelin (LoadBalancer + Service)

kubectl create -f zeppelin.ml/zeppelin-svc.yaml
kubectl get svc -w

Verify Zeppelin Notebook

  • Navigate Browser...
http://[ZEPPELIN-EXTERNAL-IP]:8080
  • Change Interpreter's Spark Master URL (Upper right, Drop-down, Interpreter)
Change value to spark://[SPARK-MASTER-EXTERNAL-IP]:7077
  • Run Example Notebooks!
Clone this wiki locally