A guideline for basic use and installation of kubeflow in AWS.
AWS-Kubeflow
is a guideline for basic use and installation of kubeflow in AWS.
Kubeflow is a Cloud Native platform for machine learning based on Google’s internal machine learning pipelines. Quickly get running with your ML Workflow
The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Our goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow.
Start with a Ubuntu 16.04 EC2
for kubernetes controller
Should >= c4.xlarge (7.5GB Memory, 20GB >= Storage), Open All TCP Port Inbound for test.
I recommend EC2
than docker container, because it is more easy to tunneling with DashBoard.
Connect to your EC2.
$ sudo su
$ apt update && \
apt install python python-pip curl groff vim jq gzip git -y
# install kubectl
$ curl -o kubectl https://amazon-eks.s3-us-west-2.amazonaws.com/1.11.5/2018-12-06/bin/linux/amd64/kubectl && \
chmod +x kubectl && \
mv kubectl /usr/bin/
# kubectl version check
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.5", GitCommit:"753b2dbc622f5cc417845f0ff8a77f539a4213ea", GitTreeState:"clean", BuildDate:"2018-12-06T01:33:57Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
# install aws-iam-authenticator
$ curl -o aws-iam-authenticator https://amazon-eks.s3-us-west-2.amazonaws.com/1.11.5/2018-12-06/bin/linux/amd64/aws-iam-authenticator && \
chmod +x aws-iam-authenticator && \
mv aws-iam-authenticator /usr/bin/
# install awscli
$ pip install awscli --upgrade
# awscli version check
$ aws --version
aws-cli/1.16.169 Python/2.7.12 Linux/4.4.0-1083-aws botocore/1.12.159
# install eksctl
$ curl --silent --location "https://github.com/weaveworks/eksctl/releases/download/latest_release/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp && \
mv /tmp/eksctl /usr/local/bin
# eksctl version check
$ eksctl version
[ℹ] version.Info{BuiltAt:"", GitCommit:"", GitTag:"0.1.33"}
$ export AWS_ACCESS_KEY_ID=<KEY>
$ export AWS_SECRET_ACCESS_KEY=<KEY>
eksctl
# create cluster
$ eksctl create cluster eks-cpu \
--node-type=c4.xlarge \
--timeout=40m \
--nodes=2 \
--region=ap-northeast-2
--node-type
, --region
, --nodes
: select node-type, region, number of nodes.eksctl
will setting availability zones, subnets, make nodegroup with EC2 instances, Auto Scaling Group and Elastic Kubernetes Cluster(EKS), etc.eks
are complete, check the node using the following command:$ kubectl get nodes "-o=custom-columns=NAME:.metadata.name,MEMORY:.status.allocatable.memory,CPU:.status.allocatable.cpu,GPU:.status.allocatable.nvidia\.com/gpu"
NAME MEMORY CPU GPU
ip-192-168-12-60.ap-northeast-2.compute.internal 7548168Ki 4 <none>
ip-192-168-55-153.ap-northeast-2.compute.internal 7548172Ki 4 <none>
$ kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml
# install ksonnet
$ wget https://github.com/ksonnet/ksonnet/releases/download/v0.13.1/ks_0.13.1_linux_amd64.tar.gz && \
tar -xvf ks_0.13.1_linux_amd64.tar.gz && \
mv ks_0.13.1_linux_amd64/ks /usr/local/bin
# ksonnet version check
# ksonnet had ended in github, lastest version is 0.13.1
$ ks version
ksonnet version: 0.13.1
jsonnet version: v0.11.2
client-go version: kubernetes-1.10.4
$ export KUBEFLOW_SRC=/tmp/kubeflow-aws
$ export KUBEFLOW_VERSION=v0.5-branch
$ mkdir -p ${KUBEFLOW_SRC} && cd ${KUBEFLOW_SRC}
$ curl https://raw.githubusercontent.com/graykode/aws-kubeflow/master/kubeflow.sh | bash
$ curl -O https://raw.githubusercontent.com/graykode/aws-kubeflow/master/util.sh && \
mv util.sh ${KUBEFLOW_SRC}/scripts/aws/util.sh
$ export KFAPP=kfapp
$ export REGION=ap-northeast-2
$ export AWS_CLUSTER_NAME=eks-cpu
# check your nodegroup role name
$ aws iam list-roles \
| jq -r ".Roles[] \
| select(.RoleName \
| startswith(\"eksctl-$AWS_CLUSTER_NAME\") and contains(\"NodeInstanceRole\")) \
.RoleName"
eksctl-eks-cpu-nodegroup-ng-11598-NodeInstanceRole-S6OPLB7TW3RR
$ export AWS_NODEGROUP_ROLE_NAMES=eksctl-eks-cpu-nodegroup-ng-11598-NodeInstanceRole-S6OPLB7TW3RR
$ cd ${KUBEFLOW_SRC}
$ ${KUBEFLOW_SRC}/scripts/kfctl.sh init ${KFAPP} --platform aws \
--awsClusterName ${AWS_CLUSTER_NAME} \
--awsRegion ${AWS_REGION} \
--awsNodegroupRoleNames ${AWS_NODEGROUP_ROLE_NAMES}
$ ls
deployment kfapp kubeflow scripts
$ cd ${KFAPP}
# Generate the Kubernetes changes.
$ ${KUBEFLOW_SRC}/scripts/kfctl.sh generate k8s
# deploly changed kubernetes.
$ ${KUBEFLOW_SRC}/scripts/kfctl.sh apply k8s
Finished install kuberflow!!! 😍
Check namespace kubeflow pods. Waiting all pods Running finish.
$ kubectl get pods -n kubeflow
${KUBEFLOW_SRC}/scripts/kfctl.sh delete k8s
If you would like re-connected EKS(such as reconnected ssh terminal), fellow this.
$ sudo su
$ cd /tmp
$ export AWS_ACCESS_KEY_ID=<KEY>
$ export AWS_SECRET_ACCESS_KEY=<KEY>
$ aws eks --region ap-northeast-2 update-kubeconfig --name eks-cpu
# check kubernetes cluster
$ kubectl get nodes
$ kubectl port-forward -n kubeflow `kubectl get pods -n kubeflow --selector=service=ambassador -o jsonpath='{.items[0].metadata.name}'` 8080:80
# !! ssh tunneling using another terminal
$ ssh -i your_key.pem ubuntu@server-ip -L 8080:localhost:8080
Enter to http://127.0.0.1:8080.
# Deploy Kubernetes DashBoard
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml
# Deploy the heapster to monitor the container cluster and enable performance analysis of the cluster.
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/heapster.yaml
# Deploy an influxdb backend to the cluster for the heapster
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/influxdb.yaml
# Create Heapster Cluster Role Bindings for Dashboards
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/rbac/heapster-rbac.yaml
# Create eks-admin service account and cluster role binding
$ kubectl apply -f https://raw.githubusercontent.com/graykode/aws-kubeflow/master/eks-admin-service-account.yaml
# interlock Dashboard
$ kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep eks-admin | awk '{print $1}')
Write token string to login Kubernetes Dashboard.
# start Dashboard
$ kubectl proxy
# !! ssh tunneling using another terminal
$ ssh -i your_key.pem ubuntu@server-ip -L 8001:localhost:8001
You can use kubeflow such as google colaboratory, Machine Learning Engineer don't know the cloud infrastructure, but they only need to use Jupyter notebook.
test
. And Connect to Jupyter Notbook.See kubectl get pods -n kubetl
NAME READY STATUS RESTARTS AGE
..
test-0 1/1 Running 0 15m
..
$ git clone https://github.com/kubeflow/examples
# install pip package
$ pip install pandas sklearn ktext matplotlib annoy nltk pydot
$ wget https://raw.githubusercontent.com/graykode/aws-kubeflow/master/Training.ipynb && \
mv Training.ipynb examples/github_issue_summarization/notebooks
I will add more example after getting used to kuberflow! 🔨🔨
$ eksctl delete cluster --name eks-cpu --region ap-northeast-2