A worker node in golang to execute jobs in a docker container
APACHE-2.0 License
The workstation project is machine learning job management system.
It consists of a task queue where user can create new jobs and one or more worker nodes that will pull jobs from this queue, run the algorithm and return the result when it is finished.
The jobs are submitted via a Docker image that shall be available on a public container registry.
The project is made of three repositories:
This tutorial will guide you through running the whole project.
Here we will run the entire project on your local machine from scratch, including the database. The database will be boostrapped with default users.
git clone https://github.com/jjauzion/ws-backend
cd ws-backend
.env
file in the project root folder.WS_ES_HOST=http://localhost
WS_ES_PORT=9200
WS_KIBANA_PORT=5601
WS_API_HOST=localhost
WS_API_PORT=8080
WS_GRPC_HOST=localhost
WS_GRPC_PORT=8090
IS_DEV_ENV=true
TOKEN_DURATION_HOURS=24
WS_ES_USERNAME=""
WS_ES_PWD=""
make elastic
docker logs ws-backend_kibana_1 -f
{"type":"log","@timestamp":"2021-03-28T15:11:50+00:00","tags":["listening","info"],"pid":7,"message":"Server running at http://0:5601"}
{"type":"log","@timestamp":"2021-03-28T15:11:51+00:00","tags":["info","http","server","Kibana"],"pid":7,"message":"http server running at http://0:5601"}
{"type":"log","@timestamp":"2021-03-28T15:11:54+00:00","tags":["warning","plugins","reporting"],"pid":7,"message":"Enabling the Chromium sandbox provides an additional layer of protection."}
make gql FLAG="--bootstrap"
make grpc
At this point you have started the database, the graphQL server that interact with the frontend and the gRPC server that interact with the worker nodes.
Before starting the worker node we will learn how to interact with the backend. First, lets check the database:
Dev Tools
GET _cat/indices?v
ws_task
and onews_user
. Index starting with a dot .
are system index.GET ws_user/_search
{
"query": {
"match_all": {}
}
}
ws_user
by ws_task
Before being able to create user and task you will need to login. As we started the GraphQL server with bootstrap option, two default users have been created in the DB. We will login with the admin user using the GraphQL API.
query login {
login (id: "[email protected]", pwd: "") {
... on Token {
token
userId
username
}
... on Error {
code
message
}
}
}
You should get a response similar to this:
{
"data": {
"login": {
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdXRob3JpemVkIjp0cnVlLCJleHAiOjE2MTczODMzNDksInVzZXJfaWQiOiJkZjljNDYzZC00ZmIwLTRmYzAtYTU5OC00YmQ3NzEzMzg2ZDAifQ.Xj_rUGIB7l90kiXD_U12ni2kf9U-afARaCZKbEao-oU",
"userId": "df9c463d-4fb0-4fc0-a598-4bd7713386d0",
"username": "[email protected]"
}
}
}
As you can see the server successfully authenticated your request and have generated a JWT token that you can use for further request to prove that you are authenticated.
Copy the token
value and userId
somewhere as you will need those later.
We will now see how to create user and task with the GraphQL API.
mutation tuto_create_user {
create_user(input:{email:"[email protected]"}) {
id
email
}
}
403
error because you are not authenticatedHTTP HEADERS
and paste the following (replace the token value with yours):{
"auth": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdXRob3JpemVkIjp0cnVlLCJleHAiOjE2MTczODMwNTUsInVzZXJfaWQiOiI0MzVmNTA3OC02NjFlLTRkOGMtODJjZS0zNDJhZTQ1ZTQ4MzcifQ.RTzseF7mSjR8aop-9CCiNt1-IkqFGem9nNWymaJKBRo"
}
{
"data": {
"create_user": {
"id": "86c776ec-9abe-43a0-93f1-4dac0997ba90",
"email": "[email protected]"
}
}
}
mutation createTask {
create_task(input:{docker_image:"jjauzion/ws-mock-container", dataset:"s3//"}) {
id
user_id
created_at
started_at
ended_at
status
job { dataset, docker_image }
}
}
403
error, check you didn't forget the auth
Header in your request (see previous chapter)Congratulations !! You have created a user and a new jobs :) You can go to the kibana console and run the search to see your creations.
Now that we have created a new task, it would be nice to have a worker to actually run that task right? But before starting a worker node, we need to start the gRPC server:
ws-backend
repository and run: make grpc
Now let's run the worker:
git clone https://github.com/jjauzion/ws-worker.git
ws-worker
repo: cd ws-worker
.env
file at the project root.WS_GRPC_HOST=localhost
WS_GRPC_PORT=8090
make run
This will start the worker and it will automatically pull the task you have created in the previous chapter and run it.
You can go to kibana and check your task, you will see the status going from "NOT_STARTED" to "RUNNING" and "ENDED"
Let's create a real job: running a ML jobs and tracking your jobs parameters while it is running.
For this we will use wandb (https://wandb.ai/site) so you must create a user and copy your private key.
Then paste the following in the playground console and put your wandb key in the env variable. Your key will be encrypted on the server and will never be stored in clear (WIP, not done yet)
mutation createTask {
create_task(input:{env:"WANDB_API_KEY=putYourKeyHere", docker_image:"jjauzion/wandb-test", dataset:"s3//"}) {
id
user_id
created_at
started_at
ended_at
status
job { dataset, docker_image, env }
}
}
Wait until the task status is updated to "RUNNING" (can take up to 30sec), then log to wandb. You should see your work ongoing.
When you are done, go in the ws-backend repo and run make down
to stop and the elastic containers
That's it folks :)