An Azure template to deploy a lightweight Kubernetes cluster using k3s.io
MIT License
A (WIP) dynamically resizable k3s
cluster for Azure, based on my azure-docker-swarm-cluster
project.
This is an Azure Resource Manager template that automatically deploys a k3s
cluster atop Ubuntu LTS. This cluster has a single master VM and a VM scaleset for workers/agents, plus required network infrastructure.
The template defaults to deploying B-Series VMs (B1ls
) with the smallest possible managed disk size (S4, 32GB). It also deploys (and mounts) an Azure File Share on all machines with (very) permissive access at /srv
, which makes it quite easy to run stateful services without messing about with volume claims.
The key aspect of this template is that you can add and remove agents at will simply by resizing the VM scaleset, which is very handy when running the node pool as spot instances - the cluster comes with a few (very simple) helper scripts that allow nodes to join and leave the cluster as they are created/destroyed, and the k3s
scheduler will redeploy pods as needed.
This was originally built as a Docker Swarm template, and even though Azure has a perfectly serviceable Kubernetes managed service, I enjoy the challenge of building my own stuff and fine-tuning it.
k3s
is a breath of fresh air, and an opportunity to play around with a simpler, slimmer version of Kubernetes--and break it to see what happens.
Also, a lot of the ARM templating involved (for metrics, managed identities, etc.) lacked comprehensive samples when I started the project, so this was also a way for me to provide a fully working example that other people can learn from.
curl
cloud-config
kubernetes-dashboard
deployment/srv/autostart
?)sudo tailscale up
after deployment)Makefile
k3s
v1.25.2+k3s1 and allow setting cluster version in Makefile
docker
instead of containerd
, but saves a lot of hassle when doing tests)Makefile
k3s
v1.22.3+k3s1k3c
python
instead of python3
)cloud-config
for Ubuntu 20.04 and python3
k3s
v1.19.4+k3s1k3s
v1.17.0+k3s.1k3s
1.0.1k3s
1.0.0k3s
0.8.0k3s
0.7.0k3s
0.6.0/mnt/scratch
bash
completion for kubectl
in master nodetraefik
on the master)k3s
viacloud-config
cloud-config
to expose k3s
token to agentscloud-config
Makefile
README
Makefile
commandsmake keys
- generates an SSH key for provisioningmake deploy-storage
- deploys shared storagemake params
- generates ARM template parametersmake deploy-compute
- deploys cluster resources and pre-provisions Docker on all machinesmake view-deployment
- view deployment statusmake watch-deployment
- watch deployment progressmake list-agents
- lists all agent VMsmake scale-agents-<number>
- scales the agent VM scale set to <number>
instances, i.e., make scale-10
will resize it (up or down) to 10 VMsmake stop-agents
- stops all agentsmake start-agents
- starts all agentsmake reimage-agents-parallel
- nukes and paves all agentsmake reimage-agents-serial
- reimages all agents in sequencemake chaos-monkey
- restarts all agents in random ordermake proxy
- opens an SSH session to master0
and sets up TCP forwarding to localhost
make tail-helper
- opens an SSH session to master0
and tails the k3s-helper
logmake list-endpoints
- list DNS aliasesmake destroy-cluster
- destroys the entire cluster (should not be the default)make destroy-compute
- destroys only the compute cluster (should be the default if you want to save costs)make destroy-storage
- destroys the storage (should be avoided)az login
make keys
make deploy-storage
make params
make deploy-compute
make view-deployment
# Go to the Azure portal and check the deployment progress
# Clean up after we're done working for the day, to save costs (preserves storage)
make destroy-compute
# Clean up the whole thing (destroys storage as well)
make destroy-cluster
Azure Cloud Shell (which includes all the below in bash
mode) or:
pip install -U -r requirements.txt
will install it)make
(you can just read through the Makefile
and type the commands yourself)master0
runs a very simple HTTP server (only accessible inside the cluster) that provides tokens for new VMs to join the cluster and an endpoint for them to signal that they're leaving. That server also cleans up the node table once agents are gone.
Upon provisioning, all agents try to obtain a token and join the cluster. Upon rebooting, they signal they're leaving the cluster and re-join it again.
This is done in the simplest possible way, by using cloud-init
to bootstrap a few helper scripts that are invoked upon shutdown and (re)boot. Check the YAML files for details.
The cluster is actually split across two resource groups (-storage
and -compute
).
The -storage
resource group contains an Azure Storage Account with an Azure Files (SMB) share that is mounted on all the nodes. This makes it trivial to deploy the cluster, work on it for a few hours, store your manifests and data on /srv
, destroy the -compute
resources to save costs and spin them up again against the same -storage
the next day.
Since it is possible to run machines like Standard_NV6ads_A10_v5
as spot instances, you can now try to run k3s
on these with nvidia-docker2
, which should be considered highly experimental since Ubuntu 22.04 is not yet supported on those machines (although older versions are).
To avoid using VM extensions (which are nice, but opaque to most people used to using cloud-init
) and to ensure each fresh deployment runs the latest Docker version, VMs are provisioned using customData
in their respective ARM templates.
cloud-init
files and SSH keys are then packed into the JSON parameters file and submitted as a single provisioning transaction, and upon first boot of a node (master or agent) Ubuntu takes the cloud-init
file and provisions the machine accordingly.
See azure-docker-swarm-cluster
for more details.
Deploying registry/registry.yml
will set up a container registry on the master node that uses the shared Azure Files storage (mounted in /srv
) as backing store.
The container registry is "insecure" in that it does not require authentication nor uses HTTPS, but it is accessible only to the cluster nodes and does not require setting up any kind of certificates.
This makes it very easy to use docker
on the master node to manage images, and having the registry mapped to the SMB share ensures it is persistent across cluster deployments if you don't delete the -storage
resource group.
Pro Tip: You can set
STORAGE_ACCOUNT_GROUP
andSTORAGE_ACCOUNT_NAME
inside an.env
file if you want to use a pre-existing storage account. As long as you usemake
to do everything, the value will be automatically overridden.
Keep in mind that this was written for conciseness and ease of experimentation -- look to AKS for a production service.