Watches the status of resources in a Kubernetes cluster and uses the results to update a metric in Amazon CloudWatch.
APACHE-2.0 License
Small app written in Go that continuously watches the status of certain resources in a Kubernetes cluster, aggregates these into a single value, and uses that to update a metric in Amazon CloudWatch.
The metric will have the value 1 if all targets are healthy, the value 0 if at least one target is unhealthy (according to the configuration), and missing data if KubeStatus2CloudWatch is unhealthy / down.
Lately I've been using Amazon EKS for running and orchestrating containerized workloads. To monitor the clusters and the workloads within them the popular tools Prometheus, Grafana, and friends are used. They are hosted within the clusters and they will notify my team and me if an alert fires.
But what if the observability system itself goes down? We won't get any notification in that case. And since it is all self-hosted and self-contained there are no SLAs or similar.
We somehow have to monitor the monitoring system. This is where KubeStatus2CloudWatch comes in. It scans the status of the monitoring components in the cluster and manages a CloudWatch metric that reflects the overall status. Now I can go ahead and create a CloudWatch alarm and friends to monitor this one metric.
I am also interested in learning Go and things related to it. So I took this all as an excuse to get my hands dirty.
Here is a high-level overview of the use case described in Motivation. KubeStatus2CloudWatch is used as a bridge between Kubernetes and CloudWatch. The alarm fires if the metric falls below 1 or is missing data for a certain amount of time.
KubeStatus2CloudWatch caters to a specific use case and must be combined with other tools to be useful.
KubeStatus2CloudWatch is written in Go and the code ends up in a single executable binary. There are three approaches:
go build
as usual with Go.Create a configuration file for KubeStatus2CloudWatch. Read Configuration for more information.
The general approach:
Before getting KubeStatus2CloudWatch to run in the cluster, we will first run it locally. This requires AWS and Kubernetes credentials to be setup.
Place the config.yaml
you have adjusted to your requirements next to the
binary. Now execute the binary. You should see in the logs that
KubeStatus2CloudWatch reads the configuration, configures things, and then
starts to scan the targets periodically and update the CloudWatch metric. There
should be no errors visible. Check the metric in CloudWatch. If everything looks
fine, you can proceed with deploying KubeStatus2CloudWatch in Kubernetes.
Note that KubeStatus2CloudWatch interacts with the Kubernetes and CloudWatch APIs, which requires appropriate permissions. IAM Roles for Service Accounts (IRSA) is expected to be available in the cluster.
We need a IAM role with the following trust policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::${ACCOUNT_ID}:oidc-provider/${ISSUER_URL}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${ISSUER_URL}:aud": "sts.amazonaws.com",
"${ISSUER_URL}:sub": "system:serviceaccount:${KUBE_NAMESPACE}:kubestatus2cloudwatch"
}
}
}
]
}
The inline policy should look like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "cloudwatch:PutMetricData",
"Resource": "*",
"Condition": {
"StringEquals": {
"cloudwatch:namespace": "${KUBE_NAMESPACE}"
}
}
}
]
}
Within Kubernetes, the required Service Account references the IAM role:
apiVersion: v1
kind: ServiceAccount
metadata:
namespace: "${KUBE_NAMESPACE}"
name: kubestatus2cloudwatch
labels:
app.kubernetes.io/name: kubestatus2cloudwatch
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::${ACCOUNT_ID}:role/${IAM_ROLE_NAME}
eks.amazonaws.com/sts-regional-endpoints: "true"
But a Role is also required:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: "${KUBE_NAMESPACE}"
name: kubestatus2cloudwatch
labels:
app.kubernetes.io/name: kubestatus2cloudwatch
rules:
- apiGroups: [apps]
resources: [daemonsets, statefulsets, deployments]
verbs: [get]
A Role Binding is used to associate the Role with the Service Account:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
namespace: "${KUBE_NAMESPACE}"
name: kubestatus2cloudwatch
labels:
app.kubernetes.io/name: kubestatus2cloudwatch
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: kubestatus2cloudwatch
subjects:
- kind: ServiceAccount
name: kubestatus2cloudwatch
namespace: "${KUBE_NAMESPACE}"
To provide the configuration file to KubeStatus2CloudWatch, a Config Map is used:
apiVersion: v1
kind: ConfigMap
metadata:
namespace: "${KUBE_NAMESPACE}"
name: kubestatus2cloudwatch
labels:
app.kubernetes.io/name: kubestatus2cloudwatch
data:
config.yaml: |
...
Now finish it up by creating the Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: "${KUBE_NAMESPACE}"
name: kubestatus2cloudwatch
labels:
app.kubernetes.io/name: kubestatus2cloudwatch
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: kubestatus2cloudwatch
template:
metadata:
labels:
app.kubernetes.io/name: kubestatus2cloudwatch
spec:
containers:
- image: trallnag/kubestatus2cloudwatch:${VERSION}
name: kubestatus2cloudwatch
volumeMounts:
- name: config
subPath: config.yaml
mountPath: /app/config.yaml
readOnly: true
serviceAccountName: kubestatus2cloudwatch
volumes:
- name: config
configMap:
name: kubestatus2cloudwatch
Check the container logs and the CloudWatch metric to see if things work as expected.
KubeStatus2CloudWatch is configured with a YAML file that is called
config.yaml
and placed right next to binary. The app will crash during startup
without a valid configuration file.
A valid exemplary configuration with extensive comments as documentation can be
found at assets/config-example.yaml
. It can be
used as a starting point. The file
assets/config-minimal.yaml
contains a minimal
configuration.
As a supplement the corresponding JSON schema at
assets/config.schema.json
can be used as well.
The project is maintained by me, trallnag, and I am interested in keeping it alive as my colleagues and I use it in production. I also don't mind developing it further as I like working with Go.
Contributions are welcome. Please refer to CONTRIBUTING.md
.
Consult DEVELOPMENT.md
for guidance regarding development.
Read RELEASE.md
for details about the release process.
This work is licensed under the
Apache License (Apache-2.0),
a permissive license whose main conditions require preservation of copyright and
license notices. See LICENSE
for the license text.
This work comes with an explicit NOTICE
file containing additional
legal notices and information.