From On-Premise to Cloud
The main page of the final project written and organized in Terraform for The Amazon Cloud School 5th.
The project was recognized with an award of excellence in a project evaluation organized by RAPA and AWS.
For more detailed information, please refer to this link.
The AWS Load Balancer Controller manages Elastic Load Balancers in an Amazon EKS cluster. Configures ALB/NLB resources through Kubernetes, automating load balancing to simplify deployment.
Utilizes Fluent Bit with AWS plugins for efficient log management. Fluent Bit sends logs to CloudWatch Logs or Kinesis Data Firehose, offering lower resource usage than Fluentd.
Metrics Server is essential for autoscaling, gathering real-time metrics for Kubernetes and enabling the Horizontal Pod Autoscaler (HPA) to respond to resource demands.
Karpenter dynamically provisions new nodes based on workload demands, observing real-time events to optimize cluster resources.
Kube Prometheus Stack combines Prometheus, Grafana, and Prometheus Operator to monitor Kubernetes environments effectively, offering end-to-end visibility of application metrics.
Kube-ops-view provides a visual interface for viewing the status of nodes, pods, and clusters, aiding in the real-time management and monitoring of Kubernetes clusters.
ArgoCD is a GitOps continuous delivery tool for Kubernetes, allowing for declarative configuration management and automated deployment of applications based on changes in Git repositories.
The EBS CSI Driver enables dynamic provisioning of Amazon EBS volumes in Kubernetes. It simplifies storage management, allowing applications to automatically attach, mount, and detach EBS volumes as needed.
The EKS Blueprints Teams module manages multi-tenancy within Amazon EKS clusters. It facilitates team-based operations through isolated namespaces and access controls.
Before you begin, ensure that the following tools are installed:
# Step 1: Create an IAM user named 'admin'
# This command creates a new IAM user named 'admin'.
$ aws iam create-user --user-name admin
# Step 2: Attach AdministratorAccess policy to the 'admin' user
# This grants full admin rights to the user, enabling access to all AWS resources.
$ aws iam attach-user-policy --user-name admin --policy-arn arn:aws:iam::aws:policy/AdministratorAccess
# Step 3: (Optional) Create an access key for programmatic access
# Generating an access key allows 'admin' to interact with AWS services programmatically.
# Make sure to store the access key and secret key securely.
$ aws iam create-access-key --user-name admin
# you can check AccessKeyId and SecretAccessKey
{
"AccessKey": {
"UserName": "admin",
"AccessKeyId": "YOUR_ACCESS_KEY_ID",
"Status": "Active",
"SecretAccessKey": "YOUR_SECRET_ACCESS_KEY",
"CreateDate": "2024-10-14T12:34:56Z"
}
}
$ aws configure
AWS Access Key ID [None]: YOUR_ACCESS_KEY_ID
AWS Secret Access Key [None]: YOUR_SECRET_ACCESS_KEY
Default region name [None]: ap-northeast-2 # region you wanna launch resource in
Default output format [None]: json # json or yaml etc
$ export AWS_DEFAULT_PROFILE=<your_aws_profile_name>
Before you continue, you need to enable your AWS account to launch Spot instances if you haven't launched any yet. To do so, create the service-linked role for Spot by running the following command:
$ aws iam create-service-linked-role --aws-service-name spot.amazonaws.com || true
You might see the following error if the role has already been successfully created. You don't need to worry about this error, you simply had to run the above command to make sure you have the service-linked role to launch Spot instances:
An error occurred (InvalidInput) when calling the CreateServiceLinkedRole operation: Service role name AWSServiceRoleForEC2Spot has been taken in this account, please try a different suffix.
$ git clone https://github.com/Team-Hama/A-Hybrid-Journey-for-Minecraft-Server-Scalability-with-AWS.git
Change directory to infra and update the variables.tf, terraform.tfvars and local.tf accordingly. Check the plan and initiate apply to create the VPC, and the EKS cluster, along with Karpenter to scale the node pool.
To create the cluster, run the following commands:
$ terraform init
$ terraform plan
$ terraform apply --auto-approve
Or, if you want to create the cluster module by module to have a better
understanding, remove depends_on
from
eks_cluster.tf file and run the following commands:
$ terraform init
$ terraform plan
$ terraform apply -target="module.vpc" -auto-approve
$ terraform apply -target="module.eks" -auto-approve
$ terraform apply --auto-approve
Check the AWS console for the newly created VPC [VPC ID from the Terraform output]
$ aws ec2 describe-vpcs --vpc-ids "vpc-0ae0071a560b7a269" --region=ap-northeast-2
{
"Vpcs": [
{
"CidrBlock": "10.0.0.0/16",
"DhcpOptionsId": "dopt-0b0d0955f7a6e6f89",
"State": "available",
"VpcId": "vpc-0ae0071a560b7a269",
"OwnerId": "53xxxxxxxx44",
"InstanceTenancy": "default",
"CidrBlockAssociationSet": [
{
"AssociationId": "vpc-cidr-assoc-0f08fa04c7809dbe6",
"CidrBlock": "10.0.0.0/16",
"CidrBlockState": {
"State": "associated"
}
}
],
"IsDefault": false,
"Tags": [
{
"Key": "blueprint",
"Value": "eks-cluster"
},
{
"Key": "Name",
"Value": "eks-cluster"
}
]
}
]
}
Once completed (after waiting about 15 minutes), run the following command to update the kube.config file to interact with the cluster through kubectl:
$ aws eks --region $AWS_REGION update-kubeconfig --name $CLUSTER_NAME
Then, check the EKS cluster, i.e.
$ eksctl get cluster --name=eks-cluster --region=ap-northeast-2
NAME VERSION STATUS CREATED VPC SUBNETS SECURITYGROUPS PROVIDER
eks-cluster 1.30 ACTIVE 2024-09-10T14:19:44Z vpc-0ae0071a560b7a269 subnet-022e5f6ce1670fe14,subnet-02e1475b6290fcba6,subnet-0c62df535bc954c1f EKS
We asked our carrier to assign a public IP to our computers on-premises.VyOS needs to be given a public IP by DHCP to implement Site to Site VPN.
Unfortunately, we were unable to implement the Cloudwatch part with Terraform.
Please see below for instructions on how to set it up with the console
How to trigger Lambda function
"Jenkins is an open-source automation server that facilitates the automation of software development processes, such as building, testing, and deploying applications. As a widely adopted tool for Continuous Integration and Continuous Delivery (CI/CD), Jenkins enables teams to efficiently manage and accelerate their development pipelines by automating repetitive tasks. With its extensive plugin ecosystem, Jenkins can integrate with numerous tools, making it highly adaptable to different workflows and environments.
For more information about Jenkins and a deeper understanding of CI/CD, please refer to this link."
For a repo of the image files built with Jenkins, please see the following link
For a repo of the image files built with Jenkins, please see the following link
You need to make sure you can interact with the cluster and the Karpenter
pods are running, i.e.:
$ kubectl get pods -n karpenter
NAME READY STATUS RESTARTS AGE
karpenter-c5447bdf5-7kp6j 1/1 Running 0 19m
karpenter-c5447bdf5-qc7lf 1/1 Running 0 19m
List the nodes available after creating the cluster.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-101-90.us-east-2.compute.internal Ready <none> 35m v1.30.2-eks-1552ad0
Now, move to EKS directory and deploy Nginx
with
nginx-deployment.yml file.
$ kubectl apply -f nginx-deployment.yml
deployment.apps/nginx-deployment created
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-deployment-576c6b7b6-6blkw 1/1 Running 0 22s
nginx-deployment-576c6b7b6-7nnd7 1/1 Running 0 22s
nginx-deployment-576c6b7b6-b4f8p 1/1 Running 0 22s
There are 3
pods running, as mentioned in the YAML
file.
for scale out to 30 pods, input comment below
kubectl scale deployment/nginx-depolyment --replicas=30
We still have 1
worker node, as it is sufficient to accommodate all the 30
pods.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-101-90.us-east-2.compute.internal Ready <none> 48m v1.30.2-eks-1552ad0
We want to see Karpenter
in action, right? To achieve that, increase the
replica from 30
to 150
, and apply the change. Wait for a couple of minutes,
and check for the nodes again.
$ kubectl scale deployment/nginx-depolyment --replicas=30
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-101-90.us-east-2.compute.internal Ready <none> 51m v1.30.2-eks-1552ad0
ip-10-0-55-209.us-east-2.compute.internal Ready <none> 24s v1.30.2-eks-1552ad0
YES!! Now we have 2
worker nodes, and 150
pods deployed. Karpenter
scaled
out based on the settings and needs of the resources to accommodate the
desired pod counts.
Now, set the replica count to 3
again, and check for the results.
$ kubectl apply -f nginx-deployment.yml
deployment.apps/nginx-deployment configured
$ kubectl get pods --output name | wc -l
3
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-101-90.us-east-2.compute.internal Ready <none> 60m v1.30.2-eks-1552ad0
Also, you can check the Karpenter
logs to have a better understanding.
$kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller
{"level":"INFO","time":"2024-09-10T14:45:22.231Z","logger":"controller","message":"discovered ssm parameter","commit":"62a726c","controller":"nodeclass.status","controllerGroup":"karpenter.k8s.aws","controllerKind":"EC2NodeClass
","EC2NodeClass":{"name":"default"},"namespace":"","name":"default","reconcileID":"bd9523a7-6456-4de3-9b32-3f9293425866","parameter":"/aws/service/eks/optimized-ami/1.30/amazon-linux-2-gpu/recommended/image_id","value":"ami-033b66e831e2c5f86"}
{"level":"ERROR","time":"2024-09-10T14:45:23.152Z","logger":"controller","message":"failed listing instance types for default","commit":"62a726c","controller":"disruption","namespace":"","name":"","reconcileID":"4bfd4c2f-61f3-4cc9-829d-930350fd859d","error":"no subnets found"}
{"level":"INFO","time":"2024-09-10T15:21:13.076Z","logger":"controller","message":"found provisionable pod(s)","commit":"62a726c","controller":"provisioner","namespace":"","name":"","reconcileID":"2033d0c2-e63a-4176-8b47-99d331d
df61f","Pods":"default/nginx-deployment-576c6b7b6-c4c2d, default/nginx-deployment-576c6b7b6-xhl9m, default/nginx-deployment-576c6b7b6-trkf4, default/nginx-deployment-576c6b7b6-gnb75, default/nginx-deployment-576c6b7b6-j2qwx and 47 other(s)","duration":"229.528995ms"}
{"level":"INFO","time":"2024-09-10T15:21:13.076Z","logger":"controller","message":"computed new nodeclaim(s) to fit pod(s)","commit":"62a726c","controller":"provisioner","namespace":"","name":"","reconcileID":"2033d0c2-e63a-4176-8b47-99d331ddf61f","nodeclaims":1,"pods":52}
{"level":"INFO","time":"2024-09-10T15:21:13.098Z","logger":"controller","message":"created nodeclaim","commit":"62a726c","controller":"provisioner","namespace":"","name":"","reconcileID":"2033d0c2-e63a-4176-8b47-99d331ddf61f","NodePool":{"name":"default"},"NodeClaim":{"name":"default-wgbwq"},"requests":{"cpu":"260m","memory":"290Mi","pods":"57"},"instance-types":"c4.xlarge, c5.xlarge, c5a.xlarge, c5ad.xlarge, c5d.xlarge and 17 other(s)"}
{"level":"INFO","time":"2024-09-10T15:21:15.752Z","logger":"controller","message":"launched nodeclaim","commit":"62a726c","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClai
m":{"name":"default-wgbwq"},"namespace":"","name":"default-wgbwq","reconcileID":"509fbe1c-f406-471f-bf82-71b96f0a0245","provider-id":"aws:///us-east-2a/i-0f4fe866297af566d","instance-type":"t4g.xlarge","zone":"us-east-2a","capacity-type":"spot","allocatable":{"cpu":"3920m","ephemeral-storage":"17Gi","memory":"14103Mi","pods":"58"}}
{"level":"INFO","time":"2024-09-10T15:21:41.702Z","logger":"controller","message":"registered nodeclaim","commit":"62a726c","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"default-wgbwq"},"namespace":"","name":"default-wgbwq","reconcileID":"31fab340-d7d4-4d84-9664-d6efe47eebf0","provider-id":"aws:///us-east-2a/i-0f4fe866297af566d","Node":{"name":"ip-10-0-55-209.us-east-2.compute.internal"}}
{"level":"INFO","time":"2024-09-10T15:21:54.762Z","logger":"controller","message":"initialized nodeclaim","commit":"62a726c","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"default-wgbwq"},"namespace":"","name":"default-wgbwq","reconcileID":"6914e0ef-f121-4163-b757-cc87c84f3218","provider-id":"aws:///us-east-2a/i-0f4fe866297af566d","Node":{"name":"ip-10-0-55-209.us-east-2.compute.internal"},"allocatable":{"cpu":"3920m","ephemeral-storage":"18233774458","hugepages-1Gi":"0","hugepages-2Mi":"0","hugepages-32Mi":"0","hugepages-64Ki":"0","memory":"15136204Ki","pods":"58"}}
{"level":"INFO","time":"2024-09-10T15:28:12.725Z","logger":"controller","message":"disrupting nodeclaim(s) via delete, terminating 1 nodes (3 pods) ip-10-0-55-209.us-east-2.compute.internal/t4g.xlarge/spot","commit":"62a726c","controller":"disruption","namespace":"","name":"","reconcileID":"f167a15a-0515-479f-b9e1-5ecb273c3697","command-id":"5c4c0606-c72a-4602-860f-feb31101a5c5","reason":"underutilized"}
{"level":"INFO","time":"2024-09-10T15:28:13.262Z","logger":"controller","message":"tainted node","commit":"62a726c","controller":"node.termination","controllerGroup":"","controllerKind":"Node","Node":{"name":"ip-10-0-55-209.us-east-2.compute.internal"},"namespace":"","name":"ip-10-0-55-209.us-east-2.compute.internal","reconcileID":"b36e33e1-3deb-44b8-9ac6-6eaa2c10267c","taint.Key":"karpenter.sh/disrupted","taint.Value":"","taint.Effect":"NoSchedule"}
{"level":"INFO","time":"2024-09-10T15:29:12.708Z","logger":"controller","message":"deleted node","commit":"62a726c","controller":"node.termination","controllerGroup":"","controllerKind":"Node","Node":{"name":"ip-10-0-55-209.us-east-2.compute.internal"},"namespace":"","name":"ip-10-0-55-209.us-east-2.compute.internal","reconcileID":"ccd91aa6-4fed-48f5-97fd-d117d8a989cd"}
{"level":"INFO","time":"2024-09-10T15:29:12.953Z","logger":"controller","message":"deleted nodeclaim","commit":"62a726c","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"default-wgbwq"},"namespace":"","name":"default-wgbwq","reconcileID":"0dea1bc7-8bd6-4a51-98d4-c771c0909529","Node":{"name":"ip-10-0-55-209.us-east-2.compute.internal"},"provider-id":"aws:///us-east-2a/i-0f4fe866297af566d"}
If you're interested in learning more about Karpenter, check out the links below
We will be using Prometheus and Grafana for setting up the monitoring. to do
so, we will
enable kube-prometheus-stack
in our managed addons in the eks_cluster.tf file,
something like
enable_kube_prometheus_stack = true
kube_prometheus_stack = {
name = "monitoring"
chart = "kube-prometheus-stack"
chart_version = "62.6.0"
repository = "https://prometheus-community.github.io/helm-charts"
namespace = "monitoring"
timeout = 900
}
with some custom configs, i.e. the custom namespace, chart version and so on. Once the deployment is done, we can list everything from the namespace with
$ kubectl get all -n monitoring
You can see, Grafana is running as a NodePort service, we can expose it with AWS ALB to the world. grafana.yml contains the necessary configurations to create the ingress. You can get the ALB URL by the command
$ kubectl get ing -n monitoring
Also, you'll need to retrieve the Grafana admin password for the first
time using the kubectl
command, i.e.
$ kubectl get secret -n monitoring monitoring-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
After that, you can visit the web address provided by the ingress, use admin
as the username, and hotly retrieved password to access Grafana. There, you
will see some prebuilt dashboards. Also, you can create your own or get by
ID(s) from Grafana Labs
If you want to access the Prometheus
also, use this
prometheus.yml file to create the Ingress, i.e.
$ kubectl apply -f prometheus.yml
$ kubectl get ing -n monitoring
NAME CLASS HOSTS ADDRESS PORTS AGE
grafana alb * k8s-monitori-grafana-0d481f4284-1538150578.us-east-2.elb.amazonaws.com 80 64m
prometheus alb * k8s-monitori-promethe-f7484f4f25-423861633.us-east-2.elb.amazonaws.com 80 25m
If you want to create IAM users and assign them administrator access or some development access, go to the iam.tf file, uncomment it, adjust the team users/settings according to your needs, and then apply the changes with terraform.
If you are using the kube-prometheus-stack
, CRDs created by this chart are
not removed by default and should be manually cleaned up:
start=$(date +%s)
kubectl delete -f ../EKS/echoserver_full.yml
kubectl delete -f ../EKS/env-echoserver.yml
kubectl patch ingress argocd-ingress -n argocd -p '{"metadata":{"finalizers":[]}}' --type=merge
kubectl patch ingress grafana -n monitoring -p '{"metadata":{"finalizers":[]}}' --type=merge
kubectl patch ingress kube-ops-ingress -n monitoring -p '{"metadata":{"finalizers":[]}}' --type=merge
kubectl patch ingress prometheus -n monitoring -p '{"metadata":{"finalizers":[]}}' --type=merge
kubectl delete nodes --all
kubectl delete crd alertmanagerconfigs.monitoring.coreos.com
kubectl delete crd alertmanagers.monitoring.coreos.com
kubectl delete crd podmonitors.monitoring.coreos.com
kubectl delete crd probes.monitoring.coreos.com
kubectl delete crd prometheusagents.monitoring.coreos.com
kubectl delete crd prometheuses.monitoring.coreos.com
kubectl delete crd prometheusrules.monitoring.coreos.com
kubectl delete crd scrapeconfigs.monitoring.coreos.com
kubectl delete crd servicemonitors.monitoring.coreos.com
kubectl delete crd thanosrulers.monitoring.coreos.com
kubectl delete --all nodeclaim
kubectl delete --all nodepool
kubectl delete --all ec2nodeclass
cluster_arn=$(kubectl config get-clusters | grep "arn:aws:eks:")
kubectl config delete-cluster "${cluster_arn}"
terraform destroy --auto-approve
end=$(date +%s)
echo Execution time was "$((end - start))" seconds.
Always delete the AWS resources to save money after you are done.
$ kubectl delete --all nodeclaim
$ kubectl delete --all nodepool
$ kubectl delete --all ec2nodeclass
$ kubectl config delete-cluster arn:aws:eks:<region>:<account_id>:cluster/<cluster_name>
$ terraform destroy --auto-approve
Alternately, you can use tf_cleanup.sh to clean up the resources.
정진우 (rynf0rce) | 강성훈 (ksh0811) | 오재석 (ojs201) | 차진우 (rttitity) | 임승환 (shlim0118) | 한 석 (Hanseok-git) |
박찬민 (전임강사) | 송민석 (교육생) | 손현준 (교육생) |