Deprecated, renamed and maintained at https://github.com/coinbase/odin
APACHE-2.0 License
Deploy your 12-factor-applications to AWS easily and securely with the Step Auto-Scaling Group (ASG) Deployer (Asgard).
Asgard's goals/requirements/features are:
Asgard is made of an AWS Lambda Function (with a role) and AWS Step Function. You can bootstrap these into AWS with:
git pull # pull down new code
./scripts/bootstrap
Asgard includes a test project deploy-test
that has one service web
which is a nginx server to be mounted behind a Elastic Load Balancer (ELB) and Load Balancer target group. The service instances have a security group and instance profile.
To create the AWS resources for deploy-test
:
./scripts/geo apply resources/deploy-test-resources.rb
Note: you will also have to tag the latest Ubuntu release
A deploy-test
release file deployer-test-release.json
looks like:
{
"project_name": "coinbase/deploy-test",
"config_name": "development",
"subnets": ["test_private_subnet_a", "test_private_subnet_b"],
"ami": "ubuntu",
"user_data": "{{USER_DATA_FILE}}",
"services": {
"web": {
"instance_type": "t2.nano",
"security_groups": ["ec2::coinbase/deploy-test::development"],
"elbs": ["coinbase-deploy-test-web-elb"],
"profile": "coinbase-deploy-test",
"target_groups": ["coinbase-deploy-test-web-tg"]
}
}
}
The user data for the release is stored in the file deployer-test-release.json.userdata
:
#cloud-config
repo_update: true
repo_upgrade: all
packages:
- docker.io
runcmd:
- docker run -d --restart always --name test_server -p 8000:80 nginx
To build a release for deploy-test
and send it to Asgard we use the step-asg-deployer
executable:
step-asg-deployer deploy deploy-test-release.json
Asgard then:
web
which is configured to start an nginx server.web
ASG to become healthy behind the ELB and target group. Healthy means that the health checks for both ELB and target group pass.An Asgard release is a request to deploy a Project-Configuration where:
org/name
.development
, production
.Each release can define 1-to-many Services; each service is a logical group of servers, e.g. web
or worker
, that maps to a single auto-scaling group (ASG).
When Asgard is sent a release, it moves it through a state machine:
At each of these states it is possible to fail and then move towards a failure state. The typical failures are:
The end states are:
A release uses resources that must exist and be configured correctly to be used for the project-configuration-service being deployed.
A release must have:
ami
key that can be either a Name
tag or AMI ID e.g. ami-1234567
subnets
key that is a list of either Name
tags or Subnet IDs e.g. subnet-1234567
Both the above resources MUST have a tag DeployWith
that equals step-asg-deployer
.
Services can have:
security_groups
key is a list of security groups Name
tagselbs
key is a list of ELB namestarget_groups
is a list of target group's Name
tagsAll the above resources MUST be tagged with the ProjectName
, ConfigName
and ServiceName
of the release to ensure that resources are assigned correctly.
Services can also have an Instance Profile defined by the profile
key that is and instance profile Name
tag. The roles path MUST be equal to /<project_name>/<config_name>/<service_name>/
.
Asgard makes it easy to scale both vertically and horizontally. To scale deploy-test
we add to the release:
{ ...
"services": {
"web": { ...
"instance_type": "c4.xlarge",
"ebs_volume_size": 20,
"ebs_volume_type": "gp2",
"ebs_device_name": "/dev/sda1",
"autoscaling": {
"min_size": 3,
"max_size": 5,
"spread": 0.2,
"max_terms": 1,
"policies": [
{
"type": "cpu_scale_up",
"threshold" : 25,
"scaling_adjustment": 2
},
{
"type": "cpu_scale_down",
"threshold" : 15,
"scaling_adjustment": -1
}
]
}
}
}
}
instance_type
is the EC2 instance type for the serviceebs_volume_size
, ebs_volume_type
, ebs_device_name
define the attached EBS volume in GB.The autoscaling
key defines the horizontal scaling of a service:
min_size
and max_size
.desired_capacity
is equal to the min_size
or capacity of the previously launched servicedesired_capacity * (1 + spread)
desired_capacity * (1 - spread)
max_terms
(default 0
), the release is immediately halts.policies
are defined above to increase the desired_capacity
by 2 instances if the CPU goes above 25% and reduce by 1 instance if it drops below 15%.Both spread
and max_terms
are useful when launching many instances because as scale increases the number of cloud errors increase.
Do not put sensitive data into user data. User data is not treated by Asgard as secure information, it is difficult to secure with IAM, and it is very limited in size. We recommend using Vault, AWS Parameter store, or KMS encrypted S3 authenticated by a service's instance profile.
The user_data
in the release is the plain text instance metadata sent to initialize each instance. Asgard will replace some strings with information about the release, project, config and service, e.g.:
...
write_files:
- path: /
content: |
{{RELEASE_ID}}
{{PROJECT_NAME}}
{{CONFIG_NAME}}
{{SERVICE_NAME}}
Asgard will replace {{PROJECT_NAME}}
with the name of the project and {{SERVICE_NAME}}
with the name of the service. This can be useful for getting service specific configuration and logging.
If user_data
is equal to {{USER_DATA_FILE}}
and deployed with step-asg-deployer
the value will be replaced with the contents of the <release_file>.userdata
, e.g. deployer-test-release.json.userdata
.
A release can have a timeout
which is how long in seconds a release will wait for its services to become healthy. By default the timeout is 10 minutes, the max value would be around a year (31556926 seconds) since that is how long a step function can run.
AWS provides Auto Scaling Group Lifecycle Hooks to detect and react to auto-scaling events. You can add the lifecycle hooks to the ASGs with:
{ ...
"lifecycle": {
"termhook" : {
"transition": "autoscaling:EC2_INSTANCE_TERMINATING",
"role": "asg_lifecycle_hooks",
"sns": "asg_lifecycle_hooks",
"heartbeat_timeout": 300
}
}
}
These can be used to gracefully shutdown instances, which is necessary if a service has long running jobs e.g. a worker
service.
Asgard supports manually stopping a release while is it being deployed. Just execute:
step-asg-deployer halt deploy-test-release.json
This will:
halt
file to S3Halt does not guarantee that the release will not be deployed, if executed too late the release may still result in success.
DO NOT use Stop execution
of the Asgard step function as it will not clean up resources and leave AWS in a bad state.
Deployers are critical pieces of infrastructure as they may be used to compromise software they deploy. As such, we take security very seriously around the step-asg-deployer
and try to answer the following questions:
The central authentication mechanisms are the AWS IAM permissions for step functions and S3.
By limiting the ec2:CreateAutoscalingGroup
, permissions the Asgard function becomes the only way to deploy ASG's. Then limiting permissions to who can call states:StartExecution
for Asgard limits who can deploy.
Ensuring that Asgard's lambda can only access a single S3 bucket, further limits who can deploy with:
{
"Effect": "Allow",
"Action": [
"s3:GetObject*", "s3:PutObject*",
"s3:List*", "s3:DeleteObject*"
],
"Resource": [
"arn:aws:s3:::#{s3_bucket_name}/*",
"arn:aws:s3:::#{s3_bucket_name}"
]
},
{
"Effect": "Deny",
"Action": ["s3:*"],
"NotResource": [
"arn:aws:s3:::#{s3_bucket_name}/*",
"arn:aws:s3:::#{s3_bucket_name}"
]
},
Who can execute the step function, and who can upload to S3 are the two permissions that guard who can deploy.
All resources that can be used in a Asgard deploy must opt-in using tags or paths. Additionally, service resources require specific tags or paths denoting which project/config/service can use them.
Assets uploaded to S3 are in the path /<ProjectName>/<ConfigName>
so limiting who can s3:PutObject
to a path can be used to limit what project-configs they can deploy or halt.
Each release the client generates a release release_id
, a created_at
date, and together also uploads the release to S3.
The step-asg-deployer
will reject any request where the created_at
date is not recent, or the release sent to the step function and S3 don't match. This means that if a user can invoke the step function, but not upload to S3 (or vice-versa) it is not possible to deploy old or malicious code.
Working out what happened and when is very useful for debugging and security response. Step functions make it easy to see the history of all executions in the AWS console and via API. S3 can log all access to cloud-trail, so collecting from these two sources will show all information about a deploy.
There is always more to do: