Because unmonitored infrastructure will bite you
APACHE-2.0 License
Because unmonitored infrastructure will bite you
Package | Latest Version |
---|---|
JustEatTakeaway.Watchman | |
JustEatTakeaway.Quartermaster |
AWSWatchman creates and maintains AWS CloudWatch alerts.
Dynamic monitoring. This program creates and maintains CloudWatch alerts for infrastructure in AWS. It covers DynamoDB tables, SQS queues and more.
The details of who to alert and what tables to alert on must be stored in configuration files.
⚠️ Be careful. This code, when used correctly, will modify your AWS account by adding CloudWatch alerts to multiple resources. By default it will do a dry run which will tell you what alarms would be added. You must add
--RunMode GenerateAlarms
to enable writes.
Unmonitored infrastructure is a problem waiting to happen, so all infrastructure in AWS should have appropriate alerts via CloudWatch. It is usually possible to declare all these alarms with the resource upfront in CloudFormation. But this is not always the best way to do it. AWSWatchman is good for cases where the CloudFormation definitions are harder to use. For example:
Watchman
is written in C# and targets .NET 6. You can download the dotnet
runtime for Windows, Mac or Linux here
Run the Watchman
specifying a config folder for config files (see Configuration file format), and optionally AWS credentials.
dotnet .\Watchman.dll --RunMode GenerateAlarms --ConfigFolder ".\configuration" --AwsAccessKey AKABC123 --AwsSecretKey abcd1234
dotnet .\Watchman.dll --RunMode GenerateAlarms --ConfigFolder ".\configuration" --AwsProfile prod
dotnet .\Watchman.dll --RunMode GenerateAlarms --ConfigFolder ".\configuration"
If you are using the new resource types and have a high number of alarms you will need to specify and S3 bucket/path that the CloudFormation template can be deployed to. The AWS credentials will need permissions to put objects into that location.
dotnet .\Watchman.dll --RunMode GenerateAlarms --ConfigFolder ".\configuration" --TemplateS3Path "s3://je-deployments-qa21/watchman"
The possible command-line parameters are:
RunMode
: One of TestConfig
, DryRun
, or GenerateAlarms
. Optional, default is DryRun
. Mode behaviours are:
TestConfig
: Configuration files are loaded and validated. Used to test syntax of changes to configuration. AWS credentials are not needed.DryRun
: All actions short of writing alarms are performed. Used to test what the effects of the configuration will be on the AWS account.GenerateAlarms
: A full run of the program. You must specify GenerateAlarms
in order to actually write alarms.AwsAccessKey
and AwsSecretKey
. Optional. Supply both of these parameters in order to specify AWS credentials on the command line.AwsProfile
Specify a named AWS profile to use for credentials. Optional.AwsRegion
The AWS region to use. Optional, default is eu-west-1
.ConfigFolder
: The ppath to the configuration files. Required.Verbose
: One of true
or false
. Give more detailed output. Optional, default is false
.WriteCloudFormationTemplatesToDirectory
. If set, alarms deployed via CloudFormation will be written to this folder instead of deployed. Note that this does not affect SQS and DynamoDB alarms which currently use a different deployment method.AwsLogging
. Enable AWS SDK logging. Default is false
. If true
, AWS metrics and error responses are logged to the console.AWS connection credentials will be found in the following order:
AwsAccessKey
and AwsSecretKey
are specified, these will be used.AwsProfile
is specified, the named profile will be used.Test that the configuration files can be read and pass validation. AWS credentials are not required for this.
For example:
dotnet .\Watchman.dll --RunMode TestConfig --ConfigFolder ".\configuration"
Shows what would happen - It does all the reads but none of the writes. AWS credentials are required for this.
For example:
dotnet .\Watchman.dll --RunMode DryRun --ConfigFolder ".\configuration" --AwsAccessKey AKABC123 --AwsSecretKey abc123 --Verbose true
A full read and write run. AWS credentials are required for this.
For example:
dotnet .\Watchman.dll --RunMode GenerateAlarms --ConfigFolder ".\configuration" --AwsAccessKey AKABC123 --AwsSecretKey abc123
The user associated with the keys needs to be in several roles. These are all documented in the Security Policy.
See Supported Services for supported AWS services.
When run, Watchman
does things in approximately this order:
TestConfig
.DryRun
.Quartermaster is a reporting tool to examine your DynamoDB usage. It will send a weekly "Dynamo provisioning report" email to the reporting targets. This lists the read and write capacity, provisioned and peak of actual usage, across all all the DynamoDB tables and their indexes listed in the alerting group for the last week, and the usage as a percentage of provisioned capacity.
The percentage use can be used to track which tables and indexes are approaching the threshold, and which are over-provisioned. The intention is to allow you to change capacities up or down without first needing an alert.