CloudStackOps

Collection of scripts that make operating a CloudStack cloud easier :-).

This collection of scripts was written to automate operating the public and private Cloud, which is built on top of CloudStack. It consists of handy scripts when you're working with CloudStack on a day-to-day basis.

API Credentials

To talk to the CloudStack API, you need to configure your root admin API keys. You can either configure them in the config file config, or tell the script which CloudMonkey profile to use by using the --config-profile or -c command line argument.

CloudMonkey is NOT used to execute the API calls, it just uses its config file since many of us have this already setup and it makes life easier.

Command line arguments

Using arguments you specify on the command line, you can control the behaviour of the scripts. When no arguments are specified, these scripts will display usage. So, it is safe to run them without arguments to learn what options are available.

Each of the command line arguments has a long version (prefixed by double-dash), like --domain and a shorter one (prefixed with one dash), like -d. You can use either method, or mix them.

DRY-run and DEBUG modes

All scripts run in DRY-run mode by default. This means it will tell you what it wants to do, but that's all. If you're OK with it, run it again with the --exec parameter specified. It will then really execute the API calls and change stuff. When using scripts that only show listings, you do not need the --exec parameter as nothing would change by just listing information anyway.

At any time, add --debug as a parameter and the script will add some useful debug info.

E-mail notifications

All scripts that do stuff that impacts users, will lookup an e-mailadres in CloudStack, and send an e-mail notification when maintenance starts and completes. An example is the upgradeRouterVM.py that you use to upgrade a router vm to a new templete, for example after an CloudStack upgrade.

Please be sure to edit the config file, to edit the e-mail settings before sending e-mail.

If something goes wrong, a notification is e-mailed to the errors-to e-mail address in the config file.

Getting started

Setup config file

An example file config.sample is provided as a starting point. Copy the file to start:

cp -pr config.sample config

Next, have a look at the config parameters. Example config:

# Config file for CloudStack Operations scripts

[cloudstackOps]
organization = The Iaas Team

[mail]
smtpserver = localhost
mail_from = [email protected]
errors_to = [email protected]

[config]
username = admin
apikey = whMTYFZh3n7C4M8VCSpwEhpqZjYkzhYTufcpaLPH9hInYGTx4fOnrJ3dgL-3AZC_STMBUeTFQgqlETPEile4_A
url = http://127.0.0.1:8080/client/api
expires = 600
secretkey = 9Z0S5-ryeoCworyp2x_tuhw5E4bAJ4JTRrpNaftTiAl488q5rvUt8_pG7LxAeg3m_VY-AafXQj-tVhkn9tFv1Q
timeout = 3600
password = password

[mysqlservername]
mysqlpassword=password

Again, if you use CloudMonkey you can ommit the [config] part. If you like, you can add multiple sections like [devcloud] and [prodcloud] and refer to them with the -c flag (like you can do with the CloudMonkey config file, too.

A given profile is first looked up in the CloudMonkey config file, then in the local config file. If both exist, CloudMonkey profile is used.

Remember: you need root API credentials to use most scripts.

The MySQL part is used for scripts that query the database. You can specify the mysql server on the command line and specify the password in the config file, using a section with the same name. Alternatively, you can also specify the password on the command line (not recommended).

Setup Marvin

To talk to the CloudStack API, these scripts use the Marvin Python library that comes with Apache CloudStack. Unfortunately, Marvin has changed quite a few times during the development of these scripts. Without backwards compatibility that is. Therefore, use the one in this repository and you're fine. Support for the latest version is being worked on, but that takes some time. You can setup an virtual environment (see below) to use multiple versions at the same time.

Install the tar.gz from this repo using pip

pip install -Iv marvin/Marvin-0.1.0.tar.gz

Optional: Python virtual environment (non-root):

Especially when you are using different versions of Marvin or other packages, a virtual environment is handy. It also does not require root privileges to install.

Make sure you have a system with python including virtualenv support. Install by running:

sudo yum install python-virtualenv

Make a python virtual environment

virtualenv ~/python_cloud
When working with the scripts activate this virtual env

source ~/python_cloud/bin/activate
install marvin python cloudstack library within the virtual env

pip install -Iv marvin/Marvin-0.1.0.tar.gz
Call scripts with python instead of ./, example:

python listVirtualMachines.py --oncluster CLUSTER-3

Dependencies

Install clint to display colors on the terminal.

pip install -Iv clint

You also need pretty table.

pip install -Iv prettytable

... and for the MySQL direct access, you'll need mysql-connector-python.

yum -y install mysql-connector-python

.. some extra for some extra juice pip install -Iv dnspython

Tips

Always run in DRY-mode first, so you get an idea what will happen.
Always run the commands in screen (so it keeps running when your connection gets lost)

Using the provided scripts

For each script included, the use-case and usage examples are provided below.

Display overview of VMs / used capacity

This script lists all instances and their consumed capacity. You can limit it to only display instances or only routers, filter on domain name, project name, a keyword, or a combination. At the bottom, a summary is displayed with the number of VMs, the total used disk space, and the total allocated RAM.

This script has two main use cases: to display an overview and to be used to make a selection and then pipe it to the next script. See section about Batch processing for more info.

For usage, run: ./listVirtualMachines.py

Examples:

To list all VMs on cluser with name 'CLUSTER-2': ./listVirtualMachines.py --oncluster CLUSTER-2
To list all VMs on POD with name 'POD-2': ./listVirtualMachines.py --pod POD-2
To list all VMs on cluser with name 'CLUSTER-2', but only from domain 'domainname': ./listVirtualMachines.py --oncluster CLUSTER-2 --domain domainname
To list all VMs on cluser with name 'CLUSTER-2', but only from domain 'domainname', filter on 'keyword': ./listVirtualMachines.py --oncluster CLUSTER-2 --domain domainname --filter keyword
To list all VMs on cluser with name 'CLUSTER-2', filter on 'keyword': ./listVirtualMachines.py --oncluster CLUSTER-2 --filter keyword
To list all project VMs on cluser with name 'CLUSTER-2': ./listVirtualMachines.py --oncluster CLUSTER-2 --is-projectvm
To list all project VMs on cluser with name 'CLUSTER-2' where projectname is 'linuxbase': ./listVirtualMachines.py --oncluster CLUSTER-2 --project linuxbase
To list all project VMs on cluser with name 'CLUSTER-2', filter on 'keyword': ./listVirtualMachines.py --oncluster CLUSTER-2 --filter keyword
To list all VMs on cluser with name 'CLUSTER-1', and specify a config profile to use: ./listVirtualMachines.py --config-profile config_cloud_admin --oncluster CLUSTER-2
To list the capacity used in zone 'ZONE-1': ./listVirtualMachines.py --zone ZONE-1 --summary
To list the capacity used in zone 'ZONE-1' by domain 'domainname': ./listVirtualMachines.py --zone ZONE-1 --domainname domain --summary
To list the VMs for a domain with non-admin credentials: ./listVirtualMachines.py --non-admin-credentials --domain domainname
To list the VMs for all domains except the domain called 'domainname': ./listVirtualMachines.py --ignore-domain domainname
To list the VMs for all domains except the specified domain names: ./listVirtualMachines.py --ignore-domain domainnamei1,domainname2,domainname3

Working with routers:

To list the router VMs on on cluser with name 'CLUSTER-2': ./listVirtualMachines.py --oncluster CLUSTER-2 --only-routers
To list the router VMs on on cluser with name 'CLUSTER-2' that have exactly 2 nics: ./listVirtualMachines.py --oncluster CLUSTER-2 --only-routers --router-nic-count 2
To list the router VMs on on cluser with name 'CLUSTER-2' that have more than 3 nics: ./listVirtualMachines.py --oncluster CLUSTER-2 --only-routers --router-nic-count 3 --nic-count-is-minumum
To list the router VMs on on cluser with name 'CLUSTER-2' that have 3 or less nics: ./listVirtualMachines.py --oncluster CLUSTER-2 --only-routers --router-nic-count 3 --nic-count-is-maximum
To list the router VMs on on cluser with name 'CLUSTER-2' that require a systemvm template upgrade: ./listVirtualMachines.py --oncluster CLUSTER-2 --only-routers-to-be-upgraded

Tips:

If you want to display instances only, use the --no-routers switch.
If you want to display routers only, use the --only-routers switch.

Migrate a Virtual Machine

This script will migrate a given VM to the cluster specified. The vm will be shut down, and the user is informed by e-mail. We use this to migrate between old and new clusters.

For usage, run: ./migrateVirtualMachine.py

Examples:

To migrate a VM with instance-id 'i-123-45678-VM' to cluster 'CLUSTER-2': ./migrateVirtualMachine.py --instance-name i-123-45678-VM --tocluster CLUSTER-2 --exec
To migrate a VM with instance-id 'i-123-45678-VM' to cluster 'CLUSTER-2' in DRY-run mode: ./migrateVirtualMachine.py --instance-name i-123-45678-VM --tocluster CLUSTER-2
To migrate a VM with instance-id 'i-123-45678-VM' to cluster 'CLUSTER-2' and belongs to a project: ./migrateVirtualMachine.py --instance-name i-123-45678-VM --tocluster CLUSTER-2 --exec --is-projectvm

Tip: We recommend using instance names instead of hostnames. When a hostname is not unique, the script will refuse to operate.

To migrate a VM with name 'server001' to cluster 'CLUSTER-2':
./migrateVirtualMachine.py --vmname server001 --tocluster CLUSTER-2 --exec

Migrate all Virtual Machines in a domain

There used to be a separate script for this, but that has been depricated. Instead, you can use ./listVirtualMachine.py and pipe its output to ./migrateVirtualMachine.py. This allows for more flexability and removes duplicate code.

Just play with ./listVirtualMachine.py to get the desired list of VMs, then pipe it through egrep and awk to generate the migrate commands and finally feed it to sh for execution.

Example:

Migrate all VMs in domain 'domainname' to cluster 'CLUSTER-2'
./listVirtualMachines.py -d domainname |
egrep "i\-(.*)\-VM" |
cut -d\| -f6 |
awk {'print "./migrateVirtualMachine.py --instance " $1 " --tocluster CLUSTER-2" '} | sh

When you are sure it works as expected, add --exec and all VMs will be migrated sequentially.

Migrate offline volumes

This script migrate all offline volumes from one storage pool to another. An offline volume is a volume that is currently not attached to a running vm. It is mainly useful to empty a cluster's storage pool so you can decommision it.

In DRY-RUN mode, it will display a nice table of what will be migrated.

For usage, run: ./migrateOfflineVolumes.py

Examples:

To display which offline volumes can be migrated from CLUSTER-6 to CLUSTER-12: ./migrateOfflineVolumes.py --fromcluster CLUSTER-6 --tocluster CLUSTER-12
To migrate offline volumes from CLUSTER-6 to CLUSTER-12: ./migrateOfflineVolumes.py --fromcluster CLUSTER-6 --tocluster CLUSTER-12 --exec

Upgrade a router VM

This script will upgrade a router to a new systemVM template. CloudStack will destroy it and re-create it using the same instance-id when you just reboot it. This script works on CloudStack 4.3 and above, because it uses the requiresUpgrade flag.

The script is most powerful when used in combination with the listVirtualMachines.py script and the --only-routers-to-be-upgraded flag. See Batch processing section below for an example.

When a router does not need an update, the script will do nothing.

For usage, run: ./upgradeRouterVM.py

Examples:

To upgrade a router VM: ./upgradeRouterVM.py --routerinstance-name r-12345-VM --exec
To upgrade a router VM that belongs to a project: ./upgradeRouterVM.py --routerinstance-name r-12345-VM --is-projectrouter --exec

Update a hosts hosttags

This script will update the hosttags of a given host. You can eiter add a new hosttag or replace all tags with new ones.

For usage, run: ./updateHostTags.py

Examples:

To add a tag 'new-tag': ./updateHostTags.py --hostname hypervisor01 --tags new-tag --exec
To replace all tags with a new tag 'new-tag': ./updateHostTags.py --hostname hypervisor01 --tags new-tag --replace --exec
To replace all tags with new tags 'new-tag,new-tag-2': ./updateHostTags.py --hostname hypervisor01 --tags new-tag,new-tag-2 --replace --exec
To remove all tags: ./updateHostTags.py --hostname hypervisor01 --tags ' ' --replace --exec

Manage a cluster

These scripts allow you to set the Allocation- and Managed state. This is handy when you want to patch a cluster. Currently only XenServer is supported.

You can see the status of the cluster, inclusing its hosts. Also, it's easy to see who the poolmaster is.

For usage, run: ./clusterMaintenance.py

Examples:

To show an overview of a cluster: ./clusterMaintenance.py --clustername cluster001
To set the cluster in Unmanage state: (As a result, all hosts get disconnected.) ./clusterMaintenance.py --clustername cluster001 --managedstate Unmanaged --exec
To set it to manage, run: ./clusterMaintenance.py --clustername cluster001 --managedstate Managed --exec
To disable a cluster: ./clusterMaintenance.py --clustername cluster001 --allocationstate Disabled --exec
To enable it again: ./clusterMaintenance.py --clustername cluster001 --allocationstate Enabled --exec

Rolling reboot of XenServer cluster

The purpose of this script is to reboot all hypervisors in a XenServer cluster (aka pool) without impacting the uptime of the VMs running on the cluster. This requires a N+1 situation, where one hypervisor can be empty (this is a wise configuration anyway). The script will start with the poolmaster, live migrate all VMs to other hypervisors and then reboot. When it comes back, one-by-one all other hypervisors will be rebooted and VMs are live migrated around.

Using the --prepare flag, some pre-work is done: ejecting CDs, faking XenTools and pushing some scripts.

The script requires the following PIP modules: Marvin, clint, fabric.

Overview of what it does:

This script will:

Set the specified cluster to unmanage in CloudStack
Turn OFF XenServer poolHA for the specified cluster
For any hypervisor it will do this (poolmaster first):
- put it to Disabled aka Maintenance in XenServer
- live migrate all VMs off of it using XenServer evacuate command
- when empty, it will reboot the hypervisor
- will wait for it to come back online (checks SSH connection)
- set the hypervisor to Enabled in XenServer
- continues to the next hypervisor
When the rebooting is done, it enables XenServer poolHA again for the specified cluster
Finally, it sets the specified cluster to Managed again in CloudStack
CloudStack will update its admin according to the new situation
Then the reboot cyclus for the specified cluster is done!

To kick it off, run with the --exec flag.

For usage, run: ./xenserver_rolling_reboot.py

Examples:

To display the above help message for 'CLUSTER-1': ./xenserver_rolling_reboot.py --clustername CLUSTER-1
To prepare the rolling reboot for 'CLUSTER-1': ./xenserver_rolling_reboot.py --clustername CLUSTER-1 --prepare
To start the rolling reboot for 'CLUSTER-1': ./xenserver_rolling_reboot.py --clustername CLUSTER-1 --exec
To start the rolling reboot for 'CLUSTER-1' and use 6 threads (instead of the default 5): ./xenserver_rolling_reboot.py --clustername CLUSTER-1 --threads 6 --exec
To start the rolling reboot for 'CLUSTER-1' but skip the host called 'host1': ./xenserver_rolling_reboot.py --clustername CLUSTER-1 --ignore-hosts host1 --exec

Display the CloudStack HA-Worker table

This script lists all entries in the HA-Worker table. This is useful when a hypervisor failed and you need to know the impact (to send out a notification e-mail to customers for example).

The hostname of the MySQL server needs to be provided as an argument. The password for the cloud user is optional. When it is not supplied, it will try looking it up in the 'config' config file. You need to make a section with the hostname, and mysqlpassword=password.

For usage, run: ./listHAWorkers.py

Examples:

To display the HA-worker table for a CloudStack instance using MySQL server 'mysql001': ./listHAWorkers.py --mysqlserver mysql001
To display the HA-worker table for a CloudStack instance using MySQL server 'mysql001' and specify password 'passwd': ./listHAWorkers.py --mysqlserver mysql001 --mysqlpassword passwd
To display only records about 'hypevisor001': ./listHAWorkers.py --mysqlserver mysql001 --hypervisor-name hypevisor001
To display only records about 'hypevisor001' of VMs that are currently non running: ./listHAWorkers.py --mysqlserver mysql001 --hypervisor-name hypevisor001 --non-running
To display only records about a VMs with a certain name: ./listHAWorkers.py --mysqlserver mysql001 --name-filter testvm
Generate a plain table: ./listHAWorkers.py --mysqlserver mysql001 --hypervisor-name hypevisor001 --plain-display

If you want to e-mail a list of vm's that were running on a hypervisor, use: ./listHAWorkers.py --mysqlserver mysql001 --hypervisor-name hypevisor001 --plain-display | awk {'print $1'}

Batch / Bulk processing

Most scripts do only a single operation, like migrating one instance. Combined with the ./listVirtualMachines.py script, you can create powerful workflows.

This selects all routers on 'CLUSTER-3' that have 2 nics, and upgrade those:

./listVirtualMachines.py -c config_cloud_admin --oncluster CLUSTER-3 --only-routers-to-be-upgraded --router-nic-count=2 |\
grep -E '[r]\-(.*)\-VM' | cut -d\| -f6 | awk {'print "./upgradeRouterVM.py -r " $1 '} | sh

You could also use the output from a CloudMonkey select, straight to one of the scripts. Like this when using table display:

cloudmonkey list clusters filter=name |\
tr -d '|' |\
tr -d '+' |\
grep -v name |\
grep -v "\-\-" |\
grep -v count |\
grep -v host |\
grep -v cluster |\
tr -d ' ' |\
awk {'print "./listVirtualMachines.py --oncluster " $1 '} | sh

This will execute the ./listVirtualMachines.py script for each result of the cloudmonkey call.

Note: for this to work, you need to disable colors in cloudmonkey:

cloudmonkey set color false

Experimental and advanced stuff

The below scripts use certain hacks or are for specific use cases. Feel free to use them, but be warned you may need to tweak them to get them to work. This is not for the ordinary or unexperienced user.

Put a hypervisor in Maintenance mode

This script will put a XenServer hypervisor in maintenance mode and is also able to cancel it. It makes sure only one host is in maintenance at the same time.

In theory, all you have to do is use the prepareHostForMaintenance() API call. In practice, we learned that this will sometimes fail with a resourcestate ErrorInMaintenance or simply gets stuck in PreparForMaintenance state.

Another issue we had was when CloudStack and XenServer would disagree on the available resources. That's why we came up with a different approach. We look for all vm's running, migrate the away using separate calls and finally put the host in maintenance. If that does not work, we call XAPI to migrate it anyway. Our goal is to automatically empty a hypervisor to do automated maintenance without user impact. We used it to automatically patch XenServers, including reboots without downtime for the user.

In DRY-RUN mode, a similation will be done of a manual migration of all vm's. This allows you to spot problems before-hand.

For usage, run: ./hypervisorMaintenance.py

Examples:

To see what will happen when you would put host hypervisor001 in maintenance run: ./hypervisorMaintenance.py --hostname hypervisor001
To have more than one hypervisor in maintenance use --force flag: ./hypervisorMaintenance.py --hostname hypervisor001 --force
To put host hypervisor001 in maintenance run: ./hypervisorMaintenance.py --hostname hypervisor001 --exec
To cancel maintenance for hypervisor001 run: ./hypervisorMaintenance.py --hostname hypervisor001 --exec --cancel-maintenance

Migrate a Virtual Router (SQL)

This script will migrate the specified router VM to another cluster in the same zone. Be warned, it's a bit of a hack helped us migrate hundreds of routers to new clusters.

Please note: There is no supported way in CloudStack to move routers around, other than live-migrating them between the same cluster. For live-migration between clusters to work, both clusters need access to both primary storages and that is not the case in our setup.

Another way is to destroy the router, and when a new VM is started a new router will also be created. This wasn't the way we wanted it wo work, as it would cause too much down time and also a lot of trouble with capacity limits etc.

We needed this, because we wanted to move from old to new clusters. We came up with a new way:

Requirements

Put the cluster that the router VM is currently running on in Disabled state (aka the old cluster)
Make sure that any host and storage tags that are defined in the router VM's service offering, are removed from the current cluster's hypervisors

How does it work?

A router VM has only one disk, and this disk is always called "ROOT-" followed by an identifier. This identifier is the same as in the router VM's instance name. So, r-1234-VM has a disk called ROOT-1234.

Unfortunately the CloudStack API does not return any results when calling listVolues() with a name like ROOT-1234. As a work-around, we query the CloudStack database and look for the ROOT volume's UUID. Then, it is possible to call the migrateVolume() API with the router VM's ROOT volume (using it's UUID).

This script will:

Stop the router
Migrate the ROOT volume of the router
Start the router

This usually takes ~3 minutes or less and if you have a redundant setup, you won't notice as the router will fail-over.

It will also send e-mail notifications to inform the user.

Checks

To be sure it all works as expected a lot of checks are done:

Redundant routers will not end up on the same cluster
If a redundant router's peer router is in FAULT state (and thus a fail-over would fail) the script will not do the migration
Make sure the current cluster is Disabled and any tags are removed
Make sure the destination cluster has the required host and storags tags as defined in the Service Offering

When you do not specify the destination cluster, a random cluster within the same zone will be selected, that has the required tags, is in state 'Enabled' and is not the same cluster as the router's peer.

Connecting to the database

You need to specify the MySQL server and optionally the 'cloud' user's password. If not specified, the script tries to look up the password using the config file 'config'. Make a section with the name of the mysql server and specify mysqlpassword=password. See config.sample.

For usage, run: ./migrateRouterVM.py

Examples:

To migrate a router VM with instance-id 'r-1234-VM' in DRY-RUN mode: ./migrateRouterVM.py --mysqlserver mysqlserver01 --routerinstance-name r-1234-VM
To migrate a router VM with instance-id 'r-1234-VM': ./migrateRouterVM.py --mysqlserver mysqlserver01 --routerinstance-name r-1234-VM --exec
To migrate a router VM with instance-id 'r-1234-VM' to cluster 'CLUSTER-2': ./migrateRouterVM.py --mysqlserver mysqlserver01 --routerinstance-name r-1234-VM --tocluster CLUSTER-2 --exec
To migrate a router VM with instance-id 'r-1234-VM': ./migrateRouterVM.py --mysqlserver mysqlserver01 --mysqlpassword test123 --routerinstance-name r-1234-VM --exec

List running Async jobs (SQL)

Restarting CloudStack when migrations are ongoing or snapshots are being made is not wise. This script shows what jobs are running so you can see if restarting now is OK. It's not 100% safe, but the best guestimate we currently have.

For usage, run: ./listRunningJobs.py

Examples:

To list running jobs of a Cloud that uses 'mysqlserver01' as its MySQL host:
./listRunningJobs.py --mysqlserver mysqlserver01

Who is using a given ip address (SQL)

This is a script that looks up a given ip address and shows who uses it. Handy for abuse handling.

For usage, run: ./whoHasThisIp.py

Examples:

Look who uses 1.2.3.4 address: ./whoHasThisIp.py --mysqlserver mysqlserver01 --ip-address 1.2.3.4
Look for ip addresses that have match '10.20.': ./whoHasThisIp.py --mysqlserver mysqlserver01 --ip-address 10.20.

Clean DHCP ip addresses

Script to clean an old ip address that is still left behind on the (VPC) router. It can use the edit_hosts.sh script (that lives on the router VM), but you don't have to manually lookup hostname and mac-address.

Make sure you copy this script to the router VM, give it exec permission chmod 755 cleanDHCPipaddress.sh and then run it to clean the ip addresses.

To clean an address using the provided 'edit_hosts.sh' scipt: ./cleanDHCPipaddress.sh 1.2.3.4
If it's in a weird state, use force to clean as much as possible on its own: ./cleanDHCPipaddress.sh 1.2.3.4 1

List networks, perform network maintenance

This helper script allows you to get an overview of the networks in the platform, and perform restarts with cleanup=true to all or a subset of them, based on the filters.

$ ./listNetworkVRs.py
+---+----------------------+----------+--------------------------------------+---------+-------------+------------+-------------+---------------------+
| # |       Network        |   Type   |                  ID                  |  Domain |    State    | Redundant? | RestartReq? |         VRs         |
+---+----------------------+----------+--------------------------------------+---------+-------------+------------+-------------+---------------------+
| 1 |     net-jenkins      | Isolated | 8e8c6038-83f2-4063-8f32-88844608c534 | jenkins |  Allocated  |    True    |    False    | r-1303-VM,r-1302-VM |
| 2 | T1-VPC-NMCTX-T154226 | VPCTier  | 26dd8f6e-e01e-4ddd-8160-ff8143b95ffd |   ROOT  | Implemented |   False    |    False    |      r-1300-VM      |
| 3 | T2-VPC-NMCTX-T154226 | VPCTier  | dfe1c73e-87d1-4b58-a004-3071268aef7d |   ROOT  | Implemented |   False    |    False    |      r-1300-VM      |
| 4 |     test-network     | Isolated | 2260393c-1913-4605-88f0-d4302573581e |   test  |  Allocated  |    True    |     True    |                     |
| 5 |  VPC-NMCTX-T154226   |   VPC    | 53cab7c8-8560-4adc-a325-731260eebc77 |   ROOT  |   Enabled   |   False    |    False    |      r-1300-VM      |
+---+----------------------+----------+--------------------------------------+---------+-------------+------------+-------------+---------------------+

Perform restarts on a subset (only VPCs with redundant VRs):

$ ./listNetworkVRs.py --exec -r --type VPC --onlyRR

Perform user maintenance

This script will assist you in maintaining users across several domains and - especially - expiring them.

$ ./listUsers.py -u n.tavares
+---+--------+----------+------+-----------+--------------------------------------+-------+---------+----------+
| # | Domain | Account  | Type |  Username |                  Id                  | First |   Last  |  State   |
+---+--------+----------+------+-----------+--------------------------------------+-------+---------+----------+
| 1 |  ROOT  |  admin   |  1   | ntavares  | 37c92e7f-5bb6-485e-ba33-4db0e3f503ee |  Nuno | Tavares | enabled  |
| 2 |  CDN   |   cdn    |  2   | ntavares  | 3e4bd323-9312-4936-ac94-1e1277b43c80 |  Nuno | Tavares | enabled  |
+---+--------+----------+------+-----------+--------------------------------------+-------+---------+----------+

$ ./listUsers.py --disable -u n.tavares --domain CDN

Advisory tool

This is a tool to run detect problem in all components of a CloudStack cloud. It will attempt to detect problems as precise as possible, and suggest actions to be taken. For certain problems, it is actually able to self-heal. For each action suggested, an impact analysis is also made (Safety Level), so that you can use the tool to automate some of the repair tasks up to a specified Safety level. Also, because some of the tests might take long or produce a lot of results (due to circumstancial conditions, e.g., after an upgrade), there is a --deep switch to enable these.

Currently, this tool integrates (is dependent) of other support and monitoring tools, mostly using them in the so-called "quick" mode (default). To perform a live assessment, there is a --live switch. Generally speaking, the filters you specify will dictate the scope of the actions requested (if any).

$ ./listAdvisories.py -h
Usage: ./listAdvisories.py [options] 
  --config-profile -c <profile>		Specify the CloudMonkey profile name to get the credentials from (or specify in ./config file)
  --plain-display			Enable plain display, no pretty tables
  --repair				Apply suggested actions - at Safe/Best level

  Modifiers:
  --exec	Disable dry-run mode. You'l need this to perform changes to the platform.
  --debug	Enable debug mode. Use it multiple times to increase verbosity
  --live	Perform live scan. By default, quick mode is used (using deferred/cached collection methods)
  --deep	Enable further tests that usually produces a lot of results. For a list of tests, use -h with this option
  --email	Send Repair Report by email

  Filters:
  -n 		Scan networks (incl. VPCs)
  -r 		Scan routerVMs
  -i 		Scan instances
  -s 		Scan systemVMs
  -H 		Scan hypervisors
  -t 		Scan resource usage
  --all 	Report all assets of the selected types, independently of the presence of advisory
  --safety <safety> 	Filter out advisories that are not at the specified safety level (default: Best)

There is extensive documentation, but to be simplify the documentation we will now start documenting the check and repair support in the tool itself. To see the list of tests and actions supported, along in which "depth" level:

$ ./listAdvisories.py -h --deep

List of tests available
+------------+--------+--------------------------------------------------------------------------------------------------+-----------+----------+
|   Scope    | Level  | Symptom / Probe / Detection                                                                      | Detection | Recovery |
+------------+--------+--------------------------------------------------------------------------------------------------+-----------+----------+
|  network   | Normal | Flag restart_required                                                                            |    True   |   True   |
|  network   | Normal | Redundancy state inconsistency (needs -r)                                                        |    True   |   True   |
|   router   | Normal | Redundancy state                                                                                 |    True   |   True   |
|   router   | Normal | Output of check_routervms.py is non-zero (dmesg,swap,resolv,ping,fs,disk,password)               |    True   |   True   |
|   router   |  Deep  | Checks if router is running with the latest systemvm template version                            |    True   |   True   |
|   router   | Normal | Checks if router has requiresUpgrade flag on                                                     |    True   |   True   |
|   router   |  Deep  | Checks if router is based on the same package version than management (router.cloudstackversion) |    True   |   True   |
|  instance  | Normal | Try to assess instance read-only state                                                           |    True   |  False   |
|  instance  | Normal | Queries libvirt usage records for abusers (CPU, I/O, etc)                                        |    True   |  False   |
| hypervisor | Normal | Agent state (version, conn state)                                                                |    True   |  False   |
| hypervisor | Normal | Load average                                                                                     |    True   |  False   |
| hypervisor | Normal | Conntrack abusers                                                                                |    True   |  False   |
| hypervisor | Normal | check_libvirt_storage.sh correct functioning                                                     |    True   |  False   |
|  systemvm  | Normal | Output of check_appliance.py is non-zero (dmesg,swap,resolv,ping,fs,disk,websockify)             |    True   |   True   |
|  systemvm  |  Deep  | Checks if systemvm is running with the latest systemvm template version                          |    True   |   True   |
+------------+--------+--------------------------------------------------------------------------------------------------+-----------+----------+

Bugs

The scripts have been well tested during migrations, but there could still be bugs (in handling unexpected conditions for example). If you encounter problems, please open an issue.

Related Projects

thrash-protect

Simple-Stupid user-space program doing "kill -STOP" and "kill -CONT" to protect from thrashing

29 Jul 2013 162

authlog-threats

Parses the authlog against a CIDR whitelist, automatically reloads pf with the brutes, prints and...

07 May 2023 7

bpytop

Linux/OSX/FreeBSD resource monitor

01 Jul 2020 10,125

netcopa

Network Configuration Parser

04 Apr 2017 135

easyOVS

Insightful operations and diagnosis for OpenStack Networking!

28 Mar 2014 94

ansible-debian

Buildfiles: Ansible automated leight-weight and sensible Debian provisioning

10 Dec 2017 92

Cloudstack-automation-python

CloudStackOps

API Credentials

Command line arguments

DRY-run and DEBUG modes

E-mail notifications

Getting started

Tips

Using the provided scripts

Display overview of VMs / used capacity

Migrate a Virtual Machine

Migrate all Virtual Machines in a domain

Migrate offline volumes

Upgrade a router VM

Update a hosts hosttags

Manage a cluster

Rolling reboot of XenServer cluster

Display the CloudStack HA-Worker table

Batch / Bulk processing

Experimental and advanced stuff

Put a hypervisor in Maintenance mode

Migrate a Virtual Router (SQL)

List running Async jobs (SQL)

Who is using a given ip address (SQL)

Clean DHCP ip addresses

List networks, perform network maintenance

Perform user maintenance

Advisory tool

Bugs

Related Projects

thrash-protect

authlog-threats

bpytop

netcopa

easyOVS

ansible-debian