rpc-maas

Ansible playbooks for deploying Rackspace Monitoring-as-a-Service within Openstack Environments

APACHE-2.0 License

Stars
32
Committers
52

Bot releases are visible (Hide)

rpc-maas - 1.8.1 Latest Release

Published by rpc-jenkins-svc almost 6 years ago

Release Notes

1.8.1

Bug Fixes

  • Add boolean type for deploy_osp in custom OSD fact gathering module.
rpc-maas - 1.8.0

Published by rpc-jenkins-svc almost 6 years ago

Release Notes

1.8.0

New Features

  • This PR allows AIO creation for RPCR osp-13. The condition is ${RE_JOB_ACTION} == osp_13_deploy
  • Ceph commands for cech checks needs to run on the host not, not inside ceph docker container on osp-13. This PR fix that and enable ceph checks for osp-13 gates.
  • Now that embedded ansible is required to deploy rpc-maas playbooks against RPC-O and/or RPC-R deployments, the magnanimous-turbo-chainsaw (MTC), tooling repo has been added as a submodule of rpc-maas. This repository contains resources intended to be used across all RPC Toolstack components. As part of the initial commit, the generate-environment-vars.yml playbook and bootstrap-embedded-ansible.sh wrapper have been symlinked into the main directory structure.
    1. Added missing supported memcached version 1.4.39 for osp13
    2. Adjusted client.my.conf tempalte to take internal ip for osp13
    3. Rearranged maas-agent vars variables.
  • Add agent/poller installation for rpcr-maas
  • Add support for providing deployment overrides into MaaS deploy environment
  • Add support Glance, Cinder and Heat support for OSP13
  • Nova, neutron, and swift checks are now supporting rpcr OSP13
  • osp-13 dynamic inventory now includes logic automatically loading up stackrc env file. maas_stackrc will be set by dynamic inventory too.
  • The rpcr-dynamic inventory now is able to set OS_CACERT environment variable automatically.
  • Adjust ceph plugin code and check templates to make maas ceph cheks working with rpcr osp-13
  • This PR enables osp-13 gate for MAAS. It does so by using a wrapper script test-ansible-functional-osp-mnaio.sh to invoke tests on OSP-13 MNAIO director node. It also fixed some discrepencies between several checks and other missing steps in ansible dynamic inventory scripts as well.
  • Horizon and keystone checks are now supporting rpcr osp13.
  • Host and infra checks are now supporting rpcr OSP13. Also a new check type of pacemaker has been added.

Upgrade Notes

  • Deployment processes can skip the step sourcing /home/stack/stackrc. Also maas_stackrc can be removed from variable overrides
  • Deployment process can skip the step setting OS_CACERT.

Bug Fixes

  • Add agent token cleanup to raxmon and move execution of entity delete and agent token delete functions to the test-ansible-functional.sh trap.
  • Enable but not start rax-mon service in maas-agent-setup
  • Fix nfs_check.py to properly detect nfs file systems
  • Fix rpc-maas-tool.py to report excludedchecks as the check will always be created.
  • Gating - Remove 'enable_ironic' from aio-create.sh as the functionality is not currently supported and it is causing numerous critical alarms preventing maas-verify.yml from passing. This can be added back to the necessary release versions when required.
  • Gating - Remove maas-tigkstack-all.yml from test playbooks. This functionality is no longer supported as a product addition.
  • Gating - Enable 'maas_verify_registration' in gating to validate alarms are created in the api.
  • Gating - Enable 'maas_verify_status' in gating to validate alarms are not critical after deployments.
  • Gating - Tweak existing variable overrides for different gating release versions and actually use these overrides.
  • Gating - Remove tests/vars alarm exclusions, frequency changes, and update excluded checks to include the proper checks per release version.
  • Gating - Disable 'maas_remote_check' to exclude remote.http checks from testing as they will always fail in AIOs because of private endpoints.
  • Gating - Properly enable overrides for user_$VERSION_vars.yml based on the gate release.
  • Gating - Run maas-restart.yml after deploying rally performance checks.
  • Move the RH subscription-unregister to the gate exit tasks.
  • Fix a missed instantiation of ansible_host in rally plays without a container_address default.
rpc-maas - 1.7.10

Published by rpc-jenkins-svc almost 6 years ago

Release Notes

1.7.10

New Features

  • Managed Kubernetes k8s API checks have been moved to the maas agent inside the kubernetes cluster to align better with the device id of the cluster under monitoring and simplify management of alerts.

Bug Fixes

    • The conntrack_count.py plugin is now checking for network namespaces listed at /var/run/netns and retreives the iptables connection tracking infomation for each namespace. This ensures that embedded network namespaces are alerted in case connection tracking hashes are about to exceed a configurable threshold. Due to the limited availability of MAAS metrics per alarm, only the namespace with the higest connection tracking count is reported.
  • Latest influxdb-relay installation requires newer version of go.
rpc-maas - 1.7.9

Published by rpc-jenkins-svc about 6 years ago

Release Notes

1.7.9

New Features

  • Added tags to Run MaaS verify local task in maas-verify.yml for ASC test integration

Bug Fixes

  • The maas_common.py is setting OS_VOLUME_API_VERSION and OS_IMAGE_API_VERSION with a default of None. This caused calls to get_auth_details to fail a key check for 'None' causing nova and neutron checks to fail of those variables are not being set in the environment anywhere.
  • Some 1.7.8 checks (nova api) breaks in absence of maasrc file. This fix it.
rpc-maas - 1.7.8

Published by rpc-jenkins-svc about 6 years ago

Release Notes

1.7.8

New Features

  • Add maas rally scheme defaulting to the same as maas

Upgrade Notes

  • Refactored maas-common keystone libs to work with RPC-R

Bug Fixes

  • Fixes new queens gate
  • The rpc-r additions were unsetting a variable causing a set_fact task to fail on rpc-o environments. This is to make sure both checks use a different registered variable and set the fact based on the rpc-o vs rpc-r conditionals.
  • Map nova_console_type and nova_console_port back to a maas_* variable. These are defaults in OSA and aren't pulled in properly from the ansible role. This will allow overrides to be configured in a less obtrusive way.
rpc-maas - 1.7.7

Published by rpc-jenkins-svc about 6 years ago

Release Notes

1.7.7

Bug Fixes

  • Fix the bonding mii split logic to remove leading and trailing spaces and calculate the slave_count outside the loop determining if a slave is down.
rpc-maas - 1.7.6

Published by rpc-jenkins-svc about 6 years ago

Release Notes

1.7.6

New Features

  • Adds support for designatwe checks in RPC-R

Upgrade Notes

  • Add slave_count to ensure the bond has redundancy and check explicitly for not "up" This ensures a bond slave that is marked down (should only occur in a maintenance window) triggers an alert.

Bug Fixes

  • Remove - vars/maas-{{ ansible_distribution | lower }}.yml include as it is redundant and problematic for RPC-R Maas work
rpc-maas - 1.7.5

Published by rpc-jenkins-svc about 6 years ago

Release Notes

1.7.5

Prelude

This is a barebones script to add RPC-R support for octavia and mk8s

New Features

  • Added protocol and port option support for ironic checks, so it can support ssl and/or different port.
    • Added new bonding interface metric host_bonding_iface_<iface>_slave_down
  • nova/neutron service check plugins now produces string metrics. When API is down and agents/service cannot reach it, monitoring agent will generate a warning of message "cannot reach API" by emitting out a metric with a string metric "<service/agent> cannot reach API". And in other scenarios, the plugin will emit "Yes"/"No" metric to indicate service/agent is up/down.
  • 'site-osp.yml' will prepare and install octavia and mk8s maas checks on an RPC-R environment. This feature is very limited.

Known Issues

  • After the script is run the maas agent needs to be restarted manually for changes taking effect.

Upgrade Notes

  • The elasticsearch maas check is not doing a whole lot and is more noise than value. This is on top of the fact that the elasticsearch checks are not made for a deployment where elasticsearch/logstash are deployed in a high performance cluster. This change also removes the filebeat check.

Bug Fixes

  • notification queue matching regex is not working properly. Some messages may or may not start with '/' and may or may not have 'versioned_'. This PR fixes it.
rpc-maas - 1.7.4

Published by rpc-jenkins-svc over 6 years ago

Release Notes

1.7.4

New Features

  • Added the ability to override max_row_limit in influxdb.conf
  • Add the ability to specify the Ceph rados Gateway listen address for MaaS monitoring. Using the existing ceph-ansible vars which are, in order of precedence. radosgw_address which is used to specify the address. radosgw_address_block which can be used to specify the address block in CIDR format/ radosgw_interface which can be used to specify the interface on which RadosGW is listening, for example br-storage. If none of these options are specified the ansible_host address will be used.

Bug Fixes

  • Fixes the wrong auth servcie and the typo when calling the managed_k8s_services_local_check.py script.
rpc-maas - 1.7.3

Published by rpc-jenkins-svc over 6 years ago

Release Notes

1.7.3

New Features

  • Managed Kubernetes installs several lxc-conatiners for various auhentication related functionality (etp, etg, auth, ui). This will monitor if the respective processes are up and the API endpoints are accessible. UI also includes checks for the LB whereas the other services only have local checks.

Bug Fixes

  • Add an extra maas_env_product, defaults to 'rpco'. In ceph environment it should be set to 'ceph'. Also maas_rpco_dir has been renamed to maas_product_dir, and osa version will be left out in the case of ceph environment
  • Update metadata macro to include lb_ssl_expiry_check and private_lb_ssl_expiry_check in the API category.
rpc-maas - 1.7.2

Published by rpc-jenkins-svc over 6 years ago

Release Notes

1.7.2

New Features

  • If maas_rally is configured to write to an influxdb endpoint, a new metric (influxdb_success) and alarm will be created to generate alerts if writing to influxdb fails. A failure to write to influxdb is no longer fatal, which allows performance metrics to still be reported via the MaaS API even if the influxdb endpoint is unavailable.
  • Add ability to set the port used by the Ceph rados Gateway service. Use the radosgw_civetweb_port variable to set the port. This defaults to 8080 to match the ceph-ansible default, but the radosgw_civetweb_port variable must be set to the same value in your Ceph and MaaS configurations.
  • maas_rally now adds an 'influxdb_database' tag to influxdb datapoints, which allows for granular routing to different backend influxdb databases using telegraf.
  • The maas_rally task arguments are now read from the plugin's configuration file (/etc/rally/maas_rally.yml by default). This eliminates the need to look up things such as network uuids when running a performance scenario manually for troubleshooting purposes.

Upgrade Notes

  • Any custom scenarios or overrides setting non-default times and/or concurrency values will need to move these settings to the task_args dictionary.
  • Any configuration overrides of the extra_vars dictionary will need to rename the dictionary to task_args.
  • After running the maas-openstack-rally.yml playbook the rally_* checks in MaaS will fail until the agent is restarted and check definitions are updated.

Bug Fixes

  • Revert limiting enablement of rgw checks to first node of each group. This was an incorrect assumption.
    • Fixes endpoint handling to better support deployment in Kilo environments
    • Adjusts rabbitmq_status check to better handle missing RabbitMQ API data
    • Raises MaaS check timeout to 59 seconds, canonizing a de facto default
  • Properly validate logical volume status if HP volume is encrypted
  • fixed pip-10 introduced gating issue
  • Fix rate functions for swift_account_replication_check, swift_container_replication_check, and swift_object_replication_check.
  • openstacksdk has been temporarily pinned to <0.12.0 to work around changes that break maas_rally's resource cleanup

Other Notes

  • Improvements were made to maas_rally allow running the maas-openstack-rally.yml playbook without installing the MaaS agent. This supports use cases where rally performance scenarios need to be run without shipping metrics to the MaaS API.
rpc-maas - 1.7.1

Published by rpc-jenkins-svc over 6 years ago

Release Notes

1.7.1

New Features

  • Detailed logging was added to the maas_rally performance monitoring plugin.
  • Automatic stale lock and resource cleanup was added to maas_rally. This makes the plugin more robust and resiliant to transitory environmental problems.
  • A configurable quota factor was added to the maas_rally plugin. This allows resource cleanup and performance polling to run asynchronously.
  • The maas_rally plugin will now generate an alarm event when too many consecutive intervals (default=3) required cleanup of stale resources.
  • The maas_rally plugin will now generate an alarm event when too many consecutive intervals (default=3) were aborted waiting for immature locks.
  • A rally_diag.sh script is now deployed to all utility containers. This script helps support to quickly identify resources (instances, images, etc) that were created by maas_rally.

Bug Fixes

  • Limit ceph_cluster_stats and ceph_mons_stats checks to groups['mons'][0] and ceph_rgw_stats to groups['rgws'][0] to prevent duplicate alarms on ceph clusters.

    • Properly configure agent.plugin timeout value in plugin arguments.
    • Add override to swift-recon checks and include a parser for timeout in swift-recon.py.
  • Added more meaningful process info in neutron_ovs_agent alarm exception message.

  • Added a new status_err_no_exit function call to allow plugins like neutron_ ovs_agent_check.py to run its cause and report correct metrics

  • Fixed an exotic KeyError premature exit of the rabbitmq_status.py _get_node_metrics check path. (See https://core.rackspace.com/ticket/180307-12728 for reference)

  • Using the new status_err_no_exit function call to allow plugins to run its cause and report correct metrics

  • Fixed an exotic CalledProcessError premature exit of the swift quarantine check path. (See https://core.rackspace.com/ticket/180307-05355 for reference)

  • Using the new status_err_no_exit function call to allow plugins to run its cause and report correct metrics

  • Fixed an exotic KeyError premature exit of the rabbitmq_status check path.

    • Disable capacitive related checks: cinder_vg_check, ironic_capacity_check, and nova_cloud_stats_check.
    • Disable alarms for CDM checks on all hosts except groups['shared-infra_hosts']. This includes cpu_check, disk_utilisation, and memory_check.
    • Disable alarms for network_throughput across all hosts.

    * Changes to galera_check: * Limit enablement to groups['galera_all'][0].* Remove alarm for aborted_clients.
    * Changes to rabbitmq_status: * Limit enablement to groups['rabbitmq_all'][0].* Modify metric msgs_excl_notifications to sum messages from consumed queues only. * Add metric msgs_without_consumers to sum messages from unconsumed queues only.* Fix bug in rabbitmq_qgrowth_excl_notifications alarm removing the division by check period. This is automatically handled by the rate() function. * Restructure rabbitmq_queues_without_consumers alarm with rabbitmq_msgs_without_consumers. This will alarm if unconsumed messages reaches the default threshold of 20000.* Remove default var for unused maas_rabbitmq_queues_without_consumers_limit. * Update maas_rabbitmq_queued_messages_excluding_notifications_threshold to 5000.* Add maas_rabbitmq_messages_without_consumers_threshold, defaulting to 20000.

    • Update maas_swift_container_replication_avg_time_threshold from 50 to 300.

Other Notes

  • The user configured in openrc_os_username (admin by default) will be granted the admin role on each project created for maas_rally scenarios. This facilitates listing swift containers in the rally_diag.sh script.
rpc-maas - 1.7.0

Published by rpc-jenkins-svc over 6 years ago

Release Notes

1.7.0

  • MaaS for Designate (initial stage)
  • Some Maas rally improvements: load plugin from class, alarm checks a now parameterized by (critical and warning), and config file location adjustment
  • Switches check now uses Octavia V2 API
  • Ceph gateways are more stable now.
rpc-maas - 1.6.0

Published by rpc-jenkins-svc over 6 years ago

Release Notes

1.6.0

No release notes

rpc-maas - 1.5.0

Published by rpc-jenkins-svc almost 7 years ago

Release Notes

1.5.0

No release notes

rpc-maas - 1.4.0

Published by rpc-jenkins-svc almost 7 years ago

Release Notes

1.4.0

No release notes

rpc-maas - 1.3.1

Published by rpc-jenkins-svc almost 7 years ago

Release Notes

1.3.1

No release notes

rpc-maas - 1.3.0

Published by rpc-jenkins-svc almost 7 years ago

Release Notes

1.3.0

No release notes

rpc-maas -

Published by supermari0 almost 7 years ago

rpc-maas - Release 1.2.1

Published by major about 7 years ago

f685fba Fix ternary logic for setting holland_venv_bin
e66cd5c Allow holland to deploy on all rpc versions
924ae22 Fix typos in plugin and template
00e70d1 Fix issue with template population