cloudprober

An active monitoring software to detect failures before your customers do.

APACHE-2.0 License

Downloads
166
Stars
479
Committers
88

Bot releases are hidden (Show)

cloudprober - v0.11.7 - New probe status UI, bug fixes, and other minor improvements

Published by manugarg over 2 years ago

Probe Status UI

Cloudprober gets its own probe status UI \o/ (screenshot). Until now, we only exported metrics in the formats understood by other monitoring systems like Prometheus; there was no way to decipher the probe status using Cloudprober alone. This release changes that. It adds an option to show probe status in the Cloudprober UI itself. To enable this UI, add the following stanza to your config:

surfacer {
  type: PROBESTATUS
}

By default, probe status UI will be available at the /probestatus URL[1]. This, and other parameters like retention size, can be changed in the config. Note that all the data stays in memory, so if Cloudprober is restarted, data will be gone.

[1] Eventually probe status UI will be enabled by default, and will show up at the /status URL.

Ping probe fixes

  • Fix privileged (raw socket) IPv4 ping (#123).
  • On windows, disable use_datagram_socket by default (#104), and fix the source IP if source IP is not provided (#127).
  • Add real ping tests, and run them through Github actions (#125, #126).

Other improvements

  • Improve startup time in non-Cloud environments (from 6s -> 100ms) (#124).
  • [gRPC probe] Add TLS support (#116).
  • [gRPC probe] Add health check tests (#117).
  • [Targets discovery] Add support for GCE global forwarding rules.

Full Changelog: https://github.com/cloudprober/cloudprober/compare/v0.11.6...v0.11.7

cloudprober - v0.11.6 - Enhancements and bug fixes

Published by manugarg over 2 years ago

Enhancements

  • HTTP Probe: Provide a way to improve coverage for load balanced targets. (#79)

    • Make HTTP connections behavior more deterministic. When doing multiple requests per probe, use a different client (connection) for each request. With this change, you can configure your probe in the keep_alive mode, and increase connection diversity (final endpoint coverage) by configuring requests_per_probe to a large number, say 100.
      probe {
        type: HTTP
        targets {
          host_names: "some-loadbalanced-target"
        }
        ...
        http_probe {
          keep_alive: true
          requests_per_probe: 100
        }
      }
      
    • Allow introducing a gap between HTTP requests. (#89)
  • Add a new probe type: TCP. For now, it simply tries to connect to the given host and port, and increments success and records latency if connection is successful. This is useful for probing non-HTTP, TCP endpoints. (#96)

  • External probe: Support aggregating labeled metrics. (#90)

    • Earlier output lines with labels were skipped from processing if aggregate_in_cloudprober option was enabled. See redis_probe.go for an example probe outputting labels in the output, and cloudprober_aggregate.cfg for the corresponding config aggregating that output into distributions (histograms).
  • Introduce the concept of negative tests. If a probe has the negative_test option enabled, it reports success if the check fails, for example if you configure a ping probe with negative_test: true, probe will succeed only if it doesn't receive any replies back. (#98).

  • Add the ability to add target IP to the labels (#82, #83, #84, #86, #87).

    • You can now add use @target.ip@ to add resolved IP to probe labels. For HTTP, it works only if probe's configured to resolve the IP, through the resolve_first option.

Bug fixes

  • Fix ping probe for IPv4 MacOS: Ping probe stopped working for IPv4 on some MacOS platforms. It's unclear if it impacted only some MacOS version (see #80), but the latest fix should make it work on all variations. (#81)
  • Fix a bug in OAuth handling. If resolve_first was not configured, we were not refreshing OAuth token in the outgoing HTTP requests. We've added tests to verify this behavior now. (#95)
  • Fix a bug in GCE OAuth error handling. (#102)

Full Changelog: https://github.com/cloudprober/cloudprober/compare/v0.11.4...v0.11.6

cloudprober - v0.11.5 - Enhancements and bug fixes

Published by manugarg over 2 years ago

This release was merged in to v0.11.6.

Full Changelog: https://github.com/cloudprober/cloudprober/compare/v0.11.4...v0.11.5

cloudprober - v0.11.4 - Minor enhancements

Published by manugarg over 2 years ago

This release cycle has been more about documentation updates, and build tooling improvements. Here are some of the functionality changes since the last release (v0.11.3):

  • Provide an option to change the latency metric name (b5860e3).
  • Add support for specifying timeout and interval in time.Duration parseable string format: 2ms, 4s, 5m, etc (https://github.com/cloudprober/cloudprober/pull/29).
  • Prometheus surfacer: Remove metrics that haven't been updated for 10 min (8b1d7e5).
  • Cloudwatch surfacer: Make Cloudwatch region configurable (9b743cb).
  • DNS Probe: Generate a new DNS transaction ID for each probe (2295b84).
  • Better cleanup to allow running multiple instances of Cloudprober sequentially (#33).

Complete list of commits since the last release: https://github.com/cloudprober/cloudprober/compare/v0.11.3...v0.11.4

cloudprober - v0.11.2: Bug fix and enhancements

Published by manugarg almost 3 years ago

Enhancements

More accurate "ping" latency measurements
Cloudprober now uses SO_TIMESTAMP socket option (on non-windows OS) to make latency measurements much more accurate. (#546)

Better isolation in HTTP probes for multiple targets
HTTP probes for multiple targets execute in independent goroutines now. This allows for better isolation between probes.

Add HTTP header validator for HTTP probe
You can now add a header validator to an HTTP probe (example).

Bug Fixes

#537 Datagram socket ping (unprivileged ping) over IPv4 is broken for MacOS.
#554 Error initializing cloudprober on GKE.
#584 Goroutine leaking bug in external probe.

Other improvements and fix.

Complete list of commits since the last release: https://github.com/google/cloudprober/compare/v0.11.1...v0.11.2
Bugs/PR in this release: Milestone v0.11.2.

cloudprober - v0.11.3 - Lot of new features, and a couple of bug fixes.

Published by manugarg almost 3 years ago

Metrics export enhancements

  • Two shiny new surfacers: AWS CloudWatch and Datadog. (#141 & #583).
  • You can now configure any surfacer to export metrics as gauge. You just need to add export_as_gauge to the surfacer config. (#604)
  • Also, now you can add failure metric to any surfacer by adding add_failure_metric to the surfacer config. (#604)
  • A common way to filter metrics. You now have fine-grained control of which metrics will be exported to which surfacer, e.g. you may want to send only critical probe metrics to an expensive monitoring system, while you may send all metrics to pub/sub or files. (#597)

Labeling

  • Multiple substitutions: We now support multiple substitutions in labels, e.g. a label like @target.name@:@target.port@@target.relative_url@ will get appropriately substituted. (#582)

Targets Discovery

  • RDS provider for file based targets. To allow sharing a large set of targets across multiple probes.
  • Implement caching in RDS protocol. No need to re-process (on RDS server or client) targets that haven't changed. (#646)
    [Above two changes allowed cloudprober to work with 1M targets in a single targets file. (#634)]
  • Add "node" and "pod" labels for K8s endpoints resources. (#612)

Bug Fixes

  • External probe: Generate metrics even if requests timeout. (#653)
  • External probe: Add additional labels to payload based metrics as well. (#654)
  • HTTP Probe: Fix parallelism when running multiple requests per probe. (#647)

Complete list of commits since the last release: https://github.com/cloudprober/cloudprober/compare/v0.11.2...v0.11.3
Bugs/PR in this release: Milestone v0.11.3.