cloudprober

An active monitoring software to detect failures before your customers do.

APACHE-2.0 License

Downloads
166
Stars
479
Committers
88

Bot releases are hidden (Show)

cloudprober - v0.13.7 - FIPS, improved include support and consistency fixes Latest Release

Published by manugarg 2 months ago

User visible changes

Other fixes and improvements

Docs updates

New Contributors

Full Changelog: https://github.com/cloudprober/cloudprober/compare/v0.13.6...v0.13.7

cloudprober - v0.13.6

Published by manugarg 4 months ago

What's Changed

Targets improvements

  • [targets.resolver] Make cloudprober's resolver more reliable. Note this resolver gets triggered only
    if you configure resolve_first in your probe.
    • On failure, attempt refresh on each resolve call. Earlier, we'll wait for the cache TTL to expire,
      causing delays in getting a good resolution. by @manugarg in #753, #755
  • [targets.resolve] Provide a way to specify network for DNS server override. This will allow users to
    use DNS servers over TCP, which may be critical in enterprise/firewall setting where UDP is sometimes
    restricted. by @manugarg in #756
  • [targets.file] Support YAML format for target files and allow specifying URLs in resources. This will
    make specifying targets through files even more convenient. by @manugarg in #777, #778

Surfacers improvements

  • [surfacers] Ignore sysvars metrics for file surfacer added by default. This is to reduce noise in default
    logs. by @manugarg in #749.
  • [config] Provide a way to specify surfacers config independently. This will allow multi-tenant
    deployments to configure surfacers centrally. by @manugarg in #750.
  • [surfacers] Provide a way to add labels to all metrics through environment variables. by @manugarg
    in #759.

HTTP Probe

Bug fixes

Documentation

Build maintenance

New Contributors

Full Changelog: https://github.com/cloudprober/cloudprober/compare/v0.13.5...v0.13.6

cloudprober - v0.13.5 Couple of bug fixes and support for reading files from cloud

Published by manugarg 5 months ago

What's Changed

Bug fixes

  • [probes.external] Fix a bug in serverutils (#748). If you run an external probe in server mode, and your external probe binary uses serverutils Go library, you may be impacted by this bug. This was bug was introduced in the last version (v0.13.4) during refactoring for more tests coverage.
  • [probes.external] Fix logical race condition in external server probe handling by @apolcyn in https://github.com/cloudprober/cloudprober/pull/744

Other changes

Full Changelog: https://github.com/cloudprober/cloudprober/compare/v0.13.4...v0.13.5

cloudprober - v0.13.4 - Improved gRPC and external probes, HTTP latency breakdown and more

Published by manugarg 6 months ago

New Features / Enhancements

  • Latency Breakdown for HTTP Probes
    Provide a way to report latency breakdown for HTTP probes (#699).

  • gRPC Probe Enhancements

    • Wait for successful connection before sending the request (#726, #729). By default gRPC's DialContext returns immediately even for the non-retryable errors. This mechanism doesn't work very well for probing -- there is no error message to show problems with the connection and connection is in the TRANSIENT_FAILURE state. We now use grpcurl's BlockingDial which takes care of these issues.
    • Use client TLS by default for encryption. (#727)
    • Capture request error messages better. (#731)
  • Streaming Metrics for external probes
    Parse and export external probes' metrics as soon as they are available. (#708, #712, #713, #715, #716, #722)
    This feature enables use cases where external probe runs less frequently (say 1 every 60s) but runs many tasks (for performance measurement, e.g.) and export results many times (say every 5s) within that interval. See discussion in #691 and #689 for more background.

  • Bulk writes in postgres surfacer
    Batch postgres surfacer writes to improve peformance (#717).

  • Allow DNS Overrides
    Allow overriding DNS server (#707).

Other Changes

  • Add a command line flag to control prometheus metrics prefix: --prometheus_metrics_prefix (#732).
  • Fix ostgres surfacer metric filter (#711).
  • [prober.saveconfig] Write probe configs to disk in order (#721).
  • [tls] Reload server certificates as well (#719).
  • [build.tools] Fix where python proto stub is copied to (#700).
  • [build] Update net dependency for sec alert (#728).
  • [build] Update protobuf package to fix security alert (#714).

Contributors

Full Changelog: https://github.com/cloudprober/cloudprober/compare/v0.13.3...v0.13.4

What's Changed

  • Probe scheduling capability:
    • You can now run probes only on the weekdays, only business hours, or turn them off during certain time periods. (#652, #662, #683)
  • External probe server:
  • DNS Probe Improvements:
    • Allow running multiple probes in parallel. (#670)
    • Add support for TCP DNS probes. (#681)
  • Dynamic config: Provide a mechanism to save config to disk on dynamic change. If you program probes dynamically using gRPC interface for example, you can configure cloudprober such that it will reload the config on restart (see https://github.com/cloudprober/cloudprober/issues/645 for more background). (#671)
  • Jsonnet configs support. You can now write Cloudprober configs in textpb (protobuf text), YAML, JSON, and Jsonnet (#687)
  • [surfacers.otel] Additional resource attribute support. (#664)
  • [tls] Fix client cert handling when cert reloading is enabled. (#697)
  • [logs] Redirect container logs to journald. (#682)
  • [website.homepage] Add a diagram to Cloudprober homepage. (#674)
  • [servers.http] Fix /healthcheck when lameduck lister is not initialized. (#684)

Contributors: @manugarg, @cbroglie, @aitorpazos, @ls692, @AdamEAnderson

New Contributors

Full Changelog: https://github.com/cloudprober/cloudprober/compare/v0.13.2...v0.13.3

cloudprober - v0.13.2 OpenTelemetry surfacer and composable config files

Published by manugarg 10 months ago

What's Changed

  • [config] Make cloudprober configs composable. You can now include other configs within a cloudprober config. This will make multi-team config management easier (yay!! 🎉) . #643 (Note this doesn't work for YAML configs yet)
    • Cloudprober helm chart also supports specifying additional configs now.
  • [surfacers] Add OpenTelemetry surfacer. OpenTelemetry is becoming very popular and almost all metrics systems support it. Adding this surfacer increases Cloudprober's integration capabilities multifold. There may still be some rough edges but give it try. #642
  • [surfacers] "failure" metric for all. Now all surfacers, except FILE and PUBSUB, export "failure" metric by default. You can still disable it if you want. #648
  • [surfacers] Filtering metrics by name works for all surfacers now except FILE and PUBSUB. #648
  • [probes.ping] Take a small pause between pushing packets to avoid overwhelming the network buffers by @jumpojoy in https://github.com/cloudprober/cloudprober/pull/634
  • [docs] Documentation enhancements. #627 #650
  • [build] Fix Dockerfile.dev by @jumpojoy in https://github.com/cloudprober/cloudprober/pull/632

Breaking Change

  • [targets] Rename targets.endpoints to targets.endpoint (#646). With this change targets { endpoints {} } config fields will result an error. Sorry for the breaking change, but since these fields were introduced only in the last release (v0.13.1), impact should be minimal.

Security Update

New Contributors

Full Changelog: https://github.com/cloudprober/cloudprober/compare/v0.13.1...v0.13.2

cloudprober - v0.13.1 - Opsgenie integration, alerting enhancements and other changes

Published by manugarg 12 months ago

Alerting Enhancements

  • Opsgenie integration 🎉 #570
  • Add a generic HTTP notifier. This can be used for pretty much anything. #599
  • Resolve alerts automatically wherever possible. #556, #558, #580, #561.
  • Use alert name and target to deduplicate alerts instead of condition start timestamp: #583
  • Add severity to alerts. #569
  • Add target IP to available alert fields. #548
  • Improve documentation. #573, #549, #552, #585

Other Changes (consistency, logs verbosity, documentation, etc)

  • [probes] Make DNS resolve errors behavior more consistent: #616, #619, #620
  • Provide a way to specify detailed targets configuration in the config directly. This will simplify configuration quite significantly. #606
  • [targets] Make targets optional for certain probes. #614
  • [probes] Return an error if interval is smaller than timeout. #560
  • Reduce logs verbosity: #555, #562, #563
  • [docs.targets] Improve targets documentation. #617
  • [config] Streamline config usage and loading: #622
  • [probes.http] Support for new Cloudprober internal scheme, host and path labels. #607, #608
  • [website] Fix company list fonts on the homepage. #612

Build and testing

  • [build] Move a bunch of packages to internal #589, #590, #591, #592
  • [build] Test example configs during build. #543
  • [probes.test] Add tests to verify that empty configs work. #544
  • [cleanup] Cleanup usage of deprecated packages: #603, #604, #605
  • [probes.grpc] Disable connect failures test for macos. #579
  • [examples] Fix myprober example and simplify it. #568
  • [build] Don't fail fast. Run as many tests as possible. #574
  • [build] Run certain actions only in the main repository. #621

Bug fixes

  • [config] Fix bug in envSecret handling. #546
  • [config] Fix a bug in the /config-running disabling functiionality. #554

Security updates

  • [security] Upgrade gRPC package to fix the security issue. #602
  • [build] Update some depdendencies to fix security alerts. #577

Full Changelog: https://github.com/cloudprober/cloudprober/compare/v0.13.0...v0.13.1

Enhancements

Bug fixes

Documentation

Full Changelog: https://github.com/cloudprober/cloudprober/compare/v0.12.9...v0.13.0

Enhancements

Bug fixes

Internal

Build and Testing

New Contributors

Full Changelog: https://github.com/cloudprober/cloudprober/compare/v0.12.8...v0.12.9

cloudprober - v0.12.8 Bug fixes

Published by manugarg over 1 year ago

Bug fixes

  • [probes.http] Fix HTTP request body handling and add a lot of tests. We were using a custom reader for HTTP request body to optimize some aspects of it, but some parts of the Go's HTTP implementation expect the request body to be a buffered reader. #408, #407, #409, #411
  • [oauth2] Fix a bug in HTTP token refresh, introduced by the above change. #422
  • [probes.http] Fix how we determine whether to change the TLS config servername. #420
  • [probes.udp] Fix additional labels handling for UDP probe. #421

Other changes

  • [alerting] Add support for email notifications. #403
  • [alerting] Fix config protobuf package name. #402
  • [build] Exclude "tip" from releases. #400
  • [probes.http] Include headers configured with header option in requests. #418

New Contributors

Full Changelog: https://github.com/cloudprober/cloudprober/compare/v0.12.7...v0.12.8

cloudprober - v0.12.7 Alerting (new), enhancements, and bug fixes

Published by manugarg over 1 year ago

Alerting

It's official. Cloudprober is getting its own alerting functionality 🎉. This release marks the beginning of the alerting implementation. We'll add more ways to notify an alert, but you can already trigger a command on alert. User documentation coming soon, but in the meantime, you can take a look at the alerting config proto to get an idea.

PRs: #359, #361, #364, #365, #380, #384, #387

Enhancements

  • [probes.http] Provide a way to set custom user-agent. (#317)
  • [probes.external] Add support for environment variables. (#379, #386).
  • [surfacers.datadog] Add compression, and use batching, when posting metrics. (#357, #358)
  • [surfacers] Adding a bigquery surfacer. (#274)
  • [cmd] Exit on non-flag arguments. (#372)
  • [prober] Start probing more quickly. (#382)
  • [validators] Log validation failures. (#377)

Bug fixes

  • [probes.http] Set request's GetBody explicitly to make redirect work for POST requests with data. To make HTTP requests work same across Cloudprober, move HTTP request creation to a common package. This also allows for better testing. (#373, #375, #376, #381, #383)
  • [probes.http] Fix a bug in connect_event metric update. (#388)
  • [rds.gcp] Delete a zone from cache if it's not discovered again. (#318)
  • [probestatus] Fix JS variable declarations. (#321, #322)
  • [probestatus] Fix the graph issue for probes with "." in name. (#363)
  • [oauth] Fix a bug in http_token. (#371)
  • [probes.external] Synchronize access to cmdRunning (#395). Add tests to discover the same issue (#397).

Build

  • [oauth] Improve token source test reliability. (#325)
  • [build] Update protoc-gen-go version to the latest. (#374)
  • [metrics.testutils] Refactor testutils to make it more convenient. (#378)

Docs

Complete website (cloudprober.org) redesign: #327. Other PRs: #335, #344, #349, #351.

New Contributors

Full Changelog: https://github.com/cloudprober/cloudprober/compare/v0.12.6...v0.12.7

cloudprober - v0.12.6 Bug fixes and minor enhancements

Published by manugarg over 1 year ago

Bug fixes

  • [probes.tcp] Fix a bug in TCP probe scheduler. (#297)
  • [debug] Add back debug pprof URLs. (#298)
  • [labels] Fix a bug in additional labels processing. (#301)

Enhancements

  • [targets.k8s] Automatically add anchors to the name filter regex. (#302)
  • [probes.external] Output stderr from a command by default when running external probe with once mode. (#304)
  • [probes.grpc] Use discovered target's IP for connection. (#309)
  • [probes.tcp] Add port to dst label automatically if port comes from the targets definition or discovery. (#310)
  • [surfacers.stackdriver] Customize monitoringURL prefix for Stackdriver metrics. (#306)

New Contributors

Full Changelog: https://github.com/cloudprober/cloudprober/compare/v0.12.5...v0.12.6

cloudprober - v0.12.5 - Better OAuth, better k8s support, couple of bug fixes and much more

Published by manugarg over 1 year ago

OAuth Enhancements

  • Provide a way to get access token from a URL. (#275)
  • Add support for getting JSON tokens from a script or a file (through the bearer token type). (#278)
  • Add k8s bearer token source to provide a convenient way to access k8s local token. (#285)

K8s Targets

  • Add native k8s targets support. (#276)
  • Refresh in-cluster token every minute. (#284)

gRPC probe

  • Add option to use insecure transport credentials. (#269)
  • Support setting headers in probe config. (#273)

Bug fixes

  • [probes.http] Don't overwrite TLS config's server name if configured explicitly. (thanks @cbroglie for catching this)
  • [probes.tcp] Allow zero config for TCP probe.

Other changes

  • [probes.http] Provide an option to limit HTTP redirects. (#289)
  • [targets] Allow space separator for static hosts. (#287)
  • [surfacers.cloudwatch] Increase buffer size to 1000 and add a timer for the cloudwatch surfacer. (#282)
  • [dev] Add a dockerfile for development and testing. (#270)
  • [oauth] Fix race and make refresh expiry buffer configurable. (#280)
  • [rds.client] Really implement the on-demand client (created by setting re_eval_sec to 0). (#279)
  • [dev] Upgrade a bunch of dependencies (#281).
  • [oauth] Make refresh_expiry_buffer_sec a common option for all OAuth (#286)

New Contributors

Full Changelog: https://github.com/cloudprober/cloudprober/compare/v0.12.3...v0.12.5

cloudprober - v0.12.4

Published by manugarg over 1 year ago

cloudprober - v0.12.3 - Helm chart, bug fixes and enhancements

Published by manugarg over 1 year ago

🎉🎉 Cloudprober now has a helm chart to make installation on Kubernetes super convenient: https://helm.cloudprober.org/

Enhancements

  • [probes.ping] Add an option to prevent OS-level fragmentation. by @darinpeetz in #235
    • Note: this works only on Linux systems. On other systems you get a warning (#238).
  • Better support for Google Cloud Run (#258, #263, #264, #267)
  • Homepage refactor. (#118)

Bug fixes

  • [probes.http] Fix keep-alive option when doing multiple requests in parallel (#231). This bug got introduced a few months back, while trying to fix something else. To avoid regressions in future add comprehensive testing for keep alive and other features (#233).
  • [probes.http] Really fix the nil config problem (#219).
  • [web] Fix the URL in the config page. (#260)

Other changes

  • Add option to configure MaxIdleConns in http prober by @darinpeetz (#232).
  • Lot of test improvements to make tests more robust and usable (#239 to #249).
  • Changes to improve build speed (#244, #250).
  • [docker] Use 'main' tag for the image built from the HEAD. (#259)
  • [docker] Simplify Dockerfile as we don't need to copy CA certs anymore. (#265)
  • [proto-go-generation] Always download protoc. (#256)
  • Added custom labels option for cloud log entries by @v-pratap in https://github.com/cloudprober/cloudprober/pull/234

New Contributors

Full Changelog: https://github.com/cloudprober/cloudprober/compare/v0.12.2...v0.12.3

cloudprober - v0.12.2 - Minor bug fixes

Published by manugarg almost 2 years ago

What's Changed

New Contributors

Full Changelog: https://github.com/cloudprober/cloudprober/compare/v0.12.1...v0.12.2

cloudprober - v0.12.1 - JSON validator, improved status page and more

Published by manugarg almost 2 years ago

  • Brand new JSON validator (see #169). I am really excited about this change. It will be a game changer as it will allow testing REST APIs without writing any code at all. I'll write more about doing API probing using Cloudprober. -@manugarg (Implementation PR: #194).

  • Improved probe status page. Probe status page has gone many changes and is the default status page now (available at /status). Status page now has a better look and feel, shows when data is not available, drops stale targets (no data for 6h), shows build info in the header, and allows selecting probes through the UI. Screenshot. By @manugarg through PRs: #201, #203, #208, #209, #210, #211, #212.

  • Use discovered IP addresses for probing without any additional configuration (before this user had to configure resolve_first to use the discovered IP). By @manugarg through PRs: #198, #199.

  • Migrate AWS SDK from v1 to v2. API changed quite a bit, so it required more changes than just changing the version number. By @robpickerill in #153.

  • Migrate ingresses from v1beta1 to v1 API. By @stylianosrigas in #215.

New Contributors

Full Changelog: https://github.com/cloudprober/cloudprober/compare/v0.12.0...v0.12.1

cloudprober - v0.12.0 - Bug fixes and some improvements

Published by manugarg about 2 years ago

Bug fixes

Significant Improvements

Other changes

New Contributors

Full Changelog: https://github.com/cloudprober/cloudprober/compare/v0.11.9...v0.12.0

cloudprober - v0.11.9 - Bug fixes and minor improvements

Published by manugarg about 2 years ago

Bug fixes and improvements

  • [HTTP Probe] Fix HTTPS probe's TLS cert validation behavior when using resolve_first. (#158, #159 by @manugarg)
  • [HTTP Probe] Set content length header when HTTP request body is configured. Without this header, Go's HTTP client assumes chunked encoding. (#168 by @steinarvk-oda)
  • [External probe] Clean up all child processes. (#165, #167 by @manugarg )
  • [RDS] Fix GCP resource discovery while running on GKE. (#170, #171 by @manugarg)

Logging improvements

  • [RDS client] Don't "Info" log if state has not changed. (#160 by @manugarg)
  • [Logger] Don't add instance name label on Kubernetes. (#173 by @manugarg)

New Contributors (Congratulations, and welcome!)

Full Changelog: https://github.com/cloudprober/cloudprober/compare/v0.11.8...v0.11.9

cloudprober - v0.11.8 - New features and bug fixes

Published by manugarg over 2 years ago

New features

  • Add SSL cert expiration metric for HTTPS probes: ssl_earliest_cert_expiry_sec (#119 by @robpickerill).
  • Enable probestatus UI by default (#154). Probe stats are now available at the URL /probestatus. If you suspect a problem because of this (high memory usage for example1), you can disable this UI by adding the following stanza to your config:
    surfacer {
      probestatus_surfacer {
        disable: true
      }
    }
    

Bug fixes

  • [targets.file] Fix file targets' re-eval strategy (#140).
  • [http] Fix HTTP/2 behavior (#141).
  • [external] Start executing external probe right away (#145).
  • [rds.k8s] Fix "pods" resources parsing bug (#151).

Other cleanup and maintenance

  • Update some mod dependencies (#142, #152)
  • Cleanup: Remove unused function, and fix some issues reported by staticcheck (#146, #147).
  • Close UDP listeners when context is done (#148). This is to make sure Cloudprober can be cleanly restarted.
  • Consistency: Make gRPC runconfig similar to other configs (#155).

[1] -- There is a protection in place to avoid too much memory consumption if you run a lot of probes. By default, we keep only 3 days (configurable) worth of data, and only for up to 20 targets per probe. Also, the page served at /probestatus is cached with a TTL of 2s, to avoid DoSing Cloudprober functionality just by accessing /probestatus.

Full Changelog: https://github.com/cloudprober/cloudprober/compare/v0.11.7...v0.11.8

- Manu Garg (@manugarg)