Bot releases are visible (Hide)

mimir - 2.12.0 Latest Release

Published by pr00se 7 months ago

This release contains 531 PRs from 60 authors, including new contributors Benoit Schipper, Derek Cadzow, Edwin, Itay Kalfon, Ivan Farré Vicente, Jan O. Rundshagen, Jorge Turrado Ferrero, Lukas Monkevicius, Mickaël Canévet, Rafael Sathler, Rajakavitha Kodhandapani, Tim Kotowski, Vladimir Varankin, Zach, Zach Day, Zirko, blut, github-actions[bot], ncharaf, zhehao-grafana. Thank you!

Grafana Mimir version 2.12.0 release notes

Grafana Labs is excited to announce version 2.12 of Grafana Mimir.

The highlights that follow include the top features, enhancements, and bug fixes in this release.
For the complete list of changes, refer to the CHANGELOG.

Features and enhancements

Added support to only count series that are considered active through the Cardinality API endpoint /api/v1/cardinality/label_names by passing the count_method parameter.
If set to active it counts only series that are considered active according to the -ingester.active-series-metrics-idle-timeout flag setting rather than counting all in-memory series.
The "Store-gateway: bucket tenant blocks" admin page contains a new column "No Compact".
If block no compaction marker is set, it specifies the reason and the date the marker is added.
The estimated number of compaction jobs based on the current bucket-index is now computed by the compactor.
The result is tracked by the new cortex_bucket_index_compaction_jobs metric.
If this computation fails, the cortex_bucket_index_compaction_jobs_errors_total metric is updated instead.
The estimated number of compaction jobs is also shown in Top tenants, Tenants, and Compactor dashboards.
Added mimir-distroless container image built upon a distroless image (gcr.io/distroless/static-debian12).
This improvement minimizes attack surfaces and potential CVEs by trimming down the dependencies within the image.
After comprehensive testing, the Mimir maintainers plan to shift from the current image to the distroless version.

Additionally, the following previously experimental features are now considered stable:

The number of pre-allocated workers used to forward push requests to the ingesters, configurable via the -distributor.reusable-ingester-push-workers CLI flag on distributors.
It now defaults to 2000.
Note that this is a performance optimization, and not a limiting feature.
If not enough workers available, new goroutines will be spawned.
The number of gRPC server workers used to serve the requests, configurable via the -server.grpc.num-workers CLI flag.
It now defaults to 100.
Note that this is the number of pre-allocated long-lived workers, and not a limiting feature.
If not enough workers are available, new goroutines will be spawned.
The maximum number of concurrent index header loads across all tenants, configurable via the -blocks-storage.bucket-store.index-header.lazy-loading-concurrency CLI flag on store-gateways.
It defaults to 4.
The maximum time to wait for the query-frontend to become ready before rejecting requests, configurable via the -query-frontend.not-running-timeout CLI flag on query-frontends.
It now defaults to 2s.
The CLI flag that allows queriers to reduce pressure on ingesters by initially querying only the minimum set of ingesters required to reach quorum, -querier.minimize-ingester-requests.
It is now enabled by default.
Spread-minimizing token-related CLI flags: -ingester.ring.token-generation-strategy, -ingester.ring.spread-minimizing-zones and -ingester.ring.spread-minimizing-join-ring-in-order.
You can read more about this feature in our blog post.

Important changes

In Grafana Mimir 2.12 the following behavior has changed:

Store-gateway now persists a sparse version of the index-header to disk on construction and loads sparse index-headers from disk instead of the whole index-header.
This improves the speed at which index headers are lazy-loaded from disk by up to 90%. The added disk usage is in the order of 1-2%.
Alertmanager deprecated the v1 API. All v1 API endpoints now respond with a JSON deprecation notice and a status code of 410.
All endpoints have a v2 equivalent.
The list of endpoints is:
- <alertmanager-web.external-url>/api/v1/alerts
- <alertmanager-web.external-url>/api/v1/receivers
- <alertmanager-web.external-url>/api/v1/silence/{id}
- <alertmanager-web.external-url>/api/v1/silences
- <alertmanager-web.external-url>/api/v1/status
Exemplar's label traceID has been changed to trace_id to be consistent with the OpenTelemetry standard.
Errors returned by ingesters now contain only gRPC status codes.
Previously they contained both gRPC and HTTP status codes.

{{< admonition type="warning" >}}
To guarantee backwards compatibility when migrating from a version prior to 2.11, it's necessary to first migrate to version 2.11, and then to version 2.12.
Otherwise, it might happen that during the migration, some ingester errors with HTTP status code 4xx won't be recognized, and the corresponding request will be repeated.
{{< /admonition >}}
Responses with gRPC status codes are now reported as status_code labels in the cortex_request_duration_seconds and cortex_ingester_client_request_duration_seconds metrics.
Responses with HTTP 4xx status codes are now treated as errors and used in status_code label of request duration metric.

The default value of the following CLI flags have been changed:

-blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes from 10MB to 100MB.
-blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes from 10MB to 100MB.
-blocks-storage.bucket-store.tenant-sync-concurrency from 10 to 1.
-query-frontend.max-cache-freshness from 1m to 10m.
-distributor.write-requests-buffer-pooling-enabled from false to true.
-locks-storage.bucket-store.block-sync-concurrency from 20 to 4.
-memberlist.stream-timeout from 10s to 2s.
-server.report-grpc-codes-in-instrumentation-label-enabled from false to true.

The following deprecated configuration options are removed in Grafana Mimir 2.12:

The YAML setting frontend.cache_unaligned_requests.
Experimental CLI flag -querier.prefer-streaming-chunks-from-ingesters.

The following configuration options are deprecated and will be removed in Grafana Mimir 2.14:

The CLI flag -ingester.limit-inflight-requests-using-grpc-method-limiter.
It now defaults to true.
The CLI flag -ingester.return-only-grpc-errors.
It now defaults to true.

{{< admonition type="warning" >}}
To guarantee backwards compatibility when migrating from a version prior to 2.11, it's necessary to first migrate to version 2.11, and then to version 2.12.
Otherwise, it might happen that during the migration, some ingester errors with HTTP status code 4xx won't be recognized, and the corresponding request will be repeated.
{{< /admonition >}}
The CLI flag -ingester.client.report-grpc-codes-in-instrumentation-label-enabled.
It now defaults to true.
The CLI flag -distributor.limit-inflight-requests-using-grpc-method-limiter.
It now defaults to true.
The CLI flag -distributor.enable-otlp-metadata-storage.
It now defaults to true.
The CLI flag -querier.max-query-into-future.

The following metrics are removed or deprecated:

cortex_bucket_store_blocks_loaded_by_duration has been removed.
cortex_distributor_sample_delay_seconds has been deprecated and will be removed in Mimir 2.14.

Experimental features

Grafana Mimir 2.12 includes new features that are considered experimental and disabled by default.
Use them with caution and report any issues you encounter:

The maximum number of tenant IDs that may be for a federated query can be configured via the -tenant-federation.max-tenants CLI flag on query-frontends.
By default, it's 0, meaning that the limit is disabled.
Sharding of active series queries can be enabled via the -query-frontend.shard-active-series-queries CLI flag on query-frontends.
Timely head compaction can be enabled via the -blocks-storage.tsdb.timely-head-compaction-enabled on ingesters.
If enabled, the head compaction happens when the min block range can no longer be appended, without requiring 1.5x the chunk range worth of data in the head.
Streaming of responses from querier to query-frontend can be enabled via the -querier.response-streaming-enabled CLI flag on queriers.
This is currently supported only for responses from the /api/v1/cardinality/active_series endpoint.
The maximum response size for active series queries, in bytes, can be set via the -querier.active-series-results-max-size-bytes CLI flag on queriers.
Metric relabeling on a per-tenant basis can be forcefully disabled via the -distributor.metric-relabeling-enabled CLI flag on rulers.
Metrics relabeling is enabled by default.
Query Queue Load Balancing by Query Component. Tenant query queues in the query-scheduler can now be split into subqueues by which query component is expected to be utilized to complete the query: ingesters, store-gateways, both, or uncategorized.
Dequeuing queries for a given tenant will rotate through the query component subqueues via simple round-robin.
In the event that the one of the query components (ingesters or store-gateways) experience a slowdown, queries only utilizing the other query component can continue to be serviced.
This feature is recommended to be enabled.
The following CLI flags must be set to true in order to be in effect:
- -query-frontend.additional-query-queue-dimensions-enabled on the query-frontend.
- -query-scheduler.additional-query-queue-dimensions-enabled on the query-scheduler.
Owned series tracking in ingesters can be enabled via the -ingester.track-ingester-owned-series CLI flag.
When enabled, ingesters will track the number of in-memory series that still map to the ingester based on the ring state.
These counts are more reactive to ring and shard changes than in-memory series, and can be used when enforcing tenant series limits by enabling the -ingester.use-ingester-owned-series-for-limits CLI flag.
This feature requires zone-aware replication to be enabled, and the replication factor to be equal to the number of zones.

Bug fixes

Distributor: fixed an issue where -distributor.metric-relabeling-enabled could cause distributors to panic.
Distributor: fix an issue where -distributor.metric-relabeling-enabled could cause distributors to write unsorted labels and corrupt blocks.
Ingester: errors encountered while iterating through chunks or samples in response to a query request aren't ignored anymore.
Compactor: out-of-order blocks aren't allowed to prevent timely compaction anymore.
Querier: requests to store-gateway when a query gets canceled aren't retried anymore.
Querier: status code 499 is now returned instead of 500 when a request to remote read endpoint gets canceled.
Querier: fixed an issue where -querier.max-fetched-series-per-query wasn't applied to /series endpoint in case series loaded from ingesters.
Querier: fixed an issue with the remote-read requests HTTP status code translations.
Previously, remote-read had conflicting behaviours: when returning samples all internal errors were translated to HTTP 400, while when returning chunks all internal errors were translated to HTTP 500.
With this fix, all validation errors will be translated into HTTP 400 errors, while all other errors will be translated into HTTP 500 errors.
Query-frontend: the cortex_query_frontend_queries_total metric incorrectly reported op="query" for any request which wasn't a range query.
Now the op label value can be one of the following:
- query: instant query
- query_range: range query
- cardinality: cardinality query
- label_names_and_values: label names / values query
- active_series: active series query
- other: any other request
Ruler: fixed an issue where "failed to remotely evaluate query expression, will retry" messages were logged without context such as the trace ID and didn't appear in trace events.
Ruler: requests to remote querier when server's response exceeds its configured max payload size aren't retried anymore.
Ruler: fixed a regression that caused client errors to be tracked in cortex_ruler_write_requests_failed_total metric.
Ruler: fixed an issue with recording rule result being corruption due to an usage of a bad native histogram pointer.

Helm chart improvements

The Grafana Mimir and Grafana Enterprise Metrics Helm charts are released independently.
Refer to the Grafana Mimir Helm chart documentation.

Changelog

2.12.0

Grafana Mimir

[CHANGE] Alertmanager: Deprecates the v1 API. All v1 API endpoints now respond with a JSON deprecation notice and a status code of 410. All endpoints have a v2 equivalent. The list of endpoints is: #7103
- <alertmanager-web.external-url>/api/v1/alerts
- <alertmanager-web.external-url>/api/v1/receivers
- <alertmanager-web.external-url>/api/v1/silence/{id}
- <alertmanager-web.external-url>/api/v1/silences
- <alertmanager-web.external-url>/api/v1/status
[CHANGE] Ingester: Increase default value of -blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes and -blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes to 100 MiB (previous default value was 10 MiB). #6764
[CHANGE] Validate tenant IDs according to documented behavior even when tenant federation is not enabled. Note that this will cause some previously accepted tenant IDs to be rejected such as those longer than 150 bytes or containing | characters. #6959
[CHANGE] Ruler: don't use backoff retry on remote evaluation in case of 4xx errors. #7004
[CHANGE] Server: responses with HTTP 4xx status codes are now treated as errors and used in status_code label of request duration metric. #7045
[CHANGE] Memberlist: change default for -memberlist.stream-timeout from 10s to 2s. #7076
[CHANGE] Memcached: remove legacy thanos_cache_memcached_* and thanos_memcached_* prefixed metrics. Instead, Memcached and Redis cache clients now emit thanos_cache_* prefixed metrics with a backend label. #7076
[CHANGE] Ruler: the following metrics, exposed when the ruler is configured to discover Alertmanager instances via service discovery, have been renamed: #7057
- prometheus_sd_failed_configs renamed to cortex_prometheus_sd_failed_configs
- prometheus_sd_discovered_targets renamed to cortex_prometheus_sd_discovered_targets
- prometheus_sd_received_updates_total renamed to cortex_prometheus_sd_received_updates_total
- prometheus_sd_updates_delayed_total renamed to cortex_prometheus_sd_updates_delayed_total
- prometheus_sd_updates_total renamed to cortex_prometheus_sd_updates_total
- prometheus_sd_refresh_failures_total renamed to cortex_prometheus_sd_refresh_failures_total
- prometheus_sd_refresh_duration_seconds renamed to cortex_prometheus_sd_refresh_duration_seconds
[CHANGE] Query-frontend: the default value for -query-frontend.not-running-timeout has been changed from 0 (disabled) to 2s. The configuration option has also been moved from "experimental" to "advanced". #7127
[CHANGE] Store-gateway: to reduce disk contention on HDDs the default value for blocks-storage.bucket-store.tenant-sync-concurrency has been changed from 10 to 1 and the default value for blocks-storage.bucket-store.block-sync-concurrency has been changed from 20 to 4. #7136
[CHANGE] Store-gateway: Remove deprecated CLI flags -blocks-storage.bucket-store.index-header-lazy-loading-enabled and -blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout and their corresponding YAML settings. Instead, use -blocks-storage.bucket-store.index-header.lazy-loading-enabled and -blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout. #7521
[CHANGE] Store-gateway: Mark experimental CLI flag -blocks-storage.bucket-store.index-header.lazy-loading-concurrency and its corresponding YAML settings as advanced. #7521
[CHANGE] Store-gateway: Remove experimental CLI flag -blocks-storage.bucket-store.index-header.sparse-persistence-enabled since this is now the default behavior. #7535
[CHANGE] All: set -server.report-grpc-codes-in-instrumentation-label-enabled to true by default, which enables reporting gRPC status codes as status_code labels in the cortex_request_duration_seconds metric. #7144
[CHANGE] Distributor: report gRPC status codes as status_code labels in the cortex_ingester_client_request_duration_seconds metric by default. #7144
[CHANGE] Distributor: CLI flag -ingester.client.report-grpc-codes-in-instrumentation-label-enabled has been deprecated, and its default value is set to true. #7144
[CHANGE] Ingester: CLI flag -ingester.return-only-grpc-errors has been deprecated, and its default value is set to true. To ensure backwards compatibility, during a migration from a version prior to 2.11.0 to 2.12 or later, -ingester.return-only-grpc-errors should be set to false. Once all the components are migrated, the flag can be removed. #7151
[CHANGE] Ingester: the following CLI flags have been moved from "experimental" to "advanced": #7169
- -ingester.ring.token-generation-strategy
- -ingester.ring.spread-minimizing-zones
- -ingester.ring.spread-minimizing-join-ring-in-order
[CHANGE] Query-frontend: the default value of the CLI flag -query-frontend.max-cache-freshness (and its respective YAML configuration parameter) has been changed from 1m to 10m. #7161
[CHANGE] Distributor: default the optimization -distributor.write-requests-buffer-pooling-enabled to true. #7165
[CHANGE] Tracing: Move query information to span attributes instead of span logs. #7046
[CHANGE] Distributor: the default value of circuit breaker's CLI flag -ingester.client.circuit-breaker.cooldown-period has been changed from 1m to 10s. #7310
[CHANGE] Store-gateway: remove cortex_bucket_store_blocks_loaded_by_duration. cortex_bucket_store_series_blocks_queried is better suited for detecting when compactors are not able to keep up with the number of blocks to compact. #7309
[CHANGE] Ingester, Distributor: the support for rejecting push requests received via gRPC before reading them into memory, enabled via -ingester.limit-inflight-requests-using-grpc-method-limiter and -distributor.limit-inflight-requests-using-grpc-method-limiter, is now stable and enabled by default. The configuration options have been deprecated and will be removed in Mimir 2.14. #7360
[CHANGE] Distributor: Change-distributor.enable-otlp-metadata-storage flag's default to true, and deprecate it. The flag will be removed in Mimir 2.14. #7366
[CHANGE] Store-gateway: Use a shorter TTL for cached items related to temporary blocks. #7407 #7534
[CHANGE] Standardise exemplar label as "trace_id". #7475
[CHANGE] The configuration option -querier.max-query-into-future has been deprecated and will be removed in Mimir 2.14. #7496
[CHANGE] Distributor: the metric cortex_distributor_sample_delay_seconds has been deprecated and will be removed in Mimir 2.14. #7516
[CHANGE] Query-frontend: The deprecated YAML setting frontend.cache_unaligned_requests has been moved to limits.cache_unaligned_requests. #7519
[CHANGE] Querier: the CLI flag -querier.minimize-ingester-requests has been moved from "experimental" to "advanced". #7638
[FEATURE] Introduce -server.log-source-ips-full option to log all IPs from Forwarded, X-Real-IP, X-Forwarded-For headers. #7250
[FEATURE] Introduce -tenant-federation.max-tenants option to limit the max number of tenants allowed for requests when federation is enabled. #6959
[FEATURE] Cardinality API: added a new count_method parameter which enables counting active label names. #7085
[FEATURE] Querier / query-frontend: added -querier.promql-experimental-functions-enabled CLI flag (and respective YAML config option) to enable experimental PromQL functions. The experimental functions introduced are: mad_over_time(), sort_by_label() and sort_by_label_desc(). #7057
[FEATURE] Alertmanager API: added -alertmanager.grafana-alertmanager-compatibility-enabled CLI flag (and respective YAML config option) to enable an experimental API endpoints that support the migration of the Grafana Alertmanager. #7057
[FEATURE] Alertmanager: Added -alertmanager.utf8-strict-mode-enabled to control support for any UTF-8 character as part of Alertmanager configuration/API matchers and labels. It's default value is set to false. #6898
[FEATURE] Querier: added histogram_avg() function support to PromQL. #7293
[FEATURE] Ingester: added -blocks-storage.tsdb.timely-head-compaction flag, which enables more timely head compaction, and defaults to false. #7372
[FEATURE] Compactor: Added /compactor/tenants and /compactor/tenant/{tenant}/planned_jobs endpoints that provide functionality that was provided by tools/compaction-planner -- listing of planned compaction jobs based on tenants' bucket index. #7381
[FEATURE] Add experimental support for streaming response bodies from queriers to frontends via -querier.response-streaming-enabled. This is currently only supported for the /api/v1/cardinality/active_series endpoint. #7173
[FEATURE] Release: Added mimir distroless docker image. #7371
[FEATURE] Add support for the new grammar of {"metric_name", "l1"="val"} to promql and some of the exposition formats. #7475 #7541
[ENHANCEMENT] Distributor: Add a new metric cortex_distributor_otlp_requests_total to track the total number of OTLP requests. #7385
[ENHANCEMENT] Vault: add lifecycle manager for token used to authenticate to Vault. This ensures the client token is always valid. Includes a gauge (cortex_vault_token_lease_renewal_active) to check whether token renewal is active, and the counters cortex_vault_token_lease_renewal_success_total and cortex_vault_auth_success_total to see the total number of successful lease renewals / authentications. #7337
[ENHANCEMENT] Store-gateway: add no-compact details column on store-gateway tenants admin UI. #6848
[ENHANCEMENT] PromQL: ignore small errors for bucketQuantile #6766
[ENHANCEMENT] Distributor: improve efficiency of some errors #6785
[ENHANCEMENT] Ruler: exclude vector queries from being tracked in cortex_ruler_queries_zero_fetched_series_total. #6544
[ENHANCEMENT] Ruler: local storage backend now supports reading a rule group via /config/api/v1/rules/{namespace}/{groupName} configuration API endpoint. #6632
[ENHANCEMENT] Query-Frontend and Query-Scheduler: split tenant query request queues by query component with query-frontend.additional-query-queue-dimensions-enabled and query-scheduler.additional-query-queue-dimensions-enabled. #6772
[ENHANCEMENT] Distributor: support disabling metric relabel rules per-tenant via the flag -distributor.metric-relabeling-enabled or associated YAML. #6970
[ENHANCEMENT] Distributor: -distributor.remote-timeout is now accounted from the first ingester push request being sent. #6972
[ENHANCEMENT] Storage Provider: -<prefix>.s3.sts-endpoint sets a custom endpoint for AWS Security Token Service (AWS STS) in s3 storage provider. #6172
[ENHANCEMENT] Querier: add cortex_querier_queries_storage_type_total metric that indicates how many queries have executed for a source, ingesters or store-gateways. Add cortex_querier_query_storegateway_chunks_total metric to count the number of chunks fetched from a store gateway. #7099,#7145
[ENHANCEMENT] Query-frontend: add experimental support for sharding active series queries via -query-frontend.shard-active-series-queries. #6784
[ENHANCEMENT] Distributor: set -distributor.reusable-ingester-push-workers=2000 by default and mark feature as advanced. #7128
[ENHANCEMENT] All: set -server.grpc.num-workers=100 by default and mark feature as advanced. #7131
[ENHANCEMENT] Distributor: invalid metric name error message gets cleaned up to not include non-ascii strings. #7146
[ENHANCEMENT] Store-gateway: add source, level, and out_or_order to cortex_bucket_store_series_blocks_queried metric that indicates the number of blocks that were queried from store gateways by block metadata. #7112 #7262 #7267
[ENHANCEMENT] Compactor: After updating bucket-index, compactor now also computes estimated number of compaction jobs based on current bucket-index, and reports the result in cortex_bucket_index_estimated_compaction_jobs metric. If computation of jobs fails, cortex_bucket_index_estimated_compaction_jobs_errors_total is updated instead. #7299
[ENHANCEMENT] Mimir: Integrate profiling into tracing instrumentation. #7363
[ENHANCEMENT] Alertmanager: Adds metric cortex_alertmanager_notifications_suppressed_total that counts the total number of notifications suppressed for being silenced, inhibited, outside of active time intervals or within muted time intervals. #7384
[ENHANCEMENT] Query-scheduler: added more buckets to cortex_query_scheduler_queue_duration_seconds histogram metric, in order to better track queries staying in the queue for longer than 10s. #7470
[ENHANCEMENT] A type label is added to prometheus_tsdb_head_out_of_order_samples_appended_total metric. #7475
[ENHANCEMENT] Distributor: Optimize OTLP endpoint. #7475
[ENHANCEMENT] API: Use github.com/klauspost/compress for faster gzip and deflate compression of API responses. #7475
[ENHANCEMENT] Ingester: Limiting on owned series (-ingester.use-ingester-owned-series-for-limits) now prevents discards in cases where a tenant is sharded across all ingesters (or shuffle sharding is disabled) and the ingester count increases. #7411
[ENHANCEMENT] Block upload: include converted timestamps in the error message if block is from the future. #7538
[ENHANCEMENT] Query-frontend: Introduce -query-frontend.active-series-write-timeout to allow configuring the server-side write timeout for active series requests. #7553 #7569
[BUGFIX] Ingester: don't ignore errors encountered while iterating through chunks or samples in response to a query request. #6451
[BUGFIX] Fix issue where queries can fail or omit OOO samples if OOO head compaction occurs between creating a querier and reading chunks #6766
[BUGFIX] Fix issue where concatenatingChunkIterator can obscure errors #6766
[BUGFIX] Fix panic during tsdb Commit #6766
[BUGFIX] tsdb/head: wlog exemplars after samples #6766
[BUGFIX] Ruler: fix issue where "failed to remotely evaluate query expression, will retry" messages are logged without context such as the trace ID and do not appear in trace events. #6789
[BUGFIX] Ruler: do not retry requests to remote querier when server's response exceeds its configured max payload size. #7216
[BUGFIX] Querier: fix issue where spans in query request traces were not nested correctly. #6893
[BUGFIX] Fix issue where all incoming HTTP requests have duplicate trace spans. #6920
[BUGFIX] Querier: do not retry requests to store-gateway when a query gets canceled. #6934
[BUGFIX] Querier: return 499 status code instead of 500 when a request to remote read endpoint gets canceled. #6934
[BUGFIX] Querier: fix issue where -querier.max-fetched-series-per-query is not applied to /series endpoint if the series are loaded from ingesters. #7055
[BUGFIX] Distributor: fix issue where -distributor.metric-relabeling-enabled may cause distributors to panic #7176
[BUGFIX] Distributor: fix issue where -distributor.metric-relabeling-enabled may cause distributors to write unsorted labels and corrupt blocks #7326
[BUGFIX] Query-frontend: the cortex_query_frontend_queries_total report incorrectly reported op="query" for any request which wasn't a range query. Now the op label value can be one of the following: #7207
- query: instant query
- query_range: range query
- cardinality: cardinality query
- label_names_and_values: label names / values query
- active_series: active series query
- other: any other request
[BUGFIX] Fix performance regression introduced in Mimir 2.11.0 when uploading blocks to AWS S3. #7240
[BUGFIX] Query-frontend: fix race condition when sharding active series is enabled (see above) and response is compressed with snappy. #7290
[BUGFIX] Query-frontend: "query stats" log unsuccessful replies from downstream as "failed". #7296
[BUGFIX] Packaging: remove reload from systemd file as mimir does not take into account SIGHUP. #7345
[BUGFIX] Compactor: do not allow out-of-order blocks to prevent timely compaction. #7342
[BUGFIX] Update google.golang.org/grpc to resolve occasional issues with gRPC server closing its side of connection before it was drained by the client. #7380
[BUGFIX] Query-frontend: abort response streaming for active_series requests when the request context is canceled. #7378
[BUGFIX] Compactor: improve compaction of sporadic blocks. #7329
[BUGFIX] Ruler: fix regression that caused client errors to be tracked in cortex_ruler_write_requests_failed_total metric. #7472
[BUGFIX] promql: Fix Range selectors with an @ modifier are wrongly scoped in range queries. #7475
[BUGFIX] Fix metadata API using wrong JSON field names. #7475
[BUGFIX] Ruler: fix native histogram recording rule result corruption. #7552
[BUGFIX] Querier: fix HTTP status code translations for remote read requests. Previously, remote-read had conflicting behaviours: when returning samples all internal errors were translated to HTTP 400; when returning chunks all internal errors were translated to HTTP 500. #7487
[BUGFIX] Query-frontend: Fix memory leak on every request. #7654

Mixin

[CHANGE] The job label matcher for distributor and gateway have been extended to include any deployment matching distributor.* and cortex-gw.* respectively. This change allows to match custom and multi-zone distributor and gateway deployments too. #6817
[ENHANCEMENT] Dashboards: Add panels for alertmanager activity of a tenant #6826
[ENHANCEMENT] Dashboards: Add graphs to "Slow Queries" dashboard. #6880
[ENHANCEMENT] Dashboards: Update all deprecated "graph" panels to "timeseries" panels. #6864 #7413 #7457
[ENHANCEMENT] Dashboards: Make most columns in "Slow Queries" sortable. #7000
[ENHANCEMENT] Dashboards: Render graph panels at full resolution as opposed to at half resolution. #7027
[ENHANCEMENT] Dashboards: show query-scheduler queue length on "Reads" and "Remote Ruler Reads" dashboards. #7088
[ENHANCEMENT] Dashboards: Add estimated number of compaction jobs to "Compactor", "Tenants" and "Top tenants" dashboards. #7449 #7481
[ENHANCEMENT] Recording rules: add native histogram recording rules to cortex_request_duration_seconds. #7528
[ENHANCEMENT] Dashboards: Add total owned series, and per-ingester in-memory and owned series to "Tenants" dashboard. #7511
[BUGFIX] Dashboards: drop step parameter from targets as it is not supported. #7157
[BUGFIX] Recording rules: drop rules for metrics removed in 2.0: cortex_memcache_request_duration_seconds and cortex_cache_request_duration_seconds. #7514

Jsonnet

[CHANGE] Distributor: Increase JAEGER_REPORTER_MAX_QUEUE_SIZE from the default (100) to 1000, to avoid dropping tracing spans. #7259
[CHANGE] Querier: Increase JAEGER_REPORTER_MAX_QUEUE_SIZE from 1000 to 5000, to avoid dropping tracing spans. #6764
[CHANGE] rollout-operator: remove default CPU limit. #7066
[CHANGE] Store-gateway: Increase JAEGER_REPORTER_MAX_QUEUE_SIZE from the default (100) to 1000, to avoid dropping tracing spans. #7068
[CHANGE] Query-frontend, ingester, ruler, backend and write instances: Increase JAEGER_REPORTER_MAX_QUEUE_SIZE from the default (100), to avoid dropping tracing spans. #7086
[CHANGE] Ring: relaxed the hash ring heartbeat period and timeout for distributor, ingester, store-gateway and compactor: #6860
- -distributor.ring.heartbeat-period set to 1m
- -distributor.ring.heartbeat-timeout set to 4m
- -ingester.ring.heartbeat-period set to 2m
- -store-gateway.sharding-ring.heartbeat-period set to 1m
- -store-gateway.sharding-ring.heartbeat-timeout set to 4m
- -compactor.ring.heartbeat-period set to 1m
- -compactor.ring.heartbeat-timeout set to 4m
[CHANGE] Ruler-querier: the topology spread constrain max skew is now configured through the configuration option ruler_querier_topology_spread_max_skew instead of querier_topology_spread_max_skew. #7204
[CHANGE] Distributor: -server.grpc.keepalive.max-connection-age lowered from 2m to 60s and configured -shutdown-delay=90s and termination grace period to 100 seconds in order to reduce the chances of failed gRPC write requests when distributors gracefully shutdown. #7361
[FEATURE] Added support for the following root-level settings to configure the list of matchers to apply to node affinity: #6782 #6829
- alertmanager_node_affinity_matchers
- compactor_node_affinity_matchers
- continuous_test_node_affinity_matchers
- distributor_node_affinity_matchers
- ingester_node_affinity_matchers
- ingester_zone_a_node_affinity_matchers
- ingester_zone_b_node_affinity_matchers
- ingester_zone_c_node_affinity_matchers
- mimir_backend_node_affinity_matchers
- mimir_backend_zone_a_node_affinity_matchers
- mimir_backend_zone_b_node_affinity_matchers
- mimir_backend_zone_c_node_affinity_matchers
- mimir_read_node_affinity_matchers
- mimir_write_node_affinity_matchers
- mimir_write_zone_a_node_affinity_matchers
- mimir_write_zone_b_node_affinity_matchers
- mimir_write_zone_c_node_affinity_matchers
- overrides_exporter_node_affinity_matchers
- querier_node_affinity_matchers
- query_frontend_node_affinity_matchers
- query_scheduler_node_affinity_matchers
- rollout_operator_node_affinity_matchers
- ruler_node_affinity_matchers
- ruler_node_affinity_matchers
- ruler_querier_node_affinity_matchers
- ruler_query_frontend_node_affinity_matchers
- ruler_query_scheduler_node_affinity_matchers
- store_gateway_node_affinity_matchers
- store_gateway_node_affinity_matchers
- store_gateway_zone_a_node_affinity_matchers
- store_gateway_zone_b_node_affinity_matchers
- store_gateway_zone_c_node_affinity_matchers
[FEATURE] Ingester: Allow automated zone-by-zone downscaling, that can be enabled via the ingester_automated_downscale_enabled flag. It is disabled by default. #6850
[ENHANCEMENT] Alerts: Add MimirStoreGatewayTooManyFailedOperations warning alert that triggers when Mimir store-gateway report error when interacting with the object storage. #6831
[ENHANCEMENT] Querier HPA: improved scaling metric and scaling policies, in order to scale up and down more gradually. #6971
[ENHANCEMENT] Rollout-operator: upgraded to v0.13.0. #7469
[ENHANCEMENT] Rollout-operator: add tracing configuration to rollout-operator container (when tracing is enabled and configured). #7469
[ENHANCEMENT] Query-frontend: configured -shutdown-delay, -server.grpc.keepalive.max-connection-age and termination grace period to reduce the likelihood of queries hitting terminated query-frontends. #7129
[ENHANCEMENT] Autoscaling: add support for KEDA's ignoreNullValues option for Prometheus scaler. #7471
[BUGFIX] Update memcached-exporter to 0.14.1 due to CVE-2023-39325. #6861

Mimirtool

[FEATURE] Add command migrate-utf8 to migrate Alertmanager configurations for Alertmanager versions 0.27.0 and later. #7383
[ENHANCEMENT] Add template render command to render locally a template. #7325
[ENHANCEMENT] Add --extra-headers option to mimirtool rules command to add extra headers to requests for auth. #7141
[ENHANCEMENT] Analyze Prometheus: set tenant header. #6737
[ENHANCEMENT] Add argument --output-dir to mimirtool alertmanager get where the config and templates will be written to and can be loaded via mimirtool alertmanager load #6760
[BUGFIX] Analyze rule-file: .metricsUsed field wasn't populated. #6953

Mimir Continuous Test

[ENHANCEMENT] Include comparison of all expected and actual values when any float sample does not match. #6756

Query-tee

[BUGFIX] Fix issue where Host HTTP header was not being correctly changed for the proxy targets. #7386
[ENHANCEMENT] Allow using the value of X-Scope-OrgID for basic auth username in the forwarded request if URL username is set as __REQUEST_HEADER_X_SCOPE_ORGID__. #7452

Documentation

[CHANGE] No longer mark OTLP distributor endpoint as experimental. #7348
[ENHANCEMENT] Added runbook for KubePersistentVolumeFillingUp alert. #7297
[ENHANCEMENT] Add Grafana Cloud recommendations to OTLP documentation. #7375
[BUGFIX] Fixed typo on single zone->zone aware replication Helm page. #7327

Tools

[CHANGE] copyblocks: The flags for copyblocks have been changed to align more closely with other tools. #6607
[CHANGE] undelete-blocks: undelete-blocks-gcs has been removed and replaced with undelete-blocks, which supports recovering deleted blocks in versioned buckets from ABS, GCS, and S3-compatible object storage. #6607
[FEATURE] copyprefix: Add tool to copy objects between prefixes. Supports ABS, GCS, and S3-compatible object storage. #6607

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.11.0...mimir-2.12.0

What's Changed

Update grafana/grafana Docker tag to v10.2.2 (main) by @renovate in https://github.com/grafana/mimir/pull/6744
Update Helm release grafana-agent-operator to v0.3.11 (main) by @renovate in https://github.com/grafana/mimir/pull/6743
fix(deps): update module github.com/aws/aws-sdk-go to v1.48.4 (main) by @renovate in https://github.com/grafana/mimir/pull/6745
fix(deps): update module github.com/failsafe-go/failsafe-go to v0.4.1 (main) by @renovate in https://github.com/grafana/mimir/pull/6746
compactor: mark corrupted blocks for no compaction to avoid blocking by @ortuman in https://github.com/grafana/mimir/pull/6588
Docs: Fix flag syntax by @osg-grafana in https://github.com/grafana/mimir/pull/6750
docs: bump required agent version by @krajorama in https://github.com/grafana/mimir/pull/6752
Add option distributor.ingestion-burst-factor by @treid314 in https://github.com/grafana/mimir/pull/6662
Implement series limit using ingester own series by @pstibrany in https://github.com/grafana/mimir/pull/6718
[2.10] Backport native histograms documentation by @krajorama in https://github.com/grafana/mimir/pull/6757
docs: missed newline by @krajorama in https://github.com/grafana/mimir/pull/6759
Merge release-2.10 to main by @krajorama in https://github.com/grafana/mimir/pull/6758
Improve series deduplication for /active_series endpoint by @flxbk in https://github.com/grafana/mimir/pull/6717
Make align-queries-with-step a per-tenant setting by @56quarters in https://github.com/grafana/mimir/pull/6714
Rename "ring" to "ingester ring" by @pstibrany in https://github.com/grafana/mimir/pull/6762
jsonnet: Remove explicit config for align-queries-with-step by @56quarters in https://github.com/grafana/mimir/pull/6765
Add debug message to track tenants sending queries that are not able to benefit from caches. by @wilfriedroset in https://github.com/grafana/mimir/pull/6732
store-gateway: fix duration metrics by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6616
Enable StatefulSetAutoDeletePVC feature for alertmanager, compactor, ingester & store-gateway StatefulSets by @ivanfavi in https://github.com/grafana/mimir/pull/6106
Draft release notes for 2.11 by @leizor in https://github.com/grafana/mimir/pull/6702
Prepare release 2.11.0-rc.0 CHANGELOG.md by @leizor in https://github.com/grafana/mimir/pull/6770
continuous-test: Include comparison of all expected and actual values when any float sample does not match by @charleskorn in https://github.com/grafana/mimir/pull/6756
Docs: Move configuration-parameters subdirectory from references/ to configure/ by @osg-grafana in https://github.com/grafana/mimir/pull/6761
Upgrade to latest mimir-prometheus@main by @aknuds1 in https://github.com/grafana/mimir/pull/6766
Jsonnet: add newDistributorDeployment() utility by @pracucci in https://github.com/grafana/mimir/pull/6774
Jsonnet: add newDistributorContainer() utility by @pracucci in https://github.com/grafana/mimir/pull/6775
Upstream internal jsonnet changes by @pstibrany in https://github.com/grafana/mimir/pull/6764
Distributor: Refactor OTLP handler by @aknuds1 in https://github.com/grafana/mimir/pull/6719
feat/add discovered metrics to metricsUsed by @56quarters in https://github.com/grafana/mimir/pull/6773
Jsonnet/Helm: make memcached connection limit configurable by @lukas-unity in https://github.com/grafana/mimir/pull/6715
Un-revert #6451 by @charleskorn in https://github.com/grafana/mimir/pull/6780
Jsonnet: add distributor_node_affinity_matchers setting support by @pracucci in https://github.com/grafana/mimir/pull/6782
Query-frontend worker: put back 'continue' removed in error by @bboreham in https://github.com/grafana/mimir/pull/6768
Exclude vector queries from being counted in metric for rules with zero fetched series by @zenador in https://github.com/grafana/mimir/pull/6544
Increase default timeout for index-queries cache requests by @56quarters in https://github.com/grafana/mimir/pull/6786
Use BucketIndexBlocksFinder instead of BucketScanBlocksFinder by @leizor in https://github.com/grafana/mimir/pull/6779
Fix issue where failed remote rule evaluation errors are logged without context such as trace IDs by @charleskorn in https://github.com/grafana/mimir/pull/6789
distributor: more efficient labels error messages by @bboreham in https://github.com/grafana/mimir/pull/6785
Update github.com/thanos-io/objstore digest to bffedaa (main) by @renovate in https://github.com/grafana/mimir/pull/6741
Run make mixin-screenshots by @leizor in https://github.com/grafana/mimir/pull/6771
Update github.com/alecthomas/units digest to 9a357b5 (main) by @renovate in https://github.com/grafana/mimir/pull/6800
Update golang.org/x/exp digest to 6522937 (main) by @renovate in https://github.com/grafana/mimir/pull/6803
Fix syntax error in renovate.json by @pstibrany in https://github.com/grafana/mimir/pull/6807
Update module github.com/aws/aws-sdk-go to v1.48.11 (main) by @renovate in https://github.com/grafana/mimir/pull/6805
Move heap ballast util function to dskit by @bboreham in https://github.com/grafana/mimir/pull/6810
Move change log entry of 6106 into helm changelog by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6769
store-gateway: use bucket index instead of scanning the bucket by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6808
Make distributor log ingester details when a push request fails by @duricanikolic in https://github.com/grafana/mimir/pull/6801
querier: default to using batch.NewChunkMergeIterator by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6814
TestQuerierWithBlocksStorageOnMissingBlocksFromStorage: start compactor earlier by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6815
Update alpine Docker tag to v3.18.5 (main) by @renovate in https://github.com/grafana/mimir/pull/6804
Fix helm build script when running out of build-image in a non-bash shell by @pracucci in https://github.com/grafana/mimir/pull/6820
Mixin: extend distributor and gateway job label matchers to include multi-zone deployments by @pracucci in https://github.com/grafana/mimir/pull/6817
Fix flaky ring tests by @pstibrany in https://github.com/grafana/mimir/pull/6824
Move util to format LabelAdaptor string to mimirpb package by @bboreham in https://github.com/grafana/mimir/pull/6822
ingester: simplify metadata limit error by @bboreham in https://github.com/grafana/mimir/pull/6821
fix up bad merge giving compile errors by @bboreham in https://github.com/grafana/mimir/pull/6827
mimirtool: clarify what auth flags do by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6798
Use spanlogger for Ruler API calls by @56quarters in https://github.com/grafana/mimir/pull/6823
[dashboards] Add panels for alertmanager activity of a tenant by @vaxvms in https://github.com/grafana/mimir/pull/6826
Jsonnet: add node_affinity_matchers setting support for every Mimir component by @pracucci in https://github.com/grafana/mimir/pull/6829
Jsonnet: rename with_anti_affinity to withAntiAffinity by @pracucci in https://github.com/grafana/mimir/pull/6832
Add MimirStoreGatewayTooManyFailedOperations alert by @wilfriedroset in https://github.com/grafana/mimir/pull/6831
ruler: fix TestRulerMetricsForInvalidQueriesAndNoFetchedSeries by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6843
Update mimir-prometheus by @pracucci in https://github.com/grafana/mimir/pull/6846
Clarify grafana agent native histogram support by @callmehyde in https://github.com/grafana/mimir/pull/6793
Release mimir-distributed Helm chart 5.2.0-weekly.267 by @grafanabot in https://github.com/grafana/mimir/pull/6825
query-frontend: inject query cache keys for LabelValues/Cardinality requests by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6849
storegateway: small refactor to simplify by @bboreham in https://github.com/grafana/mimir/pull/5444
Docs: Add an introduction; add range query caching info by @osg-grafana in https://github.com/grafana/mimir/pull/6857
Add unit test to compare histogram_quantile results with and without query sharding by @zenador in https://github.com/grafana/mimir/pull/6525
Doc: Refer to native history bucket limit by @osg-grafana in https://github.com/grafana/mimir/pull/6859
ingester client: context canceled is not an error by @bboreham in https://github.com/grafana/mimir/pull/6809
Store-gateway: Add no-compact column on store-gateway/tenants admin UI by @ying-jeanne in https://github.com/grafana/mimir/pull/6848
Helm/Jsonnet: update memcached-exporter to 0.14.1 by @krajorama in https://github.com/grafana/mimir/pull/6861
[mimir-distributed-release-5.1] Helm/Jsonnet: update memcached-exporter to 0.14.1 by @krajorama in https://github.com/grafana/mimir/pull/6863
fix(deps): update module github.com/aws/aws-sdk-go to v1.48.16 (main) by @renovate in https://github.com/grafana/mimir/pull/6869
extend histogram cortex_distributor_sample_delay_seconds_bucket to track negative delays by @replay in https://github.com/grafana/mimir/pull/6838
Mixin: remove last graph panel by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6864
e2e mimir client: add ability to inject headers and roundtrippers by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6851
Release mimir-distributed Helm chart 5.2.0-weekly.268 by @grafanabot in https://github.com/grafana/mimir/pull/6882
chore: mimirtool prometheus analyze set tenant header by @jmichalek132 in https://github.com/grafana/mimir/pull/6737
query-frontend: add special error to prevent caching of LabelValues and LabelValuesCardinality requests by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6885
Distributor: Use pooled buffers for reading/decompressing request body by @aknuds1 in https://github.com/grafana/mimir/pull/6836
Update docs on CacheKeyGenerator by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6890
Fix formatting of comment posted during build image CI builds by @charleskorn in https://github.com/grafana/mimir/pull/6872
Add experimental support to write incoming data to a Kafka-compatible backend by @pracucci in https://github.com/grafana/mimir/pull/6888
Fix issue where querier trace spans are not nested correctly by @charleskorn in https://github.com/grafana/mimir/pull/6893
Upgrade dskit by @charleskorn in https://github.com/grafana/mimir/pull/6875
util.RequestBuffers: Add tests by @aknuds1 in https://github.com/grafana/mimir/pull/6891
Update alpine Docker tag to v3.19.0 (main) by @renovate in https://github.com/grafana/mimir/pull/6876
fix(deps): update module github.com/minio/minio-go/v7 to v7.0.65 (main) by @renovate in https://github.com/grafana/mimir/pull/6873
chore(deps): update grafana/agent docker tag to v0.38.1 (main) by @renovate in https://github.com/grafana/mimir/pull/6878
Update 2.9 changelog and release notes by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6899
Update 2.10 changelog and release notes by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6900
util.RequestBuffers: Fix test flakiness by @aknuds1 in https://github.com/grafana/mimir/pull/6905
Introduce type for Unix timestamps in JSON by @pstibrany in https://github.com/grafana/mimir/pull/6883
Kafka backend prototype: set metadata min age == max age by @pracucci in https://github.com/grafana/mimir/pull/6894
Update all release-2.10 release notes by @colega in https://github.com/grafana/mimir/pull/6908
jsonnet: Add ability to have rollout-operator scale down ingesters by @56quarters in https://github.com/grafana/mimir/pull/6850
k6: Pass auth to write path via headers not URL by @56quarters in https://github.com/grafana/mimir/pull/6915
Distributor: simplify context propagation by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6889
Add project ID option for k6 cloud tests by @chencs in https://github.com/grafana/mimir/pull/6916
Add flag to enable remote rule evaluation in microservices Docker Compose setup by @charleskorn in https://github.com/grafana/mimir/pull/6919
Use correct port in ruler helper script for microservices Docker Compose setup by @charleskorn in https://github.com/grafana/mimir/pull/6918
fix(deps): update golang.org/x/exp digest to f3f8817 (main) by @renovate in https://github.com/grafana/mimir/pull/6867
chore(deps): update anchore/sbom-action action to v0.15.1 (main) by @renovate in https://github.com/grafana/mimir/pull/6877
fix(deps): update module go.opentelemetry.io/collector/pdata to v1.0.0 (main) by @renovate in https://github.com/grafana/mimir/pull/6874
fix(deps): update module github.com/failsafe-go/failsafe-go to v0.4.2 (main) by @renovate in https://github.com/grafana/mimir/pull/6870
fix(deps): update module github.com/go-openapi/strfmt to v0.21.9 (main) by @renovate in https://github.com/grafana/mimir/pull/6871
Update golang Docker tag to v1.21.5 (main) by @renovate in https://github.com/grafana/mimir/pull/6868
Compactor: add function to remove no compact marker by @ying-jeanne in https://github.com/grafana/mimir/pull/6917
Ingester remains in the LEAVING state if starting() terminates by @duricanikolic in https://github.com/grafana/mimir/pull/6923
apply multidimensional query request queuing: supply queue dimensions from frontend & utilize in scheduler by @francoposa in https://github.com/grafana/mimir/pull/6772
Update dskit and fix issue where all incoming HTTP requests have duplicate trace spans by @charleskorn in https://github.com/grafana/mimir/pull/6920
Add trace span for time spent lazily loading index-header in store-gateway by @charleskorn in https://github.com/grafana/mimir/pull/6922
Fix typo in log message by @pracucci in https://github.com/grafana/mimir/pull/6933
e2e client support for cardinality by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6852
Dashboards: add graphs to "Slow queries" by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6880
Helm: release 5.1.4 by @krajorama in https://github.com/grafana/mimir/pull/6865
k6: Split usernames and tenant IDs by read/write by @56quarters in https://github.com/grafana/mimir/pull/6930
ingester: add experimental support for consuming records from kafka by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6929
Merge mimir-distributed-release-5.1 to main by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6936
chore(deps): update grafana/mimirtool docker tag to v2.10.5 (main) by @renovate in https://github.com/grafana/mimir/pull/6945
Update Mimir images to use 2.10.5 by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6937
feat(helm): add pure Ingress option instead of the gateway service by @mcanevet in https://github.com/grafana/mimir/pull/6932
Release mimir-distributed Helm chart 5.2.0-weekly.269 by @grafanabot in https://github.com/grafana/mimir/pull/6949
ingest consumer: handle Push errors by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6940
ingest consumer: more granular error handling, committer sanity check by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6951
Fix context cancellation handling when request is canceled while querying store-gateway by @pracucci in https://github.com/grafana/mimir/pull/6934
Fix .metricsUsed in mimirtool analyze rule-file by @colega in https://github.com/grafana/mimir/pull/6953
Get rid of validation/errors.go by @duricanikolic in https://github.com/grafana/mimir/pull/6955
Adjust renovate.json for 2.11 release by @leizor in https://github.com/grafana/mimir/pull/6796
add additional queue dimensions to query scheduler queue duration histogram by @francoposa in https://github.com/grafana/mimir/pull/6960
Chore: remove unused storepb.StoreServer by @pracucci in https://github.com/grafana/mimir/pull/6958
store-gateway: remove obsolete comments for fine-grained chunks caching by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6892
Slow queries dashboard: unhide parameters, remove length calculation by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6913
Delete docs on removed configuration parameters by @leizor in https://github.com/grafana/mimir/pull/6967
Merge release-2.11 to main by @leizor in https://github.com/grafana/mimir/pull/6963
fix and test label query optional time param parsing for additional queue dimensions by @francoposa in https://github.com/grafana/mimir/pull/6969
Support disabling metric relabeling per tenant by @Logiraptor in https://github.com/grafana/mimir/pull/6970
Improve validation.LimitError usages by @duricanikolic in https://github.com/grafana/mimir/pull/6954
ingest consumer: commit offset on shutdown by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6974
Upgrade mimir-prometheus by @pracucci in https://github.com/grafana/mimir/pull/6975
Mimirtool: add --output-dir to alertmanager get by @edwintye in https://github.com/grafana/mimir/pull/6760
Update module golang.org/x/crypto to v0.17.0 [SECURITY] (main) by @renovate in https://github.com/grafana/mimir/pull/6962
Update go-openapi packages (main) by @renovate in https://github.com/grafana/mimir/pull/6946
fix(deps): update golang.org/x/exp digest to dc181d7 (main) by @renovate in https://github.com/grafana/mimir/pull/6943
Remove PrioritizeRecoverableErr method no longer used by @56quarters in https://github.com/grafana/mimir/pull/6976
Distributor: start remote timeout on first callback by @colega in https://github.com/grafana/mimir/pull/6972
Jsonnet: improve querier HPA to scale up and down more gradually by @pracucci in https://github.com/grafana/mimir/pull/6971
Refactor Distributor.push() by @colega in https://github.com/grafana/mimir/pull/6978
Release mimir-distributed Helm chart 5.2.0-weekly.270 by @grafanabot in https://github.com/grafana/mimir/pull/6998
Update module github.com/twmb/franz-go to v1.15.4 (main) by @renovate in https://github.com/grafana/mimir/pull/6997
Update github.com/thanos-io/objstore digest to 9f421f2 (main) by @renovate in https://github.com/grafana/mimir/pull/6993
Merge release-2.11 to main by @leizor in https://github.com/grafana/mimir/pull/7002
Add 2.11 to backwards compatibility test by @leizor in https://github.com/grafana/mimir/pull/7003
fix(deps): update module github.com/minio/minio-go/v7 to v7.0.66 (main) by @renovate in https://github.com/grafana/mimir/pull/6996
Update module github.com/Azure/azure-sdk-for-go/sdk/storage/azblob to v1.2.1 (main) by @renovate in https://github.com/grafana/mimir/pull/6947
chore(deps): update grafana/grafana docker tag to v10.2.3 (main) by @renovate in https://github.com/grafana/mimir/pull/6994
fix(deps): update module github.com/prometheus/client_golang to v1.18.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7005
fix(deps): update golang.org/x/exp digest to 02704c9 (main) by @renovate in https://github.com/grafana/mimir/pull/7006
"Slow Queries" dashboard: make more fields sortable by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7000
fix(deps): update module github.com/aws/aws-sdk-go to v1.49.11 (main) by @renovate in https://github.com/grafana/mimir/pull/7009
fix(deps): update module go.uber.org/goleak to v1.3.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7010
fix(deps): update module golang.org/x/time to v0.5.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7011
fix(deps): update module sigs.k8s.io/kustomize/kyaml to v0.16.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7012
fix(deps): update module google.golang.org/protobuf to v1.32.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7017
chore(deps): update prom/prometheus docker tag to v2.48.1 (main) by @renovate in https://github.com/grafana/mimir/pull/7007
chore(deps): update registry.k8s.io/kustomize/kustomize docker tag to v5.3.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7008
chore(deps): update actions/setup-go action to v5 (main) by @renovate in https://github.com/grafana/mimir/pull/7020
fix(deps): update module github.com/alecthomas/chroma/v2 to v2.12.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7014
fix(deps): update module github.com/spf13/afero to v1.11.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7018
prometheus codec time param parsing docstrings and emulating upstream prometheus behavior by @francoposa in https://github.com/grafana/mimir/pull/6985
fix(deps): update github.com/thanos-io/objstore digest to 61cfed8 (main) by @renovate in https://github.com/grafana/mimir/pull/7024
mimir-build-image: update to Debian Bookworm by @colega in https://github.com/grafana/mimir/pull/6980
helm: omit backend: s3 when minio is disabled by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6999
Kafka ingestion experiment: add PartitionReader.WaitReadConsistency() by @pracucci in https://github.com/grafana/mimir/pull/6982
mimir-mixin: update github.com/grafana/jsonnet-libs by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7027
slow queries dashboard: rename loki_datasource by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7028
Always validate tenant IDs and introduce max tenants setting by @56quarters in https://github.com/grafana/mimir/pull/6959
Fix panic in TestPartitionOffsetWatcher_Concurrency by @pracucci in https://github.com/grafana/mimir/pull/7029
chore(deps): update prom/memcached-exporter docker tag to v0.14.2 (main) by @renovate in https://github.com/grafana/mimir/pull/7025
mixin: make autoscaling dashboards ignore multiple series from keda metrics by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7001
fix(deps): update module cloud.google.com/go/storage to v1.36.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7013
fix(deps): update module github.com/aws/aws-sdk-go to v1.49.13 (main) by @renovate in https://github.com/grafana/mimir/pull/7035
Slow queries dashboard: Fix loki datasource variable name by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7039
Fix flaky TestLoader_ShouldCacheIndexNotFoundOnBackgroundUpdates by @pracucci in https://github.com/grafana/mimir/pull/7040
Remote querier: use backoff retry only with 5xx errors by @duricanikolic in https://github.com/grafana/mimir/pull/7004
fix(deps): update go-openapi packages (main) by @renovate in https://github.com/grafana/mimir/pull/6995
chore(deps): update alpine/helm docker tag to v3.13.3 (main) by @renovate in https://github.com/grafana/mimir/pull/6944
fix(deps): update module github.com/google/go-github/v32 to v57 (main) by @renovate in https://github.com/grafana/mimir/pull/7038
[query-frontend] Set http request headers from querymiddleware request options by @flxbk in https://github.com/grafana/mimir/pull/7033
Add strong read consistency support in ingester for experimental ingest storage by @pracucci in https://github.com/grafana/mimir/pull/7030
Add 'msteams' to TestMultitenantAlertmanager_FirewallShouldBlockHTTPBasedReceiversWhenEnabled by @pracucci in https://github.com/grafana/mimir/pull/7050
Enable sharding for active_series requests by @flxbk in https://github.com/grafana/mimir/pull/6784
query-frontend: remove duplicate defaulting of instant query time param by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7026
Move owned series check logging/metric inside of updateAllTenants by @pr00se in https://github.com/grafana/mimir/pull/7044
Allow server to return HTTP 4xx errors by @duricanikolic in https://github.com/grafana/mimir/pull/7045
Release mimir-distributed Helm chart 5.2.0-weekly.271 by @grafanabot in https://github.com/grafana/mimir/pull/7049
Fix max_fetched_series_per_query not applied to /series from ingesters by @krajorama in https://github.com/grafana/mimir/pull/7055
Enforce use of context.WithCancelCause in non-test code by @charleskorn in https://github.com/grafana/mimir/pull/6921
fix(deps): update dependency puppeteer to v21.7.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7069
chore(deps): update otel/opentelemetry-collector-contrib docker tag to v0.91.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7067
fix(deps): update module github.com/aws/aws-sdk-go to v1.49.16 (main) by @renovate in https://github.com/grafana/mimir/pull/7064
chore(deps): update anchore/sbom-action action to v0.15.2 (main) by @renovate in https://github.com/grafana/mimir/pull/7063
fix(deps): update golang.org/x/exp digest to be819d1 (main) by @renovate in https://github.com/grafana/mimir/pull/7062
fix(deps): update module golang.org/x/sync to v0.6.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7070
Release mimir-distributed Helm chart 5.2.0-weekly.272 by @grafanabot in https://github.com/grafana/mimir/pull/7072
chore(deps): update actions/download-artifact action to v4 (main) by @renovate in https://github.com/grafana/mimir/pull/7019
new Mimir / Reads dashboard panel for scheduler queue duration, breakout by additional queue dimensions by @francoposa in https://github.com/grafana/mimir/pull/7059
Remove default CPU limit for rollout-operator pod by @charleskorn in https://github.com/grafana/mimir/pull/7066
Revert #6922 by @charleskorn in https://github.com/grafana/mimir/pull/7065
fix(deps): update module google.golang.org/grpc to v1.60.1 (main) by @renovate in https://github.com/grafana/mimir/pull/7016
Prepare CHANGELOG.md for next helm release by @flxbk in https://github.com/grafana/mimir/pull/7079
Update dskit to latest main by @56quarters in https://github.com/grafana/mimir/pull/7076
Allow S3 Storage Provider to support sts_endpoint by @TimKotowski in https://github.com/grafana/mimir/pull/6990
Increase default Jaeger queue size for store-gateways and queriers by @charleskorn in https://github.com/grafana/mimir/pull/7068
Fix formatting in querier component doc by @jhalterman in https://github.com/grafana/mimir/pull/7078
docs: update link to Grafana Agent Operator CRDs by @rfratto in https://github.com/grafana/mimir/pull/7084
Helm chart release 5.2.0-rc.0 by @flxbk in https://github.com/grafana/mimir/pull/7093
ingest: include gRPC status code validation only when running ingester by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7095
ingest: Add dev environment by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7096
Increase default Jaeger queue size for some components and make queue size for all components configurable in Helm chart by @charleskorn in https://github.com/grafana/mimir/pull/7086
Mimir / Reads dashboard query scheduler queue duration breakout improvements by @francoposa in https://github.com/grafana/mimir/pull/7098
Update dskit by @pstibrany in https://github.com/grafana/mimir/pull/7092
Handle context.Canceled in active series requests by @flxbk in https://github.com/grafana/mimir/pull/7102
Small fixes for /active_series by @flxbk in https://github.com/grafana/mimir/pull/7106
Repurpose client within tools/copyblocks to add undelete-blocks and copyprefix by @andyasp in https://github.com/grafana/mimir/pull/6607
ingest storage: per-query X-Read-Consistency HTTP header by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7091
Update dskit by @pr00se in https://github.com/grafana/mimir/pull/7109
scheduler queue benchmark: factor out & clarify functions by @francoposa in https://github.com/grafana/mimir/pull/7101
Update dskit to latest main by @56quarters in https://github.com/grafana/mimir/pull/7111
Update make docs procedure and add workflow to keep it up to date by @jdbaldry in https://github.com/grafana/mimir/pull/5794
querier: remove noop if by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7108
Add additional querier metrics by @jhalterman in https://github.com/grafana/mimir/pull/7099
Support active count_method in label names cardinality by @Logiraptor in https://github.com/grafana/mimir/pull/7085
Add query-scheduler queue length to Reads and Remote Ruler Reads dashboards by @charleskorn in https://github.com/grafana/mimir/pull/7088
fix(deps): update github.com/grafana/dskit digest to f30e65d (main) by @renovate in https://github.com/grafana/mimir/pull/7114
chore(deps): update anchore/sbom-action action to v0.15.3 (main) by @renovate in https://github.com/grafana/mimir/pull/7116
fix(deps): update golang.org/x/exp digest to db7319d (main) by @renovate in https://github.com/grafana/mimir/pull/7115
chore(deps): update helm release grafana-agent-operator to v0.3.14 (main) by @renovate in https://github.com/grafana/mimir/pull/7118
chore(deps): update golang docker tag to v1.21.6 (main) by @renovate in https://github.com/grafana/mimir/pull/7117
fix(deps): update module github.com/aws/aws-sdk-go to v1.49.21 (main) by @renovate in https://github.com/grafana/mimir/pull/7121
chore(deps): update memcached docker tag to v1.6.23 (main) by @renovate in https://github.com/grafana/mimir/pull/7120
Update vendored mimir-prometheus by @pracucci in https://github.com/grafana/mimir/pull/7057
chore(deps): update grafana/agent docker tag to v0.39.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7123
fix(deps): update module go.opentelemetry.io/collector/pdata to v1.0.1 (main) by @renovate in https://github.com/grafana/mimir/pull/7122
fix(deps): update module google.golang.org/api to v0.156.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7015
Helm chart release 5.2.0 by @flxbk in https://github.com/grafana/mimir/pull/7089
Release mimir-distributed Helm chart 5.2.0-weekly.273 by @grafanabot in https://github.com/grafana/mimir/pull/7124
Fix missing "annotation" key in create-bucket-job template of helm chart by @narqo in https://github.com/grafana/mimir/pull/7113
Upgrade rollout-operator to v0.10.1 by @pracucci in https://github.com/grafana/mimir/pull/7125
Change -query-frontend.not-running-timeout default value to 2s and mark it as advanced by @pracucci in https://github.com/grafana/mimir/pull/7127
Use 2K reusable ingester push workers by default by @colega in https://github.com/grafana/mimir/pull/7128
Set -server.grpc.num-workers=100 by default by @colega in https://github.com/grafana/mimir/pull/7131
mimirpb: add DeepCopyTimeseries extra option to make a deep copy hi… by @ortuman in https://github.com/grafana/mimir/pull/7130
Jsonnet / Helm: reduce likelihood of queries hitting terminated query-frontends by @pracucci in https://github.com/grafana/mimir/pull/7129
Merge mimir-distributed-release-5.2 to main by @flxbk in https://github.com/grafana/mimir/pull/7097
Make TestBlocksStoreQuerier_ShouldReturnContextCanceledIfContextWasCanceledWhileRunningRequestOnStoreGateway easier to debug when flaky by @pracucci in https://github.com/grafana/mimir/pull/7134
Documentation: Document ingester read path limiting by @aknuds1 in https://github.com/grafana/mimir/pull/7110
Reduce sync concurrency in store-gateway by default to reduce disk contention by @andyasp in https://github.com/grafana/mimir/pull/7136
Perform initial owned series calculation before starting lifecycler by @pr00se in https://github.com/grafana/mimir/pull/7087
Update vendored mimir-prometheus by @pracucci in https://github.com/grafana/mimir/pull/7138
Chore: simplify runAsync() and runAsyncAfter() usage in ingest storage tests by @pracucci in https://github.com/grafana/mimir/pull/7147
Ingest storage: configure BrokerMaxReadBytes on Kafka reader by @pracucci in https://github.com/grafana/mimir/pull/7148
Report gRPC status codes as labels in request duration metrics by @duricanikolic in https://github.com/grafana/mimir/pull/7144
Ingest storage local dev env: configure Mimir components to start after Kafka is healthy by @pracucci in https://github.com/grafana/mimir/pull/7153
remove non-ascii characters from invalid metric names in error message by @replay in https://github.com/grafana/mimir/pull/7146
/active_series: generate correct request shards for incoming GET requests, handle gRPC errors by @flxbk in https://github.com/grafana/mimir/pull/7133
Update to latest dskit for cache changes by @56quarters in https://github.com/grafana/mimir/pull/7155
[BUGFIX] Update nginx deployment to mount nginx.conf as file by @blut in https://github.com/grafana/mimir/pull/7150
Update opentelemetry-go monorepo to v1.22.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7071
Jsonnet / Helm: relax the hash ring heartbeat period and timeout for distributor, ingester, store-gateway and compactor by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6860
mimir-mixin: drop unsupported step target parameter by @krajorama in https://github.com/grafana/mimir/pull/7157
Update values.yaml documentation and example by @benoitschipper in https://github.com/grafana/mimir/pull/7135
Make ingesters return gRPC errors only by @duricanikolic in https://github.com/grafana/mimir/pull/7151
Update RELEASE.md with tentative 2024 release schedule by @duricanikolic in https://github.com/grafana/mimir/pull/7159
Rename metric to cortex_querier_query_storegateway_chunks_total by @jhalterman in https://github.com/grafana/mimir/pull/7145
CI: use a single cache for tests by @bboreham in https://github.com/grafana/mimir/pull/5266
Set distributor.remote-timeout=10s for rulers in jsonnet/helm by @leizor in https://github.com/grafana/mimir/pull/7143
Add integration test using ingest storage by @pracucci in https://github.com/grafana/mimir/pull/7160
fix additional queue dimensions unsupported request logging by @francoposa in https://github.com/grafana/mimir/pull/7164
mimir ruler reads dashboard query scheduler queue duration breakout improvements by @francoposa in https://github.com/grafana/mimir/pull/7163
mimir ruler reads dashboard query scheduler row - fix job matchers by @francoposa in https://github.com/grafana/mimir/pull/7166
Update MAINTAINERS.md by @derek-cadzow in https://github.com/grafana/mimir/pull/7077
Update make docs procedure by @github-actions in https://github.com/grafana/mimir/pull/7167
Ingester: mark spread-minimizing tokens-related CLI flags as advanced by @duricanikolic in https://github.com/grafana/mimir/pull/7169
Replace servicediscovery package with dskit's by @npazosmendez in https://github.com/grafana/mimir/pull/7171
Changelog: rephrase the changes related to STS endpoint configuration by @narqo in https://github.com/grafana/mimir/pull/7170
Alertmanager: Support uploading Grafana Alertmanager Configuration an… by @gotjosh in https://github.com/grafana/mimir/pull/6682
Changed -query-frontend.max-cache-freshness default value from 1m to 10m by @pracucci in https://github.com/grafana/mimir/pull/7161
Enable distributor write requests buffer pooling by default by @andyasp in https://github.com/grafana/mimir/pull/7165
Mimirtool: add the ability to include extra headers with requests made to the server by @snowzach in https://github.com/grafana/mimir/pull/7141
Fix panic as a result of duplicate write request cleanup by @Logiraptor in https://github.com/grafana/mimir/pull/7176
Write query parameters to span attributes instead of events by @chencs in https://github.com/grafana/mimir/pull/7046
Update golang.org/x/exp digest to 1b97071 (main) by @renovate in https://github.com/grafana/mimir/pull/7179
Update grafana/agent Docker tag to v0.39.1 (main) by @renovate in https://github.com/grafana/mimir/pull/7183
Update Helm release grafana-agent-operator to v0.3.15 (main) by @renovate in https://github.com/grafana/mimir/pull/7180
Update anchore/sbom-action action to v0.15.4 (main) by @renovate in https://github.com/grafana/mimir/pull/7181
Update github.com/thanos-io/objstore digest to 6ecabdd (main) by @renovate in https://github.com/grafana/mimir/pull/7178
Update Helm release rollout-operator to v0.13.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7184
Update alpine/helm Docker tag to v3.14.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7185
Update module github.com/aws/aws-sdk-go to v1.50.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7186
Update module github.com/prometheus/common to v0.46.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7187
Increase log level for problems creating index header readers by @56quarters in https://github.com/grafana/mimir/pull/7177
Add conditional read-after–write support to rules evaluation by @pracucci in https://github.com/grafana/mimir/pull/7142
querier: don't log errors on stream connection reset by @narqo in https://github.com/grafana/mimir/pull/7168
Documented ingeter migration to spread-minimizing tokens by @duricanikolic in https://github.com/grafana/mimir/pull/7174
Release mimir-distributed Helm chart 5.2.0-weekly.274 by @grafanabot in https://github.com/grafana/mimir/pull/7191
Update make docs procedure by @github-actions in https://github.com/grafana/mimir/pull/7197
Alertmanager: Update to latest main by @grobinson-grafana in https://github.com/grafana/mimir/pull/7103
Improve read consistency observability by @pracucci in https://github.com/grafana/mimir/pull/7193
Revert "helm: omit backend: s3 when minio is disabled" by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7199
[mimir-distributed-release-5.2] Revert "helm: omit backend: s3 when minio is disabled" by @grafanabot in https://github.com/grafana/mimir/pull/7201
Slow queries dashboard: bug fixes and a new panel by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7194
Helm: release 5.2.1 by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7202
Merge mimir-distributed-release-5.2 to main by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7203
Jsonnet: allow to configure ruler-querier max skew separately from querier by @pracucci in https://github.com/grafana/mimir/pull/7204
query scheduler test for multidimensional queueing effectiveness scenarios by @francoposa in https://github.com/grafana/mimir/pull/7162
Fix missing PR number from CHANGELOG.md by @grobinson-grafana in https://github.com/grafana/mimir/pull/7208
downgrade integration test certificate elliptic curve algo for boringcrypto FIPS-only compatibility by @francoposa in https://github.com/grafana/mimir/pull/7206
Fix 'op' label values for cortex_query_frontend_queries_total metric by @pracucci in https://github.com/grafana/mimir/pull/7207
distributor: make unit tests more flexible by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7205
Alertmanager: Support UTF-8 by @grobinson-grafana in https://github.com/grafana/mimir/pull/6898
Chore: remove superfluous errors.Cause() usage in pkg/storage by @pracucci in https://github.com/grafana/mimir/pull/7212
Remove errors.Cause() usage from store-gateway and its clients by @pracucci in https://github.com/grafana/mimir/pull/7213
Remove errors.Cause() usage from cardinality analysis handler by @pracucci in https://github.com/grafana/mimir/pull/7214
distributor: use pointers to mockIngester in tests by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7223
Do not use grpc/status.FromError() and gogo/status.FromError() by @pracucci in https://github.com/grafana/mimir/pull/7224
Alertmanager: Support UTF-8 characters in matchers and labels small followup by @gotjosh in https://github.com/grafana/mimir/pull/7221
Add querier metric for block source and compaction level by @jhalterman in https://github.com/grafana/mimir/pull/7112
Ensure we never use errors.Cause() by @pracucci in https://github.com/grafana/mimir/pull/7215
Add Dockerfile for tools/copyblocks by @pstibrany in https://github.com/grafana/mimir/pull/7211
Extensible "Rollout progress" workload grouping by @colega in https://github.com/grafana/mimir/pull/7228
Enable Nginx Proxy HTTP/1.1 in Mimir for Istio Envoy Sidecar Compatibility by @rafilkmp3 in https://github.com/grafana/mimir/pull/5040
ruler: add more details to failures for TestRulerEvaluationDelay by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7217
Fix s3 upload performance regression by @pracucci in https://github.com/grafana/mimir/pull/7241
Update alpine Docker tag to v3.19.1 (main) by @renovate in https://github.com/grafana/mimir/pull/7238
chore(deps): update anchore/sbom-action action to v0.15.5 (main) by @renovate in https://github.com/grafana/mimir/pull/7239
fix(deps): update module github.com/aws/aws-sdk-go to v1.50.5 (main) by @renovate in https://github.com/grafana/mimir/pull/7242
fix(deps): update module github.com/go-openapi/swag to v0.22.9 (main) by @renovate in https://github.com/grafana/mimir/pull/7243
Update mimir-prometheus to c2a5e159 by @krajorama in https://github.com/grafana/mimir/pull/7219
Update dskit to latest main and implements LogSourceIPsFull flag by @bubu11e in https://github.com/grafana/mimir/pull/7250
Update mimir-prometheus by @pracucci in https://github.com/grafana/mimir/pull/7251
Native histogram: validate schema number before reducing resolution by @duricanikolic in https://github.com/grafana/mimir/pull/7252
Update Alertmanager to latest main by @grobinson-grafana in https://github.com/grafana/mimir/pull/7254
Update dskit to latest main by @56quarters in https://github.com/grafana/mimir/pull/7258
Increase distributor jaeger max queue size by @jhesketh in https://github.com/grafana/mimir/pull/7259
Reuse Histogram objects in tools/tsdb-chunks and tools/tsdb-print-chunk by @duricanikolic in https://github.com/grafana/mimir/pull/7260
Update about-grafana-mimir-architecture by @narqo in https://github.com/grafana/mimir/pull/7234
Include out-or-order label in blocks queried metric by @56quarters in https://github.com/grafana/mimir/pull/7262
Add utf8-strict-mode flag for mimirtool by @grobinson-grafana in https://github.com/grafana/mimir/pull/7227
rulestore/local: add support for GetRuleGroup endpoint by @narqo in https://github.com/grafana/mimir/pull/7248
Add out of order field to block index by @jhalterman in https://github.com/grafana/mimir/pull/7267
Add gitignore files for mimir tools by @jhalterman in https://github.com/grafana/mimir/pull/7271
frontend: improve shardActiveSeriesMiddleware performance when merging responses by @ortuman in https://github.com/grafana/mimir/pull/7261
Native histograms: optimize chunk iterator usage by @duricanikolic in https://github.com/grafana/mimir/pull/7274
Release mimir-distributed Helm chart 5.2.0-weekly.275 by @grafanabot in https://github.com/grafana/mimir/pull/7245
helm: align grpc server connection lifetime settings with jsonnet by @narqo in https://github.com/grafana/mimir/pull/7269
querymiddleware: Fix race condition in shardActiveSeriesMiddleware by @narqo in https://github.com/grafana/mimir/pull/7290
ruler: build user-agent header at runtime by @narqo in https://github.com/grafana/mimir/pull/7264
helm: sync server.grpc.keepalive.max-connection-idle with jsonnet by @narqo in https://github.com/grafana/mimir/pull/7298
Compactor: export estimated number of compaction jobs based on bucket-index by @pstibrany in https://github.com/grafana/mimir/pull/7299
Add KubePersistentVolumeFillingUp runbook by @pracucci in https://github.com/grafana/mimir/pull/7297
Internal: remove unnecessary parameter to NoCompactionMarkFilter by @pstibrany in https://github.com/grafana/mimir/pull/7301
Name query metrics for easier discovery by @56quarters in https://github.com/grafana/mimir/pull/7302
fix(deps): update module github.com/aws/aws-sdk-go to v1.50.11 (main) by @renovate in https://github.com/grafana/mimir/pull/7288
fix(deps): update module github.com/klauspost/compress to v1.17.6 (main) by @renovate in https://github.com/grafana/mimir/pull/7291
chore(deps): update anchore/sbom-action action to v0.15.8 (main) by @renovate in https://github.com/grafana/mimir/pull/7286
chore(deps): update grafana/agent docker tag to v0.39.2 (main) by @renovate in https://github.com/grafana/mimir/pull/7287
chore(deps): update grafana/grafana docker tag to v10.3.1 (main) by @renovate in https://github.com/grafana/mimir/pull/7292
fix(deps): update module github.com/failsafe-go/failsafe-go to v0.4.4 (main) by @renovate in https://github.com/grafana/mimir/pull/7289
Chore: removed unused parameter from GenerateBlockFromSpec() by @pracucci in https://github.com/grafana/mimir/pull/7303
Update mimir-prometheus by @pracucci in https://github.com/grafana/mimir/pull/7293
Release mimir-distributed Helm chart 5.3.0-weekly.276 by @grafanabot in https://github.com/grafana/mimir/pull/7294
Open circuit breakers on timeouts and per-instance limit errors only by @duricanikolic in https://github.com/grafana/mimir/pull/7310
Get rid of iterators.chunkIterator and iterators.chunkMergeIterator by @duricanikolic in https://github.com/grafana/mimir/pull/7313
Compactor: Language fixes by @aknuds1 in https://github.com/grafana/mimir/pull/7315
Do not register compat metrics in mimirtool by @grobinson-grafana in https://github.com/grafana/mimir/pull/7314
Compactor: Un-export symbols that don't need to be exported by @aknuds1 in https://github.com/grafana/mimir/pull/7317
Circuit breakers: add client.ErrCircuitBreakerOpen type by @duricanikolic in https://github.com/grafana/mimir/pull/7324
Add mimirpb.CIRCUIT_BREAKER_OPEN error cause by @duricanikolic in https://github.com/grafana/mimir/pull/7330
store-gateway: remove cortex_bucket_store_blocks_loaded_by_duration by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7309
ruler: don't retry on non-retriable error by @narqo in https://github.com/grafana/mimir/pull/7216
Update Alertmanager to f69a508 by @grobinson-grafana in https://github.com/grafana/mimir/pull/7332
Helm: add ruler specific service account by @QuantumEnigmaa in https://github.com/grafana/mimir/pull/7132
frontend/transport: Log non-2xx replies from downstream as unsuccesful by @narqo in https://github.com/grafana/mimir/pull/7296
querymiddleware: Pool snappy writer in shard activity series by @narqo in https://github.com/grafana/mimir/pull/7308
Helm: make PSP configurable by @QuantumEnigmaa in https://github.com/grafana/mimir/pull/7190
Helm - Templatable host for gateway ingress/route by @Itaykal in https://github.com/grafana/mimir/pull/7218
[Docs] Update migrate-from-single-zone-with-helm.md by @eamonryan in https://github.com/grafana/mimir/pull/7327
Always sort labels in distributors by @Logiraptor in https://github.com/grafana/mimir/pull/7326
Do not check for ingester ring state before creating TSDB, or compacting / shipping blocks by @pracucci in https://github.com/grafana/mimir/pull/7322
Compactor: String format compaction plan as comma separated blocks by @aknuds1 in https://github.com/grafana/mimir/pull/7321
Add a lifetime manager for Vault authentication tokens by @fayzal-g in https://github.com/grafana/mimir/pull/7337
Update github.com/grafana/dskit digest to f245b48 (main) by @renovate in https://github.com/grafana/mimir/pull/7283
Packaging: remove reload from systemd file as mimir does not take into account SIGHUP by @wilfriedroset in https://github.com/grafana/mimir/pull/7345
Docs: No longer mark OTLP endpoint as experimental by @aknuds1 in https://github.com/grafana/mimir/pull/7348
Update golang.org/x/exp digest to 2c58cdc (main) by @renovate in https://github.com/grafana/mimir/pull/7352
Update module github.com/aws/aws-sdk-go to v1.50.15 (main) by @renovate in https://github.com/grafana/mimir/pull/7353
Update module github.com/minio/minio-go/v7 to v7.0.67 (main) by @renovate in https://github.com/grafana/mimir/pull/7354
Update dependency puppeteer to v21.11.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7355
Update helm/kind-action action to v1.9.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7357
Update module cloud.google.com/go/storage to v1.37.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7358
Jsonnet / Helm: improve distributors graceful shutdown by @pracucci in https://github.com/grafana/mimir/pull/7361
Release mimir-distributed Helm chart 5.3.0-weekly.277 by @grafanabot in https://github.com/grafana/mimir/pull/7362
Distributor: Make -distributor.enable-otlp-metadata-storage flag default to true, and deprecate by @aknuds1 in https://github.com/grafana/mimir/pull/7366
Mark -ingester.limit-inflight-requests-using-grpc-method-limiter and -distributor.limit-inflight-requests-using-grpc-method-limiter as stable and enable it by default by @pracucci in https://github.com/grafana/mimir/pull/7360
Do not consider out-of-order blocks when filtering compactable jobs by @jhalterman in https://github.com/grafana/mimir/pull/7342
mimir: Inject span profiler into tracer by @narqo in https://github.com/grafana/mimir/pull/7363
Add experimental partitions ring lifecycler support by @pracucci in https://github.com/grafana/mimir/pull/7349
feat(helm): Adding KEDA autoscaling support by @beatkind in https://github.com/grafana/mimir/pull/7282
Add timely head compaction support by @jhalterman in https://github.com/grafana/mimir/pull/7372
Fix type error in autoscaling jsonnet by @56quarters in https://github.com/grafana/mimir/pull/7374
Become temporary maintainer of docs until another technical writer is found by @jdbaldry in https://github.com/grafana/mimir/pull/7378
vendor: Bump grpc-go to 1.61 latest by @narqo in https://github.com/grafana/mimir/pull/7380
Add page for displaying compaction jobs computed from bucket-index. by @pstibrany in https://github.com/grafana/mimir/pull/7381
query-tee: override correct Host attribute by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7386
Use partitions ring in write path and ingesters consumption by @pracucci in https://github.com/grafana/mimir/pull/7376
Fix TestPartitionReader_WaitReadConsistency flakyness by @pracucci in https://github.com/grafana/mimir/pull/7391
mimirtool: Add new migrate-utf8 command by @grobinson-grafana in https://github.com/grafana/mimir/pull/7383
active series: abort streaming on context cancelled by @flxbk in https://github.com/grafana/mimir/pull/7387
Introduce new metric to track OTLP requests by @fayzal-g in https://github.com/grafana/mimir/pull/7385
Alertmanager: Update Alertmanager to commit 80b3cb0 by @grobinson-grafana in https://github.com/grafana/mimir/pull/7384
prepare-partition-downscale: Switch to Active state only if partition is currently Inactive. by @pstibrany in https://github.com/grafana/mimir/pull/7394
Add partitions ring support to Distributor.QueryStream() by @pracucci in https://github.com/grafana/mimir/pull/7388
Update Vault token renewal test to use e2e.GreaterOrEqual by @fayzal-g in https://github.com/grafana/mimir/pull/7398
ingester: Make usagestats vars package-global by @pstibrany in https://github.com/grafana/mimir/pull/7395
Make active series benchmark more realistic, improve performance of context cancellation check. by @flxbk in https://github.com/grafana/mimir/pull/7396
Query-frontend activity log enhancements by @pstibrany in https://github.com/grafana/mimir/pull/7400
Add partitions ring support to more Distributor querying functions by @pracucci in https://github.com/grafana/mimir/pull/7393
Add partitions ring support to Distributor.UserStats() by @pracucci in https://github.com/grafana/mimir/pull/7402
Improve compaction of sporadic blocks by @jhalterman in https://github.com/grafana/mimir/pull/7329
Add new command on mimirtool used to render template by @ncharaf in https://github.com/grafana/mimir/pull/7325
Clean up a few changelog entries by @56quarters in https://github.com/grafana/mimir/pull/7408
[Pull Request Template] change file path by @ncharaf in https://github.com/grafana/mimir/pull/7410
Update module github.com/aws/aws-sdk-go to v1.50.20 (main) by @renovate in https://github.com/grafana/mimir/pull/7419
Update grafana/grafana Docker tag to v10.3.3 (main) by @renovate in https://github.com/grafana/mimir/pull/7418
feat: Adding global kedaAutoscaling section by @beatkind in https://github.com/grafana/mimir/pull/7392
added the right pointer for OTLP endpoint by @zhehao-grafana in https://github.com/grafana/mimir/pull/7375
fix(deps): update github.com/grafana/dskit digest to ce15a83 (main) by @renovate in https://github.com/grafana/mimir/pull/7415
Replace all blockquote admonitions with the shortcode by @jdbaldry in https://github.com/grafana/mimir/pull/7420
Add partitions support to Distributor.ActiveSeries() by @pracucci in https://github.com/grafana/mimir/pull/7404
Note how to run Mimir Alertmanager as part of a monolithic deployment by @Rajakavitha1 in https://github.com/grafana/mimir/pull/7414
Added partitions support to Distributor.LabelValuesCardinality() by @pracucci in https://github.com/grafana/mimir/pull/7423
Release mimir-distributed Helm chart 5.3.0-weekly.278 by @grafanabot in https://github.com/grafana/mimir/pull/7429
Update dskit by @pracucci in https://github.com/grafana/mimir/pull/7430
Convert mimir-mixin from deprecated graph panel type to timeseries by @narqo in https://github.com/grafana/mimir/pull/7413
Remove unused logger argument by @56quarters in https://github.com/grafana/mimir/pull/7438
Update dskit by @pr00se in https://github.com/grafana/mimir/pull/7439
Require ingester zone to be configured when running ingest storage by @pracucci in https://github.com/grafana/mimir/pull/7432
Revert "Require ingester zone to be configured when running ingest storage" by @pracucci in https://github.com/grafana/mimir/pull/7443
Fix premature context cancellation in Distributor.QueryStream() when experimental ingest storage is enabled by @pracucci in https://github.com/grafana/mimir/pull/7437
Switch ingester limiter to use ingesterRing rather than lifecycler by @pr00se in https://github.com/grafana/mimir/pull/7440
Set series cache TTL different based on block metadata by @56quarters in https://github.com/grafana/mimir/pull/7407
storegateway: Reduce number of overlapping iterator interfaces into single iterator[S] by @narqo in https://github.com/grafana/mimir/pull/7451
fix(deps): update github.com/thanos-io/objstore digest to a8d75c5 (main) by @renovate in https://github.com/grafana/mimir/pull/7351
Modify ingester's limiter to work with ingest store. by @pstibrany in https://github.com/grafana/mimir/pull/7424
Add dashboard UI check to all dashboards using uid based on filename hash by @pstibrany in https://github.com/grafana/mimir/pull/7446
Add optional querier response streaming by @flxbk in https://github.com/grafana/mimir/pull/7173
query-tee: use X-Scope-OrgID for basic auth by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7452
mimir-mixin: Fix "unexpected type object, expected array" by @narqo in https://github.com/grafana/mimir/pull/7457
Update dskit to latest commit by @56quarters in https://github.com/grafana/mimir/pull/7458
chore(deps): update grafana/doc-validator docker tag to v4.1.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7463
fix(deps): update module github.com/klauspost/compress to v1.17.7 (main) by @renovate in https://github.com/grafana/mimir/pull/7462
fix(deps): update module github.com/aws/aws-sdk-go to v1.50.25 (main) by @renovate in https://github.com/grafana/mimir/pull/7460
chore(deps): update prom/prometheus docker tag to v2.50.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7465
chore(deps): update otel/opentelemetry-collector-contrib docker tag to v0.95.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7464
fix(deps): update golang.org/x/exp digest to 814bf88 (main) by @renovate in https://github.com/grafana/mimir/pull/7416
Revert "fix(deps): update github.com/thanos-io/objstore digest to a8d75c5 (main)" by @flxbk in https://github.com/grafana/mimir/pull/7459
fix(deps): update module cloud.google.com/go/storage to v1.38.0 (main) by @renovate in https://github.com/grafana/mimir/pull/7467
Add estimated compaction jobs based on bucket-index to Mimir dashboards by @pstibrany in https://github.com/grafana/mimir/pull/7449
Jsonnet / Helm: upgrade rollout-operator to v0.13.0 by @pracucci in https://github.com/grafana/mimir/pull/7469
Add more buckets to cortex_query_scheduler_queue_duration_seconds by @pracucci in https://github.com/grafana/mimir/pull/7470
Support KEDA's ignoreNullValues field. by @pstibrany in https://github.com/grafana/mimir/pull/7471
Chunk iterators: Optimize histogram memory allocations by @duricanikolic in https://github.com/grafana/mimir/pull/7427
Update alpine/helm Docker tag to v3.14.1 (main) by @renovate in https://github.com/grafana/mimir/pull/7417
[Build] Add and Use distroless image in e2e tests by @ying-jeanne in https://github.com/grafana/mimir/pull/7371
Fix regression that caused client errors to be tracked in cortex_ruler_write_requests_failed_total by @pracucci in https://github.com/grafana/mimir/pull/7472
Add cardinality API qps to Overview dashboard by @flxbk in https://github.com/grafana/mimir/pull/6720
Release mimir-distributed Helm chart 5.3.0-weekly.279 by @grafanabot in https://github.com/grafana/mimir/pull/7476
Upgrade to latest mimir-prometheus@main by @aknuds1 in https://github.com/grafana/mimir/pull/7475
Helm / Gateway: Allow to configure whether or not NGINX binds IPv6 by @wilfriedroset in https://github.com/grafana/mimir/pull/7421
Update CODEOWNERS by @grobinson-grafana in https://github.com/grafana/mimir/pull/7490
Update dskit by @pracucci in https://github.com/grafana/mimir/pull/7492
fix: Update KEDA related information by @JorTurFer in https://github.com/grafana/mimir/pull/7480
Better explain Estimated Compaction Jobs in compactor dashboard. by @pstibrany in https://github.com/grafana/mimir/pull/7481
Remove docs writer from maintaining proposals by @jdbaldry in https://github.com/grafana/mimir/pull/7493
fix(deps): update github.com/thanos-io/objstore digest to c3ccc5d (main) by @renovate in https://github.com/grafana/mimir/pull/7466
mixin: add extra alert options by @jmichalek132 in https://github.com/grafana/mimir/pull/7442
changelog: Add a note on updating all panels to "timeseries" in mixin by @narqo in https://github.com/grafana/mimir/pull/7502
Revert "fix(deps): update github.com/thanos-io/objstore digest to c3ccc5d (main)" by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7505
Querier: deprecates max-query-into-future by @wilfriedroset in https://github.com/grafana/mimir/pull/7496
Wait for owned series recomputation before lowering local series limit by @pr00se in https://github.com/grafana/mimir/pull/7411
Docs: Explain why types of caches should be separate by @56quarters in https://github.com/grafana/mimir/pull/7498
Update dskit to latest commit by @56quarters in https://github.com/grafana/mimir/pull/7510
Drop useless recording rules for non existent cache metrics by @krajorama in https://github.com/grafana/mimir/pull/7514
querier: fix remote read error translation by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7487
Deprecate metric cortex_distributor_sample_delay_seconds by @krajorama in https://github.com/grafana/mimir/pull/7516
Remove the deprecated frontend.cache_unaligned_requests YAML config by @56quarters in https://github.com/grafana/mimir/pull/7519
helm: Fix siToBytes to work with power-of-ten suffixes by @narqo in https://github.com/grafana/mimir/pull/7506
mixin: native histogram recording rule: cortex_request_duration_seconds by @krajorama in https://github.com/grafana/mimir/pull/7528
Remove deprecated store-gateway index-header settings. by @56quarters in https://github.com/grafana/mimir/pull/7521
[mimir-mixin] Add owned series to tenant dashboard by @pr00se in https://github.com/grafana/mimir/pull/7511
Release mimir-distributed Helm chart 5.3.0-weekly.280 by @grafanabot in https://github.com/grafana/mimir/pull/7527
Jsonnet: Don't explicitly set default values for index-header by @56quarters in https://github.com/grafana/mimir/pull/7532
Make persistence of sparse headers default behavior by @56quarters in https://github.com/grafana/mimir/pull/7535
Block upload: include converted timestamp in the error message by @pstibrany in https://github.com/grafana/mimir/pull/7538
Upgrade mimir-prometheus to latest main by @aknuds1 in https://github.com/grafana/mimir/pull/7541
Initialize Tanka with 1.29. by @pstibrany in https://github.com/grafana/mimir/pull/7544
Set TTL for cached postings based on block metadata by @56quarters in https://github.com/grafana/mimir/pull/7534
Querier: improve merging of batchStream objects by @duricanikolic in https://github.com/grafana/mimir/pull/7478
Update module google.golang.org/protobuf to v1.33.0 [SECURITY] (main) by @renovate in https://github.com/grafana/mimir/pull/7549
Update grafana/doc-validator Docker tag to v4.1.1 (main) by @renovate in https://github.com/grafana/mimir/pull/7524
chore(deps): update helm release grafana-agent-operator to v0.3.17 (main) by @renovate in https://github.com/grafana/mimir/pull/7525
chore(deps): update alpine/helm docker tag to v3.14.2 (main) by @renovate in https://github.com/grafana/mimir/pull/7523
chore(deps): update memcached docker tag to v1.6.24 (main) by @renovate in https://github.com/grafana/mimir/pull/7526
Remove index_header_lazy_loading_enabled docker-compose config by @flxbk in https://github.com/grafana/mimir/pull/7551
ruler: fix native histogram recording rule result corruption by @krajorama in https://github.com/grafana/mimir/pull/7552
[query-frontend] Set write deadline for /active_series requests by @flxbk in https://github.com/grafana/mimir/pull/7553
chore(deps): update golang docker tag to v1.21.8 (main) by @pr00se in https://github.com/grafana/mimir/pull/7557
Make owned series service in ingester work with partitions ring. by @pstibrany in https://github.com/grafana/mimir/pull/7508
Add Casie as maintainer by @pracucci in https://github.com/grafana/mimir/pull/7560
Update requirements doc with correct frontend parameter by @aldernero in https://github.com/grafana/mimir/pull/7566
[Helm] Release 5.2.2 by @Logiraptor in https://github.com/grafana/mimir/pull/7555
[query-frontend] /active_series: cancel request context when write deadline is reached by @flxbk in https://github.com/grafana/mimir/pull/7569
chore(deps): update module github.com/go-jose/go-jose/v3 to v3.0.3 [security] (main) by @renovate in https://github.com/grafana/mimir/pull/7571
Merge mimir-distributed-release-5.2 to main by @Logiraptor in https://github.com/grafana/mimir/pull/7568
[release-2.12] Preparing release notes for 2.12 by @grafanabot in https://github.com/grafana/mimir/pull/7600
Update the release version in CHANGELOG.md and VERSION by @duricanikolic in https://github.com/grafana/mimir/pull/7599
Run make mixin-screenshots for release 2.12 by @duricanikolic in https://github.com/grafana/mimir/pull/7602
Move -querier.minimize-ingester-requests from experimental to advanced by @duricanikolic in https://github.com/grafana/mimir/pull/7649
[release-2.12] Get rid of -querier.prefer-streaming-chunks-from-ingesters by @grafanabot in https://github.com/grafana/mimir/pull/7661
[query-frontend] Close response body in request handler (#7654) by @flxbk in https://github.com/grafana/mimir/pull/7663
Update VERSION for release 2.12.0-rc.1 by @duricanikolic in https://github.com/grafana/mimir/pull/7671
Prepare 2.12 final release by @pr00se in https://github.com/grafana/mimir/pull/7781
Update screenshots for 2.12 by @pr00se in https://github.com/grafana/mimir/pull/7796

New Contributors

@ivanfavi made their first contribution in https://github.com/grafana/mimir/pull/6106
@lukas-unity made their first contribution in https://github.com/grafana/mimir/pull/6715
@callmehyde made their first contribution in https://github.com/grafana/mimir/pull/6793
@mcanevet made their first contribution in https://github.com/grafana/mimir/pull/6932
@edwintye made their first contribution in https://github.com/grafana/mimir/pull/6760
@TimKotowski made their first contribution in https://github.com/grafana/mimir/pull/6990
@rfratto made their first contribution in https://github.com/grafana/mimir/pull/7084
@blut made their first contribution in https://github.com/grafana/mimir/pull/7150
@benoitschipper made their first contribution in https://github.com/grafana/mimir/pull/7135
@derek-cadzow made their first contribution in https://github.com/grafana/mimir/pull/7077
@github-actions made their first contribution in https://github.com/grafana/mimir/pull/7167
@snowzach made their first contribution in https://github.com/grafana/mimir/pull/7141
@rafilkmp3 made their first contribution in https://github.com/grafana/mimir/pull/5040
@QuantumEnigmaa made their first contribution in https://github.com/grafana/mimir/pull/7132
@Itaykal made their first contribution in https://github.com/grafana/mimir/pull/7218
@ncharaf made their first contribution in https://github.com/grafana/mimir/pull/7325
@zhehao-grafana made their first contribution in https://github.com/grafana/mimir/pull/7375
@JorTurFer made their first contribution in https://github.com/grafana/mimir/pull/7480

Full Changelog: https://github.com/grafana/mimir/compare/mimir-2.11.0...mimir-2.12.0

mimir - 2.12.0-rc.1

Published by duricanikolic 7 months ago

This release contains 525 PRs from 60 authors, including new contributors Benoit Schipper, Derek Cadzow, Edwin, Itay Kalfon, Ivan Farré Vicente, Jan O. Rundshagen, Jorge Turrado Ferrero, Lukas Monkevicius, Mickaël Canévet, Rafael Sathler, Rajakavitha Kodhandapani, Tim Kotowski, Vladimir Varankin, Zach, Zach Day, Zirko, blut, github-actions[bot], ncharaf, zhehao-grafana. Thank you!

Grafana Mimir version 2.12.0-rc.1 release notes

Grafana Labs is excited to announce version 2.12 of Grafana Mimir.

The highlights that follow include the top features, enhancements, and bug fixes in this release.
For the complete list of changes, refer to the CHANGELOG.

Features and enhancements

Added support to only count series that are considered active through the Cardinality API endpoint /api/v1/cardinality/label_names by passing the count_method parameter.
If set to active it counts only series that are considered active according to the -ingester.active-series-metrics-idle-timeout flag setting rather than counting all in-memory series.
The "Store-gateway: bucket tenant blocks" admin page contains a new column "No Compact".
If block no compaction marker is set, it specifies the reason and the date the marker is added.
The estimated number of compaction jobs based on the current bucket-index is now computed by the compactor.
The result is tracked by the new cortex_bucket_index_compaction_jobs metric.
If this computation fails, the cortex_bucket_index_compaction_jobs_errors_total metric is updated instead.
The estimated number of compaction jobs is also shown in Top tenants, Tenants, and Compactor dashboards.
Added mimir-distroless container image built upon a distroless image (gcr.io/distroless/static-debian12).
This improvement minimizes attack surfaces and potential CVEs by trimming down the dependencies within the image.
After comprehensive testing, the Mimir maintainers plan to shift from the current image to the distroless version.

Additionally, the following previously experimental features are now considered stable:

The number of pre-allocated workers used to forward push requests to the ingesters, configurable via the -distributor.reusable-ingester-push-workers CLI flag on distributors.
It now defaults to 2000.
Note that this is a performance optimization, and not a limiting feature.
If not enough workers available, new goroutines will be spawned.
The number of gRPC server workers used to serve the requests, configurable via the -server.grpc.num-workers CLI flag.
It now defaults to 100.
Note that this is the number of pre-allocated long-lived workers, and not a limiting feature.
If not enough workers are available, new goroutines will be spawned.
The maximum number of concurrent index header loads across all tenants, configurable via the -blocks-storage.bucket-store.index-header.lazy-loading-concurrency CLI flag on store-gateways.
It defaults to 4.
The maximum time to wait for the query-frontend to become ready before rejecting requests, configurable via the -query-frontend.not-running-timeout CLI flag on query-frontends.
It now defaults to 2s.
The CLI flag that allows queriers to reduce pressure on ingesters by initially querying only the minimum set of ingesters required to reach quorum, -querier.minimize-ingester-requests.
It is now enabled by default.
Spread-minimizing token-related CLI flags: -ingester.ring.token-generation-strategy, -ingester.ring.spread-minimizing-zones and -ingester.ring.spread-minimizing-join-ring-in-order.
You can read more about this feature in our blog post.

Important changes

In Grafana Mimir 2.12 the following behavior has changed:

Store-gateway now persists a sparse version of the index-header to disk on construction and loads sparse index-headers from disk instead of the whole index-header.
This improves the speed at which index headers are lazy-loaded from disk by up to 90%. The added disk usage is in the order of 1-2%.
Alertmanager deprecated the v1 API. All v1 API endpoints now respond with a JSON deprecation notice and a status code of 410.
All endpoints have a v2 equivalent.
The list of endpoints is:
- <alertmanager-web.external-url>/api/v1/alerts
- <alertmanager-web.external-url>/api/v1/receivers
- <alertmanager-web.external-url>/api/v1/silence/{id}
- <alertmanager-web.external-url>/api/v1/silences
- <alertmanager-web.external-url>/api/v1/status
Exemplar's label traceID has been changed to trace_id to be consistent with the OpenTelemetry standard.
Errors returned by ingesters now contain only gRPC status codes.
Previously they contained both gRPC and HTTP status codes.
To guarantee backwards compatibility when migrating from a version prior to 2.11, it's necessary to first migrate to version 2.11, and then to version 2.12.
Otherwise, it might happen that during the migration, some ingester errors with HTTP status code 4xx won't be recognized, and the corresponding request will be repeated.
Responses with gRPC status codes are now reported as status_code labels in the cortex_request_duration_seconds and cortex_ingester_client_request_duration_seconds metrics.
Responses with HTTP 4xx status codes are now treated as errors and used in status_code label of request duration metric.

The default value of the following CLI flags have been changed:

-blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes from 10MB to 100MB.
-blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes from 10MB to 100MB.
-blocks-storage.bucket-store.tenant-sync-concurrency from 10 to 1.
-query-frontend.max-cache-freshness from 1m to 10m.
-distributor.write-requests-buffer-pooling-enabled from false to true.
-locks-storage.bucket-store.block-sync-concurrency from 20 to 4.
-memberlist.stream-timeout from 10s to 2s.
-server.report-grpc-codes-in-instrumentation-label-enabled from false to true.

The following deprecated configuration options are removed in Grafana Mimir 2.12:

The YAML setting frontend.cache_unaligned_requests.
Experimental CLI flag -querier.prefer-streaming-chunks-from-ingesters.

The following configuration options are deprecated and will be removed in Grafana Mimir 2.14:

The CLI flag -ingester.limit-inflight-requests-using-grpc-method-limiter.
It now defaults to true.
The CLI flag -ingester.return-only-grpc-errors.
It now defaults to true.
To guarantee backwards compatibility when migrating from a version prior to 2.11, it's necessary to first migrate to version 2.11, and then to version 2.12.
Otherwise, it might happen that during the migration, some ingester errors with HTTP status code 4xx won't be recognized, and the corresponding request will be repeated.
The CLI flag -ingester.client.report-grpc-codes-in-instrumentation-label-enabled.
It now defaults to true.
The CLI flag -distributor.limit-inflight-requests-using-grpc-method-limiter.
It now defaults to true.
The CLI flag -distributor.enable-otlp-metadata-storage.
It now defaults to true.
The CLI flag -querier.max-query-into-future.

The following metrics are removed or deprecated:

cortex_bucket_store_blocks_loaded_by_duration has been removed.
cortex_distributor_sample_delay_seconds has been deprecated and will be removed in Mimir 2.14.

Experimental features

Grafana Mimir 2.12 includes new features that are considered experimental and disabled by default.
Use them with caution and report any issues you encounter:

The maximum number of tenant IDs that may be for a federated query can be configured via the -tenant-federation.max-tenants CLI flag on query-frontends.
By default, it's 0, meaning that the limit is disabled.
Sharding of active series queries can be enabled via the -query-frontend.shard-active-series-queries CLI flag on query-frontends.
Timely head compaction can be enabled via the -blocks-storage.tsdb.timely-head-compaction-enabled on ingesters.
If enabled, the head compaction happens when the min block range can no longer be appended, without requiring 1.5x the chunk range worth of data in the head.
Streaming of responses from querier to query-frontend can be enabled via the -querier.response-streaming-enabled CLI flag on queriers.
This is currently supported only for responses from the /api/v1/cardinality/active_series endpoint.
The maximum response size for active series queries, in bytes, can be set via the -querier.active-series-results-max-size-bytes CLI flag on queriers.
Metric relabeling on a per-tenant basis can be forcefully disabled via the -distributor.metric-relabeling-enabled CLI flag on rulers.
Metrics relabeling is enabled by default.
Query Queue Load Balancing by Query Component. Tenant query queues in the query-scheduler can now be split into subqueues by which query component is expected to be utilized to complete the query: ingesters, store-gateways, both, or uncategorized.
Dequeuing queries for a given tenant will rotate through the query component subqueues via simple round-robin.
In the event that the one of the query components (ingesters or store-gateways) experience a slowdown, queries only utilizing the the other query component can continue to be serviced.
This feature is recommended to be enabled.
The following CLI flags must be set to true in order to be in effect:
- -query-frontend.additional-query-queue-dimensions-enabled on the query-frontend.
- -query-scheduler.additional-query-queue-dimensions-enabled on the query-scheduler.
Owned series tracking in ingesters can be enabled via the -ingester.track-ingester-owned-series CLI flag.
When enabled, ingesters will track the number of in-memory series that still map to the ingester based on the ring state.
These counts are more reactive to ring and shard changes than in-memory series, and can be used when enforcing tenant series limits by enabling the -ingester.use-ingester-owned-series-for-limits CLI flag.
This feature requires zone-aware replication to be enabled, and the replication factor to be equal to the number of zones.

Bug fixes

Distributor: fixed an issue where -distributor.metric-relabeling-enabled could cause distributors to panic.
Distributor: fix an issue where -distributor.metric-relabeling-enabled could cause distributors to write unsorted labels and corrupt blocks.
Ingester: errors encountered while iterating through chunks or samples in response to a query request aren't ignored anymore.
Compactor: out-of-order blocks aren't allowed to prevent timely compaction anymore.
Querier: requests to store-gateway when a query gets canceled aren't retried anymore.
Querier: status code 499 is now returned instead of 500 when a request to remote read endpoint gets canceled.
Querier: fixed an issue where -querier.max-fetched-series-per-query wasn't applied to /series endpoint in case series loaded from ingesters.
Querier: fixed an issue with the remote-read requests HTTP status code translations.
Previously, remote-read had conflicting behaviours: when returning samples all internal errors were translated to HTTP 400, while when returning chunks all internal errors were translated to HTTP 500.
With this fix, all validation errors will be translated into HTTP 400 errors, while all other errors will be translated into HTTP 500 errors.
Query-frontend: the cortex_query_frontend_queries_total metric incorrectly reported op="query" for any request which wasn't a range query.
Now the op label value can be one of the following:
- query: instant query
- query_range: range query
- cardinality: cardinality query
- label_names_and_values: label names / values query
- active_series: active series query
- other: any other request
Ruler: fixed an issue where "failed to remotely evaluate query expression, will retry" messages were logged without context such as the trace ID and didn't appear in trace events.
Ruler: requests to remote querier when server's response exceeds its configured max payload size aren't retried anymore.
Ruler: fixed a regression that caused client errors to be tracked in cortex_ruler_write_requests_failed_total metric.
Ruler: fixed an issue with recording rule result being corruption due to an usage of a bad native histogram pointer.

Helm chart improvements

The Grafana Mimir and Grafana Enterprise Metrics Helm charts are released independently.
Refer to the Grafana Mimir Helm chart documentation.

Changelog

2.12.0-rc.1

Grafana Mimir

[BUGFIX] Query-frontend: Fix memory leak on every request. #7654

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.12.0-rc.0...mimir-2.12.0-rc.1

mimir - 2.12.0-rc.0

Published by duricanikolic 7 months ago

Grafana Mimir version 2.12.0-rc.0 release notes

Grafana Labs is excited to announce version 2.12 of Grafana Mimir.

The highlights that follow include the top features, enhancements, and bug fixes in this release.
For the complete list of changes, refer to the CHANGELOG.

Features and enhancements

Added support to only count series that are considered active through the Cardinality API endpoint /api/v1/cardinality/label_names by passing the count_method parameter.
If set to active it counts only series that are considered active according to the -ingester.active-series-metrics-idle-timeout flag setting rather than counting all in-memory series.
The "Store-gateway: bucket tenant blocks" admin page contains a new column "No Compact".
If block no compaction marker is set, it specifies the reason and the date the marker is added.
The estimated number of compaction jobs based on the current bucket-index is now computed by the compactor.
The result is tracked by the new cortex_bucket_index_compaction_jobs metric.
If this computation fails, the cortex_bucket_index_compaction_jobs_errors_total metric is updated instead.
The estimated number of compaction jobs is also shown in Top tenants, Tenants, and Compactor dashboards.
Added mimir-distroless container image built upon a distroless image (gcr.io/distroless/static-debian12).
This improvement minimizes attack surfaces and potential CVEs by trimming down the dependencies within the image.
After comprehensive testing, the Mimir maintainers plan to shift from the current image to the distroless version.

Additionally, the following previously experimental features are now considered stable:

The number of pre-allocated workers used to forward push requests to the ingesters, configurable via the -distributor.reusable-ingester-push-workers CLI flag on distributors.
It now defaults to 2000.
Note that this is a performance optimization, and not a limiting feature.
If not enough workers available, new goroutines will be spawned.
The number of gRPC server workers used to serve the requests, configurable via the -server.grpc.num-workers CLI flag.
It now defaults to 100.
Note that this is the number of pre-allocated long-lived workers, and not a limiting feature.
If not enough workers are available, new goroutines will be spawned.
The maximum number of concurrent index header loads across all tenants, configurable via the -blocks-storage.bucket-store.index-header.lazy-loading-concurrency CLI flag on store-gateways.
It defaults to 4.
The maximum time to wait for the query-frontend to become ready before rejecting requests, configurable via the -query-frontend.not-running-timeout CLI flags on query-frontends.
It now defaults to 2s.
Spread-minimizing token-related CLI flags: -ingester.ring.token-generation-strategy, -ingester.ring.spread-minimizing-zones and -ingester.ring.spread-minimizing-join-ring-in-order.
You can read more about this feature in our blog post.

Important changes

In Grafana Mimir 2.12 the following behavior has changed:

Store-gateway now persists a sparse version of the index-header to disk on construction and loads sparse index-headers from disk instead of the whole index-header.
This improves the speed at which index headers are lazy-loaded from disk by up to 90%. The added disk usage is in the order of 1-2%.
Alertmanager deprecated the v1 API. All v1 API endpoints now respond with a JSON deprecation notice and a status code of 410.
All endpoints have a v2 equivalent.
The list of endpoints is:
- <alertmanager-web.external-url>/api/v1/alerts
- <alertmanager-web.external-url>/api/v1/receivers
- <alertmanager-web.external-url>/api/v1/silence/{id}
- <alertmanager-web.external-url>/api/v1/silences
- <alertmanager-web.external-url>/api/v1/status
Exemplar's label traceID has been changed to trace_id to be consistent with the OpenTelemetry standard.
Errors returned by ingesters now contain only gRPC status codes.
Previously they contained both gRPC and HTTP status codes.
To guarantee backwards compatibility when migrating from a version prior to 2.11, it's necessary to first migrate to version 2.11, and then to version 2.12.
Otherwise, it might happen that during the migration, some ingester errors with HTTP status code 4xx won't be recognized, and the corresponding request will be repeated.
Responses with gRPC status codes are now reported as status_code labels in the cortex_request_duration_seconds and cortex_ingester_client_request_duration_seconds metrics.
Responses with HTTP 4xx status codes are now treated as errors and used in status_code label of request duration metric.

The default value of the following CLI flags have been changed:

-blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes from 10MB to 100MB.
-blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes from 10MB to 100MB.
-blocks-storage.bucket-store.tenant-sync-concurrency from 10 to 1.
-query-frontend.max-cache-freshness from 1m to 10m.
-distributor.write-requests-buffer-pooling-enabled from false to true.
-locks-storage.bucket-store.block-sync-concurrency from 20 to 4.
-memberlist.stream-timeout from 10s to 2s.
-server.report-grpc-codes-in-instrumentation-label-enabled from false to true.

The following deprecated configuration options are removed in Grafana Mimir 2.12:

The YAML setting frontend.cache_unaligned_requests.

The following configuration options are deprecated and will be removed in Grafana Mimir 2.14:

The CLI flag -ingester.limit-inflight-requests-using-grpc-method-limiter.
It now defaults to true.
The CLI flag -ingester.return-only-grpc-errors.
It now defaults to true.
To guarantee backwards compatibility when migrating from a version prior to 2.11, it's necessary to first migrate to version 2.11, and then to version 2.12.
Otherwise, it might happen that during the migration, some ingester errors with HTTP status code 4xx won't be recognized, and the corresponding request will be repeated.
The CLI flag -ingester.client.report-grpc-codes-in-instrumentation-label-enabled.
It now defaults to true.
The CLI flag -distributor.limit-inflight-requests-using-grpc-method-limiter.
It now defaults to true.
The CLI flag -distributor.enable-otlp-metadata-storage.
It now defaults to true.
The CLI flag -querier.max-query-into-future.

The following metrics are removed or deprecated:

cortex_bucket_store_blocks_loaded_by_duration has been removed.
cortex_distributor_sample_delay_seconds has been deprecated and will be removed in Mimir 2.14.

Experimental features

Grafana Mimir 2.12 includes new features that are considered experimental and disabled by default.
Use them with caution and report any issues you encounter:

The maximum number of tenant IDs that may be for a federated query can be configured via the -tenant-federation.max-tenants CLI flag on query-frontends.
By default, it's 0, meaning that the limit is disabled.
Sharding of active series queries can be enabled via the -query-frontend.shard-active-series-queries CLI flag on query-frontends.
Timely head compaction can be enabled via the -blocks-storage.tsdb.timely-head-compaction-enabled on ingesters.
If enabled, the head compaction happens when the min block range can no longer be appended, without requiring 1.5x the chunk range worth of data in the head.
Streaming of responses from querier to query-frontend can be enabled via the -querier.response-streaming-enabled CLI flag on queriers.
This is currently supported only for responses from the /api/v1/cardinality/active_series endpoint.
The maximum response size for active series queries, in bytes, can be set via the -querier.active-series-results-max-size-bytes CLI flag on queriers.
Metric relabeling on a per-tenant basis can be forcefully disabled via the -distributor.metric-relabeling-enabled CLI flag on rulers.
Metrics relabeling is enabled by default.
Query Queue Load Balancing by Query Component. Tenant query queues in the query-scheduler can now be split into subqueues by which query component is expected to be utilized to complete the query: ingesters, store-gateways, both, or uncategorized.
Dequeuing queries for a given tenant will rotate through the query component subqueues via simple round-robin.
In the event that the one of the query components (ingesters or store-gateways) experience a slowdown, queries only utilizing the the other query component can continue to be serviced.
This feature is recommended to be enabled.
The following CLI flags must be set to true in order to be in effect:
- -query-frontend.additional-query-queue-dimensions-enabled on the query-frontend.
- -query-scheduler.additional-query-queue-dimensions-enabled on the query-scheduler.
Owned series tracking in ingesters can be enabled via the -ingester.track-ingester-owned-series CLI flag.
When enabled, ingesters will track the number of in-memory series that still map to the ingester based on the ring state.
These counts are more reactive to ring and shard changes than in-memory series, and can be used when enforcing tenant series limits by enabling the -ingester.use-ingester-owned-series-for-limits CLI flag.
This feature requires zone-aware replication to be enabled, and the replication factor to be equal to the number of zones.

Bug fixes

Distributor: fixed an issue where -distributor.metric-relabeling-enabled could cause distributors to panic.
Distributor: fix an issue where -distributor.metric-relabeling-enabled could cause distributors to write unsorted labels and corrupt blocks.
Ingester: errors encountered while iterating through chunks or samples in response to a query request aren't ignored anymore.
Compactor: out-of-order blocks aren't allowed to prevent timely compaction anymore.
Querier: requests to store-gateway when a query gets canceled aren't retried anymore.
Querier: status code 499 is now returned instead of 500 when a request to remote read endpoint gets canceled.
Querier: fixed an issue where -querier.max-fetched-series-per-query wasn't applied to /series endpoint in case series loaded from ingesters.
Querier: fixed an issue with the remote-read requests HTTP status code translations.
Previously, remote-read had conflicting behaviours: when returning samples all internal errors were translated to HTTP 400, while when returning chunks all internal errors were translated to HTTP 500.
With this fix, all validation errors will be translated into HTTP 400 errors, while all other errors will be translated into HTTP 500 errors.
Query-frontend: the cortex_query_frontend_queries_total metric incorrectly reported op="query" for any request which wasn't a range query.
Now the op label value can be one of the following:
- query: instant query
- query_range: range query
- cardinality: cardinality query
- label_names_and_values: label names / values query
- active_series: active series query
- other: any other request
Ruler: fixed an issue where "failed to remotely evaluate query expression, will retry" messages were logged without context such as the trace ID and didn't appear in trace events.
Ruler: requests to remote querier when server's response exceeds its configured max payload size aren't retried anymore.
Ruler: fixed a regression that caused client errors to be tracked in cortex_ruler_write_requests_failed_total metric.
Ruler: fixed an issue with recording rule result being corruption due to an usage of a bad native histogram pointer.

Helm chart improvements

The Grafana Mimir and Grafana Enterprise Metrics Helm charts are released independently.
Refer to the Grafana Mimir Helm chart documentation.

Changelog

2.12.0-rc.0

Grafana Mimir

[CHANGE] Alertmanager: Deprecates the v1 API. All v1 API endpoints now respond with a JSON deprecation notice and a status code of 410. All endpoints have a v2 equivalent. The list of endpoints is: #7103
- <alertmanager-web.external-url>/api/v1/alerts
- <alertmanager-web.external-url>/api/v1/receivers
- <alertmanager-web.external-url>/api/v1/silence/{id}
- <alertmanager-web.external-url>/api/v1/silences
- <alertmanager-web.external-url>/api/v1/status
[CHANGE] Ingester: Increase default value of -blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes and -blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes to 100 MiB (previous default value was 10 MiB). #6764
[CHANGE] Validate tenant IDs according to documented behavior even when tenant federation is not enabled. Note that this will cause some previously accepted tenant IDs to be rejected such as those longer than 150 bytes or containing | characters. #6959
[CHANGE] Ruler: don't use backoff retry on remote evaluation in case of 4xx errors. #7004
[CHANGE] Server: responses with HTTP 4xx status codes are now treated as errors and used in status_code label of request duration metric. #7045
[CHANGE] Memberlist: change default for -memberlist.stream-timeout from 10s to 2s. #7076
[CHANGE] Memcached: remove legacy thanos_cache_memcached_* and thanos_memcached_* prefixed metrics. Instead, Memcached and Redis cache clients now emit thanos_cache_* prefixed metrics with a backend label. #7076
[CHANGE] Ruler: the following metrics, exposed when the ruler is configured to discover Alertmanager instances via service discovery, have been renamed: #7057
- prometheus_sd_failed_configs renamed to cortex_prometheus_sd_failed_configs
- prometheus_sd_discovered_targets renamed to cortex_prometheus_sd_discovered_targets
- prometheus_sd_received_updates_total renamed to cortex_prometheus_sd_received_updates_total
- prometheus_sd_updates_delayed_total renamed to cortex_prometheus_sd_updates_delayed_total
- prometheus_sd_updates_total renamed to cortex_prometheus_sd_updates_total
- prometheus_sd_refresh_failures_total renamed to cortex_prometheus_sd_refresh_failures_total
- prometheus_sd_refresh_duration_seconds renamed to cortex_prometheus_sd_refresh_duration_seconds
[CHANGE] Query-frontend: the default value for -query-frontend.not-running-timeout has been changed from 0 (disabled) to 2s. The configuration option has also been moved from "experimental" to "advanced". #7126
[CHANGE] Store-gateway: to reduce disk contention on HDDs the default value for blocks-storage.bucket-store.tenant-sync-concurrency has been changed from 10 to 1 and the default value for blocks-storage.bucket-store.block-sync-concurrency has been changed from 20 to 4. #7136
[CHANGE] Store-gateway: Remove deprecated CLI flags -blocks-storage.bucket-store.index-header-lazy-loading-enabled and -blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout and their corresponding YAML settings. Instead, use -blocks-storage.bucket-store.index-header.lazy-loading-enabled and -blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout. #7521
[CHANGE] Store-gateway: Mark experimental CLI flag -blocks-storage.bucket-store.index-header.lazy-loading-concurrency and its corresponding YAML settings as advanced. #7521
[CHANGE] Store-gateway: Remove experimental CLI flag -blocks-storage.bucket-store.index-header.sparse-persistence-enabled since this is now the default behavior. #7535
[CHANGE] All: set -server.report-grpc-codes-in-instrumentation-label-enabled to true by default, which enables reporting gRPC status codes as status_code labels in the cortex_request_duration_seconds metric. #7144
[CHANGE] Distributor: report gRPC status codes as status_code labels in the cortex_ingester_client_request_duration_seconds metric by default. #7144
[CHANGE] Distributor: CLI flag -ingester.client.report-grpc-codes-in-instrumentation-label-enabled has been deprecated, and its default value is set to true. #7144
[CHANGE] Ingester: CLI flag -ingester.return-only-grpc-errors has been deprecated, and its default value is set to true. To ensure backwards compatibility, during a migration from a version prior to 2.11.0 to 2.12 or later, -ingester.return-only-grpc-errors should be set to false. Once all the components are migrated, the flag can be removed. #7151
[CHANGE] Ingester: the following CLI flags have been moved from "experimental" to "advanced": #7169
- -ingester.ring.token-generation-strategy
- -ingester.ring.spread-minimizing-zones
- -ingester.ring.spread-minimizing-join-ring-in-order
[CHANGE] Query-frontend: the default value of the CLI flag -query-frontend.max-cache-freshness (and its respective YAML configuration parameter) has been changed from 1m to 10m. #7161
[CHANGE] Distributor: default the optimization -distributor.write-requests-buffer-pooling-enabled to true. #7165
[CHANGE] Tracing: Move query information to span attributes instead of span logs. #7046
[CHANGE] Distributor: the default value of circuit breaker's CLI flag -ingester.client.circuit-breaker.cooldown-period has been changed from 1m to 10s. #7310
[CHANGE] Store-gateway: remove cortex_bucket_store_blocks_loaded_by_duration. cortex_bucket_store_series_blocks_queried is better suited for detecting when compactors are not able to keep up with the number of blocks to compact. #7309
[CHANGE] Ingester, Distributor: the support for rejecting push requests received via gRPC before reading them into memory, enabled via -ingester.limit-inflight-requests-using-grpc-method-limiter and -distributor.limit-inflight-requests-using-grpc-method-limiter, is now stable and enabled by default. The configuration options have been deprecated and will be removed in Mimir 2.14. #7360
[CHANGE] Distributor: Change-distributor.enable-otlp-metadata-storage flag's default to true, and deprecate it. The flag will be removed in Mimir 2.14. #7366
[CHANGE] Store-gateway: Use a shorter TTL for cached items related to temporary blocks. #7407 #7534
[CHANGE] Standardise exemplar label as "trace_id". #7475
[CHANGE] The configuration option -querier.max-query-into-future has been deprecated and will be removed in Mimir 2.14. #7496
[CHANGE] Distributor: the metric cortex_distributor_sample_delay_seconds has been deprecated and will be removed in Mimir 2.14. #7516
[CHANGE] Query-frontend: The deprecated YAML setting frontend.cache_unaligned_requests has been moved to limits.cache_unaligned_requests. #7519
[FEATURE] Introduce -server.log-source-ips-full option to log all IPs from Forwarded, X-Real-IP, X-Forwarded-For headers. #7250
[FEATURE] Introduce -tenant-federation.max-tenants option to limit the max number of tenants allowed for requests when federation is enabled. #6959
[FEATURE] Cardinality API: added a new count_method parameter which enables counting active label values. #7085
[FEATURE] Querier / query-frontend: added -querier.promql-experimental-functions-enabled CLI flag (and respective YAML config option) to enable experimental PromQL functions. The experimental functions introduced are: mad_over_time(), sort_by_label() and sort_by_label_desc(). #7057
[FEATURE] Alertmanager API: added -alertmanager.grafana-alertmanager-compatibility-enabled CLI flag (and respective YAML config option) to enable an experimental API endpoints that support the migration of the Grafana Alertmanager. #7057
[FEATURE] Alertmanager: Added -alertmanager.utf8-strict-mode-enabled to control support for any UTF-8 character as part of Alertmanager configuration/API matchers and labels. It's default value is set to false. #6898
[FEATURE] Querier: added histogram_avg() function support to PromQL. #7293
[FEATURE] Ingester: added -blocks-storage.tsdb.timely-head-compaction flag, which enables more timely head compaction, and defaults to false. #7372
[FEATURE] Compactor: Added /compactor/tenants and /compactor/tenant/{tenant}/planned_jobs endpoints that provide functionality that was provided by tools/compaction-planner -- listing of planned compaction jobs based on tenants' bucket index. #7381
[FEATURE] Add experimental support for streaming response bodies from queriers to frontends via -querier.response-streaming-enabled. This is currently only supported for the /api/v1/cardinality/active_series endpoint. #7173
[FEATURE] Release: Added mimir distroless docker image. #7371
[FEATURE] Add support for the new grammar of {"metric_name", "l1"="val"} to promql and some of the exposition formats. #7475 #7541
[ENHANCEMENT] Distributor: Add a new metric cortex_distributor_otlp_requests_total to track the total number of OTLP requests. #7385
[ENHANCEMENT] Vault: add lifecycle manager for token used to authenticate to Vault. This ensures the client token is always valid. Includes a gauge (cortex_vault_token_lease_renewal_active) to check whether token renewal is active, and the counters cortex_vault_token_lease_renewal_success_total and cortex_vault_auth_success_total to see the total number of successful lease renewals / authentications. #7337
[ENHANCEMENT] Store-gateway: add no-compact details column on store-gateway tenants admin UI. #6848
[ENHANCEMENT] PromQL: ignore small errors for bucketQuantile #6766
[ENHANCEMENT] Distributor: improve efficiency of some errors #6785
[ENHANCEMENT] Ruler: exclude vector queries from being tracked in cortex_ruler_queries_zero_fetched_series_total. #6544
[ENHANCEMENT] Ruler: local storage backend now supports reading a rule group via /config/api/v1/rules/{namespace}/{groupName} configuration API endpoint. #6632
[ENHANCEMENT] Query-Frontend and Query-Scheduler: split tenant query request queues by query component with query-frontend.additional-query-queue-dimensions-enabled and query-scheduler.additional-query-queue-dimensions-enabled. #6772
[ENHANCEMENT] Distributor: support disabling metric relabel rules per-tenant via the flag -distributor.metric-relabeling-enabled or associated YAML. #6970
[ENHANCEMENT] Distributor: -distributor.remote-timeout is now accounted from the first ingester push request being sent. #6972
[ENHANCEMENT] Storage Provider: -<prefix>.s3.sts-endpoint sets a custom endpoint for AWS Security Token Service (AWS STS) in s3 storage provider. #6172
[ENHANCEMENT] Querier: add cortex_querier_queries_storage_type_total metric that indicates how many queries have executed for a source, ingesters or store-gateways. Add cortex_querier_query_storegateway_chunks_total metric to count the number of chunks fetched from a store gateway. #7099,#7145
[ENHANCEMENT] Query-frontend: add experimental support for sharding active series queries via -query-frontend.shard-active-series-queries. #6784
[ENHANCEMENT] Distributor: set -distributor.reusable-ingester-push-workers=2000 by default and mark feature as advanced. #7128
[ENHANCEMENT] All: set -server.grpc.num-workers=100 by default and mark feature as advanced. #7131
[ENHANCEMENT] Distributor: invalid metric name error message gets cleaned up to not include non-ascii strings. #7146
[ENHANCEMENT] Store-gateway: add source, level, and out_or_order to cortex_bucket_store_series_blocks_queried metric that indicates the number of blocks that were queried from store gateways by block metadata. #7112 #7262 #7267
[ENHANCEMENT] Compactor: After updating bucket-index, compactor now also computes estimated number of compaction jobs based on current bucket-index, and reports the result in cortex_bucket_index_estimated_compaction_jobs metric. If computation of jobs fails, cortex_bucket_index_estimated_compaction_jobs_errors_total is updated instead. #7299
[ENHANCEMENT] Mimir: Integrate profiling into tracing instrumentation. #7363
[ENHANCEMENT] Alertmanager: Adds metric cortex_alertmanager_notifications_suppressed_total that counts the total number of notifications suppressed for being silenced, inhibited, outside of active time intervals or within muted time intervals. #7384
[ENHANCEMENT] Query-scheduler: added more buckets to cortex_query_scheduler_queue_duration_seconds histogram metric, in order to better track queries staying in the queue for longer than 10s. #7470
[ENHANCEMENT] A type label is added to prometheus_tsdb_head_out_of_order_samples_appended_total metric. #7475
[ENHANCEMENT] Distributor: Optimize OTLP endpoint. #7475
[ENHANCEMENT] API: Use github.com/klauspost/compress for faster gzip and deflate compression of API responses. #7475
[ENHANCEMENT] Ingester: Limiting on owned series (-ingester.use-ingester-owned-series-for-limits) now prevents discards in cases where a tenant is sharded across all ingesters (or shuffle sharding is disabled) and the ingester count increases. #7411
[ENHANCEMENT] Block upload: include converted timestamps in the error message if block is from the future. #7538
[ENHANCEMENT] Query-frontend: Introduce -query-frontend.active-series-write-timeout to allow configuring the server-side write timeout for active series requests. #7553 #7569
[BUGFIX] Ingester: don't ignore errors encountered while iterating through chunks or samples in response to a query request. #6451
[BUGFIX] Fix issue where queries can fail or omit OOO samples if OOO head compaction occurs between creating a querier and reading chunks #6766
[BUGFIX] Fix issue where concatenatingChunkIterator can obscure errors #6766
[BUGFIX] Fix panic during tsdb Commit #6766
[BUGFIX] tsdb/head: wlog exemplars after samples #6766
[BUGFIX] Ruler: fix issue where "failed to remotely evaluate query expression, will retry" messages are logged without context such as the trace ID and do not appear in trace events. #6789
[BUGFIX] Ruler: do not retry requests to remote querier when server's response exceeds its configured max payload size. #7216
[BUGFIX] Querier: fix issue where spans in query request traces were not nested correctly. #6893
[BUGFIX] Fix issue where all incoming HTTP requests have duplicate trace spans. #6920
[BUGFIX] Querier: do not retry requests to store-gateway when a query gets canceled. #6934
[BUGFIX] Querier: return 499 status code instead of 500 when a request to remote read endpoint gets canceled. #6934
[BUGFIX] Querier: fix issue where -querier.max-fetched-series-per-query is not applied to /series endpoint if the series are loaded from ingesters. #7055
[BUGFIX] Distributor: fix issue where -distributor.metric-relabeling-enabled may cause distributors to panic #7176
[BUGFIX] Distributor: fix issue where -distributor.metric-relabeling-enabled may cause distributors to write unsorted labels and corrupt blocks #7326
[BUGFIX] Query-frontend: the cortex_query_frontend_queries_total report incorrectly reported op="query" for any request which wasn't a range query. Now the op label value can be one of the following: #7207
- query: instant query
- query_range: range query
- cardinality: cardinality query
- label_names_and_values: label names / values query
- active_series: active series query
- other: any other request
[BUGFIX] Fix performance regression introduced in Mimir 2.11.0 when uploading blocks to AWS S3. #7240
[BUGFIX] Query-frontend: fix race condition when sharding active series is enabled (see above) and response is compressed with snappy. #7290
[BUGFIX] Query-frontend: "query stats" log unsuccessful replies from downstream as "failed". #7296
[BUGFIX] Packaging: remove reload from systemd file as mimir does not take into account SIGHUP. #7345
[BUGFIX] Compactor: do not allow out-of-order blocks to prevent timely compaction. #7342
[BUGFIX] Update google.golang.org/grpc to resolve occasional issues with gRPC server closing its side of connection before it was drained by the client. #7380
[BUGFIX] Query-frontend: abort response streaming for active_series requests when the request context is canceled. #7378
[BUGFIX] Compactor: improve compaction of sporadic blocks. #7329
[BUGFIX] Ruler: fix regression that caused client errors to be tracked in cortex_ruler_write_requests_failed_total metric. #7472
[BUGFIX] promql: Fix Range selectors with an @ modifier are wrongly scoped in range queries. #7475
[BUGFIX] Fix metadata API using wrong JSON field names. #7475
[BUGFIX] Ruler: fix native histogram recording rule result corruption. #7552

Mixin

[CHANGE] The job label matcher for distributor and gateway have been extended to include any deployment matching distributor.* and cortex-gw.* respectively. This change allows to match custom and multi-zone distributor and gateway deployments too. #6817
[ENHANCEMENT] Dashboards: Add panels for alertmanager activity of a tenant #6826
[ENHANCEMENT] Dashboards: Add graphs to "Slow Queries" dashboard. #6880
[ENHANCEMENT] Dashboards: Update all deprecated "graph" panels to "timeseries" panels. #6864 #7413 #7457
[ENHANCEMENT] Dashboards: Make most columns in "Slow Queries" sortable. #7000
[ENHANCEMENT] Dashboards: Render graph panels at full resolution as opposed to at half resolution. #7027
[ENHANCEMENT] Dashboards: show query-scheduler queue length on "Reads" and "Remote Ruler Reads" dashboards. #7088
[ENHANCEMENT] Dashboards: Add estimated number of compaction jobs to "Compactor", "Tenants" and "Top tenants" dashboards. #7449 #7481
[ENHANCEMENT] Recording rules: add native histogram recording rules to cortex_request_duration_seconds. #7528
[ENHANCEMENT] Dashboards: Add total owned series, and per-ingester in-memory and owned series to "Tenants" dashboard. #7511
[BUGFIX] Dashboards: drop step parameter from targets as it is not supported. #7157
[BUGFIX] Recording rules: drop rules for metrics removed in 2.0: cortex_memcache_request_duration_seconds and cortex_cache_request_duration_seconds. #7514

Jsonnet

[CHANGE] Distributor: Increase JAEGER_REPORTER_MAX_QUEUE_SIZE from the default (100) to 1000, to avoid dropping tracing spans. #7259
[CHANGE] Querier: Increase JAEGER_REPORTER_MAX_QUEUE_SIZE from 1000 to 5000, to avoid dropping tracing spans. #6764
[CHANGE] rollout-operator: remove default CPU limit. #7066
[CHANGE] Store-gateway: Increase JAEGER_REPORTER_MAX_QUEUE_SIZE from the default (100) to 1000, to avoid dropping tracing spans. #7068
[CHANGE] Query-frontend, ingester, ruler, backend and write instances: Increase JAEGER_REPORTER_MAX_QUEUE_SIZE from the default (100), to avoid dropping tracing spans. #7086
[CHANGE] Ring: relaxed the hash ring heartbeat period and timeout for distributor, ingester, store-gateway and compactor: #6860
- -distributor.ring.heartbeat-period set to 1m
- -distributor.ring.heartbeat-timeout set to 4m
- -ingester.ring.heartbeat-period set to 2m
- -store-gateway.sharding-ring.heartbeat-period set to 1m
- -store-gateway.sharding-ring.heartbeat-timeout set to 4m
- -compactor.ring.heartbeat-period set to 1m
- -compactor.ring.heartbeat-timeout set to 4m
[CHANGE] Ruler-querier: the topology spread constrain max skew is now configured through the configuration option ruler_querier_topology_spread_max_skew instead of querier_topology_spread_max_skew. #7204
[CHANGE] Distributor: -server.grpc.keepalive.max-connection-age lowered from 2m to 60s and configured -shutdown-delay=90s and termination grace period to 100 seconds in order to reduce the chances of failed gRPC write requests when distributors gracefully shutdown. #7361
[FEATURE] Added support for the following root-level settings to configure the list of matchers to apply to node affinity: #6782 #6829
- alertmanager_node_affinity_matchers
- compactor_node_affinity_matchers
- continuous_test_node_affinity_matchers
- distributor_node_affinity_matchers
- ingester_node_affinity_matchers
- ingester_zone_a_node_affinity_matchers
- ingester_zone_b_node_affinity_matchers
- ingester_zone_c_node_affinity_matchers
- mimir_backend_node_affinity_matchers
- mimir_backend_zone_a_node_affinity_matchers
- mimir_backend_zone_b_node_affinity_matchers
- mimir_backend_zone_c_node_affinity_matchers
- mimir_read_node_affinity_matchers
- mimir_write_node_affinity_matchers
- mimir_write_zone_a_node_affinity_matchers
- mimir_write_zone_b_node_affinity_matchers
- mimir_write_zone_c_node_affinity_matchers
- overrides_exporter_node_affinity_matchers
- querier_node_affinity_matchers
- query_frontend_node_affinity_matchers
- query_scheduler_node_affinity_matchers
- rollout_operator_node_affinity_matchers
- ruler_node_affinity_matchers
- ruler_node_affinity_matchers
- ruler_querier_node_affinity_matchers
- ruler_query_frontend_node_affinity_matchers
- ruler_query_scheduler_node_affinity_matchers
- store_gateway_node_affinity_matchers
- store_gateway_node_affinity_matchers
- store_gateway_zone_a_node_affinity_matchers
- store_gateway_zone_b_node_affinity_matchers
- store_gateway_zone_c_node_affinity_matchers
[FEATURE] Ingester: Allow automated zone-by-zone downscaling, that can be enabled via the ingester_automated_downscale_enabled flag. It is disabled by default. #6850
[ENHANCEMENT] Alerts: Add MimirStoreGatewayTooManyFailedOperations warning alert that triggers when Mimir store-gateway report error when interacting with the object storage. #6831
[ENHANCEMENT] Querier HPA: improved scaling metric and scaling policies, in order to scale up and down more gradually. #6971
[ENHANCEMENT] Rollout-operator: upgraded to v0.13.0. #7469
[ENHANCEMENT] Rollout-operator: add tracing configuration to rollout-operator container (when tracing is enabled and configured). #7469
[ENHANCEMENT] Query-frontend: configured -shutdown-delay, -server.grpc.keepalive.max-connection-age and termination grace period to reduce the likelihood of queries hitting terminated query-frontends. #7129
[ENHANCEMENT] Autoscaling: add support for KEDA's ignoreNullValues option for Prometheus scaler. #7471
[BUGFIX] Update memcached-exporter to 0.14.1 due to CVE-2023-39325. #6861

Mimirtool

[FEATURE] Add command migrate-utf8 to migrate Alertmanager configurations for Alertmanager versions 0.27.0 and later. #7383
[ENHANCEMENT] Add template render command to render locally a template. #7325
[ENHANCEMENT] Add --extra-headers option to mimirtool rules command to add extra headers to requests for auth. #7141
[ENHANCEMENT] Analyze Prometheus: set tenant header. #6737
[ENHANCEMENT] Add argument --output-dir to mimirtool alertmanager get where the config and templates will be written to and can be loaded via mimirtool alertmanager load #6760
[BUGFIX] Analyze rule-file: .metricsUsed field wasn't populated. #6953

Mimir Continuous Test

[ENHANCEMENT] Include comparison of all expected and actual values when any float sample does not match. #6756

Query-tee

[BUGFIX] Fix issue where Host HTTP header was not being correctly changed for the proxy targets. #7386
[ENHANCEMENT] Allow using the value of X-Scope-OrgID for basic auth username in the forwarded request if URL username is set as __REQUEST_HEADER_X_SCOPE_ORGID__. #7452

Documentation

[CHANGE] No longer mark OTLP distributor endpoint as experimental. #7348
[ENHANCEMENT] Added runbook for KubePersistentVolumeFillingUp alert. #7297
[ENHANCEMENT] Add Grafana Cloud recommendations to OTLP documentation. #7375
[BUGFIX] Fixed typo on single zone->zone aware replication Helm page. #7327

Tools

[CHANGE] copyblocks: The flags for copyblocks have been changed to align more closely with other tools. #6607
[CHANGE] undelete-blocks: undelete-blocks-gcs has been removed and replaced with undelete-blocks, which supports recovering deleted blocks in versioned buckets from ABS, GCS, and S3-compatible object storage. #6607
[FEATURE] copyprefix: Add tool to copy objects between prefixes. Supports ABS, GCS, and S3-compatible object storage. #6607

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.11.0...mimir-2.12.0-rc.0

mimir - 2.11.0

Published by leizor 10 months ago

This release contains 532 PRs from 55 authors, including new contributors Benjamin, Dominik Kepinski, Jonathan Donzallaz, Juraj Michálek, Kai.Ke, Ludovic Terrier, Luke, Maciej Lech, Matthew Penner, Michael Potter, Mihai Țimbota-Belin, Rasmus Werner Salling, Ying WANG, chencs, fayzal-g, kalle (jag), sarthaktyagi-505, whoami. Thank you!

Grafana Mimir version 2.11.0 release notes

Grafana Labs is excited to announce version 2.11 of Grafana Mimir.

The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.

Features and enhancements

Sampled logging of errors in the ingester. A high-traffic Mimir cluster can occasionally become bogged down logging high volumes of repeated errors. You can now reduce the amount of errors outputted to logs by setting a sample rate via the -ingester.error-sample-rate CLI flag.
Add total request size instance limit for ingesters. This limit protects the ingesters against requests that together may cause an OOM. Enable this feature by setting the -ingester.instance-limits.max-inflight-push-requests-bytes CLI flag in combination with the -ingester.limit-inflight-requests-using-grpc-method-limiter CLI flag.
Reduce the resolution of incoming native histograms samples if the incoming sample has too many buckets compared to -validation.max-native-histogram-buckets. This is enabled by default but can be turned off by setting the -validation.reduce-native-histogram-over-max-buckets CLI flag to false.
Improved query-scheduler performance under load. This is particularly apparent for clusters with large numbers of queriers.
Ingester to querier chunks streaming reduces the memory utilization of queriers and reduces the likelihood of OOMs.
Ingester query request minimization reduces the number of query requests to ingesters, improving performance and resource utilization for both ingesters and queriers.

Experimental features

Grafana Mimir 2.11 includes new features that are considered experimental and disabled by default. Please use them with caution and report any issue you encounter:

Block specified queries on a per-tenant basis. This is configured via the blocked_queries limit. See the docs for more information.
Store metadata when ingesting metrics via OTLP. This makes metric description and type available when ingesting metrics via OTLP. You can enable this feature by setting the CLI flag -distributor.enable-otlp-metadata-storage to true.
Reject gRPC push requests that the ingester/distributor is unable to accept before reading them into memory. You can enable this feature by using the -ingester.limit-inflight-requests-using-grpc-method-limiter and/or the -distributor.limit-inflight-requests-using-grpc-method-limiter CLI flags for the ingester and/or the distributor, respectively.
Customize the memcached client write and read buffer size. The buffer allocated for each memcached connection can be configured via the following CLI flags:
- For the blocks storage:
  - -blocks-storage.bucket-store.chunks-cache.memcached.read-buffer-size-bytes
  - -blocks-storage.bucket-store.chunks-cache.memcached.write-buffer-size-bytes
  - -blocks-storage.bucket-store.index-cache.memcached.read-buffer-size-bytes
  - -blocks-storage.bucket-store.index-cache.memcached.write-buffer-size-bytes
  - -blocks-storage.bucket-store.metadata-cache.memcached.read-buffer-size-bytes
  - -blocks-storage.bucket-store.metadata-cache.memcached.write-buffer-size-bytes
- For the query frontend:
  - -query-frontend.results-cache.memcached.read-buffer-size-bytes
  - -query-frontend.results-cache.memcached.write-buffer-size-bytes
- For the ruler storage:
  - -ruler-storage.cache.memcached.read-buffer-size-bytes
  - -ruler-storage.cache.memcached.write-buffer-size-bytes
Configure the number of long-living workers used to process gRPC requests. This can decrease CPU usage by reducing the number of stack allocations. Configure this feature by using the -server.grpc.num-workers CLI flag.
Enforce a limit in bytes on the PostingsForMatchers cache used by ingesters. This limit can be configured via the -blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes and -blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes CLI flags.
Pre-allocate the pool of workers in the distributor that are used to send push requests to ingesters. This can decrease CPU usage by reducing the number of stack allocations. You can enable this feature by using the -distributor.reusable-ingester-push-worker flag.
Include a Retry-After header in recoverable error responses from the distributor. This can protect your Mimir cluster from clients including Prometheus that default to retrying very quickly. Enable this feature by setting the -distributor.retry-after-header.enabled CLI flag.

Helm chart improvements

The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.

Important changes

In Grafana Mimir 2.11 the following behavior has changed:

The utilization-based read path limiter now operates on Go heap size instead of RSS from the Linux proc file system.

The following configuration options had been previously deprecated and are removed in Grafana Mimir 2.11:

The CLI flag -querier.iterators.
The CLI flag -query.batch-iterators.
The CLI flag -blocks-storage.bucket-store.bucket-index.enabled.
The CLI flag -blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes.
The CLI flag -blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes.
The CLI flag -blocks-storage.bucket-store.max-chunk-pool-bytes.

The following configuration options are deprecated and will be removed in Grafana Mimir 2.13:

The CLI flag -log.buffered; this is now the default behavior.

The following metrics are removed:

cortex_query_frontend_workers_enqueued_requests_total; use cortex_query_frontend_enqueue_duration_seconds_count instead.

The following configuration option defaults were changed:

The CLI flag -blocks-storage.bucket-store.index-header.sparse-persistence-enabled now defaults to true.
The default value for the CLI flag -blocks-storage.bucket-store.index-header.lazy-loading-concurrency was changed from 0 to 4.
The default value for the CLI flag -blocks-storage.tsdb.series-hash-cache-max-size-bytes was changed from 1GB to 350MB.
The default value for the CLI flag -blocks-storage.tsdb.early-head-compaction-min-estimated-series-reduction-percentage was changed from 10 to 15.

Bug fixes

Ingester: Respect context cancelation during query execution. PR 6085
Distributor: Return 529 when ingestion rate limit is hit and the distributor.service_overload_status_code_on_rate_limit_enabled flag is active. PR 6549
Query-scheduler: Prevent accumulation of stale querier connections. PR 6100
Packaging: Fix preremove script preventing upgrades on RHEL based OS. PR 6067

Changelog

2.11.0

Grafana Mimir

[CHANGE] The following deprecated configurations have been removed: #6673 #6779 #6808 #6814
- -querier.iterators
- -querier.batch-iterators
- -blocks-storage.bucket-store.max-chunk-pool-bytes
- -blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes
- -blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes
- -blocks-storage.bucket-store.bucket-index.enabled
[CHANGE] Querier: Split worker GRPC config into separate client configs for the frontend and scheduler to allow TLS to be configured correctly when specifying the tls_server_name. The GRPC config specified under -querier.frontend-client.* will no longer apply to the scheduler client, and will need to be set explicitly under -querier.scheduler-client.*. #6445 #6573
[CHANGE] Store-gateway: enable sparse index headers by default. Sparse index headers reduce the time to load an index header up to 90%. #6005
[CHANGE] Store-gateway: lazy-loading concurrency limit default value is now 4. #6004
[CHANGE] General: enabled -log.buffered by default. The -log.buffered has been deprecated and will be removed in Mimir 2.13. #6131
[CHANGE] Ingester: changed default -blocks-storage.tsdb.series-hash-cache-max-size-bytes setting from 1GB to 350MB. The new default cache size is enough to store the hashes for all series in a ingester, assuming up to 2M in-memory series per ingester and using the default 13h retention period for local TSDB blocks in the ingesters. #6130
[CHANGE] Query-frontend: removed cortex_query_frontend_workers_enqueued_requests_total. Use cortex_query_frontend_enqueue_duration_seconds_count instead. #6121
[CHANGE] Ingester / querier: enable ingester to querier chunks streaming by default and mark it as stable. #6174
[CHANGE] Ingester / querier: enable ingester query request minimisation by default and mark it as stable. #6174
[CHANGE] Ingester: changed the default value for the experimental configuration parameter -blocks-storage.tsdb.early-head-compaction-min-estimated-series-reduction-percentage from 10 to 15. #6186
[CHANGE] Ingester: /ingester/push HTTP endpoint has been removed. This endpoint was added for testing and troubleshooting, but was never documented or used for anything. #6299
[CHANGE] Experimental setting -log.rate-limit-logs-per-second-burst renamed to -log.rate-limit-logs-burst-size. #6230
[CHANGE] Distributor: instead of errors with HTTP status codes, Push() now returns errors with gRPC codes: #6377
- http.StatusAccepted (202) code is replaced with codes.AlreadyExists.
- http.BadRequest (400) code is replaced with codes.FailedPrecondition.
- http.StatusTooManyRequests (429) and the non-standard 529 (The service is overloaded) codes are replaced with codes.ResourceExhausted.
[CHANGE] Ingester: by setting the newly introduced experimental CLI flag -ingester.return-only-grpc-errors to true, ingester will return only gRPC errors. This feature changes the following status codes: #6443 #6680 #6723
- http.StatusBadRequest (400) is replaced with codes.FailedPrecondition on the write path.
- http.StatusServiceUnavailable (503) is replaced with codes.Internal on the write path, and with codes.ResourceExhausted on the read path.
- codes.Unknown is replaced with codes.Internal on both write and read path.
[CHANGE] Upgrade Node.js to v20. #6540
[CHANGE] Querier: cortex_querier_blocks_consistency_checks_failed_total is now incremented when a block couldn't be queried from any attempted store-gateway as opposed to incremented after each attempt. Also cortex_querier_blocks_consistency_checks_total is incremented once per query as opposed to once per attempt (with 3 attempts). #6590
[CHANGE] Ingester: Modify utilization based read path limiter to base memory usage on Go heap size. #6584
[FEATURE] Distributor: added option -distributor.retry-after-header.enabled to include the Retry-After header in recoverable error responses. #6608
[FEATURE] Query-frontend: add experimental support for query blocking. Queries are blocked on a per-tenant basis and is configured via the limit blocked_queries. #5609
[FEATURE] Vault: Added support for new Vault authentication methods: AppRole, Kubernetes, UserPass and Token. #6143
[FEATURE] Add experimental endpoint /api/v1/cardinality/active_series to return the set of active series for a given selector. #6536 #6619 #6651 #6667
[FEATURE] Added -<prefix>.s3.part-size flag to configure the S3 minimum file size in bytes used for multipart uploads. #6592
[FEATURE] Add the experimental -<prefix>.s3.send-content-md5 flag (defaults to false) to configure S3 Put Object requests to send a Content-MD5 header. Setting this flag is not recommended unless your object storage does not support checksums. #6622
[FEATURE] Distributor: add an experimental flag -distributor.reusable-ingester-push-worker that can be used to pre-allocate a pool of workers to be used to send push requests to the ingesters. #6660
[FEATURE] Distributor: Support enabling of automatically generated name suffixes for metrics ingested via OTLP, through the flag -distributor.otel-metric-suffixes-enabled. #6542
[ENHANCEMENT] Query-frontend: don't treat cancel as an error. #4648
[ENHANCEMENT] Ingester: exported summary cortex_ingester_inflight_push_requests_summary tracking total number of inflight requests in percentile buckets. #5845
[ENHANCEMENT] Query-scheduler: add cortex_query_scheduler_enqueue_duration_seconds metric that records the time taken to enqueue or reject a query request. #5879
[ENHANCEMENT] Query-frontend: add cortex_query_frontend_enqueue_duration_seconds metric that records the time taken to enqueue or reject a query request. When query-scheduler is in use, the metric has the scheduler_address label to differentiate the enqueue duration by query-scheduler backend. #5879 #6087 #6120
[ENHANCEMENT] Store-gateway: add metric cortex_bucket_store_blocks_loaded_by_duration for counting the loaded number of blocks based on their duration. #6074 #6129
[ENHANCEMENT] Expose /sync/mutex/wait/total:seconds Go runtime metric as go_sync_mutex_wait_total_seconds_total from all components. #5879
[ENHANCEMENT] Query-scheduler: improve latency with many concurrent queriers. #5880
[ENHANCEMENT] Ruler: add new per-tenant cortex_ruler_queries_zero_fetched_series_total metric to track rules that fetched no series. #5925
[ENHANCEMENT] Implement support for limit, limit_per_metric and metric parameters for <Prometheus HTTP prefix>/api/v1/metadata endpoint. #5890
[ENHANCEMENT] Distributor: add experimental support for storing metadata when ingesting metrics via OTLP. This makes metrics description and type available when ingesting metrics via OTLP. Enable with -distributor.enable-otlp-metadata-storage=true. #5693 #6035 #6254
[ENHANCEMENT] Ingester: added support for sampling errors, which can be enabled by setting -ingester.error-sample-rate. This way each error will be logged once in the configured number of times. All the discarded samples will still be tracked by the cortex_discarded_samples_total metric. #5584 #6014
[ENHANCEMENT] Ruler: Fetch secrets used to configure TLS on the Alertmanager client from Vault when -vault.enabled is true. #5239
[ENHANCEMENT] Query-frontend: added query-sharding support for group by aggregation queries. #6024
[ENHANCEMENT] Fetch secrets used to configure server-side TLS from Vault when -vault.enabled is true. #6052.
[ENHANCEMENT] Packaging: add logrotate config file. #6142
[ENHANCEMENT] Ingester: add the experimental configuration options -blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes and -blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes to enforce a limit in bytes on the PostingsForMatchers() cache used by ingesters (the cache limit is per TSDB head and block basis, not a global one). The experimental configuration options -blocks-storage.tsdb.head-postings-for-matchers-cache-size and -blocks-storage.tsdb.block-postings-for-matchers-cache-size have been deprecated. #6151
[ENHANCEMENT] Ingester: use the PostingsForMatchers() in-memory cache for label values queries with matchers too. #6151
[ENHANCEMENT] Ingester / store-gateway: optimized regex matchers. #6168 #6250
[ENHANCEMENT] Distributor: Include ingester IDs in circuit breaker related metrics and logs. #6206
[ENHANCEMENT] Querier: improve errors and logging when streaming chunks from ingesters and store-gateways. #6194 #6309
[ENHANCEMENT] Querier: Add cortex_querier_federation_exemplar_tenants_queried and cortex_querier_federation_tenants_queried metrics to track the number of tenants queried by multi-tenant queries. #6374 #6409
[ENHANCEMENT] All: added an experimental -server.grpc.num-workers flag that configures the number of long-living workers used to process gRPC requests. This could decrease the CPU usage by reducing the number of stack allocations. #6311
[ENHANCEMENT] All: improved IPv6 support by using the proper host:port formatting. #6311
[ENHANCEMENT] Querier: always return error encountered during chunks streaming, rather than the stream has already been exhausted. #6345 #6433
[ENHANCEMENT] Query-frontend: add instance_enable_ipv6 to support IPv6. #6111
[ENHANCEMENT] Store-gateway: return same detailed error messages as queriers when chunks or series limits are reached. #6347
[ENHANCEMENT] Querier: reduce memory consumed for queries that hit store-gateways. #6348
[ENHANCEMENT] Ruler: include corresponding trace ID with log messages associated with rule evaluation. #6379 #6520
[ENHANCEMENT] Querier: clarify log messages and span events emitted while querying ingesters, and include both ingester name and address when relevant. #6381
[ENHANCEMENT] Memcached: introduce new experimental configuration parameters -<prefix>.memcached.write-buffer-size-bytes -<prefix>.memcached.read-buffer-size-bytes to customise the memcached client write and read buffer size (the buffer is allocated for each memcached connection). #6468
[ENHANCEMENT] Ingester, Distributor: added experimental support for rejecting push requests received via gRPC before reading them into memory, if ingester or distributor is unable to accept the request. This is activated by using -ingester.limit-inflight-requests-using-grpc-method-limiter for ingester, and -distributor.limit-inflight-requests-using-grpc-method-limiter for distributor. #5976 #6300
[ENHANCEMENT] Add capability in store-gateways to accept number of tokens through config. -store-gateway.sharding-ring.num-tokens, default-value=512 #4863
[ENHANCEMENT] Query-frontend: return warnings generated during query evaluation. #6391
[ENHANCEMENT] Server: Add the option -server.http-read-header-timeout to enable specifying a timeout for reading HTTP request headers. It defaults to 0, in which case reading of headers can take up to -server.http-read-timeout, leaving no time for reading body, if there's any. #6517
[ENHANCEMENT] Add connection-string option, -<prefix>.azure.connection-string, for Azure Blob Storage. #6487
[ENHANCEMENT] Ingester: Add -ingester.instance-limits.max-inflight-push-requests-bytes. This limit protects the ingester against requests that together may cause an OOM. #6492
[ENHANCEMENT] Ingester: add new per-tenant cortex_ingester_local_limits metric to expose the calculated local per-tenant limits seen at each ingester. Exports the local per-tenant series limit with label {limit="max_global_series_per_user"} #6403
[ENHANCEMENT] Query-frontend: added "queue_time_seconds" field to "query stats" log. This is total time that query and subqueries spent in the queue, before queriers picked it up. #6537
[ENHANCEMENT] Server: Add -server.report-grpc-codes-in-instrumentation-label-enabled CLI flag to specify whether gRPC status codes should be used in status_code label of cortex_request_duration_seconds metric. It defaults to false, meaning that successful and erroneous gRPC status codes are represented with success and error respectively. #6562
[ENHANCEMENT] Server: Add -ingester.client.report-grpc-codes-in-instrumentation-label-enabled CLI flag to specify whether gRPC status codes should be used in status_code label of cortex_ingester_client_request_duration_seconds metric. It defaults to false, meaning that successful and erroneous gRPC status codes are represented with 2xx and error respectively. #6562
[ENHANCEMENT] Server: Add -server.http-log-closed-connections-without-response-enabled option to log details about connections to HTTP server that were closed before any data was sent back. This can happen if client doesn't manage to send complete HTTP headers before timeout. #6612
[ENHANCEMENT] Query-frontend: include length of query, time since the earliest and latest points of a query, time since the earliest and latest points of a query, cached/uncached bytes in "query stats" logs. Time parameters (start/end/time) are always formatted as RFC3339 now. #6473 #6477 #6709 #6710
[ENHANCEMENT] Distributor: added support for reducing the resolution of native histogram samples upon ingestion if the sample has too many buckets compared to -validation.max-native-histogram-buckets. This is enabled by default and can be turned off by setting -validation.reduce-native-histogram-over-max-buckets to false. #6535
[ENHANCEMENT] Query-frontend: optionally wait for the frontend to complete startup if requests are received while the frontend is still starting. Disabled by default, set -query-frontend.not-running-timeout to a non-zero value to enable. #6621
[ENHANCEMENT] Distributor: Include source IPs in OTLP push handler logs. #6652
[ENHANCEMENT] Query-frontend: return clearer error message when a query request is received while shutting down. #6675
[ENHANCEMENT] Querier: return clearer error message when a query request is cancelled by the caller. #6697
[BUGFIX] Distributor: return server overload error in the event of exceeding the ingestion rate limit. #6549
[BUGFIX] Ring: Ensure network addresses used for component hash rings are formatted correctly when using IPv6. #6068
[BUGFIX] Query-scheduler: don't retain connections from queriers that have shut down, leading to gradually increasing enqueue latency over time. #6100 #6145
[BUGFIX] Ingester: prevent query logic from continuing to execute after queries are canceled. #6085
[BUGFIX] Ensure correct nesting of children of the querier.Select tracing span. #6085
[BUGFIX] Packaging: fix preremove script preventing upgrades on RHEL based OS. #6067
[BUGFIX] Querier: return actual error rather than attempted to read series at index XXX from stream, but the stream has already been exhausted (or even no error at all) when streaming chunks from ingesters or store-gateways is enabled and an error occurs while streaming chunks. #6346
[BUGFIX] Querier: reduce log volume when querying ingesters with zone-awareness enabled and one or more instances in a single zone unavailable. #6381
[BUGFIX] Querier: don't try to query further ingesters if ingester query request minimization is enabled and a query limit is reached as a result of the responses from the initial set of ingesters. #6402
[BUGFIX] Ingester: Don't cache context cancellation error when querying. #6446
[BUGFIX] Ingester: don't ignore errors encountered while iterating through chunks or samples in response to a query request. #6469
[BUGFIX] All: fix issue where traces for some inter-component gRPC calls would incorrectly show the call as failing due to cancellation. #6470
[BUGFIX] Querier: correctly mark streaming requests to ingesters or store-gateways as successful, not cancelled, in metrics and traces. #6471 #6505
[BUGFIX] Querier: fix issue where queries fail with "context canceled" error when an ingester or store-gateway fails healthcheck while the query is in progress. #6550
[BUGFIX] Tracing: When creating an OpenTelemetry tracing span, add it to the context for later retrieval. #6614
[BUGFIX] Querier: always report query results to query-frontends, even when cancelled, to ensure query-frontends don't wait for results that will otherwise never arrive. #6703
[BUGFIX] Querier: attempt to query ingesters in PENDING state, to reduce the likelihood that scaling up the number of ingesters in multiple zones simultaneously causes a read outage. #6726 #6727
[BUGFIX] Querier: don't cancel inflight queries from a query-scheduler if the stream between the querier and query-scheduler is broken. #6728
[BUGFIX] Store-gateway: Fix double-counting of some duration metrics. #6616
[BUGFIX] Fixed possible series matcher corruption leading to wrong series being included in query results. #6884

Mixin

[CHANGE] Dashboards: enabled reporting gRPC codes as status_code label in Mimir dashboards. In case of gRPC calls, the successful status_code label on cortex_request_duration_seconds and gRPC client request duration metrics has changed from 'success' and '2xx' to 'OK'. #6561
[CHANGE] Alerts: remove MimirGossipMembersMismatch alert and replace it with MimirGossipMembersTooHigh and MimirGossipMembersTooLow alerts that should have a higher signal-to-noise ratio. #6508
[ENHANCEMENT] Dashboards: Optionally show rejected requests on Mimir Writes dashboard. Useful when used together with "early request rejection" in ingester and distributor. #6132 #6556
[ENHANCEMENT] Alerts: added a critical alert for CompactorSkippedBlocksWithOutOfOrderChunks when multiple blocks are affected. #6410
[ENHANCEMENT] Dashboards: Added the min-replicas for autoscaling dashboards. #6528
[BUGFIX] Alerts: fixed issue where GossipMembersMismatch warning message referred to per-instance labels that were not produced by the alert query. #6146
[BUGFIX] Dashboards: Fix autoscaling dashboard panels for KEDA > 2.9. Requires scraping the KEDA operator for metrics since they moved. #6528
[BUGFIX] Alerts: Fix autoscaling alerts for KEDA > 2.9. Requires scraping the KEDA operator for metrics since they moved. #6528

Jsonnet

[CHANGE] Ingester: reduce -server.grpc-max-concurrent-streams to 500. #5666
[CHANGE] Changed default _config.cluster_domain from cluster.local to cluster.local. to reduce the number of DNS lookups made by Mimir. #6389
[CHANGE] Query-frontend: changed default _config.autoscaling_query_frontend_cpu_target_utilization from 1 to 0.75. #6395
[CHANGE] Distributor: Increase HPA scale down period such that distributors are slower to scale down after autoscaling up. #6589
[FEATURE] Store-gateway: Allow automated zone-by-zone downscaling, that can be enabled via the store_gateway_automated_downscale_enabled flag. It is disabled by default. #6149
[FEATURE] Ingester: Allow to configure TSDB Head early compaction using the following _config parameters: #6181
- ingester_tsdb_head_early_compaction_enabled (disabled by default)
- ingester_tsdb_head_early_compaction_reduction_percentage
- ingester_tsdb_head_early_compaction_min_in_memory_series
[ENHANCEMENT] Double the amount of rule groups for each user tier. #5897
[ENHANCEMENT] Set maxUnavailable to 0 for distributor, overrides-exporter, querier, query-frontend, query-scheduler ruler-querier, ruler-query-frontend, ruler-query-scheduler and consul deployments, to ensure they don't become completely unavailable during a rollout. #5924
[ENHANCEMENT] Update rollout-operator to v0.9.0. #6022 #6110 #6558 #6681
[ENHANCEMENT] Update memcached to memcached:1.6.22-alpine. #6585
[ENHANCEMENT] Store-gateway: replaced the following deprecated CLI flags: #6319
- -blocks-storage.bucket-store.index-header-lazy-loading-enabled replaced with -blocks-storage.bucket-store.index-header.lazy-loading-enabled
- -blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout replaced with -blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout
[ENHANCEMENT] Store-gateway: Allow selective enablement of store-gateway automated scaling on a per-zone basis. #6302
[BUGFIX] Autoscaling: KEDA > 2.9 removed the ability to set metricName in the trigger metadata. To help discern which metric is used by the HPA, we set the trigger name to what was the metricName. This is available as the scaler label on keda_* metrics. #6528

Mimirtool

[ENHANCEMENT] Analyze Grafana: Improve support for variables in range. #6657
[BUGFIX] Fix out of bounds error on export with large timespans and/or series count. #5700
[BUGFIX] Fix the issue where --read-timeout was applied to the entire mimirtool analyze grafana invocation rather than to individual Grafana API calls. #5915
[BUGFIX] Fix incorrect remote-read path joining for mimirtool remote-read commands on Windows. #6011
[BUGFIX] Fix template files full path being sent in mimirtool alertmanager load command. #6138
[BUGFIX] Analyze rule-file: .metricsUsed field wasn't populated. #6953

Mimir Continuous Test

Query-tee

Documentation

[ENHANCEMENT] Document the concept of native histograms and how to send them to Mimir, migration path. #5956 #6488 #6539
[ENHANCEMENT] Document native histograms query and visualization. #6231

Tools

[CHANGE] tsdb-index: Rename tool to tsdb-series. #6317
[FEATURE] tsdb-labels: Add tool to print label names and values of a TSDB block. #6317
[ENHANCEMENT] trafficdump: Trafficdump can now parse OTEL requests. Entire request is dumped to output, there's no filtering of fields or matching of series done. #6108

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.5...mimir-2.11.0

mimir - 2.11.0-rc.0

Published by leizor 10 months ago

This release contains 531 PRs from 55 authors, including new contributors Benjamin, Dominik Kepinski, Jonathan Donzallaz, Juraj Michálek, Kai.Ke, Ludovic Terrier, Luke, Maciej Lech, Matthew Penner, Michael Potter, Mihai Țimbota-Belin, Rasmus Werner Salling, Ying WANG, chencs, fayzal-g, kalle (jag), renovate[bot], sarthaktyagi-505, whoami. Thank you!

Grafana Mimir version 2.11.0-rc.0 release notes

Grafana Labs is excited to announce version 2.11 of Grafana Mimir.

The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.

Features and enhancements

Sampled logging of errors in the ingester. A high-traffic Mimir cluster can occasionally become bogged down logging high volumes of repeated errors. You can now reduce the amount of errors outputted to logs by setting a sample rate via the -ingester.error-sample-rate CLI flag.
Add total request size instance limit for ingesters. This limit protects the ingesters against requests that together may cause an OOM. Enable this feature by setting the -ingester.instance-limits.max-inflight-push-requests-bytes CLI flag in combination with the -ingester.limit-inflight-requests-using-grpc-method-limiter CLI flag.
Reduce the resolution of incoming native histograms samples if the incoming sample has too many buckets compared to -validation.max-native-histogram-buckets. This is enabled by default but can be turned off by setting the -validation.reduce-native-histogram-over-max-buckets CLI flag to false.
Improved query-scheduler performance under load. This is particularly apparent for clusters with large numbers of queriers.
Ingester to querier chunks streaming reduces the memory utilization of queriers and reduces the likelihood of OOMs.
Ingester query request minimization reduces the number of query requests to ingesters, improving performance and resource utilization for both ingesters and queriers.

Experimental features

Grafana Mimir 2.11 includes new features that are considered experimental and disabled by default. Please use them with caution and report any issue you encounter:

Block specified queries on a per-tenant basis. This is configured via the blocked_queries limit. See the docs for more information.
Store metadata when ingesting metrics via OTLP. This makes metric description and type available when ingesting metrics via OTLP. You can enable this feature by setting the CLI flag -distributor.enable-otlp-metadata-storage to true.
Reject gRPC push requests that the ingester/distributor is unable to accept before reading them into memory. You can enable this feature by using the -ingester.limit-inflight-requests-using-grpc-method-limiter and/or the -distributor.limit-inflight-requests-using-grpc-method-limiter CLI flags for the ingester and/or the distributor, respectively.
Customize the memcached client write and read buffer size. The buffer allocated for each memcached connection can be configured via the following CLI flags:
- For the blocks storage:
  - -blocks-storage.bucket-store.chunks-cache.memcached.read-buffer-size-bytes
  - -blocks-storage.bucket-store.chunks-cache.memcached.write-buffer-size-bytes
  - -blocks-storage.bucket-store.index-cache.memcached.read-buffer-size-bytes
  - -blocks-storage.bucket-store.index-cache.memcached.write-buffer-size-bytes
  - -blocks-storage.bucket-store.metadata-cache.memcached.read-buffer-size-bytes
  - -blocks-storage.bucket-store.metadata-cache.memcached.write-buffer-size-bytes
- For the query frontend:
  - -query-frontend.results-cache.memcached.read-buffer-size-bytes
  - -query-frontend.results-cache.memcached.write-buffer-size-bytes
- For the ruler storage:
  - -ruler-storage.cache.memcached.read-buffer-size-bytes
  - -ruler-storage.cache.memcached.write-buffer-size-bytes
Configure the number of long-living workers used to process gRPC requests. This can decrease CPU usage by reducing the number of stack allocations. Configure this feature by using the -server.grpc.num-workers CLI flag.
Enforce a limit in bytes on the PostingsForMatchers cache used by ingesters. This limit can be configured via the -blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes and -blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes CLI flags.
Pre-allocate the pool of workers in the distributor that are used to send push requests to ingesters. This can decrease CPU usage by reducing the number of stack allocations. You can enable this feature by using the -distributor.reusable-ingester-push-worker flag.
Include a Retry-After header in recoverable error responses from the distributor. This can protect your Mimir cluster from clients including Prometheus that default to retrying very quickly. Enable this feature by setting the -distributor.retry-after-header.enabled CLI flag.

Helm chart improvements

The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.

Important changes

In Grafana Mimir 2.11 the following behavior has changed:

The utilization-based read path limiter now operates on Go heap size instead of RSS from the Linux proc file system.

The following configuration options had been previously deprecated and are removed in Grafana Mimir 2.11:

The CLI flag -querier.iterators.
The CLI flag -query.batch-iterators.
The CLI flag -blocks-storage.bucket-store.bucket-index.enabled.
The CLI flag -blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes.
The CLI flag -blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes.
The CLI flag -blocks-storage.bucket-store.max-chunk-pool-bytes.

The following configuration options are deprecated and will be removed in Grafana Mimir 2.13:

The CLI flag -log.buffered; this is now the default behavior.

The following metrics are removed:

cortex_query_frontend_workers_enqueued_requests_total; use cortex_query_frontend_enqueue_duration_seconds_count instead.

The following configuration option defaults were changed:

The CLI flag -blocks-storage.bucket-store.index-header.sparse-persistence-enabled now defaults to true.
The default value for the CLI flag -blocks-storage.bucket-store.index-header.lazy-loading-concurrency was changed from 0 to 4.
The default value for the CLI flag -blocks-storage.tsdb.series-hash-cache-max-size-bytes was changed from 1GB to 350MB.
The default value for the CLI flag -blocks-storage.tsdb.early-head-compaction-min-estimated-series-reduction-percentage was changed from 10 to 15.

Bug fixes

Ingester: Respect context cancelation during query execution. PR 6085
Distributor: Return 529 when ingestion rate limit is hit and the distributor.service_overload_status_code_on_rate_limit_enabled flag is active. PR 6549
Query-scheduler: Prevent accumulation of stale querier connections. PR 6100
Packaging: Fix preremove script preventing upgrades on RHEL based OS. PR 6067

Changelog

2.11.0-rc.0

Grafana Mimir

[CHANGE] The following deprecated configurations have been removed: #6673 #6779 #6808 #6814
- -querier.iterators
- -querier.batch-iterators
- -blocks-storage.bucket-store.max-chunk-pool-bytes
- -blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes
- -blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes
- -blocks-storage.bucket-store.bucket-index.enabled
[CHANGE] Querier: Split worker GRPC config into separate client configs for the frontend and scheduler to allow TLS to be configured correctly when specifying the tls_server_name. The GRPC config specified under -querier.frontend-client.* will no longer apply to the scheduler client, and will need to be set explicitly under -querier.scheduler-client.*. #6445 #6573
[CHANGE] Store-gateway: enable sparse index headers by default. Sparse index headers reduce the time to load an index header up to 90%. #6005
[CHANGE] Store-gateway: lazy-loading concurrency limit default value is now 4. #6004
[CHANGE] General: enabled -log.buffered by default. The -log.buffered has been deprecated and will be removed in Mimir 2.13. #6131
[CHANGE] Ingester: changed default -blocks-storage.tsdb.series-hash-cache-max-size-bytes setting from 1GB to 350MB. The new default cache size is enough to store the hashes for all series in a ingester, assuming up to 2M in-memory series per ingester and using the default 13h retention period for local TSDB blocks in the ingesters. #6130
[CHANGE] Query-frontend: removed cortex_query_frontend_workers_enqueued_requests_total. Use cortex_query_frontend_enqueue_duration_seconds_count instead. #6121
[CHANGE] Ingester / querier: enable ingester to querier chunks streaming by default and mark it as stable. #6174
[CHANGE] Ingester / querier: enable ingester query request minimisation by default and mark it as stable. #6174
[CHANGE] Ingester: changed the default value for the experimental configuration parameter -blocks-storage.tsdb.early-head-compaction-min-estimated-series-reduction-percentage from 10 to 15. #6186
[CHANGE] Ingester: /ingester/push HTTP endpoint has been removed. This endpoint was added for testing and troubleshooting, but was never documented or used for anything. #6299
[CHANGE] Experimental setting -log.rate-limit-logs-per-second-burst renamed to -log.rate-limit-logs-burst-size. #6230
[CHANGE] Distributor: instead of errors with HTTP status codes, Push() now returns errors with gRPC codes: #6377
- http.StatusAccepted (202) code is replaced with codes.AlreadyExists.
- http.BadRequest (400) code is replaced with codes.FailedPrecondition.
- http.StatusTooManyRequests (429) and the non-standard 529 (The service is overloaded) codes are replaced with codes.ResourceExhausted.
[CHANGE] Ingester: by setting the newly introduced experimental CLI flag -ingester.return-only-grpc-errors to true, ingester will return only gRPC errors. This feature changes the following status codes: #6443 #6680 #6723
- http.StatusBadRequest (400) is replaced with codes.FailedPrecondition on the write path.
- http.StatusServiceUnavailable (503) is replaced with codes.Internal on the write path, and with codes.ResourceExhausted on the read path.
- codes.Unknown is replaced with codes.Internal on both write and read path.
[CHANGE] Upgrade Node.js to v20. #6540
[CHANGE] Querier: cortex_querier_blocks_consistency_checks_failed_total is now incremented when a block couldn't be queried from any attempted store-gateway as opposed to incremented after each attempt. Also cortex_querier_blocks_consistency_checks_total is incremented once per query as opposed to once per attempt (with 3 attempts). #6590
[CHANGE] Ingester: Modify utilization based read path limiter to base memory usage on Go heap size. #6584
[FEATURE] Distributor: added option -distributor.retry-after-header.enabled to include the Retry-After header in recoverable error responses. #6608
[FEATURE] Query-frontend: add experimental support for query blocking. Queries are blocked on a per-tenant basis and is configured via the limit blocked_queries. #5609
[FEATURE] Vault: Added support for new Vault authentication methods: AppRole, Kubernetes, UserPass and Token. #6143
[FEATURE] Add experimental endpoint /api/v1/cardinality/active_series to return the set of active series for a given selector. #6536 #6619 #6651 #6667
[FEATURE] Added -<prefix>.s3.part-size flag to configure the S3 minimum file size in bytes used for multipart uploads. #6592
[FEATURE] Add the experimental -<prefix>.s3.send-content-md5 flag (defaults to false) to configure S3 Put Object requests to send a Content-MD5 header. Setting this flag is not recommended unless your object storage does not support checksums. #6622
[FEATURE] Distributor: add an experimental flag -distributor.reusable-ingester-push-worker that can be used to pre-allocate a pool of workers to be used to send push requests to the ingesters. #6660
[FEATURE] Distributor: Support enabling of automatically generated name suffixes for metrics ingested via OTLP, through the flag -distributor.otel-metric-suffixes-enabled. #6542
[ENHANCEMENT] Query-frontend: don't treat cancel as an error. #4648
[ENHANCEMENT] Ingester: exported summary cortex_ingester_inflight_push_requests_summary tracking total number of inflight requests in percentile buckets. #5845
[ENHANCEMENT] Query-scheduler: add cortex_query_scheduler_enqueue_duration_seconds metric that records the time taken to enqueue or reject a query request. #5879
[ENHANCEMENT] Query-frontend: add cortex_query_frontend_enqueue_duration_seconds metric that records the time taken to enqueue or reject a query request. When query-scheduler is in use, the metric has the scheduler_address label to differentiate the enqueue duration by query-scheduler backend. #5879 #6087 #6120
[ENHANCEMENT] Store-gateway: add metric cortex_bucket_store_blocks_loaded_by_duration for counting the loaded number of blocks based on their duration. #6074 #6129
[ENHANCEMENT] Expose /sync/mutex/wait/total:seconds Go runtime metric as go_sync_mutex_wait_total_seconds_total from all components. #5879
[ENHANCEMENT] Query-scheduler: improve latency with many concurrent queriers. #5880
[ENHANCEMENT] Ruler: add new per-tenant cortex_ruler_queries_zero_fetched_series_total metric to track rules that fetched no series. #5925
[ENHANCEMENT] Implement support for limit, limit_per_metric and metric parameters for <Prometheus HTTP prefix>/api/v1/metadata endpoint. #5890
[ENHANCEMENT] Distributor: add experimental support for storing metadata when ingesting metrics via OTLP. This makes metrics description and type available when ingesting metrics via OTLP. Enable with -distributor.enable-otlp-metadata-storage=true. #5693 #6035 #6254
[ENHANCEMENT] Ingester: added support for sampling errors, which can be enabled by setting -ingester.error-sample-rate. This way each error will be logged once in the configured number of times. All the discarded samples will still be tracked by the cortex_discarded_samples_total metric. #5584 #6014
[ENHANCEMENT] Ruler: Fetch secrets used to configure TLS on the Alertmanager client from Vault when -vault.enabled is true. #5239
[ENHANCEMENT] Query-frontend: added query-sharding support for group by aggregation queries. #6024
[ENHANCEMENT] Fetch secrets used to configure server-side TLS from Vault when -vault.enabled is true. #6052.
[ENHANCEMENT] Packaging: add logrotate config file. #6142
[ENHANCEMENT] Ingester: add the experimental configuration options -blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes and -blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes to enforce a limit in bytes on the PostingsForMatchers() cache used by ingesters (the cache limit is per TSDB head and block basis, not a global one). The experimental configuration options -blocks-storage.tsdb.head-postings-for-matchers-cache-size and -blocks-storage.tsdb.block-postings-for-matchers-cache-size have been deprecated. #6151
[ENHANCEMENT] Ingester: use the PostingsForMatchers() in-memory cache for label values queries with matchers too. #6151
[ENHANCEMENT] Ingester / store-gateway: optimized regex matchers. #6168 #6250
[ENHANCEMENT] Distributor: Include ingester IDs in circuit breaker related metrics and logs. #6206
[ENHANCEMENT] Querier: improve errors and logging when streaming chunks from ingesters and store-gateways. #6194 #6309
[ENHANCEMENT] Querier: Add cortex_querier_federation_exemplar_tenants_queried and cortex_querier_federation_tenants_queried metrics to track the number of tenants queried by multi-tenant queries. #6374 #6409
[ENHANCEMENT] All: added an experimental -server.grpc.num-workers flag that configures the number of long-living workers used to process gRPC requests. This could decrease the CPU usage by reducing the number of stack allocations. #6311
[ENHANCEMENT] All: improved IPv6 support by using the proper host:port formatting. #6311
[ENHANCEMENT] Querier: always return error encountered during chunks streaming, rather than the stream has already been exhausted. #6345 #6433
[ENHANCEMENT] Query-frontend: add instance_enable_ipv6 to support IPv6. #6111
[ENHANCEMENT] Store-gateway: return same detailed error messages as queriers when chunks or series limits are reached. #6347
[ENHANCEMENT] Querier: reduce memory consumed for queries that hit store-gateways. #6348
[ENHANCEMENT] Ruler: include corresponding trace ID with log messages associated with rule evaluation. #6379 #6520
[ENHANCEMENT] Querier: clarify log messages and span events emitted while querying ingesters, and include both ingester name and address when relevant. #6381
[ENHANCEMENT] Memcached: introduce new experimental configuration parameters -<prefix>.memcached.write-buffer-size-bytes -<prefix>.memcached.read-buffer-size-bytes to customise the memcached client write and read buffer size (the buffer is allocated for each memcached connection). #6468
[ENHANCEMENT] Ingester, Distributor: added experimental support for rejecting push requests received via gRPC before reading them into memory, if ingester or distributor is unable to accept the request. This is activated by using -ingester.limit-inflight-requests-using-grpc-method-limiter for ingester, and -distributor.limit-inflight-requests-using-grpc-method-limiter for distributor. #5976 #6300
[ENHANCEMENT] Add capability in store-gateways to accept number of tokens through config. -store-gateway.sharding-ring.num-tokens, default-value=512 #4863
[ENHANCEMENT] Query-frontend: return warnings generated during query evaluation. #6391
[ENHANCEMENT] Server: Add the option -server.http-read-header-timeout to enable specifying a timeout for reading HTTP request headers. It defaults to 0, in which case reading of headers can take up to -server.http-read-timeout, leaving no time for reading body, if there's any. #6517
[ENHANCEMENT] Add connection-string option, -<prefix>.azure.connection-string, for Azure Blob Storage. #6487
[ENHANCEMENT] Ingester: Add -ingester.instance-limits.max-inflight-push-requests-bytes. This limit protects the ingester against requests that together may cause an OOM. #6492
[ENHANCEMENT] Ingester: add new per-tenant cortex_ingester_local_limits metric to expose the calculated local per-tenant limits seen at each ingester. Exports the local per-tenant series limit with label {limit="max_global_series_per_user"} #6403
[ENHANCEMENT] Query-frontend: added "queue_time_seconds" field to "query stats" log. This is total time that query and subqueries spent in the queue, before queriers picked it up. #6537
[ENHANCEMENT] Server: Add -server.report-grpc-codes-in-instrumentation-label-enabled CLI flag to specify whether gRPC status codes should be used in status_code label of cortex_request_duration_seconds metric. It defaults to false, meaning that successful and erroneous gRPC status codes are represented with success and error respectively. #6562
[ENHANCEMENT] Server: Add -ingester.client.report-grpc-codes-in-instrumentation-label-enabled CLI flag to specify whether gRPC status codes should be used in status_code label of cortex_ingester_client_request_duration_seconds metric. It defaults to false, meaning that successful and erroneous gRPC status codes are represented with 2xx and error respectively. #6562
[ENHANCEMENT] Server: Add -server.http-log-closed-connections-without-response-enabled option to log details about connections to HTTP server that were closed before any data was sent back. This can happen if client doesn't manage to send complete HTTP headers before timeout. #6612
[ENHANCEMENT] Query-frontend: include length of query, time since the earliest and latest points of a query, time since the earliest and latest points of a query, cached/uncached bytes in "query stats" logs. Time parameters (start/end/time) are always formatted as RFC3339 now. #6473 #6477 #6709 #6710
[ENHANCEMENT] Distributor: added support for reducing the resolution of native histogram samples upon ingestion if the sample has too many buckets compared to -validation.max-native-histogram-buckets. This is enabled by default and can be turned off by setting -validation.reduce-native-histogram-over-max-buckets to false. #6535
[ENHANCEMENT] Query-frontend: optionally wait for the frontend to complete startup if requests are received while the frontend is still starting. Disabled by default, set -query-frontend.not-running-timeout to a non-zero value to enable. #6621
[ENHANCEMENT] Distributor: Include source IPs in OTLP push handler logs. #6652
[ENHANCEMENT] Query-frontend: return clearer error message when a query request is received while shutting down. #6675
[ENHANCEMENT] Querier: return clearer error message when a query request is cancelled by the caller. #6697
[BUGFIX] Distributor: return server overload error in the event of exceeding the ingestion rate limit. #6549
[BUGFIX] Ring: Ensure network addresses used for component hash rings are formatted correctly when using IPv6. #6068
[BUGFIX] Query-scheduler: don't retain connections from queriers that have shut down, leading to gradually increasing enqueue latency over time. #6100 #6145
[BUGFIX] Ingester: prevent query logic from continuing to execute after queries are canceled. #6085
[BUGFIX] Ensure correct nesting of children of the querier.Select tracing span. #6085
[BUGFIX] Packaging: fix preremove script preventing upgrades on RHEL based OS. #6067
[BUGFIX] Querier: return actual error rather than attempted to read series at index XXX from stream, but the stream has already been exhausted (or even no error at all) when streaming chunks from ingesters or store-gateways is enabled and an error occurs while streaming chunks. #6346
[BUGFIX] Querier: reduce log volume when querying ingesters with zone-awareness enabled and one or more instances in a single zone unavailable. #6381
[BUGFIX] Querier: don't try to query further ingesters if ingester query request minimization is enabled and a query limit is reached as a result of the responses from the initial set of ingesters. #6402
[BUGFIX] Ingester: Don't cache context cancellation error when querying. #6446
[BUGFIX] Ingester: don't ignore errors encountered while iterating through chunks or samples in response to a query request. #6469
[BUGFIX] All: fix issue where traces for some inter-component gRPC calls would incorrectly show the call as failing due to cancellation. #6470
[BUGFIX] Querier: correctly mark streaming requests to ingesters or store-gateways as successful, not cancelled, in metrics and traces. #6471 #6505
[BUGFIX] Querier: fix issue where queries fail with "context canceled" error when an ingester or store-gateway fails healthcheck while the query is in progress. #6550
[BUGFIX] Tracing: When creating an OpenTelemetry tracing span, add it to the context for later retrieval. #6614
[BUGFIX] Querier: always report query results to query-frontends, even when cancelled, to ensure query-frontends don't wait for results that will otherwise never arrive. #6703
[BUGFIX] Querier: attempt to query ingesters in PENDING state, to reduce the likelihood that scaling up the number of ingesters in multiple zones simultaneously causes a read outage. #6726 #6727
[BUGFIX] Querier: don't cancel inflight queries from a query-scheduler if the stream between the querier and query-scheduler is broken. #6728
[BUGFIX] Store-gateway: Fix double-counting of some duration metrics. #6616
[BUGFIX] Fixed possible series matcher corruption leading to wrong series being included in query results. #6884

Mixin

[CHANGE] Dashboards: enabled reporting gRPC codes as status_code label in Mimir dashboards. In case of gRPC calls, the successful status_code label on cortex_request_duration_seconds and gRPC client request duration metrics has changed from 'success' and '2xx' to 'OK'. #6561
[CHANGE] Alerts: remove MimirGossipMembersMismatch alert and replace it with MimirGossipMembersTooHigh and MimirGossipMembersTooLow alerts that should have a higher signal-to-noise ratio. #6508
[ENHANCEMENT] Dashboards: Optionally show rejected requests on Mimir Writes dashboard. Useful when used together with "early request rejection" in ingester and distributor. #6132 #6556
[ENHANCEMENT] Alerts: added a critical alert for CompactorSkippedBlocksWithOutOfOrderChunks when multiple blocks are affected. #6410
[ENHANCEMENT] Dashboards: Added the min-replicas for autoscaling dashboards. #6528
[BUGFIX] Alerts: fixed issue where GossipMembersMismatch warning message referred to per-instance labels that were not produced by the alert query. #6146
[BUGFIX] Dashboards: Fix autoscaling dashboard panels for KEDA > 2.9. Requires scraping the KEDA operator for metrics since they moved. #6528
[BUGFIX] Alerts: Fix autoscaling alerts for KEDA > 2.9. Requires scraping the KEDA operator for metrics since they moved. #6528

Jsonnet

[CHANGE] Ingester: reduce -server.grpc-max-concurrent-streams to 500. #5666
[CHANGE] Changed default _config.cluster_domain from cluster.local to cluster.local. to reduce the number of DNS lookups made by Mimir. #6389
[CHANGE] Query-frontend: changed default _config.autoscaling_query_frontend_cpu_target_utilization from 1 to 0.75. #6395
[CHANGE] Distributor: Increase HPA scale down period such that distributors are slower to scale down after autoscaling up. #6589
[FEATURE] Store-gateway: Allow automated zone-by-zone downscaling, that can be enabled via the store_gateway_automated_downscale_enabled flag. It is disabled by default. #6149
[FEATURE] Ingester: Allow to configure TSDB Head early compaction using the following _config parameters: #6181
- ingester_tsdb_head_early_compaction_enabled (disabled by default)
- ingester_tsdb_head_early_compaction_reduction_percentage
- ingester_tsdb_head_early_compaction_min_in_memory_series
[ENHANCEMENT] Double the amount of rule groups for each user tier. #5897
[ENHANCEMENT] Set maxUnavailable to 0 for distributor, overrides-exporter, querier, query-frontend, query-scheduler ruler-querier, ruler-query-frontend, ruler-query-scheduler and consul deployments, to ensure they don't become completely unavailable during a rollout. #5924
[ENHANCEMENT] Update rollout-operator to v0.9.0. #6022 #6110 #6558 #6681
[ENHANCEMENT] Update memcached to memcached:1.6.22-alpine. #6585
[ENHANCEMENT] Store-gateway: replaced the following deprecated CLI flags: #6319
- -blocks-storage.bucket-store.index-header-lazy-loading-enabled replaced with -blocks-storage.bucket-store.index-header.lazy-loading-enabled
- -blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout replaced with -blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout
[ENHANCEMENT] Store-gateway: Allow selective enablement of store-gateway automated scaling on a per-zone basis. #6302
[BUGFIX] Autoscaling: KEDA > 2.9 removed the ability to set metricName in the trigger metadata. To help discern which metric is used by the HPA, we set the trigger name to what was the metricName. This is available as the scaler label on keda_* metrics. #6528

Mimirtool

[ENHANCEMENT] Analyze Grafana: Improve support for variables in range. #6657
[BUGFIX] Fix out of bounds error on export with large timespans and/or series count. #5700
[BUGFIX] Fix the issue where --read-timeout was applied to the entire mimirtool analyze grafana invocation rather than to individual Grafana API calls. #5915
[BUGFIX] Fix incorrect remote-read path joining for mimirtool remote-read commands on Windows. #6011
[BUGFIX] Fix template files full path being sent in mimirtool alertmanager load command. #6138
[BUGFIX] Analyze rule-file: .metricsUsed field wasn't populated. #6953

Mimir Continuous Test

Query-tee

Documentation

[ENHANCEMENT] Document the concept of native histograms and how to send them to Mimir, migration path. #5956 #6488 #6539
[ENHANCEMENT] Document native histograms query and visualization. #6231

Tools

[CHANGE] tsdb-index: Rename tool to tsdb-series. #6317
[FEATURE] tsdb-labels: Add tool to print label names and values of a TSDB block. #6317
[ENHANCEMENT] trafficdump: Trafficdump can now parse OTEL requests. Entire request is dumped to output, there's no filtering of fields or matching of series done. #6108

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.4...mimir-2.11.0-rc.0

mimir - 2.10.5

Published by dimitarvdimitrov 10 months ago

Changelog

2.10.5

Grafana Mimir

[ENHANCEMENT] Update Docker base images from alpine:3.18.3 to alpine:3.18.5. #6897
[BUGFIX] Fixed possible series matcher corruption leading to wrong series being included in query results. #6886

Documentation

[ENHANCEMENT] Document the concept of native histograms and how to send them to Mimir, migration path. #6757
[ENHANCEMENT] Document native histograms query and visualization. #6757

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.4...mimir-2.10.5

mimir - 2.9.4

Published by dimitarvdimitrov 10 months ago

Changelog

2.9.4

Grafana Mimir

[ENHANCEMENT] Update Docker base images from alpine:3.18.3 to alpine:3.18.5. #6895

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.9.3...mimir-2.9.4

mimir - 2.9.3

Published by fayzal-g 11 months ago

This release contains 1 PR from 1 author. Thank you!

Changelog

2.9.3

[BUGFIX] Update go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp to 0.44 which includes a fix for CVE-2023-45142. #6637

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.9.2...mimir-2.9.3

mimir - 2.10.4

Published by colega 11 months ago

This release contains 3 PRs from 1 authors. Thank you!

Changelog

2.10.4

Grafana Mimir

[BUGFIX] Update otelhttp library to v0.44.0 as a mitigation for CVE-2023-45142. #6634

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.3...mimir-2.10.4

mimir - 2.10.3

Published by colega about 1 year ago

This release contains 1 PR from 1 author. Thank you!

Changelog

2.10.3

Grafana Mimir

[BUGFIX] Update grpc-go library to 1.57.2-dev that includes a fix for a bug introduced in 1.57.1. #6419

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.2...mimir-2.10.3

mimir - 2.9.2

Published by lamida about 1 year ago

This release contains 5 PRs from 3 authors. Thank you!

Grafana Mimir version 2.9.2 release notes

Changelog

2.9.2

[BUGFIX] Update grpc-go library to 1.56.3 and golang.org/x/net to 0.17, which include fix for CVE-2023-44487. #6353 #6364

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.9.1...mimir-2.9.2

mimir - 2.10.2

Published by pstibrany about 1 year ago

This release contains 2 PRs from 1 authors. Thank you!

Changelog

2.10.2

Grafana Mimir

[BUGFIX] Update grpc-go library to 1.57.1 and golang.org/x/net to 0.17, which include fix for CVE-2023-44487. #6349

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.1...mimir-2.10.2

mimir - 2.10.1

Published by colega about 1 year ago

This release contains 6 PRs from 4 authors. Thank you!

Changelog

2.10.1

Grafana Mimir

[CHANGE] Update Go version to 1.21.3. #6244 #6325
[BUGFIX] Query-frontend: Don't retry read requests rejected by the ingester due to utilization based read path limiting. #6032
[BUGFIX] Ingester: fix panic in WAL replay of certain native histograms. #6086

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.0...mimir-2.10.1

mimir - 2.10.0

Published by colega about 1 year ago

This release contains 455 PRs from 54 authors, including new contributors Aaron Sanders, Alexander Proschek, Aljoscha Pörtner, balazs92117, Francois Gouteroux, Franco Posa, Heather Yuan, jingyang, kendrickclark, m4r1u2, Milan Plžík, Samir Teymurov, Sven Haardiek, Thomas Schaaf, Tiago Posse. Thank you!

Grafana Mimir version 2.10.0 release notes

Grafana Labs is excited to announce version 2.10 of Grafana Mimir.

The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.

Features and enhancements

Added support for rule filtering by passing file, ruler_group and rule_name parameters to the ruler endpoint /api/v1/rules.
Added support to only count series that are considered active through the Cardinality API endpoint /api/v1/cardinality/label_values by passing the count_method parameter. You can set it to active to count only series that are considered active according to the -ingester.active-series-metrics-idle-timeout flag setting rather than counting all in-memory series.
Reduced the overall memory consumption by changing the internal data structure for labels. Expect ingesters to use around 15% less memory with this change, depending on the pattern of labels used, number of tenants, etc.
Reduced the memory usage of the Active Series Tracker in the ingester.
Added a buffered logging implementation that can be enabled through the -log.buffered CLI flag. This should reduce contention and resource usage under heavy usage patterns.
Improved the performance of the OTLP ingestion and more detailed information was added to the traces in order to make troubleshooting problems easier.
Improved the performance of series matching in the store-gateway by always including the __name__ posting group causing a reduction in the number of object storage API calls.
Improved the performance of label values with matchers calls when number of matched series is small. If you're using Grafana to query Grafana Mimir, you'll need to be sure your Prometheus data source configuration has the Prometheus type set to Mimir and the Version set correctly in order to benefit from this improvement.
Support to cache cardinality, label names and label values query responses in query frontend. The cache will be used when -query-frontend.cache-results is enabled, and -query-frontend.results-cache-ttl-for-cardinality-query or -query-frontend.results-cache-ttl-for-labels-query is set to a value greater than 0.
Reduced wasted effort spent computing results that won't be used by having queriers cancel the requests sent to the ingesters in a zone upon receiving first error from that zone.
Reduced object storage use by enhancing the compactor to remove the bucket index, markers, and debug files when it detects zero remaining blocks in the bucket index. This cleanup process can be enabled by setting the -compactor.no-blocks-file-cleanup-enabled option to true.
Added new debug HTTP endpoints /ingester/tenants and /ingester/tsdb/{tenant} to the ingester that provide debug information about tenants and their TSDBs.
Added new metrics for tracking native histograms in active series: cortex_ingester_active_native_histogram_series, cortex_ingester_active_native_histogram_series_custom_tracker, cortex_ingester_active_native_histogram_buckets, cortex_ingester_active_native_histogram_buckets_custom_tracker. The first 2 are the subsets of the existing and unmodified cortex_ingester_active_series and cortex_ingester_active_series_custom_tracker respectively, only tracking native histogram series, and the last 2 are the equivalent for tracking the number of buckets in native histogram series.

Additionally, the following previously experimental features are now considered stable:

Support for a ruler storage cache. This cache should reduce the number of "list objects" API calls issued to the object storage when there are 2+ ruler replicas running in a Mimir cluster. The cache can be configured by setting the -ruler-storage.cache.* CLI flags or their respective YAML config options.
Query sharding cardinality estimation. This feature allows query sharding to take into account cardinality of similar requests executed previously when computing the maximum number of shards to use. You can enable it through the advanced CLI configuration flag -query-frontend.query-sharding-target-series-per-shard; we recommend starting with a value of 2500.
Query expression size limit. You can limit the size in bytes of the queries allowed to be processed through the CLI configuration flag -query-frontend.max-query-expression-size-bytes.
Peer discovery / tenant sharding for overrides exporters. You can enable it through the CLI configuration flag -overrides-exporter.ring.enabled.
Overrides exporter enabled metrics selection. You can select which metrics the overrides exporter should export through the CLI configuration flag -overrides-exporter.enabled-metrics.
Per-tenant results cache TTL. The time-to-live duration for cached query results can be configured using the results_cache_ttl and results_cache_ttl_for_out_of_order_time_window parameters.

Experimental features

Grafana Mimir 2.10 includes new features that are considered as experimental and disabled by default. Please use them with caution and report any issues you encounter:

Support for ingesting exponential histograms in OpenTelemetry format. The exponential histograms that are over the native histogram scale limit of 8 are downscaled to allow their ingestion.
Store-gateway index-header loading improvements, which include the ability to persist the sparse index-header to disk instead of reconstructing it on every restart (-blocks-storage.bucket-store.index-header-sparse-persistence-enabled) as well as the ability to persist the list of block IDs that were lazy-loaded while running to eagerly load them upon startup to prevent starting up with no loaded blocks (-blocks-storage.bucket-store.index-header.eager-loading-startup-enabled) and an option to limit the number of concurrent index-header loads when lazy-loading (-blocks-storage.bucket-store.index-header-lazy-loading-concurrency).
Option to allow queriers to reduce pressure on ingesters by initially querying only the minimum set of ingesters required to reach quorum. (-querier.minimize-ingester-requests).
Early TSDB Head compaction in the ingesters to reduce in-memory series when a certain threshold is reached. Useful to deal with high series churning rate. (-blocks-storage.tsdb.early-head-compaction-min-in-memory-series).
Spread-minimizing token generation algorithm for the ingesters. This new method drastically reduces the difference in series pushed to different ingesters. Please note that a migration process is required to switch from previous random generation algorithm, which will be detailed once the feature is declared stable.
Support for chunks streaming from store-gateways to queriers that should reduce the memory usage in the queriers. Can be enabled through the -querier.prefer-streaming-chunks-from-store-gateways option.
Support for circuit-breaking the distributor write requests to the ingesters. This can be enabled through the -ingester.client.circuit-breaker.* configuration options and should serve to let ingesters recover when under high pressure.
Support to limit read requests based on CPU/memory utilization. This should alleviate pressure on the ingesters after receiving heavy queries and reduce the likelihood of disrupting the write path. (-ingester.read-path-cpu-utilization-limit, -ingester.read-path-memory-utilization-limit, -ingester.log-utilization-based-limiter-cpu-samples).

Helm chart improvements

The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.

Important changes

In Grafana Mimir 2.10 we have changed the following behaviors:

Query requests are initiated only to ingesters in the ACTIVE state in the ring. This is not expected to introduce any degradation in terms of query results correctness or high-availability.
Per-instance limit errors are not logged anymore, to reduce resource usage when ingesters are under pressure. We encourage you to use metrics and alerting to monitor them instead. The following metrics have been added to count the number of requests rejected for hitting per-instance limits:
- cortex_distributor_instance_rejected_requests_total
- cortex_ingester_instance_rejected_requests_total
The CLI flag -validation.create-grace-period is now enforced in the ingester. If you've configured -validation.create-grace-period, make sure the configuration is applied to ingesters too.
The CLI flag -validation.create-grace-period is now enforced for exemplars. The cortex_discarded_exemplars_total{reason="exemplar_too_far_in_future",user="..."} series is incremented when exemplars are dropped because their timestamp is greater than "now + grace_period".
The CLI flag -validation.create-grace-period is now enforced in the query-frontend even when the configured value is 0. When the value is 0, the query end time range is truncated to the current real-world time.

The following metrics were removed:

cortex_ingester_shipper_dir_syncs_total
cortex_ingester_shipper_dir_sync_failures_total

The following configuration options are deprecated and will be removed in Grafana Mimir 2.12:

The CLI flag -blocks-storage.bucket-store.index-header-lazy-loading-enabled is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-enabled.
The CLI flag -blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout.
The CLI flag -blocks-storage.bucket-store.index-header-lazy-loading-concurrency is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-concurrency.

The following configuration options that were deprecated in Grafana Mimir 2.8 are removed:

The CLI flag blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup.

The following experimental configuration options were renamed or removed:

The CLI flag -querier.prefer-streaming-chunks was renamed to -querier.prefer-streaming-chunks-from-ingesters.
The CLI flag -blocks-storage.bucket-store.chunks-cache.fine-grained-chunks-caching-enabled was removed.
The CLI flag -blocks-storage.bucket-store.fine-grained-chunks-caching-ranges-per-series was removed.

The following experimental options are now stable:

The CLI flag -shutdown-delay.
The CLI flag -ingester.ring.excluded-zones.

The following configuration option defaults were changed:

The default value for the CLI flag -querier.streaming-chunks-per-ingester-buffer-size was changed from 512 to 256.
The default value for gRPC clients connect timeout was set to 5s (default inherited from gRPC client was 20s) with a default max backoff delay of 5s (default inherited from gRPC client was 120s).

Bug fixes

Ruler: fixed graceful shutdown for rule evaluations.
Ingester: fixed ingesters getting stuck when previous state is LEAVING and the number of tokens has changed upon restarting.
Querier: fixed timestamp() function fail with execution: attempted to read series at index 0 from stream, but the stream has already been exhausted if the experimental feature to stream chunks from ingesters to queriers is enabled.
Memberlist: brought back memberlist_client_kv_store_count metric that used to exist in Cortex, but got lost during grafana/dskit updates before Mimir 2.0.
Store-gateway: fixed an issue where stopping a store-gateway could cause all store-gateways to unload all blocks.
Ingester: prevented setting "last update time" of TSDB into the future when opening TSDB. This could prevent detecting of idle TSDB for a long time, if sample in distant future was ingested.
General: changed ballast to allocate smaller blocks to avoid problem when entire ballast was kept in memory working set.

Changelog

2.10.0

Grafana Mimir

[CHANGE] Store-gateway: skip verifying index header integrity upon loading. To enable verification set blocks_storage.bucket_store.index_header.verify_on_load: true. #5174
[CHANGE] Querier: change the default value of the experimental -querier.streaming-chunks-per-ingester-buffer-size flag to 256. #5203
[CHANGE] Querier: only initiate query requests to ingesters in the ACTIVE state in the ring. #5342
[CHANGE] Querier: renamed -querier.prefer-streaming-chunks to -querier.prefer-streaming-chunks-from-ingesters to enable streaming chunks from ingesters to queriers. #5182
[CHANGE] Querier: -query-frontend.cache-unaligned-requests has been moved from a global flag to a per-tenant override. #5312
[CHANGE] Ingester: removed cortex_ingester_shipper_dir_syncs_total and cortex_ingester_shipper_dir_sync_failures_total metrics. The former metric was not much useful, and the latter was never incremented. #5396
[CHANGE] Ingester: removed logging of errors related to hitting per-instance limits to reduce resource usage when ingesters are under pressure. #5585
[CHANGE] gRPC clients: use default connect timeout of 5s, and therefore enable default connect backoff max delay of 5s. #5562
[CHANGE] Ingester: the -validation.create-grace-period is now enforced in the ingester too, other than distributor and query-frontend. If you've configured -validation.create-grace-period then make sure the configuration is applied to ingesters too. #5712
[CHANGE] Distributor: the -validation.create-grace-period is now enforced for examplars too in the distributor. If an examplar has timestamp greater than "now + grace_period", then the exemplar will be dropped and the metric cortex_discarded_exemplars_total{reason="exemplar_too_far_in_future",user="..."} increased. #5761
[CHANGE] Query-frontend: the -validation.create-grace-period is now enforced in the query-frontend even when the configured value is 0. When the value is 0, the query end time range is truncated to the current real-world time. #5829
[CHANGE] Store-gateway: deprecated configuration parameters for index header under blocks-storage.bucket-store and use a new configurations in blocks-storage.bucket-store.index-header, deprecated configuration will be removed in Mimir 2.12. Configuration changes: #5726
- -blocks-storage.bucket-store.index-header-lazy-loading-enabled is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-enabled
- -blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout
- -blocks-storage.bucket-store.index-header-lazy-loading-concurrency is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-concurrency
[CHANGE] Store-gateway: remove experimental fine-grained chunks caching. The following experimental configuration parameters have been removed -blocks-storage.bucket-store.chunks-cache.fine-grained-chunks-caching-enabled, -blocks-storage.bucket-store.fine-grained-chunks-caching-ranges-per-series. #5816 #5875
[CHANGE] Ingester: remove deprecated blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup. #5850
[FEATURE] Introduced -distributor.service-overload-status-code-on-rate-limit-enabled flag for configuring status code to 529 instead of 429 upon rate limit exhaustion. #5752
[FEATURE] Cardinality API: added a new count_method parameter which enables counting active series. #5136
[FEATURE] Query-frontend: added experimental support to cache cardinality, label names and label values query responses. The cache will be used when -query-frontend.cache-results is enabled, and -query-frontend.results-cache-ttl-for-cardinality-query or -query-frontend.results-cache-ttl-for-labels-query set to a value greater than 0. The following metrics have been added to track the query results cache hit ratio per request_type: #5212 #5235 #5426 #5524
- cortex_frontend_query_result_cache_requests_total{request_type="query_range|cardinality|label_names_and_values"}
- cortex_frontend_query_result_cache_hits_total{request_type="query_range|cardinality|label_names_and_values"}
[FEATURE] Added -<prefix>.s3.list-objects-version flag to configure the S3 list objects version. #5099
[FEATURE] Ingester: add optional CPU/memory utilization based read request limiting, considered experimental. Disabled by default, enable by configuring limits via both of the following flags: #5012 #5392 #5394 #5526 #5508 #5704
- -ingester.read-path-cpu-utilization-limit
- -ingester.read-path-memory-utilization-limit
- -ingester.log-utilization-based-limiter-cpu-samples
[FEATURE] Ruler: support filtering results from rule status endpoint by file, rule_group and rule_name. #5291
[FEATURE] Ingester: add experimental support for creating tokens by using spread minimizing strategy. This can be enabled with -ingester.ring.token-generation-strategy: spread-minimizing and -ingester.ring.spread-minimizing-zones: <all available zones>. In that case -ingester.ring.tokens-file-path must be empty. #5308 #5324
[FEATURE] Storegateway: Persist sparse index-headers to disk and read from disk on index-header loads instead of reconstructing. #5465 #5651 #5726
[FEATURE] Ingester: add experimental CLI flag -ingester.ring.spread-minimizing-join-ring-in-order that allows an ingester to register tokens in the ring only after all previous ingesters (with ID lower than its own ID) have already been registered. #5541
[FEATURE] Ingester: add experimental support to compact the TSDB Head when the number of in-memory series is equal or greater than -blocks-storage.tsdb.early-head-compaction-min-in-memory-series, and the ingester estimates that the per-tenant TSDB Head compaction will reduce in-memory series by at least -blocks-storage.tsdb.early-head-compaction-min-estimated-series-reduction-percentage. #5371
[FEATURE] Ingester: add new metrics for tracking native histograms in active series: cortex_ingester_active_native_histogram_series, cortex_ingester_active_native_histogram_series_custom_tracker, cortex_ingester_active_native_histogram_buckets, cortex_ingester_active_native_histogram_buckets_custom_tracker. The first 2 are the subsets of the existing and unmodified cortex_ingester_active_series and cortex_ingester_active_series_custom_tracker respectively, only tracking native histogram series, and the last 2 are the equivalents for tracking the number of buckets in native histogram series. #5318
[FEATURE] Add experimental CLI flag -<prefix>.s3.native-aws-auth-enabled that allows to enable the default credentials provider chain of the AWS SDK. #5636
[FEATURE] Distributor: add experimental support for circuit breaking when writing to ingesters via -ingester.client.circuit-breaker.enabled, -ingester.client.circuit-breaker.failure-threshold, or -ingester.client.circuit-breaker.cooldown-period or their corresponding YAML. #5650
[FEATURE] The following features are no longer considered experimental. #5701 #5872
- Ruler storage cache (-ruler-storage.cache.*)
- Exclude ingesters running in specific zones (-ingester.ring.excluded-zones)
- Cardinality-based query sharding (-query-frontend.query-sharding-target-series-per-shard)
- Cardinality query result caching (-query-frontend.results-cache-ttl-for-cardinality-query)
- Label names and values query result caching (-query-frontend.results-cache-ttl-for-labels-query)
- Query expression size limit (-query-frontend.max-query-expression-size-bytes)
- Peer discovery / tenant sharding for overrides exporters (-overrides-exporter.ring.enabled)
- Configuring enabled metrics in overrides exporter (-overrides-exporter.enabled-metrics)
- Per-tenant results cache TTL (-query-frontend.results-cache-ttl, -query-frontend.results-cache-ttl-for-out-of-order-time-window)
- Shutdown delay (-shutdown-delay)
[FEATURE] Querier: add experimental CLI flag -tenant-federation.max-concurrent to adjust the max number of per-tenant queries that can be run at a time when executing a single multi-tenant query. #5874
[FEATURE] Alertmanager: add Microsoft Teams as a supported integration. #5840
[ENHANCEMENT] Overrides-exporter: Add new metrics for write path and alertmanager (max_global_metadata_per_user, max_global_metadata_per_metric, request_rate, request_burst_size, alertmanager_notification_rate_limit, alertmanager_max_dispatcher_aggregation_groups, alertmanager_max_alerts_count, alertmanager_max_alerts_size_bytes) and added flag -overrides-exporter.enabled-metrics to explicitly configure desired metrics, e.g. -overrides-exporter.enabled-metrics=request_rate,ingestion_rate. Default value for this flag is: ingestion_rate,ingestion_burst_size,max_global_series_per_user,max_global_series_per_metric,max_global_exemplars_per_user,max_fetched_chunks_per_query,max_fetched_series_per_query,ruler_max_rules_per_rule_group,ruler_max_rule_groups_per_tenant. #5376
[ENHANCEMENT] Cardinality API: when zone aware replication is enabled, the label values cardinality API can now tolerate single zone failure #5178
[ENHANCEMENT] Distributor: optimize sending requests to ingesters when incoming requests don't need to be modified. For now this feature can be disabled by setting -timeseries-unmarshal-caching-optimization-enabled=false. #5137
[ENHANCEMENT] Add advanced CLI flags to control gRPC client behaviour: #5161
- -<prefix>.connect-timeout
- -<prefix>.connect-backoff-base-delay
- -<prefix>.connect-backoff-max-delay
- -<prefix>.initial-stream-window-size
- -<prefix>.initial-connection-window-size
[ENHANCEMENT] Query-frontend: added "response_size_bytes" field to "query stats" log. #5196
[ENHANCEMENT] Querier: refine error messages for per-tenant query limits, informing the user of the preferred strategy for not hitting the limit, in addition to how they may tweak the limit. #5059
[ENHANCEMENT] Distributor: optimize sending of requests to ingesters by reusing memory buffers for marshalling requests. This optimization can be enabled by setting -distributor.write-requests-buffer-pooling-enabled to true. #5195 #5805 #5830
[ENHANCEMENT] Querier: add experimental -querier.minimize-ingester-requests option to initially query only the minimum set of ingesters required to reach quorum. #5202 #5259 #5263
[ENHANCEMENT] Querier: improve error message when streaming chunks from ingesters to queriers and a query limit is reached. #5245
[ENHANCEMENT] Use new data structure for labels, to reduce memory consumption. #3555 #5731
[ENHANCEMENT] Update alpine base image to 3.18.2. #5276
[ENHANCEMENT] Ruler: add cortex_ruler_sync_rules_duration_seconds metric, tracking the time spent syncing all rule groups owned by the ruler instance. #5311
[ENHANCEMENT] Store-gateway: add experimental blocks-storage.bucket-store.index-header-lazy-loading-concurrency config option to limit the number of concurrent index-headers loads when lazy loading. #5313 #5605
[ENHANCEMENT] Ingester and querier: improve level of detail in traces emitted for queries that hit ingesters. #5315
[ENHANCEMENT] Querier: add cortex_querier_queries_rejected_total metric that counts the number of queries rejected due to hitting a limit (eg. max series per query or max chunks per query). #5316 #5440 #5450
[ENHANCEMENT] Querier: add experimental -querier.minimize-ingester-requests-hedging-delay option to initiate requests to further ingesters when request minimisation is enabled and not all initial requests have completed. #5368
[ENHANCEMENT] Clarify docs for -ingester.client.* flags to make it clear that these are used by both queriers and distributors. #5375
[ENHANCEMENT] Querier and store-gateway: add experimental support for streaming chunks from store-gateways to queriers while evaluating queries. This can be enabled with -querier.prefer-streaming-chunks-from-store-gateways=true. #5182
[ENHANCEMENT] Querier: enforce max-chunks-per-query limit earlier in query processing when streaming chunks from ingesters to queriers to avoid unnecessarily consuming resources for queries that will be aborted. #5369 #5447
[ENHANCEMENT] Ingester: added cortex_ingester_shipper_last_successful_upload_timestamp_seconds metric tracking the last successful TSDB block uploaded to the bucket (unix timestamp in seconds). #5396
[ENHANCEMENT] Ingester: add two metrics tracking resource utilization calculated by utilization based limiter: #5496
- cortex_ingester_utilization_limiter_current_cpu_load: The current exponential weighted moving average of the ingester's CPU load
- cortex_ingester_utilization_limiter_current_memory_usage_bytes: The current ingester memory utilization
[ENHANCEMENT] Ruler: added insight=true field to ruler's prometheus component for rule evaluation logs. #5510
[ENHANCEMENT] Distributor Ingester: add metrics to count the number of requests rejected for hitting per-instance limits, cortex_distributor_instance_rejected_requests_total and cortex_ingester_instance_rejected_requests_total respectively. #5551
[ENHANCEMENT] Distributor: add support for ingesting exponential histograms that are over the native histogram scale limit of 8 in OpenTelemetry format by downscaling them. #5532 #5607
[ENHANCEMENT] General: buffered logging: #5506
- -log.buffered CLI flag enable buffered logging.
[ENHANCEMENT] Distributor: add more detailed information to traces generated while processing OTLP write requests. #5539
[ENHANCEMENT] Distributor: improve performance ingesting OTLP payloads. #5531 #5607 #5616
[ENHANCEMENT] Ingester: optimize label-values with matchers call when number of matched series is small. #5600
[ENHANCEMENT] Compactor: delete bucket-index, markers and debug files if there are no blocks left in the bucket index. This cleanup must be enabled by using -compactor.no-blocks-file-cleanup-enabled option. #5648
[ENHANCEMENT] Ingester: reduce memory usage of active series tracker. #5665
[ENHANCEMENT] Store-gateway: added -store-gateway.sharding-ring.auto-forget-enabled configuration parameter to control whether store-gateway auto-forget feature should be enabled or disabled (enabled by default). #5702
[ENHANCEMENT] Compactor: added per tenant block upload counters cortex_block_upload_api_blocks_total, cortex_block_upload_api_bytes_total, and cortex_block_upload_api_files_total. #5738
[ENHANCEMENT] Compactor: verify time range of compacted block(s) matches the time range of input blocks. #5760
[ENHANCEMENT] Querier: improved observability of calls to ingesters during queries. #5724
[ENHANCEMENT] Compactor: block backfilling logging is now more verbose. #5711
[ENHANCEMENT] Added support to rate limit application logs: #5764
- -log.rate-limit-enabled
- -log.rate-limit-logs-per-second
- -log.rate-limit-logs-per-second-burst
[ENHANCEMENT] Ingester: added cortex_ingester_tsdb_head_min_timestamp_seconds and cortex_ingester_tsdb_head_max_timestamp_seconds metrics which return min and max time of all TSDB Heads open in an ingester. #5786 #5815
[ENHANCEMENT] Querier: cancel query requests to ingesters in a zone upon first error received from the zone, to reduce wasted effort spent computing results that won't be used #5764
[ENHANCEMENT] All: improve tracing of internal HTTP requests sent over httpgrpc. #5782
[ENHANCEMENT] Querier: add experimental per-query chunks limit based on an estimate of the number of chunks that will be sent from ingesters and store-gateways that is enforced earlier during query evaluation. This limit is disabled by default and can be configured with -querier.max-estimated-fetched-chunks-per-query-multiplier. #5765
[ENHANCEMENT] Ingester: add UI for listing tenants with TSDB on given ingester and viewing details of tenants's TSDB on given ingester. #5803 #5824
[ENHANCEMENT] Querier: improve observability of calls to store-gateways during queries. #5809
[ENHANCEMENT] Query-frontend: improve tracing of interactions with query-scheduler. #5818
[ENHANCEMENT] Query-scheduler: improve tracing of requests when request is rejected by query-scheduler. #5848
[ENHANCEMENT] Ingester: avoid logging some errors that could cause logging contention. #5494 #5581
[ENHANCEMENT] Store-gateway: wait for query gate after loading blocks. #5507
[ENHANCEMENT] Store-gateway: always include __name__ posting group in selection in order to reduce the number of object storage API calls. #5246
[ENHANCEMENT] Ingester: track active series by ref instead of hash/labels to reduce memory usage. #5134 #5193
[ENHANCEMENT] Go: updated to 1.21.1. #5955 #5960
[ENHANCEMENT] Alertmanager: updated to alertmanager 0.26.0. #5840
[BUGFIX] Ingester: Handle when previous ring state is leaving and the number of tokens has changed. #5204
[BUGFIX] Querier: fix issue where queries that use the timestamp() function fail with execution: attempted to read series at index 0 from stream, but the stream has already been exhausted if streaming chunks from ingesters to queriers is enabled. #5370
[BUGFIX] memberlist: bring back memberlist_client_kv_store_count metric that used to exist in Cortex, but got lost during dskit updates before Mimir 2.0. #5377
[BUGFIX] Querier: pass on HTTP 503 query response code. #5364
[BUGFIX] Store-gateway: Fix issue where stopping a store-gateway could cause all store-gateways to unload all blocks. #5464
[BUGFIX] Allocate ballast in smaller blocks to avoid problem when entire ballast was kept in memory working set. #5565
[BUGFIX] Querier: retry frontend result notification when an error is returned. #5591
[BUGFIX] Querier: fix issue where cortex_ingester_client_request_duration_seconds metric did not include streaming query requests that did not return any series. #5695
[BUGFIX] Ingester: fix ActiveSeries tracker double-counting series that have been deleted from the Head while still being active and then recreated again. #5678
[BUGFIX] Ingester: don't set "last update time" of TSDB into the future when opening TSDB. This could prevent detecting of idle TSDB for a long time, if sample in distant future was ingested. #5787
[BUGFIX] Store-gateway: fix bug when lazy index header could be closed prematurely even when still in use. #5795
[BUGFIX] Ruler: gracefully shut down rule evaluations. #5778
[BUGFIX] Querier: fix performance when ingesters stream samples. #5836
[BUGFIX] Ingester: fix spurious not found errors on label values API during head compaction. #5957
[BUGFIX] All: updated Minio object storage client from 7.0.62 to 7.0.63 to fix auto-detection of AWS GovCloud environments. #5905

Mixin

[CHANGE] Dashboards: show all workloads in selected namespace on "rollout progress" dashboard. #5113
[CHANGE] Dashboards: show the number of updated and ready pods for each workload in the "rollout progress" panel on the "rollout progress" dashboard. #5113
[CHANGE] Dashboards: removed "Query results cache misses" panel on the "Mimir / Queries" dashboard. #5423
[CHANGE] Dashboards: default to shared crosshair on all dashboards. #5489
[CHANGE] Dashboards: sort variable drop-down lists from A to Z, rather than Z to A. #5490
[CHANGE] Alerts: removed MimirProvisioningTooManyActiveSeries alert. You should configure -ingester.instance-limits.max-series and rely on MimirIngesterReachingSeriesLimit alert instead. #5593
[CHANGE] Alerts: removed MimirProvisioningTooManyWrites alert. The alerting threshold used in this alert was chosen arbitrarily and ingesters receiving an higher number of samples / sec don't necessarily have any issue. You should rely on SLOs metrics and alerts instead. #5706
[CHANGE] Alerts: don't raise MimirRequestErrors or MimirRequestLatency alert for the /debug/pprof endpoint. #5826
[ENHANCEMENT] Dashboards: adjust layout of "rollout progress" dashboard panels so that the "rollout progress" panel doesn't require scrolling. #5113
[ENHANCEMENT] Dashboards: show container name first in "pods count per version" panel on "rollout progress" dashboard. #5113
[ENHANCEMENT] Dashboards: show time spend waiting for turn when lazy loading index headers in the "index-header lazy load gate latency" panel on the "queries" dashboard. #5313
[ENHANCEMENT] Dashboards: split query results cache hit ratio by request type in "Query results cache hit ratio" panel on the "Mimir / Queries" dashboard. #5423
[ENHANCEMENT] Dashboards: add "rejected queries" panel to "queries" dashboard. #5429
[ENHANCEMENT] Dashboards: add native histogram active series and active buckets to "tenants" dashboard. #5543
[ENHANCEMENT] Dashboards: add panels to "Mimir / Writes" for requests rejected for per-instance limits. #5638
[ENHANCEMENT] Dashboards: rename "Blocks currently loaded" to "Blocks currently owned" in the "Mimir / Queries" dashboard. #5705
[ENHANCEMENT] Alerts: Add MimirIngestedDataTooFarInTheFuture warning alert that triggers when Mimir ingests sample with timestamp more than 1h in the future. #5822
[BUGFIX] Alerts: fix MimirIngesterRestarts to fire only when the ingester container is restarted, excluding the cases the pod is rescheduled. #5397
[BUGFIX] Dashboards: fix "unhealthy pods" panel on "rollout progress" dashboard showing only a number rather than the name of the workload and the number of unhealthy pods if only one workload has unhealthy pods. #5113 #5200
[BUGFIX] Alerts: fixed MimirIngesterHasNotShippedBlocks and MimirIngesterHasNotShippedBlocksSinceStart alerts. #5396
[BUGFIX] Alerts: Fix MimirGossipMembersMismatch to include admin-api and custom compactor pods. admin-api is a GEM component. #5641 #5797
[BUGFIX] Dashboards: fix autoscaling dashboard panels that could show multiple series for a single component. #5810

Jsonnet

[CHANGE] Removed _config.querier.concurrency configuration option and replaced it with _config.querier_max_concurrency and _config.ruler_querier_max_concurrency to allow to easily fine tune it for different querier deployments. #5322
[CHANGE] Change _config.multi_zone_ingester_max_unavailable to 50. #5327
[CHANGE] Change distributors rolling update strategy configuration: maxSurge and maxUnavailable are set to 15% and 0. #5714
[FEATURE] Alertmanager: Add horizontal pod autoscaler config, that can be enabled using autoscaling_alertmanager_enabled: true. #5194 #5249
[ENHANCEMENT] Enable the track_sizes feature for Memcached pods to help determine cache efficiency. #5209
[ENHANCEMENT] Add per-container map for environment variables. #5181
[ENHANCEMENT] Add PodDisruptionBudgets for compactor, continuous-test, distributor, overrides-exporter, querier, query-frontend, query-scheduler, rollout-operator, ruler, ruler-querier, ruler-query-frontend, ruler-query-scheduler, and all memcached workloads. #5098
[ENHANCEMENT] Ruler: configure the ruler storage cache when the metadata cache is enabled. #5326 #5334
[ENHANCEMENT] Shuffle-sharding: ingester shards in user-classes can now be configured to target different series and limit percentage utilization through _config.shuffle_sharding.target_series_per_ingester and _config.shuffle_sharding.target_utilization_percentage values. #5470
[ENHANCEMENT] Distributor: allow adjustment of the targeted CPU usage as a percentage of requested CPU. This can be adjusted with _config.autoscaling_distributor_cpu_target_utilization. #5525
[ENHANCEMENT] Ruler: add configuration option _config.ruler_remote_evaluation_max_query_response_size_bytes to easily set the maximum query response size allowed (in bytes). #5592
[ENHANCEMENT] Distributor: dynamically set GOMAXPROCS based on the CPU request. This should reduce distributor CPU utilization, assuming the CPU request is set to a value close to the actual utilization. #5588
[ENHANCEMENT] Querier: dynamically set GOMAXPROCS based on the CPU request. This should reduce noisy neighbour issues created by the querier, whose CPU utilization could eventually saturate the Kubernetes node if unbounded. #5646 #5658
[ENHANCEMENT] Allow to remove an entry from the configured environment variable for a given component, setting the environment value to null in the *_env_map objects (e.g. store_gateway_env_map+:: { 'field': null}). #5599
[ENHANCEMENT] Allow overriding the default number of replicas for etcd. #5589
[ENHANCEMENT] Memcached: reduce memory request for results, chunks and metadata caches. The requested memory is 5% greater than the configured memcached max cache size. #5661
[ENHANCEMENT] Autoscaling: Add the following configuration options to fine tune autoscaler target utilization: #5679 #5682 #5689
- autoscaling_querier_target_utilization (defaults to 0.75)
- autoscaling_mimir_read_target_utilization (defaults to 0.75)
- autoscaling_ruler_querier_cpu_target_utilization (defaults to 1)
- autoscaling_distributor_memory_target_utilization (defaults to 1)
- autoscaling_ruler_cpu_target_utilization (defaults to 1)
- autoscaling_query_frontend_cpu_target_utilization (defaults to 1)
- autoscaling_ruler_query_frontend_cpu_target_utilization (defaults to 1)
- autoscaling_alertmanager_cpu_target_utilization (defaults to 1)
[ENHANCEMENT] Gossip-ring: add appProtocol for istio compatibility. #5680
[ENHANCEMENT] Add _config.commonConfig to allow adding common configuration parameters for all Mimir components. #5703
[ENHANCEMENT] Update rollout-operator to v0.7.0. #5718
[ENHANCEMENT] Increase the default rollout speed for store-gateway when lazy loading is disabled. #5823
[BUGFIX] Fix compilation when index, chunks or metadata caches are disabled. #5710

Mimirtool

[ENHANCEMENT] Mimirtool uses paging to fetch all dashboards from Grafana when running mimirtool analyse grafana. This allows the tool to work correctly when running against Grafana instances with more than a 1000 dashboards. #5825
[ENHANCEMENT] Extract metric name from queries that have a __name__ matcher. #5911
[BUGFIX] Mimirtool no longer parses label names as metric names when handling templating variables that are populated using label_values(<label_name>) when running mimirtool analyse grafana. #5832
[BUGFIX] Fix panic when analyzing a grafana dashboard with multiline queries in templating variables. #5911

Query-tee

[CHANGE] Proxy Content-Type response header from backend. Previously Content-Type: text/plain; charset=utf-8 was returned on all requests. #5183
[CHANGE] Increase default value of -proxy.compare-skip-recent-samples to avoid racing with recording rule evaluation. #5561
[CHANGE] Add -backend.skip-tls-verify to optionally skip TLS verification on backends. #5656

Documentation

[CHANGE] Fix reference to get-started documentation directory. #5476
[CHANGE] Fix link to external OTLP/HTTP documentation.
[ENHANCEMENT] Improved MimirRulerTooManyFailedQueries runbook. #5586
[ENHANCEMENT] Improved "Recover accidentally deleted blocks" runbook. #5620
[ENHANCEMENT] Documented options and trade-offs to query label names and values. #5582
[ENHANCEMENT] Improved MimirRequestErrors runbook for alertmanager. #5694

Tools

[CHANGE] copyblocks: add support for S3 and the ability to copy between different object storage services. Due to this, the -source-service and -destination-service flags are now required and the -service flag has been removed. #5486
[FEATURE] undelete-block-gcs: Added new tool for undeleting blocks on GCS storage. #5610 #5855
[FEATURE] wal-reader: Added new tool for printing entries in TSDB WAL. #5780
[ENHANCEMENT] ulidtime: add -seconds flag to print timestamps as Unix timestamps. #5621
[ENHANCEMENT] ulidtime: exit with status code 1 if some ULIDs can't be parsed. #5621
[ENHANCEMENT] tsdb-index-toc: added index-header size estimates. #5652
[BUGFIX] Stop tools from panicking when -help flag is passed. #5412
[BUGFIX] Remove github.com/golang/glog command line flags from tools. #5413

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.9.0...mimir-2.10.0

mimir - 2.9.1

Published by ying-jeanne about 1 year ago

This release contains 2 PRs from 1 authors. Thank you!

Changelog

2.9.1

Grafana Mimir

[ENHANCEMENT] Update alpine base image to 3.18.3. #6021

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.9.0...mimir-2.9.1

mimir - 2.10.0-rc.2

Published by colega about 1 year ago

This release contains 5 PRs from 3 authors. Thank you!

Changelog

2.10.0-rc.2

Grafana Mimir

[ENHANCEMENT] Go: updated to 1.21.1. #5955 #5960
[BUGFIX] Ingester: fix spurious not found errors on label values API during head compaction. #5957

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.0-rc.1...mimir-2.10.0-rc.2

mimir - 2.10.0-rc.1

Published by colega about 1 year ago

This release contains 12 PRs from 4 authors. Thank you!

Changelog

2.10.0-rc.1

Grafana Mimir

[FEATURE] The following features are no longer considered experimental. #5872
- Ruler storage cache (-ruler-storage.cache.*)
- Exclude ingesters running in specific zones (-ingester.ring.excluded-zones)
- Cardinality-based query sharding (-query-frontend.query-sharding-target-series-per-shard)
- Cardinality query result caching (-query-frontend.results-cache-ttl-for-cardinality-query)
- Label names and values query result caching (-query-frontend.results-cache-ttl-for-labels-query)
- Query expression size limit (-query-frontend.max-query-expression-size-bytes)
- Peer discovery / tenant sharding for overrides exporters (-overrides-exporter.ring.enabled)
- Configuring enabled metrics in overrides exporter (-overrides-exporter.enabled-metrics)
- Per-tenant results cache TTL (-query-frontend.results-cache-ttl, -query-frontend.results-cache-ttl-for-out-of-order-time-window)
[FEATURE] Querier: add experimental CLI flag -tenant-federation.max-concurrent to adjust the max number of per-tenant queries that can be run at a time when executing a single multi-tenant query. #5874
[FEATURE] Alertmanager: Add Microsoft Teams as a supported integration. #5840
[ENHANCEMENT] Alertmanager: update to alertmanager 0.26.0. #5840
[BUGFIX] Store-gateway: fix chunks corruption bug introduced in rc.0. #5875
[BUGFIX] Update Minio object storage client from 7.0.62 to 7.0.63 to fix auto-detection of AWS GovCloud environments. #5905

Mimirtool

[ENHANCEMENT] Mimirtool uses paging to fetch all dashboards from Grafana when running mimirtool analyse grafana. This allows the tool to work correctly when running against Grafana instances with more than a 1000 dashboards. #5825
[ENHANCEMENT] Extract metric name from queries that have a __name__ matcher. #5911
[BUGFIX] Mimirtool no longer parses label names as metric names when handling templating variables that are populated using label_values(<label_name>) when running mimirtool analyse grafana. #5832
[BUGFIX] Fix panic when analyzing a grafana dashboard with multiline queries in templating variables. #5911

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.0-rc.0...mimir-2.10.0-rc.1

mimir - 2.10.0-rc.0

Published by colega about 1 year ago

This release contains 434 PRs from 54 authors, including new contributors Aaron Sanders, Alexander Proschek, Aljoscha Pörtner, balazs92117, Francois Gouteroux, Franco Posa, Heather Yuan, jingyang, kendrickclark, m4r1u2, Milan Plžík, Samir Teymurov, Sven Haardiek, Thomas Schaaf, Tiago Posse. Thank you!

Grafana Mimir version 2.10.0-rc.0 release notes

Pending, draft version can be seen at: https://github.com/grafana/mimir/pull/5873

Changelog

2.10.0-rc.0

Grafana Mimir

[CHANGE] Update Go version to 1.21.0. #5734
[CHANGE] Store-gateway: skip verifying index header integrity upon loading. To enable verification set blocks_storage.bucket_store.index_header.verify_on_load: true. #5174
[CHANGE] Querier: change the default value of the experimental -querier.streaming-chunks-per-ingester-buffer-size flag to 256. #5203
[CHANGE] Querier: only initiate query requests to ingesters in the ACTIVE state in the ring. #5342
[CHANGE] Querier: Renamed -querier.prefer-streaming-chunks to -querier.prefer-streaming-chunks-from-ingesters to enable streaming chunks from ingesters to queriers. #5182
[CHANGE] Querier: -query-frontend.cache-unaligned-requests has been moved from a global flag to a per-tenant override. #5312
[CHANGE] Ingester: removed cortex_ingester_shipper_dir_syncs_total and cortex_ingester_shipper_dir_sync_failures_total metrics. The former metric was not much useful, and the latter was never incremented. #5396
[CHANGE] Ingester: Do not log errors related to hitting per-instance limits to reduce resource usage when ingesters are under pressure. #5585
[CHANGE] gRPC clients: use default connect timeout of 5s, and therefore enable default connect backoff max delay of 5s. #5562
[CHANGE] The -shutdown-delay flag is no longer experimental. #5701
[CHANGE] The -validation.create-grace-period is now enforced in the ingester too, other than distributor and query-frontend. If you've configured -validation.create-grace-period then make sure the configuration is applied to ingesters too. #5712
[CHANGE] The -validation.create-grace-period is now enforced for examplars too in the distributor. If an examplar has timestamp greater than "now + grace_period", then the exemplar will be dropped and the metric cortex_discarded_exemplars_total{reason="exemplar_too_far_in_future",user="..."} increased. #5761
[CHANGE] The -validation.create-grace-period is now enforced in the query-frontend even when the configured value is 0. When the value is 0, the query end time range is truncated to the current real-world time. #5829
[CHANGE] Store-gateway: deprecate configuration parameters for index header under blocks-storage.bucket-store and use a new configurations in blocks-storage.bucket-store.index-header, deprecated configuration will be removed in Mimir 2.12. Configuration changes: #5726
- -blocks-storage.bucket-store.index-header-lazy-loading-enabled is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-enabled
- -blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout
- -blocks-storage.bucket-store.index-header-lazy-loading-concurrency is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-concurrency
[CHANGE] Store-gateway: remove experimental fine-grained chunks caching. The following experimental configuration parameters have been removed -blocks-storage.bucket-store.chunks-cache.fine-grained-chunks-caching-enabled, -blocks-storage.bucket-store.fine-grained-chunks-caching-ranges-per-series. #5816
[CHANGE] Ingester: remove deprecated blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup. #5850
[FEATURE] Introduced distributor.service_overload_status_code_on_rate_limit_enabled flag for configuring status code to 529 instead of 429 upon rate limit exhaustion. #5752
[FEATURE] Cardinality API: Add a new count_method parameter which enables counting active series #5136
[FEATURE] Query-frontend: added experimental support to cache cardinality, label names and label values query responses. The cache will be used when -query-frontend.cache-results is enabled, and -query-frontend.results-cache-ttl-for-cardinality-query or -query-frontend.results-cache-ttl-for-labels-query set to a value greater than 0. The following metrics have been added to track the query results cache hit ratio per request_type: #5212 #5235 #5426 #5524
- cortex_frontend_query_result_cache_requests_total{request_type="query_range|cardinality|label_names_and_values"}
- cortex_frontend_query_result_cache_hits_total{request_type="query_range|cardinality|label_names_and_values"}
[FEATURE] Added -<prefix>.s3.list-objects-version flag to configure the S3 list objects version. #5099
[FEATURE] Ingester: Add optional CPU/memory utilization based read request limiting, considered experimental. Disabled by default, enable by configuring limits via both of the following flags: #5012 #5392 #5394 #5526 #5508 #5704
- -ingester.read-path-cpu-utilization-limit
- -ingester.read-path-memory-utilization-limit
- -ingester.log-utilization-based-limiter-cpu-samples
[FEATURE] Ruler: Support filtering results from rule status endpoint by file, rule_group and rule_name. #5291
[FEATURE] Ingester: add experimental support for creating tokens by using spread minimizing strategy. This can be enabled with -ingester.ring.token-generation-strategy: spread-minimizing and -ingester.ring.spread-minimizing-zones: <all available zones>. In that case -ingester.ring.tokens-file-path must be empty. #5308 #5324
[FEATURE] Storegateway: Persist sparse index-headers to disk and read from disk on index-header loads instead of reconstructing. #5465 #5651 #5726
[FEATURE] Ingester: add experimental CLI flag -ingester.ring.spread-minimizing-join-ring-in-order that allows an ingester to register tokens in the ring only after all previous ingesters (with ID lower than its own ID) have already been registered. #5541
[FEATURE] Ingester: add experimental support to compact the TSDB Head when the number of in-memory series is equal or greater than -blocks-storage.tsdb.early-head-compaction-min-in-memory-series, and the ingester estimates that the per-tenant TSDB Head compaction will reduce in-memory series by at least -blocks-storage.tsdb.early-head-compaction-min-estimated-series-reduction-percentage. #5371
[FEATURE] Ingester: add new metrics for tracking native histograms in active series: cortex_ingester_active_native_histogram_series, cortex_ingester_active_native_histogram_series_custom_tracker, cortex_ingester_active_native_histogram_buckets, cortex_ingester_active_native_histogram_buckets_custom_tracker. The first 2 are the subsets of the existing and unmodified cortex_ingester_active_series and cortex_ingester_active_series_custom_tracker respectively, only tracking native histogram series, and the last 2 are the equivalents for tracking the number of buckets in native histogram series. #5318
[FEATURE] Add experimental CLI flag -<prefix>.s3.native-aws-auth-enabled that allows to enable the default credentials provider chain of the AWS SDK. #5636
[FEATURE] Distributor: add experimental support for circuit breaking when writing to ingesters via -ingester.client.circuit-breaker.enabled, -ingester.client.circuit-breaker.failure-threshold, or -ingester.client.circuit-breaker.cooldown-period or their corresponding YAML. #5650
[ENHANCEMENT] Overrides-exporter: Add new metrics for write path and alertmanager (max_global_metadata_per_user, max_global_metadata_per_metric, request_rate, request_burst_size, alertmanager_notification_rate_limit, alertmanager_max_dispatcher_aggregation_groups, alertmanager_max_alerts_count, alertmanager_max_alerts_size_bytes) and added flag -overrides-exporter.enabled-metrics to explicitly configure desired metrics, e.g. -overrides-exporter.enabled-metrics=request_rate,ingestion_rate. Default value for this flag is: ingestion_rate,ingestion_burst_size,max_global_series_per_user,max_global_series_per_metric,max_global_exemplars_per_user,max_fetched_chunks_per_query,max_fetched_series_per_query,ruler_max_rules_per_rule_group,ruler_max_rule_groups_per_tenant. #5376
[ENHANCEMENT] Cardinality API: When zone aware replication is enabled, the label values cardinality API can now tolerate single zone failure #5178
[ENHANCEMENT] Distributor: optimize sending requests to ingesters when incoming requests don't need to be modified. For now this feature can be disabled by setting -timeseries-unmarshal-caching-optimization-enabled=false. #5137
[ENHANCEMENT] Add advanced CLI flags to control gRPC client behaviour: #5161
- -<prefix>.connect-timeout
- -<prefix>.connect-backoff-base-delay
- -<prefix>.connect-backoff-max-delay
- -<prefix>.initial-stream-window-size
- -<prefix>.initial-connection-window-size
[ENHANCEMENT] Query-frontend: added "response_size_bytes" field to "query stats" log. #5196
[ENHANCEMENT] Querier: Refine error messages for per-tenant query limits, informing the user of the preferred strategy for not hitting the limit, in addition to how they may tweak the limit. #5059
[ENHANCEMENT] Distributor: optimize sending of requests to ingesters by reusing memory buffers for marshalling requests. This optimization can be enabled by setting -distributor.write-requests-buffer-pooling-enabled to true. #5195 #5805 #5830
[ENHANCEMENT] Querier: add experimental -querier.minimize-ingester-requests option to initially query only the minimum set of ingesters required to reach quorum. #5202 #5259 #5263
[ENHANCEMENT] Querier: improve error message when streaming chunks from ingesters to queriers and a query limit is reached. #5245
[ENHANCEMENT] Use new data structure for labels, to reduce memory consumption. #3555 #5731
[ENHANCEMENT] Update alpine base image to 3.18.2. #5276
[ENHANCEMENT] Ruler: add cortex_ruler_sync_rules_duration_seconds metric, tracking the time spent syncing all rule groups owned by the ruler instance. #5311
[ENHANCEMENT] Store-gateway: add experimental blocks-storage.bucket-store.index-header-lazy-loading-concurrency config option to limit the number of concurrent index-headers loads when lazy loading. #5313 #5605
[ENHANCEMENT] Ingester and querier: improve level of detail in traces emitted for queries that hit ingesters. #5315
[ENHANCEMENT] Querier: add cortex_querier_queries_rejected_total metric that counts the number of queries rejected due to hitting a limit (eg. max series per query or max chunks per query). #5316 #5440 #5450
[ENHANCEMENT] Querier: add experimental -querier.minimize-ingester-requests-hedging-delay option to initiate requests to further ingesters when request minimisation is enabled and not all initial requests have completed. #5368
[ENHANCEMENT] Clarify docs for -ingester.client.* flags to make it clear that these are used by both queriers and distributors. #5375
[ENHANCEMENT] Querier and store-gateway: add experimental support for streaming chunks from store-gateways to queriers while evaluating queries. This can be enabled with -querier.prefer-streaming-chunks-from-store-gateways=true. #5182
[ENHANCEMENT] Querier: enforce max-chunks-per-query limit earlier in query processing when streaming chunks from ingesters to queriers to avoid unnecessarily consuming resources for queries that will be aborted. #5369 #5447
[ENHANCEMENT] Ingester: added cortex_ingester_shipper_last_successful_upload_timestamp_seconds metric tracking the last successful TSDB block uploaded to the bucket (unix timestamp in seconds). #5396
[ENHANCEMENT] Ingester: Add two metrics tracking resource utilization calculated by utilization based limiter: #5496
- cortex_ingester_utilization_limiter_current_cpu_load: The current exponential weighted moving average of the ingester's CPU load
- cortex_ingester_utilization_limiter_current_memory_usage_bytes: The current ingester memory utilization
[ENHANCEMENT] Ruler: added insight=true field to ruler's prometheus component for rule evaluation logs. #5510
[ENHANCEMENT] Distributor Ingester: Add metrics to count the number of requests rejected for hitting per-instance limits, cortex_distributor_instance_rejected_requests_total and cortex_ingester_instance_rejected_requests_total respectively. #5551
[ENHANCEMENT] Distributor: add support for ingesting exponential histograms that are over the native histogram scale limit of 8 in OpenTelemetry format by downscaling them. #5532 #5607
[ENHANCEMENT] General: buffered logging: #5506
- -log.buffered: Enable buffered logging
[ENHANCEMENT] Distributor: add more detailed information to traces generated while processing OTLP write requests. #5539
[ENHANCEMENT] Distributor: improve performance ingesting OTLP payloads. #5531 #5607 #5616
[ENHANCEMENT] Ingester: optimize label-values with matchers call when number of matched series is small. #5600
[ENHANCEMENT] Compactor: Delete bucket-index, markers and debug files if there are no blocks left in the bucket index. This cleanup must be enabled by using -compactor.no-blocks-file-cleanup-enabled option. #5648
[ENHANCEMENT] Ingester: reduce memory usage of active series tracker. #5665
[ENHANCEMENT] Store-gateway: added -store-gateway.sharding-ring.auto-forget-enabled configuration parameter to control whether store-gateway auto-forget feature should be enabled or disabled (enabled by default). #5702
[ENHANCEMENT] Compactor: added per tenant block upload counters cortex_block_upload_api_blocks_total, cortex_block_upload_api_bytes_total, and cortex_block_upload_api_files_total. #5738
[ENHANCEMENT] Compactor: Verify time range of compacted block(s) matches the time range of input blocks. #5760
[ENHANCEMENT] Querier: improved observability of calls to ingesters during queries. #5724
[ENHANCEMENT] Compactor: block backfilling logging is now more verbose. #5711
[ENHANCEMENT] Added support to rate limit application logs: #5764
- -log.rate-limit-enabled
- -log.rate-limit-logs-per-second
- -log.rate-limit-logs-per-second-burst
[ENHANCEMENT] Added cortex_ingester_tsdb_head_min_timestamp_seconds and cortex_ingester_tsdb_head_max_timestamp_seconds metrics which return min and max time of all TSDB Heads open in an ingester. #5786 #5815
[ENHANCEMENT] Querier: cancel query requests to ingesters in a zone upon first error received from the zone, to reduce wasted effort spent computing results that won't be used #5764
[ENHANCEMENT] Improve tracing of internal HTTP requests sent over httpgrpc #5782
[ENHANCEMENT] Querier: add experimental per-query chunks limit based on an estimate of the number of chunks that will be sent from ingesters and store-gateways that is enforced earlier during query evaluation. This limit is disabled by default and can be configured with -querier.max-estimated-fetched-chunks-per-query-multiplier. #5765
[ENHANCEMENT] Ingester: add UI for listing tenants with TSDB on given ingester and viewing details of tenants's TSDB on given ingester. #5803 #5824
[ENHANCEMENT] Querier: improve observability of calls to store-gateways during queries. #5809
[ENHANCEMENT] Query-frontend: improve tracing of interactions with query-scheduler. #5818
[ENHANCEMENT] Query-scheduler: improve tracing of requests when request is rejected by query-scheduler. #5848
[ENHANCEMENT] Ingester: avoid logging some errors that could cause logging contention. #5494 #5581
[ENHANCEMENT] Store-gateway: wait for query gate after loading blocks. #5507
[ENHANCEMENT] Store-gateway: always include __name__ posting group in selection in order to reduce the number of object storage API calls. #5246
[ENHANCEMENT] Ingester: track active series by ref instead of hash/labels to reduce memory usage. #5134 #5193
[BUGFIX] Ingester: Handle when previous ring state is leaving and the number of tokens has changed. #5204
[BUGFIX] Querier: fix issue where queries that use the timestamp() function fail with execution: attempted to read series at index 0 from stream, but the stream has already been exhausted if streaming chunks from ingesters to queriers is enabled. #5370
[BUGFIX] memberlist: bring back memberlist_client_kv_store_count metric that used to exist in Cortex, but got lost during dskit updates before Mimir 2.0. #5377
[BUGFIX] Querier: Pass on HTTP 503 query response code. #5364
[BUGFIX] Store-gateway: Fix issue where stopping a store-gateway could cause all store-gateways to unload all blocks. #5464
[BUGFIX] Allocate ballast in smaller blocks to avoid problem when entire ballast was kept in memory working set. #5565
[BUGFIX] Querier: Retry frontend result notification when an error is returned. #5591
[BUGFIX] Querier: fix issue where cortex_ingester_client_request_duration_seconds metric did not include streaming query requests that did not return any series. #5695
[BUGFIX] Ingester: Fix ActiveSeries tracker double-counting series that have been deleted from the Head while still being active and then recreated again. #5678
[BUGFIX] Ingester: Don't set "last update time" of TSDB into the future when opening TSDB. This could prevent detecting of idle TSDB for a long time, if sample in distant future was ingested. #5787
[BUGFIX] Store-gateway: fix bug when lazy index header could be closed prematurely even when still in use. #5795
[BUGFIX] Ruler: gracefully shut down rule evaluations. #5778
[BUGFIX] Querier: fix performance when ingesters stream samples. #5836

Mixin

[CHANGE] Dashboards: show all workloads in selected namespace on "rollout progress" dashboard. #5113
[CHANGE] Dashboards: show the number of updated and ready pods for each workload in the "rollout progress" panel on the "rollout progress" dashboard. #5113
[CHANGE] Dashboards: removed "Query results cache misses" panel on the "Mimir / Queries" dashboard. #5423
[CHANGE] Dashboards: default to shared crosshair on all dashboards. #5489
[CHANGE] Dashboards: sort variable drop-down lists from A to Z, rather than Z to A. #5490
[CHANGE] Alerts: removed MimirProvisioningTooManyActiveSeries alert. You should configure -ingester.instance-limits.max-series and rely on MimirIngesterReachingSeriesLimit alert instead. #5593
[CHANGE] Alerts: removed MimirProvisioningTooManyWrites alert. The alerting threshold used in this alert was chosen arbitrarily and ingesters receiving an higher number of samples / sec don't necessarily have any issue. You should rely on SLOs metrics and alerts instead. #5706
[CHANGE] Alerts: don't raise MimirRequestErrors or MimirRequestLatency alert for the /debug/pprof endpoint. #5826
[ENHANCEMENT] Dashboards: adjust layout of "rollout progress" dashboard panels so that the "rollout progress" panel doesn't require scrolling. #5113
[ENHANCEMENT] Dashboards: show container name first in "pods count per version" panel on "rollout progress" dashboard. #5113
[ENHANCEMENT] Dashboards: show time spend waiting for turn when lazy loading index headers in the "index-header lazy load gate latency" panel on the "queries" dashboard. #5313
[ENHANCEMENT] Dashboards: split query results cache hit ratio by request type in "Query results cache hit ratio" panel on the "Mimir / Queries" dashboard. #5423
[ENHANCEMENT] Dashboards: add "rejected queries" panel to "queries" dashboard. #5429
[ENHANCEMENT] Dashboards: add native histogram active series and active buckets to "tenants" dashboard. #5543
[ENHANCEMENT] Dashboards: add panels to "Mimir / Writes" for requests rejected for per-instance limits. #5638
[ENHANCEMENT] Dashboards: rename "Blocks currently loaded" to "Blocks currently owned" in the "Mimir / Queries" dashboard. #5705
[ENHANCEMENT] Alerts: Add MimirIngestedDataTooFarInTheFuture warning alert that triggers when Mimir ingests sample with timestamp more than 1h in the future. #5822
[BUGFIX] Alerts: fix MimirIngesterRestarts to fire only when the ingester container is restarted, excluding the cases the pod is rescheduled. #5397
[BUGFIX] Dashboards: fix "unhealthy pods" panel on "rollout progress" dashboard showing only a number rather than the name of the workload and the number of unhealthy pods if only one workload has unhealthy pods. #5113 #5200
[BUGFIX] Alerts: fixed MimirIngesterHasNotShippedBlocks and MimirIngesterHasNotShippedBlocksSinceStart alerts. #5396
[BUGFIX] Alerts: Fix MimirGossipMembersMismatch to include admin-api and custom compactor pods. admin-api is a GEM component. #5641 #5797
[BUGFIX] Dashboards: fix autoscaling dashboard panels that could show multiple series for a single component. #5810

Jsonnet

[CHANGE] Removed _config.querier.concurrency configuration option and replaced it with _config.querier_max_concurrency and _config.ruler_querier_max_concurrency to allow to easily fine tune it for different querier deployments. #5322
[CHANGE] Change _config.multi_zone_ingester_max_unavailable to 50. #5327
[CHANGE] Change distributors rolling update strategy configuration: maxSurge and maxUnavailable are set to 15% and 0. #5714
[FEATURE] Alertmanager: Add horizontal pod autoscaler config, that can be enabled using autoscaling_alertmanager_enabled: true. #5194 #5249
[ENHANCEMENT] Enable the track_sizes feature for Memcached pods to help determine cache efficiency. #5209
[ENHANCEMENT] Add per-container map for environment variables. #5181
[ENHANCEMENT] Add PodDisruptionBudgets for compactor, continuous-test, distributor, overrides-exporter, querier, query-frontend, query-scheduler, rollout-operator, ruler, ruler-querier, ruler-query-frontend, ruler-query-scheduler, and all memcached workloads. #5098
[ENHANCEMENT] Ruler: configure the ruler storage cache when the metadata cache is enabled. #5326 #5334
[ENHANCEMENT] Shuffle-sharding: ingester shards in user-classes can now be configured to target different series and limit percentage utilization through _config.shuffle_sharding.target_series_per_ingester and _config.shuffle_sharding.target_utilization_percentage values. #5470
[ENHANCEMENT] Distributor: allow adjustment of the targeted CPU usage as a percentage of requested CPU. This can be adjusted with _config.autoscaling_distributor_cpu_target_utilization. #5525
[ENHANCEMENT] Ruler: add configuration option _config.ruler_remote_evaluation_max_query_response_size_bytes to easily set the maximum query response size allowed (in bytes). #5592
[ENHANCEMENT] Distributor: dynamically set GOMAXPROCS based on the CPU request. This should reduce distributor CPU utilization, assuming the CPU request is set to a value close to the actual utilization. #5588
[ENHANCEMENT] Querier: dynamically set GOMAXPROCS based on the CPU request. This should reduce noisy neighbour issues created by the querier, whose CPU utilization could eventually saturate the Kubernetes node if unbounded. #5646 #5658
[ENHANCEMENT] Allow to remove an entry from the configured environment variable for a given component, setting the environment value to null in the *_env_map objects (e.g. store_gateway_env_map+:: { 'field': null}). #5599
[ENHANCEMENT] Allow overriding the default number of replicas for etcd. #5589
[ENHANCEMENT] Memcached: reduce memory request for results, chunks and metadata caches. The requested memory is 5% greater than the configured memcached max cache size. #5661
[ENHANCEMENT] Autoscaling: Add the following configuration options to fine tune autoscaler target utilization: #5679 #5682 #5689
- autoscaling_querier_target_utilization (defaults to 0.75)
- autoscaling_mimir_read_target_utilization (defaults to 0.75)
- autoscaling_ruler_querier_cpu_target_utilization (defaults to 1)
- autoscaling_distributor_memory_target_utilization (defaults to 1)
- autoscaling_ruler_cpu_target_utilization (defaults to 1)
- autoscaling_query_frontend_cpu_target_utilization (defaults to 1)
- autoscaling_ruler_query_frontend_cpu_target_utilization (defaults to 1)
- autoscaling_alertmanager_cpu_target_utilization (defaults to 1)
[ENHANCEMENT] Gossip-ring: add appProtocol for istio compatibility. #5680
[ENHANCEMENT] Add _config.commonConfig to allow adding common configuration parameters for all Mimir components. #5703
[ENHANCEMENT] Update rollout-operator to v0.7.0. #5718
[ENHANCEMENT] Increase the default rollout speed for store-gateway when lazy loading is disabled. #5823
[BUGFIX] Fix compilation when index, chunks or metadata caches are disabled. #5710

Query-tee

[CHANGE] Proxy Content-Type response header from backend. Previously Content-Type: text/plain; charset=utf-8 was returned on all requests. #5183
[CHANGE] Increase default value of -proxy.compare-skip-recent-samples to avoid racing with recording rule evaluation. #5561
[CHANGE] Add -backend.skip-tls-verify to optionally skip TLS verification on backends. #5656

Documentation

[CHANGE] Fix reference to get-started documentation directory. #5476
[CHANGE] Fix link to external OTLP/HTTP documentation.
[ENHANCEMENT] Improved MimirRulerTooManyFailedQueries runbook. #5586
[ENHANCEMENT] Improved "Recover accidentally deleted blocks" runbook. #5620
[ENHANCEMENT] Documented options and trade-offs to query label names and values. #5582
[ENHANCEMENT] Improved MimirRequestErrors runbook for alertmanager. #5694

Tools

[CHANGE] copyblocks: add support for S3 and the ability to copy between different object storage services. Due to this, the -source-service and -destination-service flags are now required and the -service flag has been removed. #5486
[FEATURE] undelete-block-gcs: Added new tool for undeleting blocks on GCS storage. #5610 #5855
[FEATURE] wal-reader: Added new tool for printing entries in TSDB WAL. #5780
[ENHANCEMENT] ulidtime: add -seconds flag to print timestamps as Unix timestamps. #5621
[ENHANCEMENT] ulidtime: exit with status code 1 if some ULIDs can't be parsed. #5621
[ENHANCEMENT] tsdb-index-toc: added index-header size estimates. #5652
[BUGFIX] Stop tools from panicking when -help flag is passed. #5412
[BUGFIX] Remove github.com/golang/glog command line flags from tools. #5413

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.9.0...mimir-2.10.0-rc.0

mimir - 2.9.0

Published by flxbk over 1 year ago

This release contains 252 PRs from 46 authors, including new contributors Alex R, Alexander Soelberg Heidarsson, Alexander Weaver, Benjamin Lazarecki, Dhanu Saputra, Dominik Süß, Fiona Liao, Jonathan Halterman, Kristian Bremberg, MattiasSegerdahl, Salva Corts, Stephanie Closson, willychrisza. Thank you!

Grafana Mimir version 2.9.0 release notes

Grafana Labs is excited to announce version 2.9 of Grafana Mimir.

The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.

Features and enhancements

Reduced store-gateway memory utilization on fetching series from long-term storage For queries that include broad label matchers (e.g. datacenter="dc1"), Mimir 2.9 will fetch a reduced volume of index data, which leads to a significant reduction in memory allocations in the store-gateway.
Reduced CPU utilisation for some shuffle sharding scenarios Mimir queriers will now use significantly less CPU in cases where shuffle sharding is enabled for tenants with a shard size that's large but lower than the total number of ingesters.
Reduced object storage API calls in compactors and rulers Mimir 2.9 comes with optimizations that will reduce the amount of times compactors and rulers need to access rules stored in object storage.
- This release adds experimental support for a ruler storage cache. This cache should reduce the number of "list objects" API calls issued to the object storage when there are 2+ ruler replicas running in a Mimir cluster. The cache can be configured by setting the -ruler-storage.cache.* CLI flags or their respective YAML config options.
- We also introduced a new feature to trigger a synchronization of tenant's rule groups as soon as changes to the rule configuration are made via API. This synchronization is in addition of the periodic syncing done every -ruler.poll-interval, which has then been relaxed from every 1m to every 10m. The new behaviour is enabled globally by default but can be disabled with -ruler.sync-rules-on-changes-enabled=false or tuned at a per-tenant level.
Experimental support for streaming chunks from ingester to querier This is expected to greatly reduce querier memory consumption when evaluating queries that select a large number of series, because chunks streamed from the querier can now be read into memory as needed.

Helm chart improvements

The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.

Important changes

In Grafana Mimir 2.9 we have removed the following previously deprecated or experimental metrics:

cortex_bucket_store_chunk_pool_requested_bytes_total
cortex_bucket_store_chunk_pool_returned_bytes_total

The following configuration options are deprecated and will be removed in Grafana Mimir 2.11:

The CLI flag -querier.query-ingesters-within. This configuration is moved to per-tenant overrides.
The CLI flag -blocks-storage.bucket-store.bucket-index.enabled.
The CLI flags -blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes, -blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes and -blocks-storage.bucket-store.max-chunk-pool-bytes.
The CLI flags -querier.iterators and -query.batch-iterators.

The following configuration options that were deprecated in 2.7 are removed:

The CLI flag -blocks-storage.bucket-store.chunks-cache.subrange-size. A fixed value of 16000 is now always used.
The CLI flag -blocks-storage.bucket-store.consistency-delay.
The CLI flag -compactor.consistency-delay.
The CLI flag -ingester.ring.readiness-check-ring-health.

The following experimental options and features are now stable:

The CLI flag -query-frontend.query-sharding-max-regexp-size-bytes.
The CLI flag -query-scheduler.max-used-instances.
The CLI flags -(alertmanager|blocks|ruler)-storage.storage-prefix.
The CLI flag -compactor.first-level-compaction-wait-period.
The CLI flags -usage-stats.enabled and -usage-stats.installation-mode.
The CLI flag -query-frontend.query-sharding-target-series-per-shard.

The following configuration option defaults were changed:

The default value for the CLI flag -query-frontend.query-sharding-max-regexp-size-bytes was changed from 0 to 4096. As a result, queries with regex matchers exceeding this limit will not be sharded by default.
The default value for the CLI flag -compactor.partial-block-deletion-delay was changed from 0s to 1d. As a result, partial blocks resulting from a failed block upload or deletion will be cleaned up automatically.
The default value for the CLI flag -ruler.poll-interval was changed from 1m to 10m.

Bug fixes

Store-gateway: Detect collisions in the postings cache. PR 4770
Store-gateway: Fix panic caused by cached LabelValues responses with more than 655360 values. PR 5021

Changelog

2.9.0

Grafana Mimir

[CHANGE] Store-gateway: change expanded postings, postings, and label values index cache key format. These caches will be invalidated when rolling out the new Mimir version. #4770 #4978 #5037
[CHANGE] Distributor: remove the "forwarding" feature as it isn't necessary anymore. #4876
[CHANGE] Query-frontend: Change the default value of -query-frontend.query-sharding-max-regexp-size-bytes from 0 to 4096. #4932
[CHANGE] Querier: -querier.query-ingesters-within has been moved from a global flag to a per-tenant override. #4287
[CHANGE] Querier: Use -blocks-storage.tsdb.retention-period instead of -querier.query-ingesters-within for calculating the lookback period for shuffle sharded ingesters. Setting -querier.query-ingesters-within=0 no longer disables shuffle sharding on the read path. #4287
[CHANGE] Block upload: /api/v1/upload/block/{block}/files endpoint now allows file uploads with no Content-Length. #4956
[CHANGE] Store-gateway: deprecate configuration parameters for chunk pooling, they will be removed in Mimir 2.11. The following options are now also ignored: #4996
- -blocks-storage.bucket-store.max-chunk-pool-bytes
- -blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes
- -blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes
[CHANGE] Store-gateway: remove metrics cortex_bucket_store_chunk_pool_requested_bytes_total and cortex_bucket_store_chunk_pool_returned_bytes_total. #4996
[CHANGE] Compactor: change default of -compactor.partial-block-deletion-delay to 1d. This will automatically clean up partial blocks that were a result of failed block upload or deletion. #5026
[CHANGE] Compactor: the deprecated configuration parameter -compactor.consistency-delay has been removed. #5050
[CHANGE] Store-gateway: the deprecated configuration parameter -blocks-storage.bucket-store.consistency-delay has been removed. #5050
[CHANGE] The configuration parameter -blocks-storage.bucket-store.bucket-index.enabled has been deprecated and will be removed in Mimir 2.11. Mimir is running by default with the bucket index enabled since version 2.0, and starting from the version 2.11 it will not be possible to disable it. #5051
[CHANGE] The configuration parameters -querier.iterators and -query.batch-iterators have been deprecated and will be removed in Mimir 2.11. Mimir runs by default with -querier.batch-iterators=true, and starting from version 2.11 it will not be possible to change this. #5114
[CHANGE] Compactor: change default of -compactor.first-level-compaction-wait-period to 25m. #5128
[CHANGE] Ruler: changed default of -ruler.poll-interval from 1m to 10m. Starting from this release, the configured rule groups will also be re-synced each time they're modified calling the ruler configuration API. #5170
[FEATURE] Query-frontend: add -query-frontend.log-query-request-headers to enable logging of request headers in query logs. #5030
[ENHANCEMENT] Add per-tenant limit -validation.max-native-histogram-buckets to be able to ignore native histogram samples that have too many buckets. #4765
[ENHANCEMENT] Store-gateway: reduce memory usage in some LabelValues calls. #4789
[ENHANCEMENT] Store-gateway: add a stage label to the metric cortex_bucket_store_series_data_touched. This label now applies to data_type="chunks" and data_type="series". The stage label has 2 values: processed - the number of series that parsed - and returned - the number of series selected from the processed bytes to satisfy the query. #4797 #4830
[ENHANCEMENT] Distributor: make __meta_tenant_id label available in relabeling rules configured via metric_relabel_configs. #4725
[ENHANCEMENT] Compactor: added the configurable limit compactor.block-upload-max-block-size-bytes or compactor_block_upload_max_block_size_bytes to limit the byte size of uploaded or validated blocks. #4680
[ENHANCEMENT] Querier: reduce CPU utilisation when shuffle sharding is enabled with large shard sizes. #4851
[ENHANCEMENT] Packaging: facilitate configuration management by instructing systemd to start mimir with a configuration file. #4810
[ENHANCEMENT] Store-gateway: reduce memory allocations when looking up postings from cache. #4861 #4869 #4962 #5047
[ENHANCEMENT] Store-gateway: retain only necessary bytes when reading series from the bucket. #4926
[ENHANCEMENT] Ingester, store-gateway: clear the shutdown marker after a successful shutdown to enable reusing their persistent volumes in case the ingester or store-gateway is restarted. #4985
[ENHANCEMENT] Store-gateway, query-frontend: Reduced memory allocations when looking up cached entries from Memcached. #4862
[ENHANCEMENT] Alertmanager: Add additional template function queryFromGeneratorURL returning query URL decoded query from the GeneratorURL field of an alert. #4301
[ENHANCEMENT] Ruler: added experimental ruler storage cache support. The cache should reduce the number of "list objects" API calls issued to the object storage when there are 2+ ruler replicas running in a Mimir cluster. The cache can be configured setting -ruler-storage.cache.* CLI flags or their respective YAML config options. #4950 #5054
[ENHANCEMENT] Store-gateway: added HTTP /store-gateway/prepare-shutdown endpoint for gracefully scaling down of store-gateways. A gauge cortex_store_gateway_prepare_shutdown_requested has been introduced for tracing this process. #4955
[ENHANCEMENT] Updated Kuberesolver dependency (github.com/sercand/kuberesolver) from v2.4.0 to v4.0.0 and gRPC dependency (google.golang.org/grpc) from v1.47.0 to v1.53.0. #4922
[ENHANCEMENT] Introduced new options for logging HTTP request headers: -server.log-request-headers enables logging HTTP request headers, -server.log-request-headers-exclude-list lists headers which should not be logged. #4922
[ENHANCEMENT] Block upload: /api/v1/upload/block/{block}/files endpoint now disables read and write HTTP timeout, overriding -server.http-read-timeout and -server.http-write-timeout values. This is done to allow large file uploads to succeed. #4956
[ENHANCEMENT] Alertmanager: Introduce new metrics from upstream. #4918
- cortex_alertmanager_notifications_failed_total (added reason label)
- cortex_alertmanager_nflog_maintenance_total
- cortex_alertmanager_nflog_maintenance_errors_total
- cortex_alertmanager_silences_maintenance_total
- cortex_alertmanager_silences_maintenance_errors_total
[ENHANCEMENT] Add native histogram support for cortex_request_duration_seconds metric family. #4987
[ENHANCEMENT] Ruler: do not list rule groups in the object storage for disabled tenants. #5004
[ENHANCEMENT] Query-frontend and querier: add HTTP API endpoint <prometheus-http-prefix>/api/v1/format_query to format a PromQL query. #4373
[ENHANCEMENT] Query-frontend: Add cortex_query_frontend_regexp_matcher_count and cortex_query_frontend_regexp_matcher_optimized_count metrics to track optimization of regular expression label matchers. #4813
[ENHANCEMENT] Alertmanager: Add configuration option to enable or disable the deletion of alertmanager state from object storage. This is useful when migrating alertmanager tenants from one cluster to another, because it avoids a condition where the state object is copied but then deleted before the configuration object is copied. #4989
[ENHANCEMENT] Querier: only use the minimum set of chunks from ingesters when querying, and cancel unnecessary requests to ingesters sooner if we know their results won't be used. #5016
[ENHANCEMENT] Add -enable-go-runtime-metrics flag to expose all go runtime metrics as Prometheus metrics. #5009
[ENHANCEMENT] Ruler: trigger a synchronization of tenant's rule groups as soon as they change the rules configuration via API. This synchronization is in addition of the periodic syncing done every -ruler.poll-interval. The new behavior is enabled by default, but can be disabled with -ruler.sync-rules-on-changes-enabled=false (configurable on a per-tenant basis too). If you disable the new behaviour, then you may want to revert -ruler.poll-interval to 1m. #4975 #5053 #5115 #5170
[ENHANCEMENT] Distributor: Improve invalid tenant shard size error message. #5024
[ENHANCEMENT] Store-gateway: record index header loading time separately in cortex_bucket_store_series_request_stage_duration_seconds{stage="load_index_header"}. Now index header loading will be visible in the "Mimir / Queries" dashboard in the "Series request p99/average latency" panels. #5011 #5062
[ENHANCEMENT] Querier and ingester: add experimental support for streaming chunks from ingesters to queriers while evaluating queries. This can be enabled with -querier.prefer-streaming-chunks=true. #4886 #5078 #5094 #5126
[ENHANCEMENT] Update Docker base images from alpine:3.17.3 to alpine:3.18.0. #5065
[ENHANCEMENT] Compactor: reduced the number of "object exists" API calls issued by the compactor to the object storage when syncing block's meta.json files. #5063
[ENHANCEMENT] Distributor: Push request rate limits (-distributor.request-rate-limit and -distributor.request-burst-size) and their associated YAML configuration are now stable. #5124
[ENHANCEMENT] Go: updated to 1.20.5. #5185
[ENHANCEMENT] Update alpine base image to 3.18.2. #5274 #5276
[BUGFIX] Metadata API: Mimir will now return an empty object when no metadata is available, matching Prometheus. #4782
[BUGFIX] Store-gateway: add collision detection on expanded postings and individual postings cache keys. #4770
[BUGFIX] Ruler: Support the type=alert|record query parameter for the API endpoint <prometheus-http-prefix>/api/v1/rules. #4302
[BUGFIX] Backend: Check that alertmanager's data-dir doesn't overlap with bucket-sync dir. #4921
[BUGFIX] Alertmanager: Allow to rate-limit webex, telegram and discord notifications. #4979
[BUGFIX] Store-gateway: panics when decoding LabelValues responses that contain more than 655360 values. These responses are no longer cached. #5021
[BUGFIX] Querier: don't leak memory when processing query requests from query-frontends (ie. when the query-scheduler is disabled). #5199

Documentation

[ENHANCEMENT] Improve MimirIngesterReachingTenantsLimit runbook. #4744 #4752
[ENHANCEMENT] Add symbol table size exceeds case to MimirCompactorHasNotSuccessfullyRunCompaction runbook. #4945
[ENHANCEMENT] Clarify which APIs use query sharding. #4948

Mixin

[CHANGE] Alerts: Remove MimirQuerierHighRefetchRate. #4980
[CHANGE] Alerts: Remove MimirTenantHasPartialBlocks. This is obsoleted by the changed default of -compactor.partial-block-deletion-delay to 1d, which will auto remediate this alert. #5026
[ENHANCEMENT] Alertmanager dashboard: display active aggregation groups #4772
[ENHANCEMENT] Alerts: MimirIngesterTSDBWALCorrupted now only fires when there are more than one corrupted WALs in single-zone deployments and when there are more than two zones affected in multi-zone deployments. #4920
[ENHANCEMENT] Alerts: added labels to duplicated MimirRolloutStuck and MimirCompactorHasNotUploadedBlocks rules in order to distinguish them. #5023
[ENHANCEMENT] Dashboards: fix holes in graph for lightly loaded clusters #4915
[ENHANCEMENT] Dashboards: allow configuring additional services for the Rollout Progress dashboard. #5007
[ENHANCEMENT] Alerts: do not fire MimirAllocatingTooMuchMemory alert for any matching container outside of namespaces where Mimir is running. #5089
[BUGFIX] Dashboards: show cancelled requests in a different color to successful requests in throughput panels on dashboards. #5039
[BUGFIX] Dashboards: fix dashboard panels that showed percentages with axes from 0 to 10000%. #5084

Jsonnet

[CHANGE] Ruler: changed ruler autoscaling policy, extended scale down period from 60s to 600s. #4786
[CHANGE] Update to v0.5.0 rollout-operator. #4893
[CHANGE] Backend: add alertmanager_args to mimir-backend when running in read-write deployment mode. Remove hardcoded filesystem alertmanager storage. This moves alertmanager's data-dir to /data/alertmanager by default. #4907 #4921
[CHANGE] Remove -pdb suffix from PodDisruptionBudget names. This will create new PodDisruptionBudget resources. Make sure to prune the old resources; otherwise, rollouts will be blocked. #5109
[CHANGE] Query-frontend: enable query sharding for cardinality estimation via -query-frontend.query-sharding-target-series-per-shard by default if the results cache is enabled. #5128
[ENHANCEMENT] Ingester: configure -blocks-storage.tsdb.head-compaction-interval=15m to spread TSDB head compaction over a wider time range. #4870
[ENHANCEMENT] Ingester: configure -blocks-storage.tsdb.wal-replay-concurrency to CPU request minus 1. #4864
[ENHANCEMENT] Compactor: configure -compactor.first-level-compaction-wait-period to TSDB head compaction interval plus 10 minutes. #4872
[ENHANCEMENT] Store-gateway: set GOMEMLIMIT to the memory request value. This should reduce the likelihood the store-gateway may go out of memory, at the cost of an higher CPU utilization due to more frequent garbage collections when the memory utilization gets closer or above the configured requested memory. #4971
[ENHANCEMENT] Store-gateway: dynamically set GOMAXPROCS based on the CPU request. This should reduce the likelihood a high load on the store-gateway will slow down the entire Kubernetes node. #5104
[ENHANCEMENT] Store-gateway: add store_gateway_lazy_loading_enabled configuration option which combines disabled lazy-loading and reducing blocks sync concurrency. Reducing blocks sync concurrency improves startup times with disabled lazy loading on HDDs. #5025
[ENHANCEMENT] Update rollout-operator image to v0.6.0. #5155
[BUGFIX] Backend: configure -ruler.alertmanager-url to mimir-backend when running in read-write deployment mode. #4892

Mimirtool

[CHANGE] check rules: will fail on duplicate rules when --strict is provided. #5035
[FEATURE] sync/diff can now include/exclude namespaces based on a regular expression using --namespaces-regex and --ignore-namespaces-regex. #5100
[ENHANCEMENT] analyze prometheus: allow to specify -prometheus-http-prefix. #4966
[ENHANCEMENT] analyze grafana: allow to specify --folder-title to limit dashboards analysis based on their exact folder title. #4973

Tools

[CHANGE] copyblocks: copying between Azure Blob Storage buckets is now supported in addition to copying between Google Cloud Storage buckets. As a result, the --service flag is now required to be specified (accepted values are gcs or abs). #4756

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.8.0...mimir-2.9.0

mimir - 2.9.0-rc.1

Published by flxbk over 1 year ago

This release contains 260 PRs from 46 authors. Thank you!

Grafana Mimir version 2.9 release notes

Grafana Labs is excited to announce version 2.9 of Grafana Mimir.

The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.

Features and enhancements

Reduced store-gateway memory utilization on fetching series from long-term storage For queries that include broad label matchers (e.g. datacenter="dc1"), Mimir 2.9 will fetch a reduced volume of index data, which leads to a significant reduction in memory allocations in the store-gateway.
Reduced CPU utilisation for some shuffle sharding scenarios Mimir queriers will now use significantly less CPU in cases where shuffle sharding is enabled for tenants with a shard size that's large but lower than the total number of ingesters.
Reduced object storage API calls in compactors and rulers Mimir 2.9 comes with optimizations that will reduce the amount of times compactors and rulers need to access rules stored in object storage.
- This release adds experimental support for a ruler storage cache. This cache should reduce the number of "list objects" API calls issued to the object storage when there are 2+ ruler replicas running in a Mimir cluster. The cache can be configured by setting the -ruler-storage.cache.* CLI flags or their respective YAML config options.
- We also introduced a new feature to trigger a synchronization of tenant's rule groups as soon as changes to the rule configuration are made via API. This synchronization is in addition of the periodic syncing done every -ruler.poll-interval and allows to increase the polling interval. The new behavior is enabled globally by default but can be disabled with -ruler.sync-rules-on-changes-enabled=false or tuned at a per-tenant level.
Experimental support for streaming chunks from ingester to querier This is expected to greatly reduce querier memory consumption when evaluating queries that select a large number of series, because chunks streamed from the querier can now be read into memory as needed.

Helm chart improvements

The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.

Important changes

In Grafana Mimir 2.9 we have removed the following previously deprecated or experimental metrics:

cortex_bucket_store_chunk_pool_requested_bytes_total
cortex_bucket_store_chunk_pool_returned_bytes_total

The following configuration options are deprecated and will be removed in Grafana Mimir 2.11:

The CLI flag -querier.query-ingesters-within. This configuration is moved to per-tenant overrides.
The CLI flag -blocks-storage.bucket-store.bucket-index.enabled.
The CLI flags -blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes, -blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes and -blocks-storage.bucket-store.max-chunk-pool-bytes.
The CLI flags querier.iterators and -query.batch-iterators.

The following configuration options that were deprecated in 2.7 are removed:

The CLI flag -blocks-storage.bucket-store.chunks-cache.subrange-size. A fixed value of 16000 is now always used.
The CLI flag -blocks-storage.bucket-store.consistency-delay.
The CLI flag -compactor.consistency-delay.
The CLI flag -ingester.ring.readiness-check-ring-health.

The following experimental options and features are now stable:

The CLI flag -query-frontend.query-sharding-max-regexp-size-bytes.
The CLI flag -query-scheduler.max-used-instances.
The CLI flags -(alertmanager|blocks|ruler)-storage.storage-prefix.
The CLI flag -compactor.first-level-compaction-wait-period.
The CLI flags -usage-stats.enabled and -usage-stats.installation-mode.
The CLI flag -query-frontend.query-sharding-target-series-per-shard.

The following configuration option defaults were changed:

The default value for the CLI flag -query-frontend.query-sharding-max-regexp-size-bytes was changed from 0 to 4096. As a result, queries with regex matchers exceeding this limit will not be sharded by default.
The default value for the CLI flag -compactor.partial-block-deletion-delay was changed from 0s to 1d. As a result, partial blocks resulting from a failed block upload or deletion will be cleaned up automatically.
The default value for the CLI flag -ruler.poll-interval was changed from 1m to 10m.

Bug fixes

Store-gateway: Detect collisions in the postings cache. PR 4770
Store-gateway: Fix panic caused by cached LabelValues responses with more than 655360 values. PR 5021

Changelog

2.9.0-rc.1

Grafana Mimir

[CHANGE] Store-gateway: change expanded postings, postings, and label values index cache key format. These caches will be invalidated when rolling out the new Mimir version. #4770 #4978 #5037
[CHANGE] Distributor: remove the "forwarding" feature as it isn't necessary anymore. #4876
[CHANGE] Query-frontend: Change the default value of -query-frontend.query-sharding-max-regexp-size-bytes from 0 to 4096. #4932
[CHANGE] Querier: -querier.query-ingesters-within has been moved from a global flag to a per-tenant override. #4287
[CHANGE] Querier: Use -blocks-storage.tsdb.retention-period instead of -querier.query-ingesters-within for calculating the lookback period for shuffle sharded ingesters. Setting -querier.query-ingesters-within=0 no longer disables shuffle sharding on the read path. #4287
[CHANGE] Block upload: /api/v1/upload/block/{block}/files endpoint now allows file uploads with no Content-Length. #4956
[CHANGE] Store-gateway: deprecate configuration parameters for chunk pooling, they will be removed in Mimir 2.11. The following options are now also ignored: #4996
- -blocks-storage.bucket-store.max-chunk-pool-bytes
- -blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes
- -blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes
[CHANGE] Store-gateway: remove metrics cortex_bucket_store_chunk_pool_requested_bytes_total and cortex_bucket_store_chunk_pool_returned_bytes_total. #4996
[CHANGE] Compactor: change default of -compactor.partial-block-deletion-delay to 1d. This will automatically clean up partial blocks that were a result of failed block upload or deletion. #5026
[CHANGE] Compactor: the deprecated configuration parameter -compactor.consistency-delay has been removed. #5050
[CHANGE] Store-gateway: the deprecated configuration parameter -blocks-storage.bucket-store.consistency-delay has been removed. #5050
[CHANGE] The configuration parameter -blocks-storage.bucket-store.bucket-index.enabled has been deprecated and will be removed in Mimir 2.11. Mimir is running by default with the bucket index enabled since version 2.0, and starting from the version 2.11 it will not be possible to disable it. #5051
[CHANGE] The configuration parameters -querier.iterators and -query.batch-iterators have been deprecated and will be removed in Mimir 2.11. Mimir runs by default with -querier.batch-iterators=true, and starting from version 2.11 it will not be possible to change this. #5114
[CHANGE] Compactor: change default of -compactor.first-level-compaction-wait-period to 25m. #5128
[CHANGE] Ruler: changed default of -ruler.poll-interval from 1m to 10m. Starting from this release, the configured rule groups will also be re-synced each time they're modified calling the ruler configuration API. #5170
[FEATURE] Query-frontend: add -query-frontend.log-query-request-headers to enable logging of request headers in query logs. #5030
[ENHANCEMENT] Add per-tenant limit -validation.max-native-histogram-buckets to be able to ignore native histogram samples that have too many buckets. #4765
[ENHANCEMENT] Store-gateway: reduce memory usage in some LabelValues calls. #4789
[ENHANCEMENT] Store-gateway: add a stage label to the metric cortex_bucket_store_series_data_touched. This label now applies to data_type="chunks" and data_type="series". The stage label has 2 values: processed - the number of series that parsed - and returned - the number of series selected from the processed bytes to satisfy the query. #4797 #4830
[ENHANCEMENT] Distributor: make __meta_tenant_id label available in relabeling rules configured via metric_relabel_configs. #4725
[ENHANCEMENT] Compactor: added the configurable limit compactor.block-upload-max-block-size-bytes or compactor_block_upload_max_block_size_bytes to limit the byte size of uploaded or validated blocks. #4680
[ENHANCEMENT] Querier: reduce CPU utilisation when shuffle sharding is enabled with large shard sizes. #4851
[ENHANCEMENT] Packaging: facilitate configuration management by instructing systemd to start mimir with a configuration file. #4810
[ENHANCEMENT] Store-gateway: reduce memory allocations when looking up postings from cache. #4861 #4869 #4962 #5047
[ENHANCEMENT] Store-gateway: retain only necessary bytes when reading series from the bucket. #4926
[ENHANCEMENT] Ingester, store-gateway: clear the shutdown marker after a successful shutdown to enable reusing their persistent volumes in case the ingester or store-gateway is restarted. #4985
[ENHANCEMENT] Store-gateway, query-frontend: Reduced memory allocations when looking up cached entries from Memcached. #4862
[ENHANCEMENT] Alertmanager: Add additional template function queryFromGeneratorURL returning query URL decoded query from the GeneratorURL field of an alert. #4301
[ENHANCEMENT] Ruler: added experimental ruler storage cache support. The cache should reduce the number of "list objects" API calls issued to the object storage when there are 2+ ruler replicas running in a Mimir cluster. The cache can be configured setting -ruler-storage.cache.* CLI flags or their respective YAML config options. #4950 #5054
[ENHANCEMENT] Store-gateway: added HTTP /store-gateway/prepare-shutdown endpoint for gracefully scaling down of store-gateways. A gauge cortex_store_gateway_prepare_shutdown_requested has been introduced for tracing this process. #4955
[ENHANCEMENT] Updated Kuberesolver dependency (github.com/sercand/kuberesolver) from v2.4.0 to v4.0.0 and gRPC dependency (google.golang.org/grpc) from v1.47.0 to v1.53.0. #4922
[ENHANCEMENT] Introduced new options for logging HTTP request headers: -server.log-request-headers enables logging HTTP request headers, -server.log-request-headers-exclude-list lists headers which should not be logged. #4922
[ENHANCEMENT] Block upload: /api/v1/upload/block/{block}/files endpoint now disables read and write HTTP timeout, overriding -server.http-read-timeout and -server.http-write-timeout values. This is done to allow large file uploads to succeed. #4956
[ENHANCEMENT] Alertmanager: Introduce new metrics from upstream. #4918
- cortex_alertmanager_notifications_failed_total (added reason label)
- cortex_alertmanager_nflog_maintenance_total
- cortex_alertmanager_nflog_maintenance_errors_total
- cortex_alertmanager_silences_maintenance_total
- cortex_alertmanager_silences_maintenance_errors_total
[ENHANCEMENT] Add native histogram support for cortex_request_duration_seconds metric family. #4987
[ENHANCEMENT] Ruler: do not list rule groups in the object storage for disabled tenants. #5004
[ENHANCEMENT] Query-frontend and querier: add HTTP API endpoint <prometheus-http-prefix>/api/v1/format_query to format a PromQL query. #4373
[ENHANCEMENT] Query-frontend: Add cortex_query_frontend_regexp_matcher_count and cortex_query_frontend_regexp_matcher_optimized_count metrics to track optimization of regular expression label matchers. #4813
[ENHANCEMENT] Alertmanager: Add configuration option to enable or disable the deletion of alertmanager state from object storage. This is useful when migrating alertmanager tenants from one cluster to another, because it avoids a condition where the state object is copied but then deleted before the configuration object is copied. #4989
[ENHANCEMENT] Querier: only use the minimum set of chunks from ingesters when querying, and cancel unnecessary requests to ingesters sooner if we know their results won't be used. #5016
[ENHANCEMENT] Add -enable-go-runtime-metrics flag to expose all go runtime metrics as Prometheus metrics. #5009
[ENHANCEMENT] Ruler: trigger a synchronization of tenant's rule groups as soon as they change the rules configuration via API. This synchronization is in addition of the periodic syncing done every -ruler.poll-interval. The new behavior is enabled by default, but can be disabled with -ruler.sync-rules-on-changes-enabled=false (configurable on a per-tenant basis too). If you disable the new behaviour, then you may want to revert -ruler.poll-interval to 1m. #4975 #5053 #5115 #5170
[ENHANCEMENT] Distributor: Improve invalid tenant shard size error message. #5024
[ENHANCEMENT] Store-gateway: record index header loading time separately in cortex_bucket_store_series_request_stage_duration_seconds{stage="load_index_header"}. Now index header loading will be visible in the "Mimir / Queries" dashboard in the "Series request p99/average latency" panels. #5011 #5062
[ENHANCEMENT] Querier and ingester: add experimental support for streaming chunks from ingesters to queriers while evaluating queries. This can be enabled with -querier.prefer-streaming-chunks=true. #4886 #5078 #5094 #5126
[ENHANCEMENT] Update Docker base images from alpine:3.17.3 to alpine:3.18.0. #5065
[ENHANCEMENT] Compactor: reduced the number of "object exists" API calls issued by the compactor to the object storage when syncing block's meta.json files. #5063
[ENHANCEMENT] Distributor: Push request rate limits (-distributor.request-rate-limit and -distributor.request-burst-size) and their associated YAML configuration are now stable. #5124
[ENHANCEMENT] Go: updated to 1.20.5. #5185
[BUGFIX] Metadata API: Mimir will now return an empty object when no metadata is available, matching Prometheus. #4782
[BUGFIX] Store-gateway: add collision detection on expanded postings and individual postings cache keys. #4770
[BUGFIX] Ruler: Support the type=alert|record query parameter for the API endpoint <prometheus-http-prefix>/api/v1/rules. #4302
[BUGFIX] Backend: Check that alertmanager's data-dir doesn't overlap with bucket-sync dir. #4921
[BUGFIX] Alertmanager: Allow to rate-limit webex, telegram and discord notifications. #4979
[BUGFIX] Store-gateway: panics when decoding LabelValues responses that contain more than 655360 values. These responses are no longer cached. #5021
[BUGFIX] Querier: don't leak memory when processing query requests from query-frontends (ie. when the query-scheduler is disabled). #5199

Documentation

[ENHANCEMENT] Improve MimirIngesterReachingTenantsLimit runbook. #4744 #4752
[ENHANCEMENT] Add symbol table size exceeds case to MimirCompactorHasNotSuccessfullyRunCompaction runbook. #4945
[ENHANCEMENT] Clarify which APIs use query sharding. #4948

Mixin

[CHANGE] Alerts: Remove MimirQuerierHighRefetchRate. #4980
[CHANGE] Alerts: Remove MimirTenantHasPartialBlocks. This is obsoleted by the changed default of -compactor.partial-block-deletion-delay to 1d, which will auto remediate this alert. #5026
[ENHANCEMENT] Alertmanager dashboard: display active aggregation groups #4772
[ENHANCEMENT] Alerts: MimirIngesterTSDBWALCorrupted now only fires when there are more than one corrupted WALs in single-zone deployments and when there are more than two zones affected in multi-zone deployments. #4920
[ENHANCEMENT] Alerts: added labels to duplicated MimirRolloutStuck and MimirCompactorHasNotUploadedBlocks rules in order to distinguish them. #5023
[ENHANCEMENT] Dashboards: fix holes in graph for lightly loaded clusters #4915
[ENHANCEMENT] Dashboards: allow configuring additional services for the Rollout Progress dashboard. #5007
[ENHANCEMENT] Alerts: do not fire MimirAllocatingTooMuchMemory alert for any matching container outside of namespaces where Mimir is running. #5089
[BUGFIX] Dashboards: show cancelled requests in a different color to successful requests in throughput panels on dashboards. #5039
[BUGFIX] Dashboards: fix dashboard panels that showed percentages with axes from 0 to 10000%. #5084

Jsonnet

[CHANGE] Ruler: changed ruler autoscaling policy, extended scale down period from 60s to 600s. #4786
[CHANGE] Update to v0.5.0 rollout-operator. #4893
[CHANGE] Backend: add alertmanager_args to mimir-backend when running in read-write deployment mode. Remove hardcoded filesystem alertmanager storage. This moves alertmanager's data-dir to /data/alertmanager by default. #4907 #4921
[CHANGE] Remove -pdb suffix from PodDisruptionBudget names. This will create new PodDisruptionBudget resources. Make sure to prune the old resources; otherwise, rollouts will be blocked. #5109
[CHANGE] Query-frontend: enable query sharding for cardinality estimation via -query-frontend.query-sharding-target-series-per-shard by default if the results cache is enabled. #5128
[ENHANCEMENT] Ingester: configure -blocks-storage.tsdb.head-compaction-interval=15m to spread TSDB head compaction over a wider time range. #4870
[ENHANCEMENT] Ingester: configure -blocks-storage.tsdb.wal-replay-concurrency to CPU request minus 1. #4864
[ENHANCEMENT] Compactor: configure -compactor.first-level-compaction-wait-period to TSDB head compaction interval plus 10 minutes. #4872
[ENHANCEMENT] Store-gateway: set GOMEMLIMIT to the memory request value. This should reduce the likelihood the store-gateway may go out of memory, at the cost of an higher CPU utilization due to more frequent garbage collections when the memory utilization gets closer or above the configured requested memory. #4971
[ENHANCEMENT] Store-gateway: dynamically set GOMAXPROCS based on the CPU request. This should reduce the likelihood a high load on the store-gateway will slow down the entire Kubernetes node. #5104
[ENHANCEMENT] Store-gateway: add store_gateway_lazy_loading_enabled configuration option which combines disabled lazy-loading and reducing blocks sync concurrency. Reducing blocks sync concurrency improves startup times with disabled lazy loading on HDDs. #5025
[ENHANCEMENT] Update rollout-operator image to v0.6.0. #5155
[BUGFIX] Backend: configure -ruler.alertmanager-url to mimir-backend when running in read-write deployment mode. #4892

Mimirtool

[CHANGE] check rules: will fail on duplicate rules when --strict is provided. #5035
[FEATURE] sync/diff can now include/exclude namespaces based on a regular expression using --namespaces-regex and --ignore-namespaces-regex. #5100
[ENHANCEMENT] analyze prometheus: allow to specify -prometheus-http-prefix. #4966
[ENHANCEMENT] analyze grafana: allow to specify --folder-title to limit dashboards analysis based on their exact folder title. #4973

Tools

[CHANGE] copyblocks: copying between Azure Blob Storage buckets is now supported in addition to copying between Google Cloud Storage buckets. As a result, the --service flag is now required to be specified (accepted values are gcs or abs). #4756