Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.
AGPL-3.0 License
Bot releases are visible (Hide)
This release contains 531 PRs from 60 authors, including new contributors Benoit Schipper, Derek Cadzow, Edwin, Itay Kalfon, Ivan Farré Vicente, Jan O. Rundshagen, Jorge Turrado Ferrero, Lukas Monkevicius, Mickaël Canévet, Rafael Sathler, Rajakavitha Kodhandapani, Tim Kotowski, Vladimir Varankin, Zach, Zach Day, Zirko, blut, github-actions[bot], ncharaf, zhehao-grafana. Thank you!
Grafana Labs is excited to announce version 2.12 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bug fixes in this release.
For the complete list of changes, refer to the CHANGELOG.
Added support to only count series that are considered active through the Cardinality API endpoint /api/v1/cardinality/label_names
by passing the count_method
parameter.
If set to active
it counts only series that are considered active according to the -ingester.active-series-metrics-idle-timeout
flag setting rather than counting all in-memory series.
The "Store-gateway: bucket tenant blocks" admin page contains a new column "No Compact".
If block no compaction marker is set, it specifies the reason and the date the marker is added.
The estimated number of compaction jobs based on the current bucket-index is now computed by the compactor.
The result is tracked by the new cortex_bucket_index_compaction_jobs
metric.
If this computation fails, the cortex_bucket_index_compaction_jobs_errors_total
metric is updated instead.
The estimated number of compaction jobs is also shown in Top tenants, Tenants, and Compactor dashboards.
Added mimir-distroless
container image built upon a distroless
image (gcr.io/distroless/static-debian12
).
This improvement minimizes attack surfaces and potential CVEs by trimming down the dependencies within the image.
After comprehensive testing, the Mimir maintainers plan to shift from the current image to the distroless version.
Additionally, the following previously experimental features are now considered stable:
The number of pre-allocated workers used to forward push requests to the ingesters, configurable via the -distributor.reusable-ingester-push-workers
CLI flag on distributors.
It now defaults to 2000
.
Note that this is a performance optimization, and not a limiting feature.
If not enough workers available, new goroutines will be spawned.
The number of gRPC server workers used to serve the requests, configurable via the -server.grpc.num-workers
CLI flag.
It now defaults to 100
.
Note that this is the number of pre-allocated long-lived workers, and not a limiting feature.
If not enough workers are available, new goroutines will be spawned.
The maximum number of concurrent index header loads across all tenants, configurable via the -blocks-storage.bucket-store.index-header.lazy-loading-concurrency
CLI flag on store-gateways.
It defaults to 4
.
The maximum time to wait for the query-frontend to become ready before rejecting requests, configurable via the -query-frontend.not-running-timeout
CLI flag on query-frontends.
It now defaults to 2s
.
The CLI flag that allows queriers to reduce pressure on ingesters by initially querying only the minimum set of ingesters required to reach quorum, -querier.minimize-ingester-requests
.
It is now enabled by default.
Spread-minimizing token-related CLI flags: -ingester.ring.token-generation-strategy
, -ingester.ring.spread-minimizing-zones
and -ingester.ring.spread-minimizing-join-ring-in-order
.
You can read more about this feature in our blog post.
In Grafana Mimir 2.12 the following behavior has changed:
Store-gateway now persists a sparse version of the index-header to disk on construction and loads sparse index-headers from disk instead of the whole index-header.
This improves the speed at which index headers are lazy-loaded from disk by up to 90%. The added disk usage is in the order of 1-2%.
Alertmanager deprecated the v1
API. All v1
API endpoints now respond with a JSON deprecation notice and a status code of 410
.
All endpoints have a v2
equivalent.
The list of endpoints is:
<alertmanager-web.external-url>/api/v1/alerts
<alertmanager-web.external-url>/api/v1/receivers
<alertmanager-web.external-url>/api/v1/silence/{id}
<alertmanager-web.external-url>/api/v1/silences
<alertmanager-web.external-url>/api/v1/status
Exemplar's label traceID
has been changed to trace_id
to be consistent with the OpenTelemetry standard.
Errors returned by ingesters now contain only gRPC status codes.
Previously they contained both gRPC and HTTP status codes.
{{< admonition type="warning" >}}
To guarantee backwards compatibility when migrating from a version prior to 2.11
, it's necessary to first migrate to version 2.11
, and then to version 2.12
.
Otherwise, it might happen that during the migration, some ingester errors with HTTP status code 4xx
won't be recognized, and the corresponding request will be repeated.
{{< /admonition >}}
Responses with gRPC status codes are now reported as status_code
labels in the cortex_request_duration_seconds
and cortex_ingester_client_request_duration_seconds
metrics.
Responses with HTTP 4xx status codes are now treated as errors and used in status_code
label of request duration metric.
The default value of the following CLI flags have been changed:
-blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes
from 10MB
to 100MB
.-blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes
from 10MB
to 100MB
.-blocks-storage.bucket-store.tenant-sync-concurrency
from 10
to 1
.-query-frontend.max-cache-freshness
from 1m
to 10m
.-distributor.write-requests-buffer-pooling-enabled
from false
to true
.-locks-storage.bucket-store.block-sync-concurrency
from 20
to 4
.-memberlist.stream-timeout
from 10s
to 2s
.-server.report-grpc-codes-in-instrumentation-label-enabled
from false
to true
.The following deprecated configuration options are removed in Grafana Mimir 2.12:
frontend.cache_unaligned_requests
.-querier.prefer-streaming-chunks-from-ingesters
.The following configuration options are deprecated and will be removed in Grafana Mimir 2.14:
The CLI flag -ingester.limit-inflight-requests-using-grpc-method-limiter
.
It now defaults to true
.
The CLI flag -ingester.return-only-grpc-errors
.
It now defaults to true
.
{{< admonition type="warning" >}}
To guarantee backwards compatibility when migrating from a version prior to 2.11
, it's necessary to first migrate to version 2.11
, and then to version 2.12
.
Otherwise, it might happen that during the migration, some ingester errors with HTTP status code 4xx
won't be recognized, and the corresponding request will be repeated.
{{< /admonition >}}
The CLI flag -ingester.client.report-grpc-codes-in-instrumentation-label-enabled
.
It now defaults to true
.
The CLI flag -distributor.limit-inflight-requests-using-grpc-method-limiter
.
It now defaults to true
.
The CLI flag -distributor.enable-otlp-metadata-storage
.
It now defaults to true
.
The CLI flag -querier.max-query-into-future
.
The following metrics are removed or deprecated:
cortex_bucket_store_blocks_loaded_by_duration
has been removed.cortex_distributor_sample_delay_seconds
has been deprecated and will be removed in Mimir 2.14.Grafana Mimir 2.12 includes new features that are considered experimental and disabled by default.
Use them with caution and report any issues you encounter:
The maximum number of tenant IDs that may be for a federated query can be configured via the -tenant-federation.max-tenants
CLI flag on query-frontends.
By default, it's 0
, meaning that the limit is disabled.
Sharding of active series queries can be enabled via the -query-frontend.shard-active-series-queries
CLI flag on query-frontends.
Timely head compaction can be enabled via the -blocks-storage.tsdb.timely-head-compaction-enabled
on ingesters.
If enabled, the head compaction happens when the min block range can no longer be appended, without requiring 1.5x the chunk range worth of data in the head.
Streaming of responses from querier to query-frontend can be enabled via the -querier.response-streaming-enabled
CLI flag on queriers.
This is currently supported only for responses from the /api/v1/cardinality/active_series
endpoint.
The maximum response size for active series queries, in bytes, can be set via the -querier.active-series-results-max-size-bytes
CLI flag on queriers.
Metric relabeling on a per-tenant basis can be forcefully disabled via the -distributor.metric-relabeling-enabled
CLI flag on rulers.
Metrics relabeling is enabled by default.
Query Queue Load Balancing by Query Component. Tenant query queues in the query-scheduler can now be split into subqueues by which query component is expected to be utilized to complete the query: ingesters, store-gateways, both, or uncategorized.
Dequeuing queries for a given tenant will rotate through the query component subqueues via simple round-robin.
In the event that the one of the query components (ingesters or store-gateways) experience a slowdown, queries only utilizing the other query component can continue to be serviced.
This feature is recommended to be enabled.
The following CLI flags must be set to true in order to be in effect:
-query-frontend.additional-query-queue-dimensions-enabled
on the query-frontend.-query-scheduler.additional-query-queue-dimensions-enabled
on the query-scheduler.Owned series tracking in ingesters can be enabled via the -ingester.track-ingester-owned-series
CLI flag.
When enabled, ingesters will track the number of in-memory series that still map to the ingester based on the ring state.
These counts are more reactive to ring and shard changes than in-memory series, and can be used when enforcing tenant series limits by enabling the -ingester.use-ingester-owned-series-for-limits
CLI flag.
This feature requires zone-aware replication to be enabled, and the replication factor to be equal to the number of zones.
-distributor.metric-relabeling-enabled
could cause distributors to panic.-distributor.metric-relabeling-enabled
could cause distributors to write unsorted labels and corrupt blocks.-querier.max-fetched-series-per-query
wasn't applied to /series
endpoint in case series loaded from ingesters.400
, while when returning chunks all internal errors were translated to HTTP 500
.400
errors, while all other errors will be translated into HTTP 500
errors.cortex_query_frontend_queries_total
metric incorrectly reported op="query"
for any request which wasn't a range query.op
label value can be one of the following:
query
: instant queryquery_range
: range querycardinality
: cardinality querylabel_names_and_values
: label names / values queryactive_series
: active series queryother
: any other requestcortex_ruler_write_requests_failed_total
metric.The Grafana Mimir and Grafana Enterprise Metrics Helm charts are released independently.
Refer to the Grafana Mimir Helm chart documentation.
v1
API. All v1
API endpoints now respond with a JSON deprecation notice and a status code of 410
. All endpoints have a v2
equivalent. The list of endpoints is: #7103
<alertmanager-web.external-url>/api/v1/alerts
<alertmanager-web.external-url>/api/v1/receivers
<alertmanager-web.external-url>/api/v1/silence/{id}
<alertmanager-web.external-url>/api/v1/silences
<alertmanager-web.external-url>/api/v1/status
-blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes
and -blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes
to 100 MiB (previous default value was 10 MiB). #6764|
characters. #69594xx
errors. #7004status_code
label of request duration metric. #7045-memberlist.stream-timeout
from 10s
to 2s
. #7076thanos_cache_memcached_*
and thanos_memcached_*
prefixed metrics. Instead, Memcached and Redis cache clients now emit thanos_cache_*
prefixed metrics with a backend
label. #7076prometheus_sd_failed_configs
renamed to cortex_prometheus_sd_failed_configs
prometheus_sd_discovered_targets
renamed to cortex_prometheus_sd_discovered_targets
prometheus_sd_received_updates_total
renamed to cortex_prometheus_sd_received_updates_total
prometheus_sd_updates_delayed_total
renamed to cortex_prometheus_sd_updates_delayed_total
prometheus_sd_updates_total
renamed to cortex_prometheus_sd_updates_total
prometheus_sd_refresh_failures_total
renamed to cortex_prometheus_sd_refresh_failures_total
prometheus_sd_refresh_duration_seconds
renamed to cortex_prometheus_sd_refresh_duration_seconds
-query-frontend.not-running-timeout
has been changed from 0 (disabled) to 2s. The configuration option has also been moved from "experimental" to "advanced". #7127blocks-storage.bucket-store.tenant-sync-concurrency
has been changed from 10
to 1
and the default value for blocks-storage.bucket-store.block-sync-concurrency
has been changed from 20
to 4
. #7136-blocks-storage.bucket-store.index-header-lazy-loading-enabled
and -blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout
and their corresponding YAML settings. Instead, use -blocks-storage.bucket-store.index-header.lazy-loading-enabled
and -blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout
. #7521-blocks-storage.bucket-store.index-header.lazy-loading-concurrency
and its corresponding YAML settings as advanced. #7521-blocks-storage.bucket-store.index-header.sparse-persistence-enabled
since this is now the default behavior. #7535-server.report-grpc-codes-in-instrumentation-label-enabled
to true
by default, which enables reporting gRPC status codes as status_code
labels in the cortex_request_duration_seconds
metric. #7144status_code
labels in the cortex_ingester_client_request_duration_seconds
metric by default. #7144-ingester.client.report-grpc-codes-in-instrumentation-label-enabled
has been deprecated, and its default value is set to true
. #7144-ingester.return-only-grpc-errors
has been deprecated, and its default value is set to true
. To ensure backwards compatibility, during a migration from a version prior to 2.11.0 to 2.12 or later, -ingester.return-only-grpc-errors
should be set to false
. Once all the components are migrated, the flag can be removed. #7151-ingester.ring.token-generation-strategy
-ingester.ring.spread-minimizing-zones
-ingester.ring.spread-minimizing-join-ring-in-order
-query-frontend.max-cache-freshness
(and its respective YAML configuration parameter) has been changed from 1m
to 10m
. #7161-distributor.write-requests-buffer-pooling-enabled
to true
. #7165-ingester.client.circuit-breaker.cooldown-period
has been changed from 1m
to 10s
. #7310cortex_bucket_store_blocks_loaded_by_duration
. cortex_bucket_store_series_blocks_queried
is better suited for detecting when compactors are not able to keep up with the number of blocks to compact. #7309-ingester.limit-inflight-requests-using-grpc-method-limiter
and -distributor.limit-inflight-requests-using-grpc-method-limiter
, is now stable and enabled by default. The configuration options have been deprecated and will be removed in Mimir 2.14. #7360-distributor.enable-otlp-metadata-storage
flag's default to true, and deprecate it. The flag will be removed in Mimir 2.14. #7366-querier.max-query-into-future
has been deprecated and will be removed in Mimir 2.14. #7496cortex_distributor_sample_delay_seconds
has been deprecated and will be removed in Mimir 2.14. #7516frontend.cache_unaligned_requests
has been moved to limits.cache_unaligned_requests
. #7519-querier.minimize-ingester-requests
has been moved from "experimental" to "advanced". #7638-server.log-source-ips-full
option to log all IPs from Forwarded
, X-Real-IP
, X-Forwarded-For
headers. #7250-tenant-federation.max-tenants
option to limit the max number of tenants allowed for requests when federation is enabled. #6959count_method
parameter which enables counting active label names. #7085-querier.promql-experimental-functions-enabled
CLI flag (and respective YAML config option) to enable experimental PromQL functions. The experimental functions introduced are: mad_over_time()
, sort_by_label()
and sort_by_label_desc()
. #7057-alertmanager.grafana-alertmanager-compatibility-enabled
CLI flag (and respective YAML config option) to enable an experimental API endpoints that support the migration of the Grafana Alertmanager. #7057-alertmanager.utf8-strict-mode-enabled
to control support for any UTF-8 character as part of Alertmanager configuration/API matchers and labels. It's default value is set to false
. #6898histogram_avg()
function support to PromQL. #7293-blocks-storage.tsdb.timely-head-compaction
flag, which enables more timely head compaction, and defaults to false
. #7372/compactor/tenants
and /compactor/tenant/{tenant}/planned_jobs
endpoints that provide functionality that was provided by tools/compaction-planner
-- listing of planned compaction jobs based on tenants' bucket index. #7381-querier.response-streaming-enabled
. This is currently only supported for the /api/v1/cardinality/active_series
endpoint. #7173{"metric_name", "l1"="val"}
to promql and some of the exposition formats. #7475 #7541cortex_distributor_otlp_requests_total
to track the total number of OTLP requests. #7385cortex_vault_token_lease_renewal_active
) to check whether token renewal is active, and the counters cortex_vault_token_lease_renewal_success_total
and cortex_vault_auth_success_total
to see the total number of successful lease renewals / authentications. #7337cortex_ruler_queries_zero_fetched_series_total
. #6544/config/api/v1/rules/{namespace}/{groupName}
configuration API endpoint. #6632query-frontend.additional-query-queue-dimensions-enabled
and query-scheduler.additional-query-queue-dimensions-enabled
. #6772-distributor.metric-relabeling-enabled
or associated YAML. #6970-distributor.remote-timeout
is now accounted from the first ingester push request being sent. #6972-<prefix>.s3.sts-endpoint
sets a custom endpoint for AWS Security Token Service (AWS STS) in s3 storage provider. #6172cortex_querier_queries_storage_type_total
metric that indicates how many queries have executed for a source, ingesters or store-gateways. Add cortex_querier_query_storegateway_chunks_total
metric to count the number of chunks fetched from a store gateway. #7099,#7145-query-frontend.shard-active-series-queries
. #6784-distributor.reusable-ingester-push-workers=2000
by default and mark feature as advanced
. #7128-server.grpc.num-workers=100
by default and mark feature as advanced
. #7131source
, level
, and out_or_order
to cortex_bucket_store_series_blocks_queried
metric that indicates the number of blocks that were queried from store gateways by block metadata. #7112 #7262 #7267cortex_bucket_index_estimated_compaction_jobs
metric. If computation of jobs fails, cortex_bucket_index_estimated_compaction_jobs_errors_total
is updated instead. #7299cortex_alertmanager_notifications_suppressed_total
that counts the total number of notifications suppressed for being silenced, inhibited, outside of active time intervals or within muted time intervals. #7384cortex_query_scheduler_queue_duration_seconds
histogram metric, in order to better track queries staying in the queue for longer than 10s. #7470type
label is added to prometheus_tsdb_head_out_of_order_samples_appended_total
metric. #7475-ingester.use-ingester-owned-series-for-limits
) now prevents discards in cases where a tenant is sharded across all ingesters (or shuffle sharding is disabled) and the ingester count increases. #7411-query-frontend.active-series-write-timeout
to allow configuring the server-side write timeout for active series requests. #7553 #7569-querier.max-fetched-series-per-query
is not applied to /series
endpoint if the series are loaded from ingesters. #7055-distributor.metric-relabeling-enabled
may cause distributors to panic #7176-distributor.metric-relabeling-enabled
may cause distributors to write unsorted labels and corrupt blocks #7326cortex_query_frontend_queries_total
report incorrectly reported op="query"
for any request which wasn't a range query. Now the op
label value can be one of the following: #7207
query
: instant queryquery_range
: range querycardinality
: cardinality querylabel_names_and_values
: label names / values queryactive_series
: active series queryother
: any other requestgoogle.golang.org/grpc
to resolve occasional issues with gRPC server closing its side of connection before it was drained by the client. #7380active_series
requests when the request context is canceled. #7378cortex_ruler_write_requests_failed_total
metric. #7472job
label matcher for distributor and gateway have been extended to include any deployment matching distributor.*
and cortex-gw.*
respectively. This change allows to match custom and multi-zone distributor and gateway deployments too. #6817cortex_request_duration_seconds
. #7528step
parameter from targets as it is not supported. #7157cortex_memcache_request_duration_seconds
and cortex_cache_request_duration_seconds
. #7514JAEGER_REPORTER_MAX_QUEUE_SIZE
from the default (100) to 1000, to avoid dropping tracing spans. #7259JAEGER_REPORTER_MAX_QUEUE_SIZE
from 1000 to 5000, to avoid dropping tracing spans. #6764JAEGER_REPORTER_MAX_QUEUE_SIZE
from the default (100) to 1000, to avoid dropping tracing spans. #7068JAEGER_REPORTER_MAX_QUEUE_SIZE
from the default (100), to avoid dropping tracing spans. #7086-distributor.ring.heartbeat-period
set to 1m
-distributor.ring.heartbeat-timeout
set to 4m
-ingester.ring.heartbeat-period
set to 2m
-store-gateway.sharding-ring.heartbeat-period
set to 1m
-store-gateway.sharding-ring.heartbeat-timeout
set to 4m
-compactor.ring.heartbeat-period
set to 1m
-compactor.ring.heartbeat-timeout
set to 4m
ruler_querier_topology_spread_max_skew
instead of querier_topology_spread_max_skew
. #7204-server.grpc.keepalive.max-connection-age
lowered from 2m
to 60s
and configured -shutdown-delay=90s
and termination grace period to 100
seconds in order to reduce the chances of failed gRPC write requests when distributors gracefully shutdown. #7361alertmanager_node_affinity_matchers
compactor_node_affinity_matchers
continuous_test_node_affinity_matchers
distributor_node_affinity_matchers
ingester_node_affinity_matchers
ingester_zone_a_node_affinity_matchers
ingester_zone_b_node_affinity_matchers
ingester_zone_c_node_affinity_matchers
mimir_backend_node_affinity_matchers
mimir_backend_zone_a_node_affinity_matchers
mimir_backend_zone_b_node_affinity_matchers
mimir_backend_zone_c_node_affinity_matchers
mimir_read_node_affinity_matchers
mimir_write_node_affinity_matchers
mimir_write_zone_a_node_affinity_matchers
mimir_write_zone_b_node_affinity_matchers
mimir_write_zone_c_node_affinity_matchers
overrides_exporter_node_affinity_matchers
querier_node_affinity_matchers
query_frontend_node_affinity_matchers
query_scheduler_node_affinity_matchers
rollout_operator_node_affinity_matchers
ruler_node_affinity_matchers
ruler_node_affinity_matchers
ruler_querier_node_affinity_matchers
ruler_query_frontend_node_affinity_matchers
ruler_query_scheduler_node_affinity_matchers
store_gateway_node_affinity_matchers
store_gateway_node_affinity_matchers
store_gateway_zone_a_node_affinity_matchers
store_gateway_zone_b_node_affinity_matchers
store_gateway_zone_c_node_affinity_matchers
ingester_automated_downscale_enabled
flag. It is disabled by default. #6850MimirStoreGatewayTooManyFailedOperations
warning alert that triggers when Mimir store-gateway report error when interacting with the object storage. #6831-shutdown-delay
, -server.grpc.keepalive.max-connection-age
and termination grace period to reduce the likelihood of queries hitting terminated query-frontends. #7129ignoreNullValues
option for Prometheus scaler. #7471migrate-utf8
to migrate Alertmanager configurations for Alertmanager versions 0.27.0 and later. #7383--extra-headers
option to mimirtool rules
command to add extra headers to requests for auth. #7141--output-dir
to mimirtool alertmanager get
where the config and templates will be written to and can be loaded via mimirtool alertmanager load
#6760Host
HTTP header was not being correctly changed for the proxy targets. #7386__REQUEST_HEADER_X_SCOPE_ORGID__
. #7452KubePersistentVolumeFillingUp
alert. #7297All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.11.0...mimir-2.12.0
distributor.ingestion-burst-factor
by @treid314 in https://github.com/grafana/mimir/pull/6662
/active_series
endpoint by @flxbk in https://github.com/grafana/mimir/pull/6717
MimirStoreGatewayTooManyFailedOperations
alert by @wilfriedroset in https://github.com/grafana/mimir/pull/6831
Distributor.push()
by @colega in https://github.com/grafana/mimir/pull/6978
backend: s3
when minio is disabled by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/6999
active_series
requests by @flxbk in https://github.com/grafana/mimir/pull/6784
time
param by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7026
context.WithCancelCause
in non-test code by @charleskorn in https://github.com/grafana/mimir/pull/6921
context.Canceled
in active series requests by @flxbk in https://github.com/grafana/mimir/pull/7102
/active_series
by @flxbk in https://github.com/grafana/mimir/pull/7106
tools/copyblocks
to add undelete-blocks
and copyprefix
by @andyasp in https://github.com/grafana/mimir/pull/6607
X-Read-Consistency
HTTP header by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7091
make docs
procedure and add workflow to keep it up to date by @jdbaldry in https://github.com/grafana/mimir/pull/5794
DeepCopyTimeseries
extra option to make a deep copy hi… by @ortuman in https://github.com/grafana/mimir/pull/7130
/active_series
: generate correct request shards for incoming GET
requests, handle gRPC errors by @flxbk in https://github.com/grafana/mimir/pull/7133
make docs
procedure by @github-actions in https://github.com/grafana/mimir/pull/7167
make docs
procedure by @github-actions in https://github.com/grafana/mimir/pull/7197
backend: s3
when minio is disabled" by @dimitarvdimitrov in https://github.com/grafana/mimir/pull/7199
backend: s3
when minio is disabled" by @grafanabot in https://github.com/grafana/mimir/pull/7201
shardActiveSeriesMiddleware
performance when merging responses by @ortuman in https://github.com/grafana/mimir/pull/7261
-distributor.enable-otlp-metadata-storage
flag default to true, and deprecate by @aknuds1 in https://github.com/grafana/mimir/pull/7366
kedaAutoscaling
section by @beatkind in https://github.com/grafana/mimir/pull/7392
index_header_lazy_loading_enabled
docker-compose config by @flxbk in https://github.com/grafana/mimir/pull/7551
/active_series
requests by @flxbk in https://github.com/grafana/mimir/pull/7553
/active_series
: cancel request context when write deadline is reached by @flxbk in https://github.com/grafana/mimir/pull/7569
Full Changelog: https://github.com/grafana/mimir/compare/mimir-2.11.0...mimir-2.12.0
Published by duricanikolic 7 months ago
This release contains 525 PRs from 60 authors, including new contributors Benoit Schipper, Derek Cadzow, Edwin, Itay Kalfon, Ivan Farré Vicente, Jan O. Rundshagen, Jorge Turrado Ferrero, Lukas Monkevicius, Mickaël Canévet, Rafael Sathler, Rajakavitha Kodhandapani, Tim Kotowski, Vladimir Varankin, Zach, Zach Day, Zirko, blut, github-actions[bot], ncharaf, zhehao-grafana. Thank you!
Grafana Labs is excited to announce version 2.12 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bug fixes in this release.
For the complete list of changes, refer to the CHANGELOG.
Added support to only count series that are considered active through the Cardinality API endpoint /api/v1/cardinality/label_names
by passing the count_method
parameter.
If set to active
it counts only series that are considered active according to the -ingester.active-series-metrics-idle-timeout
flag setting rather than counting all in-memory series.
The "Store-gateway: bucket tenant blocks" admin page contains a new column "No Compact".
If block no compaction marker is set, it specifies the reason and the date the marker is added.
The estimated number of compaction jobs based on the current bucket-index is now computed by the compactor.
The result is tracked by the new cortex_bucket_index_compaction_jobs
metric.
If this computation fails, the cortex_bucket_index_compaction_jobs_errors_total
metric is updated instead.
The estimated number of compaction jobs is also shown in Top tenants, Tenants, and Compactor dashboards.
Added mimir-distroless
container image built upon a distroless
image (gcr.io/distroless/static-debian12
).
This improvement minimizes attack surfaces and potential CVEs by trimming down the dependencies within the image.
After comprehensive testing, the Mimir maintainers plan to shift from the current image to the distroless version.
Additionally, the following previously experimental features are now considered stable:
The number of pre-allocated workers used to forward push requests to the ingesters, configurable via the -distributor.reusable-ingester-push-workers
CLI flag on distributors.
It now defaults to 2000
.
Note that this is a performance optimization, and not a limiting feature.
If not enough workers available, new goroutines will be spawned.
The number of gRPC server workers used to serve the requests, configurable via the -server.grpc.num-workers
CLI flag.
It now defaults to 100
.
Note that this is the number of pre-allocated long-lived workers, and not a limiting feature.
If not enough workers are available, new goroutines will be spawned.
The maximum number of concurrent index header loads across all tenants, configurable via the -blocks-storage.bucket-store.index-header.lazy-loading-concurrency
CLI flag on store-gateways.
It defaults to 4
.
The maximum time to wait for the query-frontend to become ready before rejecting requests, configurable via the -query-frontend.not-running-timeout
CLI flag on query-frontends.
It now defaults to 2s
.
The CLI flag that allows queriers to reduce pressure on ingesters by initially querying only the minimum set of ingesters required to reach quorum, -querier.minimize-ingester-requests
.
It is now enabled by default.
Spread-minimizing token-related CLI flags: -ingester.ring.token-generation-strategy
, -ingester.ring.spread-minimizing-zones
and -ingester.ring.spread-minimizing-join-ring-in-order
.
You can read more about this feature in our blog post.
In Grafana Mimir 2.12 the following behavior has changed:
Store-gateway now persists a sparse version of the index-header to disk on construction and loads sparse index-headers from disk instead of the whole index-header.
This improves the speed at which index headers are lazy-loaded from disk by up to 90%. The added disk usage is in the order of 1-2%.
Alertmanager deprecated the v1
API. All v1
API endpoints now respond with a JSON deprecation notice and a status code of 410
.
All endpoints have a v2
equivalent.
The list of endpoints is:
<alertmanager-web.external-url>/api/v1/alerts
<alertmanager-web.external-url>/api/v1/receivers
<alertmanager-web.external-url>/api/v1/silence/{id}
<alertmanager-web.external-url>/api/v1/silences
<alertmanager-web.external-url>/api/v1/status
Exemplar's label traceID
has been changed to trace_id
to be consistent with the OpenTelemetry standard.
Errors returned by ingesters now contain only gRPC status codes.
Previously they contained both gRPC and HTTP status codes.
To guarantee backwards compatibility when migrating from a version prior to 2.11
, it's necessary to first migrate to version 2.11
, and then to version 2.12
.
Otherwise, it might happen that during the migration, some ingester errors with HTTP status code 4xx
won't be recognized, and the corresponding request will be repeated.
Responses with gRPC status codes are now reported as status_code
labels in the cortex_request_duration_seconds
and cortex_ingester_client_request_duration_seconds
metrics.
Responses with HTTP 4xx status codes are now treated as errors and used in status_code
label of request duration metric.
The default value of the following CLI flags have been changed:
-blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes
from 10MB
to 100MB
.-blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes
from 10MB
to 100MB
.-blocks-storage.bucket-store.tenant-sync-concurrency
from 10
to 1
.-query-frontend.max-cache-freshness
from 1m
to 10m
.-distributor.write-requests-buffer-pooling-enabled
from false
to true
.-locks-storage.bucket-store.block-sync-concurrency
from 20
to 4
.-memberlist.stream-timeout
from 10s
to 2s
.-server.report-grpc-codes-in-instrumentation-label-enabled
from false
to true
.The following deprecated configuration options are removed in Grafana Mimir 2.12:
frontend.cache_unaligned_requests
.-querier.prefer-streaming-chunks-from-ingesters
.The following configuration options are deprecated and will be removed in Grafana Mimir 2.14:
The CLI flag -ingester.limit-inflight-requests-using-grpc-method-limiter
.
It now defaults to true
.
The CLI flag -ingester.return-only-grpc-errors
.
It now defaults to true
.
To guarantee backwards compatibility when migrating from a version prior to 2.11
, it's necessary to first migrate to version 2.11
, and then to version 2.12
.
Otherwise, it might happen that during the migration, some ingester errors with HTTP status code 4xx
won't be recognized, and the corresponding request will be repeated.
The CLI flag -ingester.client.report-grpc-codes-in-instrumentation-label-enabled
.
It now defaults to true
.
The CLI flag -distributor.limit-inflight-requests-using-grpc-method-limiter
.
It now defaults to true
.
The CLI flag -distributor.enable-otlp-metadata-storage
.
It now defaults to true
.
The CLI flag -querier.max-query-into-future
.
The following metrics are removed or deprecated:
cortex_bucket_store_blocks_loaded_by_duration
has been removed.cortex_distributor_sample_delay_seconds
has been deprecated and will be removed in Mimir 2.14.Grafana Mimir 2.12 includes new features that are considered experimental and disabled by default.
Use them with caution and report any issues you encounter:
The maximum number of tenant IDs that may be for a federated query can be configured via the -tenant-federation.max-tenants
CLI flag on query-frontends.
By default, it's 0
, meaning that the limit is disabled.
Sharding of active series queries can be enabled via the -query-frontend.shard-active-series-queries
CLI flag on query-frontends.
Timely head compaction can be enabled via the -blocks-storage.tsdb.timely-head-compaction-enabled
on ingesters.
If enabled, the head compaction happens when the min block range can no longer be appended, without requiring 1.5x the chunk range worth of data in the head.
Streaming of responses from querier to query-frontend can be enabled via the -querier.response-streaming-enabled
CLI flag on queriers.
This is currently supported only for responses from the /api/v1/cardinality/active_series
endpoint.
The maximum response size for active series queries, in bytes, can be set via the -querier.active-series-results-max-size-bytes
CLI flag on queriers.
Metric relabeling on a per-tenant basis can be forcefully disabled via the -distributor.metric-relabeling-enabled
CLI flag on rulers.
Metrics relabeling is enabled by default.
Query Queue Load Balancing by Query Component. Tenant query queues in the query-scheduler can now be split into subqueues by which query component is expected to be utilized to complete the query: ingesters, store-gateways, both, or uncategorized.
Dequeuing queries for a given tenant will rotate through the query component subqueues via simple round-robin.
In the event that the one of the query components (ingesters or store-gateways) experience a slowdown, queries only utilizing the the other query component can continue to be serviced.
This feature is recommended to be enabled.
The following CLI flags must be set to true in order to be in effect:
-query-frontend.additional-query-queue-dimensions-enabled
on the query-frontend.-query-scheduler.additional-query-queue-dimensions-enabled
on the query-scheduler.Owned series tracking in ingesters can be enabled via the -ingester.track-ingester-owned-series
CLI flag.
When enabled, ingesters will track the number of in-memory series that still map to the ingester based on the ring state.
These counts are more reactive to ring and shard changes than in-memory series, and can be used when enforcing tenant series limits by enabling the -ingester.use-ingester-owned-series-for-limits
CLI flag.
This feature requires zone-aware replication to be enabled, and the replication factor to be equal to the number of zones.
-distributor.metric-relabeling-enabled
could cause distributors to panic.-distributor.metric-relabeling-enabled
could cause distributors to write unsorted labels and corrupt blocks.-querier.max-fetched-series-per-query
wasn't applied to /series
endpoint in case series loaded from ingesters.400
, while when returning chunks all internal errors were translated to HTTP 500
.400
errors, while all other errors will be translated into HTTP 500
errors.cortex_query_frontend_queries_total
metric incorrectly reported op="query"
for any request which wasn't a range query.op
label value can be one of the following:
query
: instant queryquery_range
: range querycardinality
: cardinality querylabel_names_and_values
: label names / values queryactive_series
: active series queryother
: any other requestcortex_ruler_write_requests_failed_total
metric.The Grafana Mimir and Grafana Enterprise Metrics Helm charts are released independently.
Refer to the Grafana Mimir Helm chart documentation.
All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.12.0-rc.0...mimir-2.12.0-rc.1
Published by duricanikolic 7 months ago
This release contains 525 PRs from 60 authors, including new contributors Benoit Schipper, Derek Cadzow, Edwin, Itay Kalfon, Ivan Farré Vicente, Jan O. Rundshagen, Jorge Turrado Ferrero, Lukas Monkevicius, Mickaël Canévet, Rafael Sathler, Rajakavitha Kodhandapani, Tim Kotowski, Vladimir Varankin, Zach, Zach Day, Zirko, blut, github-actions[bot], ncharaf, zhehao-grafana. Thank you!
Grafana Labs is excited to announce version 2.12 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bug fixes in this release.
For the complete list of changes, refer to the CHANGELOG.
Added support to only count series that are considered active through the Cardinality API endpoint /api/v1/cardinality/label_names
by passing the count_method
parameter.
If set to active
it counts only series that are considered active according to the -ingester.active-series-metrics-idle-timeout
flag setting rather than counting all in-memory series.
The "Store-gateway: bucket tenant blocks" admin page contains a new column "No Compact".
If block no compaction marker is set, it specifies the reason and the date the marker is added.
The estimated number of compaction jobs based on the current bucket-index is now computed by the compactor.
The result is tracked by the new cortex_bucket_index_compaction_jobs
metric.
If this computation fails, the cortex_bucket_index_compaction_jobs_errors_total
metric is updated instead.
The estimated number of compaction jobs is also shown in Top tenants, Tenants, and Compactor dashboards.
Added mimir-distroless
container image built upon a distroless
image (gcr.io/distroless/static-debian12
).
This improvement minimizes attack surfaces and potential CVEs by trimming down the dependencies within the image.
After comprehensive testing, the Mimir maintainers plan to shift from the current image to the distroless version.
Additionally, the following previously experimental features are now considered stable:
The number of pre-allocated workers used to forward push requests to the ingesters, configurable via the -distributor.reusable-ingester-push-workers
CLI flag on distributors.
It now defaults to 2000
.
Note that this is a performance optimization, and not a limiting feature.
If not enough workers available, new goroutines will be spawned.
The number of gRPC server workers used to serve the requests, configurable via the -server.grpc.num-workers
CLI flag.
It now defaults to 100
.
Note that this is the number of pre-allocated long-lived workers, and not a limiting feature.
If not enough workers are available, new goroutines will be spawned.
The maximum number of concurrent index header loads across all tenants, configurable via the -blocks-storage.bucket-store.index-header.lazy-loading-concurrency
CLI flag on store-gateways.
It defaults to 4
.
The maximum time to wait for the query-frontend to become ready before rejecting requests, configurable via the -query-frontend.not-running-timeout
CLI flags on query-frontends.
It now defaults to 2s
.
Spread-minimizing token-related CLI flags: -ingester.ring.token-generation-strategy
, -ingester.ring.spread-minimizing-zones
and -ingester.ring.spread-minimizing-join-ring-in-order
.
You can read more about this feature in our blog post.
In Grafana Mimir 2.12 the following behavior has changed:
Store-gateway now persists a sparse version of the index-header to disk on construction and loads sparse index-headers from disk instead of the whole index-header.
This improves the speed at which index headers are lazy-loaded from disk by up to 90%. The added disk usage is in the order of 1-2%.
Alertmanager deprecated the v1
API. All v1
API endpoints now respond with a JSON deprecation notice and a status code of 410
.
All endpoints have a v2
equivalent.
The list of endpoints is:
<alertmanager-web.external-url>/api/v1/alerts
<alertmanager-web.external-url>/api/v1/receivers
<alertmanager-web.external-url>/api/v1/silence/{id}
<alertmanager-web.external-url>/api/v1/silences
<alertmanager-web.external-url>/api/v1/status
Exemplar's label traceID
has been changed to trace_id
to be consistent with the OpenTelemetry standard.
Errors returned by ingesters now contain only gRPC status codes.
Previously they contained both gRPC and HTTP status codes.
To guarantee backwards compatibility when migrating from a version prior to 2.11
, it's necessary to first migrate to version 2.11
, and then to version 2.12
.
Otherwise, it might happen that during the migration, some ingester errors with HTTP status code 4xx
won't be recognized, and the corresponding request will be repeated.
Responses with gRPC status codes are now reported as status_code
labels in the cortex_request_duration_seconds
and cortex_ingester_client_request_duration_seconds
metrics.
Responses with HTTP 4xx status codes are now treated as errors and used in status_code
label of request duration metric.
The default value of the following CLI flags have been changed:
-blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes
from 10MB
to 100MB
.-blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes
from 10MB
to 100MB
.-blocks-storage.bucket-store.tenant-sync-concurrency
from 10
to 1
.-query-frontend.max-cache-freshness
from 1m
to 10m
.-distributor.write-requests-buffer-pooling-enabled
from false
to true
.-locks-storage.bucket-store.block-sync-concurrency
from 20
to 4
.-memberlist.stream-timeout
from 10s
to 2s
.-server.report-grpc-codes-in-instrumentation-label-enabled
from false
to true
.The following deprecated configuration options are removed in Grafana Mimir 2.12:
frontend.cache_unaligned_requests
.The following configuration options are deprecated and will be removed in Grafana Mimir 2.14:
The CLI flag -ingester.limit-inflight-requests-using-grpc-method-limiter
.
It now defaults to true
.
The CLI flag -ingester.return-only-grpc-errors
.
It now defaults to true
.
To guarantee backwards compatibility when migrating from a version prior to 2.11
, it's necessary to first migrate to version 2.11
, and then to version 2.12
.
Otherwise, it might happen that during the migration, some ingester errors with HTTP status code 4xx
won't be recognized, and the corresponding request will be repeated.
The CLI flag -ingester.client.report-grpc-codes-in-instrumentation-label-enabled
.
It now defaults to true
.
The CLI flag -distributor.limit-inflight-requests-using-grpc-method-limiter
.
It now defaults to true
.
The CLI flag -distributor.enable-otlp-metadata-storage
.
It now defaults to true
.
The CLI flag -querier.max-query-into-future
.
The following metrics are removed or deprecated:
cortex_bucket_store_blocks_loaded_by_duration
has been removed.cortex_distributor_sample_delay_seconds
has been deprecated and will be removed in Mimir 2.14.Grafana Mimir 2.12 includes new features that are considered experimental and disabled by default.
Use them with caution and report any issues you encounter:
The maximum number of tenant IDs that may be for a federated query can be configured via the -tenant-federation.max-tenants
CLI flag on query-frontends.
By default, it's 0
, meaning that the limit is disabled.
Sharding of active series queries can be enabled via the -query-frontend.shard-active-series-queries
CLI flag on query-frontends.
Timely head compaction can be enabled via the -blocks-storage.tsdb.timely-head-compaction-enabled
on ingesters.
If enabled, the head compaction happens when the min block range can no longer be appended, without requiring 1.5x the chunk range worth of data in the head.
Streaming of responses from querier to query-frontend can be enabled via the -querier.response-streaming-enabled
CLI flag on queriers.
This is currently supported only for responses from the /api/v1/cardinality/active_series
endpoint.
The maximum response size for active series queries, in bytes, can be set via the -querier.active-series-results-max-size-bytes
CLI flag on queriers.
Metric relabeling on a per-tenant basis can be forcefully disabled via the -distributor.metric-relabeling-enabled
CLI flag on rulers.
Metrics relabeling is enabled by default.
Query Queue Load Balancing by Query Component. Tenant query queues in the query-scheduler can now be split into subqueues by which query component is expected to be utilized to complete the query: ingesters, store-gateways, both, or uncategorized.
Dequeuing queries for a given tenant will rotate through the query component subqueues via simple round-robin.
In the event that the one of the query components (ingesters or store-gateways) experience a slowdown, queries only utilizing the the other query component can continue to be serviced.
This feature is recommended to be enabled.
The following CLI flags must be set to true in order to be in effect:
-query-frontend.additional-query-queue-dimensions-enabled
on the query-frontend.-query-scheduler.additional-query-queue-dimensions-enabled
on the query-scheduler.Owned series tracking in ingesters can be enabled via the -ingester.track-ingester-owned-series
CLI flag.
When enabled, ingesters will track the number of in-memory series that still map to the ingester based on the ring state.
These counts are more reactive to ring and shard changes than in-memory series, and can be used when enforcing tenant series limits by enabling the -ingester.use-ingester-owned-series-for-limits
CLI flag.
This feature requires zone-aware replication to be enabled, and the replication factor to be equal to the number of zones.
-distributor.metric-relabeling-enabled
could cause distributors to panic.-distributor.metric-relabeling-enabled
could cause distributors to write unsorted labels and corrupt blocks.-querier.max-fetched-series-per-query
wasn't applied to /series
endpoint in case series loaded from ingesters.400
, while when returning chunks all internal errors were translated to HTTP 500
.400
errors, while all other errors will be translated into HTTP 500
errors.cortex_query_frontend_queries_total
metric incorrectly reported op="query"
for any request which wasn't a range query.op
label value can be one of the following:
query
: instant queryquery_range
: range querycardinality
: cardinality querylabel_names_and_values
: label names / values queryactive_series
: active series queryother
: any other requestcortex_ruler_write_requests_failed_total
metric.The Grafana Mimir and Grafana Enterprise Metrics Helm charts are released independently.
Refer to the Grafana Mimir Helm chart documentation.
v1
API. All v1
API endpoints now respond with a JSON deprecation notice and a status code of 410
. All endpoints have a v2
equivalent. The list of endpoints is: #7103
<alertmanager-web.external-url>/api/v1/alerts
<alertmanager-web.external-url>/api/v1/receivers
<alertmanager-web.external-url>/api/v1/silence/{id}
<alertmanager-web.external-url>/api/v1/silences
<alertmanager-web.external-url>/api/v1/status
-blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes
and -blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes
to 100 MiB (previous default value was 10 MiB). #6764|
characters. #69594xx
errors. #7004status_code
label of request duration metric. #7045-memberlist.stream-timeout
from 10s
to 2s
. #7076thanos_cache_memcached_*
and thanos_memcached_*
prefixed metrics. Instead, Memcached and Redis cache clients now emit thanos_cache_*
prefixed metrics with a backend
label. #7076prometheus_sd_failed_configs
renamed to cortex_prometheus_sd_failed_configs
prometheus_sd_discovered_targets
renamed to cortex_prometheus_sd_discovered_targets
prometheus_sd_received_updates_total
renamed to cortex_prometheus_sd_received_updates_total
prometheus_sd_updates_delayed_total
renamed to cortex_prometheus_sd_updates_delayed_total
prometheus_sd_updates_total
renamed to cortex_prometheus_sd_updates_total
prometheus_sd_refresh_failures_total
renamed to cortex_prometheus_sd_refresh_failures_total
prometheus_sd_refresh_duration_seconds
renamed to cortex_prometheus_sd_refresh_duration_seconds
-query-frontend.not-running-timeout
has been changed from 0 (disabled) to 2s. The configuration option has also been moved from "experimental" to "advanced". #7126blocks-storage.bucket-store.tenant-sync-concurrency
has been changed from 10
to 1
and the default value for blocks-storage.bucket-store.block-sync-concurrency
has been changed from 20
to 4
. #7136-blocks-storage.bucket-store.index-header-lazy-loading-enabled
and -blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout
and their corresponding YAML settings. Instead, use -blocks-storage.bucket-store.index-header.lazy-loading-enabled
and -blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout
. #7521-blocks-storage.bucket-store.index-header.lazy-loading-concurrency
and its corresponding YAML settings as advanced. #7521-blocks-storage.bucket-store.index-header.sparse-persistence-enabled
since this is now the default behavior. #7535-server.report-grpc-codes-in-instrumentation-label-enabled
to true
by default, which enables reporting gRPC status codes as status_code
labels in the cortex_request_duration_seconds
metric. #7144status_code
labels in the cortex_ingester_client_request_duration_seconds
metric by default. #7144-ingester.client.report-grpc-codes-in-instrumentation-label-enabled
has been deprecated, and its default value is set to true
. #7144-ingester.return-only-grpc-errors
has been deprecated, and its default value is set to true
. To ensure backwards compatibility, during a migration from a version prior to 2.11.0 to 2.12 or later, -ingester.return-only-grpc-errors
should be set to false
. Once all the components are migrated, the flag can be removed. #7151-ingester.ring.token-generation-strategy
-ingester.ring.spread-minimizing-zones
-ingester.ring.spread-minimizing-join-ring-in-order
-query-frontend.max-cache-freshness
(and its respective YAML configuration parameter) has been changed from 1m
to 10m
. #7161-distributor.write-requests-buffer-pooling-enabled
to true
. #7165-ingester.client.circuit-breaker.cooldown-period
has been changed from 1m
to 10s
. #7310cortex_bucket_store_blocks_loaded_by_duration
. cortex_bucket_store_series_blocks_queried
is better suited for detecting when compactors are not able to keep up with the number of blocks to compact. #7309-ingester.limit-inflight-requests-using-grpc-method-limiter
and -distributor.limit-inflight-requests-using-grpc-method-limiter
, is now stable and enabled by default. The configuration options have been deprecated and will be removed in Mimir 2.14. #7360-distributor.enable-otlp-metadata-storage
flag's default to true, and deprecate it. The flag will be removed in Mimir 2.14. #7366-querier.max-query-into-future
has been deprecated and will be removed in Mimir 2.14. #7496cortex_distributor_sample_delay_seconds
has been deprecated and will be removed in Mimir 2.14. #7516frontend.cache_unaligned_requests
has been moved to limits.cache_unaligned_requests
. #7519-server.log-source-ips-full
option to log all IPs from Forwarded
, X-Real-IP
, X-Forwarded-For
headers. #7250-tenant-federation.max-tenants
option to limit the max number of tenants allowed for requests when federation is enabled. #6959count_method
parameter which enables counting active label values. #7085-querier.promql-experimental-functions-enabled
CLI flag (and respective YAML config option) to enable experimental PromQL functions. The experimental functions introduced are: mad_over_time()
, sort_by_label()
and sort_by_label_desc()
. #7057-alertmanager.grafana-alertmanager-compatibility-enabled
CLI flag (and respective YAML config option) to enable an experimental API endpoints that support the migration of the Grafana Alertmanager. #7057-alertmanager.utf8-strict-mode-enabled
to control support for any UTF-8 character as part of Alertmanager configuration/API matchers and labels. It's default value is set to false
. #6898histogram_avg()
function support to PromQL. #7293-blocks-storage.tsdb.timely-head-compaction
flag, which enables more timely head compaction, and defaults to false
. #7372/compactor/tenants
and /compactor/tenant/{tenant}/planned_jobs
endpoints that provide functionality that was provided by tools/compaction-planner
-- listing of planned compaction jobs based on tenants' bucket index. #7381-querier.response-streaming-enabled
. This is currently only supported for the /api/v1/cardinality/active_series
endpoint. #7173{"metric_name", "l1"="val"}
to promql and some of the exposition formats. #7475 #7541cortex_distributor_otlp_requests_total
to track the total number of OTLP requests. #7385cortex_vault_token_lease_renewal_active
) to check whether token renewal is active, and the counters cortex_vault_token_lease_renewal_success_total
and cortex_vault_auth_success_total
to see the total number of successful lease renewals / authentications. #7337cortex_ruler_queries_zero_fetched_series_total
. #6544/config/api/v1/rules/{namespace}/{groupName}
configuration API endpoint. #6632query-frontend.additional-query-queue-dimensions-enabled
and query-scheduler.additional-query-queue-dimensions-enabled
. #6772-distributor.metric-relabeling-enabled
or associated YAML. #6970-distributor.remote-timeout
is now accounted from the first ingester push request being sent. #6972-<prefix>.s3.sts-endpoint
sets a custom endpoint for AWS Security Token Service (AWS STS) in s3 storage provider. #6172cortex_querier_queries_storage_type_total
metric that indicates how many queries have executed for a source, ingesters or store-gateways. Add cortex_querier_query_storegateway_chunks_total
metric to count the number of chunks fetched from a store gateway. #7099,#7145-query-frontend.shard-active-series-queries
. #6784-distributor.reusable-ingester-push-workers=2000
by default and mark feature as advanced
. #7128-server.grpc.num-workers=100
by default and mark feature as advanced
. #7131source
, level
, and out_or_order
to cortex_bucket_store_series_blocks_queried
metric that indicates the number of blocks that were queried from store gateways by block metadata. #7112 #7262 #7267cortex_bucket_index_estimated_compaction_jobs
metric. If computation of jobs fails, cortex_bucket_index_estimated_compaction_jobs_errors_total
is updated instead. #7299cortex_alertmanager_notifications_suppressed_total
that counts the total number of notifications suppressed for being silenced, inhibited, outside of active time intervals or within muted time intervals. #7384cortex_query_scheduler_queue_duration_seconds
histogram metric, in order to better track queries staying in the queue for longer than 10s. #7470type
label is added to prometheus_tsdb_head_out_of_order_samples_appended_total
metric. #7475-ingester.use-ingester-owned-series-for-limits
) now prevents discards in cases where a tenant is sharded across all ingesters (or shuffle sharding is disabled) and the ingester count increases. #7411-query-frontend.active-series-write-timeout
to allow configuring the server-side write timeout for active series requests. #7553 #7569-querier.max-fetched-series-per-query
is not applied to /series
endpoint if the series are loaded from ingesters. #7055-distributor.metric-relabeling-enabled
may cause distributors to panic #7176-distributor.metric-relabeling-enabled
may cause distributors to write unsorted labels and corrupt blocks #7326cortex_query_frontend_queries_total
report incorrectly reported op="query"
for any request which wasn't a range query. Now the op
label value can be one of the following: #7207
query
: instant queryquery_range
: range querycardinality
: cardinality querylabel_names_and_values
: label names / values queryactive_series
: active series queryother
: any other requestgoogle.golang.org/grpc
to resolve occasional issues with gRPC server closing its side of connection before it was drained by the client. #7380active_series
requests when the request context is canceled. #7378cortex_ruler_write_requests_failed_total
metric. #7472job
label matcher for distributor and gateway have been extended to include any deployment matching distributor.*
and cortex-gw.*
respectively. This change allows to match custom and multi-zone distributor and gateway deployments too. #6817cortex_request_duration_seconds
. #7528step
parameter from targets as it is not supported. #7157cortex_memcache_request_duration_seconds
and cortex_cache_request_duration_seconds
. #7514JAEGER_REPORTER_MAX_QUEUE_SIZE
from the default (100) to 1000, to avoid dropping tracing spans. #7259JAEGER_REPORTER_MAX_QUEUE_SIZE
from 1000 to 5000, to avoid dropping tracing spans. #6764JAEGER_REPORTER_MAX_QUEUE_SIZE
from the default (100) to 1000, to avoid dropping tracing spans. #7068JAEGER_REPORTER_MAX_QUEUE_SIZE
from the default (100), to avoid dropping tracing spans. #7086-distributor.ring.heartbeat-period
set to 1m
-distributor.ring.heartbeat-timeout
set to 4m
-ingester.ring.heartbeat-period
set to 2m
-store-gateway.sharding-ring.heartbeat-period
set to 1m
-store-gateway.sharding-ring.heartbeat-timeout
set to 4m
-compactor.ring.heartbeat-period
set to 1m
-compactor.ring.heartbeat-timeout
set to 4m
ruler_querier_topology_spread_max_skew
instead of querier_topology_spread_max_skew
. #7204-server.grpc.keepalive.max-connection-age
lowered from 2m
to 60s
and configured -shutdown-delay=90s
and termination grace period to 100
seconds in order to reduce the chances of failed gRPC write requests when distributors gracefully shutdown. #7361alertmanager_node_affinity_matchers
compactor_node_affinity_matchers
continuous_test_node_affinity_matchers
distributor_node_affinity_matchers
ingester_node_affinity_matchers
ingester_zone_a_node_affinity_matchers
ingester_zone_b_node_affinity_matchers
ingester_zone_c_node_affinity_matchers
mimir_backend_node_affinity_matchers
mimir_backend_zone_a_node_affinity_matchers
mimir_backend_zone_b_node_affinity_matchers
mimir_backend_zone_c_node_affinity_matchers
mimir_read_node_affinity_matchers
mimir_write_node_affinity_matchers
mimir_write_zone_a_node_affinity_matchers
mimir_write_zone_b_node_affinity_matchers
mimir_write_zone_c_node_affinity_matchers
overrides_exporter_node_affinity_matchers
querier_node_affinity_matchers
query_frontend_node_affinity_matchers
query_scheduler_node_affinity_matchers
rollout_operator_node_affinity_matchers
ruler_node_affinity_matchers
ruler_node_affinity_matchers
ruler_querier_node_affinity_matchers
ruler_query_frontend_node_affinity_matchers
ruler_query_scheduler_node_affinity_matchers
store_gateway_node_affinity_matchers
store_gateway_node_affinity_matchers
store_gateway_zone_a_node_affinity_matchers
store_gateway_zone_b_node_affinity_matchers
store_gateway_zone_c_node_affinity_matchers
ingester_automated_downscale_enabled
flag. It is disabled by default. #6850MimirStoreGatewayTooManyFailedOperations
warning alert that triggers when Mimir store-gateway report error when interacting with the object storage. #6831-shutdown-delay
, -server.grpc.keepalive.max-connection-age
and termination grace period to reduce the likelihood of queries hitting terminated query-frontends. #7129ignoreNullValues
option for Prometheus scaler. #7471migrate-utf8
to migrate Alertmanager configurations for Alertmanager versions 0.27.0 and later. #7383--extra-headers
option to mimirtool rules
command to add extra headers to requests for auth. #7141--output-dir
to mimirtool alertmanager get
where the config and templates will be written to and can be loaded via mimirtool alertmanager load
#6760Host
HTTP header was not being correctly changed for the proxy targets. #7386__REQUEST_HEADER_X_SCOPE_ORGID__
. #7452KubePersistentVolumeFillingUp
alert. #7297All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.11.0...mimir-2.12.0-rc.0
Published by leizor 10 months ago
This release contains 532 PRs from 55 authors, including new contributors Benjamin, Dominik Kepinski, Jonathan Donzallaz, Juraj Michálek, Kai.Ke, Ludovic Terrier, Luke, Maciej Lech, Matthew Penner, Michael Potter, Mihai Țimbota-Belin, Rasmus Werner Salling, Ying WANG, chencs, fayzal-g, kalle (jag), sarthaktyagi-505, whoami. Thank you!
Grafana Labs is excited to announce version 2.11 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
-ingester.error-sample-rate
CLI flag.-ingester.instance-limits.max-inflight-push-requests-bytes
CLI flag in combination with the -ingester.limit-inflight-requests-using-grpc-method-limiter
CLI flag.-validation.max-native-histogram-buckets
. This is enabled by default but can be turned off by setting the -validation.reduce-native-histogram-over-max-buckets
CLI flag to false
.Grafana Mimir 2.11 includes new features that are considered experimental and disabled by default. Please use them with caution and report any issue you encounter:
blocked_queries
limit. See the docs for more information.-distributor.enable-otlp-metadata-storage
to true
.-ingester.limit-inflight-requests-using-grpc-method-limiter
and/or the -distributor.limit-inflight-requests-using-grpc-method-limiter
CLI flags for the ingester and/or the distributor, respectively.-blocks-storage.bucket-store.chunks-cache.memcached.read-buffer-size-bytes
-blocks-storage.bucket-store.chunks-cache.memcached.write-buffer-size-bytes
-blocks-storage.bucket-store.index-cache.memcached.read-buffer-size-bytes
-blocks-storage.bucket-store.index-cache.memcached.write-buffer-size-bytes
-blocks-storage.bucket-store.metadata-cache.memcached.read-buffer-size-bytes
-blocks-storage.bucket-store.metadata-cache.memcached.write-buffer-size-bytes
-query-frontend.results-cache.memcached.read-buffer-size-bytes
-query-frontend.results-cache.memcached.write-buffer-size-bytes
-ruler-storage.cache.memcached.read-buffer-size-bytes
-ruler-storage.cache.memcached.write-buffer-size-bytes
-server.grpc.num-workers
CLI flag.PostingsForMatchers
cache used by ingesters. This limit can be configured via the -blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes
and -blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes
CLI flags.-distributor.reusable-ingester-push-worker
flag.Retry-After
header in recoverable error responses from the distributor. This can protect your Mimir cluster from clients including Prometheus that default to retrying very quickly. Enable this feature by setting the -distributor.retry-after-header.enabled
CLI flag.The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.
In Grafana Mimir 2.11 the following behavior has changed:
The following configuration options had been previously deprecated and are removed in Grafana Mimir 2.11:
-querier.iterators
.-query.batch-iterators
.-blocks-storage.bucket-store.bucket-index.enabled
.-blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes
.-blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes
.-blocks-storage.bucket-store.max-chunk-pool-bytes
.The following configuration options are deprecated and will be removed in Grafana Mimir 2.13:
-log.buffered
; this is now the default behavior.The following metrics are removed:
cortex_query_frontend_workers_enqueued_requests_total
; use cortex_query_frontend_enqueue_duration_seconds_count
instead.The following configuration option defaults were changed:
-blocks-storage.bucket-store.index-header.sparse-persistence-enabled
now defaults to true.-blocks-storage.bucket-store.index-header.lazy-loading-concurrency
was changed from 0
to 4
.-blocks-storage.tsdb.series-hash-cache-max-size-bytes
was changed from 1GB
to 350MB
.-blocks-storage.tsdb.early-head-compaction-min-estimated-series-reduction-percentage
was changed from 10
to 15
.distributor.service_overload_status_code_on_rate_limit_enabled
flag is active. PR 6549
-querier.iterators
-querier.batch-iterators
-blocks-storage.bucket-store.max-chunk-pool-bytes
-blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes
-blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes
-blocks-storage.bucket-store.bucket-index.enabled
tls_server_name
. The GRPC config specified under -querier.frontend-client.*
will no longer apply to the scheduler client, and will need to be set explicitly under -querier.scheduler-client.*
. #6445 #6573-log.buffered
by default. The -log.buffered
has been deprecated and will be removed in Mimir 2.13. #6131-blocks-storage.tsdb.series-hash-cache-max-size-bytes
setting from 1GB
to 350MB
. The new default cache size is enough to store the hashes for all series in a ingester, assuming up to 2M in-memory series per ingester and using the default 13h retention period for local TSDB blocks in the ingesters. #6130cortex_query_frontend_workers_enqueued_requests_total
. Use cortex_query_frontend_enqueue_duration_seconds_count
instead. #6121-blocks-storage.tsdb.early-head-compaction-min-estimated-series-reduction-percentage
from 10 to 15. #6186/ingester/push
HTTP endpoint has been removed. This endpoint was added for testing and troubleshooting, but was never documented or used for anything. #6299-log.rate-limit-logs-per-second-burst
renamed to -log.rate-limit-logs-burst-size
. #6230Push()
now returns errors with gRPC codes: #6377
http.StatusAccepted
(202) code is replaced with codes.AlreadyExists
.http.BadRequest
(400) code is replaced with codes.FailedPrecondition
.http.StatusTooManyRequests
(429) and the non-standard 529
(The service is overloaded) codes are replaced with codes.ResourceExhausted
.-ingester.return-only-grpc-errors
to true, ingester will return only gRPC errors. This feature changes the following status codes: #6443 #6680 #6723
http.StatusBadRequest
(400) is replaced with codes.FailedPrecondition
on the write path.http.StatusServiceUnavailable
(503) is replaced with codes.Internal
on the write path, and with codes.ResourceExhausted
on the read path.codes.Unknown
is replaced with codes.Internal
on both write and read path.cortex_querier_blocks_consistency_checks_failed_total
is now incremented when a block couldn't be queried from any attempted store-gateway as opposed to incremented after each attempt. Also cortex_querier_blocks_consistency_checks_total
is incremented once per query as opposed to once per attempt (with 3 attempts). #6590-distributor.retry-after-header.enabled
to include the Retry-After
header in recoverable error responses. #6608blocked_queries
. #5609AppRole
, Kubernetes
, UserPass
and Token
. #6143/api/v1/cardinality/active_series
to return the set of active series for a given selector. #6536 #6619 #6651 #6667-<prefix>.s3.part-size
flag to configure the S3 minimum file size in bytes used for multipart uploads. #6592-<prefix>.s3.send-content-md5
flag (defaults to false
) to configure S3 Put Object requests to send a Content-MD5
header. Setting this flag is not recommended unless your object storage does not support checksums. #6622-distributor.reusable-ingester-push-worker
that can be used to pre-allocate a pool of workers to be used to send push requests to the ingesters. #6660-distributor.otel-metric-suffixes-enabled
. #6542cortex_ingester_inflight_push_requests_summary
tracking total number of inflight requests in percentile buckets. #5845cortex_query_scheduler_enqueue_duration_seconds
metric that records the time taken to enqueue or reject a query request. #5879cortex_query_frontend_enqueue_duration_seconds
metric that records the time taken to enqueue or reject a query request. When query-scheduler is in use, the metric has the scheduler_address
label to differentiate the enqueue duration by query-scheduler backend. #5879 #6087 #6120cortex_bucket_store_blocks_loaded_by_duration
for counting the loaded number of blocks based on their duration. #6074 #6129/sync/mutex/wait/total:seconds
Go runtime metric as go_sync_mutex_wait_total_seconds_total
from all components. #5879cortex_ruler_queries_zero_fetched_series_total
metric to track rules that fetched no series. #5925limit
, limit_per_metric
and metric
parameters for <Prometheus HTTP prefix>/api/v1/metadata
endpoint. #5890-distributor.enable-otlp-metadata-storage=true
. #5693 #6035 #6254-ingester.error-sample-rate
. This way each error will be logged once in the configured number of times. All the discarded samples will still be tracked by the cortex_discarded_samples_total
metric. #5584 #6014-vault.enabled
is true. #5239group by
aggregation queries. #6024-vault.enabled
is true. #6052.-blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes
and -blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes
to enforce a limit in bytes on the PostingsForMatchers()
cache used by ingesters (the cache limit is per TSDB head and block basis, not a global one). The experimental configuration options -blocks-storage.tsdb.head-postings-for-matchers-cache-size
and -blocks-storage.tsdb.block-postings-for-matchers-cache-size
have been deprecated. #6151PostingsForMatchers()
in-memory cache for label values queries with matchers too. #6151cortex_querier_federation_exemplar_tenants_queried
and cortex_querier_federation_tenants_queried
metrics to track the number of tenants queried by multi-tenant queries. #6374 #6409-server.grpc.num-workers
flag that configures the number of long-living workers used to process gRPC requests. This could decrease the CPU usage by reducing the number of stack allocations. #6311the stream has already been exhausted
. #6345 #6433instance_enable_ipv6
to support IPv6. #6111-<prefix>.memcached.write-buffer-size-bytes
-<prefix>.memcached.read-buffer-size-bytes
to customise the memcached client write and read buffer size (the buffer is allocated for each memcached connection). #6468-ingester.limit-inflight-requests-using-grpc-method-limiter
for ingester, and -distributor.limit-inflight-requests-using-grpc-method-limiter
for distributor. #5976 #6300-store-gateway.sharding-ring.num-tokens
, default-value=512
#4863-server.http-read-header-timeout
to enable specifying a timeout for reading HTTP request headers. It defaults to 0, in which case reading of headers can take up to -server.http-read-timeout
, leaving no time for reading body, if there's any. #6517-<prefix>.azure.connection-string
, for Azure Blob Storage. #6487-ingester.instance-limits.max-inflight-push-requests-bytes
. This limit protects the ingester against requests that together may cause an OOM. #6492cortex_ingester_local_limits
metric to expose the calculated local per-tenant limits seen at each ingester. Exports the local per-tenant series limit with label {limit="max_global_series_per_user"}
#6403-server.report-grpc-codes-in-instrumentation-label-enabled
CLI flag to specify whether gRPC status codes should be used in status_code
label of cortex_request_duration_seconds
metric. It defaults to false, meaning that successful and erroneous gRPC status codes are represented with success
and error
respectively. #6562-ingester.client.report-grpc-codes-in-instrumentation-label-enabled
CLI flag to specify whether gRPC status codes should be used in status_code
label of cortex_ingester_client_request_duration_seconds
metric. It defaults to false, meaning that successful and erroneous gRPC status codes are represented with 2xx
and error
respectively. #6562-server.http-log-closed-connections-without-response-enabled
option to log details about connections to HTTP server that were closed before any data was sent back. This can happen if client doesn't manage to send complete HTTP headers before timeout. #6612-validation.max-native-histogram-buckets
. This is enabled by default and can be turned off by setting -validation.reduce-native-histogram-over-max-buckets
to false
. #6535-query-frontend.not-running-timeout
to a non-zero value to enable. #6621querier.Select
tracing span. #6085attempted to read series at index XXX from stream, but the stream has already been exhausted
(or even no error at all) when streaming chunks from ingesters or store-gateways is enabled and an error occurs while streaming chunks. #6346status_code
label in Mimir dashboards. In case of gRPC calls, the successful status_code
label on cortex_request_duration_seconds
and gRPC client request duration metrics has changed from 'success' and '2xx' to 'OK'. #6561MimirGossipMembersMismatch
alert and replace it with MimirGossipMembersTooHigh
and MimirGossipMembersTooLow
alerts that should have a higher signal-to-noise ratio. #6508CompactorSkippedBlocksWithOutOfOrderChunks
when multiple blocks are affected. #6410GossipMembersMismatch
warning message referred to per-instance labels that were not produced by the alert query. #6146-server.grpc-max-concurrent-streams
to 500. #5666_config.cluster_domain
from cluster.local
to cluster.local.
to reduce the number of DNS lookups made by Mimir. #6389_config.autoscaling_query_frontend_cpu_target_utilization
from 1
to 0.75
. #6395store_gateway_automated_downscale_enabled
flag. It is disabled by default. #6149_config
parameters: #6181
ingester_tsdb_head_early_compaction_enabled
(disabled by default)ingester_tsdb_head_early_compaction_reduction_percentage
ingester_tsdb_head_early_compaction_min_in_memory_series
maxUnavailable
to 0 for distributor
, overrides-exporter
, querier
, query-frontend
, query-scheduler
ruler-querier
, ruler-query-frontend
, ruler-query-scheduler
and consul
deployments, to ensure they don't become completely unavailable during a rollout. #5924v0.9.0
. #6022 #6110 #6558 #6681memcached:1.6.22-alpine
. #6585-blocks-storage.bucket-store.index-header-lazy-loading-enabled
replaced with -blocks-storage.bucket-store.index-header.lazy-loading-enabled
-blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout
replaced with -blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout
scaler
label on keda_*
metrics. #6528--read-timeout
was applied to the entire mimirtool analyze grafana
invocation rather than to individual Grafana API calls. #5915mimirtool remote-read
commands on Windows. #6011mimirtool alertmanager load
command. #6138All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.5...mimir-2.11.0
Published by leizor 10 months ago
This release contains 531 PRs from 55 authors, including new contributors Benjamin, Dominik Kepinski, Jonathan Donzallaz, Juraj Michálek, Kai.Ke, Ludovic Terrier, Luke, Maciej Lech, Matthew Penner, Michael Potter, Mihai Țimbota-Belin, Rasmus Werner Salling, Ying WANG, chencs, fayzal-g, kalle (jag), renovate[bot], sarthaktyagi-505, whoami. Thank you!
Grafana Labs is excited to announce version 2.11 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
-ingester.error-sample-rate
CLI flag.-ingester.instance-limits.max-inflight-push-requests-bytes
CLI flag in combination with the -ingester.limit-inflight-requests-using-grpc-method-limiter
CLI flag.-validation.max-native-histogram-buckets
. This is enabled by default but can be turned off by setting the -validation.reduce-native-histogram-over-max-buckets
CLI flag to false
.Grafana Mimir 2.11 includes new features that are considered experimental and disabled by default. Please use them with caution and report any issue you encounter:
blocked_queries
limit. See the docs for more information.-distributor.enable-otlp-metadata-storage
to true
.-ingester.limit-inflight-requests-using-grpc-method-limiter
and/or the -distributor.limit-inflight-requests-using-grpc-method-limiter
CLI flags for the ingester and/or the distributor, respectively.-blocks-storage.bucket-store.chunks-cache.memcached.read-buffer-size-bytes
-blocks-storage.bucket-store.chunks-cache.memcached.write-buffer-size-bytes
-blocks-storage.bucket-store.index-cache.memcached.read-buffer-size-bytes
-blocks-storage.bucket-store.index-cache.memcached.write-buffer-size-bytes
-blocks-storage.bucket-store.metadata-cache.memcached.read-buffer-size-bytes
-blocks-storage.bucket-store.metadata-cache.memcached.write-buffer-size-bytes
-query-frontend.results-cache.memcached.read-buffer-size-bytes
-query-frontend.results-cache.memcached.write-buffer-size-bytes
-ruler-storage.cache.memcached.read-buffer-size-bytes
-ruler-storage.cache.memcached.write-buffer-size-bytes
-server.grpc.num-workers
CLI flag.PostingsForMatchers
cache used by ingesters. This limit can be configured via the -blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes
and -blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes
CLI flags.-distributor.reusable-ingester-push-worker
flag.Retry-After
header in recoverable error responses from the distributor. This can protect your Mimir cluster from clients including Prometheus that default to retrying very quickly. Enable this feature by setting the -distributor.retry-after-header.enabled
CLI flag.The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.
In Grafana Mimir 2.11 the following behavior has changed:
The following configuration options had been previously deprecated and are removed in Grafana Mimir 2.11:
-querier.iterators
.-query.batch-iterators
.-blocks-storage.bucket-store.bucket-index.enabled
.-blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes
.-blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes
.-blocks-storage.bucket-store.max-chunk-pool-bytes
.The following configuration options are deprecated and will be removed in Grafana Mimir 2.13:
-log.buffered
; this is now the default behavior.The following metrics are removed:
cortex_query_frontend_workers_enqueued_requests_total
; use cortex_query_frontend_enqueue_duration_seconds_count
instead.The following configuration option defaults were changed:
-blocks-storage.bucket-store.index-header.sparse-persistence-enabled
now defaults to true.-blocks-storage.bucket-store.index-header.lazy-loading-concurrency
was changed from 0
to 4
.-blocks-storage.tsdb.series-hash-cache-max-size-bytes
was changed from 1GB
to 350MB
.-blocks-storage.tsdb.early-head-compaction-min-estimated-series-reduction-percentage
was changed from 10
to 15
.distributor.service_overload_status_code_on_rate_limit_enabled
flag is active. PR 6549
-querier.iterators
-querier.batch-iterators
-blocks-storage.bucket-store.max-chunk-pool-bytes
-blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes
-blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes
-blocks-storage.bucket-store.bucket-index.enabled
tls_server_name
. The GRPC config specified under -querier.frontend-client.*
will no longer apply to the scheduler client, and will need to be set explicitly under -querier.scheduler-client.*
. #6445 #6573-log.buffered
by default. The -log.buffered
has been deprecated and will be removed in Mimir 2.13. #6131-blocks-storage.tsdb.series-hash-cache-max-size-bytes
setting from 1GB
to 350MB
. The new default cache size is enough to store the hashes for all series in a ingester, assuming up to 2M in-memory series per ingester and using the default 13h retention period for local TSDB blocks in the ingesters. #6130cortex_query_frontend_workers_enqueued_requests_total
. Use cortex_query_frontend_enqueue_duration_seconds_count
instead. #6121-blocks-storage.tsdb.early-head-compaction-min-estimated-series-reduction-percentage
from 10 to 15. #6186/ingester/push
HTTP endpoint has been removed. This endpoint was added for testing and troubleshooting, but was never documented or used for anything. #6299-log.rate-limit-logs-per-second-burst
renamed to -log.rate-limit-logs-burst-size
. #6230Push()
now returns errors with gRPC codes: #6377
http.StatusAccepted
(202) code is replaced with codes.AlreadyExists
.http.BadRequest
(400) code is replaced with codes.FailedPrecondition
.http.StatusTooManyRequests
(429) and the non-standard 529
(The service is overloaded) codes are replaced with codes.ResourceExhausted
.-ingester.return-only-grpc-errors
to true, ingester will return only gRPC errors. This feature changes the following status codes: #6443 #6680 #6723
http.StatusBadRequest
(400) is replaced with codes.FailedPrecondition
on the write path.http.StatusServiceUnavailable
(503) is replaced with codes.Internal
on the write path, and with codes.ResourceExhausted
on the read path.codes.Unknown
is replaced with codes.Internal
on both write and read path.cortex_querier_blocks_consistency_checks_failed_total
is now incremented when a block couldn't be queried from any attempted store-gateway as opposed to incremented after each attempt. Also cortex_querier_blocks_consistency_checks_total
is incremented once per query as opposed to once per attempt (with 3 attempts). #6590-distributor.retry-after-header.enabled
to include the Retry-After
header in recoverable error responses. #6608blocked_queries
. #5609AppRole
, Kubernetes
, UserPass
and Token
. #6143/api/v1/cardinality/active_series
to return the set of active series for a given selector. #6536 #6619 #6651 #6667-<prefix>.s3.part-size
flag to configure the S3 minimum file size in bytes used for multipart uploads. #6592-<prefix>.s3.send-content-md5
flag (defaults to false
) to configure S3 Put Object requests to send a Content-MD5
header. Setting this flag is not recommended unless your object storage does not support checksums. #6622-distributor.reusable-ingester-push-worker
that can be used to pre-allocate a pool of workers to be used to send push requests to the ingesters. #6660-distributor.otel-metric-suffixes-enabled
. #6542cortex_ingester_inflight_push_requests_summary
tracking total number of inflight requests in percentile buckets. #5845cortex_query_scheduler_enqueue_duration_seconds
metric that records the time taken to enqueue or reject a query request. #5879cortex_query_frontend_enqueue_duration_seconds
metric that records the time taken to enqueue or reject a query request. When query-scheduler is in use, the metric has the scheduler_address
label to differentiate the enqueue duration by query-scheduler backend. #5879 #6087 #6120cortex_bucket_store_blocks_loaded_by_duration
for counting the loaded number of blocks based on their duration. #6074 #6129/sync/mutex/wait/total:seconds
Go runtime metric as go_sync_mutex_wait_total_seconds_total
from all components. #5879cortex_ruler_queries_zero_fetched_series_total
metric to track rules that fetched no series. #5925limit
, limit_per_metric
and metric
parameters for <Prometheus HTTP prefix>/api/v1/metadata
endpoint. #5890-distributor.enable-otlp-metadata-storage=true
. #5693 #6035 #6254-ingester.error-sample-rate
. This way each error will be logged once in the configured number of times. All the discarded samples will still be tracked by the cortex_discarded_samples_total
metric. #5584 #6014-vault.enabled
is true. #5239group by
aggregation queries. #6024-vault.enabled
is true. #6052.-blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes
and -blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes
to enforce a limit in bytes on the PostingsForMatchers()
cache used by ingesters (the cache limit is per TSDB head and block basis, not a global one). The experimental configuration options -blocks-storage.tsdb.head-postings-for-matchers-cache-size
and -blocks-storage.tsdb.block-postings-for-matchers-cache-size
have been deprecated. #6151PostingsForMatchers()
in-memory cache for label values queries with matchers too. #6151cortex_querier_federation_exemplar_tenants_queried
and cortex_querier_federation_tenants_queried
metrics to track the number of tenants queried by multi-tenant queries. #6374 #6409-server.grpc.num-workers
flag that configures the number of long-living workers used to process gRPC requests. This could decrease the CPU usage by reducing the number of stack allocations. #6311the stream has already been exhausted
. #6345 #6433instance_enable_ipv6
to support IPv6. #6111-<prefix>.memcached.write-buffer-size-bytes
-<prefix>.memcached.read-buffer-size-bytes
to customise the memcached client write and read buffer size (the buffer is allocated for each memcached connection). #6468-ingester.limit-inflight-requests-using-grpc-method-limiter
for ingester, and -distributor.limit-inflight-requests-using-grpc-method-limiter
for distributor. #5976 #6300-store-gateway.sharding-ring.num-tokens
, default-value=512
#4863-server.http-read-header-timeout
to enable specifying a timeout for reading HTTP request headers. It defaults to 0, in which case reading of headers can take up to -server.http-read-timeout
, leaving no time for reading body, if there's any. #6517-<prefix>.azure.connection-string
, for Azure Blob Storage. #6487-ingester.instance-limits.max-inflight-push-requests-bytes
. This limit protects the ingester against requests that together may cause an OOM. #6492cortex_ingester_local_limits
metric to expose the calculated local per-tenant limits seen at each ingester. Exports the local per-tenant series limit with label {limit="max_global_series_per_user"}
#6403-server.report-grpc-codes-in-instrumentation-label-enabled
CLI flag to specify whether gRPC status codes should be used in status_code
label of cortex_request_duration_seconds
metric. It defaults to false, meaning that successful and erroneous gRPC status codes are represented with success
and error
respectively. #6562-ingester.client.report-grpc-codes-in-instrumentation-label-enabled
CLI flag to specify whether gRPC status codes should be used in status_code
label of cortex_ingester_client_request_duration_seconds
metric. It defaults to false, meaning that successful and erroneous gRPC status codes are represented with 2xx
and error
respectively. #6562-server.http-log-closed-connections-without-response-enabled
option to log details about connections to HTTP server that were closed before any data was sent back. This can happen if client doesn't manage to send complete HTTP headers before timeout. #6612-validation.max-native-histogram-buckets
. This is enabled by default and can be turned off by setting -validation.reduce-native-histogram-over-max-buckets
to false
. #6535-query-frontend.not-running-timeout
to a non-zero value to enable. #6621querier.Select
tracing span. #6085attempted to read series at index XXX from stream, but the stream has already been exhausted
(or even no error at all) when streaming chunks from ingesters or store-gateways is enabled and an error occurs while streaming chunks. #6346status_code
label in Mimir dashboards. In case of gRPC calls, the successful status_code
label on cortex_request_duration_seconds
and gRPC client request duration metrics has changed from 'success' and '2xx' to 'OK'. #6561MimirGossipMembersMismatch
alert and replace it with MimirGossipMembersTooHigh
and MimirGossipMembersTooLow
alerts that should have a higher signal-to-noise ratio. #6508CompactorSkippedBlocksWithOutOfOrderChunks
when multiple blocks are affected. #6410GossipMembersMismatch
warning message referred to per-instance labels that were not produced by the alert query. #6146-server.grpc-max-concurrent-streams
to 500. #5666_config.cluster_domain
from cluster.local
to cluster.local.
to reduce the number of DNS lookups made by Mimir. #6389_config.autoscaling_query_frontend_cpu_target_utilization
from 1
to 0.75
. #6395store_gateway_automated_downscale_enabled
flag. It is disabled by default. #6149_config
parameters: #6181
ingester_tsdb_head_early_compaction_enabled
(disabled by default)ingester_tsdb_head_early_compaction_reduction_percentage
ingester_tsdb_head_early_compaction_min_in_memory_series
maxUnavailable
to 0 for distributor
, overrides-exporter
, querier
, query-frontend
, query-scheduler
ruler-querier
, ruler-query-frontend
, ruler-query-scheduler
and consul
deployments, to ensure they don't become completely unavailable during a rollout. #5924v0.9.0
. #6022 #6110 #6558 #6681memcached:1.6.22-alpine
. #6585-blocks-storage.bucket-store.index-header-lazy-loading-enabled
replaced with -blocks-storage.bucket-store.index-header.lazy-loading-enabled
-blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout
replaced with -blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout
scaler
label on keda_*
metrics. #6528--read-timeout
was applied to the entire mimirtool analyze grafana
invocation rather than to individual Grafana API calls. #5915mimirtool remote-read
commands on Windows. #6011mimirtool alertmanager load
command. #6138All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.4...mimir-2.11.0-rc.0
Published by dimitarvdimitrov 10 months ago
alpine:3.18.3
to alpine:3.18.5
. #6897All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.4...mimir-2.10.5
Published by dimitarvdimitrov 10 months ago
alpine:3.18.3
to alpine:3.18.5
. #6895All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.9.3...mimir-2.9.4
Published by fayzal-g 11 months ago
This release contains 1 PR from 1 author. Thank you!
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp
to 0.44
which includes a fix for CVE-2023-45142. #6637All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.9.2...mimir-2.9.3
Published by colega 11 months ago
This release contains 3 PRs from 1 authors. Thank you!
All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.3...mimir-2.10.4
Published by colega about 1 year ago
This release contains 1 PR from 1 author. Thank you!
All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.2...mimir-2.10.3
Published by lamida about 1 year ago
This release contains 5 PRs from 3 authors. Thank you!
golang.org/x/net
to 0.17
, which include fix for CVE-2023-44487. #6353 #6364All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.9.1...mimir-2.9.2
Published by pstibrany about 1 year ago
This release contains 2 PRs from 1 authors. Thank you!
golang.org/x/net
to 0.17
, which include fix for CVE-2023-44487. #6349All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.1...mimir-2.10.2
Published by colega about 1 year ago
This release contains 6 PRs from 4 authors. Thank you!
All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.0...mimir-2.10.1
Published by colega about 1 year ago
This release contains 455 PRs from 54 authors, including new contributors Aaron Sanders, Alexander Proschek, Aljoscha Pörtner, balazs92117, Francois Gouteroux, Franco Posa, Heather Yuan, jingyang, kendrickclark, m4r1u2, Milan Plžík, Samir Teymurov, Sven Haardiek, Thomas Schaaf, Tiago Posse. Thank you!
Grafana Labs is excited to announce version 2.10 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
file
, ruler_group
and rule_name
parameters to the ruler endpoint /api/v1/rules
./api/v1/cardinality/label_values
by passing the count_method
parameter. You can set it to active
to count only series that are considered active according to the -ingester.active-series-metrics-idle-timeout
flag setting rather than counting all in-memory series.-log.buffered
CLI flag. This should reduce contention and resource usage under heavy usage patterns.__name__
posting group causing a reduction in the number of object storage API calls.Mimir
and the Version
set correctly in order to benefit from this improvement.-query-frontend.cache-results
is enabled, and -query-frontend.results-cache-ttl-for-cardinality-query
or -query-frontend.results-cache-ttl-for-labels-query
is set to a value greater than 0.-compactor.no-blocks-file-cleanup-enabled
option to true
./ingester/tenants
and /ingester/tsdb/{tenant}
to the ingester that provide debug information about tenants and their TSDBs.cortex_ingester_active_native_histogram_series
, cortex_ingester_active_native_histogram_series_custom_tracker
, cortex_ingester_active_native_histogram_buckets
, cortex_ingester_active_native_histogram_buckets_custom_tracker
. The first 2 are the subsets of the existing and unmodified cortex_ingester_active_series
and cortex_ingester_active_series_custom_tracker
respectively, only tracking native histogram series, and the last 2 are the equivalent for tracking the number of buckets in native histogram series.Additionally, the following previously experimental features are now considered stable:
-ruler-storage.cache.*
CLI flags or their respective YAML config options.-query-frontend.query-sharding-target-series-per-shard
; we recommend starting with a value of 2500
.-query-frontend.max-query-expression-size-bytes
.-overrides-exporter.ring.enabled
.-overrides-exporter.enabled-metrics
.results_cache_ttl
and results_cache_ttl_for_out_of_order_time_window
parameters.Grafana Mimir 2.10 includes new features that are considered as experimental and disabled by default. Please use them with caution and report any issues you encounter:
-blocks-storage.bucket-store.index-header-sparse-persistence-enabled
) as well as the ability to persist the list of block IDs that were lazy-loaded while running to eagerly load them upon startup to prevent starting up with no loaded blocks (-blocks-storage.bucket-store.index-header.eager-loading-startup-enabled
) and an option to limit the number of concurrent index-header loads when lazy-loading (-blocks-storage.bucket-store.index-header-lazy-loading-concurrency
).-querier.minimize-ingester-requests
).-blocks-storage.tsdb.early-head-compaction-min-in-memory-series
).-querier.prefer-streaming-chunks-from-store-gateways
option.-ingester.client.circuit-breaker.*
configuration options and should serve to let ingesters recover when under high pressure.-ingester.read-path-cpu-utilization-limit
, -ingester.read-path-memory-utilization-limit
, -ingester.log-utilization-based-limiter-cpu-samples
).The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.
In Grafana Mimir 2.10 we have changed the following behaviors:
ACTIVE
state in the ring. This is not expected to introduce any degradation in terms of query results correctness or high-availability.cortex_distributor_instance_rejected_requests_total
cortex_ingester_instance_rejected_requests_total
-validation.create-grace-period
is now enforced in the ingester. If you've configured -validation.create-grace-period
, make sure the configuration is applied to ingesters too.-validation.create-grace-period
is now enforced for exemplars. The cortex_discarded_exemplars_total{reason="exemplar_too_far_in_future",user="..."}
series is incremented when exemplars are dropped because their timestamp is greater than "now + grace_period".-validation.create-grace-period
is now enforced in the query-frontend even when the configured value is 0. When the value is 0, the query end time range is truncated to the current real-world time.The following metrics were removed:
cortex_ingester_shipper_dir_syncs_total
cortex_ingester_shipper_dir_sync_failures_total
The following configuration options are deprecated and will be removed in Grafana Mimir 2.12:
-blocks-storage.bucket-store.index-header-lazy-loading-enabled
is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-enabled
.-blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout
is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout
.-blocks-storage.bucket-store.index-header-lazy-loading-concurrency
is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-concurrency
.The following configuration options that were deprecated in Grafana Mimir 2.8 are removed:
blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup
.The following experimental configuration options were renamed or removed:
-querier.prefer-streaming-chunks
was renamed to -querier.prefer-streaming-chunks-from-ingesters
.-blocks-storage.bucket-store.chunks-cache.fine-grained-chunks-caching-enabled
was removed.-blocks-storage.bucket-store.fine-grained-chunks-caching-ranges-per-series
was removed.The following experimental options are now stable:
-shutdown-delay
.-ingester.ring.excluded-zones
.The following configuration option defaults were changed:
-querier.streaming-chunks-per-ingester-buffer-size
was changed from 512
to 256
.5s
(default inherited from gRPC client was 20s
) with a default max backoff delay of 5s
(default inherited from gRPC client was 120s
).LEAVING
and the number of tokens has changed upon restarting.timestamp()
function fail with execution: attempted to read series at index 0 from stream, but the stream has already been exhausted
if the experimental feature to stream chunks from ingesters to queriers is enabled.memberlist_client_kv_store_count
metric that used to exist in Cortex, but got lost during grafana/dskit updates before Mimir 2.0.blocks_storage.bucket_store.index_header.verify_on_load: true
. #5174-querier.streaming-chunks-per-ingester-buffer-size
flag to 256. #5203ACTIVE
state in the ring. #5342-querier.prefer-streaming-chunks
to -querier.prefer-streaming-chunks-from-ingesters
to enable streaming chunks from ingesters to queriers. #5182-query-frontend.cache-unaligned-requests
has been moved from a global flag to a per-tenant override. #5312cortex_ingester_shipper_dir_syncs_total
and cortex_ingester_shipper_dir_sync_failures_total
metrics. The former metric was not much useful, and the latter was never incremented. #5396-validation.create-grace-period
is now enforced in the ingester too, other than distributor and query-frontend. If you've configured -validation.create-grace-period
then make sure the configuration is applied to ingesters too. #5712-validation.create-grace-period
is now enforced for examplars too in the distributor. If an examplar has timestamp greater than "now + grace_period", then the exemplar will be dropped and the metric cortex_discarded_exemplars_total{reason="exemplar_too_far_in_future",user="..."}
increased. #5761-validation.create-grace-period
is now enforced in the query-frontend even when the configured value is 0. When the value is 0, the query end time range is truncated to the current real-world time. #5829blocks-storage.bucket-store
and use a new configurations in blocks-storage.bucket-store.index-header
, deprecated configuration will be removed in Mimir 2.12. Configuration changes: #5726
-blocks-storage.bucket-store.index-header-lazy-loading-enabled
is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-enabled
-blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout
is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout
-blocks-storage.bucket-store.index-header-lazy-loading-concurrency
is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-concurrency
-blocks-storage.bucket-store.chunks-cache.fine-grained-chunks-caching-enabled
, -blocks-storage.bucket-store.fine-grained-chunks-caching-ranges-per-series
. #5816 #5875blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup
. #5850-distributor.service-overload-status-code-on-rate-limit-enabled
flag for configuring status code to 529 instead of 429 upon rate limit exhaustion. #5752count_method
parameter which enables counting active series. #5136-query-frontend.cache-results
is enabled, and -query-frontend.results-cache-ttl-for-cardinality-query
or -query-frontend.results-cache-ttl-for-labels-query
set to a value greater than 0. The following metrics have been added to track the query results cache hit ratio per request_type
: #5212 #5235 #5426 #5524
cortex_frontend_query_result_cache_requests_total{request_type="query_range|cardinality|label_names_and_values"}
cortex_frontend_query_result_cache_hits_total{request_type="query_range|cardinality|label_names_and_values"}
-<prefix>.s3.list-objects-version
flag to configure the S3 list objects version. #5099-ingester.read-path-cpu-utilization-limit
-ingester.read-path-memory-utilization-limit
-ingester.log-utilization-based-limiter-cpu-samples
file
, rule_group
and rule_name
. #5291-ingester.ring.token-generation-strategy: spread-minimizing
and -ingester.ring.spread-minimizing-zones: <all available zones>
. In that case -ingester.ring.tokens-file-path
must be empty. #5308 #5324-ingester.ring.spread-minimizing-join-ring-in-order
that allows an ingester to register tokens in the ring only after all previous ingesters (with ID lower than its own ID) have already been registered. #5541-blocks-storage.tsdb.early-head-compaction-min-in-memory-series
, and the ingester estimates that the per-tenant TSDB Head compaction will reduce in-memory series by at least -blocks-storage.tsdb.early-head-compaction-min-estimated-series-reduction-percentage
. #5371cortex_ingester_active_native_histogram_series
, cortex_ingester_active_native_histogram_series_custom_tracker
, cortex_ingester_active_native_histogram_buckets
, cortex_ingester_active_native_histogram_buckets_custom_tracker
. The first 2 are the subsets of the existing and unmodified cortex_ingester_active_series
and cortex_ingester_active_series_custom_tracker
respectively, only tracking native histogram series, and the last 2 are the equivalents for tracking the number of buckets in native histogram series. #5318-<prefix>.s3.native-aws-auth-enabled
that allows to enable the default credentials provider chain of the AWS SDK. #5636-ingester.client.circuit-breaker.enabled
, -ingester.client.circuit-breaker.failure-threshold
, or -ingester.client.circuit-breaker.cooldown-period
or their corresponding YAML. #5650-ruler-storage.cache.*
)-ingester.ring.excluded-zones
)-query-frontend.query-sharding-target-series-per-shard
)-query-frontend.results-cache-ttl-for-cardinality-query
)-query-frontend.results-cache-ttl-for-labels-query
)-query-frontend.max-query-expression-size-bytes
)-overrides-exporter.ring.enabled
)-overrides-exporter.enabled-metrics
)-query-frontend.results-cache-ttl
, -query-frontend.results-cache-ttl-for-out-of-order-time-window
)-shutdown-delay
)-tenant-federation.max-concurrent
to adjust the max number of per-tenant queries that can be run at a time when executing a single multi-tenant query. #5874max_global_metadata_per_user
, max_global_metadata_per_metric
, request_rate
, request_burst_size
, alertmanager_notification_rate_limit
, alertmanager_max_dispatcher_aggregation_groups
, alertmanager_max_alerts_count
, alertmanager_max_alerts_size_bytes
) and added flag -overrides-exporter.enabled-metrics
to explicitly configure desired metrics, e.g. -overrides-exporter.enabled-metrics=request_rate,ingestion_rate
. Default value for this flag is: ingestion_rate,ingestion_burst_size,max_global_series_per_user,max_global_series_per_metric,max_global_exemplars_per_user,max_fetched_chunks_per_query,max_fetched_series_per_query,ruler_max_rules_per_rule_group,ruler_max_rule_groups_per_tenant
. #5376-timeseries-unmarshal-caching-optimization-enabled=false
. #5137-<prefix>.connect-timeout
-<prefix>.connect-backoff-base-delay
-<prefix>.connect-backoff-max-delay
-<prefix>.initial-stream-window-size
-<prefix>.initial-connection-window-size
-distributor.write-requests-buffer-pooling-enabled
to true
. #5195 #5805 #5830-querier.minimize-ingester-requests
option to initially query only the minimum set of ingesters required to reach quorum. #5202 #5259 #5263cortex_ruler_sync_rules_duration_seconds
metric, tracking the time spent syncing all rule groups owned by the ruler instance. #5311blocks-storage.bucket-store.index-header-lazy-loading-concurrency
config option to limit the number of concurrent index-headers loads when lazy loading. #5313 #5605cortex_querier_queries_rejected_total
metric that counts the number of queries rejected due to hitting a limit (eg. max series per query or max chunks per query). #5316 #5440 #5450-querier.minimize-ingester-requests-hedging-delay
option to initiate requests to further ingesters when request minimisation is enabled and not all initial requests have completed. #5368-ingester.client.*
flags to make it clear that these are used by both queriers and distributors. #5375-querier.prefer-streaming-chunks-from-store-gateways=true
. #5182max-chunks-per-query
limit earlier in query processing when streaming chunks from ingesters to queriers to avoid unnecessarily consuming resources for queries that will be aborted. #5369 #5447cortex_ingester_shipper_last_successful_upload_timestamp_seconds
metric tracking the last successful TSDB block uploaded to the bucket (unix timestamp in seconds). #5396cortex_ingester_utilization_limiter_current_cpu_load
: The current exponential weighted moving average of the ingester's CPU loadcortex_ingester_utilization_limiter_current_memory_usage_bytes
: The current ingester memory utilizationinsight=true
field to ruler's prometheus component for rule evaluation logs. #5510cortex_distributor_instance_rejected_requests_total
and cortex_ingester_instance_rejected_requests_total
respectively. #5551-log.buffered
CLI flag enable buffered logging.-compactor.no-blocks-file-cleanup-enabled
option. #5648-store-gateway.sharding-ring.auto-forget-enabled
configuration parameter to control whether store-gateway auto-forget feature should be enabled or disabled (enabled by default). #5702cortex_block_upload_api_blocks_total
, cortex_block_upload_api_bytes_total
, and cortex_block_upload_api_files_total
. #5738-log.rate-limit-enabled
-log.rate-limit-logs-per-second
-log.rate-limit-logs-per-second-burst
cortex_ingester_tsdb_head_min_timestamp_seconds
and cortex_ingester_tsdb_head_max_timestamp_seconds
metrics which return min and max time of all TSDB Heads open in an ingester. #5786 #5815-querier.max-estimated-fetched-chunks-per-query-multiplier
. #5765__name__
posting group in selection in order to reduce the number of object storage API calls. #5246timestamp()
function fail with execution: attempted to read series at index 0 from stream, but the stream has already been exhausted
if streaming chunks from ingesters to queriers is enabled. #5370memberlist_client_kv_store_count
metric that used to exist in Cortex, but got lost during dskit updates before Mimir 2.0. #5377cortex_ingester_client_request_duration_seconds
metric did not include streaming query requests that did not return any series. #5695not found
errors on label values API during head compaction. #5957MimirProvisioningTooManyActiveSeries
alert. You should configure -ingester.instance-limits.max-series
and rely on MimirIngesterReachingSeriesLimit
alert instead. #5593MimirProvisioningTooManyWrites
alert. The alerting threshold used in this alert was chosen arbitrarily and ingesters receiving an higher number of samples / sec don't necessarily have any issue. You should rely on SLOs metrics and alerts instead. #5706MimirRequestErrors
or MimirRequestLatency
alert for the /debug/pprof
endpoint. #5826MimirIngestedDataTooFarInTheFuture
warning alert that triggers when Mimir ingests sample with timestamp more than 1h in the future. #5822MimirIngesterRestarts
to fire only when the ingester container is restarted, excluding the cases the pod is rescheduled. #5397MimirIngesterHasNotShippedBlocks
and MimirIngesterHasNotShippedBlocksSinceStart
alerts. #5396MimirGossipMembersMismatch
to include admin-api
and custom compactor pods. admin-api
is a GEM component. #5641 #5797_config.querier.concurrency
configuration option and replaced it with _config.querier_max_concurrency
and _config.ruler_querier_max_concurrency
to allow to easily fine tune it for different querier deployments. #5322_config.multi_zone_ingester_max_unavailable
to 50. #5327maxSurge
and maxUnavailable
are set to 15%
and 0
. #5714autoscaling_alertmanager_enabled: true
. #5194 #5249track_sizes
feature for Memcached pods to help determine cache efficiency. #5209PodDisruptionBudget
s for compactor, continuous-test, distributor, overrides-exporter, querier, query-frontend, query-scheduler, rollout-operator, ruler, ruler-querier, ruler-query-frontend, ruler-query-scheduler, and all memcached workloads. #5098_config.shuffle_sharding.target_series_per_ingester
and _config.shuffle_sharding.target_utilization_percentage
values. #5470_config.autoscaling_distributor_cpu_target_utilization
. #5525_config.ruler_remote_evaluation_max_query_response_size_bytes
to easily set the maximum query response size allowed (in bytes). #5592GOMAXPROCS
based on the CPU request. This should reduce distributor CPU utilization, assuming the CPU request is set to a value close to the actual utilization. #5588GOMAXPROCS
based on the CPU request. This should reduce noisy neighbour issues created by the querier, whose CPU utilization could eventually saturate the Kubernetes node if unbounded. #5646 #5658null
in the *_env_map
objects (e.g. store_gateway_env_map+:: { 'field': null}
). #5599etcd
. #5589autoscaling_querier_target_utilization
(defaults to 0.75
)autoscaling_mimir_read_target_utilization
(defaults to 0.75
)autoscaling_ruler_querier_cpu_target_utilization
(defaults to 1
)autoscaling_distributor_memory_target_utilization
(defaults to 1
)autoscaling_ruler_cpu_target_utilization
(defaults to 1
)autoscaling_query_frontend_cpu_target_utilization
(defaults to 1
)autoscaling_ruler_query_frontend_cpu_target_utilization
(defaults to 1
)autoscaling_alertmanager_cpu_target_utilization
(defaults to 1
)v0.7.0
. #5718mimirtool analyse grafana
. This allows the tool to work correctly when running against Grafana instances with more than a 1000 dashboards. #5825__name__
matcher. #5911label_values(<label_name>)
when running mimirtool analyse grafana
. #5832Content-Type
response header from backend. Previously Content-Type: text/plain; charset=utf-8
was returned on all requests. #5183-proxy.compare-skip-recent-samples
to avoid racing with recording rule evaluation. #5561-backend.skip-tls-verify
to optionally skip TLS verification on backends. #5656get-started
documentation directory. #5476MimirRulerTooManyFailedQueries
runbook. #5586MimirRequestErrors
runbook for alertmanager. #5694-source-service
and -destination-service
flags are now required and the -service
flag has been removed. #5486-help
flag is passed. #5412All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.9.0...mimir-2.10.0
Published by ying-jeanne about 1 year ago
This release contains 2 PRs from 1 authors. Thank you!
All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.9.0...mimir-2.9.1
Published by colega about 1 year ago
This release contains 5 PRs from 3 authors. Thank you!
not found
errors on label values API during head compaction. #5957All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.0-rc.1...mimir-2.10.0-rc.2
Published by colega about 1 year ago
This release contains 12 PRs from 4 authors. Thank you!
-ruler-storage.cache.*
)-ingester.ring.excluded-zones
)-query-frontend.query-sharding-target-series-per-shard
)-query-frontend.results-cache-ttl-for-cardinality-query
)-query-frontend.results-cache-ttl-for-labels-query
)-query-frontend.max-query-expression-size-bytes
)-overrides-exporter.ring.enabled
)-overrides-exporter.enabled-metrics
)-query-frontend.results-cache-ttl
, -query-frontend.results-cache-ttl-for-out-of-order-time-window
)-tenant-federation.max-concurrent
to adjust the max number of per-tenant queries that can be run at a time when executing a single multi-tenant query. #5874mimirtool analyse grafana
. This allows the tool to work correctly when running against Grafana instances with more than a 1000 dashboards. #5825__name__
matcher. #5911label_values(<label_name>)
when running mimirtool analyse grafana
. #5832All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.0-rc.0...mimir-2.10.0-rc.1
Published by colega about 1 year ago
This release contains 434 PRs from 54 authors, including new contributors Aaron Sanders, Alexander Proschek, Aljoscha Pörtner, balazs92117, Francois Gouteroux, Franco Posa, Heather Yuan, jingyang, kendrickclark, m4r1u2, Milan Plžík, Samir Teymurov, Sven Haardiek, Thomas Schaaf, Tiago Posse. Thank you!
Pending, draft version can be seen at: https://github.com/grafana/mimir/pull/5873
blocks_storage.bucket_store.index_header.verify_on_load: true
. #5174-querier.streaming-chunks-per-ingester-buffer-size
flag to 256. #5203ACTIVE
state in the ring. #5342-querier.prefer-streaming-chunks
to -querier.prefer-streaming-chunks-from-ingesters
to enable streaming chunks from ingesters to queriers. #5182-query-frontend.cache-unaligned-requests
has been moved from a global flag to a per-tenant override. #5312cortex_ingester_shipper_dir_syncs_total
and cortex_ingester_shipper_dir_sync_failures_total
metrics. The former metric was not much useful, and the latter was never incremented. #5396-shutdown-delay
flag is no longer experimental. #5701-validation.create-grace-period
is now enforced in the ingester too, other than distributor and query-frontend. If you've configured -validation.create-grace-period
then make sure the configuration is applied to ingesters too. #5712-validation.create-grace-period
is now enforced for examplars too in the distributor. If an examplar has timestamp greater than "now + grace_period", then the exemplar will be dropped and the metric cortex_discarded_exemplars_total{reason="exemplar_too_far_in_future",user="..."}
increased. #5761-validation.create-grace-period
is now enforced in the query-frontend even when the configured value is 0. When the value is 0, the query end time range is truncated to the current real-world time. #5829blocks-storage.bucket-store
and use a new configurations in blocks-storage.bucket-store.index-header
, deprecated configuration will be removed in Mimir 2.12. Configuration changes: #5726
-blocks-storage.bucket-store.index-header-lazy-loading-enabled
is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-enabled
-blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout
is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout
-blocks-storage.bucket-store.index-header-lazy-loading-concurrency
is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-concurrency
-blocks-storage.bucket-store.chunks-cache.fine-grained-chunks-caching-enabled
, -blocks-storage.bucket-store.fine-grained-chunks-caching-ranges-per-series
. #5816blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup
. #5850distributor.service_overload_status_code_on_rate_limit_enabled
flag for configuring status code to 529 instead of 429 upon rate limit exhaustion. #5752count_method
parameter which enables counting active series #5136-query-frontend.cache-results
is enabled, and -query-frontend.results-cache-ttl-for-cardinality-query
or -query-frontend.results-cache-ttl-for-labels-query
set to a value greater than 0. The following metrics have been added to track the query results cache hit ratio per request_type
: #5212 #5235 #5426 #5524
cortex_frontend_query_result_cache_requests_total{request_type="query_range|cardinality|label_names_and_values"}
cortex_frontend_query_result_cache_hits_total{request_type="query_range|cardinality|label_names_and_values"}
-<prefix>.s3.list-objects-version
flag to configure the S3 list objects version. #5099-ingester.read-path-cpu-utilization-limit
-ingester.read-path-memory-utilization-limit
-ingester.log-utilization-based-limiter-cpu-samples
file
, rule_group
and rule_name
. #5291-ingester.ring.token-generation-strategy: spread-minimizing
and -ingester.ring.spread-minimizing-zones: <all available zones>
. In that case -ingester.ring.tokens-file-path
must be empty. #5308 #5324-ingester.ring.spread-minimizing-join-ring-in-order
that allows an ingester to register tokens in the ring only after all previous ingesters (with ID lower than its own ID) have already been registered. #5541-blocks-storage.tsdb.early-head-compaction-min-in-memory-series
, and the ingester estimates that the per-tenant TSDB Head compaction will reduce in-memory series by at least -blocks-storage.tsdb.early-head-compaction-min-estimated-series-reduction-percentage
. #5371cortex_ingester_active_native_histogram_series
, cortex_ingester_active_native_histogram_series_custom_tracker
, cortex_ingester_active_native_histogram_buckets
, cortex_ingester_active_native_histogram_buckets_custom_tracker
. The first 2 are the subsets of the existing and unmodified cortex_ingester_active_series
and cortex_ingester_active_series_custom_tracker
respectively, only tracking native histogram series, and the last 2 are the equivalents for tracking the number of buckets in native histogram series. #5318-<prefix>.s3.native-aws-auth-enabled
that allows to enable the default credentials provider chain of the AWS SDK. #5636-ingester.client.circuit-breaker.enabled
, -ingester.client.circuit-breaker.failure-threshold
, or -ingester.client.circuit-breaker.cooldown-period
or their corresponding YAML. #5650max_global_metadata_per_user
, max_global_metadata_per_metric
, request_rate
, request_burst_size
, alertmanager_notification_rate_limit
, alertmanager_max_dispatcher_aggregation_groups
, alertmanager_max_alerts_count
, alertmanager_max_alerts_size_bytes
) and added flag -overrides-exporter.enabled-metrics
to explicitly configure desired metrics, e.g. -overrides-exporter.enabled-metrics=request_rate,ingestion_rate
. Default value for this flag is: ingestion_rate,ingestion_burst_size,max_global_series_per_user,max_global_series_per_metric,max_global_exemplars_per_user,max_fetched_chunks_per_query,max_fetched_series_per_query,ruler_max_rules_per_rule_group,ruler_max_rule_groups_per_tenant
. #5376-timeseries-unmarshal-caching-optimization-enabled=false
. #5137-<prefix>.connect-timeout
-<prefix>.connect-backoff-base-delay
-<prefix>.connect-backoff-max-delay
-<prefix>.initial-stream-window-size
-<prefix>.initial-connection-window-size
-distributor.write-requests-buffer-pooling-enabled
to true
. #5195 #5805 #5830-querier.minimize-ingester-requests
option to initially query only the minimum set of ingesters required to reach quorum. #5202 #5259 #5263cortex_ruler_sync_rules_duration_seconds
metric, tracking the time spent syncing all rule groups owned by the ruler instance. #5311blocks-storage.bucket-store.index-header-lazy-loading-concurrency
config option to limit the number of concurrent index-headers loads when lazy loading. #5313 #5605cortex_querier_queries_rejected_total
metric that counts the number of queries rejected due to hitting a limit (eg. max series per query or max chunks per query). #5316 #5440 #5450-querier.minimize-ingester-requests-hedging-delay
option to initiate requests to further ingesters when request minimisation is enabled and not all initial requests have completed. #5368-ingester.client.*
flags to make it clear that these are used by both queriers and distributors. #5375-querier.prefer-streaming-chunks-from-store-gateways=true
. #5182max-chunks-per-query
limit earlier in query processing when streaming chunks from ingesters to queriers to avoid unnecessarily consuming resources for queries that will be aborted. #5369 #5447cortex_ingester_shipper_last_successful_upload_timestamp_seconds
metric tracking the last successful TSDB block uploaded to the bucket (unix timestamp in seconds). #5396cortex_ingester_utilization_limiter_current_cpu_load
: The current exponential weighted moving average of the ingester's CPU loadcortex_ingester_utilization_limiter_current_memory_usage_bytes
: The current ingester memory utilizationinsight=true
field to ruler's prometheus component for rule evaluation logs. #5510cortex_distributor_instance_rejected_requests_total
and cortex_ingester_instance_rejected_requests_total
respectively. #5551-log.buffered
: Enable buffered logging-compactor.no-blocks-file-cleanup-enabled
option. #5648-store-gateway.sharding-ring.auto-forget-enabled
configuration parameter to control whether store-gateway auto-forget feature should be enabled or disabled (enabled by default). #5702cortex_block_upload_api_blocks_total
, cortex_block_upload_api_bytes_total
, and cortex_block_upload_api_files_total
. #5738-log.rate-limit-enabled
-log.rate-limit-logs-per-second
-log.rate-limit-logs-per-second-burst
cortex_ingester_tsdb_head_min_timestamp_seconds
and cortex_ingester_tsdb_head_max_timestamp_seconds
metrics which return min and max time of all TSDB Heads open in an ingester. #5786 #5815-querier.max-estimated-fetched-chunks-per-query-multiplier
. #5765__name__
posting group in selection in order to reduce the number of object storage API calls. #5246timestamp()
function fail with execution: attempted to read series at index 0 from stream, but the stream has already been exhausted
if streaming chunks from ingesters to queriers is enabled. #5370memberlist_client_kv_store_count
metric that used to exist in Cortex, but got lost during dskit updates before Mimir 2.0. #5377cortex_ingester_client_request_duration_seconds
metric did not include streaming query requests that did not return any series. #5695MimirProvisioningTooManyActiveSeries
alert. You should configure -ingester.instance-limits.max-series
and rely on MimirIngesterReachingSeriesLimit
alert instead. #5593MimirProvisioningTooManyWrites
alert. The alerting threshold used in this alert was chosen arbitrarily and ingesters receiving an higher number of samples / sec don't necessarily have any issue. You should rely on SLOs metrics and alerts instead. #5706MimirRequestErrors
or MimirRequestLatency
alert for the /debug/pprof
endpoint. #5826MimirIngestedDataTooFarInTheFuture
warning alert that triggers when Mimir ingests sample with timestamp more than 1h in the future. #5822MimirIngesterRestarts
to fire only when the ingester container is restarted, excluding the cases the pod is rescheduled. #5397MimirIngesterHasNotShippedBlocks
and MimirIngesterHasNotShippedBlocksSinceStart
alerts. #5396MimirGossipMembersMismatch
to include admin-api
and custom compactor pods. admin-api
is a GEM component. #5641 #5797_config.querier.concurrency
configuration option and replaced it with _config.querier_max_concurrency
and _config.ruler_querier_max_concurrency
to allow to easily fine tune it for different querier deployments. #5322_config.multi_zone_ingester_max_unavailable
to 50. #5327maxSurge
and maxUnavailable
are set to 15%
and 0
. #5714autoscaling_alertmanager_enabled: true
. #5194 #5249track_sizes
feature for Memcached pods to help determine cache efficiency. #5209PodDisruptionBudget
s for compactor, continuous-test, distributor, overrides-exporter, querier, query-frontend, query-scheduler, rollout-operator, ruler, ruler-querier, ruler-query-frontend, ruler-query-scheduler, and all memcached workloads. #5098_config.shuffle_sharding.target_series_per_ingester
and _config.shuffle_sharding.target_utilization_percentage
values. #5470_config.autoscaling_distributor_cpu_target_utilization
. #5525_config.ruler_remote_evaluation_max_query_response_size_bytes
to easily set the maximum query response size allowed (in bytes). #5592GOMAXPROCS
based on the CPU request. This should reduce distributor CPU utilization, assuming the CPU request is set to a value close to the actual utilization. #5588GOMAXPROCS
based on the CPU request. This should reduce noisy neighbour issues created by the querier, whose CPU utilization could eventually saturate the Kubernetes node if unbounded. #5646 #5658null
in the *_env_map
objects (e.g. store_gateway_env_map+:: { 'field': null}
). #5599etcd
. #5589autoscaling_querier_target_utilization
(defaults to 0.75
)autoscaling_mimir_read_target_utilization
(defaults to 0.75
)autoscaling_ruler_querier_cpu_target_utilization
(defaults to 1
)autoscaling_distributor_memory_target_utilization
(defaults to 1
)autoscaling_ruler_cpu_target_utilization
(defaults to 1
)autoscaling_query_frontend_cpu_target_utilization
(defaults to 1
)autoscaling_ruler_query_frontend_cpu_target_utilization
(defaults to 1
)autoscaling_alertmanager_cpu_target_utilization
(defaults to 1
)v0.7.0
. #5718Content-Type
response header from backend. Previously Content-Type: text/plain; charset=utf-8
was returned on all requests. #5183-proxy.compare-skip-recent-samples
to avoid racing with recording rule evaluation. #5561-backend.skip-tls-verify
to optionally skip TLS verification on backends. #5656get-started
documentation directory. #5476MimirRulerTooManyFailedQueries
runbook. #5586MimirRequestErrors
runbook for alertmanager. #5694-source-service
and -destination-service
flags are now required and the -service
flag has been removed. #5486-help
flag is passed. #5412All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.9.0...mimir-2.10.0-rc.0
Published by flxbk over 1 year ago
This release contains 252 PRs from 46 authors, including new contributors Alex R, Alexander Soelberg Heidarsson, Alexander Weaver, Benjamin Lazarecki, Dhanu Saputra, Dominik Süß, Fiona Liao, Jonathan Halterman, Kristian Bremberg, MattiasSegerdahl, Salva Corts, Stephanie Closson, willychrisza. Thank you!
Grafana Labs is excited to announce version 2.9 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
datacenter="dc1"
), Mimir 2.9 will fetch a reduced volume of index data, which leads to a significant reduction in memory allocations in the store-gateway.-ruler-storage.cache.*
CLI flags or their respective YAML config options.-ruler.poll-interval
, which has then been relaxed from every 1m
to every 10m
. The new behaviour is enabled globally by default but can be disabled with -ruler.sync-rules-on-changes-enabled=false
or tuned at a per-tenant level.The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.
In Grafana Mimir 2.9 we have removed the following previously deprecated or experimental metrics:
cortex_bucket_store_chunk_pool_requested_bytes_total
cortex_bucket_store_chunk_pool_returned_bytes_total
The following configuration options are deprecated and will be removed in Grafana Mimir 2.11:
-querier.query-ingesters-within
. This configuration is moved to per-tenant overrides.-blocks-storage.bucket-store.bucket-index.enabled
.-blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes
, -blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes
and -blocks-storage.bucket-store.max-chunk-pool-bytes
.-querier.iterators
and -query.batch-iterators
.The following configuration options that were deprecated in 2.7 are removed:
-blocks-storage.bucket-store.chunks-cache.subrange-size
. A fixed value of 16000 is now always used.-blocks-storage.bucket-store.consistency-delay
.-compactor.consistency-delay
.-ingester.ring.readiness-check-ring-health
.The following experimental options and features are now stable:
-query-frontend.query-sharding-max-regexp-size-bytes
.-query-scheduler.max-used-instances
.-(alertmanager|blocks|ruler)-storage.storage-prefix
.-compactor.first-level-compaction-wait-period
.-usage-stats.enabled
and -usage-stats.installation-mode
.-query-frontend.query-sharding-target-series-per-shard
.The following configuration option defaults were changed:
-query-frontend.query-sharding-max-regexp-size-bytes
was changed from 0
to 4096
. As a result, queries with regex matchers exceeding this limit will not be sharded by default.-compactor.partial-block-deletion-delay
was changed from 0s
to 1d
. As a result, partial blocks resulting from a failed block upload or deletion will be cleaned up automatically.-ruler.poll-interval
was changed from 1m
to 10m
.-query-frontend.query-sharding-max-regexp-size-bytes
from 0
to 4096
. #4932-querier.query-ingesters-within
has been moved from a global flag to a per-tenant override. #4287-blocks-storage.tsdb.retention-period
instead of -querier.query-ingesters-within
for calculating the lookback period for shuffle sharded ingesters. Setting -querier.query-ingesters-within=0
no longer disables shuffle sharding on the read path. #4287/api/v1/upload/block/{block}/files
endpoint now allows file uploads with no Content-Length
. #4956-blocks-storage.bucket-store.max-chunk-pool-bytes
-blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes
-blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes
cortex_bucket_store_chunk_pool_requested_bytes_total
and cortex_bucket_store_chunk_pool_returned_bytes_total
. #4996-compactor.partial-block-deletion-delay
to 1d
. This will automatically clean up partial blocks that were a result of failed block upload or deletion. #5026-compactor.consistency-delay
has been removed. #5050-blocks-storage.bucket-store.consistency-delay
has been removed. #5050-blocks-storage.bucket-store.bucket-index.enabled
has been deprecated and will be removed in Mimir 2.11. Mimir is running by default with the bucket index enabled since version 2.0, and starting from the version 2.11 it will not be possible to disable it. #5051-querier.iterators
and -query.batch-iterators
have been deprecated and will be removed in Mimir 2.11. Mimir runs by default with -querier.batch-iterators=true
, and starting from version 2.11 it will not be possible to change this. #5114-compactor.first-level-compaction-wait-period
to 25m. #5128-ruler.poll-interval
from 1m
to 10m
. Starting from this release, the configured rule groups will also be re-synced each time they're modified calling the ruler configuration API. #5170-query-frontend.log-query-request-headers
to enable logging of request headers in query logs. #5030-validation.max-native-histogram-buckets
to be able to ignore native histogram samples that have too many buckets. #4765stage
label to the metric cortex_bucket_store_series_data_touched
. This label now applies to data_type="chunks"
and data_type="series"
. The stage
label has 2 values: processed
- the number of series that parsed - and returned
- the number of series selected from the processed bytes to satisfy the query. #4797 #4830__meta_tenant_id
label available in relabeling rules configured via metric_relabel_configs
. #4725compactor.block-upload-max-block-size-bytes
or compactor_block_upload_max_block_size_bytes
to limit the byte size of uploaded or validated blocks. #4680queryFromGeneratorURL
returning query URL decoded query from the GeneratorURL
field of an alert. #4301-ruler-storage.cache.*
CLI flags or their respective YAML config options. #4950 #5054/store-gateway/prepare-shutdown
endpoint for gracefully scaling down of store-gateways. A gauge cortex_store_gateway_prepare_shutdown_requested
has been introduced for tracing this process. #4955-server.log-request-headers
enables logging HTTP request headers, -server.log-request-headers-exclude-list
lists headers which should not be logged. #4922/api/v1/upload/block/{block}/files
endpoint now disables read and write HTTP timeout, overriding -server.http-read-timeout
and -server.http-write-timeout
values. This is done to allow large file uploads to succeed. #4956cortex_alertmanager_notifications_failed_total
(added reason
label)cortex_alertmanager_nflog_maintenance_total
cortex_alertmanager_nflog_maintenance_errors_total
cortex_alertmanager_silences_maintenance_total
cortex_alertmanager_silences_maintenance_errors_total
cortex_request_duration_seconds
metric family. #4987<prometheus-http-prefix>/api/v1/format_query
to format a PromQL query. #4373cortex_query_frontend_regexp_matcher_count
and cortex_query_frontend_regexp_matcher_optimized_count
metrics to track optimization of regular expression label matchers. #4813-enable-go-runtime-metrics
flag to expose all go runtime metrics as Prometheus metrics. #5009-ruler.poll-interval
. The new behavior is enabled by default, but can be disabled with -ruler.sync-rules-on-changes-enabled=false
(configurable on a per-tenant basis too). If you disable the new behaviour, then you may want to revert -ruler.poll-interval
to 1m
. #4975 #5053 #5115 #5170cortex_bucket_store_series_request_stage_duration_seconds{stage="load_index_header"}
. Now index header loading will be visible in the "Mimir / Queries" dashboard in the "Series request p99/average latency" panels. #5011 #5062-querier.prefer-streaming-chunks=true
. #4886 #5078 #5094 #5126alpine:3.17.3
to alpine:3.18.0
. #5065meta.json
files. #5063-distributor.request-rate-limit
and -distributor.request-burst-size
) and their associated YAML configuration are now stable. #5124type=alert|record
query parameter for the API endpoint <prometheus-http-prefix>/api/v1/rules
. #4302MimirIngesterReachingTenantsLimit
runbook. #4744 #4752symbol table size exceeds
case to MimirCompactorHasNotSuccessfullyRunCompaction
runbook. #4945MimirQuerierHighRefetchRate
. #4980MimirTenantHasPartialBlocks
. This is obsoleted by the changed default of -compactor.partial-block-deletion-delay
to 1d
, which will auto remediate this alert. #5026MimirIngesterTSDBWALCorrupted
now only fires when there are more than one corrupted WALs in single-zone deployments and when there are more than two zones affected in multi-zone deployments. #4920MimirRolloutStuck
and MimirCompactorHasNotUploadedBlocks
rules in order to distinguish them. #5023MimirAllocatingTooMuchMemory
alert for any matching container outside of namespaces where Mimir is running. #5089alertmanager_args
to mimir-backend
when running in read-write deployment mode. Remove hardcoded filesystem
alertmanager storage. This moves alertmanager's data-dir to /data/alertmanager
by default. #4907 #4921-pdb
suffix from PodDisruptionBudget
names. This will create new PodDisruptionBudget
resources. Make sure to prune the old resources; otherwise, rollouts will be blocked. #5109-query-frontend.query-sharding-target-series-per-shard
by default if the results cache is enabled. #5128-blocks-storage.tsdb.head-compaction-interval=15m
to spread TSDB head compaction over a wider time range. #4870-blocks-storage.tsdb.wal-replay-concurrency
to CPU request minus 1. #4864-compactor.first-level-compaction-wait-period
to TSDB head compaction interval plus 10 minutes. #4872GOMEMLIMIT
to the memory request value. This should reduce the likelihood the store-gateway may go out of memory, at the cost of an higher CPU utilization due to more frequent garbage collections when the memory utilization gets closer or above the configured requested memory. #4971GOMAXPROCS
based on the CPU request. This should reduce the likelihood a high load on the store-gateway will slow down the entire Kubernetes node. #5104store_gateway_lazy_loading_enabled
configuration option which combines disabled lazy-loading and reducing blocks sync concurrency. Reducing blocks sync concurrency improves startup times with disabled lazy loading on HDDs. #5025rollout-operator
image to v0.6.0
. #5155-ruler.alertmanager-url
to mimir-backend
when running in read-write deployment mode. #4892--strict
is provided. #5035--namespaces-regex
and --ignore-namespaces-regex
. #5100-prometheus-http-prefix
. #4966--folder-title
to limit dashboards analysis based on their exact folder title. #4973--service
flag is now required to be specified (accepted values are gcs
or abs
). #4756All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.8.0...mimir-2.9.0
Published by flxbk over 1 year ago
This release contains 260 PRs from 46 authors. Thank you!
Grafana Labs is excited to announce version 2.9 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
datacenter="dc1"
), Mimir 2.9 will fetch a reduced volume of index data, which leads to a significant reduction in memory allocations in the store-gateway.-ruler-storage.cache.*
CLI flags or their respective YAML config options.-ruler.sync-rules-on-changes-enabled=false
or tuned at a per-tenant level.The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.
In Grafana Mimir 2.9 we have removed the following previously deprecated or experimental metrics:
cortex_bucket_store_chunk_pool_requested_bytes_total
cortex_bucket_store_chunk_pool_returned_bytes_total
The following configuration options are deprecated and will be removed in Grafana Mimir 2.11:
-querier.query-ingesters-within
. This configuration is moved to per-tenant overrides.-blocks-storage.bucket-store.bucket-index.enabled
.-blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes
, -blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes
and -blocks-storage.bucket-store.max-chunk-pool-bytes
.querier.iterators
and -query.batch-iterators
.The following configuration options that were deprecated in 2.7 are removed:
-blocks-storage.bucket-store.chunks-cache.subrange-size
. A fixed value of 16000 is now always used.-blocks-storage.bucket-store.consistency-delay
.-compactor.consistency-delay
.-ingester.ring.readiness-check-ring-health
.The following experimental options and features are now stable:
-query-frontend.query-sharding-max-regexp-size-bytes
.-query-scheduler.max-used-instances
.-(alertmanager|blocks|ruler)-storage.storage-prefix
.-compactor.first-level-compaction-wait-period
.-usage-stats.enabled
and -usage-stats.installation-mode
.-query-frontend.query-sharding-target-series-per-shard
.The following configuration option defaults were changed:
-query-frontend.query-sharding-max-regexp-size-bytes
was changed from 0
to 4096
. As a result, queries with regex matchers exceeding this limit will not be sharded by default.-compactor.partial-block-deletion-delay
was changed from 0s
to 1d
. As a result, partial blocks resulting from a failed block upload or deletion will be cleaned up automatically.-ruler.poll-interval
was changed from 1m
to 10m
.-query-frontend.query-sharding-max-regexp-size-bytes
from 0
to 4096
. #4932-querier.query-ingesters-within
has been moved from a global flag to a per-tenant override. #4287-blocks-storage.tsdb.retention-period
instead of -querier.query-ingesters-within
for calculating the lookback period for shuffle sharded ingesters. Setting -querier.query-ingesters-within=0
no longer disables shuffle sharding on the read path. #4287/api/v1/upload/block/{block}/files
endpoint now allows file uploads with no Content-Length
. #4956-blocks-storage.bucket-store.max-chunk-pool-bytes
-blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes
-blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes
cortex_bucket_store_chunk_pool_requested_bytes_total
and cortex_bucket_store_chunk_pool_returned_bytes_total
. #4996-compactor.partial-block-deletion-delay
to 1d
. This will automatically clean up partial blocks that were a result of failed block upload or deletion. #5026-compactor.consistency-delay
has been removed. #5050-blocks-storage.bucket-store.consistency-delay
has been removed. #5050-blocks-storage.bucket-store.bucket-index.enabled
has been deprecated and will be removed in Mimir 2.11. Mimir is running by default with the bucket index enabled since version 2.0, and starting from the version 2.11 it will not be possible to disable it. #5051-querier.iterators
and -query.batch-iterators
have been deprecated and will be removed in Mimir 2.11. Mimir runs by default with -querier.batch-iterators=true
, and starting from version 2.11 it will not be possible to change this. #5114-compactor.first-level-compaction-wait-period
to 25m. #5128-ruler.poll-interval
from 1m
to 10m
. Starting from this release, the configured rule groups will also be re-synced each time they're modified calling the ruler configuration API. #5170-query-frontend.log-query-request-headers
to enable logging of request headers in query logs. #5030-validation.max-native-histogram-buckets
to be able to ignore native histogram samples that have too many buckets. #4765stage
label to the metric cortex_bucket_store_series_data_touched
. This label now applies to data_type="chunks"
and data_type="series"
. The stage
label has 2 values: processed
- the number of series that parsed - and returned
- the number of series selected from the processed bytes to satisfy the query. #4797 #4830__meta_tenant_id
label available in relabeling rules configured via metric_relabel_configs
. #4725compactor.block-upload-max-block-size-bytes
or compactor_block_upload_max_block_size_bytes
to limit the byte size of uploaded or validated blocks. #4680queryFromGeneratorURL
returning query URL decoded query from the GeneratorURL
field of an alert. #4301-ruler-storage.cache.*
CLI flags or their respective YAML config options. #4950 #5054/store-gateway/prepare-shutdown
endpoint for gracefully scaling down of store-gateways. A gauge cortex_store_gateway_prepare_shutdown_requested
has been introduced for tracing this process. #4955-server.log-request-headers
enables logging HTTP request headers, -server.log-request-headers-exclude-list
lists headers which should not be logged. #4922/api/v1/upload/block/{block}/files
endpoint now disables read and write HTTP timeout, overriding -server.http-read-timeout
and -server.http-write-timeout
values. This is done to allow large file uploads to succeed. #4956cortex_alertmanager_notifications_failed_total
(added reason
label)cortex_alertmanager_nflog_maintenance_total
cortex_alertmanager_nflog_maintenance_errors_total
cortex_alertmanager_silences_maintenance_total
cortex_alertmanager_silences_maintenance_errors_total
cortex_request_duration_seconds
metric family. #4987<prometheus-http-prefix>/api/v1/format_query
to format a PromQL query. #4373cortex_query_frontend_regexp_matcher_count
and cortex_query_frontend_regexp_matcher_optimized_count
metrics to track optimization of regular expression label matchers. #4813-enable-go-runtime-metrics
flag to expose all go runtime metrics as Prometheus metrics. #5009-ruler.poll-interval
. The new behavior is enabled by default, but can be disabled with -ruler.sync-rules-on-changes-enabled=false
(configurable on a per-tenant basis too). If you disable the new behaviour, then you may want to revert -ruler.poll-interval
to 1m
. #4975 #5053 #5115 #5170cortex_bucket_store_series_request_stage_duration_seconds{stage="load_index_header"}
. Now index header loading will be visible in the "Mimir / Queries" dashboard in the "Series request p99/average latency" panels. #5011 #5062-querier.prefer-streaming-chunks=true
. #4886 #5078 #5094 #5126alpine:3.17.3
to alpine:3.18.0
. #5065meta.json
files. #5063-distributor.request-rate-limit
and -distributor.request-burst-size
) and their associated YAML configuration are now stable. #5124type=alert|record
query parameter for the API endpoint <prometheus-http-prefix>/api/v1/rules
. #4302MimirIngesterReachingTenantsLimit
runbook. #4744 #4752symbol table size exceeds
case to MimirCompactorHasNotSuccessfullyRunCompaction
runbook. #4945MimirQuerierHighRefetchRate
. #4980MimirTenantHasPartialBlocks
. This is obsoleted by the changed default of -compactor.partial-block-deletion-delay
to 1d
, which will auto remediate this alert. #5026MimirIngesterTSDBWALCorrupted
now only fires when there are more than one corrupted WALs in single-zone deployments and when there are more than two zones affected in multi-zone deployments. #4920MimirRolloutStuck
and MimirCompactorHasNotUploadedBlocks
rules in order to distinguish them. #5023MimirAllocatingTooMuchMemory
alert for any matching container outside of namespaces where Mimir is running. #5089alertmanager_args
to mimir-backend
when running in read-write deployment mode. Remove hardcoded filesystem
alertmanager storage. This moves alertmanager's data-dir to /data/alertmanager
by default. #4907 #4921-pdb
suffix from PodDisruptionBudget
names. This will create new PodDisruptionBudget
resources. Make sure to prune the old resources; otherwise, rollouts will be blocked. #5109-query-frontend.query-sharding-target-series-per-shard
by default if the results cache is enabled. #5128-blocks-storage.tsdb.head-compaction-interval=15m
to spread TSDB head compaction over a wider time range. #4870-blocks-storage.tsdb.wal-replay-concurrency
to CPU request minus 1. #4864-compactor.first-level-compaction-wait-period
to TSDB head compaction interval plus 10 minutes. #4872GOMEMLIMIT
to the memory request value. This should reduce the likelihood the store-gateway may go out of memory, at the cost of an higher CPU utilization due to more frequent garbage collections when the memory utilization gets closer or above the configured requested memory. #4971GOMAXPROCS
based on the CPU request. This should reduce the likelihood a high load on the store-gateway will slow down the entire Kubernetes node. #5104store_gateway_lazy_loading_enabled
configuration option which combines disabled lazy-loading and reducing blocks sync concurrency. Reducing blocks sync concurrency improves startup times with disabled lazy loading on HDDs. #5025rollout-operator
image to v0.6.0
. #5155-ruler.alertmanager-url
to mimir-backend
when running in read-write deployment mode. #4892--strict
is provided. #5035--namespaces-regex
and --ignore-namespaces-regex
. #5100-prometheus-http-prefix
. #4966--folder-title
to limit dashboards analysis based on their exact folder title. #4973--service
flag is now required to be specified (accepted values are gcs
or abs
). #4756Full Changelog: https://github.com/grafana/mimir/compare/mimir-2.8.0...mimir-2.9.0-rc.1