Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.
AGPL-3.0 License
Bot releases are visible (Hide)
Published by lamida over 1 year ago
This release contains 223 PRs from 53 authors, including new contributors Abdurrahman J. Allawala, Ashray Jain, Cyrill N, Daniel Barnes, Dave, David van der Spek, day4me, Devin Trejo, Dmitriy Okladin, Gabriel Santos, inbarpatashnik, Johannes Tandler, Julien Girard, KingJ, Miller, Rafał Boniecki, Raphael Ferreira, Raúl Marín, Ruslan Kovalov, Shagit Ziganshin, shanmugara, Wilfried ROSET. Thank you!
Grafana Labs is excited to announce version 2.8 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.
In Grafana Mimir 2.8 we have removed the following previously deprecated or experimental metrics:
cortex_bucket_store_series_get_all_duration_seconds
cortex_bucket_store_series_merge_duration_seconds
cortex_ingester_tsdb_wal_replay_duration_seconds
The following configuration options are deprecated and will be removed in Grafana Mimir 2.10:
-blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup
and its respective YAML configuration option tsdb.max_tsdb_opening_concurrency_on_startup
.The following configuration options that were deprecated in 2.6 are removed:
-store.max-query-length
and its respective YAML configuration option limits.max_query_length
.The following configuration options that were deprecated in 2.5 are removed:
-azure.msi-resource
.The following experimental options and features are now stable:
We changed default value of block storage retention period. The default value for -blocks-storage.tsdb.retention-period
was 24h
and now is 13h
-out-of-order-blocks-external-label-enabled
to -ingester.out-of-order-blocks-external-label-enabled
#4440cortex_bucket_store_series_get_all_duration_seconds
cortex_bucket_store_series_merge_duration_seconds
-blocks-storage.tsdb.retention-period
from 24h
to 13h
. If you're running Mimir with a custom configuration and you're overriding -querier.query-store-after
to a value greater than the default 12h
then you should increase -blocks-storage.tsdb.retention-period
accordingly. #4382-blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup
has been deprecated and will be removed in Mimir 2.10. #4445cortex_ingester_tsdb_wal_replay_duration_seconds
metrics has been removed. #4465/api/v1/upload/block/{block}/finish
endpoint now returns a 429
status code when the compactor has reached the limit specified by -compactor.max-block-upload-validation-concurrency
. #4598413
status code is returned. #4683tls-ca-path
, tls-cert-path
and tls-key-path
will denote the path in Vault for the following CLI flags when -vault.enabled
is true: #4446.
-distributor.ha-tracker.etcd.*
-distributor.ring.etcd.*
-distributor.forwarding.grpc-client.*
-querier.store-gateway-client.*
-ingester.client.*
-ingester.ring.etcd.*
-querier.frontend-client.*
-query-frontend.grpc-client-config.*
-query-frontend.results-cache.redis.*
-blocks-storage.bucket-store.index-cache.redis.*
-blocks-storage.bucket-store.chunks-cache.redis.*
-blocks-storage.bucket-store.metadata-cache.redis.*
-compactor.ring.etcd.*
-store-gateway.sharding-ring.etcd.*
-ruler.client.*
-ruler.alertmanager-client.*
-ruler.ring.etcd.*
-ruler.query-frontend.grpc-client-config.*
-alertmanager.sharding-ring.etcd.*
-alertmanager.alertmanager-client.*
-memberlist.*
-query-scheduler.grpc-client-config.*
-query-scheduler.ring.etcd.*
-overrides-exporter.ring.etcd.*
-query-frontend.query-result-response-format=protobuf
on the query frontend. #4286 #4352 #4354 #4376 #4377 #4387 #4396 #4425 #4442 #4494 #4512 #4513 #4526-<prefix>.s3.storage-class
flag to configure the S3 storage class for objects written to S3 buckets. #4300freebsd
to the target OS when generating binaries for a Mimir release. #4654prepare-shutdown
endpoint which can be used as part of Kubernetes scale down automations. #4718JAEGER_SERVICE_NAME
environment variable. #4394-query-frontend.query-result-response-format=protobuf
. #4304 #4318 #4375-compactor.first-level-compaction-wait-period
, to configure how long the compactor should wait before compacting 1st level blocks (uploaded by ingesters). This configuration option allows to reduce the chances compactor begins compacting blocks before all ingesters have uploaded their blocks to the storage. #4401-ruler.query-frontend.query-result-response-format=protobuf
. #4331.*
and .+
regular expression label matchers. #4432a|b|c
). #4647-query-frontend.results-cache-ttl
and -query-frontend.results-cache-ttl-for-out-of-order-time-window
options. These values can also be specified per tenant. Default values are unchanged (7 days and 10 minutes respectively). #4385-blocks-storage.tsdb.wal-replay-concurrency
representing the maximum number of CPUs used during WAL replay. #4445cortex_ingester_tsdb_open_duration_seconds_total
to measure the total time it takes to open all existing TSDBs. The time tracked by this metric also includes the TSDBs WAL replay duration. #4465-blocks-storage.bucket-store.batch-series-size
. #4464-blocks-storage.tsdb.block-postings-for-matchers-cache-ttl
-blocks-storage.tsdb.block-postings-for-matchers-cache-size
-blocks-storage.tsdb.block-postings-for-matchers-cache-force
-compactor.block-upload-validation-enabled
has been added, compactor_block_upload_validation_enabled
can be used to override per tenant-compactor.block-upload.block-validation-enabled
was the previous global flag and has been removed-compactor.max-block-upload-validation-concurrency
. #4598-ingester.native-histograms-ingestion-enabled
to true
. #4063 #4639cortex_query_fetched_index_bytes_total
to measure TSDB index bytes fetched to execute a query. #4597-query-frontend.max-query-expression-size-bytes
or max_query_expression_size_bytes
. #4604distributor_limits
block in runtime configuration in addition to the existing configuration. #4619-query-frontend.query-sharding-max-regexp-size-bytes
limit to query-frontend. When set to a value greater than 0, query-frontend disabled query sharding for any query with a regexp matcher longer than the configured limit. #4632cortex_bucket_store_series*
metrics. #4673alpine:3.17.2
to alpine:3.17.3
. #4685-blocks-storage.bucket-store.series-selection-strategy
, which can limit the impact of large posting lists (when many series share the same label name and value). #4667 #4695 #4698stage="processed"
for the metrics cortex_bucket_store_series_data_touched
and cortex_bucket_store_series_data_size_touched_bytes
when using fine-grained chunks caching is now reporting the correct values of chunks held in memory. #4449/etc/default/mimir
and /etc/sysconfig/mimir
as config to prevent overwrite. #4587_config.job_names.<job>
values can now be arrays of regular expressions in addition to a single string. Strings are still supported and behave as before. #4543(.*-mimir-)
to (.*mimir-)
. #46030
to 50%
, and max unavailable from 1
to 0
. #4381-blocks-storage.bucket-store.index-cache.memcached.max-idle-connections
, -blocks-storage.bucket-store.chunks-cache.memcached.max-idle-connections
and -blocks-storage.bucket-store.metadata-cache.memcached.max-idle-connections
settings are now configured based on max-get-multi-concurrency
and max-async-concurrency
. #4591memcached_*_enabled
config options to cache_*_enabled
memcached_*_max_item_size_mb
config options to cache_*_max_item_size_mb
cache_*_backend
config optionsalertmanager_data_disk_size
and alertmanager_data_disk_class
configuration options, by default no storage class is set. #4389rollout-operator
to v0.4.0
. #4524memcached:1.6.19-alpine
. #4581memcached-exporter
to v0.11.2
. #4570autoscaling_query_frontend_memory_target_utilization
, autoscaling_ruler_query_frontend_memory_target_utilization
, and autoscaling_ruler_memory_target_utilization
configuration options, for controlling the corresponding autoscaler memory thresholds. Each has a default of 1, i.e. 100%. #4612distributor_instance_limits
using runtime configuration. #4627gauge
panel type is supported now in mimirtool analyze dashboard
. #4679User-Agent
header on requests to Mimir or Prometheus servers. #4700-tests.write-read-series-test.histogram-samples-enabled
. The metrics exposed by the tool will now have a new label called type
with possible values of float
, histogram_float_counter
, histogram_float_gauge
, histogram_int_counter
, histogram_int_gauge
, the list of metrics impacted: #4457
mimir_continuous_test_writes_total
mimir_continuous_test_writes_failed_total
mimir_continuous_test_queries_total
mimir_continuous_test_queries_failed_total
mimir_continuous_test_query_result_checks_total
mimir_continuous_test_query_result_checks_failed_total
mimir_continuous_test_build_info
that reports version information, similar to the existing cortex_build_info
metric exposed by other Mimir components. #4712split-groups
and split-and-merge-shards
recommendation on component page. #4623All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.7.3...mimir-2.8.0
Published by pstibrany over 1 year ago
Full Changelog: https://github.com/grafana/mimir/compare/mimir-2.7.2...mimir-2.7.3
Published by pstibrany over 1 year ago
Full Changelog: https://github.com/grafana/mimir/compare/mimir-2.6.1...mimir-2.6.2
Published by lamida over 1 year ago
This release contains 2 PRs from 2 authors. Thank you!
All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.8.0-rc.1...mimir-2.8.0-rc.2
Published by lamida over 1 year ago
This release contains 8 PRs from 2 authors. Thank you!
All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.8.0-rc.0...mimir-2.8.0-rc.1
Published by aldernero over 1 year ago
This release contains 3 PRs from 2 authors. Thank you!
All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.6.0...mimir-2.6.1
Published by aldernero over 1 year ago
This release contains 2 PRs from 2 authors. Thank you!
All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.7.1...mimir-2.7.2
Published by lamida over 1 year ago
This release contains 210 PRs from 53 authors, including new contributors Abdurrahman J. Allawala, Ashray Jain, Cyrill N, Daniel Barnes, Dave, David van der Spek, day4me, Devin Trejo, Dmitriy Okladin, Gabriel Santos, inbarpatashnik, Johannes Tandler, Julien Girard, KingJ, Miller, Rafał Boniecki, Raphael Ferreira, Raúl Marín, Ruslan Kovalov, Shagit Ziganshin, shanmugara, Wilfried ROSET. Thank you!
Grafana Labs is excited to announce version 2.8 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
-blocks-storage.tsdb.retention-period
was 24h
and now is 13h
.The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.
In Grafana Mimir 2.8 we have removed the following previously deprecated or experimental configuration options or metrics.
The following metrics have been removed cortex_bucket_store_series_get_all_duration_seconds
, cortex_bucket_store_series_merge_duration_seconds
,
cortex_ingester_tsdb_wal_replay_duration_seconds
.
The following configuration options are deprecated and will be removed in Grafana Mimir 2.10:
-blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup
and its respective YAML configuration option tsdb.max_tsdb_opening_concurrency_on_startup
.The following experimental options and features are now stable:
-out-of-order-blocks-external-label-enabled
to -ingester.out-of-order-blocks-external-label-enabled
#4440cortex_bucket_store_series_get_all_duration_seconds
cortex_bucket_store_series_merge_duration_seconds
-blocks-storage.tsdb.retention-period
from 24h
to 13h
. If you're running Mimir with a custom configuration and you're overriding -querier.query-store-after
to a value greater than the default 12h
then you should increase -blocks-storage.tsdb.retention-period
accordingly. #4382-blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup
has been deprecated and will be removed in Mimir 2.10. #4445cortex_ingester_tsdb_wal_replay_duration_seconds
metrics has been removed. #4465/api/v1/upload/block/{block}/finish
endpoint now returns a 429
status code when the compactor has reached the limit specified by -compactor.max-block-upload-validation-concurrency
. #4598413
status code is returned. #4683tls-ca-path
, tls-cert-path
and tls-key-path
will denote the path in Vault for the following CLI flags when -vault.enabled
is true: #4446.
-distributor.ha-tracker.etcd.*
-distributor.ring.etcd.*
-distributor.forwarding.grpc-client.*
-querier.store-gateway-client.*
-ingester.client.*
-ingester.ring.etcd.*
-querier.frontend-client.*
-query-frontend.grpc-client-config.*
-query-frontend.results-cache.redis.*
-blocks-storage.bucket-store.index-cache.redis.*
-blocks-storage.bucket-store.chunks-cache.redis.*
-blocks-storage.bucket-store.metadata-cache.redis.*
-compactor.ring.etcd.*
-store-gateway.sharding-ring.etcd.*
-ruler.client.*
-ruler.alertmanager-client.*
-ruler.ring.etcd.*
-ruler.query-frontend.grpc-client-config.*
-alertmanager.sharding-ring.etcd.*
-alertmanager.alertmanager-client.*
-memberlist.*
-query-scheduler.grpc-client-config.*
-query-scheduler.ring.etcd.*
-overrides-exporter.ring.etcd.*
-query-frontend.query-result-response-format=protobuf
on the query frontend. #4286 #4352 #4354 #4376 #4377 #4387 #4396 #4425 #4442 #4494 #4512 #4513 #4526-<prefix>.s3.storage-class
flag to configure the S3 storage class for objects written to S3 buckets. #4300freebsd
to the target OS when generating binaries for a Mimir release. #4654prepare-shutdown
endpoint which can be used as part of Kubernetes scale down automations. #4718JAEGER_SERVICE_NAME
environment variable. #4394-query-frontend.query-result-response-format=protobuf
. #4304 #4318 #4375-compactor.first-level-compaction-wait-period
, to configure how long the compactor should wait before compacting 1st level blocks (uploaded by ingesters). This configuration option allows to reduce the chances compactor begins compacting blocks before all ingesters have uploaded their blocks to the storage. #4401-ruler.query-frontend.query-result-response-format=protobuf
. #4331.*
and .+
regular expression label matchers. #4432a|b|c
). #4647-query-frontend.results-cache-ttl
and -query-frontend.results-cache-ttl-for-out-of-order-time-window
options. These values can also be specified per tenant. Default values are unchanged (7 days and 10 minutes respectively). #4385-blocks-storage.tsdb.wal-replay-concurrency
representing the maximum number of CPUs used during WAL replay. #4445cortex_ingester_tsdb_open_duration_seconds_total
to measure the total time it takes to open all existing TSDBs. The time tracked by this metric also includes the TSDBs WAL replay duration. #4465-blocks-storage.bucket-store.batch-series-size
. #4464-blocks-storage.tsdb.block-postings-for-matchers-cache-ttl
-blocks-storage.tsdb.block-postings-for-matchers-cache-size
-blocks-storage.tsdb.block-postings-for-matchers-cache-force
-compactor.block-upload-validation-enabled
has been added, compactor_block_upload_validation_enabled
can be used to override per tenant-compactor.block-upload.block-validation-enabled
was the previous global flag and has been removed-compactor.max-block-upload-validation-concurrency
. #4598-ingester.native-histograms-ingestion-enabled
to true
. #4063 #4639cortex_query_fetched_index_bytes_total
to measure TSDB index bytes fetched to execute a query. #4597-query-frontend.max-query-expression-size-bytes
or max_query_expression_size_bytes
. #4604distributor_limits
block in runtime configuration in addition to the existing configuration. #4619-query-frontend.query-sharding-max-regexp-size-bytes
limit to query-frontend. When set to a value greater than 0, query-frontend disabled query sharding for any query with a regexp matcher longer than the configured limit. #4632cortex_bucket_store_series*
metrics. #4673alpine:3.17.2
to alpine:3.17.3
. #4685-blocks-storage.bucket-store.series-selection-strategy
, which can limit the impact of large posting lists (when many series share the same label name and value). #4667 #4695 #4698stage="processed"
for the metrics cortex_bucket_store_series_data_touched
and cortex_bucket_store_series_data_size_touched_bytes
when using fine-grained chunks caching is now reporting the correct values of chunks held in memory. #4449/etc/default/mimir
and /etc/sysconfig/mimir
as config to prevent overwrite. #4587_config.job_names.<job>
values can now be arrays of regular expressions in addition to a single string. Strings are still supported and behave as before. #4543(.*-mimir-)
to (.*mimir-)
. #46030
to 50%
, and max unavailable from 1
to 0
. #4381-blocks-storage.bucket-store.index-cache.memcached.max-idle-connections
, -blocks-storage.bucket-store.chunks-cache.memcached.max-idle-connections
and -blocks-storage.bucket-store.metadata-cache.memcached.max-idle-connections
settings are now configured based on max-get-multi-concurrency
and max-async-concurrency
. #4591memcached_*_enabled
config options to cache_*_enabled
memcached_*_max_item_size_mb
config options to cache_*_max_item_size_mb
cache_*_backend
config optionsalertmanager_data_disk_size
and alertmanager_data_disk_class
configuration options, by default no storage class is set. #4389rollout-operator
to v0.4.0
. #4524memcached:1.6.19-alpine
. #4581memcached-exporter
to v0.11.2
. #4570autoscaling_query_frontend_memory_target_utilization
, autoscaling_ruler_query_frontend_memory_target_utilization
, and autoscaling_ruler_memory_target_utilization
configuration options, for controlling the corresponding autoscaler memory thresholds. Each has a default of 1, i.e. 100%. #4612distributor_instance_limits
using runtime configuration. #4627gauge
panel type is supported now in mimirtool analyze dashboard
. #4679User-Agent
header on requests to Mimir or Prometheus servers. #4700-tests.write-read-series-test.histogram-samples-enabled
. The metrics exposed by the tool will now have a new label called type
with possible values of float
, histogram_float_counter
, histogram_float_gauge
, histogram_int_counter
, histogram_int_gauge
, the list of metrics impacted: #4457
mimir_continuous_test_writes_total
mimir_continuous_test_writes_failed_total
mimir_continuous_test_queries_total
mimir_continuous_test_queries_failed_total
mimir_continuous_test_query_result_checks_total
mimir_continuous_test_query_result_checks_failed_total
mimir_continuous_test_build_info
that reports version information, similar to the existing cortex_build_info
metric exposed by other Mimir components. #4712split-groups
and split-and-merge-shards
recommendation on component page. #4623All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.7.1...mimir-2.8.0-rc.0
Published by aldernero over 1 year ago
This release contains 177 PRs from 43 authors, including new contributors Bartosz Cisek, dggmsa, gmintoco, Ihor Urazov, James Ross, Jean-Philippe Quéméner, Jon Gutschon, l3ioo, lpugoy, Nicolás Pazos, Oscar, Reto Kupferschmid, ying-jeanne. Thank you!
Grafana Labs is excited to announce version 2.7.1 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
Note: During the release process, version 2.7.0 was tagged too early, before completing the release checklist and production testing. Release 2.7.1 doesn't include any code changes since 2.7.0, but now has proper release notes, published documentation, and has been fully tested in our production environment.
5000
for -blocks-storage.bucket-store.batch-series-size
enables store-gateway streaming in the default configuration. This means that series are loaded from object storage in batches rather than buffering them all in memory before returning to the querier. Enabling streaming can reduce memory utilization peaks in the store-gateway.keep_firing_for
option to ruler configuration This new option determines the amount of time an alert should keep firing while the ruler expression doesn't return results.-blocks-storage.bucket-store.chunks-cache.fine-grained-chunks-caching-enabled=true
. This should reduce CPU, memory utilization, and receive bandwidth of a store-gateway.-query-frontend.query-sharding-target-series-per-shard
, allows query sharding to take into account cardinality of similar requests executed previously when computing the maximum number of shards to use. If you want to try it out, we recommend starting with a value of 2500
.-ingester.native-histograms-ingestion-enabled
controls whether native histograms are stored or ignored. The support for querying native histograms is not complete yet and it's expected to be available in the next release.cortex_alertmanager_dispatcher_aggregation_groups
cortex_alertmanager_dispatcher_alert_processing_duration_seconds
The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.
In Grafana Mimir 2.7, the default vaules of the following configuration options have changed:
-blocks-storage.bucket-store.batch-series-size
is now enabled by default with a value of 5000
.-ruler.evaluation-delay-duration
has changed from 0
to 1m
.In Grafana Mimir 2.7, the following configuration options are now deprecated:
-blocks-storage.bucket-store.chunks-cache.subrange-size
since there's no benefit to changing the default of 16000
-blocks-storage.bucket-store.consistency-delay
has been deprecated and will be removed in Mimir 2.9.-compactor.consistency-delay
has been deprecated and will be removed in Mimir 2.9.-ingester.ring.readiness-check-ring-health
has been deprecated and will be removed in Mimir 2.9.In Grafana Mimir 2.7, the following options, metrics, and labels have been removed:
-blocks-storage.ephemeral-tsdb.*
-distributor.ephemeral-series-enabled
-distributor.ephemeral-series-matchers
-ingester.max-ephemeral-series-per-user
-ingester.instance-limits.max-ephemeral-series
cortex_ingester_ephemeral_series
cortex_ingester_ephemeral_series_created_total
cortex_ingester_ephemeral_series_removed_total
cortex_ingester_ingested_ephemeral_samples_total
cortex_ingester_ingested_ephemeral_samples_failures_total
cortex_ingester_memory_ephemeral_users
cortex_ingester_queries_ephemeral_total
cortex_ingester_queried_ephemeral_samples
cortex_ingester_queried_ephemeral_series
{__mimir_storage__="ephemeral"}
selector no longer works. All label values with the ephemeral-
prefix within the reason
label of the cortex_discarded_samples_total
metric are no longer available.-blocks-storage.bucket-store.index-header.map-populate-enabled
has been removed-blocks-storage.bucket-store.index-header.stream-reader-enabled
has been removed-blocks-storage.bucket-store.index-header.stream-reader-max-idle-file-handles
has been renamed to -blocks-storage.bucket-store.index-header.max-idle-file-handles
, and the corresponding configuration file option has been renamed from stream_reader_max_idle_file_handles
to max_idle_file_handles
-ingester.ring.readiness-check-ring-health
has been deprecated and will be removed in Mimir 2.9. #4422-ruler.evaluation-delay-duration
option from 0 to 1m. #4250422
coming from the store-gateway are propagated and not converted to the consistency check error anymore. #4100max_fetched_chunks_per_query
and max_fetched_series_per_query
limits, an error with the status code 422
is created and returned. #4056-blocks-storage.bucket-store.chunks-cache.subrange-size
since there's no benefit to changing the default of 16000
. #4135-blocks-storage.ephemeral-tsdb.*
-distributor.ephemeral-series-enabled
-distributor.ephemeral-series-matchers
-ingester.max-ephemeral-series-per-user
-ingester.instance-limits.max-ephemeral-series
{__mimir_storage__="ephemeral"}
selector no longer works. All label values with ephemeral-
prefix in reason
label of cortex_discarded_samples_total
metric are no longer available. Following metrics have been removed:cortex_ingester_ephemeral_series
cortex_ingester_ephemeral_series_created_total
cortex_ingester_ephemeral_series_removed_total
cortex_ingester_ingested_ephemeral_samples_total
cortex_ingester_ingested_ephemeral_samples_failures_total
cortex_ingester_memory_ephemeral_users
cortex_ingester_queries_ephemeral_total
cortex_ingester_queried_ephemeral_samples
cortex_ingester_queried_ephemeral_series
-blocks-storage.bucket-store.index-header.map-populate-enabled
has been removed-blocks-storage.bucket-store.index-header.stream-reader-enabled
has been removed-blocks-storage.bucket-store.index-header.stream-reader-max-idle-file-handles
has been renamed to -blocks-storage.bucket-store.index-header.max-idle-file-handles
, and the corresponding configuration file option has been renamed from stream_reader_max_idle_file_handles
to max_idle_file_handles
-blocks-storage.bucket-store.batch-series-size
is 5000
. #4330-compactor.consistency-delay
has been deprecated and will be removed in Mimir 2.9. #4409-blocks-storage.bucket-store.consistency-delay
has been deprecated and will be removed in Mimir 2.9. #4409keep_firing_for
support to alerting rules. #4099-ingester.native-histograms-ingestion-enabled
controls whether native histograms are stored or ignored. #4159-query-frontend.query-sharding-target-series-per-shard
to allow query sharding to take into account cardinality of similar requests executed previously. This feature uses the same cache that's used for results caching. #4121 #4177 #4188 #4254out_of_order_blocks_external_label_enabled
shipper option to label out-of-order blocks before shipping them to cloud storage. #4182 #4297reason
label to cortex_compactor_runs_failed_total
. The value can be shutdown
or error
. #4012max_fetched_series_per_query
. #4056data_type
label with values on cortex_bucket_store_partitioner_extended_ranges_total
, cortex_bucket_store_partitioner_expanded_ranges_total
, cortex_bucket_store_partitioner_requested_ranges_total
, cortex_bucket_store_partitioner_expanded_bytes_total
, cortex_bucket_store_partitioner_requested_bytes_total
for postings
, series
, and chunks
. #4095cortex_frontend_query_response_codec_duration_seconds
and cortex_frontend_query_response_codec_payload_bytes
metrics to measure the time taken and bytes read / written while encoding and decoding query result payloads. #4110cortex_alertmanager_dispatcher_aggregation_groups
, cortex_alertmanager_dispatcher_alert_processing_duration_seconds
. #4151-query-frontend.query-result-response-format=protobuf
. #4153-blocks-storage.bucket-store.chunks-cache.fine-grained-chunks-caching-enabled=true
. #4163 #4174 #4227encode
and other
stage to cortex_bucket_store_series_request_stage_duration_seconds
metric. #4179alpine:3.17.1
to alpine:3.17.2
. #4240stage
label to the metrics cortex_bucket_store_series_data_fetched
, cortex_bucket_store_series_data_size_fetched_bytes
, cortex_bucket_store_series_data_touched
, cortex_bucket_store_series_data_size_touched_bytes
. This label only applies to data_type="chunks"
. For fetched
metrics with data_type="chunks"
the stage
label has 2 values: fetched
- the chunks or bytes that were fetched from the cache or the object store, refetched
- the chunks or bytes that had to be refetched from the cache or the object store because their size was underestimated during the first fetch. For touched
metrics with data_type="chunks"
the stage
label has 2 values: processed
- the chunks or bytes that were read from the fetched chunks or bytes and were processed in memory, returned
- the chunks or bytes that were selected from the processed bytes to satisfy the query. #4227 #4316compactor.partial-block-deletion-delay
to potentially issue less requests to object storage. #4246-*.memcached.min-idle-connections-headroom-percentage
support to configure the minimum number of idle connections to keep open as a percentage (0-100) of the number of recently used idle connections. This feature is disabled when set to a negative value (default), which means idle connections are kept open indefinitely. #4249-compactor.block-upload.block-validation-enabled
with the default true
to configure whether block validation occurs on backfilled blocks. #3411-blocks-storage.tsdb.head-compaction-interval
. Subsequent checks will happen at the configured interval. This should help to spread the TSDB head compaction among different ingesters over the configured interval. #4364-blocks-storage.tsdb.head-compaction-interval
has been increased from 5m to 15m. #4364Canceled
rather than Aborted
or Internal
error when the calling querier cancels a label names or values request, and return Internal
if processing the request fails for another reason. #4061499
in the metrics instead of 503
or 422
. #4099/ingester/flush
or when TSDB is idle. #4180max-series-per-user
, max-series-per-metric
, max-metadata-per-user
and max-metadata-per-metric
into corresponding local limits now takes into account the number of ingesters in each zone. #4238cortex_ingester_memory_series
metric consistently with cortex_ingester_memory_series_created_total
and cortex_ingester_memory_series_removed_total
. #4312MimirMemcachedRequestErrors
has been renamed to MimirCacheRequestErrors
. #4242MimirAutoscalerKedaFailing
alert firing when a KEDA scaler is failing. #4045MimirAutoscalerNotActive
to not fire if scaling metric does not exist, to avoid false positives on scaled objects with 0 min replicas. #4045MimirCompactorHasNotSuccessfullyRunCompaction
is no longer triggered by frequent compactor restarts. #4012query-frontend-discovery
service only when Mimir is deployed in microservice mode without query-scheduler. #4353ruler-query-frontend
configuration to allow cache reuse for cardinality-estimation based sharding. #4257weight
param to newQuerierScaledObject
and newRulerQuerierScaledObject
to allow running multiple querier deployments on different node types. #4141$._config.shuffle_sharding.*
. #4363keep_firing_for
support to rules configuration. #4099-tls-insecure-skip-verify
to rules, alertmanager and backfill commands. #4162-backend.read-timeout
to 150s, to accommodate default querier and query frontend timeout of 120s. #4262X-Scope-OrgID
header when logging a comparison failure. #4262All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.6.0...mimir-2.7.1
Published by 56quarters over 1 year ago
This release contains 259 PRs from 40 authors, including new contributors breadly7, bubu11e, Đurica Yuri Nikolić, Felix Beuke, Jack, klagroix, Martin Chodur, Ørjan Ommundsen, Sascha Sternheim, Wu Zhiyuan. Thank you!
Grafana Labs is excited to announce version 2.6 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
Lower memory usage in store-gateway by streaming series results
The store-gateway can now stream results back to the querier instead of buffering them. This is expected to greatly reduce peak memory consumption while keeping latency the same. This is still an experimental feature but Grafana Labs is already running it in production and there's no known issue. This feature can be enabled setting the -blocks-storage.bucket-store.batch-series-size
configuration option (if you want to try it out, we recommend you setting to 5000).
Improved stability in store-gateway by removing mmap usage
The store-gateway can now use an alternate code path to read index-headers that does not use memory mapped files. This is expected to improve stability of the store-gateway. This is still an experimental feature but Grafana Labs is already running it in production and there's no known issue. This feature can be enabled setting -blocks-storage.bucket-store.index-header.stream-reader-enabled=true
.
Webex support Alertmanager can now use Webex to send alerts.
tenantID template function A new template function tenantID
, returning the ID of the tenant owning the alert, has been added.
grafanaExploreURL template function A new template function grafanaExploreURL
, returning the URL to the Grafana explore page with range query, has been added.
The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the corresponding documentation for more information.
In Grafana Mimir 2.6 we have removed the following previously deprecated or experimental configuration options:
-blocks-storage.bucket-store.max-concurrent-reject-over-limit
and its respective YAML configuration option blocks_storage.bucket_store.max_concurrent_reject_over_limit
.-query-frontend.align-querier-with-step
and its respective YAML configuration option frontend.align_querier_with_step
.The following configuration options are deprecated and will be removed in Grafana Mimir 2.8:
-store.max-query-length
and its respective YAML configuration option limits.max_query_length
have been replaced with -querier.max-partial-query-length
and limits.max_partial_query_length
.The following experimental options and features are now stable:
-query-frontend.max-total-query-length
and its respective YAML configuration option limits.max_total_query_length
.-distributor.request-rate-limit
and -distributor.request-burst-limit
and their respective YAML configuration options limits.request_rate_limit
and limits.request_rate_burst
.-ingester.max-global-exemplars-per-user
and its respective YAML configuration option limits.max_global_exemplars_per_user
.-ingester.tsdb-config-update-period
its respective YAML configuration option ingester.tsdb_config_update_period
./api/v1/query_exemplars
.github.com/thanos-io/objstore
to address issue with Multipart PUT on s3-compatible Object Storage. PR 3802 PR 3821
metric_relabel_configs
in overrides contains null element. PR 3868
-querier.max-partial-query-length
to limit the time range for partial queries at the querier level and deprecate -store.max-query-length
. #3825 #4017-blocks-storage.bucket-store.max-concurrent-reject-over-limit
flag. #3706-query-frontend.align-querier-with-step
has been removed. #3982-blocks-storage.bucket-store.batch-series-size
to a value in the high thousands (5000-10000). This is still an experimental feature and is subject to a changing API and instability. #3540 #3546 #3587 #3606 #3611 #3620 #3645 #3355 #3697 #3666 #3687 #3728 #3739 #3751 #3779 #3839-validation.separate-metrics-group-label
flag. This allows further separation of the cortex_discarded_samples_total
metric by an additional group
label - which is configured by this flag to be the value of a specific label on an incoming timeseries. Active groups are tracked and inactive groups are cleaned up on a defined interval. The maximum number of groups tracked is controlled by the -max-separate-metrics-groups-per-user
flag. #3439-overrides-exporter.ring.enabled
. When enabled, the ring is used to establish a leader replica for the export of limit override metrics. #3908 #3953-blocks-storage.ephemeral-tsdb.retention-period
, defaults to 10 minutes), and then removed from memory. To use ephemeral storage, distributor must be configured with -distributor.ephemeral-series-enabled
option. Series matching -distributor.ephemeral-series-matchers
will be marked for storing into ephemeral storage in ingesters. Each tenant needs to have ephemeral storage enabled by using -ingester.max-ephemeral-series-per-user
limit, which defaults to 0 (no ephemeral storage). Ingesters have new -ingester.instance-limits.max-ephemeral-series
limit for total number of series in ephemeral storage across all tenants. If ingestion of samples into ephemeral storage fails, cortex_discarded_samples_total
metric will use values prefixed with ephemeral-
for reason
label. Querying of ephemeral storage is possible by using {__mimir_storage__="ephemeral"}
as metric selector. Following new metrics related to ephemeral storage are introduced: #3897 #3922 #3961 #3997 #4004
cortex_ingester_ephemeral_series
cortex_ingester_ephemeral_series_created_total
cortex_ingester_ephemeral_series_removed_total
cortex_ingester_ingested_ephemeral_samples_total
cortex_ingester_ingested_ephemeral_samples_failures_total
cortex_ingester_memory_ephemeral_users
cortex_ingester_queries_ephemeral_total
cortex_ingester_queried_ephemeral_samples
cortex_ingester_queried_ephemeral_series
thanos_shipper_last_successful_upload_time
: Unix timestamp (in seconds) of the last successful TSDB block uploaded to the bucket. #3627-ruler.alertmanager-client.tls-enabled
configuration for alertmanager client. #3432 #3597component=activity-tracker
label. #3556-blocks-storage.bucket-store.index-header.stream-reader-enabled
. #3639 #3691 #3703 #3742 #3785 #3787 #3797cortex_query_scheduler_cancelled_requests_total
metric to track the number of requests that are already cancelled when dequeued. #3696cortex_bucket_store_partitioner_extended_ranges_total
metric to keep track of the ranges that the partitioner decided to overextend and merge in order to save API call to the object storage. #3769-ruler.for-grace-period
from 10m
to 2m
and update help text. The new default value reflects how we operate Mimir at Grafana Labs. #3817-blocks-storage.tsdb.head-postings-for-matchers-cache-ttl
-blocks-storage.tsdb.head-postings-for-matchers-cache-size
-blocks-storage.tsdb.head-postings-for-matchers-cache-force
tenantID
returning id of the tenant owning the alert. #3758grafanaExploreURL
returning URL to grafana explore with range query. #3849alpine:3.16.2
to alpine:3.17.1
. #3898/ingester/tsdb_metrics
endpoint to return tenant-specific TSDB metrics. #3923-query-frontend.max-total-query-length
and its associated YAML configuration is now stable. #3882align_evaluation_time_on_interval
field, which causes all evaluations to happen on interval-aligned timestamp. #4013unsupported value type
when calling /ready
and some services are not running. #3625cortex_bucket_store_partitioner_requested_bytes_total
metric to not double count overlapping ranges. #3769github.com/thanos-io/objstore
to address issue with Multipart PUT on s3-compatible Object Storage. #3802 #3821cortex_
prefix as expected by dashboards. #3809metric_relabel_configs
in overrides contains null element. #3868Canceled
rather than Aborted
error when the calling querier cancels the request. #4007MimirIngesterInstanceHasNoTenants
alert that fires when an ingester replica is not receiving write requests for any tenant. #3681MimirAllocatingTooMuchMemory
to check read-write deployment containers. #3710MimirAlertmanagerInstanceHasNoTenants
alert that fires when an alertmanager instance ows no tenants. #3826MimirRulerInstanceHasNoRuleGroups
alert that fires when a ruler replica is not assigned any rule group to evaluate. #3723$._config.autoscale.querier.hpa_name
). #3962MimirIngesterRestarts
alert when Mimir is deployed in read-write mode. #3716MimirIngesterHasNotShippedBlocks
and MimirIngesterHasNotShippedBlocksSinceStart
alerts for when Mimir is deployed in read-write or monolithic modes and updated them to use new thanos_shipper_last_successful_upload_time
metric. #3627MimirMemoryMapAreasTooHigh
alert when Mimir is deployed in read-write mode. #3626MimirCompactorSkippedBlocksWithOutOfOrderChunks
matching on non-existent label. #3628Rollout Progress
dashboard incorrectly using Gateway metrics when Gateway was not enabled. #3709MimirCompactorHasNotUploadedBlocks
to not fire if compactor has nothing to do. #3793MimirAutoscalerNotActive
to not fire if scaling metric is 0, to avoid false positives on scaled objects with 0 min replicas. #3999policy/v1beta1
with policy/v1
when configuring a PodDisruptionBudget for read-write deployment mode. #3811-server.http-write-timeout
default option value from querier and query-frontend, as it defaults to a higher value in the code now, and cannot be lower than -querier.timeout
. #3836-store.max-query-length
with -query-frontend.max-total-query-length
in the query-frontend config. #3879mimir_backend_data_disk_size
from 100Gi
to 250Gi
. #3894rollout-operator
to v0.2.0
. #3624user_24M
and user_32M
classes to operations config. #3367memcached:1.6.16-alpine
to memcached:1.6.17-alpine
. #3914mimir-write
and mimir-read
Kubernetes service to correctly balance requests among pods. #3855 #3864 #3906ruler-query-frontend
and mimir-read
gRPC server configuration to force clients to periodically re-resolve the backend addresses. #3862mimir-read
CLI flags to ensure query-frontend configuration takes precedence over querier configuration. #3877mimirtool config convert
to work with Mimir 2.4, 2.5, 2.6 changes. #3952brew install mimirtool
. #3776--concurrency
to mimirtool rules sync
command. #3996mimirtool rules sync
to display correct number of groups created and updated. #3918-querier.max-concurrent
flag must also be set for the query-frontend. #3678All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.5.0...mimir-2.6.0
Published by 56quarters over 1 year ago
This release contains 255 PRs from 40 authors, including new contributors breadly7, bubu11e, Đurica Yuri Nikolić, Felix Beuke, Jack, klagroix, Martin Chodur, Ørjan Ommundsen, Sascha Sternheim, Wu Zhiyuan. Thank you!
Grafana Labs is excited to announce version 2.6.0-rc.0 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
Lower memory usage in store-gateway by streaming series results
The store-gateway can now stream results back to the querier instead of buffering them. This is expected to greatly reduce peak memory consumption while keeping latency the same. This is still an experimental feature but Grafana Labs is already running it in production and there's no known issue. This feature can be enabled setting the -blocks-storage.bucket-store.batch-series-size
configuration option (if you want to try it out, we recommend you setting to 5000).
Improved stability in store-gateway by removing mmap usage
The store-gateway can now use an alternate code path to read index-headers that does not use memory mapped files. This is expected to improve stability of the store-gateway. This is still an experimental feature but Grafana Labs is already running it in production and there's no known issue. This feature can be enabled setting -blocks-storage.bucket-store.index-header.stream-reader-enabled=true
.
Webex support Alertmanager can now use Webex to send alerts.
tenantID template function A new template function tenantID
, returning the ID of the tenant owning the alert, has been added.
grafanaExploreURL template function A new template function grafanaExploreURL
, returning the URL to the Grafana explore page with range query, has been added.
The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the corresponding documentation for more information.
In Grafana Mimir 2.6 we have removed the following previously deprecated or experimental configuration options:
-blocks-storage.bucket-store.max-concurrent-reject-over-limit
and its respective YAML configuration option blocks_storage.bucket_store.max_concurrent_reject_over_limit
.-query-frontend.align-querier-with-step
and its respective YAML configuration option frontend.align_querier_with_step
.The following configuration options are deprecated and will be removed in Grafana Mimir 2.8:
-store.max-query-length
and its respective YAML configuration option limits.max_query_length
have been replaced with -querier.max-partial-query-length
and limits.max_partial_query_length
.The following experimental options and features are now stable:
-query-frontend.max-total-query-length
and its respective YAML configuration option limits.max_total_query_length
.-distributor.request-rate-limit
and -distributor.request-burst-limit
and their respective YAML configuration options limits.request_rate_limit
and limits.request_rate_burst
.-ingester.max-global-exemplars-per-user
and its respective YAML configuration option limits.max_global_exemplars_per_user
.-ingester.tsdb-config-update-period
its respective YAML configuration option ingester.tsdb_config_update_period
./api/v1/query_exemplars
.github.com/thanos-io/objstore
to address issue with Multipart PUT on s3-compatible Object Storage. PR 3802 PR 3821
metric_relabel_configs
in overrides contains null element. PR 3868
-querier.max-partial-query-length
to limit the time range for partial queries at the querier level and deprecate -store.max-query-length
. #3825 #4017-blocks-storage.bucket-store.max-concurrent-reject-over-limit
flag. #3706-query-frontend.align-querier-with-step
has been removed. #3982-blocks-storage.bucket-store.batch-series-size
to a value in the high thousands (5000-10000). This is still an experimental feature and is subject to a changing API and instability. #3540 #3546 #3587 #3606 #3611 #3620 #3645 #3355 #3697 #3666 #3687 #3728 #3739 #3751 #3779 #3839-validation.separate-metrics-group-label
flag. This allows further separation of the cortex_discarded_samples_total
metric by an additional group
label - which is configured by this flag to be the value of a specific label on an incoming timeseries. Active groups are tracked and inactive groups are cleaned up on a defined interval. The maximum number of groups tracked is controlled by the -max-separate-metrics-groups-per-user
flag. #3439-overrides-exporter.ring.enabled
. When enabled, the ring is used to establish a leader replica for the export of limit override metrics. #3908 #3953-blocks-storage.ephemeral-tsdb.retention-period
, defaults to 10 minutes), and then removed from memory. To use ephemeral storage, distributor must be configured with -distributor.ephemeral-series-enabled
option. Series matching -distributor.ephemeral-series-matchers
will be marked for storing into ephemeral storage in ingesters. Each tenant needs to have ephemeral storage enabled by using -ingester.max-ephemeral-series-per-user
limit, which defaults to 0 (no ephemeral storage). Ingesters have new -ingester.instance-limits.max-ephemeral-series
limit for total number of series in ephemeral storage across all tenants. If ingestion of samples into ephemeral storage fails, cortex_discarded_samples_total
metric will use values prefixed with ephemeral-
for reason
label. Querying of ephemeral storage is possible by using {__mimir_storage__="ephemeral"}
as metric selector. Following new metrics related to ephemeral storage are introduced: #3897 #3922 #3961 #3997 #4004
cortex_ingester_ephemeral_series
cortex_ingester_ephemeral_series_created_total
cortex_ingester_ephemeral_series_removed_total
cortex_ingester_ingested_ephemeral_samples_total
cortex_ingester_ingested_ephemeral_samples_failures_total
cortex_ingester_memory_ephemeral_users
cortex_ingester_queries_ephemeral_total
cortex_ingester_queried_ephemeral_samples
cortex_ingester_queried_ephemeral_series
thanos_shipper_last_successful_upload_time
: Unix timestamp (in seconds) of the last successful TSDB block uploaded to the bucket. #3627-ruler.alertmanager-client.tls-enabled
configuration for alertmanager client. #3432 #3597component=activity-tracker
label. #3556-blocks-storage.bucket-store.index-header.stream-reader-enabled
. #3639 #3691 #3703 #3742 #3785 #3787 #3797cortex_query_scheduler_cancelled_requests_total
metric to track the number of requests that are already cancelled when dequeued. #3696cortex_bucket_store_partitioner_extended_ranges_total
metric to keep track of the ranges that the partitioner decided to overextend and merge in order to save API call to the object storage. #3769-ruler.for-grace-period
from 10m
to 2m
and update help text. The new default value reflects how we operate Mimir at Grafana Labs. #3817-blocks-storage.tsdb.head-postings-for-matchers-cache-ttl
-blocks-storage.tsdb.head-postings-for-matchers-cache-size
-blocks-storage.tsdb.head-postings-for-matchers-cache-force
tenantID
returning id of the tenant owning the alert. #3758grafanaExploreURL
returning URL to grafana explore with range query. #3849alpine:3.16.2
to alpine:3.17.1
. #3898/ingester/tsdb_metrics
endpoint to return tenant-specific TSDB metrics. #3923-query-frontend.max-total-query-length
and its associated YAML configuration is now stable. #3882align_evaluation_time_on_interval
field, which causes all evaluations to happen on interval-aligned timestamp. #4013unsupported value type
when calling /ready
and some services are not running. #3625cortex_bucket_store_partitioner_requested_bytes_total
metric to not double count overlapping ranges. #3769github.com/thanos-io/objstore
to address issue with Multipart PUT on s3-compatible Object Storage. #3802 #3821cortex_
prefix as expected by dashboards. #3809metric_relabel_configs
in overrides contains null element. #3868Canceled
rather than Aborted
error when the calling querier cancels the request. #4007MimirIngesterInstanceHasNoTenants
alert that fires when an ingester replica is not receiving write requests for any tenant. #3681MimirAllocatingTooMuchMemory
to check read-write deployment containers. #3710MimirAlertmanagerInstanceHasNoTenants
alert that fires when an alertmanager instance ows no tenants. #3826MimirRulerInstanceHasNoRuleGroups
alert that fires when a ruler replica is not assigned any rule group to evaluate. #3723$._config.autoscale.querier.hpa_name
). #3962MimirIngesterRestarts
alert when Mimir is deployed in read-write mode. #3716MimirIngesterHasNotShippedBlocks
and MimirIngesterHasNotShippedBlocksSinceStart
alerts for when Mimir is deployed in read-write or monolithic modes and updated them to use new thanos_shipper_last_successful_upload_time
metric. #3627MimirMemoryMapAreasTooHigh
alert when Mimir is deployed in read-write mode. #3626MimirCompactorSkippedBlocksWithOutOfOrderChunks
matching on non-existent label. #3628Rollout Progress
dashboard incorrectly using Gateway metrics when Gateway was not enabled. #3709MimirCompactorHasNotUploadedBlocks
to not fire if compactor has nothing to do. #3793MimirAutoscalerNotActive
to not fire if scaling metric is 0, to avoid false positives on scaled objects with 0 min replicas. #3999policy/v1beta1
with policy/v1
when configuring a PodDisruptionBudget for read-write deployment mode. #3811-server.http-write-timeout
default option value from querier and query-frontend, as it defaults to a higher value in the code now, and cannot be lower than -querier.timeout
. #3836-store.max-query-length
with -query-frontend.max-total-query-length
in the query-frontend config. #3879mimir_backend_data_disk_size
from 100Gi
to 250Gi
. #3894rollout-operator
to v0.2.0
. #3624user_24M
and user_32M
classes to operations config. #3367memcached:1.6.16-alpine
to memcached:1.6.17-alpine
. #3914mimir-write
and mimir-read
Kubernetes service to correctly balance requests among pods. #3855 #3864 #3906ruler-query-frontend
and mimir-read
gRPC server configuration to force clients to periodically re-resolve the backend addresses. #3862mimir-read
CLI flags to ensure query-frontend configuration takes precedence over querier configuration. #3877mimirtool config convert
to work with Mimir 2.4, 2.5, 2.6 changes. #3952brew install mimirtool
. #3776--concurrency
to mimirtool rules sync
command. #3996mimirtool rules sync
to display correct number of groups created and updated. #3918-querier.max-concurrent
flag must also be set for the query-frontend. #3678All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.5.0...mimir-2.6.0-rc.0
Published by pstibrany almost 2 years ago
This release contains 230 PRs from 43 authors, including new contributors Aldo D'Aquino, Anıl Mısırlıoğlu, Charles Korn, Danny Staple, Dylan Crees, Eduardo Silvi, FG, Jesse Weaver, KarlisAG, Leegin-darknight, Rohan Kumar, Wille Faler, Y.Horie, manohar-koukuntla, paulroche, songjiayang, Éamon Ryan. Thank you!
Grafana Labs is excited to announce version 2.5 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
Alertmanager Discord support
Alertmanager can now be configured to send alerts in Discord channels.
Configurable TLS minimum version and cipher suites
We added the flags -server.tls-min-version
and -server.tls-cipher-suites
that can be used to define the minimum TLS version and the supported cipher suites in all HTTP and gRPC servers in Mimir.
Lower memory usage in store-gateway, ingester and alertmanager
We made various changes related to how index lookups are performed and how the active series custom trackers are implemented, which results in better performance and lower overall memory usage in the store-gateway and ingester.
We also optimized the alertmanager, which results in a 50% reduction in memory usage in use cases with larger numbers of tenants.
Improved Mimir dashboards
We added two new dashboards named Mimir / Overview resources
and Mimir / Overview networking
. Furthermore, we have made various improvements to the following existing dashboards:
Mimir / Overview
: Add "remote read", "metadata", and "exemplar" queries.Mimir / Writes
: Add optional row about the distributor's new forwarding feature.Mimir / Tenants
: Add insights into the read path.Zone aware replication
Helm now supports deploying the ingesters and store-gateways as different availability zones. The replication is also zone-aware, therefore multiple instances of one zone can fail without any service interruption and roll outs can be performed faster because many instances of each zone can be restarted together, as opposed to them all restarting in sequence.
This is a breaking change, for details on how to upgrade please review the Helm changelog.
Running without root privileges
All Mimir, GEM and Agent processes now don't require root privileges to run anymore.
Unified reverse proxy (gateway
) configuration for Mimir and GEM
This change allows for an easier upgrade path from Mimir to GEM, without any downtime. The unified configuration also makes it possible to autoscale the GEM gateway pods and it supports OpenShift Route. The change also deprecates the nginx
section in the configuration. The section will be removed in release 7.0.0
.
Updated MinIO
The MinIO sub-chart was updated from 4.x
to 5.0.0
, note that this update inherits a breaking change because the MinIO gateway mode was removed.
Updated sizing plans
We updated our sizing plans to make them reflect better how we recommend running Mimir and GEM in production. Note that this includes a breaking change for users of the "small" plan, more details can be found in the Helm changelog.
Various quality of life improvements
Overrides
as a dependency to prevent panics when starting with -target=flusher
. PR 3151
-azure.msi-resource
is now ignored, and will be removed in Mimir 2.7. This setting is now made automatically by Azure. #2682-blocks-storage.tsdb.out-of-order-capacity-min
has been removed. #3261-server.http-write-timeout
has changed from 30s to 2m. #3346-server.tls-min-version
and -server.tls-cipher-suites
flags to configure cipher suites and min TLS version supported by HTTP and gRPC servers. #2898cortex_discarded_samples_total{reason="forwarded-sample-too-old"}
is increased. #3049 #3113--validation.create-grace-period
) to avoid querying too far into the future. #3172-usage-stats.installation-mode
configuration to track the installation mode via the anonymous usage statistics. #3244cortex_compactor_block_max_time_delta_seconds
histogram for detecting if compaction of blocks is lagging behind. #3240 #3429X-Scope-OrgId
header in requests forwarded to configured forwarding endpoint. #3283 #3385-shutdown-delay
to allow components to wait after receiving SIGTERM and before stopping. In this time the component returns 503 from /ready endpoint. #3298RulerRemoteEvaluationFailing
alert, firing when communication between ruler and frontend fails in remote operational mode. #3177 #3389Overrides
as a dependency to prevent panics when starting with -target=flusher
. #3151golang.org/x/text
dependency to fix CVE-2022-32149. #3285MimirSchedulerQueriesStuck
for
time to 7 minutes to account for the time it takes for HPA to scale up. #3223Querier > Stages
panel from the Mimir / Queries
dashboard. #3311autoscaling
section of the configuration has changed to support more components. #3378
autoscaling.querier_enabled
becomes autoscaling.querier.enabled
.Mimir / Writes
dashboard. #3182 #3394 #3394 #3461Mimir / Writes
for distributor autoscaling metrics. #3378persistentvolumeclaim
when using deployment_type=baremetal
for Disk space utilization
panels. #3173 #3184MimirGossipMembersMismatch
alert when Mimir is deployed in read-write mode. #3489policy/v1beta1
with policy/v1
when configuring a PodDisruptionBudget. #3284blocks_storage_backend
was renamed to storage_backend
and is now used as the common storage backend for all components.
blocks_storage_azure_account_(name|key)
and blocks_storage_s3_endpoint
configurations.storage_s3_endpoint
is now rendered by default using the aws_region
configuration instead of a hardcoded us-east-1
.ruler_client_type
and alertmanager_client_type
were renamed to ruler_storage_backend
and alertmanager_storage_backend
respectively, and their corresponding CLI flags won't be rendered unless explicitly set to a value different from the one in storage_backend
(like local
).alertmanager_s3_bucket_name
, alertmanager_gcs_bucket_name
and alertmanager_azure_container_name
have been removed, and replaced by a single alertmanager_storage_bucket_name
configuration used for all object storages.genericBlocksStorageConfig
configuration object was removed, and so any extensions to it will be now ignored. Use blockStorageConfig
instead.rulerClientConfig
and alertmanagerStorageClientConfig
configuration objects were renamed to rulerStorageConfig
and alertmanagerStorageConfig
respectively, and so any extensions to their previous names will be now ignored. Use the new names instead.*.s3.region
are no longer rendered as they are optional and the region can be inferred by Mimir by performing an initial API call to the endpoint.blocks_storage_backend
key to storage_backend
.blocks_storage_(azure|s3)_*
configurations to storage_(azure|s3)_*
.ruler_storage_(azure|s3)_*
and alertmanager_storage_(azure|s3)_*
keys were different from the block_storage_*
ones, they should be now provided using CLI flags, see configuration reference for more details.ruler_client_type
and alertmanager_client_type
if their value match the storage_backend
, or renaming them to their new names otherwise.genericBlocksStorageConfig
, rulerClientConfig
and alertmanagerStorageClientConfig
and moving them to the corresponding new options.alertmanager_storage_bucket_name
key.overrides-exporter.libsonnet
file is now always imported. The overrides-exporter can be enabled in jsonnet setting the following: #3379
{
_config+:: {
overrides_exporter_enabled: true,
}
}
{
_config+:: {
deployment_mode: 'read-write',
// See operations/mimir/read-write-deployment.libsonnet for more configuration options.
mimir_write_replicas: 3,
mimir_read_replicas: 2,
mimir_backend_replicas: 3,
}
}
mimir-read
component when running the read-write-deployment model. #3419$._config.usageStatsConfig
to track the installation mode via the anonymous usage statistics. #3294$._config.query_tee_node_port
) is now optional. #3272mimirtool alertmanager verify
command to validate configuration without uploading. #3440mimirtool rules delete-namespace
command to delete all of the rule groups in a namespace including the namespace itself. #3136mimirtool analyze prometheus
: add concurrency and resiliency #3349
--concurrency
flag. Default: number of logical CPUs--log.level=debug
now correctly prints the response from the remote endpoint when a request fails. #3180MimirQuerierAutoscalerNotActive
runbook. #3186MimirSchedulerQueriesStuck
runbook to reflect debug steps with querier auto-scaling enabled. #3223copyblocks
tool, to copy Mimir blocks between two GCS buckets. #3264label
when running prepare
command. #3236All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.4.0...mimir-2.5.0
Published by replay almost 2 years ago
This release contains 227 PRs from 43 authors, including new contributors Aldo D'Aquino, Anıl Mısırlıoğlu, Charles Korn, Danny Staple, Dylan Crees, Eduardo Silvi, FG, Jesse Weaver, KarlisAG, Leegin-darknight, Rohan Kumar, Wille Faler, Y.Horie, manohar-koukuntla, paulroche, songjiayang, Éamon Ryan. Thank you!
Grafana Labs is excited to announce version 2.5.0-rc.0 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
Alertmanager Discord support
Alertmanager can now be configured to send alerts in Discord channels.
Configurable TLS minimum version and cipher suites
We added the flags -server.tls-min-version
and -server.tls-cipher-suites
that can be used to define the minimum TLS version and the supported cipher suites in all HTTP and gRPC servers in Mimir.
Lower memory usage in store-gateway, ingester and alertmanager
We made various changes related to how index lookups are performed and how the active series custom trackers are implemented, which results in better performance and lower overall memory usage in the store-gateway and ingester.
We also optimized the alertmanager, which results in a 50% reduction in memory usage in use cases with larger numbers of tenants.
Improved Mimir dashboards
We added two new dashboards named Mimir / Overview resources
and Mimir / Overview networking
. Furthermore, we have made various improvements to the following existing dashboards:
Mimir / Overview
: Add "remote read", "metadata", and "exemplar" queries.Mimir / Writes
: Add optional row about the distributor's new forwarding feature.Mimir / Tenants
: Add insights into the read path.Zone aware replication
Helm now supports deploying the ingesters and store-gateways as different availability zones. The replication is also zone-aware, therefore multiple instances of one zone can fail without any service interruption and roll outs can be performed faster because many instances of each zone can be restarted together, as opposed to them all restarting in sequence.
This is a breaking change, for details on how to upgrade please review the Helm changelog.
Running without root privileges
All Mimir, GEM and Agent processes now don't require root privileges to run anymore.
Unified reverse proxy (gateway
) configuration for Mimir and GEM
This change allows for an easier upgrade path from Mimir to GEM, without any downtime. The unified configuration also makes it possible to autoscale the GEM gateway pods and it supports OpenShift Route. The change also deprecates the nginx
section in the configuration. The section will be removed in release 7.0.0
.
Updated MinIO
The MinIO sub-chart was updated from 4.x
to 5.0.0
, note that this update inherits a breaking change because the MinIO gateway mode was removed.
Updated sizing plans
We updated our sizing plans to make them reflect better how we recommend running Mimir and GEM in production. Note that this includes a breaking change for users of the "small" plan, more details can be found in the Helm changelog.
Various quality of life improvements
Overrides
as a dependency to prevent panics when starting with -target=flusher
. PR 3151
-azure.msi-resource
is now ignored, and will be removed in Mimir 2.7. This setting is now made automatically by Azure. #2682-blocks-storage.tsdb.out-of-order-capacity-min
has been removed. #3261-server.http-write-timeout
has changed from 30s to 2m. #3346-server.tls-min-version
and -server.tls-cipher-suites
flags to configure cipher suites and min TLS version supported by HTTP and gRPC servers. #2898cortex_discarded_samples_total{reason="forwarded-sample-too-old"}
is increased. #3049 #3113--validation.create-grace-period
) to avoid querying too far into the future. #3172-usage-stats.installation-mode
configuration to track the installation mode via the anonymous usage statistics. #3244cortex_compactor_block_max_time_delta_seconds
histogram for detecting if compaction of blocks is lagging behind. #3240 #3429X-Scope-OrgId
header in requests forwarded to configured forwarding endpoint. #3283 #3385-shutdown-delay
to allow components to wait after receiving SIGTERM and before stopping. In this time the component returns 503 from /ready endpoint. #3298RulerRemoteEvaluationFailing
alert, firing when communication between ruler and frontend fails in remote operational mode. #3177 #3389Overrides
as a dependency to prevent panics when starting with -target=flusher
. #3151golang.org/x/text
dependency to fix CVE-2022-32149. #3285MimirSchedulerQueriesStuck
for
time to 7 minutes to account for the time it takes for HPA to scale up. #3223Querier > Stages
panel from the Mimir / Queries
dashboard. #3311autoscaling
section of the configuration has changed to support more components. #3378
autoscaling.querier_enabled
becomes autoscaling.querier.enabled
.Mimir / Writes
dashboard. #3182 #3394 #3394 #3461Mimir / Writes
for distributor autoscaling metrics. #3378persistentvolumeclaim
when using deployment_type=baremetal
for Disk space utilization
panels. #3173 #3184MimirGossipMembersMismatch
alert when Mimir is deployed in read-write mode. #3489policy/v1beta1
with policy/v1
when configuring a PodDisruptionBudget. #3284blocks_storage_backend
was renamed to storage_backend
and is now used as the common storage backend for all components.
blocks_storage_azure_account_(name|key)
and blocks_storage_s3_endpoint
configurations.storage_s3_endpoint
is now rendered by default using the aws_region
configuration instead of a hardcoded us-east-1
.ruler_client_type
and alertmanager_client_type
were renamed to ruler_storage_backend
and alertmanager_storage_backend
respectively, and their corresponding CLI flags won't be rendered unless explicitly set to a value different from the one in storage_backend
(like local
).alertmanager_s3_bucket_name
, alertmanager_gcs_bucket_name
and alertmanager_azure_container_name
have been removed, and replaced by a single alertmanager_storage_bucket_name
configuration used for all object storages.genericBlocksStorageConfig
configuration object was removed, and so any extensions to it will be now ignored. Use blockStorageConfig
instead.rulerClientConfig
and alertmanagerStorageClientConfig
configuration objects were renamed to rulerStorageConfig
and alertmanagerStorageConfig
respectively, and so any extensions to their previous names will be now ignored. Use the new names instead.*.s3.region
are no longer rendered as they are optional and the region can be inferred by Mimir by performing an initial API call to the endpoint.blocks_storage_backend
key to storage_backend
.blocks_storage_(azure|s3)_*
configurations to storage_(azure|s3)_*
.ruler_storage_(azure|s3)_*
and alertmanager_storage_(azure|s3)_*
keys were different from the block_storage_*
ones, they should be now provided using CLI flags, see configuration reference for more details.ruler_client_type
and alertmanager_client_type
if their value match the storage_backend
, or renaming them to their new names otherwise.genericBlocksStorageConfig
, rulerClientConfig
and alertmanagerStorageClientConfig
and moving them to the corresponding new options.alertmanager_storage_bucket_name
key.overrides-exporter.libsonnet
file is now always imported. The overrides-exporter can be enabled in jsonnet setting the following: #3379
{
_config+:: {
overrides_exporter_enabled: true,
}
}
{
_config+:: {
deployment_mode: 'read-write',
// See operations/mimir/read-write-deployment.libsonnet for more configuration options.
mimir_write_replicas: 3,
mimir_read_replicas: 2,
mimir_backend_replicas: 3,
}
}
mimir-read
component when running the read-write-deployment model. #3419$._config.usageStatsConfig
to track the installation mode via the anonymous usage statistics. #3294$._config.query_tee_node_port
) is now optional. #3272mimirtool alertmanager verify
command to validate configuration without uploading. #3440mimirtool rules delete-namespace
command to delete all of the rule groups in a namespace including the namespace itself. #3136mimirtool analyze prometheus
: add concurrency and resiliency #3349
--concurrency
flag. Default: number of logical CPUs--log.level=debug
now correctly prints the response from the remote endpoint when a request fails. #3180MimirQuerierAutoscalerNotActive
runbook. #3186MimirSchedulerQueriesStuck
runbook to reflect debug steps with querier auto-scaling enabled. #3223copyblocks
tool, to copy Mimir blocks between two GCS buckets. #3264label
when running prepare
command. #3236All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.4.0...mimir-2.5.0-rc.0
Published by pracucci almost 2 years ago
This release contains 190 PRs from 29 authors, including new contributors Fayzal Ghantiwala, Furkan Türkal, Joe Blubaugh, Justin Lei, Nicolas DUPEUX, Paul Puschmann, Radu Domnu, Shubham Ranjan. Thank you!
Grafana Labs is excited to announce version 2.4 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
Note: If you are upgrading from Grafana Mimir 2.3, review the list of important changes that follow.
Query-scheduler ring-based service discovery:
The query-scheduler is an optional, stateless component that retains a queue of queries to execute, and distributes the workload to available queriers. The use the query-scheduler, query-frontends and queriers are required to discover the addresses of the query-scheduler instances.
In addition to DNS-based service discovery, Mimir 2.4 introduces the ring-based service discovery for the query-scheduler. When enabled, the query-schedulers join their own hash ring (similar to other Mimir components), and the query-frontends and queriers discover query-scheduler instances via the ring.
Ring-based service discovery makes it easier to set up the query-scheduler in environments where you can't easily define a DNS entry that resolves to the running query-scheduler instances. For more information, refer to query-scheduler configuration.
New API endpoint exposes per-tenant limits:
Mimir 2.4 introduces a new API endpoint, which is available on all Mimir components that load the runtime configuration. The endpoint exposes the limits of the authenticated tenant. You can use this new API endpoint when developing custom integrations with Mimir that require looking up the actual limits that are applied on a given tenant. For more information, refer to Get tenant limits.
New TLS configuration options:
Mimir 2.4 introduces new options to configure the accepted TLS cipher suites, and the minimum versions for the HTTP and gRPC clients that are used between Mimir components, or by Mimir to communicate to external services such as Consul or etcd.
You can use these new configuration options to override the default TLS settings and meet your security policy requirements. For more information, refer to Securing Grafana Mimir communications with TLS.
Maximum range query length limit:
Mimir 2.4 introduces the new configuration option -query-frontend.max-total-query-length
to limit the maximum range query length, which is computed as the query's end
minus start
timestamp. This limit is enforced in the query-frontend and defaults to -store.max-query-length
if unset.
The new configuration option allows you to set different limits between the received query maximum length (-query-frontend.max-total-query-length
) and the maximum length of partial queries after splitting and sharding (-store.max-query-length
).
The following experimental features have been promoted to stable:
The mimir-distributed
Helm chart is the best way to install Mimir on Kubernetes. As part of the Mimir 2.4 release, we’re also releasing version 3.2 of the mimir-distributed
Helm chart.
Notable enhancements follow. For the full list of changes, see the Helm chart changelog.
ingester:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: target
operator: In
values:
- ingester
topologyKey: "kubernetes.io/hostname"
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- ingester
topologyKey: "kubernetes.io/hostname"
store_gateway:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: target
operator: In
values:
- store-gateway
topologyKey: "kubernetes.io/hostname"
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- store-gateway
topologyKey: "kubernetes.io/hostname"
In Grafana Mimir 2.4, the default values of the following configuration options have changed:
-distributor.remote-timeout
has changed from 20s
to 2s
.-distributor.forwarding.request-timeout
has changed from 10s
to 2s
.-blocks-storage.tsdb.head-compaction-concurrency
has changed from 5
to 1
.5s
to 15s
.In Grafana Mimir 2.4, the following deprecated configuration options have been removed:
limits.active_series_custom_trackers_config
.-ingester.ring.join-after
and its respective YAML configuration option ingester.ring.join_after
.-querier.shuffle-sharding-ingesters-lookback-period
and its respective YAML configuration option querier.shuffle_sharding_ingesters_lookback_period
.With Grafana Mimir 2.4, the anonymous usage statistics tracking is enabled by default.
Mimir maintainers use this anonymous information to learn more about how the open source community runs Mimir and what the Mimir team should focus on when working on the next features and documentation improvements.
If possible, we ask you to keep the usage reporting feature enabled.
In case you want to opt-out from anonymous usage statistics reporting, refer to Disable the anonymous usage statistics reporting.
Accept-Encoding: snappy
HTTP request header.-distributor.remote-timeout
to 2s
from 20s
and -distributor.forwarding.request-timeout
to 2s
from 10s
to improve distributor resource usage when ingesters crash. #2728 #2912-ingester.ring.store
value. #2981HELP
that is longer than -validation.max-metadata-length
is now truncated silently, instead of being dropped with a 400 status code. #2993-ingester.ring.readiness-check-ring-health
from true
to false
. #2953-ingester.out-of-order-time-window
. #29405s
to 15s
. Now the default heartbeat period for all Mimir hash rings is 15s
. #3033-blocks-storage.tsdb.head-compaction-concurrency
) from 5 to 1, in order to reduce CPU spikes. #3093-ruler.query-frontend.address
) is now stable. #3109active_series_custom_trackers_config
. Please use active_series_custom_trackers
instead. #3110-ingester.ring.join-after
. #3111-querier.shuffle-sharding-ingesters-lookback-period
. The value of -querier.query-ingesters-within
is now used internally for shuffle sharding lookback, while you can use -querier.shuffle-sharding-ingesters-enabled
to enable or disable shuffle sharding on the read path. #3111-memberlist.cluster-label
and -memberlist.cluster-label-verification-disabled
) is now marked as stable. #3108/api/v1/user_limits
exposed by all components that load runtime configuration. This endpoint exposes realtime limits for the authenticated tenant, in JSON format. #2864 #3017-query-scheduler.max-used-instances
to restrict the number of query-schedulers effectively used regardless how many replicas are running. This feature can be useful when using the experimental read-write deployment mode. #3005cortex_frontend_query_result_cache_skipped_total
and cortex_frontend_query_result_cache_attempted_total
metrics to track the reason why query results are not cached. #2855cortex_distributor_forward_errors_total
, with status_code="failed"
. #2968httpgrpc
messages from weaveworks/common library. #2996-alertmanager.alertmanager-client.tls-cipher-suites
-alertmanager.alertmanager-client.tls-min-version
-alertmanager.sharding-ring.etcd.tls-cipher-suites
-alertmanager.sharding-ring.etcd.tls-min-version
-compactor.ring.etcd.tls-cipher-suites
-compactor.ring.etcd.tls-min-version
-distributor.forwarding.grpc-client.tls-cipher-suites
-distributor.forwarding.grpc-client.tls-min-version
-distributor.ha-tracker.etcd.tls-cipher-suites
-distributor.ha-tracker.etcd.tls-min-version
-distributor.ring.etcd.tls-cipher-suites
-distributor.ring.etcd.tls-min-version
-ingester.client.tls-cipher-suites
-ingester.client.tls-min-version
-ingester.ring.etcd.tls-cipher-suites
-ingester.ring.etcd.tls-min-version
-memberlist.tls-cipher-suites
-memberlist.tls-min-version
-querier.frontend-client.tls-cipher-suites
-querier.frontend-client.tls-min-version
-querier.store-gateway-client.tls-cipher-suites
-querier.store-gateway-client.tls-min-version
-query-frontend.grpc-client-config.tls-cipher-suites
-query-frontend.grpc-client-config.tls-min-version
-query-scheduler.grpc-client-config.tls-cipher-suites
-query-scheduler.grpc-client-config.tls-min-version
-query-scheduler.ring.etcd.tls-cipher-suites
-query-scheduler.ring.etcd.tls-min-version
-ruler.alertmanager-client.tls-cipher-suites
-ruler.alertmanager-client.tls-min-version
-ruler.client.tls-cipher-suites
-ruler.client.tls-min-version
-ruler.query-frontend.grpc-client-config.tls-cipher-suites
-ruler.query-frontend.grpc-client-config.tls-min-version
-ruler.ring.etcd.tls-cipher-suites
-ruler.ring.etcd.tls-min-version
-store-gateway.sharding-ring.etcd.tls-cipher-suites
-store-gateway.sharding-ring.etcd.tls-min-version
-blocks-storage.bucket-store.max-concurrent-reject-over-limit
option to allow requests that exceed the max number of inflight object storage requests to be rejected. #2999-query-frontend.max-total-query-length
flag, which defaults to -store.max-query-length
if unset or set to 0. #3058-ingester.out-of-order-allowance
from ingesters). #2935-ruler.recording-rules-evaluation-enabled
-ruler.alerting-rules-evaluation-enabled
-compactor.blocks-retention-period
) to avoid querying past this period. #3134golang.org/x/net
dependency to fix CVE-2022-27664. #3124500
status code when a 400
was received from the ingester. #3211gossip_member_label
is now set for ruler-queriers. #3141MimirSchedulerQueriesStuck
runbook. #3006mimirtool analyze
parameters documentation. #3094-success-only=true
and the captured request failed. #2863All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.3.1...mimir-2.4.0
Published by pracucci about 2 years ago
This release contains 8 PRs from 2 authors. Thank you!
All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.4.0-rc.0...mimir-2.4.0-rc.1
Published by pracucci about 2 years ago
This release contains 166 PRs from 29 authors. Thank you!
Grafana Labs is excited to announce version 2.4 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
Note: If you are upgrading from Grafana Mimir 2.3, review the list of important changes that follow.
Query-scheduler ring-based service discovery: The query-scheduler is an optional, stateless component that retains a queue of queries to execute, and distributes the workload to available queriers. The use the query-scheduler, query-frontends and queriers are required to discover the addresses of the query-scheduler instances.
In addition to DNS-based service discovery, Mimir 2.4 introduces the ring-based service discovery for the query-scheduler. When enabled, the query-schedulers join their own hash ring (similar to other Mimir components), and the query-frontends and queriers discover query-scheduler instances via the ring.
Ring-based service discovery makes it easier to set up the query-scheduler in environments where you can’t easily define a DNS entry that resolves to the running query-scheduler instances. For more information, refer to query-scheduler configuration.
New API endpoint exposes per-tenant limits: Mimir 2.4 introduces a new API endpoint, which is available on all Mimir components that load the runtime configuration. The endpoint exposes the limits of the authenticated tenant. You can use this new API endpoint when developing custom integrations with Mimir that require looking up the actual limits that are applied on a given tenant. For more information, refer to Get tenant limits.
New TLS configuration options: Mimir 2.4 introduces new options to configure the accepted TLS cipher suites, and the minimum versions for the HTTP and gRPC clients that are used between Mimir components, or by Mimir to communicate to external services such as Consul or etcd.
You can use these new configuration options to override the default TLS settings and meet your security policy requirements. For more information, refer to Securing Grafana Mimir communications with TLS.
Maximum range query length limit: Mimir 2.4 introduces the new configuration option -query-frontend.max-total-query-length
to limit the maximum range query length, which is computed as the query’s end minus start timestamp. This limit is enforced in the query-frontend and defaults to -store.max-query-length
if unset.
The new configuration option allows you to set different limits between the received query maximum length (-query-frontend.max-total-query-length
) and the maximum length of partial queries after splitting and sharding (-store.max-query-length
).
The mimir-distributed
Helm chart is the best way to install Mimir on Kubernetes. As part of the Mimir 2.4 release, we’re also releasing version 3.2 of the mimir-distributed
Helm chart.
Notable enhancements follow. For the full list of changes, see the Helm chart changelog.
Added support for topologySpreadContraints.
Replaced the default anti-affinity rules with topologySpreadContraints for all components which puts less restrictions on where Kubernetes can run pods.
Important: if you are not using the sizing plans (small.yaml, large.yaml, capped-small.yaml, capped-large.yaml) in production, you must reintroduce pod affinity rules for the ingester and store-gateway. This also fixes a missing label selector for the ingester. Merge the following with your custom values file:
ingester:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: target
operator: In
values:
- ingester
topologyKey: "kubernetes.io/hostname"
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- ingester
topologyKey: "kubernetes.io/hostname"
store_gateway:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: target
operator: In
values:
- store-gateway
topologyKey: "kubernetes.io/hostname"
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- store-gateway
topologyKey: "kubernetes.io/hostname"
Updated the anti affinity rules in the sizing plans (small.yaml, large.yaml, capped-small.yaml, capped-large.yaml). The sizing plans now enforce that no two pods of the ingester, store-gateway, or alertmanager StatefulSets are scheduled on the same Node. Pods from different StaatefulSets can share a Node.
Support for Openshift Route resource for nginx has been added.
In Grafana Mimir 2.4, the default values of the following configuration options have changed:
-distributor.remote-timeout
has changed from 20s
to 2s
.-distributor.forwarding.request-timeout
has changed from 10s
to 2s
.-blocks-storage.tsdb.head-compaction-concurrency
has changed from 5
to 1
.5s
to 15s
.With Grafana Mimir 2.4, the anonymous usage statistics tracking is enabled by default. Mimir maintainers use this anonymous information to learn more about how the open source community runs Mimir and what the Mimir team should focus on when working on the next features and documentation improvements. If possible, we ask you to keep the usage reporting feature enabled. In case you want to opt-out from anonymous usage statistics reporting, refer to Disable the anonymous usage statistics reporting.
Accept-Encoding: snappy
HTTP request header.-distributor.remote-timeout
to 2s
from 20s
and -distributor.forwarding.request-timeout
to 2s
from 10s
to improve distributor resource usage when ingesters crash. #2728 #2912-ingester.ring.store
value. #2981HELP
that is longer than -validation.max-metadata-length
is now truncated silently, instead of being dropped with a 400 status code. #2993-ingester.ring.readiness-check-ring-health
from true
to false
. #2953-ingester.out-of-order-time-window
. #29405s
to 15s
. Now the default heartbeat period for all Mimir hash rings is 15s
. #3033-blocks-storage.tsdb.head-compaction-concurrency
) from 5 to 1, in order to reduce CPU spikes. #3093-ruler.query-frontend.address
) is now stable. #3109active_series_custom_trackers_config
. Please use active_series_custom_trackers
instead. #3110-ingester.ring.join-after
. #3111-querier.shuffle-sharding-ingesters-lookback-period
. The value of -querier.query-ingesters-within
is now used internally for shuffle sharding lookback, while you can use -querier.shuffle-sharding-ingesters-enabled
to enable or disable shuffle sharding on the read path. #3111-memberlist.cluster-label
and -memberlist.cluster-label-verification-disabled
) is now marked as stable. #3108-compactor.blocks-retention-period
) to avoid querying past this period. #3134/api/v1/user_limits
exposed by all components that load runtime configuration. This endpoint exposes realtime limits for the authenticated tenant, in JSON format. #2864 #3017-query-scheduler.max-used-instances
to restrict the number of query-schedulers effectively used regardless how many replicas are running. This feature can be useful when using the experimental read-write deployment mode. #3005cortex_frontend_query_result_cache_skipped_total
and cortex_frontend_query_result_cache_attempted_total
metrics to track the reason why query results are not cached. #2855cortex_distributor_forward_errors_total
, with status_code="failed"
. #2968httpgrpc
messages from weaveworks/common library. #2996-alertmanager.alertmanager-client.tls-cipher-suites
-alertmanager.alertmanager-client.tls-min-version
-alertmanager.sharding-ring.etcd.tls-cipher-suites
-alertmanager.sharding-ring.etcd.tls-min-version
-compactor.ring.etcd.tls-cipher-suites
-compactor.ring.etcd.tls-min-version
-distributor.forwarding.grpc-client.tls-cipher-suites
-distributor.forwarding.grpc-client.tls-min-version
-distributor.ha-tracker.etcd.tls-cipher-suites
-distributor.ha-tracker.etcd.tls-min-version
-distributor.ring.etcd.tls-cipher-suites
-distributor.ring.etcd.tls-min-version
-ingester.client.tls-cipher-suites
-ingester.client.tls-min-version
-ingester.ring.etcd.tls-cipher-suites
-ingester.ring.etcd.tls-min-version
-memberlist.tls-cipher-suites
-memberlist.tls-min-version
-querier.frontend-client.tls-cipher-suites
-querier.frontend-client.tls-min-version
-querier.store-gateway-client.tls-cipher-suites
-querier.store-gateway-client.tls-min-version
-query-frontend.grpc-client-config.tls-cipher-suites
-query-frontend.grpc-client-config.tls-min-version
-query-scheduler.grpc-client-config.tls-cipher-suites
-query-scheduler.grpc-client-config.tls-min-version
-query-scheduler.ring.etcd.tls-cipher-suites
-query-scheduler.ring.etcd.tls-min-version
-ruler.alertmanager-client.tls-cipher-suites
-ruler.alertmanager-client.tls-min-version
-ruler.client.tls-cipher-suites
-ruler.client.tls-min-version
-ruler.query-frontend.grpc-client-config.tls-cipher-suites
-ruler.query-frontend.grpc-client-config.tls-min-version
-ruler.ring.etcd.tls-cipher-suites
-ruler.ring.etcd.tls-min-version
-store-gateway.sharding-ring.etcd.tls-cipher-suites
-store-gateway.sharding-ring.etcd.tls-min-version
-blocks-storage.bucket-store.max-concurrent-reject-over-limit
option to allow requests that exceed the max number of inflight object storage requests to be rejected. #2999-query-frontend.max-total-query-length
flag, which defaults to -store.max-query-length
if unset or set to 0. #3058-ingester.out-of-order-allowance
from ingesters). #2935-ruler.recording-rules-evaluation-enabled
-ruler.alerting-rules-evaluation-enabled
golang.org/x/net
dependency to fix CVE-2022-27664. #3124gossip_member_label
is now set for ruler-queriers. #3141MimirSchedulerQueriesStuck
runbook. #3006mimirtool analyze
parameters documentation. #3094-success-only=true
and the captured request failed. #2863Published by treid314 about 2 years ago
This release contains 5 PRs from 1 author. Thank you!
Full Changelog: https://github.com/grafana/mimir/compare/mimir-2.3.0...mimir-2.3.1
Published by treid314 about 2 years ago
Grafana Labs is excited to announce version 2.3 of Grafana Mimir, the most scalable, most performant open source time series database in the world.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
Note: If you are upgrading from Grafana Mimir 2.2, review the list of important changes that follow.
This release contains 370 PRs from 39 authors. Thank you!
Ingest metrics in OpenTelemetry format:
This release of Grafana Mimir introduces experimental support for ingesting metrics from the OpenTelemetry Collector's otlphttp
exporter. This adds a second ingestion option for users of the OTel Collector; Mimir was already compatible with the prometheusremotewrite
exporter. For more information, please see Configure OTel Collector.
Tenant federation for metadata queries:
Users with tenant federation enabled could already issue instant queries, range queries, and exemplar queries to multiple tenants at once and receive a single aggregated result. With Grafana Mimir 2.3, we've added tenant federation support to the /api/v1/metadata
endpoint as well.
Simpler object storage configuration:
Users can now configure block, alertmanager, and ruler storage all at once with the common
YAML config option key (or -common.storage.*
CLI flags). By centralizing your object storage configuration in one place, this enhancement makes configuration faster and less error prone. Users may still individually configure storage for each of these components if they desire. For more information, see the Common Configurations.
.deb and .rpm packages for Mimir:
Starting with version 2.3, we're publishing .deb and .rpm files for Grafana Mimir, which will make installing and running it on Debian or RedHat-based linux systems much easier. Thank you to community contributor wilfriedroset for your work to implement this!
Import historic data:
Users can now backfill time series data from their existing Prometheus or Cortex installation into Mimir using mimirtool
, making it possible to migrate to Grafana Mimir without losing your existing metrics data. This support is still considered experimental and does not yet work for data stored in Thanos. To learn more about this feature, please see mimirtool backfill
and Configure TSDB block upload
Increased instant query performance:
Grafana Mimir now supports splitting instant queries by time. This allows it to better parallelize execution of instant queries and therefore return results faster. At present, splitting is only supported for a subset of instant queries, which means not all instant queries will see a speedup. This feature is currently experimental and is disabled by default. It can be enabled with the split_instant_queries_by_interval
YAML config option in the limits
section (or the CLI flag -query-frontend.split-instant-queries-by-interval
).
The Mimir Helm chart is the best way to install Mimir on Kubernetes. As part of the Mimir 2.3 release, we’re also releasing version 3.1 of the Mimir Helm chart.
Notable enhancements follow. For the full list of changes, see the Helm chart changelog.
X-Scope-OrgID
header equal to the value of Mimir's no_auth_tenant
parameter by default. The previous release had set the value of X-Scope-OrgID
to anonymous
by default which complicated the process of migrating to Mimir.In Grafana Mimir 2.3 we have removed the following previously deprecated configuration options:
extend_writes
parameter in the distributor YAML configuration and -distributor.extend-writes
CLI flag have been removed.active_series_custom_trackers
parameter has been removed from the YAML configuration. It had already been moved to the runtime configuration. See #1188 for details.blocks-storage.tsdb.isolation-enabled
parameter in the YAML configuration and -blocks-storage.tsdb.isolation-enabled
CLI flag have been removed.With Grafana Mimir 2.3 we have also updated the default value for the CLI flag -distributor.ha-tracker.max-clusters
to 100
to provide Denial-of-Service protection. Previously -distributor.ha-tracker.max-clusters
was unlimited by default which could allow a tenant with HA Dedupe enabled to overload the HA tracker with __cluster__
label values that could cause the HA Dedupe database to fail.
Also, as noted above, the administrator password for Helm chart deployments using the built-in MinIO is now set differently.
429
to 500
when the request queue is full in the query-frontend. This corrects behavior in the query-frontend where a retryable 429 "Too Many Outstanding Requests"
error from a querier was incorrectly returned as an unretryable 500
system error.cortex_ingester_tsdb_out_of_order_samples_appended_total
. On multitenant clusters this helps us find the rate of appended out-of-order samples for a specific tenant. #2493-ruler.search-pending-for
and -ruler.flush-period
(and their respective YAML config options). #2288-*.consul.cas-retry-delay
flags. They have a default value of 1s
, while previously there was no delay between retries. #2309-store-gateway.thread-pool-size
. #2423-distributor.ha-tracker.max-clusters
to 100
to provide a DoS protection. #2465/api/v1/upload/block/{block}
endpoint for starting block upload is now /api/v1/upload/block/{block}/start
, and previous endpoint /api/v1/upload/block/{block}?uploadComplete=true
for finishing block upload is now /api/v1/upload/block/{block}/finish
. New API endpoint has been added: /api/v1/upload/block/{block}/check
. #2486 #2548-compactor.max-compaction-time
default from 0s
(disabled) to 1h
. When compacting blocks for a tenant, the compactor will move to compact blocks of another tenant or re-plan blocks to compact at least every 1h. #2514extend_writes
(see #1856) YAML key and -distributor.extend-writes
CLI flag from the distributor config. #2551active_series_custom_trackers
(see #1188) YAML key from the ingester config. #2552__mimir_cluster
is reserved by Mimir and not allowed to store metrics. #2643/purger/delete_tenant
and /purger/delete_tenant_status
to the compactor at /compactor/delete_tenant
and /compactor/delete_tenant_status
. The new endpoints on the compactor are stable. #2644-memberlist.leave-timeout duration
) from 5s to 20s and connection timeout (-memberlist.packet-dial-timeout
) from 5s to 2s. This makes leave timeout 10x the connection timeout, so that we can communicate the leave to at least 1 node, if the first 9 we try to contact times out. #2669412 Precondition Failed
and log info message when alertmanager isn't configured for a tenant. #2635max_global_series_per_metric
limit to 0
(disabled). Setting this limit by default does not provide much benefit because series are sharded by all labels. #2714-blocks-storage.tsdb.new-chunk-disk-mapper
has been removed, new chunk disk mapper is now always used, and is no longer marked experimental. Default value of -blocks-storage.tsdb.head-chunks-write-queue-size
has changed to 1000000, this enables async chunk queue by default, which leads to improved latency on the write path when new chunks are created in ingesters. #2762-blocks-storage.tsdb.isolation-enabled
option. TSDB-level isolation is now always disabled in Mimir. #2782-compactor.partial-block-deletion-delay
must either be set to 0 (to disable partial blocks deletion) or a value higher than 4h
. #2787-query-frontend.align-querier-with-step
has been deprecated. Please use -query-frontend.align-queries-with-step
instead. #2840-compactor.partial-block-deletion-delay
, as a duration string, allows you to set the delay since a partial block has been modified before marking it for deletion. A value of 0
, the default, disables this feature.cortex_compactor_blocks_marked_for_deletion_total
has a new value for the reason
label reason="partial"
, when a block deletion marker is triggered by the partial block deletion delay./otlp/v1/metrics
. #695 #2436 #2461-query-frontend.split-instant-queries-by-interval
. #2469 #2564 #2565 #2570 #2571 #2572 #2573 #2574 #2575 #2576 #2581 #2582 #2601 #2632 #2633 #2634 #2641 #2642 #2766proxy_url
configuration option in the receiver's configuration. #2317-memberlist.cluster-label
and -memberlist.cluster-label-verification-disabled
CLI flags (and their respective YAML config options). #2354common
YAML config option key (or -common.storage.*
CLI flags). #2330 #2347meta.json
file: number of series, samples and chunks. #2425cortex_ingester_client_request_duration_seconds
histogram metric, to correctly track requests taking longer than 1s (up until 16s). #2445-distributor.instance-limits.max-inflight-push-requests-bytes
. This limit protects the distributor against multiple large requests that together may cause an OOM, but are only a few, so do not trigger the max-inflight-push-requests
limit. #2413-runtime-config.file
that will be merged in left to right order. #2583cortex_distributor_query_ingester_chunks_deduped_total
and cortex_distributor_query_ingester_chunks_total
metrics for determining how effective ingester chunk deduplication at query time is. #2713alpine:3.16.2
. #2729<prometheus-http-prefix>/api/v1/status/buildinfo
endpoint. #2724-querier.max-concurrent
. #2598cortex_distributor_received_requests_total
and cortex_distributor_requests_in_total
metrics to provide visiblity into appropriate per-tenant request limits. #2770forwarding_endpoint
), instead of using per-rule endpoints. This takes precendence over per-rule endpoints. #2801err-mimir-distributor-max-write-message-size
to the errors catalog. #2470*status.Status
error when running in remote operational mode. #2417-alertmanager.web.external-url
is either a path starting with /
, or a full URL including the scheme and hostname. #2381 #2542MimirIngesterHasUnshippedBlocks
and stale cortex_ingester_oldest_unshipped_block_timestamp_seconds
when some block uploads fail. #2435-compactor.partial-block-deletion-delay
: compactor didn't correctly check for modification time of all block files. #25591 < bool 0
. #2558cortex_discarded_requests_total
metric, which previously was not registered and therefore not exported. #2712MimirAllocatingTooMuchMemory
alert for ingesters. #2480RolloutOperatorNotReconciling
alert, firing if the optional rollout-operator is not successfully reconciling. #2700deployment_type: 'baremetal'
in the mixin _config
. #2657distributor_allow_multiple_replicas_on_same_node
query_frontend_allow_multiple_replicas_on_same_node
querier_allow_multiple_replicas_on_same_node
ruler_allow_multiple_replicas_on_same_node
distributor_topology_spread_max_skew
query_frontend_topology_spread_max_skew
querier_topology_spread_max_skew
ruler_topology_spread_max_skew
max_global_series_per_metric
to 0 in all plans, and as a default value. #2669memberlist_cluster_label
and memberlist_cluster_label_verification_disabled
. #2349autoscaling_ruler_querier_enabled
: true
to enable autoscaling.autoscaling_ruler_querier_min_replicas
: minimum number of ruler-querier replicas.autoscaling_ruler_querier_max_replicas
: maximum number of ruler-querier replicas.autoscaling_prometheus_url
: Prometheus base URL from which to scrape Mimir metrics (e.g. http://prometheus.default:9090/prometheus
).memcached:1.6.16-alpine
. #2740$._config.configmaps
and $._config.runtime_config_files
to make it easy to add new configmaps or runtime config file to all components. #2748mimirtool backfill
command to upload Prometheus blocks using API available in the compactor. #1822-server.service-port
to -server.http-service-port
. #2683cortex_querytee_request_duration_seconds
to cortex_querytee_backend_request_duration_seconds
. Metric cortex_querytee_request_duration_seconds
is now reported without label backend
. #2683query-tee
to allow testing gRPC requests to Mimir instances. #2683mimirtool
commands in the HTTP API documentation. #2516markblocks
now processes multiple blocks concurrently. #2677Full Changelog: https://github.com/grafana/mimir/compare/mimir-2.2.0...mimir-2.3.0
Published by treid314 about 2 years ago
This release contains 33 contributions from 9 authors. Thank you!
Note: We tagged a 2.3.0-rc.1 but found a panic in the alertmanager before publishing the 2.3.0-rc.1 pre-release. With 2.3.0-rc.2 we have included the fix for the alertmanager and created a new tag and release candidate.
-blocks-storage.tsdb.new-chunk-disk-mapper
has been removed, new chunk disk mapper is now always used, and is no longer marked experimental. Default value of -blocks-storage.tsdb.head-chunks-write-queue-size
has changed to 1000000, this enables async chunk queue by default, which leads to improved latency on the write path when new chunks are created in ingesters. #2762-blocks-storage.tsdb.isolation-enabled
option. TSDB-level isolation is now always disabled in Mimir. #2782-compactor.partial-block-deletion-delay
must either be set to 0 (to disable partial blocks deletion) or a value higher than 4h
. #2787-query-frontend.align-querier-with-step
has been deprecated. Please use -query-frontend.align-queries-with-step
instead. #2840-distributor.remote-timeout
to 2s
from 20s
and -distributor.forwarding.request-timeout
to 2s
from 10s
to improve distributor resource usage when ingesters crash. #2728cortex_distributor_query_ingester_chunks_deduped_total
and cortex_distributor_query_ingester_chunks_total
metrics for determining how effective ingester chunk deduplication at query time is. #2713alpine:3.16.2
. #2729<prometheus-http-prefix>/api/v1/status/buildinfo
endpoint. #2724-querier.max-concurrent
. #2598cortex_distributor_received_requests_total
and cortex_distributor_requests_in_total
metrics to provide visiblity into appropriate per-tenant request limits. #2770forwarding_endpoint
), instead of using per-rule endpoints. This takes precendence over per-rule endpoints. #2801err-mimir-distributor-max-write-message-size
to the errors catalog. #2470ruler.external_url
is explicitly set to an empty string (""
) in YAML. #2915Writes
dashboard to account for samples ingested via the new OTLP ingestion endpoint. #2919deployment_type: 'baremetal'
in the mixin _config
. #2657memcached:1.6.16-alpine
. #2740$._config.configmaps
and $._config.runtime_config_files
to make it easy to add new configmaps or runtime config file to all components. #2748-server.service-port
to -server.http-service-port
. #2683cortex_querytee_request_duration_seconds
to cortex_querytee_backend_request_duration_seconds
. Metric cortex_querytee_request_duration_seconds
is now reported without label backend
. #2683query-tee
to allow testing gRPC requests to Mimir instances. #2683Full Changelog: https://github.com/grafana/mimir/compare/mimir-2.3.0-rc0...mimir-2.3.0-rc.2
Published by treid314 about 2 years ago
This release contains 333 PRs from 39 authors. Thank you!
Grafana Labs is excited to announce version 2.3 of Grafana Mimir, the most scalable, most performant open source time series database in the world.
The highlights that follow include the top features, enhancements, and bugfixes in this release. If you are upgrading from Grafana Mimir 2.2, there is upgrade-related information as well.
For the complete list of changes, see the Changelog.
Ingest metrics in OpenTelemetry format:
This release of Grafana Mimir introduces experimental support for ingesting metrics from the OpenTelemetry Collector's otlphttp
exporter. This adds a second ingestion option for users of the OTel Collector; Mimir was already compatible with the prometheusremotewrite
exporter. For more information, please see Configure OTel Collector.
Increased instant query performance:
Grafana Mimir now supports splitting instant queries by time. This allows it to better parallelize execution of instant queries and therefore return results faster. At present, splitting is only supported for a subset of instant queries, which means not all instant queries will see a speedup. This feature is being released as experimental and is disabled by default. It can be enabled by setting -query-frontend.split-instant-queries-by-interval
.
Tenant federation for metadata queries:
Users with tenant federation enabled could previously issue instant queries, range queries, and exemplar queries to multiple tenants at once and receive a single aggregated result. With Grafana Mimir 2.3, we've added tenant federation support to the /api/v1/metadata
endpoint as well.
Simpler object storage configuration:
Users can now configure block, alertmanager, and ruler storage all at once with the common
YAML config option key (or -common.storage.*
CLI flags). By centralizing your object storage configuration in one place, this enhancement makes configuration faster and less error prone. Users can still individually configure storage for each of these components if they desire. For more information, see the Common Configurations.
DEB and RPM packages for Mimir:
Starting with version 2.3, we're publishing deb and rpm files for Grafana Mimir, which will make installing and running it on Debian or RedHat-based linux systems much easier. Thank you to community contributor wilfriedroset for your work to implement this!
Import historic data to Grafana Mimir:
Users can now backfill time series data from their existing Prometheus or Cortex installation into Mimir using mimirtool
, making it possible to migrate to Grafana Mimir without losing your existing metrics data. This support is still considered experimental and does not work for data stored in Thanos yet. To learn more about this feature, please see mimirtool backfill and Configure TSDB block upload
New Helm chart minor release: The Mimir Helm chart is the best way to install Mimir on Kubernetes. As part of the Mimir 2.3 release, we’re also releasing version 3.1 of the Mimir Helm chart. Notable enhancements follow. For the full list of changes, see the Helm chart changelog.
X-Scope-OrgID
header equal to the value of Mimir's no_auth_tenant
parameter by default. The previous release had set the value of X-Scope-OrgID
to anonymous
by default which complicated the process of migrating to Mimir.In Grafana Mimir 2.3 we have removed the following previously deprecated configuration options:
extend_writes
parameter in the distributor YAML configuration and -distributor.extend-writes
CLI flag have been removed.active_series_custom_trackers
parameter has been removed from the YAML configuration. It had already been moved to the runtime configuration. See #1188 for details.With Grafana Mimir 2.3 we have also updated the default value for -distributor.ha-tracker.max-clusters
to 100
to provide Denial-of-Service protection. Previously -distributor.ha-tracker.max-clusters
was unlimited by default which could allow a tenant with HA Dedupe enabled to overload the HA tracker with __cluster__
label values that could cause the HA Dedupe database to fail.
429
to 500
when the request queue is full in the query-frontend. This corrects behavior in the query-frontend where a 429 "Too Many Outstanding Requests"
error (a retriable error) from a querier was incorrectly returned as a 500
system error (an unretriable error).cortex_ingester_tsdb_out_of_order_samples_appended_total
. On multitenant clusters this helps us find the rate of appended out-of-order samples for a specific tenant. #2493-ruler.search-pending-for
and -ruler.flush-period
(and their respective YAML config options). #2288-*.consul.cas-retry-delay
flags. They have a default value of 1s
, while previously there was no delay between retries. #2309-store-gateway.thread-pool-size
. #2423-distributor.ha-tracker.max-clusters
to 100
to provide a DoS protection. #2465/api/v1/upload/block/{block}
endpoint for starting block upload is now /api/v1/upload/block/{block}/start
, and previous endpoint /api/v1/upload/block/{block}?uploadComplete=true
for finishing block upload is now /api/v1/upload/block/{block}/finish
. New API endpoint has been added: /api/v1/upload/block/{block}/check
. #2486 #2548-compactor.max-compaction-time
default from 0s
(disabled) to 1h
. When compacting blocks for a tenant, the compactor will move to compact blocks of another tenant or re-plan blocks to compact at least every 1h. #2514extend_writes
(see #1856) YAML key and -distributor.extend-writes
CLI flag from the distributor config. #2551active_series_custom_trackers
(see #1188) YAML key from the ingester config. #2552__mimir_cluster
is reserved by Mimir and not allowed to store metrics. #2643/purger/delete_tenant
and /purger/delete_tenant_status
to the compactor at /compactor/delete_tenant
and /compactor/delete_tenant_status
. The new endpoints on the compactor are stable. #2644-memberlist.leave-timeout duration
) from 5s to 20s and connection timeout (-memberlist.packet-dial-timeout
) from 5s to 2s. This makes leave timeout 10x the connection timeout, so that we can communicate the leave to at least 1 node, if the first 9 we try to contact times out. #2669412 Precondition Failed
and log info message when alertmanager isn't configured for a tenant. #2635max_global_series_per_metric
limit to 0
(disabled). Setting this limit by default does not provide much benefit because series are sharded by all labels. #2714-compactor.partial-block-deletion-delay
, as a duration string, allows you to set the delay since a partial block has been modified before marking it for deletion. A value of 0
, the default, disables this feature.cortex_compactor_blocks_marked_for_deletion_total
has a new value for the reason
label reason="partial"
, when a block deletion marker is triggered by the partial block deletion delay./otlp/v1/metrics
. #695 #2436 #2461-query-frontend.split-instant-queries-by-interval
. #2469 #2564 #2565 #2570 #2571 #2572 #2573 #2574 #2575 #2576 #2581 #2582 #2601 #2632 #2633 #2634 #2641 #2642 #2766proxy_url
configuration option in the receiver's configuration. #2317-memberlist.cluster-label
and -memberlist.cluster-label-verification-disabled
CLI flags (and their respective YAML config options). #2354common
YAML config option key (or -common.storage.*
CLI flags). #2330 #2347meta.json
file: number of series, samples and chunks. #2425cortex_ingester_client_request_duration_seconds
histogram metric, to correctly track requests taking longer than 1s (up until 16s). #2445-distributor.instance-limits.max-inflight-push-requests-bytes
. This limit protects the distributor against multiple large requests that together may cause an OOM, but are only a few, so do not trigger the max-inflight-push-requests
limit. #2413-runtime-config.file
that will be merged in left to right order. #2583*status.Status
error when running in remote operational mode. #2417-alertmanager.web.external-url
is either a path starting with /
, or a full URL including the scheme and hostname. #2381 #2542MimirIngesterHasUnshippedBlocks
and stale cortex_ingester_oldest_unshipped_block_timestamp_seconds
when some block uploads fail. #2435-compactor.partial-block-deletion-delay
: compactor didn't correctly check for modification time of all block files. #25591 < bool 0
. #2558cortex_discarded_requests_total
metric, which previously was not registered and therefore not exported. #2712MimirAllocatingTooMuchMemory
alert for ingesters. #2480RolloutOperatorNotReconciling
alert, firing if the optional rollout-operator is not successfully reconciling. #2700distributor_allow_multiple_replicas_on_same_node
query_frontend_allow_multiple_replicas_on_same_node
querier_allow_multiple_replicas_on_same_node
ruler_allow_multiple_replicas_on_same_node
distributor_topology_spread_max_skew
query_frontend_topology_spread_max_skew
querier_topology_spread_max_skew
ruler_topology_spread_max_skew
max_global_series_per_metric
to 0 in all plans, and as a default value. #2669memberlist_cluster_label
and memberlist_cluster_label_verification_disabled
. #2349autoscaling_ruler_querier_enabled
: true
to enable autoscaling.autoscaling_ruler_querier_min_replicas
: minimum number of ruler-querier replicas.autoscaling_ruler_querier_max_replicas
: maximum number of ruler-querier replicas.autoscaling_prometheus_url
: Prometheus base URL from which to scrape Mimir metrics (e.g. http://prometheus.default:9090/prometheus
).mimirtool backfill
command to upload Prometheus blocks using API available in the compactor. #1822mimirtool
commands in the HTTP API documentation. #2516markblocks
now processes multiple blocks concurrently. #2677Full Changelog: https://github.com/grafana/mimir/compare/mimir-2.2.0...mimir-2.3.0-rc0