Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.
AGPL-3.0 License
Bot releases are hidden (Show)
Published by krajorama about 2 years ago
Grafana Labs is excited to announce version 2.2 of Grafana Mimir, the most scalable, most performant open source time series database in the world.
The highlights that follow include the top features, enhancements, and bugfixes in this release. If you are upgrading from Grafana Mimir 2.1, there is upgrade-related information as well.
For the complete list of changes, see the Changelog.
This release contains 214 contributions from 32 authors. Thank you!
Support for ingesting out-of-order samples: Grafana Mimir includes new, experimental support for ingesting out-of-order samples.
This support is configurable, and it allows you to set how far out-of-order Mimir accepts samples on a per-tenant basis.
This feature still needs additional testing; we do not recommend using it in a production environment.
For more information, see Configuring out-of-order samples ingestion
Improved error messages: The error messages that Mimir reports are more human readable, and the messages include error codes that are easily searchable.
For error descriptions, see the Grafana Mimir runbooks’ Errors catalog.
Configurable prefix for object storage: Mimir can now store block data, rules, and alerts in one bucket, with each under its own user-defined prefix, rather than requiring one bucket for each.
You can configure the storage prefix by using -<storage>.storage-prefix
option for corresponding storage: ruler-storage
, alertmanager-storage
or blocks-storage
.
Store-gateway performance optimization
The store-gateway can now pre-populate the file system cache when memory-mapping index-header files.
This avoids the store-gateway from appearing to be stuck while loading index-headers.
This feature is experimental and disabled by default; enable it using the flag -blocks-storage.bucket-store.index-header.map-populate-enabled
.
Faster ingester startup: Ingesters now replay their WALs (write ahead logs) about 50% faster, and they also re-join the ring sooner under some conditions.
Helm Chart improvements: The Mimir Helm chart is the best way to install Mimir on Kubernetes. As part of the Mimir 2.2 release, we're also releasing version 3.0 of the Helm chart. Notable enhancements follow. For the full list of changes, see the Helm chart changelog.
extraEnvFrom
capability to all Mimir services to enable you to inject secrets via environment variables.global.extraEnv
and global.extraEnvFrom
. Note that the memcached and minio pods are not included.Secret
to a ConfigMap
, which makes it easier to quickly see the differences between your Mimir configurations between upgrades. We especially like the Helm diff plugin for this purpose.structuredConfig
option, which allows you to overwrite specific key-value pairs in the mimir.config
template, which saves you from having to maintain the entire mimir.config
in your own values.yaml
file.-ingester.ring.unregister-on-shutdown
and -distributor.extend-writes
, for a smoother upgrade experience. Rolling restarts of ingesters are now less likely to cause spikes in resource usage.All deprecated API endpoints that are under /api/v1/rules*
and /prometheus/rules*
have now been removed from the ruler component in favor of identical endpoints that use the prefix /prometheus/config/v1/rules*
.
In Grafana Mimir 2.2, we have updated default values and some parameters to give you a better out-of-the-box experience:
Message size limits for gRPC messages that are exchanged between internal Mimir components have increased to 100 MiB from 4 MiB.
This helps to avoid internal server errors when pushing or querying large data.
The -blocks-storage.bucket-store.ignore-blocks-within
parameter changed from 0
to 10h
.
The default value of -querier.query-store-after
changed from 0
to 12h
.
For most-recent data, both changes improve query performance by querying only the ingesters, rather than object storage.
The option -querier.shuffle-sharding-ingesters-lookback-period
has been deprecated.
If you previously changed this option from its default of 0s
, set -querier.shuffle-sharding-ingesters-enabled
to true
and specify the lookback period by setting the -querier.query-ingesters-within
option.
The -memberlist.abort-if-join-fails
parameter now defaults to false
.
When Mimir is using memberlist as the backend store for its hash ring, and it fails to join the memberlist cluster, Mimir no longer aborts startup by default.
If you have used a previous version of the Mimir Helm chart, you must address some of the chart's breaking changes before upgrading to helm chart version 3.0. For a detailed information about how to do this, see Upgrade the Grafana Mimir Helm chart from version 2.1 to 3.0.
-server.grpc-max-recv-msg-size-bytes
and -server.grpc-max-send-msg-size-bytes
from 4MB to 100MB. #1884-blocks-storage.bucket-store.ignore-blocks-within
now defaults to 10h
(previously 0
)-querier.query-store-after
now defaults to 12h
(previously 0
)-querier.query-ingesters-within
-querier.query-store-after
-ingester.ring.join-after
. Mimir now behaves as this setting is always set to 0s. This configuration option will be removed in Mimir 2.4.0. #1965__org_id__
label. Compactor now ignores this label and will compact blocks with and without this label together. mimirconvert
tool will remove the label from blocks as "unknown" label. #1972-querier.shuffle-sharding-ingesters-lookback-period
, instead adding -querier.shuffle-sharding-ingesters-enabled
to enable or disable shuffle sharding on the read path. The value of -querier.query-ingesters-within
is now used internally for shuffle sharding lookback. #2110-memberlist.abort-if-join-fails
now defaults to false. Previously it defaulted to true. #2168/api/v1/rules*
and /prometheus/rules*
configuration endpoints are removed. Use /prometheus/config/v1/rules*
. #2182-ingester.exemplars-update-period
has been renamed to -ingester.tsdb-config-update-period
. You can use it to update multiple, per-tenant TSDB configurations. #2187-ingester.out-of-order-time-window
, as duration string, allows you to set how back in time a sample can be. The default is 0s
, where s
is seconds.cortex_ingester_tsdb_out_of_order_samples_appended_total
metric tracks the total number of out-of-order samples ingested by the ingester.cortex_discarded_samples_total
has a new label reason="sample-too-old"
, when the -ingester.out-of-order-time-window
flag is greater than zero. The label tracks the number of samples that were discarded for being too old; they were out of order, but beyond the time window allowed. The labels reason="sample-out-of-order"
and reason="sample-out-of-bounds"
are not used when out-of-order ingestion is enabled.-distributor.request-rate-limit
-distributor.request-burst-limit
cortex_discarded_requests_total
-store-gateway.thread-pool-size
and is disabled by default. Replaces the ability to run index header operations in a dedicated thread pool. #1660 #1812memberlist_client_received_broadcasts_dropped_total
counter tracks number of dropped per-key messages. #1912*_storage.storage_prefix
). This enables using the same bucket for the three components. #1686 #1951alpine:3.16.0
. #2028-blocks-storage.bucket-store.index-header.map-populate-enabled=true
. Note this flag only has an effect when running on Linux. #2019 #2054-compactor.block-upload-enabled
. #1694 #2126cortex_distributor_forward_errors_total
for error codes resulting from forwarding requests. #2077/ready
endpoint now returns and logs detailed services information. #2055-querier.query-store-after
validation fails. #1914cortex_ruler_queries_failed_total
metric for any remote query error except 4xx when remote operational mode is enabled. #2053 #2143-ingester.ring.unregister-on-shutdown=false
with long -ingester.ring.heartbeat-period
. #2085-querier.timeout
(default 2m
). #2090 #2222active_series_custom_trackers_config
to active_series_custom_trackers
. For backwards compatibility both version is going to be supported for until Mimir v2.4. When both fields are specified, active_series_custom_trackers_config
takes precedence over active_series_custom_trackers
. #2101cortex_discarded_metadata_total
metric. #2096/runtime_config
page. #2065vector
and time
functions were sharded, which made expressions like vector(1) > 0 and vector(1)
fail. #2355mimir_queries
rules group into mimir_queries
and mimir_ingester_queries
to keep number of rules per group within the default per-tenant limit. #1885gateway_enabled: true
in the mixin config and recompiling the mixin running make build-mixin
. #1955MimirFrontendQueriesStuck
and MimirSchedulerQueriesStuck
to consider ruler query path components. #1949MimirRulerTooManyFailedQueries
severity to critical
. #2165datasource_regex
to customise the regular expression used to select valid datasources for Mimir dashboards. #1802MimirStoreGatewayNoSyncedTenants
alert that fires when there is a store-gateway owning no tenants. #1882recording_rules_range_interval
configurable for cases where Mimir metrics are scraped less often that every 30 seconds. #2118container_memory_usage_bytes:sum
recording rule. #1865MimirGossipMembersMismatch
alerts if Mimir alertmanager is activated. #1870MimirRulerMissedEvaluations
to show % of missed alerts as a value between 0 and 100 instead of 0 and 1. #1895MimirCompactorHasNotUploadedBlocks
alert false positive when Mimir is deployed in monolithic mode. #1902MimirGossipMembersMismatch
to make it less sensitive during rollouts and fire one alert per installation, not per job. #1926MimirAllocatingTooMuchMemory
alerts if no container limits are supplied. #1905Mimir / Queries
dashboard. #1928$__rate_interval
for rate queries in dashboards to support scrape intervals of >15s. #2011MimirCompactorHasNotUploadedBlocks
distinct to avoid rule evaluation failures due to duplicate series being generated. #2197MimirGossipMembersMismatch
alert when using remote ruler evaluation. #2159-querier.query-store-after
, -querier.shuffle-sharding-ingesters-lookback-period
, -blocks-storage.bucket-store.ignore-blocks-within
, and -blocks-storage.tsdb.close-idle-tsdb-timeout
CLI flags since the values now match defaults. #1915 #1921-blocks-storage.bucket-store.chunks-cache.memcached.timeout
to 450ms
to increase use of cached data. #2035memberlist_ring_enabled
configuration now applies to Alertmanager. #2102 #2103 #2107memberlist_ring_enabled
is now true. It means that all hash rings use Memberlist as default KV store instead of Consul (previous default). #2161-ingester.max-global-metadata-per-user
to correspond to 20% of the configured max number of series per tenant. #2250-ingester.max-global-metadata-per-metric
to be 10. #2250_config.multi_zone_ingester_max_unavailable
to 25. #2251autoscaling_querier_enabled
: true
to enable autoscaling.autoscaling_querier_min_replicas
: minimum number of querier replicas.autoscaling_querier_max_replicas
: maximum number of querier replicas.autoscaling_prometheus_url
: Prometheus base URL from which to scrape Mimir metrics (e.g. http://prometheus.default:9090/prometheus
).ruler_remote_evaluation_enabled
), which deploys and uses a dedicated query path for rule evaluation. This enables the benefits of the query-frontend for rule evaluation, such as query sharding. #2073compactor
service, that can be used to route requests directly to compactor (e.g. admin UI). #2063consul_enabled
configuration option to provide the ability to disable consul. It is automatically set to false when memberlist_ring_enabled
is true and multikv_migration_enabled
(used for migration from Consul to memberlist) is not set. #2093 #2152--use-legacy-routes
now toggles between using /prometheus/config/v1/rules
(default) and /api/v1/rules
(legacy) endpoints. #2182-tests.smoke-test
flag to run the mimir-continuous-test
suite once and immediately exit. #2047 #2094/memberlist
admin page. #2166MimirRequestLatency
was expanded with more practical advice. #1967Full Changelog: https://github.com/grafana/mimir/compare/mimir-2.1.0...mimir-2.2.0
Published by colega over 2 years ago
This release contains 26 contributions from 6 authors. Thank you!
vector
and time
functions were sharded, which made expressions like vector(1) > 0 and vector(1)
fail. #2355Full Changelog: https://github.com/grafana/mimir/compare/mimir-2.2.0-rc.0...mimir-2.2.0-rc.1
Published by pstibrany over 2 years ago
This release contains 214 contributions from 32 authors. Thank you!
Grafana Labs is excited to announce version 2.2 of Grafana Mimir, the most scalable, most performant open source time series database in the world.
Highlights include the top features, enhancements, and bugfixes in this release. If you are upgrading from Grafana Mimir 2.1, there is migration-related information as well.
For the complete list of changes, see the Changelog.
Support for ingesting out-of-order samples: Grafana Mimir includes new, experimental support for ingesting out-of-order samples.
This support is configurable, with users able to set how far out-of-order Mimir will accept samples on a per-tenant basis.
Note that this feature still needs a heavy testing, and is not production-ready yet.
Error messages: The error messages that Mimir reports are more human readable, and the messages include error codes that are easily searchable.
Configurable prefix for object storage: Mimir can now store block data, rules, and alerts in one bucket, each under its own user-defined prefix, rather than requiring one bucket for each.
You can configure the storage prefix by using -<storage>.storage-prefix
option for corresponding storage: ruler-storage
, alertmanager-storage
or blocks-storage
.
Helm Chart update: TBD
Store-gateway can now optionally prepopulate the file system cache when memory-mapping index-header files.
This can help store-gateway to avoid looking stuck while loading index-headers.
Feature can be enabled with new experimental flag -blocks-storage.bucket-store.index-header.map-populate-enabled
.
Faster ingester startup: Ingesters now replay Write-Ahead-Log by about 50% faster, and they also re-join the ring sooner under some conditions.
We have updated default values and some parameters in Grafana Mimir 2.2 to give you a better out-of-the-box experience:
Message size limits for gRPC messages exchanged between internal Mimir components increased to 100 MiB from the previous 4 MiB.
This helps to avoid internal server errors when pushing or querying large data.
The -blocks-storage.bucket-store.ignore-blocks-within
parameter changed from 0
to 10h
.
The default value of -querier.query-store-after
changed from 0
to 12h
.
Both changes improve query performance for most-recent data by querying only the ingesters, rather than object storage.
The option -querier.shuffle-sharding-ingesters-lookback-period
has been deprecated.
If you previously changed this option from its default of 0s
, set -querier.shuffle-sharding-ingesters-enabled
to true
and specify the lookback period by setting the -querier.query-ingesters-within
option.
The -memberlist.abort-if-join-fails
parameter now defaults to false.
When Mimir is using memberlist as a backend store for hash ring, and it fails to join the memberlist cluster, Mimir no longer aborts startup by default.
-server.grpc-max-recv-msg-size-bytes
and -server.grpc-max-send-msg-size-bytes
from 4MB to 100MB. #1884-blocks-storage.bucket-store.ignore-blocks-within
now defaults to 10h
(previously 0
)-querier.query-store-after
now defaults to 12h
(previously 0
)-querier.query-ingesters-within
-querier.query-store-after
-ingester.ring.join-after
. Mimir now behaves as this setting is always set to 0s. This configuration option will be removed in Mimir 2.4.0. #1965__org_id__
label. Compactor now ignores this label and will compact blocks with and without this label together. mimirconvert
tool will remove the label from blocks as "unknown" label. #1972-querier.shuffle-sharding-ingesters-lookback-period
, instead adding -querier.shuffle-sharding-ingesters-enabled
to enable or disable shuffle sharding on the read path. The value of -querier.query-ingesters-within
is now used internally for shuffle sharding lookback. #2110-memberlist.abort-if-join-fails
now defaults to false. Previously it defaulted to true. #2168/api/v1/rules*
and /prometheus/rules*
configuration endpoints are removed. Use /prometheus/config/v1/rules*
. #2182-ingester.exemplars-update-period
has been renamed to -ingester.tsdb-config-update-period
. You can use it to update multiple, per-tenant TSDB configurations. #2187-ingester.out-of-order-time-window
, as duration string, allows you to set how back in time a sample can be. The default is 0s
, where s
is seconds.cortex_ingester_tsdb_out_of_order_samples_appended_total
metric tracks the total number of out-of-order samples ingested by the ingester.cortex_discarded_samples_total
has a new label reason="sample-too-old"
, when the -ingester.out-of-order-time-window
flag is greater than zero. The label tracks the number of samples that were discarded for being too old; they were out of order, but beyond the time window allowed.-distributor.request-rate-limit
-distributor.request-burst-limit
cortex_discarded_requests_total
-store-gateway.thread-pool-size
and is disabled by default. Replaces the ability to run index header operations in a dedicated thread pool. #1660 #1812memberlist_client_received_broadcasts_dropped_total
counter tracks number of dropped per-key messages. #1912*_storage.storage_prefix
). This enables using the same bucket for the three components. #1686 #1951alpine:3.16.0
. #2028-blocks-storage.bucket-store.index-header.map-populate-enabled=true
. Note this flag only has an effect when running on Linux. #2019 #2054-compactor.block-upload-enabled
. #1694 #2126cortex_distributor_forward_errors_total
for error codes resulting from forwarding requests. #2077/ready
endpoint now returns and logs detailed services information. #2055-querier.query-store-after
validation fails. #1914cortex_ruler_queries_failed_total
metric for any remote query error except 4xx when remote operational mode is enabled. #2053 #2143-ingester.ring.unregister-on-shutdown=false
with long -ingester.ring.heartbeat-period
. #2085-querier.timeout
(default 2m
). #2090 #2222active_series_custom_trackers_config
to active_series_custom_trackers
. For backwards compatibility both version is going to be supported for until Mimir v2.4. When both fields are specified, active_series_custom_trackers_config
takes precedence over active_series_custom_trackers
. #2101cortex_discarded_metadata_total
metric. #2096/runtime_config
page. #2065mimir_queries
rules group into mimir_queries
and mimir_ingester_queries
to keep number of rules per group within the default per-tenant limit. #1885gateway_enabled: true
in the mixin config and recompiling the mixin running make build-mixin
. #1955MimirFrontendQueriesStuck
and MimirSchedulerQueriesStuck
to consider ruler query path components. #1949MimirRulerTooManyFailedQueries
severity to critical
. #2165datasource_regex
to customise the regular expression used to select valid datasources for Mimir dashboards. #1802MimirStoreGatewayNoSyncedTenants
alert that fires when there is a store-gateway owning no tenants. #1882recording_rules_range_interval
configurable for cases where Mimir metrics are scraped less often that every 30 seconds. #2118container_memory_usage_bytes:sum
recording rule. #1865MimirGossipMembersMismatch
alerts if Mimir alertmanager is activated. #1870MimirRulerMissedEvaluations
to show % of missed alerts as a value between 0 and 100 instead of 0 and 1. #1895MimirCompactorHasNotUploadedBlocks
alert false positive when Mimir is deployed in monolithic mode. #1902MimirGossipMembersMismatch
to make it less sensitive during rollouts and fire one alert per installation, not per job. #1926MimirAllocatingTooMuchMemory
alerts if no container limits are supplied. #1905Mimir / Queries
dashboard. #1928$__rate_interval
for rate queries in dashboards to support scrape intervals of >15s. #2011MimirCompactorHasNotUploadedBlocks
distinct to avoid rule evaluation failures due to duplicate series being generated. #2197MimirGossipMembersMismatch
alert when using remote ruler evaluation. #2159-querier.query-store-after
, -querier.shuffle-sharding-ingesters-lookback-period
, -blocks-storage.bucket-store.ignore-blocks-within
, and -blocks-storage.tsdb.close-idle-tsdb-timeout
CLI flags since the values now match defaults. #1915 #1921-blocks-storage.bucket-store.chunks-cache.memcached.timeout
to 450ms
to increase use of cached data. #2035memberlist_ring_enabled
configuration now applies to Alertmanager. #2102 #2103 #2107memberlist_ring_enabled
is now true. It means that all hash rings use Memberlist as default KV store instead of Consul (previous default). #2161-ingester.max-global-metadata-per-user
to correspond to 20% of the configured max number of series per tenant. #2250-ingester.max-global-metadata-per-metric
to be 10. #2250_config.multi_zone_ingester_max_unavailable
to 25. #2251autoscaling_querier_enabled
: true
to enable autoscaling.autoscaling_querier_min_replicas
: minimum number of querier replicas.autoscaling_querier_max_replicas
: maximum number of querier replicas.autoscaling_prometheus_url
: Prometheus base URL from which to scrape Mimir metrics (e.g. http://prometheus.default:9090/prometheus
).ruler_remote_evaluation_enabled
), which deploys and uses a dedicated query path for rule evaluation. This enables the benefits of the query-frontend for rule evaluation, such as query sharding. #2073compactor
service, that can be used to route requests directly to compactor (e.g. admin UI). #2063consul_enabled
configuration option to provide the ability to disable consul. It is automatically set to false when memberlist_ring_enabled
is true and multikv_migration_enabled
(used for migration from Consul to memberlist) is not set. #2093 #2152--use-legacy-routes
now toggles between using /prometheus/config/v1/rules
(default) and /api/v1/rules
(legacy) endpoints. #2182-tests.smoke-test
flag to run the mimir-continuous-test
suite once and immediately exit. #2047 #2094/memberlist
admin page. #2166MimirRequestLatency
was expanded with more practical advice. #1967Full Changelog: https://github.com/grafana/mimir/compare/mimir-2.1.0...mimir-2.2.0-rc.0
Published by johannaratliff over 2 years ago
Grafana Labs is excited to announce version 2.1 of Grafana Mimir, the most scalable, most performant open source time series database in the world.
Below we highlight the top features, enhancements and bugfixes in this release, as well as relevant callouts for those upgrading from Grafana Mimir 2.0. The complete list of changes is recorded in the Changelog.
Mimir on ARM: We now publish Docker images for both amd64
and arm64
, making it easier for those on arm-based machines to develop and run Mimir. Multiplaform images are available from the Mimir docker registry. Note that our existing integration test suite only uses the amd64
images, which means we cannot make any functional or performance guarantees about the arm64
images.
Remote
ruler mode for improved rule evaluation performance: We've added a remote
mode for the Grafana Mimir ruler, in which the ruler delegates rule evaluation to the query-frontend rather than evaluating rules directly within the ruler process itself. This allows recording and alerting rules to benefit from the query parallelization techniques implemented in the query-frontend (like query sharding). Remote
mode is considered experimental and is off by default. To enable, see remote ruler.
Per-tenant custom trackers for monitoring cardinality: In Grafana Mimir 2.0, we introduced a custom tracker feature that allows you to track the count of active series over time that match a specific label matcher. In Grafana Mimir 2.1, we've made it possible to configure custom trackers via the runtime configuration file. This means you can now define different trackers for each tenant in your cluster and modify those trackers without an ingester restart.
Reduce cardinality of Grafana Mimir's /metrics
endpoint: While Grafana Mimir does a good job of exposing a relatively small number of series about its own state, this number can tick up when running Grafana Mimir clusters with high tenant counts or high active series counts. To reduce this number (and the accompanying cost of scraping and storing these time series), we made several optimizations which decreased series count on the /metrics
endpoint by more than 10%.
We've updated the default values for 2 parameters in Grafana Mimir to give users better out-of-the-box performance:
We've changed the default for -blocks-storage.tsdb.isolation-enabled
from true
to false
. We've marked this flag as deprecated and will remove it completely in 2 releases. TSDB isolation is a feature inherited from Prometheus that didn't provide any benefit given Grafana Mimir's distributed architecture and in our 1 billion series load test we found it actually hurt performance. Disabling it reduced our ingester 99th percentile latency by 90%.
The store-gateway attributes cache is now enabled by default (achieved by updating the default for -blocks-storage.bucket-store.chunks-cache.attributes-in-memory-max-items
from 0
to 50000
). This in-memory cache makes it faster to look up object attributes for chunk data. We've been running this optional cache internally for a while and upon a recent configuration audit, realized it made sense to do the same for all users. The increase in store-gateway memory utilization from enabling this cache is negligible and easily justified given the performance gains.
local
storage broke in Grafana Mimir 2.0 when we removed the ability to run the Alertmanager without sharding. With this bugfix, we've made it possible to again run Alertmanager with local
storage. However, for production use, we still recommend using external store since this is needed to persist Alertmanager state (e.g. silences) between replicas.-alertmanager.alertmanager-client.grpc-max-recv-msg-size
now defaults to 100 MiB (previously was not configurable and set to 16 MiB)-alertmanager.alertmanager-client.grpc-max-send-msg-size
now defaults to 100 MiB (previously was not configurable and set to 4 MiB)-alertmanager.max-recv-msg-size
now defaults to 100 MiB (previously was 16 MiB)user
label to metrics cortex_ingester_ingested_samples_total
and cortex_ingester_ingested_samples_failures_total
. #1533-blocks-storage.tsdb.isolation-enabled
default from true
to false
. The config option has also been deprecated and will be removed in 2 minor version. #1655-blocks-storage.bucket-store.chunks-cache.attributes-in-memory-max-items=50000
. #1727cortex_compactor_garbage_collected_blocks_total
since it duplicates cortex_compactor_blocks_marked_for_deletion_total
. #1728org_id
label now use user
label. #1634 #1758user
and integration
when the metric value is zero: #1783
cortex_alertmanager_notifications_total
cortex_alertmanager_notifications_failed_total
cortex_alertmanager_notification_requests_total
cortex_alertmanager_notification_requests_failed_total
cortex_alertmanager_notification_rate_limited_total
cortex_member_ring_tokens_owned
cortex_member_ring_tokens_to_own
cortex_ring_tokens_owned
cortex_ring_member_ownership_percent
cortex_request_duration_seconds_count{route=~"/cortex.Ingester/(QueryStream|QueryExemplars)"}
instead. #1797
cortex_distributor_ingester_queries_total
cortex_distributor_ingester_query_failures_total
cortex_distributor_ingester_appends_total
cortex_distributor_ingester_append_failures_total
-distributor.extend-writes
. Now Mimir always behaves as if this setting was set to false
, which we expect to be safe for every Mimir cluster setup. #1856query-frontend
setup responses will be buffered until they've been completed. #1735evaluation_delay
for each rule group via rules group configuration file. #1474-ruler.query-frontend.address
-ruler.query-frontend.grpc-client-config.grpc-max-recv-msg-size
-ruler.query-frontend.grpc-client-config.grpc-max-send-msg-size
-ruler.query-frontend.grpc-client-config.grpc-compression
-ruler.query-frontend.grpc-client-config.grpc-client-rate-limit
-ruler.query-frontend.grpc-client-config.grpc-client-rate-limit-burst
-ruler.query-frontend.grpc-client-config.backoff-on-ratelimits
-ruler.query-frontend.grpc-client-config.backoff-min-period
-ruler.query-frontend.grpc-client-config.backoff-max-period
-ruler.query-frontend.grpc-client-config.backoff-retries
-ruler.query-frontend.grpc-client-config.tls-enabled
-ruler.query-frontend.grpc-client-config.tls-ca-path
-ruler.query-frontend.grpc-client-config.tls-cert-path
-ruler.query-frontend.grpc-client-config.tls-key-path
-ruler.query-frontend.grpc-client-config.tls-server-name
-ruler.query-frontend.grpc-client-config.tls-insecure-skip-verify
-alertmanager.max-concurrent-get-requests-per-tenant
. #1547-alertmanager.alertmanager-client.backoff-max-period
-alertmanager.alertmanager-client.backoff-min-period
-alertmanager.alertmanager-client.backoff-on-ratelimits
-alertmanager.alertmanager-client.backoff-retries
-alertmanager.alertmanager-client.grpc-client-rate-limit
-alertmanager.alertmanager-client.grpc-client-rate-limit-burst
-alertmanager.alertmanager-client.grpc-compression
-alertmanager.alertmanager-client.grpc-max-recv-msg-size
-alertmanager.alertmanager-client.grpc-max-send-msg-size
insight=true
field to alertmanager dispatch logs. #1379-blocks-storage.bucket-store.index-header-thread-pool-size
and is disabled by default. #1660component=query-frontend
label to results cache memcached metrics to fix a panic when Mimir is running in single binary mode and results cache is enabled. #1704text/html
. #1575multi
KV. #1587multi
KV store in ruler and querier. #1665-alertmanager-storage.backend=local
. Note that when using this storage type, the Alertmanager is not able persist state remotely, so it not recommended for production use. #1836user
label from logs instead of org_id
. #1634a76bee5913c97c918d9e56a3cc88cc28
to b0d38d318bbddd80476246d4930f9e55
68b66aed90ccab448009089544a8d6c6
to a6883fb22799ac74479c7db872451092
9c408e1d55681ecb8a22c9fab46875cc
to 1b3443aea86db629e6efdb7d05c53823
df9added6f1f4332f95848cca48ebd99
to 09a5c49e9cdb2f2b24c6d184574a07fd
61bb048ced9817b2d3e07677fb1c6290
to 5d9d0b4724c0f80d68467088ec61e003
d5a3a4489d57c733b5677fb55370a723
to e1324ee2a434f4158c00a9ee279d3292
b5c95fee2e5e7c4b5930826ff6e89a12
to 1e2c358600ac53f09faea133f811b5bb
d9931b1054053c8b972d320774bb8f1d
to b3abe8d5c040395cc36615cb4334c92d
8d6ba60eccc4b6eedfa329b24b1bd339
to e327503188913dc38ad571c647eef643
c0464f0d8bd026f776c9006b05910000
to 54b2a0a4748b3bd1aefa92ce5559a1c2
2fd2cda9eea8d8af9fbc0a5960425120
to cc86fd5aa9301c6528986572ad974db9
7544a3a62b1be6ffd919fc990ab8ba8f
to 7f0b5567d543a1698e695b530eb7f5de
44d12bcb1f95661c6ab6bc946dfc3473
to 631e15d5d85afb2ca8e35d62984eeaa0
88c041017b96856c9176e07cf557bdcf
to 64bbad83507b7289b514725658e10352
e6f3091e29d2636e3b8393447e925668
to 6089e1ce1e678788f46312a0a1e647e6
35fa247ce651ba189debf33d7ae41611
to 35fa247ce651ba189debf33d7ae41611
bc6e12d4fe540e4a1785b9d3ca0ffdd9
to bc6e12d4fe540e4a1785b9d3ca0ffdd9
0156f6d15aa234d452a33a4f13c838e3
to 8280707b8f16e7b87b840fc1cc92d4c5
681cd62b680b7154811fe73af55dcfd4
to 978c1cb452585c96697a238eaac7fe2d
c0464f0d8bd026f776c9006b0591bb0b
to bc9160e50b52e89e0e49c840fea3d379
mimir-continuous-test
tool: #1676
MimirContinuousTestNotRunningOnWrites
MimirContinuousTestNotRunningOnReads
MimirContinuousTestFailed
per_cluster_label
support to allow to change the label name used to differentiate between Kubernetes clusters. #1651MimirRequestErrors
and MimirRequestLatency
#1702gateway_enabled
(defaults to true
) to disable gateway panels from dashboards. #1761per_instance_label
in all dashboards and alerts. #1697mimir-continuous-test
. To deploy mimir-continuous-test
you can use the following configuration: #1675 #1850
_config+: {
continuous_test_enabled: true,
continuous_test_tenant_id: 'type-tenant-id',
continuous_test_write_endpoint: 'http://type-write-path-hostname',
continuous_test_read_endpoint: 'http://type-read-path-hostname/prometheus',
},
ingester_allow_multiple_replicas_on_same_node
configuration key. #1581node_selector
configuration option to select Kubernetes nodes where Mimir should run. #1596PodDisruptionBudget
of withMaxUnavailable = 1
, to ensure we maintain quorum during rollouts. #1683store_gateway_allow_multiple_replicas_on_same_node
configuration key. #1730store_gateway_zone_a_args
, store_gateway_zone_b_args
and store_gateway_zone_c_args
configuration options. #1807multikv_switch_primary_secondary
config option to flip primary and secondary in runtime config.config convert
: Retain Cortex defaults for blocks_storage.backend
, ruler_storage.backend
, alertmanager_storage.backend
, auth.type
, activity_tracker.filepath
, alertmanager.data_dir
, blocks_storage.filesystem.dir
, compactor.data_dir
, ruler.rule_path
, ruler_storage.filesystem.dir
, and graphite.querier.schemas.backend
. #1626 #1762markblocks
tool that creates no-compact
and delete
marks for the blocks. #1551mimir-continuous-test
tool to continuously run smoke tests on live Mimir clusters. #1535 #1540 #1653 #1603 #1630 #1691 #1675 #1676 #1692 #1706 #1709 #1775 #1777 #1778 #1795mimir-rules-action
GitHub action, located at operations/mimir-rules-action/
, used to lint, prepare, verify, diff, and sync rules to a Mimir cluster. #1723Published by johannaratliff over 2 years ago
CHANGELOG since mimir-2.1.0-rc.0
-distributor.extend-writes
. Now Mimir always behaves as if this setting was set to false
, which we expect to be safe for every Mimir cluster setup. #1856Published by johannaratliff over 2 years ago
Grafana Labs is excited to announce version 2.1 of Grafana Mimir, the most scalable, most performant open source time series database in the world.
Below we highlight the top features, enhancements and bugfixes in this release, as well as relevant callouts for those upgrading from Grafana Mimir 2.0. The complete list of changes is recorded in the Changelog.
Mimir on ARM: We now publish Docker images for both amd64
and arm64
, making it easier for those on arm-based machines to develop and run Mimir. Multiplaform images are available from the Mimir docker registry. Note that our existing integration test suite only uses the amd64
images, which means we cannot make any functional or performance guarantees about the arm64
images.
Remote
ruler mode for improved rule evaluation performance: We've added a remote
mode for the Grafana Mimir ruler, in which the ruler delegates rule evaluation to the query-frontend rather than evaluating rules directly within the ruler process itself. This allows recording and alerting rules to benefit from the query parallelization techniques implemented in the query-frontend (like query sharding). Remote
mode is considered experimental and is off by default. To enable, see remote ruler.
Per-tenant custom trackers for monitoring cardinality: In Grafana Mimir 2.0, we introduced a custom tracker feature that allows you to track the count of active series over time that match a specific label matcher. In Grafana Mimir 2.1, we've made it possible to configure custom trackers via the runtime configuration file. This means you can now define different trackers for each tenant in your cluster and modify those trackers without an ingester restart.
Reduce cardinality of Grafana Mimir's /metrics
endpoint: While Grafana Mimir does a good job of exposing a relatively small number of series about its own state, this number can tick up when running Grafana Mimir clusters with high tenant counts or high active series counts. To reduce this number (and the accompanying cost of scraping and storing these time series), we made several optimizations which decreased series count on the /metrics
endpoint by more than 10%.
We've updated the default values for 2 parameters in Grafana Mimir to give users better out-of-the-box performance:
We've changed the default for -blocks-storage.tsdb.isolation-enabled
from true
to false
. We've marked this flag as deprecated and will remove it completely in 2 releases. TSDB isolation is a feature inherited from Prometheus that didn't provide any benefit given Grafana Mimir's distributed architecture and in our 1 billion series load test we found it actually hurt performance. Disabling it reduced our ingester 99th percentile latency by 90%.
The store-gateway attributes cache is now enabled by default (achieved by updating the default for -blocks-storage.bucket-store.chunks-cache.attributes-in-memory-max-items
from 0
to 50000
). This in-memory cache makes it faster to look up object attributes for chunk data. We've been running this optional cache internally for a while and upon a recent configuration audit, realized it made sense to do the same for all users. The increase in store-gateway memory utilization from enabling this cache is negligible and easily justified given the performance gains.
local
storage broke in Grafana Mimir 2.0 when we removed the ability to run the Alertmanager without sharding. With this bugfix, we've made it possible to again run Alertmanager with local
storage. However, for production use, we still recommend using external store since this is needed to persist Alertmanager state (e.g. silences) between replicas.Published by pracucci over 2 years ago
Grafana Labs is excited to announce the first release of Grafana Mimir, the most scalable, most performant open source time series database in the world. In customer tests, we’ve shown that a single cluster can support more than 1 billion active time series.
Besides massive scale, Grafana Mimir offers a host of other benefits, including easy deployment, native multi-tenancy, high availability, durable long-term storage, and exceptional query performance on even the highest cardinality queries.
We’re launching Grafana Mimir with a 2.0 version number to signal our respect for Cortex, the project from which Grafana Mimir was forked. The choice of 2.0 also represents our conviction that Grafana Mimir is real-world-tested, production-ready software. It has served as the backbone of our Grafana Cloud Metrics and Grafana Enterprise Metrics products since their inception.
Learn more:
The complete list of changes is recorded in the Changelog.
Published by pracucci over 2 years ago
Published by pracucci over 2 years ago
Published by pracucci over 2 years ago
Published by pracucci over 2 years ago
Published by pracucci over 2 years ago