Published by squat over 4 years ago
--store.unhealthy-timeout
was never respected.web.route-prefix
to correctly handle /
and prefixes that do not begin with a /
.--store-strict
flag. More information available here.--wait-interval
to specify compaction wait interval between consecutive compact runs when --wait
is enabled.--deduplication.replica-label
flag to specify the replica label on which to deduplicate (hidden). Please note that this uses a NAIVE algorithm for merging (no smart replica deduplication, just chaining samples together). This works well for deduplication of blocks with precisely the same samples like those produced by Receiver replication. We plan to add a smarter algorithm in the following weeks.max_item_size
configuration option to memcached-based index cache. This should be set to the max item size configured in memcached (-I
flag) in order to not waste network round-trips to cache items larger than the limit configured in memcached.--experimental.enable-index-cache-postings-compression
flag to enable re-encoding and compressing postings before storing them into the cache. Compressed postings take about 10% of the original size.:<http-port>/loaded
, which shows exactly the blocks that are currently seen by compactor and the store gateway. The compactor also serves a different bucket UI on :<http-port>/global
, which shows the status of object storage without any filters.thanos bucket replicate
command to replicate blocks from one bucket to another.deletion-mark.json
file for the block that was chosen to be deleted. This file contains Unix time of when the block was marked for deletion. If you want to keep existing behavior, you should add --delete-delay=0s
as a flag.downsample
command has moved and is now a sub-command of the thanos bucket
sub-command; it cannot be called via thanos downsample
any more.=~".*"
matchers or negation matchers (!=...
or !~...
) benefit the most.--store.disable-index-header
. The --experimental.enable-index-header
flag was removed.Published by squat over 4 years ago
--store.unhealthy-timeout
was never respected.web.route-prefix
to correctly handle /
and prefixes that do not begin with a /
.--store-strict
flag. More information available here.--wait-interval
to specify compaction wait interval between consecutive compact runs when --wait
is enabled.--deduplication.replica-label
flag to specify the replica label on which to deduplicate (hidden). Please note that this uses a NAIVE algorithm for merging (no smart replica deduplication, just chaining samples together). This works well for deduplication of blocks with precisely the same samples like those produced by Receiver replication. We plan to add a smarter algorithm in the following weeks.max_item_size
configuration option to memcached-based index cache. This should be set to the max item size configured in memcached (-I
flag) in order to not waste network round-trips to cache items larger than the limit configured in memcached.--experimental.enable-index-cache-postings-compression
flag to enable re-encoding and compressing postings before storing them into the cache. Compressed postings take about 10% of the original size.:<http-port>/loaded
, which shows exactly the blocks that are currently seen by compactor and the store gateway. The compactor also serves a different bucket UI on :<http-port>/global
, which shows the status of object storage without any filters.thanos bucket replicate
command to replicate blocks from one bucket to another.deletion-mark.json
file for the block that was chosen to be deleted. This file contains Unix time of when the block was marked for deletion. If you want to keep existing behavior, you should add --delete-delay=0s
as a flag.downsample
command has moved and is now a sub-command of the thanos bucket
sub-command; it cannot be called via thanos downsample
any more.=~".*"
matchers or negation matchers (!=...
or !~...
) benefit the most.--store.disable-index-header
. The --experimental.enable-index-header
flag was removed.Published by squat over 4 years ago
--store.unhealthy-timeout
was never respected.web.route-prefix
to correctly handle /
and prefixes that do not begin with a /
.--store-strict
flag. More information available here.--wait-interval
to specify compaction wait interval between consecutive compact runs when --wait
is enabled.--deduplication.replica-label
flag to specify the replica label on which to deduplicate (hidden). Please note that this uses a NAIVE algorithm for merging (no smart replica deduplication, just chaining samples together). This works well for deduplication of blocks with precisely the same samples like those produced by Receiver replication. We plan to add a smarter algorithm in the following weeks.max_item_size
configuration option to memcached-based index cache. This should be set to the max item size configured in memcached (-I
flag) in order to not waste network round-trips to cache items larger than the limit configured in memcached.--experimental.enable-index-cache-postings-compression
flag to enable re-encoding and compressing postings before storing them into the cache. Compressed postings take about 10% of the original size.:<http-port>/loaded
, which shows exactly the blocks that are currently seen by compactor and the store gateway. The compactor also serves a different bucket UI on :<http-port>/global
, which shows the status of object storage without any filters.thanos bucket replicate
command to replicate blocks from one bucket to another.deletion-mark.json
file for the block that was chosen to be deleted. This file contains Unix time of when the block was marked for deletion. If you want to keep existing behavior, you should add --delete-delay=0s
as a flag.downsample
command has moved and is now a sub-command of the thanos bucket
sub-command; it cannot be called via thanos downsample
any more.=~".*"
matchers or negation matchers (!=...
or !~...
) benefit the most.--store.disable-index-header
. The --experimental.enable-index-header
flag was removed.Published by metalmatze over 4 years ago
index-header
mode run store with hidden experimental.enable-index-header
flag.--query.config
and --query.config-file
CLI flags. See documentation for further information.thanos_proxy_store_empty_stream_responses_total
metric for number of empty responses from stores.--receive.local-endpoint
flag and the endpoints in the hashring configuration file must now specify the receive gRPC port and must be updated to be a simple host:port
combination, e.g. 127.0.0.1:10901
, rather than a full HTTP URL, e.g. http://127.0.0.1:10902/api/v1/receive
.--tsdb.wal-compression
to configure whether to enable tsdb wal compression in ruler and receiver.thanos_query_duplicated_store_address
to thanos_query_duplicated_store_addresses_total
and thanos_rule_duplicated_query_address
to thanos_rule_duplicated_query_addresses_total
.Published by metalmatze over 4 years ago
index-header
mode run store with hidden experimental.enable-index-header
flag.--query.config
and --query.config-file
CLI flags. See documentation for further information.thanos_proxy_store_empty_stream_responses_total
metric for number of empty responses from stores.--receive.local-endpoint
flag and the endpoints in the hashring configuration file must now specify the receive gRPC port and must be updated to be a simple host:port
combination, e.g. 127.0.0.1:10901
, rather than a full HTTP URL, e.g. http://127.0.0.1:10902/api/v1/receive
.--tsdb.wal-compression
to configure whether to enable tsdb wal compression in ruler and receiver.thanos_query_duplicated_store_address
to thanos_query_duplicated_store_addresses_total
and thanos_rule_duplicated_query_address
to thanos_rule_duplicated_query_addresses_total
.Published by metalmatze over 4 years ago
index-header
mode run store with hidden experimental.enable-index-header
flag.--query.config
and --query.config-file
CLI flags. See documentation for further information.thanos_proxy_store_empty_stream_responses_total
metric for number of empty responses from stores.--receive.local-endpoint
flag and the endpoints in the hashring configuration file must now specify the receive gRPC port and must be updated to be a simple host:port
combination, e.g. 127.0.0.1:10901
, rather than a full HTTP URL, e.g. http://127.0.0.1:10902/api/v1/receive
.--tsdb.wal-compression
to configure whether to enable tsdb wal compression in ruler and receiver.thanos_query_duplicated_store_address
to thanos_query_duplicated_store_addresses_total
and thanos_rule_duplicated_query_address
to thanos_rule_duplicated_query_addresses_total
.Published by GiedriusS almost 5 years ago
Thanks to all contributors! β€οΈ
Highlights: Store now supports memcached
; StoreAPI has a new skip-chunks
option which is used to greatly speed-up the /api/v1/series
end-point; Store/Compactor has improved synchronization of meta JSON files; Ruler supports TLS and authentication; fixed a potential data loss when uploading older blocks or when the upload is taking a long time while the Compactor is running; Compaction process should take significantly less RAM but a longer time.
β memcached
support is marked experimental for now β
As always, here is the detailed changelog:
#1919 Compactor: Fixed potential data loss when uploading older blocks, or upload taking long time while compactor is
running.
#1937 Compactor: Improved synchronization of meta JSON files.
Compactor now properly handles partial block uploads for all operation like retention apply, downsampling and compaction. Additionally:
thanos_compact_sync_meta_*
metrics. Use thanos_blocks_meta_*
metrics instead.thanos_consistency_delay_seconds
and thanos_compactor_aborted_partial_uploads_deletion_attempts_total
metrics.#1936 Store: Improved synchronization of meta JSON files. Store now properly handles corrupted disk cache. Added meta.json sync metrics.
#1856 Receive: close DBReadOnly after flushing to fix a memory leak.
#1882 Receive: upload to object storage as 'receive' rather than 'sidecar'.
#1907 Store: Fixed the duration unit for the metric thanos_bucket_store_series_gate_duration_seconds
.
#1931 Compact: Fixed the compactor successfully exiting when actually an error occurred while compacting a blocks group.
#1872 Ruler: /api/v1/rules
now shows a properly formatted value
#1945 master
container images are now built with Go 1.13
#1956 Ruler: now properly ignores duplicated query addresses
#1975 Store Gateway: fixed panic caused by memcached servers selector when there's 1 memcached node
AWS_CONTAINER_CREDENTIALS_FULL_URI
by upgrading to minio-go v6.0.44--alertmanagers.config
and --alertmanagers.config-file
CLI flags. See documentation for further information.--alertmanagers.sd-dns-interval
CLI option to specify the interval between DNS resolutions of Alertmanager hosts./api/v1/series
endpoint./api/v1/labels
now understands POST
- useful for sending bigger requests#1947 Upgraded Prometheus dependencies to v2.15.2. This includes:
[ 5m]
#1833 --shipper.upload-compacted
flag has been promoted to non hidden, non experimental state. More info available here.
#1867 Ruler: now sets a Thanos/$version
User-Agent
in requests
#1887 Service discovery now deduplicates targets between different target groups
Published by GiedriusS almost 5 years ago
Published by GiedriusS almost 5 years ago
Published by bwplotka almost 5 years ago
Thanks to all contributors!
Worth-noting changes: Support for AlibabaCloud object storage; LightStep tracing; Ruler fixes, Store UI page fixed, Store gateway has now metrics for startup cycle plus optimization.
--grpc-grace-period
CLI option to components which serve gRPC to set how long to wait until gRPC Server shuts down.--prometheus.ready_timeout
CLI option to the sidecar to set how long to wait until Prometheus starts up.AliYun OSS
object storage, see documents for further information.--http-grace-period
CLI option to components which serve HTTP to set how long to wait until HTTP Server shuts down.--listen
to --http-address
to match other components.thanos_compactor_iterations_total
on Thanos Compactor which shows the number of successful iterations.thanos bucket web
now supports --web.external-prefix
for proxying on a subpath.--web.prefix-header
flags to allow for bucket UI to be accessible behind a reverse proxy./-/healthy
endpoint now starts to respond with success earlier. /metrics
endpoint starts serving metrics earlier as well. Make sure to point your readiness probes to the /-/ready
endpoint rather than /metrics
.failed to assert type of rule ...
message.--web.external-prefix
404s for static resources.offset
.thanos_compact_group_compactions_total
now counts block compactions, so operations that resulted in a compacted block. The old behaviourthanos_compact_group_compaction_runs_started_total
and thanos_compact_group_compaction_runs_completed_total
which counts compaction runs overall.prober_ready
and prober_healthy
metrics are removed, for sake of status
. Now status
exposes same metric with a label, check
. check
can have "healty" or "ready" depending on status of the probe.Published by bwplotka almost 5 years ago
RC release for v0.9.0
See changes here
Published by bwplotka about 5 years ago
thanos_store_nodes_grpc_connections
metric is now per external_labels
and store_type
. It is a recommended metric for Querier storeAPIs. thanos_store_node_info
is marked as obsolete and will be removed in next release."@thanos_compatibility_store_type=store"
label. This is to have the current Store Gateway compatible with Querier pre v0.8.0.debug.advertise-compatibility-label=false
flag on Store Gateway.See full CHANGELOG here
Published by bwplotka about 5 years ago
min-time
(e.g 3h only).Make sure you check out Prometheus 2.13.0 as well. New release drastically improves usage and resource consumption of both Prometheus and sidecar with Thanos: https://prometheus.io/blog/2019/10/10/remote-read-meets-streaming/
--selector.relabel-config-file
and selector.relabel-config
) into Thanos Store and Compact components./-/ready
and /-/healthy
endpoints./-/ready
and /-/healthy
endpoints./-/ready
and /-/healthy
endpoints./-/ready
and /-/healthy
endpoints./-/ready
and /-/healthy
endpoints.replicaLabels
param for /query
and/query_range
querier endpoints. When provided overwrite the query.replica-label
cli flags.resendDelay
flag.query.replica-label
configuration can be provided more than--query.replica-label=prometheus_replica --query.replica-label=service
.labels
to label
to be consistent with other commands.+
in it.See full CHANGELOG here
Published by domgreen about 5 years ago
Accepted into CNCF:
#thanos
/#thanos-dev
/ #thanos-prs
thanos_receive_config_hash
, thanos_receive_config_last_reload_successful
and thanos_receive_config_last_reload_success_timestamp_seconds
metrics to track latest configuration change2.13
or 2.12-master
.part_size
configuration option for HTTP multipart requests minimum part size for S3 storage typethanos_receive_hashring_nodes
and thanos_receive_hashring_tenants
metrics to monitor status of hash-rings/-/ready
and /-/healthy
endpoints to Thanos sidecar./-/ready
and /-/healthy
endpoints to Thanos compact.min-time
& max-time
downsampling.disable
./series
API end-point now properly returns an empty array just like Prometheus if there are no resultshttp_requests_total
and http_request_duration_seconds_bucket
; Thanos Query no longer exposes thanos_query_api_instant_query_duration_seconds
, thanos_query_api_range_query_duration_second
metrics and Thanos Receive no longer exposes thanos_http_request_duration_seconds
, thanos_http_requests_total
, thanos_http_response_size_bytes
.Published by domgreen about 5 years ago
TLDR; Move to CNCF, Added steaming between Prometheus and Sidecar, allow time sharding on Store Gateway and many bug fixes.
More detailed information on the release can be found here https://github.com/thanos-io/thanos/blob/master/CHANGELOG.md
Published by bwplotka about 5 years ago
Published by GiedriusS over 5 years ago
TL;DR: Jaeger tracing support (tracing flag changed), various observability improvements, Thanos receiver improvements, improvement external label propagation, including federated Queriers (!) and other fixes.
NOTE: Thanks to improved external labels propagation, if you run have duplicate queries in your Querier configuration with hierarchical federation of multiple Queries, Thanos now will detect this case and block all duplicates. New releases (potentially in v0.6.1) will just warn and block all but one.
#1097 Added thanos check rules
linter for Thanos rule rules files.
#1253 Add support for specifying a maximum amount of retries when using Azure Blob storage (default: no retries).
#1244 Thanos Compact now exposes new metrics thanos_compact_downsample_total
and thanos_compact_downsample_failures_total
which are useful to catch when errors happen
#1260 Thanos Query/Rule now exposes metrics thanos_querier_store_apis_dns_provider_results
and thanos_ruler_query_apis_dns_provider_results
which tell how many addresses were configured and how many were actually discovered respectively
#1248 Add a web UI to show the state of remote storage.
#1217 Thanos Receive gained basic hashring support
#1262 Thanos Receive got a new metric thanos_http_requests_total
which shows how many requests were handled by it
#1243 Thanos Receive got an ability to forward time series data between nodes. Now you can pass the hashring configuration via --receive.hashrings-file
; the refresh interval --receive.hashrings-file-refresh-interval
; the name of the local node's name --receive.local-endpoint
; and finally the header's name which is used to determine the tenant --receive.tenant-header
.
#1147 Support for the Jaeger tracer has been added!
breaking New common flags were added for configuring tracing: --tracing.config-file
and --tracing.config
. You can either pass a file to Thanos with the tracing configuration or pass it in the command line itself. Old --gcloudtrace.*
flags were removed β οΈ
To migrate over the old --gcloudtrace.*
configuration, your tracing configuration should look like this:
---
type: STACKDRIVER
config:
- service_name: 'foo'
project_id: '123'
sample_factor: 123
The other type
you can use is JAEGER
now. The config
keys and values are Jaeger specific and you can find all of the information here.
#1284 Add support for multiple label-sets in Info gRPC service. This deprecates the single Labels
slice of the InfoResponse
, in a future release backward compatible handling for the single set of Labels will be removed. Upgrading to v0.6.0 or higher is advised.
#1314 Removes http_request_duration_microseconds
(Summary) and adds http_request_duration_seconds
(Histogram) from http server instrumentation used in Thanos APIs and UIs.
#1287 Sidecar now waits on Prometheus' external labels before starting the uploading process
#1261 Thanos Receive now exposes metrics thanos_http_request_duration_seconds
and thanos_http_response_size_bytes
properly of each handler
#1274 Iteration limit has been lifted from the LRU cache so there should be no more spam of error messages as they were harmless
#1321 Thanos Query now fails early on a query which only uses external labels - this improves clarity in certain situations
#1227 Some context handling issues were fixed in Thanos Compact; some unnecessary memory allocations were removed in the hot path of Thanos Store.
#1183 Compactor now correctly propogates retriable/haltable errors which means that it will not unnecessarily restart if such an error occurs
#1231 Receive now correctly handles SIGINT and closes without deadlocking
#1278 Fixed inflated values problem with sum()
on Thanos Query
#1280 Fixed a problem with concurrent writes to a map
in Thanos Query while rendering the UI
#1311 Fixed occasional panics in Compact and Store when using Azure Blob cloud storage caused by lack of error checking in client library.
#1322 Removed duplicated closing of the gRPC listener - this gets rid of harmless messages like store gRPC listener: close tcp 0.0.0.0:10901: use of closed network connection
when those programs are being closed
Published by GiedriusS over 5 years ago
Published by bwplotka over 5 years ago
TL;DR: Store LRU cache is no longer leaking, Upgraded Thanos UI to Prometheus 2.9, Fixed auto-downsampling, Moved to Go 1.12.5 and more.
This version moved tarballs to Golang 1.12.5 from 1.11 as well, so same warning applies if you use container_memory_usage_bytes
from cadvisor. Use container_memory_working_set_bytes
instead.
breaking As announced couple of times this release also removes gossip with all configuration flags (--cluster.*
).
#1118 breaking swift: Added support for cross-domain authentication by introducing userDomainID
, userDomainName
, projectDomainID
, projectDomainName
.
The outdated terms tenantID
, tenantName
are deprecated and have been replaced by projectID
, projectName
.
#1066 Upgrade Thanos ui to Prometheus v2.9.1.
Changes from the upstream:
#1156 Moved CI and docker multistage to Golang 1.12.5 for latest mem alloc improvements.
#1103 Updated go-cos deps. (COS bucket client).
#1149 Updated google Golang API deps (GCS bucket client).
#1190 Updated minio deps (S3 bucket client). This fixes minio retries.
#1133 Use prometheus v2.9.2, common v0.4.0 & tsdb v0.8.0.
Changes from the upstreams:
--cluster.*
flags removed and Thanos will error out if any is provided.See full CHANGELOG here