thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.

APACHE-2.0 License

Stars
12.6K
Committers
620
thanos - v0.18.0-rc.0

Published by squat almost 4 years ago

Highlights

  • Several big optimizations speeding up query performance were introduced!
  • A new command was added, thanos tools bucket rewrite, enabling the deletion of series from given block.
  • The Query Frontend now supports proxying requests to the labels and series API endpoints.
  • thanos tools bucket replicate can now copy particular blocks by ID.
  • The number of series touched by the Store component when servicing a Series call can now be limited using a CLI flag.

Added

  • #3380 Mixin: Add block deletion panels for compactor dashboards.
  • #3568 Store: Optimized inject label stage of index lookup.
  • #3566 StoreAPI: Support label matchers in labels API.
  • #3531 Store: Optimized common cases for time selecting smaller amount of series by avoiding looking up symbols.
  • #3469 StoreAPI: Added hints field to LabelNamesRequest and LabelValuesRequest. Hints are an opaque data structure that can be used to carry additional information from the store and its content is implementation-specific.
  • #3421 Tools: Added thanos tools bucket rewrite command allowing to delete series from given block.
  • #3509 Store: Added a CLI flag to limit the number of series that are touched.
  • #3444 Query Frontend: Make POST request to downstream URL for labels and series API endpoints.
  • #3388 Tools: Bucket replicator now can specify block IDs to copy.
  • #3385 Tools: Bucket prints extra statistics for block index with debug log-level.
  • #3121 Receive: Added --receive.hashrings alternative to receive.hashrings-file flag (lower priority). The flag expects the literal hashring configuration in JSON format.

Fixed

  • #3567 Mixin: Reintroduce thanos_objstore_bucket_operation_failures_total alert.
  • #3527 Query Frontend: Fix query_range behavior when start/end times are the same
  • #3560 Query Frontend: Allow separate label cache
  • #3672 Rule: Prevent crashing due to no such host error when using dnssrv+ or dnssrvnoa+.
  • #3461 Compact, Shipper, Store: Fixed panic when no external labels are set in block metadata.

Changed

  • #3496 S3: Respect SignatureV2 flag for all credential providers.
  • #2732 Swift: Switched to a new library ncw/swift providing large objects support.
    By default, segments will be uploaded to the same container directory segments/ if the file is bigger than 1GB.
    To change the defaults see the docs.
  • #3626 Shipper: Failed upload of meta.json file doesn't cause block cleanup anymore. This has a potential to generate corrupted blocks under specific conditions. Partial block is left in bucket for later cleanup.
thanos - v0.17.2

Published by metalmatze almost 4 years ago

Fixed

  • #3532 compact: do not cleanup blocks on boot. Reverts the behavior change introduced in #3115 as in some very bad cases the boot of Thanos Compact took a very long time since there were a lot of blocks-to-be-cleaned.
  • #3520 Fix index out of bound bug when comparing ZLabelSets.
thanos - v0.17.1

Published by metalmatze almost 4 years ago

Fixed

  • #3480 Query-frontend: Fixed regression.

Changed

  • #3498 Enabled debug.SetPanicOnFault(true) which allow us to recover on queries causing SEG FAULTs (e.g unmmaped memory access).
thanos - v0.17.0

Published by metalmatze almost 4 years ago

Highlights

  • BlockViewer now allows downloading meta.json of a block directly from Compact/Web UI.
  • thanos query-frontend now supports query splitting and retries for label names and label values APIs.
  • thanos tools bucket replicate can now copy particular blocks by time range.
  • thanos query is now using dynamic lookback delta when downsampled data is used by default.
  • More katacoda tutorials!
  • Compactor is aware of 64GB index size limits and automatically avoids compacting blocks to a bigger size than this. Admin can
    also manually exclude block from further compaction by uploading no-compact-mark.json file to bucket block directory.
  • Added thanos tools bucket mark for marking blocks with no compact or delayed deletion marker.
  • When enabled ( --store.enable-index-header-lazy-reader flag), Store Gateway will mmap only those blocks caches that are needed for the queries. This option will be set to default in next release.
  • ⚠️ Breaking change ⚠️ Dozens of metrics were renamed for consistency. (Particularly thanos_querier_... to thanos_query_... and thanos_compactor_... to thanos_compact_...

Added

  • #3259 Thanos BlockViewer: Added a button in the blockviewer that allows users to download the metadata of a block.
  • #3261 Thanos Store: Use segment files specified in meta.json file, if present. If not present, Store does the LIST operation as before.
  • #3276 Query Frontend: Support query splitting and retry for label names, label values and series requests.
  • #3315 Query Frontend: Support results caching for label names, label values and series requests.
  • #3346 Ruler UI: Fix a bug preventing the /rules endpoint from loading.
  • #3115 compact: now deletes partially uploaded and blocks with deletion marks concurrently. It does that at the beginning and then every --compact.cleanup-interval time period. By default it is 5 minutes.
  • #3312 s3: add list_objects_version config option for compatibility.
  • #3356 Query Frontend: Add a flag to disable step alignment middleware for query range.
  • #3378 Ruler: added the ability to send queries via the HTTP method POST. Helps when alerting/recording rules are extra long because it encodes the actual parameters inside of the body instead of the URI. Thanos Ruler now uses POST by default unless --query.http-method is set GET.
  • #3381 Querier UI: Add ability to enable or disable metric autocomplete functionality.
  • #2979 Replicator: Add the ability to replicate blocks within a time frame by passing --min-time and --max-time
  • #3398 Query Frontend: Add default config for query frontend memcached config.
  • #3277 Thanos Query: Introduce dynamic lookback interval. This allows queries with large step to make use of downsampled data.
  • #3409 Compactor: Added support for no-compact-mark.json which excludes the block from compaction.
  • #3245 Query Frontend: Add query-frontend.org-id-header flag to specify HTTP header(s) to populate slow query log (e.g. X-Grafana-User).
  • #3431 Store: Added experimental support to lazy load index-headers at query time. When enabled via --store.enable-index-header-lazy-reader flag, the store-gateway will load into memory an index-header only once it's required at query time. Index-header will be automatically released after --store.index-header-lazy-reader-idle-timeout of inactivity.
    • This, generally, reduces baseline memory usage of store when inactive, as well as a total number of mapped files (which is limited to 64k in some systems.
  • #3415 Tools: Added thanos tools bucket mark command that allows to mark given block for deletion or for no-compact

Fixed

  • #3257 Ruler: Prevent Ruler from crashing when using default DNS to lookup hosts that results in "No such hosts" errors.
  • #3331 Disable Azure blob exception logging
  • #3341 Disable Azure blob syslog exception logging
  • #3414 Set CORS for Query Frontend

Changed

  • #3452 Store: Index cache posting compression is now enabled by default. Removed experimental.enable-index-cache-postings-compression flag.
  • #3410 Compactor: Changed metric thanos_compactor_blocks_marked_for_deletion_total to thanos_compactor_blocks_marked_total with marker label.
    Compactor will now automatically disable compaction for blocks with large index that would output blocks after compaction larger than specified value (by default: 64GB). This automatically
    handles the Promethus format limit.
  • #2906 Tools: Refactor Bucket replicate execution. Removed all thanos_replicate_origin_.* metrics.
    • thanos_replicate_origin_meta_loads_total can be replaced by blocks_meta_synced{state="loaded"}.
    • thanos_replicate_origin_partial_meta_reads_total can be replaced by blocks_meta_synced{state="failed"}.
  • #3309 Compact: breaking ⚠️ Rename metrics to match naming convention. This includes metrics starting with thanos_compactor to thanos_compact, thanos_querier to thanos_query and thanos_ruler to thanos_rule.
thanos - v0.17.0-rc.0

Published by metalmatze almost 4 years ago

Highlights

  • BlockViewer now allows downloading meta.json of a block directly from Compact/Web UI.
  • thanos query-frontend now supports query splitting and retries for label names and label values APIs.
  • thanos tools bucket replicate can now copy particular block s by ID or time range.
  • thanos query is now using dynamic lookback delta when downsampled data is used by default.
  • More katacoda tutorials!
  • Compactor is aware of 64GB index size limits and automatically avoids compacting blocks to a bigger size than this. Admin can
    also manually exclude block from further compaction by uploading no-compact-mark.json file to bucket block directory.
  • Added thanos tools bucket mark for marking blocks with no compact or delayed deletion marker.
  • When enabled ( --store.enable-index-header-lazy-reader flag), Store Gateway will mmap only those blocks caches that are needed for the queries. This option will be set to default in next release.
  • ⚠️ Breaking change ⚠️ Dozens of metrics were renamed for consistency. (Particularly thanos_querier_... to thanos_query_... and thanos_compactor_... to thanos_compact_...

Added

  • #3259 Thanos BlockViewer: Added a button in the blockviewer that allows users to download the metadata of a block.
  • #3261 Thanos Store: Use segment files specified in meta.json file, if present. If not present, Store does the LIST operation as before.
  • #3276 Query Frontend: Support query splitting and retry for label names, label values and series requests.
  • #3315 Query Frontend: Support results caching for label names, label values and series requests.
  • #3346 Ruler UI: Fix a bug preventing the /rules endpoint from loading.
  • #3115 compact: now deletes partially uploaded and blocks with deletion marks concurrently. It does that at the beginning and then every --compact.cleanup-interval time period. By default it is 5 minutes.
  • #3312 s3: add list_objects_version config option for compatibility.
  • #3356 Query Frontend: Add a flag to disable step alignment middleware for query range.
  • #3378 Ruler: added the ability to send queries via the HTTP method POST. Helps when alerting/recording rules are extra long because it encodes the actual parameters inside of the body instead of the URI. Thanos Ruler now uses POST by default unless --query.http-method is set GET.
  • #3381 Querier UI: Add ability to enable or disable metric autocomplete functionality.
  • #2979 Replicator: Add the ability to replicate blocks within a time frame by passing --min-time and --max-time
  • #3398 Query Frontend: Add default config for query frontend memcached config.
  • #3277 Thanos Query: Introduce dynamic lookback interval. This allows queries with large step to make use of downsampled data.
  • #3409 Compactor: Added support for no-compact-mark.json which excludes the block from compaction.
  • #3245 Query Frontend: Add query-frontend.org-id-header flag to specify HTTP header(s) to populate slow query log (e.g. X-Grafana-User).
  • #3431 Store: Added experimental support to lazy load index-headers at query time. When enabled via --store.enable-index-header-lazy-reader flag, the store-gateway will load into memory an index-header only once it's required at query time. Index-header will be automatically released after --store.index-header-lazy-reader-idle-timeout of inactivity.
    • This, generally, reduces baseline memory usage of store when inactive, as well as a total number of mapped files (which is limited to 64k in some systems.
  • #3415 Tools: Added thanos tools bucket mark command that allows to mark given block for deletion or for no-compact

Fixed

  • #3257 Ruler: Prevent Ruler from crashing when using default DNS to lookup hosts that results in "No such hosts" errors.
  • #3331 Disable Azure blob exception logging
  • #3341 Disable Azure blob syslog exception logging
  • #3414 Set CORS for Query Frontend

Changed

  • #3410 Compactor: Changed metric thanos_compactor_blocks_marked_for_deletion_total to thanos_compactor_blocks_marked_total with marker label.
    Compactor will now automatically disable compaction for blocks with large index that would output blocks after compaction larger than specified value (by default: 64GB). This automatically
    handles the Promethus format limit.
  • #2906 Tools: Refactor Bucket replicate execution. Removed all thanos_replicate_origin_.* metrics.
    • thanos_replicate_origin_meta_loads_total can be replaced by blocks_meta_synced{state="loaded"}.
    • thanos_replicate_origin_partial_meta_reads_total can be replaced by blocks_meta_synced{state="failed"}.
  • #3309 Compact: breaking ⚠️ Rename metrics to match naming convention. This includes metrics starting with thanos_compactor to thanos_compact, thanos_querier to thanos_query and thanos_ruler to thanos_rule.
thanos - v0.16.0

Published by bwplotka almost 4 years ago

Highlights:

  • New Thanos component, Query Frontend has more options and supports shared cache (currently: Memcached).
  • Added debug mode in Thanos UI that allows to filter Stores to query from by their IPs from Store page (!). This helps enormously in e.g debugging the slowest store etc. All raw Thanos API allows passing
    storeMatch[] arguments with __address__ matchers.
  • Improved debuggability on all Thanos components by exposing off-CPU profiles thanks to fgprof endpoint.
  • Significantly improved sidecar latency and CPU usage for metrics fetches.

Fixed

  • #3234 UI: Fix assets not loading when --web.prefix-header is used.
  • #3184 Compactor: Fixed support for web.external-prefix for Compactor UI.

Added

  • #3114 Query Frontend: Added support for Memacached cache.
    • breaking Renamed flag log_queries_longer_than to log-queries-longer-than.
  • #3166 UIs: Added UI for passing a storeMatch[] parameter to queries.
  • #3181 Logging: Added debug level logging for responses between 300-399
  • #3133 Query: Allowed passing a storeMatch[] to Labels APIs; Time range metadata based store filtering is supported on Labels APIs.
  • #3146 Sidecar: Significantly improved sidecar latency (reduced ~2x). Added thanos_sidecar_prometheus_store_received_frames histogram metric.
  • #3147 Querier: Added query.metadata.default-time-range flag to specify the default metadata time range duration for retrieving labels through Labels and Series API when the range parameters are not specified. The zero value means range covers the time since the beginning.
  • #3207 Query Frontend: Added cache-compression-type flag to use compression in the query frontend cache.
  • #3122 *: All Thanos components have now /debug/fgprof endpoint on HTTP port allowing to get off-CPU profiles as well.
  • #3109 Query Frontend: Added support for Cache-Control HTTP response header which controls caching behaviour. So far no-store value is supported and it makes the response skip cache.
  • #3092 Tools: Added tools bucket cleanup CLI tool that deletes all blocks marked to be deleted.

Changed

  • #3136 Sidecar: breaking Added metric thanos_sidecar_reloader_config_apply_operations_total and rename metric thanos_sidecar_reloader_config_apply_errors_total to thanos_sidecar_reloader_config_apply_operations_failed_total.
  • #3154 Querier: breaking Added metric thanos_query_gate_queries_max. Remove metric thanos_query_concurrent_selects_gate_queries_in_flight.
  • #3154 Store: breaking Renamed metric thanos_bucket_store_queries_concurrent_max to thanos_bucket_store_series_gate_queries_max.
  • #3179 Store: context.Canceled will not increase thanos_objstore_bucket_operation_failures_total.
  • #3136 Sidecar: Improved detection of directory changes for Prometheus config.
    • breaking Added metric thanos_sidecar_reloader_config_apply_operations_total and rename metric thanos_sidecar_reloader_config_apply_errors_total to thanos_sidecar_reloader_config_apply_operations_failed_total.
  • #3022 *: Thanos images are now build with Go 1.15.
  • #3205 *: Updated TSDB to ~2.21
thanos - v0.16.0-rc.1

Published by bwplotka about 4 years ago

thanos - v0.16.0-rc.0

Published by bwplotka about 4 years ago

thanos - v0.15.0

Published by kakkoyun about 4 years ago

Highlights:

  • Added new Thanos component: Query Frontend responsible for response caching,
    query scheduling and parallelization (based on Cortex Query Frontend).
  • Added various new, improved UIs to Thanos based on React: Querier' BuildInfo & Flags, Ruler UI, BlockViewer.
  • Optimized Sidecar, Store, Receive, Ruler data retrieval with new TSDB ChunkIterator, capping chunks to 120 samples, fixed various leaks.
  • Fixed sample limit on Store Gateway.
  • Added S3 Server Side Encryption options.
  • Tons of other important fixes!

Thanks to all contributors! 🤗

Fixed

  • #2665 Swift: Fix issue with missing Content-Type HTTP headers.
  • #2800 Query: Fix handling of --web.external-prefix and --web.route-prefix.
  • #2834 Query: Fix rendered JSON state value for rules and alerts should be in lowercase.
  • #2866 Receive, Querier: Fixed leaks on receive and querier Store API Series, which were leaking on errors.
  • #2937 Receive: Fixing auto-configuration of --receive.local-endpoint.
  • #2895 Compact: Fix increment of thanos_compact_downsample_total metric for downsample of 5m resolution blocks.
  • #2858 Store: Fix --store.grpc.series-sample-limit implementation. The limit is now applied to the sum of all samples fetched across all queried blocks via a single Series call, instead of applying it individually to each block.
  • #2936 Compact: Fix ReplicaLabelRemover panic when replicaLabels are not specified.
  • #2956 Store: Fix fetching of chunks bigger than 16000 bytes.
  • #2970 Store: Upgrade minio-go/v7 to fix slowness when running on EKS.
  • #2957 Rule: breaking ⚠️ Now sets all of the relevant fields properly; avoids a panic when /api/v1/rules is called and the time zone is not UTC; rules field is an empty array now if no rules have been defined in a rule group.
    Thanos Rule's /api/v1/rules endpoint no longer returns the old, deprecated partial_response_strategy. The old, deprecated value has been fixed to WARN for quite some time. Please use partialResponseStrategy.
  • #2976 Query: Better rounding for incoming query timestamps.
  • #2929 Mixin: Fix expression for 'unhealthy sidecar' alert and increase the timeout for 10 minutes.
  • #3024 Query: Consider group name and file for deduplication.
  • #3012 Ruler,Receiver: Fix TSDB to delete blocks in atomic way.
  • #3046 Ruler,Receiver: Fixed framing of StoreAPI response, it was one chunk by one.
  • #3095 Ruler: Update the manager when all rule files are removed.
  • #3105 Querier: Fix overwriting maxSourceResolution when auto downsampling is enabled.
  • #3010 Querier: Added --query.lookback-delta flag to override the default lookback delta in PromQL. The flag should be lookback delta should be set to at least 2 times of the slowest scrape interval. If unset it will use the PromQL default of 5m.

Added

  • #2305 Receive,Sidecar,Ruler: Propagate correct (stricter) MinTime for TSDBs that have no block.
  • #2849 Query, Ruler: Added request logging for HTTP server side.
  • #2832 ui React: Add runtime and build info page
  • #2926 API: Add new blocks HTTP API to serve blocks metadata. The status endpoints (/api/v1/status/flags, /api/v1/status/runtimeinfo and /api/v1/status/buildinfo) are now available on all components with a HTTP API.
  • #2892 Receive: Receiver fails when the initial upload fails.
  • #2865 ui: Migrate Thanos Ruler UI to React
  • #2964 Query: Add time range parameters to label APIs. Add start and end fields to Store API LabelNamesRequest and LabelValuesRequest.
  • #2996 Sidecar: Add reloader_config_apply_errors_total metric. Add new flags --reloader.watch-interval, and --reloader.retry-interval.
  • #2973 Add Thanos Query Frontend component.
  • #2980 Bucket Viewer: Migrate block viewer to React.
  • #2725 Add bucket index operation durations: thanos_bucket_store_cached_series_fetch_duration_seconds and thanos_bucket_store_cached_postings_fetch_duration_seconds.
  • #2931 Query: Allow passing a storeMatch[] to select matching stores when debugging the querier. See documentation

Changed

  • #2893 Store: Rename metric thanos_bucket_store_cached_postings_compression_time_seconds to thanos_bucket_store_cached_postings_compression_time_seconds_total.
  • #2915 Receive,Ruler: Enable TSDB directory locking by default. Add a new flag (--tsdb.no-lockfile) to override behavior.
  • #2902 Querier UI:Separate dedupe and partial response checkboxes per panel in new UI.
  • #2991 Store: breaking ⚠️ operation label value getrange changed to get_range for thanos_store_bucket_cache_operation_requests_total and thanos_store_bucket_cache_operation_hits_total to be consistent with bucket operation metrics.
  • #2876 Receive,Ruler: Updated TSDB and switched to ChunkIterators instead of sample one, which avoids unnecessary decoding / encoding.
  • #3064 s3: breaking ⚠️ Add SSE/SSE-KMS/SSE-C configuration. The S3 encrypt_sse: true option is now deprecated in favour of sse_config. If you used encrypt_sse, the migration strategy is to set up the following block:
sse_config:
  type: SSE-S3
thanos - v0.15.0-rc.1

Published by kakkoyun about 4 years ago

thanos - v0.15.0-rc.0

Published by bwplotka about 4 years ago

thanos - v0.14.0

Published by kakkoyun over 4 years ago

Highlights:

  • Upgrade to Prometheus @3268eac2ddda which is after v2.18.1.
    • TSDB now does memory-mapping of Head chunks and reduces memory usage.
  • Querier performs concurrent select per query.
  • Changed bucket tool bucket verify --id-whitelist flag to --id.
  • Store removed support to the legacy index.cache.json. The hidden flag --store.disable-index-header was removed.
  • Store decreased memory allocations while querying block's index.

Thanks all for contributing! 🤗

Fixed

  • #2637 Compact: Detect retryable errors that are inside of a wrapped tsdb.MultiError.
  • #2648 Store: Allow index cache and caching bucket to be configured at the same time.
  • #2705 minio-go: Added support for af-south-1 and eu-south-1 regions.
  • #2728 Query: Fixed panics when using a larger number of replica labels with short series label sets.
  • #2787 Update Prometheus mod to pull in prometheus/prometheus#7414.
  • #2807 Store: Decreased memory allocations while querying block's index.
  • #2809 Query: /api/v1/stores now guarantees to return a string in the lastError field.

Changed

  • #2658 #2703 Upgrade to Prometheus @3268eac2ddda which is after v2.18.1.
    • TSDB now does memory-mapping of Head chunks and reduces memory usage.
  • #2667 Store: Removed support to the legacy index.cache.json. The hidden flag --store.disable-index-header was removed.
  • #2613 Store: Renamed the caching bucket config option chunk_object_size_ttl to chunk_object_attrs_ttl.
  • #2667 Compact: The deprecated flag --index.generate-missing-cache-file and the metric thanos_compact_generated_index_total were removed.
  • 2603 Store/Querier: Significantly optimize cases where StoreAPIs or blocks returns exact overlapping chunks (e.g Store GW and sidecar or brute force Store Gateway HA).
  • #2671 breaking Tools: Bucket replicate flag --resolution is now in Go duration format.
  • #2671 Tools: Bucket replicate now replicates by default all blocks.
  • #2739 Changed bucket tool bucket verify --id-whitelist flag to --id.
  • #2748 Upgrade Prometheus to @66dfb951c4ca which is after v2.19.0.
    • PromQL now allows us to executed concurrent selects.

Added

  • #2671 Tools: Bucket replicate now allows passing repeated --compaction and --resolution flags.
  • #2657 Querier: Add the ability to perform concurrent select request per query.
  • #2754 UI: Add stores page in the React UI.
  • #2752 Compact: Add flag --block-viewer.global.sync-block-interval to configure metadata sync interval for the bucket UI.
thanos - v0.14.0-rc.1

Published by kakkoyun over 4 years ago

Fixed

  • #2848 Querier: Fixed a panic that occured when /api/v1/stores requested.
    • The issue was introduced in #2809 within the current release cycle.

Thanks all for contributing! 🤗

thanos - v0.14.0-rc.0

Published by kakkoyun over 4 years ago

Highlights:

  • Upgrade to Prometheus @3268eac2ddda which is after v2.18.1.
    • TSDB now does memory-mapping of Head chunks and reduces memory usage.
  • Querier performs concurrent select per query.
  • Changed bucket tool bucket verify --id-whitelist flag to --id.
  • Store removed support to the legacy index.cache.json. The hidden flag --store.disable-index-header was removed.
  • Store decreased memory allocations while querying block's index.

Thanks all for contributing! 🤗

Fixed

  • #2637 Compact: Detect retryable errors that are inside of a wrapped tsdb.MultiError.
  • #2648 Store: Allow index cache and caching bucket to be configured at the same time.
  • #2705 minio-go: Added support for af-south-1 and eu-south-1 regions.
  • #2728 Query: Fixed panics when using a larger number of replica labels with short series label sets.
  • #2787 Update Prometheus mod to pull in prometheus/prometheus#7414.
  • #2807 Store: Decreased memory allocations while querying block's index.
  • #2809 Query: /api/v1/stores now guarantees to return a string in the lastError field.

Changed

  • #2658 #2703 Upgrade to Prometheus @3268eac2ddda which is after v2.18.1.
    • TSDB now does memory-mapping of Head chunks and reduces memory usage.
  • #2667 Store: Removed support to the legacy index.cache.json. The hidden flag --store.disable-index-header was removed.
  • #2613 Store: Renamed the caching bucket config option chunk_object_size_ttl to chunk_object_attrs_ttl.
  • #2667 Compact: The deprecated flag --index.generate-missing-cache-file and the metric thanos_compact_generated_index_total were removed.
  • 2603 Store/Querier: Significantly optimize cases where StoreAPIs or blocks returns exact overlapping chunks (e.g Store GW and sidecar or brute force Store Gateway HA).
  • #2671 breaking Tools: Bucket replicate flag --resolution is now in Go duration format.
  • #2671 Tools: Bucket replicate now replicates by default all blocks.
  • #2739 Changed bucket tool bucket verify --id-whitelist flag to --id.
  • #2748 Upgrade Prometheus to @66dfb951c4ca which is after v2.19.0.
    • PromQL now allows us to executed concurrent selects.

Added

  • #2671 Tools: Bucket replicate now allows passing repeated --compaction and --resolution flags.
  • #2657 Querier: Add the ability to perform concurrent select request per query.
  • #2754 UI: Add stores page in the React UI.
  • #2752 Compact: Add flag --block-viewer.global.sync-block-interval to configure metadata sync interval for the bucket UI.
thanos - v0.13.0

Published by bwplotka over 4 years ago

Highlights:

  • Rare overlapping block issue fixed
  • Deduplication bug with counters fixed
  • Receiver is now multi-tenant (multi TSDB)
  • Added @memcached meta and chunk caching. (e.g allows to significantly limit objstore traffic)
  • TSDB isolation added for Ruler and Receive
  • Querier performance significantly improved especially when used with duplicated Store APIs (e.g for HA purposes).

Thanks all for contributing! 🤗

Fixed

  • #2548 Query: Fixed rare cases of double counter reset accounting when querying rate with deduplication enabled.
  • #2536 S3: Fixed AWS STS endpoint url to https for Web Identity providers on AWS EKS.
  • #2501 Query: Gracefully handle additional fields in SeriesResponse protobuf message that may be added in the future.
  • #2568 Query: Don't close the connection of strict, static nodes if establishing a connection had succeeded but Info() call failed.
  • #2615 Rule: Fix bugs where rules were out of sync.
  • #2614 Tracing: Disabled Elastic APM Go Agent default tracer on initialization to disable the default metric gatherer.
  • #2525 Query: Fixed logging for dns resolution error in the Query component.
  • #2484 Query/Ruler: Fixed issue #2483, when web.route-prefix is set, it is added twice in HTTP router prefix.
  • #2416 Bucket: Fixed issue #2416 bug in inspect --sort-by doesn't work correctly in all cases.
  • #2719 Query: irate and resets use now counter downsampling aggregations.
  • #2705 minio-go: Added support for af-south-1 and eu-south-1 regions.
  • #2753 Sidecar, Receive, Rule: Fixed possibility of out of order uploads in error cases. This could potentially cause Compactor to create overlapping blocks.

Added

  • #2012 Receive: Added multi-tenancy support (based on header)
  • #2502 StoreAPI: Added hints field to SeriesResponse. Hints in an opaque data structure that can be used to carry additional information from the store and its content is implementation specific.
  • #2521 Sidecar: Added thanos_sidecar_reloader_reloads_failed_total, thanos_sidecar_reloader_reloads_total, thanos_sidecar_reloader_watch_errors_total, thanos_sidecar_reloader_watch_events_total and thanos_sidecar_reloader_watches metrics.
  • #2412 UI: Added React UI from Prometheus upstream. Currently only accessible from Query component as only /graph endpoint is migrated.
  • #2532 Store: Added hidden option --store.caching-bucket.config=<yaml content> (or --store.caching-bucket.config-file=<file.yaml>) for experimental caching bucket, that can cache chunks into shared memcached. This can speed up querying and reduce number of requests to object storage.
  • #2579 Store: Experimental caching bucket can now cache metadata as well. Config has changed from #2532.
  • #2526 Compact: In case there are no labels left after deduplication via --deduplication.replica-label, assign first replica-label with value deduped.
  • #2621 Receive: Added flag to configure forward request timeout. Receive write will complete request as soon as a quorum of writes succeeds.

Changed

  • #2194 Updated to golang v1.14.2.
  • #2505 Store: Removed obsolete thanos_store_node_info metric.
  • 2513 Tools: Moved thanos bucket commands to thanos tools bucket, also
    moved thanos check rules to thanos tools rules-check. thanos tools rules-check also takes rules by --rules repeated flag not argument
    anymore.
  • #2548 Store, Querier: remove duplicated chunks on StoreAPI.
  • #2596 Updated Prometheus dependency to @cd73b3d33e064bbd846fc7a26dc8c313d46af382 which falls in between v2.17.0 and v2.18.0.
    • Receive,Rule: TSDB now supports isolation of append and queries.
    • Receive,Rule: TSDB now holds less WAL files after Head Truncation.
  • #2450 Store: Added Regex-set optimization for label=~"a|b|c" matchers.
  • #2526 Compact: In case there are no labels left after deduplication via --deduplication.replica-label, assign first replica-label with value deduped.
  • 2603 Store/Querier: Significantly optimize cases where StoreAPIs or blocks returns exact overlapping chunks (e.g Store GW and sidecar or brute force Store Gateway HA).
thanos - v0.13.0-rc.2

Published by bwplotka over 4 years ago

Fixed

  • #2548 Query: Fixed rare cases of double counter reset accounting when querying rate with deduplication enabled.
  • #2536 S3: Fixed AWS STS endpoint url to https for Web Identity providers on AWS EKS.
  • #2501 Query: Gracefully handle additional fields in SeriesResponse protobuf message that may be added in the future.
  • #2568 Query: Don't close the connection of strict, static nodes if establishing a connection had succeeded but Info() call failed.
  • #2615 Rule: Fix bugs where rules were out of sync.
  • #2614 Tracing: Disabled Elastic APM Go Agent default tracer on initialization to disable the default metric gatherer.
  • #2525 Query: Fixed logging for dns resolution error in the Query component.
  • #2484 Query/Ruler: Fixed issue #2483, when web.route-prefix is set, it is added twice in HTTP router prefix.
  • #2416 Bucket: Fixed issue #2416 bug in inspect --sort-by doesn't work correctly in all cases.
  • #2719 Query: irate and resets use now counter downsampling aggregations.
  • #2705 minio-go: Added support for af-south-1 and eu-south-1 regions.
  • #2753 Sidecar, Receive, Rule: Fixed possibility of out of order uploads in error cases. This could potentially cause Compactor to create overlapping blocks.

Added

  • #2012 Receive: Added multi-tenancy support (based on header)
  • #2502 StoreAPI: Added hints field to SeriesResponse. Hints in an opaque data structure that can be used to carry additional information from the store and its content is implementation specific.
  • #2521 Sidecar: Added thanos_sidecar_reloader_reloads_failed_total, thanos_sidecar_reloader_reloads_total, thanos_sidecar_reloader_watch_errors_total, thanos_sidecar_reloader_watch_events_total and thanos_sidecar_reloader_watches metrics.
  • #2412 UI: Added React UI from Prometheus upstream. Currently only accessible from Query component as only /graph endpoint is migrated.
  • #2532 Store: Added hidden option --store.caching-bucket.config=<yaml content> (or --store.caching-bucket.config-file=<file.yaml>) for experimental caching bucket, that can cache chunks into shared memcached. This can speed up querying and reduce number of requests to object storage.
  • #2579 Store: Experimental caching bucket can now cache metadata as well. Config has changed from #2532.
  • #2526 Compact: In case there are no labels left after deduplication via --deduplication.replica-label, assign first replica-label with value deduped.
  • #2621 Receive: Added flag to configure forward request timeout. Receive write will complete request as soon as quorum of writes succeeds.

Changed

  • #2194 Updated to golang v1.14.2.
  • #2505 Store: Removed obsolete thanos_store_node_info metric.
  • 2513 Tools: Moved thanos bucket commands to thanos tools bucket, also
    moved thanos check rules to thanos tools rules-check. thanos tools rules-check also takes rules by --rules repeated flag not argument
    anymore.
  • #2548 Store, Querier: remove duplicated chunks on StoreAPI.
  • #2596 Updated Prometheus dependency to @cd73b3d33e064bbd846fc7a26dc8c313d46af382 which falls in between v2.17.0 and v2.18.0.
    • Receive,Rule: TSDB now supports isolation of append and queries.
    • Receive,Rule: TSDB now holds less WAL files after Head Truncation.
  • #2450 Store: Added Regex-set optimization for label=~"a|b|c" matchers.
  • #2526 Compact: In case there are no labels left after deduplication via --deduplication.replica-label, assign first replica-label with value deduped.
  • 2603 Store/Querier: Significantly optimize cases where StoreAPIs or blocks returns exact overlapping chunks (e.g Store GW and sidecar or brute force Store Gateway HA).
thanos - v0.13.0-rc.1

Published by bwplotka over 4 years ago

Fixed

  • #2548 Query: Fixed rare cases of double counter reset accounting when querying rate with deduplication enabled.
  • #2536 S3: Fixed AWS STS endpoint url to https for Web Identity providers on AWS EKS.
  • #2501 Query: Gracefully handle additional fields in SeriesResponse protobuf message that may be added in the future.
  • #2568 Query: Don't close the connection of strict, static nodes if establishing a connection had succeeded but Info() call failed.
  • #2615 Rule: Fix bugs where rules were out of sync.
  • #2614 Tracing: Disabled Elastic APM Go Agent default tracer on initialization to disable the default metric gatherer.
  • #2525 Query: Fixed logging for dns resolution error in the Query component.
  • #2484 Query/Ruler: Fixed issue #2483, when web.route-prefix is set, it is added twice in HTTP router prefix.
  • #2416 Bucket: Fixed issue #2416 bug in inspect --sort-by doesn't work correctly in all cases.
  • #2719 Query: irate and resets use now counter downsampling aggregations.
  • #2705 minio-go: Added support for af-south-1 and eu-south-1 regions.

Added

  • #2012 Receive: Added multi-tenancy support (based on header)
  • #2502 StoreAPI: Added hints field to SeriesResponse. Hints in an opaque data structure that can be used to carry additional information from the store and its content is implementation specific.
  • #2521 Sidecar: Added thanos_sidecar_reloader_reloads_failed_total, thanos_sidecar_reloader_reloads_total, thanos_sidecar_reloader_watch_errors_total, thanos_sidecar_reloader_watch_events_total and thanos_sidecar_reloader_watches metrics.
  • #2412 UI: Added React UI from Prometheus upstream. Currently only accessible from Query component as only /graph endpoint is migrated.
  • #2532 Store: Added hidden option --store.caching-bucket.config=<yaml content> (or --store.caching-bucket.config-file=<file.yaml>) for experimental caching bucket, that can cache chunks into shared memcached. This can speed up querying and reduce number of requests to object storage.
  • #2579 Store: Experimental caching bucket can now cache metadata as well. Config has changed from #2532.
  • #2526 Compact: In case there are no labels left after deduplication via --deduplication.replica-label, assign first replica-label with value deduped.
  • #2621 Receive: Added flag to configure forward request timeout. Receive write will complete request as soon as quorum of writes succeeds.

Changed

  • #2194 Updated to golang v1.14.2.
  • #2505 Store: Removed obsolete thanos_store_node_info metric.
  • 2513 Tools: Moved thanos bucket commands to thanos tools bucket, also
    moved thanos check rules to thanos tools rules-check. thanos tools rules-check also takes rules by --rules repeated flag not argument
    anymore.
  • #2548 Store, Querier: remove duplicated chunks on StoreAPI.
  • #2596 Updated Prometheus dependency to @cd73b3d33e064bbd846fc7a26dc8c313d46af382 which falls in between v2.17.0 and v2.18.0.
    • Receive,Rule: TSDB now supports isolation of append and queries.
    • Receive,Rule: TSDB now holds less WAL files after Head Truncation.
  • #2450 Store: Added Regex-set optimization for label=~"a|b|c" matchers.
  • #2526 Compact: In case there are no labels left after deduplication via --deduplication.replica-label, assign first replica-label with value deduped.
  • 2603 Store/Querier: Significantly optimize cases where StoreAPIs or blocks returns exact overlapping chunks (e.g Store GW and sidecar or brute force Store Gateway HA).
thanos - v0.13.0-rc.0

Published by bwplotka over 4 years ago

Highlights:

  • Fixed very old bug of rare double counter reset accounting when querying rate with deduplication enabled, thanks to @SuperQ who shared raw data, that reproducing this!
  • Thanos Receive is now multitenant 🎉 and closer to claiming production stability 💪
  • Huge movement in moving Thanos UIs to React thanks to our GSoC mentee @prmsrswt !
  • Experimental Memcached caching for metadata and chunks thanks to @pstibrany and Cortex project ❤️
  • Optimized querier for various cases of overlapping data (e.g sidecar + store GW overlap) 📉
  • Thanos Rule and Receive should use less resources thanks to Prometheus improvements 🎉

Fixed

  • #2548 Query: Fixed rare cases of double counter reset accounting when querying rate with deduplication enabled.
  • #2536 S3: Fixed AWS STS endpoint URL to https for Web Identity providers on AWS EKS.
  • #2501 Query: Gracefully handle additional fields in SeriesResponse protobuf message that may be added in the future.
  • #2568 Query: Don't close the connection of strict, static nodes if establishing a connection had succeeded but Info() call failed.
  • #2615 Rule: Fix bugs where rules were out of sync.
  • #2614 Tracing: Disabled Elastic APM Go Agent default tracer on initialization to disable the default metric gatherer.
  • #2525 Query: Fixed logging for dns resolution error in the Query component.
  • #2484 Query/Ruler: Fixed issue #2483, when web.route-prefix is set, it is added twice in HTTP router prefix.
  • #2416 Bucket: Fixed issue #2416 bug in inspect --sort-by doesn't work correctly in all cases

Added

  • #2012 Receive: Added multi-tenancy support (based on the header)
  • #2502 StoreAPI: Added hints field to SeriesResponse. Hints in an opaque data structure that can be used to carry additional information from the store and its content is implementation-specific.
  • #2521 Sidecar: Added thanos_sidecar_reloader_reloads_failed_total, thanos_sidecar_reloader_reloads_total, thanos_sidecar_reloader_watch_errors_total, thanos_sidecar_reloader_watch_events_total and thanos_sidecar_reloader_watches metrics.
  • #2412 UI: Added React UI from Prometheus upstream. Currently only accessible from Query component as only /graph endpoint is migrated.
  • #2532 Store: Added hidden option --store.caching-bucket.config=<yaml content> (or --store.caching-bucket.config-file=<file.yaml>) for experimental caching bucket, that can cache chunks into shared memcached. This can speed up querying and reduce number of requests to object storage.
  • #2579 Store: Experimental caching bucket can now cache metadata as well. Config has changed from #2532.
  • #2621 Receive: Added flag to configure forward request timeout. Receive write will complete request as soon as a quorum of writes succeeds.

Changed

  • #2194 Updated to golang v1.14.2.
  • #2505 Store: Removed obsolete thanos_store_node_info metric.
  • 2513 Tools: Moved thanos bucket commands to thanos tools bucket, also
    moved thanos check rules to thanos tools rules-check. thanos tools rules-check also takes rules by --rules repeated flag not argument
    anymore.
  • #2548 Store, Querier: remove duplicated chunks on StoreAPI.
  • #2596 Updated Prometheus dependency to @cd73b3d33e064bbd846fc7a26dc8c313d46af382 which falls in between v2.17.0 and v2.18.0.
    • Receive,Rule: TSDB now supports isolation of append and queries.
    • Receive,Rule: TSDB now holds less WAL files after Head Truncation.
  • #2450 Store: Added Regex-set optimization for label=~"a|b|c" matchers.
  • #2526 Compact: In case there are no labels left after deduplication via --deduplication.replica-label, assign first replica-label with value deduped.
thanos - v0.12.2

Published by squat over 4 years ago

Fixed

  • #2459 Compact: Fixed issue with old blocks being marked and deleted in a (slow) loop.
  • #2533 Rule: do not wrap reload endpoint with /. Makes /-/reload accessible again when no prefix has been specified.
thanos - v0.12.1

Published by squat over 4 years ago

Fixed

  • #2411 Query: fix a bug where queries might not time out sometimes due to issues with one or more StoreAPIs.
  • #2474 Store: fix a panic caused by concurrent memory access during block filtering.
  • #2472 Compact: fix a bug where partial blocks were never deleted, causing spam of warnings.
  • #2484 Query/Ruler: fix issue #2483, when web.route-prefix is set, it is added twice in HTTP router prefix.