Compare commits

...

68 Commits

Author SHA1 Message Date
Giedrius Statkevičius 3727363b49
Merge pull request #8335 from pedro-stanaka/fix/flaky-unit-test-store-proxy
fix: make TestProxyStore_SeriesSlowStores less flaky by removing timing assertions
2025-06-26 12:20:19 +03:00
Giedrius Statkevičius 37254e5779
Merge pull request #8336 from thanos-io/lazyindexheader_fix
indexheader: fix race between lazy index header creation
2025-06-26 11:19:12 +03:00
Giedrius Statkevičius 4b31bbaa6b indexheader: create lazy header in singleflight
Creation of the index header shares the underlying storage so we should
use singleflight here to only create it once.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-26 10:18:07 +03:00
Giedrius Statkevičius d6ee898a06 indexheader: produce race in test
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-26 10:01:21 +03:00
Giedrius Statkevičius 5a95d13802
Merge pull request #8333 from thanos-io/repro_8224
e2e: add repro for 8224
2025-06-26 08:01:35 +03:00
Pedro Tanaka b54d293dbd
fix: make TestProxyStore_SeriesSlowStores less flaky by removing timing assertions
The TestProxyStore_SeriesSlowStores test was failing intermittently in CI due to
strict timing assertions that were sensitive to system load and scheduling variations.

The test now focuses on functional correctness rather than precise timing,
making it more reliable in CI environments while still validating the
proxy store's timeout and partial response behavior.

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>
2025-06-25 23:09:47 +02:00
Giedrius Statkevičius dfcbfe7c40 e2e: add repro for 8224
Add repro for https://github.com/thanos-io/thanos/issues/8224. Fix in
follow up PRs.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-25 18:07:48 +03:00
Giedrius Statkevičius 8b738c55b1
Merge pull request #8331 from thanos-io/merge-release-0.39-to-main-v2
Merge release 0.39 to main
2025-06-25 15:25:36 +03:00
Giedrius Statkevičius 69624ecbf1 Merge branch 'main' into merge-release-0.39-to-main-v2
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-25 14:59:35 +03:00
Giedrius Statkevičius 0453c9b144
*: release 0.39.0 (#8330)
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-25 14:05:34 +03:00
Saswata Mukherjee 9c955d21df
e2e: Check rule group label works (#8322)
* e2e: Check rule group label works

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Fix fanout test

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2025-06-23 10:27:07 +01:00
Paul 7de9c13e5f
add rule tsdb.enable-native-histograms flag (#8321)
Signed-off-by: Paul Hsieh <supaulkawaii@gmail.com>
2025-06-23 10:06:00 +01:00
Giedrius Statkevičius a6c05e6df6
*: add CHANGELOG, update VERSION (#8320)
Prepare for 0.39.0-rc.0.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-20 07:12:19 +03:00
Giedrius Statkevičius 34a98c8efb
CHANGELOG: indicate release (#8319)
Indicate that 0.39.0 is in progress.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-19 17:59:12 +03:00
Giedrius Statkevičius 933f04f55e
query_frontend: only ready if downstream is ready (#8315)
We had an incident in prod where QFE was reporting that it is ready even
though the downstream didn't work due to a misconfigured load-balancer.
In this PR I am proposing sending periodic requests to downstream
to check whether it is working.

TestQueryFrontendTenantForward never worked so I deleted it.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-18 11:56:48 +03:00
dependabot[bot] f1c0f4b9b8
build(deps): bump github.com/KimMachineGun/automemlimit (#8312)
Bumps [github.com/KimMachineGun/automemlimit](https://github.com/KimMachineGun/automemlimit) from 0.7.2 to 0.7.3.
- [Release notes](https://github.com/KimMachineGun/automemlimit/releases)
- [Commits](https://github.com/KimMachineGun/automemlimit/compare/v0.7.2...v0.7.3)

---
updated-dependencies:
- dependency-name: github.com/KimMachineGun/automemlimit
  dependency-version: 0.7.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-18 12:35:30 +05:30
Hongcheng Zhu a6370c7cc6
Add Prometheus counters for pending write requests and series requests in Receive (#8308)
Signed-off-by: HC Zhu <hczhu.mtv@gmail.com>
Co-authored-by: HC Zhu (DB) <hc.zhu@databricks.com>
2025-06-17 10:46:12 +05:30
Hongcheng Zhu 8f715b0b6b
Query: limit LazyRetrieval memory buffer size (#8296)
* Limit lazyRespSet memory buffer size using a ring buffer

Signed-off-by: HC Zhu <hczhu.mtv@gmail.com>

* store: make heap a bit more consistent

Add len comparison to make it more consistent.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Fix linter complains

Signed-off-by: HC Zhu <hczhu.mtv@gmail.com>

---------

Signed-off-by: HC Zhu <hczhu.mtv@gmail.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Co-authored-by: HC Zhu (DB) <hc.zhu@databricks.com>
Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Co-authored-by: HC Zhu <hczhu.mtv@gmail.com>
2025-06-14 10:52:46 -07:00
Filip Petkovski 6c27396458
Merge pull request #8306 from GregSharpe1/main
[docs] Updating documentation around --compact flags
2025-06-13 08:53:04 +02:00
Greg Sharpe d1afea6a69 Updating the documention to reflect the correct flags when using --compact.enable-vertical-compaction.
Signed-off-by: Greg Sharpe <git+me@gregsharpe.co.uk>
2025-06-13 08:28:42 +02:00
gabyf 03d5b6bc28
tools: fix tool bucket inspect output arg description (#8252)
* docs: fix tool bucket output arg description

Signed-off-by: gabyf <zweeking.tech@gmail.com>

* fix(tools_bucket): output description from cvs to csv

Signed-off-by: gabyf <zweeking.tech@gmail.com>

---------

Signed-off-by: gabyf <zweeking.tech@gmail.com>
2025-06-12 16:35:42 -07:00
Giedrius Statkevičius 8769b97c86
go.mod: update promql engine + Prom dep (#8305)
Update dependencies. Almost everything works except for
https://github.com/prometheus/prometheus/pull/16252.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-12 10:50:03 +03:00
Aaron Walker 26f6e64365
Revert capnp to v3.0.0-alpha (#8300)
cef0b02 caused a regression of !7944. This reverts the version upgrade to the previously working version

Signed-off-by: Aaron Walker <aaron@vcra.io>
2025-06-10 09:41:59 +05:30
dependabot[bot] 60533e4a22
build(deps): bump golang.org/x/time from 0.11.0 to 0.12.0 (#8302)
Bumps [golang.org/x/time](https://github.com/golang/time) from 0.11.0 to 0.12.0.
- [Commits](https://github.com/golang/time/compare/v0.11.0...v0.12.0)

---
updated-dependencies:
- dependency-name: golang.org/x/time
  dependency-version: 0.12.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-10 09:31:02 +05:30
dependabot[bot] 95a2b00f17
build(deps): bump github.com/alicebob/miniredis/v2 from 2.22.0 to 2.35.0 (#8303)
Bumps [github.com/alicebob/miniredis/v2](https://github.com/alicebob/miniredis) from 2.22.0 to 2.35.0.
- [Release notes](https://github.com/alicebob/miniredis/releases)
- [Changelog](https://github.com/alicebob/miniredis/blob/master/CHANGELOG.md)
- [Commits](https://github.com/alicebob/miniredis/compare/v2.22.0...v2.35.0)

---
updated-dependencies:
- dependency-name: github.com/alicebob/miniredis/v2
  dependency-version: 2.35.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-10 09:30:45 +05:30
dependabot[bot] 2ed24bdf5b
build(deps): bump github/codeql-action from 3.26.13 to 3.28.19 (#8304)
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.26.13 to 3.28.19.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](f779452ac5...fca7ace96b)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 3.28.19
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-10 09:30:25 +05:30
Naman-Parlecha 23d60b8615
Fix: DataRace in TestEndpointSetUpdate_StrictEndpointMetadata test (#8288)
* fix: Fixing Unit Test TestEndpointSetUpdate_StrictEndpointMetadata

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* revert: CHANGELOG.md

Signed-off-by: Naman-Parlecha <namanparlecha@gmail.com>

---------

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
Signed-off-by: Naman-Parlecha <namanparlecha@gmail.com>
2025-06-06 15:51:53 +03:00
Naman-Parlecha 290f16c0e9
Resolve GitHub Actions Failure (#8299)
* update: changing to new prometheus page

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* fix: disable-admin-op flag

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

---------

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-06-05 13:52:12 +03:00
Aaron Walker 4ad45948cd
Receive: Remove migration of legacy storage to multi-tsdb (#8289)
This has been in since 0.13 (~5 years ago). This fixes issues caused when the default-tenant does not have any data and gets churned, resulting in the migration assuming that per-tenant directories are actually blocks, resulting in blocks not being queryable.

Signed-off-by: Aaron Walker <aaron@vcra.io>
2025-06-03 16:57:57 +03:00
Daniel Blando 15b1ef2ead
shipper: allow shipper sync to skip corrupted blocks (#8259)
* Allow shipper sync to skip corrupted blocks

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

* Move check to blockMetasFromOldest

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

* Split metrics. Return error

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

* fix test

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

* Reorder shipper contructor variables

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

* Use opts in shipper constructor

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

* Fix typo

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

---------

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>
2025-06-02 23:30:16 -07:00
Naman-Parlecha 2029c9bee0
store: Add --disable-admin-operations Flag to Store Gateway (#8284)
* fix(sidebar): maintain expanded state based on current page

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* fixing changelog

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* store: --disable-admin-operation flag

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

docs: Adding Flag details

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

updated changelog

refactor: changelog

Signed-off-by: Naman-Parlecha <namanparlecha@gmail.com>

---------

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
Signed-off-by: Naman-Parlecha <namanparlecha@gmail.com>
2025-06-01 15:26:58 -07:00
Saumya Shah 4e04420489
query: handle query.Analyze returning nil gracefully (#8199)
* fix: handle analyze returning nil gracefully

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* update CHANGELOG.md

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* fix format

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

---------

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>
2025-05-30 12:15:42 +03:00
Naman-Parlecha 36df30bbe8
fix: maintain expanded state based on current page (#8266)
* fix(sidebar): maintain expanded state based on current page

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* fixing changelog

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

---------

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
Signed-off-by: Naman-Parlecha <namanparlecha@gmail.com>
2025-05-30 12:07:23 +03:00
Saumya Shah 390fd0a023
query, query-frontend, ruler: Add support for flags to use promQL experimental functions & bump promql-engine (#8245)
* feat: add support for experimental functions, if enabled

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* fix tests

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* allow setting enable-feature flag in ruler

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* add flag info in docs

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* add CHANGELOG

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* add hidden flag to throw err on query fallback, red in tests ^_^

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* bump promql-engine to latest version/commit

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* format docs

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

---------

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>
2025-05-30 10:04:28 +03:00
Anna Tran 12649d8be7
Force sync writes to meta.json in case of host crash (#8282)
* Force sync writes to meta.json in case of host crash

Signed-off-by: Anna Tran <trananna@amazon.com>

* Update CHANGELOG for fsync meta.json

Signed-off-by: Anna Tran <trananna@amazon.com>

---------

Signed-off-by: Anna Tran <trananna@amazon.com>
2025-05-29 12:23:49 +03:00
Giedrius Statkevičius cef0b0200e
go.mod: mass update modules (#8277)
Maintenance task: let's update all modules.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-05-27 18:32:28 +03:00
Saumya Shah efc6eee8c6
query: fix query analyze to return appropriate results (#8262)
* call query analysis once querys being exec

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* refract the analyze logic

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* send not analyzable warnings instead of returning err

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* add seperate warnings in query non analyzable state based on engine

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

---------

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>
2025-05-27 16:13:30 +03:00
Siavash Safi da421eaffe
Shipper: fix missing meta file errors (#8268)
- fix meta file read error check
- use proper logs for missing meta file vs. other read errors

Signed-off-by: Siavash Safi <siavash@cloudflare.com>
2025-05-23 11:46:09 +00:00
Giedrius Statkevičius d71a58cbd4
docs: fix receive page (#8267)
Fix the docs after the most recent merge.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-05-23 10:47:01 +00:00
Giedrius Statkevičius f847ff0262
receive: implement shuffle sharding (#8238)
See the documentation for details.

Closes #3821.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-05-22 11:08:23 +03:00
dronenb ec9601aa0e
feat(promu): add darwin/arm64 (#8263)
* feat(promu): add darwin/arm64

Signed-off-by: Ben Dronen <dronenb@users.noreply.github.com>

* fix(promu): just use darwin

Signed-off-by: Ben Dronen <dronenb@users.noreply.github.com>

---------

Signed-off-by: Ben Dronen <dronenb@users.noreply.github.com>
2025-05-22 10:04:57 +02:00
Michael Hoffmann 759773c4dc
shipper: delete unused functions (#8260)
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-05-21 08:18:52 +00:00
Giedrius Statkevičius 88092449cd
docs: volunteer as shepherd (#8249)
* docs: volunteer as shepherd

Release the next version in a few weeks.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Fix formatting

Signed-off-by: Matej Gera <38492574+matej-g@users.noreply.github.com>

---------

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Matej Gera <38492574+matej-g@users.noreply.github.com>
Co-authored-by: Matej Gera <38492574+matej-g@users.noreply.github.com>
2025-05-15 14:56:00 +03:00
Ayoub Mrini 34b3d64034
test(tools_test.go/Test_CheckRules_Glob): take into consideration RO current dirs while (#8014)
changing files permissions.

The process may not have the needed permissions on the file (not the owner, not root or doesn't have the CAP_FOWNER capability)
to chmod it.
i

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-05-14 13:20:14 +01:00
dongjiang 242b5f6307
add otlp clientType (#8243)
Signed-off-by: dongjiang <dongjiang1989@126.com>
2025-05-13 14:18:41 +03:00
Giedrius Statkevičius aa3e4199db
e2e: disable some more flaky tests (#8241)
These are flaky hence disable them.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-05-09 16:46:23 +03:00
Giedrius Statkevičius 81b4260f5f
reloader: disable some flaky tests (#8240)
Disabling some flaky tests.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-05-08 15:59:24 +03:00
Saumya Shah 2dfc749a85
UI: bump codemirror-promql dependency to latest version (#8230)
* bump codemirror-promql react dep to latest version

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* fix lint errors, build react-app

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* sync ui change of input expression

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* revert build files

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* build and update few warnings

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

---------

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>
2025-05-07 11:20:49 +01:00
Philip Gough 2a5a856e34
tools: Extend bucket ls options (#8225)
* tools: Extend bucket ls command with min and max time, selector config and timeout options

Signed-off-by: Philip Gough <philip.p.gough@gmail.com>

* make: docs

Signed-off-by: Philip Gough <philip.p.gough@gmail.com>

Update cmd/thanos/tools_bucket.go

Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Philip Gough <pgough@redhat.com>

Update cmd/thanos/tools_bucket.go

Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Philip Gough <pgough@redhat.com>

---------

Signed-off-by: Philip Gough <philip.p.gough@gmail.com>
Signed-off-by: Philip Gough <pgough@redhat.com>
Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2025-04-25 10:33:34 +00:00
Giedrius Statkevičius cff147dbc0
receive: remove Get() method from hashring (#8226)
Get() is equivalent to GetN(1) so remove it. It's not used.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-04-25 09:59:37 +00:00
Saswata Mukherjee 7d7ea650b7
Receive: Ensure forward/replication metrics are incremented in err cases (#8212)
* Ensure forward/replication metrics are incremented in err cases

This commit ensures forward and replication metrics are incremented with
err labels.

This seemed to be missing, came across this whilst working on a
dashboard.

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Add changelog

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2025-04-22 11:35:34 +00:00
Andrew Reilly 92db7aabb1
Update query.md documentation where example uses --query.tenant-default-id flag instead of --query.default-tenant-id (#8210)
Signed-off-by: Andrew Reilly <adr@maas.ca>
2025-04-22 11:27:49 +03:00
Filip Petkovski 66f54ac88d
Merge pull request #8216 from yuchen-db/fix-iter-race
Fix Pull iterator race between next() and stop()
2025-04-22 08:19:10 +02:00
Yuchen Wang a8220d7317 simplify unit test
Signed-off-by: Yuchen Wang <yuchen.wang@databricks.com>
2025-04-20 17:54:47 -07:00
Yuchen Wang 909c08fa98 add comments
Signed-off-by: Yuchen Wang <yuchen.wang@databricks.com>
2025-04-20 17:46:38 -07:00
Yuchen Wang 6663bb01ac update changelog
Signed-off-by: Yuchen Wang <yuchen.wang@databricks.com>
2025-04-20 17:16:33 -07:00
Yuchen Wang d7876b4303 fix unit test
Signed-off-by: Yuchen Wang <yuchen.wang@databricks.com>
2025-04-20 16:55:57 -07:00
Yuchen Wang 6f556d2bbb add unit test
Signed-off-by: Yuchen Wang <yuchen.wang@databricks.com>
2025-04-20 15:46:54 -07:00
Yuchen Wang 0dcc9e9ccd add changelog
Signed-off-by: Yuchen Wang <yuchen.wang@databricks.com>
2025-04-20 15:46:54 -07:00
Yuchen Wang f168dc0cbb fix Pull iter race between next() and stop()
Signed-off-by: Yuchen Wang <yuchen.wang@databricks.com>
2025-04-20 15:46:54 -07:00
dependabot[bot] 8273ad013c
build(deps): bump github.com/golang-jwt/jwt/v5 from 5.2.1 to 5.2.2 (#8164)
Bumps [github.com/golang-jwt/jwt/v5](https://github.com/golang-jwt/jwt) from 5.2.1 to 5.2.2.
- [Release notes](https://github.com/golang-jwt/jwt/releases)
- [Changelog](https://github.com/golang-jwt/jwt/blob/main/VERSION_HISTORY.md)
- [Commits](https://github.com/golang-jwt/jwt/compare/v5.2.1...v5.2.2)

---
updated-dependencies:
- dependency-name: github.com/golang-jwt/jwt/v5
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-17 15:08:12 +01:00
Michael Hoffmann 31c6115317
Query: fix partial response for distributed instant query (#8211)
This commit fixes a typo in partial response handling for distributed
instant queries.

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-04-17 08:06:42 +00:00
Aaron Walker c0b5500cb5
Unhide tsdb.enable-native-histograms flag in receive (#8202)
Signed-off-by: Aaron Walker <aaron@vcra.io>
2025-04-11 13:33:24 +02:00
Michael Hoffmann ce2b51f93e
Sidecar: increase default prometheus timeout (#8192)
Adjust the default get-config timeout to match the default get-config
interval.

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-04-07 15:29:22 +00:00
Naohiro Okada 1a559f9de8
fix changelog markdown. (#8190)
Signed-off-by: naohiroo <naohiro.dev@gmail.com>
2025-04-04 14:23:16 +02:00
Michael Hoffmann b2f5ee44a7
merge release 0.38.0 to main (#8186)
* Changelog: cut release 0.38-rc.0 (#8174)

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>

* Changelog: cut release 0.38.0-rc.1 (#8180)

* Query: fix endpointset setup

This commit fixes an issue where we add non-strict, non-group endpoints
to the endpointset twice, once with resolved addresses from the dns
provider and once with its dns prefix.

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>

* deps: bump promql-engine (#8181)

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>

* Changelog: cut release 0.38-rc.1

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>

---------

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>

* Changelog: cut release 0.38 (#8185)

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>

---------

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-04-04 06:10:48 +00:00
Michael Hoffmann 08e5907cba
deps: bump promql-engine (#8181)
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-03-31 13:22:35 +00:00
Michael Hoffmann 2fccdfbf5a
Query: fix endpointset setup (#8175)
This commit fixes an issue where we add non-strict, non-group endpoints
to the endpointset twice, once with resolved addresses from the dns
provider and once with its dns prefix.

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-03-27 07:02:21 +00:00
159 changed files with 4869 additions and 3220 deletions

View File

@ -48,7 +48,7 @@ jobs:
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@f779452ac5af1c261dce0346a8f964149f49322b # v3.26.13
uses: github/codeql-action/init@fca7ace96b7d713c7035871441bd52efbe39e27e # v3.28.19
with:
languages: ${{ matrix.language }}
config-file: ./.github/codeql/codeql-config.yml
@ -60,7 +60,7 @@ jobs:
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@f779452ac5af1c261dce0346a8f964149f49322b # v3.26.13
uses: github/codeql-action/autobuild@fca7ace96b7d713c7035871441bd52efbe39e27e # v3.28.19
# Command-line programs to run using the OS shell.
# 📚 https://git.io/JvXDl
@ -74,4 +74,4 @@ jobs:
# make release
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@f779452ac5af1c261dce0346a8f964149f49322b # v3.26.13
uses: github/codeql-action/analyze@fca7ace96b7d713c7035871441bd52efbe39e27e # v3.28.19

View File

@ -16,7 +16,7 @@ build:
crossbuild:
platforms:
- linux/amd64
- darwin/amd64
- darwin
- linux/arm64
- windows/amd64
- freebsd/amd64

View File

@ -18,7 +18,37 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
### Fixed
## [v0.38.0 - <in progress>](https://github.com/thanos-io/thanos/tree/release-0.38)
## [v0.39.0](https://github.com/thanos-io/thanos/tree/release-0.39) - 2025 06 25
In short: there are a bunch of fixes and small improvements. The shining items in this release are memory usage improvements in Thanos Query and shuffle sharding support in Thanos Receiver. Information about shuffle sharding support is available in the documentation. Thank you to all contributors!
### Added
- [#8308](https://github.com/thanos-io/thanos/pull/8308) Receive: Prometheus counters for pending write requests and series requests
- [#8225](https://github.com/thanos-io/thanos/pull/8225) tools: Extend bucket ls options.
- [#8238](https://github.com/thanos-io/thanos/pull/8238) Receive: add shuffle sharding support
- [#8284](https://github.com/thanos-io/thanos/pull/8284) Store: Add `--disable-admin-operations` Flag to Store Gateway
- [#8245](https://github.com/thanos-io/thanos/pull/8245) Querier/Query-Frontend/Ruler: Add `--enable-feature=promql-experimental-functions` flag option to enable using promQL experimental functions in respective Thanos components
- [#8259](https://github.com/thanos-io/thanos/pull/8259) Shipper: Add `--shipper.skip-corrupted-blocks` flag to allow `Sync()` to continue upload when finding a corrupted block
### Changed
- [#8282](https://github.com/thanos-io/thanos/pull/8282) Force sync writes to meta.json in case of host crash
- [#8192](https://github.com/thanos-io/thanos/pull/8192) Sidecar: fix default get config timeout
- [#8202](https://github.com/thanos-io/thanos/pull/8202) Receive: Unhide `--tsdb.enable-native-histograms` flag
- [#8315](https://github.com/thanos-io/thanos/pull/8315) Query-Frontend: only ready if downstream is ready
### Removed
- [#8289](https://github.com/thanos-io/thanos/pull/8289) Receive: *breaking :warning:* Removed migration of legacy-TSDB to multi-TSDB. Ensure you are running version >0.13
### Fixed
- [#8199](https://github.com/thanos-io/thanos/pull/8199) Query: handle panics or nil pointer dereference in querier gracefully when query analyze returns nil
- [#8211](https://github.com/thanos-io/thanos/pull/8211) Query: fix panic on nested partial response in distributed instant query
- [#8216](https://github.com/thanos-io/thanos/pull/8216) Query/Receive: fix iter race between `next()` and `stop()` introduced in https://github.com/thanos-io/thanos/pull/7821.
- [#8212](https://github.com/thanos-io/thanos/pull/8212) Receive: Ensure forward/replication metrics are incremented in err cases
- [#8296](https://github.com/thanos-io/thanos/pull/8296) Query: limit LazyRetrieval memory buffer size
## [v0.38.0](https://github.com/thanos-io/thanos/tree/release-0.38) - 03.04.2025
### Fixed
- [#8091](https://github.com/thanos-io/thanos/pull/8091) *: Add POST into allowed CORS methods header
@ -36,6 +66,8 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#8131](https://github.com/thanos-io/thanos/pull/8131) Store Gateway: Optimize regex matchers for .* and .+. #8131
- [#7808](https://github.com/thanos-io/thanos/pull/7808) Query: Support chain deduplication algorithm.
- [#8158](https://github.com/thanos-io/thanos/pull/8158) Rule: Add support for query offset.
- [#8110](https://github.com/thanos-io/thanos/pull/8110) Compact: implement native histogram downsampling.
- [#7996](https://github.com/thanos-io/thanos/pull/7996) Receive: Add OTLP endpoint.
### Changed
@ -43,6 +75,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#7012](https://github.com/thanos-io/thanos/pull/7012) Query: Automatically adjust `max_source_resolution` based on promql query to avoid querying data from higher resolution resulting empty results.
- [#8118](https://github.com/thanos-io/thanos/pull/8118) Query: Bumped promql-engine
- [#8135](https://github.com/thanos-io/thanos/pull/8135) Query: respect partial response in distributed engine
- [#8181](https://github.com/thanos-io/thanos/pull/8181) Deps: bump promql engine
### Removed
@ -52,6 +85,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#7970](https://github.com/thanos-io/thanos/pull/7970) Sidecar: Respect min-time setting.
- [#7962](https://github.com/thanos-io/thanos/pull/7962) Store: Fix potential deadlock in hedging request.
- [#8175](https://github.com/thanos-io/thanos/pull/8175) Query: fix endpointset setup
### Added

View File

@ -1 +1 @@
0.39.0-dev
0.40.0-dev

View File

@ -136,7 +136,7 @@ func (pc *prometheusConfig) registerFlag(cmd extkingpin.FlagClause) *prometheusC
Default("30s").DurationVar(&pc.getConfigInterval)
cmd.Flag("prometheus.get_config_timeout",
"Timeout for getting Prometheus config").
Default("5s").DurationVar(&pc.getConfigTimeout)
Default("30s").DurationVar(&pc.getConfigTimeout)
pc.httpClient = extflag.RegisterPathOrContent(
cmd,
"prometheus.http-client",
@ -203,6 +203,7 @@ type shipperConfig struct {
uploadCompacted bool
ignoreBlockSize bool
allowOutOfOrderUpload bool
skipCorruptedBlocks bool
hashFunc string
metaFileName string
}
@ -219,6 +220,11 @@ func (sc *shipperConfig) registerFlag(cmd extkingpin.FlagClause) *shipperConfig
"This can trigger compaction without those blocks and as a result will create an overlap situation. Set it to true if you have vertical compaction enabled and wish to upload blocks as soon as possible without caring"+
"about order.").
Default("false").Hidden().BoolVar(&sc.allowOutOfOrderUpload)
cmd.Flag("shipper.skip-corrupted-blocks",
"If true, shipper will skip corrupted blocks in the given iteration and retry later. This means that some newer blocks might be uploaded sooner than older blocks."+
"This can trigger compaction without those blocks and as a result will create an overlap situation. Set it to true if you have vertical compaction enabled and wish to upload blocks as soon as possible without caring"+
"about order.").
Default("false").Hidden().BoolVar(&sc.skipCorruptedBlocks)
cmd.Flag("hash-func", "Specify which hash function to use when calculating the hashes of produced files. If no function has been specified, it does not happen. This permits avoiding downloading some files twice albeit at some performance cost. Possible values are: \"\", \"SHA256\".").
Default("").EnumVar(&sc.hashFunc, "SHA256", "")
cmd.Flag("shipper.meta-file-name", "the file to store shipper metadata in").Default(shipper.DefaultMetaFilename).StringVar(&sc.metaFileName)

View File

@ -15,7 +15,8 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/oklog/run"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"

View File

@ -298,8 +298,7 @@ func setupEndpointSet(
addresses := make([]string, 0, len(endpointConfig.Endpoints))
for _, ecfg := range endpointConfig.Endpoints {
if addr := ecfg.Address; !ecfg.Group && !ecfg.Strict {
// originally only "--endpoint" addresses got resolved
if addr := ecfg.Address; dns.IsDynamicNode(addr) && !ecfg.Group {
addresses = append(addresses, addr)
}
}
@ -318,14 +317,16 @@ func setupEndpointSet(
endpointConfig := configProvider.config()
specs := make([]*query.GRPCEndpointSpec, 0)
// groups and non dynamic endpoints
for _, ecfg := range endpointConfig.Endpoints {
strict, group, addr := ecfg.Strict, ecfg.Group, ecfg.Address
if group {
specs = append(specs, query.NewGRPCEndpointSpec(fmt.Sprintf("thanos:///%s", addr), strict, append(dialOpts, extgrpc.EndpointGroupGRPCOpts()...)...))
} else {
} else if !dns.IsDynamicNode(addr) {
specs = append(specs, query.NewGRPCEndpointSpec(addr, strict, dialOpts...))
}
}
// dynamic endpoints
for _, addr := range dnsEndpointProvider.Addresses() {
specs = append(specs, query.NewGRPCEndpointSpec(addr, false, dialOpts...))
}

View File

@ -15,6 +15,7 @@ import (
"runtime/debug"
"syscall"
"github.com/alecthomas/kingpin/v2"
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/oklog/run"
@ -25,7 +26,6 @@ import (
versioncollector "github.com/prometheus/client_golang/prometheus/collectors/version"
"github.com/prometheus/common/version"
"go.uber.org/automaxprocs/maxprocs"
"gopkg.in/alecthomas/kingpin.v2"
"github.com/thanos-io/thanos/pkg/extkingpin"
"github.com/thanos-io/thanos/pkg/logging"

View File

@ -14,7 +14,8 @@ import (
"time"
"github.com/go-kit/log"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/prometheus/client_golang/prometheus"
promtest "github.com/prometheus/client_golang/prometheus/testutil"
"github.com/prometheus/prometheus/model/labels"

View File

@ -20,6 +20,7 @@ import (
"github.com/prometheus/common/route"
"github.com/prometheus/prometheus/model/labels"
"github.com/prometheus/prometheus/promql"
"github.com/prometheus/prometheus/promql/parser"
apiv1 "github.com/thanos-io/thanos/pkg/api/query"
"github.com/thanos-io/thanos/pkg/api/query/querypb"
@ -53,9 +54,10 @@ import (
)
const (
promqlNegativeOffset = "promql-negative-offset"
promqlAtModifier = "promql-at-modifier"
queryPushdown = "query-pushdown"
promqlNegativeOffset = "promql-negative-offset"
promqlAtModifier = "promql-at-modifier"
queryPushdown = "query-pushdown"
promqlExperimentalFunctions = "promql-experimental-functions"
)
// registerQuery registers a query command.
@ -81,6 +83,8 @@ func registerQuery(app *extkingpin.App) {
defaultEngine := cmd.Flag("query.promql-engine", "Default PromQL engine to use.").Default(string(apiv1.PromqlEnginePrometheus)).
Enum(string(apiv1.PromqlEnginePrometheus), string(apiv1.PromqlEngineThanos))
disableQueryFallback := cmd.Flag("query.disable-fallback", "If set then thanos engine will throw an error if query falls back to prometheus engine").Hidden().Default("false").Bool()
extendedFunctionsEnabled := cmd.Flag("query.enable-x-functions", "Whether to enable extended rate functions (xrate, xincrease and xdelta). Only has effect when used with Thanos engine.").Default("false").Bool()
promqlQueryMode := cmd.Flag("query.mode", "PromQL query mode. One of: local, distributed.").
Default(string(apiv1.PromqlQueryModeLocal)).
@ -135,7 +139,7 @@ func registerQuery(app *extkingpin.App) {
activeQueryDir := cmd.Flag("query.active-query-path", "Directory to log currently active queries in the queries.active file.").Default("").String()
featureList := cmd.Flag("enable-feature", "Comma separated experimental feature names to enable.The current list of features is empty.").Hidden().Default("").Strings()
featureList := cmd.Flag("enable-feature", "Comma separated feature names to enable. Valid options for now: promql-experimental-functions (enables promql experimental functions in query)").Default("").Strings()
enableExemplarPartialResponse := cmd.Flag("exemplar.partial-response", "Enable partial response for exemplar endpoint. --no-exemplar.partial-response for disabling.").
Hidden().Default("true").Bool()
@ -198,6 +202,9 @@ func registerQuery(app *extkingpin.App) {
strictEndpointGroups := extkingpin.Addrs(cmd.Flag("endpoint-group-strict", "(Deprecated, Experimental): DNS name of statically configured Thanos API server groups (repeatable) that are always used, even if the health check fails.").PlaceHolder("<endpoint-group-strict>"))
lazyRetrievalMaxBufferedResponses := cmd.Flag("query.lazy-retrieval-max-buffered-responses", "The lazy retrieval strategy can buffer up to this number of responses. This is to limit the memory usage. This flag takes effect only when the lazy retrieval strategy is enabled.").
Default("20").Hidden().Int()
var storeRateLimits store.SeriesSelectLimits
storeRateLimits.RegisterFlags(cmd)
@ -208,6 +215,10 @@ func registerQuery(app *extkingpin.App) {
}
for _, feature := range *featureList {
if feature == promqlExperimentalFunctions {
parser.EnableExperimentalFunctions = true
level.Info(logger).Log("msg", "Experimental PromQL functions enabled.", "option", promqlExperimentalFunctions)
}
if feature == promqlAtModifier {
level.Warn(logger).Log("msg", "This option for --enable-feature is now permanently enabled and therefore a no-op.", "option", promqlAtModifier)
}
@ -225,7 +236,6 @@ func registerQuery(app *extkingpin.App) {
}
grpcLogOpts, logFilterMethods, err := logging.ParsegRPCOptions(reqLogConfig)
if err != nil {
return errors.Wrap(err, "error while parsing config for request logging")
}
@ -331,12 +341,14 @@ func registerQuery(app *extkingpin.App) {
store.NewTSDBSelector(tsdbSelector),
apiv1.PromqlEngineType(*defaultEngine),
apiv1.PromqlQueryMode(*promqlQueryMode),
*disableQueryFallback,
*tenantHeader,
*defaultTenant,
*tenantCertField,
*enforceTenancy,
*tenantLabel,
*queryDistributedWithOverlappingInterval,
*lazyRetrievalMaxBufferedResponses,
)
})
}
@ -393,12 +405,14 @@ func runQuery(
tsdbSelector *store.TSDBSelector,
defaultEngine apiv1.PromqlEngineType,
queryMode apiv1.PromqlQueryMode,
disableQueryFallback bool,
tenantHeader string,
defaultTenant string,
tenantCertField string,
enforceTenancy bool,
tenantLabel string,
queryDistributedWithOverlappingInterval bool,
lazyRetrievalMaxBufferedResponses int,
) error {
comp := component.Query
if alertQueryURL == "" {
@ -412,6 +426,7 @@ func runQuery(
options := []store.ProxyStoreOption{
store.WithTSDBSelector(tsdbSelector),
store.WithProxyStoreDebugLogging(debugLogging),
store.WithLazyRetrievalMaxBufferedResponsesForProxy(lazyRetrievalMaxBufferedResponses),
}
// Parse and sanitize the provided replica labels flags.
@ -466,6 +481,7 @@ func runQuery(
extendedFunctionsEnabled,
activeQueryTracker,
queryMode,
disableQueryFallback,
)
lookbackDeltaCreator := LookbackDeltaFactory(lookbackDelta, dynamicLookbackDelta)

View File

@ -4,6 +4,7 @@
package main
import (
"context"
"net"
"net/http"
"time"
@ -34,6 +35,7 @@ import (
"github.com/thanos-io/thanos/pkg/logging"
"github.com/thanos-io/thanos/pkg/prober"
"github.com/thanos-io/thanos/pkg/queryfrontend"
"github.com/thanos-io/thanos/pkg/runutil"
httpserver "github.com/thanos-io/thanos/pkg/server/http"
"github.com/thanos-io/thanos/pkg/server/http/middleware"
"github.com/thanos-io/thanos/pkg/tenancy"
@ -97,6 +99,8 @@ func registerQueryFrontend(app *extkingpin.App) {
cmd.Flag("query-frontend.enable-x-functions", "Enable experimental x- functions in query-frontend. --no-query-frontend.enable-x-functions for disabling.").
Default("false").BoolVar(&cfg.EnableXFunctions)
cmd.Flag("enable-feature", "Comma separated feature names to enable. Valid options for now: promql-experimental-functions (enables promql experimental functions in query-frontend)").Default("").StringsVar(&cfg.EnableFeatures)
cmd.Flag("query-range.max-query-length", "Limit the query time range (end - start time) in the query-frontend, 0 disables it.").
Default("0").DurationVar((*time.Duration)(&cfg.QueryRangeConfig.Limits.MaxQueryLength))
@ -301,6 +305,15 @@ func runQueryFrontend(
}
}
if len(cfg.EnableFeatures) > 0 {
for _, feature := range cfg.EnableFeatures {
if feature == promqlExperimentalFunctions {
parser.EnableExperimentalFunctions = true
level.Info(logger).Log("msg", "Experimental PromQL functions enabled.", "option", promqlExperimentalFunctions)
}
}
}
tripperWare, err := queryfrontend.NewTripperware(cfg.Config, reg, logger)
if err != nil {
return errors.Wrap(err, "setup tripperwares")
@ -384,8 +397,55 @@ func runQueryFrontend(
})
}
// Periodically check downstream URL to ensure it is reachable.
{
ctx, cancel := context.WithCancel(context.Background())
g.Add(func() error {
var firstRun = true
for {
if !firstRun {
select {
case <-ctx.Done():
return nil
case <-time.After(10 * time.Second):
}
}
timeoutCtx, cancel := context.WithTimeout(ctx, 10*time.Second)
defer cancel()
readinessUrl := cfg.DownstreamURL + "/-/ready"
req, err := http.NewRequestWithContext(timeoutCtx, http.MethodGet, readinessUrl, nil)
if err != nil {
return errors.Wrap(err, "creating request to downstream URL")
}
resp, err := roundTripper.RoundTrip(req)
if err != nil {
level.Warn(logger).Log("msg", "failed to reach downstream URL", "err", err, "readiness_url", readinessUrl)
statusProber.NotReady(err)
firstRun = false
continue
}
runutil.ExhaustCloseWithLogOnErr(logger, resp.Body, "downstream health check response body")
if resp.StatusCode/100 == 4 || resp.StatusCode/100 == 5 {
level.Warn(logger).Log("msg", "downstream URL returned an error", "status_code", resp.StatusCode, "readiness_url", readinessUrl)
statusProber.NotReady(errors.Errorf("downstream URL %s returned an error: %d", readinessUrl, resp.StatusCode))
firstRun = false
continue
}
statusProber.Ready()
}
}, func(err error) {
cancel()
})
}
level.Info(logger).Log("msg", "starting query frontend")
statusProber.Ready()
return nil
}

View File

@ -210,10 +210,9 @@ func runReceive(
}
}
// TODO(brancz): remove after a couple of versions
// Migrate non-multi-tsdb capable storage to multi-tsdb disk layout.
if err := migrateLegacyStorage(logger, conf.dataDir, conf.defaultTenantID); err != nil {
return errors.Wrapf(err, "migrate legacy storage in %v to default tenant %v", conf.dataDir, conf.defaultTenantID)
// Create TSDB for the default tenant.
if err := createDefautTenantTSDB(logger, conf.dataDir, conf.defaultTenantID); err != nil {
return errors.Wrapf(err, "create default tenant tsdb in %v", conf.dataDir)
}
relabelContentYaml, err := conf.relabelConfigPath.Content()
@ -243,6 +242,7 @@ func runReceive(
conf.tenantLabelName,
bkt,
conf.allowOutOfOrderUpload,
conf.skipCorruptedBlocks,
hashFunc,
multiTSDBOptions...,
)
@ -354,10 +354,14 @@ func runReceive(
return errors.Wrap(err, "setup gRPC server")
}
if conf.lazyRetrievalMaxBufferedResponses <= 0 {
return errors.New("--receive.lazy-retrieval-max-buffered-responses must be > 0")
}
options := []store.ProxyStoreOption{
store.WithProxyStoreDebugLogging(debugLogging),
store.WithMatcherCache(cache),
store.WithoutDedup(),
store.WithLazyRetrievalMaxBufferedResponsesForProxy(conf.lazyRetrievalMaxBufferedResponses),
}
proxy := store.NewProxyStore(
@ -593,7 +597,7 @@ func setupHashring(g *run.Group,
webHandler.Hashring(receive.SingleNodeHashring(conf.endpoint))
level.Info(logger).Log("msg", "Empty hashring config. Set up single node hashring.")
} else {
h, err := receive.NewMultiHashring(algorithm, conf.replicationFactor, c)
h, err := receive.NewMultiHashring(algorithm, conf.replicationFactor, c, reg)
if err != nil {
return errors.Wrap(err, "unable to create new hashring from config")
}
@ -795,38 +799,25 @@ func startTSDBAndUpload(g *run.Group,
return nil
}
func migrateLegacyStorage(logger log.Logger, dataDir, defaultTenantID string) error {
func createDefautTenantTSDB(logger log.Logger, dataDir, defaultTenantID string) error {
defaultTenantDataDir := path.Join(dataDir, defaultTenantID)
if _, err := os.Stat(defaultTenantDataDir); !os.IsNotExist(err) {
level.Info(logger).Log("msg", "default tenant data dir already present, not attempting to migrate storage")
level.Info(logger).Log("msg", "default tenant data dir already present, will not create")
return nil
}
if _, err := os.Stat(dataDir); os.IsNotExist(err) {
level.Info(logger).Log("msg", "no existing storage found, no data migration attempted")
level.Info(logger).Log("msg", "no existing storage found, not creating default tenant data dir")
return nil
}
level.Info(logger).Log("msg", "found legacy storage, migrating to multi-tsdb layout with default tenant", "defaultTenantID", defaultTenantID)
files, err := os.ReadDir(dataDir)
if err != nil {
return errors.Wrapf(err, "read legacy data dir: %v", dataDir)
}
level.Info(logger).Log("msg", "default tenant data dir not found, creating", "defaultTenantID", defaultTenantID)
if err := os.MkdirAll(defaultTenantDataDir, 0750); err != nil {
return errors.Wrapf(err, "create default tenant data dir: %v", defaultTenantDataDir)
}
for _, f := range files {
from := path.Join(dataDir, f.Name())
to := path.Join(defaultTenantDataDir, f.Name())
if err := os.Rename(from, to); err != nil {
return errors.Wrapf(err, "migrate file from %v to %v", from, to)
}
}
return nil
}
@ -895,6 +886,7 @@ type receiveConfig struct {
ignoreBlockSize bool
allowOutOfOrderUpload bool
skipCorruptedBlocks bool
reqLogConfig *extflag.PathOrContent
relabelConfigPath *extflag.PathOrContent
@ -907,6 +899,8 @@ type receiveConfig struct {
matcherCacheSize int
lazyRetrievalMaxBufferedResponses int
featureList *[]string
headExpandedPostingsCacheSize uint64
@ -1010,7 +1004,7 @@ func (rc *receiveConfig) registerFlag(cmd extkingpin.FlagClause) {
rc.tsdbOutOfOrderTimeWindow = extkingpin.ModelDuration(cmd.Flag("tsdb.out-of-order.time-window",
"[EXPERIMENTAL] Configures the allowed time window for ingestion of out-of-order samples. Disabled (0s) by default"+
"Please note if you enable this option and you use compactor, make sure you have the --enable-vertical-compaction flag enabled, otherwise you might risk compactor halt.",
"Please note if you enable this option and you use compactor, make sure you have the --compact.enable-vertical-compaction flag enabled, otherwise you might risk compactor halt.",
).Default("0s"))
cmd.Flag("tsdb.out-of-order.cap-max",
@ -1045,7 +1039,7 @@ func (rc *receiveConfig) registerFlag(cmd extkingpin.FlagClause) {
cmd.Flag("tsdb.enable-native-histograms",
"[EXPERIMENTAL] Enables the ingestion of native histograms.").
Default("false").Hidden().BoolVar(&rc.tsdbEnableNativeHistograms)
Default("false").BoolVar(&rc.tsdbEnableNativeHistograms)
cmd.Flag("writer.intern",
"[EXPERIMENTAL] Enables string interning in receive writer, for more optimized memory usage.").
@ -1062,6 +1056,12 @@ func (rc *receiveConfig) registerFlag(cmd extkingpin.FlagClause) {
"about order.").
Default("false").Hidden().BoolVar(&rc.allowOutOfOrderUpload)
cmd.Flag("shipper.skip-corrupted-blocks",
"If true, shipper will skip corrupted blocks in the given iteration and retry later. This means that some newer blocks might be uploaded sooner than older blocks."+
"This can trigger compaction without those blocks and as a result will create an overlap situation. Set it to true if you have vertical compaction enabled and wish to upload blocks as soon as possible without caring"+
"about order.").
Default("false").Hidden().BoolVar(&rc.skipCorruptedBlocks)
cmd.Flag("matcher-cache-size", "Max number of cached matchers items. Using 0 disables caching.").Default("0").IntVar(&rc.matcherCacheSize)
rc.reqLogConfig = extkingpin.RegisterRequestLoggingFlags(cmd)
@ -1074,6 +1074,9 @@ func (rc *receiveConfig) registerFlag(cmd extkingpin.FlagClause) {
cmd.Flag("receive.otlp-promote-resource-attributes", "(Repeatable) Resource attributes to include in OTLP metrics ingested by Receive.").Default("").StringsVar(&rc.otlpResourceAttributes)
rc.featureList = cmd.Flag("enable-feature", "Comma separated experimental feature names to enable. The current list of features is "+metricNamesFilter+".").Default("").Strings()
cmd.Flag("receive.lazy-retrieval-max-buffered-responses", "The lazy retrieval strategy can buffer up to this number of responses. This is to limit the memory usage. This flag takes effect only when the lazy retrieval strategy is enabled.").
Default("20").IntVar(&rc.lazyRetrievalMaxBufferedResponses)
}
// determineMode returns the ReceiverMode that this receiver is configured to run in.

View File

@ -14,6 +14,7 @@ import (
"os"
"path/filepath"
"strings"
"sync"
texttemplate "text/template"
"time"
@ -35,6 +36,7 @@ import (
"github.com/prometheus/prometheus/promql"
"github.com/prometheus/prometheus/promql/parser"
"github.com/prometheus/prometheus/rules"
"github.com/prometheus/prometheus/scrape"
"github.com/prometheus/prometheus/storage"
"github.com/prometheus/prometheus/storage/remote"
"github.com/prometheus/prometheus/tsdb"
@ -110,7 +112,9 @@ type ruleConfig struct {
storeRateLimits store.SeriesSelectLimits
ruleConcurrentEval int64
extendedFunctionsEnabled bool
extendedFunctionsEnabled bool
EnableFeatures []string
tsdbEnableNativeHistograms bool
}
type Expression struct {
@ -165,6 +169,11 @@ func registerRule(app *extkingpin.App) {
PlaceHolder("<endpoint>").StringsVar(&conf.grpcQueryEndpoints)
cmd.Flag("query.enable-x-functions", "Whether to enable extended rate functions (xrate, xincrease and xdelta). Only has effect when used with Thanos engine.").Default("false").BoolVar(&conf.extendedFunctionsEnabled)
cmd.Flag("enable-feature", "Comma separated feature names to enable. Valid options for now: promql-experimental-functions (enables promql experimental functions for ruler)").Default("").StringsVar(&conf.EnableFeatures)
cmd.Flag("tsdb.enable-native-histograms",
"[EXPERIMENTAL] Enables the ingestion of native histograms.").
Default("false").BoolVar(&conf.tsdbEnableNativeHistograms)
conf.rwConfig = extflag.RegisterPathOrContent(cmd, "remote-write.config", "YAML config for the remote-write configurations, that specify servers where samples should be sent to (see https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write). This automatically enables stateless mode for ruler and no series will be stored in the ruler's TSDB. If an empty config (or file) is provided, the flag is ignored and ruler is run with its own TSDB.", extflag.WithEnvSubstitution())
@ -185,11 +194,12 @@ func registerRule(app *extkingpin.App) {
}
tsdbOpts := &tsdb.Options{
MinBlockDuration: int64(time.Duration(*tsdbBlockDuration) / time.Millisecond),
MaxBlockDuration: int64(time.Duration(*tsdbBlockDuration) / time.Millisecond),
RetentionDuration: int64(time.Duration(*tsdbRetention) / time.Millisecond),
NoLockfile: *noLockFile,
WALCompression: wlog.ParseCompressionType(*walCompression, string(wlog.CompressionSnappy)),
MinBlockDuration: int64(time.Duration(*tsdbBlockDuration) / time.Millisecond),
MaxBlockDuration: int64(time.Duration(*tsdbBlockDuration) / time.Millisecond),
RetentionDuration: int64(time.Duration(*tsdbRetention) / time.Millisecond),
NoLockfile: *noLockFile,
WALCompression: wlog.ParseCompressionType(*walCompression, string(wlog.CompressionSnappy)),
EnableNativeHistograms: conf.tsdbEnableNativeHistograms,
}
agentOpts := &agent.Options{
@ -469,7 +479,7 @@ func runRule(
// flushDeadline is set to 1m, but it is for metadata watcher only so not used here.
remoteStore := remote.NewStorage(slogger, reg, func() (int64, error) {
return 0, nil
}, conf.dataDir, 1*time.Minute, nil, false)
}, conf.dataDir, 1*time.Minute, &readyScrapeManager{})
if err := remoteStore.ApplyConfig(&config.Config{
GlobalConfig: config.GlobalConfig{
ExternalLabels: labelsTSDBToProm(conf.lset),
@ -581,6 +591,15 @@ func runRule(
}
}
if len(conf.EnableFeatures) > 0 {
for _, feature := range conf.EnableFeatures {
if feature == promqlExperimentalFunctions {
parser.EnableExperimentalFunctions = true
level.Info(logger).Log("msg", "Experimental PromQL functions enabled.", "option", promqlExperimentalFunctions)
}
}
}
// Run rule evaluation and alert notifications.
notifyFunc := func(ctx context.Context, expr string, alerts ...*rules.Alert) {
res := make([]*notifier.Alert, 0, len(alerts))
@ -839,7 +858,18 @@ func runRule(
}
}()
s := shipper.New(logger, reg, conf.dataDir, bkt, func() labels.Labels { return conf.lset }, metadata.RulerSource, nil, conf.shipper.allowOutOfOrderUpload, metadata.HashFunc(conf.shipper.hashFunc), conf.shipper.metaFileName)
s := shipper.New(
bkt,
conf.dataDir,
shipper.WithLogger(logger),
shipper.WithRegisterer(reg),
shipper.WithSource(metadata.RulerSource),
shipper.WithHashFunc(metadata.HashFunc(conf.shipper.hashFunc)),
shipper.WithMetaFileName(conf.shipper.metaFileName),
shipper.WithLabels(func() labels.Labels { return conf.lset }),
shipper.WithAllowOutOfOrderUploads(conf.shipper.allowOutOfOrderUpload),
shipper.WithSkipCorruptedBlocks(conf.shipper.skipCorruptedBlocks),
)
ctx, cancel := context.WithCancel(context.Background())
@ -1084,3 +1114,32 @@ func filterOutPromQLWarnings(warns []string, logger log.Logger, query string) []
}
return storeWarnings
}
// ReadyScrapeManager allows a scrape manager to be retrieved. Even if it's set at a later point in time.
type readyScrapeManager struct {
mtx sync.RWMutex
m *scrape.Manager
}
// Set the scrape manager.
func (rm *readyScrapeManager) Set(m *scrape.Manager) {
rm.mtx.Lock()
defer rm.mtx.Unlock()
rm.m = m
}
// Get the scrape manager. If is not ready, return an error.
func (rm *readyScrapeManager) Get() (*scrape.Manager, error) {
rm.mtx.RLock()
defer rm.mtx.RUnlock()
if rm.m != nil {
return rm.m, nil
}
return nil, ErrNotReady
}
// ErrNotReady is returned if the underlying scrape manager is not ready yet.
var ErrNotReady = errors.New("scrape manager not ready")

View File

@ -414,9 +414,19 @@ func runSidecar(
return errors.Wrapf(err, "aborting as no external labels found after waiting %s", promReadyTimeout)
}
uploadCompactedFunc := func() bool { return conf.shipper.uploadCompacted }
s := shipper.New(logger, reg, conf.tsdb.path, bkt, m.Labels, metadata.SidecarSource,
uploadCompactedFunc, conf.shipper.allowOutOfOrderUpload, metadata.HashFunc(conf.shipper.hashFunc), conf.shipper.metaFileName)
s := shipper.New(
bkt,
conf.tsdb.path,
shipper.WithLogger(logger),
shipper.WithRegisterer(reg),
shipper.WithSource(metadata.SidecarSource),
shipper.WithHashFunc(metadata.HashFunc(conf.shipper.hashFunc)),
shipper.WithMetaFileName(conf.shipper.metaFileName),
shipper.WithLabels(m.Labels),
shipper.WithUploadCompacted(conf.shipper.uploadCompacted),
shipper.WithAllowOutOfOrderUploads(conf.shipper.allowOutOfOrderUpload),
shipper.WithSkipCorruptedBlocks(conf.shipper.skipCorruptedBlocks),
)
return runutil.Repeat(30*time.Second, ctx.Done(), func() error {
if uploaded, err := s.Sync(ctx); err != nil {

View File

@ -105,7 +105,8 @@ type storeConfig struct {
indexHeaderLazyDownloadStrategy string
matcherCacheSize int
matcherCacheSize int
disableAdminOperations bool
}
func (sc *storeConfig) registerFlag(cmd extkingpin.FlagClause) {
@ -229,6 +230,8 @@ func (sc *storeConfig) registerFlag(cmd extkingpin.FlagClause) {
cmd.Flag("matcher-cache-size", "Max number of cached matchers items. Using 0 disables caching.").Default("0").IntVar(&sc.matcherCacheSize)
cmd.Flag("disable-admin-operations", "Disable UI/API admin operations like marking blocks for deletion and no compaction.").Default("false").BoolVar(&sc.disableAdminOperations)
sc.reqLogConfig = extkingpin.RegisterRequestLoggingFlags(cmd)
}

View File

@ -23,7 +23,8 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/oklog/run"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/olekukonko/tablewriter"
"github.com/opentracing/opentracing-go"
"github.com/pkg/errors"
@ -110,8 +111,11 @@ type bucketVerifyConfig struct {
}
type bucketLsConfig struct {
output string
excludeDelete bool
output string
excludeDelete bool
selectorRelabelConf extflag.PathOrContent
filterConf *store.FilterConfig
timeout time.Duration
}
type bucketWebConfig struct {
@ -181,10 +185,18 @@ func (tbc *bucketVerifyConfig) registerBucketVerifyFlag(cmd extkingpin.FlagClaus
}
func (tbc *bucketLsConfig) registerBucketLsFlag(cmd extkingpin.FlagClause) *bucketLsConfig {
tbc.selectorRelabelConf = *extkingpin.RegisterSelectorRelabelFlags(cmd)
tbc.filterConf = &store.FilterConfig{}
cmd.Flag("output", "Optional format in which to print each block's information. Options are 'json', 'wide' or a custom template.").
Short('o').Default("").StringVar(&tbc.output)
cmd.Flag("exclude-delete", "Exclude blocks marked for deletion.").
Default("false").BoolVar(&tbc.excludeDelete)
cmd.Flag("min-time", "Start of time range limit to list blocks. Thanos Tools will list blocks, which were created later than this value. Option can be a constant time in RFC3339 format or time duration relative to current time, such as -1d or 2h45m. Valid duration units are ms, s, m, h, d, w, y.").
Default("0000-01-01T00:00:00Z").SetValue(&tbc.filterConf.MinTime)
cmd.Flag("max-time", "End of time range limit to list. Thanos Tools will list only blocks, which were created earlier than this value. Option can be a constant time in RFC3339 format or time duration relative to current time, such as -1d or 2h45m. Valid duration units are ms, s, m, h, d, w, y.").
Default("9999-12-31T23:59:59Z").SetValue(&tbc.filterConf.MaxTime)
cmd.Flag("timeout", "Timeout to download metadata from remote storage").Default("5m").DurationVar(&tbc.timeout)
return tbc
}
@ -418,12 +430,30 @@ func registerBucketLs(app extkingpin.AppClause, objStoreConfig *extflag.PathOrCo
}
insBkt := objstoretracing.WrapWithTraces(objstore.WrapWithMetrics(bkt, extprom.WrapRegistererWithPrefix("thanos_", reg), bkt.Name()))
var filters []block.MetadataFilter
if tbc.timeout < time.Minute {
level.Warn(logger).Log("msg", "Timeout less than 1m could lead to frequent failures")
}
relabelContentYaml, err := tbc.selectorRelabelConf.Content()
if err != nil {
return errors.Wrap(err, "get content of relabel configuration")
}
relabelConfig, err := block.ParseRelabelConfig(relabelContentYaml, block.SelectorSupportedRelabelActions)
if err != nil {
return err
}
filters := []block.MetadataFilter{
block.NewLabelShardedMetaFilter(relabelConfig),
block.NewTimePartitionMetaFilter(tbc.filterConf.MinTime, tbc.filterConf.MaxTime),
}
if tbc.excludeDelete {
ignoreDeletionMarkFilter := block.NewIgnoreDeletionMarkFilter(logger, insBkt, 0, block.FetcherConcurrency)
filters = append(filters, ignoreDeletionMarkFilter)
}
baseBlockIDsFetcher := block.NewConcurrentLister(logger, insBkt)
fetcher, err := block.NewMetaFetcher(logger, block.FetcherConcurrency, insBkt, baseBlockIDsFetcher, "", extprom.WrapRegistererWithPrefix(extpromPrefix, reg), filters)
if err != nil {
@ -435,7 +465,7 @@ func registerBucketLs(app extkingpin.AppClause, objStoreConfig *extflag.PathOrCo
defer runutil.CloseWithLogOnErr(logger, insBkt, "bucket client")
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
ctx, cancel := context.WithTimeout(context.Background(), tbc.timeout)
defer cancel()
var (
@ -505,7 +535,7 @@ func registerBucketInspect(app extkingpin.AppClause, objStoreConfig *extflag.Pat
tbc := &bucketInspectConfig{}
tbc.registerBucketInspectFlag(cmd)
output := cmd.Flag("output", "Output format for result. Currently supports table, cvs, tsv.").Default("table").Enum(outputTypes...)
output := cmd.Flag("output", "Output format for result. Currently supports table, csv, tsv.").Default("table").Enum(outputTypes...)
cmd.Setup(func(g *run.Group, logger log.Logger, reg *prometheus.Registry, _ opentracing.Tracer, _ <-chan struct{}, _ bool) error {
@ -1471,8 +1501,15 @@ func registerBucketUploadBlocks(app extkingpin.AppClause, objStoreConfig *extfla
bkt = objstoretracing.WrapWithTraces(objstore.WrapWithMetrics(bkt, extprom.WrapRegistererWithPrefix("thanos_", reg), bkt.Name()))
s := shipper.New(logger, reg, tbc.path, bkt, func() labels.Labels { return lset }, metadata.BucketUploadSource,
nil, false, metadata.HashFunc(""), shipper.DefaultMetaFilename)
s := shipper.New(
bkt,
tbc.path,
shipper.WithLogger(logger),
shipper.WithRegisterer(reg),
shipper.WithSource(metadata.BucketUploadSource),
shipper.WithMetaFileName(shipper.DefaultMetaFilename),
shipper.WithLabels(func() labels.Labels { return lset }),
)
ctx, cancel := context.WithCancel(context.Background())
g.Add(func() error {

View File

@ -5,6 +5,7 @@ package main
import (
"os"
"path"
"testing"
"github.com/go-kit/log"
@ -47,9 +48,12 @@ func Test_CheckRules_Glob(t *testing.T) {
testutil.NotOk(t, checkRulesFiles(logger, files), "expected err for file %s", files)
// Unreadble path
files = &[]string{"./testdata/rules-files/unreadable_valid.yaml"}
filename := (*files)[0]
testutil.Ok(t, os.Chmod(filename, 0000), "failed to change file permissions of %s to 0000", filename)
// Move the initial file to a temp dir and make it unreadble there, in case the process cannot chmod the file in the current dir.
filename := "./testdata/rules-files/unreadable_valid.yaml"
bytesRead, err := os.ReadFile(filename)
testutil.Ok(t, err)
filename = path.Join(t.TempDir(), "file.yaml")
testutil.Ok(t, os.WriteFile(filename, bytesRead, 0000))
files = &[]string{filename}
testutil.NotOk(t, checkRulesFiles(logger, files), "expected err for file %s", files)
testutil.Ok(t, os.Chmod(filename, 0777), "failed to change file permissions of %s to 0777", filename)
}

View File

@ -8,4 +8,4 @@ Welcome 👋🏼
This space was created for the Thanos community to share learnings, insights, best practices and cool things to the world. If you are interested in contributing relevant content to Thanos blog, feel free to add Pull Request (PR) to [Thanos repo's blog directory](http://github.com/thanos-io/thanos). See ya there!
PS: For Prometheus specific content, consider contributing to [Prometheus blog space](https://prometheus.io/blog/) by creating PR to [Prometheus docs repo](https://github.com/prometheus/docs/tree/main/content/blog).
PS: For Prometheus specific content, consider contributing to [Prometheus blog space](https://prometheus.io/blog/) by creating PR to [Prometheus docs repo](https://github.com/prometheus/docs/tree/main/blog-posts).

View File

@ -280,10 +280,75 @@ usage: thanos compact [<flags>]
Continuously compacts blocks in an object store bucket.
Flags:
-h, --[no-]help Show context-sensitive help (also try
--help-long and --help-man).
--[no-]version Show application version.
--log.level=info Log filtering level.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--[no-]enable-auto-gomemlimit
Enable go runtime to automatically limit memory
consumption.
--auto-gomemlimit.ratio=0.9
The ratio of reserved GOMEMLIMIT memory to the
detected maximum container or system memory.
--http-address="0.0.0.0:10902"
Listen host:port for HTTP endpoints.
--http-grace-period=2m Time to wait after an interrupt received for
HTTP Server.
--http.config="" [EXPERIMENTAL] Path to the configuration file
that can enable TLS or authentication for all
HTTP endpoints.
--data-dir="./data" Data directory in which to cache blocks and
process compactions.
--objstore.config-file=<file-path>
Path to YAML file that contains object
store configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--objstore.config=<content>
Alternative to 'objstore.config-file'
flag (mutually exclusive). Content of
YAML file that contains object store
configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--consistency-delay=30m Minimum age of fresh (non-compacted)
blocks before they are being processed.
Malformed blocks older than the maximum of
consistency-delay and 48h0m0s will be removed.
--retention.resolution-raw=0d
How long to retain raw samples in bucket.
Setting this to 0d will retain samples of this
resolution forever
--retention.resolution-5m=0d
How long to retain samples of resolution 1 (5
minutes) in bucket. Setting this to 0d will
retain samples of this resolution forever
--retention.resolution-1h=0d
How long to retain samples of resolution 2 (1
hour) in bucket. Setting this to 0d will retain
samples of this resolution forever
-w, --[no-]wait Do not exit after all compactions have been
processed and wait for new work.
--wait-interval=5m Wait interval between consecutive compaction
runs and bucket refreshes. Only works when
--wait flag specified.
--[no-]downsampling.disable
Disables downsampling. This is not recommended
as querying long time ranges without
non-downsampled data is not efficient and useful
e.g it is not possible to render all samples for
a human eye anyway
--block-discovery-strategy="concurrent"
One of concurrent, recursive. When set to
concurrent, stores will concurrently issue
@ -293,13 +358,13 @@ Flags:
recursively traversing into each directory.
This avoids N+1 calls at the expense of having
slower bucket iterations.
--block-meta-fetch-concurrency=32
Number of goroutines to use when fetching block
metadata from object storage.
--block-files-concurrency=1
Number of goroutines to use when
fetching/uploading block files from object
storage.
--block-meta-fetch-concurrency=32
Number of goroutines to use when fetching block
metadata from object storage.
--block-viewer.global.sync-block-interval=1m
Repeat interval for syncing the blocks between
local and remote view for /global Block Viewer
@ -308,32 +373,37 @@ Flags:
Maximum time for syncing the blocks between
local and remote view for /global Block Viewer
UI.
--bucket-web-label=BUCKET-WEB-LABEL
External block label to use as group title in
the bucket web UI
--compact.blocks-fetch-concurrency=1
Number of goroutines to use when download block
during compaction.
--compact.cleanup-interval=5m
How often we should clean up partially uploaded
blocks and blocks with deletion mark in the
background when --wait has been enabled. Setting
it to "0s" disables it - the cleaning will only
happen at the end of an iteration.
--compact.concurrency=1 Number of goroutines to use when compacting
groups.
--compact.progress-interval=5m
Frequency of calculating the compaction progress
in the background when --wait has been enabled.
Setting it to "0s" disables it. Now compaction,
downsampling and retention progress are
supported.
--consistency-delay=30m Minimum age of fresh (non-compacted)
blocks before they are being processed.
Malformed blocks older than the maximum of
consistency-delay and 48h0m0s will be removed.
--data-dir="./data" Data directory in which to cache blocks and
process compactions.
--compact.concurrency=1 Number of goroutines to use when compacting
groups.
--compact.blocks-fetch-concurrency=1
Number of goroutines to use when download block
during compaction.
--downsample.concurrency=1
Number of goroutines to use when downsampling
blocks.
--delete-delay=48h Time before a block marked for deletion is
deleted from bucket. If delete-delay is non
zero, blocks will be marked for deletion and
compactor component will delete blocks marked
for deletion from the bucket. If delete-delay
is 0, blocks will be deleted straight away.
Note that deleting blocks immediately can cause
query failures, if store gateway still has the
block loaded, or compactor is ignoring the
deletion because it's compacting the block at
the same time.
--deduplication.func= Experimental. Deduplication algorithm for
merging overlapping blocks. Possible values are:
"", "penalty". If no value is specified,
@ -360,48 +430,19 @@ Flags:
need a different deduplication algorithm (e.g
one that works well with Prometheus replicas),
please set it via --deduplication.func.
--delete-delay=48h Time before a block marked for deletion is
deleted from bucket. If delete-delay is non
zero, blocks will be marked for deletion and
compactor component will delete blocks marked
for deletion from the bucket. If delete-delay
is 0, blocks will be deleted straight away.
Note that deleting blocks immediately can cause
query failures, if store gateway still has the
block loaded, or compactor is ignoring the
deletion because it's compacting the block at
the same time.
--disable-admin-operations
Disable UI/API admin operations like marking
blocks for deletion and no compaction.
--downsample.concurrency=1
Number of goroutines to use when downsampling
blocks.
--downsampling.disable Disables downsampling. This is not recommended
as querying long time ranges without
non-downsampled data is not efficient and useful
e.g it is not possible to render all samples for
a human eye anyway
--enable-auto-gomemlimit Enable go runtime to automatically limit memory
consumption.
--hash-func= Specify which hash function to use when
calculating the hashes of produced files.
If no function has been specified, it does not
happen. This permits avoiding downloading some
files twice albeit at some performance cost.
Possible values are: "", "SHA256".
-h, --help Show context-sensitive help (also try
--help-long and --help-man).
--http-address="0.0.0.0:10902"
Listen host:port for HTTP endpoints.
--http-grace-period=2m Time to wait after an interrupt received for
HTTP Server.
--http.config="" [EXPERIMENTAL] Path to the configuration file
that can enable TLS or authentication for all
HTTP endpoints.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--log.level=info Log filtering level.
--min-time=0000-01-01T00:00:00Z
Start of time range limit to compact.
Thanos Compactor will compact only blocks, which
happened later than this value. Option can be a
constant time in RFC3339 format or time duration
relative to current time, such as -1d or 2h45m.
Valid duration units are ms, s, m, h, d, w, y.
--max-time=9999-12-31T23:59:59Z
End of time range limit to compact.
Thanos Compactor will compact only blocks,
@ -410,35 +451,14 @@ Flags:
duration relative to current time, such as -1d
or 2h45m. Valid duration units are ms, s, m, h,
d, w, y.
--min-time=0000-01-01T00:00:00Z
Start of time range limit to compact.
Thanos Compactor will compact only blocks, which
happened later than this value. Option can be a
constant time in RFC3339 format or time duration
relative to current time, such as -1d or 2h45m.
Valid duration units are ms, s, m, h, d, w, y.
--objstore.config=<content>
Alternative to 'objstore.config-file'
flag (mutually exclusive). Content of
YAML file that contains object store
configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--objstore.config-file=<file-path>
Path to YAML file that contains object
store configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--retention.resolution-1h=0d
How long to retain samples of resolution 2 (1
hour) in bucket. Setting this to 0d will retain
samples of this resolution forever
--retention.resolution-5m=0d
How long to retain samples of resolution 1 (5
minutes) in bucket. Setting this to 0d will
retain samples of this resolution forever
--retention.resolution-raw=0d
How long to retain raw samples in bucket.
Setting this to 0d will retain samples of this
resolution forever
--[no-]web.disable Disable Block Viewer UI.
--selector.relabel-config-file=<file-path>
Path to YAML file with relabeling
configuration that allows selecting blocks
to act on based on their external labels.
It follows thanos sharding relabel-config
syntax. For format details see:
https://thanos.io/tip/thanos/sharding.md/#relabelling
--selector.relabel-config=<content>
Alternative to 'selector.relabel-config-file'
flag (mutually exclusive). Content of YAML
@ -447,32 +467,10 @@ Flags:
external labels. It follows thanos sharding
relabel-config syntax. For format details see:
https://thanos.io/tip/thanos/sharding.md/#relabelling
--selector.relabel-config-file=<file-path>
Path to YAML file with relabeling
configuration that allows selecting blocks
to act on based on their external labels.
It follows thanos sharding relabel-config
syntax. For format details see:
https://thanos.io/tip/thanos/sharding.md/#relabelling
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--version Show application version.
-w, --wait Do not exit after all compactions have been
processed and wait for new work.
--wait-interval=5m Wait interval between consecutive compaction
runs and bucket refreshes. Only works when
--wait flag specified.
--web.disable Disable Block Viewer UI.
--web.disable-cors Whether to disable CORS headers to be set by
Thanos. By default Thanos sets CORS headers to
be allowed by all.
--web.route-prefix="" Prefix for API and UI endpoints. This allows
thanos UI to be served on a sub-path. This
option is analogous to --web.route-prefix of
Prometheus.
--web.external-prefix="" Static prefix for all HTML links and redirect
URLs in the bucket web UI interface.
Actual endpoints are still served on / or the
@ -492,9 +490,14 @@ Flags:
stripped prefix value in X-Forwarded-Prefix
header. This allows thanos UI to be served on a
sub-path.
--web.route-prefix="" Prefix for API and UI endpoints. This allows
thanos UI to be served on a sub-path. This
option is analogous to --web.route-prefix of
Prometheus.
--[no-]web.disable-cors Whether to disable CORS headers to be set by
Thanos. By default Thanos sets CORS headers to
be allowed by all.
--bucket-web-label=BUCKET-WEB-LABEL
External block label to use as group title in
the bucket web UI
--[no-]disable-admin-operations
Disable UI/API admin operations like marking
blocks for deletion and no compaction.
```

View File

@ -200,198 +200,196 @@ usage: thanos query-frontend [<flags>]
Query frontend command implements a service deployed in front of queriers to
improve query parallelization and caching.
Flags:
--auto-gomemlimit.ratio=0.9
The ratio of reserved GOMEMLIMIT memory to the
detected maximum container or system memory.
--cache-compression-type=""
Use compression in results cache.
Supported values are: 'snappy' and ” (disable
compression).
--enable-auto-gomemlimit Enable go runtime to automatically limit memory
consumption.
-h, --help Show context-sensitive help (also try
--help-long and --help-man).
--http-address="0.0.0.0:10902"
Listen host:port for HTTP endpoints.
--http-grace-period=2m Time to wait after an interrupt received for
HTTP Server.
--http.config="" [EXPERIMENTAL] Path to the configuration file
that can enable TLS or authentication for all
HTTP endpoints.
--labels.default-time-range=24h
The default metadata time range duration for
retrieving labels through Labels and Series API
when the range parameters are not specified.
--labels.max-query-parallelism=14
Maximum number of labels requests will be
scheduled in parallel by the Frontend.
--labels.max-retries-per-request=5
Maximum number of retries for a single
label/series API request; beyond this,
the downstream error is returned.
--labels.partial-response Enable partial response for labels requests
if no partial_response param is specified.
--no-labels.partial-response for disabling.
--labels.response-cache-config=<content>
Alternative to
'labels.response-cache-config-file' flag
(mutually exclusive). Content of YAML file that
contains response cache configuration.
--labels.response-cache-config-file=<file-path>
Path to YAML file that contains response cache
configuration.
--labels.response-cache-max-freshness=1m
Most recent allowed cacheable result for
labels requests, to prevent caching very recent
results that might still be in flux.
--labels.split-interval=24h
Split labels requests by an interval and
execute in parallel, it should be greater
than 0 when labels.response-cache-config is
configured.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--log.level=info Log filtering level.
--query-frontend.compress-responses
Compress HTTP responses.
--query-frontend.downstream-tripper-config=<content>
Alternative to
'query-frontend.downstream-tripper-config-file'
flag (mutually exclusive). Content of YAML file
that contains downstream tripper configuration.
If your downstream URL is localhost or
127.0.0.1 then it is highly recommended to
increase max_idle_conns_per_host to at least
100.
--query-frontend.downstream-tripper-config-file=<file-path>
Path to YAML file that contains downstream
tripper configuration. If your downstream URL
is localhost or 127.0.0.1 then it is highly
recommended to increase max_idle_conns_per_host
to at least 100.
--query-frontend.downstream-url="http://localhost:9090"
URL of downstream Prometheus Query compatible
API.
--query-frontend.enable-x-functions
Enable experimental x-
functions in query-frontend.
--no-query-frontend.enable-x-functions for
disabling.
--query-frontend.force-query-stats
Enables query statistics for all queries and
will export statistics as logs and service
headers.
--query-frontend.forward-header=<http-header-name> ...
List of headers forwarded by the query-frontend
to downstream queriers, default is empty
--query-frontend.log-queries-longer-than=0
Log queries that are slower than the specified
duration. Set to 0 to disable. Set to < 0 to
enable on all queries.
--query-frontend.org-id-header=<http-header-name> ...
Deprecation Warning - This flag
will be soon deprecated in favor of
query-frontend.tenant-header and both flags
cannot be used at the same time. Request header
names used to identify the source of slow
queries (repeated flag). The values of the
header will be added to the org id field in
the slow query log. If multiple headers match
the request, the first matching arg specified
will take precedence. If no headers match
'anonymous' will be used.
--query-frontend.slow-query-logs-user-header=<http-header-name>
Set the value of the field remote_user in the
slow query logs to the value of the given HTTP
header. Falls back to reading the user from the
basic auth header.
--query-frontend.vertical-shards=QUERY-FRONTEND.VERTICAL-SHARDS
Number of shards to use when
distributing shardable PromQL queries.
For more details, you can refer to
the Vertical query sharding proposal:
https://thanos.io/tip/proposals-accepted/202205-vertical-query-sharding.md
--query-range.align-range-with-step
Mutate incoming queries to align their
start and end with their step for better
cache-ability. Note: Grafana dashboards do that
by default.
--query-range.horizontal-shards=0
Split queries in this many requests
when query duration is below
query-range.max-split-interval.
--query-range.max-query-length=0
Limit the query time range (end - start time)
in the query-frontend, 0 disables it.
--query-range.max-query-parallelism=14
Maximum number of query range requests will be
scheduled in parallel by the Frontend.
--query-range.max-retries-per-request=5
Maximum number of retries for a single query
range request; beyond this, the downstream
error is returned.
--query-range.max-split-interval=0
Split query range below this interval in
query-range.horizontal-shards. Queries with a
range longer than this value will be split in
multiple requests of this length.
--query-range.min-split-interval=0
Split query range requests above this
interval in query-range.horizontal-shards
requests of equal range. Using
this parameter is not allowed with
query-range.split-interval. One should also set
query-range.split-min-horizontal-shards to a
value greater than 1 to enable splitting.
--query-range.partial-response
Enable partial response for query range
requests if no partial_response param is
specified. --no-query-range.partial-response
for disabling.
--query-range.request-downsampled
Make additional query for downsampled data in
case of empty or incomplete response to range
request.
--query-range.response-cache-config=<content>
Alternative to
'query-range.response-cache-config-file' flag
(mutually exclusive). Content of YAML file that
contains response cache configuration.
--query-range.response-cache-config-file=<file-path>
Path to YAML file that contains response cache
configuration.
--query-range.response-cache-max-freshness=1m
Most recent allowed cacheable result for query
range requests, to prevent caching very recent
results that might still be in flux.
--query-range.split-interval=24h
Split query range requests by an interval and
execute in parallel, it should be greater than
0 when query-range.response-cache-config is
configured.
--request.logging-config=<content>
Alternative to 'request.logging-config-file'
flag (mutually exclusive). Content
of YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config-file=<file-path>
Path to YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
-h, --[no-]help Show context-sensitive help (also try --help-long
and --help-man).
--[no-]version Show application version.
--log.level=info Log filtering level.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--version Show application version.
--web.disable-cors Whether to disable CORS headers to be set by
Thanos. By default Thanos sets CORS headers to
be allowed by all.
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--[no-]enable-auto-gomemlimit
Enable go runtime to automatically limit memory
consumption.
--auto-gomemlimit.ratio=0.9
The ratio of reserved GOMEMLIMIT memory to the
detected maximum container or system memory.
--http-address="0.0.0.0:10902"
Listen host:port for HTTP endpoints.
--http-grace-period=2m Time to wait after an interrupt received for HTTP
Server.
--http.config="" [EXPERIMENTAL] Path to the configuration file
that can enable TLS or authentication for all
HTTP endpoints.
--[no-]web.disable-cors Whether to disable CORS headers to be set by
Thanos. By default Thanos sets CORS headers to be
allowed by all.
--[no-]query-range.align-range-with-step
Mutate incoming queries to align their start and
end with their step for better cache-ability.
Note: Grafana dashboards do that by default.
--[no-]query-range.request-downsampled
Make additional query for downsampled data in
case of empty or incomplete response to range
request.
--query-range.split-interval=24h
Split query range requests by an interval and
execute in parallel, it should be greater than
0 when query-range.response-cache-config is
configured.
--query-range.min-split-interval=0
Split query range requests above this interval
in query-range.horizontal-shards requests of
equal range. Using this parameter is not allowed
with query-range.split-interval. One should also
set query-range.split-min-horizontal-shards to a
value greater than 1 to enable splitting.
--query-range.max-split-interval=0
Split query range below this interval in
query-range.horizontal-shards. Queries with a
range longer than this value will be split in
multiple requests of this length.
--query-range.horizontal-shards=0
Split queries in this many requests when query
duration is below query-range.max-split-interval.
--query-range.max-retries-per-request=5
Maximum number of retries for a single query
range request; beyond this, the downstream error
is returned.
--[no-]query-frontend.enable-x-functions
Enable experimental x-
functions in query-frontend.
--no-query-frontend.enable-x-functions for
disabling.
--enable-feature= ... Comma separated feature names to enable. Valid
options for now: promql-experimental-functions
(enables promql experimental functions in
query-frontend)
--query-range.max-query-length=0
Limit the query time range (end - start time) in
the query-frontend, 0 disables it.
--query-range.max-query-parallelism=14
Maximum number of query range requests will be
scheduled in parallel by the Frontend.
--query-range.response-cache-max-freshness=1m
Most recent allowed cacheable result for query
range requests, to prevent caching very recent
results that might still be in flux.
--[no-]query-range.partial-response
Enable partial response for query range requests
if no partial_response param is specified.
--no-query-range.partial-response for disabling.
--query-range.response-cache-config-file=<file-path>
Path to YAML file that contains response cache
configuration.
--query-range.response-cache-config=<content>
Alternative to
'query-range.response-cache-config-file' flag
(mutually exclusive). Content of YAML file that
contains response cache configuration.
--labels.split-interval=24h
Split labels requests by an interval and execute
in parallel, it should be greater than 0 when
labels.response-cache-config is configured.
--labels.max-retries-per-request=5
Maximum number of retries for a single
label/series API request; beyond this, the
downstream error is returned.
--labels.max-query-parallelism=14
Maximum number of labels requests will be
scheduled in parallel by the Frontend.
--labels.response-cache-max-freshness=1m
Most recent allowed cacheable result for labels
requests, to prevent caching very recent results
that might still be in flux.
--[no-]labels.partial-response
Enable partial response for labels requests
if no partial_response param is specified.
--no-labels.partial-response for disabling.
--labels.default-time-range=24h
The default metadata time range duration for
retrieving labels through Labels and Series API
when the range parameters are not specified.
--labels.response-cache-config-file=<file-path>
Path to YAML file that contains response cache
configuration.
--labels.response-cache-config=<content>
Alternative to
'labels.response-cache-config-file' flag
(mutually exclusive). Content of YAML file that
contains response cache configuration.
--cache-compression-type=""
Use compression in results cache. Supported
values are: 'snappy' and ” (disable compression).
--query-frontend.downstream-url="http://localhost:9090"
URL of downstream Prometheus Query compatible
API.
--query-frontend.downstream-tripper-config-file=<file-path>
Path to YAML file that contains downstream
tripper configuration. If your downstream URL
is localhost or 127.0.0.1 then it is highly
recommended to increase max_idle_conns_per_host
to at least 100.
--query-frontend.downstream-tripper-config=<content>
Alternative to
'query-frontend.downstream-tripper-config-file'
flag (mutually exclusive). Content of YAML file
that contains downstream tripper configuration.
If your downstream URL is localhost or 127.0.0.1
then it is highly recommended to increase
max_idle_conns_per_host to at least 100.
--[no-]query-frontend.compress-responses
Compress HTTP responses.
--query-frontend.log-queries-longer-than=0
Log queries that are slower than the specified
duration. Set to 0 to disable. Set to < 0 to
enable on all queries.
--[no-]query-frontend.force-query-stats
Enables query statistics for all queries and will
export statistics as logs and service headers.
--query-frontend.org-id-header=<http-header-name> ...
Deprecation Warning - This flag
will be soon deprecated in favor of
query-frontend.tenant-header and both flags
cannot be used at the same time. Request header
names used to identify the source of slow queries
(repeated flag). The values of the header will be
added to the org id field in the slow query log.
If multiple headers match the request, the first
matching arg specified will take precedence.
If no headers match 'anonymous' will be used.
--query-frontend.forward-header=<http-header-name> ...
List of headers forwarded by the query-frontend
to downstream queriers, default is empty
--query-frontend.vertical-shards=QUERY-FRONTEND.VERTICAL-SHARDS
Number of shards to use when
distributing shardable PromQL queries.
For more details, you can refer to
the Vertical query sharding proposal:
https://thanos.io/tip/proposals-accepted/202205-vertical-query-sharding.md
--query-frontend.slow-query-logs-user-header=<http-header-name>
Set the value of the field remote_user in the
slow query logs to the value of the given HTTP
header. Falls back to reading the user from the
basic auth header.
--request.logging-config-file=<file-path>
Path to YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config=<content>
Alternative to 'request.logging-config-file'
flag (mutually exclusive). Content
of YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
```

View File

@ -281,7 +281,7 @@ Example file SD file in YAML:
### Tenant Metrics
Tenant information is captured in relevant Thanos exported metrics in the Querier, Query Frontend and Store. In order make use of this functionality requests to the Query/Query Frontend component should include the tenant-id in the appropriate HTTP request header as configured with `--query.tenant-header`. The tenant information is passed through components (including Query Frontend), down to the Thanos Store, enabling per-tenant metrics in these components also. If no tenant header is set to requests to the query component, the default tenant as defined by `--query.tenant-default-id` will be used.
Tenant information is captured in relevant Thanos exported metrics in the Querier, Query Frontend and Store. In order make use of this functionality requests to the Query/Query Frontend component should include the tenant-id in the appropriate HTTP request header as configured with `--query.tenant-header`. The tenant information is passed through components (including Query Frontend), down to the Thanos Store, enabling per-tenant metrics in these components also. If no tenant header is set to requests to the query component, the default tenant as defined by `--query.default-tenant-id` will be used.
### Tenant Enforcement
@ -299,96 +299,29 @@ usage: thanos query [<flags>]
Query node exposing PromQL enabled Query API with data retrieved from multiple
store nodes.
Flags:
--alert.query-url=ALERT.QUERY-URL
The external Thanos Query URL that would be set
in all alerts 'Source' field.
-h, --[no-]help Show context-sensitive help (also try
--help-long and --help-man).
--[no-]version Show application version.
--log.level=info Log filtering level.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--[no-]enable-auto-gomemlimit
Enable go runtime to automatically limit memory
consumption.
--auto-gomemlimit.ratio=0.9
The ratio of reserved GOMEMLIMIT memory to the
detected maximum container or system memory.
--deduplication.func=penalty
Experimental. Deduplication algorithm for
merging overlapping series. Possible values
are: "penalty", "chain". If no value is
specified, penalty based deduplication
algorithm will be used. When set to chain, the
default compact deduplication merger is used,
which performs 1:1 deduplication for samples.
At least one replica label has to be set via
--query.replica-label flag.
--enable-auto-gomemlimit Enable go runtime to automatically limit memory
consumption.
--endpoint=<endpoint> ... (Deprecated): Addresses of statically
configured Thanos API servers (repeatable).
The scheme may be prefixed with 'dns+' or
'dnssrv+' to detect Thanos API servers through
respective DNS lookups.
--endpoint-group=<endpoint-group> ...
(Deprecated, Experimental): DNS name of
statically configured Thanos API server groups
(repeatable). Targets resolved from the DNS
name will be queried in a round-robin, instead
of a fanout manner. This flag should be used
when connecting a Thanos Query to HA groups of
Thanos components.
--endpoint-group-strict=<endpoint-group-strict> ...
(Deprecated, Experimental): DNS name of
statically configured Thanos API server groups
(repeatable) that are always used, even if the
health check fails.
--endpoint-strict=<endpoint-strict> ...
(Deprecated): Addresses of only statically
configured Thanos API servers that are always
used, even if the health check fails. Useful if
you have a caching layer on top.
--endpoint.sd-config=<content>
Alternative to 'endpoint.sd-config-file' flag
(mutually exclusive). Content of Config File
with endpoint definitions
--endpoint.sd-config-file=<file-path>
Path to Config File with endpoint definitions
--endpoint.sd-config-reload-interval=5m
Interval between endpoint config refreshes
--grpc-address="0.0.0.0:10901"
Listen ip:port address for gRPC endpoints
(StoreAPI). Make sure this address is routable
from other components.
--grpc-client-server-name=""
Server name to verify the hostname on
the returned gRPC certificates. See
https://tools.ietf.org/html/rfc4366#section-3.1
--grpc-client-tls-ca="" TLS CA Certificates to use to verify gRPC
servers
--grpc-client-tls-cert="" TLS Certificates to use to identify this client
to the server
--grpc-client-tls-key="" TLS Key for the client's certificate
--grpc-client-tls-secure Use TLS when talking to the gRPC server
--grpc-client-tls-skip-verify
Disable TLS certificate verification i.e self
signed, signed by fake CA
--grpc-compression=none Compression algorithm to use for gRPC requests
to other clients. Must be one of: snappy, none
--grpc-grace-period=2m Time to wait after an interrupt received for
GRPC Server.
--grpc-server-max-connection-age=60m
The grpc server max connection age. This
controls how often to re-establish connections
and redo TLS handshakes.
--grpc-server-tls-cert="" TLS Certificate for gRPC server, leave blank to
disable TLS
--grpc-server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--grpc-server-tls-key="" TLS Key for the gRPC server, leave blank to
disable TLS
--grpc-server-tls-min-version="1.3"
TLS supported minimum version for gRPC server.
If no version is specified, it'll default to
1.3. Allowed values: ["1.0", "1.1", "1.2",
"1.3"]
-h, --help Show context-sensitive help (also try
--help-long and --help-man).
--http-address="0.0.0.0:10902"
Listen host:port for HTTP endpoints.
--http-grace-period=2m Time to wait after an interrupt received for
@ -396,179 +329,50 @@ Flags:
--http.config="" [EXPERIMENTAL] Path to the configuration file
that can enable TLS or authentication for all
HTTP endpoints.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--log.level=info Log filtering level.
--query.active-query-path=""
Directory to log currently active queries in
the queries.active file.
--query.auto-downsampling Enable automatic adjustment (step / 5) to what
source of data should be used in store gateways
if no max_source_resolution param is specified.
--query.conn-metric.label=external_labels... ...
Optional selection of query connection metric
labels to be collected from endpoint set
--query.default-evaluation-interval=1m
Set default evaluation interval for sub
queries.
--query.default-step=1s Set default step for range queries. Default
step is only used when step is not set in UI.
In such cases, Thanos UI will use default
step to calculate resolution (resolution
= max(rangeSeconds / 250, defaultStep)).
This will not work from Grafana, but Grafana
has __step variable which can be used.
--query.default-tenant-id="default-tenant"
Default tenant ID to use if tenant header is
not present
--query.enable-x-functions
Whether to enable extended rate functions
(xrate, xincrease and xdelta). Only has effect
when used with Thanos engine.
--query.enforce-tenancy Enforce tenancy on Query APIs. Responses
are returned only if the label value of the
configured tenant-label-name and the value of
the tenant header matches.
--query.lookback-delta=QUERY.LOOKBACK-DELTA
The maximum lookback duration for retrieving
metrics during expression evaluations.
PromQL always evaluates the query for the
certain timestamp (query range timestamps are
deduced by step). Since scrape intervals might
be different, PromQL looks back for given
amount of time to get latest sample. If it
exceeds the maximum lookback delta it assumes
series is stale and returns none (a gap).
This is why lookback delta should be set to at
least 2 times of the slowest scrape interval.
If unset it will use the promql default of 5m.
--query.max-concurrent=20 Maximum number of queries processed
concurrently by query node.
--query.max-concurrent-select=4
Maximum number of select requests made
concurrently per a query.
--query.metadata.default-time-range=0s
The default metadata time range duration for
retrieving labels through Labels and Series API
when the range parameters are not specified.
The zero value means range covers the time
since the beginning.
--query.mode=local PromQL query mode. One of: local, distributed.
--query.partial-response Enable partial response for queries if
no partial_response param is specified.
--no-query.partial-response for disabling.
--query.partition-label=QUERY.PARTITION-LABEL ...
Labels that partition the leaf queriers. This
is used to scope down the labelsets of leaf
queriers when using the distributed query mode.
If set, these labels must form a partition
of the leaf queriers. Partition labels must
not intersect with replica labels. Every TSDB
of a leaf querier must have these labels.
This is useful when there are multiple external
labels that are irrelevant for the partition as
it allows the distributed engine to ignore them
for some optimizations. If this is empty then
all labels are used as partition labels.
--query.promql-engine=prometheus
Default PromQL engine to use.
--query.replica-label=QUERY.REPLICA-LABEL ...
Labels to treat as a replica indicator along
which data is deduplicated. Still you will
be able to query without deduplication using
'dedup=false' parameter. Data includes time
series, recording rules, and alerting rules.
Flag may be specified multiple times as well as
a comma separated list of labels.
--query.telemetry.request-duration-seconds-quantiles=0.1... ...
The quantiles for exporting metrics about the
request duration quantiles.
--query.telemetry.request-samples-quantiles=100... ...
The quantiles for exporting metrics about the
samples count quantiles.
--query.telemetry.request-series-seconds-quantiles=10... ...
The quantiles for exporting metrics about the
series count quantiles.
--query.tenant-certificate-field=
Use TLS client's certificate field to determine
tenant for write requests. Must be one of
organization, organizationalUnit or commonName.
This setting will cause the query.tenant-header
flag value to be ignored.
--query.tenant-header="THANOS-TENANT"
HTTP header to determine tenant.
--query.tenant-label-name="tenant_id"
Label name to use when enforcing tenancy (if
--query.enforce-tenancy is enabled).
--query.timeout=2m Maximum time to process query by query node.
--request.logging-config=<content>
Alternative to 'request.logging-config-file'
flag (mutually exclusive). Content
of YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config-file=<file-path>
Path to YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--selector-label=<name>="<value>" ...
Query selector labels that will be exposed in
info endpoint (repeated).
--selector.relabel-config=<content>
Alternative to 'selector.relabel-config-file'
flag (mutually exclusive). Content of YAML
file with relabeling configuration that allows
selecting blocks to query based on their
external labels. It follows the Thanos sharding
relabel-config syntax. For format details see:
https://thanos.io/tip/thanos/sharding.md/#relabelling
--selector.relabel-config-file=<file-path>
Path to YAML file with relabeling
configuration that allows selecting blocks
to query based on their external labels.
It follows the Thanos sharding relabel-config
syntax. For format details see:
https://thanos.io/tip/thanos/sharding.md/#relabelling
--store.limits.request-samples=0
The maximum samples allowed for a single
Series request, The Series call fails if
this limit is exceeded. 0 means no limit.
NOTE: For efficiency the limit is internally
implemented as 'chunks limit' considering each
chunk contains a maximum of 120 samples.
--store.limits.request-series=0
The maximum series allowed for a single Series
request. The Series call fails if this limit is
exceeded. 0 means no limit.
--store.response-timeout=0ms
If a Store doesn't send any data in this
specified duration then a Store will be ignored
and partial data will be returned if it's
enabled. 0 disables timeout.
--store.sd-dns-interval=30s
Interval between DNS resolutions.
--store.sd-files=<path> ...
(Deprecated) Path to files that contain
addresses of store API servers. The path can be
a glob pattern (repeatable).
--store.sd-interval=5m (Deprecated) Refresh interval to re-read file
SD files. It is used as a resync fallback.
--store.unhealthy-timeout=5m
Timeout before an unhealthy store is cleaned
from the store UI page.
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--version Show application version.
--web.disable-cors Whether to disable CORS headers to be set by
Thanos. By default Thanos sets CORS headers to
be allowed by all.
--grpc-address="0.0.0.0:10901"
Listen ip:port address for gRPC endpoints
(StoreAPI). Make sure this address is routable
from other components.
--grpc-server-tls-cert="" TLS Certificate for gRPC server, leave blank to
disable TLS
--grpc-server-tls-key="" TLS Key for the gRPC server, leave blank to
disable TLS
--grpc-server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--grpc-server-tls-min-version="1.3"
TLS supported minimum version for gRPC server.
If no version is specified, it'll default to
1.3. Allowed values: ["1.0", "1.1", "1.2",
"1.3"]
--grpc-server-max-connection-age=60m
The grpc server max connection age. This
controls how often to re-establish connections
and redo TLS handshakes.
--grpc-grace-period=2m Time to wait after an interrupt received for
GRPC Server.
--[no-]grpc-client-tls-secure
Use TLS when talking to the gRPC server
--[no-]grpc-client-tls-skip-verify
Disable TLS certificate verification i.e self
signed, signed by fake CA
--grpc-client-tls-cert="" TLS Certificates to use to identify this client
to the server
--grpc-client-tls-key="" TLS Key for the client's certificate
--grpc-client-tls-ca="" TLS CA Certificates to use to verify gRPC
servers
--grpc-client-server-name=""
Server name to verify the hostname on
the returned gRPC certificates. See
https://tools.ietf.org/html/rfc4366#section-3.1
--grpc-compression=none Compression algorithm to use for gRPC requests
to other clients. Must be one of: snappy, none
--web.route-prefix="" Prefix for API and UI endpoints. This allows
thanos UI to be served on a sub-path.
Defaults to the value of --web.external-prefix.
This option is analogous to --web.route-prefix
of Prometheus.
--web.external-prefix="" Static prefix for all HTML links and
redirect URLs in the UI query web interface.
Actual endpoints are still served on / or the
@ -588,11 +392,217 @@ Flags:
stripped prefix value in X-Forwarded-Prefix
header. This allows thanos UI to be served on a
sub-path.
--web.route-prefix="" Prefix for API and UI endpoints. This allows
thanos UI to be served on a sub-path.
Defaults to the value of --web.external-prefix.
This option is analogous to --web.route-prefix
of Prometheus.
--[no-]web.disable-cors Whether to disable CORS headers to be set by
Thanos. By default Thanos sets CORS headers to
be allowed by all.
--query.timeout=2m Maximum time to process query by query node.
--query.promql-engine=prometheus
Default PromQL engine to use.
--[no-]query.enable-x-functions
Whether to enable extended rate functions
(xrate, xincrease and xdelta). Only has effect
when used with Thanos engine.
--query.mode=local PromQL query mode. One of: local, distributed.
--query.max-concurrent=20 Maximum number of queries processed
concurrently by query node.
--query.lookback-delta=QUERY.LOOKBACK-DELTA
The maximum lookback duration for retrieving
metrics during expression evaluations.
PromQL always evaluates the query for the
certain timestamp (query range timestamps are
deduced by step). Since scrape intervals might
be different, PromQL looks back for given
amount of time to get latest sample. If it
exceeds the maximum lookback delta it assumes
series is stale and returns none (a gap).
This is why lookback delta should be set to at
least 2 times of the slowest scrape interval.
If unset it will use the promql default of 5m.
--query.max-concurrent-select=4
Maximum number of select requests made
concurrently per a query.
--query.conn-metric.label=external_labels... ...
Optional selection of query connection metric
labels to be collected from endpoint set
--deduplication.func=penalty
Experimental. Deduplication algorithm for
merging overlapping series. Possible values
are: "penalty", "chain". If no value is
specified, penalty based deduplication
algorithm will be used. When set to chain, the
default compact deduplication merger is used,
which performs 1:1 deduplication for samples.
At least one replica label has to be set via
--query.replica-label flag.
--query.replica-label=QUERY.REPLICA-LABEL ...
Labels to treat as a replica indicator along
which data is deduplicated. Still you will
be able to query without deduplication using
'dedup=false' parameter. Data includes time
series, recording rules, and alerting rules.
Flag may be specified multiple times as well as
a comma separated list of labels.
--query.partition-label=QUERY.PARTITION-LABEL ...
Labels that partition the leaf queriers. This
is used to scope down the labelsets of leaf
queriers when using the distributed query mode.
If set, these labels must form a partition
of the leaf queriers. Partition labels must
not intersect with replica labels. Every TSDB
of a leaf querier must have these labels.
This is useful when there are multiple external
labels that are irrelevant for the partition as
it allows the distributed engine to ignore them
for some optimizations. If this is empty then
all labels are used as partition labels.
--query.metadata.default-time-range=0s
The default metadata time range duration for
retrieving labels through Labels and Series API
when the range parameters are not specified.
The zero value means range covers the time
since the beginning.
--selector-label=<name>="<value>" ...
Query selector labels that will be exposed in
info endpoint (repeated).
--[no-]query.auto-downsampling
Enable automatic adjustment (step / 5) to what
source of data should be used in store gateways
if no max_source_resolution param is specified.
--[no-]query.partial-response
Enable partial response for queries if
no partial_response param is specified.
--no-query.partial-response for disabling.
--query.active-query-path=""
Directory to log currently active queries in
the queries.active file.
--enable-feature= ... Comma separated feature names to enable. Valid
options for now: promql-experimental-functions
(enables promql experimental functions in
query)
--query.default-evaluation-interval=1m
Set default evaluation interval for sub
queries.
--query.default-step=1s Set default step for range queries. Default
step is only used when step is not set in UI.
In such cases, Thanos UI will use default
step to calculate resolution (resolution
= max(rangeSeconds / 250, defaultStep)).
This will not work from Grafana, but Grafana
has __step variable which can be used.
--store.response-timeout=0ms
If a Store doesn't send any data in this
specified duration then a Store will be ignored
and partial data will be returned if it's
enabled. 0 disables timeout.
--selector.relabel-config-file=<file-path>
Path to YAML file with relabeling
configuration that allows selecting blocks
to query based on their external labels.
It follows the Thanos sharding relabel-config
syntax. For format details see:
https://thanos.io/tip/thanos/sharding.md/#relabelling
--selector.relabel-config=<content>
Alternative to 'selector.relabel-config-file'
flag (mutually exclusive). Content of YAML
file with relabeling configuration that allows
selecting blocks to query based on their
external labels. It follows the Thanos sharding
relabel-config syntax. For format details see:
https://thanos.io/tip/thanos/sharding.md/#relabelling
--request.logging-config-file=<file-path>
Path to YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config=<content>
Alternative to 'request.logging-config-file'
flag (mutually exclusive). Content
of YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--alert.query-url=ALERT.QUERY-URL
The external Thanos Query URL that would be set
in all alerts 'Source' field.
--query.telemetry.request-duration-seconds-quantiles=0.1... ...
The quantiles for exporting metrics about the
request duration quantiles.
--query.telemetry.request-samples-quantiles=100... ...
The quantiles for exporting metrics about the
samples count quantiles.
--query.telemetry.request-series-seconds-quantiles=10... ...
The quantiles for exporting metrics about the
series count quantiles.
--query.tenant-header="THANOS-TENANT"
HTTP header to determine tenant.
--query.default-tenant-id="default-tenant"
Default tenant ID to use if tenant header is
not present
--query.tenant-certificate-field=
Use TLS client's certificate field to determine
tenant for write requests. Must be one of
organization, organizationalUnit or commonName.
This setting will cause the query.tenant-header
flag value to be ignored.
--[no-]query.enforce-tenancy
Enforce tenancy on Query APIs. Responses
are returned only if the label value of the
configured tenant-label-name and the value of
the tenant header matches.
--query.tenant-label-name="tenant_id"
Label name to use when enforcing tenancy (if
--query.enforce-tenancy is enabled).
--store.sd-dns-interval=30s
Interval between DNS resolutions.
--store.unhealthy-timeout=5m
Timeout before an unhealthy store is cleaned
from the store UI page.
--endpoint.sd-config-file=<file-path>
Path to Config File with endpoint definitions
--endpoint.sd-config=<content>
Alternative to 'endpoint.sd-config-file' flag
(mutually exclusive). Content of Config File
with endpoint definitions
--endpoint.sd-config-reload-interval=5m
Interval between endpoint config refreshes
--store.sd-files=<path> ...
(Deprecated) Path to files that contain
addresses of store API servers. The path can be
a glob pattern (repeatable).
--store.sd-interval=5m (Deprecated) Refresh interval to re-read file
SD files. It is used as a resync fallback.
--endpoint=<endpoint> ... (Deprecated): Addresses of statically
configured Thanos API servers (repeatable).
The scheme may be prefixed with 'dns+' or
'dnssrv+' to detect Thanos API servers through
respective DNS lookups.
--endpoint-group=<endpoint-group> ...
(Deprecated, Experimental): DNS name of
statically configured Thanos API server groups
(repeatable). Targets resolved from the DNS
name will be queried in a round-robin, instead
of a fanout manner. This flag should be used
when connecting a Thanos Query to HA groups of
Thanos components.
--endpoint-strict=<endpoint-strict> ...
(Deprecated): Addresses of only statically
configured Thanos API servers that are always
used, even if the health check fails. Useful if
you have a caching layer on top.
--endpoint-group-strict=<endpoint-group-strict> ...
(Deprecated, Experimental): DNS name of
statically configured Thanos API server groups
(repeatable) that are always used, even if the
health check fails.
--store.limits.request-series=0
The maximum series allowed for a single Series
request. The Series call fails if this limit is
exceeded. 0 means no limit.
--store.limits.request-samples=0
The maximum samples allowed for a single
Series request, The Series call fails if
this limit is exceeded. 0 means no limit.
NOTE: For efficiency the limit is internally
implemented as 'chunks limit' considering each
chunk contains a maximum of 120 samples.
```

View File

@ -22,6 +22,47 @@ The Ketama algorithm is a consistent hashing scheme which enables stable scaling
If you are using the `hashmod` algorithm and wish to migrate to `ketama`, the simplest and safest way would be to set up a new pool receivers with `ketama` hashrings and start remote-writing to them. Provided you are on the latest Thanos version, old receivers will flush their TSDBs after the configured retention period and will upload blocks to object storage. Once you have verified that is done, decommission the old receivers.
#### Shuffle sharding
Ketama also supports [shuffle sharding](https://aws.amazon.com/builders-library/workload-isolation-using-shuffle-sharding/). It allows you to provide a single-tenant experience in a multi-tenant system. With shuffle sharding, a tenant gets a subset of all nodes in a hashring. You can configure shuffle sharding for any Ketama hashring like so:
```json
[
{
"endpoints": [
{"address": "node-1:10901", "capnproto_address": "node-1:19391", "az": "foo"},
{"address": "node-2:10901", "capnproto_address": "node-2:19391", "az": "bar"},
{"address": "node-3:10901", "capnproto_address": "node-3:19391", "az": "qux"},
{"address": "node-4:10901", "capnproto_address": "node-4:19391", "az": "foo"},
{"address": "node-5:10901", "capnproto_address": "node-5:19391", "az": "bar"},
{"address": "node-6:10901", "capnproto_address": "node-6:19391", "az": "qux"}
],
"algorithm": "ketama",
"shuffle_sharding_config": {
"shard_size": 2,
"cache_size": 100,
"overrides": [
{
"shard_size": 3,
"tenants": ["prefix-tenant-*"],
"tenant_matcher_type": "glob"
}
]
}
}
]
```
This will enable shuffle sharding with the default shard size of 2 and override it to 3 for every tenant that starts with `prefix-tenant-`.
`cache_size` sets the size of the in-memory LRU cache of the computed subrings. It is not possible to cache everything because an attacker could possibly spam requests with random tenants and those subrings would stay in-memory forever.
With this config, `shard_size/number_of_azs` is chosen from each availability zone for each tenant. So, each tenant will get a unique and consistent set of 3 nodes.
You can use `zone_awareness_disabled` to disable zone awareness. This is useful in the case where you have many separate AZs and it doesn't matter which one to choose. The shards will ignore AZs but the Ketama algorithm will later prefer spreading load through as many AZs as possible. That's why with zone awareness disabled it is recommended to set the shard size to be `max(nodes_in_any_az, replication_factor)`.
Receive only supports stateless shuffle sharding now so it doesn't store and check there have been any overlaps between shards.
### Hashmod (discouraged)
This algorithm uses a `hashmod` function over all labels to decide which receiver is responsible for a given timeseries. This is the default algorithm due to historical reasons. However, its usage for new Receive installations is discouraged since adding new Receiver nodes leads to series churn and memory usage spikes.
@ -331,7 +372,7 @@ Please see the metric `thanos_receive_forward_delay_seconds` to see if you need
The following formula is used for calculating quorum:
```go mdox-exec="sed -n '1029,1039p' pkg/receive/handler.go"
```go mdox-exec="sed -n '1046,1056p' pkg/receive/handler.go"
// writeQuorum returns minimum number of replicas that has to confirm write success before claiming replication success.
func (h *Handler) writeQuorum() int {
// NOTE(GiedriusS): this is here because otherwise RF=2 doesn't make sense as all writes
@ -354,46 +395,29 @@ usage: thanos receive [<flags>]
Accept Prometheus remote write API requests and write to local tsdb.
Flags:
-h, --[no-]help Show context-sensitive help (also try
--help-long and --help-man).
--[no-]version Show application version.
--log.level=info Log filtering level.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--[no-]enable-auto-gomemlimit
Enable go runtime to automatically limit memory
consumption.
--auto-gomemlimit.ratio=0.9
The ratio of reserved GOMEMLIMIT memory to the
detected maximum container or system memory.
--enable-auto-gomemlimit Enable go runtime to automatically limit memory
consumption.
--enable-feature= ... Comma separated experimental feature names
to enable. The current list of features is
metric-names-filter.
--grpc-address="0.0.0.0:10901"
Listen ip:port address for gRPC endpoints
(StoreAPI). Make sure this address is routable
from other components.
--grpc-grace-period=2m Time to wait after an interrupt received for
GRPC Server.
--grpc-server-max-connection-age=60m
The grpc server max connection age. This
controls how often to re-establish connections
and redo TLS handshakes.
--grpc-server-tls-cert="" TLS Certificate for gRPC server, leave blank to
disable TLS
--grpc-server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--grpc-server-tls-key="" TLS Key for the gRPC server, leave blank to
disable TLS
--grpc-server-tls-min-version="1.3"
TLS supported minimum version for gRPC server.
If no version is specified, it'll default to
1.3. Allowed values: ["1.0", "1.1", "1.2",
"1.3"]
--hash-func= Specify which hash function to use when
calculating the hashes of produced files.
If no function has been specified, it does not
happen. This permits avoiding downloading some
files twice albeit at some performance cost.
Possible values are: "", "SHA256".
-h, --help Show context-sensitive help (also try
--help-long and --help-man).
--http-address="0.0.0.0:10902"
Listen host:port for HTTP endpoints.
--http-grace-period=2m Time to wait after an interrupt received for
@ -401,40 +425,100 @@ Flags:
--http.config="" [EXPERIMENTAL] Path to the configuration file
that can enable TLS or authentication for all
HTTP endpoints.
--grpc-address="0.0.0.0:10901"
Listen ip:port address for gRPC endpoints
(StoreAPI). Make sure this address is routable
from other components.
--grpc-server-tls-cert="" TLS Certificate for gRPC server, leave blank to
disable TLS
--grpc-server-tls-key="" TLS Key for the gRPC server, leave blank to
disable TLS
--grpc-server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--grpc-server-tls-min-version="1.3"
TLS supported minimum version for gRPC server.
If no version is specified, it'll default to
1.3. Allowed values: ["1.0", "1.1", "1.2",
"1.3"]
--grpc-server-max-connection-age=60m
The grpc server max connection age. This
controls how often to re-establish connections
and redo TLS handshakes.
--grpc-grace-period=2m Time to wait after an interrupt received for
GRPC Server.
--store.limits.request-series=0
The maximum series allowed for a single Series
request. The Series call fails if this limit is
exceeded. 0 means no limit.
--store.limits.request-samples=0
The maximum samples allowed for a single
Series request, The Series call fails if
this limit is exceeded. 0 means no limit.
NOTE: For efficiency the limit is internally
implemented as 'chunks limit' considering each
chunk contains a maximum of 120 samples.
--remote-write.address="0.0.0.0:19291"
Address to listen on for remote write requests.
--remote-write.server-tls-cert=""
TLS Certificate for HTTP server, leave blank to
disable TLS.
--remote-write.server-tls-key=""
TLS Key for the HTTP server, leave blank to
disable TLS.
--remote-write.server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--remote-write.server-tls-min-version="1.3"
TLS version for the gRPC server, leave blank
to default to TLS 1.3, allow values: ["1.0",
"1.1", "1.2", "1.3"]
--remote-write.client-tls-cert=""
TLS Certificates to use to identify this client
to the server.
--remote-write.client-tls-key=""
TLS Key for the client's certificate.
--[no-]remote-write.client-tls-secure
Use TLS when talking to the other receivers.
--[no-]remote-write.client-tls-skip-verify
Disable TLS certificate verification when
talking to the other receivers i.e self signed,
signed by fake CA.
--remote-write.client-tls-ca=""
TLS CA Certificates to use to verify servers.
--remote-write.client-server-name=""
Server name to verify the hostname
on the returned TLS certificates. See
https://tools.ietf.org/html/rfc4366#section-3.1
--tsdb.path="./data" Data directory of TSDB.
--label=key="value" ... External labels to announce. This flag will be
removed in the future when handling multiple
tsdb instances is added.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--log.level=info Log filtering level.
--matcher-cache-size=0 Max number of cached matchers items. Using 0
disables caching.
--objstore.config-file=<file-path>
Path to YAML file that contains object
store configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--objstore.config=<content>
Alternative to 'objstore.config-file'
flag (mutually exclusive). Content of
YAML file that contains object store
configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--objstore.config-file=<file-path>
Path to YAML file that contains object
store configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--receive.capnproto-address="0.0.0.0:19391"
Address for the Cap'n Proto server.
--receive.default-tenant-id="default-tenant"
Default tenant ID to use when none is provided
via a header.
--receive.forward.async-workers=5
Number of concurrent workers processing
forwarding of remote-write requests.
--receive.grpc-compression=snappy
Compression algorithm to use for gRPC requests
to other receivers. Must be one of: snappy,
none
--receive.grpc-service-config=<content>
gRPC service configuration file
or content in JSON format. See
https://github.com/grpc/grpc/blob/master/doc/service_config.md
--tsdb.retention=15d How long to retain raw samples on local
storage. 0d - disables the retention
policy (i.e. infinite retention).
For more details on how retention is
enforced for individual tenants, please
refer to the Tenant lifecycle management
section in the Receive documentation:
https://thanos.io/tip/components/receive.md/#tenant-lifecycle-management
--receive.hashrings-file=<path>
Path to file that contains the hashring
configuration. A watcher is initialized
to watch changes and update the hashring
dynamically.
--receive.hashrings=<content>
Alternative to 'receive.hashrings-file' flag
(lower priority). Content of file that contains
@ -444,11 +528,6 @@ Flags:
the hashrings. Must be one of hashmod, ketama.
Will be overwritten by the tenant-specific
algorithm in the hashring config.
--receive.hashrings-file=<path>
Path to file that contains the hashring
configuration. A watcher is initialized
to watch changes and update the hashring
dynamically.
--receive.hashrings-file-refresh-interval=5m
Refresh interval to re-read the hashring
configuration file. (used as a fallback)
@ -458,23 +537,35 @@ Flags:
configuration. If it's empty AND hashring
configuration was provided, it means that
receive will run in RoutingOnly mode.
--receive.otlp-enable-target-info
Enables target information in OTLP metrics
ingested by Receive. If enabled, it converts
the resource to the target info metric
--receive.otlp-promote-resource-attributes= ...
(Repeatable) Resource attributes to include in
OTLP metrics ingested by Receive.
--receive.relabel-config=<content>
Alternative to 'receive.relabel-config-file'
flag (mutually exclusive). Content of YAML file
that contains relabeling configuration.
--receive.relabel-config-file=<file-path>
Path to YAML file that contains relabeling
configuration.
--receive.tenant-header="THANOS-TENANT"
HTTP header to determine tenant for write
requests.
--receive.tenant-certificate-field=
Use TLS client's certificate field to
determine tenant for write requests.
Must be one of organization, organizationalUnit
or commonName. This setting will cause the
receive.tenant-header flag value to be ignored.
--receive.default-tenant-id="default-tenant"
Default tenant ID to use when none is provided
via a header.
--receive.split-tenant-label-name=""
Label name through which the request will
be split into multiple tenants. This takes
precedence over the HTTP header.
--receive.tenant-label-name="tenant_id"
Label name through which the tenant will be
announced.
--receive.replica-header="THANOS-REPLICA"
HTTP header specifying the replica number of a
write request.
--receive.forward.async-workers=5
Number of concurrent workers processing
forwarding of remote-write requests.
--receive.grpc-compression=snappy
Compression algorithm to use for gRPC requests
to other receivers. Must be one of: snappy,
none
--receive.replication-factor=1
How many times to replicate incoming write
requests.
@ -482,95 +573,59 @@ Flags:
The protocol to use for replicating
remote-write requests. One of protobuf,
capnproto
--receive.split-tenant-label-name=""
Label name through which the request will
be split into multiple tenants. This takes
precedence over the HTTP header.
--receive.tenant-certificate-field=
Use TLS client's certificate field to
determine tenant for write requests.
Must be one of organization, organizationalUnit
or commonName. This setting will cause the
receive.tenant-header flag value to be ignored.
--receive.tenant-header="THANOS-TENANT"
HTTP header to determine tenant for write
requests.
--receive.tenant-label-name="tenant_id"
Label name through which the tenant will be
announced.
--remote-write.address="0.0.0.0:19291"
Address to listen on for remote write requests.
--remote-write.client-server-name=""
Server name to verify the hostname
on the returned TLS certificates. See
https://tools.ietf.org/html/rfc4366#section-3.1
--remote-write.client-tls-ca=""
TLS CA Certificates to use to verify servers.
--remote-write.client-tls-cert=""
TLS Certificates to use to identify this client
to the server.
--remote-write.client-tls-key=""
TLS Key for the client's certificate.
--remote-write.client-tls-secure
Use TLS when talking to the other receivers.
--remote-write.client-tls-skip-verify
Disable TLS certificate verification when
talking to the other receivers i.e self signed,
signed by fake CA.
--remote-write.server-tls-cert=""
TLS Certificate for HTTP server, leave blank to
disable TLS.
--remote-write.server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--remote-write.server-tls-key=""
TLS Key for the HTTP server, leave blank to
disable TLS.
--remote-write.server-tls-min-version="1.3"
TLS version for the gRPC server, leave blank
to default to TLS 1.3, allow values: ["1.0",
"1.1", "1.2", "1.3"]
--request.logging-config=<content>
Alternative to 'request.logging-config-file'
flag (mutually exclusive). Content
of YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config-file=<file-path>
Path to YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--store.limits.request-samples=0
The maximum samples allowed for a single
Series request, The Series call fails if
this limit is exceeded. 0 means no limit.
NOTE: For efficiency the limit is internally
implemented as 'chunks limit' considering each
chunk contains a maximum of 120 samples.
--store.limits.request-series=0
The maximum series allowed for a single Series
request. The Series call fails if this limit is
exceeded. 0 means no limit.
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tsdb.allow-overlapping-blocks
--receive.capnproto-address="0.0.0.0:19391"
Address for the Cap'n Proto server.
--receive.grpc-service-config=<content>
gRPC service configuration file
or content in JSON format. See
https://github.com/grpc/grpc/blob/master/doc/service_config.md
--receive.relabel-config-file=<file-path>
Path to YAML file that contains relabeling
configuration.
--receive.relabel-config=<content>
Alternative to 'receive.relabel-config-file'
flag (mutually exclusive). Content of YAML file
that contains relabeling configuration.
--tsdb.too-far-in-future.time-window=0s
Configures the allowed time window for
ingesting samples too far in the future.
Disabled (0s) by default. Please note enable
this flag will reject samples in the future of
receive local NTP time + configured duration
due to clock skew in remote write clients.
--tsdb.out-of-order.time-window=0s
[EXPERIMENTAL] Configures the allowed
time window for ingestion of out-of-order
samples. Disabled (0s) by defaultPlease
note if you enable this option and you
use compactor, make sure you have the
--compact.enable-vertical-compaction flag
enabled, otherwise you might risk compactor
halt.
--tsdb.out-of-order.cap-max=0
[EXPERIMENTAL] Configures the maximum capacity
for out-of-order chunks (in samples). If set to
<=0, default value 32 is assumed.
--[no-]tsdb.allow-overlapping-blocks
Allow overlapping blocks, which in turn enables
vertical compaction and vertical query merge.
Does not do anything, enabled all the time.
--tsdb.block.expanded-postings-cache-size=0
[EXPERIMENTAL] If non-zero, enables expanded
postings cache for compacted blocks.
--tsdb.max-retention-bytes=0
Maximum number of bytes that can be stored for
blocks. A unit is required, supported units: B,
KB, MB, GB, TB, PB, EB. Ex: "512MB". Based on
powers-of-2, so 1KB is 1024B.
--[no-]tsdb.wal-compression
Compress the tsdb WAL.
--[no-]tsdb.no-lockfile Do not create lockfile in TSDB data directory.
In any case, the lockfiles will be deleted on
next startup.
--tsdb.head.expanded-postings-cache-size=0
[EXPERIMENTAL] If non-zero, enables expanded
postings cache for the head block.
--tsdb.block.expanded-postings-cache-size=0
[EXPERIMENTAL] If non-zero, enables expanded
postings cache for compacted blocks.
--tsdb.max-exemplars=0 Enables support for ingesting exemplars and
sets the maximum number of exemplars that will
be stored per tenant. In case the exemplar
@ -579,43 +634,41 @@ Flags:
ingesting a new exemplar will evict the oldest
exemplar from storage. 0 (or less) value of
this flag disables exemplars storage.
--tsdb.max-retention-bytes=0
Maximum number of bytes that can be stored for
blocks. A unit is required, supported units: B,
KB, MB, GB, TB, PB, EB. Ex: "512MB". Based on
powers-of-2, so 1KB is 1024B.
--tsdb.no-lockfile Do not create lockfile in TSDB data directory.
In any case, the lockfiles will be deleted on
next startup.
--tsdb.out-of-order.cap-max=0
[EXPERIMENTAL] Configures the maximum capacity
for out-of-order chunks (in samples). If set to
<=0, default value 32 is assumed.
--tsdb.out-of-order.time-window=0s
[EXPERIMENTAL] Configures the allowed time
window for ingestion of out-of-order samples.
Disabled (0s) by defaultPlease note if you
enable this option and you use compactor, make
sure you have the --enable-vertical-compaction
flag enabled, otherwise you might risk
compactor halt.
--tsdb.path="./data" Data directory of TSDB.
--tsdb.retention=15d How long to retain raw samples on local
storage. 0d - disables the retention
policy (i.e. infinite retention).
For more details on how retention is
enforced for individual tenants, please
refer to the Tenant lifecycle management
section in the Receive documentation:
https://thanos.io/tip/components/receive.md/#tenant-lifecycle-management
--tsdb.too-far-in-future.time-window=0s
Configures the allowed time window for
ingesting samples too far in the future.
Disabled (0s) by default. Please note enable
this flag will reject samples in the future of
receive local NTP time + configured duration
due to clock skew in remote write clients.
--tsdb.wal-compression Compress the tsdb WAL.
--version Show application version.
--[no-]tsdb.enable-native-histograms
[EXPERIMENTAL] Enables the ingestion of native
histograms.
--hash-func= Specify which hash function to use when
calculating the hashes of produced files.
If no function has been specified, it does not
happen. This permits avoiding downloading some
files twice albeit at some performance cost.
Possible values are: "", "SHA256".
--matcher-cache-size=0 Max number of cached matchers items. Using 0
disables caching.
--request.logging-config-file=<file-path>
Path to YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config=<content>
Alternative to 'request.logging-config-file'
flag (mutually exclusive). Content
of YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--[no-]receive.otlp-enable-target-info
Enables target information in OTLP metrics
ingested by Receive. If enabled, it converts
the resource to the target info metric
--receive.otlp-promote-resource-attributes= ...
(Repeatable) Resource attributes to include in
OTLP metrics ingested by Receive.
--enable-feature= ... Comma separated experimental feature names
to enable. The current list of features is
metric-names-filter.
--receive.lazy-retrieval-max-buffered-responses=20
The lazy retrieval strategy can buffer up to
this number of responses. This is to limit the
memory usage. This flag takes effect only when
the lazy retrieval strategy is enabled.
```

View File

@ -269,106 +269,29 @@ usage: thanos rule [<flags>]
Ruler evaluating Prometheus rules against given Query nodes, exposing Store API
and storing old blocks in bucket.
Flags:
--alert.label-drop=ALERT.LABEL-DROP ...
Labels by name to drop before sending
to alertmanager. This allows alert to be
deduplicated on replica label (repeated).
Similar Prometheus alert relabelling
--alert.query-template="/graph?g0.expr={{.Expr}}&g0.tab=1"
Template to use in alerts source field.
Need only include {{.Expr}} parameter
--alert.query-url=ALERT.QUERY-URL
The external Thanos Query URL that would be set
in all alerts 'Source' field
--alert.relabel-config=<content>
Alternative to 'alert.relabel-config-file' flag
(mutually exclusive). Content of YAML file that
contains alert relabelling configuration.
--alert.relabel-config-file=<file-path>
Path to YAML file that contains alert
relabelling configuration.
--alertmanagers.config=<content>
Alternative to 'alertmanagers.config-file'
flag (mutually exclusive). Content
of YAML file that contains alerting
-h, --[no-]help Show context-sensitive help (also try
--help-long and --help-man).
--[no-]version Show application version.
--log.level=info Log filtering level.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/components/rule.md/#configuration.
If defined, it takes precedence
over the '--alertmanagers.url' and
'--alertmanagers.send-timeout' flags.
--alertmanagers.config-file=<file-path>
Path to YAML file that contains alerting
configuration. See format details:
https://thanos.io/tip/components/rule.md/#configuration.
If defined, it takes precedence
over the '--alertmanagers.url' and
'--alertmanagers.send-timeout' flags.
--alertmanagers.sd-dns-interval=30s
Interval between DNS resolutions of
Alertmanager hosts.
--alertmanagers.send-timeout=10s
Timeout for sending alerts to Alertmanager
--alertmanagers.url=ALERTMANAGERS.URL ...
Alertmanager replica URLs to push firing
alerts. Ruler claims success if push to
at least one alertmanager from discovered
succeeds. The scheme should not be empty
e.g `http` might be used. The scheme may be
prefixed with 'dns+' or 'dnssrv+' to detect
Alertmanager IPs through respective DNS
lookups. The port defaults to 9093 or the
SRV record's value. The URL path is used as a
prefix for the regular Alertmanager API path.
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--[no-]enable-auto-gomemlimit
Enable go runtime to automatically limit memory
consumption.
--auto-gomemlimit.ratio=0.9
The ratio of reserved GOMEMLIMIT memory to the
detected maximum container or system memory.
--data-dir="data/" data directory
--enable-auto-gomemlimit Enable go runtime to automatically limit memory
consumption.
--eval-interval=1m The default evaluation interval to use.
--for-grace-period=10m Minimum duration between alert and restored
"for" state. This is maintained only for alerts
with configured "for" time greater than grace
period.
--for-outage-tolerance=1h Max time to tolerate prometheus outage for
restoring "for" state of alert.
--grpc-address="0.0.0.0:10901"
Listen ip:port address for gRPC endpoints
(StoreAPI). Make sure this address is routable
from other components.
--grpc-grace-period=2m Time to wait after an interrupt received for
GRPC Server.
--grpc-query-endpoint=<endpoint> ...
Addresses of Thanos gRPC query API servers
(repeatable). The scheme may be prefixed
with 'dns+' or 'dnssrv+' to detect Thanos API
servers through respective DNS lookups.
--grpc-server-max-connection-age=60m
The grpc server max connection age. This
controls how often to re-establish connections
and redo TLS handshakes.
--grpc-server-tls-cert="" TLS Certificate for gRPC server, leave blank to
disable TLS
--grpc-server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--grpc-server-tls-key="" TLS Key for the gRPC server, leave blank to
disable TLS
--grpc-server-tls-min-version="1.3"
TLS supported minimum version for gRPC server.
If no version is specified, it'll default to
1.3. Allowed values: ["1.0", "1.1", "1.2",
"1.3"]
--hash-func= Specify which hash function to use when
calculating the hashes of produced files.
If no function has been specified, it does not
happen. This permits avoiding downloading some
files twice albeit at some performance cost.
Possible values are: "", "SHA256".
-h, --help Show context-sensitive help (also try
--help-long and --help-man).
--http-address="0.0.0.0:10902"
Listen host:port for HTTP endpoints.
--http-grace-period=2m Time to wait after an interrupt received for
@ -376,145 +299,33 @@ Flags:
--http.config="" [EXPERIMENTAL] Path to the configuration file
that can enable TLS or authentication for all
HTTP endpoints.
--label=<name>="<value>" ...
Labels to be applied to all generated metrics
(repeated). Similar to external labels for
Prometheus, used to identify ruler and its
blocks as unique source.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--log.level=info Log filtering level.
--objstore.config=<content>
Alternative to 'objstore.config-file'
flag (mutually exclusive). Content of
YAML file that contains object store
configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--objstore.config-file=<file-path>
Path to YAML file that contains object
store configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--query=<query> ... Addresses of statically configured query
API servers (repeatable). The scheme may be
prefixed with 'dns+' or 'dnssrv+' to detect
query API servers through respective DNS
lookups.
--query.config=<content> Alternative to 'query.config-file' flag
(mutually exclusive). Content of YAML
file that contains query API servers
configuration. See format details:
https://thanos.io/tip/components/rule.md/#configuration.
If defined, it takes precedence over the
'--query' and '--query.sd-files' flags.
--query.config-file=<file-path>
Path to YAML file that contains query API
servers configuration. See format details:
https://thanos.io/tip/components/rule.md/#configuration.
If defined, it takes precedence over the
'--query' and '--query.sd-files' flags.
--query.default-step=1s Default range query step to use. This is
only used in stateless Ruler and alert state
restoration.
--query.enable-x-functions
Whether to enable extended rate functions
(xrate, xincrease and xdelta). Only has effect
when used with Thanos engine.
--query.http-method=POST HTTP method to use when sending queries.
Possible options: [GET, POST]
--query.sd-dns-interval=30s
Interval between DNS resolutions.
--query.sd-files=<path> ...
Path to file that contains addresses of query
API servers. The path can be a glob pattern
(repeatable).
--query.sd-interval=5m Refresh interval to re-read file SD files.
(used as a fallback)
--remote-write.config=<content>
Alternative to 'remote-write.config-file'
flag (mutually exclusive). Content
of YAML config for the remote-write
configurations, that specify servers
where samples should be sent to (see
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write).
This automatically enables stateless mode
for ruler and no series will be stored in the
ruler's TSDB. If an empty config (or file) is
provided, the flag is ignored and ruler is run
with its own TSDB.
--remote-write.config-file=<file-path>
Path to YAML config for the remote-write
configurations, that specify servers
where samples should be sent to (see
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write).
This automatically enables stateless mode
for ruler and no series will be stored in the
ruler's TSDB. If an empty config (or file) is
provided, the flag is ignored and ruler is run
with its own TSDB.
--request.logging-config=<content>
Alternative to 'request.logging-config-file'
flag (mutually exclusive). Content
of YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config-file=<file-path>
Path to YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--resend-delay=1m Minimum amount of time to wait before resending
an alert to Alertmanager.
--restore-ignored-label=RESTORE-IGNORED-LABEL ...
Label names to be ignored when restoring alerts
from the remote storage. This is only used in
stateless mode.
--rule-concurrent-evaluation=1
How many rules can be evaluated concurrently.
Default is 1.
--rule-file=rules/ ... Rule files that should be used by rule
manager. Can be in glob format (repeated).
Note that rules are not automatically detected,
use SIGHUP or do HTTP POST /-/reload to re-read
them.
--rule-query-offset=0s The default rule group query_offset duration to
use.
--shipper.meta-file-name="thanos.shipper.json"
the file to store shipper metadata in
--shipper.upload-compacted
If true shipper will try to upload compacted
blocks as well. Useful for migration purposes.
Works only if compaction is disabled on
Prometheus. Do it once and then disable the
flag when done.
--store.limits.request-samples=0
The maximum samples allowed for a single
Series request, The Series call fails if
this limit is exceeded. 0 means no limit.
NOTE: For efficiency the limit is internally
implemented as 'chunks limit' considering each
chunk contains a maximum of 120 samples.
--store.limits.request-series=0
The maximum series allowed for a single Series
request. The Series call fails if this limit is
exceeded. 0 means no limit.
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tsdb.block-duration=2h Block duration for TSDB block.
--tsdb.no-lockfile Do not create lockfile in TSDB data directory.
In any case, the lockfiles will be deleted on
next startup.
--tsdb.retention=48h Block retention time on local disk.
--tsdb.wal-compression Compress the tsdb WAL.
--version Show application version.
--web.disable-cors Whether to disable CORS headers to be set by
Thanos. By default Thanos sets CORS headers to
be allowed by all.
--grpc-address="0.0.0.0:10901"
Listen ip:port address for gRPC endpoints
(StoreAPI). Make sure this address is routable
from other components.
--grpc-server-tls-cert="" TLS Certificate for gRPC server, leave blank to
disable TLS
--grpc-server-tls-key="" TLS Key for the gRPC server, leave blank to
disable TLS
--grpc-server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--grpc-server-tls-min-version="1.3"
TLS supported minimum version for gRPC server.
If no version is specified, it'll default to
1.3. Allowed values: ["1.0", "1.1", "1.2",
"1.3"]
--grpc-server-max-connection-age=60m
The grpc server max connection age. This
controls how often to re-establish connections
and redo TLS handshakes.
--grpc-grace-period=2m Time to wait after an interrupt received for
GRPC Server.
--web.route-prefix="" Prefix for API and UI endpoints. This allows
thanos UI to be served on a sub-path. This
option is analogous to --web.route-prefix of
Prometheus.
--web.external-prefix="" Static prefix for all HTML links and redirect
URLs in the bucket web UI interface.
Actual endpoints are still served on / or the
@ -534,10 +345,209 @@ Flags:
stripped prefix value in X-Forwarded-Prefix
header. This allows thanos UI to be served on a
sub-path.
--web.route-prefix="" Prefix for API and UI endpoints. This allows
thanos UI to be served on a sub-path. This
option is analogous to --web.route-prefix of
Prometheus.
--[no-]web.disable-cors Whether to disable CORS headers to be set by
Thanos. By default Thanos sets CORS headers to
be allowed by all.
--[no-]shipper.upload-compacted
If true shipper will try to upload compacted
blocks as well. Useful for migration purposes.
Works only if compaction is disabled on
Prometheus. Do it once and then disable the
flag when done.
--hash-func= Specify which hash function to use when
calculating the hashes of produced files.
If no function has been specified, it does not
happen. This permits avoiding downloading some
files twice albeit at some performance cost.
Possible values are: "", "SHA256".
--shipper.meta-file-name="thanos.shipper.json"
the file to store shipper metadata in
--query=<query> ... Addresses of statically configured query
API servers (repeatable). The scheme may be
prefixed with 'dns+' or 'dnssrv+' to detect
query API servers through respective DNS
lookups.
--query.config-file=<file-path>
Path to YAML file that contains query API
servers configuration. See format details:
https://thanos.io/tip/components/rule.md/#configuration.
If defined, it takes precedence over the
'--query' and '--query.sd-files' flags.
--query.config=<content> Alternative to 'query.config-file' flag
(mutually exclusive). Content of YAML
file that contains query API servers
configuration. See format details:
https://thanos.io/tip/components/rule.md/#configuration.
If defined, it takes precedence over the
'--query' and '--query.sd-files' flags.
--query.sd-files=<path> ...
Path to file that contains addresses of query
API servers. The path can be a glob pattern
(repeatable).
--query.sd-interval=5m Refresh interval to re-read file SD files.
(used as a fallback)
--query.sd-dns-interval=30s
Interval between DNS resolutions.
--query.http-method=POST HTTP method to use when sending queries.
Possible options: [GET, POST]
--query.default-step=1s Default range query step to use. This is
only used in stateless Ruler and alert state
restoration.
--alertmanagers.config-file=<file-path>
Path to YAML file that contains alerting
configuration. See format details:
https://thanos.io/tip/components/rule.md/#configuration.
If defined, it takes precedence
over the '--alertmanagers.url' and
'--alertmanagers.send-timeout' flags.
--alertmanagers.config=<content>
Alternative to 'alertmanagers.config-file'
flag (mutually exclusive). Content
of YAML file that contains alerting
configuration. See format details:
https://thanos.io/tip/components/rule.md/#configuration.
If defined, it takes precedence
over the '--alertmanagers.url' and
'--alertmanagers.send-timeout' flags.
--alertmanagers.url=ALERTMANAGERS.URL ...
Alertmanager replica URLs to push firing
alerts. Ruler claims success if push to
at least one alertmanager from discovered
succeeds. The scheme should not be empty
e.g `http` might be used. The scheme may be
prefixed with 'dns+' or 'dnssrv+' to detect
Alertmanager IPs through respective DNS
lookups. The port defaults to 9093 or the
SRV record's value. The URL path is used as a
prefix for the regular Alertmanager API path.
--alertmanagers.send-timeout=10s
Timeout for sending alerts to Alertmanager
--alertmanagers.sd-dns-interval=30s
Interval between DNS resolutions of
Alertmanager hosts.
--alert.query-url=ALERT.QUERY-URL
The external Thanos Query URL that would be set
in all alerts 'Source' field
--alert.label-drop=ALERT.LABEL-DROP ...
Labels by name to drop before sending
to alertmanager. This allows alert to be
deduplicated on replica label (repeated).
Similar Prometheus alert relabelling
--alert.relabel-config-file=<file-path>
Path to YAML file that contains alert
relabelling configuration.
--alert.relabel-config=<content>
Alternative to 'alert.relabel-config-file' flag
(mutually exclusive). Content of YAML file that
contains alert relabelling configuration.
--alert.query-template="/graph?g0.expr={{.Expr}}&g0.tab=1"
Template to use in alerts source field.
Need only include {{.Expr}} parameter
--store.limits.request-series=0
The maximum series allowed for a single Series
request. The Series call fails if this limit is
exceeded. 0 means no limit.
--store.limits.request-samples=0
The maximum samples allowed for a single
Series request, The Series call fails if
this limit is exceeded. 0 means no limit.
NOTE: For efficiency the limit is internally
implemented as 'chunks limit' considering each
chunk contains a maximum of 120 samples.
--label=<name>="<value>" ...
Labels to be applied to all generated metrics
(repeated). Similar to external labels for
Prometheus, used to identify ruler and its
blocks as unique source.
--tsdb.block-duration=2h Block duration for TSDB block.
--tsdb.retention=48h Block retention time on local disk.
--[no-]tsdb.no-lockfile Do not create lockfile in TSDB data directory.
In any case, the lockfiles will be deleted on
next startup.
--[no-]tsdb.wal-compression
Compress the tsdb WAL.
--data-dir="data/" data directory
--rule-file=rules/ ... Rule files that should be used by rule
manager. Can be in glob format (repeated).
Note that rules are not automatically detected,
use SIGHUP or do HTTP POST /-/reload to re-read
them.
--resend-delay=1m Minimum amount of time to wait before resending
an alert to Alertmanager.
--eval-interval=1m The default evaluation interval to use.
--rule-query-offset=0s The default rule group query_offset duration to
use.
--for-outage-tolerance=1h Max time to tolerate prometheus outage for
restoring "for" state of alert.
--for-grace-period=10m Minimum duration between alert and restored
"for" state. This is maintained only for alerts
with configured "for" time greater than grace
period.
--restore-ignored-label=RESTORE-IGNORED-LABEL ...
Label names to be ignored when restoring alerts
from the remote storage. This is only used in
stateless mode.
--rule-concurrent-evaluation=1
How many rules can be evaluated concurrently.
Default is 1.
--grpc-query-endpoint=<endpoint> ...
Addresses of Thanos gRPC query API servers
(repeatable). The scheme may be prefixed
with 'dns+' or 'dnssrv+' to detect Thanos API
servers through respective DNS lookups.
--[no-]query.enable-x-functions
Whether to enable extended rate functions
(xrate, xincrease and xdelta). Only has effect
when used with Thanos engine.
--enable-feature= ... Comma separated feature names to enable. Valid
options for now: promql-experimental-functions
(enables promql experimental functions for
ruler)
--[no-]tsdb.enable-native-histograms
[EXPERIMENTAL] Enables the ingestion of native
histograms.
--remote-write.config-file=<file-path>
Path to YAML config for the remote-write
configurations, that specify servers
where samples should be sent to (see
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write).
This automatically enables stateless mode
for ruler and no series will be stored in the
ruler's TSDB. If an empty config (or file) is
provided, the flag is ignored and ruler is run
with its own TSDB.
--remote-write.config=<content>
Alternative to 'remote-write.config-file'
flag (mutually exclusive). Content
of YAML config for the remote-write
configurations, that specify servers
where samples should be sent to (see
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write).
This automatically enables stateless mode
for ruler and no series will be stored in the
ruler's TSDB. If an empty config (or file) is
provided, the flag is ignored and ruler is run
with its own TSDB.
--objstore.config-file=<file-path>
Path to YAML file that contains object
store configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--objstore.config=<content>
Alternative to 'objstore.config-file'
flag (mutually exclusive). Content of
YAML file that contains object store
configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--request.logging-config-file=<file-path>
Path to YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config=<content>
Alternative to 'request.logging-config-file'
flag (mutually exclusive). Content
of YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
```

View File

@ -95,43 +95,29 @@ usage: thanos sidecar [<flags>]
Sidecar for Prometheus server.
Flags:
-h, --[no-]help Show context-sensitive help (also try
--help-long and --help-man).
--[no-]version Show application version.
--log.level=info Log filtering level.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--[no-]enable-auto-gomemlimit
Enable go runtime to automatically limit memory
consumption.
--auto-gomemlimit.ratio=0.9
The ratio of reserved GOMEMLIMIT memory to the
detected maximum container or system memory.
--enable-auto-gomemlimit Enable go runtime to automatically limit memory
consumption.
--grpc-address="0.0.0.0:10901"
Listen ip:port address for gRPC endpoints
(StoreAPI). Make sure this address is routable
from other components.
--grpc-grace-period=2m Time to wait after an interrupt received for
GRPC Server.
--grpc-server-max-connection-age=60m
The grpc server max connection age. This
controls how often to re-establish connections
and redo TLS handshakes.
--grpc-server-tls-cert="" TLS Certificate for gRPC server, leave blank to
disable TLS
--grpc-server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--grpc-server-tls-key="" TLS Key for the gRPC server, leave blank to
disable TLS
--grpc-server-tls-min-version="1.3"
TLS supported minimum version for gRPC server.
If no version is specified, it'll default to
1.3. Allowed values: ["1.0", "1.1", "1.2",
"1.3"]
--hash-func= Specify which hash function to use when
calculating the hashes of produced files.
If no function has been specified, it does not
happen. This permits avoiding downloading some
files twice albeit at some performance cost.
Possible values are: "", "SHA256".
-h, --help Show context-sensitive help (also try
--help-long and --help-man).
--http-address="0.0.0.0:10902"
Listen host:port for HTTP endpoints.
--http-grace-period=2m Time to wait after an interrupt received for
@ -139,81 +125,105 @@ Flags:
--http.config="" [EXPERIMENTAL] Path to the configuration file
that can enable TLS or authentication for all
HTTP endpoints.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--log.level=info Log filtering level.
--min-time=0000-01-01T00:00:00Z
Start of time range limit to serve. Thanos
sidecar will serve only metrics, which happened
later than this value. Option can be a constant
time in RFC3339 format or time duration
relative to current time, such as -1d or 2h45m.
Valid duration units are ms, s, m, h, d, w, y.
--objstore.config=<content>
Alternative to 'objstore.config-file'
flag (mutually exclusive). Content of
YAML file that contains object store
configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--objstore.config-file=<file-path>
Path to YAML file that contains object
store configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--grpc-address="0.0.0.0:10901"
Listen ip:port address for gRPC endpoints
(StoreAPI). Make sure this address is routable
from other components.
--grpc-server-tls-cert="" TLS Certificate for gRPC server, leave blank to
disable TLS
--grpc-server-tls-key="" TLS Key for the gRPC server, leave blank to
disable TLS
--grpc-server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--grpc-server-tls-min-version="1.3"
TLS supported minimum version for gRPC server.
If no version is specified, it'll default to
1.3. Allowed values: ["1.0", "1.1", "1.2",
"1.3"]
--grpc-server-max-connection-age=60m
The grpc server max connection age. This
controls how often to re-establish connections
and redo TLS handshakes.
--grpc-grace-period=2m Time to wait after an interrupt received for
GRPC Server.
--prometheus.url=http://localhost:9090
URL at which to reach Prometheus's API.
For better performance use local network.
--prometheus.ready_timeout=10m
Maximum time to wait for the Prometheus
instance to start up
--prometheus.get_config_interval=30s
How often to get Prometheus config
--prometheus.get_config_timeout=5s
--prometheus.get_config_timeout=30s
Timeout for getting Prometheus config
--prometheus.http-client-file=<file-path>
Path to YAML file or string with http
client configs. See Format details:
https://thanos.io/tip/components/sidecar.md/#configuration.
--prometheus.http-client=<content>
Alternative to 'prometheus.http-client-file'
flag (mutually exclusive). Content
of YAML file or string with http
client configs. See Format details:
https://thanos.io/tip/components/sidecar.md/#configuration.
--prometheus.http-client-file=<file-path>
Path to YAML file or string with http
client configs. See Format details:
https://thanos.io/tip/components/sidecar.md/#configuration.
--prometheus.ready_timeout=10m
Maximum time to wait for the Prometheus
instance to start up
--prometheus.url=http://localhost:9090
URL at which to reach Prometheus's API.
For better performance use local network.
--tsdb.path="./data" Data directory of TSDB.
--reloader.config-file="" Config file watched by the reloader.
--reloader.config-envsubst-file=""
Output file for environment variable
substituted config file.
--reloader.config-file="" Config file watched by the reloader.
--reloader.method=http Method used to reload the configuration.
--reloader.process-name="prometheus"
Executable name used to match the process being
reloaded when using the signal method.
--reloader.retry-interval=5s
Controls how often reloader retries config
reload in case of error.
--reloader.rule-dir=RELOADER.RULE-DIR ...
Rule directories for the reloader to refresh
(repeated field).
--reloader.watch-interval=3m
Controls how often reloader re-reads config and
rules.
--reloader.retry-interval=5s
Controls how often reloader retries config
reload in case of error.
--reloader.method=http Method used to reload the configuration.
--reloader.process-name="prometheus"
Executable name used to match the process being
reloaded when using the signal method.
--request.logging-config-file=<file-path>
Path to YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config=<content>
Alternative to 'request.logging-config-file'
flag (mutually exclusive). Content
of YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config-file=<file-path>
Path to YAML file with request logging
--objstore.config-file=<file-path>
Path to YAML file that contains object
store configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--objstore.config=<content>
Alternative to 'objstore.config-file'
flag (mutually exclusive). Content of
YAML file that contains object store
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--shipper.meta-file-name="thanos.shipper.json"
the file to store shipper metadata in
--shipper.upload-compacted
https://thanos.io/tip/thanos/storage.md/#configuration
--[no-]shipper.upload-compacted
If true shipper will try to upload compacted
blocks as well. Useful for migration purposes.
Works only if compaction is disabled on
Prometheus. Do it once and then disable the
flag when done.
--hash-func= Specify which hash function to use when
calculating the hashes of produced files.
If no function has been specified, it does not
happen. This permits avoiding downloading some
files twice albeit at some performance cost.
Possible values are: "", "SHA256".
--shipper.meta-file-name="thanos.shipper.json"
the file to store shipper metadata in
--store.limits.request-series=0
The maximum series allowed for a single Series
request. The Series call fails if this limit is
exceeded. 0 means no limit.
--store.limits.request-samples=0
The maximum samples allowed for a single
Series request, The Series call fails if
@ -221,21 +231,13 @@ Flags:
NOTE: For efficiency the limit is internally
implemented as 'chunks limit' considering each
chunk contains a maximum of 120 samples.
--store.limits.request-series=0
The maximum series allowed for a single Series
request. The Series call fails if this limit is
exceeded. 0 means no limit.
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tsdb.path="./data" Data directory of TSDB.
--version Show application version.
--min-time=0000-01-01T00:00:00Z
Start of time range limit to serve. Thanos
sidecar will serve only metrics, which happened
later than this value. Option can be a constant
time in RFC3339 format or time duration
relative to current time, such as -1d or 2h45m.
Valid duration units are ms, s, m, h, d, w, y.
```

View File

@ -48,10 +48,124 @@ usage: thanos store [<flags>]
Store node giving access to blocks in a bucket provider. Now supported GCS, S3,
Azure, Swift, Tencent COS and Aliyun OSS.
Flags:
-h, --[no-]help Show context-sensitive help (also try
--help-long and --help-man).
--[no-]version Show application version.
--log.level=info Log filtering level.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--[no-]enable-auto-gomemlimit
Enable go runtime to automatically limit memory
consumption.
--auto-gomemlimit.ratio=0.9
The ratio of reserved GOMEMLIMIT memory to the
detected maximum container or system memory.
--http-address="0.0.0.0:10902"
Listen host:port for HTTP endpoints.
--http-grace-period=2m Time to wait after an interrupt received for
HTTP Server.
--http.config="" [EXPERIMENTAL] Path to the configuration file
that can enable TLS or authentication for all
HTTP endpoints.
--grpc-address="0.0.0.0:10901"
Listen ip:port address for gRPC endpoints
(StoreAPI). Make sure this address is routable
from other components.
--grpc-server-tls-cert="" TLS Certificate for gRPC server, leave blank to
disable TLS
--grpc-server-tls-key="" TLS Key for the gRPC server, leave blank to
disable TLS
--grpc-server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--grpc-server-tls-min-version="1.3"
TLS supported minimum version for gRPC server.
If no version is specified, it'll default to
1.3. Allowed values: ["1.0", "1.1", "1.2",
"1.3"]
--grpc-server-max-connection-age=60m
The grpc server max connection age. This
controls how often to re-establish connections
and redo TLS handshakes.
--grpc-grace-period=2m Time to wait after an interrupt received for
GRPC Server.
--store.limits.request-series=0
The maximum series allowed for a single Series
request. The Series call fails if this limit is
exceeded. 0 means no limit.
--store.limits.request-samples=0
The maximum samples allowed for a single
Series request, The Series call fails if
this limit is exceeded. 0 means no limit.
NOTE: For efficiency the limit is internally
implemented as 'chunks limit' considering each
chunk contains a maximum of 120 samples.
--data-dir="./data" Local data directory used for caching
purposes (index-header, in-mem cache items and
meta.jsons). If removed, no data will be lost,
just store will have to rebuild the cache.
NOTE: Putting raw blocks here will not
cause the store to read them. For such use
cases use Prometheus + sidecar. Ignored if
--no-cache-index-header option is specified.
--[no-]cache-index-header Cache TSDB index-headers on disk to reduce
startup time. When set to true, Thanos Store
will download index headers from remote object
storage on startup and create a header file on
disk. Use --data-dir to set the directory in
which index headers will be downloaded.
--index-cache-size=250MB Maximum size of items held in the in-memory
index cache. Ignored if --index-cache.config or
--index-cache.config-file option is specified.
--index-cache.config-file=<file-path>
Path to YAML file that contains index
cache configuration. See format details:
https://thanos.io/tip/components/store.md/#index-cache
--index-cache.config=<content>
Alternative to 'index-cache.config-file'
flag (mutually exclusive). Content of
YAML file that contains index cache
configuration. See format details:
https://thanos.io/tip/components/store.md/#index-cache
--chunk-pool-size=2GB Maximum size of concurrently allocatable
bytes reserved strictly to reuse for chunks in
memory.
--store.grpc.touched-series-limit=0
DEPRECATED: use store.limits.request-series.
--store.grpc.series-sample-limit=0
DEPRECATED: use store.limits.request-samples.
--store.grpc.downloaded-bytes-limit=0
Maximum amount of downloaded (either
fetched or touched) bytes in a single
Series/LabelNames/LabelValues call. The Series
call fails if this limit is exceeded. 0 means
no limit.
--store.grpc.series-max-concurrency=20
Maximum number of concurrent Series calls.
--objstore.config-file=<file-path>
Path to YAML file that contains object
store configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--objstore.config=<content>
Alternative to 'objstore.config-file'
flag (mutually exclusive). Content of
YAML file that contains object store
configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--sync-block-duration=15m Repeat interval for syncing the blocks between
local and remote view.
--block-discovery-strategy="concurrent"
One of concurrent, recursive. When set to
concurrent, stores will concurrently issue
@ -61,71 +175,46 @@ Flags:
recursively traversing into each directory.
This avoids N+1 calls at the expense of having
slower bucket iterations.
--block-meta-fetch-concurrency=32
Number of goroutines to use when fetching block
metadata from object storage.
--block-sync-concurrency=20
Number of goroutines to use when constructing
index-cache.json blocks from object storage.
Must be equal or greater than 1.
--bucket-web-label=BUCKET-WEB-LABEL
External block label to use as group title in
the bucket web UI
--cache-index-header Cache TSDB index-headers on disk to reduce
startup time. When set to true, Thanos Store
will download index headers from remote object
storage on startup and create a header file on
disk. Use --data-dir to set the directory in
which index headers will be downloaded.
--chunk-pool-size=2GB Maximum size of concurrently allocatable
bytes reserved strictly to reuse for chunks in
memory.
--block-meta-fetch-concurrency=32
Number of goroutines to use when fetching block
metadata from object storage.
--min-time=0000-01-01T00:00:00Z
Start of time range limit to serve. Thanos
Store will serve only metrics, which happened
later than this value. Option can be a constant
time in RFC3339 format or time duration
relative to current time, such as -1d or 2h45m.
Valid duration units are ms, s, m, h, d, w, y.
--max-time=9999-12-31T23:59:59Z
End of time range limit to serve. Thanos Store
will serve only blocks, which happened earlier
than this value. Option can be a constant time
in RFC3339 format or time duration relative
to current time, such as -1d or 2h45m. Valid
duration units are ms, s, m, h, d, w, y.
--selector.relabel-config-file=<file-path>
Path to YAML file with relabeling
configuration that allows selecting blocks
to act on based on their external labels.
It follows thanos sharding relabel-config
syntax. For format details see:
https://thanos.io/tip/thanos/sharding.md/#relabelling
--selector.relabel-config=<content>
Alternative to 'selector.relabel-config-file'
flag (mutually exclusive). Content of YAML
file with relabeling configuration that allows
selecting blocks to act on based on their
external labels. It follows thanos sharding
relabel-config syntax. For format details see:
https://thanos.io/tip/thanos/sharding.md/#relabelling
--consistency-delay=0s Minimum age of all blocks before they are
being read. Set it to safe value (e.g 30m) if
your object storage is eventually consistent.
GCS and S3 are (roughly) strongly consistent.
--data-dir="./data" Local data directory used for caching
purposes (index-header, in-mem cache items and
meta.jsons). If removed, no data will be lost,
just store will have to rebuild the cache.
NOTE: Putting raw blocks here will not
cause the store to read them. For such use
cases use Prometheus + sidecar. Ignored if
--no-cache-index-header option is specified.
--enable-auto-gomemlimit Enable go runtime to automatically limit memory
consumption.
--grpc-address="0.0.0.0:10901"
Listen ip:port address for gRPC endpoints
(StoreAPI). Make sure this address is routable
from other components.
--grpc-grace-period=2m Time to wait after an interrupt received for
GRPC Server.
--grpc-server-max-connection-age=60m
The grpc server max connection age. This
controls how often to re-establish connections
and redo TLS handshakes.
--grpc-server-tls-cert="" TLS Certificate for gRPC server, leave blank to
disable TLS
--grpc-server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--grpc-server-tls-key="" TLS Key for the gRPC server, leave blank to
disable TLS
--grpc-server-tls-min-version="1.3"
TLS supported minimum version for gRPC server.
If no version is specified, it'll default to
1.3. Allowed values: ["1.0", "1.1", "1.2",
"1.3"]
-h, --help Show context-sensitive help (also try
--help-long and --help-man).
--http-address="0.0.0.0:10902"
Listen host:port for HTTP endpoints.
--http-grace-period=2m Time to wait after an interrupt received for
HTTP Server.
--http.config="" [EXPERIMENTAL] Path to the configuration file
that can enable TLS or authentication for all
HTTP endpoints.
--ignore-deletion-marks-delay=24h
Duration after which the blocks marked for
deletion will be filtered out while fetching
@ -147,111 +236,15 @@ Flags:
blocks before being deleted from bucket.
Default is 24h, half of the default value for
--delete-delay on compactor.
--index-cache-size=250MB Maximum size of items held in the in-memory
index cache. Ignored if --index-cache.config or
--index-cache.config-file option is specified.
--index-cache.config=<content>
Alternative to 'index-cache.config-file'
flag (mutually exclusive). Content of
YAML file that contains index cache
configuration. See format details:
https://thanos.io/tip/components/store.md/#index-cache
--index-cache.config-file=<file-path>
Path to YAML file that contains index
cache configuration. See format details:
https://thanos.io/tip/components/store.md/#index-cache
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--log.level=info Log filtering level.
--matcher-cache-size=0 Max number of cached matchers items. Using 0
disables caching.
--max-time=9999-12-31T23:59:59Z
End of time range limit to serve. Thanos Store
will serve only blocks, which happened earlier
than this value. Option can be a constant time
in RFC3339 format or time duration relative
to current time, such as -1d or 2h45m. Valid
duration units are ms, s, m, h, d, w, y.
--min-time=0000-01-01T00:00:00Z
Start of time range limit to serve. Thanos
Store will serve only metrics, which happened
later than this value. Option can be a constant
time in RFC3339 format or time duration
relative to current time, such as -1d or 2h45m.
Valid duration units are ms, s, m, h, d, w, y.
--objstore.config=<content>
Alternative to 'objstore.config-file'
flag (mutually exclusive). Content of
YAML file that contains object store
configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--objstore.config-file=<file-path>
Path to YAML file that contains object
store configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--request.logging-config=<content>
Alternative to 'request.logging-config-file'
flag (mutually exclusive). Content
of YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config-file=<file-path>
Path to YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--selector.relabel-config=<content>
Alternative to 'selector.relabel-config-file'
flag (mutually exclusive). Content of YAML
file with relabeling configuration that allows
selecting blocks to act on based on their
external labels. It follows thanos sharding
relabel-config syntax. For format details see:
https://thanos.io/tip/thanos/sharding.md/#relabelling
--selector.relabel-config-file=<file-path>
Path to YAML file with relabeling
configuration that allows selecting blocks
to act on based on their external labels.
It follows thanos sharding relabel-config
syntax. For format details see:
https://thanos.io/tip/thanos/sharding.md/#relabelling
--store.enable-index-header-lazy-reader
--[no-]store.enable-index-header-lazy-reader
If true, Store Gateway will lazy memory map
index-header only once the block is required by
a query.
--store.enable-lazy-expanded-postings
--[no-]store.enable-lazy-expanded-postings
If true, Store Gateway will estimate postings
size and try to lazily expand postings if
it downloads less data than expanding all
postings.
--store.grpc.downloaded-bytes-limit=0
Maximum amount of downloaded (either
fetched or touched) bytes in a single
Series/LabelNames/LabelValues call. The Series
call fails if this limit is exceeded. 0 means
no limit.
--store.grpc.series-max-concurrency=20
Maximum number of concurrent Series calls.
--store.grpc.series-sample-limit=0
DEPRECATED: use store.limits.request-samples.
--store.grpc.touched-series-limit=0
DEPRECATED: use store.limits.request-series.
--store.index-header-lazy-download-strategy=eager
Strategy of how to download index headers
lazily. Supported values: eager, lazy.
If eager, always download index header during
initial load. If lazy, download index header
during query time.
--store.limits.request-samples=0
The maximum samples allowed for a single
Series request, The Series call fails if
this limit is exceeded. 0 means no limit.
NOTE: For efficiency the limit is internally
implemented as 'chunks limit' considering each
chunk contains a maximum of 120 samples.
--store.limits.request-series=0
The maximum series allowed for a single Series
request. The Series call fails if this limit is
exceeded. 0 means no limit.
--store.posting-group-max-key-series-ratio=100
Mark posting group as lazy if it fetches more
keys than R * max series the query should
@ -264,22 +257,13 @@ Flags:
accordingly. This config is only valid if lazy
expanded posting is enabled. 0 disables the
limit.
--sync-block-duration=15m Repeat interval for syncing the blocks between
local and remote view.
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--version Show application version.
--web.disable Disable Block Viewer UI.
--web.disable-cors Whether to disable CORS headers to be set by
Thanos. By default Thanos sets CORS headers to
be allowed by all.
--store.index-header-lazy-download-strategy=eager
Strategy of how to download index headers
lazily. Supported values: eager, lazy.
If eager, always download index header during
initial load. If lazy, download index header
during query time.
--[no-]web.disable Disable Block Viewer UI.
--web.external-prefix="" Static prefix for all HTML links and redirect
URLs in the bucket web UI interface.
Actual endpoints are still served on / or the
@ -299,6 +283,27 @@ Flags:
stripped prefix value in X-Forwarded-Prefix
header. This allows thanos UI to be served on a
sub-path.
--[no-]web.disable-cors Whether to disable CORS headers to be set by
Thanos. By default Thanos sets CORS headers to
be allowed by all.
--bucket-web-label=BUCKET-WEB-LABEL
External block label to use as group title in
the bucket web UI
--matcher-cache-size=0 Max number of cached matchers items. Using 0
disables caching.
--[no-]disable-admin-operations
Disable UI/API admin operations like marking
blocks for deletion and no compaction.
--request.logging-config-file=<file-path>
Path to YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config=<content>
Alternative to 'request.logging-config-file'
flag (mutually exclusive). Content
of YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
```

File diff suppressed because it is too large Load Diff

View File

@ -23,6 +23,9 @@ Release shepherd responsibilities:
| Release | Time of first RC | Shepherd (GitHub handle) |
|---------|------------------|-------------------------------|
| v0.39.0 | 2025.05.29 | `@GiedriusS` |
| v0.38.0 | 2025.03.25 | `@MichaHoffmann` |
| v0.37.0 | 2024.11.19 | `@saswatamcode` |
| v0.36.0 | 2024.06.26 | `@MichaHoffmann` |
| v0.35.0 | 2024.04.09 | `@saswatamcode` |
| v0.34.0 | 2024.01.14 | `@MichaHoffmann` |

344
go.mod
View File

@ -1,289 +1,315 @@
module github.com/thanos-io/thanos
go 1.24
go 1.24.0
require (
capnproto.org/go/capnp/v3 v3.0.0-alpha.30
cloud.google.com/go/trace v1.10.12
github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/trace v1.8.3
github.com/KimMachineGun/automemlimit v0.6.1
capnproto.org/go/capnp/v3 v3.1.0-alpha.1
cloud.google.com/go/trace v1.11.4
github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/trace v1.27.0
github.com/KimMachineGun/automemlimit v0.7.3
github.com/alecthomas/units v0.0.0-20240927000941-0f3dac36c52b
github.com/alicebob/miniredis/v2 v2.22.0
github.com/alicebob/miniredis/v2 v2.35.0
github.com/blang/semver/v4 v4.0.0
github.com/bradfitz/gomemcache v0.0.0-20190913173617-a41fca850d0b
github.com/bradfitz/gomemcache v0.0.0-20250403215159-8d39553ac7cf
github.com/caio/go-tdigest v3.1.0+incompatible
github.com/cespare/xxhash/v2 v2.3.0
github.com/chromedp/cdproto v0.0.0-20230802225258-3cf4e6d46a89
github.com/chromedp/chromedp v0.9.2
github.com/cortexproject/promqlsmith v0.0.0-20240506042652-6cfdd9739a5e
github.com/cortexproject/promqlsmith v0.0.0-20250407233056-90db95b1a4e4
github.com/cristalhq/hedgedhttp v0.9.1
github.com/dustin/go-humanize v1.0.1
github.com/efficientgo/core v1.0.0-rc.3
github.com/efficientgo/e2e v0.14.1-0.20230710114240-c316eb95ae5b
github.com/efficientgo/tools/extkingpin v0.0.0-20220817170617-6c25e3b627dd
github.com/efficientgo/tools/extkingpin v0.0.0-20230505153745-6b7392939a60
github.com/facette/natsort v0.0.0-20181210072756-2cd4dd1e2dcb
github.com/fatih/structtag v1.2.0
github.com/felixge/fgprof v0.9.5
github.com/fortytw2/leaktest v1.3.0
github.com/fsnotify/fsnotify v1.8.0
github.com/fsnotify/fsnotify v1.9.0
github.com/go-kit/log v0.2.1
github.com/go-openapi/strfmt v0.23.0
github.com/gogo/protobuf v1.3.2
github.com/gogo/status v1.1.1
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da
github.com/golang/groupcache v0.0.0-20241129210726-2c02b8208cf8
github.com/golang/protobuf v1.5.4
github.com/golang/snappy v0.0.4
github.com/golang/snappy v1.0.0
github.com/google/go-cmp v0.7.0
github.com/google/uuid v1.6.0
github.com/googleapis/gax-go v2.0.2+incompatible
github.com/grpc-ecosystem/go-grpc-middleware/providers/prometheus v1.0.1
github.com/grpc-ecosystem/go-grpc-middleware/v2 v2.1.0
github.com/grpc-ecosystem/go-grpc-middleware/v2 v2.3.2
github.com/hashicorp/golang-lru/v2 v2.0.7
github.com/jpillora/backoff v1.0.0
github.com/json-iterator/go v1.1.12
github.com/klauspost/compress v1.17.11
github.com/klauspost/compress v1.18.0
github.com/leanovate/gopter v0.2.9
github.com/lightstep/lightstep-tracer-go v0.25.0
github.com/lightstep/lightstep-tracer-go v0.26.0
github.com/lovoo/gcloud-opentracing v0.3.0
github.com/miekg/dns v1.1.62
github.com/miekg/dns v1.1.66
github.com/minio/sha256-simd v1.0.1
github.com/mitchellh/go-ps v1.0.0
github.com/mwitkow/go-conntrack v0.0.0-20190716064945-2f068394615f
github.com/oklog/run v1.1.0
github.com/oklog/ulid v1.3.1
github.com/oklog/ulid v1.3.1 // indirect
github.com/olekukonko/tablewriter v0.0.5
github.com/onsi/gomega v1.34.0
github.com/onsi/gomega v1.36.2
github.com/opentracing/basictracer-go v1.1.0
github.com/opentracing/opentracing-go v1.2.0
github.com/pkg/errors v0.9.1
github.com/prometheus-community/prom-label-proxy v0.8.1-0.20240127162815-c1195f9aabc0
github.com/prometheus/alertmanager v0.27.0
github.com/prometheus/client_golang v1.21.1
github.com/prometheus/client_model v0.6.1
github.com/prometheus/common v0.62.0
github.com/prometheus/exporter-toolkit v0.13.2
github.com/prometheus-community/prom-label-proxy v0.11.1
github.com/prometheus/alertmanager v0.28.1
github.com/prometheus/client_golang v1.22.0
github.com/prometheus/client_model v0.6.2
github.com/prometheus/common v0.63.0
github.com/prometheus/exporter-toolkit v0.14.0
// Prometheus maps version 3.x.y to tags v0.30x.y.
github.com/prometheus/prometheus v0.301.0
github.com/redis/rueidis v1.0.45-alpha.1
github.com/prometheus/prometheus v0.303.1
github.com/redis/rueidis v1.0.61
github.com/seiflotfy/cuckoofilter v0.0.0-20240715131351-a2f2c23f1771
github.com/sony/gobreaker v0.5.0
github.com/sony/gobreaker v1.0.0
github.com/stretchr/testify v1.10.0
github.com/thanos-io/objstore v0.0.0-20241111205755-d1dd89d41f97
github.com/thanos-io/promql-engine v0.0.0-20250302135832-accbf0891a16
github.com/thanos-io/promql-engine v0.0.0-20250522103302-dd83bd8fdb50
github.com/uber/jaeger-client-go v2.30.0+incompatible
github.com/vimeo/galaxycache v0.0.0-20210323154928-b7e5d71c067a
github.com/vimeo/galaxycache v1.3.1
github.com/weaveworks/common v0.0.0-20230728070032-dd9e68f319d5
go.elastic.co/apm v1.15.0
go.elastic.co/apm/module/apmot v1.15.0
go.opentelemetry.io/contrib/propagators/autoprop v0.54.0
go.opentelemetry.io/contrib/samplers/jaegerremote v0.23.0
go.opentelemetry.io/otel v1.35.0
go.opentelemetry.io/otel/bridge/opentracing v1.31.0
go.opentelemetry.io/otel/exporters/jaeger v1.16.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.34.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.33.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.34.0
go.opentelemetry.io/otel/sdk v1.35.0
go.opentelemetry.io/otel/trace v1.35.0
go.opentelemetry.io/contrib/propagators/autoprop v0.61.0
go.opentelemetry.io/contrib/samplers/jaegerremote v0.30.0
go.opentelemetry.io/otel v1.36.0
go.opentelemetry.io/otel/bridge/opentracing v1.36.0
go.opentelemetry.io/otel/exporters/jaeger v1.17.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.36.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.36.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.36.0
go.opentelemetry.io/otel/sdk v1.36.0
go.opentelemetry.io/otel/trace v1.36.0
go.uber.org/atomic v1.11.0
go.uber.org/automaxprocs v1.6.0
go.uber.org/goleak v1.3.0
go4.org/intern v0.0.0-20230525184215-6c62f75575cb
golang.org/x/crypto v0.32.0
golang.org/x/net v0.34.0
golang.org/x/sync v0.10.0
golang.org/x/text v0.21.0
golang.org/x/time v0.8.0
google.golang.org/grpc v1.69.4
golang.org/x/crypto v0.39.0
golang.org/x/net v0.41.0
golang.org/x/sync v0.15.0
golang.org/x/text v0.26.0
golang.org/x/time v0.12.0
google.golang.org/grpc v1.73.0
google.golang.org/grpc/examples v0.0.0-20211119005141-f45e61797429
google.golang.org/protobuf v1.36.3
gopkg.in/alecthomas/kingpin.v2 v2.2.6
google.golang.org/protobuf v1.36.6
gopkg.in/yaml.v2 v2.4.0
gopkg.in/yaml.v3 v3.0.1
)
require (
cloud.google.com/go v0.115.1 // indirect
cloud.google.com/go/auth v0.13.0 // indirect
cloud.google.com/go/auth/oauth2adapt v0.2.6 // indirect
cloud.google.com/go/compute/metadata v0.6.0 // indirect
cloud.google.com/go/iam v1.1.13 // indirect
cloud.google.com/go v0.118.0 // indirect
cloud.google.com/go/auth v0.15.1-0.20250317171031-671eed979bfd // indirect
cloud.google.com/go/auth/oauth2adapt v0.2.8 // indirect
cloud.google.com/go/compute/metadata v0.7.0 // indirect
cloud.google.com/go/iam v1.3.1 // indirect
cloud.google.com/go/storage v1.43.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/azcore v1.16.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.8.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/internal v1.10.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.3.0 // indirect
github.com/AzureAD/microsoft-authentication-library-for-go v1.2.2 // indirect
github.com/Azure/azure-sdk-for-go/sdk/azcore v1.18.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.10.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/internal v1.11.1 // indirect
github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.6.1 // indirect
github.com/AzureAD/microsoft-authentication-library-for-go v1.4.2 // indirect
)
require (
github.com/tjhop/slog-gokit v0.1.3
go.opentelemetry.io/collector/pdata v1.22.0
go.opentelemetry.io/collector/semconv v0.116.0
github.com/alecthomas/kingpin/v2 v2.4.0
github.com/oklog/ulid/v2 v2.1.1
github.com/prometheus/otlptranslator v0.0.0-20250527173959-2573485683d5
github.com/tjhop/slog-gokit v0.1.4
go.opentelemetry.io/collector/pdata v1.34.0
go.opentelemetry.io/collector/semconv v0.128.0
)
require github.com/dgryski/go-metro v0.0.0-20200812162917-85c65e2d0165 // indirect
require github.com/dgryski/go-metro v0.0.0-20250106013310-edb8663e5e33 // indirect
require (
github.com/HdrHistogram/hdrhistogram-go v1.1.2 // indirect
github.com/bboreham/go-loser v0.0.0-20230920113527-fcc2c21820a3 // indirect
github.com/cilium/ebpf v0.11.0 // indirect
github.com/containerd/cgroups/v3 v3.0.3 // indirect
github.com/docker/go-units v0.5.0 // indirect
github.com/elastic/go-licenser v0.3.1 // indirect
github.com/elastic/go-licenser v0.4.2 // indirect
github.com/go-ini/ini v1.67.0 // indirect
github.com/go-openapi/runtime v0.27.1 // indirect
github.com/goccy/go-json v0.10.3 // indirect
github.com/godbus/dbus/v5 v5.0.4 // indirect
github.com/golang-jwt/jwt/v5 v5.2.1 // indirect
github.com/google/s2a-go v0.1.8 // indirect
github.com/huaweicloud/huaweicloud-sdk-go-obs v3.23.3+incompatible // indirect
github.com/jcchavezs/porto v0.1.0 // indirect
github.com/go-openapi/runtime v0.28.0 // indirect
github.com/goccy/go-json v0.10.5 // indirect
github.com/golang-jwt/jwt/v5 v5.2.2 // indirect
github.com/google/s2a-go v0.1.9 // indirect
github.com/huaweicloud/huaweicloud-sdk-go-obs v3.25.4+incompatible // indirect
github.com/jcchavezs/porto v0.7.0 // indirect
github.com/leesper/go_rng v0.0.0-20190531154944-a612b043e353 // indirect
github.com/mdlayher/socket v0.4.1 // indirect
github.com/mdlayher/socket v0.5.1 // indirect
github.com/mdlayher/vsock v1.2.1 // indirect
github.com/metalmatze/signal v0.0.0-20210307161603-1c9aa721a97a // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/onsi/ginkgo v1.16.5 // indirect
github.com/opencontainers/runtime-spec v1.0.2 // indirect
github.com/pbnjay/memory v0.0.0-20210728143218-7b4eea64cf58 // indirect
github.com/sercand/kuberesolver/v4 v4.0.0 // indirect
github.com/zhangyunhao116/umap v0.0.0-20221211160557-cb7705fafa39 // indirect
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.54.0 // indirect
go.opentelemetry.io/contrib/propagators/ot v1.29.0 // indirect
go4.org/unsafe/assume-no-moving-gc v0.0.0-20230525183740-e7c30c78aeb2 // indirect
golang.org/x/lint v0.0.0-20210508222113-6edffad5e616 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20250115164207-1a7da9e5054f // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20250115164207-1a7da9e5054f // indirect
k8s.io/apimachinery v0.31.3 // indirect
k8s.io/client-go v0.31.3 // indirect
github.com/zhangyunhao116/umap v0.0.0-20250307031311-0b61e69e958b // indirect
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.61.0 // indirect
go.opentelemetry.io/contrib/propagators/ot v1.36.0 // indirect
go4.org/unsafe/assume-no-moving-gc v0.0.0-20231121144256-b99613f794b6 // indirect
golang.org/x/lint v0.0.0-20241112194109-818c5a804067 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20250603155806-513f23925822 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20250603155806-513f23925822 // indirect
k8s.io/apimachinery v0.33.1 // indirect
k8s.io/client-go v0.33.1 // indirect
k8s.io/klog/v2 v2.130.1 // indirect
k8s.io/utils v0.0.0-20240711033017-18e509b52bc8 // indirect
zenhack.net/go/util v0.0.0-20230414204917-531d38494cf5 // indirect
k8s.io/utils v0.0.0-20250604170112-4c0f3b243397 // indirect
)
require (
github.com/GoogleCloudPlatform/opentelemetry-operations-go/internal/resourcemapping v0.32.3 // indirect
github.com/alecthomas/template v0.0.0-20190718012654-fb15b899a751 // indirect
github.com/alicebob/gopher-json v0.0.0-20200520072559-a9ecdc9d1d3a // indirect
github.com/aliyun/aliyun-oss-go-sdk v2.2.2+incompatible // indirect
github.com/GoogleCloudPlatform/opentelemetry-operations-go/internal/resourcemapping v0.52.0 // indirect
github.com/aliyun/aliyun-oss-go-sdk v3.0.2+incompatible // indirect
github.com/armon/go-radix v1.0.0 // indirect
github.com/asaskevich/govalidator v0.0.0-20230301143203-a9d515a09cc2 // indirect
github.com/aws/aws-sdk-go v1.55.5 // indirect
github.com/aws/aws-sdk-go-v2 v1.16.0 // indirect
github.com/aws/aws-sdk-go-v2/config v1.15.1 // indirect
github.com/aws/aws-sdk-go-v2/credentials v1.11.0 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.12.1 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.1.7 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.4.1 // indirect
github.com/aws/aws-sdk-go-v2/internal/ini v1.3.8 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.9.1 // indirect
github.com/aws/aws-sdk-go-v2/service/sso v1.11.1 // indirect
github.com/aws/aws-sdk-go-v2/service/sts v1.16.1 // indirect
github.com/aws/smithy-go v1.11.1 // indirect
github.com/baidubce/bce-sdk-go v0.9.111 // indirect
github.com/aws/aws-sdk-go v1.55.7 // indirect
github.com/aws/aws-sdk-go-v2 v1.36.3 // indirect
github.com/aws/aws-sdk-go-v2/config v1.29.15 // indirect
github.com/aws/aws-sdk-go-v2/credentials v1.17.68 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.30 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.34 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.34 // indirect
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.3 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.12.3 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.12.15 // indirect
github.com/aws/aws-sdk-go-v2/service/sso v1.25.3 // indirect
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.30.1 // indirect
github.com/aws/aws-sdk-go-v2/service/sts v1.33.20 // indirect
github.com/aws/smithy-go v1.22.3 // indirect
github.com/baidubce/bce-sdk-go v0.9.230 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/cenkalti/backoff/v4 v4.3.0 // indirect
github.com/cenkalti/backoff/v5 v5.0.2 // indirect
github.com/chromedp/sysutil v1.0.0 // indirect
github.com/clbanning/mxj v1.8.4 // indirect
github.com/cncf/xds/go v0.0.0-20250501225837-2ac532fd4443 // indirect
github.com/coreos/go-systemd/v22 v22.5.0 // indirect
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
github.com/dennwc/varint v1.0.0 // indirect
github.com/edsrzf/mmap-go v1.2.0 // indirect
github.com/elastic/go-sysinfo v1.8.1 // indirect
github.com/elastic/go-windows v1.0.1 // indirect
github.com/elastic/go-sysinfo v1.15.3 // indirect
github.com/elastic/go-windows v1.0.2 // indirect
github.com/envoyproxy/go-control-plane/envoy v1.32.4 // indirect
github.com/fatih/color v1.18.0 // indirect
github.com/felixge/httpsnoop v1.0.4 // indirect
github.com/go-logfmt/logfmt v0.6.0 // indirect
github.com/go-logr/logr v1.4.2 // indirect
github.com/go-logr/logr v1.4.3 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/go-ole/go-ole v1.2.6 // indirect
github.com/go-openapi/analysis v0.22.2 // indirect
github.com/go-openapi/errors v0.22.0 // indirect
github.com/go-openapi/jsonpointer v0.21.0 // indirect
github.com/go-openapi/jsonreference v0.20.4 // indirect
github.com/go-openapi/loads v0.21.5 // indirect
github.com/go-openapi/spec v0.20.14 // indirect
github.com/go-openapi/swag v0.23.0 // indirect
github.com/go-openapi/validate v0.23.0 // indirect
github.com/go-openapi/analysis v0.23.0 // indirect
github.com/go-openapi/errors v0.22.1 // indirect
github.com/go-openapi/jsonpointer v0.21.1 // indirect
github.com/go-openapi/jsonreference v0.21.0 // indirect
github.com/go-openapi/loads v0.22.0 // indirect
github.com/go-openapi/spec v0.21.0 // indirect
github.com/go-openapi/swag v0.23.1 // indirect
github.com/go-openapi/validate v0.24.0 // indirect
github.com/go-viper/mapstructure/v2 v2.2.1 // indirect
github.com/gobwas/glob v0.2.3 // indirect
github.com/gobwas/httphead v0.1.0 // indirect
github.com/gobwas/pool v0.2.1 // indirect
github.com/gobwas/ws v1.2.1 // indirect
github.com/gofrs/flock v0.8.1 // indirect
github.com/gogo/googleapis v1.4.0 // indirect
github.com/gofrs/flock v0.12.1 // indirect
github.com/gogo/googleapis v1.4.1 // indirect
github.com/google/go-querystring v1.1.0 // indirect
github.com/google/pprof v0.0.0-20241210010833-40e02aabc2ad // indirect
github.com/googleapis/enterprise-certificate-proxy v0.3.4 // indirect
github.com/googleapis/gax-go/v2 v2.14.0 // indirect
github.com/gorilla/mux v1.8.0 // indirect
github.com/google/pprof v0.0.0-20250607225305-033d6d78b36a // indirect
github.com/googleapis/enterprise-certificate-proxy v0.3.6 // indirect
github.com/googleapis/gax-go/v2 v2.14.1 // indirect
github.com/gorilla/mux v1.8.1 // indirect
github.com/grafana/regexp v0.0.0-20240518133315-a468a5bfb3bc // indirect
github.com/grpc-ecosystem/grpc-gateway/v2 v2.25.1 // indirect
github.com/grpc-ecosystem/grpc-gateway/v2 v2.26.3 // indirect
github.com/hashicorp/go-version v1.7.0 // indirect
github.com/jaegertracing/jaeger-idl v0.6.0 // indirect
github.com/jmespath/go-jmespath v0.4.0 // indirect
github.com/joeshaw/multierror v0.0.0-20140124173710-69b34d4ec901 // indirect
github.com/josharian/intern v1.0.0 // indirect
github.com/julienschmidt/httprouter v1.3.0 // indirect
github.com/klauspost/cpuid/v2 v2.2.8 // indirect
github.com/klauspost/cpuid/v2 v2.2.10 // indirect
github.com/knadh/koanf/maps v0.1.2 // indirect
github.com/knadh/koanf/providers/confmap v1.0.0 // indirect
github.com/knadh/koanf/v2 v2.2.1 // indirect
github.com/kylelemons/godebug v1.1.0 // indirect
github.com/lightstep/lightstep-tracer-common/golang/gogo v0.0.0-20210210170715-a8dfcb80d3a7 // indirect
github.com/lufia/plan9stats v0.0.0-20211012122336-39d0f177ccd0 // indirect
github.com/mailru/easyjson v0.7.7 // indirect
github.com/mattn/go-runewidth v0.0.13 // indirect
github.com/mailru/easyjson v0.9.0 // indirect
github.com/mattn/go-colorable v0.1.14 // indirect
github.com/mattn/go-runewidth v0.0.16 // indirect
github.com/minio/md5-simd v1.1.2 // indirect
github.com/minio/minio-go/v7 v7.0.80 // indirect
github.com/mitchellh/copystructure v1.2.0 // indirect
github.com/mitchellh/mapstructure v1.5.0 // indirect
github.com/mitchellh/reflectwalk v1.0.2 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.2 // indirect
github.com/mozillazg/go-httpheader v0.2.1 // indirect
github.com/mozillazg/go-httpheader v0.4.0 // indirect
github.com/ncw/swift v1.0.53 // indirect
github.com/opentracing-contrib/go-grpc v0.0.0-20210225150812-73cb765af46e // indirect
github.com/opentracing-contrib/go-stdlib v1.0.0 // indirect
github.com/oracle/oci-go-sdk/v65 v65.41.1 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/internal/exp/metrics v0.128.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/pdatautil v0.128.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/processor/deltatocumulativeprocessor v0.128.0 // indirect
github.com/opentracing-contrib/go-grpc v0.1.2 // indirect
github.com/opentracing-contrib/go-stdlib v1.1.0 // indirect
github.com/oracle/oci-go-sdk/v65 v65.93.1 // indirect
github.com/pkg/browser v0.0.0-20240102092130-5ac0b6a4141c // indirect
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
github.com/power-devops/perfstat v0.0.0-20210106213030-5aafc221ea8c // indirect
github.com/prometheus/procfs v0.15.1 // indirect
github.com/prometheus/sigv4 v0.1.0 // indirect
github.com/rivo/uniseg v0.2.0 // indirect
github.com/prometheus/procfs v0.16.1 // indirect
github.com/prometheus/sigv4 v0.1.2 // indirect
github.com/puzpuzpuz/xsync/v3 v3.5.1 // indirect
github.com/rivo/uniseg v0.4.7 // indirect
github.com/rs/xid v1.6.0 // indirect
github.com/santhosh-tekuri/jsonschema v1.2.4 // indirect
github.com/shirou/gopsutil/v3 v3.22.9 // indirect
github.com/sirupsen/logrus v1.9.3 // indirect
github.com/stretchr/objx v0.5.2 // indirect
github.com/tencentyun/cos-go-sdk-v5 v0.7.40 // indirect
github.com/tklauser/go-sysconf v0.3.10 // indirect
github.com/tklauser/numcpus v0.4.0 // indirect
github.com/tencentyun/cos-go-sdk-v5 v0.7.66 // indirect
github.com/uber/jaeger-lib v2.4.1+incompatible // indirect
github.com/weaveworks/promrus v1.2.0 // indirect
github.com/yuin/gopher-lua v0.0.0-20210529063254-f4c35e4016d9 // indirect
github.com/yusufpapurcu/wmi v1.2.2 // indirect
github.com/xhit/go-str2duration/v2 v2.1.0 // indirect
github.com/youmark/pkcs8 v0.0.0-20240726163527-a2c0da244d78 // indirect
github.com/yuin/gopher-lua v1.1.1 // indirect
go.elastic.co/apm/module/apmhttp v1.15.0 // indirect
go.elastic.co/fastjson v1.1.0 // indirect
go.mongodb.org/mongo-driver v1.14.0 // indirect
go.elastic.co/fastjson v1.5.1 // indirect
go.mongodb.org/mongo-driver v1.17.4 // indirect
go.opencensus.io v0.24.0 // indirect
go.opentelemetry.io/auto/sdk v1.1.0 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/httptrace/otelhttptrace v0.58.0 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.58.0 // indirect
go.opentelemetry.io/contrib/propagators/aws v1.29.0 // indirect
go.opentelemetry.io/contrib/propagators/b3 v1.29.0 // indirect
go.opentelemetry.io/contrib/propagators/jaeger v1.29.0 // indirect
go.opentelemetry.io/otel/metric v1.35.0 // indirect
go.opentelemetry.io/proto/otlp v1.5.0 // indirect
go.opentelemetry.io/collector/component v1.34.0 // indirect
go.opentelemetry.io/collector/confmap v1.34.0 // indirect
go.opentelemetry.io/collector/confmap/xconfmap v0.128.0 // indirect
go.opentelemetry.io/collector/consumer v1.34.0 // indirect
go.opentelemetry.io/collector/featuregate v1.34.0 // indirect
go.opentelemetry.io/collector/internal/telemetry v0.128.0 // indirect
go.opentelemetry.io/collector/pipeline v0.128.0 // indirect
go.opentelemetry.io/collector/processor v1.34.0 // indirect
go.opentelemetry.io/contrib/bridges/otelzap v0.11.0 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/httptrace/otelhttptrace v0.61.0 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.61.0 // indirect
go.opentelemetry.io/contrib/propagators/aws v1.36.0 // indirect
go.opentelemetry.io/contrib/propagators/b3 v1.36.0 // indirect
go.opentelemetry.io/contrib/propagators/jaeger v1.36.0 // indirect
go.opentelemetry.io/otel/log v0.12.2 // indirect
go.opentelemetry.io/otel/metric v1.36.0 // indirect
go.opentelemetry.io/proto/otlp v1.7.0 // indirect
go.uber.org/multierr v1.11.0 // indirect
golang.org/x/exp v0.0.0-20240613232115-7f521ea00fb8 // indirect
golang.org/x/mod v0.22.0 // indirect
golang.org/x/oauth2 v0.24.0 // indirect
golang.org/x/sys v0.30.0 // indirect
golang.org/x/tools v0.28.0 // indirect
gonum.org/v1/gonum v0.15.0 // indirect
google.golang.org/api v0.213.0 // indirect
google.golang.org/genproto v0.0.0-20240823204242-4ba0660f739c // indirect
howett.net/plist v0.0.0-20181124034731-591f970eefbb // indirect
go.uber.org/zap v1.27.0 // indirect
golang.org/x/exp v0.0.0-20250606033433-dcc06ee1d476 // indirect
golang.org/x/mod v0.25.0 // indirect
golang.org/x/oauth2 v0.30.0 // indirect
golang.org/x/sys v0.33.0 // indirect
golang.org/x/tools v0.34.0 // indirect
gonum.org/v1/gonum v0.16.0 // indirect
google.golang.org/api v0.228.0 // indirect
google.golang.org/genproto v0.0.0-20250122153221-138b5a5a4fd4 // indirect
howett.net/plist v1.0.1 // indirect
sigs.k8s.io/yaml v1.4.0 // indirect
zenhack.net/go/util v0.0.0-20230607025951-8b02fee814ae // indirect
)
replace (
// Pinnning capnp due to https://github.com/thanos-io/thanos/issues/7944
capnproto.org/go/capnp/v3 => capnproto.org/go/capnp/v3 v3.0.0-alpha.30
// Using a 3rd-party branch for custom dialer - see https://github.com/bradfitz/gomemcache/pull/86.
// Required by Cortex https://github.com/cortexproject/cortex/pull/3051.
github.com/bradfitz/gomemcache => github.com/themihai/gomemcache v0.0.0-20180902122335-24332e2d58ab
// v3.3.1 with https://github.com/prometheus/prometheus/pull/16252.
github.com/prometheus/prometheus => github.com/thanos-io/thanos-prometheus v0.0.0-20250610133519-082594458a88
// Pin kuberesolver/v5 to support new grpc version. Need to upgrade kuberesolver version on weaveworks/common.
github.com/sercand/kuberesolver/v4 => github.com/sercand/kuberesolver/v5 v5.1.1

805
go.sum

File diff suppressed because it is too large Load Diff

View File

@ -9,7 +9,8 @@ import (
"time"
"github.com/go-kit/log"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/opentracing/opentracing-go"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"

View File

@ -17,7 +17,8 @@ import (
"time"
"github.com/go-kit/log"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/prometheus/common/route"
"github.com/prometheus/prometheus/model/labels"
"github.com/thanos-io/objstore"

View File

@ -74,7 +74,8 @@ type queryCreator interface {
}
type QueryFactory struct {
mode PromqlQueryMode
mode PromqlQueryMode
disableFallback bool
prometheus *promql.Engine
thanosLocal *engine.Engine
@ -90,6 +91,7 @@ func NewQueryFactory(
enableXFunctions bool,
activeQueryTracker *promql.ActiveQueryTracker,
mode PromqlQueryMode,
disableFallback bool,
) *QueryFactory {
makeOpts := func(registry prometheus.Registerer) engine.Opts {
opts := engine.Opts{
@ -134,6 +136,7 @@ func NewQueryFactory(
prometheus: promEngine,
thanosLocal: thanosLocal,
thanosDistributed: thanosDistributed,
disableFallback: disableFallback,
}
}
@ -159,7 +162,7 @@ func (f *QueryFactory) makeInstantQuery(
res, err = f.thanosLocal.MakeInstantQuery(ctx, q, opts, qry.query, ts)
}
if err != nil {
if engine.IsUnimplemented(err) {
if engine.IsUnimplemented(err) && !f.disableFallback {
// fallback to prometheus
return f.prometheus.NewInstantQuery(ctx, q, opts, qry.query, ts)
}

View File

@ -120,7 +120,7 @@ func (g *GRPCAPI) Query(request *querypb.QueryRequest, server querypb.Query_Quer
})
if result.Err != nil {
if request.EnablePartialResponse {
if err := server.Send(querypb.NewQueryWarningsResponse(err)); err != nil {
if err := server.Send(querypb.NewQueryWarningsResponse(result.Err)); err != nil {
return err
}
return nil
@ -273,6 +273,9 @@ func extractQueryStats(qry promql.Query) *querypb.QueryStats {
}
if explQry, ok := qry.(engine.ExplainableQuery); ok {
analyze := explQry.Analyze()
if analyze == nil {
return stats
}
stats.SamplesTotal = analyze.TotalSamples()
stats.PeakSamples = analyze.PeakSamples()
}

View File

@ -407,17 +407,29 @@ func (qapi *QueryAPI) getQueryExplain(query promql.Query) (*engine.ExplainOutput
return eq.Explain(), nil
}
return nil, &api.ApiError{Typ: api.ErrorBadData, Err: errors.Errorf("Query not explainable")}
}
func (qapi *QueryAPI) parseQueryAnalyzeParam(r *http.Request, query promql.Query) (queryTelemetry, error) {
if r.FormValue(QueryAnalyzeParam) == "true" || r.FormValue(QueryAnalyzeParam) == "1" {
if eq, ok := query.(engine.ExplainableQuery); ok {
return processAnalysis(eq.Analyze()), nil
func (qapi *QueryAPI) parseQueryAnalyzeParam(r *http.Request) bool {
return (r.FormValue(QueryAnalyzeParam) == "true" || r.FormValue(QueryAnalyzeParam) == "1")
}
func analyzeQueryOutput(query promql.Query, engineType PromqlEngineType) (queryTelemetry, error) {
if eq, ok := query.(engine.ExplainableQuery); ok {
if analyze := eq.Analyze(); analyze != nil {
return processAnalysis(analyze), nil
} else {
return queryTelemetry{}, errors.Errorf("Query: %v not analyzable", query)
}
return queryTelemetry{}, errors.Errorf("Query not analyzable; change engine to 'thanos'")
}
return queryTelemetry{}, nil
var warning error
if engineType == PromqlEngineThanos {
warning = errors.New("Query fallback to prometheus engine; not analyzable.")
} else {
warning = errors.New("Query not analyzable; change engine to 'thanos'.")
}
return queryTelemetry{}, warning
}
func processAnalysis(a *engine.AnalyzeOutputNode) queryTelemetry {
@ -530,7 +542,6 @@ func (qapi *QueryAPI) queryExplain(r *http.Request) (interface{}, []error, *api.
var qErr error
qry, qErr = qapi.queryCreate.makeInstantQuery(ctx, engineParam, queryable, remoteEndpoints, planOrQuery{query: queryStr}, queryOpts, ts)
return qErr
}); err != nil {
return nil, nil, &api.ApiError{Typ: api.ErrorBadData, Err: err}, func() {}
}
@ -614,6 +625,7 @@ func (qapi *QueryAPI) query(r *http.Request) (interface{}, []error, *api.ApiErro
var (
qry promql.Query
seriesStats []storepb.SeriesStatsCounter
warnings []error
)
if err := tracing.DoInSpanWithErr(ctx, "instant_query_create", func(ctx context.Context) error {
@ -638,16 +650,10 @@ func (qapi *QueryAPI) query(r *http.Request) (interface{}, []error, *api.ApiErro
var qErr error
qry, qErr = qapi.queryCreate.makeInstantQuery(ctx, engineParam, queryable, remoteEndpoints, planOrQuery{query: queryStr}, queryOpts, ts)
return qErr
}); err != nil {
return nil, nil, &api.ApiError{Typ: api.ErrorBadData, Err: err}, func() {}
}
analysis, err := qapi.parseQueryAnalyzeParam(r, qry)
if err != nil {
return nil, nil, apiErr, func() {}
}
if err := tracing.DoInSpanWithErr(ctx, "query_gate_ismyturn", qapi.gate.Start); err != nil {
return nil, nil, &api.ApiError{Typ: api.ErrorExec, Err: err}, qry.Close
}
@ -669,6 +675,15 @@ func (qapi *QueryAPI) query(r *http.Request) (interface{}, []error, *api.ApiErro
}
return nil, nil, &api.ApiError{Typ: api.ErrorExec, Err: res.Err}, qry.Close
}
warnings = append(warnings, res.Warnings.AsErrors()...)
var analysis queryTelemetry
if qapi.parseQueryAnalyzeParam(r) {
analysis, err = analyzeQueryOutput(qry, engineParam)
if err != nil {
warnings = append(warnings, err)
}
}
aggregator := qapi.seriesStatsAggregatorFactory.NewAggregator(tenant)
for i := range seriesStats {
@ -686,7 +701,7 @@ func (qapi *QueryAPI) query(r *http.Request) (interface{}, []error, *api.ApiErro
Result: res.Value,
Stats: qs,
QueryAnalysis: analysis,
}, res.Warnings.AsErrors(), nil, qry.Close
}, warnings, nil, qry.Close
}
func (qapi *QueryAPI) queryRangeExplain(r *http.Request) (interface{}, []error, *api.ApiError, func()) {
@ -813,7 +828,6 @@ func (qapi *QueryAPI) queryRangeExplain(r *http.Request) (interface{}, []error,
var qErr error
qry, qErr = qapi.queryCreate.makeRangeQuery(ctx, engineParam, queryable, remoteEndpoints, planOrQuery{query: queryStr}, queryOpts, start, end, step)
return qErr
}); err != nil {
return nil, nil, &api.ApiError{Typ: api.ErrorBadData, Err: err}, func() {}
}
@ -923,6 +937,7 @@ func (qapi *QueryAPI) queryRange(r *http.Request) (interface{}, []error, *api.Ap
var (
qry promql.Query
seriesStats []storepb.SeriesStatsCounter
warnings []error
)
if err := tracing.DoInSpanWithErr(ctx, "range_query_create", func(ctx context.Context) error {
queryable := qapi.queryableCreate(
@ -946,16 +961,10 @@ func (qapi *QueryAPI) queryRange(r *http.Request) (interface{}, []error, *api.Ap
var qErr error
qry, qErr = qapi.queryCreate.makeRangeQuery(ctx, engineParam, queryable, remoteEndpoints, planOrQuery{query: queryStr}, queryOpts, start, end, step)
return qErr
}); err != nil {
return nil, nil, &api.ApiError{Typ: api.ErrorBadData, Err: err}, func() {}
}
analysis, err := qapi.parseQueryAnalyzeParam(r, qry)
if err != nil {
return nil, nil, apiErr, func() {}
}
if err := tracing.DoInSpanWithErr(ctx, "query_gate_ismyturn", qapi.gate.Start); err != nil {
return nil, nil, &api.ApiError{Typ: api.ErrorExec, Err: err}, qry.Close
}
@ -964,7 +973,6 @@ func (qapi *QueryAPI) queryRange(r *http.Request) (interface{}, []error, *api.Ap
var res *promql.Result
tracing.DoInSpan(ctx, "range_query_exec", func(ctx context.Context) {
res = qry.Exec(ctx)
})
beforeRange := time.Now()
if res.Err != nil {
@ -976,6 +984,16 @@ func (qapi *QueryAPI) queryRange(r *http.Request) (interface{}, []error, *api.Ap
}
return nil, nil, &api.ApiError{Typ: api.ErrorExec, Err: res.Err}, qry.Close
}
warnings = append(warnings, res.Warnings.AsErrors()...)
var analysis queryTelemetry
if qapi.parseQueryAnalyzeParam(r) {
analysis, err = analyzeQueryOutput(qry, engineParam)
if err != nil {
warnings = append(warnings, err)
}
}
aggregator := qapi.seriesStatsAggregatorFactory.NewAggregator(tenant)
for i := range seriesStats {
aggregator.Aggregate(seriesStats[i])
@ -992,7 +1010,7 @@ func (qapi *QueryAPI) queryRange(r *http.Request) (interface{}, []error, *api.Ap
Result: res.Value,
Stats: qs,
QueryAnalysis: analysis,
}, res.Warnings.AsErrors(), nil, qry.Close
}, warnings, nil, qry.Close
}
func (qapi *QueryAPI) labelValues(r *http.Request) (interface{}, []error, *api.ApiError, func()) {
@ -1145,7 +1163,6 @@ func (qapi *QueryAPI) series(r *http.Request) (interface{}, []error, *api.ApiErr
nil,
query.NoopSeriesStatsReporter,
).Querier(timestamp.FromTime(start), timestamp.FromTime(end))
if err != nil {
return nil, nil, &api.ApiError{Typ: api.ErrorExec, Err: err}, func() {}
}

View File

@ -99,6 +99,7 @@ var (
true,
nil,
PromqlQueryModeLocal,
false,
)
emptyRemoteEndpointsCreate = query.NewRemoteEndpointsCreator(

View File

@ -19,7 +19,7 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
"github.com/thanos-io/objstore"

View File

@ -18,7 +18,8 @@ import (
"github.com/thanos-io/thanos/pkg/extprom"
"github.com/go-kit/log"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"

View File

@ -18,7 +18,8 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/golang/groupcache/singleflight"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"

View File

@ -18,7 +18,8 @@ import (
"time"
"github.com/go-kit/log"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
promtest "github.com/prometheus/client_golang/prometheus/testutil"

View File

@ -16,7 +16,8 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/prometheus/prometheus/model/labels"
"github.com/prometheus/prometheus/storage"

View File

@ -21,7 +21,7 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"

View File

@ -15,7 +15,8 @@ import (
"time"
"github.com/go-kit/log"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/prometheus/model/labels"

View File

@ -12,7 +12,8 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"

View File

@ -13,7 +13,8 @@ import (
"time"
"github.com/go-kit/log"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
promtestutil "github.com/prometheus/client_golang/prometheus/testutil"
"github.com/prometheus/prometheus/model/labels"
"github.com/thanos-io/objstore/providers/filesystem"

View File

@ -7,10 +7,14 @@ import (
"context"
"sync"
"time"
"unsafe"
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
xsync "golang.org/x/sync/singleflight"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
"github.com/thanos-io/objstore"
@ -48,6 +52,7 @@ type ReaderPool struct {
// Keep track of all readers managed by the pool.
lazyReadersMx sync.Mutex
lazyReaders map[*LazyBinaryReader]struct{}
lazyReadersSF xsync.Group
lazyDownloadFunc LazyDownloadIndexHeaderFunc
}
@ -122,18 +127,16 @@ func NewReaderPool(logger log.Logger, lazyReaderEnabled bool, lazyReaderIdleTime
// with lazy reader enabled, this function will return a lazy reader. The returned lazy reader
// is tracked by the pool and automatically closed once the idle timeout expires.
func (p *ReaderPool) NewBinaryReader(ctx context.Context, logger log.Logger, bkt objstore.BucketReader, dir string, id ulid.ULID, postingOffsetsInMemSampling int, meta *metadata.Meta) (Reader, error) {
var reader Reader
var err error
if p.lazyReaderEnabled {
reader, err = NewLazyBinaryReader(ctx, logger, bkt, dir, id, postingOffsetsInMemSampling, p.metrics.lazyReader, p.metrics.binaryReader, p.onLazyReaderClosed, p.lazyDownloadFunc(meta))
} else {
reader, err = NewBinaryReader(ctx, logger, bkt, dir, id, postingOffsetsInMemSampling, p.metrics.binaryReader)
if !p.lazyReaderEnabled {
return NewBinaryReader(ctx, logger, bkt, dir, id, postingOffsetsInMemSampling, p.metrics.binaryReader)
}
if err != nil {
return nil, err
}
idBytes := id.Bytes()
lazyReader, err, _ := p.lazyReadersSF.Do(*(*string)(unsafe.Pointer(&idBytes)), func() (interface{}, error) {
return NewLazyBinaryReader(ctx, logger, bkt, dir, id, postingOffsetsInMemSampling, p.metrics.lazyReader, p.metrics.binaryReader, p.onLazyReaderClosed, p.lazyDownloadFunc(meta))
})
reader := lazyReader.(Reader)
// Keep track of lazy readers only if required.
if p.lazyReaderEnabled && p.lazyReaderIdleTimeout > 0 {

View File

@ -6,12 +6,16 @@ package indexheader
import (
"context"
"path/filepath"
"sync"
"testing"
"time"
"github.com/go-kit/log"
"github.com/prometheus/client_golang/prometheus"
promtestutil "github.com/prometheus/client_golang/prometheus/testutil"
"github.com/prometheus/prometheus/model/labels"
"github.com/stretchr/testify/require"
"github.com/thanos-io/objstore"
"github.com/thanos-io/objstore/providers/filesystem"
"github.com/efficientgo/core/testutil"
@ -132,3 +136,60 @@ func TestReaderPool_ShouldCloseIdleLazyReaders(t *testing.T) {
testutil.Equals(t, float64(2), promtestutil.ToFloat64(metrics.lazyReader.loadCount))
testutil.Equals(t, float64(2), promtestutil.ToFloat64(metrics.lazyReader.unloadCount))
}
func TestReaderPool_MultipleReaders(t *testing.T) {
ctx := context.Background()
blkDir := t.TempDir()
bkt := objstore.NewInMemBucket()
b1, err := e2eutil.CreateBlock(ctx, blkDir, []labels.Labels{
labels.New(labels.Label{Name: "a", Value: "1"}),
labels.New(labels.Label{Name: "a", Value: "2"}),
labels.New(labels.Label{Name: "a", Value: "3"}),
labels.New(labels.Label{Name: "a", Value: "4"}),
labels.New(labels.Label{Name: "b", Value: "1"}),
}, 100, 0, 1000, labels.New(labels.Label{Name: "ext1", Value: "val1"}), 124, metadata.NoneFunc, nil)
testutil.Ok(t, err)
require.NoError(t, block.Upload(ctx, log.NewNopLogger(), bkt, filepath.Join(blkDir, b1.String()), metadata.NoneFunc))
readerPool := NewReaderPool(
log.NewNopLogger(),
true,
time.Minute,
NewReaderPoolMetrics(prometheus.NewRegistry()),
AlwaysEagerDownloadIndexHeader,
)
dlDir := t.TempDir()
m, err := metadata.ReadFromDir(filepath.Join(blkDir, b1.String()))
testutil.Ok(t, err)
startWg := &sync.WaitGroup{}
startWg.Add(1)
waitWg := &sync.WaitGroup{}
const readersCount = 10
waitWg.Add(readersCount)
for i := 0; i < readersCount; i++ {
go func() {
defer waitWg.Done()
t.Logf("waiting")
startWg.Wait()
t.Logf("starting")
br, err := readerPool.NewBinaryReader(ctx, log.NewNopLogger(), bkt, dlDir, b1, 32, m)
testutil.Ok(t, err)
t.Cleanup(func() {
testutil.Ok(t, br.Close())
})
}()
}
startWg.Done()
waitWg.Wait()
}

View File

@ -10,7 +10,7 @@ import (
"path"
"github.com/go-kit/log"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/thanos-io/objstore"

View File

@ -12,7 +12,8 @@ import (
"time"
"github.com/go-kit/log"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/thanos-io/objstore"
"github.com/thanos-io/thanos/pkg/testutil/custom"

View File

@ -16,7 +16,8 @@ import (
"path/filepath"
"github.com/go-kit/log"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/prometheus/prometheus/model/labels"
"github.com/prometheus/prometheus/model/relabel"
@ -209,6 +210,11 @@ func (m Meta) WriteToDir(logger log.Logger, dir string) error {
runutil.CloseWithLogOnErr(logger, f, "close meta")
return err
}
// Force the kernel to persist the file on disk to avoid data loss if the host crashes.
if err := f.Sync(); err != nil {
return err
}
if err := f.Close(); err != nil {
return err
}

View File

@ -9,7 +9,8 @@ import (
"testing"
"github.com/efficientgo/core/testutil"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/prometheus/prometheus/tsdb"
)

View File

@ -9,7 +9,8 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/prometheus/client_golang/prometheus"
"github.com/thanos-io/objstore"

View File

@ -12,7 +12,8 @@ import (
"time"
"github.com/go-kit/log"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
promtest "github.com/prometheus/client_golang/prometheus/testutil"

View File

@ -17,7 +17,7 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/golang/groupcache/singleflight"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/opentracing/opentracing-go"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"

View File

@ -16,7 +16,8 @@ import (
"time"
"github.com/go-kit/log"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
promtest "github.com/prometheus/client_golang/prometheus/testutil"

View File

@ -15,7 +15,8 @@ import (
"github.com/go-kit/log"
"github.com/oklog/run"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"

View File

@ -15,7 +15,8 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/prometheus/prometheus/model/histogram"
"github.com/prometheus/prometheus/model/labels"

View File

@ -9,7 +9,8 @@ import (
"testing"
"github.com/efficientgo/core/testutil"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/prometheus/prometheus/model/labels"
"github.com/prometheus/prometheus/storage"
"github.com/prometheus/prometheus/tsdb/chunkenc"

View File

@ -10,7 +10,7 @@ import (
"path/filepath"
"github.com/go-kit/log"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
"github.com/thanos-io/objstore"

View File

@ -12,7 +12,7 @@ import (
"testing"
"github.com/go-kit/log"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"

View File

@ -10,7 +10,8 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
"github.com/thanos-io/objstore"

View File

@ -13,7 +13,8 @@ import (
"time"
"github.com/go-kit/log"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
promtest "github.com/prometheus/client_golang/prometheus/testutil"

View File

@ -14,7 +14,8 @@ import (
"time"
"github.com/go-kit/log"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/prometheus/common/model"
"github.com/prometheus/prometheus/model/labels"

View File

@ -4,7 +4,7 @@
package extflag
import (
"gopkg.in/alecthomas/kingpin.v2"
"github.com/alecthomas/kingpin/v2"
)
type FlagClause interface {

View File

@ -9,60 +9,14 @@ import (
"sort"
"text/template"
"github.com/alecthomas/kingpin/v2"
"github.com/go-kit/log"
"github.com/oklog/run"
"github.com/opentracing/opentracing-go"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
"gopkg.in/alecthomas/kingpin.v2"
)
const UsageTemplate = `{{define "FormatCommand"}}\
{{if .FlagSummary}} {{.FlagSummary}}{{end}}\
{{range .Args}} {{if not .Required}}[{{end}}<{{.Name}}>{{if .Value|IsCumulative}}...{{end}}{{if not .Required}}]{{end}}{{end}}\
{{end}}\
{{define "FormatCommands"}}\
{{range .FlattenedCommands}}\
{{if not .Hidden}}\
{{.FullCommand}}{{if .Default}}*{{end}}{{template "FormatCommand" .}}
{{.Help|Wrap 4}}
{{end}}\
{{end}}\
{{end}}\
{{define "FormatUsage"}}\
{{template "FormatCommand" .}}{{if .Commands}} <command> [<args> ...]{{end}}
{{if .Help}}
{{.Help|Wrap 0}}\
{{end}}\
{{end}}\
{{if .Context.SelectedCommand}}\
usage: {{.App.Name}} {{.Context.SelectedCommand}}{{template "FormatUsage" .Context.SelectedCommand}}
{{else}}\
usage: {{.App.Name}}{{template "FormatUsage" .App}}
{{end}}\
{{if .Context.Flags}}\
Flags:
{{alphabeticalSort .Context.Flags|FlagsToTwoColumns|FormatTwoColumns}}
{{end}}\
{{if .Context.Args}}\
Args:
{{.Context.Args|ArgsToTwoColumns|FormatTwoColumns}}
{{end}}\
{{if .Context.SelectedCommand}}\
{{if len .Context.SelectedCommand.Commands}}\
Subcommands:
{{template "FormatCommands" .Context.SelectedCommand}}
{{end}}\
{{else if .App.Commands}}\
Commands:
{{template "FormatCommands" .App}}
{{end}}\
`
type FlagClause interface {
Flag(name, help string) *kingpin.FlagClause
}
@ -87,7 +41,6 @@ type App struct {
// NewApp returns new App.
func NewApp(app *kingpin.Application) *App {
app.HelpFlag.Short('h')
app.UsageTemplate(UsageTemplate)
app.UsageFuncs(template.FuncMap{
"alphabeticalSort": func(data []*kingpin.FlagModel) []*kingpin.FlagModel {
sort.Slice(data, func(i, j int) bool { return data[i].Name < data[j].Name })

View File

@ -7,10 +7,10 @@ import (
"fmt"
"strings"
"github.com/alecthomas/kingpin/v2"
extflag "github.com/efficientgo/tools/extkingpin"
"github.com/pkg/errors"
"github.com/prometheus/common/model"
"gopkg.in/alecthomas/kingpin.v2"
)
func ModelDuration(flags *kingpin.FlagClause) *model.Duration {

View File

@ -9,7 +9,8 @@ import (
"math/rand"
"time"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/opentracing/opentracing-go"
"go.opentelemetry.io/otel/trace"

View File

@ -6,9 +6,9 @@ package model
import (
"time"
"github.com/alecthomas/kingpin/v2"
"github.com/prometheus/common/model"
"github.com/prometheus/prometheus/model/timestamp"
"gopkg.in/alecthomas/kingpin.v2"
)
// TimeOrDurationValue is a custom kingping parser for time in RFC3339

View File

@ -7,8 +7,8 @@ import (
"testing"
"time"
"github.com/alecthomas/kingpin/v2"
"github.com/prometheus/prometheus/model/timestamp"
"gopkg.in/alecthomas/kingpin.v2"
"github.com/efficientgo/core/testutil"
"github.com/thanos-io/thanos/pkg/model"

View File

@ -14,7 +14,8 @@ import (
"time"
"github.com/go-kit/log"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/prometheus/common/model"
"github.com/prometheus/prometheus/config"
"github.com/prometheus/prometheus/model/labels"

View File

@ -466,9 +466,12 @@ func TestEndpointSetUpdate_EndpointComingOnline(t *testing.T) {
func TestEndpointSetUpdate_StrictEndpointMetadata(t *testing.T) {
t.Parallel()
info := sidecarInfo
info.Store.MinTime = 111
info.Store.MaxTime = 222
infoCopy := *sidecarInfo
infoCopy.Store = &infopb.StoreInfo{
MinTime: 111,
MaxTime: 222,
}
info := &infoCopy
endpoints, err := startTestEndpoints([]testEndpointMeta{
{
err: fmt.Errorf("endpoint unavailable"),

View File

@ -265,9 +265,6 @@ eval instant at 0m label_replace(testmetric, "dst", "", "dst", ".*")
# label_replace fails when the regex is invalid.
eval_fail instant at 0m label_replace(testmetric, "dst", "value-$1", "src", "(.*")
# label_replace fails when the destination label name is not a valid Prometheus label name.
eval_fail instant at 0m label_replace(testmetric, "invalid-label-name", "", "src", "(.*)")
# label_replace fails when there would be duplicated identical output label sets.
eval_fail instant at 0m label_replace(testmetric, "src", "", "", "")

View File

@ -213,6 +213,7 @@ type Config struct {
DefaultTenant string
TenantCertField string
EnableXFunctions bool
EnableFeatures []string
}
// QueryRangeConfig holds the config for query range tripperware.

View File

@ -380,19 +380,19 @@ func sortPlanForQuery(q string) (sortPlan, error) {
if err != nil {
return 0, err
}
// Check if the root expression is topk or bottomk
// Check if the root expression is topk, bottomk, limitk or limit_ratio
if aggr, ok := expr.(*parser.AggregateExpr); ok {
if aggr.Op == parser.TOPK || aggr.Op == parser.BOTTOMK {
if aggr.Op == parser.TOPK || aggr.Op == parser.BOTTOMK || aggr.Op == parser.LIMITK || aggr.Op == parser.LIMIT_RATIO {
return mergeOnly, nil
}
}
checkForSort := func(expr parser.Expr) (sortAsc, sortDesc bool) {
if n, ok := expr.(*parser.Call); ok {
if n.Func != nil {
if n.Func.Name == "sort" {
if n.Func.Name == "sort" || n.Func.Name == "sort_by_label" {
sortAsc = true
}
if n.Func.Name == "sort_desc" {
if n.Func.Name == "sort_desc" || n.Func.Name == "sort_by_label_desc" {
sortDesc = true
}
}

View File

@ -42,6 +42,27 @@ const (
DefaultCapNProtoPort string = "19391"
)
type endpoints []Endpoint
func (e endpoints) Len() int {
return len(e)
}
func (e endpoints) Less(i, j int) bool {
// Sort by address first, then by CapNProtoAddress.
// First sort by address, then by CapNProtoAddress, then by AZ.
if e[i].Address == e[j].Address {
if e[i].CapNProtoAddress == e[j].CapNProtoAddress {
return e[i].AZ < e[j].AZ
}
return e[i].CapNProtoAddress < e[j].CapNProtoAddress
}
return e[i].Address < e[j].Address
}
func (e endpoints) Swap(i, j int) {
e[i], e[j] = e[j], e[i]
}
type Endpoint struct {
Address string `json:"address"`
CapNProtoAddress string `json:"capnproto_address"`
@ -104,6 +125,23 @@ type HashringConfig struct {
Endpoints []Endpoint `json:"endpoints"`
Algorithm HashringAlgorithm `json:"algorithm,omitempty"`
ExternalLabels labels.Labels `json:"external_labels,omitempty"`
// If non-zero then enable shuffle sharding.
ShuffleShardingConfig ShuffleShardingConfig `json:"shuffle_sharding_config,omitempty"`
}
type ShuffleShardingOverrideConfig struct {
ShardSize int `json:"shard_size"`
Tenants []string `json:"tenants,omitempty"`
TenantMatcherType tenantMatcher `json:"tenant_matcher_type,omitempty"`
}
type ShuffleShardingConfig struct {
ShardSize int `json:"shard_size"`
CacheSize int `json:"cache_size"`
// ZoneAwarenessDisabled disables zone awareness. We still try to spread the load
// across the available zones, but we don't try to balance the shards across zones.
ZoneAwarenessDisabled bool `json:"zone_awareness_disabled"`
Overrides []ShuffleShardingOverrideConfig `json:"overrides,omitempty"`
}
type tenantMatcher string

View File

@ -15,7 +15,7 @@ import (
"time"
"github.com/cespare/xxhash/v2"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"

View File

@ -10,7 +10,8 @@ import (
"errors"
"fmt"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/prometheus/prometheus/model/labels"
"github.com/prometheus/prometheus/storage"
prom_tsdb "github.com/prometheus/prometheus/tsdb"

View File

@ -37,6 +37,7 @@ import (
"github.com/prometheus/prometheus/tsdb"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/trace"
"go.uber.org/atomic"
"google.golang.org/grpc"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
@ -141,6 +142,9 @@ type Handler struct {
writeSamplesTotal *prometheus.HistogramVec
writeTimeseriesTotal *prometheus.HistogramVec
pendingWriteRequests prometheus.Gauge
pendingWriteRequestsCounter atomic.Int32
Limiter *Limiter
}
@ -222,6 +226,12 @@ func NewHandler(logger log.Logger, o *Options) *Handler {
Buckets: []float64{10, 50, 100, 500, 1000, 5000, 10000},
}, []string{"code", "tenant"},
),
pendingWriteRequests: promauto.With(registerer).NewGauge(
prometheus.GaugeOpts{
Name: "thanos_receive_pending_write_requests",
Help: "The number of pending write requests.",
},
),
}
h.forwardRequests.WithLabelValues(labelSuccess)
@ -324,6 +334,8 @@ func (h *Handler) Hashring(hashring Hashring) {
level.Error(h.logger).Log("msg", "closing gRPC connection failed, we might have leaked a file descriptor", "addr", node, "err", err.Error())
}
}
h.hashring.Close()
}
h.hashring = hashring
@ -1015,6 +1027,11 @@ func (h *Handler) sendRemoteWrite(
}
h.peers.markPeerAvailable(endpoint)
} else {
h.forwardRequests.WithLabelValues(labelError).Inc()
if !alreadyReplicated {
h.replications.WithLabelValues(labelError).Inc()
}
// Check if peer connection is unavailable, update the peer state to avoid spamming that peer.
if st, ok := status.FromError(err); ok {
if st.Code() == codes.Unavailable {
@ -1053,6 +1070,9 @@ func (h *Handler) RemoteWrite(ctx context.Context, r *storepb.WriteRequest) (*st
span, ctx := tracing.StartSpan(ctx, "receive_grpc")
defer span.Finish()
h.pendingWriteRequests.Set(float64(h.pendingWriteRequestsCounter.Add(1)))
defer h.pendingWriteRequestsCounter.Add(-1)
_, err := h.handleRequest(ctx, uint64(r.Replica), r.Tenant, &prompb.WriteRequest{Timeseries: r.Timeseries})
if err != nil {
level.Debug(h.logger).Log("msg", "failed to handle request", "err", err)

View File

@ -281,7 +281,7 @@ func newTestHandlerHashring(
hashringAlgo = AlgorithmHashmod
}
hashring, err := NewMultiHashring(hashringAlgo, replicationFactor, cfg)
hashring, err := NewMultiHashring(hashringAlgo, replicationFactor, cfg, prometheus.NewRegistry())
if err != nil {
return nil, nil, nil, err
}
@ -1108,6 +1108,7 @@ func benchmarkHandlerMultiTSDBReceiveRemoteWrite(b testutil.TB) {
"tenant_id",
nil,
false,
false,
metadata.NoneFunc,
)
defer func() { testutil.Ok(b, m.Close()) }()

View File

@ -4,19 +4,26 @@
package receive
import (
"crypto/md5"
"encoding/binary"
"fmt"
"math"
"math/rand"
"path/filepath"
"slices"
"sort"
"strconv"
"strings"
"sync"
"unsafe"
"github.com/cespare/xxhash/v2"
"github.com/go-kit/log"
"github.com/go-kit/log/level"
lru "github.com/hashicorp/golang-lru/v2"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
"github.com/thanos-io/thanos/pkg/store/labelpb"
"github.com/thanos-io/thanos/pkg/store/storepb/prompb"
@ -51,22 +58,19 @@ func (i *insufficientNodesError) Error() string {
// for a specified tenant.
// It returns the node and any error encountered.
type Hashring interface {
// Get returns the first node that should handle the given tenant and time series.
Get(tenant string, timeSeries *prompb.TimeSeries) (Endpoint, error)
// GetN returns the nth node that should handle the given tenant and time series.
GetN(tenant string, timeSeries *prompb.TimeSeries, n uint64) (Endpoint, error)
// Nodes returns a sorted slice of nodes that are in this hashring. Addresses could be duplicated
// if, for example, the same address is used for multiple tenants in the multi-hashring.
Nodes() []Endpoint
Close()
}
// SingleNodeHashring always returns the same node.
type SingleNodeHashring string
// Get implements the Hashring interface.
func (s SingleNodeHashring) Get(tenant string, ts *prompb.TimeSeries) (Endpoint, error) {
return s.GetN(tenant, ts, 0)
}
func (s SingleNodeHashring) Close() {}
func (s SingleNodeHashring) Nodes() []Endpoint {
return []Endpoint{{Address: string(s), CapNProtoAddress: string(s)}}
@ -86,6 +90,8 @@ func (s SingleNodeHashring) GetN(_ string, _ *prompb.TimeSeries, n uint64) (Endp
// simpleHashring represents a group of nodes handling write requests by hashmoding individual series.
type simpleHashring []Endpoint
func (s simpleHashring) Close() {}
func newSimpleHashring(endpoints []Endpoint) (Hashring, error) {
for i := range endpoints {
if endpoints[i].AZ != "" {
@ -138,6 +144,8 @@ type ketamaHashring struct {
numEndpoints uint64
}
func (s ketamaHashring) Close() {}
func newKetamaHashring(endpoints []Endpoint, sectionsPerNode int, replicationFactor uint64) (*ketamaHashring, error) {
numSections := len(endpoints) * sectionsPerNode
@ -286,6 +294,12 @@ type multiHashring struct {
nodes []Endpoint
}
func (s *multiHashring) Close() {
for _, h := range s.hashrings {
h.Close()
}
}
// Get returns a target to handle the given tenant and time series.
func (m *multiHashring) Get(tenant string, ts *prompb.TimeSeries) (Endpoint, error) {
return m.GetN(tenant, ts, 0)
@ -335,11 +349,306 @@ func (m *multiHashring) Nodes() []Endpoint {
return m.nodes
}
// shuffleShardHashring wraps a hashring implementation and applies shuffle sharding logic
// to limit which nodes are used for each tenant.
type shuffleShardHashring struct {
baseRing Hashring
shuffleShardingConfig ShuffleShardingConfig
replicationFactor uint64
nodes []Endpoint
cache *lru.Cache[string, *ketamaHashring]
metrics *shuffleShardCacheMetrics
}
func (s *shuffleShardHashring) Close() {
s.metrics.close()
}
func (s *shuffleShardCacheMetrics) close() {
s.reg.Unregister(s.requestsTotal)
s.reg.Unregister(s.hitsTotal)
s.reg.Unregister(s.numItems)
s.reg.Unregister(s.maxItems)
s.reg.Unregister(s.evicted)
}
type shuffleShardCacheMetrics struct {
requestsTotal prometheus.Counter
hitsTotal prometheus.Counter
numItems prometheus.Gauge
maxItems prometheus.Gauge
evicted prometheus.Counter
reg prometheus.Registerer
}
func newShuffleShardCacheMetrics(reg prometheus.Registerer, hashringName string) *shuffleShardCacheMetrics {
reg = prometheus.WrapRegistererWith(prometheus.Labels{"hashring": hashringName}, reg)
return &shuffleShardCacheMetrics{
reg: reg,
requestsTotal: promauto.With(reg).NewCounter(prometheus.CounterOpts{
Name: "thanos_shuffle_shard_cache_requests_total",
Help: "Total number of cache requests for shuffle shard subrings",
}),
hitsTotal: promauto.With(reg).NewCounter(prometheus.CounterOpts{
Name: "thanos_shuffle_shard_cache_hits_total",
Help: "Total number of cache hits for shuffle shard subrings",
}),
numItems: promauto.With(reg).NewGauge(prometheus.GaugeOpts{
Name: "thanos_shuffle_shard_cache_items",
Help: "Total number of cached items",
}),
maxItems: promauto.With(reg).NewGauge(prometheus.GaugeOpts{
Name: "thanos_shuffle_shard_cache_max_items",
Help: "Maximum number of items that can be cached",
}),
evicted: promauto.With(reg).NewCounter(prometheus.CounterOpts{
Name: "thanos_shuffle_shard_cache_evicted_total",
Help: "Total number of items evicted from the cache",
}),
}
}
// newShuffleShardHashring creates a new shuffle sharding hashring wrapper.
func newShuffleShardHashring(baseRing Hashring, shuffleShardingConfig ShuffleShardingConfig, replicationFactor uint64, reg prometheus.Registerer, name string) (*shuffleShardHashring, error) {
l := log.NewNopLogger()
level.Info(l).Log(
"msg", "Creating shuffle sharding hashring",
"default_shard_size", shuffleShardingConfig.ShardSize,
"total_nodes", len(baseRing.Nodes()),
)
if len(shuffleShardingConfig.Overrides) > 0 {
for _, override := range shuffleShardingConfig.Overrides {
level.Info(l).Log(
"msg", "Tenant shard size override",
"tenants", override.Tenants,
"tenant_matcher_type", override.TenantMatcherType,
"shard_size", override.ShardSize,
)
}
}
const DefaultShuffleShardingCacheSize = 100
if shuffleShardingConfig.CacheSize <= 0 {
shuffleShardingConfig.CacheSize = DefaultShuffleShardingCacheSize
}
metrics := newShuffleShardCacheMetrics(reg, name)
metrics.maxItems.Set(float64(shuffleShardingConfig.CacheSize))
cache, err := lru.NewWithEvict[string, *ketamaHashring](shuffleShardingConfig.CacheSize, func(key string, value *ketamaHashring) {
metrics.evicted.Inc()
metrics.numItems.Dec()
})
if err != nil {
return nil, err
}
ssh := &shuffleShardHashring{
baseRing: baseRing,
shuffleShardingConfig: shuffleShardingConfig,
replicationFactor: replicationFactor,
cache: cache,
metrics: metrics,
}
// Dedupe nodes as the base ring may have duplicates. We are only interested in unique nodes.
ssh.nodes = ssh.dedupedNodes()
nodeCountByAZ := make(map[string]int)
for _, node := range ssh.nodes {
var az string = node.AZ
if shuffleShardingConfig.ZoneAwarenessDisabled {
az = ""
}
nodeCountByAZ[az]++
}
maxNodesInAZ := 0
for _, count := range nodeCountByAZ {
maxNodesInAZ = max(maxNodesInAZ, count)
}
if shuffleShardingConfig.ShardSize > maxNodesInAZ {
level.Warn(l).Log(
"msg", "Shard size is larger than the maximum number of nodes in any AZ; some tenants might get all not working nodes if that AZ goes down",
"shard_size", shuffleShardingConfig.ShardSize,
"max_nodes_in_az", maxNodesInAZ,
)
}
for _, override := range shuffleShardingConfig.Overrides {
if override.ShardSize < maxNodesInAZ {
continue
}
level.Warn(l).Log(
"msg", "Shard size is larger than the maximum number of nodes in any AZ; some tenants might get all not working nodes if that AZ goes down",
"max_nodes_in_az", maxNodesInAZ,
"shard_size", override.ShardSize,
"tenants", override.Tenants,
"tenant_matcher_type", override.TenantMatcherType,
)
}
return ssh, nil
}
func (s *shuffleShardHashring) Nodes() []Endpoint {
return s.nodes
}
func (s *shuffleShardHashring) dedupedNodes() []Endpoint {
uniqueNodes := make(map[Endpoint]struct{})
for _, node := range s.baseRing.Nodes() {
uniqueNodes[node] = struct{}{}
}
// Convert the map back to a slice
nodes := make(endpoints, 0, len(uniqueNodes))
for node := range uniqueNodes {
nodes = append(nodes, node)
}
sort.Sort(nodes)
return nodes
}
// getShardSize returns the shard size for a specific tenant, taking into account any overrides.
func (s *shuffleShardHashring) getShardSize(tenant string) int {
for _, override := range s.shuffleShardingConfig.Overrides {
if override.TenantMatcherType == TenantMatcherTypeExact {
for _, t := range override.Tenants {
if t == tenant {
return override.ShardSize
}
}
} else if override.TenantMatcherType == TenantMatcherGlob {
for _, t := range override.Tenants {
matches, err := filepath.Match(t, tenant)
if err == nil && matches {
return override.ShardSize
}
}
}
}
// Default shard size is used if no overrides match
return s.shuffleShardingConfig.ShardSize
}
// ShuffleShardExpectedInstancesPerZone returns the expected number of instances per zone for a given shard size and number of zones.
// Copied from Cortex. Copyright Cortex Authors.
func ShuffleShardExpectedInstancesPerZone(shardSize, numZones int) int {
return int(math.Ceil(float64(shardSize) / float64(numZones)))
}
var (
seedSeparator = []byte{0}
)
// yoloBuf will return an unsafe pointer to a string, as the name yoloBuf implies. Use at your own risk.
func yoloBuf(s string) []byte {
return *((*[]byte)(unsafe.Pointer(&s)))
}
// ShuffleShardSeed returns seed for random number generator, computed from provided identifier.
// Copied from Cortex. Copyright Cortex Authors.
func ShuffleShardSeed(identifier, zone string) int64 {
// Use the identifier to compute a hash we'll use to seed the random.
hasher := md5.New()
hasher.Write(yoloBuf(identifier)) // nolint:errcheck
if zone != "" {
hasher.Write(seedSeparator) // nolint:errcheck
hasher.Write(yoloBuf(zone)) // nolint:errcheck
}
checksum := hasher.Sum(nil)
// Generate the seed based on the first 64 bits of the checksum.
return int64(binary.BigEndian.Uint64(checksum))
}
func (s *shuffleShardHashring) getTenantShardCached(tenant string) (*ketamaHashring, error) {
s.metrics.requestsTotal.Inc()
cached, ok := s.cache.Get(tenant)
if ok {
s.metrics.hitsTotal.Inc()
return cached, nil
}
h, err := s.getTenantShard(tenant)
if err != nil {
return nil, err
}
s.metrics.numItems.Inc()
s.cache.Add(tenant, h)
return h, nil
}
// getTenantShard returns or creates a consistent subset of nodes for a tenant.
func (s *shuffleShardHashring) getTenantShard(tenant string) (*ketamaHashring, error) {
nodes := s.Nodes()
nodesByAZ := make(map[string][]Endpoint)
for _, node := range nodes {
var az = node.AZ
if s.shuffleShardingConfig.ZoneAwarenessDisabled {
az = ""
}
nodesByAZ[az] = append(nodesByAZ[az], node)
}
ss := s.getShardSize(tenant)
var take int
if s.shuffleShardingConfig.ZoneAwarenessDisabled {
take = ss
} else {
take = ShuffleShardExpectedInstancesPerZone(ss, len(nodesByAZ))
}
var finalNodes = make([]Endpoint, 0, take*len(nodesByAZ))
for az, azNodes := range nodesByAZ {
seed := ShuffleShardSeed(tenant, az)
r := rand.New(rand.NewSource(seed))
r.Shuffle(len(azNodes), func(i, j int) {
azNodes[i], azNodes[j] = azNodes[j], azNodes[i]
})
if take > len(azNodes) {
return nil, fmt.Errorf("shard size %d is larger than number of nodes in AZ %s (%d)", ss, az, len(azNodes))
}
finalNodes = append(finalNodes, azNodes[:take]...)
}
return newKetamaHashring(finalNodes, SectionsPerNode, s.replicationFactor)
}
// GetN returns the nth endpoint for a tenant and time series, respecting the shuffle sharding.
func (s *shuffleShardHashring) GetN(tenant string, ts *prompb.TimeSeries, n uint64) (Endpoint, error) {
h, err := s.getTenantShardCached(tenant)
if err != nil {
return Endpoint{}, err
}
return h.GetN(tenant, ts, n)
}
// newMultiHashring creates a multi-tenant hashring for a given slice of
// groups.
// Which hashring to use for a tenant is determined
// by the tenants field of the hashring configuration.
func NewMultiHashring(algorithm HashringAlgorithm, replicationFactor uint64, cfg []HashringConfig) (Hashring, error) {
func NewMultiHashring(algorithm HashringAlgorithm, replicationFactor uint64, cfg []HashringConfig, reg prometheus.Registerer) (Hashring, error) {
m := &multiHashring{
cache: make(map[string]Hashring),
}
@ -351,7 +660,7 @@ func NewMultiHashring(algorithm HashringAlgorithm, replicationFactor uint64, cfg
if h.Algorithm != "" {
activeAlgorithm = h.Algorithm
}
hashring, err = newHashring(activeAlgorithm, h.Endpoints, replicationFactor, h.Hashring, h.Tenants)
hashring, err = newHashring(activeAlgorithm, h.Endpoints, replicationFactor, h.Hashring, h.Tenants, h.ShuffleShardingConfig, reg)
if err != nil {
return nil, err
}
@ -372,17 +681,38 @@ func NewMultiHashring(algorithm HashringAlgorithm, replicationFactor uint64, cfg
return m, nil
}
func newHashring(algorithm HashringAlgorithm, endpoints []Endpoint, replicationFactor uint64, hashring string, tenants []string) (Hashring, error) {
func newHashring(algorithm HashringAlgorithm, endpoints []Endpoint, replicationFactor uint64, hashring string, tenants []string, shuffleShardingConfig ShuffleShardingConfig, reg prometheus.Registerer) (Hashring, error) {
switch algorithm {
case AlgorithmHashmod:
return newSimpleHashring(endpoints)
ringImpl, err := newSimpleHashring(endpoints)
if err != nil {
return nil, err
}
if shuffleShardingConfig.ShardSize > 0 {
return nil, fmt.Errorf("hashmod algorithm does not support shuffle sharding. Either use Ketama or remove shuffle sharding configuration")
}
return ringImpl, nil
case AlgorithmKetama:
return newKetamaHashring(endpoints, SectionsPerNode, replicationFactor)
ringImpl, err := newKetamaHashring(endpoints, SectionsPerNode, replicationFactor)
if err != nil {
return nil, err
}
if shuffleShardingConfig.ShardSize > 0 {
if shuffleShardingConfig.ShardSize > len(endpoints) {
return nil, fmt.Errorf("shard size %d is larger than number of nodes in hashring %s (%d)", shuffleShardingConfig.ShardSize, hashring, len(endpoints))
}
return newShuffleShardHashring(ringImpl, shuffleShardingConfig, replicationFactor, reg, hashring)
}
return ringImpl, nil
default:
l := log.NewNopLogger()
level.Warn(l).Log("msg", "Unrecognizable hashring algorithm. Fall back to hashmod algorithm.",
"hashring", hashring,
"tenants", tenants)
if shuffleShardingConfig.ShardSize > 0 {
return nil, fmt.Errorf("hashmod algorithm does not support shuffle sharding. Either use Ketama or remove shuffle sharding configuration")
}
return newSimpleHashring(endpoints)
}
}

View File

@ -12,6 +12,7 @@ import (
"github.com/efficientgo/core/testutil"
"github.com/stretchr/testify/require"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/prometheus/model/labels"
"github.com/thanos-io/thanos/pkg/store/labelpb"
@ -198,10 +199,10 @@ func TestHashringGet(t *testing.T) {
tenant: "t2",
},
} {
hs, err := NewMultiHashring(AlgorithmHashmod, 3, tc.cfg)
hs, err := NewMultiHashring(AlgorithmHashmod, 3, tc.cfg, prometheus.NewRegistry())
require.NoError(t, err)
h, err := hs.Get(tc.tenant, ts)
h, err := hs.GetN(tc.tenant, ts, 0)
if tc.nodes != nil {
if err != nil {
t.Errorf("case %q: got unexpected error: %v", tc.name, err)
@ -661,16 +662,200 @@ func TestInvalidAZHashringCfg(t *testing.T) {
{
cfg: []HashringConfig{{Endpoints: []Endpoint{{Address: "a", AZ: "1"}, {Address: "b", AZ: "2"}}}},
replicas: 2,
algorithm: AlgorithmHashmod,
expectedError: "Hashmod algorithm does not support AZ aware hashring configuration. Either use Ketama or remove AZ configuration.",
},
} {
t.Run("", func(t *testing.T) {
_, err := NewMultiHashring(tt.algorithm, tt.replicas, tt.cfg)
_, err := NewMultiHashring(tt.algorithm, tt.replicas, tt.cfg, prometheus.NewRegistry())
require.EqualError(t, err, tt.expectedError)
})
}
}
func TestShuffleShardHashring(t *testing.T) {
t.Parallel()
for _, tc := range []struct {
name string
endpoints []Endpoint
tenant string
shuffleShardCfg ShuffleShardingConfig
err string
usedNodes int
nodeAddrs map[string]struct{}
}{
{
usedNodes: 3,
name: "ketama with shuffle sharding",
endpoints: []Endpoint{
{Address: "node-1", AZ: "az-1"},
{Address: "node-2", AZ: "az-1"},
{Address: "node-3", AZ: "az-2"},
{Address: "node-4", AZ: "az-2"},
{Address: "node-5", AZ: "az-3"},
{Address: "node-6", AZ: "az-3"},
},
tenant: "tenant-1",
shuffleShardCfg: ShuffleShardingConfig{
ShardSize: 2,
Overrides: []ShuffleShardingOverrideConfig{
{
Tenants: []string{"special-tenant"},
ShardSize: 2,
},
},
},
},
{
usedNodes: 3,
name: "ketama with glob tenant override",
endpoints: []Endpoint{
{Address: "node-1", AZ: "az-1"},
{Address: "node-2", AZ: "az-1"},
{Address: "node-3", AZ: "az-2"},
{Address: "node-4", AZ: "az-2"},
{Address: "node-5", AZ: "az-3"},
{Address: "node-6", AZ: "az-3"},
},
tenant: "prefix-tenant",
shuffleShardCfg: ShuffleShardingConfig{
ShardSize: 2,
Overrides: []ShuffleShardingOverrideConfig{
{
Tenants: []string{"prefix*"},
ShardSize: 3,
TenantMatcherType: TenantMatcherGlob,
},
},
},
},
{
name: "big shard size",
endpoints: []Endpoint{
{Address: "node-1", AZ: "az-1"},
{Address: "node-2", AZ: "az-1"},
{Address: "node-3", AZ: "az-2"},
{Address: "node-4", AZ: "az-2"},
{Address: "node-5", AZ: "az-3"},
{Address: "node-6", AZ: "az-3"},
},
tenant: "prefix-tenant",
err: `shard size 20 is larger than number of nodes in AZ`,
shuffleShardCfg: ShuffleShardingConfig{
ShardSize: 2,
Overrides: []ShuffleShardingOverrideConfig{
{
Tenants: []string{"prefix*"},
ShardSize: 20,
TenantMatcherType: TenantMatcherGlob,
},
},
},
},
{
name: "zone awareness disabled",
endpoints: []Endpoint{
{Address: "node-1", AZ: "az-1"},
{Address: "node-2", AZ: "az-1"},
{Address: "node-3", AZ: "az-2"},
{Address: "node-4", AZ: "az-2"},
{Address: "node-5", AZ: "az-2"},
{Address: "node-6", AZ: "az-2"},
{Address: "node-7", AZ: "az-3"},
{Address: "node-8", AZ: "az-3"},
},
tenant: "prefix-tenant",
usedNodes: 3,
nodeAddrs: map[string]struct{}{
"node-1": {},
"node-2": {},
"node-6": {},
},
shuffleShardCfg: ShuffleShardingConfig{
ShardSize: 1,
ZoneAwarenessDisabled: true,
Overrides: []ShuffleShardingOverrideConfig{
{
Tenants: []string{"prefix*"},
ShardSize: 3,
TenantMatcherType: TenantMatcherGlob,
},
},
},
},
} {
t.Run(tc.name, func(t *testing.T) {
var baseRing Hashring
var err error
baseRing, err = newKetamaHashring(tc.endpoints, SectionsPerNode, 2)
require.NoError(t, err)
// Create the shuffle shard hashring
shardRing, err := newShuffleShardHashring(baseRing, tc.shuffleShardCfg, 2, prometheus.NewRegistry(), "test")
require.NoError(t, err)
// Test that the shuffle sharding is consistent
usedNodes := make(map[string]struct{})
// We'll sample multiple times to ensure consistency
for i := 0; i < 100; i++ {
ts := &prompb.TimeSeries{
Labels: []labelpb.ZLabel{
{
Name: "iteration",
Value: fmt.Sprintf("%d", i),
},
},
}
h, err := shardRing.GetN(tc.tenant, ts, 0)
if tc.err != "" {
require.Error(t, err)
require.Contains(t, err.Error(), tc.err)
return
}
require.NoError(t, err)
usedNodes[h.Address] = struct{}{}
}
require.Len(t, usedNodes, tc.usedNodes)
if tc.nodeAddrs != nil {
require.Len(t, usedNodes, len(tc.nodeAddrs))
require.Equal(t, tc.nodeAddrs, usedNodes)
}
// Test consistency - same tenant should always get same nodes.
for trial := 0; trial < 50; trial++ {
trialNodes := make(map[string]struct{})
for i := 0; i < 10+trial; i++ {
ts := &prompb.TimeSeries{
Labels: []labelpb.ZLabel{
{
Name: "iteration",
Value: fmt.Sprintf("%d", i),
},
{
Name: "trial",
Value: fmt.Sprintf("%d", trial),
},
},
}
h, err := shardRing.GetN(tc.tenant, ts, 0)
require.NoError(t, err)
trialNodes[h.Address] = struct{}{}
}
// Same tenant should get same set of nodes in every trial
require.Equal(t, usedNodes, trialNodes, "Inconsistent node sharding between trials")
}
})
}
}
func makeSeries() []prompb.TimeSeries {
numSeries := 10000
series := make([]prompb.TimeSeries, numSeries)

View File

@ -16,7 +16,8 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/prometheus/model/labels"
@ -61,6 +62,7 @@ type MultiTSDB struct {
mtx *sync.RWMutex
tenants map[string]*tenant
allowOutOfOrderUpload bool
skipCorruptedBlocks bool
hashFunc metadata.HashFunc
hashringConfigs []HashringConfig
@ -114,6 +116,7 @@ func NewMultiTSDB(
tenantLabelName string,
bucket objstore.Bucket,
allowOutOfOrderUpload bool,
skipCorruptedBlocks bool,
hashFunc metadata.HashFunc,
options ...MultiTSDBOption,
) *MultiTSDB {
@ -134,6 +137,7 @@ func NewMultiTSDB(
tenantLabelName: tenantLabelName,
bucket: bucket,
allowOutOfOrderUpload: allowOutOfOrderUpload,
skipCorruptedBlocks: skipCorruptedBlocks,
hashFunc: hashFunc,
matcherCache: storecache.NoopMatchersCache,
}
@ -753,16 +757,16 @@ func (t *MultiTSDB) startTSDB(logger log.Logger, tenantID string, tenant *tenant
var ship *shipper.Shipper
if t.bucket != nil {
ship = shipper.New(
logger,
reg,
dataDir,
t.bucket,
func() labels.Labels { return lset },
metadata.ReceiveSource,
nil,
t.allowOutOfOrderUpload,
t.hashFunc,
shipper.DefaultMetaFilename,
dataDir,
shipper.WithLogger(logger),
shipper.WithRegisterer(reg),
shipper.WithSource(metadata.ReceiveSource),
shipper.WithHashFunc(t.hashFunc),
shipper.WithMetaFileName(shipper.DefaultMetaFilename),
shipper.WithLabels(func() labels.Labels { return lset }),
shipper.WithAllowOutOfOrderUploads(t.allowOutOfOrderUpload),
shipper.WithSkipCorruptedBlocks(t.skipCorruptedBlocks),
)
}
var options []store.TSDBStoreOption

View File

@ -18,7 +18,8 @@ import (
"github.com/efficientgo/core/testutil"
"github.com/go-kit/log"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/prometheus/model/exemplar"
"github.com/prometheus/prometheus/model/labels"
@ -53,7 +54,7 @@ func TestMultiTSDB(t *testing.T) {
NoLockfile: true,
MaxExemplars: 100,
EnableExemplarStorage: true,
}, labels.FromStrings("replica", "01"), "tenant_id", nil, false, metadata.NoneFunc)
}, labels.FromStrings("replica", "01"), "tenant_id", nil, false, false, metadata.NoneFunc)
defer func() { testutil.Ok(t, m.Close()) }()
testutil.Ok(t, m.Flush())
@ -136,6 +137,7 @@ func TestMultiTSDB(t *testing.T) {
"tenant_id",
nil,
false,
false,
metadata.NoneFunc,
)
defer func() { testutil.Ok(t, m.Close()) }()
@ -173,7 +175,7 @@ func TestMultiTSDB(t *testing.T) {
MaxBlockDuration: (2 * time.Hour).Milliseconds(),
RetentionDuration: (6 * time.Hour).Milliseconds(),
NoLockfile: true,
}, labels.FromStrings("replica", "01"), "tenant_id", nil, false, metadata.NoneFunc)
}, labels.FromStrings("replica", "01"), "tenant_id", nil, false, false, metadata.NoneFunc)
defer func() { testutil.Ok(t, m.Close()) }()
testutil.Ok(t, m.Flush())
@ -441,6 +443,7 @@ func TestMultiTSDBPrune(t *testing.T) {
"tenant_id",
test.bucket,
false,
false,
metadata.NoneFunc,
)
defer func() { testutil.Ok(t, m.Close()) }()
@ -516,6 +519,7 @@ func TestMultiTSDBRecreatePrunedTenant(t *testing.T) {
"tenant_id",
objstore.NewInMemBucket(),
false,
false,
metadata.NoneFunc,
)
defer func() { testutil.Ok(t, m.Close()) }()
@ -545,6 +549,7 @@ func TestMultiTSDBAddNewTenant(t *testing.T) {
"tenant_id",
objstore.NewInMemBucket(),
false,
false,
metadata.NoneFunc,
)
defer func() { testutil.Ok(t, m.Close()) }()
@ -620,6 +625,7 @@ func TestAlignedHeadFlush(t *testing.T) {
"tenant_id",
test.bucket,
false,
false,
metadata.NoneFunc,
)
defer func() { testutil.Ok(t, m.Close()) }()
@ -696,6 +702,7 @@ func TestMultiTSDBStats(t *testing.T) {
"tenant_id",
nil,
false,
false,
metadata.NoneFunc,
)
defer func() { testutil.Ok(t, m.Close()) }()
@ -727,6 +734,7 @@ func TestMultiTSDBWithNilStore(t *testing.T) {
"tenant_id",
nil,
false,
false,
metadata.NoneFunc,
)
defer func() { testutil.Ok(t, m.Close()) }()
@ -770,6 +778,7 @@ func TestProxyLabelValues(t *testing.T) {
"tenant_id",
nil,
false,
false,
metadata.NoneFunc,
)
defer func() { testutil.Ok(t, m.Close()) }()
@ -863,6 +872,7 @@ func BenchmarkMultiTSDB(b *testing.B) {
"tenant_id",
nil,
false,
false,
metadata.NoneFunc,
)
defer func() { testutil.Ok(b, m.Close()) }()
@ -943,7 +953,13 @@ func TestMultiTSDBDoesNotDeleteNotUploadedBlocks(t *testing.T) {
Uploaded: []ulid.ULID{mockBlockIDs[0]},
}))
tenant.ship = shipper.New(log.NewNopLogger(), nil, td, nil, nil, metadata.BucketUploadSource, nil, false, metadata.NoneFunc, "")
tenant.ship = shipper.New(
nil,
td,
shipper.WithLogger(log.NewNopLogger()),
shipper.WithSource(metadata.BucketUploadSource),
shipper.WithHashFunc(metadata.NoneFunc),
)
require.Equal(t, map[ulid.ULID]struct{}{
mockBlockIDs[0]: {},
}, tenant.blocksToDelete(nil))

View File

@ -25,9 +25,9 @@ import (
"go.opentelemetry.io/collector/pdata/pmetric"
conventions "go.opentelemetry.io/collector/semconv/v1.6.1"
prometheustranslator "github.com/prometheus/otlptranslator"
"github.com/prometheus/prometheus/model/timestamp"
"github.com/prometheus/prometheus/model/value"
prometheustranslator "github.com/prometheus/prometheus/storage/remote/otlptranslator/prometheus"
"github.com/thanos-io/thanos/pkg/store/storepb/prompb"
)

View File

@ -818,6 +818,7 @@ func initializeMultiTSDB(dir string) *MultiTSDB {
"tenant_id",
bucket,
false,
false,
metadata.NoneFunc,
)

View File

@ -439,6 +439,7 @@ func setupMultitsdb(t *testing.T, maxExemplars int64) (log.Logger, *MultiTSDB, A
"tenant_id",
nil,
false,
false,
metadata.NoneFunc,
)
t.Cleanup(func() { testutil.Ok(t, m.Close()) })
@ -504,6 +505,7 @@ func benchmarkWriter(b *testing.B, labelsNum int, seriesNum int, generateHistogr
"tenant_id",
nil,
false,
false,
metadata.NoneFunc,
)
b.Cleanup(func() { testutil.Ok(b, m.Close()) })

View File

@ -315,6 +315,8 @@ faulty_config:
}
func TestReloader_ConfigDirApply(t *testing.T) {
t.Skip("Flaky")
t.Parallel()
l, err := net.Listen("tcp", "localhost:0")
@ -618,6 +620,8 @@ func TestReloader_ConfigDirApply(t *testing.T) {
}
func TestReloader_ConfigDirApplyBasedOnWatchInterval(t *testing.T) {
t.Skip("Flaky")
t.Parallel()
l, err := net.Listen("tcp", "localhost:0")

View File

@ -12,7 +12,8 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/oklog/run"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/opentracing/opentracing-go"
"github.com/pkg/errors"
amlabels "github.com/prometheus/alertmanager/pkg/labels"

View File

@ -13,7 +13,7 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"

View File

@ -16,7 +16,7 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/prometheus/prometheus/model/labels"
"github.com/prometheus/prometheus/tsdb"

View File

@ -263,12 +263,12 @@ func (g configRuleAdapter) validate() (errs []error) {
set[g.group.Name] = struct{}{}
for i, r := range g.group.Rules {
for _, node := range r.Validate() {
for _, node := range r.Validate(rulefmt.RuleNode{}) {
var ruleName string
if r.Alert.Value != "" {
ruleName = r.Alert.Value
if r.Alert != "" {
ruleName = r.Alert
} else {
ruleName = r.Record.Value
ruleName = r.Record
}
errs = append(errs, &rulefmt.Error{
Group: g.group.Name,

View File

@ -27,7 +27,6 @@ import (
type Server struct {
logger log.Logger
comp component.Component
prober *prober.HTTPProbe
mux *http.ServeMux
srv *http.Server
@ -62,7 +61,6 @@ func New(logger log.Logger, reg *prometheus.Registry, comp component.Component,
return &Server{
logger: log.With(logger, "service", "http/server", "component", comp.String()),
comp: comp,
prober: prober,
mux: mux,
srv: &http.Server{Addr: options.listen, Handler: h},
opts: options,

View File

@ -9,7 +9,7 @@ import (
"net/http"
"time"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
)
type ctxKey int

View File

@ -8,7 +8,7 @@ package shipper
import (
"context"
"encoding/json"
"math"
"io/fs"
"os"
"path"
"path/filepath"
@ -17,7 +17,7 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
@ -37,6 +37,7 @@ type metrics struct {
dirSyncFailures prometheus.Counter
uploads prometheus.Counter
uploadFailures prometheus.Counter
corruptedBlocks prometheus.Counter
uploadedCompacted prometheus.Gauge
}
@ -59,6 +60,10 @@ func newMetrics(reg prometheus.Registerer) *metrics {
Name: "thanos_shipper_upload_failures_total",
Help: "Total number of block upload failures",
})
m.corruptedBlocks = promauto.With(reg).NewCounter(prometheus.CounterOpts{
Name: "thanos_shipper_corrupted_blocks_total",
Help: "Total number of corrupted blocks",
})
m.uploadedCompacted = promauto.With(reg).NewGauge(prometheus.GaugeOpts{
Name: "thanos_shipper_upload_compacted_done",
Help: "If 1 it means shipper uploaded all compacted blocks from the filesystem.",
@ -76,56 +81,135 @@ type Shipper struct {
source metadata.SourceType
metadataFilePath string
uploadCompactedFunc func() bool
uploadCompacted bool
allowOutOfOrderUploads bool
skipCorruptedBlocks bool
hashFunc metadata.HashFunc
labels func() labels.Labels
mtx sync.RWMutex
}
var (
ErrorSyncBlockCorrupted = errors.New("corrupted blocks found")
)
type shipperOptions struct {
logger log.Logger
r prometheus.Registerer
source metadata.SourceType
hashFunc metadata.HashFunc
metaFileName string
lbls func() labels.Labels
uploadCompacted bool
allowOutOfOrderUploads bool
skipCorruptedBlocks bool
}
type Option func(*shipperOptions)
// WithLogger sets the logger.
func WithLogger(logger log.Logger) Option {
return func(o *shipperOptions) {
o.logger = logger
}
}
// WithRegisterer sets the Prometheus registerer.
func WithRegisterer(r prometheus.Registerer) Option {
return func(o *shipperOptions) {
o.r = r
}
}
// WithSource sets the metadata source type.
func WithSource(source metadata.SourceType) Option {
return func(o *shipperOptions) {
o.source = source
}
}
// WithHashFunc sets the hash function.
func WithHashFunc(hashFunc metadata.HashFunc) Option {
return func(o *shipperOptions) {
o.hashFunc = hashFunc
}
}
// WithMetaFileName sets the meta file name.
func WithMetaFileName(name string) Option {
return func(o *shipperOptions) {
o.metaFileName = name
}
}
// WithLabels sets the labels function.
func WithLabels(lbls func() labels.Labels) Option {
return func(o *shipperOptions) {
o.lbls = lbls
}
}
// WithUploadCompacted sets whether to upload compacted blocks.
func WithUploadCompacted(upload bool) Option {
return func(o *shipperOptions) {
o.uploadCompacted = upload
}
}
// WithAllowOutOfOrderUploads sets whether to allow out of order uploads.
func WithAllowOutOfOrderUploads(allow bool) Option {
return func(o *shipperOptions) {
o.allowOutOfOrderUploads = allow
}
}
// WithSkipCorruptedBlocks sets whether to skip corrupted blocks.
func WithSkipCorruptedBlocks(skip bool) Option {
return func(o *shipperOptions) {
o.skipCorruptedBlocks = skip
}
}
func applyOptions(opts []Option) *shipperOptions {
so := new(shipperOptions)
for _, o := range opts {
o(so)
}
if so.logger == nil {
so.logger = log.NewNopLogger()
}
if so.lbls == nil {
so.lbls = func() labels.Labels { return labels.EmptyLabels() }
}
if so.metaFileName == "" {
so.metaFileName = DefaultMetaFilename
}
return so
}
// New creates a new shipper that detects new TSDB blocks in dir and uploads them to
// remote if necessary. It attaches the Thanos metadata section in each meta JSON file.
// If uploadCompacted is enabled, it also uploads compacted blocks which are already in filesystem.
func New(
logger log.Logger,
r prometheus.Registerer,
dir string,
bucket objstore.Bucket,
lbls func() labels.Labels,
source metadata.SourceType,
uploadCompactedFunc func() bool,
allowOutOfOrderUploads bool,
hashFunc metadata.HashFunc,
metaFileName string,
) *Shipper {
if logger == nil {
logger = log.NewNopLogger()
}
if lbls == nil {
lbls = func() labels.Labels { return labels.EmptyLabels() }
}
func New(bucket objstore.Bucket, dir string, opts ...Option) *Shipper {
options := applyOptions(opts)
if metaFileName == "" {
metaFileName = DefaultMetaFilename
}
if uploadCompactedFunc == nil {
uploadCompactedFunc = func() bool {
return false
}
}
return &Shipper{
logger: logger,
logger: options.logger,
dir: dir,
bucket: bucket,
labels: lbls,
metrics: newMetrics(r),
source: source,
allowOutOfOrderUploads: allowOutOfOrderUploads,
uploadCompactedFunc: uploadCompactedFunc,
hashFunc: hashFunc,
metadataFilePath: filepath.Join(dir, filepath.Clean(metaFileName)),
labels: options.lbls,
metrics: newMetrics(options.r),
source: options.source,
allowOutOfOrderUploads: options.allowOutOfOrderUploads,
skipCorruptedBlocks: options.skipCorruptedBlocks,
uploadCompacted: options.uploadCompacted,
hashFunc: options.hashFunc,
metadataFilePath: filepath.Join(dir, filepath.Clean(options.metaFileName)),
}
}
@ -136,42 +220,6 @@ func (s *Shipper) SetLabels(lbls labels.Labels) {
s.labels = func() labels.Labels { return lbls }
}
// Timestamps returns the minimum timestamp for which data is available and the highest timestamp
// of blocks that were successfully uploaded.
func (s *Shipper) Timestamps() (minTime, maxSyncTime int64, err error) {
meta, err := ReadMetaFile(s.metadataFilePath)
if err != nil {
return 0, 0, errors.Wrap(err, "read shipper meta file")
}
// Build a map of blocks we already uploaded.
hasUploaded := make(map[ulid.ULID]struct{}, len(meta.Uploaded))
for _, id := range meta.Uploaded {
hasUploaded[id] = struct{}{}
}
minTime = math.MaxInt64
maxSyncTime = math.MinInt64
metas, err := s.blockMetasFromOldest()
if err != nil {
return 0, 0, err
}
for _, m := range metas {
if m.MinTime < minTime {
minTime = m.MinTime
}
if _, ok := hasUploaded[m.ULID]; ok && m.MaxTime > maxSyncTime {
maxSyncTime = m.MaxTime
}
}
if minTime == math.MaxInt64 {
// No block yet found. We cannot assume any min block size so propagate 0 minTime.
minTime = 0
}
return minTime, maxSyncTime, nil
}
type lazyOverlapChecker struct {
synced bool
logger log.Logger
@ -255,8 +303,10 @@ func (s *Shipper) Sync(ctx context.Context) (uploaded int, err error) {
// If we encounter any error, proceed with an empty meta file and overwrite it later.
// The meta file is only used to avoid unnecessary bucket.Exists call,
// which are properly handled by the system if their occur anyway.
if !os.IsNotExist(err) {
level.Warn(s.logger).Log("msg", "reading meta file failed, will override it", "err", err)
if errors.Is(err, fs.ErrNotExist) {
level.Info(s.logger).Log("msg", "no meta file found, creating empty meta data to write later")
} else {
level.Error(s.logger).Log("msg", "failed to read meta file, creating empty meta data to write later", "err", err)
}
meta = &Meta{Version: MetaVersion1}
}
@ -271,13 +321,22 @@ func (s *Shipper) Sync(ctx context.Context) (uploaded int, err error) {
meta.Uploaded = nil
var (
checker = newLazyOverlapChecker(s.logger, s.bucket, func() labels.Labels { return s.labels() })
uploadErrs int
checker = newLazyOverlapChecker(s.logger, s.bucket, func() labels.Labels { return s.labels() })
uploadErrs int
failedExecution = true
)
uploadCompacted := s.uploadCompactedFunc()
metas, err := s.blockMetasFromOldest()
if err != nil {
defer func() {
if failedExecution {
s.metrics.dirSyncFailures.Inc()
} else {
s.metrics.dirSyncs.Inc()
}
}()
metas, failedBlocks, err := s.blockMetasFromOldest()
// Ignore error when we should ignore failed blocks
if err != nil && (!errors.Is(errors.Cause(err), ErrorSyncBlockCorrupted) || !s.skipCorruptedBlocks) {
return 0, err
}
for _, m := range metas {
@ -296,7 +355,7 @@ func (s *Shipper) Sync(ctx context.Context) (uploaded int, err error) {
// We only ship of the first compacted block level as normal flow.
if m.Compaction.Level > 1 {
if !uploadCompacted {
if !s.uploadCompacted {
continue
}
}
@ -304,7 +363,7 @@ func (s *Shipper) Sync(ctx context.Context) (uploaded int, err error) {
// Check against bucket if the meta file for this block exists.
ok, err := s.bucket.Exists(ctx, path.Join(m.ULID.String(), block.MetaFilename))
if err != nil {
return 0, errors.Wrap(err, "check exists")
return uploaded, errors.Wrap(err, "check exists")
}
if ok {
meta.Uploaded = append(meta.Uploaded, m.ULID)
@ -314,13 +373,13 @@ func (s *Shipper) Sync(ctx context.Context) (uploaded int, err error) {
// Skip overlap check if out of order uploads is enabled.
if m.Compaction.Level > 1 && !s.allowOutOfOrderUploads {
if err := checker.IsOverlapping(ctx, m.BlockMeta); err != nil {
return 0, errors.Errorf("Found overlap or error during sync, cannot upload compacted block, details: %v", err)
return uploaded, errors.Errorf("Found overlap or error during sync, cannot upload compacted block, details: %v", err)
}
}
if err := s.upload(ctx, m); err != nil {
if !s.allowOutOfOrderUploads {
return 0, errors.Wrapf(err, "upload %v", m.ULID)
return uploaded, errors.Wrapf(err, "upload %v", m.ULID)
}
// No error returned, just log line. This is because we want other blocks to be uploaded even
@ -337,13 +396,14 @@ func (s *Shipper) Sync(ctx context.Context) (uploaded int, err error) {
level.Warn(s.logger).Log("msg", "updating meta file failed", "err", err)
}
s.metrics.dirSyncs.Inc()
if uploadErrs > 0 {
failedExecution = false
if uploadErrs > 0 || len(failedBlocks) > 0 {
s.metrics.uploadFailures.Add(float64(uploadErrs))
return uploaded, errors.Errorf("failed to sync %v blocks", uploadErrs)
s.metrics.corruptedBlocks.Add(float64(len(failedBlocks)))
return uploaded, errors.Errorf("failed to sync %v/%v blocks", uploadErrs, len(failedBlocks))
}
if uploadCompacted {
if s.uploadCompacted {
s.metrics.uploadedCompacted.Set(1)
} else {
s.metrics.uploadedCompacted.Set(0)
@ -408,10 +468,10 @@ func (s *Shipper) upload(ctx context.Context, meta *metadata.Meta) error {
// blockMetasFromOldest returns the block meta of each block found in dir
// sorted by minTime asc.
func (s *Shipper) blockMetasFromOldest() (metas []*metadata.Meta, _ error) {
func (s *Shipper) blockMetasFromOldest() (metas []*metadata.Meta, failedBlocks []string, _ error) {
fis, err := os.ReadDir(s.dir)
if err != nil {
return nil, errors.Wrap(err, "read dir")
return nil, nil, errors.Wrap(err, "read dir")
}
names := make([]string, 0, len(fis))
for _, fi := range fis {
@ -425,21 +485,35 @@ func (s *Shipper) blockMetasFromOldest() (metas []*metadata.Meta, _ error) {
fi, err := os.Stat(dir)
if err != nil {
return nil, errors.Wrapf(err, "stat block %v", dir)
if s.skipCorruptedBlocks {
level.Error(s.logger).Log("msg", "stat block", "err", err, "block", dir)
failedBlocks = append(failedBlocks, n)
continue
}
return nil, nil, errors.Wrapf(err, "stat block %v", dir)
}
if !fi.IsDir() {
continue
}
m, err := metadata.ReadFromDir(dir)
if err != nil {
return nil, errors.Wrapf(err, "read metadata for block %v", dir)
if s.skipCorruptedBlocks {
level.Error(s.logger).Log("msg", "read metadata for block", "err", err, "block", dir)
failedBlocks = append(failedBlocks, n)
continue
}
return nil, nil, errors.Wrapf(err, "read metadata for block %v", dir)
}
metas = append(metas, m)
}
sort.Slice(metas, func(i, j int) bool {
return metas[i].BlockMeta.MinTime < metas[j].BlockMeta.MinTime
})
return metas, nil
if len(failedBlocks) > 0 {
err = ErrorSyncBlockCorrupted
}
return metas, failedBlocks, err
}
func hardlinkBlock(src, dst string) error {
@ -501,6 +575,11 @@ func WriteMetaFile(logger log.Logger, path string, meta *Meta) error {
runutil.CloseWithLogOnErr(logger, f, "write meta file close")
return err
}
// Force the kernel to persist the file on disk to avoid data loss if the host crashes.
if err := f.Sync(); err != nil {
return err
}
if err := f.Close(); err != nil {
return err
}

View File

@ -19,7 +19,7 @@ import (
"github.com/thanos-io/thanos/pkg/extprom"
"github.com/go-kit/log"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/prometheus/client_golang/prometheus"
promtest "github.com/prometheus/client_golang/prometheus/testutil"
"github.com/prometheus/prometheus/model/labels"
@ -44,19 +44,25 @@ func TestShipper_SyncBlocks_e2e(t *testing.T) {
dir := t.TempDir()
extLset := labels.FromStrings("prometheus", "prom-1")
shipper := New(log.NewLogfmtLogger(os.Stderr), nil, dir, metricsBucket, func() labels.Labels { return extLset }, metadata.TestSource, nil, false, metadata.NoneFunc, DefaultMetaFilename)
shipper := New(
metricsBucket,
dir,
WithLogger(log.NewLogfmtLogger(os.Stderr)),
WithSource(metadata.TestSource),
WithHashFunc(metadata.NoneFunc),
WithLabels(func() labels.Labels { return extLset }),
)
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
// Create 10 new blocks. 9 of them (non compacted) should be actually uploaded.
var (
expBlocks = map[ulid.ULID]struct{}{}
expFiles = map[string][]byte{}
randr = rand.New(rand.NewSource(0))
now = time.Now()
ids = []ulid.ULID{}
maxSyncSoFar int64
expBlocks = map[ulid.ULID]struct{}{}
expFiles = map[string][]byte{}
randr = rand.New(rand.NewSource(0))
now = time.Now()
ids = []ulid.ULID{}
)
for i := 0; i < 10; i++ {
id := ulid.MustNew(uint64(i), randr)
@ -120,7 +126,6 @@ func TestShipper_SyncBlocks_e2e(t *testing.T) {
if i != 5 {
ids = append(ids, id)
maxSyncSoFar = meta.MaxTime
testutil.Equals(t, 1, b)
} else {
// 5 blocks uploaded so far - 5 existence checks & 25 uploads (5 files each).
@ -167,12 +172,6 @@ func TestShipper_SyncBlocks_e2e(t *testing.T) {
shipMeta, err = ReadMetaFile(shipper.metadataFilePath)
testutil.Ok(t, err)
testutil.Equals(t, &Meta{Version: MetaVersion1, Uploaded: ids}, shipMeta)
// Verify timestamps were updated correctly.
minTotal, maxSync, err := shipper.Timestamps()
testutil.Ok(t, err)
testutil.Equals(t, timestamp.FromTime(now), minTotal)
testutil.Equals(t, maxSyncSoFar, maxSync)
}
for id := range expBlocks {
@ -211,8 +210,15 @@ func TestShipper_SyncBlocksWithMigrating_e2e(t *testing.T) {
p.DisableCompaction()
testutil.Ok(t, p.Restart(context.Background(), logger))
uploadCompactedFunc := func() bool { return true }
shipper := New(log.NewLogfmtLogger(os.Stderr), nil, dir, bkt, func() labels.Labels { return extLset }, metadata.TestSource, uploadCompactedFunc, false, metadata.NoneFunc, DefaultMetaFilename)
shipper := New(
bkt,
dir,
WithLogger(log.NewLogfmtLogger(os.Stderr)),
WithSource(metadata.TestSource),
WithHashFunc(metadata.NoneFunc),
WithLabels(func() labels.Labels { return extLset }),
WithUploadCompacted(true),
)
// Create 10 new blocks. 9 of them (non compacted) should be actually uploaded.
var (
@ -313,12 +319,6 @@ func TestShipper_SyncBlocksWithMigrating_e2e(t *testing.T) {
shipMeta, err = ReadMetaFile(shipper.metadataFilePath)
testutil.Ok(t, err)
testutil.Equals(t, &Meta{Version: MetaVersion1, Uploaded: ids}, shipMeta)
// Verify timestamps were updated correctly.
minTotal, maxSync, err := shipper.Timestamps()
testutil.Ok(t, err)
testutil.Equals(t, timestamp.FromTime(now), minTotal)
testutil.Equals(t, meta.MaxTime, maxSync)
}
for id := range expBlocks {
@ -359,10 +359,17 @@ func TestShipper_SyncOverlapBlocks_e2e(t *testing.T) {
p.DisableCompaction()
testutil.Ok(t, p.Restart(context.Background(), logger))
uploadCompactedFunc := func() bool { return true }
// Here, the allowOutOfOrderUploads flag is set to true, which allows blocks with overlaps to be uploaded.
shipper := New(log.NewLogfmtLogger(os.Stderr), nil, dir, bkt, func() labels.Labels { return extLset }, metadata.TestSource, uploadCompactedFunc, true, metadata.NoneFunc, DefaultMetaFilename)
shipper := New(
bkt,
dir,
WithLogger(log.NewLogfmtLogger(os.Stderr)),
WithSource(metadata.TestSource),
WithHashFunc(metadata.NoneFunc),
WithLabels(func() labels.Labels { return extLset }),
WithUploadCompacted(true),
WithAllowOutOfOrderUploads(true),
)
// Creating 2 overlapping blocks - both uploaded when OOO uploads allowed.
var (
expBlocks = map[ulid.ULID]struct{}{}

View File

@ -6,16 +6,18 @@ package shipper
import (
"context"
"fmt"
"math"
"math/rand"
"os"
"path"
"path/filepath"
"sort"
"strings"
"testing"
"github.com/go-kit/log"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/prometheus/client_golang/prometheus"
promtest "github.com/prometheus/client_golang/prometheus/testutil"
"github.com/prometheus/prometheus/model/labels"
"github.com/prometheus/prometheus/tsdb"
@ -26,66 +28,6 @@ import (
"github.com/thanos-io/thanos/pkg/block/metadata"
)
func TestShipperTimestamps(t *testing.T) {
dir := t.TempDir()
s := New(nil, nil, dir, nil, nil, metadata.TestSource, nil, false, metadata.NoneFunc, DefaultMetaFilename)
// Missing thanos meta file.
_, _, err := s.Timestamps()
testutil.NotOk(t, err)
meta := &Meta{Version: MetaVersion1}
testutil.Ok(t, WriteMetaFile(log.NewNopLogger(), s.metadataFilePath, meta))
// Nothing uploaded, nothing in the filesystem. We assume that
// we are still waiting for TSDB to dump first TSDB block.
mint, maxt, err := s.Timestamps()
testutil.Ok(t, err)
testutil.Equals(t, int64(0), mint)
testutil.Equals(t, int64(math.MinInt64), maxt)
id1 := ulid.MustNew(1, nil)
testutil.Ok(t, os.Mkdir(path.Join(dir, id1.String()), os.ModePerm))
testutil.Ok(t, metadata.Meta{
BlockMeta: tsdb.BlockMeta{
ULID: id1,
MaxTime: 2000,
MinTime: 1000,
Version: 1,
},
}.WriteToDir(log.NewNopLogger(), path.Join(dir, id1.String())))
mint, maxt, err = s.Timestamps()
testutil.Ok(t, err)
testutil.Equals(t, int64(1000), mint)
testutil.Equals(t, int64(math.MinInt64), maxt)
id2 := ulid.MustNew(2, nil)
testutil.Ok(t, os.Mkdir(path.Join(dir, id2.String()), os.ModePerm))
testutil.Ok(t, metadata.Meta{
BlockMeta: tsdb.BlockMeta{
ULID: id2,
MaxTime: 4000,
MinTime: 2000,
Version: 1,
},
}.WriteToDir(log.NewNopLogger(), path.Join(dir, id2.String())))
mint, maxt, err = s.Timestamps()
testutil.Ok(t, err)
testutil.Equals(t, int64(1000), mint)
testutil.Equals(t, int64(math.MinInt64), maxt)
meta = &Meta{
Version: MetaVersion1,
Uploaded: []ulid.ULID{id1},
}
testutil.Ok(t, WriteMetaFile(log.NewNopLogger(), s.metadataFilePath, meta))
mint, maxt, err = s.Timestamps()
testutil.Ok(t, err)
testutil.Equals(t, int64(1000), mint)
testutil.Equals(t, int64(2000), maxt)
}
func TestIterBlockMetas(t *testing.T) {
dir := t.TempDir()
@ -122,9 +64,60 @@ func TestIterBlockMetas(t *testing.T) {
},
}.WriteToDir(log.NewNopLogger(), path.Join(dir, id3.String())))
shipper := New(nil, nil, dir, nil, nil, metadata.TestSource, nil, false, metadata.NoneFunc, DefaultMetaFilename)
metas, err := shipper.blockMetasFromOldest()
shipper := New(
nil,
dir,
WithSource(metadata.TestSource),
WithHashFunc(metadata.NoneFunc),
)
metas, failedBlocks, err := shipper.blockMetasFromOldest()
testutil.Ok(t, err)
testutil.Equals(t, 0, len(failedBlocks))
testutil.Equals(t, sort.SliceIsSorted(metas, func(i, j int) bool {
return metas[i].BlockMeta.MinTime < metas[j].BlockMeta.MinTime
}), true)
}
func TestIterBlockMetasWhenMissingMeta(t *testing.T) {
dir := t.TempDir()
id1 := ulid.MustNew(1, nil)
testutil.Ok(t, os.Mkdir(path.Join(dir, id1.String()), os.ModePerm))
testutil.Ok(t, metadata.Meta{
BlockMeta: tsdb.BlockMeta{
ULID: id1,
MaxTime: 2000,
MinTime: 1000,
Version: 1,
},
}.WriteToDir(log.NewNopLogger(), path.Join(dir, id1.String())))
id2 := ulid.MustNew(2, nil)
testutil.Ok(t, os.Mkdir(path.Join(dir, id2.String()), os.ModePerm))
id3 := ulid.MustNew(3, nil)
testutil.Ok(t, os.Mkdir(path.Join(dir, id3.String()), os.ModePerm))
testutil.Ok(t, metadata.Meta{
BlockMeta: tsdb.BlockMeta{
ULID: id3,
MaxTime: 3000,
MinTime: 2000,
Version: 1,
},
}.WriteToDir(log.NewNopLogger(), path.Join(dir, id3.String())))
shipper := New(
nil,
dir,
WithSource(metadata.TestSource),
WithHashFunc(metadata.NoneFunc),
WithSkipCorruptedBlocks(true),
)
metas, failedBlocks, err := shipper.blockMetasFromOldest()
testutil.NotOk(t, err)
testutil.Equals(t, 1, len(failedBlocks))
testutil.Equals(t, id2.String(), failedBlocks[0])
testutil.Equals(t, 2, len(metas))
testutil.Equals(t, sort.SliceIsSorted(metas, func(i, j int) bool {
return metas[i].BlockMeta.MinTime < metas[j].BlockMeta.MinTime
}), true)
@ -153,19 +146,29 @@ func BenchmarkIterBlockMetas(b *testing.B) {
})
b.ResetTimer()
shipper := New(nil, nil, dir, nil, nil, metadata.TestSource, nil, false, metadata.NoneFunc, DefaultMetaFilename)
_, err := shipper.blockMetasFromOldest()
shipper := New(
nil,
dir,
WithSource(metadata.TestSource),
WithHashFunc(metadata.NoneFunc),
)
_, _, err := shipper.blockMetasFromOldest()
testutil.Ok(b, err)
}
func TestShipperAddsSegmentFiles(t *testing.T) {
dir := t.TempDir()
inmemory := objstore.NewInMemBucket()
metrics := prometheus.NewRegistry()
lbls := labels.FromStrings("test", "test")
s := New(nil, nil, dir, inmemory, func() labels.Labels { return lbls }, metadata.TestSource, nil, false, metadata.NoneFunc, DefaultMetaFilename)
s := New(
inmemory,
dir,
WithRegisterer(metrics),
WithSource(metadata.TestSource),
WithHashFunc(metadata.NoneFunc),
WithLabels(func() labels.Labels { return lbls }),
)
id := ulid.MustNew(1, nil)
blockDir := path.Join(dir, id.String())
@ -196,6 +199,102 @@ func TestShipperAddsSegmentFiles(t *testing.T) {
testutil.Ok(t, err)
testutil.Equals(t, []string{segmentFile}, meta.Thanos.SegmentFiles)
testutil.Ok(t, promtest.GatherAndCompare(metrics, strings.NewReader(`
# HELP thanos_shipper_dir_syncs_total Total number of dir syncs
# TYPE thanos_shipper_dir_syncs_total counter
thanos_shipper_dir_syncs_total{} 1
`), `thanos_shipper_dir_syncs_total`))
}
func TestShipperSkipCorruptedBlocks(t *testing.T) {
dir := t.TempDir()
inmemory := objstore.NewInMemBucket()
metrics := prometheus.NewRegistry()
lbls := labels.FromStrings("test", "test")
s := New(
inmemory,
dir,
WithRegisterer(metrics),
WithSource(metadata.TestSource),
WithHashFunc(metadata.NoneFunc),
WithLabels(func() labels.Labels { return lbls }),
WithSkipCorruptedBlocks(true),
)
id1 := ulid.MustNew(1, nil)
blockDir1 := path.Join(dir, id1.String())
chunksDir1 := path.Join(blockDir1, block.ChunksDirname)
testutil.Ok(t, os.MkdirAll(chunksDir1, os.ModePerm))
testutil.Ok(t, metadata.Meta{
BlockMeta: tsdb.BlockMeta{
ULID: id1,
MaxTime: 2000,
MinTime: 1000,
Version: 1,
Stats: tsdb.BlockStats{
NumSamples: 1000, // Not really, but shipper needs nonzero value.
},
},
}.WriteToDir(log.NewNopLogger(), path.Join(dir, id1.String())))
testutil.Ok(t, os.WriteFile(filepath.Join(blockDir1, "index"), []byte("index file"), 0666))
segmentFile := "00001"
testutil.Ok(t, os.WriteFile(filepath.Join(chunksDir1, segmentFile), []byte("hello world"), 0666))
id2 := ulid.MustNew(2, nil)
blockDir2 := path.Join(dir, id2.String())
chunksDir2 := path.Join(blockDir2, block.ChunksDirname)
testutil.Ok(t, os.MkdirAll(chunksDir2, os.ModePerm))
testutil.Ok(t, os.WriteFile(filepath.Join(blockDir2, "index"), []byte("index file"), 0666))
testutil.Ok(t, os.WriteFile(filepath.Join(chunksDir2, segmentFile), []byte("hello world"), 0666))
uploaded, err := s.Sync(context.Background())
testutil.NotOk(t, err)
testutil.Equals(t, 1, uploaded)
testutil.Ok(t, promtest.GatherAndCompare(metrics, strings.NewReader(`
# HELP thanos_shipper_upload_failures_total Total number of block upload failures
# TYPE thanos_shipper_upload_failures_total counter
thanos_shipper_upload_failures_total{} 0
`), `thanos_shipper_upload_failures_total`))
testutil.Ok(t, promtest.GatherAndCompare(metrics, strings.NewReader(`
# HELP thanos_shipper_corrupted_blocks_total Total number of corrupted blocks
# TYPE thanos_shipper_corrupted_blocks_total counter
thanos_shipper_corrupted_blocks_total{} 1
`), `thanos_shipper_corrupted_blocks_total`))
}
func TestShipperNotSkipCorruptedBlocks(t *testing.T) {
dir := t.TempDir()
inmemory := objstore.NewInMemBucket()
metrics := prometheus.NewRegistry()
lbls := labels.FromStrings("test", "test")
s := New(
inmemory,
dir,
WithRegisterer(metrics),
WithSource(metadata.TestSource),
WithHashFunc(metadata.NoneFunc),
WithLabels(func() labels.Labels { return lbls }),
)
id := ulid.MustNew(2, nil)
blockDir := path.Join(dir, id.String())
chunksDir := path.Join(blockDir, block.ChunksDirname)
segmentFile := "00001"
testutil.Ok(t, os.MkdirAll(chunksDir, os.ModePerm))
testutil.Ok(t, os.WriteFile(filepath.Join(blockDir, "index"), []byte("index file"), 0666))
testutil.Ok(t, os.WriteFile(filepath.Join(chunksDir, segmentFile), []byte("hello world"), 0666))
uploaded, err := s.Sync(context.Background())
testutil.NotOk(t, err)
testutil.Equals(t, 0, uploaded)
testutil.Ok(t, promtest.GatherAndCompare(metrics, strings.NewReader(`
# HELP thanos_shipper_dir_sync_failures_total Total number of failed dir syncs
# TYPE thanos_shipper_dir_sync_failures_total counter
thanos_shipper_dir_sync_failures_total{} 1
`), `thanos_shipper_dir_sync_failures_total`))
}
func TestReadMetaFile(t *testing.T) {
@ -231,11 +330,15 @@ func TestReadMetaFile(t *testing.T) {
func TestShipperExistingThanosLabels(t *testing.T) {
dir := t.TempDir()
inmemory := objstore.NewInMemBucket()
lbls := labels.FromStrings("test", "test")
s := New(nil, nil, dir, inmemory, func() labels.Labels { return lbls }, metadata.TestSource, nil, false, metadata.NoneFunc, DefaultMetaFilename)
s := New(
inmemory,
dir,
WithSource(metadata.TestSource),
WithHashFunc(metadata.NoneFunc),
WithLabels(func() labels.Labels { return lbls }),
)
id := ulid.MustNew(1, nil)
id2 := ulid.MustNew(2, nil)

View File

@ -25,7 +25,8 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/gogo/protobuf/types"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
@ -446,6 +447,8 @@ type BucketStore struct {
postingGroupMaxKeySeriesRatio float64
sortingStrategy sortingStrategy
// This flag limits memory usage when lazy retrieval strategy, newLazyRespSet(), is used.
lazyRetrievalMaxBufferedResponses int
blockEstimatedMaxSeriesFunc BlockEstimator
blockEstimatedMaxChunkFunc BlockEstimator
@ -604,9 +607,13 @@ func WithSeriesMatchRatio(seriesMatchRatio float64) BucketStoreOption {
// WithDontResort disables series resorting in Store Gateway.
func WithDontResort(true bool) BucketStoreOption {
return func(s *BucketStore) {
if true {
s.sortingStrategy = sortingStrategyNone
}
s.sortingStrategy = sortingStrategyNone
}
}
func WithLazyRetrievalMaxBufferedResponsesForBucket(n int) BucketStoreOption {
return func(s *BucketStore) {
s.lazyRetrievalMaxBufferedResponses = n
}
}
@ -683,6 +690,8 @@ func NewBucketStore(
indexHeaderLazyDownloadStrategy: indexheader.AlwaysEagerDownloadIndexHeader,
requestLoggerFunc: NoopRequestLoggerFunc,
blockLifecycleCallback: &noopBlockLifecycleCallback{},
lazyRetrievalMaxBufferedResponses: 20,
}
for _, option := range options {
@ -1728,6 +1737,7 @@ func (s *BucketStore) Series(req *storepb.SeriesRequest, seriesSrv storepb.Store
shardMatcher,
false,
s.metrics.emptyPostingCount.WithLabelValues(tenant),
max(s.lazyRetrievalMaxBufferedResponses, 1),
)
}

View File

@ -17,7 +17,8 @@ import (
"github.com/alecthomas/units"
"github.com/go-kit/log"
"github.com/gogo/status"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/prometheus/client_golang/prometheus"
dto "github.com/prometheus/client_model/go"
"github.com/prometheus/prometheus/model/labels"

View File

@ -31,7 +31,8 @@ import (
"github.com/leanovate/gopter"
"github.com/leanovate/gopter/gen"
"github.com/leanovate/gopter/prop"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
promtest "github.com/prometheus/client_golang/prometheus/testutil"
@ -1789,12 +1790,13 @@ func TestBucketSeries_OneBlock_InMemIndexCacheSegfault(t *testing.T) {
b1.meta.ULID: b1,
b2.meta.ULID: b2,
},
queryGate: gate.NewNoop(),
chunksLimiterFactory: NewChunksLimiterFactory(0),
seriesLimiterFactory: NewSeriesLimiterFactory(0),
bytesLimiterFactory: NewBytesLimiterFactory(0),
seriesBatchSize: SeriesBatchSize,
requestLoggerFunc: NoopRequestLoggerFunc,
queryGate: gate.NewNoop(),
chunksLimiterFactory: NewChunksLimiterFactory(0),
seriesLimiterFactory: NewSeriesLimiterFactory(0),
bytesLimiterFactory: NewBytesLimiterFactory(0),
seriesBatchSize: SeriesBatchSize,
requestLoggerFunc: NoopRequestLoggerFunc,
lazyRetrievalMaxBufferedResponses: 1,
}
t.Run("invoke series for one block. Fill the cache on the way.", func(t *testing.T) {

View File

@ -9,7 +9,8 @@ import (
"strconv"
"strings"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
"github.com/prometheus/prometheus/model/labels"

View File

@ -10,7 +10,8 @@ import (
"strings"
"testing"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/prometheus/prometheus/model/labels"
"golang.org/x/crypto/blake2b"

Some files were not shown because too many files have changed in this diff Show More