Compare commits

...

776 Commits

Author SHA1 Message Date
Giedrius Statkevičius 1c4d17bd12
Merge pull request #8466 from thanos-io/add_missing_path
scripts/genproto: add missing dir
2025-09-03 15:05:22 +03:00
Giedrius Statkevičius cafefb8428
Merge pull request #8464 from thanos-io/assume_unmark_upstream
block: assume that we do not unmark a block for deletion
2025-09-03 12:43:40 +03:00
Giedrius Statkevičius d4330a1bb7 scripts/genproto: add missing dir
This dir was missing from the script. Regenerate the file. Removes some
dead code.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-09-03 12:33:08 +03:00
Giedrius Statkevičius dddd98dab0
Merge pull request #8465 from thanos-io/volunteer_release
docs: volunteer as v0.40.0 shepherd
2025-09-03 12:13:23 +03:00
Giedrius Statkevičius 6e231d08e5 block: assume that we do not unmark a block for deletion
Just like we assume that the meta.json file doesn't change, let's also
assume that we do not unmark a block for deletion.

This solves a critical issue in Thanos Store where there is a race
between deletion in compactor and the loading of a block:
- Deletion starts from meta.json, deletion marker are deleted at the end
- Store sees that block, loads it by using the local in-memory and disk
  cache
- By the time the deletion marker filtering functions is executed, the
  marker is deleted by Compactor
- Store happily tries to load that block

The root cause is that we are doing listing & checking markers in two or
more separate steps. Since that is inevitable, we need to assume that
the marker won't disappear until the block is there. This is the
case when everything is working normally.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-09-03 11:34:36 +03:00
Giedrius Statkevičius 0a26f5ea7c docs: volunteer as v0.40.0 shepherd
Let's do a release just before PromCon.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-09-03 11:11:06 +03:00
Giedrius Statkevičius 743e5ed125
Merge pull request #8456 from parthivrmenon/docs/clarify-compactor-wording
docs: clarify Compactor wording in compact.md
2025-09-01 14:36:41 +03:00
Parthiv Roshan Menon a2863633aa docs: restore alert rules in examples/alerts/alerts.md
MDOX regenerated the alert rules content from null back to actual rules.

Signed-off-by: Parthiv Roshan Menon <parthiv.menon@smarsh.com>
2025-09-01 11:39:39 +05:30
Parthiv Roshan Menon 060e52a3d3 docs: apply MDOX auto-formatting
- Fix email formatting in MAINTAINERS.md (remove angle brackets)
- Update HTTP config defaults in component documentation
- Apply consistent formatting and spacing across documentation files
- Remove empty alert rules in examples/alerts/alerts.md

Signed-off-by: Parthiv Roshan Menon <parthiv.menon@smarsh.com>
2025-09-01 11:28:28 +05:30
Parthiv Roshan Menon a2f617722f docs: ignore itnext.io in link validation due to TLS issues
Signed-off-by: Parthiv Roshan Menon <parthiv.menon@smarsh.com>
2025-09-01 11:28:28 +05:30
Parthiv Roshan Menon 6bee829914 docs: clarify Compactor wording in compact.md
Signed-off-by: Parthiv Roshan Menon <parthiv.menon@smarsh.com>
2025-09-01 11:28:21 +05:30
Parthiv Roshan Menon 345f3660c5 docs: fix broken links in documentation
Signed-off-by: Parthiv Roshan Menon <parthiv.menon@smarsh.com>
2025-09-01 11:28:03 +05:30
Giedrius Statkevičius c93f82d7ca
Merge pull request #8454 from thanos-io/fix_bug_8442
compact: ensure we don't mark blocks for deletion again
2025-08-29 13:31:58 +03:00
Giedrius Statkevičius ebd746c952 block: fix race
Protect reads and writes with the mutex.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-08-29 13:04:28 +03:00
Giedrius Statkevičius 638bf440eb compact: ensure we don't mark blocks for deletion again
Fix #8442 by not marking blocks for deletion again if they were just
deleted.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-08-28 14:47:16 +03:00
Saswata Mukherjee 9a10cb2fcc
*: Restore certain omitempty tags after modernize (#8452)
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2025-08-28 10:21:39 +01:00
Giedrius Statkevičius fcea5d2c99
Merge pull request #8443 from clwluvw/querier-rl-cleanup
query: remove unused replica labels map in querier
2025-08-28 11:10:26 +03:00
Giedrius Statkevičius 662dbb6e29
Merge pull request #8450 from saswatamcode/modernize
*: Apply modernize analyzer to the codebase
2025-08-28 11:09:45 +03:00
Saswata Mukherjee fa18f8982a
Fix lint
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2025-08-27 17:09:33 +01:00
Saswata Mukherjee 733df9aedb
*: Apply modernize analyzer to the codebase
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2025-08-27 16:33:31 +01:00
Giedrius Statkevičius dc403eaac7
Merge pull request #8449 from thanos-io/add_repro_for_8442
compact: adding repro for 8442
2025-08-27 11:31:44 +03:00
Giedrius Statkevičius df3e1963bd compact: adding repro for 8442
Adding a test that reproduces the race in 8442 - race between garbage
collection and deletion of marked blocks. Garbage collection should
never mark those blocks for deletion again if it has a consistent state.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-08-27 10:43:09 +03:00
Seena Fallah 74aba58f6c query: remove unused replica labels map in querier
Remove unused local variable `rl` in newQuerier function that was
creating a map from replicaLabels but never being used.

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2025-08-25 20:36:38 +02:00
Giedrius Statkevičius b9844418d6
Merge pull request #8441 from thanos-io/promu_bump_125
.promu: bump to 1.25
2025-08-25 12:52:37 +03:00
Giedrius Statkevičius 55b92a693c .promu: bump to 1.25
Relevant PR was merged so let's bump to 1.25.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-08-25 12:42:18 +03:00
Giedrius Statkevičius 1fe840667d
Merge pull request #8439 from thanos-io/update_go_125
Update to Go 1.25
2025-08-25 12:27:49 +03:00
Giedrius Statkevičius 90bbc8b149 *: fix linter issues
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-08-25 11:52:44 +03:00
Giedrius Statkevičius 2d23cd68f6 *: update to Go 1.25
I found myself wanting the newest os.Root.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-08-25 10:47:20 +03:00
Matej Gera 801bda7a90
Remove @matej-g from maintainers (#8437)
* Remove @matej-g from maintainers

Signed-off-by: GitHub <noreply@github.com>

* Fix failing link

Signed-off-by: GitHub <noreply@github.com>

---------

Signed-off-by: GitHub <noreply@github.com>
2025-08-21 15:33:11 +02:00
Giedrius Statkevičius e61ad9c156
Merge pull request #8427 from erikgb/dockerfile-copy-chown
refactor: chown on COPY in Dockerfile to reduce image size
2025-08-19 10:02:49 +03:00
Erik Godding Boye 8dedfd08f1 refactor: chown on COPY in Dockerfile to reduce image size
Signed-off-by: Erik Godding Boye <egboye@gmail.com>
2025-08-18 17:54:38 +02:00
Giedrius Statkevičius ca9e3637ec
Merge pull request #8426 from thanos-io/add_buffers_guide
docs: add first "internal" guide
2025-08-18 17:50:50 +03:00
Giedrius Statkevičius 6c762830d0 store/labelpb: add unmarshaling benchmarks
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-08-18 15:45:54 +03:00
Michael Hoffmann b05b036a2f
Merge pull request #8423 from thanos-io/mhoffmann/fix-ruler-loading-rules-with-heredoc
ruler: fix marshalling rules with heredoc
2025-08-13 14:49:10 +02:00
Michael Hoffmann 39c56224bc ruler: fix marshalling rules with heredoc
This fixes an issue with ruler where we would load

```
groups:
    - name: test.rules
      rules:
        - alert: LastUpdateTime
          annotations:
            summary: test
          expr: |2
               max without (instance) (
                  time()
                -
                  (last_updated > 0)
              )
            >
              60 * 60 * 4
          for: 5m
          labels:
            priority: "3"
```

and then remarshal it as

```
groups:
    - name: test.rules
      rules:
        - alert: LastUpdateTime
          annotations:
            summary: test
          expr: |4
               max without (instance) (
                  time()
                -
                  (last_updated > 0)
              )
            >
              60 * 60 * 4
          for: 5m
          labels:
            priority: "3"

```

which is not valid yaml anymore.

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-08-13 10:47:27 +00:00
Giedrius Statkevičius eebe0507d4 docs: add buffers guide
Add an initial buffers guide - just outlining my ideas. Will try
removing gogoproto once again and write custom labels unmarshaling code.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-08-13 10:30:24 +03:00
Michael Hoffmann 519fda5fcd
Merge pull request #8413 from thanos-io/mhoffmann/misc-bump-prometheus-and-promql-engine
deps: bump prometheus and promql-engine
2025-08-07 13:40:22 +02:00
Michael Hoffmann 38d1bd8d20 deps: bump prometheus and promql-engine
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-08-07 10:20:51 +00:00
Giedrius Statkevičius 8d77519d03
Merge pull request #8397 from thanos-io/remove_kakkoyun
Remove @kakkoyun from maintainers
2025-08-05 11:09:28 +03:00
Giedrius Statkevičius 899abc193b
Merge pull request #8410 from thanos-io/partial_delete_marked
compact: ignore blocks with deletion mark in partial deletes
2025-08-04 17:18:43 +03:00
Giedrius Statkevičius d06dc234df compact: ignore blocks with deletion mark in partial deletes
Blocks deletion always starts with meta.json so if there are multiple
shards of the compactor then one shard can also start to try to delete
the same block because it detects them as partial. Hence, ignore
deletion marks in the partial block cleaning function because blocks
with deletion mark as handled in the other flow.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-08-04 15:52:50 +03:00
Giedrius Statkevičius 5fc08d5573
Merge pull request #8409 from thanos-io/fix_arg
compact: fix argument
2025-08-04 15:42:59 +03:00
Giedrius Statkevičius a562652784 *: fix tests
Partial block deletion is covered by unit tests so removing it from e2e
tests as it is impossible to mock the last modified date.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-08-04 15:27:23 +03:00
Giedrius Statkevičius e73cd5084c compact: fix argument
This should be the block's ID, not the bucket's name.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-08-04 11:57:59 +03:00
Michael Hoffmann 696193cc35
Merge pull request #8403 from thanos-io/mhoffmann/bump-promu-to-set-build-tags
build: bump promu to set build tags
2025-07-31 17:01:12 +02:00
NickGoog 8d3d636734
Don't start sidecar if REMOTE_WRITE_ENABLED env var present (use receive instead) (#8404)
Based on the design online, running both these components simultaneously
seems unintended: https://thanos.io/tip/thanos/quick-tutorial.md

Signed-off-by: NickGoog <hartunian@google.com>
2025-07-31 14:55:42 +01:00
Michael Hoffmann 227def9692 build: bump promu to set build tags
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-07-31 12:10:39 +00:00
NickGoog 17ca834087
Fix ObjStore config lookup for non-MinIO use (#8399)
Also removes unwanted previous quickstart.sh change from CHANGELOG.

Signed-off-by: NickGoog <66492516+NickGoog@users.noreply.github.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2025-07-31 12:59:40 +01:00
Giedrius Statkevičius 8a92550350
Merge pull request #8402 from thanos-io/trim_labelsets
query/receive: trim labelsets from String()
2025-07-31 14:00:44 +03:00
Giedrius Statkevičius 7ebe35e809 query/receive: trim labelsets from String()
We have lots of tenants & labels so adding them to String() makes the
error messages REALLY long and unreadable so I am just suggesting to
remove them entirely from String().

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-31 13:40:04 +03:00
Giedrius Statkevičius 0acf54d9ea
Merge pull request #8334 from pedro-stanaka/feat/query-check-endpoints-on-startup
Query: wait for initial endpoint discovery before becoming ready
2025-07-31 10:57:56 +03:00
Giedrius Statkevičius 75dac8cb98 Merge remote-tracking branch 'origin/main' into feat/query-check-endpoints-on-startup
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-31 10:44:50 +03:00
Giedrius Statkevičius 1e740e385f
Merge pull request #8401 from thanos-io/rework_only_write
block/compact: rework consistency check, make writers only write
2025-07-31 10:17:52 +03:00
Giedrius Statkevičius 57031c7b18 shipper: fix tests
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-31 10:00:59 +03:00
Giedrius Statkevičius d42ecb5086 block/compact: rework consistency check, make writers only write
- It's weird that on upload errors, we try to clean everything and only
  then write again. It's an extra operation we don't need since whether
  a block exists or not hinges on the existence of meta.json. We don't
  need to delete old, same files before trying to upload them again.
- Consequently, we need to always use the _upload_ time, not block
  creation time when checking for consistency or when deleting partially
  uploaded blocks. Directories as such don't exist in object storages,
  it's a client-side "illusion", so we need to iterate through the
  partial block's directory to fetch the last modified date.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-30 14:05:11 +03:00
NickGoog 1b21d7151c
Update `thanos query` flag `--store` to `--endpoint`. (#8400)
`--store` appears out of date.

Signed-off-by: NickGoog <hartunian@google.com>
2025-07-30 09:02:43 +01:00
Giedrius Statkevičius 88d0ae8071
Merge pull request #8398 from thanos-io/cleanup_surface
block: output cleanup err
2025-07-29 15:03:36 +03:00
Giedrius Statkevičius 8459bd21d7 block: output cleanup err
Now, if cleanup fails then we don't know why it failed. Surface the
error.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-29 11:51:36 +03:00
Kemal Akkoyun 763ba4d413
Remove @kakkoyun from maintainers
- Fix several formatting issues
- Convert old maintainers' section to a table

Signed-off-by: Kemal Akkoyun <kemal.akkoyun@datadoghq.com>
2025-07-28 20:36:24 +02:00
Giedrius Statkevičius 49a560d09d
Merge pull request #7758 from thibaultmg/life_of_a_sample_part_2
Blog article submission: Life of a Sample in Thanos Part II
2025-07-26 15:03:50 +03:00
Harry John c3d4ea7cdd
*: Update promql-engine and prometheus (#8388)
* *: Update promql-engine and prometheus

Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>

* Fix data race

Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>

---------

Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
2025-07-25 10:39:09 -07:00
Thibault Mange 98130c25d6
fix inaccuracies
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
2025-07-25 14:32:51 +02:00
Thibault Mange bf8777dcc5
Update docs/blog/2023-11-20-life-of-a-sample-part-2.md
Co-authored-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
2025-07-25 10:44:25 +02:00
James Geisler be2f408d9e
[tools] add flag for uploading compacted blocks to bucket upload-blocks (#8359)
* add flag for uploading compacted blocks to thanos tools

Signed-off-by: James Geisler <geislerjamesd@gmail.com>

* update changelog

Signed-off-by: James Geisler <geislerjamesd@gmail.com>

* fix doc check

Signed-off-by: James Geisler <geislerjamesd@gmail.com>

---------

Signed-off-by: James Geisler <geislerjamesd@gmail.com>
2025-07-23 17:58:01 -07:00
Pedro Tanaka f4ee5cb617
query: perform initial DNS resolution for gRPC endpoint groups
Extract resolution logic into updateResolver() and call it synchronously
in Build() to ensure endpoint groups are resolved on startup, not just
during periodic updates. This prevents potential connection delays when
the query component starts.

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2025-07-23 16:21:32 +02:00
Pedro Tanaka d7830d319f
Query: Resolve DNS before endpoint discovery on startup
Add initial DNS resolution phase before starting periodic endpoint
updates to fix race condition where Query could become ready with
zero discovered endpoints.

Previously, the first endpoint update could run before DNS resolution
completed (both use runutil.Repeat which runs immediately), causing
Query to be ready but unable to serve requests for up to 5 seconds.

Now DNS resolution happens synchronously on startup, ensuring addresses
are available when the first endpoint update runs. This eliminates the
window where Query reports ready but has no endpoints discovered.

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2025-07-23 15:07:02 +02:00
Pedro Tanaka 73db5294cf
Make Thanos Query wait for initial endpoint discovery before becoming ready
Problem:
We observed a race condition where Thanos Query components were marking themselves as ready before discovering any endpoints. This created a timing gap that could lead to query failures:

- Query pods become ready immediately upon startup
- Endpoint discovery happens asynchronously in the background
- Queries arriving between readiness and endpoint discovery fail

Solution:
This commit modifies the Thanos Query readiness behavior to wait for the initial endpoint discovery to complete before marking the pod as ready. This ensures that when a Query pod reports ready, it has already attempted to discover and connect to available endpoints.

Changes:
1. Added synchronization to EndpointSet:
   - Added firstUpdateOnce flag and firstUpdateChan channel to track first update completion
   - Added WaitForFirstUpdate() method to block until initial discovery completes

2. Modified Query startup sequence:
   - gRPC server now waits for WaitForFirstUpdate() before calling statusProber.Ready()
   - Leverages existing runutil.Repeat behavior which runs the update function immediately

3. Timeout protection:
   - Uses store response timeout or 30 seconds as default timeout
   - Logs warning if timeout occurs but still proceeds to ready state

4. Added comprehensive tests for the new WaitForFirstUpdate functionality

Impact:
- Positive: Eliminates the race condition where queries could be routed to Query pods that haven't discovered any endpoints yet
- Negative: Slightly increases startup time as pods won't be ready until endpoint discovery completes (typically <1s in normal conditions)

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2025-07-23 15:07:00 +02:00
Giedrius Statkevičius e30e831b1c
Merge pull request #8389 from thanos-io/bust_cache
block: bust cache if modified timestamp differs
2025-07-23 14:51:08 +03:00
Giedrius Statkevičius cdecd4ee3f block: use sync.Map for fetcher
f.cached can now be modified concurrently so use a sync.Map.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-23 14:06:24 +03:00
Giedrius Statkevičius 97196973f3 block: bust cache if modified timestamp differs
In the parquet converter, we mark the original meta.json file with a
flag when it gets converted so that Thanos Store wouldn't load it. For
that to work, we need to bust the local cache when that happens.

For tests, we need the updated objstore module so I am doing that as
well.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-23 13:32:41 +03:00
Giedrius Statkevičius de1a2236eb
Merge pull request #8372 from harry671003/update_grpc
*: Update GRPC
2025-07-22 16:29:23 +03:00
🌲 Harry 🌊 John 🏔 ba255aaccd Fix data race
Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
2025-07-21 15:18:32 -07:00
🌲 Harry 🌊 John 🏔 f1991970bf *: update GRPC
Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
2025-07-21 14:25:25 -07:00
Michael Hoffmann 9073c8d0c5
Merge pull request #8384 from thanos-io/r0392_merge_to_main
Merge release 0.39.2 to main
2025-07-21 09:23:29 +02:00
Michael Hoffmann ba5c91aefb Merge remote-tracking branch 'origin/main' into r0392_merge_to_main 2025-07-21 06:55:52 +00:00
Michael Hoffmann 36681afb5e
Merge pull request #8379 from thanos-io/rel_0392
Release 0.39.2
2025-07-21 08:19:58 +02:00
Michael Hoffmann 5dd0031fab Release 0.39.2
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-07-17 14:17:28 +00:00
Joel Verezhak c0273e1d1a fix: querier panic (#8374)
Thanos Query crashes with "concurrent map iteration and map write" panic
in distributed mode when multiple goroutines access the same `annotations.Annotations`
map concurrently.

```
panic: concurrent map iteration and map write
github.com/prometheus/prometheus/util/annotations.(*Annotations).Merge(...)
github.com/thanos-io/promql-engine/engine.(*compatibilityQuery).Exec(...)
```

Here I replaced direct access to `res.Warnings.AsErrors()` with a thread-safe copy:
```go
// Before (unsafe)
warnings = append(warnings, res.Warnings.AsErrors()...)

// After (thread-safe)
safeWarnings := annotations.New().Merge(res.Warnings)
warnings = append(warnings, safeWarnings.AsErrors()...)
```

Signed-off-by: Joel Verezhak <jverezhak@open-systems.com>
Co-authored-by: Joel Verezhak <jverezhak@open-systems.com>
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-07-17 14:17:28 +00:00
Michael Hoffmann e78458176e query: add custom values to prompb methods (#8375)
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-07-17 14:17:28 +00:00
Michael Hoffmann 20900389bb
query: add custom values to prompb methods (#8375)
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-07-17 13:28:16 +00:00
Joel Verezhak 6f4895633a
fix: querier panic (#8374)
Thanos Query crashes with "concurrent map iteration and map write" panic
in distributed mode when multiple goroutines access the same `annotations.Annotations`
map concurrently.

```
panic: concurrent map iteration and map write
github.com/prometheus/prometheus/util/annotations.(*Annotations).Merge(...)
github.com/thanos-io/promql-engine/engine.(*compatibilityQuery).Exec(...)
```

Here I replaced direct access to `res.Warnings.AsErrors()` with a thread-safe copy:
```go
// Before (unsafe)
warnings = append(warnings, res.Warnings.AsErrors()...)

// After (thread-safe)
safeWarnings := annotations.New().Merge(res.Warnings)
warnings = append(warnings, safeWarnings.AsErrors()...)
```

Signed-off-by: Joel Verezhak <jverezhak@open-systems.com>
Co-authored-by: Joel Verezhak <jverezhak@open-systems.com>
2025-07-17 13:28:03 +00:00
Giedrius Statkevičius 0dc0b29fc8
Merge pull request #8366 from verejoel/feature/parquet-migration-flag
feat: ignore parquet migrated blocks in store gateway
2025-07-16 22:51:23 +03:00
Giedrius Statkevičius b4951291c7 *: always enable, clean up tests+code
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-16 18:30:16 +03:00
Giedrius Statkevičius 77f12e3e97
Merge pull request #8370 from open-ch/fix/querier-relabel-config
fix: query announced endpoints match relabel-config
2025-07-15 14:03:48 +03:00
Joel Verezhak dddffa99c4
fix acceptance test
Signed-off-by: Joel Verezhak <jverezhak@open-systems.com>
2025-07-14 01:01:43 +02:00
Joel Verezhak f2ff735e76
return only one store
Signed-off-by: Joel Verezhak <jverezhak@open-systems.com>
2025-07-13 21:52:56 +02:00
Joel Verezhak dee991e0d9
acceptance test
Signed-off-by: Joel Verezhak <jverezhak@open-systems.com>
2025-07-13 21:42:13 +02:00
Joel Verezhak 0972c43f29
acceptance test
Signed-off-by: Joel Verezhak <jverezhak@open-systems.com>
2025-07-13 21:17:30 +02:00
Joel Verezhak bd88416a19
rename method
Signed-off-by: Joel Verezhak <jverezhak@open-systems.com>
2025-07-13 20:35:41 +02:00
Joel Verezhak 0bb3e73e9d
refactor
Signed-off-by: Joel Verezhak <jverezhak@open-systems.com>
2025-07-13 20:24:08 +02:00
Joel Verezhak 9f2acf9df9
lint
Signed-off-by: Joel Verezhak <jverezhak@open-systems.com>
2025-07-12 01:38:19 +02:00
Joel Verezhak 8b3c29acc7
fix: querier external labels match relabel config
Signed-off-by: Joel Verezhak <jverezhak@open-systems.com>
2025-07-12 01:16:20 +02:00
Joel Verezhak ecd54dafd0
feat: ignore parquet migrated blocks in store gateway
Signed-off-by: Joel Verezhak <j.verezhak@gmail.com>
2025-07-08 17:46:19 +02:00
Giedrius Statkevičius b51ef67654
Merge pull request #8364 from thanos-io/use_prom_consts
*: use prometheus consts
2025-07-08 15:33:45 +03:00
Giedrius Statkevičius c8e9c2b12c *: use prometheus consts
Use Prometheus consts instead of using our own.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-08 14:50:55 +03:00
Michael Hoffmann 0f81bb792a
query: make grpc service config for endpoint groups configurable (#8287)
We add a "service_config" field to endpoint config file that we can use
to override the default service_config for endpoint groups. This enables
us to configure retry policy or loadbalncing on an endpoint level.

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-07-08 08:54:32 +01:00
Giedrius Statkevičius ddd5ff85f4
Merge pull request #8352 from thanos-io/r0391_merge_to_main
Merge release-0.39 to main
2025-07-01 17:13:53 +03:00
Giedrius Statkevičius 49cccb4d83 CHANGELOG: fix formatting
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-01 16:34:53 +03:00
Giedrius Statkevičius d6a926e613 Merge branch 'main' into r0391_merge_to_main
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-01 16:28:16 +03:00
Giedrius Statkevičius ad743914dd
Merge pull request #8351 from thanos-io/rel_0391
Release 0.39.1
2025-07-01 13:15:10 +03:00
Giedrius Statkevičius 35309514d1
Merge pull request #8347 from Saumya40-codes/update-docs-links
docs: update changed repositories links in docs/ to correct location
2025-07-01 13:02:03 +03:00
Giedrius Statkevičius e9bdd79df2 CHANGELOG: release 0.39.1
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-01 12:59:58 +03:00
Giedrius Statkevičius 5583757964 qfe: defer properly
Refactor this check into a separate function so that defer would run at
the end of it and clean up resources properly.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-01 12:58:11 +03:00
Giedrius Statkevičius 4240ff3579 cmd/query_frontend: use original roundtripper + close immediately
Let's avoid using all the Cortex roundtripper machinery by using the
downstream roundtripper directly and then close the body immediately as
to not allocate any memory for the body of the response.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-01 12:58:05 +03:00
Giedrius Statkevičius 7c5ba37e5e
Merge pull request #8349 from thanos-io/defer_qfe
qfe: defer properly
2025-07-01 11:28:58 +03:00
Giedrius Statkevičius 938c083d6b qfe: defer properly
Refactor this check into a separate function so that defer would run at
the end of it and clean up resources properly.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-01 10:34:37 +03:00
Saumya Shah 9847758315 update changed repositories urls in docs/
Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>
2025-07-01 08:44:37 +05:30
Giedrius Statkevičius 246502a29b
Merge pull request #8338 from thanos-io/tweak_qfe
cmd/query_frontend: use original roundtripper + close immediately
2025-06-30 14:16:06 +03:00
Giedrius Statkevičius d87029eea4 cmd/query_frontend: use original roundtripper + close immediately
Let's avoid using all the Cortex roundtripper machinery by using the
downstream roundtripper directly and then close the body immediately as
to not allocate any memory for the body of the response.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-26 12:40:15 +03:00
Giedrius Statkevičius 3727363b49
Merge pull request #8335 from pedro-stanaka/fix/flaky-unit-test-store-proxy
fix: make TestProxyStore_SeriesSlowStores less flaky by removing timing assertions
2025-06-26 12:20:19 +03:00
Giedrius Statkevičius 37254e5779
Merge pull request #8336 from thanos-io/lazyindexheader_fix
indexheader: fix race between lazy index header creation
2025-06-26 11:19:12 +03:00
Giedrius Statkevičius 4b31bbaa6b indexheader: create lazy header in singleflight
Creation of the index header shares the underlying storage so we should
use singleflight here to only create it once.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-26 10:18:07 +03:00
Giedrius Statkevičius d6ee898a06 indexheader: produce race in test
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-26 10:01:21 +03:00
Giedrius Statkevičius 5a95d13802
Merge pull request #8333 from thanos-io/repro_8224
e2e: add repro for 8224
2025-06-26 08:01:35 +03:00
Pedro Tanaka b54d293dbd
fix: make TestProxyStore_SeriesSlowStores less flaky by removing timing assertions
The TestProxyStore_SeriesSlowStores test was failing intermittently in CI due to
strict timing assertions that were sensitive to system load and scheduling variations.

The test now focuses on functional correctness rather than precise timing,
making it more reliable in CI environments while still validating the
proxy store's timeout and partial response behavior.

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>
2025-06-25 23:09:47 +02:00
Giedrius Statkevičius dfcbfe7c40 e2e: add repro for 8224
Add repro for https://github.com/thanos-io/thanos/issues/8224. Fix in
follow up PRs.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-25 18:07:48 +03:00
Giedrius Statkevičius 8b738c55b1
Merge pull request #8331 from thanos-io/merge-release-0.39-to-main-v2
Merge release 0.39 to main
2025-06-25 15:25:36 +03:00
Giedrius Statkevičius 69624ecbf1 Merge branch 'main' into merge-release-0.39-to-main-v2
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-25 14:59:35 +03:00
Giedrius Statkevičius 0453c9b144
*: release 0.39.0 (#8330)
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-25 14:05:34 +03:00
Saswata Mukherjee 9c955d21df
e2e: Check rule group label works (#8322)
* e2e: Check rule group label works

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Fix fanout test

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2025-06-23 10:27:07 +01:00
Paul 7de9c13e5f
add rule tsdb.enable-native-histograms flag (#8321)
Signed-off-by: Paul Hsieh <supaulkawaii@gmail.com>
2025-06-23 10:06:00 +01:00
Giedrius Statkevičius a6c05e6df6
*: add CHANGELOG, update VERSION (#8320)
Prepare for 0.39.0-rc.0.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-20 07:12:19 +03:00
Giedrius Statkevičius 34a98c8efb
CHANGELOG: indicate release (#8319)
Indicate that 0.39.0 is in progress.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-19 17:59:12 +03:00
Giedrius Statkevičius 933f04f55e
query_frontend: only ready if downstream is ready (#8315)
We had an incident in prod where QFE was reporting that it is ready even
though the downstream didn't work due to a misconfigured load-balancer.
In this PR I am proposing sending periodic requests to downstream
to check whether it is working.

TestQueryFrontendTenantForward never worked so I deleted it.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-18 11:56:48 +03:00
dependabot[bot] f1c0f4b9b8
build(deps): bump github.com/KimMachineGun/automemlimit (#8312)
Bumps [github.com/KimMachineGun/automemlimit](https://github.com/KimMachineGun/automemlimit) from 0.7.2 to 0.7.3.
- [Release notes](https://github.com/KimMachineGun/automemlimit/releases)
- [Commits](https://github.com/KimMachineGun/automemlimit/compare/v0.7.2...v0.7.3)

---
updated-dependencies:
- dependency-name: github.com/KimMachineGun/automemlimit
  dependency-version: 0.7.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-18 12:35:30 +05:30
Hongcheng Zhu a6370c7cc6
Add Prometheus counters for pending write requests and series requests in Receive (#8308)
Signed-off-by: HC Zhu <hczhu.mtv@gmail.com>
Co-authored-by: HC Zhu (DB) <hc.zhu@databricks.com>
2025-06-17 10:46:12 +05:30
Hongcheng Zhu 8f715b0b6b
Query: limit LazyRetrieval memory buffer size (#8296)
* Limit lazyRespSet memory buffer size using a ring buffer

Signed-off-by: HC Zhu <hczhu.mtv@gmail.com>

* store: make heap a bit more consistent

Add len comparison to make it more consistent.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Fix linter complains

Signed-off-by: HC Zhu <hczhu.mtv@gmail.com>

---------

Signed-off-by: HC Zhu <hczhu.mtv@gmail.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Co-authored-by: HC Zhu (DB) <hc.zhu@databricks.com>
Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Co-authored-by: HC Zhu <hczhu.mtv@gmail.com>
2025-06-14 10:52:46 -07:00
Filip Petkovski 6c27396458
Merge pull request #8306 from GregSharpe1/main
[docs] Updating documentation around --compact flags
2025-06-13 08:53:04 +02:00
Greg Sharpe d1afea6a69 Updating the documention to reflect the correct flags when using --compact.enable-vertical-compaction.
Signed-off-by: Greg Sharpe <git+me@gregsharpe.co.uk>
2025-06-13 08:28:42 +02:00
gabyf 03d5b6bc28
tools: fix tool bucket inspect output arg description (#8252)
* docs: fix tool bucket output arg description

Signed-off-by: gabyf <zweeking.tech@gmail.com>

* fix(tools_bucket): output description from cvs to csv

Signed-off-by: gabyf <zweeking.tech@gmail.com>

---------

Signed-off-by: gabyf <zweeking.tech@gmail.com>
2025-06-12 16:35:42 -07:00
Giedrius Statkevičius 8769b97c86
go.mod: update promql engine + Prom dep (#8305)
Update dependencies. Almost everything works except for
https://github.com/prometheus/prometheus/pull/16252.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-12 10:50:03 +03:00
Aaron Walker 26f6e64365
Revert capnp to v3.0.0-alpha (#8300)
cef0b02 caused a regression of !7944. This reverts the version upgrade to the previously working version

Signed-off-by: Aaron Walker <aaron@vcra.io>
2025-06-10 09:41:59 +05:30
dependabot[bot] 60533e4a22
build(deps): bump golang.org/x/time from 0.11.0 to 0.12.0 (#8302)
Bumps [golang.org/x/time](https://github.com/golang/time) from 0.11.0 to 0.12.0.
- [Commits](https://github.com/golang/time/compare/v0.11.0...v0.12.0)

---
updated-dependencies:
- dependency-name: golang.org/x/time
  dependency-version: 0.12.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-10 09:31:02 +05:30
dependabot[bot] 95a2b00f17
build(deps): bump github.com/alicebob/miniredis/v2 from 2.22.0 to 2.35.0 (#8303)
Bumps [github.com/alicebob/miniredis/v2](https://github.com/alicebob/miniredis) from 2.22.0 to 2.35.0.
- [Release notes](https://github.com/alicebob/miniredis/releases)
- [Changelog](https://github.com/alicebob/miniredis/blob/master/CHANGELOG.md)
- [Commits](https://github.com/alicebob/miniredis/compare/v2.22.0...v2.35.0)

---
updated-dependencies:
- dependency-name: github.com/alicebob/miniredis/v2
  dependency-version: 2.35.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-10 09:30:45 +05:30
dependabot[bot] 2ed24bdf5b
build(deps): bump github/codeql-action from 3.26.13 to 3.28.19 (#8304)
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.26.13 to 3.28.19.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](f779452ac5...fca7ace96b)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 3.28.19
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-10 09:30:25 +05:30
Naman-Parlecha 23d60b8615
Fix: DataRace in TestEndpointSetUpdate_StrictEndpointMetadata test (#8288)
* fix: Fixing Unit Test TestEndpointSetUpdate_StrictEndpointMetadata

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* revert: CHANGELOG.md

Signed-off-by: Naman-Parlecha <namanparlecha@gmail.com>

---------

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
Signed-off-by: Naman-Parlecha <namanparlecha@gmail.com>
2025-06-06 15:51:53 +03:00
Naman-Parlecha 290f16c0e9
Resolve GitHub Actions Failure (#8299)
* update: changing to new prometheus page

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* fix: disable-admin-op flag

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

---------

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-06-05 13:52:12 +03:00
Aaron Walker 4ad45948cd
Receive: Remove migration of legacy storage to multi-tsdb (#8289)
This has been in since 0.13 (~5 years ago). This fixes issues caused when the default-tenant does not have any data and gets churned, resulting in the migration assuming that per-tenant directories are actually blocks, resulting in blocks not being queryable.

Signed-off-by: Aaron Walker <aaron@vcra.io>
2025-06-03 16:57:57 +03:00
Daniel Blando 15b1ef2ead
shipper: allow shipper sync to skip corrupted blocks (#8259)
* Allow shipper sync to skip corrupted blocks

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

* Move check to blockMetasFromOldest

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

* Split metrics. Return error

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

* fix test

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

* Reorder shipper contructor variables

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

* Use opts in shipper constructor

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

* Fix typo

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

---------

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>
2025-06-02 23:30:16 -07:00
Naman-Parlecha 2029c9bee0
store: Add --disable-admin-operations Flag to Store Gateway (#8284)
* fix(sidebar): maintain expanded state based on current page

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* fixing changelog

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* store: --disable-admin-operation flag

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

docs: Adding Flag details

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

updated changelog

refactor: changelog

Signed-off-by: Naman-Parlecha <namanparlecha@gmail.com>

---------

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
Signed-off-by: Naman-Parlecha <namanparlecha@gmail.com>
2025-06-01 15:26:58 -07:00
Saumya Shah 4e04420489
query: handle query.Analyze returning nil gracefully (#8199)
* fix: handle analyze returning nil gracefully

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* update CHANGELOG.md

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* fix format

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

---------

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>
2025-05-30 12:15:42 +03:00
Naman-Parlecha 36df30bbe8
fix: maintain expanded state based on current page (#8266)
* fix(sidebar): maintain expanded state based on current page

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* fixing changelog

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

---------

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
Signed-off-by: Naman-Parlecha <namanparlecha@gmail.com>
2025-05-30 12:07:23 +03:00
Saumya Shah 390fd0a023
query, query-frontend, ruler: Add support for flags to use promQL experimental functions & bump promql-engine (#8245)
* feat: add support for experimental functions, if enabled

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* fix tests

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* allow setting enable-feature flag in ruler

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* add flag info in docs

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* add CHANGELOG

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* add hidden flag to throw err on query fallback, red in tests ^_^

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* bump promql-engine to latest version/commit

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* format docs

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

---------

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>
2025-05-30 10:04:28 +03:00
Anna Tran 12649d8be7
Force sync writes to meta.json in case of host crash (#8282)
* Force sync writes to meta.json in case of host crash

Signed-off-by: Anna Tran <trananna@amazon.com>

* Update CHANGELOG for fsync meta.json

Signed-off-by: Anna Tran <trananna@amazon.com>

---------

Signed-off-by: Anna Tran <trananna@amazon.com>
2025-05-29 12:23:49 +03:00
Giedrius Statkevičius cef0b0200e
go.mod: mass update modules (#8277)
Maintenance task: let's update all modules.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-05-27 18:32:28 +03:00
Saumya Shah efc6eee8c6
query: fix query analyze to return appropriate results (#8262)
* call query analysis once querys being exec

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* refract the analyze logic

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* send not analyzable warnings instead of returning err

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* add seperate warnings in query non analyzable state based on engine

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

---------

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>
2025-05-27 16:13:30 +03:00
Siavash Safi da421eaffe
Shipper: fix missing meta file errors (#8268)
- fix meta file read error check
- use proper logs for missing meta file vs. other read errors

Signed-off-by: Siavash Safi <siavash@cloudflare.com>
2025-05-23 11:46:09 +00:00
Giedrius Statkevičius d71a58cbd4
docs: fix receive page (#8267)
Fix the docs after the most recent merge.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-05-23 10:47:01 +00:00
Giedrius Statkevičius f847ff0262
receive: implement shuffle sharding (#8238)
See the documentation for details.

Closes #3821.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-05-22 11:08:23 +03:00
dronenb ec9601aa0e
feat(promu): add darwin/arm64 (#8263)
* feat(promu): add darwin/arm64

Signed-off-by: Ben Dronen <dronenb@users.noreply.github.com>

* fix(promu): just use darwin

Signed-off-by: Ben Dronen <dronenb@users.noreply.github.com>

---------

Signed-off-by: Ben Dronen <dronenb@users.noreply.github.com>
2025-05-22 10:04:57 +02:00
Michael Hoffmann 759773c4dc
shipper: delete unused functions (#8260)
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-05-21 08:18:52 +00:00
Giedrius Statkevičius 88092449cd
docs: volunteer as shepherd (#8249)
* docs: volunteer as shepherd

Release the next version in a few weeks.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Fix formatting

Signed-off-by: Matej Gera <38492574+matej-g@users.noreply.github.com>

---------

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Matej Gera <38492574+matej-g@users.noreply.github.com>
Co-authored-by: Matej Gera <38492574+matej-g@users.noreply.github.com>
2025-05-15 14:56:00 +03:00
Ayoub Mrini 34b3d64034
test(tools_test.go/Test_CheckRules_Glob): take into consideration RO current dirs while (#8014)
changing files permissions.

The process may not have the needed permissions on the file (not the owner, not root or doesn't have the CAP_FOWNER capability)
to chmod it.
i

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-05-14 13:20:14 +01:00
dongjiang 242b5f6307
add otlp clientType (#8243)
Signed-off-by: dongjiang <dongjiang1989@126.com>
2025-05-13 14:18:41 +03:00
Giedrius Statkevičius aa3e4199db
e2e: disable some more flaky tests (#8241)
These are flaky hence disable them.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-05-09 16:46:23 +03:00
Giedrius Statkevičius 81b4260f5f
reloader: disable some flaky tests (#8240)
Disabling some flaky tests.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-05-08 15:59:24 +03:00
Saumya Shah 2dfc749a85
UI: bump codemirror-promql dependency to latest version (#8230)
* bump codemirror-promql react dep to latest version

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* fix lint errors, build react-app

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* sync ui change of input expression

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* revert build files

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

* build and update few warnings

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>

---------

Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>
2025-05-07 11:20:49 +01:00
Philip Gough 2a5a856e34
tools: Extend bucket ls options (#8225)
* tools: Extend bucket ls command with min and max time, selector config and timeout options

Signed-off-by: Philip Gough <philip.p.gough@gmail.com>

* make: docs

Signed-off-by: Philip Gough <philip.p.gough@gmail.com>

Update cmd/thanos/tools_bucket.go

Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Philip Gough <pgough@redhat.com>

Update cmd/thanos/tools_bucket.go

Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Philip Gough <pgough@redhat.com>

---------

Signed-off-by: Philip Gough <philip.p.gough@gmail.com>
Signed-off-by: Philip Gough <pgough@redhat.com>
Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2025-04-25 10:33:34 +00:00
Giedrius Statkevičius cff147dbc0
receive: remove Get() method from hashring (#8226)
Get() is equivalent to GetN(1) so remove it. It's not used.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-04-25 09:59:37 +00:00
Saswata Mukherjee 7d7ea650b7
Receive: Ensure forward/replication metrics are incremented in err cases (#8212)
* Ensure forward/replication metrics are incremented in err cases

This commit ensures forward and replication metrics are incremented with
err labels.

This seemed to be missing, came across this whilst working on a
dashboard.

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Add changelog

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2025-04-22 11:35:34 +00:00
Andrew Reilly 92db7aabb1
Update query.md documentation where example uses --query.tenant-default-id flag instead of --query.default-tenant-id (#8210)
Signed-off-by: Andrew Reilly <adr@maas.ca>
2025-04-22 11:27:49 +03:00
Filip Petkovski 66f54ac88d
Merge pull request #8216 from yuchen-db/fix-iter-race
Fix Pull iterator race between next() and stop()
2025-04-22 08:19:10 +02:00
Yuchen Wang a8220d7317 simplify unit test
Signed-off-by: Yuchen Wang <yuchen.wang@databricks.com>
2025-04-20 17:54:47 -07:00
Yuchen Wang 909c08fa98 add comments
Signed-off-by: Yuchen Wang <yuchen.wang@databricks.com>
2025-04-20 17:46:38 -07:00
Yuchen Wang 6663bb01ac update changelog
Signed-off-by: Yuchen Wang <yuchen.wang@databricks.com>
2025-04-20 17:16:33 -07:00
Yuchen Wang d7876b4303 fix unit test
Signed-off-by: Yuchen Wang <yuchen.wang@databricks.com>
2025-04-20 16:55:57 -07:00
Yuchen Wang 6f556d2bbb add unit test
Signed-off-by: Yuchen Wang <yuchen.wang@databricks.com>
2025-04-20 15:46:54 -07:00
Yuchen Wang 0dcc9e9ccd add changelog
Signed-off-by: Yuchen Wang <yuchen.wang@databricks.com>
2025-04-20 15:46:54 -07:00
Yuchen Wang f168dc0cbb fix Pull iter race between next() and stop()
Signed-off-by: Yuchen Wang <yuchen.wang@databricks.com>
2025-04-20 15:46:54 -07:00
dependabot[bot] 8273ad013c
build(deps): bump github.com/golang-jwt/jwt/v5 from 5.2.1 to 5.2.2 (#8164)
Bumps [github.com/golang-jwt/jwt/v5](https://github.com/golang-jwt/jwt) from 5.2.1 to 5.2.2.
- [Release notes](https://github.com/golang-jwt/jwt/releases)
- [Changelog](https://github.com/golang-jwt/jwt/blob/main/VERSION_HISTORY.md)
- [Commits](https://github.com/golang-jwt/jwt/compare/v5.2.1...v5.2.2)

---
updated-dependencies:
- dependency-name: github.com/golang-jwt/jwt/v5
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-17 15:08:12 +01:00
Michael Hoffmann 31c6115317
Query: fix partial response for distributed instant query (#8211)
This commit fixes a typo in partial response handling for distributed
instant queries.

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-04-17 08:06:42 +00:00
Aaron Walker c0b5500cb5
Unhide tsdb.enable-native-histograms flag in receive (#8202)
Signed-off-by: Aaron Walker <aaron@vcra.io>
2025-04-11 13:33:24 +02:00
Michael Hoffmann ce2b51f93e
Sidecar: increase default prometheus timeout (#8192)
Adjust the default get-config timeout to match the default get-config
interval.

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-04-07 15:29:22 +00:00
Naohiro Okada 1a559f9de8
fix changelog markdown. (#8190)
Signed-off-by: naohiroo <naohiro.dev@gmail.com>
2025-04-04 14:23:16 +02:00
Michael Hoffmann b2f5ee44a7
merge release 0.38.0 to main (#8186)
* Changelog: cut release 0.38-rc.0 (#8174)

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>

* Changelog: cut release 0.38.0-rc.1 (#8180)

* Query: fix endpointset setup

This commit fixes an issue where we add non-strict, non-group endpoints
to the endpointset twice, once with resolved addresses from the dns
provider and once with its dns prefix.

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>

* deps: bump promql-engine (#8181)

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>

* Changelog: cut release 0.38-rc.1

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>

---------

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>

* Changelog: cut release 0.38 (#8185)

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>

---------

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-04-04 06:10:48 +00:00
Michael Hoffmann 08e5907cba
deps: bump promql-engine (#8181)
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-03-31 13:22:35 +00:00
Michael Hoffmann 2fccdfbf5a
Query: fix endpointset setup (#8175)
This commit fixes an issue where we add non-strict, non-group endpoints
to the endpointset twice, once with resolved addresses from the dns
provider and once with its dns prefix.

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-03-27 07:02:21 +00:00
Michael Hoffmann da855a12dc
Changelog: mark 0.38 as in-progress (#8173)
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-03-25 12:57:29 +00:00
Michael Hoffmann 68844d46d7
e2e: use prom 3 (#8165)
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-03-24 18:10:54 +00:00
Saumya Shah d1345b999e
update: interactive tests to update non-supported store flags to endpoint in querier (#8157)
Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>
2025-03-17 11:07:46 +00:00
Ben Kochie 8da0a2b69b
rule: Add support for query offset (#8158)
Support Prometheus rule manager upstream "query offset" feature.
* Add support for a default rule query offset via command flag.
* Add per rule group query_offset support.

Fixes: https://github.com/thanos-io/thanos/issues/7596

Signed-off-by: SuperQ <superq@gmail.com>
2025-03-14 08:24:32 +00:00
Michał Mazur 1f5bff2a01
query: Support chain deduplication algorithm (#7808)
Signed-off-by: Michał Mazur <mmazur.box@gmail.com>
2025-03-13 08:24:18 -07:00
dependabot[bot] e3acaeb8d6
build(deps): bump go.opentelemetry.io/otel/sdk from 1.34.0 to 1.35.0 (#8148)
Bumps [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) from 1.34.0 to 1.35.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.34.0...v1.35.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/sdk
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-11 10:12:47 +02:00
Michael Hoffmann 097b2a4783
Query: bump promql-engine, fix fallout for distributed mode (#8135)
This PR bumps the thanos promql-engine repository, fixes the fallout and
makes distributed mode respect the user requested partial response
setting.

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-03-10 08:39:45 +00:00
Saswata Mukherjee 0414eef64d
Bump common+client_golang to deal with utf-8 (#8134)
* Bump common+client_golang to deal with utf-8

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Fix+Add tests

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Bump to 1.21

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2025-03-04 15:31:05 +00:00
Giedrius Statkevičius 03c96d05a0
compact: implement native histogram downsampling (#8110)
* test/e2e: add native histogram downsampling test case

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* downsample: port other PR

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* compact: fix after review

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

---------

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-03-03 21:07:05 +02:00
dependabot[bot] 812688e573
build(deps): bump peter-evans/create-pull-request from 7.0.6 to 7.0.7 (#8123)
Bumps [peter-evans/create-pull-request](https://github.com/peter-evans/create-pull-request) from 7.0.6 to 7.0.7.
- [Release notes](https://github.com/peter-evans/create-pull-request/releases)
- [Commits](67ccf781d6...dd2324fc52)

---
updated-dependencies:
- dependency-name: peter-evans/create-pull-request
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-03 11:20:06 +00:00
dependabot[bot] 7457649c0f
build(deps): bump actions/cache from 4.0.2 to 4.2.1 (#8122)
Bumps [actions/cache](https://github.com/actions/cache) from 4.0.2 to 4.2.1.
- [Release notes](https://github.com/actions/cache/releases)
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md)
- [Commits](0c45773b62...0c907a75c2)

---
updated-dependencies:
- dependency-name: actions/cache
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-03 11:19:49 +00:00
Michael Hoffmann 81dfb50c1e
Query: bump promql-engine (#8118)
Bumping PromQL engine, fixing fallback fallout.

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
Co-authored-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-03-03 11:18:08 +00:00
Ben Ye c69f11214d
Optimize wildcard matchers for .* and .+ (#8131)
* optimize wildcard matchers for .* and .+

Signed-off-by: yeya24 <benye@amazon.com>

* add changelog

Signed-off-by: yeya24 <benye@amazon.com>

---------

Signed-off-by: yeya24 <benye@amazon.com>
2025-03-02 19:05:17 -08:00
Ben Ye 4ba7d596a8
Infer max query downsample resolution from promql query (#7012)
* Adjust max_source_resolution automatically based promql queries

Signed-off-by: Ben Ye <benye@amazon.com>

* fix data race

Signed-off-by: yeya24 <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: yeya24 <benye@amazon.com>
2025-02-25 15:28:43 -08:00
민선 (minnie) 4a83459892
query : add missing xincrease/xrate aggregation (#8120) 2025-02-25 09:14:48 -08:00
dependabot[bot] f230915c1c
build(deps): bump go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp (#8067)
Bumps [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp](https://github.com/open-telemetry/opentelemetry-go) from 1.29.0 to 1.34.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.29.0...v1.34.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-20 22:42:18 -08:00
Yi Jin 426b2f19f1
[issue-8106] fix tenant hashring glob with multiple match patterns (#8107)
Signed-off-by: Yi Jin <yi.jin@databricks.com>
2025-02-20 12:11:27 +02:00
Michael Hoffmann 151ae7490e
Query: dynamic endpointgroups are allowed (#8113)
This PR fixes a bug where dynamic endpoint groups are silently ignored.

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
Co-authored-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-02-19 10:31:58 +00:00
Marta 71bbafb611
Remove quote from replica label. (#8075)
Signed-off-by: Marta <me@marta.nz>
2025-02-18 11:35:11 -08:00
Saumya Shah 6fa81f797c
*: bump Go to 1.24 (#8105)
* bump go version to newer 1.24

Signed-off-by: Saumyacodes-40 <saumyabshah90@gmail.com>

* update .go-version

Signed-off-by: Saumyacodes-40 <saumyabshah90@gmail.com>

* fix test failing, formatted string func now requires constant string literal

Signed-off-by: Saumyacodes-40 <saumyabshah90@gmail.com>

* test: point faillint to thanos-community/faillint

Signed-off-by: Saumyacodes-40 <saumyabshah90@gmail.com>

* fix commit hash

Signed-off-by: Saumyacodes-40 <saumyabshah90@gmail.com>

* use branch instead of hash

Signed-off-by: Saumyacodes-40 <saumyabshah90@gmail.com>

* test: update faillint.mod/sum

Signed-off-by: Saumyacodes-40 <saumyabshah90@gmail.com>

* bump golangci-lint

Signed-off-by: Saumyacodes-40 <saumyabshah90@gmail.com>

* update .bingo/faillint.mod based on new deps upgrade

Signed-off-by: Saumyacodes-40 <saumyabshah90@gmail.com>

* address required changes

Signed-off-by: Saumyacodes-40 <saumyabshah90@gmail.com>

---------

Signed-off-by: Saumyacodes-40 <saumyabshah90@gmail.com>
2025-02-18 12:41:03 +00:00
dependabot[bot] fe651280a5
build(deps): bump github.com/tjhop/slog-gokit from 0.1.2 to 0.1.3 (#8109)
Bumps [github.com/tjhop/slog-gokit](https://github.com/tjhop/slog-gokit) from 0.1.2 to 0.1.3.
- [Release notes](https://github.com/tjhop/slog-gokit/releases)
- [Changelog](https://github.com/tjhop/slog-gokit/blob/main/.goreleaser.yaml)
- [Commits](https://github.com/tjhop/slog-gokit/compare/v0.1.2...v0.1.3)

---
updated-dependencies:
- dependency-name: github.com/tjhop/slog-gokit
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-18 10:44:51 +02:00
SungJin1212 346d18bb0f
Update prometheus verison to v3.1.0 (#8090)
Signed-off-by: SungJin1212 <tjdwls1201@gmail.com>
2025-02-12 12:17:00 +02:00
Giedrius Statkevičius 38f4c3c6a2
store: lock around iterating over s.blocks (#8088)
Hold a lock around s.blocks when iterating over it. I have experienced a
case where a block had been added to a blockSet twice somehow and it
being somehow removed from s.blocks is the only way it could happen.
This is the only "bad" thing I've been able to spot.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-02-07 10:54:59 +02:00
Piotr Śliwka a13fc75c04
Fix deadlock in metadata fetcher (#8092)
This is a fix for a bug we encountered in our production deployment at
$WORK, where we experience Thanos Store to spontaneously stop refreshing
its metadata cache every time our (Ceph-based) object storage starts
rate-limiting Thanos's requests too much. This causes the Store to
permanently stop discovering new blocks in the bucket, and keep trying
to access old, long-gone blocks (which have already been compacted and
removed), breaking subsequent queries to the Store and mandating its
manual restart. More details about the bug and the fix below.

Initially, the `GetActiveAndPartialBlockIDs` method spawns fixed number
of 64 goroutines to issue concurrent requests to a remote storage. Each
time the storage causes the `f.bkt.Exists` call to return an error, one
of goroutines exits returning said error, effectively reducing
concurrency of processing remaining block IDs in `metaChan`. While it's
not a big problem in case of one or two errors, it is entirely possible
that, in case of prolonged storage problems, all 64 goroutines quit,
resulting in `metaChan` filling up and blocking the `f.bkt.Iter`
iterator below. This causes the whole method to be stuck indefinitely,
even if the storage becomes fully operational again.

This commit fixes the issue by allowing the iterator to return as soon
as a single processing goroutine errors out, so that the method can
reliably finish, returning the error as intended. Additionally, the
processing goroutines are adjusted as well, to make them all quit early
without consuming remaining items in `metaChan`. While the latter is not
strictly necessary to fix this bug, it doesn't make sense to let any
remaining goroutines keep issuing requests to the storage if the method
is already bound to return nil result along with the first encountered
error.

Signed-off-by: Piotr Śliwka <psliwka@opera.com>
2025-02-07 10:54:32 +02:00
Célian GARCIA c25a356214
fix: add POST into allowed CORS methods header (#8091)
Signed-off-by: Célian Garcia <celian.garcia@amadeus.com>
2025-02-06 16:49:44 +02:00
SungJin1212 57efc2aacd
Add a func to convert go-kit log to slog (#7969)
Signed-off-by: SungJin1212 <tjdwls1201@gmail.com>
2025-02-02 23:07:37 -08:00
Ben Ye 8cd83bfd2b
Extend posting-group-max-key-series-ratio for add all posting group (#8083)
Signed-off-by: yeya24 <benye@amazon.com>
2025-02-02 10:32:59 -08:00
Ben Ye 45013e176f
skip match label values for certain matchers (#8084)
Signed-off-by: yeya24 <benye@amazon.com>
2025-02-02 10:30:31 -08:00
Michael Hoffmann 2367777322
query, rule: make endpoint discovery dynamically reloadable (#7890)
* Removed previously deprecated and hidden flags to configure endpoints ( --rule, --target, ...)
* Added new flags --endpoint.sd-config, --endpoint-sd-config-reload-interval to configure a dynamic SD file
* Moved endpoint set construction into cmd/thanos/endpointset.go for a little cleanup
* Renamed "thanos_(querier/ruler)_duplicated_store_addresses_total" to
  "thanos_(querier/ruler)_duplicated_endpoint_addresses_total"

The new config makes it possible to also set "strict" and "group" flags on the endpoint instead
of only their addresses, making it possible to have file based service discovery for endpoint groups too.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-01-15 16:47:59 +02:00
Nicolas Takashi 300a9ed653
[FEATURE] adding otlp endpoint (#7996)
* [FEATURE] adding otlp endpoint

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [FEATURE] adding otlp endpoint

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [FEATURE] adding otlp endpoint

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [FIX] e2e tests for otlp receiver

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [CHORE] adding otlp flags

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [DOC] updating docs

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [CHORE] copying otlptranslator

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [CHORE] copying otlptranslator tests

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [CHORE] copying otlptranslator tests

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [CHORE] copying otlptranslator tests

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [FIX] lint issues

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [FIX] lint issues

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [FIX] lint issues

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [CHORE] using multi errors

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [FIX] span naming convention

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [TEST] adding handler otlp unit test

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [TEST] upgrade collector version

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [FIX] golang lint

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [CHORE] adding allow size bytes limit gate

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [FIX] unit test otlp endpoint

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* Apply suggestions from code review

Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [FIX] unit test otlp endpoint

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [DOC] updating docs

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [CHORE] applying pr comments

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* Update pkg/receive/handler_otlp.go

Co-authored-by: Matej Gera <38492574+matej-g@users.noreply.github.com>
Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* Update cmd/thanos/receive.go

Co-authored-by: Matej Gera <38492574+matej-g@users.noreply.github.com>
Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [DOCS] updating

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* [FIX] go mod lint issues

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>

* Fix TestFromMetrics error comparison

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Co-authored-by: Matej Gera <38492574+matej-g@users.noreply.github.com>
2025-01-15 11:31:04 +00:00
Pedro Tanaka a3b78c231c
QFE: fixing stats middleware when cache is enabled (#8046)
* QFE: fixing stats middleware when cache is enabled

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Clarify strange config parameter

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

---------

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2025-01-15 10:28:35 +00:00
Ben Ye caffc1181b
make lazy posting series match ratio configurable (#8049)
Signed-off-by: Ben Ye <benye@amazon.com>
2025-01-15 10:16:25 +00:00
Daniel Sabsay 4ba0ba4038
pkg/cacheutil: Async op fix (#8044)
* Add test for AsyncOperationProcessor stop() behavior

The existing implementation sometimes drops existing operations that are
still on the queue when .stop() is called.

If multiple communications in a select statement can proceed, one is
chosen pseudo-randomly: https://go.dev/ref/spec#Select_statements

This means that sometimes a processor worker will process a remaining
operation, and sometimes it won't.

Signed-off-by: Daniel Sabsay <sabsay@adobe.com>

* Fix async_op test regarding stop() behavior

Signed-off-by: Daniel Sabsay <sabsay@adobe.com>

* add header to test file

Signed-off-by: Daniel Sabsay <sabsay@adobe.com>

---------

Signed-off-by: Daniel Sabsay <sabsay@adobe.com>
Co-authored-by: Daniel Sabsay <sabsay@adobe.com>
2025-01-10 09:47:50 +02:00
Michael Hoffmann f250d681fd
query: fix panic when selecting non-default engine (#8050)
Fix duplicate metrics registration when selecting non-default engine

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
Co-authored-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-01-09 10:38:33 +00:00
Alan Protasio 0d42636167
Matcher cache/series (#8045)
* Add option to cache matcher on the get series call

Signed-off-by: alanprot <alanprot@gmail.com>

* Adding matcher cacheoption for the store gateway

Signed-off-by: alanprot <alanprot@gmail.com>

* lint/docs

Signed-off-by: alanprot <alanprot@gmail.com>

* change desc

Signed-off-by: alanprot <alanprot@gmail.com>

* change desc

Signed-off-by: alanprot <alanprot@gmail.com>

* change desc

Signed-off-by: alanprot <alanprot@gmail.com>

* fix test

Signed-off-by: alanprot <alanprot@gmail.com>

* Caching only regex matchers

Signed-off-by: alanprot <alanprot@gmail.com>

* changelog

Signed-off-by: alanprot <alanprot@gmail.com>

---------

Signed-off-by: alanprot <alanprot@gmail.com>
2025-01-07 22:05:37 +00:00
Ben Ye 0e95c464dd
Fix binary reader download duration histogram (#8017)
* fix binary reader download duration histogram

Signed-off-by: Ben Ye <benye@amazon.com>

* enable native histograms

Signed-off-by: Ben Ye <benye@amazon.com>

* changelog

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>
2025-01-06 09:34:20 -08:00
Ben Ye ca2e23ffb5
add block lifecycle callback (#8036)
Signed-off-by: Ben Ye <benye@amazon.com>
2025-01-06 11:41:20 +00:00
Ben Ye 2ff07b2ce7
Optimize sort keys by server in memcache client (#8026)
* optimize sort keys by server in memcache client

Signed-off-by: Ben Ye <benye@amazon.com>

* address comments

Signed-off-by: Ben Ye <benye@amazon.com>

* remove unused mockAddr

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>
2025-01-06 09:46:56 +00:00
Alan Protasio bed76cf4dd
Fix matcher cache (#8039)
* Fix matcher cache

Signed-off-by: alanprot <alanprot@gmail.com>

* Simplifying cache interface

Signed-off-by: alanprot <alanprot@gmail.com>

---------

Signed-off-by: alanprot <alanprot@gmail.com>
2025-01-05 16:02:39 -08:00
Ben Ye 6e29530000
optimize store gateway bytes limiter reserve with type request (#8025)
Signed-off-by: Ben Ye <benye@amazon.com>
2025-01-05 09:59:57 +00:00
dependabot[bot] 4a246cee50
build(deps): bump peter-evans/create-pull-request from 6.1.0 to 7.0.6 (#8028)
Bumps [peter-evans/create-pull-request](https://github.com/peter-evans/create-pull-request) from 6.1.0 to 7.0.6.
- [Release notes](https://github.com/peter-evans/create-pull-request/releases)
- [Commits](c5a7806660...67ccf781d6)

---
updated-dependencies:
- dependency-name: peter-evans/create-pull-request
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-05 09:58:37 +00:00
Milind Dethe 40d844de5b
receive: unhide tsdb.out-of-order.time-window and tsdb.out-of-order.cap-max (#8032)
* Unhide tsdb.out-of-order.time-window and tsdb.out-of-order.cap-max

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>

* make docs

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>

* make docs

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>

---------

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>
2025-01-05 09:58:08 +00:00
Ben Ye 07734b97fa
add pool for expanded posting slice (#8035)
* add pool for expanded posting slice

Signed-off-by: Ben Ye <benye@amazon.com>

* check nil postings

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>
2025-01-05 09:57:43 +00:00
Roberto O. Fernández Crisial 803556cb57
Updating x/net package (#8034)
Signed-off-by: Roberto O. Fernández Crisial <roberto.crisial@ip-192-168-0-7.ec2.internal>
Signed-off-by: Roberto O. Fernández Crisial <rofc@rofc.com.ar>
2025-01-03 11:44:37 -08:00
Pedro Tanaka 626d0e5bfb
Receiver: cache matchers for series calls (#7353)
* Receiver: cache matchers for series calls

We have tried caching matchers before with a time-based expiration cache, this time we are trying with LRU cache.

We saw some of our receivers busy with compiling regexes and with high CPU usage, similar to the profile of the benchmark I added here:

* Adding matcher cache for method `MatchersToPromMatchers` and a new version which uses the cache.
* The main change is in `matchesExternalLabels` function which now receives a cache instance.

adding matcher cache and refactor matchers

Co-authored-by: Andre Branchizio <andre.branchizio@shopify.com>

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

Using the cache in proxy and tsdb stores (only receiver)

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

fixing problem with deep equality

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

adding some docs

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

Adding benchmark

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

undo unecessary changes

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

Adjusting metric names

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

adding changelog

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

wiring changes to the receiver

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

Fixing linting

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

docs

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* using singleflight to get or set items

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* improve metrics

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Introduce interface for matchers cache

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* fixing unit test

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* adding changelog

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* fixing benchmark

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* moving matcher cache to storecache package

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Trying to make the cache more reusable introducing interface

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

Fixing problem with wrong initialization

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

Moving interface to storecache package

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

remove empty file and fix calls to constructor passing nil;

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* Fix false entry on change log

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Removing default value for registry and rename test file

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Using fmt.Errf()

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Remove method that is not on interface anymore

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Remove duplicate get call

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

---------

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>
2025-01-03 09:59:24 -08:00
Pengyu Wang ca40906c83
Bump devcontainer Dockerfile base image from go1.22 to go1.23 (#8031)
Signed-off-by: Pengyu Wang <hncswpy@gmail.com>
2024-12-31 13:44:21 -08:00
Harry John 6f03fcb0ba
QFE: Fix @ modifier not being applied correctly on subqueries (#8016)
Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
2024-12-26 15:38:17 -08:00
Filip Petkovski 2d041dc774
Merge pull request #8018 from Juneezee/refactor/xxhash
*: replace `cespare/xxhash` with `cespare/xxhash/v2`
2024-12-24 15:37:35 +01:00
Eng Zer Jun d298d5afee
*: replace `cespare/xxhash` with `cespare/xxhash/v2`
`github.com/cespare/xxhash/v2` is the latest version with bug fixes and
improvements.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2024-12-24 20:35:00 +08:00
Filip Petkovski ab13e10b3d
Merge pull request #8021 from Juneezee/refactor/exp
*: replace `golang.org/x/exp` with standard library
2024-12-24 13:20:09 +01:00
Eng Zer Jun 90bfef6a91
Tidy `go.mod` properly
Two sections in total: one for direct dependencies, and one for indirect
dependencies.

Reference: https://github.com/golang/go/issues/56471
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2024-12-24 01:17:41 +08:00
Eng Zer Jun b6556852a7
*: replace `golang.org/x/exp` with standard library
These experimental packages are now available in the Go standard
library. Since we upgraded our minimum Go version to 1.23 in PR
https://github.com/thanos-io/thanos/pull/7796, we can replace them with
the standard library:

	1. golang.org/x/exp/slices -> slices [1]
	2. golang.org/x/exp/maps -> maps [2]
	3. golang.org/x/exp/rand -> math/rand/v2 [3]

[1]: https://go.dev/doc/go1.21#slices
[2]: https://go.dev/doc/go1.21#maps
[3]: https://go.dev/doc/go1.22#math_rand_v2

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2024-12-24 01:16:32 +08:00
Filip Petkovski c5025b79af
Merge pull request #8002 from thanos-io/dependabot/go_modules/golang.org/x/crypto-0.31.0
build(deps): bump golang.org/x/crypto from 0.28.0 to 0.31.0
2024-12-20 16:06:01 +01:00
dependabot[bot] 8311e3de70
build(deps): bump golang.org/x/crypto from 0.28.0 to 0.31.0
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.28.0 to 0.31.0.
- [Commits](https://github.com/golang/crypto/compare/v0.28.0...v0.31.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-12-20 14:51:51 +00:00
Coleen Iona Quadros 4a847851e4
Add labels to rules UI (#8009)
Signed-off-by: Coleen Iona Quadros <coleen.quadros27@gmail.com>
2024-12-19 16:28:06 +00:00
Filip Petkovski b881f6c41b
Merge pull request #8005 from pedro-stanaka/fix-e2e-maybe
Fixing E2E tests using headless Chrome
2024-12-18 12:03:19 +01:00
Abel Simon 80c89fdb19
query: distributed engine - allow querying overlapping intervals (commit signed) (#8003)
* chore: add possibility to run individual e2e tests

Signed-off-by: Abel Simon <abelsimon48@gmail.com>

* chore: add metric pointer for too old sample logs

Signed-off-by: Abel Simon <abelsimon48@gmail.com>

* feat: add flag for distributed queries with overlapping intervals

Signed-off-by: Abel Simon <abelsimon48@gmail.com>

* chore: add failing overlapping interval test

Signed-off-by: Abel Simon <abelsimon48@gmail.com>

* chore: fix base branch diff

Signed-off-by: Abel Simon <abelsimon48@gmail.com>

---------

Signed-off-by: Abel Simon <abelsimon48@gmail.com>
2024-12-18 10:51:21 +00:00
Pedro Tanaka e2cb509f5f
Forcing headless and making sure we can run it before creating the context for testing
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-12-18 11:47:03 +01:00
Giedrius Statkevičius 8e4eb42d4c .github: run e2e tests on newer ubuntu
I cannot reproduce chromedp panics locally so trying to see if a newer
Ubuntu version would help.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-12-18 12:32:09 +02:00
Rémi Vichery 1b58ed13c9
receive: fix maxBufferedResponses channel size to avoid deadlock (#7978)
* Fix maxBufferedResponses channel size to avoid deadlock

Fixes #7977

Signed-off-by: Remi Vichery <remi@alkira.com>

* Add changelog entry

Signed-off-by: Remi Vichery <remi@alkira.com>

* adjust line numbers in docs/components/receive.md to match updated code

Signed-off-by: Remi Vichery <remi@alkira.com>

---------

Signed-off-by: Remi Vichery <remi@alkira.com>
2024-12-18 10:23:26 +02:00
Michael Hoffmann 1ca8292729
api: bump promql engine and fix fallout (#8000)
* bump to new promql-engine version and fix fallout
* new promql-engine makes it possible to provide more options at runtime

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
Co-authored-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2024-12-17 16:28:23 +00:00
Saswata Mukherjee 683cf171e9
Merge pull request #7982 from saswatamcode/merge-release-0.37.2-to-main
Merge release 0.37.2 back to main
2024-12-11 10:10:45 +00:00
Saswata Mukherjee 3ac552d95a
Merge branch 'main' into merge-release-0.37.2-to-main
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-12-11 09:32:33 +00:00
Saswata Mukherjee 18291a78d4
Merge pull request #7980 from saswatamcode/cut-release-0.37.2
* Fix potential deadlock in hedging request (#7962)

Signed-off-by: SungJin1212 <tjdwls1201@gmail.com>

* sidecar: fix limit mintime (#7970)

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Cut patch release v0.37.2

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Fix changelog

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: SungJin1212 <tjdwls1201@gmail.com>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Co-authored-by: SungJin1212 <tjdwls1201@gmail.com>
Co-authored-by: Michael Hoffmann <mhoffm@posteo.de>
2024-12-11 09:03:29 +00:00
Saswata Mukherjee c071d513f9
Fix changelog
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-12-11 08:51:31 +00:00
Saswata Mukherjee 49a0587b54
Cut patch release v0.37.2
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-12-11 08:51:31 +00:00
Michael Hoffmann 351f75b597
sidecar: fix limit mintime (#7970)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-12-11 08:51:31 +00:00
SungJin1212 6cdc1ae96d
Fix potential deadlock in hedging request (#7962)
Signed-off-by: SungJin1212 <tjdwls1201@gmail.com>
2024-12-11 08:50:48 +00:00
Ben Ye 0ea6bac096
store gateway: fix merge fetched postings with lazy postings (#7979)
Signed-off-by: Ben Ye <benye@amazon.com>
2024-12-10 15:43:02 -08:00
Michael Hoffmann b3645c8017
sidecar: fix limit mintime (#7970)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-12-10 10:22:07 +00:00
Ben Ye 51c7dcd8c2
Mark posting group lazy if it has a lot of keys (#7961)
* mark posting group lazy if it has a lot of add keys

Signed-off-by: Ben Ye <benye@amazon.com>

* update docs

Signed-off-by: Ben Ye <benye@amazon.com>

* rename labels

Signed-off-by: Ben Ye <benye@amazon.com>

* changelog

Signed-off-by: Ben Ye <benye@amazon.com>

* change to use max key series ratio

Signed-off-by: Ben Ye <benye@amazon.com>

* update docs

Signed-off-by: Ben Ye <benye@amazon.com>

* mention metrics

Signed-off-by: Ben Ye <benye@amazon.com>

* update docs

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>
2024-12-09 23:13:11 -08:00
SungJin1212 d0d93dbf3e
Fix potential deadlock in hedging request (#7962)
Signed-off-by: kade.lee <tjdwls1201@gmail.com>
2024-12-05 12:39:58 +00:00
Saswata Mukherjee 7037331e6e
Merge pull request #7959 from saswatamcode/merge-release-0.37.1-to-main
Merge release 0.37.1 to main
2024-12-04 09:46:16 +00:00
Saswata Mukherjee 5d2f3b687e
Merge branch 'main' into merge-release-0.37.1-to-main
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-12-04 09:27:43 +00:00
Saswata Mukherjee e0812e2f46
Cut patch release `v0.37.1` (#7952)
* Merge pull request #7674 from didukh86/query_frontend_tls_redis_fix

Query-frontend: Fix connection to Redis cluster with TLS.
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Capnp: Use segment from existing message (#7945)

* Capnp: Use segment from existing message

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Downgrade capnproto

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* [Receive] Fix race condition when adding multiple new tenants at once (#7941)

* [Receive] fix race condition

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* add a change log

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* memorize tsdb local clients without race condition

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* fix data race in testing with some concurrent safe helper functions

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* address comments

Signed-off-by: Yi Jin <yi.jin@databricks.com>

---------

Signed-off-by: Yi Jin <yi.jin@databricks.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Cut patch release v0.37.1

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Update promql-engine for subquery fix (#7953)

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Sidecar: Ensure limit param is positive for compatibility with older Prometheus (#7954)

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Update changelog

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Fix changelog

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Yi Jin <yi.jin@databricks.com>
Co-authored-by: Filip Petkovski <filip.petkovsky@gmail.com>
Co-authored-by: Yi Jin <96499497+jnyi@users.noreply.github.com>
2024-12-04 08:22:07 +00:00
Saswata Mukherjee dec2686f99
Sidecar: Ensure limit param is positive for compatibility with older Prometheus (#7954)
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-12-03 18:55:46 +00:00
Saswata Mukherjee 1a328c124b
Update promql-engine for subquery fix (#7953)
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-12-03 13:33:10 +00:00
Yi Jin 1ea4e69908
[Receive] Fix race condition when adding multiple new tenants at once (#7941)
* [Receive] fix race condition

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* add a change log

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* memorize tsdb local clients without race condition

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* fix data race in testing with some concurrent safe helper functions

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* address comments

Signed-off-by: Yi Jin <yi.jin@databricks.com>

---------

Signed-off-by: Yi Jin <yi.jin@databricks.com>
2024-12-03 10:52:06 +02:00
Saswata Mukherjee 51fddeb28d
Merge pull request #7946 from saswatamcode/merge-release-0.37-to-main
Merge release v0.37.0 to main
2024-11-28 17:51:16 +00:00
Saswata Mukherjee 96cc4f17ff
Fix ver
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-11-28 17:22:20 +00:00
Saswata Mukherjee cd0ac33697
Merge branch 'main' into merge-release-0.37-to-main
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-11-28 17:20:28 +00:00
Filip Petkovski dd86ec8d0a
Capnp: Use segment from existing message (#7945)
* Capnp: Use segment from existing message

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Downgrade capnproto

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-11-28 17:15:20 +00:00
bluesky6529 e133849d38
remove redundant redis config (#7942)
Signed-off-by: Helen Tseng <bluesky6529@gmail.com>
2024-11-28 17:12:34 +00:00
Filip Petkovski e4d8234c86
Merge pull request #7674 from didukh86/query_frontend_tls_redis_fix
Query-frontend: Fix connection to Redis cluster with TLS.
2024-11-28 17:44:47 +01:00
Filip Petkovski 1a3dc07892
Merge branch 'main' into query_frontend_tls_redis_fix 2024-11-28 17:32:22 +01:00
Philip Gough 1d76335611
receive: Allow specifying a custom gRPC service config via flag (#7907)
Signed-off-by: Philip Gough <philip.p.gough@gmail.com>
2024-11-25 13:00:26 +00:00
Giedrius Statkevičius a55844d52a
receive/expandedpostingscache: fix race (#7937)
Porting https://github.com/cortexproject/cortex/pull/6369 to our code
base. Add test that fails without the fix.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-11-25 12:14:08 +02:00
Saswata Mukherjee 889d527630
Cut release for v0.37.0 (#7936)
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-11-25 09:57:16 +00:00
Giedrius Statkevičius b144ebac20
receive: port expanded postings cache from Cortex (#7914)
Port expanded postings cache from Cortex. Huge kudos to @alanprot for
the implementation. I added a TODO item to convert our whole internal
caching infra to be promise based.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-11-21 11:27:53 +02:00
Saswata Mukherjee 02568235c4
Cut first release candidate for v0.37.0 (#7921)
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-11-19 11:21:32 +00:00
Saswata Mukherjee fd0643206a
docs: Fix formatting again (#7928)
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-11-18 17:25:35 +00:00
Saswata Mukherjee 6a2be98876
docs: Add link to ignore (#7926)
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-11-18 14:45:18 +00:00
Saswata Mukherjee df9cca7d31
Update objstore and promql-engine to latest (#7924)
* Update objstore and promql-engine to latest

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Fixes after upgrade

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-11-18 14:44:42 +00:00
Ben Ye f998fc59c1
Close block series client at the end to not reuse chunk buf (#7915)
* always close block series client at the end

Signed-off-by: Ben Ye <benye@amazon.com>

* add back close for loser tree

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>
2024-11-18 11:36:52 +00:00
Saswata Mukherjee 8c49344fea
Changelog: Mark v0.37 release in progress (#7920)
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-11-18 09:42:38 +00:00
Saswata Mukherjee 2a975d3366
Skip TestDistributedEngineWithDisjointTSDBs (#7911)
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-11-15 10:16:59 +00:00
Michael Hoffmann caa972ffd1
store, query: remote engine bug (#7904)
* Fix a storage GW bug that loses TSDB infos when joining them
* E2E test demonstrating a bug in the MinT calculation in distributed
  Engine

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-11-15 09:50:10 +00:00
Giedrius Statkevičius 20af3eb7df
receive/capnp: remove close (#7909)
I always get this in logs:
```
err: receive capnp conn: close tcp ...: use of closed network connection
```

This is also visible in the e2e test.

After Done() returns, the connection is closed either way so no need to
close it again.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-11-15 09:02:29 +00:00
Milind Dethe dc4c49f249
store: support hedged requests (#7860)
* support hedged requests in store

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>

* hedged roundtripper with tdigest for dynamic delay

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>

* refactor struct and fix lint

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>

* Improve hedging implementation

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>

* Improved hedging implementation

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>

* Update store doc

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>

* fix white space

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>

* add enabled field

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>

---------

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>
2024-11-14 16:06:56 +02:00
Ben Ye f9da21ec0b
Fix store debug matchers panic on regex matcher (#7903)
* fix store debug matchers panic on regex

Signed-off-by: Ben Ye <benye@amazon.com>

* add test

Signed-off-by: Ben Ye <benye@amazon.com>

* changelog

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>
2024-11-13 15:49:48 +02:00
Simon Pasquier bfbabbb89a
Fix ExternalLabels() for Prometheus v3.0 (#7893)
Prometheus v3.0.0-rc.0 introduces a new scrape protocol
(`PrometheusText1.0.0`) which is present by default in the global
configuration. It breaks the Thanos sidecar when it wants to retrieve
the external labels.

This change replaces the use of the Prometheus `GlobalConfig` struct by
a minimal struct which unmarshals only the `external_labels` key.

See also https://github.com/prometheus-operator/prometheus-operator/issues/7078

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2024-11-08 09:57:04 +00:00
Giedrius Statkevičius 928bc7aafb
*: bump Go version (#7891)
Use 1.23.3 as it contains a critical fix: https://github.com/golang/go/issues/70001

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-11-07 16:16:23 +02:00
Filip Petkovski 79593cb834
Merge pull request #7885 from fpetkovski/close-loser-tree
Fix bug in Bucket Series
2024-11-07 12:00:47 +01:00
Filip Petkovski 065c3beff2
Merge branch 'main' into close-loser-tree
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-11-07 11:47:58 +01:00
Giedrius Statkevičius ab43b2b20c
compact: add SyncMetas() timeout (#7887)
Add wait_interval*3 timeout to SyncMetas(). We had an incident in
production where object storage had had some problems and the syncer got
stuck due to no timeout. The timeout value is arbitrary but just exists
so that it wouldn't get stuck for eternity.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-11-06 16:26:54 +02:00
Michael Hoffmann 761487ccf5
Sidecar: use prometheus metrics for min timestamp (#7820)
Read "minT" from prometheus metrics so that we also set it for sidecars
that are not uploading blocks.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-11-06 15:00:35 +02:00
Giedrius Statkevičius df3df36986
discovery: preserve results from other resolve calls (#7886)
Properly preserve results from other resolve calls. There is an
assumption that resolve() is always called with the same addresses but
that is not true with gRPC and `--endpoint-group`. Without this fix,
multiple resolves could happen at the same time but some of the callers
will not be able to retrieve the results leading to random errors.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-11-06 09:46:21 +02:00
Michał Mazur ebfc03e5fc
query-frontend: Fix cache keys for dynamic split intervals (#7832) 2024-11-05 13:57:03 -08:00
Filip Petkovski 4550964eb5
Close loser tree outside of span
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-11-05 19:56:44 +01:00
Filip Petkovski 62eb843f94
Add CHANGELOG entry
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-11-05 13:38:02 +01:00
Filip Petkovski 77bd9c0cbb
Fix bug in Bucket Series
Applies the fix described in https://github.com/thanos-io/thanos/issues/7883.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-11-05 13:36:42 +01:00
Filip Petkovski 9bc3cc0d05
Merge pull request #7854 from pedro-stanaka/feat/qfe-force-stats-collection
QFE: new middleware to force query statistics collection
2024-11-05 09:31:46 +01:00
Ben Ye d6d19c568f
upgrade Prometheus to fix round function (#7877)
Signed-off-by: Ben Ye <benye@amazon.com>
2024-11-04 13:36:26 -08:00
Pedro Tanaka 3d47cdac9e
fixing one more unit test
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-11-04 16:21:23 +01:00
Pedro Tanaka 457b861228
Bind to existing stats tag
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-11-04 15:12:55 +01:00
Pedro Tanaka d16b0985da
CR comments
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-11-04 15:12:55 +01:00
Pedro Tanaka cb922bb2d8
adjust docs
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-11-04 15:12:55 +01:00
Pedro Tanaka e08d5bcad1
Adding CHANGELOG
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-10-31 17:56:37 +01:00
Pedro Tanaka 11a17086c0
Using context propagation to add sample information
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-10-31 17:48:21 +01:00
Pedro Tanaka e2f6ca34f9
Update stats protobuf
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-10-31 17:48:21 +01:00
Pedro Tanaka 9d2e5e0b69
QFE: Create new stats middleware to force query statistics collection
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-10-31 17:48:21 +01:00
pureiboi 62038110b1
allow user to specify tls version for backward compatibility (#7654)
* optional tls version logic

Signed-off-by: pureiboi <17396188+pureiboi@users.noreply.github.com>

* update cmd description and match doc

Signed-off-by: pureiboi <17396188+pureiboi@users.noreply.github.com>

* feat: update doc with make docs

Signed-off-by: pureiboi <17396188+pureiboi@users.noreply.github.com>

* fix indentation by linter

Signed-off-by: pureiboi <17396188+pureiboi@users.noreply.github.com>

---------

Signed-off-by: pureiboi <17396188+pureiboi@users.noreply.github.com>
Signed-off-by: pureiboi <17396188+pureiboi@users.noreply.github.com
2024-10-29 14:58:30 +02:00
Giedrius Statkevičius 19dc4b9478
Cut down test times (#7861)
Refactor so that leak detection is happening in TestMain;
Use t.Parallel() everywhere;
Reduce series/samples count in some tests that reuse the same functions
we use for benchmarking i.e. leave higher loads for benchmarks

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-10-29 10:08:01 +02:00
Ben Kochie 5749c8c332
Improve replica flag handling (#7855)
Add a string utility parsing function to improve the handling of replica
label flags. This allows for easier handling of flags when multiple
replica labels are need.
* Split flag parts that are comma separated.
* Remove any empty strings.
* Sort and deduplicate the slice.

For example in the case of multiple replica labels like:
`--query.replica-label=prometheus_replica,thanos_rule_replica`

Signed-off-by: SuperQ <superq@gmail.com>
2024-10-29 07:56:24 +00:00
Filip Petkovski a31af1da03
Merge pull request #7859 from logzio/logzio-logo
Add Logz.io to adopters
2024-10-24 15:13:43 +02:00
Michał Mazur 05724b9e93 Add Logz.io to adopters
Signed-off-by: Michał Mazur <mmazur.box@gmail.com>
2024-10-24 13:38:18 +02:00
Giedrius Statkevičius c10b695a5b
receive/multitsdb: defer unlock properly (#7857)
Do not forget to unlock.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-10-23 23:56:07 +03:00
Yu Long e5bb3a490e
UI: Select time range with mouse drag feature (#7853)
* UI: Select time range with mouse drag

Signed-off-by: Yu Long <yu.long@databricks.com>

* QueryFrontend: pass "stats" parameter forward (#7852)

If a querier sees a "stats" parameter in the query request, it will attach important information about the query execution to the response.
But currently, even if an user sets this value, the Query Frontend will lose this value in its middleware/roundtrippers.

This PR fixes this problem by properly encoding/decoding the requests in QFE.

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Yu Long <yu.long@databricks.com>

* build(deps): bump go.opentelemetry.io/otel/bridge/opentracing (#7851)

Bumps [go.opentelemetry.io/otel/bridge/opentracing](https://github.com/open-telemetry/opentelemetry-go) from 1.29.0 to 1.31.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.29.0...v1.31.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/bridge/opentracing
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Yu Long <yu.long@databricks.com>

* Update CHANGELOG

Signed-off-by: Yu Long <yu.long@databricks.com>

* Apply fix to linter error (from orig prom PR)

Signed-off-by: Yu Long <yu.long@databricks.com>

* Fix not-null assertion bug from orig PR

Signed-off-by: Yu Long <yu.long@databricks.com>

* Commit generated files

Signed-off-by: Yu Long <yu.long@databricks.com>

* Fix unit test

Signed-off-by: Yu Long <yu.long@databricks.com>

---------

Signed-off-by: Yu Long <yu.long@databricks.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Yu Long <yu.long@databricks.com>
Co-authored-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-10-23 11:44:35 +05:30
dependabot[bot] ea89306d0d
build(deps): bump go.opentelemetry.io/otel/bridge/opentracing (#7851)
Bumps [go.opentelemetry.io/otel/bridge/opentracing](https://github.com/open-telemetry/opentelemetry-go) from 1.29.0 to 1.31.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.29.0...v1.31.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/bridge/opentracing
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-10-22 09:09:57 +05:30
Pedro Tanaka 1bdcc655d8
QueryFrontend: pass "stats" parameter forward (#7852)
If a querier sees a "stats" parameter in the query request, it will attach important information about the query execution to the response.
But currently, even if an user sets this value, the Query Frontend will lose this value in its middleware/roundtrippers.

This PR fixes this problem by properly encoding/decoding the requests in QFE.

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-10-22 09:09:43 +05:30
Filip Petkovski 7d95913c50
Merge pull request #7843 from pedro-stanaka/fix/qfe-only-log-slow-queries
QFE: only log slow query, if it is a query endpoint
2024-10-21 10:44:46 +02:00
Pedro Tanaka 6ab96c3702
QFE: only log slow query, if it is a query endpoint
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-10-21 09:28:46 +02:00
Filip Petkovski 731e4607d3
Merge pull request #7838 from fpetkovski/optimize-validate-labels
Optimize validateLabels
2024-10-17 14:00:53 +02:00
Filip Petkovski 16130149ec
Fix lint
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-10-17 13:19:21 +02:00
Filip Petkovski 37d8e07226
Optimize validateLabels
Validating labels in the capnproto writer seems to use a notable
amount of CPU, mostly because it needlessly allocates bytes for each
labels validation.

This commit optimizes that function to have zero allocs.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-10-17 13:13:08 +02:00
Taras Didukh da6344a0ba
Merge branch 'main' into query_frontend_tls_redis_fix
Signed-off-by: Taras Didukh <didukh86@gmail.com>
2024-10-17 14:03:41 +03:00
Filip Petkovski 65b664c500
Implement capnproto replication (#7659)
* Implement capnproto replication

Our profiles from production show that a lot of CPU and memory in receivers
is used for unmarshaling protobuf messages. Although it is not possible to change
the remote-write format, we have the freedom to change the protocol used
for replicating timeseries data.

This commit introduces a new feature in receivers where replication can be done
using Cap'n Proto instead of gRPC + Protobuf. The advantage of the former protocol
is that deserialization is far cheaper and fields can be accessed directly from
the received message (byte slice) without allocating intermediate objects.
There is an additional cost for serialization because we have to convert from
Protobuf to the Cap'n proto format, but in our setup this still results in a net
reduction in resource usage.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Pass logger

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Update capnp

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Modify flag

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Lint

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Fix spellcheck

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Use previous version

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Update docker base

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Bump go

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Update docs/components/receive.md

Co-authored-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Validate labels

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* e2e: add receive test with capnp replication

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* receive: make copy only when necessary

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Fix failing test

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Add CHANGELOG entry

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Add capnproto Make target

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Replace panics with errors

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Fix benchmark

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Fix CHANGELOG

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Co-authored-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-10-17 09:45:01 +00:00
Vasiliy Rumyantsev 274f95e74f
store: label_values: fetch less postings (#7814)
* label_values: fetch less postings

Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>

* CHANGELOG.md

Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>

* added acceptance test

Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>

* removed redundant comment

Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>

* check if matcher is EQ matcher

Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>

* Update CHANGELOG.md

Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>

---------

Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
2024-10-17 07:49:26 +00:00
Giedrius Statkevičius fe51bd66c3
rule: add concurrent evals functionality (#7835)
Expose the new concurrent evaluation functionality from Ruler.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-10-16 15:38:46 +03:00
Filip Petkovski 6a3704c12b
Merge pull request #7834 from ntk148v/main
docs: add Thanos store memcached deployment note
2024-10-16 11:41:22 +02:00
Kien Nguyen Tuan a9061f3d84 docs: add Thanos store memcached deployment note
add Memcached deployment in Kubernetes, similar to Cortex [1].

[1] https://cortexmetrics.io/docs/blocks-storage/store-gateway/#memcached-index-cache

Signed-off-by: Kien Nguyen Tuan <kiennt98@fpt.com>
2024-10-16 15:50:01 +07:00
Filip Petkovski f33a44879c
Merge pull request #7833 from fpetkovski/update-go-1.23
Update go to 1.23 in the CI
2024-10-16 09:48:38 +02:00
Filip Petkovski 9045370ca9
Update go to 1.23 in the CI
This commit updates the go version to 1.23 in the CI, including
unit, e2e tests and promu crossbuild.

It also bumps bingo dependencies where needed.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-10-16 09:14:58 +02:00
Filip Petkovski 9c925bfae1
Fix coroutine leak (#7821)
* Fix coroutine leak

The in-process client uses a pull based iterator which needs
to be closed, otherwise it will leak the underlying coroutine.
When this happens, the tsdb reader will remain open which blocks head
compaction indefinitely.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Fix race condition

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Fix CHANGELOG

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Improve tests

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Fix blockSeriesClient

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Fix unit test

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Fix another unit test

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-10-15 08:39:02 +00:00
dependabot[bot] 175baf13de
build(deps): bump golang.org/x/time from 0.6.0 to 0.7.0 (#7826)
Bumps [golang.org/x/time](https://github.com/golang/time) from 0.6.0 to 0.7.0.
- [Commits](https://github.com/golang/time/compare/v0.6.0...v0.7.0)

---
updated-dependencies:
- dependency-name: golang.org/x/time
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-10-15 10:18:06 +03:00
dependabot[bot] 8b40ed075d
build(deps): bump github/codeql-action from 3.26.10 to 3.26.13 (#7822)
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.26.10 to 3.26.13.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](e2b3eafc8d...f779452ac5)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-10-15 10:04:21 +03:00
dependabot[bot] a9daf63350
build(deps): bump go.opentelemetry.io/otel/trace from 1.29.0 to 1.31.0 (#7825)
Bumps [go.opentelemetry.io/otel/trace](https://github.com/open-telemetry/opentelemetry-go) from 1.29.0 to 1.31.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.29.0...v1.31.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/trace
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-10-15 10:03:30 +03:00
dependabot[bot] 1de5a87fb2
build(deps): bump github.com/prometheus/common from 0.59.1 to 0.60.0 (#7824)
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.59.1 to 0.60.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Changelog](https://github.com/prometheus/common/blob/main/RELEASE.md)
- [Commits](https://github.com/prometheus/common/compare/v0.59.1...v0.60.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-10-15 10:03:02 +03:00
dependabot[bot] 8bea523c51
build(deps): bump google.golang.org/protobuf from 1.34.2 to 1.35.1 (#7827)
Bumps google.golang.org/protobuf from 1.34.2 to 1.35.1.

---
updated-dependencies:
- dependency-name: google.golang.org/protobuf
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-10-15 10:02:27 +03:00
Filip Petkovski 2f39d248d8
Merge pull request #7815 from fpetkovski/disable-chunk-trimming
Disable chunk trimming in Receivers
2024-10-14 12:48:23 +02:00
Filip Petkovski a79c710fc5
Fix docs
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-10-14 12:11:41 +02:00
Filip Petkovski 328385ac14 Extend godoc
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-10-14 11:48:11 +02:00
Filip Petkovski bdbcd9d39a Disable chunk trimming in Receivers
When trimming is not disabled, receivers end up recoding all chunks
in order to drop samples that are outside of the range.
This ends up being very expensive and causes ingestion problems during high
query load.

This commit disables trimming which should reduce CPU usage in receivers.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-10-14 11:48:11 +02:00
Giedrius Statkevičius d215f5b349
api: use jsoniter (#7816) 2024-10-11 10:38:38 -07:00
Giedrius Statkevičius af0900bfd2
*: bump deps + enable compaction randomization (#7813)
* *: bump deps + enable compaction randomization

Bump go.mod dependencies of prometheus and thanos promql-engine. Enable
randomized compaction start to help with reducing latency with multiple
TSDBs.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* CHANGELOG: add item

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* *: fix CI

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

---------

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-10-11 14:15:32 +03:00
Filip Petkovski 6623a3c02f
Use iterators for in-process Series calls (#7796)
* Use iterators for in-process Series calls

The TSDBStore has two implementations of Series. One uses a goroutine
and the other one buffers series in memory. Both are used for different
use cases and trade-off CPU and memory according to the use.

In order to reconcile these two approaches, we can use an iterator
which relies on coroutines that have a much lower overhead than goroutines.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Update golangci-lint

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Fix lint

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-10-10 10:48:32 +02:00
Walther Lee 55d0e09fa7
Query: Skip formatting strings if debug logging is disabled (#7678)
* skip formatting debug str if debug logging is disabled

Signed-off-by: Walther Lee <walthere.lee@gmail.com>

* make statis strings const

Signed-off-by: Walther Lee <walthere.lee@gmail.com>

---------

Signed-off-by: Walther Lee <walthere.lee@gmail.com>
2024-10-10 07:54:39 +05:30
Filip Petkovski f265c3b062
Merge pull request #7787 from niaurys/add_cuckoo_filter_on_metric_names
receive/multitsdb: add cuckoo filter on metric names
2024-10-02 08:07:23 +02:00
Filip Petkovski f8af674646
Disable dedup proxy in multi-tsdb (#7793)
The receiver manages independent TSDBs which do not have duplicated series.
For this reason it should be safe to disable deduplication of chunks and
reduce CPU usage for this path.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-10-01 21:00:58 +03:00
Mindaugas Niaura 701d312a1b rebase on main
Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>
2024-10-01 16:45:42 +03:00
Mindaugas Niaura a659d10d3c address PR comments, use options in tsbd initializations
Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>
2024-10-01 16:39:09 +03:00
Mindaugas Niaura d2b6089d58 fix TSDB pruning
Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>
2024-10-01 16:38:49 +03:00
Mindaugas Niaura f3493a1f92 use matchers in store filter
Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>
2024-10-01 16:38:47 +03:00
Mindaugas Niaura 068c92cf3b avoid copy in CuckooFilterMetricNameFilter
Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>
2024-10-01 16:38:17 +03:00
Mindaugas Niaura a4298bb850 add test cases for testFilter
Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>
2024-10-01 16:38:13 +03:00
Mindaugas Niaura bc3a1828fa add enable-feature flag to Receiver docs, fix newEndpointRef typo
Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>
2024-10-01 16:37:39 +03:00
Mindaugas Niaura 1cd8d90d4f receive/multitsdb: add cuckoo filter on metric names
Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>
2024-10-01 16:37:34 +03:00
dependabot[bot] b31a6376bd
build(deps): bump github/codeql-action from 3.26.6 to 3.26.10 (#7789)
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.26.6 to 3.26.10.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](4dd16135b6...e2b3eafc8d)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-10-01 16:36:23 +03:00
dependabot[bot] cec6c6f643
build(deps): bump github.com/redis/rueidis from 1.0.14-go1.18 to 1.0.47 (#7792)
Bumps [github.com/redis/rueidis](https://github.com/redis/rueidis) from 1.0.14-go1.18 to 1.0.47.
- [Release notes](https://github.com/redis/rueidis/releases)
- [Commits](https://github.com/redis/rueidis/compare/v1.0.14-go1.18...v1.0.47)

---
updated-dependencies:
- dependency-name: github.com/redis/rueidis
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-10-01 16:36:00 +03:00
Giedrius Statkevičius bcd20a3b88
*: switch back to gogoproto, rm stringlabels (#7790)
* Revert "store: add chunk pooling (#7771)"

This reverts commit a2113fd81c.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Revert "query/store: memoize PromLabels() call (#7767)"

This reverts commit 735db72a4b.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Revert "store: compare labels directly (#7766)"

This reverts commit 30f453edd8.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Revert "store: don't create intermediate labels (#7762)"

This reverts commit 8cd3fae938.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Revert "*: build with stringlabels (#7745)"

This reverts commit 883fade9bd.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Revert "*: enable gRPC pooling (#7742)"

This reverts commit ca8ab90266.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Revert "*: switch to vtprotobuf (#7721)"

This reverts commit a8e7109d50.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Revert "*: removing gogoproto extensions (#7718)"

This reverts commit 97710f41b0.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Revert "*: rm ZLabels (#7675)"

This reverts commit 8c8a88e2f9.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

---------

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-10-01 16:08:34 +03:00
Giedrius Statkevičius 4b012f9a59
store: reuse chunks map (#7783)
Reuse chunks map instead of creating a new one each time. This is a hot
path and shows up in profiles.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-10-01 12:01:28 +03:00
Taras Didukh 81350e8a0e
Merge branch 'main' into query_frontend_tls_redis_fix 2024-09-27 15:53:30 +03:00
Filip Petkovski 6ff5e1b3a2
Merge pull request #7785 from xnet-mobile/main
Add XNET to Adopters
2024-09-27 08:10:12 +02:00
Filip Petkovski 80179ce4f1
Merge pull request #7763 from pedro-stanaka/feat/http-wrappers-native-histo
ruler: use native histograms for client metrics
2024-09-27 07:28:24 +02:00
xnet-mobile 8ac0b4de7e
Update adopters.yml
Signed-off-by: xnet-mobile <105046137+xnet-mobile@users.noreply.github.com>
2024-09-27 03:32:56 +01:00
xnet-mobile 972fd1f9f2
Add files via upload
Signed-off-by: xnet-mobile <105046137+xnet-mobile@users.noreply.github.com>
2024-09-27 03:31:59 +01:00
Giedrius Statkevičius 90215ad135
receive: memoize exemplar/TSDB clients (#7782)
We call this on each Series() so memoize the creation of this slice.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-09-25 14:23:54 +03:00
Pedro Tanaka 303b1f1ab5
Merge branch 'main' into feat/http-wrappers-native-histo 2024-09-24 08:15:54 +02:00
Filip Petkovski 585899439b
Merge pull request #7764 from dongjiang1989/support-gomemlimit
chore: add GOMEMLIMIT in runtimeinfo api
2024-09-23 12:38:18 +02:00
dongjiang c797ec5636
Merge branch 'main' into support-gomemlimit 2024-09-23 16:00:47 +08:00
Giedrius Statkevičius a2113fd81c
store: add chunk pooling (#7771)
Pool byte slices inside of aggrchunks.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-09-23 10:39:41 +03:00
dongjiang 5bc638b392
Merge branch 'main' into support-gomemlimit 2024-09-22 18:19:49 +08:00
Kemal Akkoyun e69bf72337
Update affiliation of kakkoyun (#7773)
Signed-off-by: Kemal Akkoyun <kakkoyun@users.noreply.github.com>
2024-09-22 11:37:03 +02:00
Giedrius Statkevičius 103ef36e4a
e2e/e2ethanos: fix avalanche version (#7772)
`main` has some breakage so use older version for now.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-09-21 20:43:20 +03:00
dongjiang b93ec5369d
Merge branch 'main' into support-gomemlimit 2024-09-20 21:47:41 +08:00
Filip Petkovski 832d17ae64
Merge pull request #7768 from pedro-stanaka/banner-remove-thanoscon
Website: remove ThanosCon banner
2024-09-20 13:07:32 +02:00
Giedrius Statkevičius 735db72a4b
query/store: memoize PromLabels() call (#7767)
We use the stringlabels call so some allocations are inevitable but we
can be much smarter about it:

```
func (s *storeSeriesSet) At() (labels.Labels, []*storepb.AggrChunk) {
	return s.series[s.i].PromLabels(), s.series[s.i].Chunks <--- not memoized, new alloc on every At() call; need to memoize because of stringlabel. One alloc is inevitable.
}
```

```
lset, chks := s.SeriesSet.At()
if s.peek == nil {
	s.peek = &Series{Labels: labelpb.PromLabelsToLabelpbLabels(lset), Chunks: chks} <-- converting back to labelpb ?
	continue
}
```

```
if labels.Compare(lset, s.peek.PromLabels()) != 0 { <--- PromLabels() called; we can avoid this call
	s.lset, s.chunks = s.peek.PromLabels(), s.peek.Chunks <- PromLabels() called; we can avoid this
	s.peek = &Series{Labels: labelpb.PromLabelsToLabelpbLabels(lset), Chunks: chks} <--- converting back to labelpb; we can avoid this
	return true
}
```

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-09-20 13:00:02 +03:00
Pedro Tanaka 439c12f791
Website: remove ThanosCon banner
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-09-20 11:43:45 +02:00
Giedrius Statkevičius 30f453edd8
store: compare labels directly (#7766)
Do not create intermediate prometheus labels and compare the labels
directly.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-09-20 11:56:07 +03:00
Pedro Tanaka 5d36a5af10
Adding Pedro Tanaka as Triager (#7765)
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-09-20 09:04:43 +01:00
dongjiang1989 87d71ffe05
chore add GOMEMLIMIT
Signed-off-by: dongjiang1989 <dongjiang1989@126.com>
2024-09-20 11:34:23 +08:00
Pedro Tanaka 91a20eb980
adding changelog
Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>
2024-09-20 01:37:28 +02:00
Pedro Tanaka 4a46856cba
ruler: use native histograms for client metrics
Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>
2024-09-20 01:33:51 +02:00
Giedrius Statkevičius 8cd3fae938
store: don't create intermediate labels (#7762)
Just compare labelpb.Label directly instead of creating promlabels from
them.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-09-19 17:33:12 +03:00
Taras Didukh b95510618c
Merge branch 'main' into query_frontend_tls_redis_fix 2024-09-19 16:18:40 +03:00
Thibault Mange ade0aed6f4
remove internal links
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
2024-09-19 13:29:18 +02:00
Thibault Mange a9ae3070b9
fix links
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
2024-09-19 13:29:18 +02:00
Thibault Mange 62ec424747
format
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
2024-09-19 13:29:18 +02:00
Thibault Mange a631728945
fix img size
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
2024-09-19 13:29:18 +02:00
Thibault Mange 38a98c7ec0
add store limits
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
2024-09-19 13:29:18 +02:00
Thibault Mange e2fb8c034b
fix typo
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
2024-09-19 13:29:18 +02:00
Thibault Mange 72a4952f48
add part II
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
2024-09-19 13:29:13 +02:00
Thibault Mange 1625665caf
Fix blog article img rendering for Life of a sample Part I (#7761)
* add img style attribute

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix formatting

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix link

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove internal links

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

---------

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
2024-09-19 12:28:27 +01:00
Taras Didukh e4e645a40a
Merge branch 'main' into query_frontend_tls_redis_fix 2024-09-19 11:07:26 +03:00
Filip Petkovski 2bdb909af9
Merge pull request #7756 from fpetkovski/generalize-pool
Generalize the bucketed bytes pool
2024-09-18 15:54:43 +02:00
Thibault Mange 9dd7905a88
Blog article submission: Life of a Sample in Thanos Part I (#7748)
* part_1

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* typo

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* Update docs/blog/2023-11-20-life-of-a-sample-part-1.md

Co-authored-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* Update docs/blog/2023-11-20-life-of-a-sample-part-1.md

Co-authored-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* Update docs/blog/2023-11-20-life-of-a-sample-part-1.md

Co-authored-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* Update docs/blog/2023-11-20-life-of-a-sample-part-1.md

Co-authored-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* Update docs/blog/2023-11-20-life-of-a-sample-part-1.md

Co-authored-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* add sidecar, remove invalid links

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

---------

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Co-authored-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-09-18 13:41:00 +01:00
Filip Petkovski 95c9bcfffb
Fix test lint
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-09-17 14:48:27 +02:00
Filip Petkovski 381cad5d76
Generalize the bucketed bytes pool
Now that we have generics, we can generalize the bucketed bytes pool
to be used with slices of any type T.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-09-17 14:43:18 +02:00
Michael Hoffmann 883fade9bd
*: build with stringlabels (#7745)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-09-13 18:52:22 +02:00
Giedrius Statkevičius ca8ab90266
*: enable gRPC pooling (#7742)
Use the new CodecV2 interface to enable pooling gRPC
marshaling/unmarshaling buffers. Also, add missing includes to
scripts/genproto.sh so that we could enable the `pool` flag in the next
PR.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-09-13 12:34:41 +03:00
Michael Hoffmann 7bddb603e4
dep: bump objstore (#7741)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-09-13 11:06:11 +02:00
Tidhar Klein Orbach cbf4fb4b47
store: added a log error print when proxy limit are violated (#7683)
* store: added a log error print when proxy limit are violated

Signed-off-by: Tidhar Klein Orbach <tidhar.o@taboola.com>

* Update pkg/store/proxy.go

Co-authored-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Tidhar Klein Orbach <tizki@users.noreply.github.com>

---------

Signed-off-by: Tidhar Klein Orbach <tidhar.o@taboola.com>
Signed-off-by: Tidhar Klein Orbach <tizki@users.noreply.github.com>
Co-authored-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-09-13 08:14:54 +02:00
Harry John f4af8aa9dd
util: Pass limit to MergeSlice (#7706) 2024-09-12 18:30:19 -07:00
Michael Hoffmann d28918916e
query: add partition labels flag (#7722)
* query: add partition labels flag

The distributed engine decides when to push down certain operations by
checking if the external labels are still present, i.e. we can push down
a binary operation if its vector matching includes all external labels.
This is great but if you have multiple external labels that are
irrelevant for the partition this is problematic since query authors
must be aware of those irrelevant labels and must incorporate them into
their queries.
This PR attempts to solve that by giving an option to focus on the
labels that are relevant for the partition.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Update cmd/thanos/query.go

Co-authored-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

---------

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Co-authored-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-09-12 19:33:38 +02:00
Giedrius Statkevičius a8e7109d50
*: switch to vtprotobuf (#7721)
Finally, let's enable vtprotobuf!

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-09-12 16:39:18 +03:00
Giedrius Statkevičius 97710f41b0
*: removing gogoproto extensions (#7718)
Removed all gogoproto extensions and dealt with the changes. 2nd step in
removing gogoproto.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-09-10 15:17:06 +03:00
Michael Hoffmann 153607f4bf
receive: mark too-far-in-future flag as non-experimental (#7707)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-09-09 11:21:09 +02:00
Michael Hoffmann 27412d2868
*: get rid of store info api (#7704)
We support the Info gRPC api for 3 years now. We used to use Store API
Info as fallback if we encounter an endpoint that does not implement
Info gRPC but that should not happen now anymore.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-09-06 18:19:41 +02:00
Giedrius Statkevičius 0966192a44
receive/multitsdb: remove double lock (#7700)
Do not double lock here as in some situations it could lead to a
dead-lock situation.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-09-05 11:17:32 +03:00
Giedrius Statkevičius 8c8a88e2f9
*: rm ZLabels (#7675)
* server/grpc: add pooling

Add pooling for grpc requests/responses.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* *: rm ZLabels and friends

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* *: fix tests

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* go.mod: revert changes

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

---------

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-09-05 07:46:34 +02:00
qiyang 09db52562d
Store: Fix panic too smaller buffer (#7658)
Co-authored-by: dominic.qi <dominic.qi@jaco.live>
Co-authored-by: Ben Ye <benye@amazon.com>
2024-09-04 10:39:45 -07:00
Taras Didukh 53250395e2
Remove empty lines
Signed-off-by: Taras Didukh <didukh86@gmail.com>
2024-09-04 20:02:01 +03:00
Taras Didukh 4c5baced3b
Merge branch 'main' into query_frontend_tls_redis_fix
Signed-off-by: Taras Didukh <didukh86@gmail.com>
2024-09-04 18:15:49 +03:00
dependabot[bot] 75f0328ebb Bump golang.org/x/time from 0.5.0 to 0.6.0 (#7601)
Bumps [golang.org/x/time](https://github.com/golang/time) from 0.5.0 to 0.6.0.
- [Commits](https://github.com/golang/time/compare/v0.5.0...v0.6.0)

---
updated-dependencies:
- dependency-name: golang.org/x/time
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Taras Didukh <taras.didukh@advancedmd.com>
2024-09-04 18:00:20 +03:00
dependabot[bot] 1c2d6ca60b build(deps): bump github/codeql-action from 3.26.2 to 3.26.5 (#7667)
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.26.2 to 3.26.5.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](429e197704...2c779ab0d0)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Taras Didukh <taras.didukh@advancedmd.com>
2024-09-04 18:00:20 +03:00
Milind Dethe 93200ea437 website: max-height for version-picker dropdown (#7642)
Signed-off-by: milinddethe15 <milinddethe15@gmail.com>
Signed-off-by: Taras Didukh <taras.didukh@advancedmd.com>
2024-09-04 18:00:20 +03:00
dependabot[bot] 65c3a0e2e7 build(deps): bump go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp (#7666)
Bumps [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp](https://github.com/open-telemetry/opentelemetry-go) from 1.27.0 to 1.29.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.27.0...v1.29.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Taras Didukh <taras.didukh@advancedmd.com>
2024-09-04 18:00:07 +03:00
Michael Hoffmann a230123399 query: queryable is not respecting limits (#7679)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-09-04 18:00:07 +03:00
Taras Didukh c89dee19e5 Add record to the changelog
Signed-off-by: Taras Didukh <taras.didukh@advancedmd.com>
2024-09-04 17:58:47 +03:00
didukh86 52d84eb7c1 Query-frontend: Fix connection to Redis with TLS.
Issue: https://github.com/thanos-io/thanos/issues/7672

Signed-off-by: didukh86 <didukh86@gmail.com>
Signed-off-by: didukh86 <78904472+didukh86@users.noreply.github.com>
Signed-off-by: didukh86 <didukh86@gmail.com>
Signed-off-by: Taras Didukh <taras.didukh@advancedmd.com>
2024-09-04 17:58:47 +03:00
Saswata Mukherjee 9f2af3f78f
Fix CodeQL checks on main (#7698)
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-09-04 10:19:55 +01:00
dependabot[bot] d7f45e723f
build(deps): bump github/codeql-action from 3.26.5 to 3.26.6 (#7685)
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.26.5 to 3.26.6.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](2c779ab0d0...4dd16135b6)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-04 09:30:57 +01:00
dependabot[bot] 26392d5515
build(deps): bump go.opentelemetry.io/contrib/propagators/autoprop (#7688)
Bumps [go.opentelemetry.io/contrib/propagators/autoprop](https://github.com/open-telemetry/opentelemetry-go-contrib) from 0.53.0 to 0.54.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go-contrib/compare/zpages/v0.53.0...zpages/v0.54.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/contrib/propagators/autoprop
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-04 09:29:49 +01:00
dependabot[bot] f752793c65
build(deps): bump go.opentelemetry.io/otel/bridge/opentracing (#7689)
Bumps [go.opentelemetry.io/otel/bridge/opentracing](https://github.com/open-telemetry/opentelemetry-go) from 1.28.0 to 1.29.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.28.0...v1.29.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/bridge/opentracing
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-04 09:29:20 +01:00
Filip Petkovski 956fe47611
Merge pull request #7697 from thanos-io/dependabot/go_modules/github.com/prometheus/common-0.58.0
build(deps): bump github.com/prometheus/common from 0.55.0 to 0.58.0
2024-09-04 08:22:18 +02:00
Harry John 2c488dcf1f
store: Implement metadata API limit in stores (#7652)
* Store: Implement metadata API limit in stores

Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>

* Apply seriesLimit in nextBatch

Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>

---------

Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
2024-09-03 14:19:14 -07:00
dependabot[bot] 295d8a924c
build(deps): bump go.opentelemetry.io/contrib/samplers/jaegerremote (#7686)
Bumps [go.opentelemetry.io/contrib/samplers/jaegerremote](https://github.com/open-telemetry/opentelemetry-go-contrib) from 0.22.0 to 0.23.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go-contrib/compare/v0.22.0...v0.23.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/contrib/samplers/jaegerremote
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-03 22:02:22 +01:00
dependabot[bot] 15b221baaf
build(deps): bump github.com/prometheus/common from 0.55.0 to 0.58.0
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.55.0 to 0.58.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Changelog](https://github.com/prometheus/common/blob/main/RELEASE.md)
- [Commits](https://github.com/prometheus/common/compare/v0.55.0...v0.58.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-03 21:00:00 +00:00
dependabot[bot] a1fc99706a
build(deps): bump go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc (#7692)
Bumps [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc](https://github.com/open-telemetry/opentelemetry-go) from 1.28.0 to 1.29.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.28.0...v1.29.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-03 21:59:50 +01:00
dependabot[bot] 113c416eaf
build(deps): bump github.com/felixge/fgprof from 0.9.4 to 0.9.5 (#7691)
Bumps [github.com/felixge/fgprof](https://github.com/felixge/fgprof) from 0.9.4 to 0.9.5.
- [Release notes](https://github.com/felixge/fgprof/releases)
- [Commits](https://github.com/felixge/fgprof/compare/v0.9.4...v0.9.5)

---
updated-dependencies:
- dependency-name: github.com/felixge/fgprof
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-03 21:58:23 +01:00
dependabot[bot] 727c3c9be1
build(deps): bump github.com/prometheus/client_golang (#7693)
Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.19.1 to 1.20.2.
- [Release notes](https://github.com/prometheus/client_golang/releases)
- [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md)
- [Commits](https://github.com/prometheus/client_golang/compare/v1.19.1...v1.20.2)

---
updated-dependencies:
- dependency-name: github.com/prometheus/client_golang
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-03 21:58:04 +01:00
Mikhail Nozdrachev 74651777de
Receive: fix `thanos_receive_write_{timeseries,samples}` stats (#7643)
* Revert "Receive: fix stats (#7373)"

This reverts commit 66841fbb1e.

Signed-off-by: Mikhail Nozdrachev <mikhail.nozdrachev@aiven.io>

* Receive: fix `thanos_receive_write_{timeseries,samples}` stats

There are two path data can be written to a receiver: through the HTTP
or the gRPC endpoint, and `thanos_receive_write_{timeseries,samples}` only
count the number of timeseries/samples received through the HTTP
endpoint.

So, there is no risk that a sample will be counted twice, once as a
remote write and once as a local write. On the other hand, we still need
to account for the replication factor, and only count local writes is
not enough as there might be no local writes at all (e.g. in RouterOnly
mode).

Signed-off-by: Mikhail Nozdrachev <mikhail.nozdrachev@aiven.io>

---------

Signed-off-by: Mikhail Nozdrachev <mikhail.nozdrachev@aiven.io>
2024-09-03 14:17:08 +02:00
Mikhail Nozdrachev 1c5e7f1ab0
test: Fix flaky receive/multitsdb test (#7694)
There is race condition in `TestMultiTSDBPrune` due to a dangling goroutine
which can fail outside of the test function's lifetime if the database object
is closed before `Sync()` is finished.

Signed-off-by: Mikhail Nozdrachev <mikhail.nozdrachev@aiven.io>
2024-09-03 12:07:32 +02:00
dependabot[bot] dfeaf6e258
build(deps): bump github.com/onsi/gomega from 1.33.1 to 1.34.2 (#7681)
Bumps [github.com/onsi/gomega](https://github.com/onsi/gomega) from 1.33.1 to 1.34.2.
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](https://github.com/onsi/gomega/compare/v1.33.1...v1.34.2)

---
updated-dependencies:
- dependency-name: github.com/onsi/gomega
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-02 13:35:36 +03:00
dependabot[bot] acf423dac6
Bump golang.org/x/time from 0.5.0 to 0.6.0 (#7601)
Bumps [golang.org/x/time](https://github.com/golang/time) from 0.5.0 to 0.6.0.
- [Commits](https://github.com/golang/time/compare/v0.5.0...v0.6.0)

---
updated-dependencies:
- dependency-name: golang.org/x/time
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-02 09:23:41 +00:00
dependabot[bot] c200719861
build(deps): bump github/codeql-action from 3.26.2 to 3.26.5 (#7667)
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.26.2 to 3.26.5.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](429e197704...2c779ab0d0)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-02 08:27:21 +00:00
Milind Dethe d1b8382eab
website: max-height for version-picker dropdown (#7642)
Signed-off-by: milinddethe15 <milinddethe15@gmail.com>
2024-09-02 09:11:54 +01:00
dependabot[bot] 3d03cb4885
build(deps): bump go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp (#7666)
Bumps [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp](https://github.com/open-telemetry/opentelemetry-go) from 1.27.0 to 1.29.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.27.0...v1.29.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-02 09:09:59 +01:00
Michael Hoffmann 3270568f6b
query: queryable is not respecting limits (#7679)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-09-02 09:55:16 +03:00
Giedrius Statkevičius 8d3d34be70
receive: change quorum calculation for RF=2 (#7669)
As discussed during ThanosCon, I am updating the handling for RF=2 to
require only one successful write because requiring all writes to
succeed all the time doesn't make sense and causes lots of confusion to
users. The only other alternative is to forbid RF=2 but I think we
shouldn't do that because people would be forced to add extra resources
when moving from a Sidecar based setup.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-08-28 12:12:45 +03:00
dependabot[bot] 5dc91d287c
Bump github.com/miekg/dns from 1.1.59 to 1.1.62 (#7651)
Bumps [github.com/miekg/dns](https://github.com/miekg/dns) from 1.1.59 to 1.1.62.
- [Changelog](https://github.com/miekg/dns/blob/master/Makefile.release)
- [Commits](https://github.com/miekg/dns/compare/v1.1.59...v1.1.62)

---
updated-dependencies:
- dependency-name: github.com/miekg/dns
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-28 11:23:30 +03:00
Mario Trangoni a82a121c1a
codespell: check `pkg` folder (#7655)
* pkg/store: Fix all spelling issues discovered by codespell.

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>

* pkg/block: Fix all spelling issues discovered by codespell.

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>

* pkg/query: Fix all spelling issues discovered by codespell.

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>

* pkg/rules: Fix all spelling issues discovered by codespell.

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>

* pkg: Fix all spelling issues discovered by codespell.

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>

* codespell: Adjust CI job to exclude some pkg exceptions

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>

---------

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2024-08-28 11:23:05 +03:00
Walther Lee 8af5139a73
implement memcachedClient.Set in internal/cortex (#7610)
Signed-off-by: Walther Lee <walthere.lee@gmail.com>
2024-08-27 11:44:30 +03:00
pureiboi ce52e9fda8
fix(ui): add null check to find overlapping blocks logicx (#7644)
Signed-off-by: pureiboi <17396188+pureiboi@users.noreply.github.com>
2024-08-27 11:39:13 +03:00
Walther Lee d96661353d
Store: Fix LabelNames and LabelValues when using non-equal matchers (#7661)
* fix non-equal matchers in bucket FilterExtLabelsMatchers

Signed-off-by: Walther Lee <walthere.lee@gmail.com>

* add acceptance tests

Signed-off-by: Walther Lee <walthere.lee@gmail.com>

---------

Signed-off-by: Walther Lee <walthere.lee@gmail.com>
2024-08-24 07:00:36 +02:00
Ritesh Sonawane 6737c8dd2f
Added Scaling Prometheus with Thanos Blog from CloudRaft (#7653)
* Added Scaling Prometheus with Thanos Blog from CloudRaft

Signed-off-by: riteshsonawane1372 <riteshsonawane1372@gmail.com>

* signed commit

Signed-off-by: riteshsonawane1372 <riteshsonawane1372@gmail.com>

---------

Signed-off-by: riteshsonawane1372 <riteshsonawane1372@gmail.com>
2024-08-20 10:31:21 +01:00
Harshita Sao 9301004d55
fix: fixed the token-permission and pinned dependencies issue (#7649) 2024-08-19 09:10:47 -07:00
Filip Petkovski e197368a0c
Merge pull request #7646 from mjtrangoni/fix-spelling-issues
Fix spelling issues discovered by codespell
2024-08-19 16:03:49 +02:00
Mario Trangoni 1690d5b49b
codespell: Add GitHub actions job to the CI
Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2024-08-19 12:18:52 +02:00
Filip Petkovski c3e83fc4b4
Merge pull request #7650 from harshitasao/vulnerability-fix
vulnerability fix
2024-08-19 11:38:24 +02:00
harshitasao 4d43e436d1 vulnerability fix
Signed-off-by: harshitasao <harshitasao@gmail.com>
2024-08-18 18:50:51 +05:30
Filip Petkovski e62dbebe09
Merge pull request #7645 from fpetkovski/stringlabels
Add support for stringlabels in Thanos Query
2024-08-18 11:03:05 +02:00
Filip Petkovski b55845d084
Add CI step
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-08-16 17:39:45 +02:00
Mario Trangoni 0523e6eae9
Fix all spelling issues discovered by codespell.
Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2024-08-16 16:01:11 +02:00
Mario Trangoni bfa8beec78
mixin: Fix all spelling issues discovered by codespell.
Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2024-08-16 15:49:29 +02:00
Mario Trangoni 3cef1b6bc8
docs: Fix all spelling issues discovered by codespell.
Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2024-08-16 15:43:46 +02:00
Filip Petkovski e86e200155
Remove compatibility label
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-08-16 15:41:27 +02:00
Mario Trangoni af663bc696
tutorials: Fix all spelling issues discovered by codespell.
Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2024-08-16 15:25:15 +02:00
Mario Trangoni b42286198a
examples: Fix all spelling issues discovered by codespell.
Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2024-08-16 15:19:39 +02:00
Filip Petkovski 0bc02dd536
Use EmptyLabels()
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-08-16 14:57:34 +02:00
Filip Petkovski f7befd2339
Add support for stringlabels in Thanos Query
This commit finalizes support for the stringlabels build tag
so that we can build the binary.

I would assume that we will still get panics if we run a store
since the Series call still relies on casting one pointer type
to another. This will be fixed in a follow up PR.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-08-16 14:09:34 +02:00
Filip Petkovski 825b7c66ac
Merge pull request #7641 from mjtrangoni/fix-errcheck
golangci-lint: Update deprecated linter configurations
2024-08-15 11:13:56 +02:00
Mario Trangoni cc138f1840
golangci: Replace deprecated `run.deadline` with `run.timeout`.
Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2024-08-15 10:32:21 +02:00
Mario Trangoni 6612d56338
golangci: Replace deprecated `run.skip-dirs` with `issues.exclude-dirs`.
See,
level=warning msg="[config_reader] The configuration option `run.skip-dirs` is deprecated, please use `issues.exclude-dirs`."

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2024-08-15 10:29:31 +02:00
Mario Trangoni e36f574f06
golangci: Fix output format configuration
See,
level=warning msg="[config_reader] The configuration option `output.format` is deprecated, please use `output.formats`"

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2024-08-15 10:23:03 +02:00
Mario Trangoni 57a3acbba4
golangci: Fix errcheck configuration
See,
level=warning msg="[config_reader] The configuration option `linters.errcheck.exclude` is deprecated, please use `linters.errcheck.exclude-functions`."

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2024-08-15 10:17:27 +02:00
Ben Ye 692a4a478f
Check context cancellation every 128 iterations (#7622) 2024-08-15 06:36:14 +01:00
Saswata Mukherjee 4e08c206cb
Merge release 0.36.1 to main (#7639)
* CHANGELOG: Mark 0.36 as in progress

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Cut release candidate v0.36.0-rc.0 (#7490)

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Cut release candidate 0.36.0 rc.1 (#7510)

* *: fix server grpc histograms (#7493)

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Close endpoints after the gRPC server has terminated (#7509)

Endpoints are currently closed as soon as we receive a SIGTERM or SIGINT.
This causes in-flight queries to get cancelled since outgoing connections
get closed instantly.

This commit moves the endpoints.Close call after the grpc server shutdown
to make sure connections are available as long as the server is running.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Cut release candidate v0.36.0-rc.1

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

---------

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Co-authored-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Cut release v0.36.0 (#7578)

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Cut patch release `v0.36.1` (#7636)

* Proxy: Query goroutine leak when `store.response-timeout` is set (#7618)

time.AfterFunc() returns a time.Timer object whose C field is nil,
accroding to the documentation. A goroutine blocks forever on reading
from a `nil` channel, leading to a goroutine leak on random slow
queries.

Signed-off-by: Mikhail Nozdrachev <mikhail.nozdrachev@aiven.io>

* pkg/clientconfig: fix TLS configs with only CA (#7634)

065e3dd75a introduced a regression: TLS configurations for Thanos Ruler
query and alerting with only a CA file failed to load.

For instance, the following snippet is a valid query configuration:

```
- static_configs:
  - prometheus.example.com:9090
  scheme: https
  http_config:
    tls_config:
      ca_file: /etc/ssl/cert.pem
```

The test fixtures (CA, certificate and key files) are copied from
prometheus/common and are valid until 2072.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Cut patch release v0.36.1

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Fix failing e2e test (#7620)

Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Mikhail Nozdrachev <mikhail.nozdrachev@aiven.io>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
Co-authored-by: Mikhail Nozdrachev <mikhail.nozdrachev@aiven.io>
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Co-authored-by: Harry John <johrry@amazon.com>

---------

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Mikhail Nozdrachev <mikhail.nozdrachev@aiven.io>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
Co-authored-by: Michael Hoffmann <mhoffm@posteo.de>
Co-authored-by: Filip Petkovski <filip.petkovsky@gmail.com>
Co-authored-by: Mikhail Nozdrachev <mikhail.nozdrachev@aiven.io>
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Co-authored-by: Harry John <johrry@amazon.com>
2024-08-14 09:45:11 +01:00
Saswata Mukherjee 08b0993244
Fix changelog on main after 0.36 release (#7635)
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-08-13 10:52:13 +01:00
Simon Pasquier 4fd2d8a273
pkg/clientconfig: fix TLS configs with only CA (#7634)
065e3dd75a introduced a regression: TLS configurations for Thanos Ruler
query and alerting with only a CA file failed to load.

For instance, the following snippet is a valid query configuration:

```
- static_configs:
  - prometheus.example.com:9090
  scheme: https
  http_config:
    tls_config:
      ca_file: /etc/ssl/cert.pem
```

The test fixtures (CA, certificate and key files) are copied from
prometheus/common and are valid until 2072.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2024-08-13 10:18:31 +01:00
Mikhail Nozdrachev 4050c73f8f
Proxy: Query goroutine leak when `store.response-timeout` is set (#7618)
time.AfterFunc() returns a time.Timer object whose C field is nil,
accroding to the documentation. A goroutine blocks forever on reading
from a `nil` channel, leading to a goroutine leak on random slow
queries.

Signed-off-by: Mikhail Nozdrachev <mikhail.nozdrachev@aiven.io>
2024-08-13 08:35:54 +01:00
Harry John 49617f4d16
Fix failing e2e test (#7620)
Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
2024-08-12 19:15:28 +01:00
Ben Ye 2375b59ee3
fix GetActiveAndPartialBlockIDs panic (#7621)
Signed-off-by: Ben Ye <benye@amazon.com>
2024-08-12 09:13:04 +01:00
Simon Pasquier c9500df77b
Update CHANGELOG.md after #7614 (#7619)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2024-08-10 07:10:23 +01:00
Simon Pasquier dcadaae80f
*: fix debug log formatting (#7614)
Before the change:

```
... msg="maxprocs: No GOMAXPROCS change to reset%!(EXTRA []interface {}=[])
```

After this change:

```
... msg="maxprocs: No GOMAXPROCS change to reset"
```

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2024-08-09 09:29:59 +01:00
Harry John a3a7c3b75c
API: Add limit param in metadata APIs (#7609) 2024-08-08 12:59:59 -07:00
Ben Ye 73648360ff
Only increment ruler warning eval metric for non PromQL warnings (#7592) 2024-08-07 11:02:31 -07:00
Filip Petkovski 08af5d7b55
Merge pull request #7608 from ahurtaud/amadeuslogo
website: Update amadeus logo to latest
2024-08-07 18:50:39 +02:00
Alban Hurtaud e24b922593
Merge branch 'main' into amadeuslogo 2024-08-07 15:07:14 +02:00
Alban HURTAUD ca47024035 Update amadeus logo to latest
Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>
2024-08-07 14:20:09 +02:00
Michael Hoffmann e155196618
Merge release 0.36 to main (#7588)
* CHANGELOG: Mark 0.36 as in progress

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Cut release candidate v0.36.0-rc.0 (#7490)

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Cut release candidate 0.36.0 rc.1 (#7510)

* *: fix server grpc histograms (#7493)

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Close endpoints after the gRPC server has terminated (#7509)

Endpoints are currently closed as soon as we receive a SIGTERM or SIGINT.
This causes in-flight queries to get cancelled since outgoing connections
get closed instantly.

This commit moves the endpoints.Close call after the grpc server shutdown
to make sure connections are available as long as the server is running.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Cut release candidate v0.36.0-rc.1

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

---------

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Co-authored-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Cut release v0.36.0 (#7578)

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

---------

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Co-authored-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-08-02 14:55:26 +02:00
Michael Hoffmann a3d0aad67e
docs: add saswatas youtube introduction to blog (#7589)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-08-02 13:58:59 +02:00
Yuan-Kui Li 7c360d1930
Add Synology to adopters (#7581)
Signed-off-by: Yuan-Kui Li <yuankuili@synology.com>
2024-08-02 08:27:57 +01:00
Harry John f19b8c6161
Add @harry671003 to triagers (#7576)
Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
2024-07-31 14:30:35 +02:00
Michael Hoffmann bc42129651
discovery: use thanos resolver for endpoint groups (#7565)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-07-31 11:35:08 +02:00
Filip Petkovski 70b129888c
Merge pull request #7567 from thanos-io/metalmatze-maintainer-removal
Remove metalmatze from Thanos maintainers
2024-07-30 14:10:24 +02:00
Matthias Loibl e37cebc54a
Merge branch 'main' into metalmatze-maintainer-removal 2024-07-30 11:52:38 +01:00
Filip Petkovski 6b05aa4cc4
Merge pull request #7568 from SuperQ/thanos_engine_doc
Update Thanos PromQL Engine docs
2024-07-30 09:44:54 +02:00
SuperQ 4d2e84c101
Update Thanos PromQL Engine docs
Move the section on the distributed engine mode into the "Thanos PromQL
Engine" section since the new engine is required for distributed mode.
This also fixes an alignment issue which makes the distributed mode look
like it's part of the Tenancy section.

Also rename the section header to give it clearer "Thanos PromQL Engine"
branding.

Signed-off-by: SuperQ <superq@gmail.com>
2024-07-29 15:08:34 +02:00
Matthias Loibl e21dd45c3d
Remove metalmatze from Thanos maintainers
Thank you all for the last 5 years!

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2024-07-29 10:27:42 +02:00
dependabot[bot] 639bf8f216
Bump go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc (#7525)
Bumps [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc](https://github.com/open-telemetry/opentelemetry-go) from 1.27.0 to 1.28.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.27.0...v1.28.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-23 17:51:14 +05:30
dependabot[bot] 971785e4d5
Bump golang.org/x/net from 0.26.0 to 0.27.0 (#7544)
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.26.0 to 0.27.0.
- [Commits](https://github.com/golang/net/compare/v0.26.0...v0.27.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-23 17:48:38 +05:30
Jacob Baungård Hansen fb20d8ced6
api/rules: Add filtering on rule name/group/file (#7560)
This commits adds the option of filtering rules by rule name, rule
group, or file. This brings the rule API closer in-line with the current
Prometheus api.

Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>
2024-07-23 09:18:30 +05:30
dependabot[bot] 990a60b726
Bump go.opentelemetry.io/otel/bridge/opentracing from 1.21.0 to 1.28.0 (#7528)
Bumps [go.opentelemetry.io/otel/bridge/opentracing](https://github.com/open-telemetry/opentelemetry-go) from 1.21.0 to 1.28.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.21.0...v1.28.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/bridge/opentracing
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-22 10:28:12 -07:00
dependabot[bot] 0da18ad763
Bump golang.org/x/crypto from 0.24.0 to 0.25.0 (#7545)
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.24.0 to 0.25.0.
- [Commits](https://github.com/golang/crypto/compare/v0.24.0...v0.25.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-22 10:20:48 -07:00
Harry John 466b0beb97
Update prometheus and promql-engine dependencies (#7558)
Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
2024-07-22 10:14:22 -07:00
Nishant Bansal 5765d3c1c9
Fix issue #7550: Bug fix and complete test coverage for tools.go (#7552)
Signed-off-by: Nishant Bansal <nishant.bansal.mec21@iitbhu.ac.in>
2024-07-21 18:51:32 +05:30
Harry John f77eff80ab
Build with Go 1.22 (#7559)
Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
2024-07-19 19:57:45 -07:00
Thomas Hartland 35c0dbec85
compact: Update filtered blocks list before second downsample pass (#7492)
* compact: Update filtered blocks list before second downsample pass

If the second downsampling pass is given the same filteredMetas
list as the first pass, it will create duplicates of blocks
created in the first pass.

It will also not be able to do further downsampling e.g 5m->1h
using blocks created in the first pass, as it will not be aware
of them.

The metadata was already being synced before the second pass,
but not updated into the filteredMetas list.

Signed-off-by: Thomas Hartland <thomas.hartland@diamond.ac.uk>

* Update changelog

Signed-off-by: Thomas Hartland <thomas.hartland@diamond.ac.uk>

* e2e/compact: Fix number of blocks cleaned assertion

The value was increased in 2ed48f7 to fix the test,
with the reasoning that the hardcoded value must
have been taken from a run of the CI that didn't
reach the max value due to CI worker lag.

More likely the real reason is that commit 68bef3f
the day before had caused blocks to be duplicated
during downsampling.

The duplicate block is immediately marked for deletion,
causing an extra +1 in the number of blocks cleaned.

Subtracting one from the value again now that the
block duplication issue is fixed.

Signed-off-by: Thomas Hartland <thomas.hartland@diamond.ac.uk>

* e2e/compact: Revert change to downsample count assertion

Combined with the previous commit this effectively reverts
all of 2ed48f7, in which two assertions were changed to
(unknowingly) account for a bug which had just been
introduced in the downsampling code, causing duplicate blocks.

This assertion change I am less sure on the reasoning for,
but after running through the e2e tests several times locally,
it is consistent that the only downsampling happens in the
"compact-working" step, and so all other steps would report 0
for their total downsamples metric.

Signed-off-by: Thomas Hartland <thomas.hartland@diamond.ac.uk>

---------

Signed-off-by: Thomas Hartland <thomas.hartland@diamond.ac.uk>
2024-07-13 13:11:26 -07:00
dependabot[bot] 34e0729607
Bump go.opentelemetry.io/contrib/samplers/jaegerremote (#7529)
Bumps [go.opentelemetry.io/contrib/samplers/jaegerremote](https://github.com/open-telemetry/opentelemetry-go-contrib) from 0.7.0 to 0.22.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go-contrib/compare/v0.7.0...v0.22.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/contrib/samplers/jaegerremote
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-11 16:41:48 +02:00
dependabot[bot] 50c304dcd7
Bump go.opentelemetry.io/contrib/propagators/autoprop (#7530)
Bumps [go.opentelemetry.io/contrib/propagators/autoprop](https://github.com/open-telemetry/opentelemetry-go-contrib) from 0.38.0 to 0.53.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go-contrib/compare/zpages/v0.38.0...zpages/v0.53.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/contrib/propagators/autoprop
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-11 15:05:52 +01:00
Joel Verezhak fb76b226bd
Remove trailing period from SRV records (#7494)
Recently ran into an issue with Istio in particular, where leaving the
trailing dot on the SRV record returned by `dnssrvnoa` lookups led to an
inability to connect to the endpoint. Removing the trailing dot fixes
this behaviour.

Now, technically, this is a valid URL, and it shouldn't be a problem.
One could definitely argue that Istio should be responsible here for
ensuring that the traffic is delivered. The problem seems rooted in how
Istio attempts to do wildcard matching or URLs it receives - including
the dot leads it to lookup an empty DNS field, which is invalid.

The approach I take here is actually copied from how Prometheus does it.
Therefore I hope we can sneak this through with the argument that 'this
is how Prometheus does it', regardless of whether or not this is
philosophically correct...

Signed-off-by: verejoel <j.verezhak@gmail.com>
2024-07-09 06:07:52 +00:00
Pedro Tanaka 6f1245483e
QFE: disable double compression middleware (#7511)
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-07-08 11:07:49 +01:00
Vasiliy Rumyantsev cb27548cc4
removed mention of unused pkg (#7515)
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
2024-07-07 19:57:14 -07:00
Filip Petkovski a922b219ef
Close endpoints after the gRPC server has terminated (#7509)
Endpoints are currently closed as soon as we receive a SIGTERM or SIGINT.
This causes in-flight queries to get cancelled since outgoing connections
get closed instantly.

This commit moves the endpoints.Close call after the grpc server shutdown
to make sure connections are available as long as the server is running.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-07-03 15:20:23 +02:00
Rishabh Soni 0ae5bfc22e
chore: Add nirmata to adopters (#7506)
* Update adopters.yml

Signed-off-by: Rishabh Soni <risrock02@gmail.com>

* Add files via upload

Signed-off-by: Rishabh Soni <risrock02@gmail.com>

---------

Signed-off-by: Rishabh Soni <risrock02@gmail.com>
2024-07-02 14:45:50 -07:00
Pranshu Srivastava fcc88c028a
reloader: allow suppressing envvar errors (#7429)
Allow suppressing environment variables expansion errors when unset, and
thus keep the reloader from crashing. Instead leave them as is.

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
2024-07-02 09:41:27 +01:00
Michael Hoffmann 417595c4e5
*: fix server grpc histograms (#7493)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-06-27 19:12:15 +02:00
Michael Hoffmann 57b42d1bf4
CHANGELOG: Mark 0.36 as in progress (#7486)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-06-26 17:39:40 +02:00
Michael Hoffmann 9a96e346ed
Proxy: fix response set panic (#7484)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-06-26 06:47:47 +02:00
dependabot[bot] d82b2bd36d
Bump actions/cache from 3 to 4 (#7458)
Bumps [actions/cache](https://github.com/actions/cache) from 3 to 4.
- [Release notes](https://github.com/actions/cache/releases)
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md)
- [Commits](https://github.com/actions/cache/compare/v3...v4)

---
updated-dependencies:
- dependency-name: actions/cache
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-25 08:10:55 +01:00
dependabot[bot] b31034e08c
Bump google.golang.org/protobuf from 1.34.1 to 1.34.2 (#7437)
Bumps google.golang.org/protobuf from 1.34.1 to 1.34.2.

---
updated-dependencies:
- dependency-name: google.golang.org/protobuf
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-25 08:09:36 +01:00
dependabot[bot] 0f109dcb1a
Bump braces from 3.0.2 to 3.0.3 in /pkg/ui/react-app (#7424)
Bumps [braces](https://github.com/micromatch/braces) from 3.0.2 to 3.0.3.
- [Changelog](https://github.com/micromatch/braces/blob/master/CHANGELOG.md)
- [Commits](https://github.com/micromatch/braces/compare/3.0.2...3.0.3)

---
updated-dependencies:
- dependency-name: braces
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-25 08:08:35 +01:00
dependabot[bot] 6a08cd1c19
Bump actions/setup-node from 3 to 4 (#7433)
Bumps [actions/setup-node](https://github.com/actions/setup-node) from 3 to 4.
- [Release notes](https://github.com/actions/setup-node/releases)
- [Commits](https://github.com/actions/setup-node/compare/v3...v4)

---
updated-dependencies:
- dependency-name: actions/setup-node
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-25 08:08:09 +01:00
dependabot[bot] a179bdf3dc
Bump go.elastic.co/apm/module/apmot from 1.11.0 to 1.15.0 (#7441)
Bumps [go.elastic.co/apm/module/apmot](https://github.com/elastic/apm-agent-go) from 1.11.0 to 1.15.0.
- [Release notes](https://github.com/elastic/apm-agent-go/releases)
- [Changelog](https://github.com/elastic/apm-agent-go/blob/main/CHANGELOG.asciidoc)
- [Commits](https://github.com/elastic/apm-agent-go/compare/v1.11.0...v1.15.0)

---
updated-dependencies:
- dependency-name: go.elastic.co/apm/module/apmot
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-25 08:07:30 +01:00
dependabot[bot] ac9ed2b939
Bump github.com/opentracing/basictracer-go from 1.0.0 to 1.1.0 (#7449)
Bumps [github.com/opentracing/basictracer-go](https://github.com/opentracing/basictracer-go) from 1.0.0 to 1.1.0.
- [Release notes](https://github.com/opentracing/basictracer-go/releases)
- [Commits](https://github.com/opentracing/basictracer-go/compare/v1.0.0...v1.1.0)

---
updated-dependencies:
- dependency-name: github.com/opentracing/basictracer-go
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-25 08:07:09 +01:00
Michael Hoffmann ddcdeebebe
chore: update objstore (#7477)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-06-22 09:44:37 -07:00
Michael Hoffmann d0045e9ea9
chore: fix docs check (#7478)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-06-22 13:27:59 +02:00
Michael Hoffmann 52975fca66
Store: fix merge race (#7476)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-06-22 12:34:14 +02:00
Michael Hoffmann 0ff119d9fd
Store: add failing test to show an issue with tsdb selector (#7468)
The TSDB Selector is more powerful then label matchers. The issue is
that we propagate the TSDB to select with label matchers, but they
cannot convey enough information to select the right TSDB. This is an
example of a configuration that would select too many TSDBs.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-06-20 08:05:14 +02:00
Ben Ye 0272269bdb
put query stats logging under s.debugLogging (#7471) 2024-06-19 12:43:30 -07:00
Aleksei Atavin d9095d10c4
Bump objstore version (#7469)
Signed-off-by: Aleksei Atavin <axeo@aiven.io>
2024-06-18 17:57:37 +01:00
Ben Ye 065e3dd75a
Upgrade Prometheus common and Prometheus to latest main (#7465)
* upgrade Prometheus common and Prometheus to latest main

Signed-off-by: Ben Ye <benye@amazon.com>

* lint

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>
2024-06-17 14:22:27 -07:00
Harry John aa10ec301a
chore: updating objstore (#7462)
Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
2024-06-15 21:36:10 -07:00
Ben Kochie ceb0515693
Use the default dependabot labeling (#7457)
Dependabot already includes `dependencies` on PRs. Removing the config
will cause it to also include an ecosystem label like `go`.

Signed-off-by: SuperQ <superq@gmail.com>
2024-06-14 09:01:09 +01:00
Justin Jung 651a4a440e
Enhanced bytes limiter with data type param (#7414)
* Refactor existing stats incrementation for touched and fetched data

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Add TypedBytesLimiter

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Remove addAndCheck func

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Update BytesLimiter interface to accept dataType param

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Added tests

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Fix build + changelog

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Fix wrong data type

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Changed storeDataType to be exported

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Revert []BytesLimiter to BytesLimtier

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Lint

Signed-off-by: Justin Jung <jungjust@amazon.com>

* More reverts

Signed-off-by: Justin Jung <jungjust@amazon.com>

* More

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Rename DefaultBytesLimiterFactory back to NewBytesLimiterFactory

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Changed StoreDataType from string to int

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Removed nil check for bytesLimiter

Signed-off-by: Justin Jung <jungjust@amazon.com>

* nit

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Removed changelog

Signed-off-by: Justin Jung <jungjust@amazon.com>

---------

Signed-off-by: Justin Jung <jungjust@amazon.com>
2024-06-13 09:04:22 -07:00
dependabot[bot] 86382a8328
Bump actions/setup-go from 3 to 5 (#7435)
Bumps [actions/setup-go](https://github.com/actions/setup-go) from 3 to 5.
- [Release notes](https://github.com/actions/setup-go/releases)
- [Commits](https://github.com/actions/setup-go/compare/v3...v5)

---
updated-dependencies:
- dependency-name: actions/setup-go
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-13 10:27:23 +02:00
dependabot[bot] 3439c634c9
Bump github.com/onsi/gomega from 1.29.0 to 1.33.1 (#7448)
Bumps [github.com/onsi/gomega](https://github.com/onsi/gomega) from 1.29.0 to 1.33.1.
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](https://github.com/onsi/gomega/compare/v1.29.0...v1.33.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/gomega
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-13 09:00:56 +01:00
dependabot[bot] d375979c34
Bump github.com/klauspost/compress from 1.17.8 to 1.17.9 (#7447)
Bumps [github.com/klauspost/compress](https://github.com/klauspost/compress) from 1.17.8 to 1.17.9.
- [Release notes](https://github.com/klauspost/compress/releases)
- [Changelog](https://github.com/klauspost/compress/blob/master/.goreleaser.yml)
- [Commits](https://github.com/klauspost/compress/compare/v1.17.8...v1.17.9)

---
updated-dependencies:
- dependency-name: github.com/klauspost/compress
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-13 08:59:31 +01:00
dependabot[bot] a2c800197c
Bump github.com/felixge/fgprof from 0.9.2 to 0.9.4 (#7453)
Bumps [github.com/felixge/fgprof](https://github.com/felixge/fgprof) from 0.9.2 to 0.9.4.
- [Release notes](https://github.com/felixge/fgprof/releases)
- [Commits](https://github.com/felixge/fgprof/compare/v0.9.2...v0.9.4)

---
updated-dependencies:
- dependency-name: github.com/felixge/fgprof
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-13 08:58:49 +01:00
dependabot[bot] 8a597d2d78
Bump peter-evans/create-pull-request from 3 to 6 (#7432)
Bumps [peter-evans/create-pull-request](https://github.com/peter-evans/create-pull-request) from 3 to 6.
- [Release notes](https://github.com/peter-evans/create-pull-request/releases)
- [Commits](https://github.com/peter-evans/create-pull-request/compare/v3...v6)

---
updated-dependencies:
- dependency-name: peter-evans/create-pull-request
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-13 08:57:33 +01:00
dependabot[bot] 97439d7c7d
Bump github/codeql-action from 2 to 3 (#7434)
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2 to 3.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/v2...v3)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-13 08:57:09 +01:00
dependabot[bot] 92b8d7b132
Bump actions/checkout from 3 to 4 (#7431)
Bumps [actions/checkout](https://github.com/actions/checkout) from 3 to 4.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v3...v4)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-13 08:56:30 +01:00
Ben Kochie 0dd79e78c0
Fixup dependabot config (#7421)
* Remove unused/broken `vendor` key.
* Increase Go PR limit from 5 to 20.
* Fixup yaml consistency.

Signed-off-by: SuperQ <superq@gmail.com>
2024-06-13 07:00:56 +00:00
Aritra Basu 3c569da319
Updates devcontainer dockerfile (#7428)
Fetches the right version of prometheus from
the releases api rather than the tags api

Signed-off-by: Aritra24 <24430013+aritra24@users.noreply.github.com>
2024-06-13 07:00:09 +00:00
Filip Petkovski 10c417f0b8
Use cached label sets (#7420)
The distributed engine retrieves label sets once per query, and
doing the expensive copying and conversion uses a lot of memory.

We already set them in the format we need in the endpoint status,
so we can retrieve them from there.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-06-13 07:55:25 +01:00
Filip Petkovski 8ec0c2496d
Merge pull request #7423 from eqfarhad/main
Changelog - update the changelog entry position
2024-06-11 11:34:18 +02:00
Alan Protasio 882d6a11af
[Chore] Update Prometheus (#7416)
* Uupdate Prometheus

Signed-off-by: alanprot <alanprot@gmail.com>

* Updating prometheus to 4e664035e84e

Signed-off-by: alanprot <alanprot@gmail.com>

* Temporarily pinning prometheus common

Signed-off-by: alanprot <alanprot@gmail.com>

* fixing lint

Signed-off-by: alanprot <alanprot@gmail.com>

* Using jsoniter to encode promql responses

Signed-off-by: alanprot <alanprot@gmail.com>

* Removing e2e test case with unvalid hifen on a matcher -> prometheus now support this use case

Signed-off-by: alanprot <alanprot@gmail.com>

* Updating prometheus to v0.52.2-0.20240606174736-edd558884b24

Signed-off-by: alanprot <alanprot@gmail.com>

* pinning grpc to v1.63.2

Signed-off-by: alanprot <alanprot@gmail.com>

---------

Signed-off-by: alanprot <alanprot@gmail.com>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-21-10.us-west-2.compute.internal>
2024-06-10 15:19:55 -07:00
farhad ba950f6ab0 Changelog - update the changelog entry position
Dependency: Update minio-go to v7.0.70 which includes support for EKS Pod Identity.

Signed-off-by: farhad <eqfarhad@gmail.com>
2024-06-10 22:46:30 +02:00
Michael Hoffmann 8aa42c8a86
Sidecar: fix startup sequence (#7403)
Previously we defered starting the gRPC server by blocking the whole
startup until we could ping prometheus. This breaks usecases that rely
on the config reloader to start prometheus.
We fix it by using a channel to defer starting the grpc server
and loading external labels in an actor concurrently.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-06-10 11:02:25 -07:00
Filip Petkovski b72f7da340
Merge pull request #7409 from fpetkovski/trace-query-calls
Split promql span into query create and exec spans
2024-06-10 08:37:22 +02:00
Giedrius Statkevičius c08dc141dd
receive: remove serverAsClient usage (#7411)
* receive: remove serverAsClient usage

Remove serverAsClient usage to reduce CPU usage.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* receive: remove unused param

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* receive: make local client lazy

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

---------

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-06-06 14:05:40 +03:00
Jeroenvdl 58447e1a6e
Added Conclusion Xforce (#7412)
Co-authored-by: Jeroen van de Lockand <jeroen.vandelockand@conclusionxforce.nl>
2024-06-04 13:27:56 -07:00
Filip Petkovski 65ff447ab9
Split promql span into query create and exec spans
This commit splits the single promql_query_exec span into two
separate spans, covering query creation and execution.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-06-03 16:26:25 +02:00
Filip Petkovski 167032d24a
Merge pull request #7363 from freenowtech/slow-query-logs-user-header
Query-frontend: Set value of remote_user field in Slow Query Logs from HTTP header
2024-06-03 11:43:02 +02:00
Markus Meyer c9c00248b9
Merge branch 'main' into slow-query-logs-user-header
Signed-off-by: Markus Meyer <hydrantanderwand@gmail.com>
2024-06-03 10:03:56 +02:00
Saswata Mukherjee 863d914432
Merge pull request #7398 from saswatamcode/merge-release-0.35.1-to-main
Merge release 0.35.1 to main
2024-05-29 18:48:40 +01:00
Giedrius Statkevičius a252b24327
compactor: hold lock for a shorter amount of time (#7265)
If we are constantly running compactor in a loop then we shouldn't pay
the price of constantly holding the lock in the garbage collection
function. What the lock holding means in practice that we have to wait
two or sometimes even three times the amount it takes to sync metas.
That doesn't make sense since we are running the compactor in a loop and
the compacted blocks are properly taken care of.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-05-29 17:17:43 +03:00
Saswata Mukherjee 927fa21b6d
Fix changelog
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-05-29 09:46:52 +01:00
Saswata Mukherjee d948cde989
Merge v0.35.1 to main
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-05-29 09:45:18 +01:00
Saswata Mukherjee 419a0d9c28
Merge v0.35.1 to main
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-05-28 15:17:40 +01:00
Saswata Mukherjee 086a698b21
Cut patch release `v0.35.1` (#7394)
* compact: recover from panics (#7318)

For https://github.com/thanos-io/thanos/issues/6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Sidecar: wait for prometheus on startup (#7323)

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Receive: fix serverAsClient.Series goroutines leak (#6948)

* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix lint

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* update changelog

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* delete invalid comment

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove temp dev test

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove timer channel drain

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

---------

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* Receive: fix stats (#7373)

If we account stats for remote write and local writes we will count them
twice since the remote write will be counted locally again by the remote
receiver instance.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (#7382)

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* small fix

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Query: dont pass query hints to avoid triggering pushdown (#7392)

If we have a new querier it will create query hints even without the
pushdown feature being present anymore. Old sidecars will then trigger
query pushdown which leads to broken max,min,max_over_time and
min_over_time.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Cut patch release v0.35.1

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Co-authored-by: Michael Hoffmann <mhoffm@posteo.de>
Co-authored-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
2024-05-28 14:19:33 +01:00
Pedro Tanaka dfa7dd5720
*: Using native histograms for grpc middleware metrics (#7393)
* *: Using native histograms for grpc middleware metrics

Since we updated the middleware library, we can now use native histograms to keep track of latencies in grpc calls.
This is a semi-breaking change if people enabled native histogram collection on their Prometheus monitoring Thanos instances.

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

adding change log

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* removing empty space;

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Put full disclaimer in changelog

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

---------

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-05-28 11:13:39 +02:00
Michael Hoffmann 1c7ecab799
Query: dont pass query hints to avoid triggering pushdown (#7392)
If we have a new querier it will create query hints even without the
pushdown feature being present anymore. Old sidecars will then trigger
query pushdown which leads to broken max,min,max_over_time and
min_over_time.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-05-28 08:11:51 +01:00
Filip Petkovski c03e70f570
Merge pull request #6651 from coleenquadros/update_grpc_mw
Update go-grpc-middleware to v2.0.0
2024-05-27 14:45:01 +02:00
Coleen Iona Quadros cf472a90cd update query pkg to use internal tracing pkg to accomodate update of grpc middleware
Signed-off-by: Coleen Iona Quadros <coleen.quadros27@gmail.com>
2024-05-27 13:04:17 +02:00
Coleen Iona Quadros b84242a038 changelog
Signed-off-by: Coleen Iona Quadros <coleen.quadros27@gmail.com>
2024-05-27 12:22:16 +02:00
Coleen Iona Quadros bd1f066d65 linting
Signed-off-by: Coleen Iona Quadros <coleen.quadros27@gmail.com>
2024-05-27 12:22:16 +02:00
Coleen Iona Quadros 3e863ef4cd lint
Signed-off-by: Coleen Iona Quadros <coleen.quadros27@gmail.com>
2024-05-27 12:22:16 +02:00
Coleen Iona Quadros 7849529948 docs-s
Signed-off-by: Coleen Iona Quadros <coleen.quadros27@gmail.com>
2024-05-27 12:22:16 +02:00
Coleen Iona Quadros 2789d1be4f lint
Signed-off-by: Coleen Iona Quadros <coleen.quadros27@gmail.com>
2024-05-27 12:22:14 +02:00
Coleen Iona Quadros 95d4c57f60 add request id in logging field
Signed-off-by: Coleen Iona Quadros <coleen.quadros27@gmail.com>
2024-05-27 10:52:08 +02:00
Coleen Iona Quadros 61fc465d3e lint
Signed-off-by: Coleen Iona Quadros <coleen.quadros27@gmail.com>
2024-05-27 10:52:08 +02:00
Coleen Iona Quadros 789ef6a408 fix taggingsuite test
Signed-off-by: Coleen Iona Quadros <coleen.quadros27@gmail.com>
2024-05-27 10:52:08 +02:00
Coleen Iona Quadros 98ce7e209f Update CHANGELOG.md
Signed-off-by: Coleen Iona Quadros <coleen.quadros27@gmail.com>
2024-05-27 10:52:08 +02:00
Coleen Iona Quadros 3d7da101f9 add changelog
Signed-off-by: Coleen Iona Quadros <coleen.quadros27@gmail.com>
2024-05-27 10:52:08 +02:00
Coleen Iona Quadros 42cc9c8eb6 Update CHANGELOG.md
Signed-off-by: Coleen Iona Quadros <coleen.quadros27@gmail.com>
2024-05-27 10:52:08 +02:00
Coleen Iona Quadros 695912ea17 ctx
Signed-off-by: Coleen Iona Quadros <coleen.quadros27@gmail.com>
2024-05-27 10:52:08 +02:00
Coleen Iona Quadros 7554405e95 Update go_grpc_middleware to v2.0.0
Signed-off-by: Coleen Iona Quadros <coleen.quadros27@gmail.com>
2024-05-27 10:52:06 +02:00
Coleen Iona Quadros e54087234c update changelog
Signed-off-by: Coleen Iona Quadros <coleen.quadros27@gmail.com>
2024-05-27 10:07:25 +02:00
Coleen Iona Quadros 4ae6d73dab Update CHANGELOG.md
Signed-off-by: Coleen Iona Quadros <coleen.quadros27@gmail.com>
2024-05-27 10:07:25 +02:00
Coleen Iona Quadros 44e31e30f4 add changelog
Signed-off-by: Coleen Iona Quadros <coleen.quadros27@gmail.com>
2024-05-27 10:07:25 +02:00
Coleen Iona Quadros e01e4ab32f remove tags interceptor
Signed-off-by: Coleen Iona Quadros <coleen.quadros27@gmail.com>
2024-05-27 10:07:25 +02:00
Coleen Iona Quadros 3ab833e8c7 remove tags interceptor
Signed-off-by: Coleen Iona Quadros <coleen.quadros27@gmail.com>
2024-05-27 10:07:25 +02:00
Filip Petkovski 447cb96ad2
Merge pull request #7389 from derrix060/patch-1 2024-05-24 19:21:05 +02:00
Mario Apra 4aa17cba50
Update info on thanoscon
Since the event already passed, remove mention about how to submit
a new topic and how to register.

Signed-off-by: Mario Apra <mariotapra@gmail.com>
2024-05-24 17:39:30 +01:00
Michael Hoffmann 9be63b3a73
Query: set keepalive for store grpc client (#7385)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-05-24 15:52:22 +02:00
Pedro Tanaka fcda8e7290
Appending warn to changelog about breaking change (#7388)
* Appending warn to changelog about breaking change

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Including warning emoji

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

---------

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-05-24 15:52:07 +02:00
Filip Petkovski 1282e8422b
Merge pull request #7361 from pedro-stanaka/feat/remote-tracking-stats
Query: adding stats to the remote engine
2024-05-24 13:19:36 +02:00
Michael Hoffmann 60179ef60e
Proxy: unify store filtering (#7371)
In LabelNames and LabelValues gRPC calls were not pruned properly. While
results are not wrong, this leads to inefficient fan-out for setups with
many endpoints.
We took the opportunity to unify the store filtering and generally also
the larger layout of the gRPC methods, including logging and tracing.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-05-24 13:15:57 +02:00
Filip Petkovski 77c88647e5
Merge pull request #7384 from fpetkovski/grpc-request-id
Add request ID to client grpc spans
2024-05-24 12:40:28 +02:00
Filip Petkovski 120880635a
Add request ID to gRPC and HTTP client spans
This commit adds the request ID as a span tag to outgoing (client)
http and gRPC requests. This would allow easier correlation
of traces and logs using the request ID.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-05-24 09:59:26 +02:00
Pedro Tanaka c79c9ff657 Adding checks for backward compatibility
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-05-23 16:57:42 +02:00
Pedro Tanaka 847f0d4fe8 Do not declare reference, instead declare value object
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-05-23 16:57:42 +02:00
Pedro Tanaka 8fcb50ad76 last fix on tests
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-05-23 16:57:42 +02:00
Pedro Tanaka d7df5157d1 fixing details
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-05-23 16:57:42 +02:00
Pedro Tanaka 57aaa11f0b early continue on stats consume
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-05-23 16:57:42 +02:00
Pedro Tanaka 4c55387780 Only send stats at the end
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-05-23 16:57:42 +02:00
Pedro Tanaka e634c1062b go mod tidy
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-05-23 16:57:42 +02:00
Pedro Tanaka e7da0c7a6e Using latest main for promql-engine
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-05-23 16:57:42 +02:00
Pedro Tanaka ef033d5ff8 Adding changelog
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-05-23 16:57:42 +02:00
Pedro Tanaka f003848fd2 fixing lint
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-05-23 16:57:42 +02:00
Pedro Tanaka 0202811132 adjusting logging
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-05-23 16:57:42 +02:00
Pedro Tanaka 7b7aa38f39 Implement query sample statistics in promql interface
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-05-23 16:57:42 +02:00
Pedro Tanaka c4b3f05d89 using new version of engine
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-05-23 16:57:42 +02:00
Pedro Tanaka 09f8d0bc18 Query: adding stats to the remote engine
We are currently losing track of query stats because the remote engine does not transmit performance stats on gRPC calls.
In this PR I am adding some fields to the Query API response to include some stats.

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-05-23 16:57:42 +02:00
Saswata Mukherjee 253856231a
*: Ensure objstore flag values are masked & disable debug/pprof/cmdline (#7382)
* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* small fix

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-05-23 10:34:06 +01:00
Michael Hoffmann 8834a47ad4
UI: use prometheus POST query API (#7377)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-05-22 15:05:50 +03:00
Markus Meyer 011501fbbc
Merge branch 'main' into slow-query-logs-user-header 2024-05-21 17:49:18 +02:00
Filip Petkovski 9db31c2fc5
Merge pull request #7336 from fpetkovski/endpoint-collect-timeout
Add timeout to endpointset metric collector
2024-05-21 16:50:12 +02:00
Markus Meyer f062718adf
Merge branch 'main' into slow-query-logs-user-header 2024-05-21 11:44:25 +02:00
Michael Hoffmann 66841fbb1e
Receive: fix stats (#7373)
If we account stats for remote write and local writes we will count them
twice since the remote write will be counted locally again by the remote
receiver instance.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-05-21 10:40:09 +01:00
Markus Meyer 4031f5e83c
Merge branch 'main' into slow-query-logs-user-header
Signed-off-by: Markus Meyer <hydrantanderwand@gmail.com>
2024-05-21 11:16:33 +02:00
Michael Hoffmann fa0b4bde26
Docs: update my affiliation (#7375)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-05-21 08:36:31 +01:00
Michael Hoffmann d671a95d5d
misc: convert more code to build with stringlabels (#7372)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-05-20 13:17:55 +03:00
Thibault Mange 258154a149
Receive: fix serverAsClient.Series goroutines leak (#6948)
* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix lint

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* update changelog

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* delete invalid comment

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove temp dev test

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove timer channel drain

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

---------

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
2024-05-20 12:45:47 +03:00
Ben Ye 9e6cbd9fdd
Allow configurable request logger in Store Gateway (#7367)
* allow configurable request logger for Store Gateway

Signed-off-by: Ben Ye <benye@amazon.com>

* lint

Signed-off-by: Ben Ye <benye@amazon.com>

* lint

Signed-off-by: Ben Ye <benye@amazon.com>

* fix tests

Signed-off-by: Ben Ye <benye@amazon.com>

* fix test

Signed-off-by: Ben Ye <benye@amazon.com>

* address comments

Signed-off-by: Ben Ye <benye@amazon.com>

* fix tests

Signed-off-by: Ben Ye <benye@amazon.com>

* changelog

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>
2024-05-17 13:37:36 -07:00
Markus Meyer 25ff6b6b94 fix lint issues in docs
Signed-off-by: Markus Meyer <m.meyer@mytaxi.com>
2024-05-17 14:08:46 +02:00
Markus Meyer 0ba4422877 update changelog
Signed-off-by: Markus Meyer <m.meyer@mytaxi.com>
2024-05-17 14:08:46 +02:00
Markus Meyer ee8dfbaad3 Update docs
Signed-off-by: Markus Meyer <m.meyer@mytaxi.com>
2024-05-17 14:08:46 +02:00
Markus Meyer 6774ba0d45 Implement flag --query-frontend.slow-query-logs-user-header
Signed-off-by: Markus Meyer <m.meyer@mytaxi.com>
2024-05-17 14:08:46 +02:00
Giedrius Statkevičius e7524245bb
compact/planner: fix issue 6775 (#7334)
* compact/planner: fix issue 6775

It doesn't make sense to vertically compact downsampled blocks so mark
them with the no compact marker if downsampled blocks were detected in
the plan. Seems like the Planner is the best place for this logic - I
just repeated the previous pattern with the large index file filter.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* CHANGELOG: add item

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

---------

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-05-17 00:57:49 -07:00
Filip Petkovski 9707a4f7da
Propagate request ID through gRPC context (#7356)
* Propagate request ID through gRPC context

The request ID only gets propagated through HTTP calls and is not available
in gRPC servers.

This commit adds intereceptors to grpc servers and clients to make sure request ID
propagation happens.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Add license

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-05-15 10:41:04 -07:00
dependabot[bot] 9b26db4050
Bump webpack-dev-middleware from 5.3.1 to 5.3.4 in /pkg/ui/react-app (#7348)
Bumps [webpack-dev-middleware](https://github.com/webpack/webpack-dev-middleware) from 5.3.1 to 5.3.4.
- [Release notes](https://github.com/webpack/webpack-dev-middleware/releases)
- [Changelog](https://github.com/webpack/webpack-dev-middleware/blob/v5.3.4/CHANGELOG.md)
- [Commits](https://github.com/webpack/webpack-dev-middleware/compare/v5.3.1...v5.3.4)

---
updated-dependencies:
- dependency-name: webpack-dev-middleware
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-13 11:16:37 +01:00
dependabot[bot] 6d312d3ef3
Bump ip from 1.1.5 to 1.1.9 in /pkg/ui/react-app (#7344)
Bumps [ip](https://github.com/indutny/node-ip) from 1.1.5 to 1.1.9.
- [Commits](https://github.com/indutny/node-ip/compare/v1.1.5...v1.1.9)

---
updated-dependencies:
- dependency-name: ip
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-13 11:15:22 +01:00
Filip Petkovski da2bbb6b76
Align tenant pruning according to wall clock (#7299)
* Align tenant pruning according to wall clock.

Pruning a tenant currently acquires a lock on the tenant's TSDB,
which blocks reads from incoming queries. We have noticed spikes in
query latency when tenants get decomissioned since each receiver will
prune the tenant at a different time.

To reduce the window where queries get degraded, this commit makes sure that
pruning happens at predictable intervals by aligning it to the wall clock, similar
to how head compaction is aligned.

The commit also changes the tenant deletion condition to look at the duration
from the min time of the tenant, rather than from the last append time.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Improve tests

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-05-13 08:57:33 +00:00
Filip Petkovski 2d738f0ded
Merge pull request #7342 from thanos-io/dependabot/npm_and_yarn/pkg/ui/react-app/webpack-5.91.0
Bump webpack from 5.70.0 to 5.91.0 in /pkg/ui/react-app
2024-05-10 10:36:24 +02:00
Filip Petkovski a27c96a6dd
Rebuild react app
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-05-10 09:56:51 +02:00
dependabot[bot] 6dbf535f8e
Bump webpack from 5.70.0 to 5.91.0 in /pkg/ui/react-app
Bumps [webpack](https://github.com/webpack/webpack) from 5.70.0 to 5.91.0.
- [Release notes](https://github.com/webpack/webpack/releases)
- [Commits](https://github.com/webpack/webpack/compare/v5.70.0...v5.91.0)

---
updated-dependencies:
- dependency-name: webpack
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-05-09 22:55:05 +00:00
Pedro Tanaka 970cbbee60
*: Promql changes to add support to extended functions throught Thanos (#7338)
* fixing extended functions support in more places

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* Adding new failint for the Parse() method

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* Adding new method for ParseMetricSelector

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* Fixing missing imports

Extending test to check behavior

More missing imports

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* Fixing method name

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* Solving references to forbidden functions

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* Treating promql validation from ParseExpr

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* fixing funcs

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

---------

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-05-07 12:05:19 +03:00
Farhad ab8f2b39e5
Dependency - Update minio-go to v7.0.70 (#7335)
* Update minio-go to v7.0.70

Add support for EKS Pod Identity
fix issue: #7157

Signed-off-by: farhad <eqfarhad@gmail.com>

* Changelog - support for EKS Pod Identity

Updated changelog

Signed-off-by: farhad <eqfarhad@gmail.com>

---------

Signed-off-by: farhad <eqfarhad@gmail.com>
2024-05-06 15:19:08 -07:00
Filip Petkovski cad8f9346b
Add timeout to endpointset metric collector
We have seen deadlocks with endpoint discovery caused by the metric
collector hanging and not releasing the store labels lock. This causes
the endpoint update to hang, which also makes all endpoint readers hang on
acquiring a read lock for the resolved endpoints slice.

This commit makes sure the Collect method on the metrics collector has
a built in timeout to guard against cases where an upstream call never
reads from the collection channel.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-05-06 14:31:14 +02:00
Filip Petkovski 07838b88a2
Merge pull request #7326 from pedro-stanaka/fix/exemplars-store-multitsdb
Query: fix exemplar proxying for receivers with multiple tenants
2024-05-06 14:28:11 +02:00
Vanshika 7ce3f359cc
Ruler UI: usage of alert.query-template inside Rules UI (#7329)
* rule

Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>

* rule-changes

Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>

* prettier

Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>

* Rebuild

Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>

* changes after make react-app

Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>

---------

Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>
2024-05-06 09:40:52 +01:00
Vanshika 6bbd899a4c
level change (#7330)
Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>
2024-05-06 08:11:35 +01:00
Pedro Tanaka 70e3bd9bf4
Query: Fixing extended functions in distributed querier (#7331)
* Adding repro case for broken query with distributed engine

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* Fixing problem with distributed queries and xfunctios

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* Adding support for extended functions in tenancy enforcement

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* Moving custom parser to new package

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* fixing go-lint

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* Using same opts and reorganize imports

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* fixing problem with query format

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* fixing flaky tests

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* removing extra test

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* yet another flaky test

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

---------

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>
2024-05-04 21:37:53 +02:00
Filip Petkovski 90558f5a16
Merge pull request #7332 from fpetkovski/trace-remote-queries
Emit tracing span for remote queries
2024-05-03 14:10:10 +02:00
Giedrius Statkevičius 038a0b22c5
e2e/compact: add repro for issue #6775 (#7333)
Adding a minimal test case for issue #6775 - reproduces the panic in the
compactor.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-05-03 13:06:53 +03:00
Pedro Tanaka 167c32f9d6
rename func
Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>
2024-05-03 11:46:32 +02:00
Filip Petkovski 170eabc0fe
Fix lint
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-05-03 10:57:58 +02:00
Filip Petkovski 8cb4d8774c
Emit tracing span for remote queries
This commit adds a new tracing span for remotely delegated queries
with attributes related to the query and remote engine.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-05-03 10:50:24 +02:00
Pedro Tanaka 20a608ae7e
adding changelog
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-05-03 09:21:23 +02:00
Pedro Tanaka f5fb9af2ac
Query: fixing matching of exemplar stores with multi tenants
When using the exemplars proxy to search for exemplars on receivers, if one receiver had tenants that did not match the selector on the external label it would get
skipped completely even if it had a tenant that actually matched

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-05-03 09:21:02 +02:00
Pedro Tanaka 3ad558e53e
adding broken test case
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-05-03 09:21:02 +02:00
Michael Hoffmann c94b34cd5d
Sidecar: wait for prometheus on startup (#7323)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-05-03 07:52:43 +02:00
Saswata Mukherjee f0b1753eda
Merge pull request #7322 from saswatamcode/merge-release-0.35-to-main
Merge release 0.35 to main
2024-05-02 12:41:16 +01:00
Saswata Mukherjee 02cb11529e
Fix version
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-05-02 12:10:03 +01:00
Saswata Mukherjee d506ace7eb
Merge branch 'main' into merge-release-0.35-to-main 2024-05-02 12:09:09 +01:00
Saswata Mukherjee d9a0efab57
Cut release v0.35.0 (#7320)
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-05-02 10:40:12 +01:00
Giedrius Statkevičius 6e08d1a1ad
compact: recover from panics (#7318)
For https://github.com/thanos-io/thanos/issues/6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-05-01 23:52:36 +03:00
Ben Ye 1e745af672
fix reader getting wrong posting offsets when querying multiple values (#7301) 2024-05-01 09:19:08 -07:00
Filip Petkovski 17afd29c6c
Merge pull request #7317 from fpetkovski/otel-resource-attrs
Allow specifying OTLP resource attributes for traces
2024-04-30 15:50:41 +02:00
Filip Petkovski 5960dd6d97
Add CHANGELOG entry
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-04-30 11:47:48 +02:00
Filip Petkovski a534d10f8d
Allow specifying OTLP resource attributes for traces
This commit adds a resource_attributes field to the OTLP tracing configuration.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-04-30 11:44:03 +02:00
Kartikay 2d23490230
added Trademark URL (#7107)
Signed-off-by: Kartikay <kartikay_2101ce32@iitp.ac.in>
2024-04-29 11:13:22 -07:00
Saswata Mukherjee bcad1e1af9
Cut release candidate v0.35.0-rc.0 (#7314)
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-04-29 14:22:05 +01:00
Saswata Mukherjee d9508cc3f4
CHANGELOG: Mark 0.35 as in progress (#7312)
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2024-04-29 12:54:14 +01:00
Michael Hoffmann bd74665efb
Stores: respect replica labels in LabelValues and LabelNames (#7310)
* Proxy: acceptance test for proxy store with replica labels

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Stores: handle replica labels in label_value and label_names grpcs

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

---------

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-04-29 13:30:12 +02:00
Michael Hoffmann 4145f03e49
Proxy: acceptance tests for relabel filter (#7309)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-04-29 05:42:13 +01:00
Michael Hoffmann fed2870cbc
Store: batch tsdb infos (#7308)
Batch TSDB Infos for bucket store for blocks with overlapping ranges.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-04-26 21:16:58 +02:00
Giedrius Statkevičius 6bf98f929d
store: use loser trees (#7304)
Remove a long-standing TODO item in the code - let's use the great loser
tree implementation by Bryan. It is faster than the heap because less
comparisons are needed. Should be a nice improvement given that the heap
is used in a lot of hot paths.

Since Prometheus also uses this library, it's tricky to import the "any"
version. I tried doing https://github.com/bboreham/go-loser/pull/3 but
it's still impossible to do that. Let's just copy/paste the code, it's
not a lot.

Bench:

```
goos: linux
goarch: amd64
pkg: github.com/thanos-io/thanos/pkg/store
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
             │   oldkway   │               newkway               │
             │   sec/op    │    sec/op     vs base               │
KWayMerge-16   2.292m ± 3%   2.075m ± 15%  -9.47% (p=0.023 n=10)

             │   oldkway    │               newkway               │
             │     B/op     │     B/op      vs base               │
KWayMerge-16   1.553Mi ± 0%   1.585Mi ± 0%  +2.04% (p=0.000 n=10)

             │   oldkway   │              newkway               │
             │  allocs/op  │  allocs/op   vs base               │
KWayMerge-16   27.26k ± 0%   26.27k ± 0%  -3.66% (p=0.000 n=10)
```

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-04-26 09:39:39 -07:00
Pedro Tanaka e6fc833018
*: Updating hashicorp LRU cache to v2 (#7306)
* *: Updating hashicorp LRU cache to v2

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Adding some new comments regarding removing complexity of TTL

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Using new version everywhere

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* rephrase the comment

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

---------

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-04-25 23:15:44 -07:00
Pedro Tanaka a007648423
Query|Receiver: Do not log full request on ProxyStore by default (#7305)
* Query|Receiver|Store: Do not log full request on ProxyStore by default

We had a problem on our production where a sudden increase in requests with long matchers was putting our receivers under a lot of pressure.
Upon checking profiles we saw that the problem was calls to Log()

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Adding changelog

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

---------

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-04-25 22:50:28 -07:00
Nicolas Takashi 23d2052864
[CHORE] considering X-Forwarded-For on HTTP Logging (#7303)
Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>
2024-04-25 06:18:57 +01:00
Michael Hoffmann 57016bdfe1
Sidecar: mark as unqueryable if prometheus is down (#7297)
If the prometheus that belongs to a sidecar is down we dont need to
query the sidecar. This PR makes it so that we take the sidecar out of
the endpoint set then. We do the same for all other store APIs by
retuning an error in the info/Info gRPC call if they are marked as not
ready.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-04-24 16:00:16 +02:00
Ben Ye 7c8fe85682
Optimize empty posting check in lazy posting (#7298)
* change lazy postings empty posting check to use cardinality

Signed-off-by: Ben Ye <benye@amazon.com>

* update lazy posting test

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>
2024-04-23 11:34:30 -07:00
Filip Petkovski a96e7f3c63
Show warnings in query frontend (#7289)
* Show warnings in query frontend

QFE currently does not parse warnings from downstream queriers.
This commit fixes that by adding the field to proto messages and
modifies the merge function to take warnings into account.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Add CHANGELOG entry

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Omit empty warnings

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-04-23 08:29:04 +02:00
Filip Petkovski c3cd031d43
Merge pull request #7219 from guillaumelecerf/bugfix/client-tls-external-termination
Receive: stop relying on grpc server config to set grpc client secure/skipVerify
2024-04-22 13:05:43 +02:00
Guillaume Lecerf 9998c9b1e1
Receive: stop relying on grpc server config to set grpc client secure/skipVerify
Signed-off-by: Guillaume Lecerf <guillaume.lecerf@iziwork.com>
2024-04-22 12:01:37 +02:00
Filip Petkovski 6582c81716
Merge pull request #7286 from fpetkovski/instant-query-warns
Propagate warnings from instant queries
2024-04-18 14:17:10 +02:00
Filip Petkovski fe0931dcae
Add CHANGELOG entry
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-04-18 12:39:33 +02:00
Filip Petkovski b0be15586c
Propagate warnings from instant queries
Warnings from remote instant queries get turned into errors, which
is a bug. It should be up to the root client to decide whether warnings
should be show as such, or converted to errors.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-04-18 12:38:00 +02:00
Ben Kochie f7ba14066f
Compact: Replace group with resolution in ownsample metrics (#7283)
Compaction dowsnample metrics have too high a cardinality, causing metric
bloat on large installations. The group information is better suited to logs.
* Replace with a resolution label to reduce cardinality.

Fixes: #5841

Signed-off-by: SuperQ <superq@gmail.com>
2024-04-18 12:24:07 +03:00
Giedrius Statkevičius 5fb0c69d19
receive/multitsdb: do not delete not uploaded blocks (#7166)
* receive/multitsdb: do not delete not uploaded blocks

If a block hasn't been uploaded yet then tell the TSDB layer not to
delete them. This prevents a nasty race where the TSDB layer can delete
a block before the shipper gets to it. I saw this happen with a very
small block.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* receive/multitsdb: change order

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* shipper/receive: just use a single lock

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

---------

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-04-16 09:44:40 +03:00
Nicolas Takashi 9338e1e131
[CHORE] adding user agent (#7281)
Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>
2024-04-15 10:53:08 -07:00
magiceses 968899fabc
Fix incorrect comments (#7268)
Signed-off-by: Magiceses <magiceses0118@gmail.com>
2024-04-12 12:36:37 +03:00
Yi Jin 140bc87193
Receive: fix issue-7248 with parallel receive_forward (#7267)
* Receive: fix issue-7248 by introducing a worker pool

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* fix unit test bug

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* fix CLI flags not pass into the receive handler

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* address comments

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* init context in constructor

Signed-off-by: Yi Jin <yi.jin@databricks.com>

---------

Signed-off-by: Yi Jin <yi.jin@databricks.com>
2024-04-11 11:02:29 -07:00
okestro-yj.yoo 652e8cc41e
change the reflect package to an unsafe package (#7143)
- as 'reflect.String.Header' is deprecated, it is replaced with an unsafe package.

Signed-off-by: Youngjun <yj.yoo@okestro.com>
2024-04-11 18:37:17 +05:30
Pedro Tanaka 8227108dba
query: fixing dedup iterator when working on mixed sample types (#7271)
* query: fixing dedup iterator when working on mixed sample types

There was a panic in case the dedupiterator worked on two chunks with both Native Histograms and Float (XOR encoded).

Co-authored-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Adding changelog

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* fixing lint

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* removing comments

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Fixing repro test case

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* fixing initialization

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* fixing changelog

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* adding header to new file

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* using t.Run

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* fixing ordering of samples in tests

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

---------

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Co-authored-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>
2024-04-11 02:52:36 -07:00
Giedrius Statkevičius 5280bb607b
receive/handler: implement tenant label splitting (#7256)
* receive/handler: implement tenant label splitting

Implement splitting incoming HTTP requests along some label inside of
the timeseries themselves. This functionality is useful when you have
one big application exposing lots of series and, for instance, you have
a label `team` that identifies different owners of metrics in that
application. Then using this you can use that `team` label to have
different tenants in Thanos.

Only negative thing that I could spot is that if after splitting one of
the requests fails then that code is used for all tenants and that skews
the Receiver metrics a little bit. I think that can be left as a TODO
task.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* test/e2e: add more receiver tests

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* thanos/receive: note that splitting takes precendence over HTTP

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* thanos/receive: fix typo

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

---------

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-04-10 16:20:29 +05:30
Filip Petkovski 953ce26ad7
Merge pull request #7261 from pedro-stanaka/feat/plan-serialize-optimize
query: forward query plan in the remote query request
2024-04-09 18:43:51 +02:00
Filip Petkovski f7853dd12c
Merge pull request #7266 from NotAFile/clarify-relabel-selector-docs
Clarify documentation around selecor.relabel-config option
2024-04-09 12:20:47 +02:00
Giedrius Statkevičius a106d5f75b
api/ui: show peak/total samples in analyze (#7269)
Show the new peak/total fields in analyze output next to each operator.
Add tooltips to explain what is the meaning of each field.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-04-09 12:26:34 +03:00
Pedro Tanaka 9ef4b5ac0d
last nits
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-04-09 10:13:33 +02:00
Pedro Tanaka 4ae0449c73
Refactor to method
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-04-09 10:13:33 +02:00
Pedro Tanaka 409cfed1b1
refactor, add tests
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-04-09 10:13:33 +02:00
Pedro Tanaka 0bed7efe08
fallback in case we cant use plan
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-04-09 10:13:32 +02:00
Pedro Tanaka 11f87d8cdb
Refactor query creation from plan
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-04-09 10:13:32 +02:00
Pedro Tanaka f5bcc13bef
Using proper constructors passing the query plan
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-04-09 10:13:32 +02:00
Pedro Tanaka 350796bb9a
Passing the plan along as the query in remote executions
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-04-09 10:13:32 +02:00
Pedro Tanaka 79b11f5b93
removing second precision engine, upstream already truncates
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-04-09 10:13:32 +02:00
Pedro Tanaka a6dc67b003
Propagate the query plan
* Serialize the plan for remote executions

latest engine

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

Propagate marshaled plan and introduce optimizer

Propagating the query plan in the remote engine requests and introduce new SetProjectionColumns optimizer

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Fixing passing down of plan

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* go mod tidy

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* avoid panics

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* delete dev file

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* undo small refactor

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* improve test

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

---------

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

generating protos

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

fixing v1

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

delete unused method

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

Set projection labels after distributing queries
2024-04-09 10:13:31 +02:00
Ben Ye d6d3645f13
don't halt compaction due to overlapping sources when vertical compaction is enabled (#7225)
Signed-off-by: Ben Ye <benye@amazon.com>
2024-04-09 10:04:42 +03:00
notafile 74c5dc9c33 clarify writing around sharding
Signed-off-by: notafile <nota@notafile.com>
2024-04-08 17:40:28 +02:00
notafile 78b5bbcd60 clarify documentation around selecor.relabel-config option
Signed-off-by: notafile <nota@notafile.com>
2024-04-08 17:40:18 +02:00
Tidhar Klein Orbach 6b3aa32e20
Fix 7244 error targets page (#7245)
* added UNKNOWN to TargetHealth_value at targets proto

Signed-off-by: Tidhar Klein Orbach <tizkiko@gmail.com>

* added TargetHealth_value UNKNOWN to rpc.pb.go

Signed-off-by: Tidhar Klein Orbach <tizkiko@gmail.com>

---------

Signed-off-by: Tidhar Klein Orbach <tizkiko@gmail.com>
2024-04-08 12:05:12 +02:00
Giedrius Statkevičius 40465eecfd
go.mod: bump promql-engine (#7263)
Bump promql-engine version to include samples counting.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-04-05 14:14:04 +03:00
Yi Jin 603fb38478
fix query_test when --race enabled (#7258)
Signed-off-by: Yi Jin <yi.jin@databricks.com>
2024-04-05 09:54:17 +03:00
NeerajNagure 3048d99108
Tracing: added missing sampler types (#7231)
* added missing sampler types

Signed-off-by: Neeraj Nagure <nagureneeraj@gmail.com>

* added changelog entry

Signed-off-by: Neeraj Nagure <nagureneeraj@gmail.com>

* fixed changelog entry

Signed-off-by: Neeraj Nagure <nagureneeraj@gmail.com>

* Fixed changelog entry conflict

Signed-off-by: Neeraj Nagure <nagureneeraj@gmail.com>

---------

Signed-off-by: Neeraj Nagure <nagureneeraj@gmail.com>
2024-04-04 18:20:28 -07:00
suhas-chikkanna f80fd94732
Added Shield in adopters (#7254)
* Added Shield in adopters

Signed-off-by: suhas.chikkanna.shield <suhas.chikkanna@shield.com>

* Upload compatible image 

Signed-off-by: suhas-chikkanna <162577490+suhas-chikkanna@users.noreply.github.com>

---------

Signed-off-by: suhas.chikkanna.shield <suhas.chikkanna@shield.com>
Signed-off-by: suhas-chikkanna <162577490+suhas-chikkanna@users.noreply.github.com>
Co-authored-by: suhas.chikkanna.shield <suhas.chikkanna@shield.com>
2024-04-03 15:02:08 +00:00
Kemal Akkoyun e8027459fe
Update kakkoyun's affiliation (#7251) 2024-04-02 11:34:55 +02:00
Filip Petkovski c7b1cc9231
Merge pull request #7250 from roth-wine/pr-fix-changelog-gomemlimit-reference
fix(changelog): fix GOMEMLIMIT pull request reference
2024-04-02 09:19:22 +02:00
philipp.roth 8cdece5d6d
fix(changelog): fix GOMEMLIMIT pull request reference
Signed-off-by: roth-wine <philipp.roth@hetzner.com>
2024-04-02 08:49:14 +02:00
Ben Ye 4bf7867da4
change shipper to not overwrite all external labels (#7247)
Signed-off-by: Ben Ye <benye@amazon.com>
2024-04-01 08:01:10 -07:00
Ben Ye 881beb95e1
remove write method from Compactor interface (#7246)
Signed-off-by: Ben Ye <benye@amazon.com>
2024-03-31 23:44:57 -07:00
Michael Hoffmann f707f8c9f6
docs: add thanoscon 2024 talks (#7243)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-03-29 09:37:43 +01:00
Alec Rajeev 943401f726
update docs for receive routing only with limits (#7241)
Signed-off-by: Alec Rajeev <13004609+alecrajeev@users.noreply.github.com>
2024-03-28 11:59:24 -07:00
Ben Ye 4f664e3151
Bump Prometheus to include new label regex optimization (#7232)
* bump Prometheus version to include new label matcher regex value optimization

Signed-off-by: Ben Ye <benye@amazon.com>

* update

Signed-off-by: Ben Ye <benye@amazon.com>

* fix again

Signed-off-by: Ben Ye <benye@amazon.com>

* include latest fix

Signed-off-by: Ben Ye <benye@amazon.com>

* update go mod

Signed-off-by: Ben Ye <benye@amazon.com>

* fix explain test

Signed-off-by: Ben Ye <benye@amazon.com>

* fix test again

Signed-off-by: Ben Ye <benye@amazon.com>

* update again

Signed-off-by: Ben Ye <benye@amazon.com>

* update

Signed-off-by: Ben Ye <benye@amazon.com>

* fix tests so far

Signed-off-by: Ben Ye <benye@amazon.com>

* fix compactor tests

Signed-off-by: Ben Ye <benye@amazon.com>

* use own out of order chunk index

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>
2024-03-28 10:45:07 -07:00
Ben Ye 4d7a75f40a
Fix lazy expanded postings cache and bug of non equal matcher (#7220)
* fix lazy expanded postings cache and bug of non equal matcher with non existent values

Signed-off-by: Ben Ye <benye@amazon.com>

* test case for remove keys noop

Signed-off-by: Ben Ye <benye@amazon.com>

* add promqlsmith fuzz test

Signed-off-by: Ben Ye <benye@amazon.com>

* update

Signed-off-by: Ben Ye <benye@amazon.com>

* changelog

Signed-off-by: Ben Ye <benye@amazon.com>

* fix go mod

Signed-off-by: Ben Ye <benye@amazon.com>

* rename test

Signed-off-by: Ben Ye <benye@amazon.com>

* fix series request timestamp

Signed-off-by: Ben Ye <benye@amazon.com>

* skip e2e test

Signed-off-by: Ben Ye <benye@amazon.com>

* handle non lazy expanded case

Signed-off-by: Ben Ye <benye@amazon.com>

* update comment

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>
2024-03-27 19:00:44 +01:00
Markus Möslinger 6c613fcd13
UI: Showing Block Size Stats (#7233)
* feat(ui): added BlockSizeStats calculation to blocks page

A block can have a list of contained files set in `.thanos.files`.
If the `files` array is set, all referenced files with `size_bytes` set are counted:
- sum of all `chunk/*` file sizes
- size of index file
- total size (sum of both)

Shows statistics about the selected block in the block details view:
- Total size of block
- Size of index (and percentage of total)
- Size of all chunks (and percentage of total)
- Daily growth, based on total size and block duration

Output is humanized up to Pebibytes and fixed to two decimal places;
raw bytes are accessible through mouse over / title text.

Signed-off-by: Markus Möslinger <markus.moeslinger@socra.dev>

* feat(ui): added aggregated BlockSizeStats to blocks row title

Added total size of all blocks from a source to the row title, beneath the source name.

The shown total size is humanized up to pebibytes and fixed to two decimal places;
raw bytes value is accessible through mouse over / title text.

The shown value will refresh with selected compaction levels, but doesn't take block filter into account.

I thought about showing daily growth as well, but just summing all milliseconds of all blocks doesn't work with overlapping blocks / multiple resolutions.

Signed-off-by: Markus Möslinger <markus.moeslinger@socra.dev>

* chore(docs): added UI block size PR to CHANGELOG.md

Signed-off-by: Markus Möslinger <markus.moeslinger@socra.dev>

* chore(ui): removed comments

Automatic code formatting duplicated some comments near import statements.

Signed-off-by: Markus Möslinger <markus.moeslinger@socra.dev>

---------

Signed-off-by: Markus Möslinger <markus.moeslinger@socra.dev>
2024-03-26 09:25:20 +02:00
Thibault Mange b55ffbc2c4
Query-frontend: fix missing redis username config (#7224)
* add username cfg to rueidis client

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* update changelog

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

---------

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
2024-03-25 08:41:36 +00:00
Ben Ye b721f09ddf
bump objstore package version to latest main (#7228)
Signed-off-by: Ben Ye <benye@amazon.com>
2024-03-25 08:40:56 +00:00
Nicolas Takashi 93c79b6182
[CHORE] adding auto GOMEMLIMIT flag (#7223)
Co-authored-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-03-24 13:55:22 +01:00
Ben Ye deb615fff6
expose NewPromSeriesSet (#7214)
Signed-off-by: Ben Ye <benye@amazon.com>
2024-03-23 02:05:27 -07:00
Filip Petkovski 4a2a4555d2
Update thanos-io/promql-engine (#7215)
* Update thanos-io/promql-engine

This commit updates the promql-engine module to latest main and modifies
to remote engine based on the breaking change.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Fix lint

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-03-21 01:57:33 -07:00
Filip Petkovski 6df670f10a
Merge pull request #7194 from xBazilio/retry-downsample-errors 2024-03-18 09:45:32 +01:00
Filip Petkovski 2623e4963e
Merge branch 'main' into retry-downsample-errors 2024-03-18 09:44:11 +01:00
Filip Petkovski f731719f95
Add support for TSDB selector in querier (#7200)
* Add support for TSDB selector in querier

This PR allows using the query distributed mode against a set of multi-tenant receivers
as described in https://github.com/thanos-io/thanos/blob/main/docs/proposals-done/202301-distributed-query-execution.md#distributed-execution-against-receive-components.

The feature is enabled by a selector.relabel-config flag in the Query component
which allows it to select a subset of TSDBs to query based on their external labels.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Add CHANGELOG entry and fix docs

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Fix tests

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Add comments

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Add test case for MatchersForLabelSets

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Fix failing test

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Use an unbuffered channel

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Change flag description

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Remove parameter from ServerAsClient

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-03-14 09:13:55 +01:00
Filip Petkovski dea822d1ee
Merge pull request #7207 from MichaHoffmann/mhoffm-make-server-as-client-channels-unbuffered
storepb: make ServerAsClient channels unbuffered
2024-03-13 16:20:23 +01:00
Michael Hoffmann bbfb8fdff5
storepb: make ServerAsClient channels unbuffered
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-03-13 14:39:20 +01:00
Hélia Barroso 528f06639f
[FEAT] Adding blog post (#7202)
Signed-off-by: Helia Barroso <helia.barroso@hotmail.com>
Co-authored-by: Helia Barroso <helia.barroso@hotmail.com>
2024-03-13 10:09:15 +02:00
Daniel Hrabovcak 7eda7ff69d
Reloader: Add support for watching and decompressing Prometheus configuration directories (#7199)
Signed-off-by: Daniel Hrabovcak <thespiritxiii@gmail.com>
2024-03-12 08:25:26 +00:00
Filip Petkovski 7acce0cbb8
Merge pull request #7164 from pedro-stanaka/fix/dedup-iter
compact: properly treat native histogram deduplication in chunk series merger
2024-03-11 07:50:48 +01:00
Filip Petkovski 3019dfecde
Merge pull request #7193 from Improwised/support-page-changes
change platform-engineer logo size and make 'https://thanos.io/support/' responsive
2024-03-10 16:57:18 +01:00
Munir Khakhi d889195061
Merge branch 'main' into support-page-changes 2024-03-09 11:27:11 +05:30
Jacob Baungård Hansen 5910ed68b5
Query UI: Only show tenant box with enforcement on (#7186)
With this commit we only show the tenant-ui box when enforcement of
tenancy is on, as it is not needed otherwise.

Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>
2024-03-08 17:53:13 +02:00
Munir Khakhi 0be6c877b6
Merge branch 'main' into support-page-changes 2024-03-08 18:34:11 +05:30
Giedrius Statkevičius 31af6daa6a
rule: do not turn off if resolving fails (#7192)
Do not turn off Ruler if resolving fails. We can still (try to) evaluate
rules even if Alertmanager is not available.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-03-08 14:30:41 +02:00
Vasiliy Rumyantsev 4c10194482 downsample: retry objstore related errors
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
2024-03-07 17:56:13 +03:00
Vasiliy Rumyantsev 5fa40d7d1c downsample: retry objstore related errors
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
2024-03-07 17:53:49 +03:00
Payal17122000 adca80be49 fix: make responsive support page and change size of platform-engineer logo
fix: add anchor tag to all images
Signed-off-by: Payal17122000 <raviyapayal17@gmail.com>
2024-03-07 19:06:24 +05:30
Payal Raviya cbc9738961
Merge branch 'thanos-io:main' into main 2024-03-07 18:36:35 +05:30
Daniel Mellado e40e364280
Bump google.golang.org/protobuf to v1.33.0 (#7191)
This PR bumps the version of google.golang.org/protobuf to v1.33.0 fix a
potential vulnerability in the protojson.Unmarhsl function [1] that can
occure when unmarshaling a message with a protobuf value.

Even if the function isn't used directly in Thanos it would be safer to
just bump it directly.

[1] https://pkg.go.dev/vuln/GO-2024-2611

Signed-off-by: Daniel Mellado <dmellado@redhat.com>
2024-03-07 14:40:19 +02:00
Munir Khakhi 673c82f611
Added platformengineers (#7181)
Signed-off-by: Munir Khakhi <munir@improwised.com>
2024-03-07 11:19:23 +02:00
Munir Khakhi 6664247d7f
Added platformengineers
Signed-off-by: Munir Khakhi <munir@improwised.com>
2024-03-05 14:48:35 +05:30
Filip Petkovski a97a6ff92f
Merge pull request #7180 from fpetkovski/fix-docs
Fix docs
2024-03-05 09:20:08 +01:00
Filip Petkovski 9694f01210
Fix docs
Fixes docs formatting and updates the distributed execution link to the done proposal.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-03-05 08:49:39 +01:00
Giedrius Statkevičius c06d55dfec
cortex/querier: fix analysis merging (#7179)
We were not merging analysis properly - mergo was overwriting data.
Instead of using a whole library for this, just write two small
functions and use them. Add test to cover this.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-03-05 09:47:22 +02:00
Filip Petkovski 4166776cf4
Merge pull request #7175 from fpetkovski/distributed-query-mode
Unhide distributed execution mode
2024-03-05 08:34:14 +01:00
Filip Petkovski f6fed686ae
Add changelog entry
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-03-04 14:18:31 +01:00
Filip Petkovski be02591c32
Unhide distributed execution mode
This commit exposes the distributed query execution mode to end-users by unhiding the
flag used to toggle this feature.

The commit also adds documentation on when the mode is appropriate to be used.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-03-04 14:18:31 +01:00
Giedrius Statkevičius 084fb23f07
.circleci: bump setup_docker_version version (#7177)
The current image is deprecated. See
https://discuss.circleci.com/t/remote-docker-image-deprecations-and-eol-for-2024/50176.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-03-04 14:48:04 +02:00
Giedrius Statkevičius 4c7997d25a
receive: add support for globbing tenant specifiers (#7155)
We want to be able to route all tenants which begin with certain letters
to some receivers so we need to have some kind of globbing/regex support
in the hashring. This PR adds that functionality. We've been using this
in prod successfully.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-02-27 19:00:11 +02:00
Jacob Baungård Hansen 360d24de10
Query UI: Add tenant box (#6867)
* Query UI: Add tenant box

With this commit as tenant box is added to the query UI. It can be used
to specify which tenant to use when making a query.

Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>

* Re-compile static react app

Recompiles the static react app as now needed following:
https://github.com/thanos-io/thanos/pull/6900

Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>

* Move changelog item to appropiate future release

After merging it was under the 0.34 release.

Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>

* Move query path tenancy proposal to done

Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>

---------

Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>
2024-02-27 07:48:29 -08:00
Michael Hoffmann 5ab87be31d
Merge pull request #7163 from thanos-io/fix_queryrange_analysis
queryfrontend: fix analysis after API changes
2024-02-27 15:28:03 +01:00
Pedro Tanaka deabad94fc
*: properly treat native histogram deduplication in chunk series merger
We have detected a problem in the chunk seriers merger where it will
panic in case it encounters native histogram chunks.
I am using thanos as a library for a project and wanted to use the
penalty function to dedup blocks from Prometheus instances.

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-02-27 12:47:54 +01:00
Ben Ye e7cd6c1c60
bugfix: lazy posting optimization with wrong cardinality for estimation (#7122)
* bugfix: catch lazy posting optimization using wrong cardinality for estimation

Signed-off-by: Ben Ye <benye@amazon.com>

* update changelog

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>
2024-02-26 10:46:00 -08:00
Giedrius Statkevičius a532ccd421 queryfrontend: fix analysis after API changes
Fix the analysis functionality with query-frontend after the recent
changes. Added tests for this.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-02-26 17:10:34 +02:00
Xiaochao Dong f72b767f37
cache: implement the circuit breaker pattern for asynchronous set operations in the cache client (#7010)
* Implement the circuit breaker pattern for asynchronous set operations in the cache client

Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>

* Add feature flag for circuitbreaker

Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>

* Sync docs

Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>

* Skip configuration validation if the circuit breaker is disabled

Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>

* Make lint happy

Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>

* Abstract the logic of the circuit breaker

Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>

---------

Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>
2024-02-25 15:26:31 -08:00
Filip Petkovski 2f1f83f661
Allow using different listing strategies (#7134)
* Allow using different listing strategies

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Expose flags for block list strategy

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Run make docs

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Fix whitespace

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Add CHANGELOG entry

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-02-24 11:52:24 -08:00
Giedrius Statkevičius 75152c4e7b
cache/caching_bucket: add path to hash (#7158)
Add path to the hash. This allows identifying difference instances by
different config paths.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-02-23 10:02:31 +02:00
Giedrius Statkevičius 508d82e1ed
e2e/query_frontend: add tests for explain/analyze (#7160)
Adding tests for explain/analyze with QFE. Will add fixes as separate
PR.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-02-23 09:55:48 +02:00
Filip Petkovski ed44e01af9
Copy labels from remote instant queries (#7151)
Similar to https://github.com/thanos-io/thanos/pull/6957, we should copy
labels from remote instant queries so that memory does not get overwritten
when processing series in a central engine.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-02-22 10:22:06 -08:00
Samuel Dufel 42289ca3b5
Extended func support - doc update (#7161)
* Add support for extended promql functions in rule

Adds a flag to register the extended promql functions supported by the thanos
query engine when running the rule component.  This will allow rule config
files containing query expressions with (xrate / xincrease / xdelta) to pass
validation.  This will only work if the query endpoint in use is running the
thanos engine.

Signed-off-by: Samuel Dufel <samuel.dufel@shopify.com>

* Update rendered docs with added flag

Signed-off-by: Samuel Dufel <samuel.dufel@shopify.com>

---------

Signed-off-by: Samuel Dufel <samuel.dufel@shopify.com>
2024-02-22 20:11:20 +02:00
Samuel Dufel 1723d1d421
Add support for extended promql functions in rule (#7105)
Adds a flag to register the extended promql functions supported by the thanos
query engine when running the rule component.  This will allow rule config
files containing query expressions with (xrate / xincrease / xdelta) to pass
validation.  This will only work if the query endpoint in use is running the
thanos engine.

Signed-off-by: Samuel Dufel <samuel.dufel@shopify.com>
2024-02-20 21:24:22 -08:00
Michael Hoffmann fc3b360e32
Merge pull request #7150 from MichaHoffmann/merge-release-0.34.1-to-main
Merge release 0.34.1 to main
2024-02-20 15:01:17 +01:00
Michael Hoffmann 40114ce851
Merge remote-tracking branch 'origin/main' into merge-release-0.34.1-to-main 2024-02-20 13:55:31 +01:00
Michael Hoffmann 8249048f3b
Merge remote-tracking branch 'origin/main' into merge-release-0.34.1-to-main 2024-02-20 13:53:55 +01:00
Giedrius Statkevičius 987fac66af
cache: attach object storage hash to iter key (#6880)
Attach object storage hash to the iter key so that it would be possible
to reuse the same cache storage e.g. Redis for different buckets.
Without this, the results are funny to say the least if you accidentally
attempt to do that. Thus, let's add the hash to reduce the possibility
of an accident for our users.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-02-19 19:13:52 +02:00
Michael Hoffmann 4cf1559998
Merge pull request #7131 from MichaHoffmann/mhoffm-cut-release-0.34.1
Cut patch release 0.34.1
2024-02-19 17:38:43 +01:00
Giedrius Statkevičius 8fa5ff9659
docs: fix link (#7129)
The link has moved to another since Cisco bought Banzai Cloud.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-02-18 11:15:51 +01:00
Chetan Deshmukh 70c8eb6b26
Adding InfraCloud as Enterprise support partner (#7141)
* adding InfraCloud as Enterprise support partner

Signed-off-by: Chetan Deshmukh <cdeshmukh@infracloud.io>

* replaced svg file to match layout

Signed-off-by: Chetan Deshmukh <cdeshmukh@infracloud.io>

* added alt-text and horizontal image

Signed-off-by: Chetan Deshmukh <cdeshmukh@infracloud.io>

---------

Signed-off-by: Chetan Deshmukh <cdeshmukh@infracloud.io>
2024-02-15 09:33:10 +02:00
Pedro Tanaka 4a82ba7f69
Adding new method on BucketedBytes to expose used memory (#7137)
* Adding new method on bucketed bytes to expose used

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Removing interface, using RWMutex

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

---------

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-02-14 10:45:02 +00:00
Pedro Tanaka e78d867333
Fixing log line for remote engine in debug mode (#7133)
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-02-12 09:37:35 -08:00
Filip Petkovski f5ca5a8417
Merge pull request #7132 from bavarianbidi/update_helm_installation_instruction
docs: update helm installation instruction
2024-02-12 16:26:41 +01:00
Mario Constanti 7640f0f72f docs: run make docs for helm installation instruction
Signed-off-by: Mario Constanti <github@constanti.de>
2024-02-12 15:47:16 +01:00
Mario Constanti 8ffb953ccb
Merge branch 'main' into update_helm_installation_instruction 2024-02-12 14:54:09 +01:00
Giedrius Statkevičius f28680ceeb
docs: fix link (#7129)
The link has moved to another since Cisco bought Banzai Cloud.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-02-12 12:31:07 +02:00
Mario Constanti 0bf17ae237 docs: update helm installation instruction
the prometheus helm chart is a community maintained chart since a few
years. With that, the old example pointed to an old chart and the
provided example values aren't also working anymore.

This update the documentation.

Signed-off-by: Mario Constanti <github@constanti.de>
2024-02-12 06:39:46 +01:00
Michael Hoffmann 4a4b66908b
VERSION: cut release 0.34.1
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-02-11 19:35:50 +01:00
hanyuting8 3b7951cdfd
Upgrade grpc to 1.57.2 (#7078)
1、In the replace of go.mod, due to weaveworks/common#239, The grpc version is 1.45.0, but there are vulnerabilities in this version. In order to fix CVE-2023-44478, the grpc version needs to be upgraded to 1.57.2
2、In order to upgrade GRPC, the version of weaveworks/common also needs to be upgraded, otherwise the build will fail

Signed-off-by: hanyuting8 <hytxidian@163.com>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-02-11 19:35:37 +01:00
Michael Hoffmann 3da5c1c2f8
Receive: dont rely on slice labels (#7100)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-02-09 13:28:42 -08:00
Jake Keeys 21ed9bbc0c
default to alertmanager v2 api (#7123)
Signed-off-by: Jake Keeys <jake@keeys.org>
2024-02-09 16:22:04 +02:00
Giedrius Statkevičius 29831f840c
receive/handler: do not double lock (#7124)
markPeerUnavailable was always taking a lock and in one case we were
calling it with a lock already taken. Fix this.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-02-09 15:06:49 +02:00
Kartikay 37092db5f1
fix minio store gateway err (#7114)
Signed-off-by: Kartikay <kartikay_2101ce32@iitp.ac.in>
2024-02-08 08:19:25 +00:00
Giedrius Statkevičius 94f971bc2b
receive/handler: fix locking twice (#7112)
Fix bug introduced in https://github.com/thanos-io/thanos/pull/6898: we
were RLock()ing twice. This leads to a deadlock in some situations.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-02-01 15:41:36 +02:00
Filip Petkovski 50ce7a2806
Update prometheus/prometheus (#7096)
* Update prometheus/prometheus

This commit updates prometheus/prometheus to latest main (60b6266e).

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Fix file discovery

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-01-31 10:33:18 +02:00
Michael Hoffmann 13e1558054
Merge pull request #7099 from MichaHoffmann/mhoffm-dont-use-slice-labels-continued
Store: dont rely on slice labels continued
2024-01-29 16:05:54 +01:00
Michael Hoffmann 925e31a514
Merge pull request #7101 from MichaHoffmann/merge-release-0.34-to-main
merge release 0.34 to main
2024-01-29 15:10:20 +01:00
Michael Hoffmann 9eb6591cf9
Merge remote-tracking branch 'origin/main' into merge-release-0.34-to-main
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-01-29 10:31:54 +01:00
Michael Hoffmann 2f861d852e Store: dont rely on slice labels continued
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-01-29 09:16:40 +01:00
Michael Hoffmann 6a0a49101a
all: get rid of query pushdown to simplify query path (#7014)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-01-29 09:42:45 +02:00
Michael Hoffmann 1cf333e28d
Stores: convert tests to not rely on slice labels (#7098)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-01-27 09:54:17 -08:00
Michael Hoffmann 18d740f292
CHANGELOG: cut release 0.34 (#7095)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-01-26 13:45:53 -08:00
Giedrius Statkevičius daa34a52cc
receive: use async remote writing (#7045) 2024-01-26 10:18:53 -08:00
Mikhail Nozdrachev fce0fe2458
receive: race condition in handler Close() when stopped early (#7087)
Receiver hangs waiting for the HTTP Hander to shutdown if an error occurs
before Handler is initialized. This might happen, for example, if the hashring
is too small for a given replication factor.

Signed-off-by: Mikhail Nozdrachev <mikhail.nozdrachev@aiven.io>
2024-01-24 09:41:52 +02:00
Michael Hoffmann 15a60f9db7
Merge pull request #7086 from MichaHoffmann/mhoffm-cut-release-0.34.0-rc.1
VERSION: cut release 0.34.0-rc.1
2024-01-23 18:11:29 +01:00
Michael Hoffmann df467f7e5a
VERSION: cut release 0.34.0-rc.1
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-01-23 11:23:14 +01:00
Michael Hoffmann b4aee0ef54
Store: fix label values edge case (#7082)
If the requested label is an external label and we have series matchers
we should only return results if the series matchers actually match a
series.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-01-23 12:10:40 +02:00
Ben Ye e215fa599b
Fix lazy postings with zero length (#7083)
* fix lazy postings with zero length

Signed-off-by: Ben Ye <benye@amazon.com>

* changelog

Signed-off-by: Ben Ye <benye@amazon.com>

* unit tests

Signed-off-by: Ben Ye <benye@amazon.com>

* fix doc

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>
2024-01-22 10:25:32 -08:00
Michael Hoffmann 6b18338ccd
Store: acceptance test for proxy store (#7084)
* Add basic acceptance tests for proxy store
* Fix bug where invalid requests got ignored because of partial response
  strategy

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2024-01-22 11:37:48 +01:00
hanyuting8 058f92070f
Upgrade grpc to 1.57.2 (#7078)
1、In the replace of go.mod, due to weaveworks/common#239, The grpc version is 1.45.0, but there are vulnerabilities in this version. In order to fix CVE-2023-44478, the grpc version needs to be upgraded to 1.57.2
2、In order to upgrade GRPC, the version of weaveworks/common also needs to be upgraded, otherwise the build will fail

Signed-off-by: hanyuting8 <hytxidian@163.com>
2024-01-21 09:15:31 -08:00
Douglas Camata 4a73fc3cdb
Receive: refactor handler for improved readability and organization (#6898)
* [wip] First checkpoint

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* [wip] Second checkpoint

All tests passing, unit and e2e.

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Small random refactors

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Add some useful trace tags

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Concurrent and traced local writes

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Improve variable names in remote writes

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Rename `newFanoutForward` function

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* More refactors

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Fix linting issue

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Add a quorum test with sloppy quorum

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* [wip] Try to make retries work

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* [wip] Checkpoint: wait group still hanging

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Some refactors

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Add some commented code so I don't lose it

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Adapt tests

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Remove sloppy quorum code

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Move some code around

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Remove even more leftover of sloppy quorum

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Extract a type to hold function params

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Remove unused struct field

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Remove useless variable

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Remove type that wasn't used enough

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Delete function to tighten up max buffered responses

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Add comments to some functions

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Fix peer up check

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Fix size of replication tracking slices

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Rename context

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Don't do local writes concurrently

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Remove extra error logging

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Fix syntax after merge

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Add missing methods to peersContainer

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Fix handler test

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Reset peers state on hashring changes

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Handle PR comment regarding waitgroup

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Set span tags to help debug

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Fix concurrency issue

We close the request as soon as quorum is reached and leave a few Go routines running to finish replication and so cleanups.

This means that the context from the HTTP request is cancelled... which ends up also cancelling the pending replication requests.

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Fix request ID middleware

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Fix `distributeTimeseriesToReplicas` comment

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Extract var with 1-indexed replication index

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Rename methods in peersContainer interface

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Make peerGroup `getConnection` check if peers are up

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Remove yet one more not useful log

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Remove logger from `h.sendWrites`

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

---------

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
2024-01-17 15:31:47 +02:00
Michael Hoffmann a0ce64d274
Merge pull request #7065 from vinted/multitsdb_overlapping
receive: disable overlapping compaction
2024-01-16 17:18:00 +01:00
Jacob Baungård Hansen 3de122f331
CI: Ensure static react-app is checked in (#7063)
* CI: Ensure static react-app is checked in

With this commit the CI system should fail if changes to the react-app
has been made without checking in the changes.

Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>

* Add `react-app` as dependency `check-react-app`

To ensure the react-app is rebuilt before checking for changes.

Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>

---------

Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>
2024-01-16 16:23:12 +05:30
Giedrius Statkevičius 80a5ce6b15 receive: disable overlapping compaction
Use the new TSDB flag to disable overlapping compaction to fix OOO
samples handling in the Receive component.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-01-16 12:50:56 +02:00
Alex Le 324846f66d
Make RetryError and HaltError able to be fetched for root cause (#7043)
* Make RetryError and HaltError able to be fetched for root cause

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Added unit test

Signed-off-by: Alex Le <leqiyue@amazon.com>

* fix lint

Signed-off-by: Alex Le <leqiyue@amazon.com>

* fixed IsRetryError and IsHaltError functions

Signed-off-by: Alex Le <leqiyue@amazon.com>

---------

Signed-off-by: Alex Le <leqiyue@amazon.com>
2024-01-15 11:46:23 -08:00
Giedrius Statkevičius bee20b9d2a
go.mod: update Prometheus version (#7047)
Update Prometheus version to include
https://github.com/prometheus/prometheus/pull/13242 which is important
for me - it unblocks further postings work.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-01-15 12:48:22 +02:00
Jacob Baungård Hansen a7e8a644d0
UI: Don't always force tracing (#7062)
Forced tracing was.. Forced true always, even if the checkbox in the UI
to enable tracing was not actually checked.

Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>
2024-01-15 11:25:44 +01:00
606 changed files with 51248 additions and 17524 deletions

1
.bingo/.gitignore vendored
View File

@ -11,3 +11,4 @@
!variables.env
*tmp.mod
*tmp.sum

View File

@ -6,7 +6,6 @@ This is directory which stores Go modules with pinned buildable package that is
* Run `bingo get <tool>` to install <tool> that have own module file in this directory.
* For Makefile: Make sure to put `include .bingo/Variables.mk` in your Makefile, then use $(<upper case tool name>) variable where <tool> is the .bingo/<tool>.mod.
* For shell: Run `source .bingo/variables.env` to source all environment variable for each tool.
* For go: Import `.bingo/variables.go` to for variable names.
* See https://github.com/bwplotka/bingo or -h on how to add, remove or change binaries dependencies.
## Requirements

View File

@ -1,4 +1,4 @@
# Auto generated binary variables helper managed by https://github.com/bwplotka/bingo v0.8. DO NOT EDIT.
# Auto generated binary variables helper managed by https://github.com/bwplotka/bingo v0.9. DO NOT EDIT.
# All tools are designed to be build inside $GOBIN.
BINGO_DIR := $(dir $(lastword $(MAKEFILE_LIST)))
GOPATH ?= $(shell go env GOPATH)
@ -17,29 +17,35 @@ GO ?= $(shell which go)
# @echo "Running alertmanager"
# @$(ALERTMANAGER) <flags/args..>
#
ALERTMANAGER := $(GOBIN)/alertmanager-v0.24.0
ALERTMANAGER := $(GOBIN)/alertmanager-v0.27.0
$(ALERTMANAGER): $(BINGO_DIR)/alertmanager.mod
@# Install binary/ries using Go 1.14+ build command. This is using bwplotka/bingo-controlled, separate go module with pinned dependencies.
@echo "(re)installing $(GOBIN)/alertmanager-v0.24.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=alertmanager.mod -o=$(GOBIN)/alertmanager-v0.24.0 "github.com/prometheus/alertmanager/cmd/alertmanager"
@echo "(re)installing $(GOBIN)/alertmanager-v0.27.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=alertmanager.mod -o=$(GOBIN)/alertmanager-v0.27.0 "github.com/prometheus/alertmanager/cmd/alertmanager"
BINGO := $(GOBIN)/bingo-v0.8.1-0.20230820182247-0568407746a2
BINGO := $(GOBIN)/bingo-v0.9.0
$(BINGO): $(BINGO_DIR)/bingo.mod
@# Install binary/ries using Go 1.14+ build command. This is using bwplotka/bingo-controlled, separate go module with pinned dependencies.
@echo "(re)installing $(GOBIN)/bingo-v0.8.1-0.20230820182247-0568407746a2"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=bingo.mod -o=$(GOBIN)/bingo-v0.8.1-0.20230820182247-0568407746a2 "github.com/bwplotka/bingo"
@echo "(re)installing $(GOBIN)/bingo-v0.9.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=bingo.mod -o=$(GOBIN)/bingo-v0.9.0 "github.com/bwplotka/bingo"
FAILLINT := $(GOBIN)/faillint-v1.11.0
CAPNPC_GO := $(GOBIN)/capnpc-go-v3.0.1-alpha.2.0.20240830165715-46ccd63a72af
$(CAPNPC_GO): $(BINGO_DIR)/capnpc-go.mod
@# Install binary/ries using Go 1.14+ build command. This is using bwplotka/bingo-controlled, separate go module with pinned dependencies.
@echo "(re)installing $(GOBIN)/capnpc-go-v3.0.1-alpha.2.0.20240830165715-46ccd63a72af"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=capnpc-go.mod -o=$(GOBIN)/capnpc-go-v3.0.1-alpha.2.0.20240830165715-46ccd63a72af "capnproto.org/go/capnp/v3/capnpc-go"
FAILLINT := $(GOBIN)/faillint-v1.15.0
$(FAILLINT): $(BINGO_DIR)/faillint.mod
@# Install binary/ries using Go 1.14+ build command. This is using bwplotka/bingo-controlled, separate go module with pinned dependencies.
@echo "(re)installing $(GOBIN)/faillint-v1.11.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=faillint.mod -o=$(GOBIN)/faillint-v1.11.0 "github.com/fatih/faillint"
@echo "(re)installing $(GOBIN)/faillint-v1.15.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=faillint.mod -o=$(GOBIN)/faillint-v1.15.0 "github.com/fatih/faillint"
GOIMPORTS := $(GOBIN)/goimports-v0.12.0
GOIMPORTS := $(GOBIN)/goimports-v0.23.0
$(GOIMPORTS): $(BINGO_DIR)/goimports.mod
@# Install binary/ries using Go 1.14+ build command. This is using bwplotka/bingo-controlled, separate go module with pinned dependencies.
@echo "(re)installing $(GOBIN)/goimports-v0.12.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=goimports.mod -o=$(GOBIN)/goimports-v0.12.0 "golang.org/x/tools/cmd/goimports"
@echo "(re)installing $(GOBIN)/goimports-v0.23.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=goimports.mod -o=$(GOBIN)/goimports-v0.23.0 "golang.org/x/tools/cmd/goimports"
GOJSONTOYAML := $(GOBIN)/gojsontoyaml-v0.1.0
$(GOJSONTOYAML): $(BINGO_DIR)/gojsontoyaml.mod
@ -47,11 +53,11 @@ $(GOJSONTOYAML): $(BINGO_DIR)/gojsontoyaml.mod
@echo "(re)installing $(GOBIN)/gojsontoyaml-v0.1.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=gojsontoyaml.mod -o=$(GOBIN)/gojsontoyaml-v0.1.0 "github.com/brancz/gojsontoyaml"
GOLANGCI_LINT := $(GOBIN)/golangci-lint-v1.54.1
GOLANGCI_LINT := $(GOBIN)/golangci-lint-v2.4.0
$(GOLANGCI_LINT): $(BINGO_DIR)/golangci-lint.mod
@# Install binary/ries using Go 1.14+ build command. This is using bwplotka/bingo-controlled, separate go module with pinned dependencies.
@echo "(re)installing $(GOBIN)/golangci-lint-v1.54.1"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=golangci-lint.mod -o=$(GOBIN)/golangci-lint-v1.54.1 "github.com/golangci/golangci-lint/cmd/golangci-lint"
@echo "(re)installing $(GOBIN)/golangci-lint-v2.4.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=golangci-lint.mod -o=$(GOBIN)/golangci-lint-v2.4.0 "github.com/golangci/golangci-lint/v2/cmd/golangci-lint"
GOTESPLIT := $(GOBIN)/gotesplit-v0.2.1
$(GOTESPLIT): $(BINGO_DIR)/gotesplit.mod
@ -95,11 +101,11 @@ $(MDOX): $(BINGO_DIR)/mdox.mod
@echo "(re)installing $(GOBIN)/mdox-v0.9.1-0.20220713110358-25b9abcf90a0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=mdox.mod -o=$(GOBIN)/mdox-v0.9.1-0.20220713110358-25b9abcf90a0 "github.com/bwplotka/mdox"
MINIO := $(GOBIN)/minio-v0.0.0-20220720015624-ce8397f7d944
MINIO := $(GOBIN)/minio-v0.0.0-20241014163537-3da7c9cce3de
$(MINIO): $(BINGO_DIR)/minio.mod
@# Install binary/ries using Go 1.14+ build command. This is using bwplotka/bingo-controlled, separate go module with pinned dependencies.
@echo "(re)installing $(GOBIN)/minio-v0.0.0-20220720015624-ce8397f7d944"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=minio.mod -o=$(GOBIN)/minio-v0.0.0-20220720015624-ce8397f7d944 "github.com/minio/minio"
@echo "(re)installing $(GOBIN)/minio-v0.0.0-20241014163537-3da7c9cce3de"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=minio.mod -o=$(GOBIN)/minio-v0.0.0-20241014163537-3da7c9cce3de "github.com/minio/minio"
PROMDOC := $(GOBIN)/promdoc-v0.8.0
$(PROMDOC): $(BINGO_DIR)/promdoc.mod
@ -107,11 +113,11 @@ $(PROMDOC): $(BINGO_DIR)/promdoc.mod
@echo "(re)installing $(GOBIN)/promdoc-v0.8.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=promdoc.mod -o=$(GOBIN)/promdoc-v0.8.0 "github.com/plexsystems/promdoc"
PROMETHEUS := $(GOBIN)/prometheus-v0.37.0
PROMETHEUS := $(GOBIN)/prometheus-v0.54.1
$(PROMETHEUS): $(BINGO_DIR)/prometheus.mod
@# Install binary/ries using Go 1.14+ build command. This is using bwplotka/bingo-controlled, separate go module with pinned dependencies.
@echo "(re)installing $(GOBIN)/prometheus-v0.37.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=prometheus.mod -o=$(GOBIN)/prometheus-v0.37.0 "github.com/prometheus/prometheus/cmd/prometheus"
@echo "(re)installing $(GOBIN)/prometheus-v0.54.1"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=prometheus.mod -o=$(GOBIN)/prometheus-v0.54.1 "github.com/prometheus/prometheus/cmd/prometheus"
PROMTOOL := $(GOBIN)/promtool-v0.47.0
$(PROMTOOL): $(BINGO_DIR)/promtool.mod
@ -119,11 +125,11 @@ $(PROMTOOL): $(BINGO_DIR)/promtool.mod
@echo "(re)installing $(GOBIN)/promtool-v0.47.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=promtool.mod -o=$(GOBIN)/promtool-v0.47.0 "github.com/prometheus/prometheus/cmd/promtool"
PROMU := $(GOBIN)/promu-v0.5.0
PROMU := $(GOBIN)/promu-v0.17.0
$(PROMU): $(BINGO_DIR)/promu.mod
@# Install binary/ries using Go 1.14+ build command. This is using bwplotka/bingo-controlled, separate go module with pinned dependencies.
@echo "(re)installing $(GOBIN)/promu-v0.5.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=promu.mod -o=$(GOBIN)/promu-v0.5.0 "github.com/prometheus/promu"
@echo "(re)installing $(GOBIN)/promu-v0.17.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=promu.mod -o=$(GOBIN)/promu-v0.17.0 "github.com/prometheus/promu"
PROTOC_GEN_GOGOFAST := $(GOBIN)/protoc-gen-gogofast-v1.3.2
$(PROTOC_GEN_GOGOFAST): $(BINGO_DIR)/protoc-gen-gogofast.mod
@ -131,9 +137,9 @@ $(PROTOC_GEN_GOGOFAST): $(BINGO_DIR)/protoc-gen-gogofast.mod
@echo "(re)installing $(GOBIN)/protoc-gen-gogofast-v1.3.2"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=protoc-gen-gogofast.mod -o=$(GOBIN)/protoc-gen-gogofast-v1.3.2 "github.com/gogo/protobuf/protoc-gen-gogofast"
SHFMT := $(GOBIN)/shfmt-v3.7.0
SHFMT := $(GOBIN)/shfmt-v3.8.0
$(SHFMT): $(BINGO_DIR)/shfmt.mod
@# Install binary/ries using Go 1.14+ build command. This is using bwplotka/bingo-controlled, separate go module with pinned dependencies.
@echo "(re)installing $(GOBIN)/shfmt-v3.7.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=shfmt.mod -o=$(GOBIN)/shfmt-v3.7.0 "mvdan.cc/sh/v3/cmd/shfmt"
@echo "(re)installing $(GOBIN)/shfmt-v3.8.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=shfmt.mod -o=$(GOBIN)/shfmt-v3.8.0 "mvdan.cc/sh/v3/cmd/shfmt"

View File

@ -1,5 +1,7 @@
module _ // Auto generated by https://github.com/bwplotka/bingo. DO NOT EDIT
go 1.14
go 1.21
require github.com/prometheus/alertmanager v0.24.0 // cmd/alertmanager
toolchain go1.23.1
require github.com/prometheus/alertmanager v0.27.0 // cmd/alertmanager

File diff suppressed because it is too large Load Diff

View File

@ -2,4 +2,4 @@ module _ // Auto generated by https://github.com/bwplotka/bingo. DO NOT EDIT
go 1.14
require github.com/bwplotka/bingo v0.8.1-0.20230820182247-0568407746a2
require github.com/bwplotka/bingo v0.9.0

View File

@ -4,6 +4,8 @@ github.com/bwplotka/bingo v0.6.0 h1:AlRrI9J/GVjOUSZbsYQ5WS8X8FnLpTbEAhUVW5iOQ7M=
github.com/bwplotka/bingo v0.6.0/go.mod h1:/qx0tLceUEeAs1R8QnIF+n9+Q0xUe7hmdQTB2w0eDYk=
github.com/bwplotka/bingo v0.8.1-0.20230820182247-0568407746a2 h1:nvLMMDf/Lw2JdJe2KzXjnL7IhIU+j48CXFZEuR9uPHQ=
github.com/bwplotka/bingo v0.8.1-0.20230820182247-0568407746a2/go.mod h1:GxC/y/xbmOK5P29cn+B3HuOSw0s2gruddT3r+rDizDw=
github.com/bwplotka/bingo v0.9.0 h1:slnsdJYExR4iRalHR6/ZiYnr9vSazOuFGmc2LdX293g=
github.com/bwplotka/bingo v0.9.0/go.mod h1:GxC/y/xbmOK5P29cn+B3HuOSw0s2gruddT3r+rDizDw=
github.com/cpuguy83/go-md2man/v2 v2.0.2/go.mod h1:tgQtvFlXSQOSOSIRvRPT7W67SCa46tRHOmNcaadrF8o=
github.com/creack/pty v1.1.9/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ33E=
github.com/creack/pty v1.1.15/go.mod h1:MOBLtS5ELjhRRrroQr9kyvTxUAFNvYEK993ew/Vr4O4=

5
.bingo/capnpc-go.mod Normal file
View File

@ -0,0 +1,5 @@
module _ // Auto generated by https://github.com/bwplotka/bingo. DO NOT EDIT
go 1.23.1
require capnproto.org/go/capnp/v3 v3.0.1-alpha.2.0.20240830165715-46ccd63a72af // capnpc-go

6
.bingo/capnpc-go.sum Normal file
View File

@ -0,0 +1,6 @@
capnproto.org/go/capnp/v3 v3.0.1-alpha.2.0.20240830165715-46ccd63a72af h1:A5wxH0ZidOtYYUGjhtBaRuB87M73bGfc06uWB8sHpg0=
capnproto.org/go/capnp/v3 v3.0.1-alpha.2.0.20240830165715-46ccd63a72af/go.mod h1:2vT5D2dtG8sJGEoEKU17e+j7shdaYp1Myl8X03B3hmc=
github.com/colega/zeropool v0.0.0-20230505084239-6fb4a4f75381 h1:d5EKgQfRQvO97jnISfR89AiCCCJMwMFoSxUiU0OGCRU=
github.com/colega/zeropool v0.0.0-20230505084239-6fb4a4f75381/go.mod h1:OU76gHeRo8xrzGJU3F3I1CqX1ekM8dfJw0+wPeMwnp0=
golang.org/x/sync v0.7.0 h1:YsImfSBoP9QPYL0xyKJPq0gcaJdG3rInoqxTWbfQu9M=
golang.org/x/sync v0.7.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=

View File

@ -1,5 +1,11 @@
module _ // Auto generated by https://github.com/bwplotka/bingo. DO NOT EDIT
go 1.14
go 1.23.0
require github.com/fatih/faillint v1.11.0
toolchain go1.24.0
replace github.com/fatih/faillint => github.com/thanos-community/faillint v0.0.0-20250217160734-830c2205d383
require github.com/fatih/faillint v1.15.0
require golang.org/x/sync v0.16.0 // indirect

View File

@ -6,25 +6,54 @@ github.com/fatih/faillint v1.10.0 h1:NQ2zhSNuYp0g23/6gyCSi2IfdVIfOk/JkSzpWSDEnYQ
github.com/fatih/faillint v1.10.0/go.mod h1:upblMxCjN4sL78nBbOHFEH9UGHTSw61M3Kj9BMS0UL0=
github.com/fatih/faillint v1.11.0 h1:EhmAKe8k0Cx2gnf+/JiX/IAeeKjwsQao5dY8oG6cQB4=
github.com/fatih/faillint v1.11.0/go.mod h1:d9kdQwFcr+wD4cLXOdjTw1ENUUvv5+z0ctJ5Wm0dTvA=
github.com/fatih/faillint v1.13.0 h1:9Dn9ZvK7bPTFmAkQ0FvhBRF4qD+LZg0ZgelyeBc7kKE=
github.com/fatih/faillint v1.13.0/go.mod h1:YiTDDtwQSL6MNRPtYG0n/rGE9orYt92aohq/P2QYBLA=
github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
github.com/thanos-community/faillint v0.0.0-20250217160734-830c2205d383 h1:cuHWR5WwIVpmvccpJ2iYgEWIo1SQQCvPWtYSPOiVvoU=
github.com/thanos-community/faillint v0.0.0-20250217160734-830c2205d383/go.mod h1:KM6cUIJEIVjYDUACgnDrky9bsAP4/+d37G7sGbEn6I0=
github.com/yuin/goldmark v1.4.1/go.mod h1:mwnBkeHKe2W/ZEtQ+71ViKU8L12m81fl3OWwC1Zlc8k=
github.com/yuin/goldmark v1.4.13/go.mod h1:6yULJ656Px+3vBD8DxQVa3kxgyrAnzto9xy5taEt/CY=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI=
golang.org/x/crypto v0.0.0-20210921155107-089bfa567519/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
golang.org/x/crypto v0.13.0/go.mod h1:y6Z2r+Rw4iayiXXAIxJIDAJ1zMW4yaTpebo8fPOliYc=
golang.org/x/crypto v0.19.0/go.mod h1:Iy9bg/ha4yyC70EfRS8jz+B6ybOBKMaSxLj6P6oBDfU=
golang.org/x/crypto v0.21.0/go.mod h1:0BP7YvVV9gBbVKyeTG0Gyn+gZm94bibOW5BjDEYAOMs=
golang.org/x/crypto v0.33.0/go.mod h1:bVdXmD7IV/4GdElGPozy6U7lWdRXA4qyRVGJV57uQ5M=
golang.org/x/mod v0.5.1 h1:OJxoQ/rynoF0dcCdI7cLPktw/hR2cueqYfjm43oqK38=
golang.org/x/mod v0.5.1/go.mod h1:5OXOZSfqPIIbmVBIIKWRFfZjPR0E5r58TLhUjH0a2Ro=
golang.org/x/mod v0.6.0-dev.0.20220106191415-9b9b3d81d5e3 h1:kQgndtyPBW/JIYERgdxfwMYh3AVStj88WQTlNDi2a+o=
golang.org/x/mod v0.6.0-dev.0.20220106191415-9b9b3d81d5e3/go.mod h1:3p9vT2HGsQu2K1YbXdKPJLVgG5VJdoTa1poYQBtP1AY=
golang.org/x/mod v0.6.0-dev.0.20220419223038-86c51ed26bb4 h1:6zppjxzCulZykYSLyVDYbneBfbaBIQPYMevg0bEwv2s=
golang.org/x/mod v0.6.0-dev.0.20220419223038-86c51ed26bb4/go.mod h1:jJ57K6gSWd91VN4djpZkiMVwK6gcyfeH4XE8wZrZaV4=
golang.org/x/mod v0.8.0/go.mod h1:iBbtSCu2XBx23ZKBPSOrRkjjQPZFPuis4dIYUhu/chs=
golang.org/x/mod v0.12.0/go.mod h1:iBbtSCu2XBx23ZKBPSOrRkjjQPZFPuis4dIYUhu/chs=
golang.org/x/mod v0.15.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/mod v0.16.0 h1:QX4fJ0Rr5cPQCF7O9lh9Se4pmwfwskqZfq5moyldzic=
golang.org/x/mod v0.16.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/mod v0.23.0 h1:Zb7khfcRGKk+kqfxFaP5tZqCnDZMjC5VtUBs87Hr6QM=
golang.org/x/mod v0.23.0/go.mod h1:6SkKJ3Xj0I0BrPOZoBy3bdMptDDU9oJrpohJ3eWZ1fY=
golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg=
golang.org/x/net v0.0.0-20211015210444-4f30a5c0130f/go.mod h1:9nx3DQGgdP8bBQD5qxJ1jj9UTztislL4KSBs9R2vV5Y=
golang.org/x/net v0.0.0-20220722155237-a158d28d115b/go.mod h1:XRhObCWvk6IyKnWLug+ECip1KBveYUHfp+8e9klMJ9c=
golang.org/x/net v0.6.0/go.mod h1:2Tu9+aMcznHK/AK1HMvgo6xiTLG5rD5rZLDS+rp2Bjs=
golang.org/x/net v0.10.0/go.mod h1:0qNGK6F8kojg2nk9dLZ2mShWaEBan6FAoqfSigmmuDg=
golang.org/x/net v0.15.0/go.mod h1:idbUs1IY1+zTqbi8yxTbhexhEEk5ur9LInksu6HrEpk=
golang.org/x/net v0.21.0/go.mod h1:bIjVDfnllIU7BJ2DNgfnXvpSvtn8VRwhlsaeUTyUS44=
golang.org/x/net v0.22.0/go.mod h1:JKghWKKOSdJwpW2GEx0Ja7fmaKnMsbu+MWVZTokSYmg=
golang.org/x/net v0.35.0/go.mod h1:EglIi67kWsHKlRzzVMUD93VMSWGFOMSZgxFjparz1Qk=
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20210220032951-036812b2e83c/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.1.0/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.3.0/go.mod h1:FU7BRWz2tNW+3quACPkgCx/L+uEAv1htQ0V83Z9Rj+Y=
golang.org/x/sync v0.6.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sync v0.11.0 h1:GGz8+XQP4FvTTrjZPzNKTMFtSXH80RAzG+5ghFPgK9w=
golang.org/x/sync v0.11.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sync v0.16.0 h1:ycBJEhp9p4vXvUZNszeOq0kGTPghopOL8q0fq3vstxw=
golang.org/x/sync v0.16.0/go.mod h1:1dzgHSNfp02xaA81J2MS99Qcpr2w7fw1gpm99rleRqA=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
@ -35,12 +64,31 @@ golang.org/x/sys v0.0.0-20211019181941-9d821ace8654/go.mod h1:oPkhp1MJrh7nUepCBc
golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220722155257-8c9f86f7a55f h1:v4INt8xihDGvnrfjMDVXGxw9wrfxYyCjk0KbXjhR55s=
golang.org/x/sys v0.0.0-20220722155257-8c9f86f7a55f/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.8.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.12.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.17.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/sys v0.18.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/sys v0.30.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/telemetry v0.0.0-20240228155512-f48c80bd79b2/go.mod h1:TeRTkGYfJXctD9OcfyVLyj2J3IxLnKwHJR8f4D8a3YE=
golang.org/x/telemetry v0.0.0-20240521205824-bda55230c457/go.mod h1:pRgIJT+bRLFKnoM1ldnzKoxTIn14Yxz928LQRYYgIN0=
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
golang.org/x/term v0.0.0-20210927222741-03fcf44c2211/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8=
golang.org/x/term v0.5.0/go.mod h1:jMB1sMXY+tzblOD4FWmEbocvup2/aLOaQEp7JmGp78k=
golang.org/x/term v0.8.0/go.mod h1:xPskH00ivmX89bAKVGSKKtLOWNx2+17Eiy94tnKShWo=
golang.org/x/term v0.12.0/go.mod h1:owVbMEjm3cBLCHdkQu9b1opXd4ETQWc3BhuQGKgXgvU=
golang.org/x/term v0.17.0/go.mod h1:lLRBjIVuehSbZlaOtGMbcMncT+aqLLLmKrsjNrUguwk=
golang.org/x/term v0.18.0/go.mod h1:ILwASektA3OnRv7amZ1xhE/KTR+u50pbXfZ03+6Nx58=
golang.org/x/term v0.29.0/go.mod h1:6bl4lRlvVuDgSf3179VpIxBF0o10JUpXWOnI7nErv7s=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.3.7/go.mod h1:u+2+/6zg+i71rQMx5EYifcz6MCKuco9NR6JIITiCfzQ=
golang.org/x/text v0.7.0/go.mod h1:mrYo+phRRbMaCq/xk9113O4dZlRixOauAjOtrjsXDZ8=
golang.org/x/text v0.9.0/go.mod h1:e1OnstbJyHTd6l/uOt8jFFHp6TRDWZR/bV3emEE/zU8=
golang.org/x/text v0.13.0/go.mod h1:TvPlkZtksWOMsz7fbANvkp4WM8x/WCo/om8BMLbz+aE=
golang.org/x/text v0.14.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
golang.org/x/text v0.22.0/go.mod h1:YRoo4H8PVmsu+E3Ou7cqLVH8oXWIHVoX0jqUWALQhfY=
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
golang.org/x/tools v0.1.8 h1:P1HhGGuLW4aAclzjtmJdf0mJOjVUZUzOTqkAkWL+l6w=
@ -49,6 +97,12 @@ golang.org/x/tools v0.1.10 h1:QjFRCZxdOhBJ/UNgnBZLbNV13DlbnK0quyivTnXJM20=
golang.org/x/tools v0.1.10/go.mod h1:Uh6Zz+xoGYZom868N8YTex3t7RhtHDBrE8Gzo9bV56E=
golang.org/x/tools v0.1.12 h1:VveCTK38A2rkS8ZqFY25HIDFscX5X9OoEhJd3quQmXU=
golang.org/x/tools v0.1.12/go.mod h1:hNGJHUnrk76NpqgfD5Aqm5Crs+Hm0VOH/i9J2+nxYbc=
golang.org/x/tools v0.6.0/go.mod h1:Xwgl3UAJ/d3gWutnCtw505GrjyAbvKui8lOU390QaIU=
golang.org/x/tools v0.13.0/go.mod h1:HvlwmtVNQAhOuCjW7xxvovg8wbNq7LwfXh/k7wXUl58=
golang.org/x/tools v0.19.0 h1:tfGCXNR1OsFG+sVdLAitlpjAvD/I6dHDKnYrpEZUHkw=
golang.org/x/tools v0.19.0/go.mod h1:qoJWxmGSIBmAeriMx19ogtrEPrGtDbPK634QFIcLAhc=
golang.org/x/tools v0.30.0 h1:BgcpHewrV5AUp2G9MebG4XPFI1E2W41zU1SaqVA9vJY=
golang.org/x/tools v0.30.0/go.mod h1:c347cR/OJfw5TI+GfX7RUPNMdDRRbjvYTS0jPyvsVtY=
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 h1:go1bK/D/BFZV2I8cIQd1NKEZ+0owSTG1fDTci4IqFcE=

View File

@ -2,4 +2,4 @@ module _ // Auto generated by https://github.com/bwplotka/bingo. DO NOT EDIT
go 1.14
require golang.org/x/tools v0.12.0 // cmd/goimports
require golang.org/x/tools v0.23.0 // cmd/goimports

View File

@ -1,3 +1,4 @@
github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
github.com/yuin/goldmark v1.1.27/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
github.com/yuin/goldmark v1.4.1/go.mod h1:mwnBkeHKe2W/ZEtQ+71ViKU8L12m81fl3OWwC1Zlc8k=
github.com/yuin/goldmark v1.4.13/go.mod h1:6yULJ656Px+3vBD8DxQVa3kxgyrAnzto9xy5taEt/CY=
@ -5,6 +6,10 @@ golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACk
golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI=
golang.org/x/crypto v0.0.0-20210921155107-089bfa567519/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
golang.org/x/crypto v0.12.0/go.mod h1:NF0Gs7EO5K4qLn+Ylc+fih8BSTeIjAP05siRnAh98yw=
golang.org/x/crypto v0.13.0/go.mod h1:y6Z2r+Rw4iayiXXAIxJIDAJ1zMW4yaTpebo8fPOliYc=
golang.org/x/crypto v0.19.0/go.mod h1:Iy9bg/ha4yyC70EfRS8jz+B6ybOBKMaSxLj6P6oBDfU=
golang.org/x/crypto v0.23.0/go.mod h1:CKFgDieR+mRhux2Lsu27y0fO304Db0wZe70UKqHu0v8=
golang.org/x/crypto v0.25.0/go.mod h1:T+wALwcMOSE0kXgUAnPAHqTLW+XHgcELELW8VaDgm/M=
golang.org/x/mod v0.2.0 h1:KU7oHjnv3XNWfa5COkzUifxZmxp1TyI7ImMXqFxLwvQ=
golang.org/x/mod v0.2.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA=
golang.org/x/mod v0.6.0-dev.0.20220419223038-86c51ed26bb4 h1:6zppjxzCulZykYSLyVDYbneBfbaBIQPYMevg0bEwv2s=
@ -12,6 +17,10 @@ golang.org/x/mod v0.6.0-dev.0.20220419223038-86c51ed26bb4/go.mod h1:jJ57K6gSWd91
golang.org/x/mod v0.8.0/go.mod h1:iBbtSCu2XBx23ZKBPSOrRkjjQPZFPuis4dIYUhu/chs=
golang.org/x/mod v0.12.0 h1:rmsUpXtvNzj340zd98LZ4KntptpfRHwpFOHG188oHXc=
golang.org/x/mod v0.12.0/go.mod h1:iBbtSCu2XBx23ZKBPSOrRkjjQPZFPuis4dIYUhu/chs=
golang.org/x/mod v0.15.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/mod v0.17.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/mod v0.19.0 h1:fEdghXQSo20giMthA7cd28ZC+jts4amQ3YMXiP5oMQ8=
golang.org/x/mod v0.19.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
golang.org/x/net v0.0.0-20200226121028-0de0cce0169b/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
@ -21,12 +30,19 @@ golang.org/x/net v0.0.0-20220722155237-a158d28d115b/go.mod h1:XRhObCWvk6IyKnWLug
golang.org/x/net v0.6.0/go.mod h1:2Tu9+aMcznHK/AK1HMvgo6xiTLG5rD5rZLDS+rp2Bjs=
golang.org/x/net v0.10.0/go.mod h1:0qNGK6F8kojg2nk9dLZ2mShWaEBan6FAoqfSigmmuDg=
golang.org/x/net v0.14.0/go.mod h1:PpSgVXXLK0OxS0F31C1/tv6XNguvCrnXIDrFMspZIUI=
golang.org/x/net v0.15.0/go.mod h1:idbUs1IY1+zTqbi8yxTbhexhEEk5ur9LInksu6HrEpk=
golang.org/x/net v0.21.0/go.mod h1:bIjVDfnllIU7BJ2DNgfnXvpSvtn8VRwhlsaeUTyUS44=
golang.org/x/net v0.25.0/go.mod h1:JkAGAh7GEvH74S6FOH42FLoXpXbE/aqXSrIQjXgsiwM=
golang.org/x/net v0.27.0/go.mod h1:dDi0PyhWNoiUOrAS8uXv/vnScO4wnHQO4mj9fn/RytE=
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20190911185100-cd5d95a43a6e/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20210220032951-036812b2e83c/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.1.0/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.3.0/go.mod h1:FU7BRWz2tNW+3quACPkgCx/L+uEAv1htQ0V83Z9Rj+Y=
golang.org/x/sync v0.6.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sync v0.7.0 h1:YsImfSBoP9QPYL0xyKJPq0gcaJdG3rInoqxTWbfQu9M=
golang.org/x/sync v0.7.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
@ -40,11 +56,21 @@ golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.8.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.11.0 h1:eG7RXZHdqOJ1i+0lgLgCpSXAp6M3LYlAo6osgSi0xOM=
golang.org/x/sys v0.11.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.12.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.17.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/sys v0.20.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/sys v0.22.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/telemetry v0.0.0-20240228155512-f48c80bd79b2/go.mod h1:TeRTkGYfJXctD9OcfyVLyj2J3IxLnKwHJR8f4D8a3YE=
golang.org/x/telemetry v0.0.0-20240521205824-bda55230c457/go.mod h1:pRgIJT+bRLFKnoM1ldnzKoxTIn14Yxz928LQRYYgIN0=
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
golang.org/x/term v0.0.0-20210927222741-03fcf44c2211/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8=
golang.org/x/term v0.5.0/go.mod h1:jMB1sMXY+tzblOD4FWmEbocvup2/aLOaQEp7JmGp78k=
golang.org/x/term v0.8.0/go.mod h1:xPskH00ivmX89bAKVGSKKtLOWNx2+17Eiy94tnKShWo=
golang.org/x/term v0.11.0/go.mod h1:zC9APTIj3jG3FdV/Ons+XE1riIZXG4aZ4GTHiPZJPIU=
golang.org/x/term v0.12.0/go.mod h1:owVbMEjm3cBLCHdkQu9b1opXd4ETQWc3BhuQGKgXgvU=
golang.org/x/term v0.17.0/go.mod h1:lLRBjIVuehSbZlaOtGMbcMncT+aqLLLmKrsjNrUguwk=
golang.org/x/term v0.20.0/go.mod h1:8UkIAJTvZgivsXaD6/pH6U9ecQzZ45awqEOzuCvwpFY=
golang.org/x/term v0.22.0/go.mod h1:F3qCibpT5AMpCRfhfT53vVJwhLtIVHhB9XDjfFvnMI4=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
@ -52,6 +78,10 @@ golang.org/x/text v0.3.7/go.mod h1:u+2+/6zg+i71rQMx5EYifcz6MCKuco9NR6JIITiCfzQ=
golang.org/x/text v0.7.0/go.mod h1:mrYo+phRRbMaCq/xk9113O4dZlRixOauAjOtrjsXDZ8=
golang.org/x/text v0.9.0/go.mod h1:e1OnstbJyHTd6l/uOt8jFFHp6TRDWZR/bV3emEE/zU8=
golang.org/x/text v0.12.0/go.mod h1:TvPlkZtksWOMsz7fbANvkp4WM8x/WCo/om8BMLbz+aE=
golang.org/x/text v0.13.0/go.mod h1:TvPlkZtksWOMsz7fbANvkp4WM8x/WCo/om8BMLbz+aE=
golang.org/x/text v0.14.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
golang.org/x/text v0.15.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
golang.org/x/text v0.16.0/go.mod h1:GhwF1Be+LQoKShO3cGOHzqOgRrGaYc9AvblQOmPVHnI=
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
golang.org/x/tools v0.0.0-20200526224456-8b020aee10d2 h1:21BqcH/onxtGHn1A2GDOJjZnbt4Nlez629S3eaR+eYs=
@ -62,6 +92,10 @@ golang.org/x/tools v0.1.12/go.mod h1:hNGJHUnrk76NpqgfD5Aqm5Crs+Hm0VOH/i9J2+nxYbc
golang.org/x/tools v0.6.0/go.mod h1:Xwgl3UAJ/d3gWutnCtw505GrjyAbvKui8lOU390QaIU=
golang.org/x/tools v0.12.0 h1:YW6HUoUmYBpwSgyaGaZq1fHjrBjX1rlpZ54T6mu2kss=
golang.org/x/tools v0.12.0/go.mod h1:Sc0INKfu04TlqNoRA1hgpFZbhYXHPr4V5DzpSBTPqQM=
golang.org/x/tools v0.13.0/go.mod h1:HvlwmtVNQAhOuCjW7xxvovg8wbNq7LwfXh/k7wXUl58=
golang.org/x/tools v0.21.1-0.20240508182429-e35e4ccd0d2d/go.mod h1:aiJjzUbINMkxbQROHiO6hDPo2LHcIPhhQsa9DLh0yGk=
golang.org/x/tools v0.23.0 h1:SGsXPZ+2l4JsgaCKkx+FQ9YZ5XEtA1GZYuoDjenLjvg=
golang.org/x/tools v0.23.0/go.mod h1:pnu6ufv6vQkll6szChhK3C3L/ruaIv5eBeztNG8wtsI=
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543 h1:E7g+9GITq07hpfrRu66IVDexMakfv52eLZ2CXBWiKr4=

View File

@ -1,5 +1,5 @@
module _ // Auto generated by https://github.com/bwplotka/bingo. DO NOT EDIT
go 1.14
go 1.25.0
require github.com/golangci/golangci-lint v1.54.1 // cmd/golangci-lint
require github.com/golangci/golangci-lint/v2 v2.4.0 // cmd/golangci-lint

File diff suppressed because it is too large Load Diff

View File

@ -1,4 +1,6 @@
github.com/Songmu/gotesplit v0.2.1 h1:qJFvR75nJpeKyMQFwyDtFrcc6zDWhrHAkks7DvM8oLo=
github.com/Songmu/gotesplit v0.2.1/go.mod h1:sVBfmLT26b1H5VhUpq8cRhCVK75GAmW9c8r2NiK0gzk=
github.com/jstemmer/go-junit-report v1.0.0 h1:8X1gzZpR+nVQLAht+L/foqOeX2l9DTZoaIPbEQHxsds=
github.com/jstemmer/go-junit-report v1.0.0/go.mod h1:Brl9GWCQeLvo8nXZwPNNblvFj/XSXhF0NWZEnDohbsk=
golang.org/x/sync v0.0.0-20220513210516-0976fa681c29 h1:w8s32wxx3sY+OjLlv9qltkLU5yvJzxjjgiHWLjdIcw4=
golang.org/x/sync v0.0.0-20220513210516-0976fa681c29/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=

View File

@ -1,5 +1,7 @@
module _ // Auto generated by https://github.com/bwplotka/bingo. DO NOT EDIT
go 1.14
go 1.22
require github.com/minio/minio v0.0.0-20220720015624-ce8397f7d944
toolchain go1.23.1
require github.com/minio/minio v0.0.0-20241014163537-3da7c9cce3de

File diff suppressed because it is too large Load Diff

View File

@ -1,10 +1,12 @@
module _ // Auto generated by https://github.com/bwplotka/bingo. DO NOT EDIT
go 1.14
go 1.21.0
toolchain go1.23.1
replace k8s.io/klog => github.com/simonpasquier/klog-gokit v0.3.0
replace k8s.io/klog/v2 => github.com/simonpasquier/klog-gokit/v3 v3.0.0
replace k8s.io/klog/v2 => github.com/simonpasquier/klog-gokit/v3 v3.3.0
exclude github.com/linode/linodego v1.0.0
@ -12,4 +14,4 @@ exclude github.com/grpc-ecosystem/grpc-gateway v1.14.7
exclude google.golang.org/api v0.30.0
require github.com/prometheus/prometheus v0.37.0 // cmd/prometheus
require github.com/prometheus/prometheus v0.54.1 // cmd/prometheus

File diff suppressed because it is too large Load Diff

View File

@ -1,5 +1,7 @@
module _ // Auto generated by https://github.com/bwplotka/bingo. DO NOT EDIT
go 1.14
go 1.21
require github.com/prometheus/promu v0.5.0
toolchain go1.23.8
require github.com/prometheus/promu v0.17.0

File diff suppressed because it is too large Load Diff

View File

@ -1,5 +1,7 @@
module _ // Auto generated by https://github.com/bwplotka/bingo. DO NOT EDIT
go 1.14
go 1.21
require mvdan.cc/sh/v3 v3.7.0 // cmd/shfmt
toolchain go1.22.5
require mvdan.cc/sh/v3 v3.8.0 // cmd/shfmt

View File

@ -1,11 +1,14 @@
github.com/creack/pty v1.1.9/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ33E=
github.com/creack/pty v1.1.17/go.mod h1:MOBLtS5ELjhRRrroQr9kyvTxUAFNvYEK993ew/Vr4O4=
github.com/creack/pty v1.1.18/go.mod h1:MOBLtS5ELjhRRrroQr9kyvTxUAFNvYEK993ew/Vr4O4=
github.com/creack/pty v1.1.21/go.mod h1:MOBLtS5ELjhRRrroQr9kyvTxUAFNvYEK993ew/Vr4O4=
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/frankban/quicktest v1.14.0/go.mod h1:NeW+ay9A/U67EYXNFA1nPE8e/tnQv/09mUdL/ijj8og=
github.com/frankban/quicktest v1.14.5/go.mod h1:4ptaffx2x8+WTWXmUCuVU6aPUX1/Mz7zb5vbUoiM6w0=
github.com/frankban/quicktest v1.14.6/go.mod h1:4ptaffx2x8+WTWXmUCuVU6aPUX1/Mz7zb5vbUoiM6w0=
github.com/google/go-cmp v0.5.6/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/go-cmp v0.5.9/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
github.com/google/renameio v0.1.0 h1:GOZbcHa3HfsPKPlmyPyN2KEohoMXOhdMbHrvbpl2QaA=
github.com/google/renameio v0.1.0/go.mod h1:KWCgfxg9yswjAJkECMjeO8J8rahYeXnNhOm40UhjYkI=
github.com/google/renameio v1.0.1 h1:Lh/jXZmvZxb0BBeSY5VKEfidcbcbenKjZFzM/q0fSeU=
@ -29,22 +32,27 @@ github.com/rogpeppe/go-internal v1.6.1/go.mod h1:xXDCJY+GAPziupqXw64V24skbSoqbTE
github.com/rogpeppe/go-internal v1.8.1/go.mod h1:JeRgkft04UBgHMgCIwADu4Pn6Mtm5d4nPKWu0nJ5d+o=
github.com/rogpeppe/go-internal v1.9.0/go.mod h1:WtVeX8xhTBvf0smdhujwtBcq4Qrzq/fJaraNFVN+nFs=
github.com/rogpeppe/go-internal v1.10.1-0.20230524175051-ec119421bb97/go.mod h1:ddIwULY96R17DhadqLgMfk9H9tvdUzkipdSkR5nkCZA=
github.com/rogpeppe/go-internal v1.12.0/go.mod h1:E+RYuTGaKKdloAfM02xzb0FW3Paa99yedzYV+kq4uf4=
github.com/sergi/go-diff v1.0.0/go.mod h1:0CfEIISq7TuYL3j771MWULgwwjU+GofnZX9QAmXWZgo=
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/testify v1.4.0/go.mod h1:j7eGeouHqKxXV5pUuKE4zz7dFj8WfuZ+81PSLYec5m4=
github.com/yuin/goldmark v1.4.13/go.mod h1:6yULJ656Px+3vBD8DxQVa3kxgyrAnzto9xy5taEt/CY=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/crypto v0.0.0-20210921155107-089bfa567519/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
golang.org/x/crypto v0.18.0/go.mod h1:R0j02AL6hcrfOiy9T4ZYp/rcWeMxM3L6QYxlOuEG1mg=
golang.org/x/mod v0.6.0-dev.0.20220419223038-86c51ed26bb4/go.mod h1:jJ57K6gSWd91VN4djpZkiMVwK6gcyfeH4XE8wZrZaV4=
golang.org/x/mod v0.9.0/go.mod h1:iBbtSCu2XBx23ZKBPSOrRkjjQPZFPuis4dIYUhu/chs=
golang.org/x/mod v0.14.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg=
golang.org/x/net v0.0.0-20220722155237-a158d28d115b/go.mod h1:XRhObCWvk6IyKnWLug+ECip1KBveYUHfp+8e9klMJ9c=
golang.org/x/net v0.20.0/go.mod h1:z8BVo6PvndSri0LbOE3hAn0apkU+1YvI6E70E9jsnvY=
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20190911185100-cd5d95a43a6e/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20210220032951-036812b2e83c/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.2.0/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.6.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20191026070338-33540a1f6037/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20200217220822-9197077df867 h1:JoRuNIf+rpHl+VhScRQQvzbHed86tKkqwPMV34T8myw=
@ -57,6 +65,8 @@ golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a/go.mod h1:oPkhp1MJrh7nUepCBc
golang.org/x/sys v0.0.0-20220722155257-8c9f86f7a55f/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.8.0 h1:EBmGv8NaZBZTWvrbjNoL6HVt+IVy3QDQpJs7VRIw3tU=
golang.org/x/sys v0.8.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.17.0 h1:25cE3gD+tdBA7lp7QfhuV+rJiE9YXTcS3VG1SqssI/Y=
golang.org/x/sys v0.17.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/term v0.0.0-20191110171634-ad39bd3f0407 h1:5zh5atpUEdIc478E/ebrIaHLKcfVvG6dL/fGv7BcMoM=
golang.org/x/term v0.0.0-20191110171634-ad39bd3f0407/go.mod h1:Nr5EML6q2oocZ2LXRh80K7BxOlk5/8JxuGnuhpl+muw=
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
@ -64,12 +74,16 @@ golang.org/x/term v0.0.0-20210927222741-03fcf44c2211 h1:JGgROgKl9N8DuW20oFS5gxc+
golang.org/x/term v0.0.0-20210927222741-03fcf44c2211/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8=
golang.org/x/term v0.8.0 h1:n5xxQn2i3PC0yLAbjTpNT85q/Kgzcr2gIoX9OrJUols=
golang.org/x/term v0.8.0/go.mod h1:xPskH00ivmX89bAKVGSKKtLOWNx2+17Eiy94tnKShWo=
golang.org/x/term v0.17.0 h1:mkTF7LCd6WGJNL3K1Ad7kwxNfYAW6a8a8QqtMblp/4U=
golang.org/x/term v0.17.0/go.mod h1:lLRBjIVuehSbZlaOtGMbcMncT+aqLLLmKrsjNrUguwk=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.3.7/go.mod h1:u+2+/6zg+i71rQMx5EYifcz6MCKuco9NR6JIITiCfzQ=
golang.org/x/text v0.14.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
golang.org/x/tools v0.1.12/go.mod h1:hNGJHUnrk76NpqgfD5Aqm5Crs+Hm0VOH/i9J2+nxYbc=
golang.org/x/tools v0.17.0/go.mod h1:xsh6VxdV005rRVaS6SSAf9oiAqljS7UZUacMZ8Bnsps=
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
@ -82,9 +96,13 @@ mvdan.cc/editorconfig v0.1.1-0.20200121172147-e40951bde157 h1:VBYz8greWWP8BDpRX0
mvdan.cc/editorconfig v0.1.1-0.20200121172147-e40951bde157/go.mod h1:Ge4atmRUYqueGppvJ7JNrtqpqokoJEFxYbP0Z+WeKS8=
mvdan.cc/editorconfig v0.2.0 h1:XL+7ys6ls/RKrkUNFQvEwIvNHh+JKx8Mj1pUV5wQxQE=
mvdan.cc/editorconfig v0.2.0/go.mod h1:lvnnD3BNdBYkhq+B4uBuFFKatfp02eB6HixDvEz91C0=
mvdan.cc/editorconfig v0.2.1-0.20231228180347-1925077f8eb2 h1:8nmqQGVnHUtHuT+yvuA49lQK0y5il5IOr2PtCBkDI2M=
mvdan.cc/editorconfig v0.2.1-0.20231228180347-1925077f8eb2/go.mod h1:r8RiQJRtzrPrZdcdEs5VCMqvRxAzYDUu9a4S9z7fKh8=
mvdan.cc/sh/v3 v3.1.2 h1:PG5BYlwtrkZTbJXUy25r0/q9shB5ObttCaknkOIB1XQ=
mvdan.cc/sh/v3 v3.1.2/go.mod h1:F+Vm4ZxPJxDKExMLhvjuI50oPnedVXpfjNSrusiTOno=
mvdan.cc/sh/v3 v3.5.1 h1:hmP3UOw4f+EYexsJjFxvU38+kn+V/s2CclXHanIBkmQ=
mvdan.cc/sh/v3 v3.5.1/go.mod h1:1JcoyAKm1lZw/2bZje/iYKWicU/KMd0rsyJeKHnsK4E=
mvdan.cc/sh/v3 v3.7.0 h1:lSTjdP/1xsddtaKfGg7Myu7DnlHItd3/M2tomOcNNBg=
mvdan.cc/sh/v3 v3.7.0/go.mod h1:K2gwkaesF/D7av7Kxl0HbF5kGOd2ArupNTX3X44+8l8=
mvdan.cc/sh/v3 v3.8.0 h1:ZxuJipLZwr/HLbASonmXtcvvC9HXY9d2lXZHnKGjFc8=
mvdan.cc/sh/v3 v3.8.0/go.mod h1:w04623xkgBVo7/IUK89E0g8hBykgEpN0vgOj3RJr6MY=

View File

@ -1,4 +1,4 @@
# Auto generated binary variables helper managed by https://github.com/bwplotka/bingo v0.8. DO NOT EDIT.
# Auto generated binary variables helper managed by https://github.com/bwplotka/bingo v0.9. DO NOT EDIT.
# All tools are designed to be build inside $GOBIN.
# Those variables will work only until 'bingo get' was invoked, or if tools were installed via Makefile's Variables.mk.
GOBIN=${GOBIN:=$(go env GOBIN)}
@ -8,17 +8,19 @@ if [ -z "$GOBIN" ]; then
fi
ALERTMANAGER="${GOBIN}/alertmanager-v0.24.0"
ALERTMANAGER="${GOBIN}/alertmanager-v0.27.0"
BINGO="${GOBIN}/bingo-v0.8.1-0.20230820182247-0568407746a2"
BINGO="${GOBIN}/bingo-v0.9.0"
FAILLINT="${GOBIN}/faillint-v1.11.0"
CAPNPC_GO="${GOBIN}/capnpc-go-v3.0.1-alpha.2.0.20240830165715-46ccd63a72af"
GOIMPORTS="${GOBIN}/goimports-v0.12.0"
FAILLINT="${GOBIN}/faillint-v1.15.0"
GOIMPORTS="${GOBIN}/goimports-v0.23.0"
GOJSONTOYAML="${GOBIN}/gojsontoyaml-v0.1.0"
GOLANGCI_LINT="${GOBIN}/golangci-lint-v1.54.1"
GOLANGCI_LINT="${GOBIN}/golangci-lint-v2.4.0"
GOTESPLIT="${GOBIN}/gotesplit-v0.2.1"
@ -34,17 +36,17 @@ JSONNETFMT="${GOBIN}/jsonnetfmt-v0.18.0"
MDOX="${GOBIN}/mdox-v0.9.1-0.20220713110358-25b9abcf90a0"
MINIO="${GOBIN}/minio-v0.0.0-20220720015624-ce8397f7d944"
MINIO="${GOBIN}/minio-v0.0.0-20241014163537-3da7c9cce3de"
PROMDOC="${GOBIN}/promdoc-v0.8.0"
PROMETHEUS="${GOBIN}/prometheus-v0.37.0"
PROMETHEUS="${GOBIN}/prometheus-v0.54.1"
PROMTOOL="${GOBIN}/promtool-v0.47.0"
PROMU="${GOBIN}/promu-v0.5.0"
PROMU="${GOBIN}/promu-v0.17.0"
PROTOC_GEN_GOGOFAST="${GOBIN}/protoc-gen-gogofast-v1.3.2"
SHFMT="${GOBIN}/shfmt-v3.7.0"
SHFMT="${GOBIN}/shfmt-v3.8.0"

View File

@ -8,56 +8,13 @@ orbs:
executors:
golang:
docker:
- image: cimg/go:1.21-node
- image: cimg/go:1.25.0-node
golang-test:
docker:
- image: cimg/go:1.21-node
- image: cimg/go:1.25.0-node
- image: quay.io/thanos/docker-swift-onlyone-authv2-keystone:v0.1
jobs:
test:
executor: golang-test
environment:
GO111MODULE: "on"
steps:
- git-shallow-clone/checkout
- go/load-cache
- go/mod-download
- run:
name: Download bingo modules
command: |
make install-tool-deps
- go/save-cache
- setup_remote_docker:
version: 20.10.12
- run:
name: Create Secret if PR is not forked
# GCS integration tests are run only for author's PR that have write access, because these tests
# require credentials. Env variables that sets up these tests will work only for these kind of PRs.
command: |
if ! [ -z ${GCP_PROJECT} ]; then
echo $GOOGLE_APPLICATION_CREDENTIALS_CONTENT > $GOOGLE_APPLICATION_CREDENTIALS
echo "Awesome! GCS and S3 AWS integration tests are enabled."
fi
- run:
name: "Run unit tests."
no_output_timeout: "30m"
environment:
THANOS_TEST_OBJSTORE_SKIP: GCS,S3,AZURE,COS,ALIYUNOSS,BOS,OCI,OBS
# Variables for Swift testing.
OS_AUTH_URL: http://127.0.0.1:5000/v2.0
OS_PASSWORD: s3cr3t
OS_PROJECT_NAME: admin
OS_REGION_NAME: RegionOne
OS_USERNAME: admin
# taskset sets CPU affinity to 2 (current CPU limit).
command: |
if [ -z ${GCP_PROJECT} ]; then
export THANOS_TEST_OBJSTORE_SKIP=${THANOS_TEST_OBJSTORE_SKIP}
fi
echo "Skipping tests for object storages: ${THANOS_TEST_OBJSTORE_SKIP}"
taskset 2 make test
# Cross build is needed for publish_release but needs to be done outside of docker.
cross_build:
machine: true
@ -82,7 +39,7 @@ jobs:
- git-shallow-clone/checkout
- go/mod-download-cached
- setup_remote_docker:
version: 20.10.12
version: docker24
- attach_workspace:
at: .
# Register qemu to support multi-arch.
@ -104,7 +61,7 @@ jobs:
- git-shallow-clone/checkout
- go/mod-download-cached
- setup_remote_docker:
version: 20.10.12
version: docker24
- attach_workspace:
at: .
- run: make tarballs-release
@ -127,19 +84,11 @@ workflows:
version: 2
thanos:
jobs:
- test:
filters:
tags:
only: /.*/
- publish_main:
requires:
- test
filters:
branches:
only: main
- cross_build:
requires:
- test
filters:
tags:
only: /^v[0-9]+(\.[0-9]+){2}(-.+|[^-.]*)$/
@ -147,7 +96,6 @@ workflows:
ignore: /.*/
- publish_release:
requires:
- test
- cross_build
filters:
tags:

View File

@ -1,9 +1,9 @@
# For details, see https://github.com/devcontainers/images/tree/main/src/go
FROM mcr.microsoft.com/devcontainers/go:1.21
FROM mcr.microsoft.com/devcontainers/go:1.23
RUN echo "Downloading prometheus..." \
&& curl -sSL -H "Accept: application/vnd.github.v3+json" "https://api.github.com/repos/prometheus/prometheus/tags" -o /tmp/tags.json \
&& VERSION_LIST="$(jq -r '.[] | select(.name | contains("rc") | not) | .name | split("v") | .[1]' /tmp/tags.json | tr -d '"' | sort -rV)" \
&& curl -sSL -H "Accept: application/vnd.github.v3+json" "https://api.github.com/repos/prometheus/prometheus/releases" -o /tmp/releases.json \
&& VERSION_LIST="$(jq -r '.[] | select(.tag_name | contains("rc") | not) | .tag_name | split("v") | .[1]' /tmp/releases.json | tr -d '"' | sort -rV)" \
&& PROMETHEUS_LATEST_VERSION="$(echo "${VERSION_LIST}" | head -n 1)" \
&& PROMETHEUS_FILE_NAME="prometheus-${PROMETHEUS_LATEST_VERSION}.linux-amd64" \
&& curl -fsSLO "https://github.com/prometheus/prometheus/releases/download/v${PROMETHEUS_LATEST_VERSION}/${PROMETHEUS_FILE_NAME}.tar.gz" \

View File

@ -1,3 +0,0 @@
(github.com/go-kit/log.Logger).Log
fmt.Fprintln
fmt.Fprint

View File

@ -1,21 +1,16 @@
---
version: 2
updates:
- package-ecosystem: "gomod"
directory: "/"
vendor: false
schedule:
interval: "weekly"
labels: ["dependencies"]
open-pull-requests-limit: 20
- package-ecosystem: "docker"
directory: "/"
schedule:
interval: "weekly"
labels: ["dependencies"]
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: weekly
labels:
- "dependencies"
interval: weekly

View File

@ -20,6 +20,10 @@ on:
schedule:
- cron: '30 12 * * 1'
permissions:
contents: read
security-events: write
jobs:
analyze:
name: Analyze
@ -35,16 +39,16 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@v3
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
- name: Set up Go
uses: actions/setup-go@v3
uses: actions/setup-go@0a12ed9d6a96ab950c8f026ed9f722fe0da7ef32 # v5.0.2
with:
go-version: 1.21.x
go-version: 1.22.x
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
uses: github/codeql-action/init@fca7ace96b7d713c7035871441bd52efbe39e27e # v3.28.19
with:
languages: ${{ matrix.language }}
config-file: ./.github/codeql/codeql-config.yml
@ -56,7 +60,7 @@ jobs:
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v2
uses: github/codeql-action/autobuild@fca7ace96b7d713c7035871441bd52efbe39e27e # v3.28.19
# Command-line programs to run using the OS shell.
# 📚 https://git.io/JvXDl
@ -70,4 +74,4 @@ jobs:
# make release
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
uses: github/codeql-action/analyze@fca7ace96b7d713c7035871441bd52efbe39e27e # v3.28.19

View File

@ -3,12 +3,18 @@ on:
schedule:
- cron: '0 * * * *'
name: busybox-update workflow
permissions:
contents: read
jobs:
checkVersionAndCreatePR:
permissions:
contents: write # for peter-evans/create-pull-request to create branch
pull-requests: write # for peter-evans/create-pull-request to create a PR
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
- name: Run busybox updater
run: |
@ -17,7 +23,7 @@ jobs:
shell: bash
- name: Create Pull Request
uses: peter-evans/create-pull-request@v3
uses: peter-evans/create-pull-request@dd2324fc52d5d43c699a5636bcf19fceaa70c284 # v7.0.7
with:
signoff: true
token: ${{ secrets.GITHUB_TOKEN }}

View File

@ -7,6 +7,9 @@ on:
tags:
pull_request:
permissions:
contents: read
jobs:
check:
runs-on: ubuntu-latest
@ -15,21 +18,21 @@ jobs:
GOBIN: /tmp/.bin
steps:
- name: Checkout code
uses: actions/checkout@v3
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
- name: Install Go
uses: actions/setup-go@v3
uses: actions/setup-go@0a12ed9d6a96ab950c8f026ed9f722fe0da7ef32 # v5.0.2
with:
go-version: 1.21.x
go-version: 1.25.x
- uses: actions/cache@v3
- uses: actions/cache@0c907a75c2c80ebcb7f088228285e798b750cf8f # v4.2.1
with:
path: ~/go/pkg/mod
key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}
restore-keys: |
${{ runner.os }}-go-
- uses: actions/cache@v3
- uses: actions/cache@0c907a75c2c80ebcb7f088228285e798b750cf8f # v4.2.1
with:
path: .mdoxcache
key: ${{ runner.os }}-mdox-${{ hashFiles('docs/**/*.md', 'examples/**/*.md', 'mixin/**/*.md', '*.md') }}

View File

@ -7,8 +7,42 @@ on:
tags:
pull_request:
# TODO(bwplotka): Add tests here.
permissions:
contents: read
jobs:
unit:
runs-on: ubuntu-latest
name: Thanos unit tests
env:
THANOS_TEST_OBJSTORE_SKIP: GCS,S3,AZURE,COS,ALIYUNOSS,BOS,OCI,OBS,SWIFT
OS_AUTH_URL: http://127.0.0.1:5000/v2.0
OS_PASSWORD: s3cr3t
OS_PROJECT_NAME: admin
OS_REGION_NAME: RegionOne
OS_USERNAME: admin
GOBIN: /tmp/.bin
services:
swift:
image: 'quay.io/thanos/docker-swift-onlyone-authv2-keystone:v0.1'
ports:
- 5000:5000
steps:
- name: Checkout code
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
- name: Install Go.
uses: actions/setup-go@0a12ed9d6a96ab950c8f026ed9f722fe0da7ef32 # v5.0.2
with:
go-version: 1.25.x
- name: Install bingo modules
run: make install-tool-deps
- name: Add GOBIN to path
run: echo "/tmp/.bin" >> $GITHUB_PATH
- name: Run unit tests
run: make test
cross-build-check:
runs-on: ubuntu-latest
name: Go build for different platforms
@ -16,14 +50,14 @@ jobs:
GOBIN: /tmp/.bin
steps:
- name: Checkout code
uses: actions/checkout@v3
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
- name: Install Go
uses: actions/setup-go@v3
uses: actions/setup-go@0a12ed9d6a96ab950c8f026ed9f722fe0da7ef32 # v5.0.2
with:
go-version: 1.21.x
go-version: 1.25.x
- uses: actions/cache@v3
- uses: actions/cache@0c907a75c2c80ebcb7f088228285e798b750cf8f # v4.2.1
with:
path: |
~/.cache/go-build
@ -36,6 +70,33 @@ jobs:
- name: Cross build check
run: make crossbuild
build-stringlabels:
runs-on: ubuntu-latest
name: Go build with -tags=stringlabels
env:
GOBIN: /tmp/.bin
steps:
- name: Checkout code
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
- name: Install Go
uses: actions/setup-go@0a12ed9d6a96ab950c8f026ed9f722fe0da7ef32 # v5.0.2
with:
go-version: 1.25.x
- uses: actions/cache@0c907a75c2c80ebcb7f088228285e798b750cf8f # v4.2.1
with:
path: |
~/.cache/go-build
~/.cache/golangci-lint
~/go/pkg/mod
key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}
restore-keys: |
${{ runner.os }}-go-
- name: Cross build check
run: go build -tags=stringlabels ./cmd/thanos
lint:
runs-on: ubuntu-latest
name: Linters (Static Analysis) for Go
@ -43,14 +104,14 @@ jobs:
GOBIN: /tmp/.bin
steps:
- name: Checkout code
uses: actions/checkout@v3
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
- name: Install Go
uses: actions/setup-go@v3
uses: actions/setup-go@0a12ed9d6a96ab950c8f026ed9f722fe0da7ef32 # v5.0.2
with:
go-version: 1.21.x
go-version: 1.25.x
- uses: actions/cache@v3
- uses: actions/cache@0c907a75c2c80ebcb7f088228285e798b750cf8f # v4.2.1
with:
path: |
~/.cache/go-build
@ -66,26 +127,40 @@ jobs:
- name: Linting & vetting
run: make go-lint
codespell:
runs-on: ubuntu-latest
name: Check misspelled words
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Run codespell
uses: codespell-project/actions-codespell@v2
with:
check_filenames: false
check_hidden: true
skip: ./pkg/ui/*,./pkg/store/6545postingsrepro,./internal/*,./mixin/vendor/*,./.bingo/*,go.mod,go.sum
ignore_words_list: intrumentation,mmaped,nd,ot,re-use,ser,serie,sme,sudu,tast,te,ans
e2e:
strategy:
fail-fast: false
matrix:
parallelism: [8]
index: [0, 1, 2, 3, 4, 5, 6, 7]
runs-on: ubuntu-latest
runs-on: ubuntu-24.04
name: Thanos end-to-end tests
env:
GOBIN: /tmp/.bin
steps:
- name: Checkout code
uses: actions/checkout@v3
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
- name: Install Go.
uses: actions/setup-go@v3
uses: actions/setup-go@0a12ed9d6a96ab950c8f026ed9f722fe0da7ef32 # v5.0.2
with:
go-version: 1.21.x
go-version: 1.25.x
- uses: actions/cache@v3
- uses: actions/cache@0c907a75c2c80ebcb7f088228285e798b750cf8f # v4.2.1
with:
path: |
~/.cache/go-build

View File

@ -6,17 +6,20 @@ on:
pull_request:
branches: [main]
permissions:
contents: read
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
- name: Set up Go
uses: actions/setup-go@v3
uses: actions/setup-go@0a12ed9d6a96ab950c8f026ed9f722fe0da7ef32 # v5.0.2
with:
go-version: 1.21.x
go-version: 1.22.x
- name: Generate
run: make examples
@ -29,12 +32,12 @@ jobs:
name: Linters (Static Analysis) for Jsonnet (mixin)
steps:
- name: Checkout code
uses: actions/checkout@v3
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
- name: Install Go
uses: actions/setup-go@v3
uses: actions/setup-go@0a12ed9d6a96ab950c8f026ed9f722fe0da7ef32 # v5.0.2
with:
go-version: 1.21.x
go-version: 1.22.x
- name: Format
run: |

View File

@ -6,6 +6,9 @@ on:
- main
pull_request:
permissions:
contents: read
jobs:
build:
runs-on: ubuntu-latest
@ -15,18 +18,19 @@ jobs:
name: React UI test on Node ${{ matrix.node }}
steps:
- name: Checkout code
uses: actions/checkout@v3
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
- name: Install nodejs
uses: actions/setup-node@v3
uses: actions/setup-node@1e60f620b9541d16bece96c5465dc8ee9832be0b # v4.0.3
with:
node-version: ${{ matrix.node }}
- uses: actions/cache@v3
- uses: actions/cache@0c907a75c2c80ebcb7f088228285e798b750cf8f # v4.2.1
with:
path: ~/.npm
key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-
- run: CI=false make check-react-app
- run: make react-app-test

View File

@ -1 +1 @@
1.21
1.24

View File

@ -1,82 +1,68 @@
# This file contains all available configuration options
# with their default values.
# options for analysis running
version: "2"
run:
# timeout for analysis, e.g. 30s, 5m, default is 1m
deadline: 5m
# exit code when at least one issue was found, default is 1
build-tags:
- slicelabels
issues-exit-code: 1
# which dirs to skip: they won't be analyzed;
# can use regexp here: generated.*, regexp is applied on full path;
# default value is empty list, but next dirs are always skipped independently
# from this option's value:
# vendor$, third_party$, testdata$, examples$, Godeps$, builtin$
skip-dirs:
- vendor
- internal/cortex
# output configuration options
output:
# colored-line-number|line-number|json|tab|checkstyle, default is "colored-line-number"
format: colored-line-number
# print lines of code with issue, default is true
print-issued-lines: true
# print linter name in the end of issue text, default is true
print-linter-name: true
linters:
enable:
# Sorted alphabetically.
- errcheck
- goconst
- godot
- misspell
- promlinter
- unparam
settings:
errcheck:
exclude-functions:
- (github.com/go-kit/log.Logger).Log
- fmt.Fprintln
- fmt.Fprint
goconst:
min-occurrences: 5
misspell:
locale: US
exclusions:
generated: lax
presets:
- comments
- common-false-positives
- legacy
- std-error-handling
rules:
- linters:
- promlinter
path: _test\.go
- linters:
- unused
text: SourceStoreAPI.implementsStoreAPI
- linters:
- unused
text: SourceStoreAPI.producesBlocks
- linters:
- unused
text: Source.producesBlocks
- linters:
- unused
text: newMockAlertmanager
- linters:
- unused
text: ruleAndAssert
paths:
- vendor
- internal/cortex
- .bingo
- third_party$
- builtin$
- examples$
formatters:
enable:
- gofmt
- goimports
- gosimple
- govet
- ineffassign
- misspell
- staticcheck
- typecheck
- unparam
- unused
- exportloopref
- promlinter
linters-settings:
errcheck:
exclude: ./.errcheck_excludes.txt
misspell:
locale: US
goconst:
min-occurrences: 5
issues:
exclude-rules:
# We don't check metrics naming in the tests.
- path: _test\.go
linters:
- promlinter
# These are not being checked since these methods exist
# so that no one else could implement them.
- linters:
- unused
text: "SourceStoreAPI.implementsStoreAPI"
- linters:
- unused
text: "SourceStoreAPI.producesBlocks"
- linters:
- unused
text: "Source.producesBlocks"
- linters:
- unused
text: "newMockAlertmanager"
- linters:
- unused
text: "ruleAndAssert"
exclusions:
generated: lax
paths:
- vendor
- internal/cortex
- .bingo
- third_party$
- builtin$
- examples$

View File

@ -42,3 +42,14 @@ validators:
type: 'ignore'
- regex: 'twitter\.com'
type: 'ignore'
- regex: 'outshift\.cisco\.com\/blog\/multi-cluster-monitoring'
type: 'ignore'
# Expired certificate
- regex: 'bestpractices\.coreinfrastructure\.org\/projects\/3048'
type: 'ignore'
# Frequent DNS issues.
- regex: 'build\.thebeat\.co'
type: 'ignore'
# TLS certificate issues
- regex: 'itnext\.io'
type: 'ignore'

View File

@ -1,12 +1,16 @@
go:
version: 1.21
version: 1.25
repository:
path: github.com/thanos-io/thanos
build:
binaries:
- name: thanos
path: ./cmd/thanos
flags: -a -tags netgo
flags: -a
tags:
all:
- netgo
- slicelabels
ldflags: |
-X github.com/prometheus/common/version.Version={{.Version}}
-X github.com/prometheus/common/version.Revision={{.Revision}}
@ -16,8 +20,6 @@ build:
crossbuild:
platforms:
- linux/amd64
- darwin/amd64
- linux/arm64
- windows/amd64
- freebsd/amd64
- linux/ppc64le

View File

@ -12,19 +12,311 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
### Fixed
- [#8334](https://github.com/thanos-io/thanos/pull/8334) Query: wait for initial endpoint discovery before becoming ready
### Added
- [#8366](https://github.com/thanos-io/thanos/pull/8366) Store: optionally ignore Parquet migrated blocks
- [#8359](https://github.com/thanos-io/thanos/pull/8359) Tools: add `--shipper.upload-compacted` flag for uploading compacted blocks to bucket upload-blocks
### Changed
- [#8370](https://github.com/thanos-io/thanos/pull/8370) Query: announced labelset now reflects relabel-config
### Removed
### [v0.39.2](https://github.com/thanos-io/thanos/tree/release-0.39) - 2025 07 17
### Fixed
- [#8374](https://github.com/thanos-io/thanos/pull/8374) Query: fix panic when concurrently accessing annotations map
- [#8375](https://github.com/thanos-io/thanos/pull/8375) Query: fix native histogram buckets in distributed queries
### [v0.39.1](https://github.com/thanos-io/thanos/tree/release-0.39) - 2025 07 01
Fixes a memory leak issue on query-frontend. The bug only affects v0.39.0.
### Fixed
- [#8349](https://github.com/thanos-io/thanos/pull/8349) Query-Frontend: properly clean up resources
- [#8338](https://github.com/thanos-io/thanos/pull/8338) Query-Frontend: use original roundtripper + close immediately
## [v0.39.0](https://github.com/thanos-io/thanos/tree/release-0.39) - 2025 06 25
In short: there are a bunch of fixes and small improvements. The shining items in this release are memory usage improvements in Thanos Query and shuffle sharding support in Thanos Receiver. Information about shuffle sharding support is available in the documentation. Thank you to all contributors!
### Added
- [#8308](https://github.com/thanos-io/thanos/pull/8308) Receive: Prometheus counters for pending write requests and series requests
- [#8225](https://github.com/thanos-io/thanos/pull/8225) tools: Extend bucket ls options.
- [#8238](https://github.com/thanos-io/thanos/pull/8238) Receive: add shuffle sharding support
- [#8284](https://github.com/thanos-io/thanos/pull/8284) Store: Add `--disable-admin-operations` Flag to Store Gateway
- [#8245](https://github.com/thanos-io/thanos/pull/8245) Querier/Query-Frontend/Ruler: Add `--enable-feature=promql-experimental-functions` flag option to enable using promQL experimental functions in respective Thanos components
- [#8259](https://github.com/thanos-io/thanos/pull/8259) Shipper: Add `--shipper.skip-corrupted-blocks` flag to allow `Sync()` to continue upload when finding a corrupted block
### Changed
- [#8282](https://github.com/thanos-io/thanos/pull/8282) Force sync writes to meta.json in case of host crash
- [#8192](https://github.com/thanos-io/thanos/pull/8192) Sidecar: fix default get config timeout
- [#8202](https://github.com/thanos-io/thanos/pull/8202) Receive: Unhide `--tsdb.enable-native-histograms` flag
- [#8315](https://github.com/thanos-io/thanos/pull/8315) Query-Frontend: only ready if downstream is ready
### Removed
- [#8289](https://github.com/thanos-io/thanos/pull/8289) Receive: *breaking :warning:* Removed migration of legacy-TSDB to multi-TSDB. Ensure you are running version >0.13
### Fixed
- [#8199](https://github.com/thanos-io/thanos/pull/8199) Query: handle panics or nil pointer dereference in querier gracefully when query analyze returns nil
- [#8211](https://github.com/thanos-io/thanos/pull/8211) Query: fix panic on nested partial response in distributed instant query
- [#8216](https://github.com/thanos-io/thanos/pull/8216) Query/Receive: fix iter race between `next()` and `stop()` introduced in https://github.com/thanos-io/thanos/pull/7821.
- [#8212](https://github.com/thanos-io/thanos/pull/8212) Receive: Ensure forward/replication metrics are incremented in err cases
- [#8296](https://github.com/thanos-io/thanos/pull/8296) Query: limit LazyRetrieval memory buffer size
## [v0.38.0](https://github.com/thanos-io/thanos/tree/release-0.38) - 03.04.2025
### Fixed
- [#8091](https://github.com/thanos-io/thanos/pull/8091) *: Add POST into allowed CORS methods header
- [#8046](https://github.com/thanos-io/thanos/pull/8046) Query-Frontend: Fix query statistic reporting for range queries when caching is enabled.
- [#7978](https://github.com/thanos-io/thanos/pull/7978) Receive: Fix deadlock during local writes when `split-tenant-label-name` is used
- [#8016](https://github.com/thanos-io/thanos/pull/8016) Query Frontend: Fix @ modifier not being applied correctly on sub queries.
### Added
- [#7907](https://github.com/thanos-io/thanos/pull/7907) Receive: Add `--receive.grpc-service-config` flag to configure gRPC service config for the receivers.
- [#7961](https://github.com/thanos-io/thanos/pull/7961) Store Gateway: Add `--store.posting-group-max-keys` flag to mark posting group as lazy if it exceeds number of keys limit. Added `thanos_bucket_store_lazy_expanded_posting_groups_total` for total number of lazy posting groups and corresponding reasons.
- [#8000](https://github.com/thanos-io/thanos/pull/8000) Query: Bump promql-engine, pass partial response through options
- [#7353](https://github.com/thanos-io/thanos/pull/7353) [#8045](https://github.com/thanos-io/thanos/pull/8045) Receiver/StoreGateway: Add `--matcher-cache-size` option to enable caching for regex matchers in series calls.
- [#8017](https://github.com/thanos-io/thanos/pull/8017) Store Gateway: Use native histogram for binary reader load and download duration and fixed download duration metric. #8017
- [#8131](https://github.com/thanos-io/thanos/pull/8131) Store Gateway: Optimize regex matchers for .* and .+. #8131
- [#7808](https://github.com/thanos-io/thanos/pull/7808) Query: Support chain deduplication algorithm.
- [#8158](https://github.com/thanos-io/thanos/pull/8158) Rule: Add support for query offset.
- [#8110](https://github.com/thanos-io/thanos/pull/8110) Compact: implement native histogram downsampling.
- [#7996](https://github.com/thanos-io/thanos/pull/7996) Receive: Add OTLP endpoint.
### Changed
- [#7890](https://github.com/thanos-io/thanos/pull/7890) Query,Ruler: *breaking :warning:* deprecated `--store.sd-file` and `--store.sd-interval` to be replaced with `--endpoint.sd-config` and `--endpoint-sd-config-reload-interval`; removed legacy flags to pass endpoints `--store`, `--metadata`, `--rule`, `--exemplar`.
- [#7012](https://github.com/thanos-io/thanos/pull/7012) Query: Automatically adjust `max_source_resolution` based on promql query to avoid querying data from higher resolution resulting empty results.
- [#8118](https://github.com/thanos-io/thanos/pull/8118) Query: Bumped promql-engine
- [#8135](https://github.com/thanos-io/thanos/pull/8135) Query: respect partial response in distributed engine
- [#8181](https://github.com/thanos-io/thanos/pull/8181) Deps: bump promql engine
### Removed
## [v0.37.2](https://github.com/thanos-io/thanos/tree/release-0.37) - 11.12.2024
### Fixed
- [#7970](https://github.com/thanos-io/thanos/pull/7970) Sidecar: Respect min-time setting.
- [#7962](https://github.com/thanos-io/thanos/pull/7962) Store: Fix potential deadlock in hedging request.
- [#8175](https://github.com/thanos-io/thanos/pull/8175) Query: fix endpointset setup
### Added
### Changed
### Removed
## [v0.34.0](https://github.com/thanos-io/thanos/tree/release-0.34) - release in progress
## [v0.37.1](https://github.com/thanos-io/thanos/tree/release-0.37) - 04.12.2024
### Fixed
- [#7674](https://github.com/thanos-io/thanos/pull/7674) Query-frontend: Fix connection to Redis cluster with TLS.
- [#7945](https://github.com/thanos-io/thanos/pull/7945) Receive: Capnproto - use segment from existing message.
- [#7941](https://github.com/thanos-io/thanos/pull/7941) Receive: Fix race condition when adding multiple new tenants, see [issue-7892](https://github.com/thanos-io/thanos/issues/7892).
- [#7954](https://github.com/thanos-io/thanos/pull/7954) Sidecar: Ensure limit param is positive for compatibility with older Prometheus.
- [#7953](https://github.com/thanos-io/thanos/pull/7953) Query: Update promql-engine for subquery avg fix.
### Added
### Changed
### Removed
## [v0.37.0](https://github.com/thanos-io/thanos/tree/release-0.37) - 25.11.2024
### Fixed
- [#7511](https://github.com/thanos-io/thanos/pull/7511) Query Frontend: fix doubled gzip compression for response body.
- [#7592](https://github.com/thanos-io/thanos/pull/7592) Ruler: Only increment `thanos_rule_evaluation_with_warnings_total` metric for non PromQL warnings.
- [#7614](https://github.com/thanos-io/thanos/pull/7614) *: fix debug log formatting.
- [#7492](https://github.com/thanos-io/thanos/pull/7492) Compactor: update filtered blocks list before second downsample pass.
- [#7658](https://github.com/thanos-io/thanos/pull/7658) Store: Fix panic because too small buffer in pool.
- [#7643](https://github.com/thanos-io/thanos/pull/7643) Receive: fix thanos_receive_write_{timeseries,samples} stats
- [#7644](https://github.com/thanos-io/thanos/pull/7644) fix(ui): add null check to find overlapping blocks logic
- [#7674](https://github.com/thanos-io/thanos/pull/7674) Query-frontend: Fix connection to Redis cluster with TLS.
- [#7814](https://github.com/thanos-io/thanos/pull/7814) Store: label_values: if matchers contain **name**=="something", do not add <labelname> != "" to fetch less postings.
- [#7679](https://github.com/thanos-io/thanos/pull/7679) Query: respect store.limit.* flags when evaluating queries
- [#7821](https://github.com/thanos-io/thanos/pull/7821) Query/Receive: Fix coroutine leak introduced in https://github.com/thanos-io/thanos/pull/7796.
- [#7843](https://github.com/thanos-io/thanos/pull/7843) Query Frontend: fix slow query logging for non-query endpoints.
- [#7852](https://github.com/thanos-io/thanos/pull/7852) Query Frontend: pass "stats" parameter forward to queriers and fix Prometheus stats merging.
- [#7832](https://github.com/thanos-io/thanos/pull/7832) Query Frontend: Fix cache keys for dynamic split intervals.
- [#7885](https://github.com/thanos-io/thanos/pull/7885) Store: Return chunks to the pool after completing a Series call.
- [#7893](https://github.com/thanos-io/thanos/pull/7893) Sidecar: Fix retrieval of external labels for Prometheus v3.0.0.
- [#7903](https://github.com/thanos-io/thanos/pull/7903) Query: Fix panic on regex store matchers.
- [#7915](https://github.com/thanos-io/thanos/pull/7915) Store: Close block series client at the end to not reuse chunk buffer
- [#7941](https://github.com/thanos-io/thanos/pull/7941) Receive: Fix race condition when adding multiple new tenants, see [issue-7892](https://github.com/thanos-io/thanos/issues/7892).
### Added
- [#7763](https://github.com/thanos-io/thanos/pull/7763) Ruler: use native histograms for client latency metrics.
- [#7609](https://github.com/thanos-io/thanos/pull/7609) API: Add limit param to metadata APIs (series, label names, label values).
- [#7429](https://github.com/thanos-io/thanos/pull/7429): Reloader: introduce `TolerateEnvVarExpansionErrors` to allow suppressing errors when expanding environment variables in the configuration file. When set, this will ensure that the reloader won't consider the operation to fail when an unset environment variable is encountered. Note that all unset environment variables are left as is, whereas all set environment variables are expanded as usual.
- [#7560](https://github.com/thanos-io/thanos/pull/7560) Query: Added the possibility of filtering rules by rule_name, rule_group or file to HTTP api.
- [#7652](https://github.com/thanos-io/thanos/pull/7652) Store: Implement metadata API limit in stores.
- [#7659](https://github.com/thanos-io/thanos/pull/7659) Receive: Add support for replication using [Cap'n Proto](https://capnproto.org/). This protocol has a lower CPU and memory footprint, which leads to a reduction in resource usage in Receivers. Before enabling it, make sure that all receivers are updated to a version which supports this replication method.
- [#7853](https://github.com/thanos-io/thanos/pull/7853) UI: Add support for selecting graph time range with mouse drag.
- [#7855](https://github.com/thanos-io/thanos/pull/7855) Compcat/Query: Add support for comma separated replica labels.
- [#7654](https://github.com/thanos-io/thanos/pull/7654) *: Add '--grpc-server-tls-min-version' flag to allow user to specify TLS version, otherwise default to TLS 1.3
- [#7854](https://github.com/thanos-io/thanos/pull/7854) Query Frontend: Add `--query-frontend.force-query-stats` flag to force collection of query statistics from upstream queriers.
- [#7860](https://github.com/thanos-io/thanos/pull/7860) Store: Support hedged requests
- [#7924](https://github.com/thanos-io/thanos/pull/7924) *: Upgrade promql-engine to `v0.0.0-20241106100125-097e6e9f425a` and objstore to `v0.0.0-20241111205755-d1dd89d41f97`
- [#7835](https://github.com/thanos-io/thanos/pull/7835) Ruler: Add ability to do concurrent rule evaluations
- [#7722](https://github.com/thanos-io/thanos/pull/7722) Query: Add partition labels flag to partition leaf querier in distributed mode
### Changed
- [#7494](https://github.com/thanos-io/thanos/pull/7494) Ruler: remove trailing period from SRV records returned by discovery `dnsnosrva` lookups
- [#7567](https://github.com/thanos-io/thanos/pull/7565) Query: Use thanos resolver for endpoint groups.
- [#7741](https://github.com/thanos-io/thanos/pull/7741) Deps: Bump Objstore to `v0.0.0-20240913074259-63feed0da069`
- [#7813](https://github.com/thanos-io/thanos/pull/7813) Receive: enable initial TSDB compaction time randomization
- [#7820](https://github.com/thanos-io/thanos/pull/7820) Sidecar: Use prometheus metrics for min timestamp
- [#7886](https://github.com/thanos-io/thanos/pull/7886) Discovery: Preserve results from other resolve calls
- [#7745](https://github.com/thanos-io/thanos/pull/7745) *: Build with Prometheus stringlabels tags
- [#7669](https://github.com/thanos-io/thanos/pull/7669) Receive: Change quorum calculation for rf=2
### Removed
- [#7704](https://github.com/thanos-io/thanos/pull/7704) *: *breaking :warning:* remove Store gRPC Info function. This has been deprecated for 3 years, its time to remove it.
- [#7793](https://github.com/thanos-io/thanos/pull/7793) Receive: Disable dedup proxy in multi-tsdb
- [#7678](https://github.com/thanos-io/thanos/pull/7678) Query: Skip formatting strings if debug logging is disabled
## [v0.36.1](https://github.com/thanos-io/thanos/tree/release-0.36)
### Fixed
- [#7634](https://github.com/thanos-io/thanos/pull/7634) Rule: fix Query and Alertmanager TLS configurations with CA only.
- [#7618](https://github.com/thanos-io/thanos/pull/7618) Proxy: Query goroutine leak when store.response-timeout is set
### Added
### Changed
### Removed
## [v0.36.0](https://github.com/thanos-io/thanos/tree/release-0.36)
### Fixed
- [#7326](https://github.com/thanos-io/thanos/pull/7326) Query: fixing exemplars proxy when querying stores with multiple tenants.
- [#7403](https://github.com/thanos-io/thanos/pull/7403) Sidecar: fix startup sequence
- [#7484](https://github.com/thanos-io/thanos/pull/7484) Proxy: fix panic in lazy response set
- [#7493](https://github.com/thanos-io/thanos/pull/7493) *: fix server grpc histograms
### Added
- [#7317](https://github.com/thanos-io/thanos/pull/7317) Tracing: allow specifying resource attributes for the OTLP configuration.
- [#7367](https://github.com/thanos-io/thanos/pull/7367) Store Gateway: log request ID in request logs.
- [#7361](https://github.com/thanos-io/thanos/pull/7361) Query: *breaking :warning:* pass query stats from remote execution from server to client. We changed the protobuf of the QueryAPI, if you use `query.mode=distributed` you need to update your client (upper level Queriers) first, before updating leaf Queriers (servers).
- [#7363](https://github.com/thanos-io/thanos/pull/7363) Query-frontend: set value of remote_user field in Slow Query Logs from HTTP header
- [#7335](https://github.com/thanos-io/thanos/pull/7335) Dependency: Update minio-go to v7.0.70 which includes support for EKS Pod Identity.
- [#7477](https://github.com/thanos-io/thanos/pull/7477) *: Bump objstore to `20240622095743-1afe5d4bc3cd`
### Changed
- [#7334](https://github.com/thanos-io/thanos/pull/7334) Compactor: do not vertically compact downsampled blocks. Such cases are now marked with `no-compact-mark.json`. Fixes panic `panic: unexpected seriesToChunkEncoder lack of iterations`.
- [#7393](https://github.com/thanos-io/thanos/pull/7393) *: *breaking :warning:* Using native histograms for grpc middleware metrics. Metrics `grpc_client_handling_seconds` and `grpc_server_handling_seconds` will now be native histograms, if you have enabled native histogram scraping you will need to update your PromQL expressions to use the new metric names.
### Removed
## [v0.35.1](https://github.com/thanos-io/thanos/tree/release-0.35) - 28.05.2024
### Fixed
- [#7323](https://github.com/thanos-io/thanos/pull/7323) Sidecar: wait for prometheus on startup
- [#6948](https://github.com/thanos-io/thanos/pull/6948) Receive: fix goroutines leak during series requests to thanos store api.
- [#7382](https://github.com/thanos-io/thanos/pull/7382) *: Ensure objstore flag values are masked & disable debug/pprof/cmdline
- [#7392](https://github.com/thanos-io/thanos/pull/7392) Query: fix broken min, max for pre 0.34.1 sidecars
- [#7373](https://github.com/thanos-io/thanos/pull/7373) Receive: Fix stats for remote write
- [#7318](https://github.com/thanos-io/thanos/pull/7318) Compactor: Recover from panic to log block ID
### Added
### Changed
### Removed
## [v0.35.0](https://github.com/thanos-io/thanos/tree/release-0.35) - 02.05.2024
### Fixed
- [#7083](https://github.com/thanos-io/thanos/pull/7083) Store Gateway: Fix lazy expanded postings with 0 length failed to be cached.
- [#7080](https://github.com/thanos-io/thanos/pull/7080) Receive: race condition in handler Close() when stopped early
- [#7132](https://github.com/thanos-io/thanos/pull/7132) Documentation: fix broken helm installation instruction
- [#7134](https://github.com/thanos-io/thanos/pull/7134) Store, Compact: Revert the recursive block listing mechanism introduced in https://github.com/thanos-io/thanos/pull/6474 and use the same strategy as in 0.31. Introduce a `--block-discovery-strategy` flag to control the listing strategy so that a recursive lister can still be used if the tradeoff of slower but cheaper discovery is preferred.
- [#7122](https://github.com/thanos-io/thanos/pull/7122) Store Gateway: Fix lazy expanded postings estimate base cardinality using posting group with remove keys.
- [#7166](https://github.com/thanos-io/thanos/pull/7166) Receive/MultiTSDB: Do not delete non-uploaded blocks
- [#7179](https://github.com/thanos-io/thanos/pull/7179) Query: Fix merging of query analysis
- [#7224](https://github.com/thanos-io/thanos/pull/7224) Query-frontend: Add Redis username to the client configuration.
- [#7220](https://github.com/thanos-io/thanos/pull/7220) Store Gateway: Fix lazy expanded postings caching partial expanded postings and bug of estimating remove postings with non existent value. Added `PromQLSmith` based fuzz test to improve correctness.
- [#7225](https://github.com/thanos-io/thanos/pull/7225) Compact: Don't halt due to overlapping sources when vertical compaction is enabled
- [#7244](https://github.com/thanos-io/thanos/pull/7244) Query: Fix Internal Server Error unknown targetHealth: "unknown" when trying to open the targets page.
- [#7248](https://github.com/thanos-io/thanos/pull/7248) Receive: Fix RemoteWriteAsync was sequentially executed causing high latency in the ingestion path.
- [#7271](https://github.com/thanos-io/thanos/pull/7271) Query: fixing dedup iterator when working on mixed sample types.
- [#7289](https://github.com/thanos-io/thanos/pull/7289) Query Frontend: show warnings from downstream queries.
- [#7308](https://github.com/thanos-io/thanos/pull/7308) Store: Batch TSDB Infos for blocks.
- [#7301](https://github.com/thanos-io/thanos/pull/7301) Store Gateway: fix index header reader `PostingsOffsets` returning wrong values.
### Added
- [#7155](https://github.com/thanos-io/thanos/pull/7155) Receive: Add tenant globbing support to hashring config
- [#7231](https://github.com/thanos-io/thanos/pull/7231) Tracing: added missing sampler types
- [#7194](https://github.com/thanos-io/thanos/pull/7194) Downsample: retry objstore related errors
- [#7105](https://github.com/thanos-io/thanos/pull/7105) Rule: add flag `--query.enable-x-functions` to allow usage of extended promql functions (xrate, xincrease, xdelta) in loaded rules
- [#6867](https://github.com/thanos-io/thanos/pull/6867) Query UI: Tenant input box added to the Query UI, in order to be able to specify which tenant the query should use.
- [#7186](https://github.com/thanos-io/thanos/pull/7186) Query UI: Only show tenant input box when query tenant enforcement is enabled
- [#7175](https://github.com/thanos-io/thanos/pull/7175) Query: Add `--query.mode=distributed` which enables the new distributed mode of the Thanos query engine.
- [#7199](https://github.com/thanos-io/thanos/pull/7199) Reloader: Add support for watching and decompressing Prometheus configuration directories
- [#7200](https://github.com/thanos-io/thanos/pull/7175) Query: Add `--selector.relabel-config` and `--selector.relabel-config-file` flags which allows scoping the Querier to a subset of matched TSDBs.
- [#7233](https://github.com/thanos-io/thanos/pull/7233) UI: Showing Block Size Stats
- [#7256](https://github.com/thanos-io/thanos/pull/7256) Receive: Split remote-write HTTP requests via tenant labels of series
- [#7269](https://github.com/thanos-io/thanos/pull/7269) Query UI: Show peak/total samples in query analysis
- [#7280](https://github.com/thanos-io/thanos/pull/7281) *: Adding User-Agent to request logs
- [#7219](https://github.com/thanos-io/thanos/pull/7219) Receive: add `--remote-write.client-tls-secure` and `--remote-write.client-tls-skip-verify` flags to stop relying on grpc server config to determine grpc client secure/skipVerify.
- [#7297](https://github.com/thanos-io/thanos/pull/7297) *: mark as not queryable if status is not ready
- [#7302](https://github.com/thanos-io/thanos/pull/7303) Considering the `X-Forwarded-For` header for the remote address in the logs.
- [#7304](https://github.com/thanos-io/thanos/pull/7304) Store: Use loser trees for merging results
### Changed
- [#7123](https://github.com/thanos-io/thanos/pull/7123) Rule: Change default Alertmanager API version to v2.
- [#7192](https://github.com/thanos-io/thanos/pull/7192) Rule: Do not turn off ruler even if resolving fails
- [#7223](https://github.com/thanos-io/thanos/pull/7223) Automatic detection of memory limits and configure GOMEMLIMIT to match.
- [#7283](https://github.com/thanos-io/thanos/pull/7283) Compact: *breaking :warning:* Replace group with resolution in compact downsample metrics to avoid cardinality explosion with large numbers of groups.
- [#7305](https://github.com/thanos-io/thanos/pull/7305) Query|Receiver: Do not log full request on ProxyStore by default.
### Removed
## [v0.34.1](https://github.com/thanos-io/thanos/tree/release-0.34) - 11.02.24
### Fixed
- [#7078](https://github.com/thanos-io/thanos/pull/7078) *: Bump gRPC to 1.57.2
### Added
### Changed
### Removed
## [v0.34.0](https://github.com/thanos-io/thanos/tree/release-0.34) - 26.01.24
### Fixed
- [#7011](https://github.com/thanos-io/thanos/pull/7011) Query Frontend: queries with negative offset should check whether it is cacheable or not.
- [#6874](https://github.com/thanos-io/thanos/pull/6874) Sidecar: fix labels returned by 'api/v1/series' in presence of conflicting external and inner labels.
- [#7009](https://github.com/thanos-io/thanos/pull/7009) Rule: Fix spacing error in URL.
- [#7082](https://github.com/thanos-io/thanos/pull/7082) Stores: fix label values edge case when requesting external label values with matchers
- [#7114](https://github.com/thanos-io/thanos/pull/7114) Stores: fix file path bug for minio v7.0.61
### Added
@ -40,6 +332,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#6887](https://github.com/thanos-io/thanos/pull/6887) Query Frontend: *breaking :warning:* Add tenant label to relevant exported metrics. Note that this change may cause some pre-existing custom dashboard queries to be incorrect due to the added label.
- [#7028](https://github.com/thanos-io/thanos/pull/7028) Query|Query Frontend: Add new `--query-frontend.enable-x-functions` flag to enable experimental extended functions.
- [#6884](https://github.com/thanos-io/thanos/pull/6884) Tools: Add upload-block command to upload blocks to object storage.
- [#7010](https://github.com/thanos-io/thanos/pull/7010) Cache: Added `set_async_circuit_breaker_*` to utilize the circuit breaker pattern for dynamically thresholding asynchronous set operations.
### Changed
@ -47,6 +340,8 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
### Removed
- [#7014](https://github.com/thanos-io/thanos/pull/7014) *: *breaking :warning:* Removed experimental query pushdown feature to simplify query path. This feature has had high complexity for too little benefits. The responsibility for query pushdown will be moved to the distributed mode of the new 'thanos' promql engine.
## [v0.33.0](https://github.com/thanos-io/thanos/tree/release-0.33) - 18.12.2023
### Fixed
@ -73,6 +368,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#6753](https://github.com/thanos-io/thanos/pull/6753) mixin(Rule): *breaking :warning:* Fixed the mixin rules with duplicate names and updated the promtool version from v0.37.0 to v0.47.0
- [#6772](https://github.com/thanos-io/thanos/pull/6772) *: Bump prometheus to v0.47.2-0.20231006112807-a5a4eab679cc
- [#6794](https://github.com/thanos-io/thanos/pull/6794) Receive: the exported HTTP metrics now uses the specified default tenant for requests where no tenants are found.
- [#6651](https://github.com/thanos-io/thanos/pull/6651) *: Update go_grpc_middleware to v2.0.0. Remove Tags Interceptor from Thanos. Tags interceptor is removed from v2.0.0 go-grpc-middleware and is not needed anymore.
### Removed
@ -111,6 +407,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#6692](https://github.com/thanos-io/thanos/pull/6692) Store: Fix matching bug when using empty alternative in regex matcher, for example (a||b).
- [#6679](https://github.com/thanos-io/thanos/pull/6697) Store: Fix block deduplication
- [#6706](https://github.com/thanos-io/thanos/pull/6706) Store: Series responses should always be sorted
- [#7286](https://github.com/thanos-io/thanos/pull/7286) Query: Propagate instant query warnings in distributed execution mode.
### Added
@ -198,7 +495,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#6342](https://github.com/thanos-io/thanos/pull/6342) Cache/Redis: Upgrade `rueidis` to v1.0.2 to to improve error handling while shrinking a redis cluster.
- [#6325](https://github.com/thanos-io/thanos/pull/6325) Store: return gRPC resource exhausted error for byte limiter.
- [#6399](https://github.com/thanos-io/thanos/pull/6399) *: Fix double-counting bug in http_request_duration metric
- [#6428](https://github.com/thanos-io/thanos/pull/6428) Report gRPC connnection errors in the logs.
- [#6428](https://github.com/thanos-io/thanos/pull/6428) Report gRPC connection errors in the logs.
- [#6519](https://github.com/thanos-io/thanos/pull/6519) Reloader: Use timeout for initial apply.
- [#6509](https://github.com/thanos-io/thanos/pull/6509) Store Gateway: Remove `memWriter` from `fileWriter` to reduce memory usage when sync index headers.
- [#6556](https://github.com/thanos-io/thanos/pull/6556) Thanos compact: respect block-files-concurrency setting when downsampling
@ -320,7 +617,7 @@ NOTE: Querier's `query.promql-engine` flag enabling new PromQL engine is now unh
- [#5889](https://github.com/thanos-io/thanos/pull/5889) Query Frontend: Added support for vertical sharding `label_replace` and `label_join` functions.
- [#5865](https://github.com/thanos-io/thanos/pull/5865) Compact: Retry on sync metas error.
- [#5819](https://github.com/thanos-io/thanos/pull/5819) Store: Added a few objectives for Store's data summaries (touched/fetched amount and sizes). They are: 50, 95, and 99 quantiles.
- [#5837](https://github.com/thanos-io/thanos/pull/5837) Store: Added streaming retrival of series from object storage.
- [#5837](https://github.com/thanos-io/thanos/pull/5837) Store: Added streaming retrieval of series from object storage.
- [#5940](https://github.com/thanos-io/thanos/pull/5940) Objstore: Support for authenticating to Swift using application credentials.
- [#5945](https://github.com/thanos-io/thanos/pull/5945) Tools: Added new `no-downsample` marker to skip blocks when downsampling via `thanos tools bucket mark --marker=no-downsample-mark.json`. This will skip downsampling for blocks with the new marker.
- [#5977](https://github.com/thanos-io/thanos/pull/5977) Tools: Added remove flag on bucket mark command to remove deletion, no-downsample or no-compact markers on the block
@ -330,7 +627,7 @@ NOTE: Querier's `query.promql-engine` flag enabling new PromQL engine is now unh
- [#5785](https://github.com/thanos-io/thanos/pull/5785) Query: `thanos_store_nodes_grpc_connections` now trimms `external_labels` label name longer than 1000 character. It also allows customizations in what labels to preserve using `query.conn-metric.label` flag.
- [#5542](https://github.com/thanos-io/thanos/pull/5542) Mixin: Added query concurrency panel to Querier dashboard.
- [#5846](https://github.com/thanos-io/thanos/pull/5846) Query Frontend: vertical query sharding supports subqueries.
- [#5593](https://github.com/thanos-io/thanos/pull/5593) Cache: switch Redis client to [Rueidis](https://github.com/rueian/rueidis). Rueidis is [faster](https://github.com/rueian/rueidis#benchmark-comparison-with-go-redis-v9) and provides [client-side caching](https://redis.io/docs/manual/client-side-caching/). It is highly recommended to use it so that repeated requests for the same key would not be needed.
- [#5593](https://github.com/thanos-io/thanos/pull/5593) Cache: switch Redis client to [Rueidis](https://github.com/rueian/rueidis). Rueidis is [faster](https://github.com/rueian/rueidis#benchmark-comparison-with-go-redis-v9) and provides [client-side caching](https://redis.io/docs/develop/use/client-side-caching/). It is highly recommended to use it so that repeated requests for the same key would not be needed.
- [#5896](https://github.com/thanos-io/thanos/pull/5896) *: Upgrade Prometheus to v0.40.7 without implementing native histogram support. *Querying native histograms will fail with `Error executing query: invalid chunk encoding "<unknown>"` and native histograms in write requests are ignored.*
- [#5909](https://github.com/thanos-io/thanos/pull/5909) Receive: Compact tenant head after no appends have happened for 1.5 `tsdb.max-block-size`.
- [#5838](https://github.com/thanos-io/thanos/pull/5838) Mixin: Added data touched type to Store dashboard.
@ -476,7 +773,7 @@ NOTE: Querier's `query.promql-engine` flag enabling new PromQL engine is now unh
- [#5170](https://github.com/thanos-io/thanos/pull/5170) All: Upgraded the TLS version from TLS1.2 to TLS1.3.
- [#5205](https://github.com/thanos-io/thanos/pull/5205) Rule: Add ruler labels as external labels in stateless ruler mode.
- [#5206](https://github.com/thanos-io/thanos/pull/5206) Cache: Add timeout for groupcache's fetch operation.
- [#5218](https://github.com/thanos-io/thanos/pull/5218) Tools: Thanos tools bucket downsample is now running continously.
- [#5218](https://github.com/thanos-io/thanos/pull/5218) Tools: Thanos tools bucket downsample is now running continuously.
- [#5231](https://github.com/thanos-io/thanos/pull/5231) Tools: Bucket verify tool ignores blocks with deletion markers.
- [#5244](https://github.com/thanos-io/thanos/pull/5244) Query: Promote negative offset and `@` modifier to stable features as per Prometheus [#10121](https://github.com/prometheus/prometheus/pull/10121).
- [#5255](https://github.com/thanos-io/thanos/pull/5255) InfoAPI: Set store API unavailable when stores are not ready.
@ -1236,7 +1533,7 @@ sse_config:
- [#1666](https://github.com/thanos-io/thanos/pull/1666) Compact: `thanos_compact_group_compactions_total` now counts block compactions, so operations that resulted in a compacted block. The old behaviour is now exposed by new metric: `thanos_compact_group_compaction_runs_started_total` and `thanos_compact_group_compaction_runs_completed_total` which counts compaction runs overall.
- [#1748](https://github.com/thanos-io/thanos/pull/1748) Updated all dependencies.
- [#1694](https://github.com/thanos-io/thanos/pull/1694) `prober_ready` and `prober_healthy` metrics are removed, for sake of `status`. Now `status` exposes same metric with a label, `check`. `check` can have "healty" or "ready" depending on status of the probe.
- [#1694](https://github.com/thanos-io/thanos/pull/1694) `prober_ready` and `prober_healthy` metrics are removed, for sake of `status`. Now `status` exposes same metric with a label, `check`. `check` can have "healthy" or "ready" depending on status of the probe.
- [#1790](https://github.com/thanos-io/thanos/pull/1790) Ruler: Fixes subqueries support for ruler.
- [#1769](https://github.com/thanos-io/thanos/pull/1769) & [#1545](https://github.com/thanos-io/thanos/pull/1545) Adjusted most of the metrics histogram buckets.
@ -1468,7 +1765,7 @@ This version moved tarballs to Golang 1.12.5 from 1.11 as well, so same warning
- query:
- [BUGFIX] Make sure subquery range is taken into account for selection #5467
- [ENHANCEMENT] Check for cancellation on every step of a range evaluation. #5131
- [BUGFIX] Exponentation operator to drop metric name in result of operation. #5329
- [BUGFIX] Exponentiation operator to drop metric name in result of operation. #5329
- [BUGFIX] Fix output sample values for scalar-to-vector comparison operations. #5454
- rule:
- [BUGFIX] Reload rules: copy state on both name and labels. #5368

View File

@ -68,7 +68,7 @@ The following section explains various suggestions and procedures to note during
* It is strongly recommended that you use Linux distributions systems or macOS for development.
* Running [WSL 2 (on Windows)](https://learn.microsoft.com/en-us/windows/wsl/) is also possible. Note that if during development you run a local Kubernetes cluster and have a Service with `service.spec.sessionAffinity: ClientIP`, it will break things until it's removed[^windows_xt_recent].
* Go 1.21.x or higher.
* Go 1.22.x or higher.
* Docker (to run e2e tests)
* For React UI, you will need a working NodeJS environment and the npm package manager to compile the Web UI assets.
@ -164,7 +164,7 @@ $ git push origin <your_branch_for_new_pr>
**Formatting**
First of all, fall back to `make help` to see all availible commands. There are a few checks that happen when making a PR and these need to pass. We can make sure locally before making the PR by using commands that are related to your changes:
First of all, fall back to `make help` to see all available commands. There are a few checks that happen when making a PR and these need to pass. We can make sure locally before making the PR by using commands that are related to your changes:
- `make docs` generates, formats and cleans up white noise.
- `make changed-docs` does same as above, but just for changed docs by checking `git diff` on which files are changed.
- `make check-docs` generates, formats, cleans up white noise and checks links. Since it can be annoying to wait on link check results - it takes forever - to skip the check, you can use `make docs`).

View File

@ -3,13 +3,13 @@ ARG BASE_DOCKER_SHA="14d68ca3d69fceaa6224250c83d81d935c053fb13594c811038c4611945
FROM quay.io/prometheus/busybox@sha256:${BASE_DOCKER_SHA}
LABEL maintainer="The Thanos Authors"
COPY /thanos_tmp_for_docker /bin/thanos
RUN adduser \
-D `#Dont assign a password` \
-H `#Dont create home directory` \
-u 1001 `#User id`\
thanos && \
chown thanos /bin/thanos
thanos
COPY --chown=thanos /thanos_tmp_for_docker /bin/thanos
USER 1001
ENTRYPOINT [ "/bin/thanos" ]

View File

@ -1,14 +1,14 @@
# Taking a non-alpine image for e2e tests so that cgo can be enabled for the race detector.
FROM golang:1.21 as builder
FROM golang:1.25.0 as builder
WORKDIR $GOPATH/src/github.com/thanos-io/thanos
COPY . $GOPATH/src/github.com/thanos-io/thanos
RUN CGO_ENABLED=1 go build -o $GOBIN/thanos -race ./cmd/thanos
RUN CGO_ENABLED=1 go build -tags slicelabels -o $GOBIN/thanos -race ./cmd/thanos
# -----------------------------------------------------------------------------
FROM golang:1.21
FROM golang:1.25.0
LABEL maintainer="The Thanos Authors"
COPY --from=builder $GOBIN/thanos /bin/thanos

View File

@ -1,6 +1,6 @@
# By default we pin to amd64 sha. Use make docker to automatically adjust for arm64 versions.
ARG BASE_DOCKER_SHA="14d68ca3d69fceaa6224250c83d81d935c053fb13594c811038c461194599973"
FROM golang:1.21-alpine3.18 as builder
FROM golang:1.24.0-alpine3.20 as builder
WORKDIR $GOPATH/src/github.com/thanos-io/thanos
# Change in the docker context invalidates the cache so to leverage docker

View File

@ -5,15 +5,12 @@
| Bartłomiej Płotka | bwplotka@gmail.com | `@bwplotka` | [@bwplotka](https://github.com/bwplotka) | Google |
| Frederic Branczyk | fbranczyk@gmail.com | `@brancz` | [@brancz](https://github.com/brancz) | Polar Signals |
| Giedrius Statkevičius | giedriuswork@gmail.com | `@Giedrius Statkevičius` | [@GiedriusS](https://github.com/GiedriusS) | Vinted |
| Kemal Akkoyun | kakkoyun@gmail.com | `@kakkoyun` | [@kakkoyun](https://github.com/kakkoyun) | Polar Signals |
| Lucas Servén Marín | lserven@gmail.com | `@squat` | [@squat](https://github.com/squat) | Red Hat |
| Prem Saraswat | prmsrswt@gmail.com | `@Prem Saraswat` | [@onprem](https://github.com/onprem) | Red Hat |
| Matthias Loibl | mail@matthiasloibl.com | `@metalmatze` | [@metalmatze](https://github.com/metalmatze) | Polar Signals |
| Ben Ye | yb532204897@gmail.com | `@yeya24` | [@yeya24](https://github.com/yeya24) | Amazon Web Services |
| Matej Gera | matejgera@gmail.com | `@Matej Gera` | [@matej-g](https://github.com/matej-g) | Coralogix |
| Filip Petkovski | filip.petkovsky@gmail.com | `@Filip Petkovski` | [@fpetkovski](https://github.com/fpetkovski) | Shopify |
| Saswata Mukherjee | saswata.mukhe@gmail.com | `@saswatamcode` | [@saswatamcode](https://github.com/saswatamcode) | Red Hat |
| Michael Hoffmann | mhoffm@posteo.de | `@Michael Hoffmann` | [@MichaHoffmann](https://github.com/MichaHoffmann) | Aiven |
| Michael Hoffmann | mhoffm@posteo.de | `@Michael Hoffmann` | [@MichaHoffmann](https://github.com/MichaHoffmann) | Cloudflare |
We are bunch of people from different companies with various interests and skills. We are from different parts of the world: Germany, Holland, Lithuania, US, UK and India. We have something in common though: We all share the love for OpenSource, Go, Prometheus, :coffee: and Observability topics.
@ -31,15 +28,17 @@ We also have some nice souls that help triaging issues and PRs. See [here](https
Full list of triage persons is displayed below:
| Name | Slack | GitHub | Company |
|----------------|------------------|----------------------------------------------------|---------|
| Adrien Fillon | `@Adrien F` | [@adrien-f](https://github.com/adrien-f) | |
| Ian Billett | `@billett` | [@bill3tt](https://github.com/bill3tt) | Red Hat |
| Martin Chodur | `@FUSAKLA` | [@fusakla](https://github.com/fusakla) | |
| Michael Dai | `@jojohappy` | [@jojohappy](https://github.com/jojohappy) | |
| Xiang Dai | `@daixiang0` | [@daixiang0](https://github.com/daixiang0) | |
| Jimmie Han | `@hanjm` | [@hanjm](https://github.com/hanjm) | Tencent |
| Douglas Camata | `@douglascamata` | [@douglascamata](https://github.com/douglascamata) | Red Hat |
| Name | Slack | GitHub | Company |
|----------------|------------------|----------------------------------------------------|---------------------|
| Adrien Fillon | `@Adrien F` | [@adrien-f](https://github.com/adrien-f) | |
| Ian Billett | `@billett` | [@bill3tt](https://github.com/bill3tt) | Red Hat |
| Martin Chodur | `@FUSAKLA` | [@fusakla](https://github.com/fusakla) | |
| Michael Dai | `@jojohappy` | [@jojohappy](https://github.com/jojohappy) | |
| Xiang Dai | `@daixiang0` | [@daixiang0](https://github.com/daixiang0) | |
| Jimmie Han | `@hanjm` | [@hanjm](https://github.com/hanjm) | Tencent |
| Douglas Camata | `@douglascamata` | [@douglascamata](https://github.com/douglascamata) | Red Hat |
| Harry John | `@harry671003` | [@harry671003](https://github.com/harry671003) | Amazon Web Services |
| Pedro Tanaka | `@pedro.tanaka` | [@pedro-stanaka](https://github.com/pedro-stanaka) | Shopify |
Please reach any of the maintainer on slack or email if you want to help as well.
@ -61,7 +60,7 @@ This helps to also estimate how long it can potentially take to review the PR or
#### Help wanted
`help wanted ` label should be present if the issue is not really assigned (or the PR has to be reviewed) and we are looking for the volunteers (:
`help wanted` label should be present if the issue is not really assigned (or the PR has to be reviewed) and we are looking for the volunteers (:
#### Good first issue
@ -103,8 +102,15 @@ In time we plan to set up maintainers team that will be organization independent
## Initial authors
Fabian Reinartz @fabxc and Bartłomiej Płotka @bwplotka
Fabian Reinartz [@fabxc](https://github.com/fabxc) and Bartłomiej Płotka [@bwplotka](https://github.com/bwplotka)
## Previous Maintainers
## Emeritus Maintainers
Dominic Green, Povilas Versockas, Marco Pracucci
| Name | GitHub |
|-------------------|----------------------------------------------|
| Dominic Green | [@domgreen](https://github.com/domgreen) |
| Povilas Versockas | [@povilasv](https://github.com/povilasv) |
| Marco Pracucci | [@pracucci](https://github.com/pracucci) |
| Matthias Loibl | [@metalmatze](https://github.com/metalmatze) |
| Kemal Akkoyun | [@kakkoyun](https://github.com/kakkoyun) |
| Matej Gera | [@matej-g](https://github.com/matej-g) |

View File

@ -121,6 +121,10 @@ $(REACT_APP_OUTPUT_DIR): $(REACT_APP_NODE_MODULES_PATH) $(REACT_APP_SOURCE_FILES
.PHONY: react-app
react-app: $(REACT_APP_OUTPUT_DIR)
.PHONY: check-react-app
check-react-app: react-app
$(call require_clean_work_tree,'all generated files should be committed, run make react-app and commit changes.')
.PHONY: react-app-lint
react-app-lint: $(REACT_APP_NODE_MODULES_PATH)
@echo ">> running React app linting"
@ -145,7 +149,7 @@ react-app-start: $(REACT_APP_NODE_MODULES_PATH)
build: ## Builds Thanos binary using `promu`.
build: check-git deps $(PROMU)
@echo ">> building Thanos binary in $(PREFIX)"
@$(PROMU) build --prefix $(PREFIX)
@$(PROMU) build -v --prefix $(PREFIX)
GIT_BRANCH=$(shell $(GIT) rev-parse --abbrev-ref HEAD)
.PHONY: crossbuild
@ -291,6 +295,13 @@ proto: ## Generates Go files from Thanos proto files.
proto: check-git $(GOIMPORTS) $(PROTOC) $(PROTOC_GEN_GOGOFAST)
@GOIMPORTS_BIN="$(GOIMPORTS)" PROTOC_BIN="$(PROTOC)" PROTOC_GEN_GOGOFAST_BIN="$(PROTOC_GEN_GOGOFAST)" PROTOC_VERSION="$(PROTOC_VERSION)" scripts/genproto.sh
.PHONY: capnp
capnp: ## Generates Go files from Thanos capnproto files.
capnp: check-git
capnp compile -I $(shell go list -m -f '{{.Dir}}' capnproto.org/go/capnp/v3)/std -ogo pkg/receive/writecapnp/write_request.capnp
@$(GOIMPORTS) -w pkg/receive/writecapnp/write_request.capnp.go
go run ./scripts/copyright
.PHONY: tarballs-release
tarballs-release: ## Build tarballs.
tarballs-release: $(PROMU)
@ -308,7 +319,7 @@ test: export THANOS_TEST_ALERTMANAGER_PATH= $(ALERTMANAGER)
test: check-git install-tool-deps
@echo ">> install thanos GOOPTS=${GOOPTS}"
@echo ">> running unit tests (without /test/e2e). Do export THANOS_TEST_OBJSTORE_SKIP=GCS,S3,AZURE,SWIFT,COS,ALIYUNOSS,BOS,OCI,OBS if you want to skip e2e tests against all real store buckets. Current value: ${THANOS_TEST_OBJSTORE_SKIP}"
@go test -race -timeout 15m $(shell go list ./... | grep -v /vendor/ | grep -v /test/e2e);
@go test -tags slicelabels -race -timeout 15m $(shell go list ./... | grep -v /vendor/ | grep -v /test/e2e);
.PHONY: test-local
test-local: ## Runs test excluding tests for ALL object storage integrations.
@ -329,7 +340,11 @@ test-e2e: docker-e2e $(GOTESPLIT)
# NOTE(GiedriusS):
# * If you want to limit CPU time available in e2e tests then pass E2E_DOCKER_CPUS environment variable. For example, E2E_DOCKER_CPUS=0.05 limits CPU time available
# to spawned Docker containers to 0.05 cores.
@$(GOTESPLIT) -total ${GH_PARALLEL} -index ${GH_INDEX} ./test/e2e/... -- ${GOTEST_OPTS}
@if [ -n "$(SINGLE_E2E_TEST)" ]; then \
$(GOTESPLIT) -total ${GH_PARALLEL} -index ${GH_INDEX} ./test/e2e -- -tags slicelabels -run $(SINGLE_E2E_TEST) ${GOTEST_OPTS}; \
else \
$(GOTESPLIT) -total ${GH_PARALLEL} -index ${GH_INDEX} ./test/e2e/... -- -tags slicelabels ${GOTEST_OPTS}; \
fi
.PHONY: test-e2e-local
test-e2e-local: ## Runs all thanos e2e tests locally.
@ -391,8 +406,7 @@ go-lint: check-git deps $(GOLANGCI_LINT) $(FAILLINT)
$(call require_clean_work_tree,'detected not clean work tree before running lint, previous job changed something?')
@echo ">> verifying modules being imported"
@# TODO(bwplotka): Add, Printf, DefaultRegisterer, NewGaugeFunc and MustRegister once exception are accepted. Add fmt.{Errorf}=github.com/pkg/errors.{Errorf} once https://github.com/fatih/faillint/issues/10 is addressed.
@$(FAILLINT) -paths "errors=github.com/pkg/errors,\
github.com/prometheus/tsdb=github.com/prometheus/prometheus/tsdb,\
@$(FAILLINT) -paths "github.com/prometheus/tsdb=github.com/prometheus/prometheus/tsdb,\
github.com/prometheus/prometheus/pkg/testutils=github.com/thanos-io/thanos/pkg/testutil,\
github.com/prometheus/client_golang/prometheus.{DefaultGatherer,DefBuckets,NewUntypedFunc,UntypedFunc},\
github.com/prometheus/client_golang/prometheus.{NewCounter,NewCounterVec,NewCounterVec,NewGauge,NewGaugeVec,NewGaugeFunc,\
@ -400,10 +414,11 @@ NewHistorgram,NewHistogramVec,NewSummary,NewSummaryVec}=github.com/prometheus/cl
NewCounterVec,NewCounterVec,NewGauge,NewGaugeVec,NewGaugeFunc,NewHistorgram,NewHistogramVec,NewSummary,NewSummaryVec},\
github.com/NYTimes/gziphandler.{GzipHandler}=github.com/klauspost/compress/gzhttp.{GzipHandler},\
sync/atomic=go.uber.org/atomic,github.com/cortexproject/cortex=github.com/thanos-io/thanos/internal/cortex,\
github.com/prometheus/prometheus/promql/parser.{ParseExpr,ParseMetricSelector}=github.com/thanos-io/thanos/pkg/extpromql.{ParseExpr,ParseMetricSelector},\
io/ioutil.{Discard,NopCloser,ReadAll,ReadDir,ReadFile,TempDir,TempFile,Writefile}" $(shell go list ./... | grep -v "internal/cortex")
@$(FAILLINT) -paths "fmt.{Print,Println,Sprint}" -ignore-tests ./...
@echo ">> linting all of the Go files GOGC=${GOGC}"
@$(GOLANGCI_LINT) run
@$(GOLANGCI_LINT) run --build-tags=slicelabels
@echo ">> ensuring Copyright headers"
@go run ./scripts/copyright
@echo ">> ensuring generated proto files are up to date"

View File

@ -4,7 +4,7 @@
[![CI](https://github.com/thanos-io/thanos/workflows/CI/badge.svg)](https://github.com/thanos-io/thanos/actions?query=workflow%3ACI) [![CI](https://circleci.com/gh/thanos-io/thanos.svg?style=svg)](https://circleci.com/gh/thanos-io/thanos) [![go](https://github.com/thanos-io/thanos/workflows/go/badge.svg)](https://github.com/thanos-io/thanos/actions?query=workflow%3Ago) [![react](https://github.com/thanos-io/thanos/workflows/react/badge.svg)](https://github.com/thanos-io/thanos/actions?query=workflow%3Areact) [![docs](https://github.com/thanos-io/thanos/workflows/docs/badge.svg)](https://github.com/thanos-io/thanos/actions?query=workflow%3Adocs) [![Gitpod ready-to-code](https://img.shields.io/badge/Gitpod-ready--to--code-blue?logo=gitpod)](https://gitpod.io/#https://github.com/thanos-io/thanos) [![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=109162639)
> 📢 [ThanosCon](https://thanos.io/blog/2023-20-11-thanoscon/) is happening on 19th March as a co-located half-day on KubeCon EU in Paris. Join us there! 🤗 CFP is open until 3rd December!
> 📢 [ThanosCon](https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/co-located-events/thanoscon/) happened on 19th March 2024 as a co-located half-day on KubeCon EU in Paris.
## Overview

View File

@ -1 +1 @@
0.34.0-rc.0
0.40.0-dev

View File

@ -42,10 +42,12 @@ import (
"github.com/thanos-io/thanos/pkg/extprom"
extpromhttp "github.com/thanos-io/thanos/pkg/extprom/http"
"github.com/thanos-io/thanos/pkg/logging"
"github.com/thanos-io/thanos/pkg/logutil"
"github.com/thanos-io/thanos/pkg/prober"
"github.com/thanos-io/thanos/pkg/runutil"
httpserver "github.com/thanos-io/thanos/pkg/server/http"
"github.com/thanos-io/thanos/pkg/store"
"github.com/thanos-io/thanos/pkg/strutil"
"github.com/thanos-io/thanos/pkg/tracing"
"github.com/thanos-io/thanos/pkg/ui"
)
@ -205,7 +207,7 @@ func runCompact(
return err
}
bkt, err := client.NewBucket(logger, confContentYaml, component.String())
bkt, err := client.NewBucket(logger, confContentYaml, component.String(), nil)
if err != nil {
return err
}
@ -239,17 +241,26 @@ func runCompact(
consistencyDelayMetaFilter := block.NewConsistencyDelayMetaFilter(logger, conf.consistencyDelay, extprom.WrapRegistererWithPrefix("thanos_", reg))
timePartitionMetaFilter := block.NewTimePartitionMetaFilter(conf.filterConf.MinTime, conf.filterConf.MaxTime)
baseBlockIDsFetcher := block.NewBaseBlockIDsFetcher(logger, insBkt)
baseMetaFetcher, err := block.NewBaseFetcher(logger, conf.blockMetaFetchConcurrency, insBkt, baseBlockIDsFetcher, conf.dataDir, extprom.WrapRegistererWithPrefix("thanos_", reg))
var blockLister block.Lister
switch syncStrategy(conf.blockListStrategy) {
case concurrentDiscovery:
blockLister = block.NewConcurrentLister(logger, insBkt)
case recursiveDiscovery:
blockLister = block.NewRecursiveLister(logger, insBkt)
default:
return errors.Errorf("unknown sync strategy %s", conf.blockListStrategy)
}
baseMetaFetcher, err := block.NewBaseFetcher(logger, conf.blockMetaFetchConcurrency, insBkt, blockLister, conf.dataDir, extprom.WrapRegistererWithPrefix("thanos_", reg))
if err != nil {
return errors.Wrap(err, "create meta fetcher")
}
enableVerticalCompaction := conf.enableVerticalCompaction
if len(conf.dedupReplicaLabels) > 0 {
dedupReplicaLabels := strutil.ParseFlagLabels(conf.dedupReplicaLabels)
if len(dedupReplicaLabels) > 0 {
enableVerticalCompaction = true
level.Info(logger).Log(
"msg", "deduplication.replica-label specified, enabling vertical compaction", "dedupReplicaLabels", strings.Join(conf.dedupReplicaLabels, ","),
"msg", "deduplication.replica-label specified, enabling vertical compaction", "dedupReplicaLabels", strings.Join(dedupReplicaLabels, ","),
)
}
if enableVerticalCompaction {
@ -267,7 +278,7 @@ func runCompact(
labelShardedMetaFilter,
consistencyDelayMetaFilter,
ignoreDeletionMarkFilter,
block.NewReplicaLabelRemover(logger, conf.dedupReplicaLabels),
block.NewReplicaLabelRemover(logger, dedupReplicaLabels),
duplicateBlocksFilter,
noCompactMarkerFilter,
}
@ -280,6 +291,11 @@ func runCompact(
cf.UpdateOnChange(func(blocks []metadata.Meta, err error) {
api.SetLoaded(blocks, err)
})
var syncMetasTimeout = conf.waitInterval
if !conf.wait {
syncMetasTimeout = 0
}
sy, err = compact.NewMetaSyncer(
logger,
reg,
@ -289,6 +305,7 @@ func runCompact(
ignoreDeletionMarkFilter,
compactMetrics.blocksMarked.WithLabelValues(metadata.DeletionMarkFilename, ""),
compactMetrics.garbageCollectedBlocks,
syncMetasTimeout,
)
if err != nil {
return errors.Wrap(err, "create syncer")
@ -318,7 +335,7 @@ func runCompact(
case compact.DedupAlgorithmPenalty:
mergeFunc = dedup.NewChunkSeriesMerger()
if len(conf.dedupReplicaLabels) == 0 {
if len(dedupReplicaLabels) == 0 {
return errors.New("penalty based deduplication needs at least one replica label specified")
}
case "":
@ -330,7 +347,7 @@ func runCompact(
// Instantiate the compactor with different time slices. Timestamps in TSDB
// are in milliseconds.
comp, err := tsdb.NewLeveledCompactor(ctx, reg, logger, levels, downsample.NewPool(), mergeFunc)
comp, err := tsdb.NewLeveledCompactor(ctx, reg, logutil.GoKitLogToSlog(logger), levels, downsample.NewPool(), mergeFunc)
if err != nil {
return errors.Wrap(err, "create compactor")
}
@ -361,13 +378,20 @@ func runCompact(
conf.blockFilesConcurrency,
conf.compactBlocksFetchConcurrency,
)
var planner compact.Planner
tsdbPlanner := compact.NewPlanner(logger, levels, noCompactMarkerFilter)
planner := compact.WithLargeTotalIndexSizeFilter(
largeIndexFilterPlanner := compact.WithLargeTotalIndexSizeFilter(
tsdbPlanner,
insBkt,
int64(conf.maxBlockIndexSize),
compactMetrics.blocksMarked.WithLabelValues(metadata.NoCompactMarkFilename, metadata.IndexSizeExceedingNoCompactReason),
)
if enableVerticalCompaction {
planner = compact.WithVerticalCompactionDownsampleFilter(largeIndexFilterPlanner, insBkt, compactMetrics.blocksMarked.WithLabelValues(metadata.NoCompactMarkFilename, metadata.DownsampleVerticalCompactionNoCompactReason))
} else {
planner = largeIndexFilterPlanner
}
blocksCleaner := compact.NewBlocksCleaner(logger, insBkt, ignoreDeletionMarkFilter, deleteDelay, compactMetrics.blocksCleaned, compactMetrics.blockCleanupFailures)
compactor, err := compact.NewBucketCompactor(
logger,
@ -379,6 +403,7 @@ func runCompact(
insBkt,
conf.compactionConcurrency,
conf.skipBlockWithOutOfOrderChunks,
blocksCleaner,
)
if err != nil {
return errors.Wrap(err, "create bucket compactor")
@ -414,14 +439,7 @@ func runCompact(
cleanMtx.Lock()
defer cleanMtx.Unlock()
if err := sy.SyncMetas(ctx); err != nil {
return errors.Wrap(err, "syncing metas")
}
compact.BestEffortCleanAbortedPartialUploads(ctx, logger, sy.Partial(), insBkt, compactMetrics.partialUploadDeleteAttempts, compactMetrics.blocksCleaned, compactMetrics.blockCleanupFailures)
if err := blocksCleaner.DeleteMarkedBlocks(ctx); err != nil {
return errors.Wrap(err, "cleaning marked blocks")
}
compact.BestEffortCleanAbortedPartialUploads(ctx, logger, sy.Partial(), insBkt, compactMetrics.partialUploadDeleteAttempts, compactMetrics.blocksCleaned, compactMetrics.blockCleanupFailures, ignoreDeletionMarkFilter.DeletionMarkBlocks())
compactMetrics.cleanups.Inc()
return nil
@ -448,9 +466,9 @@ func runCompact(
}
for _, meta := range filteredMetas {
groupKey := meta.Thanos.GroupKey()
downsampleMetrics.downsamples.WithLabelValues(groupKey)
downsampleMetrics.downsampleFailures.WithLabelValues(groupKey)
resolutionLabel := meta.Thanos.ResolutionString()
downsampleMetrics.downsamples.WithLabelValues(resolutionLabel)
downsampleMetrics.downsampleFailures.WithLabelValues(resolutionLabel)
}
if err := downsampleBucket(
@ -473,6 +491,14 @@ func runCompact(
return errors.Wrap(err, "sync before second pass of downsampling")
}
// Regenerate the filtered list of blocks after the sync,
// to include the blocks created by the first pass.
filteredMetas = sy.Metas()
noDownsampleBlocks = noDownsampleMarkerFilter.NoDownsampleMarkedBlocks()
for ul := range noDownsampleBlocks {
delete(filteredMetas, ul)
}
if err := downsampleBucket(
ctx,
logger,
@ -693,6 +719,7 @@ type compactConfig struct {
wait bool
waitInterval time.Duration
disableDownsampling bool
blockListStrategy string
blockMetaFetchConcurrency int
blockFilesConcurrency int
blockViewerSyncBlockInterval time.Duration
@ -754,6 +781,9 @@ func (cc *compactConfig) registerFlag(cmd extkingpin.FlagClause) {
"as querying long time ranges without non-downsampled data is not efficient and useful e.g it is not possible to render all samples for a human eye anyway").
Default("false").BoolVar(&cc.disableDownsampling)
strategies := strings.Join([]string{string(concurrentDiscovery), string(recursiveDiscovery)}, ", ")
cmd.Flag("block-discovery-strategy", "One of "+strategies+". When set to concurrent, stores will concurrently issue one call per directory to discover active blocks in the bucket. The recursive strategy iterates through all objects in the bucket, recursively traversing into each directory. This avoids N+1 calls at the expense of having slower bucket iterations.").
Default(string(concurrentDiscovery)).StringVar(&cc.blockListStrategy)
cmd.Flag("block-meta-fetch-concurrency", "Number of goroutines to use when fetching block metadata from object storage.").
Default("32").IntVar(&cc.blockMetaFetchConcurrency)
cmd.Flag("block-files-concurrency", "Number of goroutines to use when fetching/uploading block files from object storage.").
@ -791,8 +821,9 @@ func (cc *compactConfig) registerFlag(cmd extkingpin.FlagClause) {
"When set to penalty, penalty based deduplication algorithm will be used. At least one replica label has to be set via --deduplication.replica-label flag.").
Default("").EnumVar(&cc.dedupFunc, compact.DedupAlgorithmPenalty, "")
cmd.Flag("deduplication.replica-label", "Label to treat as a replica indicator of blocks that can be deduplicated (repeated flag). This will merge multiple replica blocks into one. This process is irreversible."+
"Experimental. When one or more labels are set, compactor will ignore the given labels so that vertical compaction can merge the blocks."+
cmd.Flag("deduplication.replica-label", "Experimental. Label to treat as a replica indicator of blocks that can be deduplicated (repeated flag). This will merge multiple replica blocks into one. This process is irreversible. "+
"Flag may be specified multiple times as well as a comma separated list of labels. "+
"When one or more labels are set, compactor will ignore the given labels so that vertical compaction can merge the blocks."+
"Please note that by default this uses a NAIVE algorithm for merging which works well for deduplication of blocks with **precisely the same samples** like produced by Receiver replication."+
"If you need a different deduplication algorithm (e.g one that works well with Prometheus replicas), please set it via --deduplication.func.").
StringsVar(&cc.dedupReplicaLabels)

View File

@ -8,17 +8,23 @@ package main
import (
"net/url"
"sort"
"strconv"
"strings"
"time"
"github.com/KimMachineGun/automemlimit/memlimit"
extflag "github.com/efficientgo/tools/extkingpin"
"github.com/go-kit/log"
"github.com/opentracing/opentracing-go"
"github.com/pkg/errors"
"google.golang.org/grpc"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/common/model"
"github.com/prometheus/prometheus/model/labels"
"github.com/thanos-io/thanos/pkg/extgrpc"
"github.com/thanos-io/thanos/pkg/extgrpc/snappy"
"github.com/thanos-io/thanos/pkg/extkingpin"
"github.com/thanos-io/thanos/pkg/shipper"
)
@ -28,6 +34,7 @@ type grpcConfig struct {
tlsSrvCert string
tlsSrvKey string
tlsSrvClientCA string
tlsMinVersion string
gracePeriod time.Duration
maxConnectionAge time.Duration
}
@ -45,6 +52,9 @@ func (gc *grpcConfig) registerFlag(cmd extkingpin.FlagClause) *grpcConfig {
cmd.Flag("grpc-server-tls-client-ca",
"TLS CA to verify clients against. If no client CA is specified, there is no client verification on server side. (tls.NoClientCert)").
Default("").StringVar(&gc.tlsSrvClientCA)
cmd.Flag("grpc-server-tls-min-version",
"TLS supported minimum version for gRPC server. If no version is specified, it'll default to 1.3. Allowed values: [\"1.0\", \"1.1\", \"1.2\", \"1.3\"]").
Default("1.3").StringVar(&gc.tlsMinVersion)
cmd.Flag("grpc-server-max-connection-age", "The grpc server max connection age. This controls how often to re-establish connections and redo TLS handshakes.").
Default("60m").DurationVar(&gc.maxConnectionAge)
cmd.Flag("grpc-grace-period",
@ -54,6 +64,38 @@ func (gc *grpcConfig) registerFlag(cmd extkingpin.FlagClause) *grpcConfig {
return gc
}
type grpcClientConfig struct {
secure bool
skipVerify bool
cert, key, caCert string
serverName string
compression string
}
func (gc *grpcClientConfig) registerFlag(cmd extkingpin.FlagClause) *grpcClientConfig {
cmd.Flag("grpc-client-tls-secure", "Use TLS when talking to the gRPC server").Default("false").BoolVar(&gc.secure)
cmd.Flag("grpc-client-tls-skip-verify", "Disable TLS certificate verification i.e self signed, signed by fake CA").Default("false").BoolVar(&gc.skipVerify)
cmd.Flag("grpc-client-tls-cert", "TLS Certificates to use to identify this client to the server").Default("").StringVar(&gc.cert)
cmd.Flag("grpc-client-tls-key", "TLS Key for the client's certificate").Default("").StringVar(&gc.key)
cmd.Flag("grpc-client-tls-ca", "TLS CA Certificates to use to verify gRPC servers").Default("").StringVar(&gc.caCert)
cmd.Flag("grpc-client-server-name", "Server name to verify the hostname on the returned gRPC certificates. See https://tools.ietf.org/html/rfc4366#section-3.1").Default("").StringVar(&gc.serverName)
compressionOptions := strings.Join([]string{snappy.Name, compressionNone}, ", ")
cmd.Flag("grpc-compression", "Compression algorithm to use for gRPC requests to other clients. Must be one of: "+compressionOptions).Default(compressionNone).EnumVar(&gc.compression, snappy.Name, compressionNone)
return gc
}
func (gc *grpcClientConfig) dialOptions(logger log.Logger, reg prometheus.Registerer, tracer opentracing.Tracer) ([]grpc.DialOption, error) {
dialOpts, err := extgrpc.StoreClientGRPCOpts(logger, reg, tracer, gc.secure, gc.skipVerify, gc.cert, gc.key, gc.caCert, gc.serverName)
if err != nil {
return nil, errors.Wrapf(err, "building gRPC client")
}
if gc.compression != compressionNone {
dialOpts = append(dialOpts, grpc.WithDefaultCallOptions(grpc.UseCompressor(gc.compression)))
}
return dialOpts, nil
}
type httpConfig struct {
bindAddress string
tlsConfig string
@ -94,7 +136,7 @@ func (pc *prometheusConfig) registerFlag(cmd extkingpin.FlagClause) *prometheusC
Default("30s").DurationVar(&pc.getConfigInterval)
cmd.Flag("prometheus.get_config_timeout",
"Timeout for getting Prometheus config").
Default("5s").DurationVar(&pc.getConfigTimeout)
Default("30s").DurationVar(&pc.getConfigTimeout)
pc.httpClient = extflag.RegisterPathOrContent(
cmd,
"prometheus.http-client",
@ -161,6 +203,7 @@ type shipperConfig struct {
uploadCompacted bool
ignoreBlockSize bool
allowOutOfOrderUpload bool
skipCorruptedBlocks bool
hashFunc string
metaFileName string
}
@ -177,6 +220,11 @@ func (sc *shipperConfig) registerFlag(cmd extkingpin.FlagClause) *shipperConfig
"This can trigger compaction without those blocks and as a result will create an overlap situation. Set it to true if you have vertical compaction enabled and wish to upload blocks as soon as possible without caring"+
"about order.").
Default("false").Hidden().BoolVar(&sc.allowOutOfOrderUpload)
cmd.Flag("shipper.skip-corrupted-blocks",
"If true, shipper will skip corrupted blocks in the given iteration and retry later. This means that some newer blocks might be uploaded sooner than older blocks."+
"This can trigger compaction without those blocks and as a result will create an overlap situation. Set it to true if you have vertical compaction enabled and wish to upload blocks as soon as possible without caring"+
"about order.").
Default("false").Hidden().BoolVar(&sc.skipCorruptedBlocks)
cmd.Flag("hash-func", "Specify which hash function to use when calculating the hashes of produced files. If no function has been specified, it does not happen. This permits avoiding downloading some files twice albeit at some performance cost. Possible values are: \"\", \"SHA256\".").
Default("").EnumVar(&sc.hashFunc, "SHA256", "")
cmd.Flag("shipper.meta-file-name", "the file to store shipper metadata in").Default(shipper.DefaultMetaFilename).StringVar(&sc.metaFileName)
@ -265,21 +313,60 @@ func (ac *alertMgrConfig) registerFlag(cmd extflag.FlagClause) *alertMgrConfig {
}
func parseFlagLabels(s []string) (labels.Labels, error) {
var lset labels.Labels
var lset labels.ScratchBuilder
for _, l := range s {
parts := strings.SplitN(l, "=", 2)
if len(parts) != 2 {
return nil, errors.Errorf("unrecognized label %q", l)
return labels.EmptyLabels(), errors.Errorf("unrecognized label %q", l)
}
if !model.LabelName.IsValid(model.LabelName(parts[0])) {
return nil, errors.Errorf("unsupported format for label %s", l)
return labels.EmptyLabels(), errors.Errorf("unsupported format for label %s", l)
}
val, err := strconv.Unquote(parts[1])
if err != nil {
return nil, errors.Wrap(err, "unquote label value")
return labels.EmptyLabels(), errors.Wrap(err, "unquote label value")
}
lset = append(lset, labels.Label{Name: parts[0], Value: val})
lset.Add(parts[0], val)
}
sort.Sort(lset)
return lset, nil
lset.Sort()
return lset.Labels(), nil
}
type goMemLimitConfig struct {
enableAutoGoMemlimit bool
memlimitRatio float64
}
func (gml *goMemLimitConfig) registerFlag(cmd extkingpin.FlagClause) *goMemLimitConfig {
cmd.Flag("enable-auto-gomemlimit",
"Enable go runtime to automatically limit memory consumption.").
Default("false").BoolVar(&gml.enableAutoGoMemlimit)
cmd.Flag("auto-gomemlimit.ratio",
"The ratio of reserved GOMEMLIMIT memory to the detected maximum container or system memory.").
Default("0.9").FloatVar(&gml.memlimitRatio)
return gml
}
func configureGoAutoMemLimit(common goMemLimitConfig) error {
if common.memlimitRatio <= 0.0 || common.memlimitRatio > 1.0 {
return errors.New("--auto-gomemlimit.ratio must be greater than 0 and less than or equal to 1.")
}
if common.enableAutoGoMemlimit {
if _, err := memlimit.SetGoMemLimitWithOpts(
memlimit.WithRatio(common.memlimitRatio),
memlimit.WithProvider(
memlimit.ApplyFallback(
memlimit.FromCgroup,
memlimit.FromSystem,
),
),
); err != nil {
return errors.Wrap(err, "Failed to set GOMEMLIMIT automatically")
}
}
return nil
}

View File

@ -15,7 +15,8 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/oklog/run"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
@ -28,10 +29,12 @@ import (
"github.com/thanos-io/thanos/pkg/block"
"github.com/thanos-io/thanos/pkg/block/metadata"
"github.com/thanos-io/thanos/pkg/compact"
"github.com/thanos-io/thanos/pkg/compact/downsample"
"github.com/thanos-io/thanos/pkg/component"
"github.com/thanos-io/thanos/pkg/errutil"
"github.com/thanos-io/thanos/pkg/extprom"
"github.com/thanos-io/thanos/pkg/logutil"
"github.com/thanos-io/thanos/pkg/prober"
"github.com/thanos-io/thanos/pkg/runutil"
httpserver "github.com/thanos-io/thanos/pkg/server/http"
@ -49,16 +52,16 @@ func newDownsampleMetrics(reg *prometheus.Registry) *DownsampleMetrics {
m.downsamples = promauto.With(reg).NewCounterVec(prometheus.CounterOpts{
Name: "thanos_compact_downsample_total",
Help: "Total number of downsampling attempts.",
}, []string{"group"})
}, []string{"resolution"})
m.downsampleFailures = promauto.With(reg).NewCounterVec(prometheus.CounterOpts{
Name: "thanos_compact_downsample_failures_total",
Help: "Total number of failed downsampling attempts.",
}, []string{"group"})
}, []string{"resolution"})
m.downsampleDuration = promauto.With(reg).NewHistogramVec(prometheus.HistogramOpts{
Name: "thanos_compact_downsample_duration_seconds",
Help: "Duration of downsample runs",
Buckets: []float64{60, 300, 900, 1800, 3600, 7200, 14400}, // 1m, 5m, 15m, 30m, 60m, 120m, 240m
}, []string{"group"})
}, []string{"resolution"})
return m
}
@ -83,14 +86,14 @@ func RunDownsample(
return err
}
bkt, err := client.NewBucket(logger, confContentYaml, component.Downsample.String())
bkt, err := client.NewBucket(logger, confContentYaml, component.Downsample.String(), nil)
if err != nil {
return err
}
insBkt := objstoretracing.WrapWithTraces(objstore.WrapWithMetrics(bkt, extprom.WrapRegistererWithPrefix("thanos_", reg), bkt.Name()))
// While fetching blocks, filter out blocks that were marked for no downsample.
baseBlockIDsFetcher := block.NewBaseBlockIDsFetcher(logger, insBkt)
baseBlockIDsFetcher := block.NewConcurrentLister(logger, insBkt)
metaFetcher, err := block.NewMetaFetcher(logger, block.FetcherConcurrency, insBkt, baseBlockIDsFetcher, "", extprom.WrapRegistererWithPrefix("thanos_", reg), []block.MetadataFilter{
block.NewDeduplicateFilter(block.FetcherConcurrency),
downsample.NewGatherNoDownsampleMarkFilter(logger, insBkt, block.FetcherConcurrency),
@ -129,9 +132,9 @@ func RunDownsample(
}
for _, meta := range metas {
groupKey := meta.Thanos.GroupKey()
metrics.downsamples.WithLabelValues(groupKey)
metrics.downsampleFailures.WithLabelValues(groupKey)
resolutionLabel := meta.Thanos.ResolutionString()
metrics.downsamples.WithLabelValues(resolutionLabel)
metrics.downsampleFailures.WithLabelValues(resolutionLabel)
}
if err := downsampleBucket(ctx, logger, metrics, insBkt, metas, dataDir, downsampleConcurrency, blockFilesConcurrency, hashFunc, false); err != nil {
return errors.Wrap(err, "downsampling failed")
@ -250,10 +253,8 @@ func downsampleBucket(
defer workerCancel()
level.Debug(logger).Log("msg", "downsampling bucket", "concurrency", downsampleConcurrency)
for i := 0; i < downsampleConcurrency; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for range downsampleConcurrency {
wg.Go(func() {
for m := range metaCh {
resolution := downsample.ResLevel1
errMsg := "downsampling to 5 min"
@ -262,13 +263,13 @@ func downsampleBucket(
errMsg = "downsampling to 60 min"
}
if err := processDownsampling(workerCtx, logger, bkt, m, dir, resolution, hashFunc, metrics, acceptMalformedIndex, blockFilesConcurrency); err != nil {
metrics.downsampleFailures.WithLabelValues(m.Thanos.GroupKey()).Inc()
metrics.downsampleFailures.WithLabelValues(m.Thanos.ResolutionString()).Inc()
errCh <- errors.Wrap(err, errMsg)
}
metrics.downsamples.WithLabelValues(m.Thanos.GroupKey()).Inc()
metrics.downsamples.WithLabelValues(m.Thanos.ResolutionString()).Inc()
}
}()
})
}
// Workers scheduled, distribute blocks.
@ -358,7 +359,7 @@ func processDownsampling(
err := block.Download(ctx, logger, bkt, m.ULID, bdir, objstore.WithFetchConcurrency(blockFilesConcurrency))
if err != nil {
return errors.Wrapf(err, "download block %s", m.ULID)
return compact.NewRetryError(errors.Wrapf(err, "download block %s", m.ULID))
}
level.Info(logger).Log("msg", "downloaded block", "id", m.ULID, "duration", time.Since(begin), "duration_ms", time.Since(begin).Milliseconds())
@ -375,7 +376,7 @@ func processDownsampling(
pool = downsample.NewPool()
}
b, err := tsdb.OpenBlock(logger, bdir, pool)
b, err := tsdb.OpenBlock(logutil.GoKitLogToSlog(logger), bdir, pool, nil)
if err != nil {
return errors.Wrapf(err, "open block %s", m.ULID)
}
@ -390,7 +391,7 @@ func processDownsampling(
downsampleDuration := time.Since(begin)
level.Info(logger).Log("msg", "downsampled block",
"from", m.ULID, "to", id, "duration", downsampleDuration, "duration_ms", downsampleDuration.Milliseconds())
metrics.downsampleDuration.WithLabelValues(m.Thanos.GroupKey()).Observe(downsampleDuration.Seconds())
metrics.downsampleDuration.WithLabelValues(m.Thanos.ResolutionString()).Observe(downsampleDuration.Seconds())
stats, err := block.GatherIndexHealthStats(ctx, logger, filepath.Join(resdir, block.IndexFilename), m.MinTime, m.MaxTime)
if err == nil {
@ -419,7 +420,7 @@ func processDownsampling(
err = block.Upload(ctx, logger, bkt, resdir, hashFunc)
if err != nil {
return errors.Wrapf(err, "upload downsampled block %s", id)
return compact.NewRetryError(errors.Wrapf(err, "upload downsampled block %s", id))
}
level.Info(logger).Log("msg", "uploaded block", "id", id, "duration", time.Since(begin), "duration_ms", time.Since(begin).Milliseconds())

380
cmd/thanos/endpointset.go Normal file
View File

@ -0,0 +1,380 @@
// Copyright (c) The Thanos Authors.
// Licensed under the Apache License 2.0.
package main
import (
"context"
"fmt"
"sync"
"time"
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/oklog/run"
"google.golang.org/grpc"
"gopkg.in/yaml.v3"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
"github.com/prometheus/common/model"
"github.com/prometheus/prometheus/discovery"
"github.com/prometheus/prometheus/discovery/file"
"github.com/prometheus/prometheus/discovery/targetgroup"
"github.com/thanos-io/thanos/pkg/component"
"github.com/thanos-io/thanos/pkg/discovery/cache"
"github.com/thanos-io/thanos/pkg/discovery/dns"
"github.com/thanos-io/thanos/pkg/errors"
"github.com/thanos-io/thanos/pkg/extgrpc"
"github.com/thanos-io/thanos/pkg/extkingpin"
"github.com/thanos-io/thanos/pkg/extprom"
"github.com/thanos-io/thanos/pkg/logutil"
"github.com/thanos-io/thanos/pkg/query"
"github.com/thanos-io/thanos/pkg/runutil"
)
// fileContent is the interface of methods that we need from extkingpin.PathOrContent.
// We need to abstract it for now so we can implement a default if the user does not provide one.
type fileContent interface {
Content() ([]byte, error)
Path() string
}
type endpointSettings struct {
Strict bool `yaml:"strict"`
Group bool `yaml:"group"`
Address string `yaml:"address"`
ServiceConfig string `yaml:"service_config"`
}
type EndpointConfig struct {
Endpoints []endpointSettings `yaml:"endpoints"`
}
type endpointConfigProvider struct {
mu sync.Mutex
cfg EndpointConfig
// statically defined endpoints from flags for backwards compatibility
endpoints []string
endpointGroups []string
strictEndpoints []string
strictEndpointGroups []string
}
func (er *endpointConfigProvider) config() EndpointConfig {
er.mu.Lock()
defer er.mu.Unlock()
res := EndpointConfig{Endpoints: make([]endpointSettings, len(er.cfg.Endpoints))}
copy(res.Endpoints, er.cfg.Endpoints)
return res
}
func (er *endpointConfigProvider) parse(configFile fileContent) (EndpointConfig, error) {
content, err := configFile.Content()
if err != nil {
return EndpointConfig{}, errors.Wrapf(err, "unable to load config content: %s", configFile.Path())
}
var cfg EndpointConfig
if err := yaml.Unmarshal(content, &cfg); err != nil {
return EndpointConfig{}, errors.Wrapf(err, "unable to unmarshal config content: %s", configFile.Path())
}
return cfg, nil
}
func (er *endpointConfigProvider) addStaticEndpoints(cfg *EndpointConfig) {
for _, e := range er.endpoints {
cfg.Endpoints = append(cfg.Endpoints, endpointSettings{
Address: e,
})
}
for _, e := range er.endpointGroups {
cfg.Endpoints = append(cfg.Endpoints, endpointSettings{
Address: e,
Group: true,
})
}
for _, e := range er.strictEndpoints {
cfg.Endpoints = append(cfg.Endpoints, endpointSettings{
Address: e,
Strict: true,
})
}
for _, e := range er.strictEndpointGroups {
cfg.Endpoints = append(cfg.Endpoints, endpointSettings{
Address: e,
Group: true,
Strict: true,
})
}
}
func validateEndpointConfig(cfg EndpointConfig) error {
for _, ecfg := range cfg.Endpoints {
if dns.IsDynamicNode(ecfg.Address) && ecfg.Strict {
return errors.Newf("%s is a dynamically specified endpoint i.e. it uses SD and that is not permitted under strict mode.", ecfg.Address)
}
if !ecfg.Group && len(ecfg.ServiceConfig) != 0 {
return errors.Newf("%s service_config is only valid for endpoint groups.", ecfg.Address)
}
}
return nil
}
func newEndpointConfigProvider(
logger log.Logger,
configFile fileContent,
configReloadInterval time.Duration,
staticEndpoints []string,
staticEndpointGroups []string,
staticStrictEndpoints []string,
staticStrictEndpointGroups []string,
) (*endpointConfigProvider, error) {
res := &endpointConfigProvider{
endpoints: staticEndpoints,
endpointGroups: staticEndpointGroups,
strictEndpoints: staticStrictEndpoints,
strictEndpointGroups: staticStrictEndpointGroups,
}
if configFile == nil {
configFile = extkingpin.NewNopConfig()
}
cfg, err := res.parse(configFile)
if err != nil {
return nil, errors.Wrapf(err, "unable to load config file")
}
res.addStaticEndpoints(&cfg)
res.cfg = cfg
if err := validateEndpointConfig(cfg); err != nil {
return nil, errors.Wrapf(err, "unable to validate endpoints")
}
// only static endpoints
if len(configFile.Path()) == 0 {
return res, nil
}
if err := extkingpin.PathContentReloader(context.Background(), configFile, logger, func() {
res.mu.Lock()
defer res.mu.Unlock()
level.Info(logger).Log("msg", "reloading endpoint config")
cfg, err := res.parse(configFile)
if err != nil {
level.Error(logger).Log("msg", "failed to reload endpoint config", "err", err)
return
}
res.addStaticEndpoints(&cfg)
if err := validateEndpointConfig(cfg); err != nil {
level.Error(logger).Log("msg", "failed to validate endpoint config", "err", err)
return
}
res.cfg = cfg
}, configReloadInterval); err != nil {
return nil, errors.Wrapf(err, "unable to create config reloader")
}
return res, nil
}
func setupEndpointSet(
g *run.Group,
comp component.Component,
reg prometheus.Registerer,
logger log.Logger,
configFile fileContent,
configReloadInterval time.Duration,
legacyFileSDFiles []string,
legacyFileSDInterval time.Duration,
legacyEndpoints []string,
legacyEndpointGroups []string,
legacyStrictEndpoints []string,
legacyStrictEndpointGroups []string,
dnsSDResolver string,
dnsSDInterval time.Duration,
unhealthyTimeout time.Duration,
endpointTimeout time.Duration,
dialOpts []grpc.DialOption,
queryConnMetricLabels ...string,
) (*query.EndpointSet, error) {
configProvider, err := newEndpointConfigProvider(
logger,
configFile,
configReloadInterval,
legacyEndpoints,
legacyEndpointGroups,
legacyStrictEndpoints,
legacyStrictEndpointGroups,
)
if err != nil {
return nil, errors.Wrapf(err, "unable to load config initially")
}
// Register resolver for the "thanos:///" scheme for endpoint-groups
dns.RegisterGRPCResolver(
logger,
dns.NewProvider(
logger,
extprom.WrapRegistererWithPrefix(fmt.Sprintf("thanos_%s_endpoint_groups_", comp), reg),
dns.ResolverType(dnsSDResolver),
),
dnsSDInterval,
)
dnsEndpointProvider := dns.NewProvider(
logger,
extprom.WrapRegistererWithPrefix(fmt.Sprintf("thanos_%s_endpoints_", comp), reg),
dns.ResolverType(dnsSDResolver),
)
duplicatedEndpoints := promauto.With(reg).NewCounter(prometheus.CounterOpts{
Name: fmt.Sprintf("thanos_%s_duplicated_endpoint_addresses_total", comp),
Help: "The number of times a duplicated endpoint addresses is detected from the different configs",
})
removeDuplicateEndpointSpecs := func(specs []*query.GRPCEndpointSpec) []*query.GRPCEndpointSpec {
set := make(map[string]*query.GRPCEndpointSpec)
for _, spec := range specs {
addr := spec.Addr()
if _, ok := set[addr]; ok {
level.Warn(logger).Log("msg", "Duplicate endpoint address is provided", "addr", addr)
duplicatedEndpoints.Inc()
}
set[addr] = spec
}
deduplicated := make([]*query.GRPCEndpointSpec, 0, len(set))
for _, value := range set {
deduplicated = append(deduplicated, value)
}
return deduplicated
}
var fileSD *file.Discovery
if len(legacyFileSDFiles) > 0 {
conf := &file.SDConfig{
Files: legacyFileSDFiles,
RefreshInterval: model.Duration(legacyFileSDInterval),
}
var err error
if fileSD, err = file.NewDiscovery(conf, logutil.GoKitLogToSlog(logger), conf.NewDiscovererMetrics(reg, discovery.NewRefreshMetrics(reg))); err != nil {
return nil, fmt.Errorf("unable to create new legacy file sd config: %w", err)
}
}
legacyFileSDCache := cache.New()
// Perform initial DNS resolution before starting periodic updates.
// This ensures that DNS providers have addresses when the first endpoint update runs.
{
resolveCtx, resolveCancel := context.WithTimeout(context.Background(), dnsSDInterval)
defer resolveCancel()
level.Info(logger).Log("msg", "performing initial DNS resolution for endpoints")
endpointConfig := configProvider.config()
addresses := make([]string, 0, len(endpointConfig.Endpoints))
for _, ecfg := range endpointConfig.Endpoints {
// Only resolve non-group dynamic endpoints here.
// Group endpoints are resolved by the gRPC resolver in its Build() method.
if addr := ecfg.Address; dns.IsDynamicNode(addr) && !ecfg.Group {
addresses = append(addresses, addr)
}
}
// Note: legacyFileSDCache will be empty at this point since file SD hasn't started yet
if len(addresses) > 0 {
if err := dnsEndpointProvider.Resolve(resolveCtx, addresses, true); err != nil {
level.Error(logger).Log("msg", "initial DNS resolution failed", "err", err)
}
}
level.Info(logger).Log("msg", "initial DNS resolution completed")
}
ctx, cancel := context.WithCancel(context.Background())
if fileSD != nil {
fileSDUpdates := make(chan []*targetgroup.Group)
g.Add(func() error {
fileSD.Run(ctx, fileSDUpdates)
return nil
}, func(err error) {
cancel()
})
g.Add(func() error {
for {
select {
case update := <-fileSDUpdates:
// Discoverers sometimes send nil updates so need to check for it to avoid panics.
if update == nil {
continue
}
legacyFileSDCache.Update(update)
case <-ctx.Done():
return nil
}
}
}, func(err error) {
cancel()
})
}
{
g.Add(func() error {
return runutil.Repeat(dnsSDInterval, ctx.Done(), func() error {
ctxUpdateIter, cancelUpdateIter := context.WithTimeout(ctx, dnsSDInterval)
defer cancelUpdateIter()
endpointConfig := configProvider.config()
addresses := make([]string, 0, len(endpointConfig.Endpoints))
for _, ecfg := range endpointConfig.Endpoints {
if addr := ecfg.Address; dns.IsDynamicNode(addr) && !ecfg.Group {
addresses = append(addresses, addr)
}
}
addresses = append(addresses, legacyFileSDCache.Addresses()...)
if err := dnsEndpointProvider.Resolve(ctxUpdateIter, addresses, true); err != nil {
level.Error(logger).Log("msg", "failed to resolve addresses for endpoints", "err", err)
}
return nil
})
}, func(error) {
cancel()
})
}
endpointset := query.NewEndpointSet(time.Now, logger, reg, func() []*query.GRPCEndpointSpec {
endpointConfig := configProvider.config()
specs := make([]*query.GRPCEndpointSpec, 0)
// groups and non dynamic endpoints
for _, ecfg := range endpointConfig.Endpoints {
strict, group, addr := ecfg.Strict, ecfg.Group, ecfg.Address
if group {
specs = append(specs, query.NewGRPCEndpointSpec(fmt.Sprintf("thanos:///%s", addr), strict, append(dialOpts, extgrpc.EndpointGroupGRPCOpts(ecfg.ServiceConfig)...)...))
} else if !dns.IsDynamicNode(addr) {
specs = append(specs, query.NewGRPCEndpointSpec(addr, strict, dialOpts...))
}
}
// dynamic endpoints
for _, addr := range dnsEndpointProvider.Addresses() {
specs = append(specs, query.NewGRPCEndpointSpec(addr, false, dialOpts...))
}
return removeDuplicateEndpointSpecs(specs)
}, unhealthyTimeout, endpointTimeout, queryConnMetricLabels...)
g.Add(func() error {
return runutil.Repeat(endpointTimeout, ctx.Done(), func() error {
ctxIter, cancelIter := context.WithTimeout(ctx, endpointTimeout)
defer cancelIter()
endpointset.Update(ctxIter)
return nil
})
}, func(error) {
cancel()
})
return endpointset, nil
}

View File

@ -15,6 +15,7 @@ import (
"runtime/debug"
"syscall"
"github.com/alecthomas/kingpin/v2"
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/oklog/run"
@ -22,9 +23,9 @@ import (
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/collectors"
versioncollector "github.com/prometheus/client_golang/prometheus/collectors/version"
"github.com/prometheus/common/version"
"go.uber.org/automaxprocs/maxprocs"
"gopkg.in/alecthomas/kingpin.v2"
"github.com/thanos-io/thanos/pkg/extkingpin"
"github.com/thanos-io/thanos/pkg/logging"
@ -49,6 +50,10 @@ func main() {
Default(logging.LogFormatLogfmt).Enum(logging.LogFormatLogfmt, logging.LogFormatJSON)
tracingConfig := extkingpin.RegisterCommonTracingFlags(app)
goMemLimitConf := goMemLimitConfig{}
goMemLimitConf.registerFlag(app)
registerSidecar(app)
registerStore(app)
registerQuery(app)
@ -61,10 +66,15 @@ func main() {
cmd, setup := app.Parse()
logger := logging.NewLogger(*logLevel, *logFormat, *debugName)
if err := configureGoAutoMemLimit(goMemLimitConf); err != nil {
level.Error(logger).Log("msg", "failed to configure Go runtime memory limits", "err", err)
os.Exit(1)
}
// Running in container with limits but with empty/wrong value of GOMAXPROCS env var could lead to throttling by cpu
// maxprocs will automate adjustment by using cgroups info about cpu limit if it set as value for runtime.GOMAXPROCS.
undo, err := maxprocs.Set(maxprocs.Logger(func(template string, args ...interface{}) {
level.Debug(logger).Log("msg", fmt.Sprintf(template, args))
undo, err := maxprocs.Set(maxprocs.Logger(func(template string, args ...any) {
level.Debug(logger).Log("msg", fmt.Sprintf(template, args...))
}))
defer undo()
if err != nil {
@ -73,7 +83,7 @@ func main() {
metrics := prometheus.NewRegistry()
metrics.MustRegister(
version.NewCollector("thanos"),
versioncollector.NewCollector("thanos"),
collectors.NewGoCollector(
collectors.WithGoCollectorRuntimeMetrics(collectors.GoRuntimeMetricsRule{Matcher: regexp.MustCompile("/.*")}),
),
@ -204,6 +214,11 @@ func getFlagsMap(flags []*kingpin.FlagModel) map[string]string {
if boilerplateFlags.GetFlag(f.Name) != nil {
continue
}
// Mask inline objstore flag which can have credentials.
if f.Name == "objstore.config" || f.Name == "objstore.config-file" {
flagsMap[f.Name] = "<REDACTED>"
continue
}
flagsMap[f.Name] = f.Value.String()
}

View File

@ -14,13 +14,15 @@ import (
"time"
"github.com/go-kit/log"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/prometheus/client_golang/prometheus"
promtest "github.com/prometheus/client_golang/prometheus/testutil"
"github.com/prometheus/prometheus/model/labels"
"github.com/thanos-io/objstore"
"github.com/efficientgo/core/testutil"
"github.com/thanos-io/thanos/pkg/block"
"github.com/thanos-io/thanos/pkg/block/metadata"
"github.com/thanos-io/thanos/pkg/compact/downsample"
@ -31,6 +33,11 @@ type erroringBucket struct {
bkt objstore.InstrumentedBucket
}
// Provider returns the provider of the bucket.
func (b *erroringBucket) Provider() objstore.ObjProvider {
return b.bkt.Provider()
}
func (b *erroringBucket) Close() error {
return b.bkt.Close()
}
@ -89,8 +96,8 @@ func (b *erroringBucket) Attributes(ctx context.Context, name string) (objstore.
// Upload the contents of the reader as an object into the bucket.
// Upload should be idempotent.
func (b *erroringBucket) Upload(ctx context.Context, name string, r io.Reader) error {
return b.bkt.Upload(ctx, name, r)
func (b *erroringBucket) Upload(ctx context.Context, name string, r io.Reader, opts ...objstore.ObjectUploadOption) error {
return b.bkt.Upload(ctx, name, r, opts...)
}
// Delete removes the object with the given name.
@ -104,6 +111,16 @@ func (b *erroringBucket) Name() string {
return b.bkt.Name()
}
// IterWithAttributes allows to iterate over objects in the bucket with their attributes.
func (b *erroringBucket) IterWithAttributes(ctx context.Context, dir string, f func(objstore.IterObjectAttributes) error, options ...objstore.IterOption) error {
return b.bkt.IterWithAttributes(ctx, dir, f, options...)
}
// SupportedIterOptions returns the supported iteration options.
func (b *erroringBucket) SupportedIterOptions() []objstore.IterOptionType {
return b.bkt.SupportedIterOptions()
}
// Ensures that downsampleBucket() stops its work properly
// after an error occurs with some blocks in the backlog.
// Testing for https://github.com/thanos-io/thanos/issues/4960.
@ -122,10 +139,10 @@ func TestRegression4960_Deadlock(t *testing.T) {
id, err = e2eutil.CreateBlock(
ctx,
dir,
[]labels.Labels{{{Name: "a", Value: "1"}}},
[]labels.Labels{labels.FromStrings("a", "1")},
1, 0, downsample.ResLevel1DownsampleRange+1, // Pass the minimum ResLevel1DownsampleRange check.
labels.Labels{{Name: "e1", Value: "1"}},
downsample.ResLevel0, metadata.NoneFunc)
labels.FromStrings("e1", "1"),
downsample.ResLevel0, metadata.NoneFunc, nil)
testutil.Ok(t, err)
testutil.Ok(t, block.Upload(ctx, logger, bkt, path.Join(dir, id.String()), metadata.NoneFunc))
}
@ -133,10 +150,10 @@ func TestRegression4960_Deadlock(t *testing.T) {
id2, err = e2eutil.CreateBlock(
ctx,
dir,
[]labels.Labels{{{Name: "a", Value: "2"}}},
[]labels.Labels{labels.FromStrings("a", "2")},
1, 0, downsample.ResLevel1DownsampleRange+1, // Pass the minimum ResLevel1DownsampleRange check.
labels.Labels{{Name: "e1", Value: "2"}},
downsample.ResLevel0, metadata.NoneFunc)
labels.FromStrings("e1", "2"),
downsample.ResLevel0, metadata.NoneFunc, nil)
testutil.Ok(t, err)
testutil.Ok(t, block.Upload(ctx, logger, bkt, path.Join(dir, id2.String()), metadata.NoneFunc))
}
@ -144,10 +161,10 @@ func TestRegression4960_Deadlock(t *testing.T) {
id3, err = e2eutil.CreateBlock(
ctx,
dir,
[]labels.Labels{{{Name: "a", Value: "2"}}},
[]labels.Labels{labels.FromStrings("a", "2")},
1, 0, downsample.ResLevel1DownsampleRange+1, // Pass the minimum ResLevel1DownsampleRange check.
labels.Labels{{Name: "e1", Value: "2"}},
downsample.ResLevel0, metadata.NoneFunc)
labels.FromStrings("e1", "2"),
downsample.ResLevel0, metadata.NoneFunc, nil)
testutil.Ok(t, err)
testutil.Ok(t, block.Upload(ctx, logger, bkt, path.Join(dir, id3.String()), metadata.NoneFunc))
}
@ -156,8 +173,8 @@ func TestRegression4960_Deadlock(t *testing.T) {
testutil.Ok(t, err)
metrics := newDownsampleMetrics(prometheus.NewRegistry())
testutil.Equals(t, 0.0, promtest.ToFloat64(metrics.downsamples.WithLabelValues(meta.Thanos.GroupKey())))
baseBlockIDsFetcher := block.NewBaseBlockIDsFetcher(logger, bkt)
testutil.Equals(t, 0.0, promtest.ToFloat64(metrics.downsamples.WithLabelValues(meta.Thanos.ResolutionString())))
baseBlockIDsFetcher := block.NewConcurrentLister(logger, bkt)
metaFetcher, err := block.NewMetaFetcher(nil, block.FetcherConcurrency, bkt, baseBlockIDsFetcher, "", nil, nil)
testutil.Ok(t, err)
@ -184,10 +201,10 @@ func TestCleanupDownsampleCacheFolder(t *testing.T) {
id, err = e2eutil.CreateBlock(
ctx,
dir,
[]labels.Labels{{{Name: "a", Value: "1"}}},
[]labels.Labels{labels.FromStrings("a", "1")},
1, 0, downsample.ResLevel1DownsampleRange+1, // Pass the minimum ResLevel1DownsampleRange check.
labels.Labels{{Name: "e1", Value: "1"}},
downsample.ResLevel0, metadata.NoneFunc)
labels.FromStrings("e1", "1"),
downsample.ResLevel0, metadata.NoneFunc, nil)
testutil.Ok(t, err)
testutil.Ok(t, block.Upload(ctx, logger, bkt, path.Join(dir, id.String()), metadata.NoneFunc))
}
@ -196,15 +213,15 @@ func TestCleanupDownsampleCacheFolder(t *testing.T) {
testutil.Ok(t, err)
metrics := newDownsampleMetrics(prometheus.NewRegistry())
testutil.Equals(t, 0.0, promtest.ToFloat64(metrics.downsamples.WithLabelValues(meta.Thanos.GroupKey())))
baseBlockIDsFetcher := block.NewBaseBlockIDsFetcher(logger, bkt)
testutil.Equals(t, 0.0, promtest.ToFloat64(metrics.downsamples.WithLabelValues(meta.Thanos.ResolutionString())))
baseBlockIDsFetcher := block.NewConcurrentLister(logger, bkt)
metaFetcher, err := block.NewMetaFetcher(nil, block.FetcherConcurrency, bkt, baseBlockIDsFetcher, "", nil, nil)
testutil.Ok(t, err)
metas, _, err := metaFetcher.Fetch(ctx)
testutil.Ok(t, err)
testutil.Ok(t, downsampleBucket(ctx, logger, metrics, bkt, metas, dir, 1, 1, metadata.NoneFunc, false))
testutil.Equals(t, 1.0, promtest.ToFloat64(metrics.downsamples.WithLabelValues(meta.Thanos.GroupKey())))
testutil.Equals(t, 1.0, promtest.ToFloat64(metrics.downsamples.WithLabelValues(meta.Thanos.ResolutionString())))
_, err = os.Stat(dir)
testutil.Assert(t, os.IsNotExist(err), "index cache dir should not exist at the end of execution")

File diff suppressed because it is too large Load Diff

View File

@ -4,6 +4,8 @@
package main
import (
"context"
"maps"
"net"
"net/http"
"time"
@ -34,6 +36,7 @@ import (
"github.com/thanos-io/thanos/pkg/logging"
"github.com/thanos-io/thanos/pkg/prober"
"github.com/thanos-io/thanos/pkg/queryfrontend"
"github.com/thanos-io/thanos/pkg/runutil"
httpserver "github.com/thanos-io/thanos/pkg/server/http"
"github.com/thanos-io/thanos/pkg/server/http/middleware"
"github.com/thanos-io/thanos/pkg/tenancy"
@ -72,10 +75,10 @@ func registerQueryFrontend(app *extkingpin.App) {
// Query range tripperware flags.
cmd.Flag("query-range.align-range-with-step", "Mutate incoming queries to align their start and end with their step for better cache-ability. Note: Grafana dashboards do that by default.").
Default("true").BoolVar(&cfg.QueryRangeConfig.AlignRangeWithStep)
Default("true").BoolVar(&cfg.AlignRangeWithStep)
cmd.Flag("query-range.request-downsampled", "Make additional query for downsampled data in case of empty or incomplete response to range request.").
Default("true").BoolVar(&cfg.QueryRangeConfig.RequestDownsampled)
Default("true").BoolVar(&cfg.RequestDownsampled)
cmd.Flag("query-range.split-interval", "Split query range requests by an interval and execute in parallel, it should be greater than 0 when query-range.response-cache-config is configured.").
Default("24h").DurationVar(&cfg.QueryRangeConfig.SplitQueriesByInterval)
@ -83,13 +86,13 @@ func registerQueryFrontend(app *extkingpin.App) {
cmd.Flag("query-range.min-split-interval", "Split query range requests above this interval in query-range.horizontal-shards requests of equal range. "+
"Using this parameter is not allowed with query-range.split-interval. "+
"One should also set query-range.split-min-horizontal-shards to a value greater than 1 to enable splitting.").
Default("0").DurationVar(&cfg.QueryRangeConfig.MinQuerySplitInterval)
Default("0").DurationVar(&cfg.MinQuerySplitInterval)
cmd.Flag("query-range.max-split-interval", "Split query range below this interval in query-range.horizontal-shards. Queries with a range longer than this value will be split in multiple requests of this length.").
Default("0").DurationVar(&cfg.QueryRangeConfig.MaxQuerySplitInterval)
Default("0").DurationVar(&cfg.MaxQuerySplitInterval)
cmd.Flag("query-range.horizontal-shards", "Split queries in this many requests when query duration is below query-range.max-split-interval.").
Default("0").Int64Var(&cfg.QueryRangeConfig.HorizontalShards)
Default("0").Int64Var(&cfg.HorizontalShards)
cmd.Flag("query-range.max-retries-per-request", "Maximum number of retries for a single query range request; beyond this, the downstream error is returned.").
Default("5").IntVar(&cfg.QueryRangeConfig.MaxRetries)
@ -97,6 +100,8 @@ func registerQueryFrontend(app *extkingpin.App) {
cmd.Flag("query-frontend.enable-x-functions", "Enable experimental x- functions in query-frontend. --no-query-frontend.enable-x-functions for disabling.").
Default("false").BoolVar(&cfg.EnableXFunctions)
cmd.Flag("enable-feature", "Comma separated feature names to enable. Valid options for now: promql-experimental-functions (enables promql experimental functions in query-frontend)").Default("").StringsVar(&cfg.EnableFeatures)
cmd.Flag("query-range.max-query-length", "Limit the query time range (end - start time) in the query-frontend, 0 disables it.").
Default("0").DurationVar((*time.Duration)(&cfg.QueryRangeConfig.Limits.MaxQueryLength))
@ -146,6 +151,8 @@ func registerQueryFrontend(app *extkingpin.App) {
cmd.Flag("query-frontend.log-queries-longer-than", "Log queries that are slower than the specified duration. "+
"Set to 0 to disable. Set to < 0 to enable on all queries.").Default("0").DurationVar(&cfg.CortexHandlerConfig.LogQueriesLongerThan)
cmd.Flag("query-frontend.force-query-stats", "Enables query statistics for all queries and will export statistics as logs and service headers.").Default("false").BoolVar(&cfg.CortexHandlerConfig.QueryStatsEnabled)
cmd.Flag("query-frontend.org-id-header", "Deprecation Warning - This flag will be soon deprecated in favor of query-frontend.tenant-header"+
" and both flags cannot be used at the same time. "+
"Request header names used to identify the source of slow queries (repeated flag). "+
@ -161,6 +168,8 @@ func registerQueryFrontend(app *extkingpin.App) {
cmd.Flag("query-frontend.vertical-shards", "Number of shards to use when distributing shardable PromQL queries. For more details, you can refer to the Vertical query sharding proposal: https://thanos.io/tip/proposals-accepted/202205-vertical-query-sharding.md").IntVar(&cfg.NumShards)
cmd.Flag("query-frontend.slow-query-logs-user-header", "Set the value of the field remote_user in the slow query logs to the value of the given HTTP header. Falls back to reading the user from the basic auth header.").PlaceHolder("<http-header-name>").Default("").StringVar(&cfg.CortexHandlerConfig.SlowQueryLogsUserHeader)
reqLogConfig := extkingpin.RegisterRequestLoggingFlags(cmd)
cmd.Setup(func(g *run.Group, logger log.Logger, reg *prometheus.Registry, tracer opentracing.Tracer, _ <-chan struct{}, _ bool) error {
@ -266,8 +275,9 @@ func runQueryFrontend(
return errors.Wrap(err, "initializing the query range cache config")
}
cfg.QueryRangeConfig.ResultsCacheConfig = &queryrange.ResultsCacheConfig{
Compression: cfg.CacheCompression,
CacheConfig: *cacheConfig,
Compression: cfg.CacheCompression,
CacheConfig: *cacheConfig,
CacheQueryableSamplesStats: cfg.CortexHandlerConfig.QueryStatsEnabled,
}
}
@ -291,8 +301,15 @@ func runQueryFrontend(
}
if cfg.EnableXFunctions {
for fname, v := range parse.XFunctions {
parser.Functions[fname] = v
maps.Copy(parser.Functions, parse.XFunctions)
}
if len(cfg.EnableFeatures) > 0 {
for _, feature := range cfg.EnableFeatures {
if feature == promqlExperimentalFunctions {
parser.EnableExperimentalFunctions = true
level.Info(logger).Log("msg", "Experimental PromQL functions enabled.", "option", promqlExperimentalFunctions)
}
}
}
@ -311,13 +328,13 @@ func runQueryFrontend(
return err
}
roundTripper, err := cortexfrontend.NewDownstreamRoundTripper(cfg.DownstreamURL, downstreamTripper)
downstreamRT, err := cortexfrontend.NewDownstreamRoundTripper(cfg.DownstreamURL, downstreamTripper)
if err != nil {
return errors.Wrap(err, "setup downstream roundtripper")
}
// Wrap the downstream RoundTripper into query frontend Tripperware.
roundTripper = tripperWare(roundTripper)
roundTripper := tripperWare(downstreamRT)
// Create the query frontend transport.
handler := transport.NewHandler(*cfg.CortexHandlerConfig, roundTripper, logger, nil)
@ -350,19 +367,17 @@ func runQueryFrontend(
if !cfg.webDisableCORS {
api.SetCORS(w)
}
tracing.HTTPMiddleware(
tracer,
name,
logger,
ins.NewHandler(
middleware.RequestID(
tracing.HTTPMiddleware(
tracer,
name,
gzhttp.GzipHandler(
middleware.RequestID(
logMiddleware.HTTPMiddleware(name, f),
),
logger,
ins.NewHandler(
name,
logMiddleware.HTTPMiddleware(name, f),
),
// Cortex frontend middlewares require orgID.
),
// Cortex frontend middlewares require orgID.
).ServeHTTP(w, r.WithContext(user.InjectOrgID(r.Context(), orgId)))
})
return hf
@ -381,8 +396,57 @@ func runQueryFrontend(
})
}
// Periodically check downstream URL to ensure it is reachable.
{
ctx, cancel := context.WithCancel(context.Background())
g.Add(func() error {
var firstRun = true
doCheckDownstream := func() (rerr error) {
timeoutCtx, cancel := context.WithTimeout(ctx, 10*time.Second)
defer cancel()
readinessUrl := cfg.DownstreamURL + "/-/ready"
req, err := http.NewRequestWithContext(timeoutCtx, http.MethodGet, readinessUrl, nil)
if err != nil {
return errors.Wrap(err, "creating request to downstream URL")
}
resp, err := downstreamRT.RoundTrip(req)
if err != nil {
return errors.Wrapf(err, "roundtripping to downstream URL %s", readinessUrl)
}
defer runutil.CloseWithErrCapture(&rerr, resp.Body, "downstream health check response body")
if resp.StatusCode/100 == 4 || resp.StatusCode/100 == 5 {
return errors.Errorf("downstream URL %s returned an error: %d", readinessUrl, resp.StatusCode)
}
return nil
}
for {
if !firstRun {
select {
case <-ctx.Done():
return nil
case <-time.After(10 * time.Second):
}
}
firstRun = false
if err := doCheckDownstream(); err != nil {
statusProber.NotReady(err)
} else {
statusProber.Ready()
}
}
}, func(err error) {
cancel()
})
}
level.Info(logger).Log("msg", "starting query frontend")
statusProber.Ready()
return nil
}

View File

@ -7,8 +7,6 @@ import (
"testing"
"time"
"github.com/prometheus/prometheus/promql"
"github.com/efficientgo/core/testutil"
)
@ -87,7 +85,7 @@ func TestLookbackDeltaFactory(t *testing.T) {
}
)
for _, td := range tData {
lookbackCreate := LookbackDeltaFactory(promql.EngineOpts{LookbackDelta: td.lookbackDelta}, td.dynamicLookbackDelta)
lookbackCreate := LookbackDeltaFactory(td.lookbackDelta, td.dynamicLookbackDelta)
for _, tc := range td.tcs {
got := lookbackCreate(tc.stepMillis)
testutil.Equals(t, tc.expect, got)

View File

@ -5,6 +5,8 @@ package main
import (
"context"
"fmt"
"net"
"os"
"path"
"strings"
@ -15,7 +17,6 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
grpc_logging "github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/logging"
"github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/tags"
"github.com/oklog/run"
"github.com/opentracing/opentracing-go"
"github.com/pkg/errors"
@ -25,16 +26,16 @@ import (
"github.com/prometheus/prometheus/model/labels"
"github.com/prometheus/prometheus/model/relabel"
"github.com/prometheus/prometheus/tsdb"
"github.com/prometheus/prometheus/tsdb/wlog"
"google.golang.org/grpc"
"gopkg.in/yaml.v2"
"github.com/prometheus/prometheus/util/compression"
"github.com/thanos-io/objstore"
"github.com/thanos-io/objstore/client"
objstoretracing "github.com/thanos-io/objstore/tracing/opentracing"
"google.golang.org/grpc"
"gopkg.in/yaml.v2"
"github.com/thanos-io/thanos/pkg/block/metadata"
"github.com/thanos-io/thanos/pkg/component"
"github.com/thanos-io/thanos/pkg/compressutil"
"github.com/thanos-io/thanos/pkg/exemplars"
"github.com/thanos-io/thanos/pkg/extgrpc"
"github.com/thanos-io/thanos/pkg/extgrpc/snappy"
@ -49,12 +50,16 @@ import (
grpcserver "github.com/thanos-io/thanos/pkg/server/grpc"
httpserver "github.com/thanos-io/thanos/pkg/server/http"
"github.com/thanos-io/thanos/pkg/store"
storecache "github.com/thanos-io/thanos/pkg/store/cache"
"github.com/thanos-io/thanos/pkg/store/labelpb"
"github.com/thanos-io/thanos/pkg/tenancy"
"github.com/thanos-io/thanos/pkg/tls"
)
const compressionNone = "none"
const (
compressionNone = "none"
metricNamesFilter = "metric-names-filter"
)
func registerReceive(app *extkingpin.App) {
cmd := app.Command(component.Receive.String(), "Accept Prometheus remote write API requests and write to local tsdb.")
@ -71,11 +76,12 @@ func registerReceive(app *extkingpin.App) {
if !model.LabelName.IsValid(model.LabelName(conf.tenantLabelName)) {
return errors.Errorf("unsupported format for tenant label name, got %s", conf.tenantLabelName)
}
if len(lset) == 0 {
if lset.Len() == 0 {
return errors.New("no external labels configured for receive, uniquely identifying external labels must be configured (ideally with `receive_` prefix); see https://thanos.io/tip/thanos/storage.md#external-labels for details.")
}
tagOpts, grpcLogOpts, err := logging.ParsegRPCOptions(conf.reqLogConfig)
grpcLogOpts, logFilterMethods, err := logging.ParsegRPCOptions(conf.reqLogConfig)
if err != nil {
return errors.Wrap(err, "error while parsing config for request logging")
}
@ -88,7 +94,7 @@ func registerReceive(app *extkingpin.App) {
MaxBytes: int64(conf.tsdbMaxBytes),
OutOfOrderCapMax: conf.tsdbOutOfOrderCapMax,
NoLockfile: conf.noLockFile,
WALCompression: wlog.ParseCompressionType(conf.walCompression, string(wlog.CompressionSnappy)),
WALCompression: compressutil.ParseCompressionType(conf.walCompression, compression.Snappy),
MaxExemplars: conf.tsdbMaxExemplars,
EnableExemplarStorage: conf.tsdbMaxExemplars > 0,
HeadChunksWriteQueueSize: int(conf.tsdbWriteQueueSize),
@ -105,7 +111,8 @@ func registerReceive(app *extkingpin.App) {
debugLogging,
reg,
tracer,
grpcLogOpts, tagOpts,
grpcLogOpts,
logFilterMethods,
tsdbOpts,
lset,
component.Receive,
@ -123,7 +130,7 @@ func runReceive(
reg *prometheus.Registry,
tracer opentracing.Tracer,
grpcLogOpts []grpc_logging.Option,
tagOpts []tags.Option,
logFilterMethods []string,
tsdbOpts *tsdb.Options,
lset labels.Labels,
comp component.SourceStoreAPI,
@ -135,7 +142,18 @@ func runReceive(
level.Info(logger).Log("mode", receiveMode, "msg", "running receive")
rwTLSConfig, err := tls.NewServerConfig(log.With(logger, "protocol", "HTTP"), conf.rwServerCert, conf.rwServerKey, conf.rwServerClientCA)
multiTSDBOptions := []receive.MultiTSDBOption{
receive.WithHeadExpandedPostingsCacheSize(conf.headExpandedPostingsCacheSize),
receive.WithBlockExpandedPostingsCacheSize(conf.compactedBlocksExpandedPostingsCacheSize),
}
for _, feature := range *conf.featureList {
if feature == metricNamesFilter {
multiTSDBOptions = append(multiTSDBOptions, receive.WithMetricNameFilterEnabled())
level.Info(logger).Log("msg", "metric name filter feature enabled")
}
}
rwTLSConfig, err := tls.NewServerConfig(log.With(logger, "protocol", "HTTP"), conf.rwServerCert, conf.rwServerKey, conf.rwServerClientCA, conf.rwServerTlsMinVersion)
if err != nil {
return err
}
@ -144,8 +162,8 @@ func runReceive(
logger,
reg,
tracer,
conf.grpcConfig.tlsSrvCert != "",
conf.grpcConfig.tlsSrvClientCA == "",
conf.rwClientSecure,
conf.rwClientSkipVerify,
conf.rwClientCert,
conf.rwClientKey,
conf.rwClientServerCA,
@ -158,6 +176,10 @@ func runReceive(
dialOpts = append(dialOpts, grpc.WithDefaultCallOptions(grpc.UseCompressor(conf.compression)))
}
if conf.grpcServiceConfig != "" {
dialOpts = append(dialOpts, grpc.WithDefaultServiceConfig(conf.grpcServiceConfig))
}
var bkt objstore.Bucket
confContentYaml, err := conf.objStoreConfig.Content()
if err != nil {
@ -179,7 +201,7 @@ func runReceive(
}
// The background shipper continuously scans the data directory and uploads
// new blocks to object storage service.
bkt, err = client.NewBucket(logger, confContentYaml, comp.String())
bkt, err = client.NewBucket(logger, confContentYaml, comp.String(), nil)
if err != nil {
return err
}
@ -189,10 +211,9 @@ func runReceive(
}
}
// TODO(brancz): remove after a couple of versions
// Migrate non-multi-tsdb capable storage to multi-tsdb disk layout.
if err := migrateLegacyStorage(logger, conf.dataDir, conf.defaultTenantID); err != nil {
return errors.Wrapf(err, "migrate legacy storage in %v to default tenant %v", conf.dataDir, conf.defaultTenantID)
// Create TSDB for the default tenant.
if err := createDefautTenantTSDB(logger, conf.dataDir, conf.defaultTenantID); err != nil {
return errors.Wrapf(err, "create default tenant tsdb in %v", conf.dataDir)
}
relabelContentYaml, err := conf.relabelConfigPath.Content()
@ -204,6 +225,15 @@ func runReceive(
return errors.Wrap(err, "parse relabel configuration")
}
var cache = storecache.NoopMatchersCache
if conf.matcherCacheSize > 0 {
cache, err = storecache.NewMatchersCache(storecache.WithSize(conf.matcherCacheSize), storecache.WithPromRegistry(reg))
if err != nil {
return errors.Wrap(err, "failed to create matchers cache")
}
multiTSDBOptions = append(multiTSDBOptions, receive.WithMatchersCache(cache))
}
dbs := receive.NewMultiTSDB(
conf.dataDir,
logger,
@ -213,7 +243,9 @@ func runReceive(
conf.tenantLabelName,
bkt,
conf.allowOutOfOrderUpload,
conf.skipCorruptedBlocks,
hashFunc,
multiTSDBOptions...,
)
writer := receive.NewWriter(log.With(logger, "component", "receive-writer"), dbs, &receive.WriterOptions{
Intern: conf.writerInterning,
@ -237,24 +269,30 @@ func runReceive(
}
webHandler := receive.NewHandler(log.With(logger, "component", "receive-handler"), &receive.Options{
Writer: writer,
ListenAddress: conf.rwAddress,
Registry: reg,
Endpoint: conf.endpoint,
TenantHeader: conf.tenantHeader,
TenantField: conf.tenantField,
DefaultTenantID: conf.defaultTenantID,
ReplicaHeader: conf.replicaHeader,
ReplicationFactor: conf.replicationFactor,
RelabelConfigs: relabelConfig,
ReceiverMode: receiveMode,
Tracer: tracer,
TLSConfig: rwTLSConfig,
DialOpts: dialOpts,
ForwardTimeout: time.Duration(*conf.forwardTimeout),
MaxBackoff: time.Duration(*conf.maxBackoff),
TSDBStats: dbs,
Limiter: limiter,
Writer: writer,
ListenAddress: conf.rwAddress,
Registry: reg,
Endpoint: conf.endpoint,
TenantHeader: conf.tenantHeader,
TenantField: conf.tenantField,
DefaultTenantID: conf.defaultTenantID,
ReplicaHeader: conf.replicaHeader,
ReplicationFactor: conf.replicationFactor,
RelabelConfigs: relabelConfig,
ReceiverMode: receiveMode,
Tracer: tracer,
TLSConfig: rwTLSConfig,
SplitTenantLabelName: conf.splitTenantLabelName,
DialOpts: dialOpts,
ForwardTimeout: time.Duration(*conf.forwardTimeout),
MaxBackoff: time.Duration(*conf.maxBackoff),
TSDBStats: dbs,
Limiter: limiter,
AsyncForwardWorkerCount: conf.asyncForwardWorkerCount,
ReplicationProtocol: receive.ReplicationProtocol(conf.replicationProtocol),
OtlpEnableTargetInfo: conf.otlpEnableTargetInfo,
OtlpResourceAttributes: conf.otlpResourceAttributes,
})
grpcProbe := prober.NewGRPC()
@ -312,14 +350,19 @@ func runReceive(
level.Debug(logger).Log("msg", "setting up gRPC server")
{
tlsCfg, err := tls.NewServerConfig(log.With(logger, "protocol", "gRPC"), conf.grpcConfig.tlsSrvCert, conf.grpcConfig.tlsSrvKey, conf.grpcConfig.tlsSrvClientCA)
tlsCfg, err := tls.NewServerConfig(log.With(logger, "protocol", "gRPC"), conf.grpcConfig.tlsSrvCert, conf.grpcConfig.tlsSrvKey, conf.grpcConfig.tlsSrvClientCA, conf.grpcConfig.tlsMinVersion)
if err != nil {
return errors.Wrap(err, "setup gRPC server")
}
options := []store.ProxyStoreOption{}
if debugLogging {
options = append(options, store.WithProxyStoreDebugLogging())
if conf.lazyRetrievalMaxBufferedResponses <= 0 {
return errors.New("--receive.lazy-retrieval-max-buffered-responses must be > 0")
}
options := []store.ProxyStoreOption{
store.WithProxyStoreDebugLogging(debugLogging),
store.WithMatcherCache(cache),
store.WithoutDedup(),
store.WithLazyRetrievalMaxBufferedResponsesForProxy(conf.lazyRetrievalMaxBufferedResponses),
}
proxy := store.NewProxyStore(
@ -341,7 +384,7 @@ func runReceive(
infoSrv := info.NewInfoServer(
component.Receive.String(),
info.WithLabelSetFunc(func() []labelpb.ZLabelSet { return proxy.LabelSet() }),
info.WithStoreInfoFunc(func() *infopb.StoreInfo {
info.WithStoreInfoFunc(func() (*infopb.StoreInfo, error) {
if httpProbe.IsReady() {
minTime, maxTime := proxy.TimeRange()
return &infopb.StoreInfo{
@ -350,14 +393,14 @@ func runReceive(
SupportsSharding: true,
SupportsWithoutReplicaLabels: true,
TsdbInfos: proxy.TSDBInfos(),
}
}, nil
}
return nil
return nil, errors.New("Not ready")
}),
info.WithExemplarsInfoFunc(),
)
srv := grpcserver.New(logger, receive.NewUnRegisterer(reg), tracer, grpcLogOpts, tagOpts, comp, grpcProbe,
srv := grpcserver.New(logger, receive.NewUnRegisterer(reg), tracer, grpcLogOpts, logFilterMethods, comp, grpcProbe,
grpcserver.WithServer(store.RegisterStoreServer(rw, logger)),
grpcserver.WithServer(store.RegisterWritableStoreServer(rw)),
grpcserver.WithServer(exemplars.RegisterExemplarsServer(exemplars.NewMultiTSDB(dbs.TSDBExemplars))),
@ -416,7 +459,13 @@ func runReceive(
{
ctx, cancel := context.WithCancel(context.Background())
g.Add(func() error {
return runutil.Repeat(2*time.Hour, ctx.Done(), func() error {
pruneInterval := 2 * time.Duration(tsdbOpts.MaxBlockDuration) * time.Millisecond
return runutil.Repeat(time.Minute, ctx.Done(), func() error {
currentTime := time.Now()
currentTotalMinutes := currentTime.Hour()*60 + currentTime.Minute()
if currentTotalMinutes%int(pruneInterval.Minutes()) != 0 {
return nil
}
if err := dbs.Prune(ctx); err != nil {
level.Error(logger).Log("err", err)
}
@ -443,6 +492,26 @@ func runReceive(
}
}
{
capNProtoWriter := receive.NewCapNProtoWriter(logger, dbs, &receive.CapNProtoWriterOptions{
TooFarInFutureTimeWindow: int64(time.Duration(*conf.tsdbTooFarInFutureTimeWindow)),
})
handler := receive.NewCapNProtoHandler(logger, capNProtoWriter)
listener, err := net.Listen("tcp", conf.replicationAddr)
if err != nil {
return err
}
server := receive.NewCapNProtoServer(listener, handler, logger)
g.Add(func() error {
return server.ListenAndServe()
}, func(err error) {
server.Shutdown()
if err := listener.Close(); err != nil {
level.Warn(logger).Log("msg", "Cap'n Proto server did not shut down gracefully", "err", err.Error())
}
})
}
level.Info(logger).Log("msg", "starting receiver")
return nil
}
@ -529,7 +598,7 @@ func setupHashring(g *run.Group,
webHandler.Hashring(receive.SingleNodeHashring(conf.endpoint))
level.Info(logger).Log("msg", "Empty hashring config. Set up single node hashring.")
} else {
h, err := receive.NewMultiHashring(algorithm, conf.replicationFactor, c)
h, err := receive.NewMultiHashring(algorithm, conf.replicationFactor, c, reg)
if err != nil {
return errors.Wrap(err, "unable to create new hashring from config")
}
@ -731,38 +800,25 @@ func startTSDBAndUpload(g *run.Group,
return nil
}
func migrateLegacyStorage(logger log.Logger, dataDir, defaultTenantID string) error {
func createDefautTenantTSDB(logger log.Logger, dataDir, defaultTenantID string) error {
defaultTenantDataDir := path.Join(dataDir, defaultTenantID)
if _, err := os.Stat(defaultTenantDataDir); !os.IsNotExist(err) {
level.Info(logger).Log("msg", "default tenant data dir already present, not attempting to migrate storage")
level.Info(logger).Log("msg", "default tenant data dir already present, will not create")
return nil
}
if _, err := os.Stat(dataDir); os.IsNotExist(err) {
level.Info(logger).Log("msg", "no existing storage found, no data migration attempted")
level.Info(logger).Log("msg", "no existing storage found, not creating default tenant data dir")
return nil
}
level.Info(logger).Log("msg", "found legacy storage, migrating to multi-tsdb layout with default tenant", "defaultTenantID", defaultTenantID)
files, err := os.ReadDir(dataDir)
if err != nil {
return errors.Wrapf(err, "read legacy data dir: %v", dataDir)
}
level.Info(logger).Log("msg", "default tenant data dir not found, creating", "defaultTenantID", defaultTenantID)
if err := os.MkdirAll(defaultTenantDataDir, 0750); err != nil {
return errors.Wrapf(err, "create default tenant data dir: %v", defaultTenantDataDir)
}
for _, f := range files {
from := path.Join(dataDir, f.Name())
to := path.Join(defaultTenantDataDir, f.Name())
if err := os.Rename(from, to); err != nil {
return errors.Wrapf(err, "migrate file from %v to %v", from, to)
}
}
return nil
}
@ -773,14 +829,18 @@ type receiveConfig struct {
grpcConfig grpcConfig
rwAddress string
rwServerCert string
rwServerKey string
rwServerClientCA string
rwClientCert string
rwClientKey string
rwClientServerCA string
rwClientServerName string
replicationAddr string
rwAddress string
rwServerCert string
rwServerKey string
rwServerClientCA string
rwClientCert string
rwClientKey string
rwClientSecure bool
rwClientServerCA string
rwClientServerName string
rwClientSkipVerify bool
rwServerTlsMinVersion string
dataDir string
labelStrs []string
@ -792,17 +852,19 @@ type receiveConfig struct {
hashringsFileContent string
hashringsAlgorithm string
refreshInterval *model.Duration
endpoint string
tenantHeader string
tenantField string
tenantLabelName string
defaultTenantID string
replicaHeader string
replicationFactor uint64
forwardTimeout *model.Duration
maxBackoff *model.Duration
compression string
refreshInterval *model.Duration
endpoint string
tenantHeader string
tenantField string
tenantLabelName string
defaultTenantID string
replicaHeader string
replicationFactor uint64
forwardTimeout *model.Duration
maxBackoff *model.Duration
compression string
replicationProtocol string
grpcServiceConfig string
tsdbMinBlockDuration *model.Duration
tsdbMaxBlockDuration *model.Duration
@ -816,14 +878,16 @@ type receiveConfig struct {
tsdbMemorySnapshotOnShutdown bool
tsdbEnableNativeHistograms bool
walCompression bool
noLockFile bool
writerInterning bool
walCompression bool
noLockFile bool
writerInterning bool
splitTenantLabelName string
hashFunc string
ignoreBlockSize bool
allowOutOfOrderUpload bool
skipCorruptedBlocks bool
reqLogConfig *extflag.PathOrContent
relabelConfigPath *extflag.PathOrContent
@ -831,6 +895,19 @@ type receiveConfig struct {
writeLimitsConfig *extflag.PathOrContent
storeRateLimits store.SeriesSelectLimits
limitsConfigReloadTimer time.Duration
asyncForwardWorkerCount uint
matcherCacheSize int
lazyRetrievalMaxBufferedResponses int
featureList *[]string
headExpandedPostingsCacheSize uint64
compactedBlocksExpandedPostingsCacheSize uint64
otlpEnableTargetInfo bool
otlpResourceAttributes []string
}
func (rc *receiveConfig) registerFlag(cmd extkingpin.FlagClause) {
@ -847,10 +924,16 @@ func (rc *receiveConfig) registerFlag(cmd extkingpin.FlagClause) {
cmd.Flag("remote-write.server-tls-client-ca", "TLS CA to verify clients against. If no client CA is specified, there is no client verification on server side. (tls.NoClientCert)").Default("").StringVar(&rc.rwServerClientCA)
cmd.Flag("remote-write.server-tls-min-version", "TLS version for the gRPC server, leave blank to default to TLS 1.3, allow values: [\"1.0\", \"1.1\", \"1.2\", \"1.3\"]").Default("1.3").StringVar(&rc.rwServerTlsMinVersion)
cmd.Flag("remote-write.client-tls-cert", "TLS Certificates to use to identify this client to the server.").Default("").StringVar(&rc.rwClientCert)
cmd.Flag("remote-write.client-tls-key", "TLS Key for the client's certificate.").Default("").StringVar(&rc.rwClientKey)
cmd.Flag("remote-write.client-tls-secure", "Use TLS when talking to the other receivers.").Default("false").BoolVar(&rc.rwClientSecure)
cmd.Flag("remote-write.client-tls-skip-verify", "Disable TLS certificate verification when talking to the other receivers i.e self signed, signed by fake CA.").Default("false").BoolVar(&rc.rwClientSkipVerify)
cmd.Flag("remote-write.client-tls-ca", "TLS CA Certificates to use to verify servers.").Default("").StringVar(&rc.rwClientServerCA)
cmd.Flag("remote-write.client-server-name", "Server name to verify the hostname on the returned TLS certificates. See https://tools.ietf.org/html/rfc4366#section-3.1").Default("").StringVar(&rc.rwClientServerName)
@ -884,15 +967,27 @@ func (rc *receiveConfig) registerFlag(cmd extkingpin.FlagClause) {
cmd.Flag("receive.default-tenant-id", "Default tenant ID to use when none is provided via a header.").Default(tenancy.DefaultTenant).StringVar(&rc.defaultTenantID)
cmd.Flag("receive.split-tenant-label-name", "Label name through which the request will be split into multiple tenants. This takes precedence over the HTTP header.").Default("").StringVar(&rc.splitTenantLabelName)
cmd.Flag("receive.tenant-label-name", "Label name through which the tenant will be announced.").Default(tenancy.DefaultTenantLabel).StringVar(&rc.tenantLabelName)
cmd.Flag("receive.replica-header", "HTTP header specifying the replica number of a write request.").Default(receive.DefaultReplicaHeader).StringVar(&rc.replicaHeader)
cmd.Flag("receive.forward.async-workers", "Number of concurrent workers processing forwarding of remote-write requests.").Default("5").UintVar(&rc.asyncForwardWorkerCount)
compressionOptions := strings.Join([]string{snappy.Name, compressionNone}, ", ")
cmd.Flag("receive.grpc-compression", "Compression algorithm to use for gRPC requests to other receivers. Must be one of: "+compressionOptions).Default(snappy.Name).EnumVar(&rc.compression, snappy.Name, compressionNone)
cmd.Flag("receive.replication-factor", "How many times to replicate incoming write requests.").Default("1").Uint64Var(&rc.replicationFactor)
replicationProtocols := []string{string(receive.ProtobufReplication), string(receive.CapNProtoReplication)}
cmd.Flag("receive.replication-protocol", "The protocol to use for replicating remote-write requests. One of "+strings.Join(replicationProtocols, ", ")).
Default(string(receive.ProtobufReplication)).
EnumVar(&rc.replicationProtocol, replicationProtocols...)
cmd.Flag("receive.capnproto-address", "Address for the Cap'n Proto server.").Default(fmt.Sprintf("0.0.0.0:%s", receive.DefaultCapNProtoPort)).StringVar(&rc.replicationAddr)
cmd.Flag("receive.grpc-service-config", "gRPC service configuration file or content in JSON format. See https://github.com/grpc/grpc/blob/master/doc/service_config.md").PlaceHolder("<content>").Default("").StringVar(&rc.grpcServiceConfig)
rc.forwardTimeout = extkingpin.ModelDuration(cmd.Flag("receive-forward-timeout", "Timeout for each forward request.").Default("5s").Hidden())
rc.maxBackoff = extkingpin.ModelDuration(cmd.Flag("receive-forward-max-backoff", "Maximum backoff for each forward fan-out request").Default("5s").Hidden())
@ -904,18 +999,18 @@ func (rc *receiveConfig) registerFlag(cmd extkingpin.FlagClause) {
rc.tsdbMaxBlockDuration = extkingpin.ModelDuration(cmd.Flag("tsdb.max-block-duration", "Max duration for local TSDB blocks").Default("2h").Hidden())
rc.tsdbTooFarInFutureTimeWindow = extkingpin.ModelDuration(cmd.Flag("tsdb.too-far-in-future.time-window",
"[EXPERIMENTAL] Configures the allowed time window for ingesting samples too far in the future. Disabled (0s) by default"+
"Configures the allowed time window for ingesting samples too far in the future. Disabled (0s) by default. "+
"Please note enable this flag will reject samples in the future of receive local NTP time + configured duration due to clock skew in remote write clients.",
).Default("0s"))
rc.tsdbOutOfOrderTimeWindow = extkingpin.ModelDuration(cmd.Flag("tsdb.out-of-order.time-window",
"[EXPERIMENTAL] Configures the allowed time window for ingestion of out-of-order samples. Disabled (0s) by default"+
"Please note if you enable this option and you use compactor, make sure you have the --enable-vertical-compaction flag enabled, otherwise you might risk compactor halt.",
).Default("0s").Hidden())
"Please note if you enable this option and you use compactor, make sure you have the --compact.enable-vertical-compaction flag enabled, otherwise you might risk compactor halt.",
).Default("0s"))
cmd.Flag("tsdb.out-of-order.cap-max",
"[EXPERIMENTAL] Configures the maximum capacity for out-of-order chunks (in samples). If set to <=0, default value 32 is assumed.",
).Default("0").Hidden().Int64Var(&rc.tsdbOutOfOrderCapMax)
).Default("0").Int64Var(&rc.tsdbOutOfOrderCapMax)
cmd.Flag("tsdb.allow-overlapping-blocks", "Allow overlapping blocks, which in turn enables vertical compaction and vertical query merge. Does not do anything, enabled all the time.").Default("false").BoolVar(&rc.tsdbAllowOverlappingBlocks)
@ -925,6 +1020,9 @@ func (rc *receiveConfig) registerFlag(cmd extkingpin.FlagClause) {
cmd.Flag("tsdb.no-lockfile", "Do not create lockfile in TSDB data directory. In any case, the lockfiles will be deleted on next startup.").Default("false").BoolVar(&rc.noLockFile)
cmd.Flag("tsdb.head.expanded-postings-cache-size", "[EXPERIMENTAL] If non-zero, enables expanded postings cache for the head block.").Default("0").Uint64Var(&rc.headExpandedPostingsCacheSize)
cmd.Flag("tsdb.block.expanded-postings-cache-size", "[EXPERIMENTAL] If non-zero, enables expanded postings cache for compacted blocks.").Default("0").Uint64Var(&rc.compactedBlocksExpandedPostingsCacheSize)
cmd.Flag("tsdb.max-exemplars",
"Enables support for ingesting exemplars and sets the maximum number of exemplars that will be stored per tenant."+
" In case the exemplar storage becomes full (number of stored exemplars becomes equal to max-exemplars),"+
@ -942,7 +1040,7 @@ func (rc *receiveConfig) registerFlag(cmd extkingpin.FlagClause) {
cmd.Flag("tsdb.enable-native-histograms",
"[EXPERIMENTAL] Enables the ingestion of native histograms.").
Default("false").Hidden().BoolVar(&rc.tsdbEnableNativeHistograms)
Default("false").BoolVar(&rc.tsdbEnableNativeHistograms)
cmd.Flag("writer.intern",
"[EXPERIMENTAL] Enables string interning in receive writer, for more optimized memory usage.").
@ -959,11 +1057,27 @@ func (rc *receiveConfig) registerFlag(cmd extkingpin.FlagClause) {
"about order.").
Default("false").Hidden().BoolVar(&rc.allowOutOfOrderUpload)
cmd.Flag("shipper.skip-corrupted-blocks",
"If true, shipper will skip corrupted blocks in the given iteration and retry later. This means that some newer blocks might be uploaded sooner than older blocks."+
"This can trigger compaction without those blocks and as a result will create an overlap situation. Set it to true if you have vertical compaction enabled and wish to upload blocks as soon as possible without caring"+
"about order.").
Default("false").Hidden().BoolVar(&rc.skipCorruptedBlocks)
cmd.Flag("matcher-cache-size", "Max number of cached matchers items. Using 0 disables caching.").Default("0").IntVar(&rc.matcherCacheSize)
rc.reqLogConfig = extkingpin.RegisterRequestLoggingFlags(cmd)
rc.writeLimitsConfig = extflag.RegisterPathOrContent(cmd, "receive.limits-config", "YAML file that contains limit configuration.", extflag.WithEnvSubstitution(), extflag.WithHidden())
cmd.Flag("receive.limits-config-reload-timer", "Minimum amount of time to pass for the limit configuration to be reloaded. Helps to avoid excessive reloads.").
Default("1s").Hidden().DurationVar(&rc.limitsConfigReloadTimer)
cmd.Flag("receive.otlp-enable-target-info", "Enables target information in OTLP metrics ingested by Receive. If enabled, it converts the resource to the target info metric").Default("true").BoolVar(&rc.otlpEnableTargetInfo)
cmd.Flag("receive.otlp-promote-resource-attributes", "(Repeatable) Resource attributes to include in OTLP metrics ingested by Receive.").Default("").StringsVar(&rc.otlpResourceAttributes)
rc.featureList = cmd.Flag("enable-feature", "Comma separated experimental feature names to enable. The current list of features is "+metricNamesFilter+".").Default("").Strings()
cmd.Flag("receive.lazy-retrieval-max-buffered-responses", "The lazy retrieval strategy can buffer up to this number of responses. This is to limit the memory usage. This flag takes effect only when the lazy retrieval strategy is enabled.").
Default("20").IntVar(&rc.lazyRetrievalMaxBufferedResponses)
}
// determineMode returns the ReceiverMode that this receiver is configured to run in.

View File

@ -8,12 +8,14 @@ import (
"context"
"fmt"
"html/template"
"maps"
"math/rand"
"net/http"
"net/url"
"os"
"path/filepath"
"strings"
"sync"
texttemplate "text/template"
"time"
@ -21,7 +23,6 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
grpc_logging "github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/logging"
"github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/tags"
"github.com/oklog/run"
"github.com/opentracing/opentracing-go"
"github.com/pkg/errors"
@ -34,32 +35,39 @@ import (
"github.com/prometheus/prometheus/model/relabel"
"github.com/prometheus/prometheus/notifier"
"github.com/prometheus/prometheus/promql"
"github.com/prometheus/prometheus/promql/parser"
"github.com/prometheus/prometheus/rules"
"github.com/prometheus/prometheus/scrape"
"github.com/prometheus/prometheus/storage"
"github.com/prometheus/prometheus/storage/remote"
"github.com/prometheus/prometheus/tsdb"
"github.com/prometheus/prometheus/tsdb/agent"
"github.com/prometheus/prometheus/tsdb/wlog"
"github.com/prometheus/prometheus/util/compression"
"gopkg.in/yaml.v2"
"github.com/thanos-io/objstore"
"github.com/thanos-io/objstore/client"
objstoretracing "github.com/thanos-io/objstore/tracing/opentracing"
"gopkg.in/yaml.v2"
"github.com/thanos-io/promql-engine/execution/parse"
"github.com/thanos-io/thanos/pkg/alert"
v1 "github.com/thanos-io/thanos/pkg/api/rule"
"github.com/thanos-io/thanos/pkg/block/metadata"
"github.com/thanos-io/thanos/pkg/clientconfig"
"github.com/thanos-io/thanos/pkg/component"
"github.com/thanos-io/thanos/pkg/compressutil"
"github.com/thanos-io/thanos/pkg/discovery/dns"
"github.com/thanos-io/thanos/pkg/errutil"
"github.com/thanos-io/thanos/pkg/extannotations"
"github.com/thanos-io/thanos/pkg/extgrpc"
"github.com/thanos-io/thanos/pkg/extkingpin"
"github.com/thanos-io/thanos/pkg/extprom"
extpromhttp "github.com/thanos-io/thanos/pkg/extprom/http"
"github.com/thanos-io/thanos/pkg/extpromql"
"github.com/thanos-io/thanos/pkg/info"
"github.com/thanos-io/thanos/pkg/info/infopb"
"github.com/thanos-io/thanos/pkg/logging"
"github.com/thanos-io/thanos/pkg/logutil"
"github.com/thanos-io/thanos/pkg/prober"
"github.com/thanos-io/thanos/pkg/promclient"
"github.com/thanos-io/thanos/pkg/query"
@ -76,8 +84,6 @@ import (
"github.com/thanos-io/thanos/pkg/ui"
)
const dnsSDResolver = "miekgdns"
type ruleConfig struct {
http httpConfig
grpc grpcConfig
@ -95,16 +101,22 @@ type ruleConfig struct {
rwConfig *extflag.PathOrContent
resendDelay time.Duration
evalInterval time.Duration
outageTolerance time.Duration
forGracePeriod time.Duration
ruleFiles []string
objStoreConfig *extflag.PathOrContent
dataDir string
lset labels.Labels
ignoredLabelNames []string
storeRateLimits store.SeriesSelectLimits
resendDelay time.Duration
evalInterval time.Duration
queryOffset time.Duration
outageTolerance time.Duration
forGracePeriod time.Duration
ruleFiles []string
objStoreConfig *extflag.PathOrContent
dataDir string
lset labels.Labels
ignoredLabelNames []string
storeRateLimits store.SeriesSelectLimits
ruleConcurrentEval int64
extendedFunctionsEnabled bool
EnableFeatures []string
tsdbEnableNativeHistograms bool
}
type Expression struct {
@ -145,16 +157,26 @@ func registerRule(app *extkingpin.App) {
Default("1m").DurationVar(&conf.resendDelay)
cmd.Flag("eval-interval", "The default evaluation interval to use.").
Default("1m").DurationVar(&conf.evalInterval)
cmd.Flag("rule-query-offset", "The default rule group query_offset duration to use.").
Default("0s").DurationVar(&conf.queryOffset)
cmd.Flag("for-outage-tolerance", "Max time to tolerate prometheus outage for restoring \"for\" state of alert.").
Default("1h").DurationVar(&conf.outageTolerance)
cmd.Flag("for-grace-period", "Minimum duration between alert and restored \"for\" state. This is maintained only for alerts with configured \"for\" time greater than grace period.").
Default("10m").DurationVar(&conf.forGracePeriod)
cmd.Flag("restore-ignored-label", "Label names to be ignored when restoring alerts from the remote storage. This is only used in stateless mode.").
StringsVar(&conf.ignoredLabelNames)
cmd.Flag("rule-concurrent-evaluation", "How many rules can be evaluated concurrently. Default is 1.").Default("1").Int64Var(&conf.ruleConcurrentEval)
cmd.Flag("grpc-query-endpoint", "Addresses of Thanos gRPC query API servers (repeatable). The scheme may be prefixed with 'dns+' or 'dnssrv+' to detect Thanos API servers through respective DNS lookups.").
PlaceHolder("<endpoint>").StringsVar(&conf.grpcQueryEndpoints)
cmd.Flag("query.enable-x-functions", "Whether to enable extended rate functions (xrate, xincrease and xdelta). Only has effect when used with Thanos engine.").Default("false").BoolVar(&conf.extendedFunctionsEnabled)
cmd.Flag("enable-feature", "Comma separated feature names to enable. Valid options for now: promql-experimental-functions (enables promql experimental functions for ruler)").Default("").StringsVar(&conf.EnableFeatures)
cmd.Flag("tsdb.enable-native-histograms",
"[EXPERIMENTAL] Enables the ingestion of native histograms.").
Default("false").BoolVar(&conf.tsdbEnableNativeHistograms)
conf.rwConfig = extflag.RegisterPathOrContent(cmd, "remote-write.config", "YAML config for the remote-write configurations, that specify servers where samples should be sent to (see https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write). This automatically enables stateless mode for ruler and no series will be stored in the ruler's TSDB. If an empty config (or file) is provided, the flag is ignored and ruler is run with its own TSDB.", extflag.WithEnvSubstitution())
conf.objStoreConfig = extkingpin.RegisterCommonObjStoreFlags(cmd, "", false)
@ -174,15 +196,16 @@ func registerRule(app *extkingpin.App) {
}
tsdbOpts := &tsdb.Options{
MinBlockDuration: int64(time.Duration(*tsdbBlockDuration) / time.Millisecond),
MaxBlockDuration: int64(time.Duration(*tsdbBlockDuration) / time.Millisecond),
RetentionDuration: int64(time.Duration(*tsdbRetention) / time.Millisecond),
NoLockfile: *noLockFile,
WALCompression: wlog.ParseCompressionType(*walCompression, string(wlog.CompressionSnappy)),
MinBlockDuration: int64(time.Duration(*tsdbBlockDuration) / time.Millisecond),
MaxBlockDuration: int64(time.Duration(*tsdbBlockDuration) / time.Millisecond),
RetentionDuration: int64(time.Duration(*tsdbRetention) / time.Millisecond),
NoLockfile: *noLockFile,
WALCompression: compressutil.ParseCompressionType(*walCompression, compression.Snappy),
EnableNativeHistograms: conf.tsdbEnableNativeHistograms,
}
agentOpts := &agent.Options{
WALCompression: wlog.ParseCompressionType(*walCompression, string(wlog.CompressionSnappy)),
WALCompression: compressutil.ParseCompressionType(*walCompression, compression.Snappy),
NoLockfile: *noLockFile,
}
@ -227,7 +250,8 @@ func registerRule(app *extkingpin.App) {
return errors.Wrap(err, "error while parsing config for request logging")
}
tagOpts, grpcLogOpts, err := logging.ParsegRPCOptions(reqLogConfig)
grpcLogOpts, logFilterMethods, err := logging.ParsegRPCOptions(reqLogConfig)
if err != nil {
return errors.Wrap(err, "error while parsing config for request logging")
}
@ -243,7 +267,7 @@ func registerRule(app *extkingpin.App) {
getFlagsMap(cmd.Flags()),
httpLogOpts,
grpcLogOpts,
tagOpts,
logFilterMethods,
tsdbOpts,
agentOpts,
)
@ -285,7 +309,7 @@ func newRuleMetrics(reg *prometheus.Registry) *RuleMetrics {
m.ruleEvalWarnings = factory.NewCounterVec(
prometheus.CounterOpts{
Name: "thanos_rule_evaluation_with_warnings_total",
Help: "The total number of rule evaluation that were successful but had warnings which can indicate partial error.",
Help: "The total number of rule evaluation that were successful but had non PromQL warnings which can indicate partial error.",
}, []string{"strategy"},
)
m.ruleEvalWarnings.WithLabelValues(strings.ToLower(storepb.PartialResponseStrategy_ABORT.String()))
@ -307,7 +331,7 @@ func runRule(
flagsMap map[string]string,
httpLogOpts []logging.Option,
grpcLogOpts []grpc_logging.Option,
tagOpts []tags.Option,
logFilterMethods []string,
tsdbOpts *tsdb.Options,
agentOpts *agent.Options,
) error {
@ -318,7 +342,7 @@ func runRule(
if len(conf.queryConfigYAML) > 0 {
queryCfg, err = clientconfig.LoadConfigs(conf.queryConfigYAML)
if err != nil {
return err
return errors.Wrap(err, "query configuration")
}
} else {
queryCfg, err = clientconfig.BuildConfigFromHTTPAddresses(conf.query.addrs)
@ -375,17 +399,17 @@ func runRule(
cfg.HTTPConfig.HTTPClientConfig.ClientMetrics = queryClientMetrics
c, err := clientconfig.NewHTTPClient(cfg.HTTPConfig.HTTPClientConfig, "query")
if err != nil {
return err
return fmt.Errorf("failed to create HTTP query client: %w", err)
}
c.Transport = tracing.HTTPTripperware(logger, c.Transport)
queryClient, err := clientconfig.NewClient(logger, cfg.HTTPConfig.EndpointsConfig, c, queryProvider.Clone())
if err != nil {
return err
return fmt.Errorf("failed to create query client: %w", err)
}
queryClients = append(queryClients, queryClient)
promClients = append(promClients, promclient.NewClient(queryClient, logger, "thanos-rule"))
// Discover and resolve query addresses.
addDiscoveryGroups(g, queryClient, conf.query.dnsSDInterval)
addDiscoveryGroups(g, queryClient, conf.query.dnsSDInterval, logger)
}
if cfg.GRPCConfig != nil {
@ -394,17 +418,6 @@ func runRule(
}
if len(grpcEndpoints) > 0 {
duplicatedGRPCEndpoints := promauto.With(reg).NewCounter(prometheus.CounterOpts{
Name: "thanos_rule_grpc_endpoints_duplicated_total",
Help: "The number of times a duplicated grpc endpoint is detected from the different configs in rule",
})
dnsEndpointProvider := dns.NewProvider(
logger,
extprom.WrapRegistererWithPrefix("thanos_rule_grpc_endpoints_", reg),
dnsSDResolver,
)
dialOpts, err := extgrpc.StoreClientGRPCOpts(
logger,
reg,
@ -420,36 +433,27 @@ func runRule(
return err
}
grpcEndpointSet = prepareEndpointSet(
grpcEndpointSet, err = setupEndpointSet(
g,
logger,
comp,
reg,
[]*dns.Provider{dnsEndpointProvider},
duplicatedGRPCEndpoints,
logger,
nil,
1*time.Minute,
nil,
1*time.Minute,
grpcEndpoints,
nil,
nil,
nil,
nil,
dialOpts,
conf.query.dnsSDResolver,
conf.query.dnsSDInterval,
5*time.Minute,
5*time.Second,
dialOpts,
)
// Periodically update the GRPC addresses from query config by resolving them using DNS SD if necessary.
{
ctx, cancel := context.WithCancel(context.Background())
g.Add(func() error {
return runutil.Repeat(5*time.Second, ctx.Done(), func() error {
resolveCtx, resolveCancel := context.WithTimeout(ctx, 5*time.Second)
defer resolveCancel()
if err := dnsEndpointProvider.Resolve(resolveCtx, grpcEndpoints); err != nil {
level.Error(logger).Log("msg", "failed to resolve addresses passed using grpc query config", "err", err)
}
return nil
})
}, func(error) {
cancel()
})
if err != nil {
return err
}
}
@ -473,10 +477,11 @@ func runRule(
return errors.Wrapf(err, "failed to parse remote write config %v", string(rwCfgYAML))
}
slogger := logutil.GoKitLogToSlog(logger)
// flushDeadline is set to 1m, but it is for metadata watcher only so not used here.
remoteStore := remote.NewStorage(logger, reg, func() (int64, error) {
remoteStore := remote.NewStorage(slogger, reg, func() (int64, error) {
return 0, nil
}, conf.dataDir, 1*time.Minute, nil)
}, conf.dataDir, 1*time.Minute, &readyScrapeManager{})
if err := remoteStore.ApplyConfig(&config.Config{
GlobalConfig: config.GlobalConfig{
ExternalLabels: labelsTSDBToProm(conf.lset),
@ -486,18 +491,18 @@ func runRule(
return errors.Wrap(err, "applying config to remote storage")
}
agentDB, err = agent.Open(logger, reg, remoteStore, conf.dataDir, agentOpts)
agentDB, err = agent.Open(slogger, reg, remoteStore, conf.dataDir, agentOpts)
if err != nil {
return errors.Wrap(err, "start remote write agent db")
}
fanoutStore := storage.NewFanout(logger, agentDB, remoteStore)
fanoutStore := storage.NewFanout(slogger, agentDB, remoteStore)
appendable = fanoutStore
// Use a separate queryable to restore the ALERTS firing states.
// We cannot use remoteStore directly because it uses remote read for
// query. However, remote read is not implemented in Thanos Receiver.
queryable = thanosrules.NewPromClientsQueryable(logger, queryClients, promClients, conf.query.httpMethod, conf.query.step, conf.ignoredLabelNames)
} else {
tsdbDB, err = tsdb.Open(conf.dataDir, log.With(logger, "component", "tsdb"), reg, tsdbOpts, nil)
tsdbDB, err = tsdb.Open(conf.dataDir, logutil.GoKitLogToSlog(log.With(logger, "component", "tsdb")), reg, tsdbOpts, nil)
if err != nil {
return errors.Wrap(err, "open TSDB")
}
@ -572,7 +577,7 @@ func runRule(
return err
}
// Discover and resolve Alertmanager addresses.
addDiscoveryGroups(g, amClient, conf.alertmgr.alertmgrsDNSSDInterval)
addDiscoveryGroups(g, amClient, conf.alertmgr.alertmgrsDNSSDInterval, logger)
alertmgrs = append(alertmgrs, alert.NewAlertmanager(logger, amClient, time.Duration(cfg.Timeout), cfg.APIVersion))
}
@ -582,6 +587,19 @@ func runRule(
alertQ = alert.NewQueue(logger, reg, 10000, 100, labelsTSDBToProm(conf.lset), conf.alertmgr.alertExcludeLabels, alertRelabelConfigs)
)
{
if conf.extendedFunctionsEnabled {
maps.Copy(parser.Functions, parse.XFunctions)
}
if len(conf.EnableFeatures) > 0 {
for _, feature := range conf.EnableFeatures {
if feature == promqlExperimentalFunctions {
parser.EnableExperimentalFunctions = true
level.Info(logger).Log("msg", "Experimental PromQL functions enabled.", "option", promqlExperimentalFunctions)
}
}
}
// Run rule evaluation and alert notifications.
notifyFunc := func(ctx context.Context, expr string, alerts ...*rules.Alert) {
res := make([]*notifier.Alert, 0, len(alerts))
@ -610,22 +628,29 @@ func runRule(
alertQ.Push(res)
}
managerOpts := rules.ManagerOptions{
NotifyFunc: notifyFunc,
Logger: logutil.GoKitLogToSlog(logger),
Appendable: appendable,
ExternalURL: nil,
Queryable: queryable,
ResendDelay: conf.resendDelay,
OutageTolerance: conf.outageTolerance,
ForGracePeriod: conf.forGracePeriod,
DefaultRuleQueryOffset: func() time.Duration { return conf.queryOffset },
}
if conf.ruleConcurrentEval > 1 {
managerOpts.MaxConcurrentEvals = conf.ruleConcurrentEval
managerOpts.ConcurrentEvalsEnabled = true
}
ctx, cancel := context.WithCancel(context.Background())
logger = log.With(logger, "component", "rules")
ruleMgr = thanosrules.NewManager(
tracing.ContextWithTracer(ctx, tracer),
reg,
conf.dataDir,
rules.ManagerOptions{
NotifyFunc: notifyFunc,
Logger: logger,
Appendable: appendable,
ExternalURL: nil,
Queryable: queryable,
ResendDelay: conf.resendDelay,
OutageTolerance: conf.outageTolerance,
ForGracePeriod: conf.forGracePeriod,
},
managerOpts,
queryFuncCreator(logger, queryClients, promClients, grpcEndpointSet, metrics.duplicatedQuery, metrics.ruleEvalWarnings, conf.query.httpMethod, conf.query.doNotAddThanosParams),
conf.lset,
// In our case the querying URL is the external URL because in Prometheus
@ -708,7 +733,7 @@ func runRule(
)
// Start gRPC server.
tlsCfg, err := tls.NewServerConfig(log.With(logger, "protocol", "gRPC"), conf.grpc.tlsSrvCert, conf.grpc.tlsSrvKey, conf.grpc.tlsSrvClientCA)
tlsCfg, err := tls.NewServerConfig(log.With(logger, "protocol", "gRPC"), conf.grpc.tlsSrvCert, conf.grpc.tlsSrvKey, conf.grpc.tlsSrvClientCA, conf.grpc.tlsMinVersion)
if err != nil {
return errors.Wrap(err, "setup gRPC server")
}
@ -728,7 +753,7 @@ func runRule(
info.WithLabelSetFunc(func() []labelpb.ZLabelSet {
return tsdbStore.LabelSet()
}),
info.WithStoreInfoFunc(func() *infopb.StoreInfo {
info.WithStoreInfoFunc(func() (*infopb.StoreInfo, error) {
if httpProbe.IsReady() {
mint, maxt := tsdbStore.TimeRange()
return &infopb.StoreInfo{
@ -737,9 +762,9 @@ func runRule(
SupportsSharding: true,
SupportsWithoutReplicaLabels: true,
TsdbInfos: tsdbStore.TSDBInfos(),
}
}, nil
}
return nil
return nil, errors.New("Not ready")
}),
)
storeServer := store.NewLimitedStoreServer(store.NewInstrumentedStoreServer(reg, tsdbStore), reg, conf.storeRateLimits)
@ -749,7 +774,7 @@ func runRule(
options = append(options, grpcserver.WithServer(
info.RegisterInfoServer(info.NewInfoServer(component.Rule.String(), infoOptions...)),
))
s := grpcserver.New(logger, reg, tracer, grpcLogOpts, tagOpts, comp, grpcProbe, options...)
s := grpcserver.New(logger, reg, tracer, grpcLogOpts, logFilterMethods, comp, grpcProbe, options...)
g.Add(func() error {
statusProber.Ready()
@ -820,7 +845,7 @@ func runRule(
if len(confContentYaml) > 0 {
// The background shipper continuously scans the data directory and uploads
// new blocks to Google Cloud Storage or an S3-compatible storage service.
bkt, err := client.NewBucket(logger, confContentYaml, component.Rule.String())
bkt, err := client.NewBucket(logger, confContentYaml, component.Rule.String(), nil)
if err != nil {
return err
}
@ -833,7 +858,18 @@ func runRule(
}
}()
s := shipper.New(logger, reg, conf.dataDir, bkt, func() labels.Labels { return conf.lset }, metadata.RulerSource, nil, conf.shipper.allowOutOfOrderUpload, metadata.HashFunc(conf.shipper.hashFunc), conf.shipper.metaFileName)
s := shipper.New(
bkt,
conf.dataDir,
shipper.WithLogger(logger),
shipper.WithRegisterer(reg),
shipper.WithSource(metadata.RulerSource),
shipper.WithHashFunc(metadata.HashFunc(conf.shipper.hashFunc)),
shipper.WithMetaFileName(conf.shipper.metaFileName),
shipper.WithLabels(func() labels.Labels { return conf.lset }),
shipper.WithAllowOutOfOrderUploads(conf.shipper.allowOutOfOrderUpload),
shipper.WithSkipCorruptedBlocks(conf.shipper.skipCorruptedBlocks),
)
ctx, cancel := context.WithCancel(context.Background())
@ -873,13 +909,7 @@ func removeLockfileIfAny(logger log.Logger, dataDir string) error {
}
func labelsTSDBToProm(lset labels.Labels) (res labels.Labels) {
for _, l := range lset {
res = append(res, labels.Label{
Name: l.Name,
Value: l.Value,
})
}
return res
return lset.Copy()
}
func queryFuncCreator(
@ -926,6 +956,8 @@ func queryFuncCreator(
level.Error(logger).Log("err", err, "query", qs)
continue
}
warns = filterOutPromQLWarnings(warns, logger, qs)
if len(warns) > 0 {
ruleEvalWarnings.WithLabelValues(strings.ToLower(partialResponseStrategy.String())).Inc()
// TODO(bwplotka): Propagate those to UI, probably requires changing rule manager code ):
@ -939,7 +971,12 @@ func queryFuncCreator(
queryAPIClients := grpcEndpointSet.GetQueryAPIClients()
for _, i := range rand.Perm(len(queryAPIClients)) {
e := query.NewRemoteEngine(logger, queryAPIClients[i], query.Opts{})
q, err := e.NewInstantQuery(ctx, nil, qs, t)
expr, err := extpromql.ParseExpr(qs)
if err != nil {
level.Error(logger).Log("err", err, "query", qs)
continue
}
q, err := e.NewInstantQuery(ctx, nil, expr, t)
if err != nil {
level.Error(logger).Log("err", err, "query", qs)
continue
@ -952,12 +989,13 @@ func queryFuncCreator(
continue
}
if len(result.Warnings) > 0 {
warnings := make([]string, 0, len(result.Warnings))
for _, warn := range result.Warnings {
warnings = append(warnings, warn.Error())
}
warnings = filterOutPromQLWarnings(warnings, logger, qs)
if len(warnings) > 0 {
ruleEvalWarnings.WithLabelValues(strings.ToLower(partialResponseStrategy.String())).Inc()
warnings := make([]string, 0, len(result.Warnings))
for _, w := range result.Warnings {
warnings = append(warnings, w.Error())
}
level.Warn(logger).Log("warnings", strings.Join(warnings, ", "), "query", qs)
}
@ -969,7 +1007,7 @@ func queryFuncCreator(
}
}
func addDiscoveryGroups(g *run.Group, c *clientconfig.HTTPClient, interval time.Duration) {
func addDiscoveryGroups(g *run.Group, c *clientconfig.HTTPClient, interval time.Duration, logger log.Logger) {
ctx, cancel := context.WithCancel(context.Background())
g.Add(func() error {
c.Discover(ctx)
@ -979,9 +1017,10 @@ func addDiscoveryGroups(g *run.Group, c *clientconfig.HTTPClient, interval time.
})
g.Add(func() error {
return runutil.Repeat(interval, ctx.Done(), func() error {
runutil.RepeatInfinitely(logger, interval, ctx.Done(), func() error {
return c.Resolve(ctx)
})
return nil
}, func(error) {
cancel()
})
@ -1062,3 +1101,45 @@ func validateTemplate(tmplStr string) error {
}
return nil
}
// Filter out PromQL related warnings from warning response and keep store related warnings only.
func filterOutPromQLWarnings(warns []string, logger log.Logger, query string) []string {
storeWarnings := make([]string, 0, len(warns))
for _, warn := range warns {
if extannotations.IsPromQLAnnotation(warn) {
level.Warn(logger).Log("warning", warn, "query", query)
continue
}
storeWarnings = append(storeWarnings, warn)
}
return storeWarnings
}
// ReadyScrapeManager allows a scrape manager to be retrieved. Even if it's set at a later point in time.
type readyScrapeManager struct {
mtx sync.RWMutex
m *scrape.Manager
}
// Set the scrape manager.
func (rm *readyScrapeManager) Set(m *scrape.Manager) {
rm.mtx.Lock()
defer rm.mtx.Unlock()
rm.m = m
}
// Get the scrape manager. If is not ready, return an error.
func (rm *readyScrapeManager) Get() (*scrape.Manager, error) {
rm.mtx.RLock()
defer rm.mtx.RUnlock()
if rm.m != nil {
return rm.m, nil
}
return nil, ErrNotReady
}
// ErrNotReady is returned if the underlying scrape manager is not ready yet.
var ErrNotReady = errors.New("scrape manager not ready")

View File

@ -7,6 +7,10 @@ import (
"testing"
"github.com/efficientgo/core/testutil"
"github.com/go-kit/log"
"github.com/prometheus/prometheus/util/annotations"
"github.com/thanos-io/thanos/pkg/extpromql"
)
func Test_parseFlagLabels(t *testing.T) {
@ -19,19 +23,7 @@ func Test_parseFlagLabels(t *testing.T) {
expectErr: false,
},
{
s: []string{`label-Name="LabelVal"`}, // Unsupported labelname.
expectErr: true,
},
{
s: []string{`label:Name="LabelVal"`}, // Unsupported labelname.
expectErr: true,
},
{
s: []string{`1abelName="LabelVal"`}, // Unsupported labelname.
expectErr: true,
},
{
s: []string{`label_Name"LabelVal"`}, // Missing "=" seprator.
s: []string{`label_Name"LabelVal"`}, // Missing "=" separator.
expectErr: true,
},
{
@ -110,3 +102,59 @@ func Test_tableLinkForExpression(t *testing.T) {
testutil.Equals(t, resStr, td.expectStr)
}
}
func TestFilterOutPromQLWarnings(t *testing.T) {
logger := log.NewNopLogger()
query := "foo"
expr, err := extpromql.ParseExpr(`rate(prometheus_build_info[5m])`)
testutil.Ok(t, err)
possibleCounterInfo := annotations.NewPossibleNonCounterInfo("foo", expr.PositionRange())
badBucketLabelWarning := annotations.NewBadBucketLabelWarning("foo", "0.99", expr.PositionRange())
for _, tc := range []struct {
name string
warnings []string
expected []string
}{
{
name: "nil warning",
expected: make([]string, 0),
},
{
name: "empty warning",
warnings: make([]string, 0),
expected: make([]string, 0),
},
{
name: "no PromQL warning",
warnings: []string{
"some_warning_message",
},
expected: []string{
"some_warning_message",
},
},
{
name: "PromQL warning",
warnings: []string{
possibleCounterInfo.Error(),
},
expected: make([]string, 0),
},
{
name: "filter out all PromQL warnings",
warnings: []string{
possibleCounterInfo.Error(),
badBucketLabelWarning.Error(),
"some_warning_message",
},
expected: []string{
"some_warning_message",
},
},
} {
t.Run(tc.name, func(t *testing.T) {
output := filterOutPromQLWarnings(tc.warnings, logger, query)
testutil.Equals(t, tc.expected, output)
})
}
}

View File

@ -16,7 +16,6 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
grpc_logging "github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/logging"
"github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/tags"
"github.com/oklog/run"
"github.com/opentracing/opentracing-go"
"github.com/pkg/errors"
@ -59,7 +58,9 @@ func registerSidecar(app *extkingpin.App) {
conf := &sidecarConfig{}
conf.registerFlag(cmd)
cmd.Setup(func(g *run.Group, logger log.Logger, reg *prometheus.Registry, tracer opentracing.Tracer, _ <-chan struct{}, _ bool) error {
tagOpts, grpcLogOpts, err := logging.ParsegRPCOptions(conf.reqLogConfig)
grpcLogOpts, logFilterMethods, err := logging.ParsegRPCOptions(conf.reqLogConfig)
if err != nil {
return errors.Wrap(err, "error while parsing config for request logging")
}
@ -101,7 +102,7 @@ func registerSidecar(app *extkingpin.App) {
extprom.WrapRegistererWithPrefix("thanos_sidecar_", reg),
&opts)
return runSidecar(g, logger, reg, tracer, rl, component.Sidecar, *conf, httpClient, grpcLogOpts, tagOpts)
return runSidecar(g, logger, reg, tracer, rl, component.Sidecar, *conf, httpClient, grpcLogOpts, logFilterMethods)
})
}
@ -115,7 +116,7 @@ func runSidecar(
conf sidecarConfig,
httpClient *http.Client,
grpcLogOpts []grpc_logging.Option,
tagOpts []tags.Option,
logFilterMethods []string,
) error {
var m = &promMetadata{
@ -134,10 +135,9 @@ func runSidecar(
return errors.Wrap(err, "getting object store config")
}
var uploads = true
if len(confContentYaml) == 0 {
var uploads = len(confContentYaml) != 0
if !uploads {
level.Info(logger).Log("msg", "no supported bucket was configured, uploads will be disabled")
uploads = false
}
grpcProbe := prober.NewGRPC()
@ -148,24 +148,31 @@ func runSidecar(
prober.NewInstrumentation(comp, logger, extprom.WrapRegistererWithPrefix("thanos_", reg)),
)
srv := httpserver.New(logger, reg, comp, httpProbe,
httpserver.WithListen(conf.http.bindAddress),
httpserver.WithGracePeriod(time.Duration(conf.http.gracePeriod)),
httpserver.WithTLSConfig(conf.http.tlsConfig),
)
// Setup the HTTP server.
{
srv := httpserver.New(logger, reg, comp, httpProbe,
httpserver.WithListen(conf.http.bindAddress),
httpserver.WithGracePeriod(time.Duration(conf.http.gracePeriod)),
httpserver.WithTLSConfig(conf.http.tlsConfig),
)
g.Add(func() error {
statusProber.Healthy()
g.Add(func() error {
statusProber.Healthy()
return srv.ListenAndServe()
}, func(err error) {
return srv.ListenAndServe()
}, func(err error) {
statusProber.NotReady(err)
defer statusProber.NotHealthy(err)
statusProber.NotReady(err)
defer statusProber.NotHealthy(err)
srv.Shutdown(err)
})
srv.Shutdown(err)
})
}
// Setup all the concurrent groups.
// Once we have loaded external labels from prometheus we can use this to signal the servers
// that they can start now.
readyToStartGRPC := make(chan struct{})
// Setup Prometheus Heartbeats.
{
promUp := promauto.With(reg).NewGauge(prometheus.GaugeOpts{
Name: "thanos_sidecar_prometheus_up",
@ -177,14 +184,35 @@ func runSidecar(
// Only check Prometheus's flags when upload is enabled.
if uploads {
// Check prometheus's flags to ensure same sidecar flags.
if err := validatePrometheus(ctx, m.client, logger, conf.shipper.ignoreBlockSize, m); err != nil {
return errors.Wrap(err, "validate Prometheus flags")
// We retry infinitely until we validated prometheus flags
err := runutil.Retry(conf.prometheus.getConfigInterval, ctx.Done(), func() error {
iterCtx, iterCancel := context.WithTimeout(context.Background(), conf.prometheus.getConfigTimeout)
defer iterCancel()
if err := validatePrometheus(iterCtx, m.client, logger, conf.shipper.ignoreBlockSize, m); err != nil {
level.Warn(logger).Log(
"msg", "failed to validate prometheus flags. Is Prometheus running? Retrying",
"err", err,
)
return err
}
level.Info(logger).Log(
"msg", "successfully validated prometheus flags",
)
return nil
})
if err != nil {
return errors.Wrap(err, "failed to validate prometheus flags")
}
}
// We retry infinitely until we reach and fetch BuildVersion from our Prometheus.
err := runutil.Retry(2*time.Second, ctx.Done(), func() error {
if err := m.BuildVersion(ctx); err != nil {
err := runutil.Retry(conf.prometheus.getConfigInterval, ctx.Done(), func() error {
iterCtx, iterCancel := context.WithTimeout(context.Background(), conf.prometheus.getConfigTimeout)
defer iterCancel()
if err := m.BuildVersion(iterCtx); err != nil {
level.Warn(logger).Log(
"msg", "failed to fetch prometheus version. Is Prometheus running? Retrying",
"err", err,
@ -203,14 +231,23 @@ func runSidecar(
// Blocking query of external labels before joining as a Source Peer into gossip.
// We retry infinitely until we reach and fetch labels from our Prometheus.
err = runutil.Retry(2*time.Second, ctx.Done(), func() error {
if err := m.UpdateLabels(ctx); err != nil {
err = runutil.Retry(conf.prometheus.getConfigInterval, ctx.Done(), func() error {
iterCtx, iterCancel := context.WithTimeout(context.Background(), conf.prometheus.getConfigTimeout)
defer iterCancel()
if err := m.UpdateTimestamps(iterCtx); err != nil {
level.Warn(logger).Log(
"msg", "failed to fetch timestamps. Is Prometheus running? Retrying",
"err", err,
)
return err
}
if err := m.UpdateLabels(iterCtx); err != nil {
level.Warn(logger).Log(
"msg", "failed to fetch initial external labels. Is Prometheus running? Retrying",
"err", err,
)
promUp.Set(0)
statusProber.NotReady(err)
return err
}
@ -218,39 +255,48 @@ func runSidecar(
"msg", "successfully loaded prometheus external labels",
"external_labels", m.Labels().String(),
)
promUp.Set(1)
statusProber.Ready()
return nil
})
if err != nil {
return errors.Wrap(err, "initial external labels query")
}
if len(m.Labels()) == 0 {
if m.Labels().Len() == 0 {
return errors.New("no external labels configured on Prometheus server, uniquely identifying external labels must be configured; see https://thanos.io/tip/thanos/storage.md#external-labels for details.")
}
promUp.Set(1)
statusProber.Ready()
close(readyToStartGRPC)
// Periodically query the Prometheus config. We use this as a heartbeat as well as for updating
// the external labels we apply.
return runutil.Repeat(conf.prometheus.getConfigInterval, ctx.Done(), func() error {
iterCtx, iterCancel := context.WithTimeout(context.Background(), conf.prometheus.getConfigTimeout)
defer iterCancel()
if err := m.UpdateLabels(iterCtx); err != nil {
level.Warn(logger).Log("msg", "heartbeat failed", "err", err)
if err := m.UpdateTimestamps(iterCtx); err != nil {
level.Warn(logger).Log("msg", "updating timestamps failed", "err", err)
promUp.Set(0)
statusProber.NotReady(err)
} else {
promUp.Set(1)
statusProber.Ready()
return nil
}
if err := m.UpdateLabels(iterCtx); err != nil {
level.Warn(logger).Log("msg", "updating labels failed", "err", err)
promUp.Set(0)
statusProber.NotReady(err)
return nil
}
promUp.Set(1)
statusProber.Ready()
return nil
})
}, func(error) {
cancel()
})
}
// Setup the Reloader.
{
ctx, cancel := context.WithCancel(context.Background())
g.Add(func() error {
@ -259,6 +305,8 @@ func runSidecar(
cancel()
})
}
// Setup the gRPC server.
{
c := promclient.NewWithTracingClient(logger, httpClient, clientconfig.ThanosUserAgent)
@ -268,7 +316,7 @@ func runSidecar(
}
tlsCfg, err := tls.NewServerConfig(log.With(logger, "protocol", "gRPC"),
conf.grpc.tlsSrvCert, conf.grpc.tlsSrvKey, conf.grpc.tlsSrvClientCA)
conf.grpc.tlsSrvCert, conf.grpc.tlsSrvKey, conf.grpc.tlsSrvClientCA, conf.grpc.tlsMinVersion)
if err != nil {
return errors.Wrap(err, "setup gRPC server")
}
@ -280,18 +328,18 @@ func runSidecar(
info.WithLabelSetFunc(func() []labelpb.ZLabelSet {
return promStore.LabelSet()
}),
info.WithStoreInfoFunc(func() *infopb.StoreInfo {
info.WithStoreInfoFunc(func() (*infopb.StoreInfo, error) {
if httpProbe.IsReady() {
mint, maxt := promStore.Timestamps()
mint, maxt := m.Timestamps()
return &infopb.StoreInfo{
MinTime: mint,
MaxTime: maxt,
SupportsSharding: true,
SupportsWithoutReplicaLabels: true,
TsdbInfos: promStore.TSDBInfos(),
}
}, nil
}
return nil
return nil, errors.New("Not ready")
}),
info.WithExemplarsInfoFunc(),
info.WithRulesInfoFunc(),
@ -300,7 +348,7 @@ func runSidecar(
)
storeServer := store.NewLimitedStoreServer(store.NewInstrumentedStoreServer(reg, promStore), reg, conf.storeRateLimits)
s := grpcserver.New(logger, reg, tracer, grpcLogOpts, tagOpts, comp, grpcProbe,
s := grpcserver.New(logger, reg, tracer, grpcLogOpts, logFilterMethods, comp, grpcProbe,
grpcserver.WithServer(store.RegisterStoreServer(storeServer, logger)),
grpcserver.WithServer(rules.RegisterRulesServer(rules.NewPrometheus(conf.prometheus.url, c, m.Labels))),
grpcserver.WithServer(targets.RegisterTargetsServer(targets.NewPrometheus(conf.prometheus.url, c, m.Labels))),
@ -312,19 +360,27 @@ func runSidecar(
grpcserver.WithMaxConnAge(conf.grpc.maxConnectionAge),
grpcserver.WithTLSConfig(tlsCfg),
)
ctx, cancel := context.WithCancel(context.Background())
g.Add(func() error {
select {
case <-ctx.Done():
return ctx.Err()
case <-readyToStartGRPC:
}
statusProber.Ready()
return s.ListenAndServe()
}, func(err error) {
cancel()
statusProber.NotReady(err)
s.Shutdown(err)
})
}
if uploads {
// The background shipper continuously scans the data directory and uploads
// new blocks to Google Cloud Storage or an S3-compatible storage service.
bkt, err := client.NewBucket(logger, confContentYaml, component.Sidecar.String())
bkt, err := client.NewBucket(logger, confContentYaml, component.Sidecar.String(), nil)
if err != nil {
return err
}
@ -350,7 +406,7 @@ func runSidecar(
defer cancel()
if err := runutil.Retry(2*time.Second, extLabelsCtx.Done(), func() error {
if len(m.Labels()) == 0 {
if m.Labels().Len() == 0 {
return errors.New("not uploading as no external labels are configured yet - is Prometheus healthy/reachable?")
}
return nil
@ -358,21 +414,24 @@ func runSidecar(
return errors.Wrapf(err, "aborting as no external labels found after waiting %s", promReadyTimeout)
}
uploadCompactedFunc := func() bool { return conf.shipper.uploadCompacted }
s := shipper.New(logger, reg, conf.tsdb.path, bkt, m.Labels, metadata.SidecarSource,
uploadCompactedFunc, conf.shipper.allowOutOfOrderUpload, metadata.HashFunc(conf.shipper.hashFunc), conf.shipper.metaFileName)
s := shipper.New(
bkt,
conf.tsdb.path,
shipper.WithLogger(logger),
shipper.WithRegisterer(reg),
shipper.WithSource(metadata.SidecarSource),
shipper.WithHashFunc(metadata.HashFunc(conf.shipper.hashFunc)),
shipper.WithMetaFileName(conf.shipper.metaFileName),
shipper.WithLabels(m.Labels),
shipper.WithUploadCompacted(conf.shipper.uploadCompacted),
shipper.WithAllowOutOfOrderUploads(conf.shipper.allowOutOfOrderUpload),
shipper.WithSkipCorruptedBlocks(conf.shipper.skipCorruptedBlocks),
)
return runutil.Repeat(30*time.Second, ctx.Done(), func() error {
if uploaded, err := s.Sync(ctx); err != nil {
level.Warn(logger).Log("err", err, "uploaded", uploaded)
}
minTime, _, err := s.Timestamps()
if err != nil {
level.Warn(logger).Log("msg", "reading timestamps failed", "err", err)
return nil
}
m.UpdateTimestamps(minTime, math.MaxInt64)
return nil
})
}, func(error) {
@ -447,16 +506,19 @@ func (s *promMetadata) UpdateLabels(ctx context.Context) error {
return nil
}
func (s *promMetadata) UpdateTimestamps(mint, maxt int64) {
func (s *promMetadata) UpdateTimestamps(ctx context.Context) error {
s.mtx.Lock()
defer s.mtx.Unlock()
if mint < s.limitMinTime.PrometheusTimestamp() {
mint = s.limitMinTime.PrometheusTimestamp()
mint, err := s.client.LowestTimestamp(ctx, s.promURL)
if err != nil {
return err
}
s.mint = mint
s.maxt = maxt
s.mint = max(s.limitMinTime.PrometheusTimestamp(), mint)
s.maxt = math.MaxInt64
return nil
}
func (s *promMetadata) Labels() labels.Labels {

View File

@ -7,6 +7,7 @@ import (
"context"
"fmt"
"strconv"
"strings"
"time"
"github.com/alecthomas/units"
@ -14,13 +15,13 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
grpclogging "github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/logging"
"github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/tags"
"github.com/oklog/run"
"github.com/opentracing/opentracing-go"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
commonmodel "github.com/prometheus/common/model"
"github.com/prometheus/common/route"
"gopkg.in/yaml.v2"
"github.com/thanos-io/objstore"
"github.com/thanos-io/objstore/client"
@ -32,6 +33,7 @@ import (
"github.com/thanos-io/thanos/pkg/block/metadata"
"github.com/thanos-io/thanos/pkg/component"
hidden "github.com/thanos-io/thanos/pkg/extflag"
"github.com/thanos-io/thanos/pkg/exthttp"
"github.com/thanos-io/thanos/pkg/extkingpin"
"github.com/thanos-io/thanos/pkg/extprom"
extpromhttp "github.com/thanos-io/thanos/pkg/extprom/http"
@ -44,6 +46,7 @@ import (
"github.com/thanos-io/thanos/pkg/runutil"
grpcserver "github.com/thanos-io/thanos/pkg/server/grpc"
httpserver "github.com/thanos-io/thanos/pkg/server/http"
"github.com/thanos-io/thanos/pkg/server/http/middleware"
"github.com/thanos-io/thanos/pkg/store"
storecache "github.com/thanos-io/thanos/pkg/store/cache"
"github.com/thanos-io/thanos/pkg/store/labelpb"
@ -56,42 +59,54 @@ const (
retryIntervalDuration = 10
)
type syncStrategy string
const (
concurrentDiscovery syncStrategy = "concurrent"
recursiveDiscovery syncStrategy = "recursive"
)
type storeConfig struct {
indexCacheConfigs extflag.PathOrContent
objStoreConfig extflag.PathOrContent
dataDir string
cacheIndexHeader bool
grpcConfig grpcConfig
httpConfig httpConfig
indexCacheSizeBytes units.Base2Bytes
chunkPoolSize units.Base2Bytes
estimatedMaxSeriesSize uint64
estimatedMaxChunkSize uint64
seriesBatchSize int
storeRateLimits store.SeriesSelectLimits
maxDownloadedBytes units.Base2Bytes
maxConcurrency int
component component.StoreAPI
debugLogging bool
syncInterval time.Duration
blockSyncConcurrency int
blockMetaFetchConcurrency int
filterConf *store.FilterConfig
selectorRelabelConf extflag.PathOrContent
advertiseCompatibilityLabel bool
consistencyDelay commonmodel.Duration
ignoreDeletionMarksDelay commonmodel.Duration
disableWeb bool
webConfig webConfig
label string
postingOffsetsInMemSampling int
cachingBucketConfig extflag.PathOrContent
reqLogConfig *extflag.PathOrContent
lazyIndexReaderEnabled bool
lazyIndexReaderIdleTimeout time.Duration
lazyExpandedPostingsEnabled bool
indexCacheConfigs extflag.PathOrContent
objStoreConfig extflag.PathOrContent
dataDir string
cacheIndexHeader bool
grpcConfig grpcConfig
httpConfig httpConfig
indexCacheSizeBytes units.Base2Bytes
chunkPoolSize units.Base2Bytes
estimatedMaxSeriesSize uint64
estimatedMaxChunkSize uint64
seriesBatchSize int
storeRateLimits store.SeriesSelectLimits
maxDownloadedBytes units.Base2Bytes
maxConcurrency int
component component.StoreAPI
debugLogging bool
syncInterval time.Duration
blockListStrategy string
blockSyncConcurrency int
blockMetaFetchConcurrency int
filterConf *store.FilterConfig
selectorRelabelConf extflag.PathOrContent
advertiseCompatibilityLabel bool
consistencyDelay commonmodel.Duration
ignoreDeletionMarksDelay commonmodel.Duration
disableWeb bool
webConfig webConfig
label string
postingOffsetsInMemSampling int
cachingBucketConfig extflag.PathOrContent
reqLogConfig *extflag.PathOrContent
lazyIndexReaderEnabled bool
lazyIndexReaderIdleTimeout time.Duration
lazyExpandedPostingsEnabled bool
postingGroupMaxKeySeriesRatio float64
indexHeaderLazyDownloadStrategy string
matcherCacheSize int
disableAdminOperations bool
}
func (sc *storeConfig) registerFlag(cmd extkingpin.FlagClause) {
@ -137,6 +152,10 @@ func (sc *storeConfig) registerFlag(cmd extkingpin.FlagClause) {
cmd.Flag("sync-block-duration", "Repeat interval for syncing the blocks between local and remote view.").
Default("15m").DurationVar(&sc.syncInterval)
strategies := strings.Join([]string{string(concurrentDiscovery), string(recursiveDiscovery)}, ", ")
cmd.Flag("block-discovery-strategy", "One of "+strategies+". When set to concurrent, stores will concurrently issue one call per directory to discover active blocks in the bucket. The recursive strategy iterates through all objects in the bucket, recursively traversing into each directory. This avoids N+1 calls at the expense of having slower bucket iterations.").
Default(string(concurrentDiscovery)).StringVar(&sc.blockListStrategy)
cmd.Flag("block-sync-concurrency", "Number of goroutines to use when constructing index-cache.json blocks from object storage. Must be equal or greater than 1.").
Default("20").IntVar(&sc.blockSyncConcurrency)
@ -189,6 +208,9 @@ func (sc *storeConfig) registerFlag(cmd extkingpin.FlagClause) {
cmd.Flag("store.enable-lazy-expanded-postings", "If true, Store Gateway will estimate postings size and try to lazily expand postings if it downloads less data than expanding all postings.").
Default("false").BoolVar(&sc.lazyExpandedPostingsEnabled)
cmd.Flag("store.posting-group-max-key-series-ratio", "Mark posting group as lazy if it fetches more keys than R * max series the query should fetch. With R set to 100, a posting group which fetches 100K keys will be marked as lazy if the current query only fetches 1000 series. thanos_bucket_store_lazy_expanded_posting_groups_total shows lazy expanded postings groups with reasons and you can tune this config accordingly. This config is only valid if lazy expanded posting is enabled. 0 disables the limit.").
Default("100").Float64Var(&sc.postingGroupMaxKeySeriesRatio)
cmd.Flag("store.index-header-lazy-download-strategy", "Strategy of how to download index headers lazily. Supported values: eager, lazy. If eager, always download index header during initial load. If lazy, download index header during query time.").
Default(string(indexheader.EagerDownloadStrategy)).
EnumVar(&sc.indexHeaderLazyDownloadStrategy, string(indexheader.EagerDownloadStrategy), string(indexheader.LazyDownloadStrategy))
@ -206,6 +228,10 @@ func (sc *storeConfig) registerFlag(cmd extkingpin.FlagClause) {
cmd.Flag("bucket-web-label", "External block label to use as group title in the bucket web UI").StringVar(&sc.label)
cmd.Flag("matcher-cache-size", "Max number of cached matchers items. Using 0 disables caching.").Default("0").IntVar(&sc.matcherCacheSize)
cmd.Flag("disable-admin-operations", "Disable UI/API admin operations like marking blocks for deletion and no compaction.").Default("false").BoolVar(&sc.disableAdminOperations)
sc.reqLogConfig = extkingpin.RegisterRequestLoggingFlags(cmd)
}
@ -227,7 +253,8 @@ func registerStore(app *extkingpin.App) {
return errors.Wrap(err, "error while parsing config for request logging")
}
tagOpts, grpcLogOpts, err := logging.ParsegRPCOptions(conf.reqLogConfig)
grpcLogOpts, logFilterMethods, err := logging.ParsegRPCOptions(conf.reqLogConfig)
if err != nil {
return errors.Wrap(err, "error while parsing config for request logging")
}
@ -240,7 +267,7 @@ func registerStore(app *extkingpin.App) {
tracer,
httpLogOpts,
grpcLogOpts,
tagOpts,
logFilterMethods,
*conf,
getFlagsMap(cmd.Flags()),
)
@ -255,7 +282,7 @@ func runStore(
tracer opentracing.Tracer,
httpLogOpts []logging.Option,
grpcLogOpts []grpclogging.Option,
tagOpts []tags.Option,
logFilterMethods []string,
conf storeConfig,
flagsMap map[string]string,
) error {
@ -294,8 +321,11 @@ func runStore(
if err != nil {
return err
}
bkt, err := client.NewBucket(logger, confContentYaml, conf.component.String())
customBktConfig := exthttp.DefaultCustomBucketConfig()
if err := yaml.Unmarshal(confContentYaml, &customBktConfig); err != nil {
return errors.Wrap(err, "parsing config YAML file")
}
bkt, err := client.NewBucket(logger, confContentYaml, conf.component.String(), exthttp.CreateHedgedTransportWithConfig(customBktConfig))
if err != nil {
return err
}
@ -309,7 +339,7 @@ func runStore(
r := route.New()
if len(cachingBucketConfigYaml) > 0 {
insBkt, err = storecache.NewCachingBucketFromYaml(cachingBucketConfigYaml, insBkt, logger, reg, r)
insBkt, err = storecache.NewCachingBucketFromYaml(cachingBucketConfigYaml, insBkt, logger, reg, r, conf.cachingBucketConfig.Path())
if err != nil {
return errors.Wrap(err, "create caching bucket")
}
@ -345,16 +375,34 @@ func runStore(
return errors.Wrap(err, "create index cache")
}
var matchersCache = storecache.NoopMatchersCache
if conf.matcherCacheSize > 0 {
matchersCache, err = storecache.NewMatchersCache(storecache.WithSize(conf.matcherCacheSize), storecache.WithPromRegistry(reg))
if err != nil {
return errors.Wrap(err, "failed to create matchers cache")
}
}
var blockLister block.Lister
switch syncStrategy(conf.blockListStrategy) {
case concurrentDiscovery:
blockLister = block.NewConcurrentLister(logger, insBkt)
case recursiveDiscovery:
blockLister = block.NewRecursiveLister(logger, insBkt)
default:
return errors.Errorf("unknown sync strategy %s", conf.blockListStrategy)
}
ignoreDeletionMarkFilter := block.NewIgnoreDeletionMarkFilter(logger, insBkt, time.Duration(conf.ignoreDeletionMarksDelay), conf.blockMetaFetchConcurrency)
baseBlockIDsFetcher := block.NewBaseBlockIDsFetcher(logger, insBkt)
metaFetcher, err := block.NewMetaFetcher(logger, conf.blockMetaFetchConcurrency, insBkt, baseBlockIDsFetcher, dataDir, extprom.WrapRegistererWithPrefix("thanos_", reg),
[]block.MetadataFilter{
block.NewTimePartitionMetaFilter(conf.filterConf.MinTime, conf.filterConf.MaxTime),
block.NewLabelShardedMetaFilter(relabelConfig),
block.NewConsistencyDelayMetaFilter(logger, time.Duration(conf.consistencyDelay), extprom.WrapRegistererWithPrefix("thanos_", reg)),
ignoreDeletionMarkFilter,
block.NewDeduplicateFilter(conf.blockMetaFetchConcurrency),
})
filters := []block.MetadataFilter{
block.NewTimePartitionMetaFilter(conf.filterConf.MinTime, conf.filterConf.MaxTime),
block.NewLabelShardedMetaFilter(relabelConfig),
block.NewConsistencyDelayMetaFilter(logger, time.Duration(conf.consistencyDelay), extprom.WrapRegistererWithPrefix("thanos_", reg)),
ignoreDeletionMarkFilter,
block.NewDeduplicateFilter(conf.blockMetaFetchConcurrency),
block.NewParquetMigratedMetaFilter(logger),
}
metaFetcher, err := block.NewMetaFetcher(logger, conf.blockMetaFetchConcurrency, insBkt, blockLister, dataDir, extprom.WrapRegistererWithPrefix("thanos_", reg), filters)
if err != nil {
return errors.Wrap(err, "meta fetcher")
}
@ -373,8 +421,16 @@ func runStore(
options := []store.BucketStoreOption{
store.WithLogger(logger),
store.WithRequestLoggerFunc(func(ctx context.Context, logger log.Logger) log.Logger {
reqID, ok := middleware.RequestIDFromContext(ctx)
if ok {
return log.With(logger, "request-id", reqID)
}
return logger
}),
store.WithRegistry(reg),
store.WithIndexCache(indexCache),
store.WithMatchersCache(matchersCache),
store.WithQueryGate(queriesGate),
store.WithChunkPool(chunkPool),
store.WithFilterConfig(conf.filterConf),
@ -395,6 +451,8 @@ func runStore(
return conf.estimatedMaxChunkSize
}),
store.WithLazyExpandedPostings(conf.lazyExpandedPostingsEnabled),
store.WithPostingGroupMaxKeySeriesRatio(conf.postingGroupMaxKeySeriesRatio),
store.WithSeriesMatchRatio(0.5), // TODO: expose series match ratio as config.
store.WithIndexHeaderLazyDownloadStrategy(
indexheader.IndexHeaderLazyDownloadStrategy(conf.indexHeaderLazyDownloadStrategy).StrategyToDownloadFunc(),
),
@ -470,7 +528,7 @@ func runStore(
info.WithLabelSetFunc(func() []labelpb.ZLabelSet {
return bs.LabelSet()
}),
info.WithStoreInfoFunc(func() *infopb.StoreInfo {
info.WithStoreInfoFunc(func() (*infopb.StoreInfo, error) {
if httpProbe.IsReady() {
mint, maxt := bs.TimeRange()
return &infopb.StoreInfo{
@ -479,21 +537,21 @@ func runStore(
SupportsSharding: true,
SupportsWithoutReplicaLabels: true,
TsdbInfos: bs.TSDBInfos(),
}
}, nil
}
return nil
return nil, errors.New("Not ready")
}),
)
// Start query (proxy) gRPC StoreAPI.
{
tlsCfg, err := tls.NewServerConfig(log.With(logger, "protocol", "gRPC"), conf.grpcConfig.tlsSrvCert, conf.grpcConfig.tlsSrvKey, conf.grpcConfig.tlsSrvClientCA)
tlsCfg, err := tls.NewServerConfig(log.With(logger, "protocol", "gRPC"), conf.grpcConfig.tlsSrvCert, conf.grpcConfig.tlsSrvKey, conf.grpcConfig.tlsSrvClientCA, conf.grpcConfig.tlsMinVersion)
if err != nil {
return errors.Wrap(err, "setup gRPC server")
}
storeServer := store.NewInstrumentedStoreServer(reg, bs)
s := grpcserver.New(logger, reg, tracer, grpcLogOpts, tagOpts, conf.component, grpcProbe,
s := grpcserver.New(logger, reg, tracer, grpcLogOpts, logFilterMethods, conf.component, grpcProbe,
grpcserver.WithServer(store.RegisterStoreServer(storeServer, logger)),
grpcserver.WithServer(info.RegisterInfoServer(infoSrv)),
grpcserver.WithListen(conf.grpcConfig.bindAddress),

View File

@ -0,0 +1,18 @@
groups:
- name: test-alert-group
partial_response_strategy: "warn"
interval: 2m
rules:
- alert: TestAlert
expr: 1
labels:
key: value
annotations:
key: value
- name: test-rule-group
partial_response_strategy: "warn"
interval: 2m
rules:
- record: test_metric
expr: 1

View File

@ -55,7 +55,6 @@ func checkRulesFiles(logger log.Logger, patterns *[]string) error {
if err != nil || matches == nil {
err = errors.New("matching file not found")
level.Error(logger).Log("result", "FAILED", "error", err)
level.Info(logger).Log()
failed.Add(err)
continue
}
@ -64,8 +63,7 @@ func checkRulesFiles(logger log.Logger, patterns *[]string) error {
f, er := os.Open(fn)
if er != nil {
level.Error(logger).Log("result", "FAILED", "error", er)
level.Info(logger).Log()
failed.Add(err)
failed.Add(er)
continue
}
defer func() { _ = f.Close() }()
@ -77,7 +75,6 @@ func checkRulesFiles(logger log.Logger, patterns *[]string) error {
level.Error(logger).Log("error", e.Error())
failed.Add(e)
}
level.Info(logger).Log()
continue
}
level.Info(logger).Log("result", "SUCCESS", "rules found", n)

View File

@ -23,7 +23,8 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/oklog/run"
"github.com/oklog/ulid"
"github.com/oklog/ulid/v2"
"github.com/olekukonko/tablewriter"
"github.com/opentracing/opentracing-go"
"github.com/pkg/errors"
@ -54,6 +55,7 @@ import (
"github.com/thanos-io/thanos/pkg/extprom"
extpromhttp "github.com/thanos-io/thanos/pkg/extprom/http"
"github.com/thanos-io/thanos/pkg/logging"
"github.com/thanos-io/thanos/pkg/logutil"
"github.com/thanos-io/thanos/pkg/model"
"github.com/thanos-io/thanos/pkg/prober"
"github.com/thanos-io/thanos/pkg/promclient"
@ -109,8 +111,11 @@ type bucketVerifyConfig struct {
}
type bucketLsConfig struct {
output string
excludeDelete bool
output string
excludeDelete bool
selectorRelabelConf extflag.PathOrContent
filterConf *store.FilterConfig
timeout time.Duration
}
type bucketWebConfig struct {
@ -161,8 +166,9 @@ type bucketMarkBlockConfig struct {
}
type bucketUploadBlocksConfig struct {
path string
labels []string
path string
labels []string
uploadCompacted bool
}
func (tbc *bucketVerifyConfig) registerBucketVerifyFlag(cmd extkingpin.FlagClause) *bucketVerifyConfig {
@ -180,10 +186,18 @@ func (tbc *bucketVerifyConfig) registerBucketVerifyFlag(cmd extkingpin.FlagClaus
}
func (tbc *bucketLsConfig) registerBucketLsFlag(cmd extkingpin.FlagClause) *bucketLsConfig {
tbc.selectorRelabelConf = *extkingpin.RegisterSelectorRelabelFlags(cmd)
tbc.filterConf = &store.FilterConfig{}
cmd.Flag("output", "Optional format in which to print each block's information. Options are 'json', 'wide' or a custom template.").
Short('o').Default("").StringVar(&tbc.output)
cmd.Flag("exclude-delete", "Exclude blocks marked for deletion.").
Default("false").BoolVar(&tbc.excludeDelete)
cmd.Flag("min-time", "Start of time range limit to list blocks. Thanos Tools will list blocks, which were created later than this value. Option can be a constant time in RFC3339 format or time duration relative to current time, such as -1d or 2h45m. Valid duration units are ms, s, m, h, d, w, y.").
Default("0000-01-01T00:00:00Z").SetValue(&tbc.filterConf.MinTime)
cmd.Flag("max-time", "End of time range limit to list. Thanos Tools will list only blocks, which were created earlier than this value. Option can be a constant time in RFC3339 format or time duration relative to current time, such as -1d or 2h45m. Valid duration units are ms, s, m, h, d, w, y.").
Default("9999-12-31T23:59:59Z").SetValue(&tbc.filterConf.MaxTime)
cmd.Flag("timeout", "Timeout to download metadata from remote storage").Default("5m").DurationVar(&tbc.timeout)
return tbc
}
@ -287,6 +301,7 @@ func (tbc *bucketRetentionConfig) registerBucketRetentionFlag(cmd extkingpin.Fla
func (tbc *bucketUploadBlocksConfig) registerBucketUploadBlocksFlag(cmd extkingpin.FlagClause) *bucketUploadBlocksConfig {
cmd.Flag("path", "Path to the directory containing blocks to upload.").Default("./data").StringVar(&tbc.path)
cmd.Flag("label", "External labels to add to the uploaded blocks (repeated).").PlaceHolder("key=\"value\"").StringsVar(&tbc.labels)
cmd.Flag("shipper.upload-compacted", "If true shipper will try to upload compacted blocks as well.").Default("false").BoolVar(&tbc.uploadCompacted)
return tbc
}
@ -327,7 +342,7 @@ func registerBucketVerify(app extkingpin.AppClause, objStoreConfig *extflag.Path
return err
}
bkt, err := client.NewBucket(logger, confContentYaml, component.Bucket.String())
bkt, err := client.NewBucket(logger, confContentYaml, component.Bucket.String(), nil)
if err != nil {
return err
}
@ -346,7 +361,7 @@ func registerBucketVerify(app extkingpin.AppClause, objStoreConfig *extflag.Path
}
} else {
// nil Prometheus registerer: don't create conflicting metrics.
backupBkt, err = client.NewBucket(logger, backupconfContentYaml, component.Bucket.String())
backupBkt, err = client.NewBucket(logger, backupconfContentYaml, component.Bucket.String(), nil)
if err != nil {
return err
}
@ -365,7 +380,7 @@ func registerBucketVerify(app extkingpin.AppClause, objStoreConfig *extflag.Path
// We ignore any block that has the deletion marker file.
filters := []block.MetadataFilter{block.NewIgnoreDeletionMarkFilter(logger, insBkt, 0, block.FetcherConcurrency)}
baseBlockIDsFetcher := block.NewBaseBlockIDsFetcher(logger, insBkt)
baseBlockIDsFetcher := block.NewConcurrentLister(logger, insBkt)
fetcher, err := block.NewMetaFetcher(logger, block.FetcherConcurrency, insBkt, baseBlockIDsFetcher, "", extprom.WrapRegistererWithPrefix(extpromPrefix, reg), filters)
if err != nil {
return err
@ -411,19 +426,37 @@ func registerBucketLs(app extkingpin.AppClause, objStoreConfig *extflag.PathOrCo
return err
}
bkt, err := client.NewBucket(logger, confContentYaml, component.Bucket.String())
bkt, err := client.NewBucket(logger, confContentYaml, component.Bucket.String(), nil)
if err != nil {
return err
}
insBkt := objstoretracing.WrapWithTraces(objstore.WrapWithMetrics(bkt, extprom.WrapRegistererWithPrefix("thanos_", reg), bkt.Name()))
var filters []block.MetadataFilter
if tbc.timeout < time.Minute {
level.Warn(logger).Log("msg", "Timeout less than 1m could lead to frequent failures")
}
relabelContentYaml, err := tbc.selectorRelabelConf.Content()
if err != nil {
return errors.Wrap(err, "get content of relabel configuration")
}
relabelConfig, err := block.ParseRelabelConfig(relabelContentYaml, block.SelectorSupportedRelabelActions)
if err != nil {
return err
}
filters := []block.MetadataFilter{
block.NewLabelShardedMetaFilter(relabelConfig),
block.NewTimePartitionMetaFilter(tbc.filterConf.MinTime, tbc.filterConf.MaxTime),
}
if tbc.excludeDelete {
ignoreDeletionMarkFilter := block.NewIgnoreDeletionMarkFilter(logger, insBkt, 0, block.FetcherConcurrency)
filters = append(filters, ignoreDeletionMarkFilter)
}
baseBlockIDsFetcher := block.NewBaseBlockIDsFetcher(logger, insBkt)
baseBlockIDsFetcher := block.NewConcurrentLister(logger, insBkt)
fetcher, err := block.NewMetaFetcher(logger, block.FetcherConcurrency, insBkt, baseBlockIDsFetcher, "", extprom.WrapRegistererWithPrefix(extpromPrefix, reg), filters)
if err != nil {
return err
@ -434,7 +467,7 @@ func registerBucketLs(app extkingpin.AppClause, objStoreConfig *extflag.PathOrCo
defer runutil.CloseWithLogOnErr(logger, insBkt, "bucket client")
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
ctx, cancel := context.WithTimeout(context.Background(), tbc.timeout)
defer cancel()
var (
@ -504,7 +537,7 @@ func registerBucketInspect(app extkingpin.AppClause, objStoreConfig *extflag.Pat
tbc := &bucketInspectConfig{}
tbc.registerBucketInspectFlag(cmd)
output := cmd.Flag("output", "Output format for result. Currently supports table, cvs, tsv.").Default("table").Enum(outputTypes...)
output := cmd.Flag("output", "Output format for result. Currently supports table, csv, tsv.").Default("table").Enum(outputTypes...)
cmd.Setup(func(g *run.Group, logger log.Logger, reg *prometheus.Registry, _ opentracing.Tracer, _ <-chan struct{}, _ bool) error {
@ -519,13 +552,13 @@ func registerBucketInspect(app extkingpin.AppClause, objStoreConfig *extflag.Pat
return err
}
bkt, err := client.NewBucket(logger, confContentYaml, component.Bucket.String())
bkt, err := client.NewBucket(logger, confContentYaml, component.Bucket.String(), nil)
if err != nil {
return err
}
insBkt := objstoretracing.WrapWithTraces(objstore.WrapWithMetrics(bkt, extprom.WrapRegistererWithPrefix("thanos_", reg), bkt.Name()))
baseBlockIDsFetcher := block.NewBaseBlockIDsFetcher(logger, insBkt)
baseBlockIDsFetcher := block.NewConcurrentLister(logger, insBkt)
fetcher, err := block.NewMetaFetcher(logger, block.FetcherConcurrency, insBkt, baseBlockIDsFetcher, "", extprom.WrapRegistererWithPrefix(extpromPrefix, reg), nil)
if err != nil {
return err
@ -629,7 +662,7 @@ func registerBucketWeb(app extkingpin.AppClause, objStoreConfig *extflag.PathOrC
return err
}
bkt, err := client.NewBucket(logger, confContentYaml, component.Bucket.String())
bkt, err := client.NewBucket(logger, confContentYaml, component.Bucket.String(), nil)
if err != nil {
return errors.Wrap(err, "bucket client")
}
@ -669,7 +702,7 @@ func registerBucketWeb(app extkingpin.AppClause, objStoreConfig *extflag.PathOrC
return err
}
// TODO(bwplotka): Allow Bucket UI to visualize the state of block as well.
baseBlockIDsFetcher := block.NewBaseBlockIDsFetcher(logger, insBkt)
baseBlockIDsFetcher := block.NewConcurrentLister(logger, insBkt)
fetcher, err := block.NewMetaFetcher(logger, block.FetcherConcurrency, insBkt, baseBlockIDsFetcher, "", extprom.WrapRegistererWithPrefix(extpromPrefix, reg),
[]block.MetadataFilter{
block.NewTimePartitionMetaFilter(filterConf.MinTime, filterConf.MaxTime),
@ -826,7 +859,7 @@ func registerBucketCleanup(app extkingpin.AppClause, objStoreConfig *extflag.Pat
return err
}
bkt, err := client.NewBucket(logger, confContentYaml, component.Cleanup.String())
bkt, err := client.NewBucket(logger, confContentYaml, component.Cleanup.String(), nil)
if err != nil {
return err
}
@ -848,7 +881,7 @@ func registerBucketCleanup(app extkingpin.AppClause, objStoreConfig *extflag.Pat
var sy *compact.Syncer
{
baseBlockIDsFetcher := block.NewBaseBlockIDsFetcher(logger, insBkt)
baseBlockIDsFetcher := block.NewConcurrentLister(logger, insBkt)
baseMetaFetcher, err := block.NewBaseFetcher(logger, tbc.blockSyncConcurrency, insBkt, baseBlockIDsFetcher, "", extprom.WrapRegistererWithPrefix(extpromPrefix, reg))
if err != nil {
return errors.Wrap(err, "create meta fetcher")
@ -870,6 +903,7 @@ func registerBucketCleanup(app extkingpin.AppClause, objStoreConfig *extflag.Pat
ignoreDeletionMarkFilter,
stubCounter,
stubCounter,
0,
)
if err != nil {
return errors.Wrap(err, "create syncer")
@ -883,8 +917,8 @@ func registerBucketCleanup(app extkingpin.AppClause, objStoreConfig *extflag.Pat
level.Info(logger).Log("msg", "synced blocks done")
compact.BestEffortCleanAbortedPartialUploads(ctx, logger, sy.Partial(), insBkt, stubCounter, stubCounter, stubCounter)
if err := blocksCleaner.DeleteMarkedBlocks(ctx); err != nil {
compact.BestEffortCleanAbortedPartialUploads(ctx, logger, sy.Partial(), insBkt, stubCounter, stubCounter, stubCounter, ignoreDeletionMarkFilter.DeletionMarkBlocks())
if _, err := blocksCleaner.DeleteMarkedBlocks(ctx); err != nil {
return errors.Wrap(err, "error cleaning blocks")
}
@ -1013,12 +1047,12 @@ func getKeysAlphabetically(labels map[string]string) []string {
// matchesSelector checks if blockMeta contains every label from
// the selector with the correct value.
func matchesSelector(blockMeta *metadata.Meta, selectorLabels labels.Labels) bool {
for _, l := range selectorLabels {
if v, ok := blockMeta.Thanos.Labels[l.Name]; !ok || v != l.Value {
return false
}
}
return true
matches := true
selectorLabels.Range(func(l labels.Label) {
val, ok := blockMeta.Thanos.Labels[l.Name]
matches = matches && ok && val == l.Value
})
return matches
}
// getIndex calculates the index of s in strs.
@ -1083,7 +1117,7 @@ func registerBucketMarkBlock(app extkingpin.AppClause, objStoreConfig *extflag.P
return err
}
bkt, err := client.NewBucket(logger, confContentYaml, component.Mark.String())
bkt, err := client.NewBucket(logger, confContentYaml, component.Mark.String(), nil)
if err != nil {
return err
}
@ -1163,7 +1197,7 @@ func registerBucketRewrite(app extkingpin.AppClause, objStoreConfig *extflag.Pat
return err
}
bkt, err := client.NewBucket(logger, confContentYaml, component.Rewrite.String())
bkt, err := client.NewBucket(logger, confContentYaml, component.Rewrite.String(), nil)
if err != nil {
return err
}
@ -1232,7 +1266,7 @@ func registerBucketRewrite(app extkingpin.AppClause, objStoreConfig *extflag.Pat
if err != nil {
return errors.Wrapf(err, "read meta of %v", id)
}
b, err := tsdb.OpenBlock(logger, filepath.Join(tbc.tmpDir, id.String()), chunkPool)
b, err := tsdb.OpenBlock(logutil.GoKitLogToSlog(logger), filepath.Join(tbc.tmpDir, id.String()), chunkPool, nil)
if err != nil {
return errors.Wrapf(err, "open block %v", id)
}
@ -1371,7 +1405,7 @@ func registerBucketRetention(app extkingpin.AppClause, objStoreConfig *extflag.P
return err
}
bkt, err := client.NewBucket(logger, confContentYaml, component.Retention.String())
bkt, err := client.NewBucket(logger, confContentYaml, component.Retention.String(), nil)
if err != nil {
return err
}
@ -1391,7 +1425,7 @@ func registerBucketRetention(app extkingpin.AppClause, objStoreConfig *extflag.P
var sy *compact.Syncer
{
baseBlockIDsFetcher := block.NewBaseBlockIDsFetcher(logger, insBkt)
baseBlockIDsFetcher := block.NewConcurrentLister(logger, insBkt)
baseMetaFetcher, err := block.NewBaseFetcher(logger, tbc.blockSyncConcurrency, insBkt, baseBlockIDsFetcher, "", extprom.WrapRegistererWithPrefix(extpromPrefix, reg))
if err != nil {
return errors.Wrap(err, "create meta fetcher")
@ -1413,6 +1447,7 @@ func registerBucketRetention(app extkingpin.AppClause, objStoreConfig *extflag.P
ignoreDeletionMarkFilter,
stubCounter,
stubCounter,
0,
)
if err != nil {
return errors.Wrap(err, "create syncer")
@ -1460,7 +1495,7 @@ func registerBucketUploadBlocks(app extkingpin.AppClause, objStoreConfig *extfla
return errors.Wrap(err, "unable to parse objstore config")
}
bkt, err := client.NewBucket(logger, confContentYaml, component.Upload.String())
bkt, err := client.NewBucket(logger, confContentYaml, component.Upload.String(), nil)
if err != nil {
return errors.Wrap(err, "unable to create bucket")
}
@ -1468,8 +1503,16 @@ func registerBucketUploadBlocks(app extkingpin.AppClause, objStoreConfig *extfla
bkt = objstoretracing.WrapWithTraces(objstore.WrapWithMetrics(bkt, extprom.WrapRegistererWithPrefix("thanos_", reg), bkt.Name()))
s := shipper.New(logger, reg, tbc.path, bkt, func() labels.Labels { return lset }, metadata.BucketUploadSource,
nil, false, metadata.HashFunc(""), shipper.DefaultMetaFilename)
s := shipper.New(
bkt,
tbc.path,
shipper.WithLogger(logger),
shipper.WithRegisterer(reg),
shipper.WithSource(metadata.BucketUploadSource),
shipper.WithMetaFileName(shipper.DefaultMetaFilename),
shipper.WithLabels(func() labels.Labels { return lset }),
shipper.WithUploadCompacted(tbc.uploadCompacted),
)
ctx, cancel := context.WithCancel(context.Background())
g.Add(func() error {

View File

@ -4,6 +4,8 @@
package main
import (
"os"
"path"
"testing"
"github.com/go-kit/log"
@ -44,4 +46,14 @@ func Test_CheckRules_Glob(t *testing.T) {
// invalid path
files = &[]string{"./testdata/rules-files/*.yamlaaa"}
testutil.NotOk(t, checkRulesFiles(logger, files), "expected err for file %s", files)
// Unreadble path
// Move the initial file to a temp dir and make it unreadble there, in case the process cannot chmod the file in the current dir.
filename := "./testdata/rules-files/unreadable_valid.yaml"
bytesRead, err := os.ReadFile(filename)
testutil.Ok(t, err)
filename = path.Join(t.TempDir(), "file.yaml")
testutil.Ok(t, os.WriteFile(filename, bytesRead, 0000))
files = &[]string{filename}
testutil.NotOk(t, checkRulesFiles(logger, files), "expected err for file %s", files)
}

View File

@ -0,0 +1,210 @@
---
title: Life of a Sample in Thanos, and How to Configure it Ingestion Part I
date: "2024-09-16"
author: Thibault Mangé (https://github.com/thibaultmg)
---
## Life of a Sample in Thanos, and How to Configure it Ingestion Part I
### Introduction
Thanos is a sophisticated distributed system with a broad range of capabilities, and with that comes a certain level of configuration complexity. In this series of articles, we will take a deep dive into the lifecycle of a sample within Thanos, tracking its journey from initial ingestion to final retrieval. Our focus will be to explain Thanos's critical internal mechanisms and highlight the essential configurations for each component, guiding you toward achieving your desired operational results. We will be covering the following Thanos components:
* **Receive**: Ingests samples from remote Prometheus instances and uploads blocks to object storage.
* **Sidecar**: Attaches to Prometheus pods as a sidecar container, ingests its data and uploads blocks to object storage.
* **Compactor**: Merges and deduplicates blocks in object storage.
* **Store**: Exposes blocks in object storage for querying.
* **Query**: Retrieves data from stores and processes queries.
* **Query Frontend**: Distributes incoming queries to Querier instances.
The objective of this series of articles is to make Thanos more accessible to new users, helping alleviate any initial apprehensions. We will also assume that the working environment is Kubernetes. Given the extensive ground to cover, our goal is to remain concise throughout this exploration.
Before diving deeper, please check the [annexes](#annexes) to clarify some essential terminology. If you are already familiar with these concepts, feel free to skip ahead.
### The Sample Origin: Do You Have Close Integration Capabilities?
The sample usually originates from a Prometheus instance that is scraping targets in a cluster. There are two possible scenarios:
* The **Prometheus instances are under your control and you can access it from your Thanos deployment**. In this case, you can use the Thanos sidecar, which you will attach to the pod running the Prometheus server. The Thanos sidecar will directly read the raw samples from the Prometheus server using the [remote read API](https://prometheus.io/docs/prometheus/latest/querying/remote_read_api/). Then, the sidecar will behave similarly to the other scenario. It will expose its local data via the Store API as a **Receiver**, without the routing and ingestion parts. Thus, we will not delve further into this use case.
* The **Prometheus servers are running in clusters that you do not control**. In this case, you cannot attach a sidecar to the Prometheus server and you cannot fetch its data. The samples will travel to your Thanos system using the remote write protocol. This is the scenario we will focus on.
Also, bear in mind that if adding Thanos for collecting your clusters metrics removes the need for a full fledged local Prometheus (with querying and alerting), you can save some resources by using the [Prometheus Agent mode](https://prometheus.io/docs/prometheus/latest/feature_flags/#prometheus-agent). In this configuration, it will only scrape the targets and forward the data to the Thanos system.
The following diagram illustrates the two scenarios:
<img src="img/life-of-a-sample/close-integration.png" alt="Close integration vs external client" style="max-width: 600px; display: block;margin: 0 auto;"/>
Comparing the two deployment modes, the Sidecar Mode is generally preferable due to its simpler configuration and fewer moving parts. However, if this isn't possible, opt for the **Receive Mode**. Bear in mind, this mode requires careful configuration to ensure high availability, scalability, and durability. It adds another layer of indirection and comes with the overhead of operating the additional component.
### Sending Samples to Thanos
#### The Remote Write Protocol
Let's start with our first Thanos component, the **Receive** or **Receiver**, the entry point to the system. It was introduced with this [proposal](https://thanos.io/tip/proposals-done/201812-thanos-remote-receive.md/). This component facilitates the ingestion of metrics from multiple clients, eliminating the need for close integration with the clients' Prometheus deployments.
Thanos Receive exposes a remote-write endpoint (see [Prometheus remote-write](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write)) that Prometheus servers can use to transmit metrics. The only prerequisite on the client side is to configure the remote write endpoint on each Prometheus server, a feature natively supported by Prometheus.
On the Receive component, the remote write endpoint is configured with the `--remote-write.address` flag. You can also configure TLS options using other `--remote-write.*` flags. You can see the full list of the Receiver flags [here](https://thanos.io/tip/components/receive.md/#flags).
The remote-write protocol is based on HTTP POST requests. The payload consists of a protobuf message containing a list of time-series samples and labels. Generally, a payload contains at most one sample per time series and spans numerous time series. Metrics are typically scraped every 15 seconds, with a maximum remote write delay of 5 seconds to minimize latency, from scraping to query availability on the receiver.
#### Tuning the Remote Write Protocol
The Prometheus remote write configuration offers various parameters to tailor the connection specifications, parallelism, and payload properties (compression, batch size, etc.). While these may seem like implementation details for Prometheus, understanding them is essential for optimizing ingestion, as they form a sensitive part of the system.
From an implementation standpoint, the key idea is to read directly from the TSDB WAL (Write Ahead Log), a simple mechanism commonly used by databases to ensure data durability. If you wish to delve deeper into the TSDB WAL, check out this [great article](https://ganeshvernekar.com/blog/prometheus-tsdb-wal-and-checkpoint). Once samples are extracted from the WAL, they are aggregated into parallel queues (shards) as remote-write payloads. When a queue reaches its limit or a maximum timeout is reached, the remote-write client stops reading the WAL and dispatches the data. The cycle continues. The parallelism is defined by the number of shards, their number is dynamically optimized. More insights on Prometheus's remote write can be found in the [documentation](https://prometheus.io/docs/practices/remote_write/). You can also find troubleshooting tips on [Grafana's blog](https://grafana.com/blog/2021/04/12/how-to-troubleshoot-remote-write-issues-in-prometheus/#troubleshooting-and-metrics).
The following diagram illustrates the impacts of each parameter on the remote write protocol:
<img src="img/life-of-a-sample/remote-write.png" alt="Remote write" width="700"/>
Key Points to Consider:
* **The send deadline setting**: `batch_send_deadline` should be set to around 5s to minimize latency. This timeframe strikes a balance between minimizing latency and avoiding excessive request frequency that could burden the Receiver. While a 5-second delay might seem substantial in critical alert scenarios, it is generally acceptable considering the typical resolution time for most issues.
* **The backoff settings**: The `min_backoff` should ideally be no less than 250 milliseconds, and the `max_backoff` should be at least 10 seconds. These settings help prevent Receiver overload, particularly in situations like system restarts, by controlling the rate of data sending.
#### Protecting the Receiver from Overuse
In scenarios where you have limited control over client configurations, it becomes essential to shield the Receive component from potential misuse or overload. The Receive component includes several configuration options designed for this purpose, comprehensively detailed in the [official documentation](https://thanos.io/tip/components/receive.md/#limits--gates-experimental). Below is a diagram illustrating the impact of these configuration settings:
<img src="img/life-of-a-sample/receive-limits.png" alt="Receive limits" width="900"/>
When implementing a topology with separate router and ingestor roles (as we will see later), these limits should be enforced at the router level.
Key points to consider:
* **Series and samples limits**: Typically, with a standard target scrape interval of 15 seconds and a maximum remote write delay of 5 seconds, the `series_limit` and `samples_limit` tend to be functionally equivalent. However, in scenarios where the remote writer is recovering from downtime, the `samples_limit` may become more restrictive, as the payload might include multiple samples for the same series.
* **Handling request limits**: If a request exceeds these limits, the system responds with a 413 (Entity Too Large) HTTP error. Currently, Prometheus does not support splitting requests in response to this error, leading to data loss.
* **Active series limiting**: The limitation on active series persists as long as the count remains above the set threshold in the Receivers' TSDBs. Active series represent the number of time series currently stored in the TSDB's (Time Series Database) head block. The head block is the in-memory portion of the TSDB where incoming samples are temporarily stored before being compacted into persistent on-disk blocks. The head block is typically compacted every two hours. This is when stale series are removed, and the active series count decreases. Requests reaching this limit are rejected with a 429 (Too Many Requests) HTTP code, triggering retries.
Considering these aspects, it is important to carefully monitor and adjust these limits. While they serve as necessary safeguards, overly restrictive settings can inadvertently lead to data loss.
### Receiving Samples with High Availability and Durability
#### The Need for Multiple Receive Instances
Relying on a single instance of Thanos Receive is not sufficient for two main reasons:
* Scalability: As your metrics grow, so does the need to scale your infrastructure.
* Reliability: If a single Receive instance falls, it disrupts metric collection, affecting rule evaluation and alerting. Furthermore, during downtime, Prometheus servers will buffer data in their Write-Ahead Log (WAL). If the outage exceeds the WAL's retention duration (default is 2 hours), this can lead to data loss.
#### The Hashring Mechanism
To achieve high availability, it is necessary to deploy multiple Receive replicas. However, it is not just about having more instances; it is crucial to maintain consistency in sample ingestion. In other words, samples from a given time series should always be ingested by the same Receive instance. This is necessary for optimized operations. When this is not achieved, it imposes a higher load on other operations such as compacting the data or querying the data.
To that effect, you guessed it, the Receive component uses a hashring! With the hashring, every Receive participant knows and agrees on who must ingest which sample. When clients send data, they connect to any Receive instance, which then routes the data to the correct instances based on the hashring. This is why the Receive component is also known as the **IngestorRouter**.
<img src="img/life-of-a-sample/ingestor-router.png" alt="IngestorRouter" style="max-width: 600px; display: block;margin: 0 auto;"/>
Receive instances use a gossip protocol to maintain a consistent view of the hashring, requiring inter-instance communication via a configured HTTP server (`--http-address` flag).
There are two possible hashrings:
* **hashmod**: This algorithm distributes time series by hashing labels modulo the number of instances. It is effective in evenly distributing the load. The downside is that scaling operations on the hashring cause a high churn of time series on the nodes, requiring each node to flush its TSDB head and upload its recent blocks on the object storage. During this operation that can last a few minutes, the receivers cannot ingest data, causing a downtime. This is especially critical if you are running big Receive nodes. The more data they have, the longer the downtime.
* **ketama**: A more recent addition, an implementation of a consistent hashing algorithm. It means that during scaling operations, most of the time series will remain attached to the same nodes. No TSDB operation or data upload is needed before operating into the new configuration. As a result, the downtime is minimal, just the time for the nodes to agree on the new hashring. As a downside, it can be less efficient in evenly distributing the load compared to hashmod.
The hashring algorithm is configured with the `--receive.hashrings-algorithm` flag. You can use the [Thanos Receive Controller](https://github.com/observatorium/thanos-receive-controller) to automate the management of the hashring.
Key points to consider:
* The case for hashmod: If your load is stable for the foreseeable future, the `hashmod` algorithm is a good choice. It is more efficient in evenly distributing the load. Otherwise, `ketama` is recommended for its operational benefits.
* The case for small Receive nodes: If you have smaller Receive nodes, the downtime during scaling operation with the `hashmod` algorithm will be shorter as the amount of data to upload to the object storage is smaller. Also, when using the `ketama` algorithm, if a node falls, its requests are directly redistributed to the remaining nodes. This could cause them to be overwhelmed if there are too few of them and result in a downtime. With more nodes, the added load represents a smaller fraction of the total load.
* Protecting the nodes after recovery: During a downtime, the Receive replies with 503 to the clients, which is interpreted as a temporary failure and remote-writes are retried. At that moment, your Receive will have to catch up and ingest a lot of data. This is why we recommend using the `--receive.limits-config` flag to limit the amount of data that can be received. This will prevent the Receive from being overwhelmed by the catch up.
#### Ensuring Samples Durability
For clients requiring high data durability, the `--receive.replication-factor` flag ensures data duplication across multiple receivers. When set to n, it will only reply with a successful processing response to the client once it has duplicated the data to `n-1` other receivers. Additionally, an external replicas label can be added to each Receive instance (`--label` flag) to mark replicated data. This setup increases data resilience but also expands the data footprint.
For even greater durability, replication can take into account the [availability zones](https://thanos.io/tip/components/receive.md/#az-aware-ketama-hashring-experimental) of the Receive instances. It will ensure that data is replicated to instances in different availability zones, reducing the risk of data loss in case of a zone failure. This is however only supported with the `ketama` algorithm.
Beyond the increased storage cost of replication, another downside is the increased load on the Receive instances that must now forward a given request to multiple nodes, according to the time series labels. Nodes receiving the first replica then must forward the series to the next Receive node until the replication factor is satisfied. This multiplies the internodes communication, especially with big hashrings.
#### Improving Scalability and Reliability
A new deployment topology was [proposed](https://thanos.io/tip/proposals-accepted/202012-receive-split.md/), separating the **router** and **ingestor** roles. The hashring configuration is read by the routers, which will direct each time series to the appropriate ingestor and its replicas. This role separation provides some important benefits:
* **Scalability**: The routers and ingestors have different constraints and can be scaled independently. The router requires a performant network and CPU to route the samples, while the ingestor needs significant memory and storage. The router is stateless, while the ingestor is stateful. This separation of concerns also enables the setup of more complex topologies, such as chaining routers and having multiple hashrings. For example, you can have different hashrings attached to the routers, grouping distinct tenants with different service levels supported by isolated groups of ingestors.
* **Reliability**: During hashring reconfigurations, especially with the hashmod algorithm, some nodes may become ready before others, leading to a partially operational hashring that results in many request failures because replicas cannot be forwarded. This triggers retries, increasing the load and causing instabilities. Relieving the ingestors from the routing responsibilities makes them more stable and less prone to overload. This is especially important as they are stateful components. Routers, on the other hand, are stateless and can be easily scaled up and down.
<img src="img/life-of-a-sample/router-and-ingestor.png" alt="IngestorRouter" style="max-width: 600px; display: block;margin: 0 auto;"/>
The Receive instance behaves in the following way:
* When both a hashring and `receive.local-endpoint` are set, it acts as a **RouterIngestor**. This last flag enables the router to identify itself in the hashring as an ingestor and ingest the data when appropriate.
* When no hashring is set, it simply ingests the data and acts as an **Ingestor**.
* When only the hashring is set, it acts as a **Router** and forwards the data to the correct ingestor.
#### Handling Out-of-Order Timestamps
To enhance reliability in data ingestion, Thanos Receive supports out-of-order samples.
Samples are ingested into the Receiver's TSDB, which has strict requirements for the order of timestamps:
* Samples are expected to have increasing timestamps for a given time series.
* A new sample cannot be more than 1 hour older than the most recent sample of any time series in the TSDB.
When these requirements are not met, the samples are dropped, and an out-of-order warning is logged. However, there are scenarios where out-of-order samples may occur, often because of [clients' misconfigurations](https://thanos.io/tip/operating/troubleshooting.md/#possible-cause-1) or delayed remote write requests, which can cause samples to arrive out of order depending on the remote write implementation. Additional examples at the Prometheus level can be found in [this article](https://promlabs.com/blog/2022/12/15/understanding-duplicate-samples-and-out-of-order-timestamp-errors-in-prometheus/).
As you are not necessarily in control of your clients' setups, you may want to increase resilience against these issues. Support for out-of-order samples has been implemented for the TSDB. This feature can be enabled with the `tsdb.out-of-order.time-window` flag on the Receiver. The downsides are:
* An increase in the TSDB's memory usage, proportional to the number of out-of-order samples.
* The TSDB will produce blocks with overlapping time periods, which the compactor must handle. Ensure the `--compact.enable-vertical-compaction` [flag](https://thanos.io/tip/components/compact.md/#enabling-vertical-compaction) is enabled on the compactor to manage these overlapping blocks. We will cover this in more detail in the next article.
Additionally, consider setting the `tsdb.too-far-in-future.time-window` flag to a value higher than the default 0s to account for possible clock drifts between clients and the Receiver.
### Conclusion
In this first part, we have covered the initial steps of the sample lifecycle in Thanos, focusing on the ingestion process. We have explored the remote write protocol, the Receive component, and the critical configurations needed to ensure high availability and durability. Now, our sample is safely ingested and stored in the system. In the next part, we will continue following our sample's journey, delving into the data management and querying processes.
See the full list of articles in this series:
* Life of a sample in thanos, and how to configure it Ingestion Part I
* Life of a sample in thanos, and how to configure it Data Management Part II
* Life of a sample in thanos, and how to configure it Querying Part III
### Annexes
#### Metrics Terminology: Samples, Labels and Series
* **Sample**: A sample in Prometheus represents a single data point, capturing a measurement of a specific system aspect or property at a given moment. It is the fundamental unit of data in Prometheus, reflecting real-time system states.
* **Labels**: very sample in Prometheus is tagged with labels, which are key-value pairs that add context and metadata. These labels typically include:
* The nature of the metric being measured.
* The source or origin of the metric.
* Other relevant contextual details.
* **External labels**: External labels are appended by the scraping or receiving component (like a Prometheus server or Thanos Receive). They enable:
* **Sharding**: Included in the `meta.json` file of the block created by Thanos, these labels are used by the compactor and the store to shard blocks processing effectively.
* **Deduplication**: In high-availability setups where Prometheus servers scrape the same targets, external labels help identify and deduplicate similar samples.
* **Tenancy isolation**: In multi-tenant systems, external labels are used to segregate data per tenant, ensuring logical data isolation.
* **Series** or **Time Series**: In the context of monitoring, a Series, which is a more generic term is necessarily a time series. A series is defined by a unique set of label-value combinations. For instance:
```
http_requests_total{method="GET", handler="/users", status="200"}
^ ^
Series name (label `__name__`) Labels (key=value format)
```
In this example, http_requests_total is a specific label (`__name__`). The unique combination of labels creates a distinct series. Prometheus scrapes these series, attaching timestamps to each sample, thereby forming a dynamic time series.
For our discussion, samples can be of various types, but we will treat them as simple integers for simplicity.
The following diagram illustrates the relationship between samples, labels and series:
<img src="img/life-of-a-sample/series-terminology.png" alt="Series terminology" width="500"/>
#### TSDB Terminology: Chunks, Chunk Files and Blocks
Thanos adopts its [storage architecture](https://thanos.io/tip/thanos/storage.md/#data-in-object-storage) from [Prometheus](https://prometheus.io/docs/prometheus/latest/storage/), utilizing the TSDB (Time Series Database) [file format](https://github.com/prometheus/prometheus/blob/release-2.48/tsdb/docs/format/README.md) as its foundation. Let's review some key terminology that is needed to understand some of the configuration options.
**Samples** from a given time series are first aggregated into small **chunks**. The storage format of a chunk is highly compressed ([see documentation](https://github.com/prometheus/prometheus/blob/release-2.48/tsdb/docs/format/chunks.md#xor-chunk-data)). Accessing a given sample of the chunk requires decoding all preceding values stored in this chunk. This is why chunks hold up to 120 samples, a number chosen to strike a balance between compression benefits and the performance of reading data.
Chunks are created over time for each time series. As time progresses, these chunks are assembled into **chunk files**. Each chunk file, encapsulating chunks from various time series, is limited to 512MiB to manage memory usage effectively during read operations. Initially, these files cover a span of two hours and are subsequently organized into a larger entity known as a **block**.
A **block** is a directory containing the chunk files in a specific time range, an index and some metadata. The two-hour duration for initial blocks is chosen for optimizing factors like storage efficiency and read performance. Over time, these two-hour blocks undergo horizontal compaction by the compactor, merging them into larger blocks. This process is designed to optimize long-term storage by extending the time period each block covers.
The following diagram illustrates the relationship between chunks, chunk files and blocks:
<img src="img/life-of-a-sample/storage-terminology.png" alt="TSDB terminology" width="900"/>

View File

@ -0,0 +1,245 @@
---
title: Life of a Sample in Thanos and How to Configure It Data Management Part II
date: "2024-09-16"
author: Thibault Mangé (https://github.com/thibaultmg)
---
## Life of a Sample in Thanos and How to Configure It Data Management Part II
### Introduction
In the first part of this series, we followed the life of a sample from its inception in a Prometheus server to our Thanos Receivers. We will now explore how Thanos manages the data ingested by the Receivers and optimizes it in the object store for reduced cost and fast retrieval.
Let's delve into these topics and more in the second part of the series.
### Preparing Samples for Object Storage: Building Chunks and Blocks
#### Using Object Storage
A key feature of Thanos is its ability to leverage economical object storage solutions like AWS S3 for long-term data retention. This contrasts with Prometheus's typical approach of storing data locally for shorter periods.
The Receive component is responsible for preparing data for object storage. Thanos adopts the TSDB (Time Series Database) data model, with some adaptations, for its object storage. This involves aggregating samples over time to construct TSDB Blocks. Please refer to the annexes of the first part if this vocabulary is not clear to you.
These blocks are built by aggregating data over two-hour periods. Once a block is ready, it is sent to the object storage, which is configured using the `--objstore.config` flag. This configuration is uniform across all components requiring object storage access.
On restarts, the Receive component ensures data preservation by immediately flushing existing data to object storage, even if it does not constitute a full two-hour block. These partial blocks are less efficient but are then optimized by the compactor, as we will see later.
The Receive is also able to [isolate data](https://thanos.io/tip/components/receive.md/#tenant-lifecycle-management) coming from different tenants. The tenant can be identified in the request by different means: a header (`--receive.tenant-header`), a label (`--receive.split-tenant-label-name`) or a certificate (`--receive.tenant-certificate-field`). Their data is ingested into different TSDBs instances (you might hear this referred to as the multiTSDB). The benefits are twofold:
* It allows for parallelization of the block-building process, especially on the compactor side as we will see later.
* It allows for smaller indexes. Indeed, labels tend to be similar for samples coming from the same source, leading to more effective compression.
<img src="img/life-of-a-sample/multi-tsdb.png" alt="Data expansion" style="max-width: 600px; display: block;margin: 0 auto;"/>
When a block is ready, it is uploaded to the object store with the block external label defined by the flag `--receive.tenant-label-name`. This corresponds to the `thanos.labels` field of the [block metadata](https://thanos.io/tip/thanos/storage.md/#metadata-file-metajson). This will be used by the compactor to group blocks together, as we will see later.
#### Exposing Local Data for Queries
During the block-building phase, the data is not accessible to the Store Gateway as it has not been uploaded to the object store yet. To counter that, the Receive component also serves as a data store, making the local data available for query through the `Store API`. This is a common gRPC API used across all Thanos components for time series data access, set with the `--grpc-address` flag. The Receive will serve all data it has. The more data it serves, the more resources it will use for this duty in addition to ingesting client data.
<img src="img/life-of-a-sample/receive-store-api.png" alt="Data expansion" style="max-width: 600px; display: block;margin: 0 auto;"/>
The amount of data the Receive component serves can be managed through two parameters:
* `--tsdb.retention`: Sets the local storage retention duration. The minimum is 2 hours, aligning with block construction periods.
* `--store.limits.request-samples` and `--store.limits.request-series`: These parameters limit the volume of data that can be queried by setting a maximum on the number of samples and/or the number of series. If these limits are exceeded, the query will be denied to ensure system stability.
Key points to consider:
* The primary objective of the Receive component is to ensure **reliable data ingestion**. However, the more data it serves through the Store API, the more resources it will use for this duty in addition to ingesting client data. You should set the retention duration to the minimum required for your use case to optimize resource allocation. The minimum value for 2-hour blocks would be a 4-hour retention to account for availability in the Store Gateway after the block is uploaded to object storage. To prevent data loss, if the Receive component fails to upload blocks before the retention limit is reached, it will hold them until the upload succeeds.
* Even when the retention duration is short, your Receive instance could be overwhelmed by a query selecting too much data. You should set limits in place to ensure the stability of the Receive instances. These limits must be carefully set to enable Store API clients to retrieve the data they need while preventing resource exhaustion. The longer the retention, the higher the limits should be as the number of samples and series will increase.
### Maintaining Data: Compaction, Downsampling, and Retention
#### The Need for Compaction
The Receive component implements many strategies to ingest samples reliably. However, this can result in unoptimized data in object storage. This is due to:
* Inefficient partial blocks sent to object storage on shutdowns.
* Duplicated data when replication is set. Several Receive instances will send the same data to object storage.
* Incomplete blocks (invalid blocks) sent to object storage when the Receive fails in the middle of an upload.
The following diagram illustrates the impact on data expansion in object storage when samples from a given target are ingested from a high-availability Prometheus setup (with 2 instances) and replication is set on the Receive (factor 3):
<img src="img/life-of-a-sample/data-expansion.png" alt="Data expansion" style="max-width: 600px; display: block;margin: 0 auto;"/>
This leads to a threefold increase in label volume (one for each block) and a sixfold increase in sample volume! This is where the Compactor comes into play.
The Compactor component is responsible for maintaining and optimizing data in object storage. It is a long-running process when configured to wait for new blocks with the `--wait` flag. It also needs access to the object storage using the `--objstore.config` flag.
Under normal operating conditions, the Compactor will check for new blocks every 5 minutes. By default, it will only consider blocks that are older than 30 minutes (configured with the `--consistency-delay` flag) to avoid reading partially uploaded blocks. It will then process these blocks in a structured manner, compacting them according to defined settings that we will discuss in the next sections.
#### Compaction Modes
Compaction consists of merging blocks that have overlapping or adjacent time ranges. This is called **horizontal compaction**. Using the [Metadata file](https://thanos.io/tip/thanos/storage.md/#metadata-file-metajson) which contains the minimum and maximum timestamps of samples in the block, the Compactor can determine if two blocks overlap. If they do, they are merged into a new block. This new block will have its compaction level index increased by one. So from two adjacent blocks of 2 hours each, we will get a new block of 4 hours.
During this compaction, the Compactor will also deduplicate samples. This is called [**vertical compaction**](https://thanos.io/tip/components/compact.md/#vertical-compactions). The Compactor provides two deduplication modes:
* `one-to-one`: This is the default mode. It will deduplicate samples that have the same timestamp and the same value but different replica label values. The replica label is configured by the `--deduplication.replica-label` flag. This flag can be repeated to account for several replication labels. Usually set to `replica`, make sure it is set up as external label on the Receivers with the flag `--label=replica=xxx`. The benefit of this mode is that it is straightforward and will remove replicated data from the Receive. However, it is not able to remove data replicated by high-availability Prometheus setups because these samples will rarely be scraped at exactly the same timestamps, as demonstrated by the diagram below.
* `penalty`: This a more complex deduplication algorithm that is able to deduplicate data coming from high availability prometheus setups. It can be set with the `--deduplication.func` flag and requires also setting the `--deduplication.replica-label` flag that identifies the label that contains the replica label. Usually `prometheus_replica`.
Here is a diagram illustrating how Prometheus replicas generate samples with different timestamps that cannot be deduplicated with the `one-to-one` mode:
<img src="img/life-of-a-sample/ha-prometheus-duplicates.png" alt="High availability prometheus duplication" style="max-width: 600px; display: block;margin: 0 auto;"/>
Getting back to our example illustrating the data duplication happening in the object storage, here is how each compaction process will impact the data:
<img src="img/life-of-a-sample/compactor-compaction.png" alt="Compactor compaction" width="700"/>
First, horizontal compaction will merge blocks together. This will mostly have an effect on the labels data that are stored in a compressed format in a single index binary file attached to a single block. Then, one-to-one deduplication will remove identical samples and delete the related replica label. Finally, penalty deduplication will remove duplicated samples resulting from concurrent scrapes in high-availability Prometheus setups and remove the related replica label.
You want to deduplicate data as much as possible because it will lower your object storage cost and improve query performance. However, using the penalty mode presents some limitations. For more details, see [the documentation](https://thanos.io/tip/components/compact.md/#vertical-compaction-risks).
Key points to consider:
* You want blocks that are not too big because they will be slow to query. However, you also want to limit the number of blocks because having too many will increase the number of requests to the object storage. Also, the more blocks there are, the less compaction occurs, and the more data there is to store and load into memory.
* You do not need to worry about too small blocks, as the Compactor will merge them together. However, you could have too big blocks. This can happen if you have very high cardinality workloads or churn-heavy workloads like CI runs, build pipelines, serverless functions, or batch jobs, which often lead to huge cardinality explosions as the metrics labels will be changing often.
* The main solution to this is splitting the data into several block streams, as we will see later. This is Thanos's sharding strategy.
* There are also cases where you might want to limit the size of the blocks. To that effect, you can use the following parameters:
* You can limit the compaction levels with `--debug.max-compaction-level` to prevent the Compactor from creating blocks that are too big. This is especially useful when you have a high metrics churn rate. Level 1 is the default and will create blocks of 2 hours. Level 2 will create blocks of 8 hours, level 3 of 2 days, and up to level 4 of 14 days. Without this limit, the Compactor will create blocks of up to 2 weeks. This is not a magic bullet; it does not limit the data size of the blocks. It just limits the number of blocks that can be merged together. The downside of using this setting is that it will increase the number of blocks in the object storage. They will use more space, and the query performance might be impacted.
* The flag `compact.block-max-index-size` can be used more effectively to specify the maximum index size beyond which the Compactor will stop block compaction, independently of its compaction level. Once a block's index exceeds this size, the system marks it for no further compaction. The default value is 64 GB, which is the maximum index size the TSDB supports. As a result, some block streams might appear discontinuous in the UI, displaying a lower compaction level than the surrounding blocks.
#### Scaling the Compactor: Block Streams
Not all blocks covering the same time range are compacted together. Instead, the Compactor organizes them into distinct [compaction groups or block streams](https://thanos.io/tip/components/compact.md/#compaction-groups--block-streams). The key here is to leverage external labels to group data originating from the same source. This strategic grouping is particularly effective for compacting indexes, as blocks from the same source tend to have nearly identical labels.
You can improve the performance of the Compactor by:
* Increasing the number of concurrent compactions using the `--max-concurrent` flag. Bear in mind that you must scale storage, memory and CPU resources accordingly (linearly).
* Sharding the data. In this mode, each Compactor will process a disjoint set of block streams. This is done by setting up the `--selector.relabel-config` flag on the external labels. For example:
```yaml
- action: hashmod
source_labels:
- tenant_id # An external label that identifies some block streams
target_label: shard
modulus: 2 # The number of Compactor replicas
- action: keep
source_labels:
- shard
regex: 0 # The shard number assigned to this Compactor
```
In this configuration, the `hashmod` action is used to distribute blocks across multiple Compactor instances based on the `tenant_id` label. The `modulus` should match the number of Compactor replicas you have. Each replica will then only process the blocks that match its shard number, as defined by the `regex` in the `keep` action.
#### Downsampling and Retention
The Compactor also optimizes data reads for long-range queries. If you are querying data for several months, you do not need the typical 15-second raw resolution. Processing such a query will be very inefficient, as it will retrieve a lot of unnecessary data that you will not be able to visualize with such detail in your UI. In worst-case scenarios, it may even cause some components of your Thanos setup to fail due to memory exhaustion.
To enable performant long range queries, the Compactor can downsample data using `--retention.resolution-*` flags. It supports two downsampling levels: 5 minutes and 1 hour. These are the resolutions of the downsampled series. They will typically come on top of the raw data, so that you can have both raw and downsampled data. This will enable you to spot abnormal patterns over long-range queries and then zoom into specific parts using the raw data. We will discuss how to configure the query to use the downsampled data in the next article.
When the Compactor performs downsampling, it does more than simply reduce the number of data points by removing intermediate samples. While reducing the volume of data is a primary goal, especially to improve performance for long-range queries, the Compactor ensures that essential statistical properties of the original data are preserved. This is crucial for maintaining the accuracy and integrity of any aggregations or analyses performed on the downsampled data. In addition to the downsampled data, it stores the count, minimum, maximum, and sum of the downsampled window. Functions like sum(), min(), max(), and avg() can then be computed correctly over the downsampled data because the necessary statistical information is preserved.
This downsampled data is then stored in its own block, one per downsampling level for each corresponding raw block.
Key points to consider:
* Downsampling is not for reducing the volume of data in object storage. It is for improving long-range query performance, making your system more versatile and stable.
* Thanos recommends having the same retention duration for raw and downsampled data. This will enable you to have a consistent view of your data over time.
* As a rule of thumb, you can consider that each downsampled data level increases the storage need by onefold compared to the raw data, although it is often a bit less than that.
#### The Compactor UI and the Block Streams
The Compactor's functionality and the progress of its operations can be monitored through the **Block Viewer UI**. This web-based interface is accessible if the Compactor is configured with the `--http-address` flag. Additional UI settings are controlled via `--web.*` and `--block-viewer.*` flags. The Compactor UI provides a visual representation of the compaction process, showing how blocks are grouped and compacted over time. Here is a glimpse of what the UI looks like:
<img src="img/life-of-a-sample/compactor-ui.png" alt="Receive and Store data overlap" width="800"/>
Occasionally, some blocks may display an artificially high compaction level in the UI, appearing lower in the stream compared to adjacent blocks. This scenario often occurs in situations like rolling Receiver upgrades, where Receivers restart sequentially, leading to the creation and upload of partial blocks to the object store. The Compactor then vertically compacts these blocks as they arrive, resulting in a temporary increase in compaction levels. When these blocks are horizontally compacted with adjacent blocks, they will be displayed higher up in the stream.
As explained earlier with compaction levels, by default, the Compactors strategy involves compacting 2-hour blocks into 8-hour blocks once they are available, then progressing to 2-day blocks, and up to 14 days, following a structured compaction timeline.
### Exposing Bucket Data for Queries: The Store Gateway and the Store API
#### Exposing Data for Queries
The Store Gateway acts as a facade for the object storage, making bucket data accessible via the Thanos Store API, a feature first introduced with the Receive component. The Store Gateway exposes the Store API with the `--grpc-address` flag.
The Store Gateway requires access to the object storage bucket to retrieve data, which is configured with the `--objstore.config` flag. You can use the `--max-time` flag to specify which blocks should be considered by the Store Gateway. For example, if your Receive instances are serving data up to 10 hours, you may configure `--max-time=-8h` so that it does not consider blocks more recent than 8 hours. This avoids returning the same data as the Receivers while ensuring some overlap between the two.
To function optimally, the Store Gateway relies on caches. To understand their usefulness, let's first explore how the Store Gateway retrieves data from the blocks in the object storage.
#### Retrieving Samples from the Object Store
Consider the simple following query done on the Querier:
```promql
# Between now and 2 days ago, compute the rate of http requests per second, filtered by method and status
rate(http_requests_total{method="GET", status="200"}[5m])
```
This PromQL query will be parsed by the Querier, which will emit a Thanos [Store API](https://github.com/thanos-io/thanos/blob/main/pkg/store/storepb/rpc.proto) request to the Store Gateway with the following parameters:
```proto
SeriesRequest request = {
min_time: [Timestamp 2 days ago],
max_time: [Current Timestamp],
max_resolution_window: 1h, // the minimum time range between two samples, relates to the downsampling levels
matchers: [
{ name: "__name__", value: "http_requests_total", type: EQUAL },
{ name: "method", value: "GET", type: EQUAL },
{ name: "status", value: "200", type: EQUAL }
]
}
```
The Store Gateway processes this request in several steps:
* **Metadata processing**: The Store Gateway first examines the block [metadata](https://thanos.io/tip/thanos/storage.md/#metadata-file-metajson) to determine the relevance of each block to the query. It evaluates the time range (`minTime` and `maxTime`) and external labels (`thanos.labels`). Blocks are deemed relevant if their timestamps overlap with the query's time range and if their resolution (`thanos.downsample.resolution`) matches the query's maximum allowed resolution.
* **Index processing**: Next, the Store Gateway retrieves the [indexes](https://thanos.io/tip/thanos/storage.md/#index-format-index) of candidate blocks. This involves:
* Fetching postings lists for each label specified in the query. These are inverted indexes where each label and value has an associated sorted list of all the corresponding time series IDs. Example:
* `"__name__=http_requests_total": [1, 2, 3]`,
* `"method=GET": [1, 2, 6]`,
* `"status=200": [1, 2, 5]`
* Intersecting these postings lists to select series matching all query labels. In our example these are series 1 and 2.
* Retrieving the series section from the index for these series, which includes the chunk files, the time ranges and offset position in the file. Example:
* Series 1: [Chunk 1: mint=t0, maxt=t1, fileRef=0001, offset=0], ...
* Determining the relevant chunks based on their time range intersection with the query.
* **Chunks retrieval**: The Store Gateway then fetches the appropriate chunks, either from the object storage directly or from a chunk cache. When retrieving from the object store, the Gateway leverages its API to read only the needed bytes (i.e., using S3 range requests), bypassing the need to download entire chunk files.
Then, the Gateway streams the selected chunks to the requesting Querier.
#### Optimizing the Store Gateway
Understanding the retrieval algorithm highlights the critical role of an external [index cache](https://thanos.io/tip/components/store.md/#index-cache) in the Store Gateway's operation. This is configured using the `--index-cache.config` flag. Indexes contain all labels and values of the block, which can result in large sizes. When the cache is full, Least Recently Used (LRU) eviction is applied. In scenarios where no external cache is configured, a portion of the memory will be utilized as a cache, managed via the `--index-cache.size` flag.
Moreover, the direct retrieval of chunks from object storage can be suboptimal, and result in excessive costs, especially with a high volume of queries. To mitigate this, employing a [caching bucket](https://thanos.io/tip/components/store.md/#caching-bucket) can significantly reduce the number of queries to the object storage. This is configured using the `--store.caching-bucket.config` flag. This chunk caching strategy is particularly effective when data access patterns are predominantly focused on recent data. By caching these frequently accessed chunks, query performance is enhanced, and the load on object storage is reduced.
Finally, you can implement the same safeguards as the Receive component by setting limits on the number of samples and series that can be queried. This is accomplished using the same `--store.limits.request-samples` and `--store.limits.request-series` flags.
#### Scaling the Store Gateway
The performance of Thanos Store components can be notably improved by managing concurrency and implementing sharding strategies.
Adjusting the level of concurrency can have a significant impact on performance. This is managed through the `--store.grpc.series-max-concurrency` flag, which sets the number of allowed concurrent series requests on the Store API. Other lower-level concurrency settings are also available.
After optimizing the store processing, you can distribute the query load using sharding strategies similar to what was done with the Compactor. Using a relabel configuration, you can assign a disjoint set of blocks to each Store Gateway replica. Heres an example of how to set up sharding using the `--selector.relabel-config` flag:
```yaml
- action: hashmod
source_labels:
- tenant_id # An external label that identifies some block streams
target_label: shard
modulus: 2 # The number of Store Gateways replicas
- action: keep
source_labels:
- shard
regex: 0 # The shard number assigned to this Store Gateway
```
Sharding based on the `__block_id` is not recommended because it prevents Stores from selecting the most relevant data resolution needed for a query. For example, one store might see only the raw data and return it, while another store sees the downsampled version for the same query and also returns it. This duplication creates unnecessary overhead.
External label based shrading avoids this issue. By giving a store a complete view of a stream's data (both raw and downsampled), it can effectively select the most appropriate resolution.
If external label sharding is not sufficient, you can combine it with time partitioning using the `--min-time` and `--max-time` flags. This process is done at the chunk level, meaning you can use shorter time intervals for recent data in 2 hour blocks, but you must use longer intervals for older data to account for horizontal compaction. The goal is for any store instance to have a complete view of the stream's data at every resolution for a given time slot, allowing it to return the unique and most appropriate data.
### Conclusion
In this second part, we explored how Thanos manages data for efficient storage and retrieval. We examined how the Receive component prepares samples and exposes local data for queries, and how the Compactor optimizes data through compaction and downsampling. We also discussed how the Store Gateway retrieves data and can be optimized by leveraging indexes and implementing sharding strategies.
Now that our samples are efficiently stored and prepared for queries, we can move on to the final part of this series, where we will explore how this distributed data is retrieved by query components like the Querier.
See the full list of articles in this series:
* Life of a sample in thanos, and how to configure it Ingestion Part I
* Life of a sample in thanos, and how to configure it Data Management Part II
* Life of a sample in thanos, and how to configure it Querying Part III

Binary file not shown.

After

Width:  |  Height:  |  Size: 145 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 218 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 756 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 244 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 96 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 144 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 199 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 126 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 303 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 134 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 159 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 248 KiB

View File

@ -8,4 +8,4 @@ Welcome 👋🏼
This space was created for the Thanos community to share learnings, insights, best practices and cool things to the world. If you are interested in contributing relevant content to Thanos blog, feel free to add Pull Request (PR) to [Thanos repo's blog directory](http://github.com/thanos-io/thanos). See ya there!
PS: For Prometheus specific content, consider contributing to [Prometheus blog space](https://prometheus.io/blog/) by creating PR to [Prometheus docs repo](https://github.com/prometheus/docs/tree/main/content/blog).
PS: For Prometheus specific content, consider contributing to [Prometheus blog space](https://prometheus.io/blog/) by creating PR to [Prometheus docs repo](https://github.com/prometheus/docs/tree/main/blog-posts).

View File

@ -57,7 +57,7 @@ This rule also means that there could be a problem when both compacted and non-c
> **NOTE:** In future versions of Thanos it's possible that both restrictions will be removed once [vertical compaction](#vertical-compactions) reaches production status.
You can though run multiple Compactors against a single Bucket as long as each instance compacts a separate stream of blocks. You can do this in order to [scale the compaction process](#scalability).
It is possible to run multiple Compactors against a single Bucket, provided each instance handles a separate stream of blocks. This allows you to [scale the compaction process](#scalability).
### Vertical Compactions
@ -106,7 +106,7 @@ external_labels: {cluster="us1", replica="1", receive="true", environment="produ
external_labels: {cluster="us1", replica="1", receive="true", environment="staging"}
```
and set `--deduplication.replica-label="replica"`, Compactor will assume those as:
and set `--deduplication.replica-label=replica`, Compactor will assume those as:
```
external_labels: {cluster="eu1", receive="true", environment="production"} (2 streams, resulted in one)
@ -152,6 +152,8 @@ message AggrChunk {
This means that for each series we collect various aggregations with a given interval: 5m or 1h (depending on resolution). This allows us to keep precision on large duration queries, without fetching too many samples.
Native histogram downsampling leverages the fact that one can aggregate & reduce schema i.e. downsample native histograms. Native histograms only store 3 aggregations - counter, count, and sum. Sum and count are used to produce "an average" native histogram. Counter is a counter that is used with functions irate, rate, increase, and resets.
### ⚠ Downsampling: Note About Resolution and Retention ⚠️
Resolution is a distance between data points on your graphs. E.g.
@ -278,14 +280,91 @@ usage: thanos compact [<flags>]
Continuously compacts blocks in an object store bucket.
Flags:
-h, --[no-]help Show context-sensitive help (also try
--help-long and --help-man).
--[no-]version Show application version.
--log.level=info Log filtering level.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--[no-]enable-auto-gomemlimit
Enable go runtime to automatically limit memory
consumption.
--auto-gomemlimit.ratio=0.9
The ratio of reserved GOMEMLIMIT memory to the
detected maximum container or system memory.
--http-address="0.0.0.0:10902"
Listen host:port for HTTP endpoints.
--http-grace-period=2m Time to wait after an interrupt received for
HTTP Server.
--http.config="" [EXPERIMENTAL] Path to the configuration file
that can enable TLS or authentication for all
HTTP endpoints.
--data-dir="./data" Data directory in which to cache blocks and
process compactions.
--objstore.config-file=<file-path>
Path to YAML file that contains object
store configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--objstore.config=<content>
Alternative to 'objstore.config-file'
flag (mutually exclusive). Content of
YAML file that contains object store
configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--consistency-delay=30m Minimum age of fresh (non-compacted)
blocks before they are being processed.
Malformed blocks older than the maximum of
consistency-delay and 48h0m0s will be removed.
--retention.resolution-raw=0d
How long to retain raw samples in bucket.
Setting this to 0d will retain samples of this
resolution forever
--retention.resolution-5m=0d
How long to retain samples of resolution 1 (5
minutes) in bucket. Setting this to 0d will
retain samples of this resolution forever
--retention.resolution-1h=0d
How long to retain samples of resolution 2 (1
hour) in bucket. Setting this to 0d will retain
samples of this resolution forever
-w, --[no-]wait Do not exit after all compactions have been
processed and wait for new work.
--wait-interval=5m Wait interval between consecutive compaction
runs and bucket refreshes. Only works when
--wait flag specified.
--[no-]downsampling.disable
Disables downsampling. This is not recommended
as querying long time ranges without
non-downsampled data is not efficient and useful
e.g it is not possible to render all samples for
a human eye anyway
--block-discovery-strategy="concurrent"
One of concurrent, recursive. When set to
concurrent, stores will concurrently issue
one call per directory to discover active
blocks in the bucket. The recursive strategy
iterates through all objects in the bucket,
recursively traversing into each directory.
This avoids N+1 calls at the expense of having
slower bucket iterations.
--block-meta-fetch-concurrency=32
Number of goroutines to use when fetching block
metadata from object storage.
--block-files-concurrency=1
Number of goroutines to use when
fetching/uploading block files from object
storage.
--block-meta-fetch-concurrency=32
Number of goroutines to use when fetching block
metadata from object storage.
--block-viewer.global.sync-block-interval=1m
Repeat interval for syncing the blocks between
local and remote view for /global Block Viewer
@ -294,56 +373,26 @@ Flags:
Maximum time for syncing the blocks between
local and remote view for /global Block Viewer
UI.
--bucket-web-label=BUCKET-WEB-LABEL
External block label to use as group title in
the bucket web UI
--compact.blocks-fetch-concurrency=1
Number of goroutines to use when download block
during compaction.
--compact.cleanup-interval=5m
How often we should clean up partially uploaded
blocks and blocks with deletion mark in the
background when --wait has been enabled. Setting
it to "0s" disables it - the cleaning will only
happen at the end of an iteration.
--compact.concurrency=1 Number of goroutines to use when compacting
groups.
--compact.progress-interval=5m
Frequency of calculating the compaction progress
in the background when --wait has been enabled.
Setting it to "0s" disables it. Now compaction,
downsampling and retention progress are
supported.
--consistency-delay=30m Minimum age of fresh (non-compacted)
blocks before they are being processed.
Malformed blocks older than the maximum of
consistency-delay and 48h0m0s will be removed.
--data-dir="./data" Data directory in which to cache blocks and
process compactions.
--deduplication.func= Experimental. Deduplication algorithm for
merging overlapping blocks. Possible values are:
"", "penalty". If no value is specified,
the default compact deduplication merger
is used, which performs 1:1 deduplication
for samples. When set to penalty, penalty
based deduplication algorithm will be used.
At least one replica label has to be set via
--deduplication.replica-label flag.
--deduplication.replica-label=DEDUPLICATION.REPLICA-LABEL ...
Label to treat as a replica indicator of blocks
that can be deduplicated (repeated flag). This
will merge multiple replica blocks into one.
This process is irreversible.Experimental.
When one or more labels are set, compactor
will ignore the given labels so that vertical
compaction can merge the blocks.Please note
that by default this uses a NAIVE algorithm
for merging which works well for deduplication
of blocks with **precisely the same samples**
like produced by Receiver replication.If you
need a different deduplication algorithm (e.g
one that works well with Prometheus replicas),
please set it via --deduplication.func.
--compact.concurrency=1 Number of goroutines to use when compacting
groups.
--compact.blocks-fetch-concurrency=1
Number of goroutines to use when download block
during compaction.
--downsample.concurrency=1
Number of goroutines to use when downsampling
blocks.
--delete-delay=48h Time before a block marked for deletion is
deleted from bucket. If delete-delay is non
zero, blocks will be marked for deletion and
@ -355,35 +404,45 @@ Flags:
block loaded, or compactor is ignoring the
deletion because it's compacting the block at
the same time.
--disable-admin-operations
Disable UI/API admin operations like marking
blocks for deletion and no compaction.
--downsample.concurrency=1
Number of goroutines to use when downsampling
blocks.
--downsampling.disable Disables downsampling. This is not recommended
as querying long time ranges without
non-downsampled data is not efficient and useful
e.g it is not possible to render all samples for
a human eye anyway
--deduplication.func= Experimental. Deduplication algorithm for
merging overlapping blocks. Possible values are:
"", "penalty". If no value is specified,
the default compact deduplication merger
is used, which performs 1:1 deduplication
for samples. When set to penalty, penalty
based deduplication algorithm will be used.
At least one replica label has to be set via
--deduplication.replica-label flag.
--deduplication.replica-label=DEDUPLICATION.REPLICA-LABEL ...
Experimental. Label to treat as a replica
indicator of blocks that can be deduplicated
(repeated flag). This will merge multiple
replica blocks into one. This process is
irreversible. Flag may be specified multiple
times as well as a comma separated list of
labels. When one or more labels are set,
compactor will ignore the given labels so that
vertical compaction can merge the blocks.Please
note that by default this uses a NAIVE algorithm
for merging which works well for deduplication
of blocks with **precisely the same samples**
like produced by Receiver replication.If you
need a different deduplication algorithm (e.g
one that works well with Prometheus replicas),
please set it via --deduplication.func.
--hash-func= Specify which hash function to use when
calculating the hashes of produced files.
If no function has been specified, it does not
happen. This permits avoiding downloading some
files twice albeit at some performance cost.
Possible values are: "", "SHA256".
-h, --help Show context-sensitive help (also try
--help-long and --help-man).
--http-address="0.0.0.0:10902"
Listen host:port for HTTP endpoints.
--http-grace-period=2m Time to wait after an interrupt received for
HTTP Server.
--http.config="" [EXPERIMENTAL] Path to the configuration file
that can enable TLS or authentication for all
HTTP endpoints.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--log.level=info Log filtering level.
--min-time=0000-01-01T00:00:00Z
Start of time range limit to compact.
Thanos Compactor will compact only blocks, which
happened later than this value. Option can be a
constant time in RFC3339 format or time duration
relative to current time, such as -1d or 2h45m.
Valid duration units are ms, s, m, h, d, w, y.
--max-time=9999-12-31T23:59:59Z
End of time range limit to compact.
Thanos Compactor will compact only blocks,
@ -392,68 +451,26 @@ Flags:
duration relative to current time, such as -1d
or 2h45m. Valid duration units are ms, s, m, h,
d, w, y.
--min-time=0000-01-01T00:00:00Z
Start of time range limit to compact.
Thanos Compactor will compact only blocks, which
happened later than this value. Option can be a
constant time in RFC3339 format or time duration
relative to current time, such as -1d or 2h45m.
Valid duration units are ms, s, m, h, d, w, y.
--objstore.config=<content>
Alternative to 'objstore.config-file'
flag (mutually exclusive). Content of
YAML file that contains object store
configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--objstore.config-file=<file-path>
Path to YAML file that contains object
store configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--retention.resolution-1h=0d
How long to retain samples of resolution 2 (1
hour) in bucket. Setting this to 0d will retain
samples of this resolution forever
--retention.resolution-5m=0d
How long to retain samples of resolution 1 (5
minutes) in bucket. Setting this to 0d will
retain samples of this resolution forever
--retention.resolution-raw=0d
How long to retain raw samples in bucket.
Setting this to 0d will retain samples of this
resolution forever
--[no-]web.disable Disable Block Viewer UI.
--selector.relabel-config-file=<file-path>
Path to YAML file with relabeling
configuration that allows selecting blocks
to act on based on their external labels.
It follows thanos sharding relabel-config
syntax. For format details see:
https://thanos.io/tip/thanos/sharding.md/#relabelling
--selector.relabel-config=<content>
Alternative to 'selector.relabel-config-file'
flag (mutually exclusive). Content of
YAML file that contains relabeling
configuration that allows selecting
blocks. It follows native Prometheus
relabel-config syntax. See format details:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config
--selector.relabel-config-file=<file-path>
Path to YAML file that contains relabeling
configuration that allows selecting
blocks. It follows native Prometheus
relabel-config syntax. See format details:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--version Show application version.
-w, --wait Do not exit after all compactions have been
processed and wait for new work.
--wait-interval=5m Wait interval between consecutive compaction
runs and bucket refreshes. Only works when
--wait flag specified.
--web.disable Disable Block Viewer UI.
--web.disable-cors Whether to disable CORS headers to be set by
Thanos. By default Thanos sets CORS headers to
be allowed by all.
flag (mutually exclusive). Content of YAML
file with relabeling configuration that allows
selecting blocks to act on based on their
external labels. It follows thanos sharding
relabel-config syntax. For format details see:
https://thanos.io/tip/thanos/sharding.md/#relabelling
--web.route-prefix="" Prefix for API and UI endpoints. This allows
thanos UI to be served on a sub-path. This
option is analogous to --web.route-prefix of
Prometheus.
--web.external-prefix="" Static prefix for all HTML links and redirect
URLs in the bucket web UI interface.
Actual endpoints are still served on / or the
@ -473,9 +490,14 @@ Flags:
stripped prefix value in X-Forwarded-Prefix
header. This allows thanos UI to be served on a
sub-path.
--web.route-prefix="" Prefix for API and UI endpoints. This allows
thanos UI to be served on a sub-path. This
option is analogous to --web.route-prefix of
Prometheus.
--[no-]web.disable-cors Whether to disable CORS headers to be set by
Thanos. By default Thanos sets CORS headers to
be allowed by all.
--bucket-web-label=BUCKET-WEB-LABEL
External block label to use as group title in
the bucket web UI
--[no-]disable-admin-operations
Disable UI/API admin operations like marking
blocks for deletion and no compaction.
```

View File

@ -77,6 +77,13 @@ config:
max_get_multi_batch_size: 0
dns_provider_update_interval: 0s
auto_discovery: false
set_async_circuit_breaker_config:
enabled: false
half_open_max_requests: 0
open_duration: 0s
min_requests: 0
consecutive_failures: 0
failure_percent: 0
expiration: 0s
```
@ -132,6 +139,13 @@ config:
master_name: ""
max_async_buffer_size: 10000
max_async_concurrency: 20
set_async_circuit_breaker_config:
enabled: false
half_open_max_requests: 10
open_duration: 5s
min_requests: 50
consecutive_failures: 5
failure_percent: 0.05
expiration: 24h0m0s
```
@ -143,6 +157,8 @@ Other cache configuration parameters, you can refer to [redis-index-cache](store
Query Frontend supports `--query-frontend.log-queries-longer-than` flag to log queries running longer than some duration.
The field `remote_user` can be read from an HTTP header, like `X-Grafana-User`, by setting `--query-frontend.slow-query-logs-user-header`.
## Naming
Naming is hard :) Please check [here](https://github.com/thanos-io/thanos/pull/2434#discussion_r408300683) to see why we chose `query-frontend` as the name.
@ -184,184 +200,196 @@ usage: thanos query-frontend [<flags>]
Query frontend command implements a service deployed in front of queriers to
improve query parallelization and caching.
Flags:
--cache-compression-type=""
Use compression in results cache.
Supported values are: 'snappy' and ” (disable
compression).
-h, --help Show context-sensitive help (also try
--help-long and --help-man).
--http-address="0.0.0.0:10902"
Listen host:port for HTTP endpoints.
--http-grace-period=2m Time to wait after an interrupt received for
HTTP Server.
--http.config="" [EXPERIMENTAL] Path to the configuration file
that can enable TLS or authentication for all
HTTP endpoints.
--labels.default-time-range=24h
The default metadata time range duration for
retrieving labels through Labels and Series API
when the range parameters are not specified.
--labels.max-query-parallelism=14
Maximum number of labels requests will be
scheduled in parallel by the Frontend.
--labels.max-retries-per-request=5
Maximum number of retries for a single
label/series API request; beyond this,
the downstream error is returned.
--labels.partial-response Enable partial response for labels requests
if no partial_response param is specified.
--no-labels.partial-response for disabling.
--labels.response-cache-config=<content>
Alternative to
'labels.response-cache-config-file' flag
(mutually exclusive). Content of YAML file that
contains response cache configuration.
--labels.response-cache-config-file=<file-path>
Path to YAML file that contains response cache
configuration.
--labels.response-cache-max-freshness=1m
Most recent allowed cacheable result for
labels requests, to prevent caching very recent
results that might still be in flux.
--labels.split-interval=24h
Split labels requests by an interval and
execute in parallel, it should be greater
than 0 when labels.response-cache-config is
configured.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--log.level=info Log filtering level.
--query-frontend.compress-responses
Compress HTTP responses.
--query-frontend.downstream-tripper-config=<content>
Alternative to
'query-frontend.downstream-tripper-config-file'
flag (mutually exclusive). Content of YAML file
that contains downstream tripper configuration.
If your downstream URL is localhost or
127.0.0.1 then it is highly recommended to
increase max_idle_conns_per_host to at least
100.
--query-frontend.downstream-tripper-config-file=<file-path>
Path to YAML file that contains downstream
tripper configuration. If your downstream URL
is localhost or 127.0.0.1 then it is highly
recommended to increase max_idle_conns_per_host
to at least 100.
--query-frontend.downstream-url="http://localhost:9090"
URL of downstream Prometheus Query compatible
API.
--query-frontend.enable-x-functions
Enable experimental x-
functions in query-frontend.
--no-query-frontend.enable-x-functions for
disabling.
--query-frontend.forward-header=<http-header-name> ...
List of headers forwarded by the query-frontend
to downstream queriers, default is empty
--query-frontend.log-queries-longer-than=0
Log queries that are slower than the specified
duration. Set to 0 to disable. Set to < 0 to
enable on all queries.
--query-frontend.org-id-header=<http-header-name> ...
Deprecation Warning - This flag
will be soon deprecated in favor of
query-frontend.tenant-header and both flags
cannot be used at the same time. Request header
names used to identify the source of slow
queries (repeated flag). The values of the
header will be added to the org id field in
the slow query log. If multiple headers match
the request, the first matching arg specified
will take precedence. If no headers match
'anonymous' will be used.
--query-frontend.vertical-shards=QUERY-FRONTEND.VERTICAL-SHARDS
Number of shards to use when
distributing shardable PromQL queries.
For more details, you can refer to
the Vertical query sharding proposal:
https://thanos.io/tip/proposals-accepted/202205-vertical-query-sharding.md
--query-range.align-range-with-step
Mutate incoming queries to align their
start and end with their step for better
cache-ability. Note: Grafana dashboards do that
by default.
--query-range.horizontal-shards=0
Split queries in this many requests
when query duration is below
query-range.max-split-interval.
--query-range.max-query-length=0
Limit the query time range (end - start time)
in the query-frontend, 0 disables it.
--query-range.max-query-parallelism=14
Maximum number of query range requests will be
scheduled in parallel by the Frontend.
--query-range.max-retries-per-request=5
Maximum number of retries for a single query
range request; beyond this, the downstream
error is returned.
--query-range.max-split-interval=0
Split query range below this interval in
query-range.horizontal-shards. Queries with a
range longer than this value will be split in
multiple requests of this length.
--query-range.min-split-interval=0
Split query range requests above this
interval in query-range.horizontal-shards
requests of equal range. Using
this parameter is not allowed with
query-range.split-interval. One should also set
query-range.split-min-horizontal-shards to a
value greater than 1 to enable splitting.
--query-range.partial-response
Enable partial response for query range
requests if no partial_response param is
specified. --no-query-range.partial-response
for disabling.
--query-range.request-downsampled
Make additional query for downsampled data in
case of empty or incomplete response to range
request.
--query-range.response-cache-config=<content>
Alternative to
'query-range.response-cache-config-file' flag
(mutually exclusive). Content of YAML file that
contains response cache configuration.
--query-range.response-cache-config-file=<file-path>
Path to YAML file that contains response cache
configuration.
--query-range.response-cache-max-freshness=1m
Most recent allowed cacheable result for query
range requests, to prevent caching very recent
results that might still be in flux.
--query-range.split-interval=24h
Split query range requests by an interval and
execute in parallel, it should be greater than
0 when query-range.response-cache-config is
configured.
--request.logging-config=<content>
Alternative to 'request.logging-config-file'
flag (mutually exclusive). Content
of YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config-file=<file-path>
Path to YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
-h, --[no-]help Show context-sensitive help (also try --help-long
and --help-man).
--[no-]version Show application version.
--log.level=info Log filtering level.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--version Show application version.
--web.disable-cors Whether to disable CORS headers to be set by
Thanos. By default Thanos sets CORS headers to
be allowed by all.
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--[no-]enable-auto-gomemlimit
Enable go runtime to automatically limit memory
consumption.
--auto-gomemlimit.ratio=0.9
The ratio of reserved GOMEMLIMIT memory to the
detected maximum container or system memory.
--http-address="0.0.0.0:10902"
Listen host:port for HTTP endpoints.
--http-grace-period=2m Time to wait after an interrupt received for HTTP
Server.
--http.config="" [EXPERIMENTAL] Path to the configuration file
that can enable TLS or authentication for all
HTTP endpoints.
--[no-]web.disable-cors Whether to disable CORS headers to be set by
Thanos. By default Thanos sets CORS headers to be
allowed by all.
--[no-]query-range.align-range-with-step
Mutate incoming queries to align their start and
end with their step for better cache-ability.
Note: Grafana dashboards do that by default.
--[no-]query-range.request-downsampled
Make additional query for downsampled data in
case of empty or incomplete response to range
request.
--query-range.split-interval=24h
Split query range requests by an interval and
execute in parallel, it should be greater than
0 when query-range.response-cache-config is
configured.
--query-range.min-split-interval=0
Split query range requests above this interval
in query-range.horizontal-shards requests of
equal range. Using this parameter is not allowed
with query-range.split-interval. One should also
set query-range.split-min-horizontal-shards to a
value greater than 1 to enable splitting.
--query-range.max-split-interval=0
Split query range below this interval in
query-range.horizontal-shards. Queries with a
range longer than this value will be split in
multiple requests of this length.
--query-range.horizontal-shards=0
Split queries in this many requests when query
duration is below query-range.max-split-interval.
--query-range.max-retries-per-request=5
Maximum number of retries for a single query
range request; beyond this, the downstream error
is returned.
--[no-]query-frontend.enable-x-functions
Enable experimental x-
functions in query-frontend.
--no-query-frontend.enable-x-functions for
disabling.
--enable-feature= ... Comma separated feature names to enable. Valid
options for now: promql-experimental-functions
(enables promql experimental functions in
query-frontend)
--query-range.max-query-length=0
Limit the query time range (end - start time) in
the query-frontend, 0 disables it.
--query-range.max-query-parallelism=14
Maximum number of query range requests will be
scheduled in parallel by the Frontend.
--query-range.response-cache-max-freshness=1m
Most recent allowed cacheable result for query
range requests, to prevent caching very recent
results that might still be in flux.
--[no-]query-range.partial-response
Enable partial response for query range requests
if no partial_response param is specified.
--no-query-range.partial-response for disabling.
--query-range.response-cache-config-file=<file-path>
Path to YAML file that contains response cache
configuration.
--query-range.response-cache-config=<content>
Alternative to
'query-range.response-cache-config-file' flag
(mutually exclusive). Content of YAML file that
contains response cache configuration.
--labels.split-interval=24h
Split labels requests by an interval and execute
in parallel, it should be greater than 0 when
labels.response-cache-config is configured.
--labels.max-retries-per-request=5
Maximum number of retries for a single
label/series API request; beyond this, the
downstream error is returned.
--labels.max-query-parallelism=14
Maximum number of labels requests will be
scheduled in parallel by the Frontend.
--labels.response-cache-max-freshness=1m
Most recent allowed cacheable result for labels
requests, to prevent caching very recent results
that might still be in flux.
--[no-]labels.partial-response
Enable partial response for labels requests
if no partial_response param is specified.
--no-labels.partial-response for disabling.
--labels.default-time-range=24h
The default metadata time range duration for
retrieving labels through Labels and Series API
when the range parameters are not specified.
--labels.response-cache-config-file=<file-path>
Path to YAML file that contains response cache
configuration.
--labels.response-cache-config=<content>
Alternative to
'labels.response-cache-config-file' flag
(mutually exclusive). Content of YAML file that
contains response cache configuration.
--cache-compression-type=""
Use compression in results cache. Supported
values are: 'snappy' and ” (disable compression).
--query-frontend.downstream-url="http://localhost:9090"
URL of downstream Prometheus Query compatible
API.
--query-frontend.downstream-tripper-config-file=<file-path>
Path to YAML file that contains downstream
tripper configuration. If your downstream URL
is localhost or 127.0.0.1 then it is highly
recommended to increase max_idle_conns_per_host
to at least 100.
--query-frontend.downstream-tripper-config=<content>
Alternative to
'query-frontend.downstream-tripper-config-file'
flag (mutually exclusive). Content of YAML file
that contains downstream tripper configuration.
If your downstream URL is localhost or 127.0.0.1
then it is highly recommended to increase
max_idle_conns_per_host to at least 100.
--[no-]query-frontend.compress-responses
Compress HTTP responses.
--query-frontend.log-queries-longer-than=0
Log queries that are slower than the specified
duration. Set to 0 to disable. Set to < 0 to
enable on all queries.
--[no-]query-frontend.force-query-stats
Enables query statistics for all queries and will
export statistics as logs and service headers.
--query-frontend.org-id-header=<http-header-name> ...
Deprecation Warning - This flag
will be soon deprecated in favor of
query-frontend.tenant-header and both flags
cannot be used at the same time. Request header
names used to identify the source of slow queries
(repeated flag). The values of the header will be
added to the org id field in the slow query log.
If multiple headers match the request, the first
matching arg specified will take precedence.
If no headers match 'anonymous' will be used.
--query-frontend.forward-header=<http-header-name> ...
List of headers forwarded by the query-frontend
to downstream queriers, default is empty
--query-frontend.vertical-shards=QUERY-FRONTEND.VERTICAL-SHARDS
Number of shards to use when
distributing shardable PromQL queries.
For more details, you can refer to
the Vertical query sharding proposal:
https://thanos.io/tip/proposals-accepted/202205-vertical-query-sharding.md
--query-frontend.slow-query-logs-user-header=<http-header-name>
Set the value of the field remote_user in the
slow query logs to the value of the given HTTP
header. Falls back to reading the user from the
basic auth header.
--request.logging-config-file=<file-path>
Path to YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config=<content>
Alternative to 'request.logging-config-file'
flag (mutually exclusive). Content
of YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
```

View File

@ -103,15 +103,32 @@ thanos query \
This logic can also be controlled via parameter on QueryAPI. More details below.
## Experimental PromQL Engine
### Deduplication Algorithms
By default, Thanos querier comes with standard Prometheus PromQL engine. However, when `--query.promql-engine=thanos` is specified, Thanos will use [experimental Thanos PromQL engine](http://github.com/thanos-community/promql-engine) which is a drop-in, efficient implementation of PromQL engine with query planner and optimizers.
Thanos Querier supports different algorithms for deduplicating overlapping series. You can choose the deduplication algorithm using the `--deduplication.func` flag. The available options are:
* `penalty` (default): This is the default deduplication algorithm used by Thanos. It fills gaps only after a certain penalty window. This helps avoid flapping between replicas due to minor differences or delays.
* `chain`: This algorithm performs 1:1 deduplication for samples. It merges all available data points from the replicas without applying any penalty. This is useful in deployments based on receivers, where each replica is populated by the same data. In such cases, using the penalty algorithm may cause gaps even when data is available in other replicas.
Note that deduplication of HA groups is not supported by the `chain` algorithm.
## Thanos PromQL Engine (experimental)
By default, Thanos querier comes with standard Prometheus PromQL engine. However, when `--query.promql-engine=thanos` is specified, Thanos will use [experimental Thanos PromQL engine](http://github.com/thanos-io/promql-engine) which is a drop-in, efficient implementation of PromQL engine with query planner and optimizers.
To learn more, see [the introduction talk](https://youtu.be/pjkWzDVxWk4?t=3609) from [the PromConEU 2022](https://promcon.io/2022-munich/talks/opening-pandoras-box-redesigning/).
This feature is still **experimental** given active development. All queries should be supported due to bulit-in fallback to old PromQL if something is not yet implemented.
For new engine bugs/issues, please use https://github.com/thanos-community/promql-engine GitHub issues.
For new engine bugs/issues, please use https://github.com/thanos-io/promql-engine GitHub issues.
### Distributed execution mode
When using Thanos PromQL Engine the distributed execution mode can be enabled using `--query.mode=distributed`. When this mode is enabled, the Querier will break down each query into independent fragments and delegate them to components which implement the Query API.
This mode is particularly useful in architectures where multiple independent Queriers are deployed in separate environments (different regions or different Kubernetes clusters) and are federated through a separate central Querier. A Querier running in the distributed mode will only talk to Queriers, or other components which implement the Query API. Endpoints which only act as Stores (e.g. Store Gateways or Rulers), and are directly connected to a distributed Querier, will not be included in the execution of a distributed query. This constraint should help with keeping the distributed query execution simple and efficient, but could be removed in the future if there are good use cases for it.
For further details on the design and use cases of this feature, see the [official design document](https://thanos.io/tip/proposals-done/202301-distributed-query-execution.md/).
## Query API Overview
@ -264,7 +281,7 @@ Example file SD file in YAML:
### Tenant Metrics
Tenant information is captured in relevant Thanos exported metrics in the Querier, Query Frontend and Store. In order make use of this functionality requests to the Query/Query Frontend component should include the tenant-id in the appropriate HTTP request header as configured with `--query.tenant-header`. The tenant information is passed through components (including Query Frontend), down to the Thanos Store, enabling per-tenant metrics in these components also. If no tenant header is set to requests to the query component, the default tenant as defined by `--query.tenant-default-id` will be used.
Tenant information is captured in relevant Thanos exported metrics in the Querier, Query Frontend and Store. In order make use of this functionality requests to the Query/Query Frontend component should include the tenant-id in the appropriate HTTP request header as configured with `--query.tenant-header`. The tenant information is passed through components (including Query Frontend), down to the Thanos Store, enabling per-tenant metrics in these components also. If no tenant header is set to requests to the query component, the default tenant as defined by `--query.default-tenant-id` will be used.
### Tenant Enforcement
@ -282,69 +299,29 @@ usage: thanos query [<flags>]
Query node exposing PromQL enabled Query API with data retrieved from multiple
store nodes.
Flags:
--alert.query-url=ALERT.QUERY-URL
The external Thanos Query URL that would be set
in all alerts 'Source' field.
--enable-feature= ... Comma separated experimental feature names
to enable.The current list of features is
query-pushdown.
--endpoint=<endpoint> ... Addresses of statically configured Thanos
API servers (repeatable). The scheme may be
prefixed with 'dns+' or 'dnssrv+' to detect
Thanos API servers through respective DNS
lookups.
--endpoint-group=<endpoint-group> ...
Experimental: DNS name of statically configured
Thanos API server groups (repeatable). Targets
resolved from the DNS name will be queried in
a round-robin, instead of a fanout manner.
This flag should be used when connecting a
Thanos Query to HA groups of Thanos components.
--endpoint-group-strict=<endpoint-group-strict> ...
Experimental: DNS name of statically configured
Thanos API server groups (repeatable) that are
always used, even if the health check fails.
--endpoint-strict=<staticendpoint> ...
Addresses of only statically configured Thanos
API servers that are always used, even if
the health check fails. Useful if you have a
caching layer on top.
--grpc-address="0.0.0.0:10901"
Listen ip:port address for gRPC endpoints
(StoreAPI). Make sure this address is routable
from other components.
--grpc-client-server-name=""
Server name to verify the hostname on
the returned gRPC certificates. See
https://tools.ietf.org/html/rfc4366#section-3.1
--grpc-client-tls-ca="" TLS CA Certificates to use to verify gRPC
servers
--grpc-client-tls-cert="" TLS Certificates to use to identify this client
to the server
--grpc-client-tls-key="" TLS Key for the client's certificate
--grpc-client-tls-secure Use TLS when talking to the gRPC server
--grpc-client-tls-skip-verify
Disable TLS certificate verification i.e self
signed, signed by fake CA
--grpc-compression=none Compression algorithm to use for gRPC requests
to other clients. Must be one of: snappy, none
--grpc-grace-period=2m Time to wait after an interrupt received for
GRPC Server.
--grpc-server-max-connection-age=60m
The grpc server max connection age. This
controls how often to re-establish connections
and redo TLS handshakes.
--grpc-server-tls-cert="" TLS Certificate for gRPC server, leave blank to
disable TLS
--grpc-server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--grpc-server-tls-key="" TLS Key for the gRPC server, leave blank to
disable TLS
-h, --help Show context-sensitive help (also try
-h, --[no-]help Show context-sensitive help (also try
--help-long and --help-man).
--[no-]version Show application version.
--log.level=info Log filtering level.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--[no-]enable-auto-gomemlimit
Enable go runtime to automatically limit memory
consumption.
--auto-gomemlimit.ratio=0.9
The ratio of reserved GOMEMLIMIT memory to the
detected maximum container or system memory.
--http-address="0.0.0.0:10902"
Listen host:port for HTTP endpoints.
--http-grace-period=2m Time to wait after an interrupt received for
@ -352,161 +329,50 @@ Flags:
--http.config="" [EXPERIMENTAL] Path to the configuration file
that can enable TLS or authentication for all
HTTP endpoints.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--log.level=info Log filtering level.
--query.active-query-path=""
Directory to log currently active queries in
the queries.active file.
--query.auto-downsampling Enable automatic adjustment (step / 5) to what
source of data should be used in store gateways
if no max_source_resolution param is specified.
--query.conn-metric.label=external_labels... ...
Optional selection of query connection metric
labels to be collected from endpoint set
--query.default-evaluation-interval=1m
Set default evaluation interval for sub
queries.
--query.default-step=1s Set default step for range queries. Default
step is only used when step is not set in UI.
In such cases, Thanos UI will use default
step to calculate resolution (resolution
= max(rangeSeconds / 250, defaultStep)).
This will not work from Grafana, but Grafana
has __step variable which can be used.
--query.default-tenant-id="default-tenant"
Default tenant ID to use if tenant header is
not present
--query.enable-x-functions
Whether to enable extended rate functions
(xrate, xincrease and xdelta). Only has effect
when used with Thanos engine.
--query.enforce-tenancy Enforce tenancy on Query APIs. Responses
are returned only if the label value of the
configured tenant-label-name and the value of
the tenant header matches.
--query.lookback-delta=QUERY.LOOKBACK-DELTA
The maximum lookback duration for retrieving
metrics during expression evaluations.
PromQL always evaluates the query for the
certain timestamp (query range timestamps are
deduced by step). Since scrape intervals might
be different, PromQL looks back for given
amount of time to get latest sample. If it
exceeds the maximum lookback delta it assumes
series is stale and returns none (a gap).
This is why lookback delta should be set to at
least 2 times of the slowest scrape interval.
If unset it will use the promql default of 5m.
--query.max-concurrent=20 Maximum number of queries processed
concurrently by query node.
--query.max-concurrent-select=4
Maximum number of select requests made
concurrently per a query.
--query.metadata.default-time-range=0s
The default metadata time range duration for
retrieving labels through Labels and Series API
when the range parameters are not specified.
The zero value means range covers the time
since the beginning.
--query.partial-response Enable partial response for queries if
no partial_response param is specified.
--no-query.partial-response for disabling.
--query.promql-engine=prometheus
Default PromQL engine to use.
--query.replica-label=QUERY.REPLICA-LABEL ...
Labels to treat as a replica indicator along
which data is deduplicated. Still you will
be able to query without deduplication using
'dedup=false' parameter. Data includes time
series, recording rules, and alerting rules.
--query.telemetry.request-duration-seconds-quantiles=0.1... ...
The quantiles for exporting metrics about the
request duration quantiles.
--query.telemetry.request-samples-quantiles=100... ...
The quantiles for exporting metrics about the
samples count quantiles.
--query.telemetry.request-series-seconds-quantiles=10... ...
The quantiles for exporting metrics about the
series count quantiles.
--query.tenant-certificate-field=
Use TLS client's certificate field to determine
tenant for write requests. Must be one of
organization, organizationalUnit or commonName.
This setting will cause the query.tenant-header
flag value to be ignored.
--query.tenant-header="THANOS-TENANT"
HTTP header to determine tenant.
--query.tenant-label-name="tenant_id"
Label name to use when enforcing tenancy (if
--query.enforce-tenancy is enabled).
--query.timeout=2m Maximum time to process query by query node.
--request.logging-config=<content>
Alternative to 'request.logging-config-file'
flag (mutually exclusive). Content
of YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config-file=<file-path>
Path to YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--selector-label=<name>="<value>" ...
Query selector labels that will be exposed in
info endpoint (repeated).
--store=<store> ... Deprecation Warning - This flag is deprecated
and replaced with `endpoint`. Addresses of
statically configured store API servers
(repeatable). The scheme may be prefixed with
'dns+' or 'dnssrv+' to detect store API servers
through respective DNS lookups.
--store-strict=<staticstore> ...
Deprecation Warning - This flag is deprecated
and replaced with `endpoint-strict`. Addresses
of only statically configured store API servers
that are always used, even if the health check
fails. Useful if you have a caching layer on
top.
--store.limits.request-samples=0
The maximum samples allowed for a single
Series request, The Series call fails if
this limit is exceeded. 0 means no limit.
NOTE: For efficiency the limit is internally
implemented as 'chunks limit' considering each
chunk contains a maximum of 120 samples.
--store.limits.request-series=0
The maximum series allowed for a single Series
request. The Series call fails if this limit is
exceeded. 0 means no limit.
--store.response-timeout=0ms
If a Store doesn't send any data in this
specified duration then a Store will be ignored
and partial data will be returned if it's
enabled. 0 disables timeout.
--store.sd-dns-interval=30s
Interval between DNS resolutions.
--store.sd-files=<path> ...
Path to files that contain addresses of store
API servers. The path can be a glob pattern
(repeatable).
--store.sd-interval=5m Refresh interval to re-read file SD files.
It is used as a resync fallback.
--store.unhealthy-timeout=5m
Timeout before an unhealthy store is cleaned
from the store UI page.
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--version Show application version.
--web.disable-cors Whether to disable CORS headers to be set by
Thanos. By default Thanos sets CORS headers to
be allowed by all.
--grpc-address="0.0.0.0:10901"
Listen ip:port address for gRPC endpoints
(StoreAPI). Make sure this address is routable
from other components.
--grpc-server-tls-cert="" TLS Certificate for gRPC server, leave blank to
disable TLS
--grpc-server-tls-key="" TLS Key for the gRPC server, leave blank to
disable TLS
--grpc-server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--grpc-server-tls-min-version="1.3"
TLS supported minimum version for gRPC server.
If no version is specified, it'll default to
1.3. Allowed values: ["1.0", "1.1", "1.2",
"1.3"]
--grpc-server-max-connection-age=60m
The grpc server max connection age. This
controls how often to re-establish connections
and redo TLS handshakes.
--grpc-grace-period=2m Time to wait after an interrupt received for
GRPC Server.
--[no-]grpc-client-tls-secure
Use TLS when talking to the gRPC server
--[no-]grpc-client-tls-skip-verify
Disable TLS certificate verification i.e self
signed, signed by fake CA
--grpc-client-tls-cert="" TLS Certificates to use to identify this client
to the server
--grpc-client-tls-key="" TLS Key for the client's certificate
--grpc-client-tls-ca="" TLS CA Certificates to use to verify gRPC
servers
--grpc-client-server-name=""
Server name to verify the hostname on
the returned gRPC certificates. See
https://tools.ietf.org/html/rfc4366#section-3.1
--grpc-compression=none Compression algorithm to use for gRPC requests
to other clients. Must be one of: snappy, none
--web.route-prefix="" Prefix for API and UI endpoints. This allows
thanos UI to be served on a sub-path.
Defaults to the value of --web.external-prefix.
This option is analogous to --web.route-prefix
of Prometheus.
--web.external-prefix="" Static prefix for all HTML links and
redirect URLs in the UI query web interface.
Actual endpoints are still served on / or the
@ -526,11 +392,217 @@ Flags:
stripped prefix value in X-Forwarded-Prefix
header. This allows thanos UI to be served on a
sub-path.
--web.route-prefix="" Prefix for API and UI endpoints. This allows
thanos UI to be served on a sub-path.
Defaults to the value of --web.external-prefix.
This option is analogous to --web.route-prefix
of Prometheus.
--[no-]web.disable-cors Whether to disable CORS headers to be set by
Thanos. By default Thanos sets CORS headers to
be allowed by all.
--query.timeout=2m Maximum time to process query by query node.
--query.promql-engine=prometheus
Default PromQL engine to use.
--[no-]query.enable-x-functions
Whether to enable extended rate functions
(xrate, xincrease and xdelta). Only has effect
when used with Thanos engine.
--query.mode=local PromQL query mode. One of: local, distributed.
--query.max-concurrent=20 Maximum number of queries processed
concurrently by query node.
--query.lookback-delta=QUERY.LOOKBACK-DELTA
The maximum lookback duration for retrieving
metrics during expression evaluations.
PromQL always evaluates the query for the
certain timestamp (query range timestamps are
deduced by step). Since scrape intervals might
be different, PromQL looks back for given
amount of time to get latest sample. If it
exceeds the maximum lookback delta it assumes
series is stale and returns none (a gap).
This is why lookback delta should be set to at
least 2 times of the slowest scrape interval.
If unset it will use the promql default of 5m.
--query.max-concurrent-select=4
Maximum number of select requests made
concurrently per a query.
--query.conn-metric.label=external_labels... ...
Optional selection of query connection metric
labels to be collected from endpoint set
--deduplication.func=penalty
Experimental. Deduplication algorithm for
merging overlapping series. Possible values
are: "penalty", "chain". If no value is
specified, penalty based deduplication
algorithm will be used. When set to chain, the
default compact deduplication merger is used,
which performs 1:1 deduplication for samples.
At least one replica label has to be set via
--query.replica-label flag.
--query.replica-label=QUERY.REPLICA-LABEL ...
Labels to treat as a replica indicator along
which data is deduplicated. Still you will
be able to query without deduplication using
'dedup=false' parameter. Data includes time
series, recording rules, and alerting rules.
Flag may be specified multiple times as well as
a comma separated list of labels.
--query.partition-label=QUERY.PARTITION-LABEL ...
Labels that partition the leaf queriers. This
is used to scope down the labelsets of leaf
queriers when using the distributed query mode.
If set, these labels must form a partition
of the leaf queriers. Partition labels must
not intersect with replica labels. Every TSDB
of a leaf querier must have these labels.
This is useful when there are multiple external
labels that are irrelevant for the partition as
it allows the distributed engine to ignore them
for some optimizations. If this is empty then
all labels are used as partition labels.
--query.metadata.default-time-range=0s
The default metadata time range duration for
retrieving labels through Labels and Series API
when the range parameters are not specified.
The zero value means range covers the time
since the beginning.
--selector-label=<name>="<value>" ...
Query selector labels that will be exposed in
info endpoint (repeated).
--[no-]query.auto-downsampling
Enable automatic adjustment (step / 5) to what
source of data should be used in store gateways
if no max_source_resolution param is specified.
--[no-]query.partial-response
Enable partial response for queries if
no partial_response param is specified.
--no-query.partial-response for disabling.
--query.active-query-path=""
Directory to log currently active queries in
the queries.active file.
--enable-feature= ... Comma separated feature names to enable. Valid
options for now: promql-experimental-functions
(enables promql experimental functions in
query)
--query.default-evaluation-interval=1m
Set default evaluation interval for sub
queries.
--query.default-step=1s Set default step for range queries. Default
step is only used when step is not set in UI.
In such cases, Thanos UI will use default
step to calculate resolution (resolution
= max(rangeSeconds / 250, defaultStep)).
This will not work from Grafana, but Grafana
has __step variable which can be used.
--store.response-timeout=0ms
If a Store doesn't send any data in this
specified duration then a Store will be ignored
and partial data will be returned if it's
enabled. 0 disables timeout.
--selector.relabel-config-file=<file-path>
Path to YAML file with relabeling
configuration that allows selecting blocks
to query based on their external labels.
It follows the Thanos sharding relabel-config
syntax. For format details see:
https://thanos.io/tip/thanos/sharding.md/#relabelling
--selector.relabel-config=<content>
Alternative to 'selector.relabel-config-file'
flag (mutually exclusive). Content of YAML
file with relabeling configuration that allows
selecting blocks to query based on their
external labels. It follows the Thanos sharding
relabel-config syntax. For format details see:
https://thanos.io/tip/thanos/sharding.md/#relabelling
--request.logging-config-file=<file-path>
Path to YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config=<content>
Alternative to 'request.logging-config-file'
flag (mutually exclusive). Content
of YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--alert.query-url=ALERT.QUERY-URL
The external Thanos Query URL that would be set
in all alerts 'Source' field.
--query.telemetry.request-duration-seconds-quantiles=0.1... ...
The quantiles for exporting metrics about the
request duration quantiles.
--query.telemetry.request-samples-quantiles=100... ...
The quantiles for exporting metrics about the
samples count quantiles.
--query.telemetry.request-series-seconds-quantiles=10... ...
The quantiles for exporting metrics about the
series count quantiles.
--query.tenant-header="THANOS-TENANT"
HTTP header to determine tenant.
--query.default-tenant-id="default-tenant"
Default tenant ID to use if tenant header is
not present
--query.tenant-certificate-field=
Use TLS client's certificate field to determine
tenant for write requests. Must be one of
organization, organizationalUnit or commonName.
This setting will cause the query.tenant-header
flag value to be ignored.
--[no-]query.enforce-tenancy
Enforce tenancy on Query APIs. Responses
are returned only if the label value of the
configured tenant-label-name and the value of
the tenant header matches.
--query.tenant-label-name="tenant_id"
Label name to use when enforcing tenancy (if
--query.enforce-tenancy is enabled).
--store.sd-dns-interval=30s
Interval between DNS resolutions.
--store.unhealthy-timeout=5m
Timeout before an unhealthy store is cleaned
from the store UI page.
--endpoint.sd-config-file=<file-path>
Path to Config File with endpoint definitions
--endpoint.sd-config=<content>
Alternative to 'endpoint.sd-config-file' flag
(mutually exclusive). Content of Config File
with endpoint definitions
--endpoint.sd-config-reload-interval=5m
Interval between endpoint config refreshes
--store.sd-files=<path> ...
(Deprecated) Path to files that contain
addresses of store API servers. The path can be
a glob pattern (repeatable).
--store.sd-interval=5m (Deprecated) Refresh interval to re-read file
SD files. It is used as a resync fallback.
--endpoint=<endpoint> ... (Deprecated): Addresses of statically
configured Thanos API servers (repeatable).
The scheme may be prefixed with 'dns+' or
'dnssrv+' to detect Thanos API servers through
respective DNS lookups.
--endpoint-group=<endpoint-group> ...
(Deprecated, Experimental): DNS name of
statically configured Thanos API server groups
(repeatable). Targets resolved from the DNS
name will be queried in a round-robin, instead
of a fanout manner. This flag should be used
when connecting a Thanos Query to HA groups of
Thanos components.
--endpoint-strict=<endpoint-strict> ...
(Deprecated): Addresses of only statically
configured Thanos API servers that are always
used, even if the health check fails. Useful if
you have a caching layer on top.
--endpoint-group-strict=<endpoint-group-strict> ...
(Deprecated, Experimental): DNS name of
statically configured Thanos API server groups
(repeatable) that are always used, even if the
health check fails.
--store.limits.request-series=0
The maximum series allowed for a single Series
request. The Series call fails if this limit is
exceeded. 0 means no limit.
--store.limits.request-samples=0
The maximum samples allowed for a single
Series request, The Series call fails if
this limit is exceeded. 0 means no limit.
NOTE: For efficiency the limit is internally
implemented as 'chunks limit' considering each
chunk contains a maximum of 120 samples.
```

View File

@ -22,10 +22,69 @@ The Ketama algorithm is a consistent hashing scheme which enables stable scaling
If you are using the `hashmod` algorithm and wish to migrate to `ketama`, the simplest and safest way would be to set up a new pool receivers with `ketama` hashrings and start remote-writing to them. Provided you are on the latest Thanos version, old receivers will flush their TSDBs after the configured retention period and will upload blocks to object storage. Once you have verified that is done, decommission the old receivers.
#### Shuffle sharding
Ketama also supports [shuffle sharding](https://aws.amazon.com/builders-library/workload-isolation-using-shuffle-sharding/). It allows you to provide a single-tenant experience in a multi-tenant system. With shuffle sharding, a tenant gets a subset of all nodes in a hashring. You can configure shuffle sharding for any Ketama hashring like so:
```json
[
{
"endpoints": [
{"address": "node-1:10901", "capnproto_address": "node-1:19391", "az": "foo"},
{"address": "node-2:10901", "capnproto_address": "node-2:19391", "az": "bar"},
{"address": "node-3:10901", "capnproto_address": "node-3:19391", "az": "qux"},
{"address": "node-4:10901", "capnproto_address": "node-4:19391", "az": "foo"},
{"address": "node-5:10901", "capnproto_address": "node-5:19391", "az": "bar"},
{"address": "node-6:10901", "capnproto_address": "node-6:19391", "az": "qux"}
],
"algorithm": "ketama",
"shuffle_sharding_config": {
"shard_size": 2,
"cache_size": 100,
"overrides": [
{
"shard_size": 3,
"tenants": ["prefix-tenant-*"],
"tenant_matcher_type": "glob"
}
]
}
}
]
```
This will enable shuffle sharding with the default shard size of 2 and override it to 3 for every tenant that starts with `prefix-tenant-`.
`cache_size` sets the size of the in-memory LRU cache of the computed subrings. It is not possible to cache everything because an attacker could possibly spam requests with random tenants and those subrings would stay in-memory forever.
With this config, `shard_size/number_of_azs` is chosen from each availability zone for each tenant. So, each tenant will get a unique and consistent set of 3 nodes.
You can use `zone_awareness_disabled` to disable zone awareness. This is useful in the case where you have many separate AZs and it doesn't matter which one to choose. The shards will ignore AZs but the Ketama algorithm will later prefer spreading load through as many AZs as possible. That's why with zone awareness disabled it is recommended to set the shard size to be `max(nodes_in_any_az, replication_factor)`.
Receive only supports stateless shuffle sharding now so it doesn't store and check there have been any overlaps between shards.
### Hashmod (discouraged)
This algorithm uses a `hashmod` function over all labels to decide which receiver is responsible for a given timeseries. This is the default algorithm due to historical reasons. However, its usage for new Receive installations is discouraged since adding new Receiver nodes leads to series churn and memory usage spikes.
### Replication protocols
By default, Receivers replicate data using Protobuf over gRPC. Deserializing protobuf-encoded messages can be resource-intensive and cause significant GC pressure. Alternatively, you can use [Cap'N Proto](https://capnproto.org/) for replication encoding and as the RPC framework.
In order to enable this mode, you can use the `receive.replication-protocol=capnproto` option on the receiver. Thanos will try to infer the Cap'N Proto address of each peer in the hashring using the existing gRPC address. You can also explicitly set the Cap'N Proto as follows:
```json
[
{
"endpoints": [
{"address": "node-1:10901", "capnproto_address": "node-1:19391"},
{"address": "node-2:10901", "capnproto_address": "node-2:19391"},
{"address": "node-3:10901", "capnproto_address": "node-3:19391"}
]
}
]
```
### Hashring management and autoscaling in Kubernetes
The [Thanos Receive Controller](https://github.com/observatorium/thanos-receive-controller) project aims to automate hashring management when running Thanos in Kubernetes. In combination with the Ketama hashring algorithm, this controller can also be used to keep hashrings up to date when Receivers are scaled automatically using an HPA or [Keda](https://keda.sh/).
@ -76,6 +135,26 @@ type: GCS
config:
bucket: ""
service_account: ""
use_grpc: false
grpc_conn_pool_size: 0
http_config:
idle_conn_timeout: 1m30s
response_header_timeout: 2m
insecure_skip_verify: false
tls_handshake_timeout: 10s
expect_continue_timeout: 1s
max_idle_conns: 100
max_idle_conns_per_host: 100
max_conns_per_host: 0
tls_config:
ca_file: ""
cert_file: ""
key_file: ""
server_name: ""
insecure_skip_verify: false
disable_compression: false
chunk_size_bytes: 0
max_retries: 0
prefix: ""
```
@ -95,6 +174,39 @@ The example content of `hashring.json`:
With such configuration any receive listens for remote write on `<ip>10908/api/v1/receive` and will forward to correct one in hashring if needed for tenancy and replication.
It is possible to only match certain `tenant`s inside of a hashring file. For example:
```json
[
{
"tenants": ["foobar"],
"endpoints": [
"127.0.0.1:1234",
"127.0.0.1:12345",
"127.0.0.1:1235"
]
}
]
```
The specified endpoints will be used if the tenant is set to `foobar`. It is possible to use glob matching through the parameter `tenant_matcher_type`. It can have the value `glob`. In this case, the strings inside the array are taken as glob patterns and matched against the `tenant` inside of a remote-write request. For instance:
```json
[
{
"tenants": ["foo*"],
"tenant_matcher_type": "glob",
"endpoints": [
"127.0.0.1:1234",
"127.0.0.1:12345",
"127.0.0.1:1235"
]
}
]
```
This will still match the tenant `foobar` and any other tenant which begins with the letters `foo`.
### AZ-aware Ketama hashring (experimental)
In order to ensure even spread for replication over nodes in different availability-zones, you can choose to include az definition in your hashring config. If we for example have a 6 node cluster, spread over 3 different availability zones; A, B and C, we could use the following example `hashring.json`:
@ -148,7 +260,7 @@ To configure the gates and limits you can use one of the two options:
- `--receive.limits-config-file=<file-path>`: where `<file-path>` is the path to the YAML file. Any modification to the indicated file will trigger a configuration reload. If the updated configuration is invalid an error will be logged and it won't replace the previous valid configuration.
- `--receive.limits-config=<content>`: where `<content>` is the content of YAML file.
By default all the limits and gates are **disabled**.
By default all the limits and gates are **disabled**. These options should be added to the routing-receivers when using the [Routing Receive and Ingesting Receive](https://thanos.io/tip/proposals-accepted/202012-receive-split.md/).
### Understanding the configuration file
@ -248,6 +360,34 @@ NOTE:
- Thanos Receive performs best-effort limiting. In case meta-monitoring is down/unreachable, Thanos Receive will not impose limits and only log errors for meta-monitoring being unreachable. Similarly to when one receiver cannot be scraped.
- Support for different limit configuration for different tenants is planned for the future.
## Asynchronous workers
Instead of spawning a new goroutine each time the Receiver forwards a request to another node, it spawns a fixed number of goroutines (workers) that perform the work. This allows avoiding spawning potentially tens or even hundred thousand goroutines if someone starts sending a lot of small requests.
This number of workers is controlled by `--receive.forward.async-workers=`.
Please see the metric `thanos_receive_forward_delay_seconds` to see if you need to increase the number of forwarding workers.
## Quorum
The following formula is used for calculating quorum:
```go mdox-exec="sed -n '1046,1056p' pkg/receive/handler.go"
// writeQuorum returns minimum number of replicas that has to confirm write success before claiming replication success.
func (h *Handler) writeQuorum() int {
// NOTE(GiedriusS): this is here because otherwise RF=2 doesn't make sense as all writes
// would need to succeed all the time. Another way to think about it is when migrating
// from a Sidecar based setup with 2 Prometheus nodes to a Receiver setup, we want to
// keep the same guarantees.
if h.options.ReplicationFactor == 2 {
return 1
}
return int((h.options.ReplicationFactor / 2) + 1)
}
```
So, if the replication factor is 2 then at least one write must succeed. With RF=3, two writes must succeed, and so on.
## Flags
```$ mdox-exec="thanos receive --help"
@ -255,33 +395,29 @@ usage: thanos receive [<flags>]
Accept Prometheus remote write API requests and write to local tsdb.
Flags:
--grpc-address="0.0.0.0:10901"
Listen ip:port address for gRPC endpoints
(StoreAPI). Make sure this address is routable
from other components.
--grpc-grace-period=2m Time to wait after an interrupt received for
GRPC Server.
--grpc-server-max-connection-age=60m
The grpc server max connection age. This
controls how often to re-establish connections
and redo TLS handshakes.
--grpc-server-tls-cert="" TLS Certificate for gRPC server, leave blank to
disable TLS
--grpc-server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--grpc-server-tls-key="" TLS Key for the gRPC server, leave blank to
disable TLS
--hash-func= Specify which hash function to use when
calculating the hashes of produced files.
If no function has been specified, it does not
happen. This permits avoiding downloading some
files twice albeit at some performance cost.
Possible values are: "", "SHA256".
-h, --help Show context-sensitive help (also try
-h, --[no-]help Show context-sensitive help (also try
--help-long and --help-man).
--[no-]version Show application version.
--log.level=info Log filtering level.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--[no-]enable-auto-gomemlimit
Enable go runtime to automatically limit memory
consumption.
--auto-gomemlimit.ratio=0.9
The ratio of reserved GOMEMLIMIT memory to the
detected maximum container or system memory.
--http-address="0.0.0.0:10902"
Listen host:port for HTTP endpoints.
--http-grace-period=2m Time to wait after an interrupt received for
@ -289,29 +425,100 @@ Flags:
--http.config="" [EXPERIMENTAL] Path to the configuration file
that can enable TLS or authentication for all
HTTP endpoints.
--grpc-address="0.0.0.0:10901"
Listen ip:port address for gRPC endpoints
(StoreAPI). Make sure this address is routable
from other components.
--grpc-server-tls-cert="" TLS Certificate for gRPC server, leave blank to
disable TLS
--grpc-server-tls-key="" TLS Key for the gRPC server, leave blank to
disable TLS
--grpc-server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--grpc-server-tls-min-version="1.3"
TLS supported minimum version for gRPC server.
If no version is specified, it'll default to
1.3. Allowed values: ["1.0", "1.1", "1.2",
"1.3"]
--grpc-server-max-connection-age=60m
The grpc server max connection age. This
controls how often to re-establish connections
and redo TLS handshakes.
--grpc-grace-period=2m Time to wait after an interrupt received for
GRPC Server.
--store.limits.request-series=0
The maximum series allowed for a single Series
request. The Series call fails if this limit is
exceeded. 0 means no limit.
--store.limits.request-samples=0
The maximum samples allowed for a single
Series request, The Series call fails if
this limit is exceeded. 0 means no limit.
NOTE: For efficiency the limit is internally
implemented as 'chunks limit' considering each
chunk contains a maximum of 120 samples.
--remote-write.address="0.0.0.0:19291"
Address to listen on for remote write requests.
--remote-write.server-tls-cert=""
TLS Certificate for HTTP server, leave blank to
disable TLS.
--remote-write.server-tls-key=""
TLS Key for the HTTP server, leave blank to
disable TLS.
--remote-write.server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--remote-write.server-tls-min-version="1.3"
TLS version for the gRPC server, leave blank
to default to TLS 1.3, allow values: ["1.0",
"1.1", "1.2", "1.3"]
--remote-write.client-tls-cert=""
TLS Certificates to use to identify this client
to the server.
--remote-write.client-tls-key=""
TLS Key for the client's certificate.
--[no-]remote-write.client-tls-secure
Use TLS when talking to the other receivers.
--[no-]remote-write.client-tls-skip-verify
Disable TLS certificate verification when
talking to the other receivers i.e self signed,
signed by fake CA.
--remote-write.client-tls-ca=""
TLS CA Certificates to use to verify servers.
--remote-write.client-server-name=""
Server name to verify the hostname
on the returned TLS certificates. See
https://tools.ietf.org/html/rfc4366#section-3.1
--tsdb.path="./data" Data directory of TSDB.
--label=key="value" ... External labels to announce. This flag will be
removed in the future when handling multiple
tsdb instances is added.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--log.level=info Log filtering level.
--objstore.config-file=<file-path>
Path to YAML file that contains object
store configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--objstore.config=<content>
Alternative to 'objstore.config-file'
flag (mutually exclusive). Content of
YAML file that contains object store
configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--objstore.config-file=<file-path>
Path to YAML file that contains object
store configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--receive.default-tenant-id="default-tenant"
Default tenant ID to use when none is provided
via a header.
--receive.grpc-compression=snappy
Compression algorithm to use for gRPC requests
to other receivers. Must be one of: snappy,
none
--tsdb.retention=15d How long to retain raw samples on local
storage. 0d - disables the retention
policy (i.e. infinite retention).
For more details on how retention is
enforced for individual tenants, please
refer to the Tenant lifecycle management
section in the Receive documentation:
https://thanos.io/tip/components/receive.md/#tenant-lifecycle-management
--receive.hashrings-file=<path>
Path to file that contains the hashring
configuration. A watcher is initialized
to watch changes and update the hashring
dynamically.
--receive.hashrings=<content>
Alternative to 'receive.hashrings-file' flag
(lower priority). Content of file that contains
@ -321,11 +528,6 @@ Flags:
the hashrings. Must be one of hashmod, ketama.
Will be overwritten by the tenant-specific
algorithm in the hashring config.
--receive.hashrings-file=<path>
Path to file that contains the hashring
configuration. A watcher is initialized
to watch changes and update the hashring
dynamically.
--receive.hashrings-file-refresh-interval=5m
Refresh interval to re-read the hashring
configuration file. (used as a fallback)
@ -335,18 +537,8 @@ Flags:
configuration. If it's empty AND hashring
configuration was provided, it means that
receive will run in RoutingOnly mode.
--receive.relabel-config=<content>
Alternative to 'receive.relabel-config-file'
flag (mutually exclusive). Content of YAML file
that contains relabeling configuration.
--receive.relabel-config-file=<file-path>
Path to YAML file that contains relabeling
configuration.
--receive.replica-header="THANOS-REPLICA"
HTTP header specifying the replica number of a
write request.
--receive.replication-factor=1
How many times to replicate incoming write
--receive.tenant-header="THANOS-TENANT"
HTTP header to determine tenant for write
requests.
--receive.tenant-certificate-field=
Use TLS client's certificate field to
@ -354,69 +546,86 @@ Flags:
Must be one of organization, organizationalUnit
or commonName. This setting will cause the
receive.tenant-header flag value to be ignored.
--receive.tenant-header="THANOS-TENANT"
HTTP header to determine tenant for write
requests.
--receive.default-tenant-id="default-tenant"
Default tenant ID to use when none is provided
via a header.
--receive.split-tenant-label-name=""
Label name through which the request will
be split into multiple tenants. This takes
precedence over the HTTP header.
--receive.tenant-label-name="tenant_id"
Label name through which the tenant will be
announced.
--remote-write.address="0.0.0.0:19291"
Address to listen on for remote write requests.
--remote-write.client-server-name=""
Server name to verify the hostname
on the returned TLS certificates. See
https://tools.ietf.org/html/rfc4366#section-3.1
--remote-write.client-tls-ca=""
TLS CA Certificates to use to verify servers.
--remote-write.client-tls-cert=""
TLS Certificates to use to identify this client
to the server.
--remote-write.client-tls-key=""
TLS Key for the client's certificate.
--remote-write.server-tls-cert=""
TLS Certificate for HTTP server, leave blank to
disable TLS.
--remote-write.server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--remote-write.server-tls-key=""
TLS Key for the HTTP server, leave blank to
disable TLS.
--request.logging-config=<content>
Alternative to 'request.logging-config-file'
flag (mutually exclusive). Content
of YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config-file=<file-path>
Path to YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--store.limits.request-samples=0
The maximum samples allowed for a single
Series request, The Series call fails if
this limit is exceeded. 0 means no limit.
NOTE: For efficiency the limit is internally
implemented as 'chunks limit' considering each
chunk contains a maximum of 120 samples.
--store.limits.request-series=0
The maximum series allowed for a single Series
request. The Series call fails if this limit is
exceeded. 0 means no limit.
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tsdb.allow-overlapping-blocks
--receive.replica-header="THANOS-REPLICA"
HTTP header specifying the replica number of a
write request.
--receive.forward.async-workers=5
Number of concurrent workers processing
forwarding of remote-write requests.
--receive.grpc-compression=snappy
Compression algorithm to use for gRPC requests
to other receivers. Must be one of: snappy,
none
--receive.replication-factor=1
How many times to replicate incoming write
requests.
--receive.replication-protocol=protobuf
The protocol to use for replicating
remote-write requests. One of protobuf,
capnproto
--receive.capnproto-address="0.0.0.0:19391"
Address for the Cap'n Proto server.
--receive.grpc-service-config=<content>
gRPC service configuration file
or content in JSON format. See
https://github.com/grpc/grpc/blob/master/doc/service_config.md
--receive.relabel-config-file=<file-path>
Path to YAML file that contains relabeling
configuration.
--receive.relabel-config=<content>
Alternative to 'receive.relabel-config-file'
flag (mutually exclusive). Content of YAML file
that contains relabeling configuration.
--tsdb.too-far-in-future.time-window=0s
Configures the allowed time window for
ingesting samples too far in the future.
Disabled (0s) by default. Please note enable
this flag will reject samples in the future of
receive local NTP time + configured duration
due to clock skew in remote write clients.
--tsdb.out-of-order.time-window=0s
[EXPERIMENTAL] Configures the allowed
time window for ingestion of out-of-order
samples. Disabled (0s) by defaultPlease
note if you enable this option and you
use compactor, make sure you have the
--compact.enable-vertical-compaction flag
enabled, otherwise you might risk compactor
halt.
--tsdb.out-of-order.cap-max=0
[EXPERIMENTAL] Configures the maximum capacity
for out-of-order chunks (in samples). If set to
<=0, default value 32 is assumed.
--[no-]tsdb.allow-overlapping-blocks
Allow overlapping blocks, which in turn enables
vertical compaction and vertical query merge.
Does not do anything, enabled all the time.
--tsdb.max-retention-bytes=0
Maximum number of bytes that can be stored for
blocks. A unit is required, supported units: B,
KB, MB, GB, TB, PB, EB. Ex: "512MB". Based on
powers-of-2, so 1KB is 1024B.
--[no-]tsdb.wal-compression
Compress the tsdb WAL.
--[no-]tsdb.no-lockfile Do not create lockfile in TSDB data directory.
In any case, the lockfiles will be deleted on
next startup.
--tsdb.head.expanded-postings-cache-size=0
[EXPERIMENTAL] If non-zero, enables expanded
postings cache for the head block.
--tsdb.block.expanded-postings-cache-size=0
[EXPERIMENTAL] If non-zero, enables expanded
postings cache for compacted blocks.
--tsdb.max-exemplars=0 Enables support for ingesting exemplars and
sets the maximum number of exemplars that will
be stored per tenant. In case the exemplar
@ -425,32 +634,41 @@ Flags:
ingesting a new exemplar will evict the oldest
exemplar from storage. 0 (or less) value of
this flag disables exemplars storage.
--tsdb.max-retention-bytes=0
Maximum number of bytes that can be stored for
blocks. A unit is required, supported units: B,
KB, MB, GB, TB, PB, EB. Ex: "512MB". Based on
powers-of-2, so 1KB is 1024B.
--tsdb.no-lockfile Do not create lockfile in TSDB data directory.
In any case, the lockfiles will be deleted on
next startup.
--tsdb.path="./data" Data directory of TSDB.
--tsdb.retention=15d How long to retain raw samples on local
storage. 0d - disables the retention
policy (i.e. infinite retention).
For more details on how retention is
enforced for individual tenants, please
refer to the Tenant lifecycle management
section in the Receive documentation:
https://thanos.io/tip/components/receive.md/#tenant-lifecycle-management
--tsdb.too-far-in-future.time-window=0s
[EXPERIMENTAL] Configures the allowed time
window for ingesting samples too far in the
future. Disabled (0s) by defaultPlease note
enable this flag will reject samples in the
future of receive local NTP time + configured
duration due to clock skew in remote write
clients.
--tsdb.wal-compression Compress the tsdb WAL.
--version Show application version.
--[no-]tsdb.enable-native-histograms
[EXPERIMENTAL] Enables the ingestion of native
histograms.
--hash-func= Specify which hash function to use when
calculating the hashes of produced files.
If no function has been specified, it does not
happen. This permits avoiding downloading some
files twice albeit at some performance cost.
Possible values are: "", "SHA256".
--matcher-cache-size=0 Max number of cached matchers items. Using 0
disables caching.
--request.logging-config-file=<file-path>
Path to YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config=<content>
Alternative to 'request.logging-config-file'
flag (mutually exclusive). Content
of YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--[no-]receive.otlp-enable-target-info
Enables target information in OTLP metrics
ingested by Receive. If enabled, it converts
the resource to the target info metric
--receive.otlp-promote-resource-attributes= ...
(Repeatable) Resource attributes to include in
OTLP metrics ingested by Receive.
--enable-feature= ... Comma separated experimental feature names
to enable. The current list of features is
metric-names-filter.
--receive.lazy-retrieval-max-buffered-responses=20
The lazy retrieval strategy can buffer up to
this number of responses. This is to limit the
memory usage. This flag takes effect only when
the lazy retrieval strategy is enabled.
```

View File

@ -18,6 +18,7 @@ The data of each Rule node can be labeled to satisfy the clusters labeling schem
thanos rule \
--data-dir "/path/to/data" \
--eval-interval "30s" \
--rule-query-offset "10s" \
--rule-file "/path/to/rules/*.rules.yaml" \
--alert.query-url "http://0.0.0.0:9090" \ # This tells what query URL to link to in UI.
--alertmanagers.url "http://alert.thanos.io" \
@ -64,6 +65,9 @@ name: <string>
# How often rules in the group are evaluated.
[ interval: <duration> | default = global.evaluation_interval ]
# Offset the rule evaluation timestamp of this particular group by the specified duration into the past.
[ query_offset: <duration> | default = --rule-query-offset flag ]
rules:
[ - <rule> ... ]
```
@ -265,96 +269,29 @@ usage: thanos rule [<flags>]
Ruler evaluating Prometheus rules against given Query nodes, exposing Store API
and storing old blocks in bucket.
Flags:
--alert.label-drop=ALERT.LABEL-DROP ...
Labels by name to drop before sending
to alertmanager. This allows alert to be
deduplicated on replica label (repeated).
Similar Prometheus alert relabelling
--alert.query-template="/graph?g0.expr={{.Expr}}&g0.tab=1"
Template to use in alerts source field.
Need only include {{.Expr}} parameter
--alert.query-url=ALERT.QUERY-URL
The external Thanos Query URL that would be set
in all alerts 'Source' field
--alert.relabel-config=<content>
Alternative to 'alert.relabel-config-file' flag
(mutually exclusive). Content of YAML file that
contains alert relabelling configuration.
--alert.relabel-config-file=<file-path>
Path to YAML file that contains alert
relabelling configuration.
--alertmanagers.config=<content>
Alternative to 'alertmanagers.config-file'
flag (mutually exclusive). Content
of YAML file that contains alerting
configuration. See format details:
https://thanos.io/tip/components/rule.md/#configuration.
If defined, it takes precedence
over the '--alertmanagers.url' and
'--alertmanagers.send-timeout' flags.
--alertmanagers.config-file=<file-path>
Path to YAML file that contains alerting
configuration. See format details:
https://thanos.io/tip/components/rule.md/#configuration.
If defined, it takes precedence
over the '--alertmanagers.url' and
'--alertmanagers.send-timeout' flags.
--alertmanagers.sd-dns-interval=30s
Interval between DNS resolutions of
Alertmanager hosts.
--alertmanagers.send-timeout=10s
Timeout for sending alerts to Alertmanager
--alertmanagers.url=ALERTMANAGERS.URL ...
Alertmanager replica URLs to push firing
alerts. Ruler claims success if push to
at least one alertmanager from discovered
succeeds. The scheme should not be empty
e.g `http` might be used. The scheme may be
prefixed with 'dns+' or 'dnssrv+' to detect
Alertmanager IPs through respective DNS
lookups. The port defaults to 9093 or the
SRV record's value. The URL path is used as a
prefix for the regular Alertmanager API path.
--data-dir="data/" data directory
--eval-interval=1m The default evaluation interval to use.
--for-grace-period=10m Minimum duration between alert and restored
"for" state. This is maintained only for alerts
with configured "for" time greater than grace
period.
--for-outage-tolerance=1h Max time to tolerate prometheus outage for
restoring "for" state of alert.
--grpc-address="0.0.0.0:10901"
Listen ip:port address for gRPC endpoints
(StoreAPI). Make sure this address is routable
from other components.
--grpc-grace-period=2m Time to wait after an interrupt received for
GRPC Server.
--grpc-query-endpoint=<endpoint> ...
Addresses of Thanos gRPC query API servers
(repeatable). The scheme may be prefixed
with 'dns+' or 'dnssrv+' to detect Thanos API
servers through respective DNS lookups.
--grpc-server-max-connection-age=60m
The grpc server max connection age. This
controls how often to re-establish connections
and redo TLS handshakes.
--grpc-server-tls-cert="" TLS Certificate for gRPC server, leave blank to
disable TLS
--grpc-server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--grpc-server-tls-key="" TLS Key for the gRPC server, leave blank to
disable TLS
--hash-func= Specify which hash function to use when
calculating the hashes of produced files.
If no function has been specified, it does not
happen. This permits avoiding downloading some
files twice albeit at some performance cost.
Possible values are: "", "SHA256".
-h, --help Show context-sensitive help (also try
-h, --[no-]help Show context-sensitive help (also try
--help-long and --help-man).
--[no-]version Show application version.
--log.level=info Log filtering level.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--[no-]enable-auto-gomemlimit
Enable go runtime to automatically limit memory
consumption.
--auto-gomemlimit.ratio=0.9
The ratio of reserved GOMEMLIMIT memory to the
detected maximum container or system memory.
--http-address="0.0.0.0:10902"
Listen host:port for HTTP endpoints.
--http-grace-period=2m Time to wait after an interrupt received for
@ -362,136 +299,33 @@ Flags:
--http.config="" [EXPERIMENTAL] Path to the configuration file
that can enable TLS or authentication for all
HTTP endpoints.
--label=<name>="<value>" ...
Labels to be applied to all generated metrics
(repeated). Similar to external labels for
Prometheus, used to identify ruler and its
blocks as unique source.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--log.level=info Log filtering level.
--objstore.config=<content>
Alternative to 'objstore.config-file'
flag (mutually exclusive). Content of
YAML file that contains object store
configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--objstore.config-file=<file-path>
Path to YAML file that contains object
store configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--query=<query> ... Addresses of statically configured query
API servers (repeatable). The scheme may be
prefixed with 'dns+' or 'dnssrv+' to detect
query API servers through respective DNS
lookups.
--query.config=<content> Alternative to 'query.config-file' flag
(mutually exclusive). Content of YAML
file that contains query API servers
configuration. See format details:
https://thanos.io/tip/components/rule.md/#configuration.
If defined, it takes precedence over the
'--query' and '--query.sd-files' flags.
--query.config-file=<file-path>
Path to YAML file that contains query API
servers configuration. See format details:
https://thanos.io/tip/components/rule.md/#configuration.
If defined, it takes precedence over the
'--query' and '--query.sd-files' flags.
--query.default-step=1s Default range query step to use. This is
only used in stateless Ruler and alert state
restoration.
--query.http-method=POST HTTP method to use when sending queries.
Possible options: [GET, POST]
--query.sd-dns-interval=30s
Interval between DNS resolutions.
--query.sd-files=<path> ...
Path to file that contains addresses of query
API servers. The path can be a glob pattern
(repeatable).
--query.sd-interval=5m Refresh interval to re-read file SD files.
(used as a fallback)
--remote-write.config=<content>
Alternative to 'remote-write.config-file'
flag (mutually exclusive). Content
of YAML config for the remote-write
configurations, that specify servers
where samples should be sent to (see
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write).
This automatically enables stateless mode
for ruler and no series will be stored in the
ruler's TSDB. If an empty config (or file) is
provided, the flag is ignored and ruler is run
with its own TSDB.
--remote-write.config-file=<file-path>
Path to YAML config for the remote-write
configurations, that specify servers
where samples should be sent to (see
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write).
This automatically enables stateless mode
for ruler and no series will be stored in the
ruler's TSDB. If an empty config (or file) is
provided, the flag is ignored and ruler is run
with its own TSDB.
--request.logging-config=<content>
Alternative to 'request.logging-config-file'
flag (mutually exclusive). Content
of YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config-file=<file-path>
Path to YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--resend-delay=1m Minimum amount of time to wait before resending
an alert to Alertmanager.
--restore-ignored-label=RESTORE-IGNORED-LABEL ...
Label names to be ignored when restoring alerts
from the remote storage. This is only used in
stateless mode.
--rule-file=rules/ ... Rule files that should be used by rule
manager. Can be in glob format (repeated).
Note that rules are not automatically detected,
use SIGHUP or do HTTP POST /-/reload to re-read
them.
--shipper.meta-file-name="thanos.shipper.json"
the file to store shipper metadata in
--shipper.upload-compacted
If true shipper will try to upload compacted
blocks as well. Useful for migration purposes.
Works only if compaction is disabled on
Prometheus. Do it once and then disable the
flag when done.
--store.limits.request-samples=0
The maximum samples allowed for a single
Series request, The Series call fails if
this limit is exceeded. 0 means no limit.
NOTE: For efficiency the limit is internally
implemented as 'chunks limit' considering each
chunk contains a maximum of 120 samples.
--store.limits.request-series=0
The maximum series allowed for a single Series
request. The Series call fails if this limit is
exceeded. 0 means no limit.
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tsdb.block-duration=2h Block duration for TSDB block.
--tsdb.no-lockfile Do not create lockfile in TSDB data directory.
In any case, the lockfiles will be deleted on
next startup.
--tsdb.retention=48h Block retention time on local disk.
--tsdb.wal-compression Compress the tsdb WAL.
--version Show application version.
--web.disable-cors Whether to disable CORS headers to be set by
Thanos. By default Thanos sets CORS headers to
be allowed by all.
--grpc-address="0.0.0.0:10901"
Listen ip:port address for gRPC endpoints
(StoreAPI). Make sure this address is routable
from other components.
--grpc-server-tls-cert="" TLS Certificate for gRPC server, leave blank to
disable TLS
--grpc-server-tls-key="" TLS Key for the gRPC server, leave blank to
disable TLS
--grpc-server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--grpc-server-tls-min-version="1.3"
TLS supported minimum version for gRPC server.
If no version is specified, it'll default to
1.3. Allowed values: ["1.0", "1.1", "1.2",
"1.3"]
--grpc-server-max-connection-age=60m
The grpc server max connection age. This
controls how often to re-establish connections
and redo TLS handshakes.
--grpc-grace-period=2m Time to wait after an interrupt received for
GRPC Server.
--web.route-prefix="" Prefix for API and UI endpoints. This allows
thanos UI to be served on a sub-path. This
option is analogous to --web.route-prefix of
Prometheus.
--web.external-prefix="" Static prefix for all HTML links and redirect
URLs in the bucket web UI interface.
Actual endpoints are still served on / or the
@ -511,10 +345,209 @@ Flags:
stripped prefix value in X-Forwarded-Prefix
header. This allows thanos UI to be served on a
sub-path.
--web.route-prefix="" Prefix for API and UI endpoints. This allows
thanos UI to be served on a sub-path. This
option is analogous to --web.route-prefix of
Prometheus.
--[no-]web.disable-cors Whether to disable CORS headers to be set by
Thanos. By default Thanos sets CORS headers to
be allowed by all.
--[no-]shipper.upload-compacted
If true shipper will try to upload compacted
blocks as well. Useful for migration purposes.
Works only if compaction is disabled on
Prometheus. Do it once and then disable the
flag when done.
--hash-func= Specify which hash function to use when
calculating the hashes of produced files.
If no function has been specified, it does not
happen. This permits avoiding downloading some
files twice albeit at some performance cost.
Possible values are: "", "SHA256".
--shipper.meta-file-name="thanos.shipper.json"
the file to store shipper metadata in
--query=<query> ... Addresses of statically configured query
API servers (repeatable). The scheme may be
prefixed with 'dns+' or 'dnssrv+' to detect
query API servers through respective DNS
lookups.
--query.config-file=<file-path>
Path to YAML file that contains query API
servers configuration. See format details:
https://thanos.io/tip/components/rule.md/#configuration.
If defined, it takes precedence over the
'--query' and '--query.sd-files' flags.
--query.config=<content> Alternative to 'query.config-file' flag
(mutually exclusive). Content of YAML
file that contains query API servers
configuration. See format details:
https://thanos.io/tip/components/rule.md/#configuration.
If defined, it takes precedence over the
'--query' and '--query.sd-files' flags.
--query.sd-files=<path> ...
Path to file that contains addresses of query
API servers. The path can be a glob pattern
(repeatable).
--query.sd-interval=5m Refresh interval to re-read file SD files.
(used as a fallback)
--query.sd-dns-interval=30s
Interval between DNS resolutions.
--query.http-method=POST HTTP method to use when sending queries.
Possible options: [GET, POST]
--query.default-step=1s Default range query step to use. This is
only used in stateless Ruler and alert state
restoration.
--alertmanagers.config-file=<file-path>
Path to YAML file that contains alerting
configuration. See format details:
https://thanos.io/tip/components/rule.md/#configuration.
If defined, it takes precedence
over the '--alertmanagers.url' and
'--alertmanagers.send-timeout' flags.
--alertmanagers.config=<content>
Alternative to 'alertmanagers.config-file'
flag (mutually exclusive). Content
of YAML file that contains alerting
configuration. See format details:
https://thanos.io/tip/components/rule.md/#configuration.
If defined, it takes precedence
over the '--alertmanagers.url' and
'--alertmanagers.send-timeout' flags.
--alertmanagers.url=ALERTMANAGERS.URL ...
Alertmanager replica URLs to push firing
alerts. Ruler claims success if push to
at least one alertmanager from discovered
succeeds. The scheme should not be empty
e.g `http` might be used. The scheme may be
prefixed with 'dns+' or 'dnssrv+' to detect
Alertmanager IPs through respective DNS
lookups. The port defaults to 9093 or the
SRV record's value. The URL path is used as a
prefix for the regular Alertmanager API path.
--alertmanagers.send-timeout=10s
Timeout for sending alerts to Alertmanager
--alertmanagers.sd-dns-interval=30s
Interval between DNS resolutions of
Alertmanager hosts.
--alert.query-url=ALERT.QUERY-URL
The external Thanos Query URL that would be set
in all alerts 'Source' field
--alert.label-drop=ALERT.LABEL-DROP ...
Labels by name to drop before sending
to alertmanager. This allows alert to be
deduplicated on replica label (repeated).
Similar Prometheus alert relabelling
--alert.relabel-config-file=<file-path>
Path to YAML file that contains alert
relabelling configuration.
--alert.relabel-config=<content>
Alternative to 'alert.relabel-config-file' flag
(mutually exclusive). Content of YAML file that
contains alert relabelling configuration.
--alert.query-template="/graph?g0.expr={{.Expr}}&g0.tab=1"
Template to use in alerts source field.
Need only include {{.Expr}} parameter
--store.limits.request-series=0
The maximum series allowed for a single Series
request. The Series call fails if this limit is
exceeded. 0 means no limit.
--store.limits.request-samples=0
The maximum samples allowed for a single
Series request, The Series call fails if
this limit is exceeded. 0 means no limit.
NOTE: For efficiency the limit is internally
implemented as 'chunks limit' considering each
chunk contains a maximum of 120 samples.
--label=<name>="<value>" ...
Labels to be applied to all generated metrics
(repeated). Similar to external labels for
Prometheus, used to identify ruler and its
blocks as unique source.
--tsdb.block-duration=2h Block duration for TSDB block.
--tsdb.retention=48h Block retention time on local disk.
--[no-]tsdb.no-lockfile Do not create lockfile in TSDB data directory.
In any case, the lockfiles will be deleted on
next startup.
--[no-]tsdb.wal-compression
Compress the tsdb WAL.
--data-dir="data/" data directory
--rule-file=rules/ ... Rule files that should be used by rule
manager. Can be in glob format (repeated).
Note that rules are not automatically detected,
use SIGHUP or do HTTP POST /-/reload to re-read
them.
--resend-delay=1m Minimum amount of time to wait before resending
an alert to Alertmanager.
--eval-interval=1m The default evaluation interval to use.
--rule-query-offset=0s The default rule group query_offset duration to
use.
--for-outage-tolerance=1h Max time to tolerate prometheus outage for
restoring "for" state of alert.
--for-grace-period=10m Minimum duration between alert and restored
"for" state. This is maintained only for alerts
with configured "for" time greater than grace
period.
--restore-ignored-label=RESTORE-IGNORED-LABEL ...
Label names to be ignored when restoring alerts
from the remote storage. This is only used in
stateless mode.
--rule-concurrent-evaluation=1
How many rules can be evaluated concurrently.
Default is 1.
--grpc-query-endpoint=<endpoint> ...
Addresses of Thanos gRPC query API servers
(repeatable). The scheme may be prefixed
with 'dns+' or 'dnssrv+' to detect Thanos API
servers through respective DNS lookups.
--[no-]query.enable-x-functions
Whether to enable extended rate functions
(xrate, xincrease and xdelta). Only has effect
when used with Thanos engine.
--enable-feature= ... Comma separated feature names to enable. Valid
options for now: promql-experimental-functions
(enables promql experimental functions for
ruler)
--[no-]tsdb.enable-native-histograms
[EXPERIMENTAL] Enables the ingestion of native
histograms.
--remote-write.config-file=<file-path>
Path to YAML config for the remote-write
configurations, that specify servers
where samples should be sent to (see
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write).
This automatically enables stateless mode
for ruler and no series will be stored in the
ruler's TSDB. If an empty config (or file) is
provided, the flag is ignored and ruler is run
with its own TSDB.
--remote-write.config=<content>
Alternative to 'remote-write.config-file'
flag (mutually exclusive). Content
of YAML config for the remote-write
configurations, that specify servers
where samples should be sent to (see
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write).
This automatically enables stateless mode
for ruler and no series will be stored in the
ruler's TSDB. If an empty config (or file) is
provided, the flag is ignored and ruler is run
with its own TSDB.
--objstore.config-file=<file-path>
Path to YAML file that contains object
store configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--objstore.config=<content>
Alternative to 'objstore.config-file'
flag (mutually exclusive). Content of
YAML file that contains object store
configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--request.logging-config-file=<file-path>
Path to YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config=<content>
Alternative to 'request.logging-config-file'
flag (mutually exclusive). Content
of YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
```

View File

@ -56,6 +56,26 @@ type: GCS
config:
bucket: ""
service_account: ""
use_grpc: false
grpc_conn_pool_size: 0
http_config:
idle_conn_timeout: 1m30s
response_header_timeout: 2m
insecure_skip_verify: false
tls_handshake_timeout: 10s
expect_continue_timeout: 1s
max_idle_conns: 100
max_idle_conns_per_host: 100
max_conns_per_host: 0
tls_config:
ca_file: ""
cert_file: ""
key_file: ""
server_name: ""
insecure_skip_verify: false
disable_compression: false
chunk_size_bytes: 0
max_retries: 0
prefix: ""
```
@ -75,33 +95,29 @@ usage: thanos sidecar [<flags>]
Sidecar for Prometheus server.
Flags:
--grpc-address="0.0.0.0:10901"
Listen ip:port address for gRPC endpoints
(StoreAPI). Make sure this address is routable
from other components.
--grpc-grace-period=2m Time to wait after an interrupt received for
GRPC Server.
--grpc-server-max-connection-age=60m
The grpc server max connection age. This
controls how often to re-establish connections
and redo TLS handshakes.
--grpc-server-tls-cert="" TLS Certificate for gRPC server, leave blank to
disable TLS
--grpc-server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--grpc-server-tls-key="" TLS Key for the gRPC server, leave blank to
disable TLS
--hash-func= Specify which hash function to use when
calculating the hashes of produced files.
If no function has been specified, it does not
happen. This permits avoiding downloading some
files twice albeit at some performance cost.
Possible values are: "", "SHA256".
-h, --help Show context-sensitive help (also try
-h, --[no-]help Show context-sensitive help (also try
--help-long and --help-man).
--[no-]version Show application version.
--log.level=info Log filtering level.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--[no-]enable-auto-gomemlimit
Enable go runtime to automatically limit memory
consumption.
--auto-gomemlimit.ratio=0.9
The ratio of reserved GOMEMLIMIT memory to the
detected maximum container or system memory.
--http-address="0.0.0.0:10902"
Listen host:port for HTTP endpoints.
--http-grace-period=2m Time to wait after an interrupt received for
@ -109,81 +125,105 @@ Flags:
--http.config="" [EXPERIMENTAL] Path to the configuration file
that can enable TLS or authentication for all
HTTP endpoints.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--log.level=info Log filtering level.
--min-time=0000-01-01T00:00:00Z
Start of time range limit to serve. Thanos
sidecar will serve only metrics, which happened
later than this value. Option can be a constant
time in RFC3339 format or time duration
relative to current time, such as -1d or 2h45m.
Valid duration units are ms, s, m, h, d, w, y.
--objstore.config=<content>
Alternative to 'objstore.config-file'
flag (mutually exclusive). Content of
YAML file that contains object store
configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--objstore.config-file=<file-path>
Path to YAML file that contains object
store configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--grpc-address="0.0.0.0:10901"
Listen ip:port address for gRPC endpoints
(StoreAPI). Make sure this address is routable
from other components.
--grpc-server-tls-cert="" TLS Certificate for gRPC server, leave blank to
disable TLS
--grpc-server-tls-key="" TLS Key for the gRPC server, leave blank to
disable TLS
--grpc-server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--grpc-server-tls-min-version="1.3"
TLS supported minimum version for gRPC server.
If no version is specified, it'll default to
1.3. Allowed values: ["1.0", "1.1", "1.2",
"1.3"]
--grpc-server-max-connection-age=60m
The grpc server max connection age. This
controls how often to re-establish connections
and redo TLS handshakes.
--grpc-grace-period=2m Time to wait after an interrupt received for
GRPC Server.
--prometheus.url=http://localhost:9090
URL at which to reach Prometheus's API.
For better performance use local network.
--prometheus.ready_timeout=10m
Maximum time to wait for the Prometheus
instance to start up
--prometheus.get_config_interval=30s
How often to get Prometheus config
--prometheus.get_config_timeout=5s
--prometheus.get_config_timeout=30s
Timeout for getting Prometheus config
--prometheus.http-client-file=<file-path>
Path to YAML file or string with http
client configs. See Format details:
https://thanos.io/tip/components/sidecar.md/#configuration.
--prometheus.http-client=<content>
Alternative to 'prometheus.http-client-file'
flag (mutually exclusive). Content
of YAML file or string with http
client configs. See Format details:
https://thanos.io/tip/components/sidecar.md/#configuration.
--prometheus.http-client-file=<file-path>
Path to YAML file or string with http
client configs. See Format details:
https://thanos.io/tip/components/sidecar.md/#configuration.
--prometheus.ready_timeout=10m
Maximum time to wait for the Prometheus
instance to start up
--prometheus.url=http://localhost:9090
URL at which to reach Prometheus's API.
For better performance use local network.
--tsdb.path="./data" Data directory of TSDB.
--reloader.config-file="" Config file watched by the reloader.
--reloader.config-envsubst-file=""
Output file for environment variable
substituted config file.
--reloader.config-file="" Config file watched by the reloader.
--reloader.method=http Method used to reload the configuration.
--reloader.process-name="prometheus"
Executable name used to match the process being
reloaded when using the signal method.
--reloader.retry-interval=5s
Controls how often reloader retries config
reload in case of error.
--reloader.rule-dir=RELOADER.RULE-DIR ...
Rule directories for the reloader to refresh
(repeated field).
--reloader.watch-interval=3m
Controls how often reloader re-reads config and
rules.
--reloader.retry-interval=5s
Controls how often reloader retries config
reload in case of error.
--reloader.method=http Method used to reload the configuration.
--reloader.process-name="prometheus"
Executable name used to match the process being
reloaded when using the signal method.
--request.logging-config-file=<file-path>
Path to YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config=<content>
Alternative to 'request.logging-config-file'
flag (mutually exclusive). Content
of YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config-file=<file-path>
Path to YAML file with request logging
--objstore.config-file=<file-path>
Path to YAML file that contains object
store configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--objstore.config=<content>
Alternative to 'objstore.config-file'
flag (mutually exclusive). Content of
YAML file that contains object store
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--shipper.meta-file-name="thanos.shipper.json"
the file to store shipper metadata in
--shipper.upload-compacted
https://thanos.io/tip/thanos/storage.md/#configuration
--[no-]shipper.upload-compacted
If true shipper will try to upload compacted
blocks as well. Useful for migration purposes.
Works only if compaction is disabled on
Prometheus. Do it once and then disable the
flag when done.
--hash-func= Specify which hash function to use when
calculating the hashes of produced files.
If no function has been specified, it does not
happen. This permits avoiding downloading some
files twice albeit at some performance cost.
Possible values are: "", "SHA256".
--shipper.meta-file-name="thanos.shipper.json"
the file to store shipper metadata in
--store.limits.request-series=0
The maximum series allowed for a single Series
request. The Series call fails if this limit is
exceeded. 0 means no limit.
--store.limits.request-samples=0
The maximum samples allowed for a single
Series request, The Series call fails if
@ -191,21 +231,13 @@ Flags:
NOTE: For efficiency the limit is internally
implemented as 'chunks limit' considering each
chunk contains a maximum of 120 samples.
--store.limits.request-series=0
The maximum series allowed for a single Series
request. The Series call fails if this limit is
exceeded. 0 means no limit.
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tsdb.path="./data" Data directory of TSDB.
--version Show application version.
--min-time=0000-01-01T00:00:00Z
Start of time range limit to serve. Thanos
sidecar will serve only metrics, which happened
later than this value. Option can be a constant
time in RFC3339 format or time duration
relative to current time, such as -1d or 2h45m.
Valid duration units are ms, s, m, h, d, w, y.
```

View File

@ -15,6 +15,26 @@ type: GCS
config:
bucket: ""
service_account: ""
use_grpc: false
grpc_conn_pool_size: 0
http_config:
idle_conn_timeout: 1m30s
response_header_timeout: 2m
insecure_skip_verify: false
tls_handshake_timeout: 10s
expect_continue_timeout: 1s
max_idle_conns: 100
max_idle_conns_per_host: 100
max_conns_per_host: 0
tls_config:
ca_file: ""
cert_file: ""
key_file: ""
server_name: ""
insecure_skip_verify: false
disable_compression: false
chunk_size_bytes: 0
max_retries: 0
prefix: ""
```
@ -28,30 +48,70 @@ usage: thanos store [<flags>]
Store node giving access to blocks in a bucket provider. Now supported GCS, S3,
Azure, Swift, Tencent COS and Aliyun OSS.
Flags:
--block-meta-fetch-concurrency=32
Number of goroutines to use when fetching block
metadata from object storage.
--block-sync-concurrency=20
Number of goroutines to use when constructing
index-cache.json blocks from object storage.
Must be equal or greater than 1.
--bucket-web-label=BUCKET-WEB-LABEL
External block label to use as group title in
the bucket web UI
--cache-index-header Cache TSDB index-headers on disk to reduce
startup time. When set to true, Thanos Store
will download index headers from remote object
storage on startup and create a header file on
disk. Use --data-dir to set the directory in
which index headers will be downloaded.
--chunk-pool-size=2GB Maximum size of concurrently allocatable
bytes reserved strictly to reuse for chunks in
memory.
--consistency-delay=0s Minimum age of all blocks before they are
being read. Set it to safe value (e.g 30m) if
your object storage is eventually consistent.
GCS and S3 are (roughly) strongly consistent.
-h, --[no-]help Show context-sensitive help (also try
--help-long and --help-man).
--[no-]version Show application version.
--log.level=info Log filtering level.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--[no-]enable-auto-gomemlimit
Enable go runtime to automatically limit memory
consumption.
--auto-gomemlimit.ratio=0.9
The ratio of reserved GOMEMLIMIT memory to the
detected maximum container or system memory.
--http-address="0.0.0.0:10902"
Listen host:port for HTTP endpoints.
--http-grace-period=2m Time to wait after an interrupt received for
HTTP Server.
--http.config="" [EXPERIMENTAL] Path to the configuration file
that can enable TLS or authentication for all
HTTP endpoints.
--grpc-address="0.0.0.0:10901"
Listen ip:port address for gRPC endpoints
(StoreAPI). Make sure this address is routable
from other components.
--grpc-server-tls-cert="" TLS Certificate for gRPC server, leave blank to
disable TLS
--grpc-server-tls-key="" TLS Key for the gRPC server, leave blank to
disable TLS
--grpc-server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--grpc-server-tls-min-version="1.3"
TLS supported minimum version for gRPC server.
If no version is specified, it'll default to
1.3. Allowed values: ["1.0", "1.1", "1.2",
"1.3"]
--grpc-server-max-connection-age=60m
The grpc server max connection age. This
controls how often to re-establish connections
and redo TLS handshakes.
--grpc-grace-period=2m Time to wait after an interrupt received for
GRPC Server.
--store.limits.request-series=0
The maximum series allowed for a single Series
request. The Series call fails if this limit is
exceeded. 0 means no limit.
--store.limits.request-samples=0
The maximum samples allowed for a single
Series request, The Series call fails if
this limit is exceeded. 0 means no limit.
NOTE: For efficiency the limit is internally
implemented as 'chunks limit' considering each
chunk contains a maximum of 120 samples.
--data-dir="./data" Local data directory used for caching
purposes (index-header, in-mem cache items and
meta.jsons). If removed, no data will be lost,
@ -60,33 +120,101 @@ Flags:
cause the store to read them. For such use
cases use Prometheus + sidecar. Ignored if
--no-cache-index-header option is specified.
--grpc-address="0.0.0.0:10901"
Listen ip:port address for gRPC endpoints
(StoreAPI). Make sure this address is routable
from other components.
--grpc-grace-period=2m Time to wait after an interrupt received for
GRPC Server.
--grpc-server-max-connection-age=60m
The grpc server max connection age. This
controls how often to re-establish connections
and redo TLS handshakes.
--grpc-server-tls-cert="" TLS Certificate for gRPC server, leave blank to
disable TLS
--grpc-server-tls-client-ca=""
TLS CA to verify clients against. If no
client CA is specified, there is no client
verification on server side. (tls.NoClientCert)
--grpc-server-tls-key="" TLS Key for the gRPC server, leave blank to
disable TLS
-h, --help Show context-sensitive help (also try
--help-long and --help-man).
--http-address="0.0.0.0:10902"
Listen host:port for HTTP endpoints.
--http-grace-period=2m Time to wait after an interrupt received for
HTTP Server.
--http.config="" [EXPERIMENTAL] Path to the configuration file
that can enable TLS or authentication for all
HTTP endpoints.
--[no-]cache-index-header Cache TSDB index-headers on disk to reduce
startup time. When set to true, Thanos Store
will download index headers from remote object
storage on startup and create a header file on
disk. Use --data-dir to set the directory in
which index headers will be downloaded.
--index-cache-size=250MB Maximum size of items held in the in-memory
index cache. Ignored if --index-cache.config or
--index-cache.config-file option is specified.
--index-cache.config-file=<file-path>
Path to YAML file that contains index
cache configuration. See format details:
https://thanos.io/tip/components/store.md/#index-cache
--index-cache.config=<content>
Alternative to 'index-cache.config-file'
flag (mutually exclusive). Content of
YAML file that contains index cache
configuration. See format details:
https://thanos.io/tip/components/store.md/#index-cache
--chunk-pool-size=2GB Maximum size of concurrently allocatable
bytes reserved strictly to reuse for chunks in
memory.
--store.grpc.touched-series-limit=0
DEPRECATED: use store.limits.request-series.
--store.grpc.series-sample-limit=0
DEPRECATED: use store.limits.request-samples.
--store.grpc.downloaded-bytes-limit=0
Maximum amount of downloaded (either
fetched or touched) bytes in a single
Series/LabelNames/LabelValues call. The Series
call fails if this limit is exceeded. 0 means
no limit.
--store.grpc.series-max-concurrency=20
Maximum number of concurrent Series calls.
--objstore.config-file=<file-path>
Path to YAML file that contains object
store configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--objstore.config=<content>
Alternative to 'objstore.config-file'
flag (mutually exclusive). Content of
YAML file that contains object store
configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--sync-block-duration=15m Repeat interval for syncing the blocks between
local and remote view.
--block-discovery-strategy="concurrent"
One of concurrent, recursive. When set to
concurrent, stores will concurrently issue
one call per directory to discover active
blocks in the bucket. The recursive strategy
iterates through all objects in the bucket,
recursively traversing into each directory.
This avoids N+1 calls at the expense of having
slower bucket iterations.
--block-sync-concurrency=20
Number of goroutines to use when constructing
index-cache.json blocks from object storage.
Must be equal or greater than 1.
--block-meta-fetch-concurrency=32
Number of goroutines to use when fetching block
metadata from object storage.
--min-time=0000-01-01T00:00:00Z
Start of time range limit to serve. Thanos
Store will serve only metrics, which happened
later than this value. Option can be a constant
time in RFC3339 format or time duration
relative to current time, such as -1d or 2h45m.
Valid duration units are ms, s, m, h, d, w, y.
--max-time=9999-12-31T23:59:59Z
End of time range limit to serve. Thanos Store
will serve only blocks, which happened earlier
than this value. Option can be a constant time
in RFC3339 format or time duration relative
to current time, such as -1d or 2h45m. Valid
duration units are ms, s, m, h, d, w, y.
--selector.relabel-config-file=<file-path>
Path to YAML file with relabeling
configuration that allows selecting blocks
to act on based on their external labels.
It follows thanos sharding relabel-config
syntax. For format details see:
https://thanos.io/tip/thanos/sharding.md/#relabelling
--selector.relabel-config=<content>
Alternative to 'selector.relabel-config-file'
flag (mutually exclusive). Content of YAML
file with relabeling configuration that allows
selecting blocks to act on based on their
external labels. It follows thanos sharding
relabel-config syntax. For format details see:
https://thanos.io/tip/thanos/sharding.md/#relabelling
--consistency-delay=0s Minimum age of all blocks before they are
being read. Set it to safe value (e.g 30m) if
your object storage is eventually consistent.
GCS and S3 are (roughly) strongly consistent.
--ignore-deletion-marks-delay=24h
Duration after which the blocks marked for
deletion will be filtered out while fetching
@ -108,124 +236,34 @@ Flags:
blocks before being deleted from bucket.
Default is 24h, half of the default value for
--delete-delay on compactor.
--index-cache-size=250MB Maximum size of items held in the in-memory
index cache. Ignored if --index-cache.config or
--index-cache.config-file option is specified.
--index-cache.config=<content>
Alternative to 'index-cache.config-file'
flag (mutually exclusive). Content of
YAML file that contains index cache
configuration. See format details:
https://thanos.io/tip/components/store.md/#index-cache
--index-cache.config-file=<file-path>
Path to YAML file that contains index
cache configuration. See format details:
https://thanos.io/tip/components/store.md/#index-cache
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--log.level=info Log filtering level.
--max-time=9999-12-31T23:59:59Z
End of time range limit to serve. Thanos Store
will serve only blocks, which happened earlier
than this value. Option can be a constant time
in RFC3339 format or time duration relative
to current time, such as -1d or 2h45m. Valid
duration units are ms, s, m, h, d, w, y.
--min-time=0000-01-01T00:00:00Z
Start of time range limit to serve. Thanos
Store will serve only metrics, which happened
later than this value. Option can be a constant
time in RFC3339 format or time duration
relative to current time, such as -1d or 2h45m.
Valid duration units are ms, s, m, h, d, w, y.
--objstore.config=<content>
Alternative to 'objstore.config-file'
flag (mutually exclusive). Content of
YAML file that contains object store
configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--objstore.config-file=<file-path>
Path to YAML file that contains object
store configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--request.logging-config=<content>
Alternative to 'request.logging-config-file'
flag (mutually exclusive). Content
of YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config-file=<file-path>
Path to YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--selector.relabel-config=<content>
Alternative to 'selector.relabel-config-file'
flag (mutually exclusive). Content of
YAML file that contains relabeling
configuration that allows selecting
blocks. It follows native Prometheus
relabel-config syntax. See format details:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config
--selector.relabel-config-file=<file-path>
Path to YAML file that contains relabeling
configuration that allows selecting
blocks. It follows native Prometheus
relabel-config syntax. See format details:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config
--store.enable-index-header-lazy-reader
--[no-]store.enable-index-header-lazy-reader
If true, Store Gateway will lazy memory map
index-header only once the block is required by
a query.
--store.enable-lazy-expanded-postings
--[no-]store.enable-lazy-expanded-postings
If true, Store Gateway will estimate postings
size and try to lazily expand postings if
it downloads less data than expanding all
postings.
--store.grpc.downloaded-bytes-limit=0
Maximum amount of downloaded (either
fetched or touched) bytes in a single
Series/LabelNames/LabelValues call. The Series
call fails if this limit is exceeded. 0 means
no limit.
--store.grpc.series-max-concurrency=20
Maximum number of concurrent Series calls.
--store.grpc.series-sample-limit=0
DEPRECATED: use store.limits.request-samples.
--store.grpc.touched-series-limit=0
DEPRECATED: use store.limits.request-series.
--store.posting-group-max-key-series-ratio=100
Mark posting group as lazy if it fetches more
keys than R * max series the query should
fetch. With R set to 100, a posting group which
fetches 100K keys will be marked as lazy if
the current query only fetches 1000 series.
thanos_bucket_store_lazy_expanded_posting_groups_total
shows lazy expanded postings groups with
reasons and you can tune this config
accordingly. This config is only valid if lazy
expanded posting is enabled. 0 disables the
limit.
--store.index-header-lazy-download-strategy=eager
Strategy of how to download index headers
lazily. Supported values: eager, lazy.
If eager, always download index header during
initial load. If lazy, download index header
during query time.
--store.limits.request-samples=0
The maximum samples allowed for a single
Series request, The Series call fails if
this limit is exceeded. 0 means no limit.
NOTE: For efficiency the limit is internally
implemented as 'chunks limit' considering each
chunk contains a maximum of 120 samples.
--store.limits.request-series=0
The maximum series allowed for a single Series
request. The Series call fails if this limit is
exceeded. 0 means no limit.
--sync-block-duration=15m Repeat interval for syncing the blocks between
local and remote view.
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(mutually exclusive). Content of YAML file
with tracing configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tracing.config-file=<file-path>
Path to YAML file with tracing
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--version Show application version.
--web.disable Disable Block Viewer UI.
--web.disable-cors Whether to disable CORS headers to be set by
Thanos. By default Thanos sets CORS headers to
be allowed by all.
--[no-]web.disable Disable Block Viewer UI.
--web.external-prefix="" Static prefix for all HTML links and redirect
URLs in the bucket web UI interface.
Actual endpoints are still served on / or the
@ -245,6 +283,27 @@ Flags:
stripped prefix value in X-Forwarded-Prefix
header. This allows thanos UI to be served on a
sub-path.
--[no-]web.disable-cors Whether to disable CORS headers to be set by
Thanos. By default Thanos sets CORS headers to
be allowed by all.
--bucket-web-label=BUCKET-WEB-LABEL
External block label to use as group title in
the bucket web UI
--matcher-cache-size=0 Max number of cached matchers items. Using 0
disables caching.
--[no-]disable-admin-operations
Disable UI/API admin operations like marking
blocks for deletion and no compaction.
--request.logging-config-file=<file-path>
Path to YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
--request.logging-config=<content>
Alternative to 'request.logging-config-file'
flag (mutually exclusive). Content
of YAML file with request logging
configuration. See format details:
https://thanos.io/tip/thanos/logging.md/#configuration
```
@ -325,6 +384,13 @@ config:
max_get_multi_batch_size: 0
dns_provider_update_interval: 0s
auto_discovery: false
set_async_circuit_breaker_config:
enabled: false
half_open_max_requests: 0
open_duration: 0s
min_requests: 0
consecutive_failures: 0
failure_percent: 0
enabled_items: []
ttl: 0s
```
@ -333,6 +399,12 @@ The **required** settings are:
- `addresses`: list of memcached addresses, that will get resolved with the [DNS service discovery](../service-discovery.md#dns-service-discovery) provider. If your cluster supports auto-discovery, you should use the flag `auto_discovery` instead and only point to *one of* the memcached servers. This typically means that there should be only one address specified that resolves to any of the alive memcached servers. Use this for Amazon ElastiCache and other similar services.
**NOTE**: The Memcached client uses a jump hash algorithm to shard cached entries across a cluster of Memcached servers. For this reason, you should make sure memcached servers are not behind any kind of load balancer and their address is configured so that servers are added/removed to the end of the list whenever a scale up/down occurs. For example, if youre running Memcached in Kubernetes, you may:
1. Deploy your Memcached cluster using a [StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/)
2. Create a [headless](https://kubernetes.io/docs/concepts/services-networking/service/#headless-services) service for Memcached StatefulSet
3. Configure the Thanos's memcached `addresses` using the `dnssrvnoa+` [DNS service discovery](../service-discovery.md#dns-service-discovery)
While the remaining settings are **optional**:
- `timeout`: the socket read/write timeout.
@ -340,10 +412,17 @@ While the remaining settings are **optional**:
- `max_async_concurrency`: maximum number of concurrent asynchronous operations can occur.
- `max_async_buffer_size`: maximum number of enqueued asynchronous operations allowed.
- `max_get_multi_concurrency`: maximum number of concurrent connections when fetching keys. If set to `0`, the concurrency is unlimited.
- `max_get_multi_batch_size`: maximum number of keys a single underlying operation should fetch. If more keys are specified, internally keys are splitted into multiple batches and fetched concurrently, honoring `max_get_multi_concurrency`. If set to `0`, the batch size is unlimited.
- `max_get_multi_batch_size`: maximum number of keys a single underlying operation should fetch. If more keys are specified, internally keys are split into multiple batches and fetched concurrently, honoring `max_get_multi_concurrency`. If set to `0`, the batch size is unlimited.
- `max_item_size`: maximum size of an item to be stored in memcached. This option should be set to the same value of memcached `-I` flag (defaults to 1MB) in order to avoid wasting network round trips to store items larger than the max item size allowed in memcached. If set to `0`, the item size is unlimited.
- `dns_provider_update_interval`: the DNS discovery update interval.
- `auto_discovery`: whether to use the auto-discovery mechanism for memcached.
- `set_async_circuit_breaker_config`: the configuration for the circuit breaker for asynchronous set operations.
- `enabled`: `true` to enable circuite breaker for asynchronous operations. The circuit breaker consists of three states: closed, half-open, and open. It begins in the closed state. When the total requests exceed `min_requests`, and either consecutive failures occur or the failure percentage is excessively high according to the configured values, the circuit breaker transitions to the open state. This results in the rejection of all asynchronous operations. After `open_duration`, the circuit breaker transitions to the half-open state, where it allows `half_open_max_requests` asynchronous operations to be processed in order to test if the conditions have improved. If they have not, the state transitions back to open; if they have, it transitions to the closed state. Following each 10 seconds interval in the closed state, the circuit breaker resets its metrics and repeats this cycle.
- `half_open_max_requests`: maximum number of requests allowed to pass through when the circuit breaker is half-open. If set to 0, the circuit breaker allows only 1 request.
- `open_duration`: the period of the open state after which the state of the circuit breaker becomes half-open. If set to 0, the circuit breaker utilizes the default value of 60 seconds.
- `min_requests`: minimal requests to trigger the circuit breaker, 0 signifies no requirements.
- `consecutive_failures`: consecutive failures based on `min_requests` to determine if the circuit breaker should open.
- `failure_percent`: the failure percentage, which is based on `min_requests`, to determine if the circuit breaker should open.
- `enabled_items`: selectively choose what types of items to cache. Supported values are `Postings`, `Series` and `ExpandedPostings`. By default, all items are cached.
- `ttl`: ttl to store index cache items in memcached.
@ -376,6 +455,13 @@ config:
master_name: ""
max_async_buffer_size: 10000
max_async_concurrency: 20
set_async_circuit_breaker_config:
enabled: false
half_open_max_requests: 10
open_duration: 5s
min_requests: 50
consecutive_failures: 5
failure_percent: 0.05
enabled_items: []
ttl: 0s
```
@ -392,7 +478,7 @@ While the remaining settings are **optional**:
- `dial_timeout`: the redis dial timeout.
- `read_timeout`: the redis read timeout.
- `write_timeout`: the redis write timeout.
- `cache_size` size of the in-memory cache used for client-side caching. Client-side caching is enabled when this value is not zero. See [official documentation](https://redis.io/docs/manual/client-side-caching/) for more. It is highly recommended to enable this so that Thanos Store would not need to continuously retrieve data from Redis for repeated requests of the same key(-s).
- `cache_size` size of the in-memory cache used for client-side caching. Client-side caching is enabled when this value is not zero. See [official documentation](https://redis.io/docs/latest/develop/reference/client-side-caching/) for more. It is highly recommended to enable this so that Thanos Store would not need to continuously retrieve data from Redis for repeated requests of the same key(-s).
- `enabled_items`: selectively choose what types of items to cache. Supported values are `Postings`, `Series` and `ExpandedPostings`. By default, all items are cached.
- `ttl`: ttl to store index cache items in redis.
@ -400,10 +486,6 @@ Here is an example of what effect client-side caching could have:
<img src="../img/rueidis-client-side.png" class="img-fluid" alt="Example of client-side in action - reduced network usage by a lot"/>
- `pool_size`: maximum number of socket connections.
- `min_idle_conns`: specifies the minimum number of idle connections which is useful when establishing new connection is slow.
- `idle_timeout`: amount of time after which client closes idle connections. Should be less than server's timeout.
- `max_conn_age`: connection age at which client retires (closes) the connection.
- `max_get_multi_concurrency`: specifies the maximum number of concurrent GetMulti() operations.
- `get_multi_batch_size`: specifies the maximum size per batch for mget.
- `max_set_multi_concurrency`: specifies the maximum number of concurrent SetMulti() operations.
@ -510,6 +592,33 @@ Note that there must be no trailing slash in the `peers` configuration i.e. one
If timeout is set to zero then there is no timeout for fetching and fetching's lifetime is equal to the lifetime to the original request's lifetime. It is recommended to keep it higher than zero. It is generally preferred to keep this value higher because the fetching operation potentially includes loading of data from remote object storage.
## Hedged Requests
Thanos Store Gateway supports `hedged requests` to enhance performance and reliability, particularly in high-latency environments. This feature addresses `long-tail latency issues` that can occur between the Thanos Store Gateway and an external cache, reducing the impact of slower response times on overall performance.
The configuration options for hedged requests allow for tuning based on latency tolerance and cost considerations, as some providers may charge per request.
In the `bucket.yml` file, you can specify the following fields under `hedging_config`:
- `enabled`: bool to enable hedged requests.
- `up_to`: maximum number of hedged requests allowed for each initial request.
- **Purpose**: controls the redundancy level of hedged requests to improve response times.
- **Cost vs. Benefit**: increasing up_to can reduce latency but may increase costs, as some providers charge per request. Higher values provide diminishing returns on latency beyond a certain level.
- `quantile`: latency threshold, specified as a quantile (e.g., percentile), which determines when additional hedged requests should be sent.
- **Purpose**: controls when hedged requests are triggered based on response time distribution.
- **Cost vs. Benefit**: lower quantile (e.g., 0.7) initiates hedged requests sooner, potentially raising costs while lowering latency variance. A higher quantile (e.g., 0.95) will initiate hedged requests later, reducing cost by limiting redundancy.
By default, `hedging_config` is set as follows:
```yaml
hedging_config:
enabled: false
up_to: 3
quantile: 0.9
```
This configuration sends up to three additional requests if the initial request response time exceeds the 90th percentile.
## Index Header
In order to query series inside blocks from object storage, Store Gateway has to know certain initial info from each block index. In order to achieve so, on startup the Gateway builds an `index-header` for each block and stores it on local disk; such `index-header` is build by downloading specific pieces of original block's index, stored on local disk and then mmaped and used by Store Gateway.

File diff suppressed because it is too large Load Diff

View File

@ -38,7 +38,7 @@ There are certain rules you can follow to make the MOST from your time with us!
- Try to be independent and responsible for the feature you want to deliver. The sooner you start to lead your task, the better for you! It's hard in the beginning but try to think about the user experience. Is it hard or easy to make mistake using it? How difficult is it to migrate to this feature? Is there anything we can do to reduce data loss errors?
- Try to help others by **reviewing** other contributors, mentees or mentors' Pull Requests! It sounds scary, but this is actually the best way to learn about coding practices, patterns and how to maintain high quality codebase! (GIFs on PRs are welcome as well!)
- Try using an [iterative process for development](https://en.wikipedia.org/wiki/Iterative_and_incremental_development). Start with small and simple assumptions, and once you have a working example ready, keep improving and discussing with the mentors. Small changes are easy to review and easy to accept 😄.
- Try working out a [proof of concept](https://en.wikipedia.org/wiki/Proof_of_concept), which can be used as a baseline, and can be improved upon. These are real-world projects, so it's not possible to have a deterministic solution everytime, and proof of concepts are quick way to determine feasibility.
- Try working out a [proof of concept](https://en.wikipedia.org/wiki/Proof_of_concept), which can be used as a baseline, and can be improved upon. These are real-world projects, so it's not possible to have a deterministic solution every time, and proof of concepts are quick way to determine feasibility.
> At the end of mentorship, it's not the end! You are welcome to join our Community Office Hours. See [this](https://docs.google.com/document/d/137XnxfOT2p1NcNUq6NWZjwmtlSdA6Wyti86Pd6cyQhs/edit#) for details. This is the meeting for any Thanos contributor, but you will find fellow current and ex-mentees on the meeting too.

View File

@ -88,6 +88,18 @@ See up to date [jsonnet mixins](https://github.com/thanos-io/thanos/tree/main/mi
## Talks
* 2024
* [Enlightning - Scaling Your Metrics with Thanos](https://www.youtube.com/live/1qvcVJiVx7M)
* [6 Learnings from Building Thanos Project](https://www.youtube.com/watch?v=ur8dDFaNEFg)
* [Monitoring the World: Scaling Thanos in Dynamic Prometheus Environments](https://www.youtube.com/watch?v=ofhvbG0iTjU)
* [Scaling Thanos at Reddit](https://www.youtube.com/watch?v=c18RGbAxCfI)
* [Thanos Project Updates](https://www.youtube.com/watch?v=wmNtCj5D4_A)
* [Connecting Thanos to the Outer Rim via Query API](https://www.youtube.com/watch?v=E8L8fuRj66o)
* [Multiverse of Thanos: Making Thanos Multi-Tenanted](https://www.youtube.com/watch?v=SAyPQ2d8v4Q)
* [Thanos Receiver Deep Dive](https://www.youtube.com/watch?v=jn_zIfBuUyE)
* [From UI to Storage: Unraveling the Magic of Thanos Query Processing](https://www.youtube.com/watch?v=ZGQIitaKoTM)
* [Thanos Infinity Stones and How You Can Operate Them!](https://www.youtube.com/watch?v=e8kvX6mRlyE)
* 2023
* [Planetscale monitoring: Handling billions of active series with Prometheus and Thanos](https://www.youtube.com/watch?v=Or8r46fSaOg)
* [Taming the Tsunami: low latency ingestion of push-based metrics in Prometheus](https://www.youtube.com/watch?v=W81x1j765hc)
@ -125,19 +137,23 @@ See up to date [jsonnet mixins](https://github.com/thanos-io/thanos/tree/main/mi
## Blog posts
* 2024:
* [Scaling Prometheus with Thanos.](https://www.cloudraft.io/blog/scaling-prometheus-with-thanos)
* [Streamlining Long-Term Storage Query Performance for Metrics With Thanos.](https://blog.devops.dev/streamlining-long-term-storage-query-performance-for-metrics-with-thanos-b44419c70cc4)
* 2023:
* [Thanos Ruler and Prometheus Rules — a match made in heaven.](https://medium.com/@helia.barroso/thanos-ruler-and-prometheus-rules-a-match-made-in-heaven-a4f08f2399ac)
* 2022:
* [Thanos at Medallia: A Hybrid Architecture Scaled to Support 1 Billion+ Series Across 40+ Data Centers](https://thanos.io/blog/2022-09-08-thanos-at-medallia/)
* [Deploy Thanos Receive with native OCI Object Storage on Oracle Kubernetes Engine](https://medium.com/@lmukadam/deploy-thanos-receive-with-native-oci-object-storage-on-kubernetes-829326ea0bc6)
* [Leveraging Consul for Thanos Query Discovery](https://nicolastakashi.medium.com/leveraging-consul-for-thanos-query-discovery-34212d496c88)
* [Leveraging Consul for Thanos Query Discovery](https://itnext.io/leveraging-consul-for-thanos-query-discovery-34212d496c88)
* 2021:
* [Adopting Thanos at LastPass](https://krisztianfekete.org/adopting-thanos-at-lastpass/)
* 2020:
* [Banzai Cloud user story](https://banzaicloud.com/blog/multi-cluster-monitoring/)
* [Banzai Cloud user story](https://outshift.cisco.com/blog/multi-cluster-monitoring)
* [Monitoring the Beat microservices: A tale of evolution](https://build.thebeat.co/monitoring-the-beat-microservices-a-tale-of-evolution-4e246882606e)
* 2019:
@ -146,7 +162,7 @@ See up to date [jsonnet mixins](https://github.com/thanos-io/thanos/tree/main/mi
* [HelloFresh blog posts part 1](https://engineering.hellofresh.com/monitoring-at-hellofresh-part-1-architecture-677b4bd6b728)
* [HelloFresh blog posts part 2](https://engineering.hellofresh.com/monitoring-at-hellofresh-part-2-operating-the-monitoring-system-8175cd939c1d)
* [Thanos deployment](https://www.metricfire.com/blog/ha-kubernetes-monitoring-using-prometheus-and-thanos)
* [Taboola user story](https://blog.taboola.com/monitoring-and-metering-scale/)
* [Taboola user story](https://www.taboola.com/engineering/monitoring-and-metering-scale/)
* [Thanos via Prometheus Operator](https://kkc.github.io/2019/02/10/prometheus-operator-with-thanos/)
* 2018:

View File

@ -0,0 +1,67 @@
# Buffers guide
## Intro
This is a guide to buffers in Thanos. The goal is to show how data is moving around, what objects or copying is happening, what are the life-times of each object and so on. With this information we will be able to make better decisions on how to make the code more garbage collector (GC) friendly.
### Situation in 0.39.2
We only use protobuf encodings and compression is optional:
```
gRPC gets a compressed protobuf message -> decompress -> protobuf decoder
```
We still use gogoproto so in the protobuf decoder we specify a custom type for labels - ZLabels. This is a "hack" that uses unsafe underneath. With the `slicelabels` tag, it is possible to create labels.Labels objects (required by the PromQL layer) and reuse references to strings allocated in the protobuf layer. The protobuf message's bytes buffer is never recycled and it lives as far as possible until it is collected by the GC. Chunks and all other objects are still copied.
### gRPC gets the ability to recycle messages
Nowadays, gRPC can and does by default recycle the decoded messages nowadays so that it wouldn't be needed to allocate a new `[]byte` all the time on the gRPC layer. But this means that we have to be conscious in the allocations that we make.
Previously we had:
```go
[]struct {
Name string
Value string
}
```
So, a slice of two pointers in each element plus the strings themselves. But, fortunately, we use unsafe code and we don't allocate a new string object for each string but these strings rather point to []bytes.
With `stringlabel` and careful use of `labels.ScratchBuilder` we could put all labels into one string object. Only consequence of this is that we will have to copy protobuf message's data into this special format but copying data in memory is faster (probably?) than having to iterate through possibly millions of objects during GC time.
Also, ideally we wouldn't have to allocate data for messages and stream them into the readers but if the messages are compressed then there is no way we could do that since for generic compression (in most cases) you need to have the whole message in memory. Cap N' Proto is also based on messages so you need to read a message fully. Only bonus is that it gives you full control over the lifetime of the messages. Most `grpc-go` encoders immediately put the message's buffer back after decoding it BUT it is possible to hold a reference for longer:
[CodecV2 ref](https://pkg.go.dev/google.golang.org/grpc/encoding#CodecV2)
Hence, the only possibility for further improvements at the moment it seems is to associate the life-time of messages with the query itself so that we could avoid copying `[]bytes` for the chunks (mostly).
I wrote a benchmark and it seems like `stringlabel` + hand-rolled unmarshaling code wins:
```
goos: linux
goarch: amd64
pkg: github.com/thanos-io/thanos/pkg/store/labelpb
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
│ labelsbench │
│ sec/op │
LabelUnmarshal/Unmarshal_regular-16 123.0µ ± 41%
LabelUnmarshal/Unmarshal_ZLabel-16 65.43µ ± 40%
LabelUnmarshal/Unmarshal_easyproto-16 41.19µ ± 45%
geomean 69.20µ
│ labelsbench │
│ B/op │
LabelUnmarshal/Unmarshal_regular-16 84.22Ki ± 0%
LabelUnmarshal/Unmarshal_ZLabel-16 68.59Ki ± 0%
LabelUnmarshal/Unmarshal_easyproto-16 32.03Ki ± 0%
geomean 56.99Ki
│ labelsbench │
│ allocs/op │
LabelUnmarshal/Unmarshal_regular-16 2.011k ± 0%
LabelUnmarshal/Unmarshal_ZLabel-16 11.00 ± 0%
LabelUnmarshal/Unmarshal_easyproto-16 1.000 ± 0%
geomean 28.07
```

View File

@ -164,7 +164,7 @@ metadata:
### Forward proxy Envoy configuration `envoy.yaml`
This is a static v2 envoy configuration (v3 example below). You will need to update this configuration for every sidecar you would like to talk to. There are also several options for dynamic configuration, like envoy XDS (and other associated dynamic config modes), or using something like terraform (if thats your deployment method) to generate the configs at deployment time. NOTE: This config **does not** send a client certificate to authenticate with remote clusters, see envoy v3 config.
This is a static v2 envoy configuration (v3 example below). You will need to update this configuration for every sidecar you would like to talk to. There are also several options for dynamic configuration, like envoy XDS (and other associated dynamic config modes), or using something like terraform (if that's your deployment method) to generate the configs at deployment time. NOTE: This config **does not** send a client certificate to authenticate with remote clusters, see envoy v3 config.
```yaml
admin:

View File

@ -55,7 +55,7 @@ The main motivation for considering deletions in the object storage are the foll
* **reason for deletion**
* The entered details are processed by the CLI tool to create a tombstone file (unique for a request and irrespective of the presence of series), and the file is uploaded to the object storage making it accessible to all components.
* **Filename optimization**: The filename is created from the hash of matchers, minTime and maxTime. This helps re-write an existing tombstone, whenever a same request is made in the future hence avoiding duplication of the same request. (NOTE: Requests which entail common deletions still creates different tombstones.)
* Store Gateway masks the series on processing the global tombstone files from the object storage. At chunk level, whenever there's a match with the data corresponding to atleast one of the tombstones, we skip the chunk, potentially resulting in the masking of chunk.
* Store Gateway masks the series on processing the global tombstone files from the object storage. At chunk level, whenever there's a match with the data corresponding to at least one of the tombstones, we skip the chunk, potentially resulting in the masking of chunk.
## Considerations

View File

@ -203,9 +203,9 @@ func (s *seriesServer) Send(r *storepb.SeriesResponse) error {
}
```
Now that the `SeriesStats` are propagated into the `storepb.SeriesServer`, we can ammend the `selectFn` function to return a tuple of `(storage.SeriesSet, storage.SeriesSetCounter, error)`
Now that the `SeriesStats` are propagated into the `storepb.SeriesServer`, we can amend the `selectFn` function to return a tuple of `(storage.SeriesSet, storage.SeriesSetCounter, error)`
Ammending the QueryableCreator to provide a func parameter:
Amending the QueryableCreator to provide a func parameter:
```go
type SeriesStatsReporter func(seriesStats storepb.SeriesStatsCounter)

View File

@ -134,7 +134,7 @@ Using the reference implementation, we benchmarked query execution and memory us
We then ran the following query on the reference dataset for 10-15 minutes: `sum by (pod) (http_requests_total)`
The memory usage of Queriers with and without sharding was ~650MB and ~1.5GB respectively, as shown n the screenshots bellow.
The memory usage of Queriers with and without sharding was ~650MB and ~1.5GB respectively, as shown n the screenshots below.
Memory usage with sharding:

View File

@ -180,7 +180,7 @@ message SeriesRefMap {
### 9.2 Per-Receive Validation
We can implement the same new endpoints as mentioned in the previous approach, on Thanos Receive, but do merging and checking operations on each Receive node in the hashring, i.e change the existing Router and Ingestor modes to handle the same limting logic.
We can implement the same new endpoints as mentioned in the previous approach, on Thanos Receive, but do merging and checking operations on each Receive node in the hashring, i.e change the existing Router and Ingestor modes to handle the same limiting logic.
The implementation would be as follows,

View File

@ -192,7 +192,7 @@ Receivers do not need to re-shard data on rollouts; instead, they must flush the
This may produce small, and therefore unoptimized, TSDB blocks in object storage, however these are optimized away by the Thanos compactor by merging the small blocks into bigger blocks. The compaction process is done concurrently in a separate deployment to the receivers. Timestamps involved are produced by the sending Prometheus, therefore no clock synchronization is necessary.
When changing a soft tenant to a hard tenant (or vise versa), all blocks on all nodes in hashrings in which the tenant is present must be flushed.
When changing a soft tenant to a hard tenant (or vice versa), all blocks on all nodes in hashrings in which the tenant is present must be flushed.
## Open questions

View File

@ -131,7 +131,7 @@ Example usages would be:
* Add/import relabel config into Thanos, add relevant logic.
* Hook it for selecting blocks on Store Gateway
* Advertise original labels of "approved" blocs on resulted external labels.
* Advertise original labels of "approved" blocks on resulted external labels.
* Hook it for selecting blocks on Compactor.
* Add documentation about following concern: Care must be taken with changing selection for compactor to unsure only single compactor ever running over each Source's blocks.

Some files were not shown because too many files have changed in this diff Show More