Compare commits

...

124 Commits

Author SHA1 Message Date
Giedrius Statkevičius 1c4d17bd12
Merge pull request #8466 from thanos-io/add_missing_path
scripts/genproto: add missing dir
2025-09-03 15:05:22 +03:00
Giedrius Statkevičius cafefb8428
Merge pull request #8464 from thanos-io/assume_unmark_upstream
block: assume that we do not unmark a block for deletion
2025-09-03 12:43:40 +03:00
Giedrius Statkevičius d4330a1bb7 scripts/genproto: add missing dir
This dir was missing from the script. Regenerate the file. Removes some
dead code.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-09-03 12:33:08 +03:00
Giedrius Statkevičius dddd98dab0
Merge pull request #8465 from thanos-io/volunteer_release
docs: volunteer as v0.40.0 shepherd
2025-09-03 12:13:23 +03:00
Giedrius Statkevičius 6e231d08e5 block: assume that we do not unmark a block for deletion
Just like we assume that the meta.json file doesn't change, let's also
assume that we do not unmark a block for deletion.

This solves a critical issue in Thanos Store where there is a race
between deletion in compactor and the loading of a block:
- Deletion starts from meta.json, deletion marker are deleted at the end
- Store sees that block, loads it by using the local in-memory and disk
  cache
- By the time the deletion marker filtering functions is executed, the
  marker is deleted by Compactor
- Store happily tries to load that block

The root cause is that we are doing listing & checking markers in two or
more separate steps. Since that is inevitable, we need to assume that
the marker won't disappear until the block is there. This is the
case when everything is working normally.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-09-03 11:34:36 +03:00
Giedrius Statkevičius 0a26f5ea7c docs: volunteer as v0.40.0 shepherd
Let's do a release just before PromCon.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-09-03 11:11:06 +03:00
Giedrius Statkevičius 743e5ed125
Merge pull request #8456 from parthivrmenon/docs/clarify-compactor-wording
docs: clarify Compactor wording in compact.md
2025-09-01 14:36:41 +03:00
Parthiv Roshan Menon a2863633aa docs: restore alert rules in examples/alerts/alerts.md
MDOX regenerated the alert rules content from null back to actual rules.

Signed-off-by: Parthiv Roshan Menon <parthiv.menon@smarsh.com>
2025-09-01 11:39:39 +05:30
Parthiv Roshan Menon 060e52a3d3 docs: apply MDOX auto-formatting
- Fix email formatting in MAINTAINERS.md (remove angle brackets)
- Update HTTP config defaults in component documentation
- Apply consistent formatting and spacing across documentation files
- Remove empty alert rules in examples/alerts/alerts.md

Signed-off-by: Parthiv Roshan Menon <parthiv.menon@smarsh.com>
2025-09-01 11:28:28 +05:30
Parthiv Roshan Menon a2f617722f docs: ignore itnext.io in link validation due to TLS issues
Signed-off-by: Parthiv Roshan Menon <parthiv.menon@smarsh.com>
2025-09-01 11:28:28 +05:30
Parthiv Roshan Menon 6bee829914 docs: clarify Compactor wording in compact.md
Signed-off-by: Parthiv Roshan Menon <parthiv.menon@smarsh.com>
2025-09-01 11:28:21 +05:30
Parthiv Roshan Menon 345f3660c5 docs: fix broken links in documentation
Signed-off-by: Parthiv Roshan Menon <parthiv.menon@smarsh.com>
2025-09-01 11:28:03 +05:30
Giedrius Statkevičius c93f82d7ca
Merge pull request #8454 from thanos-io/fix_bug_8442
compact: ensure we don't mark blocks for deletion again
2025-08-29 13:31:58 +03:00
Giedrius Statkevičius ebd746c952 block: fix race
Protect reads and writes with the mutex.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-08-29 13:04:28 +03:00
Giedrius Statkevičius 638bf440eb compact: ensure we don't mark blocks for deletion again
Fix #8442 by not marking blocks for deletion again if they were just
deleted.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-08-28 14:47:16 +03:00
Saswata Mukherjee 9a10cb2fcc
*: Restore certain omitempty tags after modernize (#8452)
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2025-08-28 10:21:39 +01:00
Giedrius Statkevičius fcea5d2c99
Merge pull request #8443 from clwluvw/querier-rl-cleanup
query: remove unused replica labels map in querier
2025-08-28 11:10:26 +03:00
Giedrius Statkevičius 662dbb6e29
Merge pull request #8450 from saswatamcode/modernize
*: Apply modernize analyzer to the codebase
2025-08-28 11:09:45 +03:00
Saswata Mukherjee fa18f8982a
Fix lint
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2025-08-27 17:09:33 +01:00
Saswata Mukherjee 733df9aedb
*: Apply modernize analyzer to the codebase
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2025-08-27 16:33:31 +01:00
Giedrius Statkevičius dc403eaac7
Merge pull request #8449 from thanos-io/add_repro_for_8442
compact: adding repro for 8442
2025-08-27 11:31:44 +03:00
Giedrius Statkevičius df3e1963bd compact: adding repro for 8442
Adding a test that reproduces the race in 8442 - race between garbage
collection and deletion of marked blocks. Garbage collection should
never mark those blocks for deletion again if it has a consistent state.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-08-27 10:43:09 +03:00
Seena Fallah 74aba58f6c query: remove unused replica labels map in querier
Remove unused local variable `rl` in newQuerier function that was
creating a map from replicaLabels but never being used.

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2025-08-25 20:36:38 +02:00
Giedrius Statkevičius b9844418d6
Merge pull request #8441 from thanos-io/promu_bump_125
.promu: bump to 1.25
2025-08-25 12:52:37 +03:00
Giedrius Statkevičius 55b92a693c .promu: bump to 1.25
Relevant PR was merged so let's bump to 1.25.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-08-25 12:42:18 +03:00
Giedrius Statkevičius 1fe840667d
Merge pull request #8439 from thanos-io/update_go_125
Update to Go 1.25
2025-08-25 12:27:49 +03:00
Giedrius Statkevičius 90bbc8b149 *: fix linter issues
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-08-25 11:52:44 +03:00
Giedrius Statkevičius 2d23cd68f6 *: update to Go 1.25
I found myself wanting the newest os.Root.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-08-25 10:47:20 +03:00
Matej Gera 801bda7a90
Remove @matej-g from maintainers (#8437)
* Remove @matej-g from maintainers

Signed-off-by: GitHub <noreply@github.com>

* Fix failing link

Signed-off-by: GitHub <noreply@github.com>

---------

Signed-off-by: GitHub <noreply@github.com>
2025-08-21 15:33:11 +02:00
Giedrius Statkevičius e61ad9c156
Merge pull request #8427 from erikgb/dockerfile-copy-chown
refactor: chown on COPY in Dockerfile to reduce image size
2025-08-19 10:02:49 +03:00
Erik Godding Boye 8dedfd08f1 refactor: chown on COPY in Dockerfile to reduce image size
Signed-off-by: Erik Godding Boye <egboye@gmail.com>
2025-08-18 17:54:38 +02:00
Giedrius Statkevičius ca9e3637ec
Merge pull request #8426 from thanos-io/add_buffers_guide
docs: add first "internal" guide
2025-08-18 17:50:50 +03:00
Giedrius Statkevičius 6c762830d0 store/labelpb: add unmarshaling benchmarks
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-08-18 15:45:54 +03:00
Michael Hoffmann b05b036a2f
Merge pull request #8423 from thanos-io/mhoffmann/fix-ruler-loading-rules-with-heredoc
ruler: fix marshalling rules with heredoc
2025-08-13 14:49:10 +02:00
Michael Hoffmann 39c56224bc ruler: fix marshalling rules with heredoc
This fixes an issue with ruler where we would load

```
groups:
    - name: test.rules
      rules:
        - alert: LastUpdateTime
          annotations:
            summary: test
          expr: |2
               max without (instance) (
                  time()
                -
                  (last_updated > 0)
              )
            >
              60 * 60 * 4
          for: 5m
          labels:
            priority: "3"
```

and then remarshal it as

```
groups:
    - name: test.rules
      rules:
        - alert: LastUpdateTime
          annotations:
            summary: test
          expr: |4
               max without (instance) (
                  time()
                -
                  (last_updated > 0)
              )
            >
              60 * 60 * 4
          for: 5m
          labels:
            priority: "3"

```

which is not valid yaml anymore.

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-08-13 10:47:27 +00:00
Giedrius Statkevičius eebe0507d4 docs: add buffers guide
Add an initial buffers guide - just outlining my ideas. Will try
removing gogoproto once again and write custom labels unmarshaling code.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-08-13 10:30:24 +03:00
Michael Hoffmann 519fda5fcd
Merge pull request #8413 from thanos-io/mhoffmann/misc-bump-prometheus-and-promql-engine
deps: bump prometheus and promql-engine
2025-08-07 13:40:22 +02:00
Michael Hoffmann 38d1bd8d20 deps: bump prometheus and promql-engine
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-08-07 10:20:51 +00:00
Giedrius Statkevičius 8d77519d03
Merge pull request #8397 from thanos-io/remove_kakkoyun
Remove @kakkoyun from maintainers
2025-08-05 11:09:28 +03:00
Giedrius Statkevičius 899abc193b
Merge pull request #8410 from thanos-io/partial_delete_marked
compact: ignore blocks with deletion mark in partial deletes
2025-08-04 17:18:43 +03:00
Giedrius Statkevičius d06dc234df compact: ignore blocks with deletion mark in partial deletes
Blocks deletion always starts with meta.json so if there are multiple
shards of the compactor then one shard can also start to try to delete
the same block because it detects them as partial. Hence, ignore
deletion marks in the partial block cleaning function because blocks
with deletion mark as handled in the other flow.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-08-04 15:52:50 +03:00
Giedrius Statkevičius 5fc08d5573
Merge pull request #8409 from thanos-io/fix_arg
compact: fix argument
2025-08-04 15:42:59 +03:00
Giedrius Statkevičius a562652784 *: fix tests
Partial block deletion is covered by unit tests so removing it from e2e
tests as it is impossible to mock the last modified date.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-08-04 15:27:23 +03:00
Giedrius Statkevičius e73cd5084c compact: fix argument
This should be the block's ID, not the bucket's name.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-08-04 11:57:59 +03:00
Michael Hoffmann 696193cc35
Merge pull request #8403 from thanos-io/mhoffmann/bump-promu-to-set-build-tags
build: bump promu to set build tags
2025-07-31 17:01:12 +02:00
NickGoog 8d3d636734
Don't start sidecar if REMOTE_WRITE_ENABLED env var present (use receive instead) (#8404)
Based on the design online, running both these components simultaneously
seems unintended: https://thanos.io/tip/thanos/quick-tutorial.md

Signed-off-by: NickGoog <hartunian@google.com>
2025-07-31 14:55:42 +01:00
Michael Hoffmann 227def9692 build: bump promu to set build tags
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-07-31 12:10:39 +00:00
NickGoog 17ca834087
Fix ObjStore config lookup for non-MinIO use (#8399)
Also removes unwanted previous quickstart.sh change from CHANGELOG.

Signed-off-by: NickGoog <66492516+NickGoog@users.noreply.github.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2025-07-31 12:59:40 +01:00
Giedrius Statkevičius 8a92550350
Merge pull request #8402 from thanos-io/trim_labelsets
query/receive: trim labelsets from String()
2025-07-31 14:00:44 +03:00
Giedrius Statkevičius 7ebe35e809 query/receive: trim labelsets from String()
We have lots of tenants & labels so adding them to String() makes the
error messages REALLY long and unreadable so I am just suggesting to
remove them entirely from String().

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-31 13:40:04 +03:00
Giedrius Statkevičius 0acf54d9ea
Merge pull request #8334 from pedro-stanaka/feat/query-check-endpoints-on-startup
Query: wait for initial endpoint discovery before becoming ready
2025-07-31 10:57:56 +03:00
Giedrius Statkevičius 75dac8cb98 Merge remote-tracking branch 'origin/main' into feat/query-check-endpoints-on-startup
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-31 10:44:50 +03:00
Giedrius Statkevičius 1e740e385f
Merge pull request #8401 from thanos-io/rework_only_write
block/compact: rework consistency check, make writers only write
2025-07-31 10:17:52 +03:00
Giedrius Statkevičius 57031c7b18 shipper: fix tests
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-31 10:00:59 +03:00
Giedrius Statkevičius d42ecb5086 block/compact: rework consistency check, make writers only write
- It's weird that on upload errors, we try to clean everything and only
  then write again. It's an extra operation we don't need since whether
  a block exists or not hinges on the existence of meta.json. We don't
  need to delete old, same files before trying to upload them again.
- Consequently, we need to always use the _upload_ time, not block
  creation time when checking for consistency or when deleting partially
  uploaded blocks. Directories as such don't exist in object storages,
  it's a client-side "illusion", so we need to iterate through the
  partial block's directory to fetch the last modified date.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-30 14:05:11 +03:00
NickGoog 1b21d7151c
Update `thanos query` flag `--store` to `--endpoint`. (#8400)
`--store` appears out of date.

Signed-off-by: NickGoog <hartunian@google.com>
2025-07-30 09:02:43 +01:00
Giedrius Statkevičius 88d0ae8071
Merge pull request #8398 from thanos-io/cleanup_surface
block: output cleanup err
2025-07-29 15:03:36 +03:00
Giedrius Statkevičius 8459bd21d7 block: output cleanup err
Now, if cleanup fails then we don't know why it failed. Surface the
error.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-29 11:51:36 +03:00
Kemal Akkoyun 763ba4d413
Remove @kakkoyun from maintainers
- Fix several formatting issues
- Convert old maintainers' section to a table

Signed-off-by: Kemal Akkoyun <kemal.akkoyun@datadoghq.com>
2025-07-28 20:36:24 +02:00
Giedrius Statkevičius 49a560d09d
Merge pull request #7758 from thibaultmg/life_of_a_sample_part_2
Blog article submission: Life of a Sample in Thanos Part II
2025-07-26 15:03:50 +03:00
Harry John c3d4ea7cdd
*: Update promql-engine and prometheus (#8388)
* *: Update promql-engine and prometheus

Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>

* Fix data race

Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>

---------

Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
2025-07-25 10:39:09 -07:00
Thibault Mange 98130c25d6
fix inaccuracies
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
2025-07-25 14:32:51 +02:00
Thibault Mange bf8777dcc5
Update docs/blog/2023-11-20-life-of-a-sample-part-2.md
Co-authored-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
2025-07-25 10:44:25 +02:00
James Geisler be2f408d9e
[tools] add flag for uploading compacted blocks to bucket upload-blocks (#8359)
* add flag for uploading compacted blocks to thanos tools

Signed-off-by: James Geisler <geislerjamesd@gmail.com>

* update changelog

Signed-off-by: James Geisler <geislerjamesd@gmail.com>

* fix doc check

Signed-off-by: James Geisler <geislerjamesd@gmail.com>

---------

Signed-off-by: James Geisler <geislerjamesd@gmail.com>
2025-07-23 17:58:01 -07:00
Pedro Tanaka f4ee5cb617
query: perform initial DNS resolution for gRPC endpoint groups
Extract resolution logic into updateResolver() and call it synchronously
in Build() to ensure endpoint groups are resolved on startup, not just
during periodic updates. This prevents potential connection delays when
the query component starts.

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2025-07-23 16:21:32 +02:00
Pedro Tanaka d7830d319f
Query: Resolve DNS before endpoint discovery on startup
Add initial DNS resolution phase before starting periodic endpoint
updates to fix race condition where Query could become ready with
zero discovered endpoints.

Previously, the first endpoint update could run before DNS resolution
completed (both use runutil.Repeat which runs immediately), causing
Query to be ready but unable to serve requests for up to 5 seconds.

Now DNS resolution happens synchronously on startup, ensuring addresses
are available when the first endpoint update runs. This eliminates the
window where Query reports ready but has no endpoints discovered.

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2025-07-23 15:07:02 +02:00
Pedro Tanaka 73db5294cf
Make Thanos Query wait for initial endpoint discovery before becoming ready
Problem:
We observed a race condition where Thanos Query components were marking themselves as ready before discovering any endpoints. This created a timing gap that could lead to query failures:

- Query pods become ready immediately upon startup
- Endpoint discovery happens asynchronously in the background
- Queries arriving between readiness and endpoint discovery fail

Solution:
This commit modifies the Thanos Query readiness behavior to wait for the initial endpoint discovery to complete before marking the pod as ready. This ensures that when a Query pod reports ready, it has already attempted to discover and connect to available endpoints.

Changes:
1. Added synchronization to EndpointSet:
   - Added firstUpdateOnce flag and firstUpdateChan channel to track first update completion
   - Added WaitForFirstUpdate() method to block until initial discovery completes

2. Modified Query startup sequence:
   - gRPC server now waits for WaitForFirstUpdate() before calling statusProber.Ready()
   - Leverages existing runutil.Repeat behavior which runs the update function immediately

3. Timeout protection:
   - Uses store response timeout or 30 seconds as default timeout
   - Logs warning if timeout occurs but still proceeds to ready state

4. Added comprehensive tests for the new WaitForFirstUpdate functionality

Impact:
- Positive: Eliminates the race condition where queries could be routed to Query pods that haven't discovered any endpoints yet
- Negative: Slightly increases startup time as pods won't be ready until endpoint discovery completes (typically <1s in normal conditions)

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2025-07-23 15:07:00 +02:00
Giedrius Statkevičius e30e831b1c
Merge pull request #8389 from thanos-io/bust_cache
block: bust cache if modified timestamp differs
2025-07-23 14:51:08 +03:00
Giedrius Statkevičius cdecd4ee3f block: use sync.Map for fetcher
f.cached can now be modified concurrently so use a sync.Map.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-23 14:06:24 +03:00
Giedrius Statkevičius 97196973f3 block: bust cache if modified timestamp differs
In the parquet converter, we mark the original meta.json file with a
flag when it gets converted so that Thanos Store wouldn't load it. For
that to work, we need to bust the local cache when that happens.

For tests, we need the updated objstore module so I am doing that as
well.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-23 13:32:41 +03:00
Giedrius Statkevičius de1a2236eb
Merge pull request #8372 from harry671003/update_grpc
*: Update GRPC
2025-07-22 16:29:23 +03:00
🌲 Harry 🌊 John 🏔 ba255aaccd Fix data race
Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
2025-07-21 15:18:32 -07:00
🌲 Harry 🌊 John 🏔 f1991970bf *: update GRPC
Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
2025-07-21 14:25:25 -07:00
Michael Hoffmann 9073c8d0c5
Merge pull request #8384 from thanos-io/r0392_merge_to_main
Merge release 0.39.2 to main
2025-07-21 09:23:29 +02:00
Michael Hoffmann ba5c91aefb Merge remote-tracking branch 'origin/main' into r0392_merge_to_main 2025-07-21 06:55:52 +00:00
Michael Hoffmann 36681afb5e
Merge pull request #8379 from thanos-io/rel_0392
Release 0.39.2
2025-07-21 08:19:58 +02:00
Michael Hoffmann 5dd0031fab Release 0.39.2
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-07-17 14:17:28 +00:00
Joel Verezhak c0273e1d1a fix: querier panic (#8374)
Thanos Query crashes with "concurrent map iteration and map write" panic
in distributed mode when multiple goroutines access the same `annotations.Annotations`
map concurrently.

```
panic: concurrent map iteration and map write
github.com/prometheus/prometheus/util/annotations.(*Annotations).Merge(...)
github.com/thanos-io/promql-engine/engine.(*compatibilityQuery).Exec(...)
```

Here I replaced direct access to `res.Warnings.AsErrors()` with a thread-safe copy:
```go
// Before (unsafe)
warnings = append(warnings, res.Warnings.AsErrors()...)

// After (thread-safe)
safeWarnings := annotations.New().Merge(res.Warnings)
warnings = append(warnings, safeWarnings.AsErrors()...)
```

Signed-off-by: Joel Verezhak <jverezhak@open-systems.com>
Co-authored-by: Joel Verezhak <jverezhak@open-systems.com>
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-07-17 14:17:28 +00:00
Michael Hoffmann e78458176e query: add custom values to prompb methods (#8375)
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-07-17 14:17:28 +00:00
Michael Hoffmann 20900389bb
query: add custom values to prompb methods (#8375)
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-07-17 13:28:16 +00:00
Joel Verezhak 6f4895633a
fix: querier panic (#8374)
Thanos Query crashes with "concurrent map iteration and map write" panic
in distributed mode when multiple goroutines access the same `annotations.Annotations`
map concurrently.

```
panic: concurrent map iteration and map write
github.com/prometheus/prometheus/util/annotations.(*Annotations).Merge(...)
github.com/thanos-io/promql-engine/engine.(*compatibilityQuery).Exec(...)
```

Here I replaced direct access to `res.Warnings.AsErrors()` with a thread-safe copy:
```go
// Before (unsafe)
warnings = append(warnings, res.Warnings.AsErrors()...)

// After (thread-safe)
safeWarnings := annotations.New().Merge(res.Warnings)
warnings = append(warnings, safeWarnings.AsErrors()...)
```

Signed-off-by: Joel Verezhak <jverezhak@open-systems.com>
Co-authored-by: Joel Verezhak <jverezhak@open-systems.com>
2025-07-17 13:28:03 +00:00
Giedrius Statkevičius 0dc0b29fc8
Merge pull request #8366 from verejoel/feature/parquet-migration-flag
feat: ignore parquet migrated blocks in store gateway
2025-07-16 22:51:23 +03:00
Giedrius Statkevičius b4951291c7 *: always enable, clean up tests+code
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-16 18:30:16 +03:00
Giedrius Statkevičius 77f12e3e97
Merge pull request #8370 from open-ch/fix/querier-relabel-config
fix: query announced endpoints match relabel-config
2025-07-15 14:03:48 +03:00
Joel Verezhak dddffa99c4
fix acceptance test
Signed-off-by: Joel Verezhak <jverezhak@open-systems.com>
2025-07-14 01:01:43 +02:00
Joel Verezhak f2ff735e76
return only one store
Signed-off-by: Joel Verezhak <jverezhak@open-systems.com>
2025-07-13 21:52:56 +02:00
Joel Verezhak dee991e0d9
acceptance test
Signed-off-by: Joel Verezhak <jverezhak@open-systems.com>
2025-07-13 21:42:13 +02:00
Joel Verezhak 0972c43f29
acceptance test
Signed-off-by: Joel Verezhak <jverezhak@open-systems.com>
2025-07-13 21:17:30 +02:00
Joel Verezhak bd88416a19
rename method
Signed-off-by: Joel Verezhak <jverezhak@open-systems.com>
2025-07-13 20:35:41 +02:00
Joel Verezhak 0bb3e73e9d
refactor
Signed-off-by: Joel Verezhak <jverezhak@open-systems.com>
2025-07-13 20:24:08 +02:00
Joel Verezhak 9f2acf9df9
lint
Signed-off-by: Joel Verezhak <jverezhak@open-systems.com>
2025-07-12 01:38:19 +02:00
Joel Verezhak 8b3c29acc7
fix: querier external labels match relabel config
Signed-off-by: Joel Verezhak <jverezhak@open-systems.com>
2025-07-12 01:16:20 +02:00
Joel Verezhak ecd54dafd0
feat: ignore parquet migrated blocks in store gateway
Signed-off-by: Joel Verezhak <j.verezhak@gmail.com>
2025-07-08 17:46:19 +02:00
Giedrius Statkevičius b51ef67654
Merge pull request #8364 from thanos-io/use_prom_consts
*: use prometheus consts
2025-07-08 15:33:45 +03:00
Giedrius Statkevičius c8e9c2b12c *: use prometheus consts
Use Prometheus consts instead of using our own.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-08 14:50:55 +03:00
Michael Hoffmann 0f81bb792a
query: make grpc service config for endpoint groups configurable (#8287)
We add a "service_config" field to endpoint config file that we can use
to override the default service_config for endpoint groups. This enables
us to configure retry policy or loadbalncing on an endpoint level.

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-07-08 08:54:32 +01:00
Giedrius Statkevičius ddd5ff85f4
Merge pull request #8352 from thanos-io/r0391_merge_to_main
Merge release-0.39 to main
2025-07-01 17:13:53 +03:00
Giedrius Statkevičius 49cccb4d83 CHANGELOG: fix formatting
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-01 16:34:53 +03:00
Giedrius Statkevičius d6a926e613 Merge branch 'main' into r0391_merge_to_main
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-01 16:28:16 +03:00
Giedrius Statkevičius 35309514d1
Merge pull request #8347 from Saumya40-codes/update-docs-links
docs: update changed repositories links in docs/ to correct location
2025-07-01 13:02:03 +03:00
Giedrius Statkevičius 7c5ba37e5e
Merge pull request #8349 from thanos-io/defer_qfe
qfe: defer properly
2025-07-01 11:28:58 +03:00
Giedrius Statkevičius 938c083d6b qfe: defer properly
Refactor this check into a separate function so that defer would run at
the end of it and clean up resources properly.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-07-01 10:34:37 +03:00
Saumya Shah 9847758315 update changed repositories urls in docs/
Signed-off-by: Saumya Shah <saumyabshah90@gmail.com>
2025-07-01 08:44:37 +05:30
Giedrius Statkevičius 246502a29b
Merge pull request #8338 from thanos-io/tweak_qfe
cmd/query_frontend: use original roundtripper + close immediately
2025-06-30 14:16:06 +03:00
Giedrius Statkevičius d87029eea4 cmd/query_frontend: use original roundtripper + close immediately
Let's avoid using all the Cortex roundtripper machinery by using the
downstream roundtripper directly and then close the body immediately as
to not allocate any memory for the body of the response.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-26 12:40:15 +03:00
Giedrius Statkevičius 3727363b49
Merge pull request #8335 from pedro-stanaka/fix/flaky-unit-test-store-proxy
fix: make TestProxyStore_SeriesSlowStores less flaky by removing timing assertions
2025-06-26 12:20:19 +03:00
Giedrius Statkevičius 37254e5779
Merge pull request #8336 from thanos-io/lazyindexheader_fix
indexheader: fix race between lazy index header creation
2025-06-26 11:19:12 +03:00
Giedrius Statkevičius 4b31bbaa6b indexheader: create lazy header in singleflight
Creation of the index header shares the underlying storage so we should
use singleflight here to only create it once.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-26 10:18:07 +03:00
Giedrius Statkevičius d6ee898a06 indexheader: produce race in test
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-26 10:01:21 +03:00
Giedrius Statkevičius 5a95d13802
Merge pull request #8333 from thanos-io/repro_8224
e2e: add repro for 8224
2025-06-26 08:01:35 +03:00
Pedro Tanaka b54d293dbd
fix: make TestProxyStore_SeriesSlowStores less flaky by removing timing assertions
The TestProxyStore_SeriesSlowStores test was failing intermittently in CI due to
strict timing assertions that were sensitive to system load and scheduling variations.

The test now focuses on functional correctness rather than precise timing,
making it more reliable in CI environments while still validating the
proxy store's timeout and partial response behavior.

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>
2025-06-25 23:09:47 +02:00
Giedrius Statkevičius dfcbfe7c40 e2e: add repro for 8224
Add repro for https://github.com/thanos-io/thanos/issues/8224. Fix in
follow up PRs.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-25 18:07:48 +03:00
Giedrius Statkevičius 8b738c55b1
Merge pull request #8331 from thanos-io/merge-release-0.39-to-main-v2
Merge release 0.39 to main
2025-06-25 15:25:36 +03:00
Giedrius Statkevičius 69624ecbf1 Merge branch 'main' into merge-release-0.39-to-main-v2
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-25 14:59:35 +03:00
Saswata Mukherjee 9c955d21df
e2e: Check rule group label works (#8322)
* e2e: Check rule group label works

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Fix fanout test

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
2025-06-23 10:27:07 +01:00
Paul 7de9c13e5f
add rule tsdb.enable-native-histograms flag (#8321)
Signed-off-by: Paul Hsieh <supaulkawaii@gmail.com>
2025-06-23 10:06:00 +01:00
Giedrius Statkevičius 34a98c8efb
CHANGELOG: indicate release (#8319)
Indicate that 0.39.0 is in progress.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-06-19 17:59:12 +03:00
Thibault Mange ade0aed6f4
remove internal links
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
2024-09-19 13:29:18 +02:00
Thibault Mange a9ae3070b9
fix links
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
2024-09-19 13:29:18 +02:00
Thibault Mange 62ec424747
format
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
2024-09-19 13:29:18 +02:00
Thibault Mange a631728945
fix img size
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
2024-09-19 13:29:18 +02:00
Thibault Mange 38a98c7ec0
add store limits
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
2024-09-19 13:29:18 +02:00
Thibault Mange e2fb8c034b
fix typo
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
2024-09-19 13:29:18 +02:00
Thibault Mange 72a4952f48
add part II
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
2024-09-19 13:29:13 +02:00
267 changed files with 5077 additions and 6321 deletions

View File

@ -6,7 +6,6 @@ This is directory which stores Go modules with pinned buildable package that is
* Run `bingo get <tool>` to install <tool> that have own module file in this directory.
* For Makefile: Make sure to put `include .bingo/Variables.mk` in your Makefile, then use $(<upper case tool name>) variable where <tool> is the .bingo/<tool>.mod.
* For shell: Run `source .bingo/variables.env` to source all environment variable for each tool.
* For go: Import `.bingo/variables.go` for variable names.
* See https://github.com/bwplotka/bingo or -h on how to add, remove or change binaries dependencies.
## Requirements

View File

@ -35,11 +35,11 @@ $(CAPNPC_GO): $(BINGO_DIR)/capnpc-go.mod
@echo "(re)installing $(GOBIN)/capnpc-go-v3.0.1-alpha.2.0.20240830165715-46ccd63a72af"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=capnpc-go.mod -o=$(GOBIN)/capnpc-go-v3.0.1-alpha.2.0.20240830165715-46ccd63a72af "capnproto.org/go/capnp/v3/capnpc-go"
FAILLINT := $(GOBIN)/faillint-v1.13.0
FAILLINT := $(GOBIN)/faillint-v1.15.0
$(FAILLINT): $(BINGO_DIR)/faillint.mod
@# Install binary/ries using Go 1.14+ build command. This is using bwplotka/bingo-controlled, separate go module with pinned dependencies.
@echo "(re)installing $(GOBIN)/faillint-v1.13.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=faillint.mod -o=$(GOBIN)/faillint-v1.13.0 "github.com/fatih/faillint"
@echo "(re)installing $(GOBIN)/faillint-v1.15.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=faillint.mod -o=$(GOBIN)/faillint-v1.15.0 "github.com/fatih/faillint"
GOIMPORTS := $(GOBIN)/goimports-v0.23.0
$(GOIMPORTS): $(BINGO_DIR)/goimports.mod
@ -53,11 +53,11 @@ $(GOJSONTOYAML): $(BINGO_DIR)/gojsontoyaml.mod
@echo "(re)installing $(GOBIN)/gojsontoyaml-v0.1.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=gojsontoyaml.mod -o=$(GOBIN)/gojsontoyaml-v0.1.0 "github.com/brancz/gojsontoyaml"
GOLANGCI_LINT := $(GOBIN)/golangci-lint-v1.64.5
GOLANGCI_LINT := $(GOBIN)/golangci-lint-v2.4.0
$(GOLANGCI_LINT): $(BINGO_DIR)/golangci-lint.mod
@# Install binary/ries using Go 1.14+ build command. This is using bwplotka/bingo-controlled, separate go module with pinned dependencies.
@echo "(re)installing $(GOBIN)/golangci-lint-v1.64.5"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=golangci-lint.mod -o=$(GOBIN)/golangci-lint-v1.64.5 "github.com/golangci/golangci-lint/cmd/golangci-lint"
@echo "(re)installing $(GOBIN)/golangci-lint-v2.4.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=golangci-lint.mod -o=$(GOBIN)/golangci-lint-v2.4.0 "github.com/golangci/golangci-lint/v2/cmd/golangci-lint"
GOTESPLIT := $(GOBIN)/gotesplit-v0.2.1
$(GOTESPLIT): $(BINGO_DIR)/gotesplit.mod
@ -125,11 +125,11 @@ $(PROMTOOL): $(BINGO_DIR)/promtool.mod
@echo "(re)installing $(GOBIN)/promtool-v0.47.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=promtool.mod -o=$(GOBIN)/promtool-v0.47.0 "github.com/prometheus/prometheus/cmd/promtool"
PROMU := $(GOBIN)/promu-v0.5.0
PROMU := $(GOBIN)/promu-v0.17.0
$(PROMU): $(BINGO_DIR)/promu.mod
@# Install binary/ries using Go 1.14+ build command. This is using bwplotka/bingo-controlled, separate go module with pinned dependencies.
@echo "(re)installing $(GOBIN)/promu-v0.5.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=promu.mod -o=$(GOBIN)/promu-v0.5.0 "github.com/prometheus/promu"
@echo "(re)installing $(GOBIN)/promu-v0.17.0"
@cd $(BINGO_DIR) && GOWORK=off $(GO) build -mod=mod -modfile=promu.mod -o=$(GOBIN)/promu-v0.17.0 "github.com/prometheus/promu"
PROTOC_GEN_GOGOFAST := $(GOBIN)/protoc-gen-gogofast-v1.3.2
$(PROTOC_GEN_GOGOFAST): $(BINGO_DIR)/protoc-gen-gogofast.mod

View File

@ -1,11 +1,11 @@
module _ // Auto generated by https://github.com/bwplotka/bingo. DO NOT EDIT
go 1.22.0
go 1.23.0
toolchain go1.24.0
replace github.com/fatih/faillint => github.com/thanos-community/faillint v0.0.0-20250217160734-830c2205d383
require github.com/fatih/faillint v1.13.0
require github.com/fatih/faillint v1.15.0
require golang.org/x/sync v0.11.0 // indirect
require golang.org/x/sync v0.16.0 // indirect

View File

@ -52,6 +52,8 @@ golang.org/x/sync v0.3.0/go.mod h1:FU7BRWz2tNW+3quACPkgCx/L+uEAv1htQ0V83Z9Rj+Y=
golang.org/x/sync v0.6.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sync v0.11.0 h1:GGz8+XQP4FvTTrjZPzNKTMFtSXH80RAzG+5ghFPgK9w=
golang.org/x/sync v0.11.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sync v0.16.0 h1:ycBJEhp9p4vXvUZNszeOq0kGTPghopOL8q0fq3vstxw=
golang.org/x/sync v0.16.0/go.mod h1:1dzgHSNfp02xaA81J2MS99Qcpr2w7fw1gpm99rleRqA=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=

View File

@ -1,7 +1,5 @@
module _ // Auto generated by https://github.com/bwplotka/bingo. DO NOT EDIT
go 1.23.0
go 1.25.0
toolchain go1.24.0
require github.com/golangci/golangci-lint v1.64.5 // cmd/golangci-lint
require github.com/golangci/golangci-lint/v2 v2.4.0 // cmd/golangci-lint

File diff suppressed because it is too large Load Diff

View File

@ -1,5 +1,7 @@
module _ // Auto generated by https://github.com/bwplotka/bingo. DO NOT EDIT
go 1.14
go 1.21
require github.com/prometheus/promu v0.5.0
toolchain go1.23.8
require github.com/prometheus/promu v0.17.0

File diff suppressed because it is too large Load Diff

View File

@ -14,13 +14,13 @@ BINGO="${GOBIN}/bingo-v0.9.0"
CAPNPC_GO="${GOBIN}/capnpc-go-v3.0.1-alpha.2.0.20240830165715-46ccd63a72af"
FAILLINT="${GOBIN}/faillint-v1.13.0"
FAILLINT="${GOBIN}/faillint-v1.15.0"
GOIMPORTS="${GOBIN}/goimports-v0.23.0"
GOJSONTOYAML="${GOBIN}/gojsontoyaml-v0.1.0"
GOLANGCI_LINT="${GOBIN}/golangci-lint-v1.64.5"
GOLANGCI_LINT="${GOBIN}/golangci-lint-v2.4.0"
GOTESPLIT="${GOBIN}/gotesplit-v0.2.1"
@ -44,7 +44,7 @@ PROMETHEUS="${GOBIN}/prometheus-v0.54.1"
PROMTOOL="${GOBIN}/promtool-v0.47.0"
PROMU="${GOBIN}/promu-v0.5.0"
PROMU="${GOBIN}/promu-v0.17.0"
PROTOC_GEN_GOGOFAST="${GOBIN}/protoc-gen-gogofast-v1.3.2"

View File

@ -8,10 +8,10 @@ orbs:
executors:
golang:
docker:
- image: cimg/go:1.24.0-node
- image: cimg/go:1.25.0-node
golang-test:
docker:
- image: cimg/go:1.24.0-node
- image: cimg/go:1.25.0-node
- image: quay.io/thanos/docker-swift-onlyone-authv2-keystone:v0.1
jobs:

View File

@ -23,7 +23,7 @@ jobs:
- name: Install Go
uses: actions/setup-go@0a12ed9d6a96ab950c8f026ed9f722fe0da7ef32 # v5.0.2
with:
go-version: 1.24.x
go-version: 1.25.x
- uses: actions/cache@0c907a75c2c80ebcb7f088228285e798b750cf8f # v4.2.1
with:

View File

@ -34,7 +34,7 @@ jobs:
- name: Install Go.
uses: actions/setup-go@0a12ed9d6a96ab950c8f026ed9f722fe0da7ef32 # v5.0.2
with:
go-version: 1.24.x
go-version: 1.25.x
- name: Install bingo modules
run: make install-tool-deps
@ -55,7 +55,7 @@ jobs:
- name: Install Go
uses: actions/setup-go@0a12ed9d6a96ab950c8f026ed9f722fe0da7ef32 # v5.0.2
with:
go-version: 1.24.x
go-version: 1.25.x
- uses: actions/cache@0c907a75c2c80ebcb7f088228285e798b750cf8f # v4.2.1
with:
@ -82,7 +82,7 @@ jobs:
- name: Install Go
uses: actions/setup-go@0a12ed9d6a96ab950c8f026ed9f722fe0da7ef32 # v5.0.2
with:
go-version: 1.24.x
go-version: 1.25.x
- uses: actions/cache@0c907a75c2c80ebcb7f088228285e798b750cf8f # v4.2.1
with:
@ -109,7 +109,7 @@ jobs:
- name: Install Go
uses: actions/setup-go@0a12ed9d6a96ab950c8f026ed9f722fe0da7ef32 # v5.0.2
with:
go-version: 1.24.x
go-version: 1.25.x
- uses: actions/cache@0c907a75c2c80ebcb7f088228285e798b750cf8f # v4.2.1
with:
@ -158,7 +158,7 @@ jobs:
- name: Install Go.
uses: actions/setup-go@0a12ed9d6a96ab950c8f026ed9f722fe0da7ef32 # v5.0.2
with:
go-version: 1.24.x
go-version: 1.25.x
- uses: actions/cache@0c907a75c2c80ebcb7f088228285e798b750cf8f # v4.2.1
with:

View File

@ -1,81 +1,68 @@
# This file contains all available configuration options
# with their default values.
# options for analysis running
version: "2"
run:
# timeout for analysis, e.g. 30s, 5m, default is 1m
timeout: 5m
# exit code when at least one issue was found, default is 1
build-tags:
- slicelabels
issues-exit-code: 1
# output configuration options
output:
# The formats used to render issues.
formats:
- format: colored-line-number
path: stdout
# print lines of code with issue, default is true
print-issued-lines: true
# print linter name in the end of issue text, default is true
print-linter-name: true
linters:
enable:
# Sorted alphabetically.
- errcheck
- goconst
- godot
- misspell
- promlinter
- unparam
settings:
errcheck:
exclude-functions:
- (github.com/go-kit/log.Logger).Log
- fmt.Fprintln
- fmt.Fprint
goconst:
min-occurrences: 5
misspell:
locale: US
exclusions:
generated: lax
presets:
- comments
- common-false-positives
- legacy
- std-error-handling
rules:
- linters:
- promlinter
path: _test\.go
- linters:
- unused
text: SourceStoreAPI.implementsStoreAPI
- linters:
- unused
text: SourceStoreAPI.producesBlocks
- linters:
- unused
text: Source.producesBlocks
- linters:
- unused
text: newMockAlertmanager
- linters:
- unused
text: ruleAndAssert
paths:
- vendor
- internal/cortex
- .bingo
- third_party$
- builtin$
- examples$
formatters:
enable:
- gofmt
- goimports
- gosimple
- govet
- ineffassign
- misspell
- staticcheck
- typecheck
- unparam
- unused
- promlinter
linters-settings:
errcheck:
# List of functions to exclude from checking, where each entry is a single function to exclude.
exclude-functions:
- (github.com/go-kit/log.Logger).Log
- fmt.Fprintln
- fmt.Fprint
misspell:
locale: US
goconst:
min-occurrences: 5
issues:
exclude-rules:
# We don't check metrics naming in the tests.
- path: _test\.go
linters:
- promlinter
# These are not being checked since these methods exist
# so that no one else could implement them.
- linters:
- unused
text: "SourceStoreAPI.implementsStoreAPI"
- linters:
- unused
text: "SourceStoreAPI.producesBlocks"
- linters:
- unused
text: "Source.producesBlocks"
- linters:
- unused
text: "newMockAlertmanager"
- linters:
- unused
text: "ruleAndAssert"
# Which dirs to exclude: issues from them won't be reported.
exclude-dirs:
- vendor
- internal/cortex
exclusions:
generated: lax
paths:
- vendor
- internal/cortex
- .bingo
- third_party$
- builtin$
- examples$

View File

@ -50,3 +50,6 @@ validators:
# Frequent DNS issues.
- regex: 'build\.thebeat\.co'
type: 'ignore'
# TLS certificate issues
- regex: 'itnext\.io'
type: 'ignore'

View File

@ -1,12 +1,16 @@
go:
version: 1.23
version: 1.25
repository:
path: github.com/thanos-io/thanos
build:
binaries:
- name: thanos
path: ./cmd/thanos
flags: -a -tags netgo
flags: -a
tags:
all:
- netgo
- slicelabels
ldflags: |
-X github.com/prometheus/common/version.Version={{.Version}}
-X github.com/prometheus/common/version.Revision={{.Revision}}
@ -16,8 +20,6 @@ build:
crossbuild:
platforms:
- linux/amd64
- darwin
- linux/arm64
- windows/amd64
- freebsd/amd64
- linux/ppc64le

View File

@ -8,6 +8,30 @@ NOTE: As semantic versioning states all 0.y.z releases can contain breaking chan
We use *breaking :warning:* to mark changes that are not backward compatible (relates only to v0.y.z releases.)
## Unreleased
### Fixed
- [#8334](https://github.com/thanos-io/thanos/pull/8334) Query: wait for initial endpoint discovery before becoming ready
### Added
- [#8366](https://github.com/thanos-io/thanos/pull/8366) Store: optionally ignore Parquet migrated blocks
- [#8359](https://github.com/thanos-io/thanos/pull/8359) Tools: add `--shipper.upload-compacted` flag for uploading compacted blocks to bucket upload-blocks
### Changed
- [#8370](https://github.com/thanos-io/thanos/pull/8370) Query: announced labelset now reflects relabel-config
### Removed
### [v0.39.2](https://github.com/thanos-io/thanos/tree/release-0.39) - 2025 07 17
### Fixed
- [#8374](https://github.com/thanos-io/thanos/pull/8374) Query: fix panic when concurrently accessing annotations map
- [#8375](https://github.com/thanos-io/thanos/pull/8375) Query: fix native histogram buckets in distributed queries
### [v0.39.1](https://github.com/thanos-io/thanos/tree/release-0.39) - 2025 07 01
Fixes a memory leak issue on query-frontend. The bug only affects v0.39.0.
@ -17,7 +41,7 @@ Fixes a memory leak issue on query-frontend. The bug only affects v0.39.0.
- [#8349](https://github.com/thanos-io/thanos/pull/8349) Query-Frontend: properly clean up resources
- [#8338](https://github.com/thanos-io/thanos/pull/8338) Query-Frontend: use original roundtripper + close immediately
### [v0.39.0](https://github.com/thanos-io/thanos/tree/release-0.39) - 2025 06 25
## [v0.39.0](https://github.com/thanos-io/thanos/tree/release-0.39) - 2025 06 25
In short: there are a bunch of fixes and small improvements. The shining items in this release are memory usage improvements in Thanos Query and shuffle sharding support in Thanos Receiver. Information about shuffle sharding support is available in the documentation. Thank you to all contributors!
@ -42,7 +66,6 @@ In short: there are a bunch of fixes and small improvements. The shining items i
### Fixed
- [#8199](https://github.com/thanos-io/thanos/pull/8199) Query: handle panics or nil pointer dereference in querier gracefully when query analyze returns nil
- [#8211](https://github.com/thanos-io/thanos/pull/8211) Query: fix panic on nested partial response in distributed instant query
- [#8216](https://github.com/thanos-io/thanos/pull/8216) Query/Receive: fix iter race between `next()` and `stop()` introduced in https://github.com/thanos-io/thanos/pull/7821.
- [#8212](https://github.com/thanos-io/thanos/pull/8212) Receive: Ensure forward/replication metrics are incremented in err cases
@ -604,7 +627,7 @@ NOTE: Querier's `query.promql-engine` flag enabling new PromQL engine is now unh
- [#5785](https://github.com/thanos-io/thanos/pull/5785) Query: `thanos_store_nodes_grpc_connections` now trimms `external_labels` label name longer than 1000 character. It also allows customizations in what labels to preserve using `query.conn-metric.label` flag.
- [#5542](https://github.com/thanos-io/thanos/pull/5542) Mixin: Added query concurrency panel to Querier dashboard.
- [#5846](https://github.com/thanos-io/thanos/pull/5846) Query Frontend: vertical query sharding supports subqueries.
- [#5593](https://github.com/thanos-io/thanos/pull/5593) Cache: switch Redis client to [Rueidis](https://github.com/rueian/rueidis). Rueidis is [faster](https://github.com/rueian/rueidis#benchmark-comparison-with-go-redis-v9) and provides [client-side caching](https://redis.io/docs/manual/client-side-caching/). It is highly recommended to use it so that repeated requests for the same key would not be needed.
- [#5593](https://github.com/thanos-io/thanos/pull/5593) Cache: switch Redis client to [Rueidis](https://github.com/rueian/rueidis). Rueidis is [faster](https://github.com/rueian/rueidis#benchmark-comparison-with-go-redis-v9) and provides [client-side caching](https://redis.io/docs/develop/use/client-side-caching/). It is highly recommended to use it so that repeated requests for the same key would not be needed.
- [#5896](https://github.com/thanos-io/thanos/pull/5896) *: Upgrade Prometheus to v0.40.7 without implementing native histogram support. *Querying native histograms will fail with `Error executing query: invalid chunk encoding "<unknown>"` and native histograms in write requests are ignored.*
- [#5909](https://github.com/thanos-io/thanos/pull/5909) Receive: Compact tenant head after no appends have happened for 1.5 `tsdb.max-block-size`.
- [#5838](https://github.com/thanos-io/thanos/pull/5838) Mixin: Added data touched type to Store dashboard.

View File

@ -3,13 +3,13 @@ ARG BASE_DOCKER_SHA="14d68ca3d69fceaa6224250c83d81d935c053fb13594c811038c4611945
FROM quay.io/prometheus/busybox@sha256:${BASE_DOCKER_SHA}
LABEL maintainer="The Thanos Authors"
COPY /thanos_tmp_for_docker /bin/thanos
RUN adduser \
-D `#Dont assign a password` \
-H `#Dont create home directory` \
-u 1001 `#User id`\
thanos && \
chown thanos /bin/thanos
thanos
COPY --chown=thanos /thanos_tmp_for_docker /bin/thanos
USER 1001
ENTRYPOINT [ "/bin/thanos" ]

View File

@ -1,14 +1,14 @@
# Taking a non-alpine image for e2e tests so that cgo can be enabled for the race detector.
FROM golang:1.24.0 as builder
FROM golang:1.25.0 as builder
WORKDIR $GOPATH/src/github.com/thanos-io/thanos
COPY . $GOPATH/src/github.com/thanos-io/thanos
RUN CGO_ENABLED=1 go build -o $GOBIN/thanos -race ./cmd/thanos
RUN CGO_ENABLED=1 go build -tags slicelabels -o $GOBIN/thanos -race ./cmd/thanos
# -----------------------------------------------------------------------------
FROM golang:1.24.0
FROM golang:1.25.0
LABEL maintainer="The Thanos Authors"
COPY --from=builder $GOBIN/thanos /bin/thanos

View File

@ -5,11 +5,9 @@
| Bartłomiej Płotka | bwplotka@gmail.com | `@bwplotka` | [@bwplotka](https://github.com/bwplotka) | Google |
| Frederic Branczyk | fbranczyk@gmail.com | `@brancz` | [@brancz](https://github.com/brancz) | Polar Signals |
| Giedrius Statkevičius | giedriuswork@gmail.com | `@Giedrius Statkevičius` | [@GiedriusS](https://github.com/GiedriusS) | Vinted |
| Kemal Akkoyun | kakkoyun@gmail.com | `@kakkoyun` | [@kakkoyun](https://github.com/kakkoyun) | Independent |
| Lucas Servén Marín | lserven@gmail.com | `@squat` | [@squat](https://github.com/squat) | Red Hat |
| Prem Saraswat | prmsrswt@gmail.com | `@Prem Saraswat` | [@onprem](https://github.com/onprem) | Red Hat |
| Ben Ye | yb532204897@gmail.com | `@yeya24` | [@yeya24](https://github.com/yeya24) | Amazon Web Services |
| Matej Gera | matejgera@gmail.com | `@Matej Gera` | [@matej-g](https://github.com/matej-g) | Coralogix |
| Filip Petkovski | filip.petkovsky@gmail.com | `@Filip Petkovski` | [@fpetkovski](https://github.com/fpetkovski) | Shopify |
| Saswata Mukherjee | saswata.mukhe@gmail.com | `@saswatamcode` | [@saswatamcode](https://github.com/saswatamcode) | Red Hat |
| Michael Hoffmann | mhoffm@posteo.de | `@Michael Hoffmann` | [@MichaHoffmann](https://github.com/MichaHoffmann) | Cloudflare |
@ -62,7 +60,7 @@ This helps to also estimate how long it can potentially take to review the PR or
#### Help wanted
`help wanted ` label should be present if the issue is not really assigned (or the PR has to be reviewed) and we are looking for the volunteers (:
`help wanted` label should be present if the issue is not really assigned (or the PR has to be reviewed) and we are looking for the volunteers (:
#### Good first issue
@ -104,8 +102,15 @@ In time we plan to set up maintainers team that will be organization independent
## Initial authors
Fabian Reinartz @fabxc and Bartłomiej Płotka @bwplotka
Fabian Reinartz [@fabxc](https://github.com/fabxc) and Bartłomiej Płotka [@bwplotka](https://github.com/bwplotka)
## Previous Maintainers
## Emeritus Maintainers
Dominic Green, Povilas Versockas, Marco Pracucci, Matthias Loibl
| Name | GitHub |
|-------------------|----------------------------------------------|
| Dominic Green | [@domgreen](https://github.com/domgreen) |
| Povilas Versockas | [@povilasv](https://github.com/povilasv) |
| Marco Pracucci | [@pracucci](https://github.com/pracucci) |
| Matthias Loibl | [@metalmatze](https://github.com/metalmatze) |
| Kemal Akkoyun | [@kakkoyun](https://github.com/kakkoyun) |
| Matej Gera | [@matej-g](https://github.com/matej-g) |

View File

@ -149,7 +149,7 @@ react-app-start: $(REACT_APP_NODE_MODULES_PATH)
build: ## Builds Thanos binary using `promu`.
build: check-git deps $(PROMU)
@echo ">> building Thanos binary in $(PREFIX)"
@$(PROMU) build --prefix $(PREFIX)
@$(PROMU) build -v --prefix $(PREFIX)
GIT_BRANCH=$(shell $(GIT) rev-parse --abbrev-ref HEAD)
.PHONY: crossbuild
@ -319,7 +319,7 @@ test: export THANOS_TEST_ALERTMANAGER_PATH= $(ALERTMANAGER)
test: check-git install-tool-deps
@echo ">> install thanos GOOPTS=${GOOPTS}"
@echo ">> running unit tests (without /test/e2e). Do export THANOS_TEST_OBJSTORE_SKIP=GCS,S3,AZURE,SWIFT,COS,ALIYUNOSS,BOS,OCI,OBS if you want to skip e2e tests against all real store buckets. Current value: ${THANOS_TEST_OBJSTORE_SKIP}"
@go test -race -timeout 15m $(shell go list ./... | grep -v /vendor/ | grep -v /test/e2e);
@go test -tags slicelabels -race -timeout 15m $(shell go list ./... | grep -v /vendor/ | grep -v /test/e2e);
.PHONY: test-local
test-local: ## Runs test excluding tests for ALL object storage integrations.
@ -341,9 +341,9 @@ test-e2e: docker-e2e $(GOTESPLIT)
# * If you want to limit CPU time available in e2e tests then pass E2E_DOCKER_CPUS environment variable. For example, E2E_DOCKER_CPUS=0.05 limits CPU time available
# to spawned Docker containers to 0.05 cores.
@if [ -n "$(SINGLE_E2E_TEST)" ]; then \
$(GOTESPLIT) -total ${GH_PARALLEL} -index ${GH_INDEX} ./test/e2e -- -run $(SINGLE_E2E_TEST) ${GOTEST_OPTS}; \
$(GOTESPLIT) -total ${GH_PARALLEL} -index ${GH_INDEX} ./test/e2e -- -tags slicelabels -run $(SINGLE_E2E_TEST) ${GOTEST_OPTS}; \
else \
$(GOTESPLIT) -total ${GH_PARALLEL} -index ${GH_INDEX} ./test/e2e/... -- ${GOTEST_OPTS}; \
$(GOTESPLIT) -total ${GH_PARALLEL} -index ${GH_INDEX} ./test/e2e/... -- -tags slicelabels ${GOTEST_OPTS}; \
fi
.PHONY: test-e2e-local
@ -418,7 +418,7 @@ github.com/prometheus/prometheus/promql/parser.{ParseExpr,ParseMetricSelector}=g
io/ioutil.{Discard,NopCloser,ReadAll,ReadDir,ReadFile,TempDir,TempFile,Writefile}" $(shell go list ./... | grep -v "internal/cortex")
@$(FAILLINT) -paths "fmt.{Print,Println,Sprint}" -ignore-tests ./...
@echo ">> linting all of the Go files GOGC=${GOGC}"
@$(GOLANGCI_LINT) run
@$(GOLANGCI_LINT) run --build-tags=slicelabels
@echo ">> ensuring Copyright headers"
@go run ./scripts/copyright
@echo ">> ensuring generated proto files are up to date"

View File

@ -1 +1 @@
0.39.1
0.40.0-dev

View File

@ -403,6 +403,7 @@ func runCompact(
insBkt,
conf.compactionConcurrency,
conf.skipBlockWithOutOfOrderChunks,
blocksCleaner,
)
if err != nil {
return errors.Wrap(err, "create bucket compactor")
@ -438,14 +439,7 @@ func runCompact(
cleanMtx.Lock()
defer cleanMtx.Unlock()
if err := sy.SyncMetas(ctx); err != nil {
return errors.Wrap(err, "syncing metas")
}
compact.BestEffortCleanAbortedPartialUploads(ctx, logger, sy.Partial(), insBkt, compactMetrics.partialUploadDeleteAttempts, compactMetrics.blocksCleaned, compactMetrics.blockCleanupFailures)
if err := blocksCleaner.DeleteMarkedBlocks(ctx); err != nil {
return errors.Wrap(err, "cleaning marked blocks")
}
compact.BestEffortCleanAbortedPartialUploads(ctx, logger, sy.Partial(), insBkt, compactMetrics.partialUploadDeleteAttempts, compactMetrics.blocksCleaned, compactMetrics.blockCleanupFailures, ignoreDeletionMarkFilter.DeletionMarkBlocks())
compactMetrics.cleanups.Inc()
return nil

View File

@ -253,10 +253,8 @@ func downsampleBucket(
defer workerCancel()
level.Debug(logger).Log("msg", "downsampling bucket", "concurrency", downsampleConcurrency)
for i := 0; i < downsampleConcurrency; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for range downsampleConcurrency {
wg.Go(func() {
for m := range metaCh {
resolution := downsample.ResLevel1
errMsg := "downsampling to 5 min"
@ -271,7 +269,7 @@ func downsampleBucket(
}
metrics.downsamples.WithLabelValues(m.Thanos.ResolutionString()).Inc()
}
}()
})
}
// Workers scheduled, distribute blocks.

View File

@ -42,9 +42,10 @@ type fileContent interface {
}
type endpointSettings struct {
Strict bool `yaml:"strict"`
Group bool `yaml:"group"`
Address string `yaml:"address"`
Strict bool `yaml:"strict"`
Group bool `yaml:"group"`
Address string `yaml:"address"`
ServiceConfig string `yaml:"service_config"`
}
type EndpointConfig struct {
@ -115,6 +116,9 @@ func validateEndpointConfig(cfg EndpointConfig) error {
if dns.IsDynamicNode(ecfg.Address) && ecfg.Strict {
return errors.Newf("%s is a dynamically specified endpoint i.e. it uses SD and that is not permitted under strict mode.", ecfg.Address)
}
if !ecfg.Group && len(ecfg.ServiceConfig) != 0 {
return errors.Newf("%s service_config is only valid for endpoint groups.", ecfg.Address)
}
}
return nil
}
@ -257,6 +261,33 @@ func setupEndpointSet(
}
legacyFileSDCache := cache.New()
// Perform initial DNS resolution before starting periodic updates.
// This ensures that DNS providers have addresses when the first endpoint update runs.
{
resolveCtx, resolveCancel := context.WithTimeout(context.Background(), dnsSDInterval)
defer resolveCancel()
level.Info(logger).Log("msg", "performing initial DNS resolution for endpoints")
endpointConfig := configProvider.config()
addresses := make([]string, 0, len(endpointConfig.Endpoints))
for _, ecfg := range endpointConfig.Endpoints {
// Only resolve non-group dynamic endpoints here.
// Group endpoints are resolved by the gRPC resolver in its Build() method.
if addr := ecfg.Address; dns.IsDynamicNode(addr) && !ecfg.Group {
addresses = append(addresses, addr)
}
}
// Note: legacyFileSDCache will be empty at this point since file SD hasn't started yet
if len(addresses) > 0 {
if err := dnsEndpointProvider.Resolve(resolveCtx, addresses, true); err != nil {
level.Error(logger).Log("msg", "initial DNS resolution failed", "err", err)
}
}
level.Info(logger).Log("msg", "initial DNS resolution completed")
}
ctx, cancel := context.WithCancel(context.Background())
if fileSD != nil {
@ -321,7 +352,7 @@ func setupEndpointSet(
for _, ecfg := range endpointConfig.Endpoints {
strict, group, addr := ecfg.Strict, ecfg.Group, ecfg.Address
if group {
specs = append(specs, query.NewGRPCEndpointSpec(fmt.Sprintf("thanos:///%s", addr), strict, append(dialOpts, extgrpc.EndpointGroupGRPCOpts()...)...))
specs = append(specs, query.NewGRPCEndpointSpec(fmt.Sprintf("thanos:///%s", addr), strict, append(dialOpts, extgrpc.EndpointGroupGRPCOpts(ecfg.ServiceConfig)...)...))
} else if !dns.IsDynamicNode(addr) {
specs = append(specs, query.NewGRPCEndpointSpec(addr, strict, dialOpts...))
}

View File

@ -73,7 +73,7 @@ func main() {
// Running in container with limits but with empty/wrong value of GOMAXPROCS env var could lead to throttling by cpu
// maxprocs will automate adjustment by using cgroups info about cpu limit if it set as value for runtime.GOMAXPROCS.
undo, err := maxprocs.Set(maxprocs.Logger(func(template string, args ...interface{}) {
undo, err := maxprocs.Set(maxprocs.Logger(func(template string, args ...any) {
level.Debug(logger).Log("msg", fmt.Sprintf(template, args...))
}))
defer undo()

View File

@ -33,6 +33,11 @@ type erroringBucket struct {
bkt objstore.InstrumentedBucket
}
// Provider returns the provider of the bucket.
func (b *erroringBucket) Provider() objstore.ObjProvider {
return b.bkt.Provider()
}
func (b *erroringBucket) Close() error {
return b.bkt.Close()
}
@ -91,8 +96,8 @@ func (b *erroringBucket) Attributes(ctx context.Context, name string) (objstore.
// Upload the contents of the reader as an object into the bucket.
// Upload should be idempotent.
func (b *erroringBucket) Upload(ctx context.Context, name string, r io.Reader) error {
return b.bkt.Upload(ctx, name, r)
func (b *erroringBucket) Upload(ctx context.Context, name string, r io.Reader, opts ...objstore.ObjectUploadOption) error {
return b.bkt.Upload(ctx, name, r, opts...)
}
// Delete removes the object with the given name.
@ -134,9 +139,9 @@ func TestRegression4960_Deadlock(t *testing.T) {
id, err = e2eutil.CreateBlock(
ctx,
dir,
[]labels.Labels{{{Name: "a", Value: "1"}}},
[]labels.Labels{labels.FromStrings("a", "1")},
1, 0, downsample.ResLevel1DownsampleRange+1, // Pass the minimum ResLevel1DownsampleRange check.
labels.Labels{{Name: "e1", Value: "1"}},
labels.FromStrings("e1", "1"),
downsample.ResLevel0, metadata.NoneFunc, nil)
testutil.Ok(t, err)
testutil.Ok(t, block.Upload(ctx, logger, bkt, path.Join(dir, id.String()), metadata.NoneFunc))
@ -145,9 +150,9 @@ func TestRegression4960_Deadlock(t *testing.T) {
id2, err = e2eutil.CreateBlock(
ctx,
dir,
[]labels.Labels{{{Name: "a", Value: "2"}}},
[]labels.Labels{labels.FromStrings("a", "2")},
1, 0, downsample.ResLevel1DownsampleRange+1, // Pass the minimum ResLevel1DownsampleRange check.
labels.Labels{{Name: "e1", Value: "2"}},
labels.FromStrings("e1", "2"),
downsample.ResLevel0, metadata.NoneFunc, nil)
testutil.Ok(t, err)
testutil.Ok(t, block.Upload(ctx, logger, bkt, path.Join(dir, id2.String()), metadata.NoneFunc))
@ -156,9 +161,9 @@ func TestRegression4960_Deadlock(t *testing.T) {
id3, err = e2eutil.CreateBlock(
ctx,
dir,
[]labels.Labels{{{Name: "a", Value: "2"}}},
[]labels.Labels{labels.FromStrings("a", "2")},
1, 0, downsample.ResLevel1DownsampleRange+1, // Pass the minimum ResLevel1DownsampleRange check.
labels.Labels{{Name: "e1", Value: "2"}},
labels.FromStrings("e1", "2"),
downsample.ResLevel0, metadata.NoneFunc, nil)
testutil.Ok(t, err)
testutil.Ok(t, block.Upload(ctx, logger, bkt, path.Join(dir, id3.String()), metadata.NoneFunc))
@ -196,9 +201,9 @@ func TestCleanupDownsampleCacheFolder(t *testing.T) {
id, err = e2eutil.CreateBlock(
ctx,
dir,
[]labels.Labels{{{Name: "a", Value: "1"}}},
[]labels.Labels{labels.FromStrings("a", "1")},
1, 0, downsample.ResLevel1DownsampleRange+1, // Pass the minimum ResLevel1DownsampleRange check.
labels.Labels{{Name: "e1", Value: "1"}},
labels.FromStrings("e1", "1"),
downsample.ResLevel0, metadata.NoneFunc, nil)
testutil.Ok(t, err)
testutil.Ok(t, block.Upload(ctx, logger, bkt, path.Join(dir, id.String()), metadata.NoneFunc))

View File

@ -4,6 +4,7 @@
package main
import (
"context"
"fmt"
"net/http"
"strings"
@ -553,6 +554,7 @@ func runQuery(
tenantCertField,
enforceTenancy,
tenantLabel,
tsdbSelector,
)
api.Register(router.WithPrefix("/api/v1"), tracer, logger, ins, logMiddleware)
@ -622,9 +624,25 @@ func runQuery(
)
g.Add(func() error {
// Wait for initial endpoint update before marking as ready
// Use store response timeout as timeout, if not set, use 30 seconds as default
timeout := storeResponseTimeout
if timeout == 0 {
timeout = 30 * time.Second
}
ctx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel()
level.Info(logger).Log("msg", "waiting for initial endpoint discovery before marking gRPC as ready", "timeout", timeout)
if err := endpointSet.WaitForFirstUpdate(ctx); err != nil {
level.Warn(logger).Log("msg", "timeout waiting for first endpoint update before marking gRPC as ready", "err", err, "timeout", timeout)
} else {
level.Info(logger).Log("msg", "initial endpoint discovery completed, marking gRPC as ready")
}
statusProber.Ready()
return s.ListenAndServe()
}, func(error) {
}, func(err error) {
statusProber.NotReady(err)
s.Shutdown(err)
endpointSet.Close()
@ -660,10 +678,7 @@ func LookbackDeltaFactory(
}
return func(maxSourceResolutionMillis int64) time.Duration {
for i := len(resolutions) - 1; i >= 1; i-- {
left := resolutions[i-1]
if resolutions[i-1] < ld {
left = ld
}
left := max(resolutions[i-1], ld)
if left < maxSourceResolutionMillis {
return lds[i]
}

View File

@ -5,6 +5,7 @@ package main
import (
"context"
"maps"
"net"
"net/http"
"time"
@ -74,10 +75,10 @@ func registerQueryFrontend(app *extkingpin.App) {
// Query range tripperware flags.
cmd.Flag("query-range.align-range-with-step", "Mutate incoming queries to align their start and end with their step for better cache-ability. Note: Grafana dashboards do that by default.").
Default("true").BoolVar(&cfg.QueryRangeConfig.AlignRangeWithStep)
Default("true").BoolVar(&cfg.AlignRangeWithStep)
cmd.Flag("query-range.request-downsampled", "Make additional query for downsampled data in case of empty or incomplete response to range request.").
Default("true").BoolVar(&cfg.QueryRangeConfig.RequestDownsampled)
Default("true").BoolVar(&cfg.RequestDownsampled)
cmd.Flag("query-range.split-interval", "Split query range requests by an interval and execute in parallel, it should be greater than 0 when query-range.response-cache-config is configured.").
Default("24h").DurationVar(&cfg.QueryRangeConfig.SplitQueriesByInterval)
@ -85,13 +86,13 @@ func registerQueryFrontend(app *extkingpin.App) {
cmd.Flag("query-range.min-split-interval", "Split query range requests above this interval in query-range.horizontal-shards requests of equal range. "+
"Using this parameter is not allowed with query-range.split-interval. "+
"One should also set query-range.split-min-horizontal-shards to a value greater than 1 to enable splitting.").
Default("0").DurationVar(&cfg.QueryRangeConfig.MinQuerySplitInterval)
Default("0").DurationVar(&cfg.MinQuerySplitInterval)
cmd.Flag("query-range.max-split-interval", "Split query range below this interval in query-range.horizontal-shards. Queries with a range longer than this value will be split in multiple requests of this length.").
Default("0").DurationVar(&cfg.QueryRangeConfig.MaxQuerySplitInterval)
Default("0").DurationVar(&cfg.MaxQuerySplitInterval)
cmd.Flag("query-range.horizontal-shards", "Split queries in this many requests when query duration is below query-range.max-split-interval.").
Default("0").Int64Var(&cfg.QueryRangeConfig.HorizontalShards)
Default("0").Int64Var(&cfg.HorizontalShards)
cmd.Flag("query-range.max-retries-per-request", "Maximum number of retries for a single query range request; beyond this, the downstream error is returned.").
Default("5").IntVar(&cfg.QueryRangeConfig.MaxRetries)
@ -300,9 +301,7 @@ func runQueryFrontend(
}
if cfg.EnableXFunctions {
for fname, v := range parse.XFunctions {
parser.Functions[fname] = v
}
maps.Copy(parser.Functions, parse.XFunctions)
}
if len(cfg.EnableFeatures) > 0 {

View File

@ -26,7 +26,7 @@ import (
"github.com/prometheus/prometheus/model/labels"
"github.com/prometheus/prometheus/model/relabel"
"github.com/prometheus/prometheus/tsdb"
"github.com/prometheus/prometheus/tsdb/wlog"
"github.com/prometheus/prometheus/util/compression"
"github.com/thanos-io/objstore"
"github.com/thanos-io/objstore/client"
objstoretracing "github.com/thanos-io/objstore/tracing/opentracing"
@ -35,6 +35,7 @@ import (
"github.com/thanos-io/thanos/pkg/block/metadata"
"github.com/thanos-io/thanos/pkg/component"
"github.com/thanos-io/thanos/pkg/compressutil"
"github.com/thanos-io/thanos/pkg/exemplars"
"github.com/thanos-io/thanos/pkg/extgrpc"
"github.com/thanos-io/thanos/pkg/extgrpc/snappy"
@ -93,7 +94,7 @@ func registerReceive(app *extkingpin.App) {
MaxBytes: int64(conf.tsdbMaxBytes),
OutOfOrderCapMax: conf.tsdbOutOfOrderCapMax,
NoLockfile: conf.noLockFile,
WALCompression: wlog.ParseCompressionType(conf.walCompression, string(wlog.CompressionSnappy)),
WALCompression: compressutil.ParseCompressionType(conf.walCompression, compression.Snappy),
MaxExemplars: conf.tsdbMaxExemplars,
EnableExemplarStorage: conf.tsdbMaxExemplars > 0,
HeadChunksWriteQueueSize: int(conf.tsdbWriteQueueSize),

View File

@ -8,6 +8,7 @@ import (
"context"
"fmt"
"html/template"
"maps"
"math/rand"
"net/http"
"net/url"
@ -41,7 +42,7 @@ import (
"github.com/prometheus/prometheus/storage/remote"
"github.com/prometheus/prometheus/tsdb"
"github.com/prometheus/prometheus/tsdb/agent"
"github.com/prometheus/prometheus/tsdb/wlog"
"github.com/prometheus/prometheus/util/compression"
"gopkg.in/yaml.v2"
"github.com/thanos-io/objstore"
@ -54,6 +55,7 @@ import (
"github.com/thanos-io/thanos/pkg/block/metadata"
"github.com/thanos-io/thanos/pkg/clientconfig"
"github.com/thanos-io/thanos/pkg/component"
"github.com/thanos-io/thanos/pkg/compressutil"
"github.com/thanos-io/thanos/pkg/discovery/dns"
"github.com/thanos-io/thanos/pkg/errutil"
"github.com/thanos-io/thanos/pkg/extannotations"
@ -112,8 +114,9 @@ type ruleConfig struct {
storeRateLimits store.SeriesSelectLimits
ruleConcurrentEval int64
extendedFunctionsEnabled bool
EnableFeatures []string
extendedFunctionsEnabled bool
EnableFeatures []string
tsdbEnableNativeHistograms bool
}
type Expression struct {
@ -170,6 +173,10 @@ func registerRule(app *extkingpin.App) {
cmd.Flag("query.enable-x-functions", "Whether to enable extended rate functions (xrate, xincrease and xdelta). Only has effect when used with Thanos engine.").Default("false").BoolVar(&conf.extendedFunctionsEnabled)
cmd.Flag("enable-feature", "Comma separated feature names to enable. Valid options for now: promql-experimental-functions (enables promql experimental functions for ruler)").Default("").StringsVar(&conf.EnableFeatures)
cmd.Flag("tsdb.enable-native-histograms",
"[EXPERIMENTAL] Enables the ingestion of native histograms.").
Default("false").BoolVar(&conf.tsdbEnableNativeHistograms)
conf.rwConfig = extflag.RegisterPathOrContent(cmd, "remote-write.config", "YAML config for the remote-write configurations, that specify servers where samples should be sent to (see https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write). This automatically enables stateless mode for ruler and no series will be stored in the ruler's TSDB. If an empty config (or file) is provided, the flag is ignored and ruler is run with its own TSDB.", extflag.WithEnvSubstitution())
conf.objStoreConfig = extkingpin.RegisterCommonObjStoreFlags(cmd, "", false)
@ -189,15 +196,16 @@ func registerRule(app *extkingpin.App) {
}
tsdbOpts := &tsdb.Options{
MinBlockDuration: int64(time.Duration(*tsdbBlockDuration) / time.Millisecond),
MaxBlockDuration: int64(time.Duration(*tsdbBlockDuration) / time.Millisecond),
RetentionDuration: int64(time.Duration(*tsdbRetention) / time.Millisecond),
NoLockfile: *noLockFile,
WALCompression: wlog.ParseCompressionType(*walCompression, string(wlog.CompressionSnappy)),
MinBlockDuration: int64(time.Duration(*tsdbBlockDuration) / time.Millisecond),
MaxBlockDuration: int64(time.Duration(*tsdbBlockDuration) / time.Millisecond),
RetentionDuration: int64(time.Duration(*tsdbRetention) / time.Millisecond),
NoLockfile: *noLockFile,
WALCompression: compressutil.ParseCompressionType(*walCompression, compression.Snappy),
EnableNativeHistograms: conf.tsdbEnableNativeHistograms,
}
agentOpts := &agent.Options{
WALCompression: wlog.ParseCompressionType(*walCompression, string(wlog.CompressionSnappy)),
WALCompression: compressutil.ParseCompressionType(*walCompression, compression.Snappy),
NoLockfile: *noLockFile,
}
@ -580,9 +588,7 @@ func runRule(
)
{
if conf.extendedFunctionsEnabled {
for k, fn := range parse.XFunctions {
parser.Functions[k] = fn
}
maps.Copy(parser.Functions, parse.XFunctions)
}
if len(conf.EnableFeatures) > 0 {

View File

@ -393,14 +393,16 @@ func runStore(
return errors.Errorf("unknown sync strategy %s", conf.blockListStrategy)
}
ignoreDeletionMarkFilter := block.NewIgnoreDeletionMarkFilter(logger, insBkt, time.Duration(conf.ignoreDeletionMarksDelay), conf.blockMetaFetchConcurrency)
metaFetcher, err := block.NewMetaFetcher(logger, conf.blockMetaFetchConcurrency, insBkt, blockLister, dataDir, extprom.WrapRegistererWithPrefix("thanos_", reg),
[]block.MetadataFilter{
block.NewTimePartitionMetaFilter(conf.filterConf.MinTime, conf.filterConf.MaxTime),
block.NewLabelShardedMetaFilter(relabelConfig),
block.NewConsistencyDelayMetaFilter(logger, time.Duration(conf.consistencyDelay), extprom.WrapRegistererWithPrefix("thanos_", reg)),
ignoreDeletionMarkFilter,
block.NewDeduplicateFilter(conf.blockMetaFetchConcurrency),
})
filters := []block.MetadataFilter{
block.NewTimePartitionMetaFilter(conf.filterConf.MinTime, conf.filterConf.MaxTime),
block.NewLabelShardedMetaFilter(relabelConfig),
block.NewConsistencyDelayMetaFilter(logger, time.Duration(conf.consistencyDelay), extprom.WrapRegistererWithPrefix("thanos_", reg)),
ignoreDeletionMarkFilter,
block.NewDeduplicateFilter(conf.blockMetaFetchConcurrency),
block.NewParquetMigratedMetaFilter(logger),
}
metaFetcher, err := block.NewMetaFetcher(logger, conf.blockMetaFetchConcurrency, insBkt, blockLister, dataDir, extprom.WrapRegistererWithPrefix("thanos_", reg), filters)
if err != nil {
return errors.Wrap(err, "meta fetcher")
}

View File

@ -166,8 +166,9 @@ type bucketMarkBlockConfig struct {
}
type bucketUploadBlocksConfig struct {
path string
labels []string
path string
labels []string
uploadCompacted bool
}
func (tbc *bucketVerifyConfig) registerBucketVerifyFlag(cmd extkingpin.FlagClause) *bucketVerifyConfig {
@ -300,6 +301,7 @@ func (tbc *bucketRetentionConfig) registerBucketRetentionFlag(cmd extkingpin.Fla
func (tbc *bucketUploadBlocksConfig) registerBucketUploadBlocksFlag(cmd extkingpin.FlagClause) *bucketUploadBlocksConfig {
cmd.Flag("path", "Path to the directory containing blocks to upload.").Default("./data").StringVar(&tbc.path)
cmd.Flag("label", "External labels to add to the uploaded blocks (repeated).").PlaceHolder("key=\"value\"").StringsVar(&tbc.labels)
cmd.Flag("shipper.upload-compacted", "If true shipper will try to upload compacted blocks as well.").Default("false").BoolVar(&tbc.uploadCompacted)
return tbc
}
@ -915,8 +917,8 @@ func registerBucketCleanup(app extkingpin.AppClause, objStoreConfig *extflag.Pat
level.Info(logger).Log("msg", "synced blocks done")
compact.BestEffortCleanAbortedPartialUploads(ctx, logger, sy.Partial(), insBkt, stubCounter, stubCounter, stubCounter)
if err := blocksCleaner.DeleteMarkedBlocks(ctx); err != nil {
compact.BestEffortCleanAbortedPartialUploads(ctx, logger, sy.Partial(), insBkt, stubCounter, stubCounter, stubCounter, ignoreDeletionMarkFilter.DeletionMarkBlocks())
if _, err := blocksCleaner.DeleteMarkedBlocks(ctx); err != nil {
return errors.Wrap(err, "error cleaning blocks")
}
@ -1509,6 +1511,7 @@ func registerBucketUploadBlocks(app extkingpin.AppClause, objStoreConfig *extfla
shipper.WithSource(metadata.BucketUploadSource),
shipper.WithMetaFileName(shipper.DefaultMetaFilename),
shipper.WithLabels(func() labels.Labels { return lset }),
shipper.WithUploadCompacted(tbc.uploadCompacted),
)
ctx, cancel := context.WithCancel(context.Background())

View File

@ -0,0 +1,245 @@
---
title: Life of a Sample in Thanos and How to Configure It Data Management Part II
date: "2024-09-16"
author: Thibault Mangé (https://github.com/thibaultmg)
---
## Life of a Sample in Thanos and How to Configure It Data Management Part II
### Introduction
In the first part of this series, we followed the life of a sample from its inception in a Prometheus server to our Thanos Receivers. We will now explore how Thanos manages the data ingested by the Receivers and optimizes it in the object store for reduced cost and fast retrieval.
Let's delve into these topics and more in the second part of the series.
### Preparing Samples for Object Storage: Building Chunks and Blocks
#### Using Object Storage
A key feature of Thanos is its ability to leverage economical object storage solutions like AWS S3 for long-term data retention. This contrasts with Prometheus's typical approach of storing data locally for shorter periods.
The Receive component is responsible for preparing data for object storage. Thanos adopts the TSDB (Time Series Database) data model, with some adaptations, for its object storage. This involves aggregating samples over time to construct TSDB Blocks. Please refer to the annexes of the first part if this vocabulary is not clear to you.
These blocks are built by aggregating data over two-hour periods. Once a block is ready, it is sent to the object storage, which is configured using the `--objstore.config` flag. This configuration is uniform across all components requiring object storage access.
On restarts, the Receive component ensures data preservation by immediately flushing existing data to object storage, even if it does not constitute a full two-hour block. These partial blocks are less efficient but are then optimized by the compactor, as we will see later.
The Receive is also able to [isolate data](https://thanos.io/tip/components/receive.md/#tenant-lifecycle-management) coming from different tenants. The tenant can be identified in the request by different means: a header (`--receive.tenant-header`), a label (`--receive.split-tenant-label-name`) or a certificate (`--receive.tenant-certificate-field`). Their data is ingested into different TSDBs instances (you might hear this referred to as the multiTSDB). The benefits are twofold:
* It allows for parallelization of the block-building process, especially on the compactor side as we will see later.
* It allows for smaller indexes. Indeed, labels tend to be similar for samples coming from the same source, leading to more effective compression.
<img src="img/life-of-a-sample/multi-tsdb.png" alt="Data expansion" style="max-width: 600px; display: block;margin: 0 auto;"/>
When a block is ready, it is uploaded to the object store with the block external label defined by the flag `--receive.tenant-label-name`. This corresponds to the `thanos.labels` field of the [block metadata](https://thanos.io/tip/thanos/storage.md/#metadata-file-metajson). This will be used by the compactor to group blocks together, as we will see later.
#### Exposing Local Data for Queries
During the block-building phase, the data is not accessible to the Store Gateway as it has not been uploaded to the object store yet. To counter that, the Receive component also serves as a data store, making the local data available for query through the `Store API`. This is a common gRPC API used across all Thanos components for time series data access, set with the `--grpc-address` flag. The Receive will serve all data it has. The more data it serves, the more resources it will use for this duty in addition to ingesting client data.
<img src="img/life-of-a-sample/receive-store-api.png" alt="Data expansion" style="max-width: 600px; display: block;margin: 0 auto;"/>
The amount of data the Receive component serves can be managed through two parameters:
* `--tsdb.retention`: Sets the local storage retention duration. The minimum is 2 hours, aligning with block construction periods.
* `--store.limits.request-samples` and `--store.limits.request-series`: These parameters limit the volume of data that can be queried by setting a maximum on the number of samples and/or the number of series. If these limits are exceeded, the query will be denied to ensure system stability.
Key points to consider:
* The primary objective of the Receive component is to ensure **reliable data ingestion**. However, the more data it serves through the Store API, the more resources it will use for this duty in addition to ingesting client data. You should set the retention duration to the minimum required for your use case to optimize resource allocation. The minimum value for 2-hour blocks would be a 4-hour retention to account for availability in the Store Gateway after the block is uploaded to object storage. To prevent data loss, if the Receive component fails to upload blocks before the retention limit is reached, it will hold them until the upload succeeds.
* Even when the retention duration is short, your Receive instance could be overwhelmed by a query selecting too much data. You should set limits in place to ensure the stability of the Receive instances. These limits must be carefully set to enable Store API clients to retrieve the data they need while preventing resource exhaustion. The longer the retention, the higher the limits should be as the number of samples and series will increase.
### Maintaining Data: Compaction, Downsampling, and Retention
#### The Need for Compaction
The Receive component implements many strategies to ingest samples reliably. However, this can result in unoptimized data in object storage. This is due to:
* Inefficient partial blocks sent to object storage on shutdowns.
* Duplicated data when replication is set. Several Receive instances will send the same data to object storage.
* Incomplete blocks (invalid blocks) sent to object storage when the Receive fails in the middle of an upload.
The following diagram illustrates the impact on data expansion in object storage when samples from a given target are ingested from a high-availability Prometheus setup (with 2 instances) and replication is set on the Receive (factor 3):
<img src="img/life-of-a-sample/data-expansion.png" alt="Data expansion" style="max-width: 600px; display: block;margin: 0 auto;"/>
This leads to a threefold increase in label volume (one for each block) and a sixfold increase in sample volume! This is where the Compactor comes into play.
The Compactor component is responsible for maintaining and optimizing data in object storage. It is a long-running process when configured to wait for new blocks with the `--wait` flag. It also needs access to the object storage using the `--objstore.config` flag.
Under normal operating conditions, the Compactor will check for new blocks every 5 minutes. By default, it will only consider blocks that are older than 30 minutes (configured with the `--consistency-delay` flag) to avoid reading partially uploaded blocks. It will then process these blocks in a structured manner, compacting them according to defined settings that we will discuss in the next sections.
#### Compaction Modes
Compaction consists of merging blocks that have overlapping or adjacent time ranges. This is called **horizontal compaction**. Using the [Metadata file](https://thanos.io/tip/thanos/storage.md/#metadata-file-metajson) which contains the minimum and maximum timestamps of samples in the block, the Compactor can determine if two blocks overlap. If they do, they are merged into a new block. This new block will have its compaction level index increased by one. So from two adjacent blocks of 2 hours each, we will get a new block of 4 hours.
During this compaction, the Compactor will also deduplicate samples. This is called [**vertical compaction**](https://thanos.io/tip/components/compact.md/#vertical-compactions). The Compactor provides two deduplication modes:
* `one-to-one`: This is the default mode. It will deduplicate samples that have the same timestamp and the same value but different replica label values. The replica label is configured by the `--deduplication.replica-label` flag. This flag can be repeated to account for several replication labels. Usually set to `replica`, make sure it is set up as external label on the Receivers with the flag `--label=replica=xxx`. The benefit of this mode is that it is straightforward and will remove replicated data from the Receive. However, it is not able to remove data replicated by high-availability Prometheus setups because these samples will rarely be scraped at exactly the same timestamps, as demonstrated by the diagram below.
* `penalty`: This a more complex deduplication algorithm that is able to deduplicate data coming from high availability prometheus setups. It can be set with the `--deduplication.func` flag and requires also setting the `--deduplication.replica-label` flag that identifies the label that contains the replica label. Usually `prometheus_replica`.
Here is a diagram illustrating how Prometheus replicas generate samples with different timestamps that cannot be deduplicated with the `one-to-one` mode:
<img src="img/life-of-a-sample/ha-prometheus-duplicates.png" alt="High availability prometheus duplication" style="max-width: 600px; display: block;margin: 0 auto;"/>
Getting back to our example illustrating the data duplication happening in the object storage, here is how each compaction process will impact the data:
<img src="img/life-of-a-sample/compactor-compaction.png" alt="Compactor compaction" width="700"/>
First, horizontal compaction will merge blocks together. This will mostly have an effect on the labels data that are stored in a compressed format in a single index binary file attached to a single block. Then, one-to-one deduplication will remove identical samples and delete the related replica label. Finally, penalty deduplication will remove duplicated samples resulting from concurrent scrapes in high-availability Prometheus setups and remove the related replica label.
You want to deduplicate data as much as possible because it will lower your object storage cost and improve query performance. However, using the penalty mode presents some limitations. For more details, see [the documentation](https://thanos.io/tip/components/compact.md/#vertical-compaction-risks).
Key points to consider:
* You want blocks that are not too big because they will be slow to query. However, you also want to limit the number of blocks because having too many will increase the number of requests to the object storage. Also, the more blocks there are, the less compaction occurs, and the more data there is to store and load into memory.
* You do not need to worry about too small blocks, as the Compactor will merge them together. However, you could have too big blocks. This can happen if you have very high cardinality workloads or churn-heavy workloads like CI runs, build pipelines, serverless functions, or batch jobs, which often lead to huge cardinality explosions as the metrics labels will be changing often.
* The main solution to this is splitting the data into several block streams, as we will see later. This is Thanos's sharding strategy.
* There are also cases where you might want to limit the size of the blocks. To that effect, you can use the following parameters:
* You can limit the compaction levels with `--debug.max-compaction-level` to prevent the Compactor from creating blocks that are too big. This is especially useful when you have a high metrics churn rate. Level 1 is the default and will create blocks of 2 hours. Level 2 will create blocks of 8 hours, level 3 of 2 days, and up to level 4 of 14 days. Without this limit, the Compactor will create blocks of up to 2 weeks. This is not a magic bullet; it does not limit the data size of the blocks. It just limits the number of blocks that can be merged together. The downside of using this setting is that it will increase the number of blocks in the object storage. They will use more space, and the query performance might be impacted.
* The flag `compact.block-max-index-size` can be used more effectively to specify the maximum index size beyond which the Compactor will stop block compaction, independently of its compaction level. Once a block's index exceeds this size, the system marks it for no further compaction. The default value is 64 GB, which is the maximum index size the TSDB supports. As a result, some block streams might appear discontinuous in the UI, displaying a lower compaction level than the surrounding blocks.
#### Scaling the Compactor: Block Streams
Not all blocks covering the same time range are compacted together. Instead, the Compactor organizes them into distinct [compaction groups or block streams](https://thanos.io/tip/components/compact.md/#compaction-groups--block-streams). The key here is to leverage external labels to group data originating from the same source. This strategic grouping is particularly effective for compacting indexes, as blocks from the same source tend to have nearly identical labels.
You can improve the performance of the Compactor by:
* Increasing the number of concurrent compactions using the `--max-concurrent` flag. Bear in mind that you must scale storage, memory and CPU resources accordingly (linearly).
* Sharding the data. In this mode, each Compactor will process a disjoint set of block streams. This is done by setting up the `--selector.relabel-config` flag on the external labels. For example:
```yaml
- action: hashmod
source_labels:
- tenant_id # An external label that identifies some block streams
target_label: shard
modulus: 2 # The number of Compactor replicas
- action: keep
source_labels:
- shard
regex: 0 # The shard number assigned to this Compactor
```
In this configuration, the `hashmod` action is used to distribute blocks across multiple Compactor instances based on the `tenant_id` label. The `modulus` should match the number of Compactor replicas you have. Each replica will then only process the blocks that match its shard number, as defined by the `regex` in the `keep` action.
#### Downsampling and Retention
The Compactor also optimizes data reads for long-range queries. If you are querying data for several months, you do not need the typical 15-second raw resolution. Processing such a query will be very inefficient, as it will retrieve a lot of unnecessary data that you will not be able to visualize with such detail in your UI. In worst-case scenarios, it may even cause some components of your Thanos setup to fail due to memory exhaustion.
To enable performant long range queries, the Compactor can downsample data using `--retention.resolution-*` flags. It supports two downsampling levels: 5 minutes and 1 hour. These are the resolutions of the downsampled series. They will typically come on top of the raw data, so that you can have both raw and downsampled data. This will enable you to spot abnormal patterns over long-range queries and then zoom into specific parts using the raw data. We will discuss how to configure the query to use the downsampled data in the next article.
When the Compactor performs downsampling, it does more than simply reduce the number of data points by removing intermediate samples. While reducing the volume of data is a primary goal, especially to improve performance for long-range queries, the Compactor ensures that essential statistical properties of the original data are preserved. This is crucial for maintaining the accuracy and integrity of any aggregations or analyses performed on the downsampled data. In addition to the downsampled data, it stores the count, minimum, maximum, and sum of the downsampled window. Functions like sum(), min(), max(), and avg() can then be computed correctly over the downsampled data because the necessary statistical information is preserved.
This downsampled data is then stored in its own block, one per downsampling level for each corresponding raw block.
Key points to consider:
* Downsampling is not for reducing the volume of data in object storage. It is for improving long-range query performance, making your system more versatile and stable.
* Thanos recommends having the same retention duration for raw and downsampled data. This will enable you to have a consistent view of your data over time.
* As a rule of thumb, you can consider that each downsampled data level increases the storage need by onefold compared to the raw data, although it is often a bit less than that.
#### The Compactor UI and the Block Streams
The Compactor's functionality and the progress of its operations can be monitored through the **Block Viewer UI**. This web-based interface is accessible if the Compactor is configured with the `--http-address` flag. Additional UI settings are controlled via `--web.*` and `--block-viewer.*` flags. The Compactor UI provides a visual representation of the compaction process, showing how blocks are grouped and compacted over time. Here is a glimpse of what the UI looks like:
<img src="img/life-of-a-sample/compactor-ui.png" alt="Receive and Store data overlap" width="800"/>
Occasionally, some blocks may display an artificially high compaction level in the UI, appearing lower in the stream compared to adjacent blocks. This scenario often occurs in situations like rolling Receiver upgrades, where Receivers restart sequentially, leading to the creation and upload of partial blocks to the object store. The Compactor then vertically compacts these blocks as they arrive, resulting in a temporary increase in compaction levels. When these blocks are horizontally compacted with adjacent blocks, they will be displayed higher up in the stream.
As explained earlier with compaction levels, by default, the Compactors strategy involves compacting 2-hour blocks into 8-hour blocks once they are available, then progressing to 2-day blocks, and up to 14 days, following a structured compaction timeline.
### Exposing Bucket Data for Queries: The Store Gateway and the Store API
#### Exposing Data for Queries
The Store Gateway acts as a facade for the object storage, making bucket data accessible via the Thanos Store API, a feature first introduced with the Receive component. The Store Gateway exposes the Store API with the `--grpc-address` flag.
The Store Gateway requires access to the object storage bucket to retrieve data, which is configured with the `--objstore.config` flag. You can use the `--max-time` flag to specify which blocks should be considered by the Store Gateway. For example, if your Receive instances are serving data up to 10 hours, you may configure `--max-time=-8h` so that it does not consider blocks more recent than 8 hours. This avoids returning the same data as the Receivers while ensuring some overlap between the two.
To function optimally, the Store Gateway relies on caches. To understand their usefulness, let's first explore how the Store Gateway retrieves data from the blocks in the object storage.
#### Retrieving Samples from the Object Store
Consider the simple following query done on the Querier:
```promql
# Between now and 2 days ago, compute the rate of http requests per second, filtered by method and status
rate(http_requests_total{method="GET", status="200"}[5m])
```
This PromQL query will be parsed by the Querier, which will emit a Thanos [Store API](https://github.com/thanos-io/thanos/blob/main/pkg/store/storepb/rpc.proto) request to the Store Gateway with the following parameters:
```proto
SeriesRequest request = {
min_time: [Timestamp 2 days ago],
max_time: [Current Timestamp],
max_resolution_window: 1h, // the minimum time range between two samples, relates to the downsampling levels
matchers: [
{ name: "__name__", value: "http_requests_total", type: EQUAL },
{ name: "method", value: "GET", type: EQUAL },
{ name: "status", value: "200", type: EQUAL }
]
}
```
The Store Gateway processes this request in several steps:
* **Metadata processing**: The Store Gateway first examines the block [metadata](https://thanos.io/tip/thanos/storage.md/#metadata-file-metajson) to determine the relevance of each block to the query. It evaluates the time range (`minTime` and `maxTime`) and external labels (`thanos.labels`). Blocks are deemed relevant if their timestamps overlap with the query's time range and if their resolution (`thanos.downsample.resolution`) matches the query's maximum allowed resolution.
* **Index processing**: Next, the Store Gateway retrieves the [indexes](https://thanos.io/tip/thanos/storage.md/#index-format-index) of candidate blocks. This involves:
* Fetching postings lists for each label specified in the query. These are inverted indexes where each label and value has an associated sorted list of all the corresponding time series IDs. Example:
* `"__name__=http_requests_total": [1, 2, 3]`,
* `"method=GET": [1, 2, 6]`,
* `"status=200": [1, 2, 5]`
* Intersecting these postings lists to select series matching all query labels. In our example these are series 1 and 2.
* Retrieving the series section from the index for these series, which includes the chunk files, the time ranges and offset position in the file. Example:
* Series 1: [Chunk 1: mint=t0, maxt=t1, fileRef=0001, offset=0], ...
* Determining the relevant chunks based on their time range intersection with the query.
* **Chunks retrieval**: The Store Gateway then fetches the appropriate chunks, either from the object storage directly or from a chunk cache. When retrieving from the object store, the Gateway leverages its API to read only the needed bytes (i.e., using S3 range requests), bypassing the need to download entire chunk files.
Then, the Gateway streams the selected chunks to the requesting Querier.
#### Optimizing the Store Gateway
Understanding the retrieval algorithm highlights the critical role of an external [index cache](https://thanos.io/tip/components/store.md/#index-cache) in the Store Gateway's operation. This is configured using the `--index-cache.config` flag. Indexes contain all labels and values of the block, which can result in large sizes. When the cache is full, Least Recently Used (LRU) eviction is applied. In scenarios where no external cache is configured, a portion of the memory will be utilized as a cache, managed via the `--index-cache.size` flag.
Moreover, the direct retrieval of chunks from object storage can be suboptimal, and result in excessive costs, especially with a high volume of queries. To mitigate this, employing a [caching bucket](https://thanos.io/tip/components/store.md/#caching-bucket) can significantly reduce the number of queries to the object storage. This is configured using the `--store.caching-bucket.config` flag. This chunk caching strategy is particularly effective when data access patterns are predominantly focused on recent data. By caching these frequently accessed chunks, query performance is enhanced, and the load on object storage is reduced.
Finally, you can implement the same safeguards as the Receive component by setting limits on the number of samples and series that can be queried. This is accomplished using the same `--store.limits.request-samples` and `--store.limits.request-series` flags.
#### Scaling the Store Gateway
The performance of Thanos Store components can be notably improved by managing concurrency and implementing sharding strategies.
Adjusting the level of concurrency can have a significant impact on performance. This is managed through the `--store.grpc.series-max-concurrency` flag, which sets the number of allowed concurrent series requests on the Store API. Other lower-level concurrency settings are also available.
After optimizing the store processing, you can distribute the query load using sharding strategies similar to what was done with the Compactor. Using a relabel configuration, you can assign a disjoint set of blocks to each Store Gateway replica. Heres an example of how to set up sharding using the `--selector.relabel-config` flag:
```yaml
- action: hashmod
source_labels:
- tenant_id # An external label that identifies some block streams
target_label: shard
modulus: 2 # The number of Store Gateways replicas
- action: keep
source_labels:
- shard
regex: 0 # The shard number assigned to this Store Gateway
```
Sharding based on the `__block_id` is not recommended because it prevents Stores from selecting the most relevant data resolution needed for a query. For example, one store might see only the raw data and return it, while another store sees the downsampled version for the same query and also returns it. This duplication creates unnecessary overhead.
External label based shrading avoids this issue. By giving a store a complete view of a stream's data (both raw and downsampled), it can effectively select the most appropriate resolution.
If external label sharding is not sufficient, you can combine it with time partitioning using the `--min-time` and `--max-time` flags. This process is done at the chunk level, meaning you can use shorter time intervals for recent data in 2 hour blocks, but you must use longer intervals for older data to account for horizontal compaction. The goal is for any store instance to have a complete view of the stream's data at every resolution for a given time slot, allowing it to return the unique and most appropriate data.
### Conclusion
In this second part, we explored how Thanos manages data for efficient storage and retrieval. We examined how the Receive component prepares samples and exposes local data for queries, and how the Compactor optimizes data through compaction and downsampling. We also discussed how the Store Gateway retrieves data and can be optimized by leveraging indexes and implementing sharding strategies.
Now that our samples are efficiently stored and prepared for queries, we can move on to the final part of this series, where we will explore how this distributed data is retrieved by query components like the Querier.
See the full list of articles in this series:
* Life of a sample in thanos, and how to configure it Ingestion Part I
* Life of a sample in thanos, and how to configure it Data Management Part II
* Life of a sample in thanos, and how to configure it Querying Part III

Binary file not shown.

After

Width:  |  Height:  |  Size: 218 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 756 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 244 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 144 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 126 KiB

View File

@ -57,7 +57,7 @@ This rule also means that there could be a problem when both compacted and non-c
> **NOTE:** In future versions of Thanos it's possible that both restrictions will be removed once [vertical compaction](#vertical-compactions) reaches production status.
You can though run multiple Compactors against a single Bucket as long as each instance compacts a separate stream of blocks. You can do this in order to [scale the compaction process](#scalability).
It is possible to run multiple Compactors against a single Bucket, provided each instance handles a separate stream of blocks. This allows you to [scale the compaction process](#scalability).
### Vertical Compactions

View File

@ -114,13 +114,13 @@ Note that deduplication of HA groups is not supported by the `chain` algorithm.
## Thanos PromQL Engine (experimental)
By default, Thanos querier comes with standard Prometheus PromQL engine. However, when `--query.promql-engine=thanos` is specified, Thanos will use [experimental Thanos PromQL engine](http://github.com/thanos-community/promql-engine) which is a drop-in, efficient implementation of PromQL engine with query planner and optimizers.
By default, Thanos querier comes with standard Prometheus PromQL engine. However, when `--query.promql-engine=thanos` is specified, Thanos will use [experimental Thanos PromQL engine](http://github.com/thanos-io/promql-engine) which is a drop-in, efficient implementation of PromQL engine with query planner and optimizers.
To learn more, see [the introduction talk](https://youtu.be/pjkWzDVxWk4?t=3609) from [the PromConEU 2022](https://promcon.io/2022-munich/talks/opening-pandoras-box-redesigning/).
This feature is still **experimental** given active development. All queries should be supported due to bulit-in fallback to old PromQL if something is not yet implemented.
For new engine bugs/issues, please use https://github.com/thanos-community/promql-engine GitHub issues.
For new engine bugs/issues, please use https://github.com/thanos-io/promql-engine GitHub issues.
### Distributed execution mode

View File

@ -138,13 +138,13 @@ config:
use_grpc: false
grpc_conn_pool_size: 0
http_config:
idle_conn_timeout: 0s
response_header_timeout: 0s
idle_conn_timeout: 1m30s
response_header_timeout: 2m
insecure_skip_verify: false
tls_handshake_timeout: 0s
expect_continue_timeout: 0s
max_idle_conns: 0
max_idle_conns_per_host: 0
tls_handshake_timeout: 10s
expect_continue_timeout: 1s
max_idle_conns: 100
max_idle_conns_per_host: 100
max_conns_per_host: 0
tls_config:
ca_file: ""

View File

@ -503,6 +503,9 @@ Flags:
options for now: promql-experimental-functions
(enables promql experimental functions for
ruler)
--[no-]tsdb.enable-native-histograms
[EXPERIMENTAL] Enables the ingestion of native
histograms.
--remote-write.config-file=<file-path>
Path to YAML config for the remote-write
configurations, that specify servers

View File

@ -59,13 +59,13 @@ config:
use_grpc: false
grpc_conn_pool_size: 0
http_config:
idle_conn_timeout: 0s
response_header_timeout: 0s
idle_conn_timeout: 1m30s
response_header_timeout: 2m
insecure_skip_verify: false
tls_handshake_timeout: 0s
expect_continue_timeout: 0s
max_idle_conns: 0
max_idle_conns_per_host: 0
tls_handshake_timeout: 10s
expect_continue_timeout: 1s
max_idle_conns: 100
max_idle_conns_per_host: 100
max_conns_per_host: 0
tls_config:
ca_file: ""

View File

@ -18,13 +18,13 @@ config:
use_grpc: false
grpc_conn_pool_size: 0
http_config:
idle_conn_timeout: 0s
response_header_timeout: 0s
idle_conn_timeout: 1m30s
response_header_timeout: 2m
insecure_skip_verify: false
tls_handshake_timeout: 0s
expect_continue_timeout: 0s
max_idle_conns: 0
max_idle_conns_per_host: 0
tls_handshake_timeout: 10s
expect_continue_timeout: 1s
max_idle_conns: 100
max_idle_conns_per_host: 100
max_conns_per_host: 0
tls_config:
ca_file: ""
@ -478,7 +478,7 @@ While the remaining settings are **optional**:
- `dial_timeout`: the redis dial timeout.
- `read_timeout`: the redis read timeout.
- `write_timeout`: the redis write timeout.
- `cache_size` size of the in-memory cache used for client-side caching. Client-side caching is enabled when this value is not zero. See [official documentation](https://redis.io/docs/manual/client-side-caching/) for more. It is highly recommended to enable this so that Thanos Store would not need to continuously retrieve data from Redis for repeated requests of the same key(-s).
- `cache_size` size of the in-memory cache used for client-side caching. Client-side caching is enabled when this value is not zero. See [official documentation](https://redis.io/docs/latest/develop/reference/client-side-caching/) for more. It is highly recommended to enable this so that Thanos Store would not need to continuously retrieve data from Redis for repeated requests of the same key(-s).
- `enabled_items`: selectively choose what types of items to cache. Supported values are `Postings`, `Series` and `ExpandedPostings`. By default, all items are cached.
- `ttl`: ttl to store index cache items in redis.

View File

@ -114,13 +114,13 @@ config:
use_grpc: false
grpc_conn_pool_size: 0
http_config:
idle_conn_timeout: 0s
response_header_timeout: 0s
idle_conn_timeout: 1m30s
response_header_timeout: 2m
insecure_skip_verify: false
tls_handshake_timeout: 0s
expect_continue_timeout: 0s
max_idle_conns: 0
max_idle_conns_per_host: 0
tls_handshake_timeout: 10s
expect_continue_timeout: 1s
max_idle_conns: 100
max_idle_conns_per_host: 100
max_conns_per_host: 0
tls_config:
ca_file: ""
@ -719,13 +719,13 @@ config:
use_grpc: false
grpc_conn_pool_size: 0
http_config:
idle_conn_timeout: 0s
response_header_timeout: 0s
idle_conn_timeout: 1m30s
response_header_timeout: 2m
insecure_skip_verify: false
tls_handshake_timeout: 0s
expect_continue_timeout: 0s
max_idle_conns: 0
max_idle_conns_per_host: 0
tls_handshake_timeout: 10s
expect_continue_timeout: 1s
max_idle_conns: 100
max_idle_conns_per_host: 100
max_conns_per_host: 0
tls_config:
ca_file: ""
@ -825,13 +825,13 @@ config:
use_grpc: false
grpc_conn_pool_size: 0
http_config:
idle_conn_timeout: 0s
response_header_timeout: 0s
idle_conn_timeout: 1m30s
response_header_timeout: 2m
insecure_skip_verify: false
tls_handshake_timeout: 0s
expect_continue_timeout: 0s
max_idle_conns: 0
max_idle_conns_per_host: 0
tls_handshake_timeout: 10s
expect_continue_timeout: 1s
max_idle_conns: 100
max_idle_conns_per_host: 100
max_conns_per_host: 0
tls_config:
ca_file: ""
@ -1051,6 +1051,9 @@ Flags:
upload.
--label=key="value" ... External labels to add to the uploaded blocks
(repeated).
--[no-]shipper.upload-compacted
If true shipper will try to upload compacted
blocks as well.
```

View File

@ -147,7 +147,7 @@ See up to date [jsonnet mixins](https://github.com/thanos-io/thanos/tree/main/mi
* 2022:
* [Thanos at Medallia: A Hybrid Architecture Scaled to Support 1 Billion+ Series Across 40+ Data Centers](https://thanos.io/blog/2022-09-08-thanos-at-medallia/)
* [Deploy Thanos Receive with native OCI Object Storage on Oracle Kubernetes Engine](https://medium.com/@lmukadam/deploy-thanos-receive-with-native-oci-object-storage-on-kubernetes-829326ea0bc6)
* [Leveraging Consul for Thanos Query Discovery](https://nicolastakashi.medium.com/leveraging-consul-for-thanos-query-discovery-34212d496c88)
* [Leveraging Consul for Thanos Query Discovery](https://itnext.io/leveraging-consul-for-thanos-query-discovery-34212d496c88)
* 2021:
* [Adopting Thanos at LastPass](https://krisztianfekete.org/adopting-thanos-at-lastpass/)

View File

@ -0,0 +1,67 @@
# Buffers guide
## Intro
This is a guide to buffers in Thanos. The goal is to show how data is moving around, what objects or copying is happening, what are the life-times of each object and so on. With this information we will be able to make better decisions on how to make the code more garbage collector (GC) friendly.
### Situation in 0.39.2
We only use protobuf encodings and compression is optional:
```
gRPC gets a compressed protobuf message -> decompress -> protobuf decoder
```
We still use gogoproto so in the protobuf decoder we specify a custom type for labels - ZLabels. This is a "hack" that uses unsafe underneath. With the `slicelabels` tag, it is possible to create labels.Labels objects (required by the PromQL layer) and reuse references to strings allocated in the protobuf layer. The protobuf message's bytes buffer is never recycled and it lives as far as possible until it is collected by the GC. Chunks and all other objects are still copied.
### gRPC gets the ability to recycle messages
Nowadays, gRPC can and does by default recycle the decoded messages nowadays so that it wouldn't be needed to allocate a new `[]byte` all the time on the gRPC layer. But this means that we have to be conscious in the allocations that we make.
Previously we had:
```go
[]struct {
Name string
Value string
}
```
So, a slice of two pointers in each element plus the strings themselves. But, fortunately, we use unsafe code and we don't allocate a new string object for each string but these strings rather point to []bytes.
With `stringlabel` and careful use of `labels.ScratchBuilder` we could put all labels into one string object. Only consequence of this is that we will have to copy protobuf message's data into this special format but copying data in memory is faster (probably?) than having to iterate through possibly millions of objects during GC time.
Also, ideally we wouldn't have to allocate data for messages and stream them into the readers but if the messages are compressed then there is no way we could do that since for generic compression (in most cases) you need to have the whole message in memory. Cap N' Proto is also based on messages so you need to read a message fully. Only bonus is that it gives you full control over the lifetime of the messages. Most `grpc-go` encoders immediately put the message's buffer back after decoding it BUT it is possible to hold a reference for longer:
[CodecV2 ref](https://pkg.go.dev/google.golang.org/grpc/encoding#CodecV2)
Hence, the only possibility for further improvements at the moment it seems is to associate the life-time of messages with the query itself so that we could avoid copying `[]bytes` for the chunks (mostly).
I wrote a benchmark and it seems like `stringlabel` + hand-rolled unmarshaling code wins:
```
goos: linux
goarch: amd64
pkg: github.com/thanos-io/thanos/pkg/store/labelpb
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
│ labelsbench │
│ sec/op │
LabelUnmarshal/Unmarshal_regular-16 123.0µ ± 41%
LabelUnmarshal/Unmarshal_ZLabel-16 65.43µ ± 40%
LabelUnmarshal/Unmarshal_easyproto-16 41.19µ ± 45%
geomean 69.20µ
│ labelsbench │
│ B/op │
LabelUnmarshal/Unmarshal_regular-16 84.22Ki ± 0%
LabelUnmarshal/Unmarshal_ZLabel-16 68.59Ki ± 0%
LabelUnmarshal/Unmarshal_easyproto-16 32.03Ki ± 0%
geomean 56.99Ki
│ labelsbench │
│ allocs/op │
LabelUnmarshal/Unmarshal_regular-16 2.011k ± 0%
LabelUnmarshal/Unmarshal_ZLabel-16 11.00 ± 0%
LabelUnmarshal/Unmarshal_easyproto-16 1.000 ± 0%
geomean 28.07
```

View File

@ -11,7 +11,7 @@ menu: proposals-accepted
* https://github.com/thanos-io/thanos/pull/5250
* https://github.com/thanos-io/thanos/pull/4917
* https://github.com/thanos-io/thanos/pull/5350
* https://github.com/thanos-community/promql-engine/issues/25
* https://github.com/thanos-io/promql-engine/issues/25
## 2 Why
@ -75,7 +75,7 @@ Keeping PromQL execution in Query components allows for deduplication between Pr
<img src="../img/distributed-execution-proposal-1.png" alt="Distributed query execution" width="400"/>
The initial version of the solution can be found here: https://github.com/thanos-community/promql-engine/pull/139
The initial version of the solution can be found here: https://github.com/thanos-io/promql-engine/pull/139
### Query rewrite algorithm

View File

@ -92,7 +92,7 @@ Enforcing tenancy label in queries:
#### Apply verification and enforcement logic in the Query Frontend instead of Querier.
The Query Frontend is an optional component on any Thanos deployment, while the Querier is always present. Plus, there might be deployments with multiple Querier layers where one or more might need to apply tenant verification and enforcement. On top of this, doing it in the Querier supports future work on using the [new Thanos PromQL engine](https://github.com/thanos-community/promql-engine), which can potentially make the Query Frontend unnecessary.
The Query Frontend is an optional component on any Thanos deployment, while the Querier is always present. Plus, there might be deployments with multiple Querier layers where one or more might need to apply tenant verification and enforcement. On top of this, doing it in the Querier supports future work on using the [new Thanos PromQL engine](https://github.com/thanos-io/promql-engine), which can potentially make the Query Frontend unnecessary.
#### Add the tenant identification as an optional field in the Store API protobuffer spec instead of an HTTP header.

View File

@ -143,4 +143,4 @@ An alternative to this is to use the existing [hashmod](https://prometheus.io/do
Once a Prometheus instance has been drained and no longer has targets to scrape we will wish to scale down and remove the instance. However, we will need to ensure that the data that is currently in the WAL block but not uploaded to object storage is flushed before we can remove the instance. Failing to do so will mean that any data in the WAL is lost when the Prometheus node is terminated. During this flush period until it is confirmed that the WAL has been uploaded we should still have the Prometheus instance serve requests for the data in the WAL.
See [prometheus/tsdb - Issue 346](https://github.com/prometheus/tsdb/issues/346) for more information.
See [prometheus/tsdb - Issue 346](https://github.com/prometheus-junkyard/tsdb/issues/346) for more information.

View File

@ -23,6 +23,7 @@ Release shepherd responsibilities:
| Release | Time of first RC | Shepherd (GitHub handle) |
|---------|------------------|-------------------------------|
| v0.40.0 | 2025.10.14 | `@GiedriusS` |
| v0.39.0 | 2025.05.29 | `@GiedriusS` |
| v0.38.0 | 2025.03.25 | `@MichaHoffmann` |
| v0.37.0 | 2024.11.19 | `@saswatamcode` |

View File

@ -290,13 +290,13 @@ config:
use_grpc: false
grpc_conn_pool_size: 0
http_config:
idle_conn_timeout: 0s
response_header_timeout: 0s
idle_conn_timeout: 1m30s
response_header_timeout: 2m
insecure_skip_verify: false
tls_handshake_timeout: 0s
expect_continue_timeout: 0s
max_idle_conns: 0
max_idle_conns_per_host: 0
tls_handshake_timeout: 10s
expect_continue_timeout: 1s
max_idle_conns: 100
max_idle_conns_per_host: 100
max_conns_per_host: 0
tls_config:
ca_file: ""
@ -379,12 +379,15 @@ Config file format is the following:
```yaml mdox-exec="go run scripts/cfggen/main.go --name=azure.Config"
type: AZURE
config:
az_tenant_id: ""
client_id: ""
client_secret: ""
storage_account: ""
storage_account_key: ""
storage_connection_string: ""
storage_create_container: false
storage_create_container: true
container: ""
endpoint: ""
endpoint: blob.core.windows.net
user_assigned_id: ""
max_retries: 0
reader_config:
@ -395,13 +398,13 @@ config:
retry_delay: 0s
max_retry_delay: 0s
http_config:
idle_conn_timeout: 0s
response_header_timeout: 0s
idle_conn_timeout: 1m30s
response_header_timeout: 2m
insecure_skip_verify: false
tls_handshake_timeout: 0s
expect_continue_timeout: 0s
max_idle_conns: 0
max_idle_conns_per_host: 0
tls_handshake_timeout: 10s
expect_continue_timeout: 1s
max_idle_conns: 100
max_idle_conns_per_host: 100
max_conns_per_host: 0
tls_config:
ca_file: ""

View File

@ -23,6 +23,7 @@ import (
"gopkg.in/yaml.v2"
"github.com/efficientgo/core/testutil"
"github.com/thanos-io/objstore"
"github.com/thanos-io/objstore/client"
"github.com/thanos-io/objstore/providers/s3"
tracingclient "github.com/thanos-io/thanos/pkg/tracing/client"
@ -176,7 +177,7 @@ func TestReadOnlyThanosSetup(t *testing.T) {
// │ │
// └───────────┘
bkt1Config, err := yaml.Marshal(client.BucketConfig{
Type: client.S3,
Type: objstore.S3,
Config: s3.Config{
Bucket: "bkt1",
AccessKey: e2edb.MinioAccessKey,
@ -198,7 +199,7 @@ func TestReadOnlyThanosSetup(t *testing.T) {
)
bkt2Config, err := yaml.Marshal(client.BucketConfig{
Type: client.S3,
Type: objstore.S3,
Config: s3.Config{
Bucket: "bkt2",
AccessKey: e2edb.MinioAccessKey,

97
go.mod
View File

@ -1,10 +1,10 @@
module github.com/thanos-io/thanos
go 1.24.0
go 1.25.0
require (
capnproto.org/go/capnp/v3 v3.1.0-alpha.1
cloud.google.com/go/trace v1.11.4
cloud.google.com/go/trace v1.11.6
github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/trace v1.27.0
github.com/KimMachineGun/automemlimit v0.7.3
github.com/alecthomas/units v0.0.0-20240927000941-0f3dac36c52b
@ -49,7 +49,7 @@ require (
github.com/minio/sha256-simd v1.0.1
github.com/mitchellh/go-ps v1.0.0
github.com/mwitkow/go-conntrack v0.0.0-20190716064945-2f068394615f
github.com/oklog/run v1.1.0
github.com/oklog/run v1.2.0
github.com/oklog/ulid v1.3.1 // indirect
github.com/olekukonko/tablewriter v0.0.5
github.com/onsi/gomega v1.36.2
@ -58,18 +58,18 @@ require (
github.com/pkg/errors v0.9.1
github.com/prometheus-community/prom-label-proxy v0.11.1
github.com/prometheus/alertmanager v0.28.1
github.com/prometheus/client_golang v1.22.0
github.com/prometheus/client_golang v1.23.0-rc.1
github.com/prometheus/client_model v0.6.2
github.com/prometheus/common v0.63.0
github.com/prometheus/common v0.65.1-0.20250703115700-7f8b2a0d32d3
github.com/prometheus/exporter-toolkit v0.14.0
// Prometheus maps version 3.x.y to tags v0.30x.y.
github.com/prometheus/prometheus v0.303.1
github.com/prometheus/prometheus v0.305.1-0.20250721065454-b09cf6be8d56
github.com/redis/rueidis v1.0.61
github.com/seiflotfy/cuckoofilter v0.0.0-20240715131351-a2f2c23f1771
github.com/sony/gobreaker v1.0.0
github.com/stretchr/testify v1.10.0
github.com/thanos-io/objstore v0.0.0-20241111205755-d1dd89d41f97
github.com/thanos-io/promql-engine v0.0.0-20250522103302-dd83bd8fdb50
github.com/thanos-io/objstore v0.0.0-20250804093838-71d60dfee488
github.com/thanos-io/promql-engine v0.0.0-20250731151205-1a520ea6a26d
github.com/uber/jaeger-client-go v2.30.0+incompatible
github.com/vimeo/galaxycache v1.3.1
github.com/weaveworks/common v0.0.0-20230728070032-dd9e68f319d5
@ -95,32 +95,33 @@ require (
golang.org/x/text v0.26.0
golang.org/x/time v0.12.0
google.golang.org/grpc v1.73.0
google.golang.org/grpc/examples v0.0.0-20211119005141-f45e61797429
google.golang.org/grpc/examples v0.0.0-20230224211313-3775f633ce20
google.golang.org/protobuf v1.36.6
gopkg.in/yaml.v2 v2.4.0
gopkg.in/yaml.v3 v3.0.1
)
require (
cloud.google.com/go v0.118.0 // indirect
cloud.google.com/go/auth v0.15.1-0.20250317171031-671eed979bfd // indirect
cloud.google.com/go v0.120.0 // indirect
cloud.google.com/go/auth v0.16.2 // indirect
cloud.google.com/go/auth/oauth2adapt v0.2.8 // indirect
cloud.google.com/go/compute/metadata v0.7.0 // indirect
cloud.google.com/go/iam v1.3.1 // indirect
cloud.google.com/go/storage v1.43.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/azcore v1.18.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.10.0 // indirect
cloud.google.com/go/iam v1.5.2 // indirect
cloud.google.com/go/storage v1.50.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/azcore v1.18.1 // indirect
github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.10.1 // indirect
github.com/Azure/azure-sdk-for-go/sdk/internal v1.11.1 // indirect
github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.6.1 // indirect
github.com/AzureAD/microsoft-authentication-library-for-go v1.4.2 // indirect
)
require (
github.com/VictoriaMetrics/easyproto v0.1.4
github.com/alecthomas/kingpin/v2 v2.4.0
github.com/oklog/ulid/v2 v2.1.1
github.com/prometheus/otlptranslator v0.0.0-20250527173959-2573485683d5
github.com/prometheus/otlptranslator v0.0.0-20250620074007-94f535e0c588
github.com/tjhop/slog-gokit v0.1.4
go.opentelemetry.io/collector/pdata v1.34.0
go.opentelemetry.io/collector/pdata v1.35.0
go.opentelemetry.io/collector/semconv v0.128.0
)
@ -145,7 +146,6 @@ require (
github.com/onsi/ginkgo v1.16.5 // indirect
github.com/pbnjay/memory v0.0.0-20210728143218-7b4eea64cf58 // indirect
github.com/sercand/kuberesolver/v4 v4.0.0 // indirect
github.com/zhangyunhao116/umap v0.0.0-20250307031311-0b61e69e958b // indirect
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.61.0 // indirect
go.opentelemetry.io/contrib/propagators/ot v1.36.0 // indirect
go4.org/unsafe/assume-no-moving-gc v0.0.0-20231121144256-b99613f794b6 // indirect
@ -159,11 +159,14 @@ require (
)
require (
cel.dev/expr v0.23.1 // indirect
cloud.google.com/go/monitoring v1.24.2 // indirect
github.com/GoogleCloudPlatform/opentelemetry-operations-go/detectors/gcp v1.27.0 // indirect
github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/metric v0.50.0 // indirect
github.com/GoogleCloudPlatform/opentelemetry-operations-go/internal/resourcemapping v0.52.0 // indirect
github.com/aliyun/aliyun-oss-go-sdk v3.0.2+incompatible // indirect
github.com/armon/go-radix v1.0.0 // indirect
github.com/asaskevich/govalidator v0.0.0-20230301143203-a9d515a09cc2 // indirect
github.com/aws/aws-sdk-go v1.55.7 // indirect
github.com/aws/aws-sdk-go-v2 v1.36.3 // indirect
github.com/aws/aws-sdk-go-v2/config v1.29.15 // indirect
github.com/aws/aws-sdk-go-v2/credentials v1.17.68 // indirect
@ -190,8 +193,10 @@ require (
github.com/elastic/go-sysinfo v1.15.3 // indirect
github.com/elastic/go-windows v1.0.2 // indirect
github.com/envoyproxy/go-control-plane/envoy v1.32.4 // indirect
github.com/envoyproxy/protoc-gen-validate v1.2.1 // indirect
github.com/fatih/color v1.18.0 // indirect
github.com/felixge/httpsnoop v1.0.4 // indirect
github.com/go-jose/go-jose/v4 v4.0.5 // indirect
github.com/go-logfmt/logfmt v0.6.0 // indirect
github.com/go-logr/logr v1.4.3 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
@ -203,7 +208,7 @@ require (
github.com/go-openapi/spec v0.21.0 // indirect
github.com/go-openapi/swag v0.23.1 // indirect
github.com/go-openapi/validate v0.24.0 // indirect
github.com/go-viper/mapstructure/v2 v2.2.1 // indirect
github.com/go-viper/mapstructure/v2 v2.3.0 // indirect
github.com/gobwas/glob v0.2.3 // indirect
github.com/gobwas/httphead v0.1.0 // indirect
github.com/gobwas/pool v0.2.1 // indirect
@ -213,13 +218,12 @@ require (
github.com/google/go-querystring v1.1.0 // indirect
github.com/google/pprof v0.0.0-20250607225305-033d6d78b36a // indirect
github.com/googleapis/enterprise-certificate-proxy v0.3.6 // indirect
github.com/googleapis/gax-go/v2 v2.14.1 // indirect
github.com/googleapis/gax-go/v2 v2.14.2 // indirect
github.com/gorilla/mux v1.8.1 // indirect
github.com/grafana/regexp v0.0.0-20240518133315-a468a5bfb3bc // indirect
github.com/grpc-ecosystem/grpc-gateway/v2 v2.26.3 // indirect
github.com/hashicorp/go-version v1.7.0 // indirect
github.com/jaegertracing/jaeger-idl v0.6.0 // indirect
github.com/jmespath/go-jmespath v0.4.0 // indirect
github.com/josharian/intern v1.0.0 // indirect
github.com/julienschmidt/httprouter v1.3.0 // indirect
github.com/klauspost/cpuid/v2 v2.2.10 // indirect
@ -231,8 +235,9 @@ require (
github.com/mailru/easyjson v0.9.0 // indirect
github.com/mattn/go-colorable v0.1.14 // indirect
github.com/mattn/go-runewidth v0.0.16 // indirect
github.com/minio/crc64nvme v1.0.1 // indirect
github.com/minio/md5-simd v1.1.2 // indirect
github.com/minio/minio-go/v7 v7.0.80 // indirect
github.com/minio/minio-go/v7 v7.0.93 // indirect
github.com/mitchellh/copystructure v1.2.0 // indirect
github.com/mitchellh/mapstructure v1.5.0 // indirect
github.com/mitchellh/reflectwalk v1.0.2 // indirect
@ -240,42 +245,48 @@ require (
github.com/modern-go/reflect2 v1.0.2 // indirect
github.com/mozillazg/go-httpheader v0.4.0 // indirect
github.com/ncw/swift v1.0.53 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/internal/exp/metrics v0.128.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/pdatautil v0.128.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/processor/deltatocumulativeprocessor v0.128.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/internal/exp/metrics v0.129.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/pdatautil v0.129.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/processor/deltatocumulativeprocessor v0.129.0 // indirect
github.com/opentracing-contrib/go-grpc v0.1.2 // indirect
github.com/opentracing-contrib/go-stdlib v1.1.0 // indirect
github.com/oracle/oci-go-sdk/v65 v65.93.1 // indirect
github.com/philhofer/fwd v1.1.3-0.20240916144458-20a13a1f6b7c // indirect
github.com/pkg/browser v0.0.0-20240102092130-5ac0b6a4141c // indirect
github.com/planetscale/vtprotobuf v0.6.1-0.20240319094008-0393e58bdf10 // indirect
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
github.com/prometheus/procfs v0.16.1 // indirect
github.com/prometheus/sigv4 v0.1.2 // indirect
github.com/prometheus/sigv4 v0.2.0 // indirect
github.com/puzpuzpuz/xsync/v3 v3.5.1 // indirect
github.com/rivo/uniseg v0.4.7 // indirect
github.com/rs/xid v1.6.0 // indirect
github.com/santhosh-tekuri/jsonschema v1.2.4 // indirect
github.com/sirupsen/logrus v1.9.3 // indirect
github.com/spiffe/go-spiffe/v2 v2.5.0 // indirect
github.com/stretchr/objx v0.5.2 // indirect
github.com/tencentyun/cos-go-sdk-v5 v0.7.66 // indirect
github.com/tinylib/msgp v1.3.0 // indirect
github.com/uber/jaeger-lib v2.4.1+incompatible // indirect
github.com/weaveworks/promrus v1.2.0 // indirect
github.com/xhit/go-str2duration/v2 v2.1.0 // indirect
github.com/youmark/pkcs8 v0.0.0-20240726163527-a2c0da244d78 // indirect
github.com/yuin/gopher-lua v1.1.1 // indirect
github.com/zeebo/errs v1.4.0 // indirect
go.elastic.co/apm/module/apmhttp v1.15.0 // indirect
go.elastic.co/fastjson v1.5.1 // indirect
go.mongodb.org/mongo-driver v1.17.4 // indirect
go.opencensus.io v0.24.0 // indirect
go.opentelemetry.io/auto/sdk v1.1.0 // indirect
go.opentelemetry.io/collector/component v1.34.0 // indirect
go.opentelemetry.io/collector/confmap v1.34.0 // indirect
go.opentelemetry.io/collector/confmap/xconfmap v0.128.0 // indirect
go.opentelemetry.io/collector/consumer v1.34.0 // indirect
go.opentelemetry.io/collector/featuregate v1.34.0 // indirect
go.opentelemetry.io/collector/internal/telemetry v0.128.0 // indirect
go.opentelemetry.io/collector/pipeline v0.128.0 // indirect
go.opentelemetry.io/collector/processor v1.34.0 // indirect
go.opentelemetry.io/collector/component v1.35.0 // indirect
go.opentelemetry.io/collector/confmap v1.35.0 // indirect
go.opentelemetry.io/collector/confmap/xconfmap v0.129.0 // indirect
go.opentelemetry.io/collector/consumer v1.35.0 // indirect
go.opentelemetry.io/collector/featuregate v1.35.0 // indirect
go.opentelemetry.io/collector/internal/telemetry v0.129.0 // indirect
go.opentelemetry.io/collector/pipeline v0.129.0 // indirect
go.opentelemetry.io/collector/processor v1.35.0 // indirect
go.opentelemetry.io/contrib/bridges/otelzap v0.11.0 // indirect
go.opentelemetry.io/contrib/detectors/gcp v1.35.0 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/httptrace/otelhttptrace v0.61.0 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.61.0 // indirect
go.opentelemetry.io/contrib/propagators/aws v1.36.0 // indirect
@ -283,6 +294,7 @@ require (
go.opentelemetry.io/contrib/propagators/jaeger v1.36.0 // indirect
go.opentelemetry.io/otel/log v0.12.2 // indirect
go.opentelemetry.io/otel/metric v1.36.0 // indirect
go.opentelemetry.io/otel/sdk/metric v1.36.0 // indirect
go.opentelemetry.io/proto/otlp v1.7.0 // indirect
go.uber.org/multierr v1.11.0 // indirect
go.uber.org/zap v1.27.0 // indirect
@ -292,11 +304,11 @@ require (
golang.org/x/sys v0.33.0 // indirect
golang.org/x/tools v0.34.0 // indirect
gonum.org/v1/gonum v0.16.0 // indirect
google.golang.org/api v0.228.0 // indirect
google.golang.org/genproto v0.0.0-20250122153221-138b5a5a4fd4 // indirect
google.golang.org/api v0.239.0 // indirect
google.golang.org/genproto v0.0.0-20250505200425-f936aa4a68b2 // indirect
howett.net/plist v1.0.1 // indirect
sigs.k8s.io/yaml v1.4.0 // indirect
zenhack.net/go/util v0.0.0-20230607025951-8b02fee814ae // indirect
zenhack.net/go/util v0.0.0-20230414204917-531d38494cf5 // indirect
)
replace (
@ -307,17 +319,14 @@ replace (
// Required by Cortex https://github.com/cortexproject/cortex/pull/3051.
github.com/bradfitz/gomemcache => github.com/themihai/gomemcache v0.0.0-20180902122335-24332e2d58ab
// v3.3.1 with https://github.com/prometheus/prometheus/pull/16252.
github.com/prometheus/prometheus => github.com/thanos-io/thanos-prometheus v0.0.0-20250610133519-082594458a88
// Pin kuberesolver/v5 to support new grpc version. Need to upgrade kuberesolver version on weaveworks/common.
github.com/sercand/kuberesolver/v4 => github.com/sercand/kuberesolver/v5 v5.1.1
github.com/vimeo/galaxycache => github.com/thanos-community/galaxycache v0.0.0-20211122094458-3a32041a1f1e
// Pinning grpc due https://github.com/grpc/grpc-go/issues/7314
google.golang.org/grpc => google.golang.org/grpc v1.63.2
// Overriding to use latest commit.
gopkg.in/alecthomas/kingpin.v2 => github.com/alecthomas/kingpin v1.3.8-0.20210301060133-17f40c25f497
// The domain `zenhack.net` expired.
zenhack.net/go/util => github.com/zenhack/go-util v0.0.0-20231005031245-66f5419c2aea
)

1459
go.sum

File diff suppressed because it is too large Load Diff

View File

@ -49,7 +49,7 @@ func (c GenNumMiddleware) Stop() {
// InjectCacheGenNumber returns a derived context containing the cache gen.
func InjectCacheGenNumber(ctx context.Context, cacheGen string) context.Context {
return context.WithValue(ctx, interface{}(cacheGenContextKey), cacheGen)
return context.WithValue(ctx, any(cacheGenContextKey), cacheGen)
}
// ExtractCacheGenNumbersFromHeaders gets the cache gen from the context.

View File

@ -20,9 +20,9 @@ import (
func fillCache(t *testing.T, cache cache.Cache) ([]string, [][]byte) {
keys := []string{}
bufs := [][]byte{}
for i := 0; i < 111; i++ {
for i := range 111 {
keys = append(keys, fmt.Sprintf("test%d", i))
bufs = append(bufs, []byte(fmt.Sprintf("buf%d", i)))
bufs = append(bufs, fmt.Appendf(nil, "buf%d", i))
}
cache.Store(context.Background(), keys, bufs)
@ -30,7 +30,7 @@ func fillCache(t *testing.T, cache cache.Cache) ([]string, [][]byte) {
}
func testCacheSingle(t *testing.T, cache cache.Cache, keys []string, data [][]byte) {
for i := 0; i < 100; i++ {
for range 100 {
index := rand.Intn(len(keys))
key := keys[index]
@ -57,7 +57,7 @@ func testCacheMultiple(t *testing.T, cache cache.Cache, keys []string, data [][]
}
func testCacheMiss(t *testing.T, cache cache.Cache) {
for i := 0; i < 100; i++ {
for range 100 {
key := strconv.Itoa(rand.Int()) // arbitrary key which should fail: no chunk key is a single integer
found, bufs, missing := cache.Fetch(context.Background(), []string{key})
require.Empty(t, found)

View File

@ -47,7 +47,7 @@ func TestFifoCacheEviction(t *testing.T) {
// Check put / get works
keys := []string{}
values := [][]byte{}
for i := 0; i < cnt; i++ {
for i := range cnt {
key := fmt.Sprintf("%02d", i)
value := make([]byte, len(key))
copy(value, key)
@ -68,7 +68,7 @@ func TestFifoCacheEviction(t *testing.T) {
assert.Equal(t, testutil.ToFloat64(c.staleGets), float64(0))
assert.Equal(t, testutil.ToFloat64(c.memoryBytes), float64(cnt*sizeOf(itemTemplate)))
for i := 0; i < cnt; i++ {
for i := range cnt {
key := fmt.Sprintf("%02d", i)
value, ok := c.Get(ctx, key)
require.True(t, ok)
@ -110,7 +110,7 @@ func TestFifoCacheEviction(t *testing.T) {
assert.Equal(t, testutil.ToFloat64(c.staleGets), float64(0))
assert.Equal(t, testutil.ToFloat64(c.memoryBytes), float64(cnt*sizeOf(itemTemplate)))
for i := 0; i < cnt-evicted; i++ {
for i := range cnt - evicted {
_, ok := c.Get(ctx, fmt.Sprintf("%02d", i))
require.False(t, ok)
}
@ -148,7 +148,7 @@ func TestFifoCacheEviction(t *testing.T) {
for i := cnt; i < cnt+evicted; i++ {
value, ok := c.Get(ctx, fmt.Sprintf("%02d", i))
require.True(t, ok)
require.Equal(t, []byte(fmt.Sprintf("%02d", i*2)), value)
require.Equal(t, fmt.Appendf(nil, "%02d", i*2), value)
}
assert.Equal(t, testutil.ToFloat64(c.entriesAdded), float64(3))

View File

@ -194,7 +194,7 @@ func (c *memcachedClient) dialViaCircuitBreaker(network, address string, timeout
}
c.Unlock()
conn, err := cb.Execute(func() (interface{}, error) {
conn, err := cb.Execute(func() (any, error) {
return net.DialTimeout(network, address, timeout)
})
if err != nil {

View File

@ -46,19 +46,19 @@ func testMemcache(t *testing.T, memcache *cache.Memcached) {
bufs := make([][]byte, 0, numKeys)
// Insert 1000 keys skipping all multiples of 5.
for i := 0; i < numKeys; i++ {
for i := range numKeys {
keysIncMissing = append(keysIncMissing, fmt.Sprint(i))
if i%5 == 0 {
continue
}
keys = append(keys, fmt.Sprint(i))
bufs = append(bufs, []byte(fmt.Sprint(i)))
bufs = append(bufs, fmt.Append(nil, i))
}
memcache.Store(ctx, keys, bufs)
found, bufs, missing := memcache.Fetch(ctx, keysIncMissing)
for i := 0; i < numKeys; i++ {
for i := range numKeys {
if i%5 == 0 {
require.Equal(t, fmt.Sprint(i), missing[0])
missing = missing[1:]
@ -121,17 +121,17 @@ func testMemcacheFailing(t *testing.T, memcache *cache.Memcached) {
keys := make([]string, 0, numKeys)
bufs := make([][]byte, 0, numKeys)
// Insert 1000 keys skipping all multiples of 5.
for i := 0; i < numKeys; i++ {
for i := range numKeys {
keysIncMissing = append(keysIncMissing, fmt.Sprint(i))
if i%5 == 0 {
continue
}
keys = append(keys, fmt.Sprint(i))
bufs = append(bufs, []byte(fmt.Sprint(i)))
bufs = append(bufs, fmt.Append(nil, i))
}
memcache.Store(ctx, keys, bufs)
for i := 0; i < 10; i++ {
for range 10 {
found, bufs, missing := memcache.Fetch(ctx, keysIncMissing)
require.Equal(t, len(found), len(bufs))
@ -185,9 +185,9 @@ func testMemcachedStopping(t *testing.T, memcache *cache.Memcached) {
ctx := context.Background()
keys := make([]string, 0, numKeys)
bufs := make([][]byte, 0, numKeys)
for i := 0; i < numKeys; i++ {
for i := range numKeys {
keys = append(keys, fmt.Sprint(i))
bufs = append(bufs, []byte(fmt.Sprint(i)))
bufs = append(bufs, fmt.Append(nil, i))
}
memcache.Store(ctx, keys, bufs)

View File

@ -38,7 +38,7 @@ func TestRedisCache(t *testing.T) {
require.Len(t, found, nHit)
require.Len(t, missed, 0)
for i := 0; i < nHit; i++ {
for i := range nHit {
require.Equal(t, keys[i], found[i])
require.Equal(t, bufs[i], data[i])
}
@ -48,7 +48,7 @@ func TestRedisCache(t *testing.T) {
require.Len(t, found, 0)
require.Len(t, missed, nMiss)
for i := 0; i < nMiss; i++ {
for i := range nMiss {
require.Equal(t, miss[i], missed[i])
}
}

View File

@ -81,7 +81,7 @@ func (s Sample) MarshalJSON() ([]byte, error) {
if err != nil {
return nil, err
}
return []byte(fmt.Sprintf("[%s,%s]", t, v)), nil
return fmt.Appendf(nil, "[%s,%s]", t, v), nil
}
// UnmarshalJSON implements json.Unmarshaler.

File diff suppressed because it is too large Load Diff

View File

@ -26,13 +26,13 @@ var (
is re-used. But since the slices are far far larger, we come out ahead.
*/
slicePool = sync.Pool{
New: func() interface{} {
New: func() any {
return make([]PreallocTimeseries, 0, expectedTimeseries)
},
}
timeSeriesPool = sync.Pool{
New: func() interface{} {
New: func() any {
return &TimeSeries{
Labels: make([]LabelAdapter, 0, expectedLabels),
Samples: make([]Sample, 0, expectedSamplesPerSeries),

View File

@ -9,6 +9,7 @@ import (
"errors"
"fmt"
"io"
"maps"
"net/http"
"net/url"
"strconv"
@ -132,9 +133,7 @@ func (f *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
}
hs := w.Header()
for h, vs := range resp.Header {
hs[h] = vs
}
maps.Copy(hs, resp.Header)
if f.cfg.QueryStatsEnabled {
writeServiceTimingHeader(queryResponseTime, hs, stats)
@ -200,7 +199,7 @@ func (f *Handler) reportSlowQuery(
remoteUser, _, _ = r.BasicAuth()
}
logMessage := append([]interface{}{
logMessage := append([]any{
"msg", "slow query detected",
"method", r.Method,
"host", r.Host,
@ -237,7 +236,7 @@ func (f *Handler) reportQueryStats(r *http.Request, queryString url.Values, quer
f.activeUsers.UpdateUserTimestamp(userID, time.Now())
// Log stats.
logMessage := append([]interface{}{
logMessage := append([]any{
"msg", "query stats",
"component", "query-frontend",
"method", r.Method,
@ -269,14 +268,14 @@ func (f *Handler) parseRequestQueryString(r *http.Request, bodyBuf bytes.Buffer)
return r.Form
}
func formatQueryString(queryString url.Values) (fields []interface{}) {
func formatQueryString(queryString url.Values) (fields []any) {
for k, v := range queryString {
fields = append(fields, fmt.Sprintf("param_%s", k), strings.Join(v, ","))
}
return fields
}
func (f *Handler) addStatsToLogMessage(message []interface{}, stats *querier_stats.Stats) []interface{} {
func (f *Handler) addStatsToLogMessage(message []any, stats *querier_stats.Stats) []any {
if stats != nil {
message = append(message, "peak_samples", stats.LoadPeakSamples())
message = append(message, "total_samples_loaded", stats.LoadTotalSamples())
@ -285,7 +284,7 @@ func (f *Handler) addStatsToLogMessage(message []interface{}, stats *querier_sta
return message
}
func addQueryRangeToLogMessage(logMessage []interface{}, queryString url.Values) []interface{} {
func addQueryRangeToLogMessage(logMessage []any, queryString url.Values) []any {
queryRange := extractQueryRange(queryString)
if queryRange != time.Duration(0) {
logMessage = append(logMessage, "query_range_hours", int(queryRange.Hours()))

View File

@ -28,10 +28,9 @@ func BenchmarkPrometheusCodec_DecodeResponse(b *testing.B) {
require.NoError(b, err)
b.Log("test prometheus response size:", len(encodedRes))
b.ResetTimer()
b.ReportAllocs()
for n := 0; n < b.N; n++ {
for b.Loop() {
_, err := PrometheusCodec.DecodeResponse(context.Background(), &http.Response{
StatusCode: 200,
Body: io.NopCloser(bytes.NewReader(encodedRes)),
@ -50,10 +49,9 @@ func BenchmarkPrometheusCodec_EncodeResponse(b *testing.B) {
// Generate a mocked response and marshal it.
res := mockPrometheusResponse(numSeries, numSamplesPerSeries)
b.ResetTimer()
b.ReportAllocs()
for n := 0; n < b.N; n++ {
for b.Loop() {
_, err := PrometheusCodec.EncodeResponse(context.Background(), res)
require.NoError(b, err)
}
@ -61,10 +59,10 @@ func BenchmarkPrometheusCodec_EncodeResponse(b *testing.B) {
func mockPrometheusResponse(numSeries, numSamplesPerSeries int) *PrometheusResponse {
stream := make([]SampleStream, numSeries)
for s := 0; s < numSeries; s++ {
for s := range numSeries {
// Generate random samples.
samples := make([]cortexpb.Sample, numSamplesPerSeries)
for i := 0; i < numSamplesPerSeries; i++ {
for i := range numSamplesPerSeries {
samples[i] = cortexpb.Sample{
Value: rand.Float64(),
TimestampMs: int64(i),

View File

@ -12,6 +12,7 @@ import (
"math"
"net/http"
"net/url"
"slices"
"sort"
"strconv"
"strings"
@ -750,7 +751,7 @@ func StatsMerge(resps []Response) *PrometheusResponseStats {
keys = append(keys, key)
}
sort.Slice(keys, func(i, j int) bool { return keys[i] < keys[j] })
slices.Sort(keys)
result := &PrometheusResponseStats{Samples: &PrometheusResponseSamplesStats{
PeakSamples: peakSamples,
@ -935,7 +936,7 @@ func (d Duration) MarshalJSON() ([]byte, error) {
}
func (d *Duration) UnmarshalJSON(b []byte) error {
var v interface{}
var v any
if err := json.Unmarshal(b, &v); err != nil {
return err
}

View File

@ -236,7 +236,7 @@ func prettyPrintJsonBody(t *testing.T, body io.ReadCloser) string {
bodyContent, err := io.ReadAll(body)
require.NoError(t, err)
var jsonData interface{}
var jsonData any
err = json.Unmarshal(bodyContent, &jsonData)
require.NoError(t, err)

View File

@ -9,6 +9,7 @@ import (
"fmt"
"github.com/thanos-io/thanos/pkg/extpromql"
"net/http"
"slices"
"sort"
"strings"
"time"
@ -276,11 +277,9 @@ func (s resultsCache) Do(ctx context.Context, r Request) (Response, error) {
// shouldCacheResponse says whether the response should be cached or not.
func (s resultsCache) shouldCacheResponse(ctx context.Context, req Request, r Response, maxCacheTime int64) bool {
headerValues := getHeaderValuesWithName(r, cacheControlHeader)
for _, v := range headerValues {
if v == noStoreValue {
level.Debug(s.logger).Log("msg", fmt.Sprintf("%s header in response is equal to %s, not caching the response", cacheControlHeader, noStoreValue))
return false
}
if slices.Contains(headerValues, noStoreValue) {
level.Debug(s.logger).Log("msg", fmt.Sprintf("%s header in response is equal to %s, not caching the response", cacheControlHeader, noStoreValue))
return false
}
if !s.isAtModifierCachable(req, maxCacheTime) {
@ -334,7 +333,12 @@ func (s resultsCache) isAtModifierCachable(r Request, maxCacheTime int64) bool {
}
// This resolves the start() and end() used with the @ modifier.
expr = promql.PreprocessExpr(expr, timestamp.Time(r.GetStart()), timestamp.Time(r.GetEnd()))
expr, err = promql.PreprocessExpr(expr, timestamp.Time(r.GetStart()), timestamp.Time(r.GetEnd()), time.Duration(r.GetStep())*time.Millisecond)
if err != nil {
// We are being pessimistic in such cases.
level.Warn(s.logger).Log("msg", "failed to preprocess expr", "query", query, "err", err)
return false
}
end := r.GetEnd()
atModCachable := true

View File

@ -388,7 +388,6 @@ func Test_evaluateAtModifier(t *testing.T) {
expectedErrorCode: http.StatusBadRequest,
},
} {
tt := tt
t.Run(tt.in, func(t *testing.T) {
t.Parallel()
out, err := EvaluateAtModifierFunction(tt.in, start, end)

View File

@ -40,11 +40,8 @@ func DoRequests(ctx context.Context, downstream Handler, reqs []Request, limits
}()
respChan, errChan := make(chan RequestResponse), make(chan error)
parallelism := validation.SmallestPositiveIntPerTenant(tenantIDs, limits.MaxQueryParallelism)
if parallelism > len(reqs) {
parallelism = len(reqs)
}
for i := 0; i < parallelism; i++ {
parallelism := min(validation.SmallestPositiveIntPerTenant(tenantIDs, limits.MaxQueryParallelism), len(reqs))
for range parallelism {
go func() {
for req := range intermediate {
resp, err := downstream.Do(ctx, req)

View File

@ -61,7 +61,7 @@ func TestMatrixToSeriesSetSortsMetricLabels(t *testing.T) {
func TestDeletedSeriesIterator(t *testing.T) {
cs := ConcreteSeries{labels: labels.FromStrings("foo", "bar")}
// Insert random stuff from (0, 1000).
for i := 0; i < 1000; i++ {
for i := range 1000 {
cs.samples = append(cs.samples, model.SamplePair{Timestamp: model.Time(i), Value: model.SampleValue(rand.Float64())})
}
@ -118,7 +118,7 @@ func TestDeletedSeriesIterator(t *testing.T) {
func TestDeletedIterator_WithSeek(t *testing.T) {
cs := ConcreteSeries{labels: labels.FromStrings("foo", "bar")}
// Insert random stuff from (0, 1000).
for i := 0; i < 1000; i++ {
for i := range 1000 {
cs.samples = append(cs.samples, model.SamplePair{Timestamp: model.Time(i), Value: model.SampleValue(rand.Float64())})
}

View File

@ -49,9 +49,9 @@ func (c CIDRSliceCSV) String() string {
// Set implements flag.Value
func (c *CIDRSliceCSV) Set(s string) error {
parts := strings.Split(s, ",")
parts := strings.SplitSeq(s, ",")
for _, part := range parts {
for part := range parts {
cidr := &CIDR{}
if err := cidr.Set(part); err != nil {
return errors.Wrapf(err, "cidr: %s", part)
@ -64,7 +64,7 @@ func (c *CIDRSliceCSV) Set(s string) error {
}
// UnmarshalYAML implements yaml.Unmarshaler.
func (c *CIDRSliceCSV) UnmarshalYAML(unmarshal func(interface{}) error) error {
func (c *CIDRSliceCSV) UnmarshalYAML(unmarshal func(any) error) error {
var s string
if err := unmarshal(&s); err != nil {
return err
@ -80,6 +80,6 @@ func (c *CIDRSliceCSV) UnmarshalYAML(unmarshal func(interface{}) error) error {
}
// MarshalYAML implements yaml.Marshaler.
func (c CIDRSliceCSV) MarshalYAML() (interface{}, error) {
func (c CIDRSliceCSV) MarshalYAML() (any, error) {
return c.String(), nil
}

View File

@ -19,7 +19,7 @@ func (v *Secret) Set(s string) error {
}
// UnmarshalYAML implements yaml.Unmarshaler.
func (v *Secret) UnmarshalYAML(unmarshal func(interface{}) error) error {
func (v *Secret) UnmarshalYAML(unmarshal func(any) error) error {
var s string
if err := unmarshal(&s); err != nil {
return err
@ -29,7 +29,7 @@ func (v *Secret) UnmarshalYAML(unmarshal func(interface{}) error) error {
}
// MarshalYAML implements yaml.Marshaler.
func (v Secret) MarshalYAML() (interface{}, error) {
func (v Secret) MarshalYAML() (any, error) {
if len(v.Value) == 0 {
return "", nil
}

View File

@ -47,7 +47,7 @@ func (b BasicAuth) IsEnabled() bool {
}
// WriteJSONResponse writes some JSON as a HTTP response.
func WriteJSONResponse(w http.ResponseWriter, v interface{}) {
func WriteJSONResponse(w http.ResponseWriter, v any) {
w.Header().Set("Content-Type", "application/json")
data, err := json.Marshal(v)
@ -63,7 +63,7 @@ func WriteJSONResponse(w http.ResponseWriter, v interface{}) {
}
// WriteYAMLResponse writes some YAML as a HTTP response.
func WriteYAMLResponse(w http.ResponseWriter, v interface{}) {
func WriteYAMLResponse(w http.ResponseWriter, v any) {
// There is not standardised content-type for YAML, text/plain ensures the
// YAML is displayed in the browser instead of offered as a download
w.Header().Set("Content-Type", "text/plain; charset=utf-8")
@ -98,7 +98,7 @@ func WriteHTMLResponse(w http.ResponseWriter, message string) {
// RenderHTTPResponse either responds with json or a rendered html page using the passed in template
// by checking the Accepts header
func RenderHTTPResponse(w http.ResponseWriter, v interface{}, t *template.Template, r *http.Request) {
func RenderHTTPResponse(w http.ResponseWriter, v any, t *template.Template, r *http.Request) {
accept := r.Header.Get("Accept")
if strings.Contains(accept, "application/json") {
WriteJSONResponse(w, v)
@ -112,7 +112,7 @@ func RenderHTTPResponse(w http.ResponseWriter, v interface{}, t *template.Templa
}
// StreamWriteYAMLResponse stream writes data as http response
func StreamWriteYAMLResponse(w http.ResponseWriter, iter chan interface{}, logger log.Logger) {
func StreamWriteYAMLResponse(w http.ResponseWriter, iter chan any, logger log.Logger) {
w.Header().Set("Content-Type", "application/yaml")
for v := range iter {
data, err := yaml.Marshal(v)

View File

@ -21,7 +21,7 @@ import (
var (
bytesBufferPool = sync.Pool{
New: func() interface{} {
New: func() any {
return bytes.NewBuffer(nil)
},
}

View File

@ -28,15 +28,15 @@ func (n noopSpanContext) ForeachBaggageItem(handler func(k, v string) bool) {}
func (n noopSpan) Context() opentracing.SpanContext { return defaultNoopSpanContext }
func (n noopSpan) SetBaggageItem(key, val string) opentracing.Span { return defaultNoopSpan }
func (n noopSpan) BaggageItem(key string) string { return emptyString }
func (n noopSpan) SetTag(key string, value interface{}) opentracing.Span { return n }
func (n noopSpan) SetTag(key string, value any) opentracing.Span { return n }
func (n noopSpan) LogFields(fields ...log.Field) {}
func (n noopSpan) LogKV(keyVals ...interface{}) {}
func (n noopSpan) LogKV(keyVals ...any) {}
func (n noopSpan) Finish() {}
func (n noopSpan) FinishWithOptions(opts opentracing.FinishOptions) {}
func (n noopSpan) SetOperationName(operationName string) opentracing.Span { return n }
func (n noopSpan) Tracer() opentracing.Tracer { return defaultNoopTracer }
func (n noopSpan) LogEvent(event string) {}
func (n noopSpan) LogEventWithPayload(event string, payload interface{}) {}
func (n noopSpan) LogEventWithPayload(event string, payload any) {}
func (n noopSpan) Log(data opentracing.LogData) {}
// StartSpan belongs to the Tracer interface.
@ -45,11 +45,11 @@ func (n noopTracer) StartSpan(operationName string, opts ...opentracing.StartSpa
}
// Inject belongs to the Tracer interface.
func (n noopTracer) Inject(sp opentracing.SpanContext, format interface{}, carrier interface{}) error {
func (n noopTracer) Inject(sp opentracing.SpanContext, format any, carrier any) error {
return nil
}
// Extract belongs to the Tracer interface.
func (n noopTracer) Extract(format interface{}, carrier interface{}) (opentracing.SpanContext, error) {
func (n noopTracer) Extract(format any, carrier any) (opentracing.SpanContext, error) {
return nil, opentracing.ErrSpanContextNotFound
}

View File

@ -33,14 +33,14 @@ type SpanLogger struct {
}
// New makes a new SpanLogger, where logs will be sent to the global logger.
func New(ctx context.Context, method string, kvps ...interface{}) (*SpanLogger, context.Context) {
func New(ctx context.Context, method string, kvps ...any) (*SpanLogger, context.Context) {
return NewWithLogger(ctx, util_log.Logger, method, kvps...)
}
// NewWithLogger makes a new SpanLogger with a custom log.Logger to send logs
// to. The provided context will have the logger attached to it and can be
// retrieved with FromContext or FromContextWithFallback.
func NewWithLogger(ctx context.Context, l log.Logger, method string, kvps ...interface{}) (*SpanLogger, context.Context) {
func NewWithLogger(ctx context.Context, l log.Logger, method string, kvps ...any) (*SpanLogger, context.Context) {
span, ctx := opentracing.StartSpanFromContext(ctx, method)
if ids, _ := tenant.TenantIDs(ctx); len(ids) > 0 {
span.SetTag(TenantIDTagName, ids)
@ -86,7 +86,7 @@ func FromContextWithFallback(ctx context.Context, fallback log.Logger) *SpanLogg
// Log implements gokit's Logger interface; sends logs to underlying logger and
// also puts the on the spans.
func (s *SpanLogger) Log(kvps ...interface{}) error {
func (s *SpanLogger) Log(kvps ...any) error {
s.Logger.Log(kvps...)
fields, err := otlog.InterleavedKVToFields(kvps...)
if err != nil {

View File

@ -28,8 +28,8 @@ func TestSpanLogger_Log(t *testing.T) {
}
func TestSpanLogger_CustomLogger(t *testing.T) {
var logged [][]interface{}
var logger funcLogger = func(keyvals ...interface{}) error {
var logged [][]any
var logger funcLogger = func(keyvals ...any) error {
logged = append(logged, keyvals)
return nil
}
@ -42,7 +42,7 @@ func TestSpanLogger_CustomLogger(t *testing.T) {
span = FromContextWithFallback(context.Background(), logger)
_ = span.Log("msg", "fallback spanlogger")
expect := [][]interface{}{
expect := [][]any{
{"method", "test", "msg", "original spanlogger"},
{"msg", "restored spanlogger"},
{"msg", "fallback spanlogger"},
@ -71,8 +71,8 @@ func createSpan(ctx context.Context) *mocktracer.MockSpan {
return logger.Span.(*mocktracer.MockSpan)
}
type funcLogger func(keyvals ...interface{}) error
type funcLogger func(keyvals ...any) error
func (f funcLogger) Log(keyvals ...interface{}) error {
func (f funcLogger) Log(keyvals ...any) error {
return f(keyvals...)
}

View File

@ -32,7 +32,7 @@ func TestTimeFromMillis(t *testing.T) {
func TestDurationWithJitter(t *testing.T) {
const numRuns = 1000
for i := 0; i < numRuns; i++ {
for range numRuns {
actual := DurationWithJitter(time.Minute, 0.5)
assert.GreaterOrEqual(t, int64(actual), int64(30*time.Second))
assert.LessOrEqual(t, int64(actual), int64(90*time.Second))
@ -46,7 +46,7 @@ func TestDurationWithJitter_ZeroInputDuration(t *testing.T) {
func TestDurationWithPositiveJitter(t *testing.T) {
const numRuns = 1000
for i := 0; i < numRuns; i++ {
for range numRuns {
actual := DurationWithPositiveJitter(time.Minute, 0.5)
assert.GreaterOrEqual(t, int64(actual), int64(60*time.Second))
assert.LessOrEqual(t, int64(actual), int64(90*time.Second))

View File

@ -8,6 +8,7 @@ import (
"encoding/json"
"errors"
"flag"
"maps"
"math"
"strings"
"time"
@ -208,7 +209,7 @@ func (l *Limits) Validate(shardByAllLabels bool) error {
}
// UnmarshalYAML implements the yaml.Unmarshaler interface.
func (l *Limits) UnmarshalYAML(unmarshal func(interface{}) error) error {
func (l *Limits) UnmarshalYAML(unmarshal func(any) error) error {
// We want to set l to the defaults and then overwrite it with the input.
// To make unmarshal fill the plain data struct rather than calling UnmarshalYAML
// again, we have to hide it using a type indirection. See prometheus/config.
@ -243,9 +244,7 @@ func (l *Limits) UnmarshalJSON(data []byte) error {
func (l *Limits) copyNotificationIntegrationLimits(defaults NotificationRateLimitMap) {
l.NotificationRateLimitPerIntegration = make(map[string]float64, len(defaults))
for k, v := range defaults {
l.NotificationRateLimitPerIntegration[k] = v
}
maps.Copy(l.NotificationRateLimitPerIntegration, defaults)
}
// When we load YAML from disk, we want the various per-customer limits

View File

@ -67,7 +67,6 @@ func TestLimits_Validate(t *testing.T) {
}
for testName, testData := range tests {
testData := testData
t.Run(testName, func(t *testing.T) {
assert.Equal(t, testData.expected, testData.limits.Validate(testData.shardByAllLabels))
@ -185,7 +184,7 @@ func TestLimitsTagsYamlMatchJson(t *testing.T) {
n := limits.NumField()
var mismatch []string
for i := 0; i < n; i++ {
for i := range n {
field := limits.Field(i)
// Note that we aren't requiring YAML and JSON tags to match, just that
@ -225,7 +224,7 @@ func TestLimitsAlwaysUsesPromDuration(t *testing.T) {
n := limits.NumField()
var badDurationType []string
for i := 0; i < n; i++ {
for i := range n {
field := limits.Field(i)
if field.Type == stdlibDuration {
badDurationType = append(badDurationType, field.Name)

View File

@ -33,7 +33,7 @@ func (m NotificationRateLimitMap) Set(s string) error {
}
// UnmarshalYAML implements yaml.Unmarshaler.
func (m NotificationRateLimitMap) UnmarshalYAML(unmarshal func(interface{}) error) error {
func (m NotificationRateLimitMap) UnmarshalYAML(unmarshal func(any) error) error {
newMap := map[string]float64{}
return m.updateMap(unmarshal(newMap), newMap)
}
@ -53,6 +53,6 @@ func (m NotificationRateLimitMap) updateMap(unmarshalErr error, newMap map[strin
}
// MarshalYAML implements yaml.Marshaler.
func (m NotificationRateLimitMap) MarshalYAML() (interface{}, error) {
func (m NotificationRateLimitMap) MarshalYAML() (any, error) {
return map[string]float64(m), nil
}

View File

@ -28,7 +28,7 @@ func TestQueue_Pop_all_Pushed(t *testing.T) {
pushes := 3
q := NewQueue(nil, nil, qcapacity, batchsize, labels.EmptyLabels(), nil, nil)
for i := 0; i < pushes; i++ {
for range pushes {
q.Push([]*notifier.Alert{
{},
{},

View File

@ -44,7 +44,7 @@ var supportedAPIVersions = []APIVersion{
}
// UnmarshalYAML implements the yaml.Unmarshaler interface.
func (v *APIVersion) UnmarshalYAML(unmarshal func(interface{}) error) error {
func (v *APIVersion) UnmarshalYAML(unmarshal func(any) error) error {
var s string
if err := unmarshal(&s); err != nil {
return errors.Wrap(err, "invalid Alertmanager API version")
@ -72,7 +72,7 @@ func DefaultAlertmanagerConfig() AlertmanagerConfig {
}
// UnmarshalYAML implements the yaml.Unmarshaler interface.
func (c *AlertmanagerConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
func (c *AlertmanagerConfig) UnmarshalYAML(unmarshal func(any) error) error {
*c = DefaultAlertmanagerConfig()
type plain AlertmanagerConfig
return unmarshal((*plain)(c))

View File

@ -118,11 +118,11 @@ type RuntimeInfo struct {
type RuntimeInfoFn func() RuntimeInfo
type response struct {
Status status `json:"status"`
Data interface{} `json:"data,omitempty"`
ErrorType ErrorType `json:"errorType,omitempty"`
Error string `json:"error,omitempty"`
Warnings []string `json:"warnings,omitempty"`
Status status `json:"status"`
Data any `json:"data,omitempty"`
ErrorType ErrorType `json:"errorType,omitempty"`
Error string `json:"error,omitempty"`
Warnings []string `json:"warnings,omitempty"`
}
// SetCORS enables cross-site script calls.
@ -132,7 +132,7 @@ func SetCORS(w http.ResponseWriter) {
}
}
type ApiFunc func(r *http.Request) (interface{}, []error, *ApiError, func())
type ApiFunc func(r *http.Request) (any, []error, *ApiError, func())
type BaseAPI struct {
logger log.Logger
@ -167,19 +167,19 @@ func (api *BaseAPI) Register(r *route.Router, tracer opentracing.Tracer, logger
r.Get("/status/buildinfo", instr("status_build", api.serveBuildInfo))
}
func (api *BaseAPI) options(r *http.Request) (interface{}, []error, *ApiError, func()) {
func (api *BaseAPI) options(r *http.Request) (any, []error, *ApiError, func()) {
return nil, nil, nil, func() {}
}
func (api *BaseAPI) flags(r *http.Request) (interface{}, []error, *ApiError, func()) {
func (api *BaseAPI) flags(r *http.Request) (any, []error, *ApiError, func()) {
return api.flagsMap, nil, nil, func() {}
}
func (api *BaseAPI) serveRuntimeInfo(r *http.Request) (interface{}, []error, *ApiError, func()) {
func (api *BaseAPI) serveRuntimeInfo(r *http.Request) (any, []error, *ApiError, func()) {
return api.runtimeInfo(), nil, nil, func() {}
}
func (api *BaseAPI) serveBuildInfo(r *http.Request) (interface{}, []error, *ApiError, func()) {
func (api *BaseAPI) serveBuildInfo(r *http.Request) (any, []error, *ApiError, func()) {
return api.buildInfo, nil, nil, func() {}
}
@ -255,7 +255,7 @@ func shouldNotCacheBecauseOfWarnings(warnings []error) bool {
return false
}
func Respond(w http.ResponseWriter, data interface{}, warnings []error, logger log.Logger) {
func Respond(w http.ResponseWriter, data any, warnings []error, logger log.Logger) {
w.Header().Set("Content-Type", "application/json")
if shouldNotCacheBecauseOfWarnings(warnings) {
w.Header().Set("Cache-Control", "no-store")
@ -283,7 +283,7 @@ func Respond(w http.ResponseWriter, data interface{}, warnings []error, logger l
}
}
func RespondError(w http.ResponseWriter, apiErr *ApiError, data interface{}, logger log.Logger) {
func RespondError(w http.ResponseWriter, apiErr *ApiError, data any, logger log.Logger) {
w.Header().Set("Content-Type", "application/json")
w.Header().Set("Cache-Control", "no-store")

View File

@ -193,12 +193,11 @@ func (nilWriter) WriteHeader(statusCode int) {}
func BenchmarkRespond(b *testing.B) {
floats := []promql.FPoint{}
for i := 0; i < 10000; i++ {
for i := range 10000 {
floats = append(floats, promql.FPoint{T: 1435781451 + int64(i), F: 1234.123 + float64(i)})
}
b.ResetTimer()
for i := 0; i < b.N; i++ {
for b.Loop() {
Respond(&nilWriter{}, promql.Matrix{
promql.Series{
Metric: promLabels.FromMap(map[string]string{"__name__": "up", "job": "prometheus"}),

View File

@ -93,7 +93,7 @@ func (bapi *BlocksAPI) Register(r *route.Router, tracer opentracing.Tracer, logg
r.Post("/blocks/mark", instr("blocks_mark", bapi.markBlock))
}
func (bapi *BlocksAPI) markBlock(r *http.Request) (interface{}, []error, *api.ApiError, func()) {
func (bapi *BlocksAPI) markBlock(r *http.Request) (any, []error, *api.ApiError, func()) {
if bapi.disableAdminOperations {
return nil, nil, &api.ApiError{Typ: api.ErrorBadData, Err: errors.New("Admin operations are disabled")}, func() {}
}
@ -132,7 +132,7 @@ func (bapi *BlocksAPI) markBlock(r *http.Request) (interface{}, []error, *api.Ap
return nil, nil, nil, func() {}
}
func (bapi *BlocksAPI) blocks(r *http.Request) (interface{}, []error, *api.ApiError, func()) {
func (bapi *BlocksAPI) blocks(r *http.Request) (any, []error, *api.ApiError, func()) {
viewParam := r.URL.Query().Get("view")
if viewParam == "loaded" {
bapi.loadedLock.Lock()

View File

@ -40,10 +40,10 @@ type endpointTestCase struct {
params map[string]string
query url.Values
method string
response interface{}
response any
errType baseAPI.ErrorType
}
type responeCompareFunction func(interface{}, interface{}) bool
type responeCompareFunction func(any, any) bool
func testEndpoint(t *testing.T, test endpointTestCase, name string, responseCompareFunc responeCompareFunction) bool {
return t.Run(name, func(t *testing.T) {
@ -57,9 +57,10 @@ func testEndpoint(t *testing.T, test endpointTestCase, name string, responseComp
params := test.query.Encode()
var body io.Reader
if test.method == http.MethodPost {
switch test.method {
case http.MethodPost:
body = strings.NewReader(params)
} else if test.method == "" {
case "":
test.method = "ANY"
reqURL += "?" + params
}

View File

@ -25,14 +25,14 @@ import (
)
type GRPCAPI struct {
now func() time.Time
replicaLabels []string
queryableCreate query.QueryableCreator
remoteEndpointsCreate query.RemoteEndpointsCreator
queryCreator queryCreator
defaultEngine querypb.EngineType
lookbackDeltaCreate func(int64) time.Duration
defaultMaxResolutionSeconds time.Duration
now func() time.Time
replicaLabels []string
queryableCreate query.QueryableCreator
remoteEndpointsCreate query.RemoteEndpointsCreator
queryCreator queryCreator
defaultEngine querypb.EngineType
lookbackDeltaCreate func(int64) time.Duration
defaultMaxResolution time.Duration
}
func NewGRPCAPI(
@ -43,17 +43,17 @@ func NewGRPCAPI(
queryCreator queryCreator,
defaultEngine querypb.EngineType,
lookbackDeltaCreate func(int64) time.Duration,
defaultMaxResolutionSeconds time.Duration,
defaultMaxResolution time.Duration,
) *GRPCAPI {
return &GRPCAPI{
now: now,
replicaLabels: replicaLabels,
queryableCreate: queryableCreator,
remoteEndpointsCreate: remoteEndpointsCreator,
queryCreator: queryCreator,
defaultEngine: defaultEngine,
lookbackDeltaCreate: lookbackDeltaCreate,
defaultMaxResolutionSeconds: defaultMaxResolutionSeconds,
now: now,
replicaLabels: replicaLabels,
queryableCreate: queryableCreator,
remoteEndpointsCreate: remoteEndpointsCreator,
queryCreator: queryCreator,
defaultEngine: defaultEngine,
lookbackDeltaCreate: lookbackDeltaCreate,
defaultMaxResolution: defaultMaxResolution,
}
}
@ -75,7 +75,7 @@ func (g *GRPCAPI) Query(request *querypb.QueryRequest, server querypb.Query_Quer
maxResolution := request.MaxResolutionSeconds
if request.MaxResolutionSeconds == 0 {
maxResolution = g.defaultMaxResolutionSeconds.Milliseconds() / 1000
maxResolution = g.defaultMaxResolution.Milliseconds() / 1000
}
storeMatchers, err := querypb.StoreMatchersToLabelMatchers(request.StoreMatchers)
@ -173,7 +173,7 @@ func (g *GRPCAPI) QueryRange(request *querypb.QueryRangeRequest, srv querypb.Que
maxResolution := request.MaxResolutionSeconds
if request.MaxResolutionSeconds == 0 {
maxResolution = g.defaultMaxResolutionSeconds.Milliseconds() / 1000
maxResolution = g.defaultMaxResolution.Milliseconds() / 1000
}
storeMatchers, err := querypb.StoreMatchersToLabelMatchers(request.StoreMatchers)

View File

@ -12,6 +12,7 @@ import (
"github.com/efficientgo/core/testutil"
"github.com/go-kit/log"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/prometheus/model/labels"
"github.com/prometheus/prometheus/promql"
"github.com/prometheus/prometheus/storage"
"github.com/prometheus/prometheus/util/annotations"
@ -31,7 +32,7 @@ import (
func TestGRPCQueryAPIWithQueryPlan(t *testing.T) {
logger := log.NewNopLogger()
reg := prometheus.NewRegistry()
proxy := store.NewProxyStore(logger, reg, func() []store.Client { return nil }, component.Store, nil, 1*time.Minute, store.LazyRetrieval)
proxy := store.NewProxyStore(logger, reg, func() []store.Client { return nil }, component.Store, labels.EmptyLabels(), 1*time.Minute, store.LazyRetrieval)
queryableCreator := query.NewQueryableCreator(logger, reg, proxy, 1, 1*time.Minute, dedup.AlgorithmPenalty)
remoteEndpointsCreator := query.NewRemoteEndpointsCreator(logger, func() []query.Client { return nil }, nil, 1*time.Minute, true, true)
lookbackDeltaFunc := func(i int64) time.Duration { return 5 * time.Minute }
@ -39,7 +40,7 @@ func TestGRPCQueryAPIWithQueryPlan(t *testing.T) {
expr, err := extpromql.ParseExpr("metric")
testutil.Ok(t, err)
lplan := logicalplan.NewFromAST(expr, &equery.Options{}, logicalplan.PlanOptions{})
lplan, err := logicalplan.NewFromAST(expr, &equery.Options{}, logicalplan.PlanOptions{})
testutil.Ok(t, err)
// Create a mock query plan.
planBytes, err := logicalplan.Marshal(lplan.Root())
@ -75,7 +76,7 @@ func TestGRPCQueryAPIWithQueryPlan(t *testing.T) {
func TestGRPCQueryAPIErrorHandling(t *testing.T) {
logger := log.NewNopLogger()
reg := prometheus.NewRegistry()
proxy := store.NewProxyStore(logger, reg, func() []store.Client { return nil }, component.Store, nil, 1*time.Minute, store.LazyRetrieval)
proxy := store.NewProxyStore(logger, reg, func() []store.Client { return nil }, component.Store, labels.EmptyLabels(), 1*time.Minute, store.LazyRetrieval)
queryableCreator := query.NewQueryableCreator(logger, reg, proxy, 1, 1*time.Minute, dedup.AlgorithmPenalty)
remoteEndpointsCreator := query.NewRemoteEndpointsCreator(logger, func() []query.Client { return nil }, nil, 1*time.Minute, true, true)
lookbackDeltaFunc := func(i int64) time.Duration { return 5 * time.Minute }

View File

@ -43,6 +43,7 @@ import (
"github.com/prometheus/prometheus/storage"
"github.com/prometheus/prometheus/util/annotations"
"github.com/prometheus/prometheus/util/stats"
v1 "github.com/prometheus/prometheus/web/api/v1"
"github.com/thanos-io/promql-engine/engine"
"github.com/thanos-io/thanos/pkg/api"
@ -110,6 +111,7 @@ type QueryAPI struct {
replicaLabels []string
endpointStatus func() []query.EndpointStatus
tsdbSelector *store.TSDBSelector
defaultRangeQueryStep time.Duration
defaultInstantQueryMaxSourceResolution time.Duration
@ -159,6 +161,7 @@ func NewQueryAPI(
tenantCertField string,
enforceTenancy bool,
tenantLabel string,
tsdbSelector *store.TSDBSelector,
) *QueryAPI {
if statsAggregatorFactory == nil {
statsAggregatorFactory = &store.NoopSeriesStatsAggregatorFactory{}
@ -194,6 +197,7 @@ func NewQueryAPI(
tenantCertField: tenantCertField,
enforceTenancy: enforceTenancy,
tenantLabel: tenantLabel,
tsdbSelector: tsdbSelector,
queryRangeHist: promauto.With(reg).NewHistogram(prometheus.HistogramOpts{
Name: "thanos_query_range_requested_timespan_duration_seconds",
@ -444,7 +448,7 @@ func processAnalysis(a *engine.AnalyzeOutputNode) queryTelemetry {
return analysis
}
func (qapi *QueryAPI) queryExplain(r *http.Request) (interface{}, []error, *api.ApiError, func()) {
func (qapi *QueryAPI) queryExplain(r *http.Request) (any, []error, *api.ApiError, func()) {
engineParam, apiErr := qapi.parseEngineParam(r)
if apiErr != nil {
return nil, nil, apiErr, func() {}
@ -554,7 +558,7 @@ func (qapi *QueryAPI) queryExplain(r *http.Request) (interface{}, []error, *api.
return explanation, nil, nil, func() {}
}
func (qapi *QueryAPI) query(r *http.Request) (interface{}, []error, *api.ApiError, func()) {
func (qapi *QueryAPI) query(r *http.Request) (any, []error, *api.ApiError, func()) {
ts, err := parseTimeParam(r, "time", qapi.baseAPI.Now())
if err != nil {
return nil, nil, &api.ApiError{Typ: api.ErrorBadData, Err: err}, func() {}
@ -675,7 +679,9 @@ func (qapi *QueryAPI) query(r *http.Request) (interface{}, []error, *api.ApiErro
}
return nil, nil, &api.ApiError{Typ: api.ErrorExec, Err: res.Err}, qry.Close
}
warnings = append(warnings, res.Warnings.AsErrors()...)
// this prevents a panic when annotations are concurrently accessed
safeWarnings := annotations.New().Merge(res.Warnings)
warnings = append(warnings, safeWarnings.AsErrors()...)
var analysis queryTelemetry
if qapi.parseQueryAnalyzeParam(r) {
@ -704,7 +710,7 @@ func (qapi *QueryAPI) query(r *http.Request) (interface{}, []error, *api.ApiErro
}, warnings, nil, qry.Close
}
func (qapi *QueryAPI) queryRangeExplain(r *http.Request) (interface{}, []error, *api.ApiError, func()) {
func (qapi *QueryAPI) queryRangeExplain(r *http.Request) (any, []error, *api.ApiError, func()) {
engineParam, apiErr := qapi.parseEngineParam(r)
if apiErr != nil {
return nil, nil, apiErr, func() {}
@ -840,7 +846,7 @@ func (qapi *QueryAPI) queryRangeExplain(r *http.Request) (interface{}, []error,
return explanation, nil, nil, func() {}
}
func (qapi *QueryAPI) queryRange(r *http.Request) (interface{}, []error, *api.ApiError, func()) {
func (qapi *QueryAPI) queryRange(r *http.Request) (any, []error, *api.ApiError, func()) {
start, err := parseTime(r.FormValue("start"))
if err != nil {
return nil, nil, &api.ApiError{Typ: api.ErrorBadData, Err: err}, func() {}
@ -984,7 +990,9 @@ func (qapi *QueryAPI) queryRange(r *http.Request) (interface{}, []error, *api.Ap
}
return nil, nil, &api.ApiError{Typ: api.ErrorExec, Err: res.Err}, qry.Close
}
warnings = append(warnings, res.Warnings.AsErrors()...)
// this prevents a panic when annotations are concurrently accessed
safeWarnings := annotations.New().Merge(res.Warnings)
warnings = append(warnings, safeWarnings.AsErrors()...)
var analysis queryTelemetry
if qapi.parseQueryAnalyzeParam(r) {
@ -1013,7 +1021,7 @@ func (qapi *QueryAPI) queryRange(r *http.Request) (interface{}, []error, *api.Ap
}, warnings, nil, qry.Close
}
func (qapi *QueryAPI) labelValues(r *http.Request) (interface{}, []error, *api.ApiError, func()) {
func (qapi *QueryAPI) labelValues(r *http.Request) (any, []error, *api.ApiError, func()) {
ctx := r.Context()
name := route.Param(ctx, "name")
@ -1108,7 +1116,7 @@ func (qapi *QueryAPI) labelValues(r *http.Request) (interface{}, []error, *api.A
return vals, warnings.AsErrors(), nil, func() {}
}
func (qapi *QueryAPI) series(r *http.Request) (interface{}, []error, *api.ApiError, func()) {
func (qapi *QueryAPI) series(r *http.Request) (any, []error, *api.ApiError, func()) {
if err := r.ParseForm(); err != nil {
return nil, nil, &api.ApiError{Typ: api.ErrorInternal, Err: errors.Wrap(err, "parse form")}, func() {}
}
@ -1199,7 +1207,7 @@ func (qapi *QueryAPI) series(r *http.Request) (interface{}, []error, *api.ApiErr
return metrics, warnings.AsErrors(), nil, func() {}
}
func (qapi *QueryAPI) labelNames(r *http.Request) (interface{}, []error, *api.ApiError, func()) {
func (qapi *QueryAPI) labelNames(r *http.Request) (any, []error, *api.ApiError, func()) {
start, end, err := parseMetadataTimeRange(r, qapi.defaultMetadataTimeRange)
if err != nil {
return nil, nil, &api.ApiError{Typ: api.ErrorBadData, Err: err}, func() {}
@ -1288,27 +1296,40 @@ func (qapi *QueryAPI) labelNames(r *http.Request) (interface{}, []error, *api.Ap
return names, warnings.AsErrors(), nil, func() {}
}
func (qapi *QueryAPI) stores(_ *http.Request) (interface{}, []error, *api.ApiError, func()) {
func (qapi *QueryAPI) stores(_ *http.Request) (any, []error, *api.ApiError, func()) {
statuses := make(map[string][]query.EndpointStatus)
for _, status := range qapi.endpointStatus() {
// Don't consider an endpoint if we cannot retrieve component type.
if status.ComponentType == nil {
continue
}
statuses[status.ComponentType.String()] = append(statuses[status.ComponentType.String()], status)
// Apply TSDBSelector filtering to LabelSets if selector is configured
filteredStatus := status
if qapi.tsdbSelector != nil && len(status.LabelSets) > 0 {
matches, filteredLabelSets := qapi.tsdbSelector.MatchLabelSets(status.LabelSets...)
if !matches {
continue
}
if filteredLabelSets != nil {
filteredStatus.LabelSets = filteredLabelSets
}
}
statuses[status.ComponentType.String()] = append(statuses[status.ComponentType.String()], filteredStatus)
}
return statuses, nil, nil, func() {}
}
// NewTargetsHandler created handler compatible with HTTP /api/v1/targets https://prometheus.io/docs/prometheus/latest/querying/api/#targets
// which uses gRPC Unary Targets API.
func NewTargetsHandler(client targets.UnaryClient, enablePartialResponse bool) func(*http.Request) (interface{}, []error, *api.ApiError, func()) {
func NewTargetsHandler(client targets.UnaryClient, enablePartialResponse bool) func(*http.Request) (any, []error, *api.ApiError, func()) {
ps := storepb.PartialResponseStrategy_ABORT
if enablePartialResponse {
ps = storepb.PartialResponseStrategy_WARN
}
return func(r *http.Request) (interface{}, []error, *api.ApiError, func()) {
return func(r *http.Request) (any, []error, *api.ApiError, func()) {
stateParam := r.URL.Query().Get("state")
state, ok := targetspb.TargetsRequest_State_value[strings.ToUpper(stateParam)]
if !ok {
@ -1334,13 +1355,13 @@ func NewTargetsHandler(client targets.UnaryClient, enablePartialResponse bool) f
// NewAlertsHandler created handler compatible with HTTP /api/v1/alerts https://prometheus.io/docs/prometheus/latest/querying/api/#alerts
// which uses gRPC Unary Rules API (Rules API works for both /alerts and /rules).
func NewAlertsHandler(client rules.UnaryClient, enablePartialResponse bool) func(*http.Request) (interface{}, []error, *api.ApiError, func()) {
func NewAlertsHandler(client rules.UnaryClient, enablePartialResponse bool) func(*http.Request) (any, []error, *api.ApiError, func()) {
ps := storepb.PartialResponseStrategy_ABORT
if enablePartialResponse {
ps = storepb.PartialResponseStrategy_WARN
}
return func(r *http.Request) (interface{}, []error, *api.ApiError, func()) {
return func(r *http.Request) (any, []error, *api.ApiError, func()) {
span, ctx := tracing.StartSpan(r.Context(), "receive_http_request")
defer span.Finish()
@ -1380,13 +1401,13 @@ func NewAlertsHandler(client rules.UnaryClient, enablePartialResponse bool) func
// NewRulesHandler created handler compatible with HTTP /api/v1/rules https://prometheus.io/docs/prometheus/latest/querying/api/#rules
// which uses gRPC Unary Rules API.
func NewRulesHandler(client rules.UnaryClient, enablePartialResponse bool) func(*http.Request) (interface{}, []error, *api.ApiError, func()) {
func NewRulesHandler(client rules.UnaryClient, enablePartialResponse bool) func(*http.Request) (any, []error, *api.ApiError, func()) {
ps := storepb.PartialResponseStrategy_ABORT
if enablePartialResponse {
ps = storepb.PartialResponseStrategy_WARN
}
return func(r *http.Request) (interface{}, []error, *api.ApiError, func()) {
return func(r *http.Request) (any, []error, *api.ApiError, func()) {
span, ctx := tracing.StartSpan(r.Context(), "receive_http_request")
defer span.Finish()
@ -1430,13 +1451,13 @@ func NewRulesHandler(client rules.UnaryClient, enablePartialResponse bool) func(
// NewExemplarsHandler creates handler compatible with HTTP /api/v1/query_exemplars https://prometheus.io/docs/prometheus/latest/querying/api/#querying-exemplars
// which uses gRPC Unary Exemplars API.
func NewExemplarsHandler(client exemplars.UnaryClient, enablePartialResponse bool) func(*http.Request) (interface{}, []error, *api.ApiError, func()) {
func NewExemplarsHandler(client exemplars.UnaryClient, enablePartialResponse bool) func(*http.Request) (any, []error, *api.ApiError, func()) {
ps := storepb.PartialResponseStrategy_ABORT
if enablePartialResponse {
ps = storepb.PartialResponseStrategy_WARN
}
return func(r *http.Request) (interface{}, []error, *api.ApiError, func()) {
return func(r *http.Request) (any, []error, *api.ApiError, func()) {
span, ctx := tracing.StartSpan(r.Context(), "exemplar_query_request")
defer span.Finish()
@ -1446,11 +1467,11 @@ func NewExemplarsHandler(client exemplars.UnaryClient, enablePartialResponse boo
err error
)
start, err := parseTimeParam(r, "start", infMinTime)
start, err := parseTimeParam(r, "start", v1.MinTime)
if err != nil {
return nil, nil, &api.ApiError{Typ: api.ErrorBadData, Err: err}, func() {}
}
end, err := parseTimeParam(r, "end", infMaxTime)
end, err := parseTimeParam(r, "end", v1.MaxTime)
if err != nil {
return nil, nil, &api.ApiError{Typ: api.ErrorBadData, Err: err}, func() {}
}
@ -1473,17 +1494,12 @@ func NewExemplarsHandler(client exemplars.UnaryClient, enablePartialResponse boo
}
}
var (
infMinTime = time.Unix(math.MinInt64/1000+62135596801, 0)
infMaxTime = time.Unix(math.MaxInt64/1000-62135596801, 999999999)
)
func parseMetadataTimeRange(r *http.Request, defaultMetadataTimeRange time.Duration) (time.Time, time.Time, error) {
// If start and end time not specified as query parameter, we get the range from the beginning of time by default.
var defaultStartTime, defaultEndTime time.Time
if defaultMetadataTimeRange == 0 {
defaultStartTime = infMinTime
defaultEndTime = infMaxTime
defaultStartTime = v1.MinTime
defaultEndTime = v1.MaxTime
} else {
now := time.Now()
defaultStartTime = now.Add(-defaultMetadataTimeRange)
@ -1575,13 +1591,13 @@ func toHintLimit(limit int) int {
// NewMetricMetadataHandler creates handler compatible with HTTP /api/v1/metadata https://prometheus.io/docs/prometheus/latest/querying/api/#querying-metric-metadata
// which uses gRPC Unary Metadata API.
func NewMetricMetadataHandler(client metadata.UnaryClient, enablePartialResponse bool) func(*http.Request) (interface{}, []error, *api.ApiError, func()) {
func NewMetricMetadataHandler(client metadata.UnaryClient, enablePartialResponse bool) func(*http.Request) (any, []error, *api.ApiError, func()) {
ps := storepb.PartialResponseStrategy_ABORT
if enablePartialResponse {
ps = storepb.PartialResponseStrategy_WARN
}
return func(r *http.Request) (interface{}, []error, *api.ApiError, func()) {
return func(r *http.Request) (any, []error, *api.ApiError, func()) {
span, ctx := tracing.StartSpan(r.Context(), "metadata_http_request")
defer span.Finish()

View File

@ -75,13 +75,13 @@ type endpointTestCase struct {
params map[string]string
query url.Values
method string
response interface{}
response any
errType baseAPI.ErrorType
}
type responeCompareFunction func(interface{}, interface{}) bool
type responeCompareFunction func(any, any) bool
// Checks if both responses have Stats present or not.
func lookupStats(a, b interface{}) bool {
func lookupStats(a, b any) bool {
ra := a.(*queryData)
rb := b.(*queryData)
return (ra.Stats == nil && rb.Stats == nil) || (ra.Stats != nil && rb.Stats != nil)
@ -124,9 +124,11 @@ func testEndpoint(t *testing.T, test endpointTestCase, name string, responseComp
params := test.query.Encode()
var body io.Reader
if test.method == http.MethodPost {
switch test.method {
case http.MethodPost:
body = strings.NewReader(params)
} else if test.method == "" {
case "":
test.method = "ANY"
reqURL += "?" + params
}
@ -163,38 +165,13 @@ func testEndpoint(t *testing.T, test endpointTestCase, name string, responseComp
func TestQueryEndpoints(t *testing.T) {
lbls := []labels.Labels{
{
labels.Label{Name: "__name__", Value: "test_metric1"},
labels.Label{Name: "foo", Value: "bar"},
},
{
labels.Label{Name: "__name__", Value: "test_metric1"},
labels.Label{Name: "foo", Value: "boo"},
},
{
labels.Label{Name: "__name__", Value: "test_metric2"},
labels.Label{Name: "foo", Value: "boo"},
},
{
labels.Label{Name: "__name__", Value: "test_metric_replica1"},
labels.Label{Name: "foo", Value: "bar"},
labels.Label{Name: "replica", Value: "a"},
},
{
labels.Label{Name: "__name__", Value: "test_metric_replica1"},
labels.Label{Name: "foo", Value: "boo"},
labels.Label{Name: "replica", Value: "a"},
},
{
labels.Label{Name: "__name__", Value: "test_metric_replica1"},
labels.Label{Name: "foo", Value: "boo"},
labels.Label{Name: "replica", Value: "b"},
},
{
labels.Label{Name: "__name__", Value: "test_metric_replica1"},
labels.Label{Name: "foo", Value: "boo"},
labels.Label{Name: "replica1", Value: "a"},
},
labels.FromStrings("__name__", "test_metric1", "foo", "bar"),
labels.FromStrings("__name__", "test_metric1", "foo", "boo"),
labels.FromStrings("__name__", "test_metric2", "foo", "boo"),
labels.FromStrings("__name__", "test_metric_replica1", "foo", "bar", "replica", "a"),
labels.FromStrings("__name__", "test_metric_replica1", "foo", "boo", "replica", "a"),
labels.FromStrings("__name__", "test_metric_replica1", "foo", "boo", "replica", "b"),
labels.FromStrings("__name__", "test_metric_replica1", "foo", "boo", "replica1", "a"),
}
db, err := e2eutil.NewTSDB()
@ -203,7 +180,7 @@ func TestQueryEndpoints(t *testing.T) {
app := db.Appender(context.Background())
for _, lbl := range lbls {
for i := int64(0); i < 10; i++ {
for i := range int64(10) {
_, err := app.Append(0, lbl, i*60000, float64(i))
testutil.Ok(t, err)
}
@ -286,76 +263,24 @@ func TestQueryEndpoints(t *testing.T) {
ResultType: parser.ValueTypeVector,
Result: promql.Vector{
{
Metric: labels.Labels{
{
Name: "__name__",
Value: "test_metric_replica1",
},
{
Name: "foo",
Value: "bar",
},
{
Name: "replica",
Value: "a",
},
},
T: 123000,
F: 2,
Metric: labels.FromStrings("__name__", "test_metric_replica1", "foo", "bar", "replica", "a"),
T: 123000,
F: 2,
},
{
Metric: labels.Labels{
{
Name: "__name__",
Value: "test_metric_replica1",
},
{
Name: "foo",
Value: "boo",
},
{
Name: "replica",
Value: "a",
},
},
T: 123000,
F: 2,
Metric: labels.FromStrings("__name__", "test_metric_replica1", "foo", "boo", "replica", "a"),
T: 123000,
F: 2,
},
{
Metric: labels.Labels{
{
Name: "__name__",
Value: "test_metric_replica1",
},
{
Name: "foo",
Value: "boo",
},
{
Name: "replica",
Value: "b",
},
},
T: 123000,
F: 2,
Metric: labels.FromStrings("__name__", "test_metric_replica1", "foo", "boo", "replica", "b"),
T: 123000,
F: 2,
},
{
Metric: labels.Labels{
{
Name: "__name__",
Value: "test_metric_replica1",
},
{
Name: "foo",
Value: "boo",
},
{
Name: "replica1",
Value: "a",
},
},
T: 123000,
F: 2,
Metric: labels.FromStrings("__name__", "test_metric_replica1", "foo", "boo", "replica1", "a"),
T: 123000,
F: 2,
},
},
},
@ -373,50 +298,19 @@ func TestQueryEndpoints(t *testing.T) {
ResultType: parser.ValueTypeVector,
Result: promql.Vector{
{
Metric: labels.Labels{
{
Name: "__name__",
Value: "test_metric_replica1",
},
{
Name: "foo",
Value: "bar",
},
},
T: 123000,
F: 2,
Metric: labels.FromStrings("__name__", "test_metric_replica1", "foo", "bar"),
T: 123000,
F: 2,
},
{
Metric: labels.Labels{
{
Name: "__name__",
Value: "test_metric_replica1",
},
{
Name: "foo",
Value: "boo",
},
},
T: 123000,
F: 2,
Metric: labels.FromStrings("__name__", "test_metric_replica1", "foo", "boo"),
T: 123000,
F: 2,
},
{
Metric: labels.Labels{
{
Name: "__name__",
Value: "test_metric_replica1",
},
{
Name: "foo",
Value: "boo",
},
{
Name: "replica1",
Value: "a",
},
},
T: 123000,
F: 2,
Metric: labels.FromStrings("__name__", "test_metric_replica1", "foo", "boo", "replica1", "a"),
T: 123000,
F: 2,
},
},
},
@ -433,32 +327,14 @@ func TestQueryEndpoints(t *testing.T) {
ResultType: parser.ValueTypeVector,
Result: promql.Vector{
{
Metric: labels.Labels{
{
Name: "__name__",
Value: "test_metric_replica1",
},
{
Name: "foo",
Value: "bar",
},
},
T: 123000,
F: 2,
Metric: labels.FromStrings("__name__", "test_metric_replica1", "foo", "bar"),
T: 123000,
F: 2,
},
{
Metric: labels.Labels{
{
Name: "__name__",
Value: "test_metric_replica1",
},
{
Name: "foo",
Value: "boo",
},
},
T: 123000,
F: 2,
Metric: labels.FromStrings("__name__", "test_metric_replica1", "foo", "boo"),
T: 123000,
F: 2,
},
},
},
@ -504,7 +380,6 @@ func TestQueryEndpoints(t *testing.T) {
}
return res
}(500, 1),
Metric: nil,
},
},
},
@ -526,7 +401,6 @@ func TestQueryEndpoints(t *testing.T) {
{F: 1, T: timestamp.FromTime(start.Add(1 * time.Second))},
{F: 2, T: timestamp.FromTime(start.Add(2 * time.Second))},
},
Metric: nil,
},
},
},
@ -779,7 +653,6 @@ func TestQueryAnalyzeEndpoints(t *testing.T) {
}
return res
}(500, 1),
Metric: nil,
},
},
QueryAnalysis: queryTelemetry{},
@ -796,7 +669,7 @@ func TestQueryAnalyzeEndpoints(t *testing.T) {
func newProxyStoreWithTSDBStore(db store.TSDBReader) *store.ProxyStore {
c := &storetestutil.TestClient{
Name: "1",
StoreClient: storepb.ServerAsClient(store.NewTSDBStore(nil, db, component.Query, nil)),
StoreClient: storepb.ServerAsClient(store.NewTSDBStore(nil, db, component.Query, labels.EmptyLabels())),
MinTime: math.MinInt64, MaxTime: math.MaxInt64,
}
@ -805,7 +678,7 @@ func newProxyStoreWithTSDBStore(db store.TSDBReader) *store.ProxyStore {
nil,
func() []store.Client { return []store.Client{c} },
component.Query,
nil,
labels.EmptyLabels(),
0,
store.EagerRetrieval,
)
@ -813,41 +686,16 @@ func newProxyStoreWithTSDBStore(db store.TSDBReader) *store.ProxyStore {
func TestMetadataEndpoints(t *testing.T) {
var old = []labels.Labels{
{
labels.Label{Name: "__name__", Value: "test_metric1"},
labels.Label{Name: "foo", Value: "bar"},
},
{
labels.Label{Name: "__name__", Value: "test_metric1"},
labels.Label{Name: "foo", Value: "boo"},
},
{
labels.Label{Name: "__name__", Value: "test_metric2"},
labels.Label{Name: "foo", Value: "boo"},
},
labels.FromStrings("__name__", "test_metric1", "foo", "bar"),
labels.FromStrings("__name__", "test_metric1", "foo", "boo"),
labels.FromStrings("__name__", "test_metric2", "foo", "boo"),
}
var recent = []labels.Labels{
{
labels.Label{Name: "__name__", Value: "test_metric_replica1"},
labels.Label{Name: "foo", Value: "bar"},
labels.Label{Name: "replica", Value: "a"},
},
{
labels.Label{Name: "__name__", Value: "test_metric_replica1"},
labels.Label{Name: "foo", Value: "boo"},
labels.Label{Name: "replica", Value: "a"},
},
{
labels.Label{Name: "__name__", Value: "test_metric_replica1"},
labels.Label{Name: "foo", Value: "boo"},
labels.Label{Name: "replica", Value: "b"},
},
{
labels.Label{Name: "__name__", Value: "test_metric_replica2"},
labels.Label{Name: "foo", Value: "boo"},
labels.Label{Name: "replica1", Value: "a"},
},
labels.FromStrings("__name__", "test_metric_replica1", "foo", "bar", "replica", "a"),
labels.FromStrings("__name__", "test_metric_replica1", "foo", "boo", "replica", "a"),
labels.FromStrings("__name__", "test_metric_replica1", "foo", "boo", "replica", "b"),
labels.FromStrings("__name__", "test_metric_replica2", "foo", "boo", "replica1", "a"),
}
dir := t.TempDir()
@ -858,7 +706,7 @@ func TestMetadataEndpoints(t *testing.T) {
for _, lbl := range old {
var samples []chunks.Sample
for i := int64(0); i < 10; i++ {
for i := range int64(10) {
samples = append(samples, sample{
t: i * 60_000,
f: float64(i),
@ -883,7 +731,7 @@ func TestMetadataEndpoints(t *testing.T) {
app = db.Appender(context.Background())
)
for _, lbl := range recent {
for i := int64(0); i < 10; i++ {
for i := range int64(10) {
_, err := app.Append(0, lbl, start+(i*60_000), float64(i)) // ms
testutil.Ok(t, err)
}
@ -1937,7 +1785,7 @@ func TestRulesHandler(t *testing.T) {
type test struct {
params map[string]string
query url.Values
response interface{}
response any
}
expectedAll := []testpromcompatibility.Rule{
testpromcompatibility.RecordingRule{
@ -2003,7 +1851,7 @@ func TestRulesHandler(t *testing.T) {
EvaluationTime: all[3].GetAlert().EvaluationDurationSeconds,
Duration: all[3].GetAlert().DurationSeconds,
KeepFiringFor: all[3].GetAlert().KeepFiringForSeconds,
Annotations: nil,
Annotations: labels.EmptyLabels(),
Alerts: []*testpromcompatibility.Alert{},
Type: "alerting",
},
@ -2018,7 +1866,7 @@ func TestRulesHandler(t *testing.T) {
EvaluationTime: all[4].GetAlert().EvaluationDurationSeconds,
Duration: all[4].GetAlert().DurationSeconds,
KeepFiringFor: all[4].GetAlert().KeepFiringForSeconds,
Annotations: nil,
Annotations: labels.EmptyLabels(),
Alerts: []*testpromcompatibility.Alert{},
Type: "alerting",
},
@ -2113,7 +1961,7 @@ func TestRulesHandler(t *testing.T) {
func BenchmarkQueryResultEncoding(b *testing.B) {
var mat promql.Matrix
for i := 0; i < 1000; i++ {
for i := range 1000 {
lset := labels.FromStrings(
"__name__", "my_test_metric_name",
"instance", fmt.Sprintf("abcdefghijklmnopqrstuvxyz-%d", i),

View File

@ -57,7 +57,7 @@ func (rapi *RuleAPI) Register(r *route.Router, tracer opentracing.Tracer, logger
instr := api.GetInstr(tracer, logger, ins, logMiddleware, rapi.disableCORS)
r.Get("/alerts", instr("alerts", func(r *http.Request) (interface{}, []error, *api.ApiError, func()) {
r.Get("/alerts", instr("alerts", func(r *http.Request) (any, []error, *api.ApiError, func()) {
return struct{ Alerts []*rulespb.AlertInstance }{Alerts: rapi.alerts.Active()}, nil, nil, func() {}
}))
r.Get("/rules", instr("rules", qapi.NewRulesHandler(rapi.ruleGroups, false)))

View File

@ -68,7 +68,7 @@ func (sapi *StatusAPI) Register(r *route.Router, tracer opentracing.Tracer, logg
r.Get("/api/v1/status/tsdb", instr("tsdb_status", sapi.httpServeStats))
}
func (sapi *StatusAPI) httpServeStats(r *http.Request) (interface{}, []error, *api.ApiError, func()) {
func (sapi *StatusAPI) httpServeStats(r *http.Request) (any, []error, *api.ApiError, func()) {
stats, sterr := sapi.getTSDBStats(r, labels.MetricName)
if sterr != nil {
return nil, nil, sterr, func() {}

View File

@ -141,39 +141,29 @@ func upload(ctx context.Context, logger log.Logger, bkt objstore.Bucket, bdir st
return errors.Wrap(err, "gather meta file stats")
}
if err := objstore.UploadDir(ctx, logger, bkt, filepath.Join(bdir, ChunksDirname), path.Join(id.String(), ChunksDirname), options...); err != nil {
return errors.Wrap(err, "upload chunks")
}
if err := objstore.UploadFile(ctx, logger, bkt, filepath.Join(bdir, IndexFilename), path.Join(id.String(), IndexFilename)); err != nil {
return errors.Wrap(err, "upload index")
}
meta.Thanos.UploadTime = time.Now().UTC()
if err := meta.Write(&metaEncoded); err != nil {
return errors.Wrap(err, "encode meta file")
}
if err := objstore.UploadDir(ctx, logger, bkt, filepath.Join(bdir, ChunksDirname), path.Join(id.String(), ChunksDirname), options...); err != nil {
return cleanUp(logger, bkt, id, errors.Wrap(err, "upload chunks"))
}
if err := objstore.UploadFile(ctx, logger, bkt, filepath.Join(bdir, IndexFilename), path.Join(id.String(), IndexFilename)); err != nil {
return cleanUp(logger, bkt, id, errors.Wrap(err, "upload index"))
}
// Meta.json always need to be uploaded as a last item. This will allow to assume block directories without meta file to be pending uploads.
if err := bkt.Upload(ctx, path.Join(id.String(), MetaFilename), strings.NewReader(metaEncoded.String())); err != nil {
// Don't call cleanUp here. Despite getting error, meta.json may have been uploaded in certain cases,
// and even though cleanUp will not see it yet, meta.json may appear in the bucket later.
// (Eg. S3 is known to behave this way when it returns 503 "SlowDown" error).
// If meta.json is not uploaded, this will produce partial blocks, but such blocks will be cleaned later.
// Syncer always checks if meta.json exists in the next iteration and will retry if it does not.
// This is to avoid partial uploads.
return errors.Wrap(err, "upload meta file")
}
return nil
}
func cleanUp(logger log.Logger, bkt objstore.Bucket, id ulid.ULID, err error) error {
// Cleanup the dir with an uncancelable context.
cleanErr := Delete(context.Background(), logger, bkt, id)
if cleanErr != nil {
return errors.Wrapf(err, "failed to clean block after upload issue. Partial block in system. Err: %s", err.Error())
}
return err
}
// MarkForDeletion creates a file which stores information about when the block was marked for deletion.
func MarkForDeletion(ctx context.Context, logger log.Logger, bkt objstore.Bucket, id ulid.ULID, details string, markedForDeletion prometheus.Counter) error {
deletionMarkFile := path.Join(id.String(), metadata.DeletionMarkFilename)

View File

@ -145,52 +145,21 @@ func TestUpload(t *testing.T) {
testutil.Equals(t, 3, len(bkt.Objects()))
testutil.Equals(t, 3727, len(bkt.Objects()[path.Join(b1.String(), ChunksDirname, "000001")]))
testutil.Equals(t, 401, len(bkt.Objects()[path.Join(b1.String(), IndexFilename)]))
testutil.Equals(t, 595, len(bkt.Objects()[path.Join(b1.String(), MetaFilename)]))
testutil.Equals(t, true, 600 < len(bkt.Objects()[path.Join(b1.String(), MetaFilename)]))
m := &metadata.Meta{}
testutil.Ok(t, json.Unmarshal(bkt.Objects()[path.Join(b1.String(), MetaFilename)], m))
testutil.Equals(t, b1, m.ULID)
testutil.Equals(t, int64(1000), m.MaxTime)
testutil.Equals(t, int64(0), m.MinTime)
testutil.Equals(t, uint64(500), m.Stats.NumSamples)
testutil.Equals(t, uint64(500), m.Stats.NumFloatSamples)
testutil.Equals(t, uint64(5), m.Stats.NumSeries)
testutil.Equals(t, uint64(5), m.Stats.NumChunks)
testutil.Equals(t, 1, len(m.Compaction.Sources))
testutil.Equals(t, labels.FromStrings("ext1", "val1"), labels.FromMap(m.Thanos.Labels))
testutil.Equals(t, 3, len(m.Thanos.Files))
// File stats are gathered.
testutil.Equals(t, fmt.Sprintf(`{
"ulid": "%s",
"minTime": 0,
"maxTime": 1000,
"stats": {
"numSamples": 500,
"numSeries": 5,
"numChunks": 5
},
"compaction": {
"level": 1,
"sources": [
"%s"
]
},
"version": 1,
"thanos": {
"labels": {
"ext1": "val1"
},
"downsample": {
"resolution": 124
},
"source": "test",
"files": [
{
"rel_path": "chunks/000001",
"size_bytes": 3727
},
{
"rel_path": "index",
"size_bytes": 401
},
{
"rel_path": "meta.json"
}
],
"index_stats": {
"series_max_size": 16
}
}
}
`, b1.String(), b1.String()), string(bkt.Objects()[path.Join(b1.String(), MetaFilename)]))
}
{
// Test Upload is idempotent.
@ -198,7 +167,7 @@ func TestUpload(t *testing.T) {
testutil.Equals(t, 3, len(bkt.Objects()))
testutil.Equals(t, 3727, len(bkt.Objects()[path.Join(b1.String(), ChunksDirname, "000001")]))
testutil.Equals(t, 401, len(bkt.Objects()[path.Join(b1.String(), IndexFilename)]))
testutil.Equals(t, 595, len(bkt.Objects()[path.Join(b1.String(), MetaFilename)]))
testutil.Equals(t, true, 600 < len(bkt.Objects()[path.Join(b1.String(), MetaFilename)]))
}
{
// Upload with no external labels should be blocked.
@ -230,7 +199,18 @@ func TestUpload(t *testing.T) {
testutil.Equals(t, 6, len(bkt.Objects()))
testutil.Equals(t, 3727, len(bkt.Objects()[path.Join(b2.String(), ChunksDirname, "000001")]))
testutil.Equals(t, 401, len(bkt.Objects()[path.Join(b2.String(), IndexFilename)]))
testutil.Equals(t, 574, len(bkt.Objects()[path.Join(b2.String(), MetaFilename)]))
m := &metadata.Meta{}
testutil.Ok(t, json.Unmarshal(bkt.Objects()[path.Join(b2.String(), MetaFilename)], m))
testutil.Equals(t, b2, m.ULID)
testutil.Equals(t, int64(1000), m.MaxTime)
testutil.Equals(t, int64(0), m.MinTime)
testutil.Equals(t, uint64(500), m.Stats.NumSamples)
testutil.Equals(t, uint64(500), m.Stats.NumFloatSamples)
testutil.Equals(t, uint64(5), m.Stats.NumSeries)
testutil.Equals(t, uint64(5), m.Stats.NumChunks)
testutil.Equals(t, 1, len(m.Compaction.Sources))
testutil.Equals(t, 3, len(m.Thanos.Files))
}
}
@ -560,8 +540,9 @@ func TestUploadCleanup(t *testing.T) {
uploadErr := Upload(ctx, log.NewNopLogger(), errBkt, path.Join(tmpDir, b1.String()), metadata.NoneFunc)
testutil.Assert(t, errors.Is(uploadErr, errUploadFailed))
// If upload of index fails, block is deleted.
testutil.Equals(t, 0, len(bkt.Objects()))
// If upload of index fails, the objects remain because the deletion of partial blocks
// is taken care of by the Compactor.
testutil.Equals(t, 2, len(bkt.Objects()))
testutil.Assert(t, len(bkt.Objects()[path.Join(DebugMetas, fmt.Sprintf("%s.json", b1.String()))]) == 0)
}
@ -588,8 +569,8 @@ type errBucket struct {
failSuffix string
}
func (eb errBucket) Upload(ctx context.Context, name string, r io.Reader) error {
err := eb.Bucket.Upload(ctx, name, r)
func (eb errBucket) Upload(ctx context.Context, name string, r io.Reader, opts ...objstore.ObjectUploadOption) error {
err := eb.Bucket.Upload(ctx, name, r, opts...)
if err != nil {
return err
}

View File

@ -7,6 +7,7 @@ import (
"context"
"encoding/json"
"io"
"maps"
"os"
"path"
"path/filepath"
@ -42,7 +43,8 @@ const FetcherConcurrency = 32
// to allow depending projects (eg. Cortex) to implement their own custom metadata fetcher while tracking
// compatible metrics.
type BaseFetcherMetrics struct {
Syncs prometheus.Counter
Syncs prometheus.Counter
CacheBusts prometheus.Counter
}
// FetcherMetrics holds metrics tracked by the metadata fetcher. This struct and its fields are exported
@ -92,6 +94,9 @@ const (
// MarkedForNoDownsampleMeta is label for blocks which are loaded but also marked for no downsample. This label is also counted in `loaded` label metric.
MarkedForNoDownsampleMeta = "marked-for-no-downsample"
// ParquetMigratedMeta is label for blocks which are marked as migrated to parquet format.
ParquetMigratedMeta = "parquet-migrated"
// Modified label values.
replicaRemovedMeta = "replica-label-removed"
)
@ -104,6 +109,11 @@ func NewBaseFetcherMetrics(reg prometheus.Registerer) *BaseFetcherMetrics {
Name: "base_syncs_total",
Help: "Total blocks metadata synchronization attempts by base Fetcher",
})
m.CacheBusts = promauto.With(reg).NewCounter(prometheus.CounterOpts{
Subsystem: FetcherSubSys,
Name: "base_cache_busts_total",
Help: "Total blocks metadata cache busts by base Fetcher",
})
return &m
}
@ -162,6 +172,7 @@ func DefaultSyncedStateLabelValues() [][]string {
{duplicateMeta},
{MarkedForDeletionMeta},
{MarkedForNoCompactionMeta},
{ParquetMigratedMeta},
}
}
@ -175,7 +186,7 @@ func DefaultModifiedLabelValues() [][]string {
type Lister interface {
// GetActiveAndPartialBlockIDs GetActiveBlocksIDs returning it via channel (streaming) and response.
// Active blocks are blocks which contain meta.json, while partial blocks are blocks without meta.json
GetActiveAndPartialBlockIDs(ctx context.Context, ch chan<- ulid.ULID) (partialBlocks map[ulid.ULID]bool, err error)
GetActiveAndPartialBlockIDs(ctx context.Context, activeBlocks chan<- ActiveBlockFetchData) (partialBlocks map[ulid.ULID]bool, err error)
}
// RecursiveLister lists block IDs by recursively iterating through a bucket.
@ -191,9 +202,17 @@ func NewRecursiveLister(logger log.Logger, bkt objstore.InstrumentedBucketReader
}
}
func (f *RecursiveLister) GetActiveAndPartialBlockIDs(ctx context.Context, ch chan<- ulid.ULID) (partialBlocks map[ulid.ULID]bool, err error) {
type ActiveBlockFetchData struct {
lastModified time.Time
ulid.ULID
}
func (f *RecursiveLister) GetActiveAndPartialBlockIDs(ctx context.Context, activeBlocks chan<- ActiveBlockFetchData) (partialBlocks map[ulid.ULID]bool, err error) {
partialBlocks = make(map[ulid.ULID]bool)
err = f.bkt.Iter(ctx, "", func(name string) error {
err = f.bkt.IterWithAttributes(ctx, "", func(attrs objstore.IterObjectAttributes) error {
name := attrs.Name
parts := strings.Split(name, "/")
dir, file := parts[0], parts[len(parts)-1]
id, ok := IsBlockDir(dir)
@ -206,15 +225,20 @@ func (f *RecursiveLister) GetActiveAndPartialBlockIDs(ctx context.Context, ch ch
if !IsBlockMetaFile(file) {
return nil
}
partialBlocks[id] = false
lastModified, _ := attrs.LastModified()
delete(partialBlocks, id)
select {
case <-ctx.Done():
return ctx.Err()
case ch <- id:
case activeBlocks <- ActiveBlockFetchData{
ULID: id,
lastModified: lastModified,
}:
}
return nil
}, objstore.WithRecursiveIter())
}, objstore.WithUpdatedAt(), objstore.WithRecursiveIter())
return partialBlocks, err
}
@ -232,16 +256,17 @@ func NewConcurrentLister(logger log.Logger, bkt objstore.InstrumentedBucketReade
}
}
func (f *ConcurrentLister) GetActiveAndPartialBlockIDs(ctx context.Context, ch chan<- ulid.ULID) (partialBlocks map[ulid.ULID]bool, err error) {
func (f *ConcurrentLister) GetActiveAndPartialBlockIDs(ctx context.Context, activeBlocks chan<- ActiveBlockFetchData) (partialBlocks map[ulid.ULID]bool, err error) {
const concurrency = 64
partialBlocks = make(map[ulid.ULID]bool)
var (
metaChan = make(chan ulid.ULID, concurrency)
eg, gCtx = errgroup.WithContext(ctx)
mu sync.Mutex
)
for i := 0; i < concurrency; i++ {
for range concurrency {
eg.Go(func() error {
for uid := range metaChan {
// TODO(bwplotka): If that causes problems (obj store rate limits), add longer ttl to cached items.
@ -258,10 +283,14 @@ func (f *ConcurrentLister) GetActiveAndPartialBlockIDs(ctx context.Context, ch c
mu.Unlock()
continue
}
select {
case <-gCtx.Done():
return gCtx.Err()
case ch <- uid:
case activeBlocks <- ActiveBlockFetchData{
ULID: uid,
lastModified: time.Time{}, // Not used, cache busting is only implemented by the recursive lister because otherwise we would have to call Attributes() (one extra call).
}:
}
}
return nil
@ -314,12 +343,16 @@ type BaseFetcher struct {
blockIDsLister Lister
// Optional local directory to cache meta.json files.
cacheDir string
syncs prometheus.Counter
g singleflight.Group
cacheDir string
syncs prometheus.Counter
cacheBusts prometheus.Counter
g singleflight.Group
mtx sync.Mutex
cached map[ulid.ULID]*metadata.Meta
mtx sync.Mutex
cached *sync.Map
modifiedTimestamps map[ulid.ULID]time.Time
}
// NewBaseFetcher constructs BaseFetcher.
@ -347,8 +380,9 @@ func NewBaseFetcherWithMetrics(logger log.Logger, concurrency int, bkt objstore.
bkt: bkt,
blockIDsLister: blockIDsLister,
cacheDir: cacheDir,
cached: map[ulid.ULID]*metadata.Meta{},
cached: &sync.Map{},
syncs: metrics.Syncs,
cacheBusts: metrics.CacheBusts,
}, nil
}
@ -377,12 +411,12 @@ func NewMetaFetcherWithMetrics(logger log.Logger, concurrency int, bkt objstore.
}
// NewMetaFetcher transforms BaseFetcher into actually usable *MetaFetcher.
func (f *BaseFetcher) NewMetaFetcher(reg prometheus.Registerer, filters []MetadataFilter, logTags ...interface{}) *MetaFetcher {
func (f *BaseFetcher) NewMetaFetcher(reg prometheus.Registerer, filters []MetadataFilter, logTags ...any) *MetaFetcher {
return f.NewMetaFetcherWithMetrics(NewFetcherMetrics(reg, nil, nil), filters, logTags...)
}
// NewMetaFetcherWithMetrics transforms BaseFetcher into actually usable *MetaFetcher.
func (f *BaseFetcher) NewMetaFetcherWithMetrics(fetcherMetrics *FetcherMetrics, filters []MetadataFilter, logTags ...interface{}) *MetaFetcher {
func (f *BaseFetcher) NewMetaFetcherWithMetrics(fetcherMetrics *FetcherMetrics, filters []MetadataFilter, logTags ...any) *MetaFetcher {
return &MetaFetcher{metrics: fetcherMetrics, wrapped: f, filters: filters, logger: log.With(f.logger, logTags...)}
}
@ -391,6 +425,22 @@ var (
ErrorSyncMetaCorrupted = errors.New("meta.json corrupted")
)
func (f *BaseFetcher) metaUpdated(id ulid.ULID, modified time.Time) bool {
if f.modifiedTimestamps[id].IsZero() {
return false
}
return !f.modifiedTimestamps[id].Equal(modified)
}
func (f *BaseFetcher) bustCacheForID(id ulid.ULID) {
f.cacheBusts.Inc()
f.cached.Delete(id)
if err := os.RemoveAll(filepath.Join(f.cacheDir, id.String())); err != nil {
level.Warn(f.logger).Log("msg", "failed to remove cached meta.json dir", "dir", filepath.Join(f.cacheDir, id.String()), "err", err)
}
}
// loadMeta returns metadata from object storage or error.
// It returns `ErrorSyncMetaNotFound` and `ErrorSyncMetaCorrupted` sentinel errors in those cases.
func (f *BaseFetcher) loadMeta(ctx context.Context, id ulid.ULID) (*metadata.Meta, error) {
@ -399,8 +449,8 @@ func (f *BaseFetcher) loadMeta(ctx context.Context, id ulid.ULID) (*metadata.Met
cachedBlockDir = filepath.Join(f.cacheDir, id.String())
)
if m, seen := f.cached[id]; seen {
return m, nil
if m, seen := f.cached.Load(id); seen {
return m.(*metadata.Meta), nil
}
// Best effort load from local dir.
@ -457,8 +507,9 @@ func (f *BaseFetcher) loadMeta(ctx context.Context, id ulid.ULID) (*metadata.Met
}
type response struct {
metas map[ulid.ULID]*metadata.Meta
partial map[ulid.ULID]error
metas map[ulid.ULID]*metadata.Meta
partial map[ulid.ULID]error
modifiedTimestamps map[ulid.ULID]time.Time
// If metaErr > 0 it means incomplete view, so some metas, failed to be loaded.
metaErrs errutil.MultiError
@ -466,26 +517,34 @@ type response struct {
corruptedMetas float64
}
func (f *BaseFetcher) fetchMetadata(ctx context.Context) (interface{}, error) {
func (f *BaseFetcher) fetchMetadata(ctx context.Context) (any, error) {
f.syncs.Inc()
var (
resp = response{
metas: make(map[ulid.ULID]*metadata.Meta),
partial: make(map[ulid.ULID]error),
metas: make(map[ulid.ULID]*metadata.Meta),
partial: make(map[ulid.ULID]error),
modifiedTimestamps: make(map[ulid.ULID]time.Time),
}
eg errgroup.Group
ch = make(chan ulid.ULID, f.concurrency)
mtx sync.Mutex
eg errgroup.Group
activeBlocksCh = make(chan ActiveBlockFetchData, f.concurrency)
mtx sync.Mutex
)
level.Debug(f.logger).Log("msg", "fetching meta data", "concurrency", f.concurrency)
for i := 0; i < f.concurrency; i++ {
eg.Go(func() error {
for id := range ch {
for activeBlockFetchMD := range activeBlocksCh {
id := activeBlockFetchMD.ULID
if f.metaUpdated(id, activeBlockFetchMD.lastModified) {
f.bustCacheForID(id)
}
meta, err := f.loadMeta(ctx, id)
if err == nil {
mtx.Lock()
resp.metas[id] = meta
resp.modifiedTimestamps[id] = activeBlockFetchMD.lastModified
mtx.Unlock()
continue
}
@ -518,8 +577,8 @@ func (f *BaseFetcher) fetchMetadata(ctx context.Context) (interface{}, error) {
var err error
// Workers scheduled, distribute blocks.
eg.Go(func() error {
defer close(ch)
partialBlocks, err = f.blockIDsLister.GetActiveAndPartialBlockIDs(ctx, ch)
defer close(activeBlocksCh)
partialBlocks, err = f.blockIDsLister.GetActiveAndPartialBlockIDs(ctx, activeBlocksCh)
return err
})
@ -541,13 +600,18 @@ func (f *BaseFetcher) fetchMetadata(ctx context.Context) (interface{}, error) {
}
// Only for complete view of blocks update the cache.
cached := make(map[ulid.ULID]*metadata.Meta, len(resp.metas))
cached := &sync.Map{}
for id, m := range resp.metas {
cached[id] = m
cached.Store(id, m)
}
modifiedTimestamps := make(map[ulid.ULID]time.Time, len(resp.modifiedTimestamps))
maps.Copy(modifiedTimestamps, resp.modifiedTimestamps)
f.mtx.Lock()
f.cached = cached
f.modifiedTimestamps = modifiedTimestamps
f.mtx.Unlock()
// Best effort cleanup of disk-cached metas.
@ -593,7 +657,7 @@ func (f *BaseFetcher) fetch(ctx context.Context, metrics *FetcherMetrics, filter
// Run this in thread safe run group.
// TODO(bwplotka): Consider custom singleflight with ttl.
v, err := f.g.Do("", func() (i interface{}, err error) {
v, err := f.g.Do("", func() (i any, err error) {
// NOTE: First go routine context will go through.
return f.fetchMetadata(ctx)
})
@ -604,9 +668,7 @@ func (f *BaseFetcher) fetch(ctx context.Context, metrics *FetcherMetrics, filter
// Copy as same response might be reused by different goroutines.
metas := make(map[ulid.ULID]*metadata.Meta, len(resp.metas))
for id, m := range resp.metas {
metas[id] = m
}
maps.Copy(metas, resp.metas)
metrics.Synced.WithLabelValues(FailedMeta).Set(float64(len(resp.metaErrs)))
metrics.Synced.WithLabelValues(NoMeta).Set(resp.noMetas)
@ -632,8 +694,12 @@ func (f *BaseFetcher) fetch(ctx context.Context, metrics *FetcherMetrics, filter
func (f *BaseFetcher) countCached() int {
f.mtx.Lock()
defer f.mtx.Unlock()
return len(f.cached)
var i int
f.cached.Range(func(_, _ any) bool {
i++
return true
})
return i
}
type MetaFetcher struct {
@ -755,20 +821,32 @@ func NewDeduplicateFilter(concurrency int) *DefaultDeduplicateFilter {
// Filter filters out duplicate blocks that can be formed
// from two or more overlapping blocks that fully submatches the source blocks of the older blocks.
func (f *DefaultDeduplicateFilter) Filter(_ context.Context, metas map[ulid.ULID]*metadata.Meta, synced GaugeVec, modified GaugeVec) error {
f.duplicateIDs = f.duplicateIDs[:0]
var wg sync.WaitGroup
var filterWg, dupWg sync.WaitGroup
var groupChan = make(chan []*metadata.Meta)
var dupsChan = make(chan ulid.ULID)
dupWg.Go(func() {
dups := make([]ulid.ULID, 0)
for dup := range dupsChan {
if metas[dup] != nil {
dups = append(dups, dup)
}
synced.WithLabelValues(duplicateMeta).Inc()
delete(metas, dup)
}
f.mu.Lock()
f.duplicateIDs = dups
f.mu.Unlock()
})
// Start up workers to deduplicate workgroups when they're ready.
for i := 0; i < f.concurrency; i++ {
wg.Add(1)
go func() {
defer wg.Done()
filterWg.Go(func() {
for group := range groupChan {
f.filterGroup(group, metas, synced)
f.filterGroup(group, dupsChan)
}
}()
})
}
// We need only look within a compaction group for duplicates, so splitting by group key gives us parallelizable streams.
@ -781,12 +859,15 @@ func (f *DefaultDeduplicateFilter) Filter(_ context.Context, metas map[ulid.ULID
groupChan <- group
}
close(groupChan)
wg.Wait()
filterWg.Wait()
close(dupsChan)
dupWg.Wait()
return nil
}
func (f *DefaultDeduplicateFilter) filterGroup(metaSlice []*metadata.Meta, metas map[ulid.ULID]*metadata.Meta, synced GaugeVec) {
func (f *DefaultDeduplicateFilter) filterGroup(metaSlice []*metadata.Meta, dupsChan chan ulid.ULID) {
sort.Slice(metaSlice, func(i, j int) bool {
ilen := len(metaSlice[i].Compaction.Sources)
jlen := len(metaSlice[j].Compaction.Sources)
@ -817,19 +898,16 @@ childLoop:
coveringSet = append(coveringSet, child)
}
f.mu.Lock()
for _, duplicate := range duplicates {
if metas[duplicate] != nil {
f.duplicateIDs = append(f.duplicateIDs, duplicate)
}
synced.WithLabelValues(duplicateMeta).Inc()
delete(metas, duplicate)
dupsChan <- duplicate
}
f.mu.Unlock()
}
// DuplicateIDs returns slice of block ids that are filtered out by DefaultDeduplicateFilter.
func (f *DefaultDeduplicateFilter) DuplicateIDs() []ulid.ULID {
f.mu.Lock()
defer f.mu.Unlock()
return f.duplicateIDs
}
@ -872,9 +950,7 @@ func (r *ReplicaLabelRemover) Filter(_ context.Context, metas map[ulid.ULID]*met
countReplicaLabelRemoved := make(map[string]int, len(metas))
for u, meta := range metas {
l := make(map[string]string)
for n, v := range meta.Thanos.Labels {
l[n] = v
}
maps.Copy(l, meta.Thanos.Labels)
for _, replicaLabel := range r.replicaLabels {
if _, exists := l[replicaLabel]; exists {
@ -932,10 +1008,18 @@ func NewConsistencyDelayMetaFilterWithoutMetrics(logger log.Logger, consistencyD
// Filter filters out blocks that filters blocks that have are created before a specified consistency delay.
func (f *ConsistencyDelayMetaFilter) Filter(_ context.Context, metas map[ulid.ULID]*metadata.Meta, synced GaugeVec, modified GaugeVec) error {
for id, meta := range metas {
var metaUploadTime = meta.Thanos.UploadTime
var tooFresh bool
if !metaUploadTime.IsZero() {
tooFresh = time.Since(metaUploadTime) < f.consistencyDelay
} else {
tooFresh = ulid.Now()-id.Time() < uint64(f.consistencyDelay/time.Millisecond)
}
// TODO(khyatisoneji): Remove the checks about Thanos Source
// by implementing delete delay to fetch metas.
// TODO(bwplotka): Check consistency delay based on file upload / modification time instead of ULID.
if ulid.Now()-id.Time() < uint64(f.consistencyDelay/time.Millisecond) &&
if tooFresh &&
meta.Thanos.Source != metadata.BucketRepairSource &&
meta.Thanos.Source != metadata.CompactorSource &&
meta.Thanos.Source != metadata.CompactorRepairSource {
@ -979,9 +1063,7 @@ func (f *IgnoreDeletionMarkFilter) DeletionMarkBlocks() map[ulid.ULID]*metadata.
defer f.mtx.Unlock()
deletionMarkMap := make(map[ulid.ULID]*metadata.DeletionMark, len(f.deletionMarkMap))
for id, meta := range f.deletionMarkMap {
deletionMarkMap[id] = meta
}
maps.Copy(deletionMarkMap, f.deletionMarkMap)
return deletionMarkMap
}
@ -999,11 +1081,16 @@ func (f *IgnoreDeletionMarkFilter) Filter(ctx context.Context, metas map[ulid.UL
}
var (
eg errgroup.Group
ch = make(chan ulid.ULID, f.concurrency)
mtx sync.Mutex
eg errgroup.Group
ch = make(chan ulid.ULID, f.concurrency)
mtx sync.Mutex
preFilterMetas = make(map[ulid.ULID]struct{}, len(metas))
)
for k := range metas {
preFilterMetas[k] = struct{}{}
}
for i := 0; i < f.concurrency; i++ {
eg.Go(func() error {
var lastErr error
@ -1058,7 +1145,19 @@ func (f *IgnoreDeletionMarkFilter) Filter(ctx context.Context, metas map[ulid.UL
}
f.mtx.Lock()
f.deletionMarkMap = deletionMarkMap
if f.deletionMarkMap == nil {
f.deletionMarkMap = make(map[ulid.ULID]*metadata.DeletionMark)
}
maps.Copy(f.deletionMarkMap, deletionMarkMap)
for u := range f.deletionMarkMap {
if _, exists := preFilterMetas[u]; exists {
continue
}
delete(f.deletionMarkMap, u)
}
f.mtx.Unlock()
return nil
@ -1086,3 +1185,46 @@ func ParseRelabelConfig(contentYaml []byte, supportedActions map[relabel.Action]
return relabelConfig, nil
}
var _ MetadataFilter = &ParquetMigratedMetaFilter{}
// ParquetMigratedMetaFilter is a metadata filter that filters out blocks that have been
// migrated to parquet format. The filter checks for the presence of the parquet_migrated
// extension key with a value of true.
// Not go-routine safe.
type ParquetMigratedMetaFilter struct {
logger log.Logger
}
// NewParquetMigratedMetaFilter creates a new ParquetMigratedMetaFilter.
func NewParquetMigratedMetaFilter(logger log.Logger) *ParquetMigratedMetaFilter {
return &ParquetMigratedMetaFilter{
logger: logger,
}
}
// Filter filters out blocks that have been marked as migrated to parquet format.
func (f *ParquetMigratedMetaFilter) Filter(_ context.Context, metas map[ulid.ULID]*metadata.Meta, synced GaugeVec, modified GaugeVec) error {
for id, meta := range metas {
if meta.Thanos.Extensions == nil {
continue
}
extensionsMap, ok := meta.Thanos.Extensions.(map[string]any)
if !ok {
continue
}
parquetMigrated, exists := extensionsMap[metadata.ParquetMigratedExtensionKey]
if !exists {
continue
}
if migratedBool, ok := parquetMigrated.(bool); ok && migratedBool {
level.Debug(f.logger).Log("msg", "filtering out parquet migrated block", "block", id)
synced.WithLabelValues(ParquetMigratedMeta).Inc()
delete(metas, id)
}
}
return nil
}

View File

@ -22,6 +22,7 @@ import (
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
promtest "github.com/prometheus/client_golang/prometheus/testutil"
"github.com/prometheus/prometheus/tsdb"
"github.com/thanos-io/objstore"
@ -72,30 +73,38 @@ func TestMetaFetcher_Fetch(t *testing.T) {
dir := t.TempDir()
var ulidToDelete ulid.ULID
r := prometheus.NewRegistry()
noopLogger := log.NewNopLogger()
insBkt := objstore.WithNoopInstr(bkt)
baseBlockIDsFetcher := NewConcurrentLister(noopLogger, insBkt)
baseFetcher, err := NewBaseFetcher(noopLogger, 20, insBkt, baseBlockIDsFetcher, dir, r)
r := prometheus.NewRegistry()
recursiveLister := NewRecursiveLister(noopLogger, insBkt)
recursiveBaseFetcher, err := NewBaseFetcher(noopLogger, 20, insBkt, recursiveLister, dir, r)
testutil.Ok(t, err)
fetcher := baseFetcher.NewMetaFetcher(r, []MetadataFilter{
recursiveFetcher := recursiveBaseFetcher.NewMetaFetcher(r, []MetadataFilter{
&ulidFilter{ulidToDelete: &ulidToDelete},
}, nil)
for i, tcase := range []struct {
for _, tcase := range []struct {
name string
do func()
do func(cleanCache func())
filterULID ulid.ULID
expectedMetas []ulid.ULID
expectedCorruptedMeta []ulid.ULID
expectedNoMeta []ulid.ULID
expectedFiltered int
expectedMetaErr error
expectedCacheBusts int
expectedSyncs int
// If this is set then use it.
fetcher *MetaFetcher
baseFetcher *BaseFetcher
}{
{
name: "empty bucket",
do: func() {},
do: func(_ func()) {},
expectedMetas: ULIDs(),
expectedCorruptedMeta: ULIDs(),
@ -103,7 +112,7 @@ func TestMetaFetcher_Fetch(t *testing.T) {
},
{
name: "3 metas in bucket",
do: func() {
do: func(_ func()) {
var meta metadata.Meta
meta.Version = 1
meta.ULID = ULID(1)
@ -126,28 +135,8 @@ func TestMetaFetcher_Fetch(t *testing.T) {
expectedNoMeta: ULIDs(),
},
{
name: "nothing changed",
do: func() {},
expectedMetas: ULIDs(1, 2, 3),
expectedCorruptedMeta: ULIDs(),
expectedNoMeta: ULIDs(),
},
{
name: "fresh cache",
do: func() {
baseFetcher.cached = map[ulid.ULID]*metadata.Meta{}
},
expectedMetas: ULIDs(1, 2, 3),
expectedCorruptedMeta: ULIDs(),
expectedNoMeta: ULIDs(),
},
{
name: "fresh cache: meta 2 and 3 have corrupted data on disk ",
do: func() {
baseFetcher.cached = map[ulid.ULID]*metadata.Meta{}
name: "meta 2 and 3 have corrupted data on disk ",
do: func(cleanCache func()) {
testutil.Ok(t, os.Remove(filepath.Join(dir, "meta-syncer", ULID(2).String(), MetaFilename)))
f, err := os.OpenFile(filepath.Join(dir, "meta-syncer", ULID(3).String(), MetaFilename), os.O_WRONLY, os.ModePerm)
@ -164,7 +153,7 @@ func TestMetaFetcher_Fetch(t *testing.T) {
},
{
name: "block without meta",
do: func() {
do: func(_ func()) {
testutil.Ok(t, bkt.Upload(ctx, path.Join(ULID(4).String(), "some-file"), bytes.NewBuffer([]byte("something"))))
},
@ -174,7 +163,7 @@ func TestMetaFetcher_Fetch(t *testing.T) {
},
{
name: "corrupted meta.json",
do: func() {
do: func(_ func()) {
testutil.Ok(t, bkt.Upload(ctx, path.Join(ULID(5).String(), MetaFilename), bytes.NewBuffer([]byte("{ not a json"))))
},
@ -182,46 +171,71 @@ func TestMetaFetcher_Fetch(t *testing.T) {
expectedCorruptedMeta: ULIDs(5),
expectedNoMeta: ULIDs(4),
},
{
name: "some added some deleted",
do: func() {
testutil.Ok(t, Delete(ctx, log.NewNopLogger(), bkt, ULID(2)))
{
name: "filter not existing ulid",
do: func(_ func()) {},
filterULID: ULID(10),
expectedMetas: ULIDs(1, 2, 3),
expectedCorruptedMeta: ULIDs(5),
expectedNoMeta: ULIDs(4),
},
{
name: "filter ulid 1",
do: func(_ func()) {
var meta metadata.Meta
meta.Version = 1
meta.ULID = ULID(6)
meta.ULID = ULID(1)
var buf bytes.Buffer
testutil.Ok(t, json.NewEncoder(&buf).Encode(&meta))
testutil.Ok(t, bkt.Upload(ctx, path.Join(meta.ULID.String(), metadata.MetaFilename), &buf))
},
expectedMetas: ULIDs(1, 3, 6),
expectedCorruptedMeta: ULIDs(5),
expectedNoMeta: ULIDs(4),
},
{
name: "filter not existing ulid",
do: func() {},
filterULID: ULID(10),
expectedMetas: ULIDs(1, 3, 6),
expectedCorruptedMeta: ULIDs(5),
expectedNoMeta: ULIDs(4),
},
{
name: "filter ulid 1",
do: func() {},
filterULID: ULID(1),
expectedMetas: ULIDs(3, 6),
expectedMetas: ULIDs(2, 3),
expectedCorruptedMeta: ULIDs(5),
expectedNoMeta: ULIDs(4),
expectedFiltered: 1,
},
{
name: "use recursive lister",
do: func(cleanCache func()) {
cleanCache()
},
fetcher: recursiveFetcher,
baseFetcher: recursiveBaseFetcher,
expectedMetas: ULIDs(1, 2, 3),
expectedCorruptedMeta: ULIDs(5),
expectedNoMeta: ULIDs(4),
},
{
name: "update timestamp, expect a cache bust",
do: func(_ func()) {
var meta metadata.Meta
meta.Version = 1
meta.MaxTime = 123456
meta.ULID = ULID(1)
var buf bytes.Buffer
testutil.Ok(t, json.NewEncoder(&buf).Encode(&meta))
testutil.Ok(t, bkt.Upload(ctx, path.Join(meta.ULID.String(), metadata.MetaFilename), &buf))
},
fetcher: recursiveFetcher,
baseFetcher: recursiveBaseFetcher,
expectedMetas: ULIDs(1, 2, 3),
expectedCorruptedMeta: ULIDs(5),
expectedNoMeta: ULIDs(4),
expectedFiltered: 0,
expectedCacheBusts: 1,
expectedSyncs: 2,
},
{
name: "error: not supported meta version",
do: func() {
do: func(_ func()) {
var meta metadata.Meta
meta.Version = 20
meta.ULID = ULID(7)
@ -231,14 +245,40 @@ func TestMetaFetcher_Fetch(t *testing.T) {
testutil.Ok(t, bkt.Upload(ctx, path.Join(meta.ULID.String(), metadata.MetaFilename), &buf))
},
expectedMetas: ULIDs(1, 3, 6),
expectedMetas: ULIDs(1, 2, 3),
expectedCorruptedMeta: ULIDs(5),
expectedNoMeta: ULIDs(4),
expectedMetaErr: errors.New("incomplete view: unexpected meta file: 00000000070000000000000000/meta.json version: 20"),
},
} {
if ok := t.Run(tcase.name, func(t *testing.T) {
tcase.do()
r := prometheus.NewRegistry()
var fetcher *MetaFetcher
var baseFetcher *BaseFetcher
if tcase.baseFetcher != nil {
baseFetcher = tcase.baseFetcher
} else {
lister := NewConcurrentLister(noopLogger, insBkt)
bf, err := NewBaseFetcher(noopLogger, 20, insBkt, lister, dir, r)
testutil.Ok(t, err)
baseFetcher = bf
}
if tcase.fetcher != nil {
fetcher = tcase.fetcher
} else {
fetcher = baseFetcher.NewMetaFetcher(r, []MetadataFilter{
&ulidFilter{ulidToDelete: &ulidToDelete},
}, nil)
}
tcase.do(func() {
baseFetcher.cached.Clear()
testutil.Ok(t, os.RemoveAll(filepath.Join(dir, "meta-syncer")))
})
ulidToDelete = tcase.filterULID
metas, partial, err := fetcher.Fetch(ctx)
@ -282,8 +322,10 @@ func TestMetaFetcher_Fetch(t *testing.T) {
if tcase.expectedMetaErr != nil {
expectedFailures = 1
}
testutil.Equals(t, float64(i+1), promtest.ToFloat64(baseFetcher.syncs))
testutil.Equals(t, float64(i+1), promtest.ToFloat64(fetcher.metrics.Syncs))
testutil.Equals(t, float64(max(1, tcase.expectedSyncs)), promtest.ToFloat64(baseFetcher.syncs))
testutil.Equals(t, float64(tcase.expectedCacheBusts), promtest.ToFloat64(baseFetcher.cacheBusts))
testutil.Equals(t, float64(max(1, tcase.expectedSyncs)), promtest.ToFloat64(fetcher.metrics.Syncs))
testutil.Equals(t, float64(len(tcase.expectedMetas)), promtest.ToFloat64(fetcher.metrics.Synced.WithLabelValues(LoadedMeta)))
testutil.Equals(t, float64(len(tcase.expectedNoMeta)), promtest.ToFloat64(fetcher.metrics.Synced.WithLabelValues(NoMeta)))
testutil.Equals(t, float64(tcase.expectedFiltered), promtest.ToFloat64(fetcher.metrics.Synced.WithLabelValues("filtered")))
@ -376,9 +418,9 @@ func TestLabelShardedMetaFilter_Filter_Hashmod(t *testing.T) {
source_labels: ["shard"]
regex: %d
`
for i := 0; i < 3; i++ {
for i := range 3 {
t.Run(fmt.Sprintf("%v", i), func(t *testing.T) {
relabelConfig, err := ParseRelabelConfig([]byte(fmt.Sprintf(relabelContentYamlFmt, BlockIDLabel, i)), SelectorSupportedRelabelActions)
relabelConfig, err := ParseRelabelConfig(fmt.Appendf(nil, relabelContentYamlFmt, BlockIDLabel, i), SelectorSupportedRelabelActions)
testutil.Ok(t, err)
f := NewLabelShardedMetaFilter(relabelConfig)
@ -947,10 +989,7 @@ func TestReplicaLabelRemover_Modify(t *testing.T) {
func compareSliceWithMapKeys(tb testing.TB, m map[ulid.ULID]*metadata.Meta, s []ulid.ULID) {
_, file, line, _ := runtime.Caller(1)
matching := true
if len(m) != len(s) {
matching = false
}
matching := len(m) == len(s)
for _, val := range s {
if m[val] == nil {
@ -1027,6 +1066,7 @@ func TestConsistencyDelayMetaFilter_Filter_0(t *testing.T) {
u.ULID(now.Add(-20 * time.Hour)): {Thanos: metadata.Thanos{Source: metadata.SidecarSource}},
u.ULID(now.Add(-20 * time.Hour)): {Thanos: metadata.Thanos{Source: metadata.ReceiveSource}},
u.ULID(now.Add(-20 * time.Hour)): {Thanos: metadata.Thanos{Source: metadata.RulerSource}},
u.ULID(now): {Thanos: metadata.Thanos{UploadTime: time.Now().Add(-20 * time.Hour), Source: metadata.RulerSource}},
u.ULID(now.Add(-20 * time.Hour)): {Thanos: metadata.Thanos{Source: metadata.BucketRepairSource}},
u.ULID(now.Add(-20 * time.Hour)): {Thanos: metadata.Thanos{Source: metadata.CompactorSource}},
u.ULID(now.Add(-20 * time.Hour)): {Thanos: metadata.Thanos{Source: metadata.CompactorRepairSource}},
@ -1151,7 +1191,7 @@ func BenchmarkDeduplicateFilter_Filter(b *testing.B) {
},
}
for j := 0; j < 100; j++ {
for range 100 {
cases[0][id].Compaction.Sources = append(cases[0][id].Compaction.Sources, ulid.MustNew(count, nil))
count++
}
@ -1174,7 +1214,7 @@ func BenchmarkDeduplicateFilter_Filter(b *testing.B) {
Downsample: metadata.ThanosDownsample{Resolution: res},
},
}
for j := 0; j < 100; j++ {
for range 100 {
cases[1][id].Compaction.Sources = append(cases[1][id].Compaction.Sources, ulid.MustNew(count, nil))
count++
}
@ -1212,3 +1252,227 @@ func Test_ParseRelabelConfig(t *testing.T) {
testutil.NotOk(t, err)
testutil.Equals(t, "unsupported relabel action: labelmap", err.Error())
}
func TestParquetMigratedMetaFilter_Filter(t *testing.T) {
logger := log.NewNopLogger()
filter := NewParquetMigratedMetaFilter(logger)
// Simulate what might happen when extensions are loaded from JSON
extensions := struct {
ParquetMigrated bool `json:"parquet_migrated"`
}{
ParquetMigrated: true,
}
for _, c := range []struct {
name string
metas map[ulid.ULID]*metadata.Meta
check func(t *testing.T, metas map[ulid.ULID]*metadata.Meta, err error)
}{
{
name: "block with other extensions",
metas: map[ulid.ULID]*metadata.Meta{
ulid.MustNew(2, nil): {
Thanos: metadata.Thanos{
Extensions: map[string]any{
"other_key": "other_value",
},
},
},
},
check: func(t *testing.T, metas map[ulid.ULID]*metadata.Meta, err error) {
testutil.Ok(t, err)
testutil.Equals(t, 1, len(metas))
},
},
{
name: "no extensions",
metas: map[ulid.ULID]*metadata.Meta{
ulid.MustNew(1, nil): {
Thanos: metadata.Thanos{
Extensions: nil,
},
},
},
check: func(t *testing.T, metas map[ulid.ULID]*metadata.Meta, err error) {
testutil.Equals(t, 1, len(metas))
testutil.Ok(t, err)
},
},
{
name: "block with parquet_migrated=false",
metas: map[ulid.ULID]*metadata.Meta{
ulid.MustNew(3, nil): {
Thanos: metadata.Thanos{
Extensions: map[string]any{
metadata.ParquetMigratedExtensionKey: false,
},
},
},
},
check: func(t *testing.T, metas map[ulid.ULID]*metadata.Meta, err error) {
testutil.Equals(t, 1, len(metas))
testutil.Ok(t, err)
},
},
{
name: "block with parquet_migrated=true",
metas: map[ulid.ULID]*metadata.Meta{
ulid.MustNew(4, nil): {
Thanos: metadata.Thanos{
Extensions: map[string]any{
metadata.ParquetMigratedExtensionKey: true,
},
},
},
},
check: func(t *testing.T, metas map[ulid.ULID]*metadata.Meta, err error) {
testutil.Equals(t, 0, len(metas))
testutil.Ok(t, err)
},
},
{
name: "mixed blocks with parquet_migrated",
metas: map[ulid.ULID]*metadata.Meta{
ulid.MustNew(5, nil): {
Thanos: metadata.Thanos{
Extensions: map[string]any{
metadata.ParquetMigratedExtensionKey: true,
},
},
},
ulid.MustNew(6, nil): {
Thanos: metadata.Thanos{
Extensions: map[string]any{
metadata.ParquetMigratedExtensionKey: false,
},
},
},
ulid.MustNew(7, nil): {
Thanos: metadata.Thanos{
Extensions: nil,
},
},
},
check: func(t *testing.T, metas map[ulid.ULID]*metadata.Meta, err error) {
testutil.Equals(t, 2, len(metas))
testutil.Ok(t, err)
testutil.Assert(t, metas[ulid.MustNew(6, nil)] != nil, "Expected block with parquet_migrated=false to remain")
testutil.Assert(t, metas[ulid.MustNew(7, nil)] != nil, "Expected block without extensions to remain")
},
},
{
name: "block with serialized extensions",
metas: map[ulid.ULID]*metadata.Meta{
ulid.MustNew(8, nil): {
Thanos: metadata.Thanos{
Extensions: extensions,
},
},
},
check: func(t *testing.T, metas map[ulid.ULID]*metadata.Meta, err error) {
testutil.Equals(t, 0, len(metas))
testutil.Ok(t, err)
},
},
} {
t.Run(c.name, func(t *testing.T) {
r := prometheus.NewRegistry()
synced := promauto.With(r).NewGaugeVec(
prometheus.GaugeOpts{
Name: "test_synced",
Help: "Test synced metric",
},
[]string{"state"},
)
modified := promauto.With(r).NewGaugeVec(
prometheus.GaugeOpts{
Name: "test_modified",
Help: "Test modified metric",
},
[]string{"state"},
)
ctx := context.Background()
m, err := json.Marshal(c.metas)
testutil.Ok(t, err)
var outmetas map[ulid.ULID]*metadata.Meta
testutil.Ok(t, json.Unmarshal(m, &outmetas))
err = filter.Filter(ctx, outmetas, synced, modified)
c.check(t, outmetas, err)
})
}
}
func TestDeletionMarkFilter_HoldsOntoMarks(t *testing.T) {
ctx := context.Background()
bkt := objstore.NewInMemBucket()
now := time.Now()
f := NewIgnoreDeletionMarkFilter(log.NewNopLogger(), objstore.WithNoopInstr(bkt), 48*time.Hour, 32)
shouldFetch := &metadata.DeletionMark{
ID: ULID(1),
DeletionTime: now.Add(-15 * time.Hour).Unix(),
Version: 1,
}
shouldIgnore := &metadata.DeletionMark{
ID: ULID(2),
DeletionTime: now.Add(-60 * time.Hour).Unix(),
Version: 1,
}
var buf bytes.Buffer
testutil.Ok(t, json.NewEncoder(&buf).Encode(&shouldFetch))
testutil.Ok(t, bkt.Upload(ctx, path.Join(shouldFetch.ID.String(), metadata.DeletionMarkFilename), &buf))
buf.Truncate(0)
md := &metadata.Meta{
Thanos: metadata.Thanos{
Version: 1,
},
}
testutil.Ok(t, json.NewEncoder(&buf).Encode(md))
testutil.Ok(t, bkt.Upload(ctx, path.Join(shouldFetch.ID.String(), "meta.json"), &buf))
testutil.Ok(t, json.NewEncoder(&buf).Encode(&shouldIgnore))
testutil.Ok(t, bkt.Upload(ctx, path.Join(shouldIgnore.ID.String(), metadata.DeletionMarkFilename), &buf))
testutil.Ok(t, bkt.Upload(ctx, path.Join(ULID(3).String(), metadata.DeletionMarkFilename), bytes.NewBufferString("not a valid deletion-mark.json")))
input := map[ulid.ULID]*metadata.Meta{
ULID(1): {},
ULID(2): {},
ULID(3): {},
ULID(4): {},
}
expected := map[ulid.ULID]*metadata.Meta{
ULID(1): {},
ULID(3): {},
ULID(4): {},
}
m := newTestFetcherMetrics()
testutil.Ok(t, f.Filter(ctx, input, m.Synced, nil))
testutil.Equals(t, 1.0, promtest.ToFloat64(m.Synced.WithLabelValues(MarkedForDeletionMeta)))
testutil.Equals(t, expected, input)
testutil.Equals(t, 2, len(f.DeletionMarkBlocks()))
testutil.Ok(t, bkt.Delete(ctx, path.Join(shouldFetch.ID.String(), metadata.DeletionMarkFilename)))
input = map[ulid.ULID]*metadata.Meta{
ULID(1): {},
ULID(2): {},
ULID(3): {},
ULID(4): {},
}
testutil.Ok(t, f.Filter(ctx, input, m.Synced, nil))
testutil.Equals(t, 2, len(f.DeletionMarkBlocks()))
}

Some files were not shown because too many files have changed in this diff Show More