simulate-proxy increments a single set of metrics on each iteration, and
also randomizes http status codes, leaving counters unchanged across
several collections.
Modify simuilate-proxy to increment all metrics on each iteration,
provide a 90% success rate, ensure a pod does not call itself, and
increase proxy count from 3 to 10.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The Prometheus scrape config collects from Conduit proxies, and maps
Kubernetes labels to Prometheus labels, appending "k8s_".
This change keeps the resultant Prometheus labels consistent with their
source Kubernetes labels. For example: "deployment" and
"pod_template_hash".
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Have the controller tell the client whether the service exists, not
just what are available. This way we can implement fallback logic to
alternate service discovery mechanisms for ambigious names.
Signed-off-by: Brian Smith <brian@briansmith.org>
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The Prometheus config in the docker-compose environment had fallen
behind the prod setup.
This change updates the docker-compose environment in the following
ways:
- Prometheus config more closely matches prod, based on #583
- simulate-proxy labels matches prod, based on #605
- add Grafana container
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The simulate-proxy script pushes metrics to the telemetry service. This PR modifies the script to expose metrics to a prometheus endpoint. This functionality creates a server that randomly generates response_total, request_totals, response_duration_ms and response_latency_ms. The server reads pod information from a k8s cluster and picks a random namespace to use for all exposed metrics.
Tested out these changes with a locally running prometheus server. I also ran the docker-compose.yml to make sure metrics were being recorded by the prometheus docker container.
fixes#498
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
Removed the `method` label from Prometheus, and removed HTTP methods from reports. Removed `StreamSummary` from reports and replaced it with a `u32` count of streams.
Closes#266
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Follow-up from #315.
Now that the UIs don't report per-path metrics, we can remove the path label from Prometheus, the path aggregation and filtering options from the telemetry API, and the path field from the proxy report API.
I've modified the tests to no longer expect the removed fields, and manually verified that Conduit still works after making these changes.
Closes#265
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
The proxy currently stores latency values in an `OrderMap` and reports every observed latency value to the controller's telemetry API since the last report. The telemetry API then sends each individual value to Prometheus. This doesn't scale well when there are a large number of proxies making reports.
I've modified the proxy to use a fixed-size histogram that matches the histogram buckets in Prometheus. Each report now includes an array indicating the histogram bounds, and each response scope contains a set of counts corresponding to each index in the bounds array, indicating the number of times a latency in that bucket was observed. The controller then reports the upper bound of each bucket to Prometheus, and can use the proxy's reported set of bucket bounds so that the observed values will be correct even if the bounds in the control plane are changed independently of those set in the proxy.
I've also modified `simulate-proxy` to generate the new report structure, and added tests in the proxy's telemetry test suite validating the new behaviour.
Currently, all "success"/"failure" classifications in the telemetry API are made based on the `grpc-status` trailer. If the trailer is not present, then a request is assumed to have failed. As we start proxying non-gRPC traffic, the controller needs to also be aware of HTTP status codes, so that non-gRPC requests are not assumed to always fail.
I've modified the telemetry API server to classify requests based on their HTTP status codes when the `grpc-status` trailer is not present.
I've also modified the `simulate-proxy` script to generate fake HTTP/2 traffic without the `grpc-status` trailer.
Closes#196
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
simulate-proxy sends HTTP example data.
Modify this test script to also send TCP example data.
Part of #132
Signed-off-by: Andrew Seigner <andrew@sig.gy>
* Sort imports
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Upgrade k8s.io/client-go to v6.0.0
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Make k8s store initialization blocking with timeout
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
Problem:
Simulate proxy would seemingly hang when used.
In simulate-proxy we were using rand.Uint32() to generate Count. This is way too big (in telemetry/server.go we call latencyStat.observe() Count times, so this loop was taking fovever).
Solution:
Use a count of 1 (as the surrounding loop will generate count requests)
Validation:
Script now works without hanging.
We’ve built Conduit from the ground up to be the fastest, lightest,
simplest, and most secure service mesh in the world. It features an
incredibly fast and safe data plane written in Rust, a simple yet
powerful control plane written in Go, and a design that’s focused on
performance, security, and usability. Most importantly, Conduit
incorporates the many lessons we’ve learned from over 18 months of
production service mesh experience with Linkerd.
This repository contains a few tightly-related components:
- `proxy` -- an HTTP/2 proxy written in Rust;
- `controller` -- a control plane written in Go with gRPC;
- `web` -- a UI written in React, served by Go.