linkerd2

Commit Graph

Author	SHA1	Message	Date
Andrew Seigner	1ed4a93b5e	Higher velocity metrics from simulate-proxy (#635 ) simulate-proxy increments a single set of metrics on each iteration, and also randomizes http status codes, leaving counters unchanged across several collections. Modify simuilate-proxy to increment all metrics on each iteration, provide a 90% success rate, ensure a pod does not call itself, and increase proxy count from 3 to 10. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-28 13:30:02 -07:00
Andrew Seigner	fe35509406	Clean up Prometheus labels scraped from proxy (#633 ) The Prometheus scrape config collects from Conduit proxies, and maps Kubernetes labels to Prometheus labels, appending "k8s_". This change keeps the resultant Prometheus labels consistent with their source Kubernetes labels. For example: "deployment" and "pod_template_hash". Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-27 15:01:08 -07:00
Brian Smith	7dc21f9588	Add the NoEndpoints message to the Destination API (#564 ) Have the controller tell the client whether the service exists, not just what are available. This way we can implement fallback logic to alternate service discovery mechanisms for ambigious names. Signed-off-by: Brian Smith <brian@briansmith.org> Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-03-27 10:45:41 -10:00
Andrew Seigner	12c6531546	Update docker-compose environment to match prod (#609 ) The Prometheus config in the docker-compose environment had fallen behind the prod setup. This change updates the docker-compose environment in the following ways: - Prometheus config more closely matches prod, based on #583 - simulate-proxy labels matches prod, based on #605 - add Grafana container Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-23 17:00:39 -07:00
Dennis Adjei-Baah	b90668a0b5	Modify simulate proxy to expose prometheus metrics (#576 ) The simulate-proxy script pushes metrics to the telemetry service. This PR modifies the script to expose metrics to a prometheus endpoint. This functionality creates a server that randomly generates response_total, request_totals, response_duration_ms and response_latency_ms. The server reads pod information from a k8s cluster and picks a random namespace to use for all exposed metrics. Tested out these changes with a locally running prometheus server. I also ran the docker-compose.yml to make sure metrics were being recorded by the prometheus docker container. fixes #498 Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>	2018-03-21 16:40:12 -07:00
Eliza Weisman	8bc497a057	Remove unused metrics (#322 ) Removed the `method` label from Prometheus, and removed HTTP methods from reports. Removed `StreamSummary` from reports and replaced it with a `u32` count of streams. Closes #266 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-02-09 17:14:17 -08:00
Eliza Weisman	458e9d2ac5	Remove per-path metrics from telemetry pipeline (#317 ) Follow-up from #315. Now that the UIs don't report per-path metrics, we can remove the path label from Prometheus, the path aggregation and filtering options from the telemetry API, and the path field from the proxy report API. I've modified the tests to no longer expect the removed fields, and manually verified that Conduit still works after making these changes. Closes #265 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-02-09 14:20:28 -08:00
Eliza Weisman	915f08ac4c	Store proxy latencies in a structure that matches controller histogram (#11 ) The proxy currently stores latency values in an `OrderMap` and reports every observed latency value to the controller's telemetry API since the last report. The telemetry API then sends each individual value to Prometheus. This doesn't scale well when there are a large number of proxies making reports. I've modified the proxy to use a fixed-size histogram that matches the histogram buckets in Prometheus. Each report now includes an array indicating the histogram bounds, and each response scope contains a set of counts corresponding to each index in the bounds array, indicating the number of times a latency in that bucket was observed. The controller then reports the upper bound of each bucket to Prometheus, and can use the proxy's reported set of bucket bounds so that the observed values will be correct even if the bounds in the control plane are changed independently of those set in the proxy. I've also modified `simulate-proxy` to generate the new report structure, and added tests in the proxy's telemetry test suite validating the new behaviour.	2018-02-07 18:02:59 -08:00
Eliza Weisman	9e49054963	Classify non-gRPC status codes for HTTP telemetry (#200 ) Currently, all "success"/"failure" classifications in the telemetry API are made based on the `grpc-status` trailer. If the trailer is not present, then a request is assumed to have failed. As we start proxying non-gRPC traffic, the controller needs to also be aware of HTTP status codes, so that non-gRPC requests are not assumed to always fail. I've modified the telemetry API server to classify requests based on their HTTP status codes when the `grpc-status` trailer is not present. I've also modified the `simulate-proxy` script to generate fake HTTP/2 traffic without the `grpc-status` trailer. Closes #196 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-01-24 10:57:23 -08:00
Andrew Seigner	6f26cf21cf	Add TCP data to simulate-proxy script (#165 ) simulate-proxy sends HTTP example data. Modify this test script to also send TCP example data. Part of #132 Signed-off-by: Andrew Seigner <andrew@sig.gy>	2018-01-17 15:38:10 -08:00
Kevin Lingerfelt	1dc1c00a2a	Upgrade k8s.io/client-go to v6.0.0 (#122 ) * Sort imports Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Upgrade k8s.io/client-go to v6.0.0 Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Make k8s store initialization blocking with timeout Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-01-11 10:22:37 -08:00
Risha Mars	82f50b5536	Fix simulate-proxy script (#14 ) Problem: Simulate proxy would seemingly hang when used. In simulate-proxy we were using rand.Uint32() to generate Count. This is way too big (in telemetry/server.go we call latencyStat.observe() Count times, so this loop was taking fovever). Solution: Use a count of 1 (as the surrounding loop will generate count requests) Validation: Script now works without hanging.	2017-12-08 11:15:05 -08:00
Oliver Gould	b104bd0676	Introducing Conduit, the ultralight service mesh We’ve built Conduit from the ground up to be the fastest, lightest, simplest, and most secure service mesh in the world. It features an incredibly fast and safe data plane written in Rust, a simple yet powerful control plane written in Go, and a design that’s focused on performance, security, and usability. Most importantly, Conduit incorporates the many lessons we’ve learned from over 18 months of production service mesh experience with Linkerd. This repository contains a few tightly-related components: - `proxy` -- an HTTP/2 proxy written in Rust; - `controller` -- a control plane written in Go with gRPC; - `web` -- a UI written in React, served by Go.	2017-12-05 00:24:55 +00:00

13 Commits