This PR removes the unused `request_duration_ms` and `response_duration_ms` histogram metrics from the proxy. It also removes them from the `simulate-proxy` script's output, and from `docs/proxy-metrics.md`
Closes#821
The Destination service does not provide ReplicaSet information to the
proxy.
The `pod-template-hash` label approximates selecting over all pods in a
ReplicaSet or ReplicationController. Modify the Destination service to
provide this label to the proxy.
Relates to #508 and #741
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Conduit 0.4.0 overhauls Conduit's telemetry system and improves service discovery
reliability.
* Web UI
* **New** automatically-configured Grafana dashboards for all Deployments.
* Command-line interface
* `conduit stat` has been completely rewritten to accept arguments like `kubectl get`.
The `--to` and `--from` filters can be used to filter traffic by destination and
source, respectively. `conduit stat` currently can operate on `Namespace` and
`Deployment` Kubernetes resources. More resource types will be added in the next
release!
* Proxy (data plane)
* **New** Prometheus-formatted metrics are now exposed on `:4191/metrics`, including
rich destination labeling for outbound HTTP requests. The proxy no longer pushes
metrics to the control plane.
* The proxy now handles `SIGINT` or `SIGTERM`, gracefully draining requests until all
are complete or `SIGQUIT` is received.
* SMTP and MySQL (ports 25 and 3306) are now treated as opaque TCP by default. You
should no longer have to specify `--skip-outbound-ports` to communicate with such
services.
* When the proxy reconnected to the controller, it could continue to send requests to
old endpoints. Now, when the proxy reconnects to the controller, it properly removes
invalid endpoints.
* A bug impacting some HTTP/2 reset scenarios has been fixed.
* Service Discovery
* Previously, the proxy failed to resolve some domain names that could be misinterpreted
as a Kubernetes Service name. This has been fixed by extending the _Destination_ API
with a negative acknowledgement response.
* Control Plane
* The _Telemetry_ service and associated APIs have been removed.
* Documentation
* Updated Roadmap
* Added prometheus metrics guide
The Destination service used slightly different labels than the
telemetry pipeline expected, specifically, prefixed with `k8s_*`.
Make all Prometheus labels consistent by dropping `k8s_*`. Also rename
`pod_name` to `pod` for consistency with `deployement`, etc. Also update
and reorganize `proxy-metrics.md` to reflect new labelling.
Fixes#655
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Previously we were using the instance label to uniquely identify a pod.
This meant that getting stats by pod name would require extra queries to
Kubernetes to map pod name to instance.
This change adds a pod_name label to metrics at collection time. This
should not affect cardinality as pod_name is invariant with respect to
instance.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
This PR adds a `classification` label to proxy response metrics, as @olix0r described in https://github.com/runconduit/conduit/issues/634#issuecomment-376964083. The label is either "success" or "failure", depending on the following rules:
+ **if** the response had a gRPC status code, *then*
- gRPC status code 0 is considered a success
- all others are considered failures
+ **else if** the response had an HTTP status code, *then*
- status codes < 500 are considered success,
- status codes >= 500 are considered failures
+ **else if** the response stream failed **then**
- the response is a failure.
I've also added end-to-end tests for the classification of HTTP responses (with some work towards classifying gRPC responses as well). Additionally, I've updated `doc/proxy_metrics.md` to reflect the added `classification` label.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
The Prometheus scrape config collects from Conduit proxies, and maps
Kubernetes labels to Prometheus labels, appending "k8s_".
This change keeps the resultant Prometheus labels consistent with their
source Kubernetes labels. For example: "deployment" and
"pod_template_hash".
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The inject code detects the object it is being injected into, and writes
self-identifying information into the CONDUIT_PROMETHEUS_LABELS
environment variable, so that conduit-proxy may read this information
and report it to Prometheus at collection time.
This change puts the self-identifying information directly into
Kubernetes labels, which Prometheus already collects, removing the need
for conduit-proxy to be aware of this information. The resulting label
in Prometheus is recorded in the form `k8s_deployment`.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The Markdown files were all originally named "$x/_index.md"; I renamed
them as follows:
```
for x in `ls ~/conduit-site/conduit.io/content`; do
cp ~/conduit-site/conduit.io/content/$x/_index.md doc/$x.md
done
mv doc/doc.md doc/overview.md
```
When we publish the files on conduit.io we need to do the inverse
transformation to avoid breaking existing links.
The images were embedded using a syntax GitHub doesn't support. Also, the
images were not originally in a subdirectory of docs/.
Use normal Markdown syntax for image embedding, and reference the docs
using relative links to the images/ subdirectory. This way they will show
up in the GitHub UI. When we publish the docs on conduit.io we'll need to
figure out how to deal with this change.
I took the liberty of renaming data-plane.png to dashboard-data-plane.png to
clarify it a bit.
There is no other roadmap so there's no need to qualify this one as
"public." Before it was made public we marked it "public" to emphasize
that it would become public, but that isn't needed now.
Signed-off-by: Brian Smith <brian@briansmith.org>