In addition to dashboards display service health, we need a dashboard to
display health of the Conduit service mesh itself.
This change introduces a conduit-health dashboard. It currently only
displays health metrics for the control plane components. Proxy health
will come later.
Fixes#502
Part of #420
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Update the inject command to set a CONDUIT_PROMETHEUS_LABELS proxy environment variable with the name of the pod spec that the proxy is injected into. This will later be used as a label value when the proxy is exposing metrics.
Fixes: #426
Signed-off-by: Alex Leong <alex@buoyant.io>
The existing telemetry pipeline relies on Prometheus scraping the
Telemetry service, which will soon be removed.
This change configures Prometheus to scrape the conduit proxies directly
for telemetry data, and the control plane components for control-plane
health information. This affects the output of both conduit install
and conduit inject.
Fixes#428, #501
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The existing `conduit dashboard` command supported opening the conduit
dashboard, or displaying the conduit dashboard URL, via a `url` boolean
flag.
Replace the `url` boolean flag with a `show` string flag, with three
modes:
`conduit dashboard --show conduit`: default, open conduit dashboard
`conduit dashboard --show grafana`: open grafana dashboard
`conduit dashboard --show url`: display dashboard URLs
Part of #420
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Successful `conduit check` commands now take into account `[ok]`
and `\n` tokens when constraining line length.
Fixes#554
Signed-off-by: Andy Hume <andyhume@gmail.com>
Existing Grafana configuration contained no dashboards, just a skeleton
for testing.
Introduce two Grafana dashboards:
1) Top Line: Overall health of all Conduit-enabled services
2) Deployment: Health of a specific conduit-enabled deployment
Fixes#500
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Most controller listeners should only bind on localhost
* Use default listening addresses in controller components
* Review feedback
* Revert test_helper change
* Revert use of absolute domains
Signed-off-by: Alex Leong <alex@buoyant.io>
* Temporarily stop trying to support configurable zones in the proxy.
None of the zone configuration is tested and lots of things assume the cluster
zone is `cluster.local`. Further, how exactly the proxy will actually learn the
cluster zone hasn't been decided yet.
Just hard-code the zone as "cluster.local" in the proxy until configurable zones
are fully implemented and tested to be working correctly.
Signed-off-by: Brian Smith <brian@briansmith.org>
* Remove the CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN setting
The way that Kubernetes configures DNS search suffixes has some negative
consequences as some names like "example.com" are ambiguous: depending on
whether there is a service "example" in the "com" namespace, "example.com"
may refer to an external service or an internal service, and this can
fluctuate over time. In recognition of that we added the
CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN setting, thinking this would
be part of a solution for users to opt out of the unfortunate behavior
if their applications didn't depend on the DNS search suffix feature.
It turns out similar effects can be acheived using a custom dnsConfig,
starting in Kubernetes 1.10 when dnsConfig reaches the beta stability level.
Now any CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN-based seems duplicative.
Further, attempting to support it optionally made the code complex and hard
to read.
Therefore, let's just remove it. If/when somebody actually requests this
functionality then we can add it back, if dnsConfig isn't a valid alternative
for them.
Signed-off-by: Brian Smith <brian@briansmith.org>
* Further hard-code "cluster.local" as the zone, temporarily.
Addresses review feedback.
Signed-off-by: Brian Smith <brian@briansmith.org>
Kubernetes will do multiple DNS lookups for a name like
`proxy-api.conduit.svc.cluster.local` based on the default search settings
in /etc/resolv.conf for each container:
1. proxy-api.conduit.svc.cluster.local.conduit.svc.cluster.local. IN A
2. proxy-api.conduit.svc.cluster.local.svc.cluster.local. IN A
3. proxy-api.conduit.svc.cluster.local.cluster.local. IN A
4. proxy-api.conduit.svc.cluster.local. IN A
We do not need or want this search to be done, so avoid it by making each
name absolute by appending a period so that the first three DNS queries
are skipped for each name.
The case for `localhost` is even worse because we expect that `localhost` will
always resolve to 127.0.0.1 and/or ::1, but this is not guaranteed if the default
search is done:
1. localhost.conduit.svc.cluster.local. IN A
2. localhost.svc.cluster.local. IN A
3. localhost.cluster.local. IN A
4. localhost. IN A
Avoid these unnecessary DNS queries by making each name absolute, so that the
first three DNS queries are skipped for each name.
Signed-off-by: Brian Smith <brian@briansmith.org>
Grafana by default calls out to grafana.com to check for updates. As
user's of Conduit do not have direct control over updating Grafana
directly, this update check is not needed.
Disable Grafana's update check via grafana.ini.
This is also a workaround for #155, root cause of #519.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
When the conduit proxy is injected into the controller pod, we observe controller pod proxy stats show up as an "outbound" deployment for an unrelated upstream deployment. This may cause confusion when monitoring deployments in the service mesh.
This PR filters out this "misleading" stat in the public api whenever the dashboard requests metric information for a specific deployment.
* exclude telemetry generated by the control plane when requesting deployment metrics
fixes#370
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
In the initial review for this code (preceding the creation of the
runconduit/conduit repository), it was noted that this container is not
actually used, so this is actually dead code.
Further, this container actualy causes a minor problem, as it doesn't
implement any retry logic, thus it will sometimes often cause errors to
be logged. See
https://github.com/runconduit/conduit/issues/496#issuecomment-370105328.
Further, this is a "buoyantio/" branded container. IF we actually need
such a container then it should be a Conduit-branded container.
See https://github.com/runconduit/conduit/issues/478 for additional
context.
Signed-off-by: Brian Smith <brian@briansmith.org>
`conduit install` deploys prometheus, but lacks a general-purpose way to
visualize that data.
This change adds a Grafana container to the `conduit install` command. It
includes two sample dashboards, viz and health, in their own respective
source files.
Part of #420
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The telemetry service in the controller pod uses a non-fully qualified URL to connect to the prometheus pod in the control plane. This PR changes the URL the telemetry's prometheus URL to be fully qualified to be consistent with other URLs in the control plane. This change was tested in minikube. The logs report no errors and looking at the prometheus dashboard shows that stats are being recorded from all conduit proxies.
fixes#414
Signed-off-by: Dennis Adjei-Baah dennis@buoyant.io
Refactor `conduit install` test into a data-driven test. Then add a test of the actual default output of `conduit install`.
This test is useful to make it clear when we change the default
settings of `conduit install`.
Signed-off-by: Brian Smith <brian@briansmith.org>
The `app` label should be reserved for end-user applications and we
shouldn't use it ourselves. We already have a Conduit-specific label
that is is prefixed with the `conduit.io/` prefix to avoid naming
collisions with users' labels, so just use that one instead.
Signed-off-by: Brian Smith <brian@briansmith.org>
In order to take advantage of the benefits the conduit proxy gives to deployments, this PR injects the conduit proxy into the control plane pod. This helps us lay the groundwork for future work such as TLS, control plane observability etc.
Fixes#311
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
* Update go-run to set version equal to root-tag
* Fix inject tests for undefined version change
* Pass inject version explitictly as arg
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Remove kubectl dependency, validate k8s server version via api
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Remove unused MockKubectl
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Remame kubectl.go to version.go
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
When the `inject` command is used on a YAML file that is invalid, it prints out an invalid YAML file with the injected proxy. This may give a false indication to the user that the inject was successful even though the inject command prints out an error message further down the terminal window. This PR fixes#303 and contains a test input and output file that indicates what should be shown.
This PR also fixes#390.
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
- reduce row spacing on tables to make them more compact
- Rename TabbedMetricsTable to MetricsTable since it's not tabbed any more
- Format latencies greater than 1000ms as seconds
- Make sidebar collapsible
- poll the /pods endpoint from the sidebar in order to refresh the list of deployments in the autocomplete
- display the conduit namespace in the service mesh details table
- Use floats rather than Col for more responsive layout (fixes#224)
The init container injected by conduit inject rewrites the iptables configuration for its network namespace. This causes havoc when the network namespace isn't restricted to the pod, i.e. when hostNetwork=true.
Skip pods with hostNetwork=true to avoid this problem.
Fixes#366.
Signed-off-by: Brian Smith <brian@briansmith.org>
Refactor `conduit inject` code to make it unit-testable.
Refactor the conduit inject code to make it easier to add unit tests. This work was done by @deebo91 in #365. This is the same PR without the conduit install changes, so that it can land ahead of #365. In particular, this will be used for testing the fix for high-priority bug #366.
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
Signed-off-by: Brian Smith <brian@briansmith.org>
Conduit has been on Prometheus 1.8.1. Prometheus 2.x promises better
performance.
Upgrade Conduit to Prometheus 2.1.0
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
This PR updates the web UI to remove the pod detail page, and to remove the links to that page from pod names in metrics tables. It also removes the `pods` option from `conduit stat`, and the `sourcePod` and `targetPod` fields from the controller API proto's `MetricMetadata` message.
I've updated the `conduit stat` tests to reflect these changes, and manually verified the web UI changes.
Closes#261
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
The conduit.io/* k8s labels and annotations we're redundant in some
cases, and not flexible enough in others.
This change modifies the labels in the following ways:
`conduit.io/plane: control` => `conduit.io/controller-component: web`
`conduit.io/controller: conduit` => `conduit.io/controller-ns: conduit`
`conduit.io/plane: data` => (remove, redundant with `conduit.io/controller-ns`)
It also centralizes all k8s labels and annotations into
pkg/k8s/labels.go, and adds tests for the install command.
Part of #201
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Abstract Conduit API client from protobuf interface to add new features
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Consolidate mock api clients
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Add simple implementation of healthcheck for conduit api
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Change NextSteps to FriendlyMessageToUser
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Add grpc check for status on the client
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Add simple server-side check for Conduit API
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Fix feedback from PR
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Add framework for healthcheck in CLI
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Add self-checked for kubectl
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Clear formatting code
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Removed ununsed objects from status
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Removed ununsed parameter
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Ignore errored self checkers
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Make the check error by default
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Log error, format changes
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Add func to rsolve kubectl-like names to canonical names
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Refactor API instantiation
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Make version command testable
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Make get command testable
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Add tests for api utils
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Make stat command testable
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Make tap command testablë
Signed-off-by: Phil Calcado <phil@buoyant.io>