* CLI: change conduit namespace shorthand flag to -c
All of the conduit CLI subcommands accept a --conduit-namespace flag,
indicating the namespace where conduit is running. Some of the
subcommands also provide a --namespace flag, indicating the kubernetes
namespace where a user's application code is running. To prevent
confusion, I'm changing the shorthand flag for the conduit namespace to
-c, and using the -n shorthand when referring to user namespaces.
As part of this change I've also standardized the capitalization of all
of our command line flags, removed the -r shorthand for the install
--registry flag, and made the global --kubeconfig and --api-addr flags
apply to all subcommands.
* Switch flag descriptions from lowercase to Capital
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The new statsummary command accepted friendly k8s names, which worked
for k8s queries, but Prometheus requires a specific key.
Modify the statsummary query to map friendly k8s names to canonical k8s
names when constructing the query. Then during the query, map the
canonical k8s name to a specific Prometheus label.
Fixes#695
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Start implementing new conduit stat summary endpoint.
Changes the public-api to call prometheus directly instead of the
telemetry service. Wired through to `api/stat` on the web server,
as well as `conduit statsummary` on the CLI. Works for deployments only.
Current implementation just retrieves requests and mesh/total pod count
(so latency stats are always 0).
Uses API defined in #663
Example queries the stat endpoint will eventually satisfy in #627
This branch includes commits from @klingerf
* run ./bin/dep ensure
* run ./bin/update-go-deps-shas
The Destination service used slightly different labels than the
telemetry pipeline expected, specifically, prefixed with `k8s_*`.
Make all Prometheus labels consistent by dropping `k8s_*`. Also rename
`pod_name` to `pod` for consistency with `deployement`, etc. Also update
and reorganize `proxy-metrics.md` to reflect new labelling.
Fixes#655
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Using a vanilla Grafana Docker image as part of `conduit install`
avoided maintaining a conduit-specific Grafana Docker image, but made
packaging dashboard json files cumbersome.
Roll our own Grafana Docker image, that includes conduit-specific
dashboard json files. This significantly decreases the `conduit install`
output size, and enables dashboard integration in the docker-compose
environment.
Fixes#567
Part of #420
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The Grafana dashboards were displaying all proxy-enabled pods, including
conduit controller pods. In the old telemetry pipeline filtering these
out required knowledge of the controller's namespace, which the
dashboards are agnostic to.
This change leverages the new `conduit_io_control_plane_component`
prometheus label to filter out proxy-enabled controller components.
Part of #420
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The Top-line and Deployment Grafana dashboards relied on the
soon-to-be-removed telemetry pipeline metrics.
Update the Grafana dashboards to query for the new, proxy-based metrics.
Grafana dashboard layouts have not changed.
Depends on #635 to render metrics.
Part of #420.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Previously we were using the instance label to uniquely identify a pod.
This meant that getting stats by pod name would require extra queries to
Kubernetes to map pod name to instance.
This change adds a pod_name label to metrics at collection time. This
should not affect cardinality as pod_name is invariant with respect to
instance.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
this is a known issue with grafana in k8s. grafana/grafana:5.0.4 was just released today. update the repo from 5.0.3 to 5.0.4
fixed issues #582
Signed-off-by: Deshi Xiao <xiaods@gmail.com>
Currently, the CLI docker image copies the entire `controller`
directory, though the CLI only requires a few of its subdirectories.
This causes the CLI's docker cache to be needlessly invalidated when,
for instance, a service implementation changes.
By restricting the copied directories to `controller/{api,public,util}`,
build caching is improved.
* Add tests/utils/scripts for running integration tests
Add a suite of integration tests in the `test/` directory, as well as
utilities for testing in the `testutil/` directory.
You can use the `bin/test-run` script to run the full suite of tests,
and the `bin/test-cleanup` script to cleanup after the tests.
The test/README.md file has more information about running tests.
@pcalcado, @franziskagoltz, and @rmars also contributed to this change.
* Create TEST.md file at the root of the repo
* Update based on review feedback
* Relax external service IP timeout for GKE
* Update TEST.md with more info about different types of test runs
* More updates to TEST.md based on review feedback
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The Prometheus scrape config collects from Conduit proxies, and maps
Kubernetes labels to Prometheus labels, appending "k8s_".
This change keeps the resultant Prometheus labels consistent with their
source Kubernetes labels. For example: "deployment" and
"pod_template_hash".
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The inject code detects the object it is being injected into, and writes
self-identifying information into the CONDUIT_PROMETHEUS_LABELS
environment variable, so that conduit-proxy may read this information
and report it to Prometheus at collection time.
This change puts the self-identifying information directly into
Kubernetes labels, which Prometheus already collects, removing the need
for conduit-proxy to be aware of this information. The resulting label
in Prometheus is recorded in the form `k8s_deployment`.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
In addition to dashboards display service health, we need a dashboard to
display health of the Conduit service mesh itself.
This change introduces a conduit-health dashboard. It currently only
displays health metrics for the control plane components. Proxy health
will come later.
Fixes#502
Part of #420
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Update the inject command to set a CONDUIT_PROMETHEUS_LABELS proxy environment variable with the name of the pod spec that the proxy is injected into. This will later be used as a label value when the proxy is exposing metrics.
Fixes: #426
Signed-off-by: Alex Leong <alex@buoyant.io>
The existing telemetry pipeline relies on Prometheus scraping the
Telemetry service, which will soon be removed.
This change configures Prometheus to scrape the conduit proxies directly
for telemetry data, and the control plane components for control-plane
health information. This affects the output of both conduit install
and conduit inject.
Fixes#428, #501
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The existing `conduit dashboard` command supported opening the conduit
dashboard, or displaying the conduit dashboard URL, via a `url` boolean
flag.
Replace the `url` boolean flag with a `show` string flag, with three
modes:
`conduit dashboard --show conduit`: default, open conduit dashboard
`conduit dashboard --show grafana`: open grafana dashboard
`conduit dashboard --show url`: display dashboard URLs
Part of #420
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Successful `conduit check` commands now take into account `[ok]`
and `\n` tokens when constraining line length.
Fixes#554
Signed-off-by: Andy Hume <andyhume@gmail.com>
Existing Grafana configuration contained no dashboards, just a skeleton
for testing.
Introduce two Grafana dashboards:
1) Top Line: Overall health of all Conduit-enabled services
2) Deployment: Health of a specific conduit-enabled deployment
Fixes#500
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
This image isn't used. It references its base image using the `latest` tag, which
is wrong; it should have been using the tag that the base image was built with. It
is likely that the last few iterations of this image that we've published have
wrong and useless contents.
With that in mind, just remove the image.
Fixes#578.
Signed-off-by: Brian Smith <brian@briansmith.org>
* Most controller listeners should only bind on localhost
* Use default listening addresses in controller components
* Review feedback
* Revert test_helper change
* Revert use of absolute domains
Signed-off-by: Alex Leong <alex@buoyant.io>
* Temporarily stop trying to support configurable zones in the proxy.
None of the zone configuration is tested and lots of things assume the cluster
zone is `cluster.local`. Further, how exactly the proxy will actually learn the
cluster zone hasn't been decided yet.
Just hard-code the zone as "cluster.local" in the proxy until configurable zones
are fully implemented and tested to be working correctly.
Signed-off-by: Brian Smith <brian@briansmith.org>
* Remove the CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN setting
The way that Kubernetes configures DNS search suffixes has some negative
consequences as some names like "example.com" are ambiguous: depending on
whether there is a service "example" in the "com" namespace, "example.com"
may refer to an external service or an internal service, and this can
fluctuate over time. In recognition of that we added the
CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN setting, thinking this would
be part of a solution for users to opt out of the unfortunate behavior
if their applications didn't depend on the DNS search suffix feature.
It turns out similar effects can be acheived using a custom dnsConfig,
starting in Kubernetes 1.10 when dnsConfig reaches the beta stability level.
Now any CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN-based seems duplicative.
Further, attempting to support it optionally made the code complex and hard
to read.
Therefore, let's just remove it. If/when somebody actually requests this
functionality then we can add it back, if dnsConfig isn't a valid alternative
for them.
Signed-off-by: Brian Smith <brian@briansmith.org>
* Further hard-code "cluster.local" as the zone, temporarily.
Addresses review feedback.
Signed-off-by: Brian Smith <brian@briansmith.org>
Shortly after conduit is installed in k8s environment. The control plane component that establishes a watch endpoint with k8s run in to networking issues during proxy initialization. During failure, each watcher fails to retry its connection to k8s watch endpoint which leads to timeouts and eventually, multiple controller pod restarts.
This PR adds retry logic to each "watch" enabled package.
fixes#478
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
Kubernetes will do multiple DNS lookups for a name like
`proxy-api.conduit.svc.cluster.local` based on the default search settings
in /etc/resolv.conf for each container:
1. proxy-api.conduit.svc.cluster.local.conduit.svc.cluster.local. IN A
2. proxy-api.conduit.svc.cluster.local.svc.cluster.local. IN A
3. proxy-api.conduit.svc.cluster.local.cluster.local. IN A
4. proxy-api.conduit.svc.cluster.local. IN A
We do not need or want this search to be done, so avoid it by making each
name absolute by appending a period so that the first three DNS queries
are skipped for each name.
The case for `localhost` is even worse because we expect that `localhost` will
always resolve to 127.0.0.1 and/or ::1, but this is not guaranteed if the default
search is done:
1. localhost.conduit.svc.cluster.local. IN A
2. localhost.svc.cluster.local. IN A
3. localhost.cluster.local. IN A
4. localhost. IN A
Avoid these unnecessary DNS queries by making each name absolute, so that the
first three DNS queries are skipped for each name.
Signed-off-by: Brian Smith <brian@briansmith.org>
Grafana by default calls out to grafana.com to check for updates. As
user's of Conduit do not have direct control over updating Grafana
directly, this update check is not needed.
Disable Grafana's update check via grafana.ini.
This is also a workaround for #155, root cause of #519.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Add --expected-version flag for conduit check command
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Update build instructions
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
When the conduit proxy is injected into the controller pod, we observe controller pod proxy stats show up as an "outbound" deployment for an unrelated upstream deployment. This may cause confusion when monitoring deployments in the service mesh.
This PR filters out this "misleading" stat in the public api whenever the dashboard requests metric information for a specific deployment.
* exclude telemetry generated by the control plane when requesting deployment metrics
fixes#370
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
In the initial review for this code (preceding the creation of the
runconduit/conduit repository), it was noted that this container is not
actually used, so this is actually dead code.
Further, this container actualy causes a minor problem, as it doesn't
implement any retry logic, thus it will sometimes often cause errors to
be logged. See
https://github.com/runconduit/conduit/issues/496#issuecomment-370105328.
Further, this is a "buoyantio/" branded container. IF we actually need
such a container then it should be a Conduit-branded container.
See https://github.com/runconduit/conduit/issues/478 for additional
context.
Signed-off-by: Brian Smith <brian@briansmith.org>
Kubernetes $KUBECONFIG environment variable is a list of paths to
configuration files, but conduit assumes that it is a single path.
Changes in this commit introduce a straightforward way to discover and
load config file(s).
Complete list of changes:
- Use k8s.io/client-go/tools/clientcmd to deal with kubernetes
configuration file
- rename k8s API and k8s proxy constructors to get rid of redundancy
- remove shell package as it is not needed anymore
Signed-off-by: Igor Zibarev <zibarev.i@gmail.com>
`conduit install` deploys prometheus, but lacks a general-purpose way to
visualize that data.
This change adds a Grafana container to the `conduit install` command. It
includes two sample dashboards, viz and health, in their own respective
source files.
Part of #420
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
As part of `conduit check` command, warn the user if they are
running an outdated version of the cli client or the control
plane components.
Fixes#314
Signed-off-by: Andy Hume <andyhume@gmail.com>
The telemetry service in the controller pod uses a non-fully qualified URL to connect to the prometheus pod in the control plane. This PR changes the URL the telemetry's prometheus URL to be fully qualified to be consistent with other URLs in the control plane. This change was tested in minikube. The logs report no errors and looking at the prometheus dashboard shows that stats are being recorded from all conduit proxies.
fixes#414
Signed-off-by: Dennis Adjei-Baah dennis@buoyant.io
Refactor `conduit install` test into a data-driven test. Then add a test of the actual default output of `conduit install`.
This test is useful to make it clear when we change the default
settings of `conduit install`.
Signed-off-by: Brian Smith <brian@briansmith.org>
The control plane is proxied through the Conduit proxy. The Conduit
proxy is based on the base image, and the control plane containers
and the proxy share a networking namespace. This means we don't
need the extra base utilities in the controller images since we can
use the utilties in the proxy image.
This is a step towards building the initial no-networking Conduit CA
pod. Since the Conduit CA will not do any networking of its own, we
networking debugging utilties are not helpful for it. They are
actually an unnecessary risk because they could facilitate the
exfiltration of the private key of the CA. (The Conduit CA pod won't
have the Conduit Proxy injected into it either.)
This also simplifies & slightly speeds up the building of the
controller images. This is a stepping stone towards being able to
build the controller images without `docker build` to improve build
times.
Signed-off-by: Brian Smith <brian@briansmith.org>
The `app` label should be reserved for end-user applications and we
shouldn't use it ourselves. We already have a Conduit-specific label
that is is prefixed with the `conduit.io/` prefix to avoid naming
collisions with users' labels, so just use that one instead.
Signed-off-by: Brian Smith <brian@briansmith.org>
The template used by `conduit install` was hard-coded in install.go.
This change moves the template into its own file, in anticipation of
increasing the template's size and complexity.
Part of #420
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
In order to take advantage of the benefits the conduit proxy gives to deployments, this PR injects the conduit proxy into the control plane pod. This helps us lay the groundwork for future work such as TLS, control plane observability etc.
Fixes#311
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
* Use Go 1.10.0 to build Go components.
Take advantage of the new build cache in Go 1.10. Future work on improving
build performance will utilize the build cache further.
Signed-off-by: Brian Smith <brian@briansmith.org>
Previously Dockerfile-go-deps was converted from a multi-stage Dockefile
to a single-stage Dockerfile in anticipation of enabling efficient use
of `--cache-from` in CI. However, that resulted in the image ballooning
in size because it contained the Git repo for every package downloaded
by `dep ensure`.
Bring the image back down to the proper size by removing the temporary
files created.
Signed-off-by: Brian Smith <brian@briansmith.org>
* Update go-run to set version equal to root-tag
* Fix inject tests for undefined version change
* Pass inject version explitictly as arg
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Remove kubectl dependency, validate k8s server version via api
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Remove unused MockKubectl
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Remame kubectl.go to version.go
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
When the `inject` command is used on a YAML file that is invalid, it prints out an invalid YAML file with the injected proxy. This may give a false indication to the user that the inject was successful even though the inject command prints out an error message further down the terminal window. This PR fixes#303 and contains a test input and output file that indicates what should be shown.
This PR also fixes#390.
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
* Run conduit dashboard on ephemeral port by default
* Fix wording on dashboard --port flag
* log.Debug error instead of discarding it
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Refactor `conduit inject` code to eliminate duplicate logic
Previously there was a lot of code repeated once for each type of
object that has a pod spec.
Refactor the code to reduce the amount of duplication there, to make future
changes easier.
Signed-off-by: Brian Smith <brian@briansmith.org>
* CLI: Remove now-unnecessary "enhanced" Kubernetes object types
The "enhanced" types aren't necessary because now the Kuberentes API
implementation has the correct JSON annotations for the InitContainers
field.
Signed-off-by: Brian Smith <brian@briansmith.org>