DaemonSet stats are not currently shown in the cli stat command, web ui
or grafana dashboard. This commit adds daemonset support for stat.
Update stat command's help message to reference daemonsets.
Update the public-api to support stats for daemonsets.
Add tests for stat summary and api.
Add daemonset get/list/watch permissions to the linkerd-controller
cluster role that's created using the install command.
Update golden expectation test files for install command
yaml manifest output.
Update web UI with daemonsets
Update navigation, overview and pages to list daemonsets and the pods
associated to them.
Add daemonset paths to server, and ui apps.
Add grafana dashboard for daemonsets; a clone of the deployment
dashboard.
Update dependencies and dockerfile hashes
Add DaemonSet support to tap and top commands
Fixes of #2006
Signed-off-by: Zak Knill <zrjknill@gmail.com>
This introduces a `linkerd install-sp` command to install service
profiles into the linkerd control plane. It installs one service profile
for each of controller-api, proxy-api, prometheus, and grafana.
Fixes#1901
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Version checks were not validating that the cli version matched the
control plane or data plane versions.
Add checks via the `linkerd check` command to validate the cli is
running the same version as the control and data plane.
Also add types around `channel-version` string parsing and matching. A
consequence being that during development `version.Version` changes from
`undefined` to `dev-undefined`.
Fixes#2076
Depends on linkerd/website#101
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Introduce resource selector and deprecate namespace field for ListPods
* Changes from code review
* Properly deprecate the field
* Do not check for nil
* Fix the mockProm usage
* Protoc changes revert
* Changed from code review
Signed-off-by: Alena Varkockova <varkockova.a@gmail.com>
The default font in Windows console did not support the Unicode
characters recently added to check and inject commands. Also the color
library the linkerd cli depends on was not being used in a
cross-platform way.
Replace the existing Unicode characters used in `check` and `inject`
with characters available in most fonts, including Windows console.
Similarly replace the spinner used in `check` with one that uses
characters available in most fonts.
Modify `check` and `inject` to use `color.Output` and `color.Error`,
which wrap `os.Stdout` and `os.Stderr`, and perform special
tranformations when on Windows.
Add a `--no-color` option to `linkerd logs`. While stern uses the same
color library that `check`/`inject` use, it is not yet using the
`color.Output` API for Windows support. That issue is tracked at:
https://github.com/wercker/stern/issues/69
Relates to https://github.com/linkerd/linkerd2/pull/2087
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* When injecting, perform an uninject as a first step
Fixes#1970
The fixture `inject_emojivoto_already_injected.input.yml` is no longer
rejected, so I created the corresponding golden file.
Note that we'll still forbid injection over resources already injected
with third party meshes (Istio, Contour), so now we have
`HasExisting3rdPartySidecars()` to detect that.
* Generalize HasExistingSidecars() to cater for both the auto-injector and
* Convert `linkerd uninject` result format to the one used in `linkerd inject`.
* More updates to the uninject reports. Revert changes to the HasExistingSidecars func.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
When debugging control plane issues or issues pertaining to a linkerd proxy, it can be cumbersome to get logs from affected containers quickly.
This PR adds a new `logs` command to the Linkerd CLI to surface log lines from any container within linkerd's control plane. This feature relies heavily on [stern](https://github.com/wercker/stern), which already provides this behavior. This PR integrates this package into the Linkerd CLI to allow users to quickly retrieve logs whenever they run into issues when using Linkerd.
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
Fixes#1875
This change improves the `linkerd routes` command in a number of important ways:
* The restriction on the type of the `--to` argument is lifted and any resource type can now be used. Try `--to ns/books`, `--to po/webapp-ABCDEF`, `--to au/linkerd.io`, or even `--to svc`.
* All routes for the target will now be populated in the table, even if there are no Prometheus metrics for that route.
* [UNKNOWN] has been renamed to [DEFAULT]
* The `Service/Authority` column will now list `Service` in all cases except for when an authority target is explicitly requested.
```
$ linkerd routes deploy/traffic --to deploy/webapp
ROUTE SERVICE SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
GET / webapp 100.00% 0.5rps 50ms 180ms 196ms
GET /authors/{id} webapp 100.00% 0.5rps 100ms 900ms 980ms
GET /books/{id} webapp 100.00% 0.9rps 38ms 93ms 99ms
POST /authors webapp 100.00% 0.5rps 35ms 48ms 50ms
POST /authors/{id}/delete webapp 100.00% 0.5rps 83ms 180ms 196ms
POST /authors/{id}/edit webapp 0.00% 0.0rps 0ms 0ms 0ms
POST /books webapp 45.16% 2.1rps 75ms 425ms 485ms
POST /books/{id}/delete webapp 100.00% 0.5rps 30ms 90ms 98ms
POST /books/{id}/edit webapp 56.00% 0.8rps 92ms 875ms 975ms
[DEFAULT] webapp 0.00% 0.0rps 0ms 0ms 0ms
```
This is all made possible by a shift in the way we handle the destination resource. When we get a request with a `ToResource`, we use the k8s API to find all Services which include at least one pod belonging to that resource. We then fetch all service profiles for those services and display the routes from those serivce profiles.
This shift in thinking also precipitates a change in the TopRoutes API where we no longer need special cases for `ToAll` (which can be specified by `--to au`) or `ToAuthority` (which can be specified by `--to au/<authority>`) and instead can use a `ToResource` to handle all cases.
Signed-off-by: Alex Leong <alex@buoyant.io>
The outputs of the `check` and `inject` commands did not vary much
between successful and failed executions, and were a bit verbose and
challenging to parse.
Reorganize output of `check` and `inject` commands, to provide more
output when errors occur, and less output when successful.
Specific changes:
`linkerd check`
- visually group checks by category
- introduce `hintURL`'s, to provide doc links when checks fail
- add spinners when retrying, remove additional retry lines
- colored unicode characters to indicate success/warning/failure
`linkerd inject`
- modify default output to mirror `kubectl apply`
- only output non-successful inject reports
- support `--verbose` flag to output all inject reports
Fixes#1471, #1653, #1656, #1739
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The linkerd check command organized the various checks via loosely
coupled category IDs, category names, and checkers themselves, all with
ordering defined by consumers of this code.
This change removes category IDs in favor of category names, groups all
checkers by category, and enforces ordering at the HealthChecker
level.
Part of #1471, depends on #2078.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The set of health checks to be executed were dependent on a combination
of check enums and boolean options.
This change modifies the health checks to be governed strictly by a set
of enums.
Next steps:
- tightly couple category IDs to names
- tightly couple checks to their parent categories
- programmatic control over check ordering
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Add `linkerd uninject` command
uninject.go iterates through the resources annotations, labels,
initContainers and Containers, removing what we know was injected by
linkerd.
The biggest part of this commit is the refactoring of inject.go, to make
it more generic and reusable by uninject.
The idea is that in a following PR this functionality will get reused by
`linkerd inject` to uninject as as preliminary step to injection, as a
solution to #1970.
This was tested successfully on emojivoto with:
```
1) inject:
kubectl get -n emojivoto deployment -o yaml | bin/linkerd inject - |
kubectl apply -f -
2) uninject:
kubectl get -n emojivoto deployment -o yaml | bin/linkerd uninject - |
kubectl apply -f -
```
Also created unit tests for uninject.go. The fixture files from the inject
tests could be reused. But as now the input files act as outputs, they
represent existing resources and required these changes (that didn't
affect inject):
- Rearranged fields in alphabetical order.
- Added fields that are only relevant for existing resources (e.g.
creationTimestamp and status.replicas in StatefulSets)
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
* Setup port-forwarding for linkerd dashboard command
* Output port-forward logs when --verbose flag is set
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Add validation to CRD for service profiles
Signed-off-by: Alena Varkockova <varkockova.a@gmail.com>
* Use properties instead of oneof
Signed-off-by: Alena Varkockova <varkockova.a@gmail.com>
* Proxy grafana requests through web service
* Fix -grafana-addr default, clarify -api-addr flag
* Fix version check in grafana dashboards
* Fix comment typo
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Fix reporting of injected resources
When reporting resources provided as a list, it was picking just one
item from the list and ignoring the rest.
Also improved test for injecting list of resources.
Fixes#1676
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
This is a followup branch from #2023:
- delete `proxy/client.go`, move code to `destination-client`
- move `RenderTapEvent` and stat functions from `util` to `cmd`
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Fixes#1910
Add a `--routes` flag to the `top` command which show requests by route instead of by path.
This also includes an overhaul of the live table rendering code to be more flexible and better accommodate the various combinations of columns that may be visible or hidden.
Co-authored-by: Andrew Seigner <siggy@buoyant.io>
Signed-off-by: Alex Leong <alex@buoyant.io>
* Create new service accounts for linkerd-web and linkerd-grafana. Change 'serviceAccount:' to 'serviceAccountName:'
* Use dynamic namespace name
Signed-off-by: Cody Vandermyn <cody.vandermyn@nordstrom.com>
* Allow input of a volume name for prometheus and grafana
* Make Prometheus and Grafana volume names 'data' by default and disallow user editing via cli flags
* Remove volume name from options
Signed-off-by: Cody Vandermyn <cody.vandermyn@nordstrom.com>
* add securityContext with runAsUser: {{.ProxyUID}} to the various containers in the install template
* Update golden to reflect new additions
* changed to a different user id than the proxy user id
* Added a controller-uid install option
* change the port that the proxy-injector runs
* The initContainers needs to be run as the root user.
* move security contexts to container level
Signed-off-by: Cody Vandermyn <cody.vandermyn@nordstrom.com>
* Add parameter to stats API to skip retrieving Prometheus stats
Used by the dashboard to populate list of resources.
Fixes#1022
Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
* Prometheus queries check results were being ignored
* Refactor verifyPromQueries() to also test when no prometheus queries
should be generated
* Add test for SkipStats=true
Includes adding ability to public.GenStatSummaryResponse to not generate
basicStats
* Fix previous test
If the `linkerd routes` command gets two routes with the same name, it will only display one of them, even if the routes are from different services. This is particularly obvious with the default `[UNKNOWN]` route.
We now display all routes, even if they have the same name.
Signed-off-by: Alex Leong <alex@buoyant.io>
Add support for service profiles created on external (non-service) authorities. For example, this allows you to create a service profile named `linkerd.io` which will apply to calls made to `linkerd.io`.
This is done by changing the `LINKERD2_PROXY_DESTINATION_PROFILE_SUFFIXES` to `.` so that the proxy will attempt to lookup a service profile for any authority. We provide the `--disable-external-profiles` proxy flag to revert this behavior in case it is a problem.
We also refactor the proxy-api implementation of GetProfiles so that it does the profile lookup, regardless of if the authority looks like a Kubernetes service name or not. To simplify this, support for multiple resolves (which was unused) was removed.
Signed-off-by: Alex Leong <alex@buoyant.io>
The proxy-api service _always_ suggests that two meshed pods communicate
via HTTP/2 (i.e. via transparent protocol upgrading, if necessary).
This can complicate debugging and diagnostics at times, so it's
important that we have a way to deploy linkerd without this auto-upgrade
behavior.
This change adds a `-disable-h2-upgrade` flag to the `linkerd install`
command that disables transparent upgrading for the whole cluster.
We rework the routes command so that it can accept any Kubernetes resource, making it act much more similarly to the stat command.
Signed-off-by: Alex Leong <alex@buoyant.io>
We rename path to path_regex in the ServiceProfile CRD to make it clear that this field accepts a regular expression. We also take this opportunity to remove unnecessary line anchors from regular expressions now that these anchors are added in the proxy.
Signed-off-by: Alex Leong <alex@buoyant.io>
Adds an endpoint, at /profiles/new that allows you to input a service name and
namespace, and download a service profile yaml template.
This will enable future work, where we can add more of the yaml customization via
a form in the dashboard, and use that data to help the user configure routes.
Filtering by Kubernetes job was not supported. Also filtering by any unknown
type caused a panic.
Add filtering support by Kubernetes job, with special case mapping `job` to
`k8s_job`, to not conflict with Prometheus' job label.
Fix panic when unknown type specified as a `--from` or `--to` flag.
Fix `job` label from `linkerd-proxy` overwriting Prometheus `job` label at
collection time. This caused all metrics collected by proxy sidecars in
Kubernetes jobs to be collected into an incorrect Prometheus job, rather than
the expected `linkerd-proxy` Prometheus job.
Fix `unsupported resource type` tap error message incorrectly printing the
target resource rather than the destination.
Set `--controller-log-level debug` in `install_test.go` for easier debugging.
Expose `slow-cooker`'s metrics via a k8s service in the tap integration test, to
validate proxy requests with a job as destination.
Fixes#1872
Part of #627
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Adjust proxy, Prometheus, and Grafana probes
High `readinessProbe.initialDelaySeconds` values delayed the controller's
readiness by up to 30s, preventing cli commands from succeeding shortly after
control plane deployment.
Decrease `readinessProbe.initialDelaySeconds` in the proxy, Prometheus, and
Grafana to the default 0s. Also change `linkerd check` controller pod ordering
to: controller, prometheus, web, grafana.
Detailed probe changes:
- proxy
- decrease `readinessProbe.initialDelaySeconds` from 10s to 0s
- prometheus
- decrease `readinessProbe.initialDelaySeconds` from 30s to 0s
- decrease `readinessProbe.timeoutSeconds` from 30s to 1s
- decrease `livenessProbe.timeoutSeconds` from 30s to 1s
- grafana
- decrease `readinessProbe.initialDelaySeconds` from 30s to 0s
- decrease `readinessProbe.timeoutSeconds` from 30s to 1s
- decrease `readinessProbe.failureThreshold` from 10 to 3
- increase `livenessProbe.initialDelaySeconds` from 0s to 30s
Fixes#1804
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The `--open-api` flag is an alternative to the `--template` flag for the `linkerd profile` command. It reads an OpenAPI specification file (also called a swagger file) and uses it to generate a corresponding service profile.
Signed-off-by: Alex Leong <alex@buoyant.io>
This change allows some advised production config to be applied to the install of the control plane.
Currently this runs 3x replicas of the controller and adds some pretty sane requests to each of the components + containers of the control plane.
Fixes#1101
Signed-off-by: Ben Lambert <ben@blam.sh>
Add a routes command which displays per-route stats for services that have service profiles defined.
This change has three parts:
* A new public-api RPC called `TopRoutes` which serves per-route stat data about a service
* An implementation of TopRoutes in the public-api service. This implementation reads per-route data from Prometheus. This is very similar to how the StatSummaries RPC and much of the code was able to be refactored and shared.
* A new CLI command called `routes` which displays the per-route data in a tabular or json format. This is very similar to the `stat` command and much of the code was able to be refactored and shared.
Note that as of the currently targeted proxy version, only outbound route stats are supported so the `--from` flag must be included in order to see data. This restriction will be lifted in an upcoming change once we add support for inbound route stats as well.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Refactor util.BuildResource so it can deal with multiple resources
First step to address #1487: Allow stat summary to query for multiple
resources
Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
* Update the stat cli help text to explain the new multi resource querying ability
Propsal for #1487: Allow stat summary to query for multiple resources
Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
* Allow stat summary to query for multiple resources
Implement this ability by issuing parallel requests to requestStatsFromAPI()
Proposal for #1487
Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
* Update tests as part of multi-resource support in `linkerd stat` (#1487)
- Refactor stat_test.go to reuse the same logic in multiple tests, and
add cases and files for json output.
- Add a couple of cases to api_utils_test.go to test multiple resources
validation.
Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
* `linkerd stat` called with multiple resources should keep an ordering (#1487)
Add SortedRes holding the order of resources to be followed when
querying `linkerd stat` with multiple resources
Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
* Extra validations for `linkerd stat` with multiple resources (#1487)
Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
* `linkerd stat` resource grouping, ordering and name prefixing (#1487)
- Group together stats per resource type.
- When more than one resource, prepend name with type.
- Make sure tables always appear in the same order.
Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
* Allow `linkerd stat` to be called with multiple resources
A few final refactorings as per code review.
Fixes#1487
Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
We make several changes to the service profile spec to make service profiles more ergonomic and to make them more consistent with the destination profile API.
* Allow multiple fields to be simultaneously set on a RequestMatch or ResponseMatch condition. Doing so is equivalent to combining the fields with an "all" condition.
* Rename "responses" to "response_classes"
* Change "IsSuccess" to "is_failure"
Signed-off-by: Alex Leong <alex@buoyant.io>
A container called `proxy-api` runs in the Linkerd2 controller pod. This container listens on port 8086 and serves the proxy-api but does nothing other than forward gRPC requests to the destination container which listens on port 8089.
We remove the proxy-api container altogether and change the destination container to listen on port 8086 instead of 8089. The result is that clients still use the proxy-api by connecting to `proxy-api.<ns>.svc.cluster.local:8086` but the controller has one fewer containers. This results in a simpler system that is easier to reason about.
Signed-off-by: Alex Leong <alex@buoyant.io>
Service profiles must be named in the form `"<service>.<namespace>"`. This is inconsistent with the fully normalized domain name that the proxy sends to the controller. It also does not permit creating service profiles for non-Kubernetes services.
We switch to requiring that service profiles must be named with the FQDN of their service. For Kubernetes services, this is `"<service>.<namespace>.svc.cluster.local"`.
This change alone is not sufficient for allowing service profile for non-Kubernetes services because the k8s resolver will ignore any DNS names which are not Kubernetes services. Further refactoring of the resolver will be required to allow looking up non-Kubernetes service profiles in Kuberenetes.
Signed-off-by: Alex Leong <alex@buoyant.io>
The existence of an invalid service profile causes `linkerd check` to fail. This means that it is not possible to open the Linkerd dashboard with the `linkerd dashboard` command. While service profile validation is useful, it should not lock users out.
Add the ability to designate health checks as warnings. A failed warning health check will display a warning output in `linkerd check` but will not affect the overall success of the command. Switch the service profile validation to be a warning.
Signed-off-by: Alex Leong <alex@buoyant.io>
Add a new CLI command: `linkerd profile --template` which outputs a sample service profile yaml. Users can edit this sample and then `kubectl apply` it to add a service profile. The sample serves as "documentation by example" of what service profiles may contain.
Example usage:
```bash
linkerd profile -n emojivoto --template web-svc > web-svc-profile.yaml
# edit web-svc-profile.yaml in your favorite editor
kubectl apply -f web-svc-profile.yaml
```
Signed-off-by: Alex Leong <alex@buoyant.io>
We implement the getProfiles method in the destination service. This method returns a stream of destination profiles for a given authority. It does this by looking up the ServiceProfile resource in the controller namespace named `<svc>.<ns>` where `<svc>` is the name of the service and `<ns>` is the namespace of the service.
This PR includes:
* Adding a ServiceProfile Custom Resource Definition to linkerd install
* A watch based implementation of the getProfiles method in the destination service, similar to the implementation of get.
* An update to the destination client script that allows querying the getProfiles method.
Signed-off-by: Alex Leong <alex@buoyant.io>
Added support for json output in `linkerd stat` through a new (-o|--output)=json option.
Fixes#1417
Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
* Make room for columns in `linkerd top`
Make room for columns in `linkerd top`. Columns with data longer than some predetermined minimum length were stepping over each other.
Proposal for #1728
* Removed unneeded truncations
Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
* Add --single-namespace install flag for restricted permissions
* Better formatting in install template
* Mark --single-namespace and --proxy-auto-inject as experimental
* Fix wording of --single-namespace check flag
* Small healthcheck refactor
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Support auto sidecar-injection
1. Add proxy-injector deployment spec to cli/install/template.go
2. Inject the Linkerd CA bundle into the MutatingWebhookConfiguration
during the webhook's start-up process.
3. Add a new handler to the CA controller to create a new secret for the
webhook when a new MutatingWebhookConfiguration is created.
4. Declare a config map to store the proxy and proxy-init container
specs used during the auto-inject process.
5. Ignore namespace and pods that are labeled with
linkerd.io/auto-inject: disabled or linkerd.io/auto-inject: completed
6. Add new flag to `linkerd install` to enable/disable proxy
auto-injection
Proposed implementation for #561.
* Resolve missing packages errors
* Move the auto-inject label to the pod level
* PR review items
* Move proxy-injector to its own deployment
* Ignore pods that already have proxy injected
This ensures the webhook doesn't error out due to proxy that are injected using the command
* PR review items on creating/updating the MWC on-start
* Replace API calls to ConfigMap with file reads
* Fixed post-rebase broken tests
* Don't mutate the auto-inject label
Since we started using healhcheck.HasExistingSidecars() to ensure pods with
existing proxies aren't mutated, we don't need to use the auto-inject label as
an indicator.
This resolves a bug which happens with the kubectl run command where the deployment
is also assigned the auto-inject label. The mutation causes the pod auto-inject
label to not match the deployment label, causing kubectl run to fail.
* Tidy up unit tests
* Include proxy resource requests in sidecar config map
* Fixes to broken YAML in CLI install config
The ignore inbound and outbound ports are changed to string type to
avoid broken YAML caused by the string conversion in the uint slice.
Also, parameterized the proxy bind timeout option in template.go.
Renamed the sidecar config map to
'linkerd-proxy-injector-webhook-config'.
Signed-off-by: ihcsim <ihcsim@gmail.com>
* Added --context flag to specify the context to use to talk to the Kubernetes apiserver
* Fix tests that are failing
* Updated context flag description
Signed-off-by: Darko Radisic <ffd2subroutine@users.noreply.github.com>
Horizontal Pod Autoscaling does not work when container definitions in pods do not all have resource requests, so here's the ability to add CPU + Memory requests to install + inject commands by proving proxy options --proxy-cpu + --proxy-memory
Fixes#1480
Signed-off-by: Ben Lambert <ben@blam.sh>
In linkerd/linkerd2-proxy#99, several proxy configuration variables were
deprecated.
This change updates the CLI to use the updated names to avoid
deprecation warnings during startup.
* Remove wait option and make it a default for check
* Switch the wait default to true
* Wait by default also for dashboard
Signed-off-by: Alena Varkockova <varkockova.a@gmail.com>
`linkerd inject` was not checking its input for known sidecars and
initContainers.
Modify `linkerd inject` to check for existing sidecars and
initContainers, specifically, Linkerd, Istio, and Contour.
Part of #1516
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Fixes#1593
Add a `--hide-sources` flag to `linkerd top`. Setting this removes the source column from the output.
Signed-off-by: Alex Leong <alex@buoyant.io>
linkerd only routes TCP data, but `linkerd inject` does not warn when it
injects into pods with ports set to `protocol: UDP`.
Modify `linkerd inject` to warn when injected into a pod with
`protocol: UDP`. The Linkerd sidecar will still be injected, but the
stderr output will include a warning.
Also add stderr checking on all inject unit tests.
Part of #1516.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
If an input file is un-injectable, existing inject behavior is to simply
output a copy of the input.
Introduce a report, printed to stderr, that communicates the end state
of the inject command. Currently this includes checking for hostNetwork
and unsupported resources.
Malformed YAML documents will continue to cause no YAML output, and return
error code 1.
This change also modifies integration tests to handle stdout and stderr separately.
example outputs...
some pods injected, none with host networking:
```
hostNetwork: pods do not use host networking...............................[ok]
supported: at least one resource injected..................................[ok]
Summary: 4 of 8 YAML document(s) injected
deploy/emoji
deploy/voting
deploy/web
deploy/vote-bot
```
some pods injected, one host networking:
```
hostNetwork: pods do not use host networking...............................[warn] -- deploy/vote-bot uses "hostNetwork: true"
supported: at least one resource injected..................................[ok]
Summary: 3 of 8 YAML document(s) injected
deploy/emoji
deploy/voting
deploy/web
```
no pods injected:
```
hostNetwork: pods do not use host networking...............................[warn] -- deploy/emoji, deploy/voting, deploy/web, deploy/vote-bot use "hostNetwork: true"
supported: at least one resource injected..................................[warn] -- no supported objects found
Summary: 0 of 8 YAML document(s) injected
```
TODO: check for UDP and other init containers
Part of #1516
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The default value for the max-rps argument to the tap and top commands is an overly conservative 1rps. This causes the data to come in very slowly and much data to be discarded. Furthermore, because tap requests are windowed to 10 seconds, this causes long pauses between updates.
We fix this in two ways. Firstly we reduce the window size to 1s so that updates will come in at least once per second, even when the actual RPS of the data path is extremely high. Secondly, we increase the default max-rps parameter from 1 to 100. This allows tap to paint an accurate picture of the data much more quickly and sidesteps some sampling bias that happens when the max-rps is low.
In general, tap events tend to happen in bursts. For example, one request in may trigger one or more requests out. Likewise, a single upstream event may trigger several requests to the tapped pod in quick succession. Sampling bias will occur when the max-rps is less than the actual rps and when the tap event limit subdivides these event bursts (biasing towards the first few events in the burst). The greater the max-rps, the less the effects of this bias.
Fixes#1525
Signed-off-by: Alex Leong <alex@buoyant.io>
Previously, we would tap any resource's pods, regardless of whether the pods
were meshed or not. We can't actually tap non-meshed pods, so I'm adding a check
that will filter out non-meshed pods from the pods that tap watches.
Previous behaviour:
When attempting to hang a non meshed pod, it would establish
a watch on the pods, but then never return any results. In the CLI you could
just cancel it with Ctrl-C. In the web, clicking Stop would send a
WebSocket.close(1000) but wouldn't actually close the connection...
Behaviour after change :
If no pods under the specified resource are meshed, it'll
return an error of no pods being found to tap
Adds basic probes to the linkerd-proxy containers injected by linkerd inject.
- Currently the Readiness and Liveness probes are configured to be the same.
- I haven't supplied a periodSeconds, but the default is 10.
- I also set the initialDelaySeconds to 10, but that might be a bit high.
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
Closes#1170.
This branch adds a `-o wide` (or `--output wide`) flag to the Tap CLI.
Passing this flag adds `src_res` and `dst_res` elements to the Tap
output, as described in #1170. These use the metadata labels in the tap
event to describe what Kubernetes resource the source and destination
peers belong to, based on what resource type is being tapped, and fall
back to pods if either peer is not a member of the specified resource
type.
In addition, when the resource type is not `namespace`, `src_ns` and
`dst_ns` elements are added, which show what namespaces the the source
and destination peers are in. For peers which are not in the Kubernetes
cluster, none of these labels are displayed.
The source metadata added in #1434 is used to populate the `src_res` and
`src_ns` fields.
Also, this branch includes some refactoring to how tap output is
formatted.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
This an initial implementation of the `linkerd top` command. This command launches an ncurses style tabular view of current requests (using data from tap). Most of the command line arguments are the same as tap and allow selecting the resource to inspect and filtering which requests to view.
Fixes#1283
Signed-off-by: Alex Leong <alex@buoyant.io>
Closes#776.
This branch adds the following validation to the `linkerd stat` command:
* The `--to` and `--from` flags are now mutually exclusive
* The `--to-namespace` and `--from-namespace` commands are also mutually
exclusive.
* The `namespace` resource type conflicts with the `--namespace`,
`--to-namespace`, and `--from-namespace` flags.
Examples:
```
$ bin/go-run cli/main.go stat deploy --to deploy/foo --from deploy/bar
Error: --to and --from flags are mutually exclusive
Usage:
linkerd stat [flags] (RESOURCE)
...
```
```
$ bin/go-run cli/main.go stat deploy --to-namespace foo --from-namespace bar
Error: --to-namespace and --from-namespace flags are mutually exclusive
Usage:
linkerd stat [flags] (RESOURCE)
...
```
```
$ bin/go-run cli/main.go stat namespace foo --namespace bar
Error: --namespace flag is incompatible with namespace resource type
Usage:
linkerd stat [flags] (RESOURCE)
...
```
```
$ bin/go-run cli/main.go stat ns --to-namespace bar
Error: --to-namespace flag is incompatible with namespace resource type
Usage:
linkerd stat [flags] (RESOURCE)
...
```
```
$ bin/go-run cli/main.go stat namespace --from-namespace bar
Error: --from-namespace flag is incompatible with namespace resource type
Usage:
linkerd stat [flags] (RESOURCE)
...
```
```
$ bin/go-run cli/main.go stat ns/foo --from-namespace bar
Error: --from-namespace flag is incompatible with namespace resource type
Usage:
linkerd stat [flags] (RESOURCE)
...
```
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Currently, when a cluster has over 100 pods injected with the Linkerd2
proxy, Prometheus metrics are not collected correctly. This is because
Prometheus appears to be making more concurrent requests than its'
proxy's outbound router cache can handle See issue #1322 for further
details.
This branch introduces a workaround for this issue, by increasing the
outbound router cache capacity to 10000 routes for the Prometheus pod's
proxy only. The router capacity limit of 100 active routes is primarily
due to the limitation of the number of active Destination service
lookups, so increasing the capacity for the Prometheus pod specifically
is probably okay, as the scrape requests are made to IP addresses
directly and therefore will not cause service discovery lookups.
This change was originally implemented and tested in @siggy's PR #1228.
I've rebased his branch onto the current `master`, and updated the code
to reflect the project name change.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Co-authored-by: Andrew Seigner <siggy@buoyant.io>
This change is a simplified implementation of the Builder.Path() and
Visitor().ExpandPathsToFileVisitors() functions used by kubectl to parse files
and directories. The filepath.Walk() function is used to recursively traverse
directories. Every .yaml or .json resource file in the directory is read
into its own io.Reader. All the readers are then passed to the YAMLDecoder in the
InjectYAML() function.
Fixes#1376
Signed-off-by: ihcsim <ihcsim@gmail.com>
Adds a tap endpoint in the web api that communicates with the dashboard
via websockets.
I've moved a bunch of code from the cli tap.go into utils so that the code
can be shared between web and CLI. I think we should consider making the
display more suited to web, but in the short term, reusing the CLI's
rendering of tap events works.
Adds a Tap page in the Web UI that you can use to make tap requests.
The form currently only allows you to enter a resource and namespace,
other filters coming in a follow-up branch.
`ca-bundle-distributor` described the original role of the program but
`ca` ("Certificate Authority") better describes its current role.
Signed-off-by: Brian Smith <brian@briansmith.org>
These files were created with the executable bit set accidentally due
to the way my network file system setup was configured.
Signed-off-by: Brian Smith <brian@briansmith.org>
* update grafana dashboards to remove conduit reference and replace with linkerd instances
* update test install fixtures to reflect changes
Fixes: #1315
Signed-off-by: Franziska von der Goltz <franziska@vdgoltz.eu>
The control-plane's `ClusterRole` and `ClusterRoleBinding` objects are
global. Because their names did not vary across multiple control-plane
deployments, it prevented multiple control-planes from coexisting (when
RBAC is enabled).
Modify the `ClusterRole` and `ClusterRoleBinding` objects to include the
control-plane's namespace in their names. Also modify the integration
test to first install two control-planes, and then perform its full
suite of tests, to prevent regression.
Fixes#1292.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
This PR begins to migrate Conduit to Linkerd2:
* The proxy has been completely removed from this repo, and is now located at
github.com/linkerd/linkerd2-proxy.
* A `Dockerfile-proxy` has been added to fetch the most-recently published proxy
binary from build.l5d.io.
* Proxy-specific protobuf bindings have been moved to
github.com/linkerd/linkerd2-proxy-api.
* All docker images now use the gcr.io/linkerd-io registry.
* `inject` now uses `LINKERD2_PROXY_` environment variables
* Go paths have been updated to reflect the new (future) repo location.
Create a ephemeral, in-memory TLS certificate authority and integrate it into the certificate distributor.
Remove the re-creation of deleted ConfigMaps; this will be added back later in #1248.
Signed-off-by: Brian Smith brian@briansmith.org
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The proxy's metrics are instrumented with a `tls` label that describes
the state of TLS for each connection and associated messges.
This same level of detail is useful to get in `tap` output as well.
This change updates Tap in the following ways:
* `TapEvent` protobuf updated:
* Added `source_meta` field including source labels
* `proxy_direction` enum indicates which proxy server was used.
* The proxy adds a `tls` label to both source and destination meta indicating the state of each peer's connection
* The CLI uses the `proxy_direction` field to determine which `tls` label should be rendered.
If the controller address has a loopback host then don't use TLS to connect
to it. TLS isn't needed for security in that case. In mormal configurations
the proxy isn't terminating TLS for loopback connections anyway.
Signed-off-by: Brian Smith <brian@briansmith.org>
- Return pod uptimes from the GetPods endpoint
- Adds filtering by namespace to api.GetPods
- Adds a --namespace filter to conduit get pods
- Adds pod uptimes to the controller component toolitps on the ServiceMesh page
- Moves the ServiceMesh page back to using /api/pods
Adds the ability to query by a new non-kubernetes resource type, "authorities",
in the StatSummary api.
This includes an extensive refactor of stat_summary.go to deal with non-kubernetes
resource types.
- Add documentation to Resource in the public api so we can use it for authority
- Handle non-k8s resource requests in the StatSummary endpoint
- Rewrite stat summary fetching and parsing to handle non-k8s resources
- keys stat summary metric handling by Resource instead of a generated string
- Adds authority to the CLI
- Adds /authorities to the Web UI
- Adds some more stat integration and unit tests
The `tap` command is prone to panic due to use of `nil` values.
This is because we don't use the safe `Get*()` field accessors
provided by protobuf.
This change fixes several unsafe field access paths.
Fixes#47
* Add TLS support to `conduit inject`.
Add the settings needed to enable TLs when `--tls=optional` is passed on the
commend line. Later the requirement to add `--tls` will be removed.
Signed-off-by: Brian Smith <brian@briansmith.org>
* Add CA certificate bundle distributor to conduit install
* Update ca-distributor to use shared informers
* Only install CA distributor when --enable-tls flag is set
* Only copy CA bundle into namespaces where inject pods have the same controller
* Update API config to only watch pods and configmaps
* Address review feedback
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Add controller admin servers and readiness probes
* Tweak readiness probes to be more sane
* Refactor based on review feedback
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
Don't allow the CLI or Web UI to request named resources if --all-namespaces is used.
This follows kubectl, which also does not allow requesting named resources
over all namespaces.
This PR also updates the Web API's behaviour to be in line with the CLI's.
Both will now default to the default namespace if no namespace is specified.
Problem
`conduit stat` would cause a panic for any resource that wasn't in the list
of StatAllResourceTypes
This bug was introduced by https://github.com/runconduit/conduit/pull/1088/files
Solution
Fix writeStatsToBuffer to not depend on what resources are in StatAllResourceTypes
Also adds a unit test and integration test for `conduit stat ns`
The `kubernetes-nodes-cadvisor` Prometheus queries node-level data via
the Kubernetes API server. In some configurations of Kubernetes, namely
minikube and at least one baremetal kubespray cluster, this API call
requires the `get` verb on the `nodes/proxy` resource.
Enable `get` for `nodes/proxy` for the `conduit-prometheus` service
account.
Fixes#912
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
- It would be nice to display container errors in the UI. This PR gets the pod's container
statuses and returns them in the public api
- Also add a terminationMessagePolicy to conduit's inject so that we can capture the
proxy's error messages if it terminates
* Add readiness/liveness checks for third party components
Any possible issues with the third party control plane components can wedge the services.
Take the best practices for prometheus/grafana and add them to our template. See #1116
* Update test fixtures for new output
Previously, in conduit stat all we would just print the map of stat results, which
resulted in the order in which stats were displayed varying between prints.
Fix:
Define an array, k8s.StatAllResourceTypes and use the order in this array to print
the map; ensuring a consistent print order every time the command is run.
* conduit inject: Add flag to set proxy bind timeout (#863)
* fix test
* fix flag to get it working with #909
* Add time parsing
* Use the variable to set the default value
Signed-off-by: Sacha Froment <sfroment42@gmail.com>
- Update the `response_total` prometheus query of the StatSummary endpoint to also
break queries out by a `meshed` label.
- Add a 'Secured' column to the web UI/CLI stat displays, which indicate the percentage of traffic
starting and ending in the mesh
This meshed label is used in the CLI/Web UI to display a column of the percentage of traffic that
starts/ends in the mesh. (Which is a proxy indicator for whether that traffic is 'secured' when we
add TLS by default for intra mesh requests).
The `meshed` label is not yet added anywhere, so until it is supplied by the proxy, all traffic will
show up as 0% secured in the web/CLI.
Running `conduit install --api-port xxx` where xxx != 8086 would yield a
broken install.
Fix the install command to correctly propagate the `api-port` flag,
setting it as the serve address in the proxy-api container.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Remove package-scoped vars in cmd package
* Run gofmt on all cmd package files
* Re-add missing Args setting on check command
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
Allow the Stat endpoint in the public-api to accept requests for resourceType "all".
Currently, this queries Pods, Deployments, RCs and Services, but can be modified
to query other resources as well.
Both the CLI and web endpoints now work if you set resourceType to all.
e.g. `conduit stat all`
Grafana provides default dashboards for Prometheus and Grafana health.
The community also provides Kubernetes-specific dashboards. Conduit was
not taking advantage of these.
Introduce new Grafana dashboards focused on Grafana, Kubernetes, and
Prometheus health. Tag all Conduit dashboards for easier UI navigation.
Also fix layout in Conduit Health dashboard.
Part of #420
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Modify the Stat endpoint to also return the count of failed pods
* Add comments explaining pod count stats
* Rename total pod count to running pod count
This is to support the service mesh overview page, as I'd like to include an indicator of
failed pods there.
The `conduit tap` command is now deprecated.
Replace `conduit tap` with `connduit tapByResource`. Rename tapByResource
to tap. The underlying protobuf for tap remains, the tap gRPC endpoint now
returns Unimplemented.
Fixes#804
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The TapByResource command now has access to destination labels from the
proxy, but was not outputting them on the cli.
Modify the TapByResource output to print the destination pod label,
rather than the ip, when available.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
public-api and and tap were both using their own implementations of
the Kubernetes Informer/Lister APIs.
This change factors out all Informer/Lister usage into the Lister
module. This also introduces a new `Lister.GetObjects` method.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The Tap command leveraged new cli parsing code, enabling Kubernetes
resources specified as `(TYPE [NAME] | TYPE/NAME)`. The Stat command
did not use this.
Modify the Stat command to use the same cli flag parsing code as Tap.
Remove the to/from-resource flags from Stat.
Fixes#792
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
There have been a number of performance improvements and bug fixes since
v2.1.0.
Bump our Prometheus container to v2.2.1.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The `stat` command did not support `service` as a resource type.
This change adds `service` support to the `stat` command. Specifically:
- as a destination resource on `--to` commands
- as a target resource on `--from` commands
Fixes#805
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The existing `tap` command is being deprecated.
Introduce a `tapByResource` cli command. It supports tapping a Kubernetes
resource or collection of resources, optionally filtered by outbound resources.
This command will eventually replace `tap`.
Part of #778
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
This changes the public api to have a new rpc type, `TapByResource`.
This api supersedes the Tap api. `TapByResource` is richer, more closely
reflecting the proxy's capabilities.
The proxy's Tap api is extended to select over destination labels,
corresponding with those returned by the Destination api.
Now both `Tap` and `TapByResource`'s responses may include destination
labels.
This change avoids breaking backwards compatibility by:
* introducing the new `TapByResource` rpc type, opting not to change Tap
* extending the proxy's Match type with a new, optional, `destination_label` field.
* `TapEvent` is extended with a new, optional, `destination_meta`.
* Expose pod stats in CLI, web UI, and Grafana
* Fix js api helpers test
* Add outbound traffic stats to pod dashboard
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The public-api previously only permitted 4 hard-coded time windows:
10s, 1m, 10m, 1h. This was primarily a relic of the recently removed
telemetry system.
Modify the public-api to validate the time string, but allow for any
window size, which is then passed through to Prometheus.
Fixes#686
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
When prometheus queries the proxy for data, these requests are reported
as inbound traffic to the pod. This leads to misleading stats when a pod
otherwise receives little/no traffic.
In order to prevent these requests being proxied, the metrics port is
now added to the default inbound skip-ports list (as is already case for
the tap server).
Fixes#769
* Add namespace as a resource type in public-api
The cli and public-api only supported deployments as a resource type.
This change adds support for namespace as a resource type in the cli and
public-api. This also change includes:
- cli statsummary now prints `-`'s when objects are not in the mesh
- cli statsummary prints `No resources found.` when applicable
- removed `out-` from cli statsummary flags, and analagous proto changes
- switched public-api to use native prometheus label types
- misc error handling and logging fixes
Part of #627
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Refactor filter and groupby label formulation
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Rename stat_summary.go to stat.go in cli
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Update rbac privileges for namespace stats
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Remove the telemetry service
The telemetry service is no longer needed, now that prometheus scrapes
metrics directly from proxies, and the public-api talks directly to
prometheus. In this branch I'm removing the service itself as well as
all of the telemetry protobuf, and updating the conduit install command
to no longer install the service. I'm also removing the old version of
the stat command, which required the telemetry service, and renaming the
statsummary command to stat.
* Fix time window tests
* Remove deprecated controller scrape config
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The new StatSummary endpoint was only providing request volume and
successs rate information.
Add support for retrieving latency stats via StatSummary. Also make
all prometheus calls in parallel, and implement kubernetes test
fixtures.
Fixes#681
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The StatSummary logic was implemented as a method on http_server.
Move the StatSummary logic into grpc_server, for consistency with the
other endpoints.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* CLI: change conduit namespace shorthand flag to -c
All of the conduit CLI subcommands accept a --conduit-namespace flag,
indicating the namespace where conduit is running. Some of the
subcommands also provide a --namespace flag, indicating the kubernetes
namespace where a user's application code is running. To prevent
confusion, I'm changing the shorthand flag for the conduit namespace to
-c, and using the -n shorthand when referring to user namespaces.
As part of this change I've also standardized the capitalization of all
of our command line flags, removed the -r shorthand for the install
--registry flag, and made the global --kubeconfig and --api-addr flags
apply to all subcommands.
* Switch flag descriptions from lowercase to Capital
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The new statsummary command accepted friendly k8s names, which worked
for k8s queries, but Prometheus requires a specific key.
Modify the statsummary query to map friendly k8s names to canonical k8s
names when constructing the query. Then during the query, map the
canonical k8s name to a specific Prometheus label.
Fixes#695
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Start implementing new conduit stat summary endpoint.
Changes the public-api to call prometheus directly instead of the
telemetry service. Wired through to `api/stat` on the web server,
as well as `conduit statsummary` on the CLI. Works for deployments only.
Current implementation just retrieves requests and mesh/total pod count
(so latency stats are always 0).
Uses API defined in #663
Example queries the stat endpoint will eventually satisfy in #627
This branch includes commits from @klingerf
* run ./bin/dep ensure
* run ./bin/update-go-deps-shas
The Destination service used slightly different labels than the
telemetry pipeline expected, specifically, prefixed with `k8s_*`.
Make all Prometheus labels consistent by dropping `k8s_*`. Also rename
`pod_name` to `pod` for consistency with `deployement`, etc. Also update
and reorganize `proxy-metrics.md` to reflect new labelling.
Fixes#655
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Using a vanilla Grafana Docker image as part of `conduit install`
avoided maintaining a conduit-specific Grafana Docker image, but made
packaging dashboard json files cumbersome.
Roll our own Grafana Docker image, that includes conduit-specific
dashboard json files. This significantly decreases the `conduit install`
output size, and enables dashboard integration in the docker-compose
environment.
Fixes#567
Part of #420
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The Grafana dashboards were displaying all proxy-enabled pods, including
conduit controller pods. In the old telemetry pipeline filtering these
out required knowledge of the controller's namespace, which the
dashboards are agnostic to.
This change leverages the new `conduit_io_control_plane_component`
prometheus label to filter out proxy-enabled controller components.
Part of #420
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The Top-line and Deployment Grafana dashboards relied on the
soon-to-be-removed telemetry pipeline metrics.
Update the Grafana dashboards to query for the new, proxy-based metrics.
Grafana dashboard layouts have not changed.
Depends on #635 to render metrics.
Part of #420.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Previously we were using the instance label to uniquely identify a pod.
This meant that getting stats by pod name would require extra queries to
Kubernetes to map pod name to instance.
This change adds a pod_name label to metrics at collection time. This
should not affect cardinality as pod_name is invariant with respect to
instance.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
this is a known issue with grafana in k8s. grafana/grafana:5.0.4 was just released today. update the repo from 5.0.3 to 5.0.4
fixed issues #582
Signed-off-by: Deshi Xiao <xiaods@gmail.com>
The Prometheus scrape config collects from Conduit proxies, and maps
Kubernetes labels to Prometheus labels, appending "k8s_".
This change keeps the resultant Prometheus labels consistent with their
source Kubernetes labels. For example: "deployment" and
"pod_template_hash".
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The inject code detects the object it is being injected into, and writes
self-identifying information into the CONDUIT_PROMETHEUS_LABELS
environment variable, so that conduit-proxy may read this information
and report it to Prometheus at collection time.
This change puts the self-identifying information directly into
Kubernetes labels, which Prometheus already collects, removing the need
for conduit-proxy to be aware of this information. The resulting label
in Prometheus is recorded in the form `k8s_deployment`.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
In addition to dashboards display service health, we need a dashboard to
display health of the Conduit service mesh itself.
This change introduces a conduit-health dashboard. It currently only
displays health metrics for the control plane components. Proxy health
will come later.
Fixes#502
Part of #420
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Update the inject command to set a CONDUIT_PROMETHEUS_LABELS proxy environment variable with the name of the pod spec that the proxy is injected into. This will later be used as a label value when the proxy is exposing metrics.
Fixes: #426
Signed-off-by: Alex Leong <alex@buoyant.io>
The existing telemetry pipeline relies on Prometheus scraping the
Telemetry service, which will soon be removed.
This change configures Prometheus to scrape the conduit proxies directly
for telemetry data, and the control plane components for control-plane
health information. This affects the output of both conduit install
and conduit inject.
Fixes#428, #501
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The existing `conduit dashboard` command supported opening the conduit
dashboard, or displaying the conduit dashboard URL, via a `url` boolean
flag.
Replace the `url` boolean flag with a `show` string flag, with three
modes:
`conduit dashboard --show conduit`: default, open conduit dashboard
`conduit dashboard --show grafana`: open grafana dashboard
`conduit dashboard --show url`: display dashboard URLs
Part of #420
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Successful `conduit check` commands now take into account `[ok]`
and `\n` tokens when constraining line length.
Fixes#554
Signed-off-by: Andy Hume <andyhume@gmail.com>
Existing Grafana configuration contained no dashboards, just a skeleton
for testing.
Introduce two Grafana dashboards:
1) Top Line: Overall health of all Conduit-enabled services
2) Deployment: Health of a specific conduit-enabled deployment
Fixes#500
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Most controller listeners should only bind on localhost
* Use default listening addresses in controller components
* Review feedback
* Revert test_helper change
* Revert use of absolute domains
Signed-off-by: Alex Leong <alex@buoyant.io>
* Temporarily stop trying to support configurable zones in the proxy.
None of the zone configuration is tested and lots of things assume the cluster
zone is `cluster.local`. Further, how exactly the proxy will actually learn the
cluster zone hasn't been decided yet.
Just hard-code the zone as "cluster.local" in the proxy until configurable zones
are fully implemented and tested to be working correctly.
Signed-off-by: Brian Smith <brian@briansmith.org>
* Remove the CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN setting
The way that Kubernetes configures DNS search suffixes has some negative
consequences as some names like "example.com" are ambiguous: depending on
whether there is a service "example" in the "com" namespace, "example.com"
may refer to an external service or an internal service, and this can
fluctuate over time. In recognition of that we added the
CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN setting, thinking this would
be part of a solution for users to opt out of the unfortunate behavior
if their applications didn't depend on the DNS search suffix feature.
It turns out similar effects can be acheived using a custom dnsConfig,
starting in Kubernetes 1.10 when dnsConfig reaches the beta stability level.
Now any CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN-based seems duplicative.
Further, attempting to support it optionally made the code complex and hard
to read.
Therefore, let's just remove it. If/when somebody actually requests this
functionality then we can add it back, if dnsConfig isn't a valid alternative
for them.
Signed-off-by: Brian Smith <brian@briansmith.org>
* Further hard-code "cluster.local" as the zone, temporarily.
Addresses review feedback.
Signed-off-by: Brian Smith <brian@briansmith.org>
Kubernetes will do multiple DNS lookups for a name like
`proxy-api.conduit.svc.cluster.local` based on the default search settings
in /etc/resolv.conf for each container:
1. proxy-api.conduit.svc.cluster.local.conduit.svc.cluster.local. IN A
2. proxy-api.conduit.svc.cluster.local.svc.cluster.local. IN A
3. proxy-api.conduit.svc.cluster.local.cluster.local. IN A
4. proxy-api.conduit.svc.cluster.local. IN A
We do not need or want this search to be done, so avoid it by making each
name absolute by appending a period so that the first three DNS queries
are skipped for each name.
The case for `localhost` is even worse because we expect that `localhost` will
always resolve to 127.0.0.1 and/or ::1, but this is not guaranteed if the default
search is done:
1. localhost.conduit.svc.cluster.local. IN A
2. localhost.svc.cluster.local. IN A
3. localhost.cluster.local. IN A
4. localhost. IN A
Avoid these unnecessary DNS queries by making each name absolute, so that the
first three DNS queries are skipped for each name.
Signed-off-by: Brian Smith <brian@briansmith.org>
Grafana by default calls out to grafana.com to check for updates. As
user's of Conduit do not have direct control over updating Grafana
directly, this update check is not needed.
Disable Grafana's update check via grafana.ini.
This is also a workaround for #155, root cause of #519.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Add --expected-version flag for conduit check command
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Update build instructions
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
When the conduit proxy is injected into the controller pod, we observe controller pod proxy stats show up as an "outbound" deployment for an unrelated upstream deployment. This may cause confusion when monitoring deployments in the service mesh.
This PR filters out this "misleading" stat in the public api whenever the dashboard requests metric information for a specific deployment.
* exclude telemetry generated by the control plane when requesting deployment metrics
fixes#370
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
In the initial review for this code (preceding the creation of the
runconduit/conduit repository), it was noted that this container is not
actually used, so this is actually dead code.
Further, this container actualy causes a minor problem, as it doesn't
implement any retry logic, thus it will sometimes often cause errors to
be logged. See
https://github.com/runconduit/conduit/issues/496#issuecomment-370105328.
Further, this is a "buoyantio/" branded container. IF we actually need
such a container then it should be a Conduit-branded container.
See https://github.com/runconduit/conduit/issues/478 for additional
context.
Signed-off-by: Brian Smith <brian@briansmith.org>
Kubernetes $KUBECONFIG environment variable is a list of paths to
configuration files, but conduit assumes that it is a single path.
Changes in this commit introduce a straightforward way to discover and
load config file(s).
Complete list of changes:
- Use k8s.io/client-go/tools/clientcmd to deal with kubernetes
configuration file
- rename k8s API and k8s proxy constructors to get rid of redundancy
- remove shell package as it is not needed anymore
Signed-off-by: Igor Zibarev <zibarev.i@gmail.com>
`conduit install` deploys prometheus, but lacks a general-purpose way to
visualize that data.
This change adds a Grafana container to the `conduit install` command. It
includes two sample dashboards, viz and health, in their own respective
source files.
Part of #420
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
As part of `conduit check` command, warn the user if they are
running an outdated version of the cli client or the control
plane components.
Fixes#314
Signed-off-by: Andy Hume <andyhume@gmail.com>
The telemetry service in the controller pod uses a non-fully qualified URL to connect to the prometheus pod in the control plane. This PR changes the URL the telemetry's prometheus URL to be fully qualified to be consistent with other URLs in the control plane. This change was tested in minikube. The logs report no errors and looking at the prometheus dashboard shows that stats are being recorded from all conduit proxies.
fixes#414
Signed-off-by: Dennis Adjei-Baah dennis@buoyant.io