## What this changes
This fixes an issue in the Jaeger extension's `jaeger-injector` component that
causes an injection error in situations with high pod or namespace churn.
Because it cannot watch namespaces, it relies only off of `get` and this appears
to fall behind at a certain point. This surfaces as an error.
For example, in the `inject` test about half way through it errors with the
error:
```
=== RUN TestInjectAutoPod
inject_test.go:430: failed to create pod/inject-pod-test-terminus in namespace linkerd-inject-pod-test for exit status 1: Error from server: error when creating "STDIN": admission webhook "jaeger-injector.linkerd.io" denied the request: namespace "linkerd-inject-pod-test" not found
--- FAIL: TestInjectAutoPod (0.22s)
FAIL
```
Looking at the `jaeger-injector` logs, most of it's messages are about the test
namespaces not being created:
```
..
time="2021-01-15T15:34:12Z" level=info msg="received admission review request b2f36a9c-3f88-4abe-bcaa-f63c61cd24c0"
time="2021-01-15T15:34:12Z" level=info msg="received admission review request 9f5b229b-1c60-4b24-a020-b66cd201171e"
time="2021-01-15T15:34:12Z" level=error msg="failed to run webhook handler. Reason: namespace \"linkerd-inj-auto-params-test\" not found"
time="2021-01-15T15:34:12Z" level=info msg="received admission review request ae00d63a-1585-46ba-9a75-1f93d40766a8"
time="2021-01-15T15:34:12Z" level=info msg="received admission review request 998721eb-5625-4be8-9166-9db834c58f10"
time="2021-01-15T15:34:12Z" level=error msg="failed to run webhook handler. Reason: namespace \"linkerd-inj-auto-params-test\" not found"
time="2021-01-15T15:34:12Z" level=info msg="received admission review request 52e4e603-89b1-492b-a69b-dc8ff67d5f26"
time="2021-01-15T15:34:12Z" level=info msg="received admission review request 27558a16-5120-4aeb-a0bd-f22a1666b2b1"
time="2021-01-15T15:34:12Z" level=error msg="failed to run webhook handler. Reason: namespace \"linkerd-inj-auto-params-test\" not found"
..
```
Adding the `watch` verb to it's cluster role fixes this and these errors no
longer occur.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Currently, the linkerd jaeger check runs multiple checks but it doesn't have a check to confirm the state of the jaeger injector to be running.
This commit adds that required check to confirm the running state of the jaeger injector pod.
Fixes#5495
Signed-off-by: Yashvardhan Kukreja <yash.kukreja.98@gmail.com>
The Destination controller can panic due to a nil-deref when
the EndpointSlices API is enabled.
This change updates the controller to properly initialize values
to avoid this segmentation fault.
Fixes#5521
Signed-off-by: Oleg Ozimok <oleg.ozimok@corp.kismia.com>
* viz: add check sub-command
This adds a new `viz check` cmd performing checks for the resources
in linkerd-viz extension. Checks include resource checks and
the health of resources, certs, etc
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Detect default ns for metrics and profile subcommands
Followup to #5485, fixes remaining cases for #5524
Properly detect the default namespace given `kubeConfigPath` and
`kubeContext` for the `metrics`, `identity`, `routes` and `profile` subcommands.
Also gets rid once and for all of the `defaultNamespace` global var.
## edge-21.1.2
This edge release continues the work on decoupling non-core Linkerd components.
Commands that use the viz-extension i.e, `dashboard`, `edges`, `routes`,
`stat`, `tap` and `top` are moved to the `viz` sub-command. These commands are still
available under root but are marked deprecated and will be removed in a
later stable release.
This release also features proxy's dependencies upgrade to the
Tokio v1 ecosystem.
* Moved sub-commands that use the viz-extension under `viz`
* Started ignoring pods with status.phase=Succeeded when watching IP addresses
in destination. This is useful for re-use of IPs of terminated pods
* Support Bring your own Jaeger use-case by adding `collector.jaegerAddr` in
the jaeger extension.
* Fixed an issue with the generation of working manifests in the
`podAntiAffinity` use-case
* Added support for the modification of proxy resources in the viz
extension through `values.yaml` in Helm and flags in CLI.
* Improved error reporting for port-forward logic with namespace
and pod data, used across dashboard, checks, etc
(thanks @piyushsingariya)
* Added support to disable the rendering of `linkerd-viz` namespace
resource in the viz extension (thanks @nlamirault)
* Made service-profile generation work offline with `--ignore-cluster`
flag (thanks @piyushsingariya)
* Proxy's Tap API is disabled by default and it is enabled only when
`LINKERD2_PROXY_TAP_SVC_NAME` configuration is set. This means that
`LINKERD2_PROXY_TAP_DISABLED` is no longer honored
* Upgraded the proxy's dependencies to Tokio v1 ecosystem
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* viz: make viz cmds available at root
Fixes#5523
This branch makes viz commands that were previously available
under root to be available at both places i.e `linkerd` and
`linkerd viz`.
We also show a depreciated notice when ran under root, asking
to use them with the `viz` prefix.
This also updates all the help messages to address these cmds
as `linkerd viz xyz` instead of `linkerd xyz`
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
While he is still overwhelmingly excited about the project, @klingerf
isn't participating in the day-to-day tasks outlined in the updated
GOVERNANCE.md, and therefore requests to be moved to emeritus status.
Signed-off-by: Kevin Ingelman <ki@buoyant.io>
The governance structure documented is `GOVERNANCE.md` is no longer
suitable for the project and doesn't reflect the reality of how changes
are made.
This change proposes an updated, simplified governance structure that
clearly outlines the expectations for maintainers around project
participation and decision making. It is expected that *most*
contributions will not come from maintainers; but we need a core group
of maintainers that are ultimately responsible for technical stewardship
of the project.
* Separate observability API
Closes#5312
This is a preliminary step towards moving all the observability API into `/viz`, by first moving its protobuf into `viz/metrics-api`. This should facilitate review as the go files are not moved yet, which will happen in a followup PR. There are no user-facing changes here.
- Moved `proto/common/healthcheck.proto` to `viz/metrics-api/proto/healthcheck.prot`
- Moved the contents of `proto/public.proto` to `viz/metrics-api/proto/viz.proto` except for the `Version` Stuff.
- Merged `proto/controller/tap.proto` into `viz/metrics-api/proto/viz.proto`
- `grpc_server.go` now temporarily exposes `PublicAPIServer` and `VizAPIServer` interfaces to separate both APIs. This will get properly split in a followup.
- The web server provides handlers for both interfaces.
- `cli/cmd/public_api.go` and `pkg/healthcheck/healthcheck.go` temporarily now have methods to access both APIs.
- Most of the CLI commands will use the Viz API, except for `version`.
The other changes in the go files are just changes in the imports to point to the new protobufs.
Other minor changes:
- Removed `git add controller/gen` from `bin/protoc-go.sh`
Users may have an existing Jaeger deployment and want to send traces to it from Linkerd.
We add the `collector.jaegerAddr` value to the Linkerd-Jaeger chart which configures the address of the jaeger backend which the opencensus collector sends to. If left unspecified, the collector will use the jaeger instance in the linkerd-jaeger extension.
To test:
Install Jaeger backend separately:
```
curl https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/examples/simplest.yaml | docker run -i --rm jaegertracing/jaeger-operator:master generate | kubectl apply -n jaeger-test -f -
```
Install Linkerd and Linkerd-jaeger, specifying the existing jaeger backend
```
linkerd install | kubectl apply -f -
linkerd jaeger install --set collector.jaegerAddr='http://my-jaeger-collector.jaeger-test:14268/api/traces' | kubectl apply -f -
```
Install emojivoto and configure it:
```
linkerd inject https://run.linkerd.io/emojivoto.yml | kubectl apply -f -
kubectl -n emojivoto set env --all deploy OC_AGENT_HOST=collector.linkerd-jaeger:55678
```
View traces in your custom jaeger backend:
```
kubectl -n jaeger-test port-forward svc/my-jaeger-query 16686 &
open http://localhost:16686
```
Signed-off-by: Alex Leong <alex@buoyant.io>
* viz: move sub-cmds using viz extension under viz cmd
Fixes#5327 , #5524
This branch moves the following commands, under the `linkerd viz`
cmd as they use the viz extension to perform the job.
- dashboard
- edges
- routes
- stat
- tap
- top
This also creates a new pkg `public-api` which fecilitates
interaction and communication with public-api to be used
across extensions.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Co-authored-by: Alex Leong <alex@buoyant.io>
The linkerd uninstall command is able to remove a lot of the test resources used in CI but it ends up leaving the test namespaces though.
Still, the test-cleanup script can be cleaned down to a good level by getting rid of the populate_array function.
Hence, this commits adds a one-liner, alongside linkerd uninstall, to deal with the deletion of all the test namespaces and the resources instead of using the big chunk of populate_array function.
Fixes#5497
Signed-off-by: Yashvardhan Kukreja <yash.kukreja.98@gmail.com>
Ignore pods with status.phase=Succeeded when watching IP addresses
When a pod terminates successfully, some CNIs will assign its IP address
to newly created pods. This can lead to duplicate pod IPs in the same
Kubernetes cluster.
Filter out pods which are in a Succeeded phase since they are not
routable anymore.
Fixes#5394
Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
The last viz refactoring removed support for modifying the k8s resources
used by the proxies injected into the control plane components (values
like `tapProxyResources`, `prometheus.proxy.resources`, etc).
This adds them back, using a consistent naming: `tap.proxy.resources`,
`dashboard.proxy.resources`, etc.
Also fixes the tap helm template that was making reference to
`.Values.tapResources` instead of `.Values.tap.resources`.
Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* viz: add render golden tests
This branch adds golden tests for the viz install. This would be
useful to track changes in render as more changes are added.
This also moves the common code that is used across extensions
to generate diffs into `testutil` to be able to be used widely.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Subject
Related to issue #5457
Problem
Linkerd only reports the local port and the remote port whenever port-forwarding fails.
Linkerd could print out namespace and port if port-forwarding fails instead of just at the error state and then force users to collate the port themselves
Solution
Linkerd needs to print the namespace and the pod name.
- [x] Add two new string variables namespace and podName in `struct PortForward`
- [x] assign the values to the variables when a new Instance is being created in `func NewPortForward()`
run() function propagates the errors that occurred while port-forwarding
- [x] Format the error being returned by `ForwardPorts()` from client-go using `fmt.Errorf()` and add `namespace` and `podName` as suffix and return error
The error is being returned by ForwardPorts() from client-go https://github.com/kubernetes/client-go/blob/master/tools/portforward/portforward.go#L188Fixes#5457
Signed-off-by: Piyush Singariya <piyushsingariya@gmail.com>
This enables the `helm-upgrade` and `upgrade-stable` integration tests,
that were disabled because the previous versions didn't have ARM
support, but now 2.9 does.
If namespace is manage by an external tool , it fails on install.
Add a feature to not manage namespace by Helm.
Signed-off-by: Nicolas Lamirault <nicolas.lamirault@gmail.com>
Closes#5401
* offline profile generation with --ignore-cluster
* validation added for ignoreCluster and service profile with tap data
Signed-off-by: Piyush Singariya <piyushsingariya@gmail.com>
## What this fixes
When clusters are cleaned up after tests in CI, the `bin/test-cleanup` script is
responsible for clearing the cluster of all testing resources.
Right now this does not work as expected because the script uses the `linkerd`
binary instead of the Linkerd path that is passed in to the `tests` script.
There are cases where different binaries have different uninstall behavior and
the script can complete with an incomplete uninstallation.
## How it fixes
`test-cleanup` now takes a linkerd path argument. This is used to specify the
Linkerd binary that should be used when running in the `uninstall` commands.
This value is passed through from the `tests` invocation which means that in CI,
the same binary is used for running tests as well as cleaning up the cluster.
Additionally, specifying the k8s context has now moved from an argument to the
`--context` flag. This is similar to how `tests` script works because it's not
always required.
## How to use
Shown here:
``` $ bin/test-cleanup -h Cleanup Linkerd integration tests.
Usage:
test-cleanup [--context k8s_context] /path/to/linkerd
Examples:
# Cleanup tests in non-default context test-cleanup --context k8s_context
/path/to/linkerd
Available Commands:
--context: use a non-default k8s context
```
## edge-21.1.1
This edge release introduces a new "opaque transport" feature that allows the
proxy to securely transport server-speaks-first and otherwise opaque TCP
traffic. Using the `config.linkerd.io/opaque-ports` annotation on pods and
namespaces, users can configure ports that should skip the proxy's protocol
detection.
Additionally, a new `linkerd-viz` extension has been introduced that separates
the installation of the Grafana, Prometheus, web, and tap components. This
extension closely follows the Jaeger and multicluster extensions; users can
`install` and `uninstall` with the `linkerd viz ..` command as well as configure
for HA with the `--ha` flag.
The `linkerd viz install` command does not have any cli flags to customize the
install directly, but instead follows the Helm way of customization by using
flags such as `set`, `set-string`, `values`, `set-files`.
Finally, a new `/shutdown` admin endpoint that may only be accessed over the
loopback network has been added. This allows batch jobs to gracefully terminate
the proxy on completion. The `linkerd-await` utility can be used to automate
this.
* Added a new `linkerd multicluster check` command to validate that the
`linkerd-multicluster` extension is working correctly
* Fixed description in the `linkerd edges` command (thanks @jsoref!)
* Moved the Grafana, Prometheus, web, and tap components into a new Viz chart,
following the same extension model that multicluster and Jaeger follow
* Introduced a new "opaque transport" feature that allows the proxy to securely
transport server-speaks-first and otherwise opaque TCP traffic
* Removed the check comparing the `ca.crt` field in the identity issuer secret
and the trust anchors in the Linkerd config; these values being different is
not a failure case for the `linkerd check` command (thanks @cypherfox!)
* Removed the Prometheus check from the `linkerd check` command since it now
depends on a component that is installed with the Viz extension
* Fixed error messages thrown by the cert checks in `linkerd check` (thanks
@pradeepnnv!)
* Added PodDisruptionBudgets to the control plane components so that they cannot
be all terminated at the same time during disruptions (thanks @tustvold!)
* Fixed an issue that displayed the wrong `linkerd.io/proxy-version` when it is
overridden by annotations (thanks @mateiidavid!)
* Added support for custom registries in the `linkerd-viz` helm chart (thanks
@jimil749!)
* Renamed `proxy-mutator` to `jaeger-injector` in the `linkerd-jaeger` extension
* Added a new `/shutdown` admin endpoint that may only be accessed over the
loopback network allowing batch jobs to gracefully terminate the proxy on
completion
* Introduced the `linkerd identity` command, used to fetch the TLS certificates
for injected pods (thanks @jimil749)
* Fixed an issue with the CNI plugin where it was incorrectly terminating and
emitting error events (thanks @mhulscher!)
* Re-added support for non-LoadBalancer service types in the
`linkerd-multicluster` extension
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Use the `uninstall` command for the viz and jaeger extensions to ensure clusters
are cleaned up properly after tests
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
## edge-20.12.4
This edge release adds support for the `config.linkerd.io/opaque-ports`
annotation on pods and namespaces, to configure ports that should skip the
proxy's protocol detection. In addition, it adds new CLI commands related to the
`linkerd-jaeger` extension, fixes bugs in the CLI `install` and `upgrade`
commands and Helm charts, and fixes a potential false positive in the proxy's
HTTP protocol detection. Finally, it includes improvements in proxy performance
and memory usage, including an upgrade for the proxy's dependency on the Tokio
async runtime.
* Added support for the `config.linkerd.io/opaque-ports` annotation on pods and
namespaces, to indicate to the proxy that some ports should skip protocol
detection
* Fixed an issue where `linkerd install --ha` failed to honor flags
* Fixed an issue where `linkerd upgrade --ha` can override existing configs
* Added missing label to the `linkerd-config-overrides` secret to avoid breaking
upgrades performed with the help of `kubectl apply --prune`
* Added a missing icon to Jaeger Helm chart
* Added new `linkerd jaeger check` CLI command to validate that the
`linkerd-jaeger` extension is working correctly
* Added new `linkerd jaeger uninstall` CLI command to print the `linkerd-jaeger`
extension's resources so that they can be piped into `kubectl delete`
* Fixed an issue where the `linkerd-cni` daemgitonset may not be installed on all
intended nodes, due to missing tolerations to the `linkerd-cni` Helm chart
(thanks @rish-onesignal!)
* Fixed an issue where the `tap` APIServer would not refresh its certs
automatically when provided externally—like through cert-manager
* Changed the proxy's cache eviction strategy to reduce memory consumption,
especially for busy HTTP/1.1 clients
* Fixed an issue in the proxy's HTTP protocol detection which could cause false
positives for non-HTTP traffic
* Increased the proxy's default dispatch timeout to 5 seconds to accomodate
connection pools which might open conenctions without immediately making a
request
* Updated the proxy's Tokio dependency to v0.3
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This branch adds links to the configurable fields list for
each extension's install cmd.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
This PR adds `--ha` flag for `viz install` which overrides with
the `values-ha.yaml` of the viz chart. This PR adds these functions
in `pkg/charts` so that the same can be re-used elsewhere.
## Testing
```bash
tarun in dev in on k3d-deep () linkerd2 on tarun/viz-ha-nits [$?] via 🐹 v1.15.4 took 2s
❯ ./bin/go-run cli viz install | grep 1024
tarun in dev in on k3d-deep () linkerd2 on tarun/viz-ha-nits [$?] via 🐹 v1.15.4 took 2s
❯ ./bin/go-run cli viz install --ha | grep 1024
memory: "1024Mi"
tarun in dev in on k3d-deep () linkerd2 on tarun/viz-ha-nits [$?] via 🐹 v1.15.4 took 2s
❯ ./bin/go-run cli viz install --ha --set grafana.resources.memory.limit=1023Mi | grep 1023
memory: "1023Mi"
```
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
As #5307 & #5293 went in the same time-frame, Some of the logic
added in #5307 got lost during the merge. (oopss, Sorry!)
The same logic has been added back. The MC refactor PR #5293 moved
all the logic from `multicluster.go` into cmd specific files
whose changes added in #5307 were lost, while the changes added
in `multicluster/values.go` and template files still remained.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* viz: add a retry check for core control-plane pods before install
This commit adds a new check so that `viz install` waits till
the control-plane pods are up. For this to work, the `prometheus`
sub-system check in control-plane self-check has been removed,
as we re-use healthchecks to perform this.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* viz: add a new uninstall command
This adds a new `linkerd viz uninstall` command emitting the resources
with the `linkerd.io/extension=linkerd-viz` label set.
The container-image `ghcr.io/linkerd/cni-plugin:stable-2.9.1` does not contain the `kill` command as an executable. Instead, it is available as a shell built-in. In its current state, Kubernetes emits error events whenever linkerd2-cni pods are terminated because the `kill` command can not be found.
Signed-off-by: Mitch Hulscher <mitch.hulscher@lib.io>
Chart dependencies are added as tarballs under the chart's `chart`
subdirectory. When we move chart dependencies around this can leave
stale dependencies behind, ensuing havoc. This PR removes those deps
before calling `helm dep up`.
This test was never broken. My best guess is that CI was not merging with the
latest `main` as we have recently noticed, so this was an issue that was fixed
by #5458Closes#5478
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Currently, Each new instance of `Checker` type have to manually
set all the fields with the `NewChecker()`, even though most
use-cases are fine with the defaults.
This branch makes this simpler by using the Builder pattern, so
that the users of `Checker` can override the defaults by using
specific field methods when needed. Thus simplifying the code.
This also removes some of the methods that were specific to tests,
and replaces them with the currently used ones.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
CLI: Introduced `identity` command to fetch tls-certificates for a pod (#4459)
Modified and added a new cli command, which initiates a sni-tls session to the proxy's admin port and returns the certificate.
Usage:
- `linkerd identity pod/<pod-name>` : fetches certificate from the specified pod
- `linkerd identity -l app=svc/emoji` : fetches certificate from all pods with label app=svc/emoji
Signed-off-by: Jimil Desai <jimildesai42@gmail.com>
Proxy logs are disabled in tests. This makes it difficult to inspect
proxies after failed tests. This change re-enables the default proxy
logs in tests.
With this new way of chart rendering i.e using helm pkg directly
instead of using our own struct, we no longer need the `Values`
struct to be present, as all the rendering happens through
`map[string]interface{}`
This might be useful in future when we do validation of values, which
can also be done directly with out this, unless we don't want to deal
with conversions
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
The name `proxy-mutator` is too generic. In particular, several different linkerd extensions will have mutating webhooks which mutate the proxy sidecar, the MutatingWebhookConfiguration resource is cluster scoped, and each one needs a unique name.
We use the `jaeger-injector` name instead. This gives us a pattern to follow for future webhooks as well (e.g. `tap-injector` etc.)
Signed-off-by: Alex Leong <alex@buoyant.io>
While using k3d v3.0.2 using 3 nodes and installing linkerd in HA I've
seen errors like
```
Error from server: error when creating "STDIN": rpc error: code = Unknown desc = database is locked
```
Which doesn't happen on v3.4.0.
This brings though by default k8s v1.19, which is producing some
warnings in `linkerd check` like:
```
W0106 11:09:39.204081 292603 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
```
That only affects stderr so the tests still pass, but needs to be
addressed in a followup.
The `get-pod` and `port-forward` functions continue to assume
deployments like grafana still live under the `linkerd` namespace.
This expands the definition of those functions to be able to specify the
namespace.
These changes can be solely tested by running `bin/web dev` (follow the
instructions in `BUILD.md` for the preliminaries needed).
Split the image `name` field in `viz/charts/linkerd-viz/values.yaml` into `name` and `registry` to support custom registries. Changed the template files accordingly.
Just like other values, the registry can now be configured via CLI via the `--set-*` flags.
Fixes#5430
Signed-off-by: Jimil Desai <jimildesai42@gmail.com>