The governance structure documented is `GOVERNANCE.md` is no longer
suitable for the project and doesn't reflect the reality of how changes
are made.
This change proposes an updated, simplified governance structure that
clearly outlines the expectations for maintainers around project
participation and decision making. It is expected that *most*
contributions will not come from maintainers; but we need a core group
of maintainers that are ultimately responsible for technical stewardship
of the project.
* Separate observability API
Closes#5312
This is a preliminary step towards moving all the observability API into `/viz`, by first moving its protobuf into `viz/metrics-api`. This should facilitate review as the go files are not moved yet, which will happen in a followup PR. There are no user-facing changes here.
- Moved `proto/common/healthcheck.proto` to `viz/metrics-api/proto/healthcheck.prot`
- Moved the contents of `proto/public.proto` to `viz/metrics-api/proto/viz.proto` except for the `Version` Stuff.
- Merged `proto/controller/tap.proto` into `viz/metrics-api/proto/viz.proto`
- `grpc_server.go` now temporarily exposes `PublicAPIServer` and `VizAPIServer` interfaces to separate both APIs. This will get properly split in a followup.
- The web server provides handlers for both interfaces.
- `cli/cmd/public_api.go` and `pkg/healthcheck/healthcheck.go` temporarily now have methods to access both APIs.
- Most of the CLI commands will use the Viz API, except for `version`.
The other changes in the go files are just changes in the imports to point to the new protobufs.
Other minor changes:
- Removed `git add controller/gen` from `bin/protoc-go.sh`
Users may have an existing Jaeger deployment and want to send traces to it from Linkerd.
We add the `collector.jaegerAddr` value to the Linkerd-Jaeger chart which configures the address of the jaeger backend which the opencensus collector sends to. If left unspecified, the collector will use the jaeger instance in the linkerd-jaeger extension.
To test:
Install Jaeger backend separately:
```
curl https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/examples/simplest.yaml | docker run -i --rm jaegertracing/jaeger-operator:master generate | kubectl apply -n jaeger-test -f -
```
Install Linkerd and Linkerd-jaeger, specifying the existing jaeger backend
```
linkerd install | kubectl apply -f -
linkerd jaeger install --set collector.jaegerAddr='http://my-jaeger-collector.jaeger-test:14268/api/traces' | kubectl apply -f -
```
Install emojivoto and configure it:
```
linkerd inject https://run.linkerd.io/emojivoto.yml | kubectl apply -f -
kubectl -n emojivoto set env --all deploy OC_AGENT_HOST=collector.linkerd-jaeger:55678
```
View traces in your custom jaeger backend:
```
kubectl -n jaeger-test port-forward svc/my-jaeger-query 16686 &
open http://localhost:16686
```
Signed-off-by: Alex Leong <alex@buoyant.io>
* viz: move sub-cmds using viz extension under viz cmd
Fixes#5327 , #5524
This branch moves the following commands, under the `linkerd viz`
cmd as they use the viz extension to perform the job.
- dashboard
- edges
- routes
- stat
- tap
- top
This also creates a new pkg `public-api` which fecilitates
interaction and communication with public-api to be used
across extensions.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Co-authored-by: Alex Leong <alex@buoyant.io>
The linkerd uninstall command is able to remove a lot of the test resources used in CI but it ends up leaving the test namespaces though.
Still, the test-cleanup script can be cleaned down to a good level by getting rid of the populate_array function.
Hence, this commits adds a one-liner, alongside linkerd uninstall, to deal with the deletion of all the test namespaces and the resources instead of using the big chunk of populate_array function.
Fixes#5497
Signed-off-by: Yashvardhan Kukreja <yash.kukreja.98@gmail.com>
Ignore pods with status.phase=Succeeded when watching IP addresses
When a pod terminates successfully, some CNIs will assign its IP address
to newly created pods. This can lead to duplicate pod IPs in the same
Kubernetes cluster.
Filter out pods which are in a Succeeded phase since they are not
routable anymore.
Fixes#5394
Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
The last viz refactoring removed support for modifying the k8s resources
used by the proxies injected into the control plane components (values
like `tapProxyResources`, `prometheus.proxy.resources`, etc).
This adds them back, using a consistent naming: `tap.proxy.resources`,
`dashboard.proxy.resources`, etc.
Also fixes the tap helm template that was making reference to
`.Values.tapResources` instead of `.Values.tap.resources`.
Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* viz: add render golden tests
This branch adds golden tests for the viz install. This would be
useful to track changes in render as more changes are added.
This also moves the common code that is used across extensions
to generate diffs into `testutil` to be able to be used widely.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Subject
Related to issue #5457
Problem
Linkerd only reports the local port and the remote port whenever port-forwarding fails.
Linkerd could print out namespace and port if port-forwarding fails instead of just at the error state and then force users to collate the port themselves
Solution
Linkerd needs to print the namespace and the pod name.
- [x] Add two new string variables namespace and podName in `struct PortForward`
- [x] assign the values to the variables when a new Instance is being created in `func NewPortForward()`
run() function propagates the errors that occurred while port-forwarding
- [x] Format the error being returned by `ForwardPorts()` from client-go using `fmt.Errorf()` and add `namespace` and `podName` as suffix and return error
The error is being returned by ForwardPorts() from client-go https://github.com/kubernetes/client-go/blob/master/tools/portforward/portforward.go#L188Fixes#5457
Signed-off-by: Piyush Singariya <piyushsingariya@gmail.com>
This enables the `helm-upgrade` and `upgrade-stable` integration tests,
that were disabled because the previous versions didn't have ARM
support, but now 2.9 does.
If namespace is manage by an external tool , it fails on install.
Add a feature to not manage namespace by Helm.
Signed-off-by: Nicolas Lamirault <nicolas.lamirault@gmail.com>
Closes#5401
* offline profile generation with --ignore-cluster
* validation added for ignoreCluster and service profile with tap data
Signed-off-by: Piyush Singariya <piyushsingariya@gmail.com>
## What this fixes
When clusters are cleaned up after tests in CI, the `bin/test-cleanup` script is
responsible for clearing the cluster of all testing resources.
Right now this does not work as expected because the script uses the `linkerd`
binary instead of the Linkerd path that is passed in to the `tests` script.
There are cases where different binaries have different uninstall behavior and
the script can complete with an incomplete uninstallation.
## How it fixes
`test-cleanup` now takes a linkerd path argument. This is used to specify the
Linkerd binary that should be used when running in the `uninstall` commands.
This value is passed through from the `tests` invocation which means that in CI,
the same binary is used for running tests as well as cleaning up the cluster.
Additionally, specifying the k8s context has now moved from an argument to the
`--context` flag. This is similar to how `tests` script works because it's not
always required.
## How to use
Shown here:
``` $ bin/test-cleanup -h Cleanup Linkerd integration tests.
Usage:
test-cleanup [--context k8s_context] /path/to/linkerd
Examples:
# Cleanup tests in non-default context test-cleanup --context k8s_context
/path/to/linkerd
Available Commands:
--context: use a non-default k8s context
```
## edge-21.1.1
This edge release introduces a new "opaque transport" feature that allows the
proxy to securely transport server-speaks-first and otherwise opaque TCP
traffic. Using the `config.linkerd.io/opaque-ports` annotation on pods and
namespaces, users can configure ports that should skip the proxy's protocol
detection.
Additionally, a new `linkerd-viz` extension has been introduced that separates
the installation of the Grafana, Prometheus, web, and tap components. This
extension closely follows the Jaeger and multicluster extensions; users can
`install` and `uninstall` with the `linkerd viz ..` command as well as configure
for HA with the `--ha` flag.
The `linkerd viz install` command does not have any cli flags to customize the
install directly, but instead follows the Helm way of customization by using
flags such as `set`, `set-string`, `values`, `set-files`.
Finally, a new `/shutdown` admin endpoint that may only be accessed over the
loopback network has been added. This allows batch jobs to gracefully terminate
the proxy on completion. The `linkerd-await` utility can be used to automate
this.
* Added a new `linkerd multicluster check` command to validate that the
`linkerd-multicluster` extension is working correctly
* Fixed description in the `linkerd edges` command (thanks @jsoref!)
* Moved the Grafana, Prometheus, web, and tap components into a new Viz chart,
following the same extension model that multicluster and Jaeger follow
* Introduced a new "opaque transport" feature that allows the proxy to securely
transport server-speaks-first and otherwise opaque TCP traffic
* Removed the check comparing the `ca.crt` field in the identity issuer secret
and the trust anchors in the Linkerd config; these values being different is
not a failure case for the `linkerd check` command (thanks @cypherfox!)
* Removed the Prometheus check from the `linkerd check` command since it now
depends on a component that is installed with the Viz extension
* Fixed error messages thrown by the cert checks in `linkerd check` (thanks
@pradeepnnv!)
* Added PodDisruptionBudgets to the control plane components so that they cannot
be all terminated at the same time during disruptions (thanks @tustvold!)
* Fixed an issue that displayed the wrong `linkerd.io/proxy-version` when it is
overridden by annotations (thanks @mateiidavid!)
* Added support for custom registries in the `linkerd-viz` helm chart (thanks
@jimil749!)
* Renamed `proxy-mutator` to `jaeger-injector` in the `linkerd-jaeger` extension
* Added a new `/shutdown` admin endpoint that may only be accessed over the
loopback network allowing batch jobs to gracefully terminate the proxy on
completion
* Introduced the `linkerd identity` command, used to fetch the TLS certificates
for injected pods (thanks @jimil749)
* Fixed an issue with the CNI plugin where it was incorrectly terminating and
emitting error events (thanks @mhulscher!)
* Re-added support for non-LoadBalancer service types in the
`linkerd-multicluster` extension
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Use the `uninstall` command for the viz and jaeger extensions to ensure clusters
are cleaned up properly after tests
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
## edge-20.12.4
This edge release adds support for the `config.linkerd.io/opaque-ports`
annotation on pods and namespaces, to configure ports that should skip the
proxy's protocol detection. In addition, it adds new CLI commands related to the
`linkerd-jaeger` extension, fixes bugs in the CLI `install` and `upgrade`
commands and Helm charts, and fixes a potential false positive in the proxy's
HTTP protocol detection. Finally, it includes improvements in proxy performance
and memory usage, including an upgrade for the proxy's dependency on the Tokio
async runtime.
* Added support for the `config.linkerd.io/opaque-ports` annotation on pods and
namespaces, to indicate to the proxy that some ports should skip protocol
detection
* Fixed an issue where `linkerd install --ha` failed to honor flags
* Fixed an issue where `linkerd upgrade --ha` can override existing configs
* Added missing label to the `linkerd-config-overrides` secret to avoid breaking
upgrades performed with the help of `kubectl apply --prune`
* Added a missing icon to Jaeger Helm chart
* Added new `linkerd jaeger check` CLI command to validate that the
`linkerd-jaeger` extension is working correctly
* Added new `linkerd jaeger uninstall` CLI command to print the `linkerd-jaeger`
extension's resources so that they can be piped into `kubectl delete`
* Fixed an issue where the `linkerd-cni` daemgitonset may not be installed on all
intended nodes, due to missing tolerations to the `linkerd-cni` Helm chart
(thanks @rish-onesignal!)
* Fixed an issue where the `tap` APIServer would not refresh its certs
automatically when provided externally—like through cert-manager
* Changed the proxy's cache eviction strategy to reduce memory consumption,
especially for busy HTTP/1.1 clients
* Fixed an issue in the proxy's HTTP protocol detection which could cause false
positives for non-HTTP traffic
* Increased the proxy's default dispatch timeout to 5 seconds to accomodate
connection pools which might open conenctions without immediately making a
request
* Updated the proxy's Tokio dependency to v0.3
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This branch adds links to the configurable fields list for
each extension's install cmd.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
This PR adds `--ha` flag for `viz install` which overrides with
the `values-ha.yaml` of the viz chart. This PR adds these functions
in `pkg/charts` so that the same can be re-used elsewhere.
## Testing
```bash
tarun in dev in on k3d-deep () linkerd2 on tarun/viz-ha-nits [$?] via 🐹 v1.15.4 took 2s
❯ ./bin/go-run cli viz install | grep 1024
tarun in dev in on k3d-deep () linkerd2 on tarun/viz-ha-nits [$?] via 🐹 v1.15.4 took 2s
❯ ./bin/go-run cli viz install --ha | grep 1024
memory: "1024Mi"
tarun in dev in on k3d-deep () linkerd2 on tarun/viz-ha-nits [$?] via 🐹 v1.15.4 took 2s
❯ ./bin/go-run cli viz install --ha --set grafana.resources.memory.limit=1023Mi | grep 1023
memory: "1023Mi"
```
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
As #5307 & #5293 went in the same time-frame, Some of the logic
added in #5307 got lost during the merge. (oopss, Sorry!)
The same logic has been added back. The MC refactor PR #5293 moved
all the logic from `multicluster.go` into cmd specific files
whose changes added in #5307 were lost, while the changes added
in `multicluster/values.go` and template files still remained.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* viz: add a retry check for core control-plane pods before install
This commit adds a new check so that `viz install` waits till
the control-plane pods are up. For this to work, the `prometheus`
sub-system check in control-plane self-check has been removed,
as we re-use healthchecks to perform this.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* viz: add a new uninstall command
This adds a new `linkerd viz uninstall` command emitting the resources
with the `linkerd.io/extension=linkerd-viz` label set.
The container-image `ghcr.io/linkerd/cni-plugin:stable-2.9.1` does not contain the `kill` command as an executable. Instead, it is available as a shell built-in. In its current state, Kubernetes emits error events whenever linkerd2-cni pods are terminated because the `kill` command can not be found.
Signed-off-by: Mitch Hulscher <mitch.hulscher@lib.io>
Chart dependencies are added as tarballs under the chart's `chart`
subdirectory. When we move chart dependencies around this can leave
stale dependencies behind, ensuing havoc. This PR removes those deps
before calling `helm dep up`.
This test was never broken. My best guess is that CI was not merging with the
latest `main` as we have recently noticed, so this was an issue that was fixed
by #5458Closes#5478
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Currently, Each new instance of `Checker` type have to manually
set all the fields with the `NewChecker()`, even though most
use-cases are fine with the defaults.
This branch makes this simpler by using the Builder pattern, so
that the users of `Checker` can override the defaults by using
specific field methods when needed. Thus simplifying the code.
This also removes some of the methods that were specific to tests,
and replaces them with the currently used ones.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
CLI: Introduced `identity` command to fetch tls-certificates for a pod (#4459)
Modified and added a new cli command, which initiates a sni-tls session to the proxy's admin port and returns the certificate.
Usage:
- `linkerd identity pod/<pod-name>` : fetches certificate from the specified pod
- `linkerd identity -l app=svc/emoji` : fetches certificate from all pods with label app=svc/emoji
Signed-off-by: Jimil Desai <jimildesai42@gmail.com>
Proxy logs are disabled in tests. This makes it difficult to inspect
proxies after failed tests. This change re-enables the default proxy
logs in tests.
With this new way of chart rendering i.e using helm pkg directly
instead of using our own struct, we no longer need the `Values`
struct to be present, as all the rendering happens through
`map[string]interface{}`
This might be useful in future when we do validation of values, which
can also be done directly with out this, unless we don't want to deal
with conversions
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
The name `proxy-mutator` is too generic. In particular, several different linkerd extensions will have mutating webhooks which mutate the proxy sidecar, the MutatingWebhookConfiguration resource is cluster scoped, and each one needs a unique name.
We use the `jaeger-injector` name instead. This gives us a pattern to follow for future webhooks as well (e.g. `tap-injector` etc.)
Signed-off-by: Alex Leong <alex@buoyant.io>
While using k3d v3.0.2 using 3 nodes and installing linkerd in HA I've
seen errors like
```
Error from server: error when creating "STDIN": rpc error: code = Unknown desc = database is locked
```
Which doesn't happen on v3.4.0.
This brings though by default k8s v1.19, which is producing some
warnings in `linkerd check` like:
```
W0106 11:09:39.204081 292603 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
```
That only affects stderr so the tests still pass, but needs to be
addressed in a followup.
The `get-pod` and `port-forward` functions continue to assume
deployments like grafana still live under the `linkerd` namespace.
This expands the definition of those functions to be able to specify the
namespace.
These changes can be solely tested by running `bin/web dev` (follow the
instructions in `BUILD.md` for the preliminaries needed).
Split the image `name` field in `viz/charts/linkerd-viz/values.yaml` into `name` and `registry` to support custom registries. Changed the template files accordingly.
Just like other values, the registry can now be configured via CLI via the `--set-*` flags.
Fixes#5430
Signed-off-by: Jimil Desai <jimildesai42@gmail.com>
### What
When overriding the proxy version using annotations, the respective annotation displays the wrong information (`linkerd.io/proxy-version`). This is a simple fix to display the correct version for the annotation; instead of using the proxy image from the config for the annotation's value, we take it from the overriden values instead.
Based on the discussion from #5338 I understood that when the image is updated it is reflected in the container image version but not the annotation. Alex's proposed fix seems to work like a charm so I can't really take credit for anything. I have attached below some before/after snippets of the deployments & pods. If there any additional changes required (or if I misunderstood anything) let me know and I'll gladly get it sorted :)
#### Tests
---
Didn't add any new tests, I built the images and just tested the annotation displays the correct version.
To test:
* I first injected an emojivoto-web deployment, its respective pod had the proxy version set to `dev-...`;
* I then re-injected the same deployment using a different proxy version and restarted the pods, its respective pod displayed the expected annotation value `stable-2.9.0` (whereas before it would have still been `dev-...`)
`Before`
```
# Deployment
apiVersion: apps/v1
kind: Deployment
...
template:
metadata:
annotations:
kubectl.kubernetes.io/restartedAt: "2021-01-04T12:41:47Z"
linkerd.io/inject: enabled
# Pod
apiVersion: v1
kind: Pod
metadata:
annotations:
kubectl.kubernetes.io/restartedAt: "2021-01-04T12:41:47Z"
linkerd.io/created-by: linkerd/proxy-injector dev-8d506317-madavid
linkerd.io/identity-mode: default
linkerd.io/inject: enabled
linkerd.io/proxy-version: dev-8d506317-madavid
```
`After`
```sh
$ linkerd inject --proxy-version stable-2.9.0 - | kubectl apply -f -
# Deployment
apiVersion: apps/v1
kind: Deployment
...
template:
metadata:
annotations:
config.linkerd.io/proxy-version: stable-2.9.0
# Pod
apiVersion: v1
kind: Pod
metadata:
annotations:
config.linkerd.io/proxy-version: stable-2.9.0
kubectl.kubernetes.io/restartedAt: "2021-01-04T12:41:47Z"
linkerd.io/created-by: linkerd/proxy-injector dev-8d506317-madavid
linkerd.io/identity-mode: default
linkerd.io/inject: enabled
linkerd.io/proxy-version: stable-2.9.0
# linkerd.io/proxy-version changed after injection and now matches the config (and the proxy img)
```
Fixes#5338
Signed-off-by: Matei David <matei.david.35@gmail.com>
Currently, public-api is part of the core control-plane where
the prom check fails when ran before the viz extension is installed.
This change comments out that check, Once metrics api is moved into
viz, maybe this check can be part of it instead or directly part of
`linkerd viz check`.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
The destination service now returns `OpaqueTransport` hint when the annotation
matches the resolve target port. This is different from the current behavior
which always sets the hint when a proxy is present.
Closes#5421
This happens by changing the endpoint watcher to set a pod's opaque port
annotation in certain cases. If the pod already has an annotation, then its
value is used. If the pod has no annotation, then it checks the namespace that
the endpoint belongs to; if it finds an annotation on the namespace then it
overrides the pod's annotation value with that.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Currently the CA bundles in the config value `global.IdentityTrustAnchorsPEM` must not contain more than one certificate when the schema type is set to `kubernetes.io/tls` or the command `linkerd check` will fail.
This change remove the comparison between the trust anchors configured in the linkerd config map and the contents of the `ca.crt` field of the identity issuer K8s secret.
This is an alternative to MR #5396, which I will close as a result of the discussion with @adleong
Fixes#5292
Signed-off-by: Lutz Behnke <lutz.behnke@finleap.com>
## What
When the destination service returns a destination profile for an endpoint,
indicate if the endpoint can receive opaque traffic.
## Why
Closes#5400
## How
When translating a pod address to a destination profile, the destination service
checks if the pod is controlled by any linkerd control plane. If it is, it can
set a protocol hint where we indicate that it supports H2 and opaque traffic.
If the pod supports opaque traffic, we need to get the port that it expects
inbound traffic on. We do this by getting the proxy container and reading it's
`LINKERD2_PROXY_INBOUND_LISTEN_ADDR` environment variable. If we successfully
parse that into a port, we can set the opaque transport field in the destination
profile.
## Testing
A test has been added to the destination server where a pod has a
`linkerd-proxy` container. We can expect the `OpaqueTransport` field to be set
in the returned destination profile's protocol hint.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* viz: move some components into linkerd-viz
This branch moves the grafana,prometheus,web, tap components
into a new viz chart, following the same extension model that
multi-cluster and jaeger follow.
The components in viz are not injected during install time, and
will go through the injector. The `viz install` does not have any
cli flags to customize the install directly but instead follow the Helm
way of customization by using flags such as
`set`, `set-string`, `values`, `set-files`.
**Changes Include**
- Move `grafana`, `prometheus`, `web`, `tap` templates into viz extension.
- Remove all add-on related charts, logic and tests w.r.t CLI & Helm.
- Clean up `linkerd2/values.go` & `linkerd2/values.yaml` to not contain
fields related to viz components.
- Update `linkerd check` Healthchecks to not check for viz components.
- Create a new top level `viz` directory with CLI logic and Helm charts.
- Clean fields in the `viz/Values.yaml` to be in the `<component>.<property>`
model. Ex: `prometheus.resources`, `dashboard.image.tag`, etc so that it is
consistent everywhere.
**Testing**
```bash
# Install the Core Linkerd Installation
./bin/linkerd install | k apply -f -
# Wait for the proxy-injector to be ready
# Install the Viz Extension
./bin/linkerd cli viz install | k apply -f -
# Customized Install
./bin/linkerd cli viz install --set prometheus.enabled=false | k apply -f -
```
What is not included in this PR:
- Move of Controller from core install into the viz extension.
- Simplification and refactoring of the core chart i.e removing `.global`, etc.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
We need to test for the presence of the TCP metric labels, not the exact count.
This change removes the count of `1` so that it can match any count.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>