Original description:
> **Subject**
> Add missing helm values for multicluster setup
>
> **Problem**
> When executing this without the linkerd command the two variables are missing and the rendering will generate empty values.
> This produces the following gateway identity, that is also used in the gateway link command to generate the link crd:
>
> ```
> mirror.linkerd.io/gateway-identity: linkerd-gateway.linkerd-multicluster.serviceaccount.identity..
> ```
>
> **Solution**
> Add the values as defaults to the helm chart values.yaml file. If the cli is used they are overwritten by the following parameters:
> * https://github.com/linkerd/linkerd2/blob/main/cli/cmd/multicluster.go#L197
> * https://github.com/linkerd/linkerd2/blob/main/cli/cmd/multicluster.go#L196
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Co-authored-by: Björn Wenzel <bjoern.wenzel@dbschenker.com>
* bin/shellcheck-all was missing some files
`bin/shellcheck-all` identifies what files to check by filtering by the
`text/x-shellscript` mime-type, which only applies to files with a
shebang pointing to bash. We had a number of files with a
`#!/usr/bin/env sh` shebang that (at least in Ubuntu given `sh` points
to `dash`) only exposes a `text/plain` mime-type, thus they were not
being checked.
This fixes that issue by replacing the filter in `bin/shellcheck-all`, using a simple grep over the file shebang instead of using the `file` command.
This changes the install-pr script to work with k3d.
Additionally, it now only installs the CLI; it no longer installs Linkerd on the
cluster. This was removed because most of the time when installing a Linkerd
version from a PR, some extra installation configuration is required and I was
always commenting out that final part of the script.
`--context` was changed to `--cluster` since we no longer need a context value,
only the cluster name which we are loading the images in to.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Fixes#5257
This branch movies mc charts and cli level code to a new
top level directory. None of the logic is changed.
Also, moves some common types into `/pkg` so that they
are accessible both to the main cli and extensions.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
This change adds flags `set`, `set-string`, `values`, `set-files`,
etc flags which are used to override the default values. This is
similar to that of Helm.
This also updates the install workflow to directly use Helm v3
pkg for chart loading and generation, without having to use
our chart type, etc.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Have webhooks refresh their certs automatically
Fixes partially #5272
In 2.9 we introduced the ability for providing the certs for `proxy-injector` and `sp-validator` through some external means like cert-manager, through the new helm setting `externalSecret`.
We forgot however to have those services watch changes in their secrets, so whenever they were rotated they would fail with a cert error, with the only workaround being to restart those pods to pick the new secrets.
This addresses that by first abstracting out `FsCredsWatcher` from the identity controller, which now lives under `pkg/tls`.
The webhook's logic in `launcher.go` no longer reads the certs before starting the https server, moving that instead into `server.go` which in a similar way as identity will receive events from `FsCredsWatcher` and update `Server.cert`. We're leveraging `http.Server.TLSConfig.GetCertificate` which allows us to provide a function that will return the current cert for every incoming request.
### How to test
```bash
# Create some root cert
$ step certificate create linkerd-proxy-injector.linkerd.svc ca.crt ca.key \
--profile root-ca --no-password --insecure --san linkerd-proxy-injector.linkerd.svc
# configure injector's caBundle to be that root cert
$ cat > linkerd-overrides.yaml << EOF
proxyInjector:
externalSecret: true
caBundle: |
< ca.crt contents>
EOF
# Install linkerd. The injector won't start untill we create the secret below
$ bin/linkerd install --controller-log-level debug --config linkerd-overrides.yaml | k apply -f -
# Generate an intermediatery cert with short lifespan
step certificate create linkerd-proxy-injector.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-proxy-injector.linkerd.svc
# Create the secret using that intermediate cert
$ kubectl create secret tls \
linkerd-proxy-injector-k8s-tls \
--cert=ca-int.crt \
--key=ca-int.key \
--namespace=linkerd
# start following the injector log
$ k -n linkerd logs -f -l linkerd.io/control-plane-component=proxy-injector -c proxy-injector
# Inject emojivoto. The pods should be injected normally
$ bin/linkerd inject https://run.linkerd.io/emojivoto.yml | kubectl apply -f -
# Wait about 5 minutes and delete a pod
$ k -n emojivoto delete po -l app=emoji-svc
# You'll see it won't be injected, and something like "remote error: tls: bad certificate" will appear in the injector logs.
# Regenerate the intermediate cert
$ step certificate create linkerd-proxy-injector.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-proxy-injector.linkerd.svc
# Delete the secret and recreate it
$ k -n linkerd delete secret linkerd-proxy-injector-k8s-tls
$ kubectl create secret tls \
linkerd-proxy-injector-k8s-tls \
--cert=ca-int.crt \
--key=ca-int.key \
--namespace=linkerd
# Wait a couple of minutes and you'll see some filesystem events in the injector log along with a "Certificate has been updated" entry
# Then delete the pod again and you'll see it gets injected this time
$ k -n emojivoto delete po -l app=emoji-svc
```
This edge release continues the work of decoupling non-core Linkerd components
by moving more tracing related functionality into the Linkerd-jaeger extension.
* Continued work on moving tracing functionality from the main control plane
into the `linkerd-jaeger` extension
* Fixed a potential panic in the proxy when looking up a socket's peer address
while under high load
* Added automatic readme generation for charts (thanks @GMarkfjard!)
* Fixed zsh completion for the CLI (thanks @jiraguha!)
* Added support for multicluster gateways of types other than LoadBalancer
(thanks @DaspawnW!)
Signed-off-by: Alex Leong <alex@buoyant.io>
This release updates the proxy's `*ring*` dependency to pick up the
latest changes from BoringSSL.
Additionally, we've audited uses of non-cryptographic random number
generators in the proxy to ensure that each balancer/router intializes
its own RNG state.
---
* Audit uses of SmallRng (linkerd/linkerd2-proxy#757)
* Update *ring* to 0.6.19 (linkerd/linkerd2-proxy#758)
* metrics: Support the Summary metric type (linkerd/linkerd2-proxy#756)
The namespace that Linkerd extensions are installed into is configurable. This can make it difficult to know which extensions are installed and where they are located. We add a `linkerd.io/extension` namespace label to easily enumerate and locate Linkerd extensions. This can be used, for example, to enable certain features only when certain extensions are installed. All new Linkerd extensions should include this namespace label.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Add automatic readme generation for charts
The current readmes for each chart is generated
manually and doesn't contain all the information available.
Utilize helm-docs to automatically fill out readme.mds
for the helm charts by pulling metadata from values.yml.
Fixes#4156
Co-authored-by: GMarkfjard <gabma047@student.liu.se>
This branch adds `jaeger dashboard` sub-command which is used
to view the jaeger dashboard. This follows the same logic/pattern
of that of `linkerd-dashboard`. Also, provides the same flags.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
This release removes a potential panic: it was assumed that looking up a
socket's peer address was infallible, but in practice this call can
fail when a host is under high load. Now these failures only impact the
connection-level task and not the whole proxy proces.
Also, the `process_cpu_seconds_total` metric is now exposed as a float
so that its value may include fractional seconds with 10ms granularity.
---
* io: Make peer_addr fallible (linkerd/linkerd2-proxy#755)
* metrics: Expose process_cpu_seconds_total as a float (linkerd/linkerd2-proxy#754)
* Jaeger injector mutating webhook
Closes#5231. This is based off of the `alex/sep-tracing` branch.
This webhook injects the `LINKERD2_PROXY_TRACE_COLLECTOR_SVC_ADDR`,
`LINKERD2_PROXY_TRACE_COLLECTOR_SVC_NAME` and
`LINKERD2_PROXY_TRACE_ATTRIBUTES_PATH` environment vars into the proxy
spec when a pod is created, as well as the podinfo volume and its mount.
If any of these are found to be present already in the pod spec, it
exits without applying a patch.
The `values.yaml` file has been expanded to include config for this
webhook. In particular, one can define a `namespaceSelector` and/or a
`objectSelector` to filter which pods will this webhook act on.
The config entries in `values.yam` for `collectorSvcAddr` and
`collectorSvcAccount` can be overriden with the
`config.linkerd.io/trace-collector` and
`config.alpha.linkerd.io/trace-collector-service-account` annotation at
the namespace or pod spec level.
## How to test:
```bash
docker build . -t ghcr.io/linkerd/jaeger-webhook:0.0.1 -f
jaeger/proxy-mutator/Dockerfile
k3d image import ghcr.io/linkerd/jaeger-webhook:0.0.1
bin/helm-build
linkerd install
helm install jaeger jaeger/charts/jaeger
linkerd inject https://run.linkerd.io/emojivoto.yml | kubectl apply -f -
kubectl -n emojivoto get po -l app=emoji-svc -oyaml | grep -A1 TRACE
```
## Reinvocation policy
The webhookconfig resource is configured with `reinvocationPolicy:
IfNeeded` so that if the tracing injector gets triggered before the
proxy injector, it will get triggered a second time after the proxy
injector runs so it can act on the injected proxy. By default this won't
be necessary because the webhooks run in alphabetical order (this is not
documented in k8s docs though) so
`linkerd-proxy-injector-webhook-config` will run before
`linkerd-proxy-mutator-webhook-config`. In order to test the
reinvocation mechanism, you can change the name of the former so it gets
called first.
I versioned the webhook image as `0.0.1`, but we can decide to align
that with linkerd's main version tag.
This edge release improves the proxy's support high-traffic workloads. It also
contains the first steps towards decoupling non-core Linkerd components, the
first iteration being a new `linkerd jaeger` sub-command for installing tracing.
Please note this is still a work in progress.
* Addressed some issues reported around clients seeing max-concurrency errors by
increasing the default in-flight request limit to 100K pending requests
* Have the proxy appropriately set `content-type` when synthesizing gRPC error
responses
* Bumped the `proxy-init` image to `v1.3.8` which is based off of
`buster-20201117-slim` to reduce potential security vulnerabilities
* No longer panic in rare cases when `linkerd-config` doesn't have an entry for
`Global` configs (thanks @hodbn!)
* Work in progress: the `/jaeger` directory now contains the charts and commands
for installing the tracing component.
* extension: Add new jaeger binary
This branch adds a new jaeger binary project in the jaeger directory.
This follows the same logic as that of `linkerd install`. But as
`linkerd install` VFS logic expects charts to be present in `/charts`
directory, This command gets its own static pkg to generate its own
VFS for its chart.
This covers only the install part of the command
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Fixes#5230
This PR moves tracing into a jaeger chart with no proxy injection
templates. We still keep the dependency on partials, as we could use
common templates like resources, etc from there.
Signed-off-by: Tarun Pothulapati tarunpothulapati@outlook.com
This release addresses some issues reported around clients seeing
max-concurrency errors by increasing the default in-flight request limit
to 100K pending requests.
Additionally, the proxy now sets an appropriate content-type when
synthesizing gRPC error responses.
---
* style: fix some random clippy lints (linkerd/linkerd2-proxy#749)
* errors: Set `content-type` for synthesized grpc errors (linkerd/linkerd2-proxy#750)
* concurrency-limit: Drop permit on readiness (linkerd/linkerd2-proxy#751)
* Increase the default buffer capacity to 100K (linkerd/linkerd2-proxy#752)
* Change default max-in-flight and buffer-capacity (linkerd/linkerd2-proxy#753)
Fixes#4874
This branch upgrades Helm sdk from v2 to v3 *without any functionaly
changes*, just replacing types with newer API's.
This should not effect our current support for Helm v2 as we did not
change any of the underlying tempaltes(which work with Helm v2). This
works becuase we did not use any of the API's that read the Chart
metadata (which are the only ones changed from v2 to v3) and currently
manually load files and pass ito the sdk.
This PR should provide a great point to start more of the newer Helm v3
API's including for the upgrade workflow thus allowing us to make
Linkerd CLI more simpler.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
CLI crashes if linkerd-config contains unexpected values.
Add a safe accessor that initializes an empty Global on the first
access. Refactor all accesses to use the newly introduced accessor using
gopls.
Add test for linkerd-config data without Global.
Fixes#5215
Co-authored-by: Itai Schwartz <yitai27@gmail.com>
Signed-off-by: Hod Bin Noon <bin.noon.hod@gmail.com>
This adds additional tests for the destination service that assert `GetProfile`
behavior when the path is an IP address.
1. Assert that when the path is a cluster IP, the configured service profile is
returned.
2. Assert that when the path a pod IP, the endpoint field is populated in the
service profile returned.
3. Assert that when the path is not a cluster or pod IP, the default service
profile is returned.
4. Assert that when path is a pod IP with or without the controller annotation,
the endpoint has or does not have a protocol hint
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* Refactor webhook framework to allow webhook define their flags
Pulled out of `launcher.go` the flag parsing logic and moved it into the `Main` methods of the webhooks (under `controller/cmd/proxy.injector/main.go` and `controller/cmd/sp-validator/main.go`), so that individual webhooks themselves can define the flags they want to use.
Also no longer require that webhooks have cluster-wide access.
Finally, renamed the type `webhook.handlerFunc` to `webhook.Handler` so it can be exported. This will be used in the upcoming jaeger webhook.
## edge-20.11.4
* Fixed an issue in the destination service where endpoints always included a
protocol hint, regardless of the controller label being present or not
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This fixes an issue where the protocol hint is always set on endpoint responses.
We now check the right value which determines if the pod has the required label.
A test for this has been added to #5266.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This release changes error handling to teardown the server-side
connection when an unexpected error is encountered.
Additionally, the outbound TCP routing stack can now skip redundant
service discovery lookups when profile responses include endpoint
information.
Finally, the cache implementation has been updated to reduce latency by
removing unnecessary buffers.
---
* h2: enable HTTP/2 keepalive PING frames (linkerd/linkerd2-proxy#737)
* actions: Add timeouts to GitHub actions (linkerd/linkerd2-proxy#738)
* outbound: Skip endpoint resolution on profile hint (linkerd/linkerd2-proxy#736)
* Add a FromStr for dns::Name (linkerd/linkerd2-proxy#746)
* outbound: Avoid redundant TCP endpoint resolution (linkerd/linkerd2-proxy#742)
* cache: Make the cache cloneable with RwLock (linkerd/linkerd2-proxy#743)
* http: Teardown serverside connections on error (linkerd/linkerd2-proxy#747)
As discussed in #5228, it is not correct for root and intermediate
certs to have SAN. This PR updates the check to not verify the
intermediate issuer cert with the identity dns name (which checks with
SAN and not CN as the the `verify` func is used to verify leaf certs and
not root and intermediate certs). This PR also avoids setting a SAN
field when generating certs in the `install` command.
Fixes#5228
Context: #5209
This updates the destination service to set the `Endpoint` field in `GetProfile`
responses.
The `Endpoint` field is only set if the IP maps to a Pod--not a Service.
Additionally in this scenario, the default Service Profile is used as the base
profile so no other significant fields are set.
### Examples
```
# GetProfile for an IP that maps to a Service
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.43.222.0:9090
INFO[0000] fully_qualified_name:"linkerd-prometheus.linkerd.svc.cluster.local" retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} dst_overrides:{authority:"linkerd-prometheus.linkerd.svc.cluster.local.:9090" weight:10000}
```
Before:
```
# GetProfile for an IP that maps to a Pod
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.42.0.20
INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}}
```
After:
```
# GetProfile for an IP that maps to a Pod
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.42.0.20
INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} endpoint:{addr:{ip:{ipv4:170524692}} weight:10000 metric_labels:{key:"control_plane_ns" value:"linkerd"} metric_labels:{key:"deployment" value:"fast-1"} metric_labels:{key:"pod" value:"fast-1-5cc87f64bc-9hx7h"} metric_labels:{key:"pod_template_hash" value:"5cc87f64bc"} metric_labels:{key:"serviceaccount" value:"default"} tls_identity:{dns_like_identity:{name:"default.default.serviceaccount.identity.linkerd.cluster.local"}} protocol_hint:{h2:{}}}
```
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* Consolidate integration tests under k3d
Fixes#5007
Simplified integration tests by moving all to k3d. Previously things were running in Kind, except for the multicluster tests, which implied some extra complexity in the supporting scripts.
Removed the KinD config files under `test/integration/configs`, as config is now passed as flags into the `k3d` command.
Also renamed `kind_integration.yml` to `integration_tests.yml`
Test skipping logic under ARM was also simplified.
The rare cases where these tests were useful don't make up for the burden of
maintaing them, having different k8s version change the messages and
having unexpected warnings come up that didn't affect the final
convergence of the system.
With this we also revert the indirection added back in #4538 that
fetched unmatched warnings after a test had failed.
This upgrades both the proxy-init image itself, and the go dependency on
proxy-init as a library, which fixes CNI in k3s and any host using
binaries coming from BusyBox, where `nsenter` has an
issue parsing arguments (see rancher/k3s#1434).
The ARM integration tests take a very long time to run for some reason. For example, in the stable-2.9.0 release, they took
38 minutes. Thus, this test needs a longer timeout.
Increase the ARM integration test timeout from 30 minutes to 60 minutes.
Signed-off-by: Alex Leong <alex@buoyant.io>
This edge release reduces memory consumption of Linkerd proxies which maintain
many idle connections (such as Prometheus). It also removes some obsolete
commands from the CLI and allows setting custom annotations on multicluster
gateways.
* Reduced the default idle connection timeout to 5s for outbound clients and
20s for inbound clients to reduce the proxy's memory footprint, especially on
Prometheus instances
* Added support for setting annotations on the multicluster gateway in Helm
which allows setting the load balancer as internal (thanks @shaikatz!)
* Removed the `get` and `logs` command from the CLI
Signed-off-by: Alex Leong <alex@buoyant.io>
Fixes#5191
The logs command adds a external dependency that we forked to work but
does not fit within linkerd's core set of responsibilities. Hence, This
is being removed.
For capabilities like this, The Kubernetes plugin ecosystem has better
and well maintained tools that can be used.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Fixes#5190
`linkerd get` is not used currently and works only for pods. This can be
removed instead as per the issue. This branch removes the command and
also the associated unit and integration tests.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
The default job timeout is 6 hours! This allows runaway builds to
consume our actions resources unnecessarily.
This change limits integration test jobs to 30 minutes. Static checks
are limited to 10 minutes.
* Update BUILD.md with multiarch stuff and some extras
Adds to `BUILD.md` a new section `Publishing images` explaining the
workflow for testing custom builds.
Also updates and gives more precision to the section `Building CLI for
development`.
Finally, a new `Multi-architecture builds` section is added.
This PR also removes `SUPPORTED_ARCHS` from `bin/docker-build-cli` that
is no longer used.
Note I'm leaving some references to Minikube. I might change that in a
separate PR to point to k3d if we manage to migrate the KinD stuff to
k3d.