The namespace that Linkerd extensions are installed into is configurable. This can make it difficult to know which extensions are installed and where they are located. We add a `linkerd.io/extension` namespace label to easily enumerate and locate Linkerd extensions. This can be used, for example, to enable certain features only when certain extensions are installed. All new Linkerd extensions should include this namespace label.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Add automatic readme generation for charts
The current readmes for each chart is generated
manually and doesn't contain all the information available.
Utilize helm-docs to automatically fill out readme.mds
for the helm charts by pulling metadata from values.yml.
Fixes#4156
Co-authored-by: GMarkfjard <gabma047@student.liu.se>
This branch adds `jaeger dashboard` sub-command which is used
to view the jaeger dashboard. This follows the same logic/pattern
of that of `linkerd-dashboard`. Also, provides the same flags.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
This release removes a potential panic: it was assumed that looking up a
socket's peer address was infallible, but in practice this call can
fail when a host is under high load. Now these failures only impact the
connection-level task and not the whole proxy proces.
Also, the `process_cpu_seconds_total` metric is now exposed as a float
so that its value may include fractional seconds with 10ms granularity.
---
* io: Make peer_addr fallible (linkerd/linkerd2-proxy#755)
* metrics: Expose process_cpu_seconds_total as a float (linkerd/linkerd2-proxy#754)
* Jaeger injector mutating webhook
Closes#5231. This is based off of the `alex/sep-tracing` branch.
This webhook injects the `LINKERD2_PROXY_TRACE_COLLECTOR_SVC_ADDR`,
`LINKERD2_PROXY_TRACE_COLLECTOR_SVC_NAME` and
`LINKERD2_PROXY_TRACE_ATTRIBUTES_PATH` environment vars into the proxy
spec when a pod is created, as well as the podinfo volume and its mount.
If any of these are found to be present already in the pod spec, it
exits without applying a patch.
The `values.yaml` file has been expanded to include config for this
webhook. In particular, one can define a `namespaceSelector` and/or a
`objectSelector` to filter which pods will this webhook act on.
The config entries in `values.yam` for `collectorSvcAddr` and
`collectorSvcAccount` can be overriden with the
`config.linkerd.io/trace-collector` and
`config.alpha.linkerd.io/trace-collector-service-account` annotation at
the namespace or pod spec level.
## How to test:
```bash
docker build . -t ghcr.io/linkerd/jaeger-webhook:0.0.1 -f
jaeger/proxy-mutator/Dockerfile
k3d image import ghcr.io/linkerd/jaeger-webhook:0.0.1
bin/helm-build
linkerd install
helm install jaeger jaeger/charts/jaeger
linkerd inject https://run.linkerd.io/emojivoto.yml | kubectl apply -f -
kubectl -n emojivoto get po -l app=emoji-svc -oyaml | grep -A1 TRACE
```
## Reinvocation policy
The webhookconfig resource is configured with `reinvocationPolicy:
IfNeeded` so that if the tracing injector gets triggered before the
proxy injector, it will get triggered a second time after the proxy
injector runs so it can act on the injected proxy. By default this won't
be necessary because the webhooks run in alphabetical order (this is not
documented in k8s docs though) so
`linkerd-proxy-injector-webhook-config` will run before
`linkerd-proxy-mutator-webhook-config`. In order to test the
reinvocation mechanism, you can change the name of the former so it gets
called first.
I versioned the webhook image as `0.0.1`, but we can decide to align
that with linkerd's main version tag.
This edge release improves the proxy's support high-traffic workloads. It also
contains the first steps towards decoupling non-core Linkerd components, the
first iteration being a new `linkerd jaeger` sub-command for installing tracing.
Please note this is still a work in progress.
* Addressed some issues reported around clients seeing max-concurrency errors by
increasing the default in-flight request limit to 100K pending requests
* Have the proxy appropriately set `content-type` when synthesizing gRPC error
responses
* Bumped the `proxy-init` image to `v1.3.8` which is based off of
`buster-20201117-slim` to reduce potential security vulnerabilities
* No longer panic in rare cases when `linkerd-config` doesn't have an entry for
`Global` configs (thanks @hodbn!)
* Work in progress: the `/jaeger` directory now contains the charts and commands
for installing the tracing component.
* extension: Add new jaeger binary
This branch adds a new jaeger binary project in the jaeger directory.
This follows the same logic as that of `linkerd install`. But as
`linkerd install` VFS logic expects charts to be present in `/charts`
directory, This command gets its own static pkg to generate its own
VFS for its chart.
This covers only the install part of the command
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Fixes#5230
This PR moves tracing into a jaeger chart with no proxy injection
templates. We still keep the dependency on partials, as we could use
common templates like resources, etc from there.
Signed-off-by: Tarun Pothulapati tarunpothulapati@outlook.com
This release addresses some issues reported around clients seeing
max-concurrency errors by increasing the default in-flight request limit
to 100K pending requests.
Additionally, the proxy now sets an appropriate content-type when
synthesizing gRPC error responses.
---
* style: fix some random clippy lints (linkerd/linkerd2-proxy#749)
* errors: Set `content-type` for synthesized grpc errors (linkerd/linkerd2-proxy#750)
* concurrency-limit: Drop permit on readiness (linkerd/linkerd2-proxy#751)
* Increase the default buffer capacity to 100K (linkerd/linkerd2-proxy#752)
* Change default max-in-flight and buffer-capacity (linkerd/linkerd2-proxy#753)
Fixes#4874
This branch upgrades Helm sdk from v2 to v3 *without any functionaly
changes*, just replacing types with newer API's.
This should not effect our current support for Helm v2 as we did not
change any of the underlying tempaltes(which work with Helm v2). This
works becuase we did not use any of the API's that read the Chart
metadata (which are the only ones changed from v2 to v3) and currently
manually load files and pass ito the sdk.
This PR should provide a great point to start more of the newer Helm v3
API's including for the upgrade workflow thus allowing us to make
Linkerd CLI more simpler.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
CLI crashes if linkerd-config contains unexpected values.
Add a safe accessor that initializes an empty Global on the first
access. Refactor all accesses to use the newly introduced accessor using
gopls.
Add test for linkerd-config data without Global.
Fixes#5215
Co-authored-by: Itai Schwartz <yitai27@gmail.com>
Signed-off-by: Hod Bin Noon <bin.noon.hod@gmail.com>
This adds additional tests for the destination service that assert `GetProfile`
behavior when the path is an IP address.
1. Assert that when the path is a cluster IP, the configured service profile is
returned.
2. Assert that when the path a pod IP, the endpoint field is populated in the
service profile returned.
3. Assert that when the path is not a cluster or pod IP, the default service
profile is returned.
4. Assert that when path is a pod IP with or without the controller annotation,
the endpoint has or does not have a protocol hint
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* Refactor webhook framework to allow webhook define their flags
Pulled out of `launcher.go` the flag parsing logic and moved it into the `Main` methods of the webhooks (under `controller/cmd/proxy.injector/main.go` and `controller/cmd/sp-validator/main.go`), so that individual webhooks themselves can define the flags they want to use.
Also no longer require that webhooks have cluster-wide access.
Finally, renamed the type `webhook.handlerFunc` to `webhook.Handler` so it can be exported. This will be used in the upcoming jaeger webhook.
## edge-20.11.4
* Fixed an issue in the destination service where endpoints always included a
protocol hint, regardless of the controller label being present or not
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This fixes an issue where the protocol hint is always set on endpoint responses.
We now check the right value which determines if the pod has the required label.
A test for this has been added to #5266.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This release changes error handling to teardown the server-side
connection when an unexpected error is encountered.
Additionally, the outbound TCP routing stack can now skip redundant
service discovery lookups when profile responses include endpoint
information.
Finally, the cache implementation has been updated to reduce latency by
removing unnecessary buffers.
---
* h2: enable HTTP/2 keepalive PING frames (linkerd/linkerd2-proxy#737)
* actions: Add timeouts to GitHub actions (linkerd/linkerd2-proxy#738)
* outbound: Skip endpoint resolution on profile hint (linkerd/linkerd2-proxy#736)
* Add a FromStr for dns::Name (linkerd/linkerd2-proxy#746)
* outbound: Avoid redundant TCP endpoint resolution (linkerd/linkerd2-proxy#742)
* cache: Make the cache cloneable with RwLock (linkerd/linkerd2-proxy#743)
* http: Teardown serverside connections on error (linkerd/linkerd2-proxy#747)
As discussed in #5228, it is not correct for root and intermediate
certs to have SAN. This PR updates the check to not verify the
intermediate issuer cert with the identity dns name (which checks with
SAN and not CN as the the `verify` func is used to verify leaf certs and
not root and intermediate certs). This PR also avoids setting a SAN
field when generating certs in the `install` command.
Fixes#5228
Context: #5209
This updates the destination service to set the `Endpoint` field in `GetProfile`
responses.
The `Endpoint` field is only set if the IP maps to a Pod--not a Service.
Additionally in this scenario, the default Service Profile is used as the base
profile so no other significant fields are set.
### Examples
```
# GetProfile for an IP that maps to a Service
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.43.222.0:9090
INFO[0000] fully_qualified_name:"linkerd-prometheus.linkerd.svc.cluster.local" retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} dst_overrides:{authority:"linkerd-prometheus.linkerd.svc.cluster.local.:9090" weight:10000}
```
Before:
```
# GetProfile for an IP that maps to a Pod
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.42.0.20
INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}}
```
After:
```
# GetProfile for an IP that maps to a Pod
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.42.0.20
INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} endpoint:{addr:{ip:{ipv4:170524692}} weight:10000 metric_labels:{key:"control_plane_ns" value:"linkerd"} metric_labels:{key:"deployment" value:"fast-1"} metric_labels:{key:"pod" value:"fast-1-5cc87f64bc-9hx7h"} metric_labels:{key:"pod_template_hash" value:"5cc87f64bc"} metric_labels:{key:"serviceaccount" value:"default"} tls_identity:{dns_like_identity:{name:"default.default.serviceaccount.identity.linkerd.cluster.local"}} protocol_hint:{h2:{}}}
```
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* Consolidate integration tests under k3d
Fixes#5007
Simplified integration tests by moving all to k3d. Previously things were running in Kind, except for the multicluster tests, which implied some extra complexity in the supporting scripts.
Removed the KinD config files under `test/integration/configs`, as config is now passed as flags into the `k3d` command.
Also renamed `kind_integration.yml` to `integration_tests.yml`
Test skipping logic under ARM was also simplified.
The rare cases where these tests were useful don't make up for the burden of
maintaing them, having different k8s version change the messages and
having unexpected warnings come up that didn't affect the final
convergence of the system.
With this we also revert the indirection added back in #4538 that
fetched unmatched warnings after a test had failed.
This upgrades both the proxy-init image itself, and the go dependency on
proxy-init as a library, which fixes CNI in k3s and any host using
binaries coming from BusyBox, where `nsenter` has an
issue parsing arguments (see rancher/k3s#1434).
The ARM integration tests take a very long time to run for some reason. For example, in the stable-2.9.0 release, they took
38 minutes. Thus, this test needs a longer timeout.
Increase the ARM integration test timeout from 30 minutes to 60 minutes.
Signed-off-by: Alex Leong <alex@buoyant.io>
This edge release reduces memory consumption of Linkerd proxies which maintain
many idle connections (such as Prometheus). It also removes some obsolete
commands from the CLI and allows setting custom annotations on multicluster
gateways.
* Reduced the default idle connection timeout to 5s for outbound clients and
20s for inbound clients to reduce the proxy's memory footprint, especially on
Prometheus instances
* Added support for setting annotations on the multicluster gateway in Helm
which allows setting the load balancer as internal (thanks @shaikatz!)
* Removed the `get` and `logs` command from the CLI
Signed-off-by: Alex Leong <alex@buoyant.io>
Fixes#5191
The logs command adds a external dependency that we forked to work but
does not fit within linkerd's core set of responsibilities. Hence, This
is being removed.
For capabilities like this, The Kubernetes plugin ecosystem has better
and well maintained tools that can be used.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Fixes#5190
`linkerd get` is not used currently and works only for pods. This can be
removed instead as per the issue. This branch removes the command and
also the associated unit and integration tests.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
The default job timeout is 6 hours! This allows runaway builds to
consume our actions resources unnecessarily.
This change limits integration test jobs to 30 minutes. Static checks
are limited to 10 minutes.
* Update BUILD.md with multiarch stuff and some extras
Adds to `BUILD.md` a new section `Publishing images` explaining the
workflow for testing custom builds.
Also updates and gives more precision to the section `Building CLI for
development`.
Finally, a new `Multi-architecture builds` section is added.
This PR also removes `SUPPORTED_ARCHS` from `bin/docker-build-cli` that
is no longer used.
Note I'm leaving some references to Minikube. I might change that in a
separate PR to point to k3d if we manage to migrate the KinD stuff to
k3d.
This release modifies the default idle timeout to 5s for outbound
clients and 20s for inbound clients. This prevents idle clients from
consuming memory at the cost of performing more discovery resolutions
for periodic but infrequent traffic. This is intended to reduce the
proxy's memory footprint, especially on Prometheus instances.
The proxy's *ring* and rustls dependencies have also been updated.
---
* Update *ring* and rustls dependencies (linkerd/linkerd2-proxy#735)
* http: Configure client connection pools (linkerd/linkerd2-proxy#734)
* Changes for `stable-2.9.0`
Only user-facing items were mentioned. There were previous edge release
notes contained a summary of a change, I preferred using that summary
instead of the more technical bullet point. Given the large list of
items, I separated into sections for easier digestion. Also, I didn't
repeat the TCP mTLS stuff (nor ARM support) below in the bullet points
as it was already well described in the summary.
## stable-2.9.0
This release extends Linkerd's zero-config mutual TLS (mTLS) support to all TCP
connections, allowing Linkerd to transparently encrypt and authenticate all TCP
connections in the cluster the moment it's installed. It also adds ARM support,
introduces a new multi-core proxy runtime for higher throughput, adds support
for Kubernetes service topologies, and lots, lots more, as described below:
* Proxy
* Performed internal improvements for lower latencies under high concurrency
* Reduced performance impact of logging , especially when the `debug` or
`trace` log levels are disabled
* Improved error handling for DNS errors encountered when discovering control
plane addresses, which can be common during installation, before all
components have been started, allowing linkerd to continue to operate
normally in HA during node outages
* Control Plane
* Added support for [topology-aware service
routing](https://kubernetes.io/docs/concepts/services-networking/service-topology/)
to the Destination controller; when providing service discovery updates to
proxies the Destination controller will now filter endpoints based on the
service's topology preferences
* Added support for the new Kubernetes
[EndpointSlice](https://kubernetes.io/docs/concepts/services-networking/endpoint-slices/)
resource to the Destination controller; Linkerd can be installed with
`--enable-endpoint-slices` flag to use this resource rather than the
Endpoints API in clusters where this new API is supported
* Dashboard
* Added new Spanish translations (please help us translate into your
language!)
* Added new section for exposing multicluster gateway metrics
* CLI
* Renamed the `--addon-config` flag to `--config` to clarify this flag can be
used
* Added fish shell completions to the `linkerd` command to set any Helm value
* Multicluster
* Replaced the single `service-mirror` controller, with separate controllers
that will be installed per target cluster through `linkerd multicluster
link`
* Changed the mechanism for mirroring services: instead of relying on
annotations on the target services, now the source cluster should specify
which services from the target cluster should be exported by using a label
selector
* Added support for creating multiple service accounts when installing
multicluster with Helm to allow more granular revocation
* Added a multicluster `unlink` command for removing multicluster links
* Prometheus
* Moved Linkerd's bundled Prometheus into an add-on (enabled by default); this
makes the Linkerd Prometheus more configurable, gives it a separate upgrade
lifecycle from the rest of the control plane, and will allow users to
disable the bundled Prometheus instance
* The long-awaited Bring-Your-Own-Prometheus case has been finally addressed:
added `global.prometheusUrl` to the Helm config to have linkerd use an
external Prometheus instance instead of the one provided by default
* Added an option to persist data to a volume instead of memory, so that
historical metrics are available when prometheus is restarted
* The helm chart can now configure persistent storage and limits
* Other
* Added a new `linkerd.io/inject: ingress` annotation and accompanying
`--ingress` flag to the `inject command, to configure the proxy to support
service profiles and enable per-route metrics and traffic splits for HTTP
ingress controllers
* Changed the type of the injector and tap API secrets to `kubernetes.io/tls`
so they can be provisioned by cert-manager
* Changed default docker image repository to `ghcr.io` from `gcr.io`; **Users
who pull the images into private repositories should take note of this
change**
* Introduced support for authenticated docker registries
* Simplified the way that Linkerd stores its configuration; configuration is
now stored as Helm values in the `linkerd-config` ConfigMap
* Added support for Helm configuration of per-component proxy resources
requests
This release includes changes from a massive list of contributors. A special
thank-you to everyone who helped make this release possible: --long
list, see file --
* Fixed some bad copypasta
* Apply suggestions from code review
Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This edge supersedes edge-20.10.6 as a release candidate for stable-2.9.0.
* Fixed issue where the `check` command would error when there is no Prometheus
configured
* Fixed recent regression that caused multicluster on EKS to not work properly
* Changed the `check` command to warn instead of error when webhook certificates
are near expiry
* Added the `--ingress` flag to the `inject` command which adds the recently
introduced `linkerd.io/inject: ingress` annotation
* Fixed issue with upgrades where external certs would be fetched and stored
even though this does not happen on fresh installs with externally created
certs
* Fixed issue with upgrades where the issuer cert expiration was being reset
* Removed the `--registry` flag from the `multicluster install` command
* Removed default CPU limits for the proxy and control plane components in HA
mode
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This change updates `FetchExternalIssuerData` to be more like
`FetchIssuerData` and return expiry correctly.
This field is currently not used anywhere and is just done for
consistentcy purposes.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Per #5165, Kubernetes does not necessarily limit the proxy's access to
cores via `cgroups` when a CPU limit is set. As of #5168, the proxy now
supports a `LINKERD2_PROXY_CORES` environment configuration that
augments CPU detection from the host operating system.
This change modifies the proxy injector to ensure that this environment
is configured from the `Values.proxy.cores` Helm value, the
`config.linkerd.io/proxy-cpu-limit` annotation, and the `--proxy-cpu-limit`
install flag.
As discussed in #5167 & #5169, Kubernetes CPU limits are not necessarily
discoverable from within the pod. This means that the control plane
processes may allocate far more threads than can actually be used by the
process given its process limits.
This change removes the default CPU limits for all control plane
components. CPU limits may still be set via Helm configuration.
Now that the proxy can use more than one core, this behavior should be
enabled by default, even in HA mode.
This change modifies the default HA helm values to unset the cpu limit
for proxy containers.
This release adds support for the LINKERD2_PROXY_CORES environment
variable. When set, the value may limit the proxy's runtime resources
so that it does not allocate a thread per core available from the host
operating system.
---
* inbound: use MakeSwitch for loopback (linkerd/linkerd2-proxy#729)
* buffer: Remove readiness watch (linkerd/linkerd2-proxy#731)
* Allow specifying the number of available cores via the env (linkerd/linkerd2-proxy#733)
After the 2.9 multicluster refactoring, `linkerd mc install`'s only
workload installed is the nginx gateway, whose docker image is
configured through the flags `--gateway-nginx-image` and
`--gateway-nginx-image-version`. Thus there's no longer need of the
`--registry` flag, which is used OTOH by `linkerd mc link` which deploys the service mirror.