This edge release reduces memory consumption of Linkerd proxies which maintain
many idle connections (such as Prometheus). It also removes some obsolete
commands from the CLI and allows setting custom annotations on multicluster
gateways.
* Reduced the default idle connection timeout to 5s for outbound clients and
20s for inbound clients to reduce the proxy's memory footprint, especially on
Prometheus instances
* Added support for setting annotations on the multicluster gateway in Helm
which allows setting the load balancer as internal (thanks @shaikatz!)
* Removed the `get` and `logs` command from the CLI
Signed-off-by: Alex Leong <alex@buoyant.io>
Fixes#5191
The logs command adds a external dependency that we forked to work but
does not fit within linkerd's core set of responsibilities. Hence, This
is being removed.
For capabilities like this, The Kubernetes plugin ecosystem has better
and well maintained tools that can be used.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Fixes#5190
`linkerd get` is not used currently and works only for pods. This can be
removed instead as per the issue. This branch removes the command and
also the associated unit and integration tests.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
The default job timeout is 6 hours! This allows runaway builds to
consume our actions resources unnecessarily.
This change limits integration test jobs to 30 minutes. Static checks
are limited to 10 minutes.
* Update BUILD.md with multiarch stuff and some extras
Adds to `BUILD.md` a new section `Publishing images` explaining the
workflow for testing custom builds.
Also updates and gives more precision to the section `Building CLI for
development`.
Finally, a new `Multi-architecture builds` section is added.
This PR also removes `SUPPORTED_ARCHS` from `bin/docker-build-cli` that
is no longer used.
Note I'm leaving some references to Minikube. I might change that in a
separate PR to point to k3d if we manage to migrate the KinD stuff to
k3d.
This release modifies the default idle timeout to 5s for outbound
clients and 20s for inbound clients. This prevents idle clients from
consuming memory at the cost of performing more discovery resolutions
for periodic but infrequent traffic. This is intended to reduce the
proxy's memory footprint, especially on Prometheus instances.
The proxy's *ring* and rustls dependencies have also been updated.
---
* Update *ring* and rustls dependencies (linkerd/linkerd2-proxy#735)
* http: Configure client connection pools (linkerd/linkerd2-proxy#734)
* Changes for `stable-2.9.0`
Only user-facing items were mentioned. There were previous edge release
notes contained a summary of a change, I preferred using that summary
instead of the more technical bullet point. Given the large list of
items, I separated into sections for easier digestion. Also, I didn't
repeat the TCP mTLS stuff (nor ARM support) below in the bullet points
as it was already well described in the summary.
## stable-2.9.0
This release extends Linkerd's zero-config mutual TLS (mTLS) support to all TCP
connections, allowing Linkerd to transparently encrypt and authenticate all TCP
connections in the cluster the moment it's installed. It also adds ARM support,
introduces a new multi-core proxy runtime for higher throughput, adds support
for Kubernetes service topologies, and lots, lots more, as described below:
* Proxy
* Performed internal improvements for lower latencies under high concurrency
* Reduced performance impact of logging , especially when the `debug` or
`trace` log levels are disabled
* Improved error handling for DNS errors encountered when discovering control
plane addresses, which can be common during installation, before all
components have been started, allowing linkerd to continue to operate
normally in HA during node outages
* Control Plane
* Added support for [topology-aware service
routing](https://kubernetes.io/docs/concepts/services-networking/service-topology/)
to the Destination controller; when providing service discovery updates to
proxies the Destination controller will now filter endpoints based on the
service's topology preferences
* Added support for the new Kubernetes
[EndpointSlice](https://kubernetes.io/docs/concepts/services-networking/endpoint-slices/)
resource to the Destination controller; Linkerd can be installed with
`--enable-endpoint-slices` flag to use this resource rather than the
Endpoints API in clusters where this new API is supported
* Dashboard
* Added new Spanish translations (please help us translate into your
language!)
* Added new section for exposing multicluster gateway metrics
* CLI
* Renamed the `--addon-config` flag to `--config` to clarify this flag can be
used
* Added fish shell completions to the `linkerd` command to set any Helm value
* Multicluster
* Replaced the single `service-mirror` controller, with separate controllers
that will be installed per target cluster through `linkerd multicluster
link`
* Changed the mechanism for mirroring services: instead of relying on
annotations on the target services, now the source cluster should specify
which services from the target cluster should be exported by using a label
selector
* Added support for creating multiple service accounts when installing
multicluster with Helm to allow more granular revocation
* Added a multicluster `unlink` command for removing multicluster links
* Prometheus
* Moved Linkerd's bundled Prometheus into an add-on (enabled by default); this
makes the Linkerd Prometheus more configurable, gives it a separate upgrade
lifecycle from the rest of the control plane, and will allow users to
disable the bundled Prometheus instance
* The long-awaited Bring-Your-Own-Prometheus case has been finally addressed:
added `global.prometheusUrl` to the Helm config to have linkerd use an
external Prometheus instance instead of the one provided by default
* Added an option to persist data to a volume instead of memory, so that
historical metrics are available when prometheus is restarted
* The helm chart can now configure persistent storage and limits
* Other
* Added a new `linkerd.io/inject: ingress` annotation and accompanying
`--ingress` flag to the `inject command, to configure the proxy to support
service profiles and enable per-route metrics and traffic splits for HTTP
ingress controllers
* Changed the type of the injector and tap API secrets to `kubernetes.io/tls`
so they can be provisioned by cert-manager
* Changed default docker image repository to `ghcr.io` from `gcr.io`; **Users
who pull the images into private repositories should take note of this
change**
* Introduced support for authenticated docker registries
* Simplified the way that Linkerd stores its configuration; configuration is
now stored as Helm values in the `linkerd-config` ConfigMap
* Added support for Helm configuration of per-component proxy resources
requests
This release includes changes from a massive list of contributors. A special
thank-you to everyone who helped make this release possible: --long
list, see file --
* Fixed some bad copypasta
* Apply suggestions from code review
Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This edge supersedes edge-20.10.6 as a release candidate for stable-2.9.0.
* Fixed issue where the `check` command would error when there is no Prometheus
configured
* Fixed recent regression that caused multicluster on EKS to not work properly
* Changed the `check` command to warn instead of error when webhook certificates
are near expiry
* Added the `--ingress` flag to the `inject` command which adds the recently
introduced `linkerd.io/inject: ingress` annotation
* Fixed issue with upgrades where external certs would be fetched and stored
even though this does not happen on fresh installs with externally created
certs
* Fixed issue with upgrades where the issuer cert expiration was being reset
* Removed the `--registry` flag from the `multicluster install` command
* Removed default CPU limits for the proxy and control plane components in HA
mode
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This change updates `FetchExternalIssuerData` to be more like
`FetchIssuerData` and return expiry correctly.
This field is currently not used anywhere and is just done for
consistentcy purposes.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Per #5165, Kubernetes does not necessarily limit the proxy's access to
cores via `cgroups` when a CPU limit is set. As of #5168, the proxy now
supports a `LINKERD2_PROXY_CORES` environment configuration that
augments CPU detection from the host operating system.
This change modifies the proxy injector to ensure that this environment
is configured from the `Values.proxy.cores` Helm value, the
`config.linkerd.io/proxy-cpu-limit` annotation, and the `--proxy-cpu-limit`
install flag.
As discussed in #5167 & #5169, Kubernetes CPU limits are not necessarily
discoverable from within the pod. This means that the control plane
processes may allocate far more threads than can actually be used by the
process given its process limits.
This change removes the default CPU limits for all control plane
components. CPU limits may still be set via Helm configuration.
Now that the proxy can use more than one core, this behavior should be
enabled by default, even in HA mode.
This change modifies the default HA helm values to unset the cpu limit
for proxy containers.
This release adds support for the LINKERD2_PROXY_CORES environment
variable. When set, the value may limit the proxy's runtime resources
so that it does not allocate a thread per core available from the host
operating system.
---
* inbound: use MakeSwitch for loopback (linkerd/linkerd2-proxy#729)
* buffer: Remove readiness watch (linkerd/linkerd2-proxy#731)
* Allow specifying the number of available cores via the env (linkerd/linkerd2-proxy#733)
After the 2.9 multicluster refactoring, `linkerd mc install`'s only
workload installed is the nginx gateway, whose docker image is
configured through the flags `--gateway-nginx-image` and
`--gateway-nginx-image-version`. Thus there's no longer need of the
`--registry` flag, which is used OTOH by `linkerd mc link` which deploys the service mirror.
Currently, For legacy upgrades we are fetching even external certs and
using it for upgrades which contradicts the condition at
https://github.com/linkerd/linkerd2/blob/master/cli/cmd/options.go#L550
used with install and thus causing errors.
Instead we don't retrieve them with upgrades and hence they don't get
stored into the config and secrets which seems correct as we do not want
to store certs in the config and use them with upgrades when they are
created externally.
This touches only the upgrade path i.e `fetchIssuers` and would not
effect the retrievel of external certs for checks, etc.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
With legacy upgrades, we can parse the cert and store the expiry
correctly instead of storing it as the default value which could be a
problem when we use that field. Currently, we do not use this field and
hence it did not cause any problems.
Install on the latest edges, This field is correctly set and works
as expected. Thus, upgrades also have the right value.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* charts: Do not store .component in linkerd-config
This removes the `.component` fields from `Values.go` and also prevents them from being emitted into `linkerd-config` by attaching them into a temporary variable during injection.
This also simplies inbound and outbound Skip ports helm logic and adds quotes to them.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* cli: add `--ingress` flag to inject cmd
This PR adds a new inject flag called `--ingress` which when enabled
adds a new annotation i.e `linkerd.io/inject: ingress`.
This annotation is not applied in the `--manual` case and the env
variable is directly set.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Fixes#5149
Before:
```
linkerd-webhooks-and-apisvc-tls
-------------------------------
× tap API server has valid cert
certificate will expire on 2020-10-28T20:22:32Z
see https://linkerd.io/checks/#l5d-tap-cert-valid for hints
```
After:
```
linkerd-webhooks-and-apisvc-tls
-------------------------------
√ tap API server has valid cert
‼ tap API server cert is valid for at least 60 days
certificate will expire on 2020-10-28T20:22:32Z
see https://linkerd.io/checks/#l5d-webhook-cert-not-expiring-soon for hints
√ proxy-injector webhook has valid cert
‼ proxy-injector cert is valid for at least 60 days
certificate will expire on 2020-10-29T18:17:03Z
see https://linkerd.io/checks/#l5d-webhook-cert-not-expiring-soon for hints
√ sp-validator webhook has valid cert
‼ sp-validator cert is valid for at least 60 days
certificate will expire on 2020-10-28T20:21:34Z
see https://linkerd.io/checks/#l5d-webhook-cert-not-expiring-soon for hints
```
Signed-off-by: Alex Leong <alex@buoyant.io>
`linkerd mc link` wasn't properly setting the `gatewayAddresses` field
when such address had a `Hostname` field instead of `Ip`, like is the
case in EKS services of type LoadBalancer.
Fixes#5143
The availability of prometheus is useful for some calls in public-api
that the check uses. This change updates the ListPods in public-api
to still return the pods even when prometheus is not configured.
For a test that exclusively checks for prometheus metrics, we have a gate
which checks if a prometheus is configured and skips it othervise.
Signed-off-by: Tarun Pothulapati tarunpothulapati@outlook.com
* Use errors.Is instead of checking underlying err messages
Fixes#5132
This PR replaces the usage of `strings.hasSuffix` with `errors.Is`
wherever error messages are being checked. So, that the code is not
effected by changes in the underlying message. Also adds a string
const for http2 response body closed error
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* docs: Update external prom and grafana readme
Update `Values.yaml` to make it more clear about reverse proxy
configuration with external grafana instances.
Also, adds `global.prometheusUrl` and `global.grafanaUrl` into charts
`README`
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Restrict controlPlaneTracing field only to control plane components
Previously, `global.controlPlaneTracing` was not available during
injection and thus not affecting it.
This commit creates a new method which checks if controlPlaneTracing is
enabled and sets to the defaults if it is. This is done on the
duplicates thus preventing it from not being propagated into
`linkerd-config`
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
This edge supersedes edge-20.10.5 as a release candidate for
stable-2.9.0. It adds a new `linkerd.io/inject: ingress` annotation to
support service profiles and enable per-route metrics and traffic splits
for HTTP ingress controllers
* Added a new `linkerd.io/inject: ingress` annotation to configure the
proxy to support service profiles and enable per-route metrics and
traffic splits for HTTP ingress controllers
* Reduced performance impact of logging in the proxy, especially when
the `debug` or `trace` log levels are disabled
* Fixed spurious warnings logged by the `linkerd profile` CLI command
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Fixes#5121
* cli: skip emitting warnings in Profile
Whenever the tapDuration gets completed, there is a warning occured
which we do not emit. This looks like it has been changed in the latest
versions of the dependency.
* Use context.withDeadline instead of client.timeout
The usage of `client.Timeout` is not working correctly causing `W1022
17:20:12.372780 19049 transport.go:260] Unable to cancel request for
promhttp.RoundTripperFunc` to be emitted by the Kubernetes Client.
This is fixed by using context.WithDeadline and passing that into the
http Request.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Fixes#5118
This PR adds a new supported value for the `linkerd.io/inject` annotation. In addition to `enabled` and `disabled`, this annotation may now be set to `ingress`. This functions identically to `enabled` but it also causes the `LINKERD2_PROXY_INGRESS_MODE="true"` environment variable to be set on the proxy. This causes the proxy to operate in ingress mode as described in #5118
With this set, ingresses are able to properly load service profiles based on the l5d-dst-override header.
Signed-off-by: Alex Leong <alex@buoyant.io>
This release adds an 'ingress mode' to support per-request routing for
HTTP ingresses.
Additionally, the performance impact of logging should be reduced,
especially when the proxy log level is not set to `debug` or `trace`.
---
* router: Use NewService instead of MakeService (linkerd/linkerd2-proxy#724)
* outbound: Split TCP stack into dedicated modules (linkerd/linkerd2-proxy#725)
* trace: update `tracing-subscriber` to 0.2.14 (linkerd/linkerd2-proxy#726)
* outbound: Extract HTTP and server modules (linkerd/linkerd2-proxy#727)
* outbound: Introduce 'ingress mode' (linkerd/linkerd2-proxy#728)
* Reduce tracing spans to the debug level (linkerd/linkerd2-proxy#730)
gcr.io has an issue that it's not possible to update multi-arch images
(see eclipse/che#16983 and open-policy-agent/gatekeeper#665).
We're now relying on ghcr.io instead, which I verified doesn't have this
bug, so we can stop skipping these pushes.
Used to be triggered only for stable releases, but now that 2.9 stable
approaches let's turn it on for the upcoming RCs.
Signed-off-by: Alex Leong <alex@buoyant.io>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
Fixes#5098
When setting up multicluster, a target cluster may wish to create multiple service accounts to be used by source clusters' service mirrors. This allows the target cluster to individually revoke access to each of the source clusters. When using the Linkerd CLI, this can be accomplished by running the `linkerd multicluster allow` command multiple times to create multiple service accounts. However, there is no analogous workflow when installing with Helm.
We update the Helm templates to support interpreting the `remoteMirrorServiceAccountName` value as either a single string or a list of strings. In the case where it is a list, we create a service account and associated RBAC for each entry in the list.
Signed-off-by: Alex Leong <alex@buoyant.io>
Followup to #5100
We had both `controllerImageVersion` and `global.controllerImageVersion`
configs, but only the latter was taken into account in the chart
templates, so this change removes all of its references.
In #5110 the `global.proxy.destinationGetNetworks` configuration is
renamed to `global.clusterNetworks` to better reflect its purpose.
The `config.linkerd.io/proxy-destination-get-networks` annotation allows
this configuration to be overridden per-workload, but there's no real use
case for this. I don't think we want to support this value differing
between pods in a cluster. No good can come of it.
This change removes support for the `proxy-destination-get-networks`
annotation.
In order for the integration tests to run successfully on a dedicated ARM cluster, two small changes are necessary:
* We need to skip the multicluster test since this test uses two separate clusters (source and target)
* We need to properly uninstall the multicluster helm chart during cleanup.
With these changes, I was able to successfully run the integration tests on a dedicated ARM cluster.
Signed-off-by: Alex Leong <alex@buoyant.io>
There is no longer a proxy config `DESTINATION_GET_NETWORKS`. Instead of
reflecting this implementation in our values.yaml, this changes this
variable to the more general `clusterNetworks` to emphasize its
similarity to `clusterDomain` for the purposes of discovery.
The proxy no longer honors DESTINATION_GET variables, as profile lookups
inform when endpoint resolution is performed. Also, there is no longer
a router capacity limit.
As described in #5105, it's not currently possible to set the proxy log
level to `off`. The proxy injector's template does not quote the log
level value, and so the `off` value is handled as `false`. Thanks, YAML.
This change updates the proxy template to use helm's `quote` function
throughout, replacing manually quoted values and fixing the quoting for
the log level value.
We also remove the default logFormat value, as the default is specified
in values.yaml.
Currently the tracing deployments do not start on clusters where
restricted PodSecurityPolicies are enforced.
This PR adds the subchart's ServiceAccounts to the `linkerd-psp`
RoleBinding, thereby allowing the deployments to be satisfied.
Signed-off-by: Simon Weald <glitchcrab-github@simonweald.com>