This edge release removes TrafficSplits from the Linkerd dashboard as well as
fixing a number of issues in the policy controller.
* Removed the TrafficSplit page from the Linkerd viz dashboard
* Fixed an issue where the policy controller was not returning the correct
status for non-Service authorities
* Fixed an issue where the policy controller could use large amounts of CPU
when lease API calls failed
Signed-off-by: Alex Leong <alex@buoyant.io>
## edge-23.3.1
This edge release continues to build support under the hood for the upcoming
features in 2.13. Also included are several dependency updates and less verbose
logging.
* Removed dependency on the `curlimages/curl` 3rd-party image used to initialize
extensions namespaces metadata (so they are visible by `linkerd check`),
replaced by the new `extension-init` image
* Lowered non-actionable error messages in the Destination log to debug-level
entries to avoid triggering false alarms (thanks @siddharthshubhampal!)
Fixes#9985
When installing extensions via Helm, the `namespace-metadata` job adds the following metadata to its extension namespace:
- `linkerd.io/extension` label to have `linkerd check` identify the extension
- `pod-security.kubernetes.io/enforce` label Pod Security Admission, depending on whether linkerd-cni is enabled
- for viz only, the `viz.linkerd.io/1external-prometheus` annotation, if using an external Prometheus instance
The job uses a `curlimages/curl` docker image and performs those mutations through an inline shell script, which has two downsides:
- Security scanner warnings are sometimges triggered because of outdated binaries in there that we're not using
- The limitations of the shell environment don't allow to have a clear and maintainable script
This change replaces the `curlimage/curl` image in the `namespace-metadata` jobs with the new image for
`linkerd-extension-init` currently worked on
linkerd/linkerd-extension-init#2.
This edge release includes a number of fixes and introduces a new CLI command,
`linkerd prune`. The new `prune` command should be used to remove resources
which are no longer part of the Linkerd manifest when doing an upgrade.
Previously, the recommendation was to use `linkerd upgrade` in conjunction with
`kubectl apply --prune`, however, that will not remove resources which are not
part of the input manifest, and it will not detect cluster scoped resources,
`linkerd prune` (included in all core extensions) should be preferred over it.
Additionally, this change contains a few fixes from our external contributors,
and a change to the `viz` Helm chart which allows for arbitrary annotations on
`Service` objects. Last but not least, the release contains a few proxy
internal changes to prepare for the new client policy API.
* Added a new `linkerd prune` command to the CLI (including extensions) to
remove resources which are no longer part of Linkerd's manifests
* Introduced new values in the `viz` chart to allow for arbitrary annotations
on the `Service` objects (thanks @sgrzemski!)
* Fixed up a comment in k8s API wrapper (thanks @ductnn!)
* Fixed an issue with EndpointSlice endpoint reconciliation on slice deletion;
when using more than one slice, a `NoEndpoints` event would be sent to the
proxy regardless of the amount of endpoints that were still available (thanks
@utay!)
Signed-off-by: Matei David <matei@buoyant.io>
Fixes: #10262
When a resource is removed from the Linkerd manifests from one version to the next, we would like that resource to be removed from the user's cluster as part of the upgrade process. Our current recommendation is to use the `linkerd upgrade` command in conjunction with the `kubectl apply` command and the `--prune` flag to remove resources which are no longer part of the manifest. However, `--prune` has many shortcomings and does not detect resources kinds which are not part of the input manifest, nor does it detect cluster scoped resources. See https://linkerd.io/2.12/tasks/upgrade/#with-the-linkerd-cli
We add a `linkerd prune` command which locates all Linkerd resources on the cluster which are not part of the Linkerd manifest and prints their metadata so that users can delete them. The recommended upgrade procedure would then be:
```
> linkerd upgrade | kubectl apply -f -
> linkerd prune | kubectl delete -f -
```
User must take special care to use the desired version of the CLI to run the prune command since running this command will print all resources on the cluster which are not included in that version.
We also add similar prune commands to each of the `viz`, `multicluster`, and `jaeger` extensions for deleting extension resources which are not in the extension manifest.
Signed-off-by: Alex Leong <alex@buoyant.io>
## edge-23.2.2
This edge release adds the policy status controller which writes the `status`
field to HTTPRoutes when a parent reference Server accepts or rejects the
HTTPRoute. This field is currently not consumed by the policy controller, but
acts as the first step for considering HTTPRoute `status` when serving policy.
Additionally, the destination controller now uses the Kubernetes metadata API
for resources which it only needs to track the metadata for — Nodes and
ReplicaSets. For all other resources it tracks, it uses additional information
so continues to use the API as before.
* Fixed error message to include the colliding Server in the policy controller's
admission webhook validation
* Updated wording for linkerd-multicluster cluster when it fails to probe a
remote gateway mirror
* Removed unnecessary Namespaces access from the destination controller RBAC
* Added Kubernetes metadata API in the destination controller for watching Nodes
and ReplicaSets
* Fixed QueryParamMatch parsing for HTTPRoutes
* Added the policy status controller which writes the `status` field to
HTTPRoutes when a parent reference Server accepts or rejects it
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
Close#10058
The Helm value `proxy.waitBeforeExitSeconds` introduces a pause after the pod receives the shutdown signal, and it was intended for pods in the data-plane whose main container needs to perform shutdown-time operations that require the network. Linkerd's control-plane pods don't require that*.
Additionally, if such shutdown operations take longer than 30s, then the user needs to set the pod's `terminationGracePeriod` (whose default is 30s) to be greater than `proxy.waitBeforeExitSeconds` to avoid the kubelet killing the pod before the operations completes. We don't expose `terminationGracePeriod` as a parameter to linkerd's pods so this scenario results in an error such as this:
```
Exec lifecycle hook ([/bin/sleep 40]) for Container "linkerd-proxy" in Pod "linkerd-destination-9559586c5-g9jns_linkerd(e33e8d02-66ca-42fa-9a7c-0ea45bda814a)" failed - error: command '/bin/sleep 40' exited with 137: , message: ""
```
For these two reasons, this change disables the `proxy.waitBeforeExitSeconds` setting for the linkerd pods, either by overriding it at the template level (for core control-plane pods) or through an annotation (for extension pods).
(*) The Viz and Jaeger extensions don't require the network during shutdown either. The Multicluster extension already exposes a setting for `terminationGracePeriod`, so this change doesn't affect this particular extension.
Fixes#10150
When we added PodSecurityAdmission in #9719 (and included in
edge-23.1.1), we added the entry `seccompProfile.type=RuntimeDefault` to
the containers SecurityContext.
For PSP to accept that we require to add the annotation
`seccomp.security.alpha.kubernetes.io/allowedProfileNames:
"runtime/default"` into the PSP resource, which also implies we require
to add the entry `seccompProfile.type=RuntimeDefault` to the pod's
SecurityContext as well, not just the container's.
It also turns out the `namespace-metadata` Jobs used by extensions for
the helm installation method didn't have their ServiceAccount properly
bound to the PSP resource. This resulted in the `helm install` command
failing, and although the extensions resources did get deployed, they
were not being discoverable by `linkerd check`. This change fixes that
as well, that has been broken since 2.12.0!
This edge release introduces a number of different fixes changes to the proxy.
The proxy has been updated to initialize routes lazily, which means service
profile routes will now only show up in the metrics when a route is used. In
the extensions, old (`ServerAuthorization`) resources have been converted to
`AuthorizationPolicy` -- as part of this change, redundant policy resources
have been cleaned up. A bug in the destination controller that could
potentially lead to stale pods being considered in the load balancer has been
fixed; operations that could previously result in this behavior are now
infallible. Support has been added for `Pod Security Admission`, used instead
of `Pod Security Policy`, as part of this change, some of the extension charts
have been modified to include a `cniEnabled` flag that will impact the policy
used.
Finally, this edge release contains a number of fixes and improvements
from our contributors.
* Converted `ServerAuthorization` resources to `AuthorizationPolicy` resources
in Linkerd extensions
* Removed policy resources bound to admin servers in extensions (previously
these resources were used to authorize probes but now are authorized by
default)
* Added a `resources` field in the linkerd-cni chart (thanks @jcogilvie!)
* Fixed an issue in the CLI where `--identity-external-ca` would set an
incorrect field (thanks @anoxape!)
* Fixed an issue in the destination controller that could result in stale
endpoints when using EndpointSlice objects. Logic that previously resulted in
undefined behavior is now infallible and endpoints will no longer be skipped
during removal
* Added namespace to namespace-metadata resources in Helm (thanks @joebowbeer!)
* Added support for Pod Security Admission (superseedes PSPs); through this
change extensions now have a `cniEnabled` value in their charts that will
directly influence which PSA policy to use
* Changed routes to be initialized lazily. Service Profile routes will no
longer show up in metrics until the route is used (default routes are always
available when no Service Profile is defined for a service)
* Changed the proxy's behavior when traffic splitting so that only services
that are not in failfast are used. This will enable the proxy to manage
failover without external coordination
* Updated tokio (async runtime) in the proxy which should reduce CPU usage,
especially for proxy's pod local (i.e in the same network namespace)
communication
Signed-off-by: Matei David <matei@buoyant.io>
Co-authored-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
The Linkerd extension charts use ServerAuthorization resources. AuthorizationPolicies are now the recommended resource to use in favor of ServerAuthorizations. We replace all of the ServerAuthorization resources in the Linkerd extension charts with AuthorizationPolicy resources.
Signed-off-by: Alex Leong <alex@buoyant.io>
Fixes#9364
Since probes are automatically authorized, Linkerd extensions no longer need admin Server resources in order for probes to be authorized. We therefore remove them.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Removed dupe imports
My IDE (vim-gopls) has been complaining for a while, so I decided to take
care of it. Found via
[staticcheck](https://github.com/dominikh/go-tools)
* Add stylecheck to go-lint checks
Closes#9676
This adds the `pod-security.kubernetes.io/enforce` label as described in [Pod Security Admission labels for namespaces](https://kubernetes.io/docs/concepts/security/pod-security-admission/#pod-security-admission-labels-for-namespaces).
PSA gives us three different possible values (policies or modes): [privileged, baseline and restricted](https://kubernetes.io/docs/concepts/security/pod-security-standards/).
For non-CNI mode, the proxy-init container relies on granting the NET_RAW and NET_ADMIN capabilities, which places those pods under the `restricted` policy. OTOH for CNI mode we can enforce the `restricted` policy, by setting some defaults on the containers' `securityContext` as done in this PR.
Also note this change also adds the `cniEnabled` entry in the `values.yaml` file for all the extension charts, which determines what policy to use.
Final note: this includes the fix from #9717, otherwise an empty gateway UID prevents the pod to be created under the `restricted` policy.
## How to test
As this is only enforced as of k8s 1.25, here are the instructions to run 1.25 with k3d using Calico as CNI:
```bash
# launch k3d with k8s v1.25, with no flannel CI
$ k3d cluster create --image='+v1.25' --k3s-arg '--disable=local-storage,metrics-server@server:0' --no-lb --k3s-arg --write-kubeconfig-mode=644 --k3s-arg --flannel-backend=none --k3s-arg --cluster-cidr=192.168.0.0/16 --k3s-arg '--disable=servicelb,traefik@server:0'
# install Calico
$ k apply -f https://k3d.io/v5.1.0/usage/advanced/calico.yaml
# load all the images
$ bin/image-load --k3d proxy controller policy-controller web metrics-api tap cni-plugin jaeger-webhook
# install linkerd-cni
$ bin/go-run cli install-cni|k apply -f -
# install linkerd-crds
$ bin/go-run cli install --crds|k apply -f -
# install linkerd-control-plane in CNI mode
$ bin/go-run cli install --linkerd-cni-enabled|k apply -f -
# Pods should come up without issues. You can also try the viz and jaeger extensions.
# Try removing one of the securityContext entries added in this PR, and the Pod
# won't come up. You should be able to see the PodSecurity error in the associated
# ReplicaSet.
```
To test the multicluster extension using CNI, check this [gist](https://gist.github.com/alpeb/4cbbd5ad87538b9e0d39a29b4e3f02eb) with a patch to run the multicluster integration test with CNI in k8s 1.25.
## edge-22.12.1
This edge release introduces static and dynamic port overrides for CNI eBPF
socket-level load balancing. In certain installations when CNI plugins run in
eBPF mode, socket-level load balancing rewrites packet destinations to port
6443; as with 443 already, this port is now skipped as well on control plane
components so that they can communicate with the Kubernetes API before their
proxies are running.
Additionally, a potential panic and false warning have been fixed in the
destination component.
* Updated linkerd-jaeger's collector component to expose port 4318 in order
support HTTP alongside gRPC (thanks @uralsemih!)
* Introduced the `privileged` configuration which allows the `proxy-init`
container to run as privileged without also running as root
* Fixed a potential panic in the destination component caused by concurrent
writes when dealing with Endpoint updates
* Fixed false warning when looking up HostPort mappings on Pods
* Added static and dynamic port overrides for CNI eBPF to work with socket-level
load balancing
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
Added 4318 port to collector (supporting http, alongside the existing gRPC port 4317), and corresponding Server entry. Also updated the opentelemetry-collector image version.
Linkerd otel-collector has not 4318 port where it is necessary for frontend like React
Signed-off-by: Semih Ural <uralsmh@gmail.com>
* edge-22.11.3 change notes
Besides the notes, this corrects a small point in `RELEASE.md`, and
bumps the proxy-init image tag to `v2.1.0`. Note that the entry under
`go.mod` wasn't bumped because moving it past v2 requires changes on
`linkerd2-proxy-init`'s `go.mod` file, and we're gonna drop that
dependency soon anyways. Finally, all the charts got their patch version
bumped, except for `linkerd2-cni` that got its minor bumped because of
the tolerations default change.
## edge-22.11.3
This edge release fixes connection errors to pods using a `hostPort` different
than their `containerPort`. Also the `network-validator` init container improves
its logging, and the `linkerd-cni` DaemonSet now gets deployed in all nodes by
default.
* Fixed `destination` service to properly discover targets using a `hostPort`
different than their `containerPort`, which was causing 502 errors
* Upgraded the `network-validator` with better logging allowing users to
determine whether failures occur as a result of their environment or the tool
itself
* Added default `Exists` toleration to the `linkerd-cni` DaemonSet, allowing it
to be deployed in all nodes by default, regardless of taints
Co-authored-by: Oliver Gould <ver@buoyant.io>
## edge-22.11.2
This edge release introduces the use of the Kubernetes metadata API in the
proxy-injector and tap-injector components. This can reduce the IO and memory
footprint for those components as they now only need to track the metadata for
certain resources, rather than the entire resource itself. Similar changes will
be made for the destination component in an upcoming release.
* Bumped HTTP dependencies to fix a potential deadlock in HTTP/2 clients
* Changed the proxy-injector and tap-injector components to use the metadata API
which should result in less memory consumption
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
* Use metadata API in the proxy and tap injectors
Part of #9485
This adds a new `MetadataAPI` similar to the current `k8s.API` hosting informers, but backed by k8s' `metadatainformer` shared informers, which retrieves only the objects metadata, resulting in less memory consumption by its clients. Currently this is only implemented for the proxy and tap injectors. Usage by the destination controller will be implemented as a follow-up.
## Existing API enhancements
Shared objects and logic required by API and MetadataAPI have been moved to the new `k8s.go`, `api_resource.go` and `prometheus.go` files. That includes the `isValidRSParent()` function whose arg is now more generic.
## Unit tests
`/controller/k8s/api_test.go` now also instantiates a MetadataAPI, used in the augmented `TestGetObjects()` and `TestGetOwnerKindAndName()` tests. The `resources` struct was introduced to capture the common fields among tests and simplify `newMockAPI()`'s signature.
## Other Changes
The injector no longer watches for Pods. It only requires watching workloads that own resources (and also watch namespaces), so Pod is not required.
## Testing Memory Consumption
Install linkerd, inject emojivoto and check the injector memory consumption with `kubectl -n linkerd top pod linkerd-proxy-injector-xxx`. It'll start consuming about 16Mi. Then ramp up emojivoto's `voting` deployment replicas to 2000. After 5 minutes memory will stabilize around 32Mi using the current branch. Using the latest edge, it'll stabilize around 110Mi.
edge-22.11.1
This edge releases ships a few fixes in Linkerd's dashboard, and the
multicluster extension. Additionally, a regression has been fixed in the CLI
that blocked upgrades from versions older than 2.12.0, due to missing CRDs
(even if the CRDs were present in-cluster). Finally, the release includes
changes to the helm charts to allow for arbitrary (user-provided) labels on
Linkerd workloads.
* Fixed an issue in the CLI where upgrades from any version prior to
stable-2.12.0 would fail when using the `--from-manifest` flag
* Removed un-injectable namespaces, such as kube-system from unmeshed resource
notification in the dashboard (thanks @MoSattler!)
* Fixed an issue where the dashboard would respond to requests with 404 due to
wrong root paths in the HTML script (thanks @junnplus!)
* Removed the proxyProtocol field in the multicluster gateway policy; this has
the effect of changing the protocol from 'HTTP/1.1' to 'unknown' (thanks
@psmit!)
* Fixed the multicluster gateway UID when installing through the CLI, prior to
this change the 'runAsUser' field would be empty
* Changed the helm chart for the control plane and all extensions to support
arbitrary labels on resources (thanks @bastienbosser!)
Signed-off-by: Matei David <matei@buoyant.io>
## edge-22.10.3
This edge release adds `network-validator`, a new init container to be used when
CNI is enabled. `network-validator` ensures that local iptables rules are
working as expected. It will validate this before linkerd-proxy starts.
`network-validator` replaces the `noop` container, runs as `nobody`, and drops
all capabilities before starting.
* Validate CNI `iptables` configuration during pod startup
* Fix "cluster networks contains all services" fails with services with no
ClusterIP
* Remove kubectl version check from `linkerd check` (thanks @ziollek!)
* Set `readOnlyRootFilesystem: true` in viz chart (thanks @mikutas!)
* Fix `linkerd multicluster install` by re-adding `pause` container image
in chart
* linkerd-viz have hardcoded image value in namespace-metadata.yml template
bug correction (thanks @bastienbosser!)
Signed-off-by: Steve Jenson <stevej@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
Co-authored-by: Matei David <matei@buoyant.io>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
This edge release fixes an issue with CNI chaining that was preventing the
Linkerd CNI plugin from working with other CNI plugins such as Cilium. It also
includes several other fixes.
* Updated Grafana dashboards to use variable duration parameter so that they can
be used when Prometheus has a longer scrape interval (thanks @TarekAS)
* Fixed handling of .conf files in the CNI plugin so that the Linkerd CNI plugin
can be used alongside other CNI plugins such as Cilium
* Added a `linkerd diagnostics policy` command to inspect Linkerd policy state
* Added a check that ClusterIP services are in the cluster networks
* Added a noop init container to injected pods when the CNI plugin is enabled
to prevent certain scenarios where a pod can get stuck without an IP address
* Fixed a bug where the `config.linkerd.io/proxy-version` annotation could be empty
Signed-off-by: Alex Leong <alex@buoyant.io>
This edge release fixes some sections of the Viz dashboard appearing blank, and
adds an optional PodMonitor resource to the Helm chart to enable easier
integration with the Prometheus Operator. It also includes many fixes submitted
by our contributors.
* Fixed the dashboard sections Tap, Top, and Routes appearing blank (thanks
@MoSattler!)
* Added an optional PodMonitor resource to the main Helm chart (thanks
@jaygridley!)
* Fixed the CLI's `--api-addr` flag, that was being ignored (thanks @mikutas!)
* Expanded the `linkerd authz` command to display AuthorizationPolicy resources
that target namespaces (thanks @aatarasoff!)
* Fixed the `NotIn` label selector operator in the policy resources, that was
erroneously treated as `In`.
* Fixed warning logic around the "linkerd-viz ClusterRoles exist" and
"linkerd-viz ClusterRoleBindings exist" checks in `linkerd viz check`
* Fixed proxies emitting some duplicate inbound metrics
This release includes several control plane and proxy fixes for
`stable-2.12.0`. In particular, it fixes issues related to control plane
HTTP servers' header read timeouts resulting in decreased controller
success rates, lowers the inbound connection pool idle timeout in the
proxy, and fixes an issue where the jaeger injector would put pods into
an error state when upgrading from stable-2.11.x.
Additionally, this release adds the `linkerd.io/trust-root-sha256`
annotation to all injected workloads allowing predictable comparison of
all workloads' trust anchors via the Kubernetes API.
For Windows users, note that the Linkerd CLI's `nupkg` file for
Chocolatey is once again included in the release assets (it was
previously removed in stable-2.10.0).
* Proxy
* Lowered inbound connection pool idle timeout to 3s
* Control Plane
* Updated AdmissionRegistration API version usage to v1
* Added `linkerd.io/trust-root-sha256` annotation on all injected
workloads to indicate certifcate bundle
* Updated fields in `AuthorizationPolicy` and `MeshTLSAuthentication`
to conform to specification (thanks @aatarasoff!)
* Updated the identity controller to not require a
`ClusterRoleBinding` to read all deployment resources
* Increased servers' header read timeouts so they no longer match
default probe and Prometheus scrape intervals
* Helm
* Restored `namespace` field in Linkerd helm charts
* Updated `PodDisruptionBudget` `apiVersion` from `policy/v1beta1` to
`policy/v1` (thanks @Vrx555!)
* Extensions
* Fixed jaeger injector interfering with upgrades to 2.12.x
This release fixes an issue where the jaeger injector would put pods into an
error state when upgrading from stable-2.11.x.
* Updated AdmissionRegistration API version usage to v1
* Fixed jaeger injector interfering with upgrades to 2.12.x
Signed-off-by: Alex Leong <alex@buoyant.io>
## edge-22.9.1
This release adds the `linkerd.io/trust-root-sha256` annotation to all injected
workloads allowing predictable comparison of all workloads' trust anchors via
the Kubernetes API.
Additionally, this release lowers the inbound connection pool idle timeout to
3s. This should help avoid socket errors, especially for Kubernetes probes.
* Added `linkerd.io/trust-root-sha256` annotation on all injected workloads
to indicate certifcate bundle
* Lowered inbound connection pool idle timeout to 3s
* Restored `namespace` field in Linkerd helm charts
* Updated fields in `AuthorizationPolicy` and `MeshTLSAuthentication` to
conform to specification (thanks @aatarasoff!)
* Updated the identity controller to not require a `ClusterRoleBinding`
to read all deployment resources.
In #6635 (f9f3ebe), we removed the `Namespace` resources from the
linkerd Helm charts. But this change also removed the `namespace` field
from all generated metadata, adding conditional logic to only include it
when being installed via the CLI.
This conditional logic currently causes spurious whitespace in output
YAML. This doesn't cause problems but is aesthetically
inconsistent/distracting.
This change removes the `partials.namespace` helper and instead inlines
the value in our templates. This makes our CLI- and Helm-generated
manifests slightly more consistent and removes needless indirection.
Signed-off-by: Oliver Gould <ver@buoyant.io>
## edge-22.8.3
Increased control plane HTTP servers' read timeouts so that they no longer
match the default probe intervals. This was leading to closed connections
and decreased controller success rate.
Signed-off-by: Jeremy Chase <jeremy.chase@gmail.com>
Closes#9230#9202 prepped the release candidate for `stable-2.12.0` by removing the `-edge`
suffix and adding the `-rc2` suffix.
This preps the chart versions for the stable release by removing that `-rc2`
suffix.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
Prevent proxy-injection for Helm Hook Job Pod for namespace-metadata, as the injected proxy would prevent the Job from terminating. See also #8194, which was closed but the Problem still exists.
Fixes#8194
Signed-off-by: Alex Berger alex-berger@gmx.ch
Signed-off-by: Alexander Berger <alex.berger@nexxiot.com>
This release is the second release candidate for stable-2.12.0.
At this point the Helm charts can be retrieved from the stable repo:
```
helm repo add linkerd https://helm.linkerd.io/stable
helm repo up
helm install linkerd-crds -n linkerd --create-namespace linkerd/linkerd-crds
helm install linkerd-control-plane \
-n linkerd \
--set-file identityTrustAnchorsPEM=ca.crt \
--set-file identity.issuer.tls.crtPEM=issuer.crt \
--set-file identity.issuer.tls.keyPEM=issuer.key \
linkerd/linkerd-control-plane
```
The following lists all the changes since edge-22.8.2:
* Fixed inheritance of the `linkerd.io/inject` annotation from Namespace to
Workloads when its value is `ingress`
* Added the `config.linkerd.io/default-inbound-policy: all-authenticated`
annotation to linkerd-multicluster’s Gateway deployment so that all clients
are required to be authenticated
* Added a `ReadHeaderTimeout` of 10s to all the go `http.Server` instances, to
avoid being vulnerable to "slowrolis" attacks
* Added check in `linkerd viz check --proxy` to warn in case namespace have the
`config.linkerd.io/default-inbound-policy: deny` annotation, which would not
authorize scrapes coming from the linkerd-viz Prometheus instance
* Added validation for accepted values for the `--default-inbound-policy` flag
* Fixed invalid URL in the `linkerd install --help` output
* Added `--destination-pod` flag to `linkerd diagnostics endpoints` subcommand
* Added `proxyInit.runAsUser` in `values.yaml` defaulting to non-zero, to
complement the new default `proxyInit.runAsRoot: false` that was rencently
changed
* Update the devcontainer to use Node 16
* Update markdowlint-cli2 to v0.5.1
* Update the markdown workflow to use a newer action
* Address various markdown linting issues
* Add a `just markdownlint` recipe
* Publish dev:v26
Signed-off-by: Oliver Gould <ver@buoyant.io>
This release is considered a release candidate for stable-2.12.0 and we
encourage you to try it out! It includes an update to the multicluster extension
which adds support for Kubernetes v1.24 and also updates many CLI commands to
support the new policy resources: ServerAuthorization and HTTPRoute.
* Updated linkerd check to allow RSA signed trust anchors (thanks @danibaeyens)
* Fixed some invalid yaml in the viz extension's tap-injector template (thanks @wc-s)
* Added support for AuthorizationPolicy and HttpRoute to viz authz command
* Added support for AuthorizationPolicy and HttpRoute to viz stat
* Added support for policy metadata in linkerd tap
* Fixed an issue where certain control plane components were not restarting as
necessary after a trust root rotation
* Added a ServiceAccount token Secret to the multicluster extension to support
Kubernetes versions >= v1.24
* Fixed an issuer where the --default-inbound-policy setting was not being
respected
Signed-off-by: Alex Leong <alex@buoyant.io>
## edge-22.8.1
This releases introduces default probe authorization. This means that on
clusters that use a default `deny` policy, probes do not have to be explicitly
authorized using policy resources. Additionally, the
`policyController.probeNetworks` Helm value has been added, which allows users
to configure the networks that probes are expected to be performed from.
Additionally, the `linkerd authz` command has been updated to support the policy
resources AuthorizationPolicy and HttpRoute.
Finally, some smaller changes include allowing to disable `linkerd-await` on
control plane components (using the existing `proxy.await` configuration) and
changing the default iptables mode back to `legacy` to support more cluster
environments by default.
* Updated the `linkerd authz` command to support AuthorizationPolicy and
HttpRoute resources
* Changed the `proxy.await` Helm value so that users can now disable
`linkerd-await` on control plane components
* Added probe authorization by default allowing clusters that use a default
`deny` policy to not explicitly need to authorize probes
* Added ability to run the Linkerd CNI plugin in non-chained (stand-alone) mode
* Added the `policyController.probeNetworks` Helm value for configuring the
networks that probes are expected to be performed from
* Changed the default iptables mode to `legacy`
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
* edge-22.7.3
This release adds a new `nft` iptables mode, used by default in proxy-init.
When used, firewall configuration will be set-up through the `iptables-nft`
binary; this should allow hosts that do not support `iptables-legacy` (such as
RHEL based environments) to make use of the init container. The older
`iptables-legacy` mode is still supported, but it must be explictly turned on.
Moreover, this release also replaces the `HTTPRoute` CRD with Linkerd's own
version, and includes a number of fixes and improvements.
* Added a new `iptables-nft` mode for proxy-init. When running in this mode,
the firewall will be configured with `nft` kernel API; this should allow
users to run the init container on RHEL-family hosts
* Fixed an issue where the proxy-injector would break when using `nodeAffinity`
values for the control plane
* Updated healthcheck to ignore `Terminated` state for pods (thanks
@AgrimPrasad!)
* Replaced `HTTRoute` CRD version from `gateway.networking.k8s.io` with a
similar version from the `policy.linkerd.io` API group. While the CRD is
similar, it does not support the `Gateway` type, does not contain the
`backendRefs` fields, and does not support `RequestMirror` and `ExtensionRef`
filter types.
* Updated the default policy controller log level to `info`; the controller
will now emit INFO level logs for some of its dependencies
* Added validation to ensure `HTTPRoute` paths are absolute; relative paths are
not supported by the proxy and the policy controller admission server will
reject any routes that use paths which do not start with `/`
Signed-off-by: Matei David <matei@buoyant.io>
Go 1.18 features a number of important chanages, notably removing client
support for defunct TLS versions: https://tip.golang.org/doc/go1.18
This change updates our Go version in CI and development.
Signed-off-by: Oliver Gould <ver@buoyant.io>
Our devcontainers pin versions of all of the tools we need to build &
test the project, but these tools are not necessarily kept in sync with
those in our devcontainer.
This change introduces new variants of our devcontainer image that can
be pre-bundled with Go or Rust tooling (with fairly minimal container
images). Various CI workflows are updated to use the same tooling
versions that are used by our devcontainer, and a CI workflow is added
to ensure that these versions stay in sync. Some workflows are NOT
updated--especially those that invoke `docker`--since the docker
environment is severely limited when running inside of a container.
Furthermore, this change does the following:
* Update shellcheck to v0.8.0;
* Update `bin/shellcheck-all` to exclude irrelevant files (that are not
part of the project);
* Add `helm` and `helm-docs` to the devcontainer;
* Update `helm` to v3.9.1
* Update `helm-docs` to v1.11.0
* Include tools like `just`, `cargo-action-fmt`, and `cargo-nextest` in
our Rust image
* Add a `just` recipe that builds (and optionally publish) the
appropriate devcontainer images
Signed-off-by: Oliver Gould <ver@buoyant.io>
This release adds support for per-route authorization policy using the
AuthorizationPolicy and HttpRoute resources. It also adds a configurable
shutdown grace period to the proxy which can be used to ensure that proxy
graceful shutdown completes within a certain time, even if there are outstanding
open connections.
* Removed kube-system exclusions from watchers to fix service discovery for
workloads in the kube-system namespace (thanks @JacobHenner)
* Added annotations to allow Linkerd extension deployments to be evicted by the
autoscaler when necessary
* Added missing port in the Linkerd viz chart documentation (thanks @haswalt)
* Added support for per-route policy by supporting AuthorizationPolicy resources
which target HttpRoute resources
* Fixed the `linkerd check` command crashing when unexpected pods are found in
a Linkerd namespace
* Added a `config.linkerd.io/shutdown-grace-period` annotation to configure the
proxy's maximum grace period for graceful shutdown
Signed-off-by: Alex Leong <alex@buoyant.io>
Closes#8916
When a random Pod (meshed or not) is created in the `linkerd`, `linkerd-viz`, or
`linkerd-jaeger` namespaces their respective `check` subcommands can fail.
We parse Pod names for their owning Deployment by assuming the Pod name has a
randomized suffix. For example, the `linkerd-destination` Deployment creates the
`linkerd-destination-58c57dd675-7tthr` Pod. We split the name on `-` and take
the first two parts (`["linkerd", "destination"]`); those first two parts make
up the Deployment name.
Now, if a random Pod is created in the namespace with the name `test`, we apply
that same logic but hit a runtime error when trying to get the first two parts
of the split. `test` did not split at all since it contains no `-` and therefore
we error with `slice bounds out of range`.
To fix this, we now use the fact that all Linkerd components have a
`linkerd.io/control-plane-component` or `component` label with a value that is
the owning Deployment. This allows us to avoid any extra parsing logic and just
look at a single label value.
Additionally, some of these checks get all the Pods in a namespace with the
`GetPodsByNamespace` method but we don't always need something so general. In
the places where we are checking specifically for Linkerd components, we can
narrow this further by using the expected LabelSelector such as
`linkerd.io/extension=viz`.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
Fixes: #8308
We add the `cluster-autoscaler.kubernetes.io/safe-to-evict: "true"` annotation to all Linkerd extension deployments. This signals that none of these deployments use persistent storage and they are all eligible for eviction if necessary.
Signed-off-by: Alex Leong <alex@buoyant.io>
This release includes a security improvement. When a user manually specified the
`policyValidator.keyPEM` setting, the value was incorrectly included in the
`linkerd-config` configmap. This means that this private key was erroneously
exposed to service accounts with read access to this configmap. Practically,
this means that the Linkerd `proxy-injector`, `identity`, and `heartbeat` pods
could read this value. This should **not** have exposed this private key to
other unauthorized users unless additional role bindings were added outside of
Linkerd. Nevertheless, we recommend that users who manually set control plane
certificates update the credentials for the policy validator after upgrading
Linkerd.
Additionally, the linkerd-multicluster extensions has several fixes related to
fail fast errors during link watch restarts, improper label matching for
mirrored services, and properly cleaning up mirrored endpoints in certain
situations.
Lastly, the proxy can now retry gRPC requests that have responses with a
TRAILERS frame. A fix to reduce redundant load balancer updates should also
result in less connection churn.
* Changed unit tests to use newly introduced `prommatch` package for asserting
expected metrics (thanks @krzysztofdrys!)
* Fixed Docker container runtime check to only during `linkerd install` rather
than `linkerd check --pre`
* Changed linkerd-multicluster's remote cluster watcher to assume the gateway is
alive when starting—fixing fail fast errors from occurring during restarts
(thanks @chenaoxd!)
* Added `matchLabels` and `matchExpressions` to linkerd-multicluster's Link CRD
* Fixed linkerd-multicluster's label selector to properly select resources that
match the expected label value, rather than just the presence of the label
* Fixed linkerd-multicluster's cluster watcher to properly clean up endpoints
belonging to remote headless services that are no longer mirrored
* Added the HttpRoute CRD which will be used by future policy features
* Fixed CNI plugin event processing where file updates could sometimes be
skipped leading to the update not being acknowledged
* Fixed redundant load balancer updates in the proxy that could cause
unnecessary connection churn
* Fixed gRPC request retries for responses that contain a TRAILERS frame
* Fixed the dashboard's `linkerd check` due to missing RBAC for listing pods in
the cluster
* Fixed API check that ensures access to the Server CRD (thanks @aatarasoff!)
* Changed `linkerd authz` to match the labels of pre-fetched Pods rather than
the multiple API calls it was doing—resulting in significant speed-up (thanks
@aatarasoff!)
* Unset `policyValidtor.keyPEM` in `linkerd-config` ConfigMap
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
This edge release bumps the minimum supported Kubernetes version from `v1.20`
to `v1.21`, introduces some new changes, and includes a few bug fixes. Most
notably, a bug has been fixed in the proxy's outbound load balancer that could
cause panics, especially when the balancer would process many service discovery
updates in a short period of time. This release also fixes a panic in the
proxy-injector and introduces a change that will include HTTP probe ports in
the proxy's inbound ports configuration, to be used for policy discovery.
* Fixed a bug in the proxy's outbound load balancer that could cause panics
when many discovery updates were processed in short time periods
* Added `runtimeClassName` options to Linkerd's Helm chart (thanks @jtcarnes!)
* Introduced a change in the proxy-injector that will configure the inbound
ports proxy configuration with the pod's probe ports (HTTPGet)
* Added godoc links in the project README file (thanks @spacewander!)
* Increased minimum supported Kubernetes version to `v1.21` from `v1.20`
* Fixed an issue where the proxy-injector would not emit events for resources
that receive annotation patches but are skipped for injection
* Refactored `PublicIPToString` to handle both IPv4 and IPv6 addresses in a
similar behavior (thanks @zhlsunshine!)
* Replaced the usage of branch with tags, and pinned `cosign-installer` action
to `v1` (thanks @saschagrunert!)
* Fixed an issue where the proxy-injector would panic if resources have an
unsupported owner kind
Signed-off-by: Matei David <matei@buoyant.io>
Co-authored-by: Eliza Weisman <eliza@buoyant.io>
Fixes#8592
Increase the minimum supported kubernetes version from 1.20 to 1.21. This allows us to drop support for batch/v1beta1/CronJob and discovery/v1beta1/EndpointSlices, instead using only v1 of those resources. This fixes deprecation warnings about these warnings printed by the CLI.
Signed-off-by: Alex Leong <alex@buoyant.io>
This edge release fixes an issue where Linkerd injected pods could not be
evicted by Cluster Autoscaler. It also adds the `--crds` flag to `linkerd check`
which validates that the Linkerd CRDs have been installed with the proper
versions.
The previously noisy "cluster networks can be verified" check has been replaced
with one that now verifies each running Pod IP is contained within the current
`clusterNetworks` configuration value.
Additionally, linkerd-viz is no longer required for linkerd-multicluster's
`gateways` command — allowing the `Gateways` API to marked as deprecated for
2.12.
Finally, several security issues have been patched in the Docker images now that
the builds are pinned only to minor — rather than patch — versions.
* Replaced manual IP address parsing with functions available in the Go standard
library (thanks @zhlsunshine!)
* Removed linkerd-multicluster's `gateway` command dependency on the linkerd-viz
extension
* Fixed issue where Linkerd injected pods were prevented from being evicted by
Cluster Autoscaler
* Added the `dst_target_cluster` metric to linkerd-multicluster's service-mirror
controller probe traffic
* Added the `--crds` flag to `linkerd check` which validates that the Linkerd
CRDs have been installed
* Removed the Docker image's hardcoded patch versions so that builds pick up
patch releases without manual intervention
* Replaced the "cluster networks can be verified check" check with a "cluster
networks contains all pods" check which ensures that all currently running Pod
IPs are contained by the current `clusterNetworks` configuration
* Added IPv6 compatible IP address generation in certain control plane
components that were only generating IPv4 (thanks @zhlsunshine!)
* Deprecated linkerd-viz's `Gateways` API which is no longer used by
linkerd-multicluster
* Added the `promm` package for making programatic Prometheus assertions in
tests (thanks @krzysztofdrys!)
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>