Commit Graph

162 Commits

Author SHA1 Message Date
Alex Leong 76d59ba9c4
edge-23.3.3 (#10592)
This edge release removes TrafficSplits from the Linkerd dashboard as well as
fixing a number of issues in the policy controller.

* Removed the TrafficSplit page from the Linkerd viz dashboard
* Fixed an issue where the policy controller was not returning the correct
  status for non-Service authorities
* Fixed an issue where the policy controller could use large amounts of CPU
  when lease API calls failed

Signed-off-by: Alex Leong <alex@buoyant.io>
2023-03-21 14:54:38 -07:00
Jeremy Chase 8c9f45ac67
edge-23.3.2 (#10523)
Signed-off-by: Jeremy Chase <jeremy.chase@gmail.com>
Co-authored-by: Oliver Gould <ver@buoyant.io>
2023-03-13 18:08:10 -07:00
Alejandro Pedraza 351288a58f
edge-23.3.1 change notes (#10432)
## edge-23.3.1

This edge release continues to build support under the hood for the upcoming
features in 2.13. Also included are several dependency updates and less verbose
logging.

* Removed dependency on the `curlimages/curl` 3rd-party image used to initialize
  extensions namespaces metadata (so they are visible by `linkerd check`),
  replaced by the new `extension-init` image
* Lowered non-actionable error messages in the Destination log to debug-level
  entries to avoid triggering false alarms (thanks @siddharthshubhampal!)
2023-03-03 08:21:44 -05:00
Alejandro Pedraza 857f882721
Replace curl script with linkerd-extension-init (#10376)
Fixes #9985

When installing extensions via Helm, the `namespace-metadata` job adds the following metadata to its extension namespace:
- `linkerd.io/extension` label to have `linkerd check` identify the extension
- `pod-security.kubernetes.io/enforce` label Pod Security Admission, depending on whether linkerd-cni is enabled
- for viz only, the `viz.linkerd.io/1external-prometheus` annotation, if using an external Prometheus instance

The job uses a `curlimages/curl` docker image and performs those mutations through an inline shell script, which has two downsides:
- Security scanner warnings are sometimges triggered because of outdated binaries in there that we're not using
- The limitations of the shell environment don't allow to have a clear and maintainable script

This change replaces the `curlimage/curl` image in the `namespace-metadata` jobs with the new image for
`linkerd-extension-init` currently worked on
linkerd/linkerd-extension-init#2.
2023-03-02 14:42:53 -05:00
Matei David 34cfa674e6
edge-23.2.3 (#10378)
This edge release includes a number of fixes and introduces a new CLI command,
`linkerd prune`. The new `prune` command should be used to remove resources
which are no longer part of the Linkerd manifest when doing an upgrade.
Previously, the recommendation was to use `linkerd upgrade` in conjunction with
`kubectl apply --prune`, however, that will not remove resources which are not
part of the input manifest, and it will not detect cluster scoped resources,
`linkerd prune` (included in all core extensions) should be preferred over it.

Additionally, this change contains a few fixes from our external contributors,
and a change to the `viz` Helm chart which allows for arbitrary annotations on
`Service` objects. Last but not least, the release contains a few proxy
internal changes to prepare for the new client policy API.

* Added a new `linkerd prune` command to the CLI (including extensions) to
  remove resources which are no longer part of Linkerd's manifests
* Introduced new values in the `viz` chart to allow for arbitrary annotations
  on the `Service` objects (thanks @sgrzemski!)
* Fixed up a comment in k8s API wrapper (thanks @ductnn!)
* Fixed an issue with EndpointSlice endpoint reconciliation on slice deletion;
  when using more than one slice, a `NoEndpoints` event would be sent to the
  proxy regardless of the amount of endpoints that were still available (thanks
  @utay!)

Signed-off-by: Matei David <matei@buoyant.io>
2023-02-23 15:56:34 +00:00
Alex Leong e9eac4c672
Add prune command to linkerd and to extensions (#10303)
Fixes: #10262 

When a resource is removed from the Linkerd manifests from one version to the next, we would like that resource to be removed from the user's cluster as part of the upgrade process.  Our current recommendation is to use the `linkerd upgrade` command in conjunction with the `kubectl apply` command and the `--prune` flag to remove resources which are no longer part of the manifest.  However, `--prune` has many shortcomings and does not detect resources kinds which are not part of the input manifest, nor does it detect cluster scoped resources.  See https://linkerd.io/2.12/tasks/upgrade/#with-the-linkerd-cli

We add a `linkerd prune` command which locates all Linkerd resources on the cluster which are not part of the Linkerd manifest and prints their metadata so that users can delete them.  The recommended upgrade procedure would then be:

```
> linkerd upgrade | kubectl apply -f -
> linkerd prune | kubectl delete -f -
```

User must take special care to use the desired version of the CLI to run the prune command since running this command will print all resources on the cluster which are not included in that version.

We also add similar prune commands to each of the `viz`, `multicluster`, and `jaeger` extensions for deleting extension resources which are not in the extension manifest.

Signed-off-by: Alex Leong <alex@buoyant.io>
2023-02-17 10:44:30 -08:00
Kevin Leimkuhler 11ec6a1cfe
Add changes for edge-23.2.2 (#10344)
## edge-23.2.2

This edge release adds the policy status controller which writes the `status`
field to HTTPRoutes when a parent reference Server accepts or rejects the
HTTPRoute. This field is currently not consumed by the policy controller, but
acts as the first step for considering HTTPRoute `status` when serving policy.

Additionally, the destination controller now uses the Kubernetes metadata API
for resources which it only needs to track the metadata for — Nodes and
ReplicaSets. For all other resources it tracks, it uses additional information
so continues to use the API as before.

* Fixed error message to include the colliding Server in the policy controller's
  admission webhook validation
* Updated wording for linkerd-multicluster cluster when it fails to probe a
  remote gateway mirror
* Removed unnecessary Namespaces access from the destination controller RBAC
* Added Kubernetes metadata API in the destination controller for watching Nodes
  and ReplicaSets
* Fixed QueryParamMatch parsing for HTTPRoutes
* Added the policy status controller which writes the `status` field to
  HTTPRoutes when a parent reference Server accepts or rejects it

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2023-02-16 17:26:05 -07:00
Oliver Gould 363e123d79
Update to dev:v39 with Go 1.19 (#10336) 2023-02-16 08:25:42 -08:00
Steve Jenson 3f0d248b96
edge-23.2.1 (#10288)
Signed-off-by: Steve Jenson <stevej@buoyant.io>
2023-02-10 09:26:45 -08:00
Alejandro Pedraza fc7d553683
Don't apply `waitBeforeExitSeconds` to control-plane pods (#10276)
Close #10058

The Helm value `proxy.waitBeforeExitSeconds` introduces a pause after the pod receives the shutdown signal, and it was intended for pods in the data-plane whose main container needs to perform shutdown-time operations that require the network. Linkerd's control-plane pods don't require that*.

Additionally, if such shutdown operations take longer than 30s, then the user needs to set the pod's `terminationGracePeriod` (whose default is 30s) to be greater than `proxy.waitBeforeExitSeconds` to avoid the kubelet killing the pod before the operations completes. We don't expose `terminationGracePeriod` as a parameter to linkerd's pods so this scenario results in an error such as this:
```
Exec lifecycle hook ([/bin/sleep 40]) for Container "linkerd-proxy" in Pod "linkerd-destination-9559586c5-g9jns_linkerd(e33e8d02-66ca-42fa-9a7c-0ea45bda814a)" failed - error: command '/bin/sleep 40' exited with 137: , message: ""
```

For these two reasons, this change disables the `proxy.waitBeforeExitSeconds` setting for the linkerd pods, either by overriding it at the template level (for core control-plane pods) or through an annotation (for extension pods).

(*) The Viz and Jaeger extensions don't require the network during shutdown either. The Multicluster extension already exposes a setting for `terminationGracePeriod`, so this change doesn't affect this particular extension.
2023-02-07 08:50:00 -05:00
Alex Leong 6d0b555e21
edge-23.1.2 (#10210)
Signed-off-by: Alex Leong <alex@buoyant.io>
2023-01-26 18:06:14 -08:00
Alejandro Pedraza cf665ef56c
Fix PSP (#10208)
Fixes #10150

When we added PodSecurityAdmission in #9719 (and included in
edge-23.1.1), we added the entry `seccompProfile.type=RuntimeDefault` to
the containers SecurityContext.

For PSP to accept that we require to add the annotation
`seccomp.security.alpha.kubernetes.io/allowedProfileNames:
"runtime/default"` into the PSP resource, which also implies we require
to add the entry `seccompProfile.type=RuntimeDefault` to the pod's
SecurityContext as well, not just the container's.

It also turns out the `namespace-metadata` Jobs used by extensions for
the helm installation method didn't have their ServiceAccount properly
bound to the PSP resource. This resulted in the `helm install` command
failing, and although the extensions resources did get deployed, they
were not being discoverable by `linkerd check`. This change fixes that
as well, that has been broken since 2.12.0!
2023-01-26 16:32:41 -08:00
Matei David 028a68265e
edge-23.1.1 (#10129)
This edge release introduces a number of different fixes changes to the proxy.
The proxy has been updated to initialize routes lazily, which means service
profile routes will now only show up in the metrics when a route is used. In
the extensions, old (`ServerAuthorization`) resources have been converted to
`AuthorizationPolicy` -- as part of this change, redundant policy resources
have been cleaned up. A bug in the destination controller that could
potentially lead to stale pods being considered in the load balancer has been
fixed; operations that could previously result in this behavior are now
infallible. Support has been added for `Pod Security Admission`, used instead
of `Pod Security Policy`, as part of this change, some of the extension charts
have been modified to include a `cniEnabled` flag that will impact the policy
used.

Finally, this edge release contains a number of fixes and improvements
from our contributors.

* Converted `ServerAuthorization` resources to `AuthorizationPolicy` resources
  in Linkerd extensions
* Removed policy resources bound to admin servers in extensions (previously
  these resources were used to authorize probes but now are authorized by
  default)
* Added a `resources` field in the linkerd-cni chart (thanks @jcogilvie!)
* Fixed an issue in the CLI where `--identity-external-ca` would set an
  incorrect field (thanks @anoxape!)
* Fixed an issue in the destination controller that could result in stale
  endpoints when using EndpointSlice objects. Logic that previously resulted in
  undefined behavior is now infallible and endpoints will no longer be skipped
  during removal
* Added namespace to namespace-metadata resources in Helm (thanks @joebowbeer!)
* Added support for Pod Security Admission (superseedes PSPs); through this
  change extensions now have a `cniEnabled` value in their charts that will
  directly influence which PSA policy to use
* Changed routes to be initialized lazily. Service Profile routes will no
  longer show up in metrics until the route is used (default routes are always
  available when no Service Profile is defined for a service)
* Changed the proxy's behavior when traffic splitting so that only services
  that are not in failfast are used. This will enable the proxy to manage
  failover without external coordination
* Updated tokio (async runtime) in the proxy which should reduce CPU usage,
  especially for proxy's pod local (i.e in the same network namespace)
  communication

Signed-off-by: Matei David <matei@buoyant.io>
Co-authored-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2023-01-13 17:58:42 +00:00
Alex Leong 52fb2c6750
convert ServerAuthorizations to AuthorizationPolicies (#10079)
The Linkerd extension charts use ServerAuthorization resources.  AuthorizationPolicies are now the recommended resource to use in favor of ServerAuthorizations.  We replace all of the ServerAuthorization resources in the Linkerd extension charts with AuthorizationPolicy resources.

Signed-off-by: Alex Leong <alex@buoyant.io>
2023-01-11 15:07:15 -08:00
Alex Leong 6cba9afcd1
Remove admin policy resources from extensions (#10073)
Fixes #9364

Since probes are automatically authorized, Linkerd extensions no longer need admin Server resources in order for probes to be authorized.  We therefore remove them.

Signed-off-by: Alex Leong <alex@buoyant.io>
2023-01-10 12:47:22 -08:00
Alejandro Pedraza 7428d4aa51
Removed dupe imports (#10049)
* Removed dupe imports

My IDE (vim-gopls) has been complaining for a while, so I decided to take
care of it. Found via
[staticcheck](https://github.com/dominikh/go-tools)

* Add stylecheck to go-lint checks
2023-01-10 14:34:56 -05:00
Joe Bowbeer b8ee97e309
Add ns to namespace-metadata resources (#10043) (#10044)
Closes #10043 

Signed-off-by: Joe Bowbeer <joe.bowbeer@gmail.com>
2023-01-03 11:19:31 -05:00
Alejandro Pedraza faf0ff62f7
Add support for Pod Security Admission (#9719)
Closes #9676

This adds the `pod-security.kubernetes.io/enforce` label as described in [Pod Security Admission labels for namespaces](https://kubernetes.io/docs/concepts/security/pod-security-admission/#pod-security-admission-labels-for-namespaces).

PSA gives us three different possible values (policies or modes): [privileged, baseline and restricted](https://kubernetes.io/docs/concepts/security/pod-security-standards/).

For non-CNI mode, the proxy-init container relies on granting the NET_RAW and NET_ADMIN capabilities, which places those pods under the `restricted` policy. OTOH for CNI mode we can enforce the `restricted` policy, by setting some defaults on the containers' `securityContext` as done in this PR.

Also note this change also adds the `cniEnabled` entry in the `values.yaml` file for all the extension charts, which determines what policy to use.

Final note: this includes the fix from #9717, otherwise an empty gateway UID prevents the pod to be created under the `restricted` policy.

## How to test

As this is only enforced as of k8s 1.25, here are the instructions to run 1.25 with k3d using Calico as CNI:

```bash
# launch k3d with k8s v1.25, with no flannel CI
$ k3d cluster create --image='+v1.25' --k3s-arg '--disable=local-storage,metrics-server@server:0' --no-lb --k3s-arg --write-kubeconfig-mode=644 --k3s-arg --flannel-backend=none --k3s-arg --cluster-cidr=192.168.0.0/16 --k3s-arg '--disable=servicelb,traefik@server:0'

# install Calico
$ k apply -f https://k3d.io/v5.1.0/usage/advanced/calico.yaml

# load all the images
$ bin/image-load --k3d proxy controller policy-controller web metrics-api tap cni-plugin jaeger-webhook

# install linkerd-cni
$ bin/go-run cli install-cni|k apply -f -

# install linkerd-crds
$ bin/go-run cli install --crds|k apply -f -

# install linkerd-control-plane in CNI mode
$ bin/go-run cli install --linkerd-cni-enabled|k apply -f -

# Pods should come up without issues. You can also try the viz and jaeger extensions.
# Try removing one of the securityContext entries added in this PR, and the Pod
# won't come up. You should be able to see the PodSecurity error in the associated
# ReplicaSet.
```

To test the multicluster extension using CNI, check this [gist](https://gist.github.com/alpeb/4cbbd5ad87538b9e0d39a29b4e3f02eb) with a patch to run the multicluster integration test with CNI in k8s 1.25.
2022-12-19 10:23:46 -05:00
Kevin Leimkuhler cbb4d8f2ac
Add changes for edge-22.12.1 (#9926)
## edge-22.12.1

This edge release introduces static and dynamic port overrides for CNI eBPF
socket-level load balancing. In certain installations when CNI plugins run in
eBPF mode, socket-level load balancing rewrites packet destinations to port
6443; as with 443 already, this port is now skipped as well on control plane
components so that they can communicate with the Kubernetes API before their
proxies are running.

Additionally, a potential panic and false warning have been fixed in the
destination component.

* Updated linkerd-jaeger's collector component to expose port 4318 in order
  support HTTP alongside gRPC (thanks @uralsemih!)
* Introduced the `privileged` configuration which allows the `proxy-init`
  container to run as privileged without also running as root
* Fixed a potential panic in the destination component caused by concurrent
  writes when dealing with Endpoint updates
* Fixed false warning when looking up HostPort mappings on Pods
* Added static and dynamic port overrides for CNI eBPF to work with socket-level
  load balancing

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2022-12-02 11:24:05 -07:00
Semih Ural 59da916029
Update otel-collector image version and add port 4318 to otel-collector (#9379)
Added 4318 port to collector (supporting http, alongside the existing gRPC port 4317), and corresponding Server entry. Also updated the opentelemetry-collector image version.

Linkerd otel-collector has not 4318 port where it is necessary for frontend like React



Signed-off-by: Semih Ural <uralsmh@gmail.com>
2022-11-24 15:41:29 -05:00
Alejandro Pedraza 4ea8ab21dc
edge-22.11.3 change notes (#9884)
* edge-22.11.3 change notes

Besides the notes, this corrects a small point in `RELEASE.md`, and
bumps the proxy-init image tag to `v2.1.0`. Note that the entry under
`go.mod` wasn't bumped because moving it past v2 requires changes on
`linkerd2-proxy-init`'s `go.mod` file, and we're gonna drop that
dependency soon anyways. Finally, all the charts got their patch version
bumped, except for `linkerd2-cni` that got its minor bumped because of
the tolerations default change.

## edge-22.11.3

This edge release fixes connection errors to pods using a `hostPort` different
than their `containerPort`. Also the `network-validator` init container improves
its logging, and the `linkerd-cni` DaemonSet now gets deployed in all nodes by
default.

* Fixed `destination` service to properly discover targets using a `hostPort`
  different than their `containerPort`, which was causing 502 errors
* Upgraded the `network-validator` with better logging allowing users to
  determine whether failures occur as a result of their environment or the tool
  itself
* Added default `Exists` toleration to the `linkerd-cni` DaemonSet, allowing it
  to be deployed in all nodes by default, regardless of taints

Co-authored-by: Oliver Gould <ver@buoyant.io>
2022-11-23 14:35:20 -05:00
Kevin Leimkuhler 74ba03fedd
Add changes for edge-22.11.2 (#9850)
## edge-22.11.2

This edge release introduces the use of the Kubernetes metadata API in the
proxy-injector and tap-injector components. This can reduce the IO and memory
footprint for those components as they now only need to track the metadata for
certain resources, rather than the entire resource itself. Similar changes will
be made for the destination component in an upcoming release.

* Bumped HTTP dependencies to fix a potential deadlock in HTTP/2 clients
* Changed the proxy-injector and tap-injector components to use the metadata API
  which should result in less memory consumption

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2022-11-17 15:30:03 -07:00
Alejandro Pedraza 4dbb027f48
Use metadata API in the proxy and tap injectors (#9650)
* Use metadata API in the proxy and tap injectors

Part of #9485

This adds a new `MetadataAPI` similar to the current `k8s.API` hosting informers, but backed by k8s' `metadatainformer` shared informers, which retrieves only the objects metadata, resulting in less memory consumption by its clients. Currently this is only implemented for the proxy and tap injectors. Usage by the destination controller will be implemented as a follow-up.

## Existing API enhancements

Shared objects and logic required by API and MetadataAPI have been moved to the new `k8s.go`, `api_resource.go` and `prometheus.go` files. That includes the `isValidRSParent()` function whose arg is now more generic.

## Unit tests

`/controller/k8s/api_test.go` now also instantiates a MetadataAPI, used in the augmented `TestGetObjects()` and `TestGetOwnerKindAndName()` tests. The `resources` struct was introduced to capture the common fields among tests and simplify `newMockAPI()`'s signature.

## Other Changes

The injector no longer watches for Pods. It only requires watching workloads that own resources (and also watch namespaces), so Pod is not required.

## Testing Memory Consumption

Install linkerd, inject emojivoto and check the injector memory consumption with `kubectl -n linkerd top pod linkerd-proxy-injector-xxx`. It'll start consuming about 16Mi. Then ramp up emojivoto's `voting` deployment replicas to 2000. After 5 minutes memory will stabilize around 32Mi using the current branch. Using the latest edge, it'll stabilize around 110Mi.
2022-11-16 09:21:39 -05:00
Matei David 77fbe4d4cc
edge-22.11.1 (#9815)
edge-22.11.1

This edge releases ships a few fixes in Linkerd's dashboard, and the
multicluster extension. Additionally, a regression has been fixed in the CLI
that blocked upgrades from versions older than 2.12.0, due to missing CRDs
(even if the CRDs were present in-cluster). Finally, the release includes
changes to the helm charts to allow for arbitrary (user-provided) labels on
Linkerd workloads.

* Fixed an issue in the CLI where upgrades from any version prior to
  stable-2.12.0 would fail when using the `--from-manifest` flag
* Removed un-injectable namespaces, such as kube-system from unmeshed resource
  notification in the dashboard (thanks @MoSattler!)
* Fixed an issue where the dashboard would respond to requests with 404 due to
  wrong root paths in the HTML script (thanks @junnplus!)
* Removed the proxyProtocol field in the multicluster gateway policy; this has
  the effect of changing the protocol from 'HTTP/1.1' to 'unknown' (thanks
  @psmit!)
* Fixed the multicluster gateway UID when installing through the CLI, prior to
  this change the 'runAsUser' field would be empty
* Changed the helm chart for the control plane and all extensions to support
  arbitrary labels on resources (thanks @bastienbosser!)

Signed-off-by: Matei David <matei@buoyant.io>
2022-11-11 18:32:52 +00:00
bastienbosser f85ed1af0d
Possibility to add additional labels on all resources for linkerd-control-plane helm chart (#9511)
* Possibility to add additional labels on all resources for linkerd-control-plane helm chart

Signed-off-by: BOSSER, Bastien <bastien.bosser@atos.net>
2022-11-03 10:19:54 -05:00
Steve Jenson 2e54727743
Changelog for edge-22.10.3 release (#9712)
## edge-22.10.3

This edge release adds `network-validator`, a new init container to be used when
CNI is enabled. `network-validator` ensures that local iptables rules are
working as expected. It will validate this before linkerd-proxy starts.
`network-validator` replaces the `noop` container, runs as `nobody`, and drops
all capabilities before starting.

* Validate CNI `iptables` configuration during pod startup
* Fix "cluster networks contains all services" fails with services with no
  ClusterIP
* Remove kubectl version check from `linkerd check` (thanks @ziollek!)
* Set `readOnlyRootFilesystem: true` in viz chart (thanks @mikutas!)
* Fix `linkerd multicluster install` by re-adding `pause` container image
  in chart
* linkerd-viz have hardcoded image value in namespace-metadata.yml template
  bug correction (thanks @bastienbosser!)

Signed-off-by: Steve Jenson <stevej@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
Co-authored-by: Matei David <matei@buoyant.io>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
2022-10-27 15:47:23 -05:00
bastienbosser fa1a80f9f5
Linkerd viz have hardcoded image value in namespace metadata.yml template (#9481)
* linkerd-viz have hardcoded image value in namespace-metadata.yml template bug correction

Signed-off-by: BOSSER, Bastien <bastien.bosser@atos.net>
2022-10-12 07:40:15 -05:00
Alex Leong c8a798410a
edge-22.10.2 (#9597)
This edge release fixes an issue with CNI chaining that was preventing the
Linkerd CNI plugin from working with other CNI plugins such as Cilium. It also
includes several other fixes.

* Updated Grafana dashboards to use variable duration parameter so that they can
  be used when Prometheus has a longer scrape interval (thanks @TarekAS)
* Fixed handling of .conf files in the CNI plugin so that the Linkerd CNI plugin
  can be used alongside other CNI plugins such as Cilium
* Added a `linkerd diagnostics policy` command to inspect Linkerd policy state
* Added a check that ClusterIP services are in the cluster networks
* Added a noop init container to injected pods when the CNI plugin is enabled
  to prevent certain scenarios where a pod can get stuck without an IP address
* Fixed a bug where the `config.linkerd.io/proxy-version` annotation could be empty

Signed-off-by: Alex Leong <alex@buoyant.io>
2022-10-11 17:31:48 -07:00
Alejandro Pedraza a5797f7212
edge-22.10.1 change notes (#9553)
This edge release fixes some sections of the Viz dashboard appearing blank, and
adds an optional PodMonitor resource to the Helm chart to enable easier
integration with the Prometheus Operator. It also includes many fixes submitted
by our contributors.

* Fixed the dashboard sections Tap, Top, and Routes appearing blank (thanks
  @MoSattler!)
* Added an optional PodMonitor resource to the main Helm chart (thanks
  @jaygridley!)
* Fixed the CLI's `--api-addr` flag, that was being ignored (thanks @mikutas!)
* Expanded the `linkerd authz` command to display AuthorizationPolicy resources
  that target namespaces (thanks @aatarasoff!)
* Fixed the `NotIn` label selector operator in the policy resources, that was
  erroneously treated as `In`.
* Fixed warning logic around the "linkerd-viz ClusterRoles exist" and
  "linkerd-viz ClusterRoleBindings exist" checks in `linkerd viz check`
* Fixed proxies emitting some duplicate inbound metrics
2022-10-04 16:26:04 -05:00
Eliza Weisman 93dbb8b3e7
stable-2.12.1 (#9453)
This release includes several control plane and proxy fixes for
`stable-2.12.0`. In particular, it fixes issues related to control plane
HTTP servers' header read timeouts resulting in decreased controller
success rates, lowers the inbound connection pool idle timeout in the
proxy, and fixes an issue where the jaeger injector would put pods into
an error state when upgrading from stable-2.11.x.

Additionally, this release adds the `linkerd.io/trust-root-sha256`
annotation to all injected workloads allowing predictable comparison of
all workloads' trust anchors via the Kubernetes API.

For Windows users, note that the Linkerd CLI's `nupkg` file for
Chocolatey is once again included in the release assets (it was
previously removed in stable-2.10.0).

* Proxy
  * Lowered inbound connection pool idle timeout to 3s

* Control Plane
  * Updated AdmissionRegistration API version usage to v1
  * Added `linkerd.io/trust-root-sha256` annotation on all injected
    workloads to indicate certifcate bundle
  * Updated fields in `AuthorizationPolicy` and `MeshTLSAuthentication`
    to conform to specification (thanks @aatarasoff!)
  * Updated the identity controller to not require a
    `ClusterRoleBinding` to read all deployment resources
  * Increased servers' header read timeouts so they no longer match
    default probe and Prometheus scrape intervals

* Helm
  * Restored `namespace` field in Linkerd helm charts
  * Updated `PodDisruptionBudget` `apiVersion` from `policy/v1beta1` to
    `policy/v1` (thanks @Vrx555!)

* Extensions
  * Fixed jaeger injector interfering with upgrades to 2.12.x
2022-09-22 14:01:35 -07:00
Alex Leong 566721c746
edge-22.9.2 (#9432)
This release fixes an issue where the jaeger injector would put pods into an
error state when upgrading from stable-2.11.x.

* Updated AdmissionRegistration API version usage to v1
* Fixed jaeger injector interfering with upgrades to 2.12.x

Signed-off-by: Alex Leong <alex@buoyant.io>
2022-09-20 12:23:02 -07:00
Jeremy Chase ee75526ba7
Add changes for edge-22.9.1 (#9384)
## edge-22.9.1

 This release adds the `linkerd.io/trust-root-sha256` annotation to all injected
 workloads allowing predictable comparison of all workloads' trust anchors via
 the Kubernetes API.

 Additionally, this release lowers the inbound connection pool idle timeout to
 3s. This should help avoid socket errors, especially for Kubernetes probes.

 * Added `linkerd.io/trust-root-sha256` annotation on all injected workloads
   to indicate certifcate bundle
 * Lowered inbound connection pool idle timeout to 3s
 * Restored `namespace` field in Linkerd helm charts
 * Updated fields in `AuthorizationPolicy` and `MeshTLSAuthentication` to
   conform to specification (thanks @aatarasoff!)
 * Updated the identity controller to not require a `ClusterRoleBinding`
   to read all deployment resources.
2022-09-15 11:04:15 -06:00
Oliver Gould c8348b3ab4
helm: Restore `namespace` field in templates (#9351)
In #6635 (f9f3ebe), we removed the `Namespace` resources from the
linkerd Helm charts. But this change also removed the `namespace` field
from all generated metadata, adding conditional logic to only include it
when being installed via the CLI.

This conditional logic currently causes spurious whitespace in output
YAML. This doesn't cause problems but is aesthetically
inconsistent/distracting.

This change removes the `partials.namespace` helper and instead inlines
the value in our templates. This makes our CLI- and Helm-generated
manifests slightly more consistent and removes needless indirection.

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-09-10 10:18:02 -07:00
Jeremy Chase 9f36569230
Add changes for edge-22.8.3 (#9298)
## edge-22.8.3

Increased control plane HTTP servers' read timeouts so that they no longer
match the default probe intervals. This was leading to closed connections
and decreased controller success rate.

Signed-off-by: Jeremy Chase <jeremy.chase@gmail.com>
2022-08-30 15:59:31 -06:00
Kevin Leimkuhler 4b5ab072d6
Prep chart versions for `stable-2.12.0` (#9236)
Closes #9230 

#9202 prepped the release candidate for `stable-2.12.0` by removing the `-edge`
suffix and adding the `-rc2` suffix.

This preps the chart versions for the stable release by removing that `-rc2`
suffix.

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2022-08-23 10:17:38 -06:00
Alexander Berger 064b7b32d9
Prevent proxy-injection for Helm Hook Job Pod (#9194)
Prevent proxy-injection for Helm Hook Job Pod for namespace-metadata, as the injected proxy would prevent the Job from terminating. See also #8194, which was closed but the Problem still exists.
Fixes #8194
Signed-off-by: Alex Berger alex-berger@gmx.ch

Signed-off-by: Alexander Berger <alex.berger@nexxiot.com>
2022-08-19 13:40:17 -05:00
Alejandro Pedraza 0404c22e9e
Change notes for stable-2.12.0-rc2 (#9202)
This release is the second release candidate for stable-2.12.0.

At this point the Helm charts can be retrieved from the stable repo:

```
helm repo add linkerd https://helm.linkerd.io/stable
helm repo up
helm install linkerd-crds -n linkerd --create-namespace linkerd/linkerd-crds
helm install linkerd-control-plane \
  -n linkerd \
  --set-file identityTrustAnchorsPEM=ca.crt \
  --set-file identity.issuer.tls.crtPEM=issuer.crt \
  --set-file identity.issuer.tls.keyPEM=issuer.key \
  linkerd/linkerd-control-plane
```

The following lists all the changes since edge-22.8.2:

* Fixed inheritance of the `linkerd.io/inject` annotation from Namespace to
  Workloads when its value is `ingress`
* Added the `config.linkerd.io/default-inbound-policy: all-authenticated`
  annotation to linkerd-multicluster’s Gateway deployment so that all clients
  are required to be authenticated
* Added a `ReadHeaderTimeout` of 10s to all the go `http.Server` instances, to
  avoid being vulnerable to "slowrolis" attacks
* Added check in `linkerd viz check --proxy` to warn in case namespace have the
  `config.linkerd.io/default-inbound-policy: deny` annotation, which would not
  authorize scrapes coming from the linkerd-viz Prometheus instance
* Added validation for accepted values for the `--default-inbound-policy` flag
* Fixed invalid URL in the `linkerd install --help` output
* Added `--destination-pod` flag to `linkerd diagnostics endpoints` subcommand
* Added `proxyInit.runAsUser` in `values.yaml` defaulting to non-zero, to
  complement the new default `proxyInit.runAsRoot: false` that was rencently
  changed
2022-08-18 19:50:09 -05:00
Oliver Gould 0b094ec142
dev: Update markdowlint-cli2 to v0.5.1 (#9166)
* Update the devcontainer to use Node 16
* Update markdowlint-cli2 to v0.5.1
* Update the markdown workflow to use a newer action
* Address various markdown linting issues
* Add a `just markdownlint` recipe
* Publish dev:v26

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-08-15 12:59:59 -07:00
Alex Leong 5427446de9
edge-22.8.2 (#9138)
This release is considered a release candidate for stable-2.12.0 and we
encourage you to try it out! It includes an update to the multicluster extension
which adds support for Kubernetes v1.24 and also updates many CLI commands to
support the new policy resources: ServerAuthorization and HTTPRoute.

* Updated linkerd check to allow RSA signed trust anchors (thanks @danibaeyens)
* Fixed some invalid yaml in the viz extension's tap-injector template (thanks @wc-s)
* Added support for AuthorizationPolicy and HttpRoute to viz authz command
* Added support for AuthorizationPolicy and HttpRoute to viz stat
* Added support for policy metadata in linkerd tap
* Fixed an issue where certain control plane components were not restarting as
  necessary after a trust root rotation
* Added a ServiceAccount token Secret to the multicluster extension to support
  Kubernetes versions >= v1.24
* Fixed an issuer where the --default-inbound-policy setting was not being
  respected

Signed-off-by: Alex Leong <alex@buoyant.io>
2022-08-11 16:56:21 -07:00
Kevin Leimkuhler ca08b81d41
Add changes for `edge-22.8.1` (#9099)
## edge-22.8.1

This releases introduces default probe authorization. This means that on
clusters that use a default `deny` policy, probes do not have to be explicitly
authorized using policy resources. Additionally, the
`policyController.probeNetworks` Helm value has been added, which allows users
to configure the networks that probes are expected to be performed from.

Additionally, the `linkerd authz` command has been updated to support the policy
resources AuthorizationPolicy and HttpRoute.

Finally, some smaller changes include allowing to disable `linkerd-await` on
control plane components (using the existing `proxy.await` configuration) and
changing the default iptables mode back to `legacy` to support more cluster
environments by default.

* Updated the `linkerd authz` command to support AuthorizationPolicy and
  HttpRoute resources
* Changed the `proxy.await` Helm value so that users can now disable
  `linkerd-await` on control plane components
* Added probe authorization by default allowing clusters that use a default
  `deny` policy to not explicitly need to authorize probes
* Added ability to run the Linkerd CNI plugin in non-chained (stand-alone) mode
* Added the `policyController.probeNetworks` Helm value for configuring the
  networks that probes are expected to be performed from
* Changed the default iptables mode to `legacy`

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2022-08-05 15:49:53 -06:00
Matei David 26f696daa3
edge-22.7.3 (#9030)
* edge-22.7.3

This release adds a new `nft` iptables mode, used by default in proxy-init.
When used, firewall configuration will be set-up through the `iptables-nft`
binary; this should allow hosts that do not support `iptables-legacy` (such as
RHEL based environments) to make use of the init container. The older
`iptables-legacy` mode is still supported, but it must be explictly turned on.
Moreover, this release also replaces the `HTTPRoute` CRD with Linkerd's own
version, and includes a number of fixes and improvements.

* Added a new `iptables-nft` mode for proxy-init. When running in this mode,
  the firewall will be configured with `nft` kernel API; this should allow
  users to run the init container on RHEL-family hosts
* Fixed an issue where the proxy-injector would break when using `nodeAffinity`
  values for the control plane
* Updated healthcheck to ignore `Terminated` state for pods (thanks
  @AgrimPrasad!)
* Replaced `HTTRoute` CRD version from `gateway.networking.k8s.io` with a
  similar version from the `policy.linkerd.io` API group. While the CRD is
  similar, it does not support the `Gateway` type, does not contain the
  `backendRefs` fields, and does not support `RequestMirror` and `ExtensionRef`
  filter types.
* Updated the default policy controller log level to `info`; the controller
  will now emit INFO level logs for some of its dependencies
* Added validation to ensure `HTTPRoute` paths are absolute; relative paths are
  not supported by the proxy and the policy controller admission server will
  reject any routes that use paths which do not start with `/`

Signed-off-by: Matei David <matei@buoyant.io>
2022-07-28 18:20:41 +03:00
Oliver Gould 5491aec246
Update Go to 1.18 (#9019)
Go 1.18 features a number of important chanages, notably removing client
support for defunct TLS versions: https://tip.golang.org/doc/go1.18

This change updates our Go version in CI and development.

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-07-27 16:10:39 -07:00
Oliver Gould 6adcf81625
ci: Use devcontainer tooling in CI (#8925)
Our devcontainers pin versions of all of the tools we need to build &
test the project, but these tools are not necessarily kept in sync with
those in our devcontainer.

This change introduces new variants of our devcontainer image that can
be pre-bundled with Go or Rust tooling (with fairly minimal container
images). Various CI workflows are updated to use the same tooling
versions that are used by our devcontainer, and a CI workflow is added
to ensure that these versions stay in sync. Some workflows are NOT
updated--especially those that invoke `docker`--since the docker
environment is severely limited when running inside of a container.

Furthermore, this change does the following:

* Update shellcheck to v0.8.0;
* Update `bin/shellcheck-all` to exclude irrelevant files (that are not
  part of the project);
* Add `helm` and `helm-docs` to the devcontainer;
* Update `helm` to v3.9.1
* Update `helm-docs` to v1.11.0
* Include tools like `just`, `cargo-action-fmt`, and `cargo-nextest` in
  our Rust image
* Add a `just` recipe that builds (and optionally publish) the
  appropriate devcontainer images

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-07-27 09:54:39 -07:00
Alex Leong 692311ee1b
edge-22.7.2 (#8947)
This release adds support for per-route authorization policy using the
AuthorizationPolicy and HttpRoute resources. It also adds a configurable
shutdown grace period to the proxy which can be used to ensure that proxy
graceful shutdown completes within a certain time, even if there are outstanding
open connections.

* Removed kube-system exclusions from watchers to fix service discovery for
  workloads in the kube-system namespace (thanks @JacobHenner)
* Added annotations to allow Linkerd extension deployments to be evicted by the
  autoscaler when necessary
* Added missing port in the Linkerd viz chart documentation (thanks @haswalt)
* Added support for per-route policy by supporting AuthorizationPolicy resources
  which target HttpRoute resources
* Fixed the `linkerd check` command crashing when unexpected pods are found in
  a Linkerd namespace
* Added a `config.linkerd.io/shutdown-grace-period` annotation to configure the
  proxy's maximum grace period for graceful shutdown

Signed-off-by: Alex Leong <alex@buoyant.io>
2022-07-21 14:16:31 -07:00
Kevin Leimkuhler 2442ca07bf
Parse Pod labels for owning Deployment instead of name (#8920)
Closes #8916

When a random Pod (meshed or not) is created in the `linkerd`, `linkerd-viz`, or
`linkerd-jaeger` namespaces their respective `check` subcommands can fail.

We parse Pod names for their owning Deployment by assuming the Pod name has a
randomized suffix. For example, the `linkerd-destination` Deployment creates the
`linkerd-destination-58c57dd675-7tthr` Pod. We split the name on `-` and take
the first two parts (`["linkerd", "destination"]`); those first two parts make
up the Deployment name.

Now, if a random Pod is created in the namespace with the name `test`, we apply
that same logic but hit a runtime error when trying to get the first two parts
of the split. `test` did not split at all since it contains no `-` and therefore
we error with `slice bounds out of range`.

To fix this, we now use the fact that all Linkerd components have a
`linkerd.io/control-plane-component` or `component` label with a value that is
the owning Deployment. This allows us to avoid any extra parsing logic and just
look at a single label value.

Additionally, some of these checks get all the Pods in a namespace with the
`GetPodsByNamespace` method but we don't always need something so general. In
the places where we are checking specifically for Linkerd components, we can
narrow this further by using the expected LabelSelector such as
`linkerd.io/extension=viz`.

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2022-07-19 12:14:55 -06:00
Alex Leong 80b2fdbe3f
Allow extension deployments to be evicted by autoscaler (#8828)
Fixes: #8308

We add the `cluster-autoscaler.kubernetes.io/safe-to-evict: "true"` annotation to all Linkerd extension deployments.  This signals that none of these deployments use persistent storage and they are all eligible for eviction if necessary.

Signed-off-by: Alex Leong <alex@buoyant.io>
2022-07-12 10:46:31 -07:00
Kevin Leimkuhler 89c397349c
Add changes for edge-22.7.1 (#8833)
This release includes a security improvement. When a user manually specified the
`policyValidator.keyPEM` setting, the value was incorrectly included in the
`linkerd-config` configmap. This means that this private key was erroneously
exposed to service accounts with read access to this configmap. Practically,
this means that the Linkerd `proxy-injector`, `identity`, and `heartbeat` pods
could read this value. This should **not** have exposed this private key to
other unauthorized users unless additional role bindings were added outside of
Linkerd. Nevertheless, we recommend that users who manually set control plane
certificates update the credentials for the policy validator after upgrading
Linkerd.

Additionally, the linkerd-multicluster extensions has several fixes related to
fail fast errors during link watch restarts, improper label matching for
mirrored services, and properly cleaning up mirrored endpoints in certain
situations.

Lastly, the proxy can now retry gRPC requests that have responses with a
TRAILERS frame. A fix to reduce redundant load balancer updates should also
result in less connection churn.

* Changed unit tests to use newly introduced `prommatch` package for asserting
  expected metrics (thanks @krzysztofdrys!)
* Fixed Docker container runtime check to only during `linkerd install` rather
  than `linkerd check --pre`
* Changed linkerd-multicluster's remote cluster watcher to assume the gateway is
  alive when starting—fixing fail fast errors from occurring during restarts
  (thanks @chenaoxd!)
* Added `matchLabels` and `matchExpressions` to linkerd-multicluster's Link CRD
* Fixed linkerd-multicluster's label selector to properly select resources that
  match the expected label value, rather than just the presence of the label
* Fixed linkerd-multicluster's cluster watcher to properly clean up endpoints
  belonging to remote headless services that are no longer mirrored
* Added the HttpRoute CRD which will be used by future policy features
* Fixed CNI plugin event processing where file updates could sometimes be
  skipped leading to the update not being acknowledged
* Fixed redundant load balancer updates in the proxy that could cause
  unnecessary connection churn
* Fixed gRPC request retries for responses that contain a TRAILERS frame
* Fixed the dashboard's `linkerd check` due to missing RBAC for listing pods in
  the cluster
* Fixed API check that ensures access to the Server CRD (thanks @aatarasoff!)
* Changed `linkerd authz` to match the labels of pre-fetched Pods rather than
  the multiple API calls it was doing—resulting in significant speed-up (thanks
  @aatarasoff!)
* Unset `policyValidtor.keyPEM` in `linkerd-config` ConfigMap

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2022-07-10 11:39:56 -06:00
Matei David a49cbb9fe1
edge-22.6.2 (#8706)
This edge release bumps the minimum supported Kubernetes version from `v1.20`
to `v1.21`, introduces some new changes, and includes a few bug fixes. Most
notably, a bug has been fixed in the proxy's outbound load balancer that could
cause panics, especially when the balancer would process many service discovery
updates in a short period of time. This release also fixes a panic in the
proxy-injector and introduces a change that will include HTTP probe ports in
the proxy's inbound ports configuration, to be used for policy discovery.

* Fixed a bug in the proxy's outbound load balancer that could cause panics
  when many discovery updates were processed in short time periods
* Added `runtimeClassName` options to Linkerd's Helm chart (thanks @jtcarnes!)
* Introduced a change in the proxy-injector that will configure the inbound
  ports proxy configuration with the pod's probe ports (HTTPGet)
* Added godoc links in the project README file (thanks @spacewander!)
* Increased minimum supported Kubernetes version to `v1.21` from `v1.20`
* Fixed an issue where the proxy-injector would not emit events for resources
  that receive annotation patches but are skipped for injection
* Refactored `PublicIPToString` to handle both IPv4 and IPv6 addresses in a
  similar behavior (thanks @zhlsunshine!)
* Replaced the usage of branch with tags, and pinned `cosign-installer` action
  to `v1` (thanks @saschagrunert!)
* Fixed an issue where the proxy-injector would panic if resources have an
  unsupported owner kind

Signed-off-by: Matei David <matei@buoyant.io>
Co-authored-by: Eliza Weisman <eliza@buoyant.io>
2022-06-20 19:25:40 +01:00
Alex Leong b7a0b8adb4
Bump minimum kubernetes version to 1.21 (#8647)
Fixes #8592

Increase the minimum supported kubernetes version from 1.20 to 1.21.  This allows us to drop support for batch/v1beta1/CronJob and discovery/v1beta1/EndpointSlices, instead using only v1 of those resources.  This fixes deprecation warnings about these warnings printed by the CLI.

Signed-off-by: Alex Leong <alex@buoyant.io>
2022-06-14 15:15:28 -07:00
Kevin Leimkuhler a24c32e5e7
Add changes for edge-22.6.1 (#8642)
This edge release fixes an issue where Linkerd injected pods could not be
evicted by Cluster Autoscaler. It also adds the `--crds` flag to `linkerd check`
which validates that the Linkerd CRDs have been installed with the proper
versions.

The previously noisy "cluster networks can be verified" check has been replaced
with one that now verifies each running Pod IP is contained within the current
`clusterNetworks` configuration value.

Additionally, linkerd-viz is no longer required for linkerd-multicluster's
`gateways` command — allowing the `Gateways` API to marked as deprecated for
2.12.

Finally, several security issues have been patched in the Docker images now that
the builds are pinned only to minor — rather than patch — versions.

* Replaced manual IP address parsing with functions available in the Go standard
  library (thanks @zhlsunshine!)
* Removed linkerd-multicluster's `gateway` command dependency on the linkerd-viz
  extension
* Fixed issue where Linkerd injected pods were prevented from being evicted by
  Cluster Autoscaler
* Added the `dst_target_cluster` metric to linkerd-multicluster's service-mirror
  controller probe traffic
* Added the `--crds` flag to `linkerd check` which validates that the Linkerd
  CRDs have been installed
* Removed the Docker image's hardcoded patch versions so that builds pick up
  patch releases without manual intervention
* Replaced the "cluster networks can be verified check" check with a "cluster
  networks contains all pods" check which ensures that all currently running Pod
  IPs are contained by the current `clusterNetworks` configuration
* Added IPv6 compatible IP address generation in certain control plane
  components that were only generating IPv4 (thanks @zhlsunshine!)
* Deprecated linkerd-viz's `Gateways` API which is no longer used by
  linkerd-multicluster
* Added the `promm` package for making programatic Prometheus assertions in
  tests (thanks @krzysztofdrys!)

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2022-06-09 18:26:47 -06:00