Default values for `linkerd-init` (resources allocated) are not always
the right fit. We offer default values to ensure proxy-init does not get
in the way of QOS Guaranteed (`linkerd-init` resource limits and
requests cannot be configured in any other way).
Instead of using default values that can be overridden, we can re-use
the proxy's configuration values. For the pod to be QOS Guaranteed, the
values for the proxy have to be set any way. If we re-use the same
values for proxy-init we can ensure we'll always request the same amount
of CPU and memory as needed.
* `linkerd-init` now defaults to the proxy's values
* when the proxy has an annotation configuration for resource requests,
it also impacts `linkerd-init`
* Helm chart and docs have been updated to reflect the missing values.
* tests now no longer use `ProxyInit.Resources`
UPGRADE NOTE:
- Deprecates `proxyInit.resources` field in the Helm values.
- It will be a no-op if specified (no hard failures)
Closes#11320
---------
Signed-off-by: Matei David <matei@buoyant.io>
Those releases ensure that when IPv6 is enabled, the series of ip6tables commands succeed. If they fail, the proxy-init/linkerd-cni containers should fail as well, instead of ignoring errors.
See linkerd/linkerd2-proxy-init#388
Fixes#11773
Make the proxy's GUID configurable via `proxy.gid` which defaults to `-1`, in which case the GUID is not set.
Also added ability to set the GUID for proxy-init and the core and extension controllers.
---------
Signed-off-by: Nico Feulner <nico.feulner@gmail.com>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
Closes#12395
Failing to iterate over init containers as well as regular containers for finding the proxy in various parts of the code when the proxy is injected as a native sidecar resulted in:
- `Get` Destination API failing in the presence of opaque ports
- Failure having the injector detecting already injected pods
- Various CLI issues
This PR is split into the following commits addressing each issue separately:
a8ebe76e3 - Fix injection check for existing sidecars
44e9625e0 - Fix 'linkerd uninject'
62694965d - Fix 'linkerd version --proxy'
42dbdaddf - Fix 'linkerd identity'
39db823fe - Fix 'linkerd check'
7359f371d - Fix 'linkerd dg proxy-metrics'
f8f73c47c - Fix destination controller
A new release has been cut for both. The new release adds a new `GID`
feature that allows iptables to skip traffic originating from a process
running under the specified GID. The CNI plugin also includes a fix for
native sidecar containers.
* Bump proxy-init from `v2.3.0` to `v2.4.0`
* Bump CNI plugin from `v1.4.0` to `v1.5.0`
---------
Signed-off-by: Matei David <matei@buoyant.io>
Fixes: https://github.com/linkerd/linkerd2/issues/12233
When Linkerd is installed in HA mode, linkerd check warns if the admission-webhooks=disabled annotation on kube-system is not set BUT the admission webhooks already exclude kube-system, making the annotation no longer necessary.
Signed-off-by: Alex Leong <alex@buoyant.io>
We released a new version of the CNI plugin. The chart has been updated
to reference the new version, however, some of the tests and the Go
`version` pkg still reference the old version (v1.2.2). When installing
through the CLI, I noticed that even though the chart value renders an
image for the new repair controller, the image used is still v1.2.2, and
as such, the container won't be started due to a missing binary.
This change bumps the version to v1.3.0 everywhere.
Signed-off-by: Matei David <matei@buoyant.io>
* Introduce a new check for extension namespace configuration
Linkerd's extension model requires that each namespace that "owns" an
extension to be labelled with the extension name. Core extensions in
particular strictly follow this pattern. For example, the namespace viz
is installed in would be labelled with `linkerd.io/extension=viz`.
The extension is used by the CLI in many different instances. It is used
in checks, it is used in uninstalls, and so on. Whenever a namespace
contains a duplicate label value (e.g. two namespaces are registered as
the owner of "viz") we introduce undefined behaviour. Extension checks
or uninstalls may or may not work correctly. These issues are not
straightforward to debug. Misconfiguration can be introduced due to a
variety of reasons.
This change adds a new "core" category (`linkerd-extension-checks`) and
a new checker that asserts all extension namespaces are configured
properly. There are two reasons why this has been made a "core"
extension:
* Extensions may have their own health checking library. It is hard to
share a common abstraction here without duplicating the logic. For
example, viz imports the healthchecking package whereas the
multicluster extension has its own. A dedicated core check will work
better with all extensions that opt-in to use linkerd's extension
label.
* Being part of the core checks means this is going to run before any of
the other extension checks do which might improve visibility.
The change is straightforward; if an extension value is used for the
label key more than once across the cluster, the check issues a warning
along with the namespaces the label key and value tuple exists on.
This should be followed-up with a docs change.
Closes#11509
Signed-off-by: Matei David <matei@buoyant.io>
When the Linkerd CLI is unable to access the internet, it will encounter
a DNS error when trying to discover the latest Linkerd releases from linkerd.io.
This change handles this DNS resolution error explicitly so that users receive
a more informative error message.
Fixes#11349
Signed-off-by: Dominik Táskai <dominik.taskai@leannet.eu>
Co-authored-by: Dominik Táskai <dominik.taskai@leannet.eu>
Co-authored-by: Oliver Gould <ver@buoyant.io>
* Bump CNI plugin to v1.2.1
* Bump proxy-init to v2.2.2
Both dependencies include a fix for CVE-2023-2603. Since alpine is used
as the runtime image, there is a security vulnerability detected in the
produced images (due to an issue with libcap). The alpine images have
been bumped to address the CVE.
Signed-off-by: Matei David <matei@buoyant.io>
Adds support for remote discovery to the destination controller.
When the destination controller gets a `Get` request for a Service with the `multicluster.linkerd.io/remote-discovery` label, this is an indication that the destination controller should discover the endpoints for this service from a remote cluster. The destination controller will look for a remote cluster which has been linked to it (using the `linkerd multicluster link` command) with that name. It will look at the `multicluster.linkerd.io/remote-discovery` label for the service name to look up in that cluster. It then streams back the endpoint data for that remote service.
Since we now have multiple client-go informers for the same resource types (one for the local cluster and one for each linked remote cluster) we add a `cluster` label onto the prometheus metrics for the informers and EndpointWatchers to ensure that each of these components' metrics are correctly tracked and don't overwrite each other.
---------
Signed-off-by: Alex Leong <alex@buoyant.io>
Problem - Current does Linkerd CNI Helm chart templates have hostNetwork: true set which is unnecessary and less secure.
Solution - Removed hostNetwork: true from linkerd-cni Helm chart templates
PR Fixes#11141
---------
Signed-off-by: Abhijeet Gaurav <abhijeetdav24aug@gmail.com>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
Updates the policy-controller to watch `httproute.gateway.networking.k8s.io` resources in addition to watching `httproute.policy.linkerd.io` resources. Routes of either or both types can be returned in policy responses and will be appropriately identified by the `group` field on their metadata. Furthermore we update the Status of these resources to correctly reflect when they are accepted.
We add the `httproute.gateway.networking.k8s.io` CRD to the Linkerd installed CRD list and add the appropriate RBAC to the policy controller so that it may watch these resources.
Signed-off-by: Alex Leong <alex@buoyant.io>
This release stops using the "interface" mode, and instead wait till
another CNI plugin drops a proper network config and then append the
linkerd CNI config to it. This avoids having pods start before proper
networking is established in the node.
This release of the CNI plugin changes the base runtime Docker image
from `debian:bullseye-slim` to `alpine:3.17.3`.
---
* cni: use `scratch` as the base runtime docker image (linkerd/linkerd2-proxy-init/pull/237)
* cni: change base runtime image from `scratch` to `alpine` (linkerd/linkerd2-proxy-init#238)
Fixed incompatibility issue with AWS CNI addon in EKS, that was
forbidding pods to acquire networking after scaling up nodes.
Credits to @frimik for providing a diagnosis and fix, and to @JonKusz for the detailed repro
proxy-init v2.2.1:
* Sanitize `subnets-to-ignore` flag
* Dep bumps
cni-plugin v1.1.0:
* Add support for the `config.linkerd.io/skip-subnets` annotation
* Dep bumps
validator v0.1.2:
* Dep bumps
Also, `linkerd-network-validator` is now released wrapped in a tar file, so this PR also amends `Dockerfile-proxy` to account for that.
The existing `linkerd check` command runs extension checks based on extension namespaces already on-cluster. This approach does not permit running extension checks without cluster-side components.
Introduce "CLI Checks". These extensions run as part of `linkerd check`, if they satisfy the following criteria:
1) executable in PATH
2) prefixed by `linkerd-`
3) supports an `_extension-metadata` subcommand, that outputs self-identifying
JSON, for example:
```
$ linkerd-foo _extension-metadata
{
"name": "linkerd-foo",
"checks": "always"
}
```
4) The `name` value from `_extension-metadata` must match the filename. And `checks` must equal `always`.
If a CLI Check is found that also would have run as an on-cluster extension check, it is run as a CLI Check only.
Fixes#10544
wind the new linkerd-cni build through the build. refactor image, version, and pullPolicy into an Image object.
Signed-off-by: Steve Jenson <stevej@buoyant.io>
* Removed dupe imports
My IDE (vim-gopls) has been complaining for a while, so I decided to take
care of it. Found via
[staticcheck](https://github.com/dominikh/go-tools)
* Add stylecheck to go-lint checks
* Refactor `linkerd check` calls in the integration tests
Extracted logic into the new file `testutil/test_helper_check.go` which exposes the functions `TestCheckPre`, `TestCheck` and `TestCheckProxy`.
`linkerd check --output json` is called so its output is properly captured without the need of golden files.
Besides checking that there are no errors (although warnings are allowed), we check that the expected check categories are returned.
The plan is to leverage this in #9856 when re-enabling the helm-upgrade test.
Closes#9676
This adds the `pod-security.kubernetes.io/enforce` label as described in [Pod Security Admission labels for namespaces](https://kubernetes.io/docs/concepts/security/pod-security-admission/#pod-security-admission-labels-for-namespaces).
PSA gives us three different possible values (policies or modes): [privileged, baseline and restricted](https://kubernetes.io/docs/concepts/security/pod-security-standards/).
For non-CNI mode, the proxy-init container relies on granting the NET_RAW and NET_ADMIN capabilities, which places those pods under the `restricted` policy. OTOH for CNI mode we can enforce the `restricted` policy, by setting some defaults on the containers' `securityContext` as done in this PR.
Also note this change also adds the `cniEnabled` entry in the `values.yaml` file for all the extension charts, which determines what policy to use.
Final note: this includes the fix from #9717, otherwise an empty gateway UID prevents the pod to be created under the `restricted` policy.
## How to test
As this is only enforced as of k8s 1.25, here are the instructions to run 1.25 with k3d using Calico as CNI:
```bash
# launch k3d with k8s v1.25, with no flannel CI
$ k3d cluster create --image='+v1.25' --k3s-arg '--disable=local-storage,metrics-server@server:0' --no-lb --k3s-arg --write-kubeconfig-mode=644 --k3s-arg --flannel-backend=none --k3s-arg --cluster-cidr=192.168.0.0/16 --k3s-arg '--disable=servicelb,traefik@server:0'
# install Calico
$ k apply -f https://k3d.io/v5.1.0/usage/advanced/calico.yaml
# load all the images
$ bin/image-load --k3d proxy controller policy-controller web metrics-api tap cni-plugin jaeger-webhook
# install linkerd-cni
$ bin/go-run cli install-cni|k apply -f -
# install linkerd-crds
$ bin/go-run cli install --crds|k apply -f -
# install linkerd-control-plane in CNI mode
$ bin/go-run cli install --linkerd-cni-enabled|k apply -f -
# Pods should come up without issues. You can also try the viz and jaeger extensions.
# Try removing one of the securityContext entries added in this PR, and the Pod
# won't come up. You should be able to see the PodSecurity error in the associated
# ReplicaSet.
```
To test the multicluster extension using CNI, check this [gist](https://gist.github.com/alpeb/4cbbd5ad87538b9e0d39a29b4e3f02eb) with a patch to run the multicluster integration test with CNI in k8s 1.25.
* edge-22.11.3 change notes
Besides the notes, this corrects a small point in `RELEASE.md`, and
bumps the proxy-init image tag to `v2.1.0`. Note that the entry under
`go.mod` wasn't bumped because moving it past v2 requires changes on
`linkerd2-proxy-init`'s `go.mod` file, and we're gonna drop that
dependency soon anyways. Finally, all the charts got their patch version
bumped, except for `linkerd2-cni` that got its minor bumped because of
the tolerations default change.
## edge-22.11.3
This edge release fixes connection errors to pods using a `hostPort` different
than their `containerPort`. Also the `network-validator` init container improves
its logging, and the `linkerd-cni` DaemonSet now gets deployed in all nodes by
default.
* Fixed `destination` service to properly discover targets using a `hostPort`
different than their `containerPort`, which was causing 502 errors
* Upgraded the `network-validator` with better logging allowing users to
determine whether failures occur as a result of their environment or the tool
itself
* Added default `Exists` toleration to the `linkerd-cni` DaemonSet, allowing it
to be deployed in all nodes by default, regardless of taints
Co-authored-by: Oliver Gould <ver@buoyant.io>
The root cause of https://github.com/linkerd/linkerd2/issues/9521 was that there were clusterip Services which were not in Linkerd's cluster networks. This means that Linkerd was not performing discovery when connecting to these services and therefore was not doing mTLS. This issue was difficult to detect and diagnose.
We add a check which verifies that all clusterIP services in the cluster have their clusterIP in the cluster networks. This is very similar to the existing check which verifies that all pods have a podIP in the cluster networks.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Bump proxy-init to v2.0.0
New release of proxy-init.
Updated:
* Helm values to use v2.0.0 of proxy-init
* Helm docs
* Tests
Note: go dependencies have not been updated since the new version will
break API compatibility with older versions (source files have been
moved, see issue for more details).
Closes#9164
Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Oliver Gould <ver@buoyant.io>
Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Oliver Gould <ver@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
* Allows RSA signed trust anchors on linkerd cli (#7771)
Linkerd currently forces using an ECDSA P-256
issuer certificate along with a ECDSA trust
anchor. Still, it's still cryptographically valid
to have an ECDSA P-256 issuer certificate issued
by an RSA signed CA.
CheckCertAlgoRequirements checks if CA cert uses
ECDSA or RSA 2048/4096 signing algorithm.
Fixes#7771
Signed-off-by: Baeyens, Daniel <daniel.baeyens@gmail.com>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
* Allow initializing a k8s namespace-scoped API
This allows reusing the `k8s.API` informers by other projects that don't
necessarily have cluster-wide permissions.
This change bumps the proxy-init version from v1.6.1 to the latest
version, v1.6.2. As part of the new release, proxy-init now adds
net_admin and net_raw sys caps to xtables-nft-multi so that nftables
mode can be used without requiring root privileges.
* Bump go.mod
* Bump version in helm values
* Bump version in misc files
* Bump version in code
Signed-off-by: Matei David <matei@buoyant.io>
Closes#8916
When a random Pod (meshed or not) is created in the `linkerd`, `linkerd-viz`, or
`linkerd-jaeger` namespaces their respective `check` subcommands can fail.
We parse Pod names for their owning Deployment by assuming the Pod name has a
randomized suffix. For example, the `linkerd-destination` Deployment creates the
`linkerd-destination-58c57dd675-7tthr` Pod. We split the name on `-` and take
the first two parts (`["linkerd", "destination"]`); those first two parts make
up the Deployment name.
Now, if a random Pod is created in the namespace with the name `test`, we apply
that same logic but hit a runtime error when trying to get the first two parts
of the split. `test` did not split at all since it contains no `-` and therefore
we error with `slice bounds out of range`.
To fix this, we now use the fact that all Linkerd components have a
`linkerd.io/control-plane-component` or `component` label with a value that is
the owning Deployment. This allows us to avoid any extra parsing logic and just
look at a single label value.
Additionally, some of these checks get all the Pods in a namespace with the
`GetPodsByNamespace` method but we don't always need something so general. In
the places where we are checking specifically for Linkerd components, we can
narrow this further by using the expected LabelSelector such as
`linkerd.io/extension=viz`.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
Release v1.6.1 of proxy-init adds support for iptables-nft. This change
bumps up the proxy-init version used in code, chart values, and golden
files.
* Update go.mod dep
* Update CNI plugin with new opts
* Update proxy-init ref in golden files and chart values
* Update policy controller CI workflow
Signed-off-by: Matei David <matei@buoyant.io>
Fixes#8660
We add the HttpRoute CRD to the CRDs installed with `linkerd install --crds` and `linkerd upgrade --crds`. You can use the `--set installHttpRoute=false` to skip installing this CRD.
Signed-off-by: Alex Leong <alex@buoyant.io>
Fixes#8555
We remove the "cluster networks can be verified" which checks that the podCIDR field exists on nodes and replace it with a "cluster networks contains all pods" check. This looks at all the pods in the cluster an verifies that each pod's IP is contained in the cluster networks.
Signed-off-by: Alex Leong <alex@buoyant.io>
Fixes#8372
Add a `--crds` flag to `linkerd check`. This flag causes `linkerd check` to validate that the Linkerd CRDs have been installed, and will wait until the check succeeds. This way, `linkerd check --crds` can be used after `linkerd install --crds` and before `linkerd install` to ensure the CRDs have been installed successfully and to avoid race conditions where `linkerd install` could potentially attempt to create custom resources for which the CRD does not yet exist.
Signed-off-by: Alex Leong <alex@buoyant.io>
Fixes#8373
We update `CheckCustomResourceDefinitions` so that it not only checks for the existence of the CRDs, but also ensures that they contain the latest version of each CRD. Note that this means that we'll need to keep this list of CRD versions in `CheckCustomResourceDefinitions` in sync with the actual CRD versions in the templates. We also add this check to `linkerd upgrade` when the `--crds` flag is not provided. This means that users who are upgrading will be required to run `linkerd upgrade --crds` first if they don't have the latest version of any of the CRDs.
Signed-off-by: Alex Leong <alex@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
We frequently compare data structures--sometimes very large data
structures--that are difficult to compare visually. This change replaces
uses of `reflect.DeepEqual` with `deep.Equal`. `go-test`'s `deep.Equal`
returns a diff of values that are not equal.
Signed-off-by: Oliver Gould <ver@buoyant.io>
Fixes: #8173
In order to support having custom resources in the default Linkerd installation, it is necessary to add a separate install step to install CRDs before the core install. The Linkerd Helm charts already accomplish this by having CRDs in a separate chart.
We add this functionality to the CLI by adding a `--crds` flag to `linkerd install` and `linkerd upgrade` which outputs manifests for the CRDs only and remove the CRD manifests when the `--crds` flag is not set. To avoid a compounding of complexity, we remove the `config` and `control-plane` stages from install/upgrade. The effect of this is that we drop support for splitting up an install by privilege level (cluster admin vs Linkerd admin).
The Linkerd install flow is now always a 2-step process where `linkerd install --crds` must be run first to install CRDs only and then `linkerd install` is run to install everything else. This more closely aligns the CLI install flow with the Helm install flow where the CRDs are a separate chart. Attempting to run `linkerd install` before the CRDs are installed will result in a helpful error message.
Similarly, upgrade is also a 2-step process of `linkerd upgrade --crds` follow by `linkerd upgrade`.
Signed-off-by: Alex Leong <alex@buoyant.io>
In order to restrict pods to run only on arbitrarily chosen nodes, affinities
or tolerations can be used. Currently, Linkerd only supports tolerations,
which are applied to pods and allow them to be scheduled on nodes with
matching "taints".
Certain environments and workflows lean more towards affinity instead of
tolerations to determine preferred or required scheduling. This change
introduces a new "nodeAffinity" field so that users may specify affinity
rules for scheduling Linkerd pods.
Closes#8136
Signed-off-by: Michal Romanowski <michal.rom089@gmail.com>
Closes#8010.
Pods that have `NodeShutdown` status should be skipped during validation as they will not have a running proxy container.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
Closes#7826
This adds the `gosec` and `errcheck` lints to the `golangci` configuration. Most significant lints have been fixed my individual changes, but this enables them by default so that all future changes are caught ahead of time.
A significant amount of these lints are been exluced by the various `exclude-rules` rules added to `.golangci.yml`. These include operations are files that generally do not fail such as `Copy`, `Flush`, or `Write`. We also choose to ignore most errors when cleaning up functions via the `defer` keyword.
Aside from those, there are several other rules added that all have comments explaining why it's okay to ignore the errors that they cover.
Finally, several smaller fixes in the code have been made where it seems necessary to catch errors or at least log them.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
The `WithTimeout` documentation states:
> Canceling this context releases resources associated with it, so code should call cancel as soon as the operations running in this Context complete
We only use the context for calling `c.check`, therefore we can call cancel immediately after `c.check` completes to free resources associated with the timeout. This prevents potentially holding on to (and accumulating) timeout related resources for the entire duration of the loop.
Signed-off-by: Alex Leong <alex@buoyant.io>