It would be useful to know how prevalent 32-bit ARM deployments are so
that we can determine whether it makes sense to continue supporting this
platform. This change adds an `arch` field to heartbeats indicating the
CPU architecture of the heartbeat container.
Signed-off-by: Oliver Gould <ver@buoyant.io>
Fixes issue described in [this comment](https://github.com/linkerd/linkerd2/issues/9310#issuecomment-1247201646)
Rollback #7382
Should be cherry-picked back into 2.12.1
For 2.12.0, #7382 removed the env vars `_l5d_ns` and `_l5d_trustdomain` from the proxy manifest because they were no longer used anywhere. In particular, the jaeger injector used them when injecting the env var `LINKERD2_PROXY_TAP_SVC_NAME=tap.linkerd-viz.serviceaccount.identity.$(_l5d_ns).$(_l5d_trustdomain)` but then started using values.yaml entries instead of these env vars.
The problem is when upgrading the core control plane (or anything else) to 2.12.0, the 2.11 jaeger extension will still be running and will attempt to inject the old env var into the pods, making reference to `l5d_ns` and `_l5d_trustdomain` which the new proxy container won't offer anymore. This will put the pod in an error state.
This change restores back those env vars. We will be able to remove them at last in 2.13.0, when presumably the jaeger injector would already have already been upgraded to 2.12 by the user.
Replication steps:
```bash
$ curl -sL https://run.linkerd.io/install | LINKERD2_VERSION=stable-2.11.4 sh
$ linkerd install | k apply -f -
$ linkerd jaeger install | k apply -f -
$ linkerd check
$ curl -sL https://run.linkerd.io/install | LINKERD2_VERSION=stable-2.12.0 sh
$ linkerd upgrade --crds | k apply -f -
$ linkerd upgrade | k apply -f -
$ k get po -n linkerd
NAME READY STATUS RESTARTS AGE
linkerd-identity-58544dfd8-jbgkb 2/2 Running 0 2m19s
linkerd-destination-764bf6785b-v8cj6 4/4 Running 0 2m19s
linkerd-proxy-injector-6d4b8c9689-zvxv2 2/2 Running 0 2m19s
linkerd-identity-55bfbf9cd4-4xk9g 0/2 CrashLoopBackOff 1 (5s ago) 32s
linkerd-proxy-injector-5b67589678-mtklx 0/2 CrashLoopBackOff 1 (5s ago) 32s
linkerd-destination-ff9b5f67b-jw8w5 0/4 PostStartHookError 0 (8s ago) 32s
```
The controller's k8s client was using `admissionregistration/v1beta1`
for its MWC shared informer. `v1beta1` was removed in k8s 1.22, and `v1`
was introduced in k8s 1.16:
https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-22
Modify the controller's k8s client to use `admissionregistration/v1` for
its MWC shared informer.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The identity controller requires access to read all deployments. This
isn't necessary.
When these permissions were added in #3600, we incorrectly assumed that
we must pass a whole Deployment resource as a _parent_ when recording
events. The [EventRecorder docs] say:
> 'object' is the object this event is about. Event will make a
> reference--or you may also pass a reference to the object directly.
We can confirm this by reviewing the source for [GetReference]: we can
simply construct an ObjectReference without fetching it from the API.
This change lets us drop unnecessary privileges in the identity
controller.
[EventRecorder docs]: https://pkg.go.dev/k8s.io/client-go/tools/record#EventRecorder
[GetReference]: ab826d2728/tools/reference/ref.go (L38-L45)
Signed-off-by: Oliver Gould <ver@buoyant.io>
The naming of policy API fields uses underscores but the JSON
spec in k8s uses camel case. This leads to nil values while working
with the SharedInformerFactory API.
Signed-off-by: aatarasoff <aatarasoff@gmail.com>
Closes#9312#9118 introduced the `linkerd.io/trust-root-sha256` annotation which is
automatically added to control plane components.
This change ensures that all injected workloads also receive this annotation.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
04a66ba added a `ReadHeaderTimeout` to our HTTP servers (at gosec's
insistence). We chose a fairly arbitrary timeout of 10s. This
configuration causes any connection that has been idle for 10s to be
torn down by the server. Unfortunately, this timeout value matches the
default Kubernetes probe interval and the default linkerd-viz scrape
interval. This can cause probes to race the timeout so that the
connection is healthy from the proxy's point of view and a request is
sent on the connection exactly as the server drops the connection.
These request failures cause controller success rate to appear degraded.
To correct this, this change raises the timeout to 15s so that the
timeout no longer matches the default probe interval.
The proxy's HTTP client is supposed to [retry] requests that encounter
this type of error. We should follow up by doing more research into why
that is not occurring in this situation.
[retry]: https://docs.rs/hyper/0.14.20/hyper/client/struct.Builder.html#method.retry_canceled_requests
Signed-off-by: Oliver Gould <ver@buoyant.io>
In 2.11.x, proxyInit.runAsRoot was true by default, which caused the
proxy-init's runAsUser field to be 0. proxyInit.runAsRoot is now
defaulted to false in 2.12.0, but runAsUser still isn't
configurable, and when following the upgrade instructions
here, helm doesn't change runAsUser and so it conflicts with the new value
for runAsRoot=false, resulting in the pods erroring with this message:
Error: container's runAsUser breaks non-root policy (pod: "linkerd-identity-bc649c5f9-ckqvg_linkerd(fb3416d2-c723-4664-acf1-80a64a734561)", container: linkerd-init)
This PR adds a new default for runAsUser to avoid this issue.
Closes#9141
This introduces the `--destination-pod` flag to the `linkerd diagnostics
endpoints` command which allows users to target a specific destination Pod when
there are multiple running in a cluster.
This can be useful for issues like #8956, where Linkerd HA is installed and
there seem to be stale endpoints in the destination service. Being able to run
this command and identity which destination Pod (if not all) have an incorrect
view of the cluster.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
* Bump proxy-init to v2.0.0
New release of proxy-init.
Updated:
* Helm values to use v2.0.0 of proxy-init
* Helm docs
* Tests
Note: go dependencies have not been updated since the new version will
break API compatibility with older versions (source files have been
moved, see issue for more details).
Closes#9164
Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Oliver Gould <ver@buoyant.io>
Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Oliver Gould <ver@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
Newer versions of golangci-lint flag `http.Server` instances that do not
set a `ReadHeaderTimeout` as being vulnerable to "slowloris" attacks,
wherein clients initiate requests that hold connections open
indefinitely.
This change sets a `ReadHeaderTimeout` of 10s. This timeout is fairly
conservative so that clients can eagerly create connections, but is
still constrained enough that these connections won't remain open
indefinitely.
This change also updates kubert to v0.9.1, which instruments a header
read timeout on the policy admission server.
Signed-off-by: Oliver Gould <ver@buoyant.io>
The proxy-init repo is changing its structure and, as such, we want to
minimize cross-repo dependencies from linkerd2 to linkerd2-proxy-init.
(We expect the cni-plugin code to move in a followup change).
This change duplicates the port range parsing utility (about 50 lines,
plus tests). This avoids stray dependencies on linkerd2-proxy-init.
Signed-off-by: Oliver Gould <ver@buoyant.io>
Some hosts may not have 'nft' modules available. Currently, proxy-init
defaults to using 'iptables-nft'; if the host does not have support for
nft modules, the init container will crash, blocking all injected
workloads from starting up.
This change defaults the 'iptablesMode' value to 'legacy'.
* Update linkerd-control-plane/values file default
* Update proxy-init partial to default to 'legacy' when no mode is
specified
* Change expected values in 'pkg/charts/linkerd2/values_test.go' and in
'cli/cmd/install_test'
* Update golden files
Fixes#9053
Signed-off-by: Matei David <matei@buoyant.io>
* Allow initializing a k8s namespace-scoped API
This allows reusing the `k8s.API` informers by other projects that don't
necessarily have cluster-wide permissions.
`json5-to-json` lets us process JSON files like devcontainer.json
safely.
Also add `just` to the go image.
Signed-off-by: Oliver Gould <ver@buoyant.io>
Add go client codegen for HttpRoute. This will be necessary for any of the go controllers (i.e. metrics-api) or go CLI commands to interact with HttpRoute resources in kubernetes.
Signed-off-by: Alex Leong <alex@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
This change introduces a new value to be used at install (or upgrade)
time. The value (`proxyInit.iptablesMode=nft|legacy`) is responsible
for starting the proxy-init container in nft or legacy mode.
By default, the init container will use iptables-nft. When the mode is set to
`nft`, it will instead use iptables-nft. Most modern Linux distributions
support both, but a subset (such as RHEL based families) only support
iptables-nft and nf_tables.
Signed-off-by: Matei David <matei@buoyant.io>
Go 1.18 features a number of important chanages, notably removing client
support for defunct TLS versions: https://tip.golang.org/doc/go1.18
This change updates our Go version in CI and development.
Signed-off-by: Oliver Gould <ver@buoyant.io>
This change bumps the proxy-init version from v1.6.1 to the latest
version, v1.6.2. As part of the new release, proxy-init now adds
net_admin and net_raw sys caps to xtables-nft-multi so that nftables
mode can be used without requiring root privileges.
* Bump go.mod
* Bump version in helm values
* Bump version in misc files
* Bump version in code
Signed-off-by: Matei David <matei@buoyant.io>
Release v1.6.1 of proxy-init adds support for iptables-nft. This change
bumps up the proxy-init version used in code, chart values, and golden
files.
* Update go.mod dep
* Update CNI plugin with new opts
* Update proxy-init ref in golden files and chart values
* Update policy controller CI workflow
Signed-off-by: Matei David <matei@buoyant.io>
Watch events for objects in the kube-system namespace were previously ignored.
In certain situations, this would cause the destination service to return
invalid (outdated) endpoints for services in kube-system - including unmeshed
services.
It [was suggested][1] that kube-system events were ignored to avoid handling
frequent Endpoint updates - specifically from [controllers using Endpoints for
leader elections][2]. As of Kubernetes 1.20, these controllers [default to using
Leases instead of Endpoints for their leader elections][3], obviating the need
to exclude (or filter) updates from kube-system. The exclusions have been
removed accordingly.
[1]: https://github.com/linkerd/linkerd2/pull/4133#issuecomment-594983588
[2]: https://github.com/kubernetes/kubernetes/issues/86286
[3]: https://github.com/kubernetes/kubernetes/pull/94603
Signed-off-by: Jacob Henner <code@ventricle.us>
Fixes#8592
Increase the minimum supported kubernetes version from 1.20 to 1.21. This allows us to drop support for batch/v1beta1/CronJob and discovery/v1beta1/EndpointSlices, instead using only v1 of those resources. This fixes deprecation warnings about these warnings printed by the CLI.
Signed-off-by: Alex Leong <alex@buoyant.io>
The injector configures the proxy with a set of known inbound ports
which are used (by the proxy) to discover inbound server configuration.
The list of ports is derived from the pod's container ports; container
ports may be optional and thus not present. The proxy supports dynamic
discovery of additional ports at runtime but since they are lazy,
additional ports may be dropped or updated long after pod start-up.
To ensure HTTP probes are handled correctly, this change introduces new
functionality to configure the list of inbound ports for the proxy with
any ports targeted by healthcheck probes, as long as they are HTTP, and
even if they are not present in the containerPorts configuration.
This change also introduces additional liveness (or readiness) probes to
the current injector webhook test fixtures in order to assert that
injected pods will always have their healthcheck target ports included
in the proxy's configuration.
Closes#8638
Signed-off-by: Matei David <matei@buoyant.io>
* Fix injector not emitting skip events properly
A resource that cannot be injected -- for a variety of reasons, such as
using the host network, or not mounting a SA token when token projection
is disabled -- may still have an annotation patch if the resource's
endpoints have ports marked as opaque.
When an annotation patch is generated but an injection patch is not, the
injector will not emit an event and consider the resource as "injected",
when in fact it does not have the sidecar. This can make it hard to
investigate issues where resources that are supposed to be injected are
not because the failure is not obvious.
This change refactors and simplifies the logic by emitting an event
whenever a resource is not injected, regardless of whether an annotation
patch is generated for it. This should provide better visibility in case
of failures. Furthermore, the change refactors the code to avoid too
many early returns and make it easier to trace the codepath through the
function. The initial assumption that an annotation patch should not
increment injection skipped admission response has been left intact; in
other words, an annotation patch will not count in the metrics as a
skip, but an event with the skip reason is still emitted.
Fixes#8634
Signed-off-by: Matei David <matei@buoyant.io>
Fixes#8624
When the proxy-injector encounters a resource with an owner ref, it calls `api.GetObjects` to fetch the owner. If the owner is a kind which is not supported by the proxy-injector, we will panic.
We add a condition so that we only attempt to fetch the owner resource if it is a kind we support.
Signed-off-by: Alex Leong <alex@buoyant.io>
Our docker images hardcode a patch version, 1.17.3, which does not
include a variety of important fixes that have been released:
> go1.17.4 (released 2021-12-02) includes fixes to the compiler, linker,
> runtime, and the go/types, net/http, and time packages. See the Go
> 1.17.4 milestone on our issue tracker for details.
> go1.17.5 (released 2021-12-09) includes security fixes to the net/http
> and syscall packages. See the Go 1.17.5 milestone on our issue tracker
> for details.
> go1.17.6 (released 2022-01-06) includes fixes to the compiler, linker,
> runtime, and the crypto/x509, net/http, and reflect packages. See the Go
> 1.17.6 milestone on our issue tracker for details.
> go1.17.7 (released 2022-02-10) includes security fixes to the go
> command, and the crypto/elliptic and math/big packages, as well as bug
> fixes to the compiler, linker, runtime, the go command, and the
> debug/macho, debug/pe, and net/http/httptest packages. See the Go 1.17.7
> milestone on our issue tracker for details.
> go1.17.8 (released 2022-03-03) includes a security fix to the
> regexp/syntax package, as well as bug fixes to the compiler, runtime,
> the go command, and the crypto/x509 and net packages. See the Go 1.17.8
> milestone on our issue tracker for details.
> go1.17.9 (released 2022-04-12) includes security fixes to the
> crypto/elliptic and encoding/pem packages, as well as bug fixes to the
> linker and runtime. See the Go 1.17.9 milestone on our issue tracker for
> details.
> go1.17.10 (released 2022-05-10) includes security fixes to the syscall
> package, as well as bug fixes to the compiler, runtime, and the
> crypto/x509 and net/http/httptest packages. See the Go 1.17.10 milestone
> on our issue tracker for details.
> go1.17.11 (released 2022-06-01) includes security fixes to the
> crypto/rand, crypto/tls, os/exec, and path/filepath packages, as well as
> bug fixes to the crypto/tls package. See the Go 1.17.11 milestone on our
> issue tracker for details.
This changes our container configs to use the latest 1.17 release on
each build so that these patch releases are picked up without manual
intervention.
Signed-off-by: Oliver Gould <ver@buoyant.io>
Fixes#8134
The multicluster probe-gateway service (which is used by the service-mirror controller to send health probes to the remote gateway) has `mirror.linkerd.io/cluster-name` label but it does not have a `mirror.linkerd.io/remote-svc-fq-name` annotation. This makes sense because the probe-gateway service does not correspond to any target service on the remote cluster and instead targets the gateway itself.
We update the logic in the destination controller so that we can still add the `dst_target_cluster` metric label when only the `mirror.linkerd.io/cluster-name` label is present but not the `mirror.linkerd.io/remote-svc-fq-name` annotation. This allows us to have the `dst_target_cluster` metric label for service-mirror controller probe traffic.
Signed-off-by: Alex Leong <alex@buoyant.io>
We frequently compare data structures--sometimes very large data
structures--that are difficult to compare visually. This change replaces
uses of `reflect.DeepEqual` with `deep.Equal`. `go-test`'s `deep.Equal`
returns a diff of values that are not equal.
Signed-off-by: Oliver Gould <ver@buoyant.io>
Opaque ports may be configured through annotations, where the value may
be either as a range (e.g 0 - 255), or as a comma delimited string
("121,122"). When configured as a comma delimited string, our parsing
logic will trim any leading and trailing commas and split the value;
ports will be converted from string to an int counterpart.
If whitespaces are used, such that the value looks similar to "121,
122", the parsing logic will fail -- when attempting to convert the
string into an integer -- with the following error "\" 122\" is not a
valid lower-bound". This can lead to confusion from users whose services
and endpoints function with undefined behaviour.
This change introduces logic to strip any leading or trailing
whitespaces from strings when tokenizing the annotation value; this way,
we are guaranteed not to experience parsing errors when converting
strings. To validate the behaviour, a new (unit) test has been added to
the opaque ports watcher with a multi-opaque-port annotation on a
service, where the value contains a space.
Signed-off-by: Matei David <matei@buoyant.io>
While looking into #8273, I wanted to confirm that the destination
controller uses the default opaque ports configuration for arbitrary
(unmeshed) IPs.
This change adds a test that exercises resolution behavior for external
IPs.
Signed-off-by: Oliver Gould <ver@buoyant.io>
This change updates the repo to use `linkerd2-proxy-api` v0.4.0 and
updates `bin/protoc` to use v3.20 to match the configuration in the
other repo.
The policy-controller builds are updated to use our `bin/protoc` wrapper
so that all builds go through the same toolchain (and to avoid compiling
protoc on each build).
Signed-off-by: Oliver Gould <ver@buoyant.io>
c1a1430d added new policy CRDs: `AuthoriationPolicy`,
`MeshTLSAuthentication` and `NetworkAuthentiction` with a controller
implemented in Rust.
This change adds Go types for these resources so that they may be
accessed from the CLI, etc.
Co-authored-by: Zahari Dichev zaharidichev@gmail.com
Signed-off-by: Zahari Dichev zaharidichev@gmail.com
Signed-off-by: Oliver Gould <ver@buoyant.io>
The destination controller can improperly handle updates by returning a
map reference instead of a new data structure. This breaks diffing logic,
as newly added endpoints appear to pre-exist.
This change ensures that a fresh data structure is used when handling
discovery updates.
Fixes#8143
Signed-off-by: Alex Leong <alex@buoyant.io>
Follow-up to #8087 that allows pprof to be enabled via the `--set
enablePprof=true` flag.
Each control plane components spawns its own admin server, so each of these
received it's own `enable-pprof` flag. When `enablePprof=true`, it is passed
through to each component so that when it launches its admin server, its pprof
endpoints are enabled.
A note on the templating: `-enable-pprof={{.Values.enablePprof | default
false}}`. `false` values are not rendered by Helm so without the `... | default
false}}`, it tries to pass the flag as `-enable-pprof=""` which results in an
error. Inlining this felt better than conditionally passing the flag with
```yaml {{ if .Values.enablePprof -}} -enable-pprof={{.Values.enablePprof}} {{
end -}} ```
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
Closes#7881
This makes the rest of the necessary fixes to satisfy the `staticcheck` lint.
The only class of lints that are being skipped are those related to deprecated tap code. There was some discussion on the original change started by @adleong about if this _actually_ deprecated [here](https://github.com/linkerd/linkerd2/pull/3240#discussion_r313634584); it doesn't look like we every came back around to fully removing it but I don't think it should be a blocker for enabling the lint right now.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
Closes#7980
A pod is considered `Burstable` instead of `Guaranteed` if there exists at least one container in the pod that specifies CPU/memory limits/requests that do not match.
The `linkerd-init` container falls into this category meaning that even if all other containers in a Pod have matching CPU/memory limits/requests, the Pod will not be considered `Guaranteed` because of `linkerd-init`'s hardcoded values.
This changes the values to match, meaning that `linkerd-init` will not be the culprit container if a Pod is not considered `Guaranteed`. Raising the requests—instead of lowering the limits—felt like the safer option here. This means that the container will now always be guaranteed these amounts _and_ will never use more.
[Docs](https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed) explain this in more detail.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
Closes#7826
This adds the `gosec` and `errcheck` lints to the `golangci` configuration. Most significant lints have been fixed my individual changes, but this enables them by default so that all future changes are caught ahead of time.
A significant amount of these lints are been exluced by the various `exclude-rules` rules added to `.golangci.yml`. These include operations are files that generally do not fail such as `Copy`, `Flush`, or `Write`. We also choose to ignore most errors when cleaning up functions via the `defer` keyword.
Aside from those, there are several other rules added that all have comments explaining why it's okay to ignore the errors that they cover.
Finally, several smaller fixes in the code have been made where it seems necessary to catch errors or at least log them.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
Fixes#7904
Allow the `Server` CRD to have the `PodSelector` entry be an empty object, by removing the `omitempty` tag from its go type definition and the `oneof` section in the CRD. No update to the CRD version is required, as this is BC change -- The CRD overriding was tested fine.
Also added some unit tests to confirm podSelector conditions are ANDed, and some minor refactorings in the `Selector` constructors.
Co-authored-by: Oliver Gould <ver@buoyant.io>
[gocritic][gc] helps to enforce some consistency and check for potential
errors. This change applies linting changes and enables gocritic via
golangci-lint.
[gc]: https://github.com/go-critic/go-critic
Signed-off-by: Oliver Gould <ver@buoyant.io>
Since Go 1.13, errors may "wrap" other errors. [`errorlint`][el] checks
that error formatting and inspection is wrapping-aware.
This change enables `errorlint` in golangci-lint and updates all error
handling code to pass the lint. Some comparisons in tests have been left
unchanged (using `//nolint:errorlint` comments).
[el]: https://github.com/polyfloyd/go-errorlint
Signed-off-by: Oliver Gould <ver@buoyant.io>
When the webhook server decodes a request's JSON payload, it may try to
use a nil value when handling the error. Furthermore, if the JSON
payload has a `nil` `Request` value, it may attempt to dereference the
value.
This change improves the webhook server's error handling to return a
`400 Bad Request` status if either of these cases are encountered.
Signed-off-by: Oliver Gould <ver@buoyant.io>
In several places where we build TLS servers (usually in our webhooks),
we use the default TLS configuration, which enables legacy versions of
TLS.
This change updates these servers to specify a minimum TLS version of
v1.2.
Signed-off-by: Oliver Gould <ver@buoyant.io>