Long ago, the policy-controller image shipped with a distroless base image, but
we have since been able to remove all runtime dependencies and ship with a
scratch image. There's no reason to manage this binary seperately from the rest
of the controller.
This change moves the controller/Dockerfile to Dockerfile.controller, and it is
updated to subsume the policy-controller/Dockerfile.
This should *not* impact users, except to reduce the overhead of extra image
pulls.
BREAKING CHANGE: with this change, we no longer ship a seperate
policy-controller image.
In a recent change, we introduced the `PatchProducer` abstraction in order to provide a way to add customizable injection logic to the injector. After some further experimentation, it has became clear that there might be better ways to customize the injection logic.
For now, we remove this abstraction as we are not going to be using it in the near future.
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
* chore(helm)!: change iptables default mode to `nft`
This change sets as a new default `nft` for the `proxyInit.iptablesMode`
and `iptablesMode` values in the linkerd-control-plane and linkerd2-cni
helm charts. This doesn't imply any change in user-facing behavior.
This was prompted by EKS with k8s 1.33 no longer supporting the iptables
legacy mode.
Further testing in multiple platforms with different k8s versions
revealed nft mode is now broadly supported.
Upgrading via Helm will apply the new default, unless the initial
install explicitly set the legacy mode.
Replace the static FS generated by the vfs package with the go embed directive. These are the same in concept, except that the go embed directive is performed automatically by the go build system at build time instead of requiring a separate `go generate` step. Therefore this allows us to remove our use of `go generate` and avoids needing to manage generated sources.
Signed-off-by: Alex Leong <alex@buoyant.io>
Replace the static FS generated by the vfs package with the go embed directive. These are the same in concept, except that the go embed directive is performed automatically by the go build system at build time instead of requiring a separate `go generate` step. Therefore this allows us to remove our use of `go generate` and avoids needing to manage generated sources.
Signed-off-by: Alex Leong <alex@buoyant.io>
This PR introduces the `PatchProducer` abstraction. This allows to abstract away the creation of proxy injection patches and to potentially layer multiple patches on top of each other.
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Fixes#14176
When a Server resource has an invalid podSelector, it triggers an error in the destination controller's `SetToServerProtocol` and aborts processing any further Servers and aborts setting addresses on destination controller responses, leading to failures across the mesh.
One invalid Server should not bring down the mesh. Instead, when encountering an invalid Server, we log an error message and then continue processing any other Servers and continue to set addresses as usual.
As a more comprehensive solution, we should also update the Server resource validation so that invalid Servers cannot be created and add a Status subresource to Server to reflect the current validity of the Server. These followups are tracked in #14194
Signed-off-by: Alex Leong <alex@buoyant.io>
* Add pod ip to via downward API to Trace Attributes
Provide additional attributes for tracing associations
Modified the helm templates to add the pod IP, and the jaeger injector to add it to the
standard attributes
Deploying should show the ip as a trace attribute
Fixes#13980
Signed-off-by: Justin <justin@sphinxdefense.com>
* fix: Update proxy-injector controller test goldens
These aren't updated automatically by the `go test ./cli/cmd/... --update` command, so they have to be updated manually.
Signed-off-by: Scott Fleener <scott@buoyant.io>
---------
Signed-off-by: Justin <justin@sphinxdefense.com>
Signed-off-by: Scott Fleener <scott@buoyant.io>
Co-authored-by: Scott Fleener <scott@buoyant.io>
In order to make the way that proxy injector patches more flexible, we adjust the method signature of `ResourceConfig.GetPodPatch` to accept a `ValueOverrider`. The type of `ValueOverrider` is:
```
func(values *l5dcharts.Values, overrides map[string]string, namedPorts map[string]int32) (*l5dcharts.Values, error)
```
and specifies how overrides (in the form of pod and namespace annotations) get translated into values for the proxy patch template.
The current override behavior, specified in `GetOverriddenValues`, is supplied in all cases, making this a refactor with no functional changes.
Signed-off-by: Alex Leong <alex@buoyant.io>
The federated service watcher test has a race condition where we create a cluster store with a set of kubernetes manifests and then immediately begin testing queries to that cluster store. If these queries are executed before the cluster store's informers process the kubernetes manifests, the queries can fail.
In the context of this test, this failure manifests as the read on the updates channel never returning, resulting in test timeouts.
We fix this by waiting for the cluster store to be populated before continuing with the test and issuing queries.
Signed-off-by: Alex Leong <alex@buoyant.io>
The destination controller's cluster store registers a gague in its constructor. When this constructor is called multiple times (i.e. in tests), this can lead to a panic.
To avoid this panic, this change updates NewClusterStoreWithDecoder to accept a prometheus registry). The NewClusterStore constructor (used by the application's main) continues to use the default registry, but tests now construct their own temporary registries to avoid duplicate registration errors.
Depends on https://github.com/linkerd/linkerd2/pull/13801
Adds support for excluding certain labels and annotations from being copied onto mirror and federated services. This makes use of the `excludedLabels` and `excludedAnnoations` fields in the Link resource. These fields take a list of strings which may be literal label/annotation names or they may be group globs of the form `<group>/*` which will match all labels/annotations beginning with `<group>/`. Any matching labels or annotations will not be copied.
We also add corresponding flags to the `mc link` command: `--excluded-labels` and `--excluded-annotations` for setting these fields on the Link resource.
Linkerd proxies no longer omit `hostname` labels for outbound policy metrics (due to potential for high-cardinality).
This change adds Helm templates and annotations to control this behavior, allowing users to opt-in to these outbound hostname labels.
Signed-off-by: Scott Fleener <scott@buoyant.io>
#13783 moved the service mirror permissions on Links from a Role to a ClusterRole as a side-effect, and this change reverts that by refactoring the Links API to allow consuming a namespace-scoped API more easily.
- We introduce in our `k8s.API` type a field `L5dClient` alongside the broad `Client` one, which is constructed via the new function `NewL5dNamespacedAPI()`.
- In the service-mirror `main.go` we use that constructor to acquire `linksAPI`, which is used to configure the informer for handling Link events in this file.
- `linksAPI` is also passed down to instantiations of `RemoteClusterServiceWatcher`, where it's used for the direct kube-apiserver calls and for retrieving a Lister.
We add a new v1alpha3 resource version to the Link custom resource. This version adds `excludedAnnotations` and `excludedLabels` fields to the spec which will be used to exclude labels and annotations from being copied onto mirror and federated services.
Signed-off-by: Alex Leong <alex@buoyant.io>
Followup to #13778, where a new test case was introduced for testing the
debug container annotation, but didn't account for the new
LINKERD2_PROXY_CORES_MIN environment variable.
Issue #13636 was opened stating that custom debug container annotations
had no effect.
Quick investigation confirmed the issue and further debugging revealed a
bug in code where the final values for helm chart were not using values
processed by GetOverriddenValues function and that's why annotations had
no effect for debug containers. This had been fixed now.
Added to unit test to test added code. Manual testing also done. The
issue seems to be resolved.
Fixes#13636
Signed-off-by: Vishal Tewatia <tewatiavishal3@gmail.com>
Co-authored-by: Vishal Tewatia <tewatiavishal3@gmail.com>
When the service mirror controller detects the first member of a federated service, it will create the federated service itself and will copy the port definitions from the founding member service. However, from that point on, the port definition in the federated service will never be updated, even if new members that define different ports join the federated service or if the ports in the founding member service are updated.
This leads to a scenario where the federated service’s port definitions become out of date if the member services’ port definitions change. Similarly for labels and annotations.
We update the service mirror controller to look at the createdAt timestamp of each member service's Link resource and to use the member with the oldest Link as the authoritative source of truth for the federated service metadata: labels, annotations, and ports.
In the case where the service with the oldest Link leaves the federated service, it's metadata will continue to be present on the federated service until a resync occurs from the member with the next oldest Link (at most 10 minutes).
Signed-off-by: Alex Leong <alex@buoyant.io>
The proxy.cores helm value is overly restrictive: it enforces a hard upper
limit. In some scenarios, a fixed limit is not practical: for example, when the
proxy is meshing an application that configures no limits.
This change replaces the proxy.cores value with a new proxy.runtime.workers
structure, with members:
- `minimum`: configures the minimum number of worker threads a proxy may use.
- `maximumCPURatio`: optionally allows the proxy to use a larger
number of CPUs, relative to the number of available CPUs on the node.
So with a minimum of 2 and a ratio of 0.1, a proxy would run 2 worker threads
(the minimum) running on an 8 core node, but allocate 10 worker threads on a 96
core node.
When the `config.linkerd.io/proxy-cpu-limit` is used, that will continue to set
the maximum number of worker threads to a fixed value.
When it is not set, however, the minimum worker pool size is derived from the
`config.linkerd.io/proxy-cpu-request`.
An additional `config.linkerd.io/proxy-cpu-ratio-limit` annotation is introduced
to allow workload-level configuration.
A follow up to https://github.com/linkerd/linkerd2/pull/13699, this default-enables the config option introduced in that PR. Now, all traffic between meshed pods should flow to the proxy's inbound port.
Signed-off-by: Scott Fleener <scott@buoyant.io>
Traffic that is meant for the destination workload can be sent over the opaque transport without issue. However, traffic intended for the proxy itself (metrics scraping, tap) need to be sent directly to the corresponding proxy port to prevent them from being forwarded to the workflow.
This adds in special cases for the admin and control ports, read directly from the environment variables on the pods, that excludes them from being sent over opaque transport.
Signed-off-by: Scott Fleener <scott@buoyant.io>
Non-opaque meshed traffic currently flows over the original destination port, which requires the inbound proxy to do protocol detection.
This adds an option to the destination controller that configures all meshed traffic to flow to the inbound proxy's inbound port. This will allow us to include more session protocol information in the future, obviating the need for inbound protocol detection.
This doesn't do much in the way of testing, since the default behavior should be unchanged. When this default changes, more validation will be done on the behavior here.
Signed-off-by: Scott Fleener <scott@buoyant.io>
* fix(service-mirror): don't restart cluster watch upon Link status updates
Every time there's an update to a Link resource the service mirror restarts the cluster watch after cleaning up any existing worker. We recently introduced a status stanza in Link that gets updated upon every mirroring of a service, which was unnecessarily triggering a cluster watcher restart. For a sufficiently high number of services getting mirrored at once this was causing severe contention on the controller, delaying mirroring up to a halt.
This change fixes the situation by only considering changes in the Link Spec for restarting the cluster watch.
* Lower log level
* Extract the resource event handler functions into a separate file, and add unit test making sure the add/update/delete functions are called, and that in particular the update function is _not_ called when updating a Link status.
* fix(destination): GetProfile requests targeting pods directly should return endpoint data for running (not necessarily ready) pods
Requiring Pods to pass readiness checks before allowing Pod to Pod communication disrupts communication in e.g. clustered systems which require Pods to communicate with each other prior to establishing ready state and allowing inbound traffic.
Relaxed the requirement and modified the workload watcher to only require that a Pod exists and is in Running phase.
Reproduced the issue with a test setup described in #13247.
Fixes#13247.
---------
Signed-off-by: Tuomo <tjorri@gmail.com>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
These values are useful as fields for correlating OpenTelemetry traces. A corresponding proxy change will be needed to emit these fields in said traces.
Signed-off-by: Scott Fleener <scott@buoyant.io>
This helps ensure a minimum level of security. The two places this affects is our controller webhook and linkerd-viz tap API.
The controller requires that kube-api supports TLSv1.3, which it does as of 1.19 (our minimum is currently 1.22). The linkerd-viz tap API is mostly used internally, and is deprecated. It may be worth revisiting if we want to keep it around at all.
Signed-off-by: Scott Fleener <scott@buoyant.io>
The proxy accepts log filters in the form `target[fields...]=level`, where a
field may include a value match. This leads to log filters like
`linkerd[name="outbound"]=debug`.
When a log filter is configured via annotation or Helm, the proxy-injector fails
to properly quote the log environment variable, leading to a failure to patch
resources properly.
To fix this, this change ensures that the log level is quoted, which properly
escapes any quotes in the filter itself.
The bin/protoc script is ancient and not useful, especially in light of tools
provided by dev containers.
Furthermore, it includes a reference to an old gross sourcefource downlaod page
for unzip.
This change removes the unused script.
The linkerd-multicluster extension uses client-go's `unstructured` API to access Link custom resources. This API allowed us to develop quickly without the work of generating typed bindings. However, using the unstrucutred API is error prone since fields must be accessed by their string name. It is also inconsistent with the rest of the project which uses typed bindings.
We replace the use of the unstructured API for Link resources with generated typed bindings. The client-go APIs are slightly different and client-go does not provide a way to update subresources for typed bindings. Therefore, when updating a Link's status subresource, we use a patch instead of an update.
Signed-off-by: Alex Leong <alex@buoyant.io>
* docker.io/library/golang from 1.22 to 1.23
* gotestsum from 0.4.2 to 1.12.0
* protoc-gen-go from 1.28.1 to 1.35.2
* protoc-gen-go-grpc from 1.2 to 1.5.1
* docker.io/library/rust from 1.76.0 to 1.83.0
* cargo-deny from 0.14.11 to 0.16.3
* cargo-nextest from 0.9.67 to 0.9.85
* cargo-tarpaulin from 0.27.3 to 0.31.3
* just from 1.24.0 to 1.37.0
* yq from 4.33.3 to 4.44.5
* markdownlint-cli2 from 0.10.0 to 0.15.0
* shellcheck from 0.9.0 to 0.10.0
* actionlint from 1.6.26 to 1.7.4
* protoc from 3.20.3 to 29.0
* step from 0.25.2 to 0.28.2
* kubectl from 1.29.2 to 1.31.3
* k3d from 5.6.0 to 5.7.5
* k3s image shas
* helm from 3.14.1 to 3.16.3
* helm-docs from 1.12.0 to 1.14.2
We received a report of a panic:
runtime error: invalid memory address or nil pointer dereference
panic({0x1edb860?, 0x37a6050?}
/usr/local/go/src/runtime/panic.go:785 +0x132
github.com/linkerd/linkerd2/controller/api/destination/watcher.latestUpdated({0xc0006b2d80?, 0xc00051a540?, 0xc0008fa008?})
/linkerd-build/vendor/github.com/linkerd/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:1612 +0x125
github.com/linkerd/linkerd2/controller/api/destination/watcher.(*OpaquePortsWatcher).updateService(0xc0007d5480, {0x21fd160?, 0xc000d71688?}, {0x21fd160, 0xc000d71688})
/linkerd-build/vendor/github.com/linkerd/linkerd2/controller/api/destination/watcher/opaque_ports_watcher.go:141 +0x68
The `latestUpdated` function does not properly handle the case where a atime is
omitted from a `ManagedFieldsEntry`.
type ManagedFieldsEntry struct {
// Time is the timestamp of when the ManagedFields entry was added. The
// timestamp will also be updated if a field is added, the manager
// changes any of the owned fields value or removes a field. The
// timestamp does not update when a field is removed from the entry
// because another manager took it over.
// +optional
Time *Time `json:"time,omitempty" protobuf:"bytes,4,opt,name=time"`
This change adds a check to avoid the nil dereference.
Docker builds emit a warning because the case of 'FROM' and 'as' don't match. Fix this everywhere.
Signed-off-by: Derek Brown <6845676+DerekTBrown@users.noreply.github.com>
Adds tests for the federated service watcher that exercise having remote and local clusters join and leave a federated service and ensuring that the correct proxy API updates are emitted.
Signed-off-by: Alex Leong <alex@buoyant.io>
Ensure consistent JSON logging for proxy-injector container
When JSON logging is enabled in the proxy-injector `controllerLogFormat: json` some log messages adhere to the JSON format while others do not. This inconsistency creates difficulty in parsing logs, especially in automated workflows. For example:
```
{"level":"info","msg":"received admission review request \"83a0ce4d-ab81-42c9-abe4-e0ade0f926e2\"","time":"2024-10-10T21:06:18Z"}
time="2024-10-10T21:06:18Z" level=info msg="received pod/mypod"
```
Modified the logging implementation in the `controller/proxy-injector/webhook.go` to ensure all log messages follow the JSON format consistently. This was achieved by removing a new instance of the logrus logger that was being created in the file and replacing it with the global logger instance, ensuring all logs respect the controllerLogFormat configuration.
Reproduced the issue by enabling JSON logging and observing mixed-format logs when install the emojivoto sample application. Applied the changes and verified that all logs consistently use the JSON format.
Ran the linkerd check command and confirmed there are no additional warnings or issues.
Tested various scenarios, including pods with and without the injection annotation, to ensure consistent logging behavior.
Fixes#13168
Signed-off-by: Micah See msee@usc.edu
In order for proxies to properly reflect the resources used to drive policy
decisions, the proxy API has been updated to include resource metadata on
ServiceProfile responses.
This change updates the profile translator to include ParentRef and ProfileRef
metadata when it is configured.
This change does not set backend or endpoint references.
When the service mirror controller detects a service in the remote cluster which matches the federated service selector (`mirror.linkerd.io/federated=memeber` by default), it will add that service to the federated service in the local cluster named `<svc>-federated`, creating this service if it does not already exist. To join a service to a federated service, it is added to the `multicluster.linkerd.io/remote-discovery` annotation on the federated service which contains a comma separated list of values in the form `<svc>@<cluster>`. When a remote service no longer exists or matches the federated service selector, it is removed from the federated service by removing it from the `mutlicluster.linkerd.io/remote-discovery` annotation.
We also add a new `local-service-mirror` deployment to the Linkerd-multicluster extension which watches the local cluster for any services which match the federated service selector. Any services in the local cluster which match will be added to the federated service by setting the `mutlicluster.linkerd.io/local-discovery` annotation on the federated service to the local service name.
Signed-off-by: Alex Leong <alex@buoyant.io>
We add support for federated services to the destination controller by adding a new FederatedServiceWatcher. When the destination controller receives a `Get` request for a Service with the `multicluster.linkerd.io/remote-discovery` and/or the `multicluster.linkerd.io/local-discovery` annotations, it subscribes to the FederatedServiceWatcher instead of subscribing to the EndpointsWatcher directly. The FederatedServiceWatcher watches the federated service for any changes to these annotations, and maintains the appropriate watches on the local EndpointWatcher and/or remote EndpointWatchers fetched through the ClusterStore.
This means that we will often have multiple EndpointTranslators writing to the same `Get` response stream. In order for a `NoEndpoints` message sent to one EndpointTranslator to not clobber the whole stream, we make a change where `NoEndpoints` messages are no longer sent to the response stream, but are replaced by a `Remove` message containing all of the addresses from that EndpointTranslator. This allows multiple EndpointTranslators to coexist on the same stream.
Signed-off-by: Alex Leong <alex@buoyant.io>
Our generated client-go code committed in the repo has diverged from the code generated by the codegen tools.
We bring them back in sync by running bin/updated-codegen.sh. This should be a non-functional and non-breaking change.
Signed-off-by: Alex Leong <alex@buoyant.io>
Currently, we don't have a simple way of checking if the endpoint a proxy is discovering is in the same zone or not.
This adds a "zone_locality" metric label to the outbound destination address metrics. Note that this does not increase the cardinality of the related metrics, as this label doesn't vary within an endpoint.
Validated by checking the prometheus metrics on a local cluster and verifying this label appears in the outbound transport metrics.
Signed-off-by: Scott Fleener <scott@buoyant.io>
Default values for 30s will be enough to linux TCP-stack completes about 7 packages retransmissions, after about 7 retransmissions RTO (retransmission timeout) will rapidly grows and does not make much sense to wait for too long.
Setting TCP_USER_TIMEOUT between linkerd-proxy and wild world is enough, since connections to containers in same pod are more stable and reliable
Fixes#13023
Signed-off-by: UsingCoding <extendedmoment@outlook.com>
This change adds the org.opencontainers.image.source label to all Dockerfiles.
It allows tools to find the source repository for the produced images.
Signed-off-by: Maxime Brunet <max@brnt.mx>
* build(deps): bump k8s.io/client-go from 0.30.3 to 0.31.0
Bumps [k8s.io/client-go](https://github.com/kubernetes/client-go) from 0.30.3 to 0.31.0.
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md)
- [Commits](https://github.com/kubernetes/client-go/compare/v0.30.3...v0.31.0)
---
updated-dependencies:
- dependency-name: k8s.io/client-go
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* To apease the linter, replaced deprecated workqueue interfaces with their typed alternatives. For the endpoints controller we can instantiate with . But for the service mirror, given the queue can hold different event types, we have to instantiate with .
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
* Dual-stack support for ExternalWorkloads
This changes the `workloadIPs.maxItems` field in the ExternalWorkload CRD from `1` to `2`, to accommodate for an IPv4 and IPv6 pair. This is a BC change, so there's no need to bump the CRD version.
The control plane already supports this, so this change is mainly about expansions to the unit tests to also account for the double stack case.