Commit Graph

903 Commits

Author SHA1 Message Date
Justin Seiser a14f09a306
Add pod ip to via downward API to Trace Attributes (#13981)
* Add pod ip to via downward API to Trace Attributes

Provide additional attributes for tracing associations

Modified the helm templates to add the pod IP, and the jaeger injector to add it to the
standard attributes

Deploying should show the ip as a trace attribute

Fixes #13980

Signed-off-by: Justin <justin@sphinxdefense.com>

* fix: Update proxy-injector controller test goldens

These aren't updated automatically by the `go test ./cli/cmd/... --update` command, so they have to be updated manually.

Signed-off-by: Scott Fleener <scott@buoyant.io>

---------

Signed-off-by: Justin <justin@sphinxdefense.com>
Signed-off-by: Scott Fleener <scott@buoyant.io>
Co-authored-by: Scott Fleener <scott@buoyant.io>
2025-05-28 09:00:26 -04:00
Alex Leong db495d6765
refactor(proxy-injector): make injection code take a value overrider as a parameter (#14037)
In order to make the way that proxy injector patches more flexible, we adjust the method signature of `ResourceConfig.GetPodPatch` to accept a `ValueOverrider`.  The type of `ValueOverrider` is:

```
func(values *l5dcharts.Values, overrides map[string]string, namedPorts map[string]int32) (*l5dcharts.Values, error)
``` 

and specifies how overrides (in the form of pod and namespace annotations) get translated into values for the proxy patch template.

The current override behavior, specified in `GetOverriddenValues`, is supplied in all cases, making this a refactor with no functional changes.

Signed-off-by: Alex Leong <alex@buoyant.io>
2025-05-23 15:30:58 -07:00
Alex Leong 7edd886c91
test(multicluster): Wait for cluster store to be populated in test
The federated service watcher test has a race condition where we create a cluster store with a set of kubernetes manifests and then immediately begin testing queries to that cluster store.  If these queries are executed before the cluster store's informers process the kubernetes manifests, the queries can fail.

In the context of this test, this failure manifests as the read on the updates channel never returning, resulting in test timeouts.

We fix this by waiting for the cluster store to be populated before continuing with the test and issuing queries.

Signed-off-by: Alex Leong <alex@buoyant.io>
2025-04-11 14:41:56 -07:00
Oliver Gould 6501de61ed
fix(test): avoid duplicate registry errors (#13898)
The destination controller's cluster store registers a gague in its constructor. When this constructor is called multiple times (i.e. in tests), this can lead to a panic.

To avoid this panic, this change updates NewClusterStoreWithDecoder to accept a prometheus registry). The NewClusterStore constructor (used by the application's main) continues to use the default registry, but tests now construct their own temporary registries to avoid duplicate registration errors.
2025-04-03 15:05:24 -05:00
Alex Leong e97b51b803
feat(mutlicluster): Add support for excluding labels and annotations from federated and mirror services (#13802)
Depends on https://github.com/linkerd/linkerd2/pull/13801

Adds support for excluding certain labels and annotations from being copied onto mirror and federated services.  This makes use of the `excludedLabels` and `excludedAnnoations` fields in the Link resource.  These fields take a list of strings which may be literal label/annotation names or they may be group globs of the form `<group>/*` which will match all labels/annotations beginning with `<group>/`.  Any matching labels or annotations will not be copied.

We also add corresponding flags to the `mc link` command: `--excluded-labels` and `--excluded-annotations` for setting these fields on the Link resource.
2025-03-26 15:08:09 -05:00
Scott Fleener 838f2fd222
feat(policy): Configure outbound hostname labels in metrics (#13822)
Linkerd proxies no longer omit `hostname` labels for outbound policy metrics (due to potential for high-cardinality).

This change adds Helm templates and annotations to control this behavior, allowing users to opt-in to these outbound hostname labels.

Signed-off-by: Scott Fleener <scott@buoyant.io>
2025-03-25 16:39:36 -07:00
Alejandro Pedraza 38ec59128e
refactor(multicluster): revert Link permissions back to Role (#13848)
#13783 moved the service mirror permissions on Links from a Role to a ClusterRole as a side-effect, and this change reverts that by refactoring the Links API to allow consuming a namespace-scoped API more easily.

- We introduce in our `k8s.API` type a field `L5dClient` alongside the broad `Client` one, which is constructed via the new function `NewL5dNamespacedAPI()`.
- In the service-mirror `main.go` we use that constructor to acquire `linksAPI`, which is used to configure the informer for handling Link events in this file.
- `linksAPI` is also passed down to instantiations of `RemoteClusterServiceWatcher`, where it's used for the direct kube-apiserver calls and for retrieving a Lister.
2025-03-22 05:44:35 -05:00
Zahari Dichev f57137b121
fix(dest): fallback to default proxy inbound port when one could not be discovered on an ExternalWorkload (#13840)
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2025-03-21 15:25:15 +02:00
Alex Leong 049bc0cb04
feat(multicluster): Add Link v1alpha3 (#13801)
We add a new v1alpha3 resource version to the Link custom resource.  This version adds `excludedAnnotations` and `excludedLabels` fields to the spec which will be used to exclude labels and annotations from being copied onto mirror and federated services.

Signed-off-by: Alex Leong <alex@buoyant.io>
2025-03-19 12:13:16 -07:00
Alejandro Pedraza 03209ddf48
fix(test): add missing env var to debug annotation test (#13825)
Followup to #13778, where a new test case was introduced for testing the
debug container annotation, but didn't account for the new
LINKERD2_PROXY_CORES_MIN environment variable.
2025-03-18 16:38:22 -05:00
vishal tewatia bd577deb54
fix(injector): use annotated values for debug container (#13778)
Issue #13636 was opened stating that custom debug container annotations
had no effect.

Quick investigation confirmed the issue and further debugging revealed a
bug in code where the final values for helm chart were not using values
processed by GetOverriddenValues function and that's why annotations had
no effect for debug containers. This had been fixed now.

Added to unit test to test added code. Manual testing also done. The
issue seems to be resolved.

Fixes #13636

Signed-off-by: Vishal Tewatia <tewatiavishal3@gmail.com>
Co-authored-by: Vishal Tewatia <tewatiavishal3@gmail.com>
2025-03-18 14:25:16 -05:00
Alex Leong 227c4c1db1
feat!(multicluster): Federated services take metadata from member with the oldest Link (#13783)
When the service mirror controller detects the first member of a federated service, it will create the federated service itself and will copy the port definitions from the founding member service.  However, from that point on, the port definition in the federated service will never be updated, even if new members that define different ports join the federated service or if the ports in the founding member service are updated.

This leads to a scenario where the federated service’s port definitions become out of date if the member services’ port definitions change.  Similarly for labels and annotations.

We update the service mirror controller to look at the createdAt timestamp of each member service's Link resource and to use the member with the oldest Link as the authoritative source of truth for the federated service metadata: labels, annotations, and ports.

In the case where the service with the oldest Link leaves the federated service, it's metadata will continue to be present on the federated service until a resync occurs from the member with the next oldest Link (at most 10 minutes).

Signed-off-by: Alex Leong <alex@buoyant.io>
2025-03-17 16:26:26 -07:00
Oliver Gould cb86d669ea
feat(inject): replace proxy.cores with proxy.runtime.workers (#13767)
The proxy.cores helm value is overly restrictive: it enforces a hard upper
limit. In some scenarios, a fixed limit is not practical: for example, when the
proxy is meshing an application that configures no limits.

This change replaces the proxy.cores value with a new proxy.runtime.workers
structure, with members:

- `minimum`: configures the minimum number of worker threads a proxy may use.
- `maximumCPURatio`: optionally allows the proxy to use a larger
  number of CPUs, relative to the number of available CPUs on the node.

So with a minimum of 2 and a ratio of 0.1, a proxy would run 2 worker threads
(the minimum) running on an 8 core node, but allocate 10 worker threads on a 96
core node.

When the `config.linkerd.io/proxy-cpu-limit` is used, that will continue to set
the maximum number of worker threads to a fixed value.

When it is not set, however, the minimum worker pool size is derived from the
`config.linkerd.io/proxy-cpu-request`.

An additional `config.linkerd.io/proxy-cpu-ratio-limit` annotation is introduced
to allow workload-level configuration.
2025-03-17 18:54:20 +00:00
Scott Fleener 5d0275e3f3
feat(dest): Default meshed traffic to inbound proxy port (#13715)
A follow up to https://github.com/linkerd/linkerd2/pull/13699, this default-enables the config option introduced in that PR. Now, all traffic between meshed pods should flow to the proxy's inbound port.

Signed-off-by: Scott Fleener <scott@buoyant.io>
2025-03-11 15:25:40 -07:00
Scott Fleener 05d48f6a52
fix(destination): Do not send admin traffic over opaque transport (#13758)
Traffic that is meant for the destination workload can be sent over the opaque transport without issue. However, traffic intended for the proxy itself (metrics scraping, tap) need to be sent directly to the corresponding proxy port to prevent them from being forwarded to the workflow.

This adds in special cases for the admin and control ports, read directly from the environment variables on the pods, that excludes them from being sent over opaque transport.

Signed-off-by: Scott Fleener <scott@buoyant.io>
2025-03-11 12:15:10 -07:00
Scott Fleener 156bf60ad7
feat(destination): introduce transport-protocol outbound TLS mode (#13699)
Non-opaque meshed traffic currently flows over the original destination port, which requires the inbound proxy to do protocol detection.

This adds an option to the destination controller that configures all meshed traffic to flow to the inbound proxy's inbound port. This will allow us to include more session protocol information in the future, obviating the need for inbound protocol detection.

This doesn't do much in the way of testing, since the default behavior should be unchanged. When this default changes, more validation will be done on the behavior here.

Signed-off-by: Scott Fleener <scott@buoyant.io>
2025-03-05 13:51:21 -08:00
Alejandro Pedraza a726757fb1
fix(service-mirror): don't restart cluster watch upon Link status updates (#13579)
* fix(service-mirror): don't restart cluster watch upon Link status updates

Every time there's an update to a Link resource the service mirror restarts the cluster watch after cleaning up any existing worker. We recently introduced a status stanza in Link that gets updated upon every mirroring of a service, which was unnecessarily triggering a cluster watcher restart. For a sufficiently high number of services getting mirrored at once this was causing severe contention on the controller, delaying mirroring up to a halt.

This change fixes the situation by only considering changes in the Link Spec for restarting the cluster watch.

* Lower log level

* Extract the resource event handler functions into a separate file, and add unit test making sure the add/update/delete functions are called, and that in particular the update function is _not_ called when updating a Link status.
2025-01-22 12:35:15 -05:00
Tuomo ba8a84c960
fix(destination): GetProfile requests targeting pods directly should return endpoint data for running (not necessarily ready) pods (#13557)
* fix(destination): GetProfile requests targeting pods directly should return endpoint data for running (not necessarily ready) pods

Requiring Pods to pass readiness checks before allowing Pod to Pod communication disrupts communication in e.g. clustered systems which require Pods to communicate with each other prior to establishing ready state and allowing inbound traffic.

Relaxed the requirement and modified the workload watcher to only require that a Pod exists and is in Running phase.

Reproduced the issue with a test setup described in #13247.

Fixes #13247.

---------

Signed-off-by: Tuomo <tjorri@gmail.com>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
2025-01-16 16:55:31 -05:00
Scott Fleener 44e54696f5
Add pod UID and container name to proxy env (#13501)
These values are useful as fields for correlating OpenTelemetry traces. A corresponding proxy change will be needed to emit these fields in said traces.

Signed-off-by: Scott Fleener <scott@buoyant.io>
2024-12-19 09:39:31 -05:00
Scott Fleener 3847f9cf13
Set minimum TLS version to 1.3 (#13500)
This helps ensure a minimum level of security. The two places this affects is our controller webhook and linkerd-viz tap API.

The controller requires that kube-api supports TLSv1.3, which it does as of 1.19 (our minimum is currently 1.22). The linkerd-viz tap API is mostly used internally, and is deprecated. It may be worth revisiting if we want to keep it around at all.

Signed-off-by: Scott Fleener <scott@buoyant.io>
2024-12-19 09:19:09 -05:00
Alejandro Pedraza 87917e557a
deps: bump proxy-init from v2.4.1 to v2.4.2 (#13487)
https://github.com/linkerd/linkerd2-proxy-init/releases/tag/proxy-init%2Fv2.4.2

This only includes dependencies updates, notably the bump for the Alpine
base image.
2024-12-19 09:15:54 -05:00
Oliver Gould f5b82f25f9
fix(proxy-injector): handle proxy-log-level with quotes (#13480)
The proxy accepts log filters in the form `target[fields...]=level`, where a
field may include a value match. This leads to log filters like
`linkerd[name="outbound"]=debug`.

When a log filter is configured via annotation or Helm, the proxy-injector fails
to properly quote the log environment variable, leading to a failure to patch
resources properly.

To fix this, this change ensures that the log level is quoted, which properly
escapes any quotes in the filter itself.
2024-12-13 12:12:09 -08:00
Oliver Gould 61cc57db33
chore(proxy-injector): reduce test boilerplate (#13479)
The webhook tests include boilerplate that is easy to eliminate.

In preparation for adding more tests, this commit adds helper functions.
2024-12-12 23:39:44 +00:00
Oliver Gould 08a6dba655
chore: remove the bin/protoc script (#13459)
The bin/protoc script is ancient and not useful, especially in light of tools
provided by dev containers.

Furthermore, it includes a reference to an old gross sourcefource downlaod page
for unzip.

This change removes the unused script.
2024-12-11 16:24:07 -08:00
Alex Leong 396af7c946
refactor(multicluster): Replace use of unstructured API with typed bindings for Link CR (#13420)
The linkerd-multicluster extension uses client-go's `unstructured` API to access Link custom resources.  This API allowed us to develop quickly without the work of generating typed bindings.  However, using the unstrucutred API is error prone since fields must be accessed by their string name.  It is also inconsistent with the rest of the project which uses typed bindings.

We replace the use of the unstructured API for Link resources with generated typed bindings.  The client-go APIs are slightly different and client-go does not provide a way to update subresources for typed bindings.  Therefore, when updating a Link's status subresource, we use a patch instead of an update.

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-12-10 11:44:19 -08:00
Oliver Gould 17b2692d58
build(deps): bump linkerd/dev from v43 to v44 (#13428)
* docker.io/library/golang from 1.22 to 1.23
* gotestsum from 0.4.2 to 1.12.0
* protoc-gen-go from 1.28.1 to 1.35.2
* protoc-gen-go-grpc from 1.2 to 1.5.1
* docker.io/library/rust from 1.76.0 to 1.83.0
* cargo-deny from 0.14.11 to 0.16.3
* cargo-nextest from 0.9.67 to 0.9.85
* cargo-tarpaulin from 0.27.3 to 0.31.3
* just from 1.24.0 to 1.37.0
* yq from 4.33.3 to 4.44.5
* markdownlint-cli2 from 0.10.0 to 0.15.0
* shellcheck from 0.9.0 to 0.10.0
* actionlint from 1.6.26 to 1.7.4
* protoc from 3.20.3 to 29.0
* step from 0.25.2 to 0.28.2
* kubectl from 1.29.2 to 1.31.3
* k3d from 5.6.0 to 5.7.5
* k3s image shas
* helm from 3.14.1 to 3.16.3
* helm-docs from 1.12.0 to 1.14.2
2024-12-06 11:38:36 -08:00
Oliver Gould 3c91fc64ce
fix(destination): avoid panic on missing managed fields timestamp (#13378)
We received a report of a panic:

    runtime error: invalid memory address or nil pointer dereference

    panic({0x1edb860?, 0x37a6050?}
        /usr/local/go/src/runtime/panic.go:785 +0x132

    github.com/linkerd/linkerd2/controller/api/destination/watcher.latestUpdated({0xc0006b2d80?, 0xc00051a540?, 0xc0008fa008?})
        /linkerd-build/vendor/github.com/linkerd/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:1612 +0x125

    github.com/linkerd/linkerd2/controller/api/destination/watcher.(*OpaquePortsWatcher).updateService(0xc0007d5480, {0x21fd160?, 0xc000d71688?}, {0x21fd160, 0xc000d71688})
        /linkerd-build/vendor/github.com/linkerd/linkerd2/controller/api/destination/watcher/opaque_ports_watcher.go:141 +0x68

The `latestUpdated` function does not properly handle the case where a atime is
omitted from a `ManagedFieldsEntry`.

    type ManagedFieldsEntry struct {
        // Time is the timestamp of when the ManagedFields entry was added. The
        // timestamp will also be updated if a field is added, the manager
        // changes any of the owned fields value or removes a field. The
        // timestamp does not update when a field is removed from the entry
        // because another manager took it over.
        // +optional
        Time *Time `json:"time,omitempty" protobuf:"bytes,4,opt,name=time"`

This change adds a check to avoid the nil dereference.
2024-11-22 15:21:09 -08:00
Derek Brown 80e444edbd
lint: fix docker build warnings (#13351)
Docker builds emit a warning because the case of 'FROM' and 'as' don't match. Fix this everywhere.

Signed-off-by: Derek Brown <6845676+DerekTBrown@users.noreply.github.com>
2024-11-20 08:44:45 -05:00
Alex Leong 752d1c9ea0
Add tests for federated service watcher (#13329)
Adds tests for the federated service watcher that exercise having remote and local clusters join and leave a federated service and ensuring that the correct proxy API updates are emitted.

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-11-19 10:08:50 -08:00
MicahSee 264b67c5fe
Ensure consistent JSON logging for proxy-injector container (#13335)
Ensure consistent JSON logging for proxy-injector container

When JSON logging is enabled in the proxy-injector `controllerLogFormat: json` some log messages adhere to the JSON format while others do not. This inconsistency creates difficulty in parsing logs, especially in automated workflows. For example:

```
{"level":"info","msg":"received admission review request \"83a0ce4d-ab81-42c9-abe4-e0ade0f926e2\"","time":"2024-10-10T21:06:18Z"}
time="2024-10-10T21:06:18Z" level=info msg="received pod/mypod"
```

Modified the logging implementation in the `controller/proxy-injector/webhook.go` to ensure all log messages follow the JSON format consistently. This was achieved by removing a new instance of the logrus logger that was being created in the file and replacing it with the global logger instance, ensuring all logs respect the controllerLogFormat configuration.

Reproduced the issue by enabling JSON logging and observing mixed-format logs when install the emojivoto sample application. Applied the changes and verified that all logs consistently use the JSON format.
Ran the linkerd check command and confirmed there are no additional warnings or issues.
Tested various scenarios, including pods with and without the injection annotation, to ensure consistent logging behavior.
Fixes #13168

Signed-off-by: Micah See msee@usc.edu
2024-11-18 17:15:24 +00:00
Oliver Gould 5cbe45c86e
feat(destination): set parent and profile references (#13292)
In order for proxies to properly reflect the resources used to drive policy
decisions, the proxy API has been updated to include resource metadata on
ServiceProfile responses.

This change updates the profile translator to include ParentRef and ProfileRef
metadata when it is configured.

This change does not set backend or endpoint references.
2024-11-09 00:11:40 +00:00
Alex Leong 50b6a17e68
Add support for federated services to the service mirror controller (#13269)
When the service mirror controller detects a service in the remote cluster which matches the federated service selector (`mirror.linkerd.io/federated=memeber` by default), it will add that service to the federated service in the local cluster named `<svc>-federated`, creating this service if it does not already exist.  To join a service to a federated service, it is added to the `multicluster.linkerd.io/remote-discovery` annotation on the federated service which contains a comma separated list of values in the form `<svc>@<cluster>`.  When a remote service no longer exists or matches the federated service selector, it is removed from the federated service by removing it from the `mutlicluster.linkerd.io/remote-discovery` annotation.

We also add a new `local-service-mirror` deployment to the Linkerd-multicluster extension which watches the local cluster for any services which match the federated service selector.  Any services in the local cluster which match will be added to the federated service by setting the `mutlicluster.linkerd.io/local-discovery` annotation on the federated service to the local service name.

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-11-08 09:34:29 -08:00
Alex Leong c66f83e1f1
Add federated service watcher (#13267)
We add support for federated services to the destination controller by adding a new FederatedServiceWatcher.  When the destination controller receives a `Get` request for a Service with the `multicluster.linkerd.io/remote-discovery` and/or the `multicluster.linkerd.io/local-discovery` annotations, it subscribes to the FederatedServiceWatcher instead of subscribing to the EndpointsWatcher directly.  The FederatedServiceWatcher watches the federated service for any changes to these annotations, and maintains the appropriate watches on the local EndpointWatcher and/or remote EndpointWatchers fetched through the ClusterStore.

This means that we will often have multiple EndpointTranslators writing to the same `Get` response stream.  In order for a `NoEndpoints` message sent to one EndpointTranslator to not clobber the whole stream, we make a change where `NoEndpoints` messages are no longer sent to the response stream, but are replaced by a `Remove` message containing all of the addresses from that EndpointTranslator.  This allows multiple EndpointTranslators to coexist on the same stream.

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-11-08 09:34:01 -08:00
Alex Leong bcc563812a
Update generated client-go code (#13167)
Our generated client-go code committed in the repo has diverged from the code generated by the codegen tools.

We bring them back in sync by running bin/updated-codegen.sh. This should be a non-functional and non-breaking change.

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-10-22 17:08:43 -07:00
Scott Fleener 958cfca666
Export zone locality in outbound destination metrics (#13129)
Currently, we don't have a simple way of checking if the endpoint a proxy is discovering is in the same zone or not.

This adds a "zone_locality" metric label to the outbound destination address metrics. Note that this does not increase the cardinality of the related metrics, as this label doesn't vary within an endpoint.

Validated by checking the prometheus metrics on a local cluster and verifying this label appears in the outbound transport metrics.

Signed-off-by: Scott Fleener <scott@buoyant.io>
2024-10-15 13:43:05 -07:00
Vadim Makerov 005a3a470e
Implement providing configuration for TCP_USER_TIMEOUT to linkerd-proxy (#13024)
Default values for 30s will be enough to linux TCP-stack completes about 7 packages retransmissions, after about 7 retransmissions RTO (retransmission timeout) will rapidly grows and does not make much sense to wait for too long.

Setting TCP_USER_TIMEOUT between linkerd-proxy and wild world is enough, since connections to containers in same pod are more stable and reliable

Fixes #13023

Signed-off-by: UsingCoding <extendedmoment@outlook.com>
2024-10-03 16:08:01 +00:00
Maxime Brunet 260fd19a2a
build: add image source label to all Dockerfiles (#13042)
This change adds the org.opencontainers.image.source label to all Dockerfiles.

It allows tools to find the source repository for the produced images.

Signed-off-by: Maxime Brunet <max@brnt.mx>
2024-09-10 11:25:32 -07:00
dependabot[bot] 4baa94baac
build(deps): bump k8s.io/client-go from 0.30.3 to 0.31.0 (#12958)
* build(deps): bump k8s.io/client-go from 0.30.3 to 0.31.0

Bumps [k8s.io/client-go](https://github.com/kubernetes/client-go) from 0.30.3 to 0.31.0.
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md)
- [Commits](https://github.com/kubernetes/client-go/compare/v0.30.3...v0.31.0)

---
updated-dependencies:
- dependency-name: k8s.io/client-go
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* To apease the linter, replaced deprecated workqueue interfaces with their typed alternatives. For the endpoints controller we can instantiate with . But for the service mirror, given the queue can hold different event types, we have to instantiate with .

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
2024-09-04 09:04:04 -05:00
Alejandro Pedraza 567288a060
Dual-stack support for ExternalWorkloads (#12965)
* Dual-stack support for ExternalWorkloads

This changes the `workloadIPs.maxItems` field in the ExternalWorkload CRD from `1` to `2`, to accommodate for an IPv4 and IPv6 pair. This is a BC change, so there's no need to bump the CRD version.

The control plane already supports this, so this change is mainly about expansions to the unit tests to also account for the double stack case.
2024-08-30 13:23:56 -05:00
Alejandro Pedraza 332c4efa8c
Only bind to IPv6 addresses when disableIPv6=false (#12938)
## Problem

When the IPv6 stack in Linux is disabled, the proxy will crash at startup.

## Repro

In a Linux machine, disable IPv6 networking through the `net.ipv6.conf.*` sysctl kernel tunables, and restart the system:

- In /etc/sysctl.conf add:
```
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
```

- In /etc/default/grub set:
```
GRUB_CMDLINE_LINUX="ipv6.disable=1"
```

- Don't forget to update grub before rebooting:
```
sudo update-grub
```

In a default k3d cluster, install Linkerd. You should see the following error in any proxy log:

```
thread 'main' panicked at /__w/linkerd2-proxy/linkerd2-proxy/linkerd/app/src/lib.rs:245:14:
Failed to bind inbound listener: Os { code: 97, kind: Uncategorized, message: "Address family not supported by protocol" }
```

## Cause

Even if a k8s cluster didn't support IPv6, we were counting on the nodes having an IPv6 stack, which allowed us to bind to the inbound proxy to [::] (although not to [::1] for the outbound proxy, as seen in GKE). This was the case in the major cloud providers we tested, but it turns out there are folks running nodes with IPv6 disabled and so we have to cater that case as well.

## Fix

The current change undoes some of the changes from 7cbe2f5ca6 (for the proxy config), 7cbe2f5ca6 (for the policy controller) and 66034099d9 (for linkerd-cni), binding back again to 0.0.0.0 unless `disableIPv6` is false.
2024-08-05 13:29:55 -05:00
Alex Leong 59fbf7b01d
feat(proxy): Disable all header and request logging (#12903)
The Linkerd proxy suppresses all logging of HTTP headers at debug level or higher unless the `proxy.logHTTPHeaders` helm values is set to `insecure`.  However, even when this value is not set, HTTP headers can still be logged if the log level is set to trace.

We update the log string we use to disable logging of HTTP headers from `linkerd_proxy_http::client[{headers}]=off` to the more general `[{headers}]=off,[{request}]=off`.  This will disable any logging which includes a `headers` or `request` field.  This has the effect of disabling the logging of headers at trace level as well.  As before, these logs can be re-enabled by settings `proxy.logHTTPHeaders=insecure`.

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-07-31 11:31:25 -07:00
Alejandro Pedraza 71291fe7bc
Add `accessPolicy` field to Server CRD (#12845)
Followup to #12844

This new field defines the default policy for Servers, i.e. if a request doesn't match the policy associated to a Server then this policy applies. The values are the same as for `proxy.defaultInboundPolicy` and the `config.linkerd.io/default-inbound-policy` annotation (all-unauthenticated, all-authenticated, cluster-authenticated, cluster-unauthenticated, deny), plus a new value "audit". The default is "deny", thus remaining backwards-compatible.

This field is also exposed as an additional printer column.
2024-07-22 09:01:09 -05:00
Matei David f05d1e9e26
feat(helm): default proxy-init resource requests to proxy values (#12741)
Default values for `linkerd-init` (resources allocated) are not always
the right fit. We offer default values to ensure proxy-init does not get
in the way of QOS Guaranteed (`linkerd-init` resource limits and
requests cannot be configured in any other way).

Instead of using default values that can be overridden, we can re-use
the proxy's configuration values. For the pod to be QOS Guaranteed, the
values for the proxy have to be set any way. If we re-use the same
values for proxy-init we can ensure we'll always request the same amount
of CPU and memory as needed.

* `linkerd-init` now defaults to the proxy's values
* when the proxy has an annotation configuration for resource requests,
  it also impacts `linkerd-init`
* Helm chart and docs have been updated to reflect the missing values.
* tests now no longer use `ProxyInit.Resources`

UPGRADE NOTE:
- Deprecates `proxyInit.resources` field in the Helm values.
  - It will be a no-op if specified (no hard failures)

Closes #11320

---------

Signed-off-by: Matei David <matei@buoyant.io>
2024-06-24 12:37:47 +01:00
Alex Leong 35fb2d6d11
feat!: Add config to disable proxy /shutdown admin endpoint (#12705)
The proxy may expose a /shutdown HTTP endpoint on its admin server that may be used by `linkerd-await --shutdown` to trigger proxy shutdown after a process completes. If an application has an SSRF vulnerability, however, an attacker could use this endpoint to trigger proxy shutdown, causing a denial of service. This admin endpoint is only useful with linkerd-await; and this functionality is supplanted by Kubernetes Native Sidecars.

To address this potential issue, this change disables the proxy's admin endpoint by default. A helm value is introduced to support enabling the admin endpoint cluster-wide; and the `config.linkerd.io/proxy-admin-shutdown: enabled` annotation may be set to enable it the admin endpoint on an individual workload.

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-06-14 09:55:15 -07:00
Alejandro Pedraza 80a1803c7f
Properly set log level for hickory dependency in proxy (#12722)
Followup to linkerd/linkerd2-proxy#2872 , where we swapped the
trust-dns-resolver with the hickory-resolver dependency. This change
updates the default log level setting for the proxy to account for
that.
2024-06-14 08:32:12 -07:00
Alejandro Pedraza b59149388f
Bump proxy-init to v2.4.1 and cni-plugin to v1.5.1 (#12711)
Those releases ensure that when IPv6 is enabled, the series of ip6tables commands succeed. If they fail, the proxy-init/linkerd-cni containers should fail as well, instead of ignoring errors.

See linkerd/linkerd2-proxy-init#388
2024-06-13 17:15:41 -05:00
Alex Leong e0fe0248d5
Add config to disable HTTP proxy logging (#12665)
Fixes #12620

When the Linkerd proxy log level is set to `debug` or higher, the proxy logs HTTP headers which may contain sensitive information.

While we want to avoid logging sensitive data by default, logging of HTTP headers can be a helpful debugging tool.  Therefore, we add a `proxy.logHTTPHeaders` Helm value which prevents the logging of HTTP headers when set to false.  The default value of this value is false so that headers cannot be logged unless users opt-in.

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-06-11 17:46:54 -07:00
Alex Leong 3bd01cac9c
add nil check when reading endpoint hostname (thanks @acallejaszu) (#12689)
Fixes #12686

When an endpoint in an EndpointSlice resource does not contain a hostname field, the destination controller can panic while looking for an endpoint with a certain hostname.  This happens when doing a lookup with a pod dns name.

We add a nil check to avoid the panic.

We add such an endpoint to our test fixture to exercise this case.

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-06-10 10:45:31 -07:00
Nico Feulner 3d674599b3
make group ID configurable (#11924)
Fixes #11773

Make the proxy's GUID configurable via `proxy.gid` which defaults to `-1`, in which case the GUID is not set.
Also added ability to set the GUID for proxy-init and the core and extension controllers.

---------

Signed-off-by: Nico Feulner <nico.feulner@gmail.com>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
2024-05-23 15:54:21 -05:00
dependabot[bot] d42432914d
build(deps): bump google.golang.org/grpc from 1.63.2 to 1.64.0 (#12593)
* build(deps): bump google.golang.org/grpc from 1.63.2 to 1.64.0

Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.63.2 to 1.64.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.63.2...v1.64.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

I've replaced all the `grpc.Dial` calls with `grpc.NewClient`. There was one call `grpc.DialContext(ctx,...)` in `viz/tap/api/grpc_server.go` that also got replaced with `grpc.NewClient`, which loses the ability to pass `ctxt` but that doesn't matter; as we're not using `WithBlock(true)` that context wasn't being accounted for when we were using `DialContext()` anyways.

https://github.com/grpc/grpc-go/blob/v1.64.0/clientconn.go#L216-L242

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
2024-05-22 14:40:04 -05:00