Updated the ExternalWorkload CRD to v1beta1, renaming the meshTls field to
meshTLS ([#12098])
* Updated the proxy to address some logging and metrics inconsistencies
([#12099])
The ExternalWorkload resource we introduced has a minor naming
inconsistency; `Tls` in `meshTls` is not capitalised. Other resources
that we have (e.g. authentication resources) capitalise TLS (and so does
Go, it follows a similar naming convention).
We fix this in the workload resource by changing the field's name and
bumping the version to `v1beta1`.
Upgrading the control plane version will continue to work without
downtime. However, if an existing resource exists, the policy controller
will not completely initialise. It will not enter a crashloop backoff,
but it will also not become ready until the resource is edited or
deleted.
Signed-off-by: Matei David <matei@buoyant.io>
The proxy injector's admission request timeout is set to the Kubernetes default
10 second value. If the proxy injector does not write out a response within
this time frame, the `webhookFailurePolicy` configured on the webhook will be
used by the API Server.
In certain situations, it would help to have the timeout value configurable.
This change introduces a new Helm value for the `proxyInjector` that allows the
webhook config timeout duration to be overridden.
---------
Signed-off-by: Michael Bell <mbell@opentable.com>
Signed-off-by: Michael Bell <mikebell90@users.noreply.github.com>
Signed-off-by: Alex Leong <alex@buoyant.io>
Co-authored-by: Alex Leong <alex@buoyant.io>
In certain cases (e.g. high CPU load) kubelets can be slow to read readiness
and liveness responses. Linkerd is configured with a default time out of `1s`
for its probes. To prevent injected pod restarts under high load, this
change makes probe timeouts configurable.
---------
Signed-off-by: Matei David <matei@buoyant.io>
Co-authored-by: Matei David <matei@buoyant.io>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
This edge release contains performance and stability improvements to the
Destination controller, and continues stabilizing support for ExternalWorkloads.
* Reduced the load on the Destination controller by only processing Server
updates on workloads affected by the Server ([#12017])
* Changed how the Destination controller reacts to target clusters (in
multicluster pod-to-pod mode) whose Server CRD is outdated: skip them and log
an error instead of panicking ([#12008])
* Improved the leader election of the ExternalWorkloads Endpoints controller to
avoid missing events ([#12021])
* Improved naming of EndpointSlices generated by ExternWorkloads ([#12016])
* Restriced the number of IPs an ExternalWorkload can have ([#12026])
A Kubernetes pod may be assigned at [most one IP address][pod-docs] for each supported protocol (i.e. IPv6 and IPv4), without the use of specialised CNIs or network configurations. When processing addresses in an endpoint, we will only ever use one address.
ExternalWorkload resources have a generic workloadIPs field that allow any number of addresses to be added. We want the behaviour to be similar to a pod -- only one address (of each protcol) should be used for routing.
We restrict the CRD server-side validation to allow only one IP address. Since we do not yet support IPv6, this will ensure that two IPv4 addresses will not be declared by the same workload. Once IPv6 support lands, or once we have a dedicated validator, we can relax the CRD validation.
[pod-docs]: https://pkg.go.dev/k8s.io/kubernetes@v1.29.1/pkg/apis/core#PodStatus
### How to test
* Install Linkerd after building the branch (just the crds will do, `linkerd install --crds`).
* Try to apply the following CRD:
```yaml
apiVersion: workload.linkerd.io/v1alpha1
kind: ExternalWorkload
metadata:
labels:
app: legacy
name: external-workload-invalid
namespace: mixed-env
spec:
meshTls:
identity: spiffe://root.linkerd.cluster.local/external-workload-invalid
serverName: external-workload-invalid.cluster.local
ports:
- name: http
port: 80
protocol: TCP
workloadIPs:
- ip: 172.22.0.5
- ip: 172.22.0.6
status:
conditions:
- lastTransitionTime: "2024-01-24T11:53:43Z"
message: This workload is alive
reason: Alive
status: "True"
type: Ready
```
* Expect the creation to fail
> The ExternalWorkload "external-workload-invalid" is invalid: spec.workloadIPs: Too many: 2: must have at most 1 items
*Edit*: going to open up a separate PR for the refactor.
Signed-off-by: Matei David <matei@buoyant.io>
ExternalWorkload resources require that status condition has almost all of its
fields set (with the exception of a date field). The original inspiration for
this design was the HTTPRoute object.
When using the resource, it is more practical to handle many of the fields as
optional; it is cumbersome to fill out the fields when creating an
ExternalWorkload. We change the settings to be in-line with a [Pod] object
instead.
[Pod]:
7d1a2f7a73/core/v1/types.go (L3063-L3084)
---------
Signed-off-by: Matei David <matei@buoyant.io>
This edge release incrementally improves support for ExternalWorkload resources
throughout the control plane.
Signed-off-by: Alex Leong <alex@buoyant.io>
For mesh expansion, we need to register an ExternalWorkload's service
membership. Service memberships describe which Service objects an
ExternalWorkload is part of (i.e. which service can be used to route
traffic to an external endpoint).
Service membership will allow the control plane to discover
configuration associated with an external endpoint when performing
discovery on a service target.
To build these memberships, we introduce a new controller to the
destination service, responsible for watching Service and
ExternalWorkload objects, and for writing out EndpointSlice objects for
each Service that selects one or more external endpoints.
As a first step, we add a new externalworkload module and a new controller in the
that watches services and workloads. In a follow-up change,
the ExternalEndpointManager will additionally perform
the necessary reconciliation by writing EndpointSlice objects.
Since Linkerd's control plane may run in HA, we also add a lease object
that will be used by the manager. When a lease is claimed, a flag is
turned on in the manager to let it know it may perform writes.
A more compact list of changes:
* Add a new externalworkload module
* Add an EndpointsController in the module along with necessary mechanisms to watch resources.
* Add RBAC rules to the destination service:
* Allow policy and destination to read ExternalWorkload objects
* Allow destination to create / update / read Lease objects
---------
Signed-off-by: Matei David <matei@buoyant.io>
ExternalWorkload resources represent as a resource configuration associated
with a process (or a group of processes) that are foreign to a Kubernetes
cluster. It allows Linkerd to read / write and store configuration for mesh
expansion. Since VMs will be able to receive inbound traffic from a variety of
resources, the proxy should be able to dynamically discover inbound
authorisation policies.
This change introduces a set of callbacks in the indexer that will apply (or
delete) ExternalWorkload resources. In addition, we ensure that
ExternalWorkloads can be processed in a similar fashion to pods (where
applicable, of course) wrt to server matching and defaulting. To serve
discovery requests for a VM, the policy controller will now also start a
watcher for external workloads and allow requests to reference an
`external_workload` target
A quick list of changes:
* ExternalWorkloads can now be indexed in the inbound (policy) index. Renamed
* the pod module in the inbound index to be more generic ("workload"); the
* module has some re-usable building blocks that we can use for external
* workloads. Moved common functions (e.g. building a default inbound server)
* around to share what's already been done without abstracting more or
* introducing generics. Changed gRPC target types to a tuple of `(Workload,
* port)` from a tuple of `(String, String, port)` Added RBAC to watch external
* workloads.
---------
Signed-off-by: Matei David <matei@buoyant.io>
PR #11874 introduced a `proxy.ExperimentalEnv` setting, allowing
arbitrary name+value environment variables on proxies. This name+value
pairing was a subset of k8s' environment variables, specifically, it did
not allow for `valueFrom.configMapKeyRef` and related fields. PR #11908
introduced this pattern in the ControlPlane containers.
Modify `proxy.ExperimentalEnv` to behave identically to k8s' native
`EnvVar`, allowing settings such as:
```
--set proxy.experimentalEnv[0].name=LINKERD2_PROXY_DEFROBINATION
--set proxy.experimentalEnv[0].valueFrom.configMapKeyRef.key=extreme-key
--set proxy.experimentalEnv[0].valueFrom.configMapKeyRef.name=extreme-config
```
Context:
https://github.com/linkerd/linkerd2/pull/11908#issuecomment-1888945793
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Currently, the value that is put in the `LINKERD2_PROXY_POLICY_WORKLOAD` env var has the format of `pod_ns:pod_name`. This PR changes the format of the policy token into a json struct, so it can encode the type of workload and not only its location. For now, we add an additional `external_workload` type.
Zahari Dichev <zaharidichev@gmail.com>
We introduced an ExternalWorkload CRD along with bindings for mesh
expansion. Currently, the CRD allows users to create ExternalWorkload
resources without adding a meshTls strategy.
This change adds some more validation restrictions to the CRD definition
(i.e. server side validation). When a meshTls strategy is used, we
require both identity and serverName to be present. We also mark meshTls
as the only required field in the spec. Every ExternalWorkload regardless
of the direction of its traffic must have it set.
WorkloadIPs and ports now become optional to allow resources to be
created only to configure outbound discovery (VM to workload)
and inbound policy discovery (VM).
---------
Signed-off-by: Matei David <matei@buoyant.io>
This PR adds the ability for a `Server` resource to select over `ExternalWorkload`
resources in addition to `Pods`. For the time being, only one of these selector types
can be specified. This has been realized via incrementing the version of the resource
to `v1beta2`
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Followup to linkerd/linkerd2-proxy-init#306Fixeslinkerd/linkerd2#11073
This adds the `reinitialize-pods` container to the `linkerd-cni`
DaemonSet, along with its config in `values.yaml`.
Also the `linkerd-cni`'s version is bumped, to contain the new binary
for this controller.
This change enables the use of SPIFFE identities in `MeshTLSAuthentication`.
To make that happen validation of the identities field on the CRD has been moved
to the policy controller admission webhook. Apart from a more clear expression
of the constraints that a SPIFFE ID needs to meet, this approach allows for
richer error messages. Note that the DNS validation is still based on a regex.
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
To support mesh expansion, the control plane needs to read configuration
associated with an external instance (i.e. a VM) for the purpose of
service and inbound authorization policy discovery.
This change introduces a new CRD that supports the required
configuration options. The resource supports:
* a list of workload IPs (with a generic format to support ipv4 now and ipv6
in the future)
* a set of mesh TLS settings (SNI and identity)
* a set of ports exposed by the workload
* a set of status conditions
---------
Signed-off-by: Matei David <matei@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
When debugging controller behavior, it may desirable to run a controller
with additional command-line flags that aren't explicitly referenced in
our values.yaml.
This change adds support for undocumented experimentalArgs values that
can be set on the policyController and destinationController parent
scopes.
When working with experimental proxy features that are not yet exposed via
control plane APIs, it can be convenient to set additional environment variables
on proxies.
To support this, we add an undocumented `proxy.experimentalEnv` value:
--set proxy.experimentalEnv.LINKERD2_PROXY_DEFROBINATION=extreme
This edge release includes fixes and improvements to the destination
controller's endpoint resolution API.
* Fixed an issue in the control plane where discovery for pod IP addresses could
hang indefinitely ([#11815])
* Updated the proxy to enforce time limits on control plane response streams so
that proxies more naturally distribute load over control plane replicas
([#11837])
* Fixed the policy's controller service metadata responses so that proxy logs
and metrics have informative values ([#11842])
linkerd/linkerd2-proxy#2587 adds configuration parameters that bound the
lifetime and idle times of control plane streams. This change helps to
mitigate imbalanced control plane replica usage and to generally prevent
scenarios where a stream becomes "stuck," as has been observed when a
control plane replica is unhealthy.
This change adds helm values to control this behavior. Default values
are provided.
This edge release contains improvements to the logging and diagnostics of the
destination controller.
* Added a control plane metric to count errors talking to the Kubernetes API
([#11774])
* Fixed an issue causing spurious destination controller error messages for
profile lookups on unmeshed pods with port in default opaque list ([#11550])
[#11774]: https://github.com/linkerd/linkerd2/pull/11774
[#11550]: https://github.com/linkerd/linkerd2/pull/11550
Signed-off-by: Alex Leong <alex@buoyant.io>
## edge-23.12.2
This edge release includes a restructuring of the proxy's balancer along with
accompanying new metrics. The new minimum supported Kubernetes version is 1.22.
* Restructured the proxy's balancer ([#11750]): balancer changes may now occur
independently of request processing. Fail-fast circuit breaking is enforced on
the balancer's queue so that requests can't get stuck in a queue indefinitely.
This new balancer is instrumented with new metrics: request (in-queue) latency
histograms, failfast states, discovery updates counts, and balancer endpoint
pool sizes.
* Changed how the policy controller updates HTTPRoute status so that it doesn't
affect statuses from other non-linkerd controllers ([#11705]; fixes [#11659])
[#11750]: https://github.com/linkerd/linkerd2/pull/11750
[#11705]: https://github.com/linkerd/linkerd2/pull/11705
[#11659]: https://github.com/linkerd/linkerd2/pull/11659
New versions of the k8s-openapi crate drop support for Kubernetes 1.21.
Kubernetes v1.22 has been considered EOL by the upstream project since
2022-07-08. Major cloud providers have EOL'd it as well (GKE's current
MSKV is 1.24).
This change updates the MSKV to v1.22. It also updates the max version
in _test-helpers.sh to v1.28.
* Add ability to configure client-go's `QPS` and `Burst` settings
## Problem and Symptoms
When having a very large number of proxies request identity in a short period of time (e.g. during large node scaling events), the identity controller will attempt to validate the tokens sent by the proxies at a rate surpassing client-go's the default request rate threshold, triggering client-side throttling, which will delay the proxies initialization, and even failing their startup (after a 2m timeout). The identity controller will surface this through log entries like this:
```
time="2023-11-08T19:50:45Z" level=error msg="error validating token for web.emojivoto.serviceaccount.identity.linkerd.cluster.local: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline"
```
## Solution
Client-go's default `QPS` is 5 and `Burst` is 10. This PR exposes those settings as entries in `values.yaml` with defaults of 100 and 200 respectively. Note this only applies to the identity controller, as it's the only controller performing direct requests to the `kube-apiserver` in a hot path. The other controllers mostly rely in informers, and direct calls are sporadic.
## Observability
The `QPS` and `Burst` settings used are exposed both as a log entry as soon as the controller starts, and as in the new metric gauges `http_client_qps` and `http_client_burst`
## Testing
You can use the following K6 script, which simulates 6k calls to the `Certify` service during one minute from emojivoto's web pod. Before running this you need to:
- Put the identity.proto and [all the other proto files](https://github.com/linkerd/linkerd2-proxy-api/tree/v0.11.0/proto) in the same directory.
- Edit the [checkRequest](https://github.com/linkerd/linkerd2/blob/edge-23.11.3/pkg/identity/service.go#L266) function and add logging statements to figure the `token` and `csr` entries you can use here, that will be shown as soon as a web pod starts.
```javascript
import { Client, Stream } from 'k6/experimental/grpc';
import { sleep } from 'k6';
const client = new Client();
client.load(['.'], 'identity.proto');
// This always holds:
// req_num = (1 / req_duration ) * duration * VUs
// Given req_duration (0.5s) test duration (1m) and the target req_num (6k), we
// can solve for the required VUs:
// VUs = req_num * req_duration / duration
// VUs = 6000 * 0.5 / 60 = 50
export const options = {
scenarios: {
identity: {
executor: 'constant-vus',
vus: 50,
duration: '1m',
},
},
};
export default () => {
client.connect('localhost:8080', {
plaintext: true,
});
const stream = new Stream(client, 'io.linkerd.proxy.identity.Identity/Certify');
// Replace with your own token
let token = "ZXlKaGJHY2lPaUpTVXpJMU5pSXNJbXRwWkNJNkluQjBaV1pUZWtaNWQyVm5OMmxmTTBkV2VUTlhWSFpqTmxwSmJYRmtNMWRSVEhwNVNHWllhUzFaZDNNaWZRLmV5SmhkV1FpT2xzaWFXUmxiblJwZEhrdWJEVmtMbWx2SWwwc0ltVjRjQ0k2TVRjd01EWTRPVFk1TUN3aWFXRjBJam94TnpBd05qQXpNamt3TENKcGMzTWlPaUpvZEhSd2N6b3ZMMnQxWW1WeWJtVjBaWE11WkdWbVlYVnNkQzV6ZG1NdVkyeDFjM1JsY2k1c2IyTmhiQ0lzSW10MVltVnlibVYwWlhNdWFXOGlPbnNpYm1GdFpYTndZV05sSWpvaVpXMXZhbWwyYjNSdklpd2ljRzlrSWpwN0ltNWhiV1VpT2lKM1pXSXRPRFUxT1dJNU4yWTNZeTEwYldJNU5TSXNJblZwWkNJNklqaGlZbUV5WWpsbExXTXdOVGN0TkRnMk1TMWhNalZsTFRjelpEY3dOV1EzWmpoaU1TSjlMQ0p6WlhKMmFXTmxZV05qYjNWdWRDSTZleUp1WVcxbElqb2lkMlZpSWl3aWRXbGtJam9pWm1JelpUQXlNRE10TmpZMU55MDBOMk0xTFRoa09EUXRORGt6WXpBM1lXUTJaak0zSW4xOUxDSnVZbVlpT2pFM01EQTJNRE15T1RBc0luTjFZaUk2SW5ONWMzUmxiVHB6WlhKMmFXTmxZV05qYjNWdWREcGxiVzlxYVhadmRHODZkMlZpSW4wLnlwMzAzZVZkeHhpamxBOG1wVjFObGZKUDB3SC03RmpUQl9PcWJ3NTNPeGU1cnNTcDNNNk96VWR6OFdhYS1hcjNkVVhQR2x2QXRDRVU2RjJUN1lKUFoxVmxxOUFZZTNvV2YwOXUzOWRodUU1ZDhEX21JUl9rWDUxY193am9UcVlORHA5ZzZ4ZFJNcW9reGg3NE9GNXFjaEFhRGtENUJNZVZ6a25kUWZtVVZwME5BdTdDMTZ3UFZWSlFmNlVXRGtnYkI1SW9UQXpxSmcyWlpyNXBBY3F5enJ0WE1rRkhSWmdvYUxVam5sN1FwX0ljWm8yYzJWWk03T2QzRjIwcFZaVzJvejlOdGt3THZoSEhSMkc5WlNJQ3RHRjdhTkYwNVR5ZC1UeU1BVnZMYnM0ZFl1clRYaHNORjhQMVk4RmFuNjE4d0x6ZUVMOUkzS1BJLUctUXRUNHhWdw==";
// Replace with your own CSR
let csr = "MIIBWjCCAQECAQAwRjFEMEIGA1UEAxM7d2ViLmVtb2ppdm90by5zZXJ2aWNlYWNjb3VudC5pZGVudGl0eS5saW5rZXJkLmNsdXN0ZXIubG9jYWwwWTATBgcqhkjOPQIBBggqhkjOPQMBBwNCAATKjgVXu6F+WCda3Bbq2ue6m3z6OTMfQ4Vnmekmvirip/XGyi2HbzRzjARnIzGlG8wo4EfeYBtd2MBCb50kP8F8oFkwVwYJKoZIhvcNAQkOMUowSDBGBgNVHREEPzA9gjt3ZWIuZW1vaml2b3RvLnNlcnZpY2VhY2NvdW50LmlkZW50aXR5LmxpbmtlcmQuY2x1c3Rlci5sb2NhbDAKBggqhkjOPQQDAgNHADBEAiAM7aXY8MRs/EOhtPo4+PRHuiNOV+nsmNDv5lvtJt8T+QIgFP5JAq0iq7M6ShRNkRG99ZquJ3L3TtLWMNVTPvqvvUE=";
const data = {
identity: "web.emojivoto.serviceaccount.identity.linkerd.cluster.local",
token: token,
certificate_signing_request: csr,
};
stream.write(data);
// This request takes around 2ms, so this sleep will mostly determine its final duration
sleep(0.5);
};
```
This results in the following report:
```
scenarios: (100.00%) 1 scenario, 50 max VUs, 1m30s max duration (incl. graceful stop):
* identity: 50 looping VUs for 1m0s (gracefulStop: 30s)
data_received................: 6.3 MB 104 kB/s
data_sent....................: 9.4 MB 156 kB/s
grpc_req_duration............: avg=2.14ms min=873.93µs med=1.9ms max=12.89ms p(90)=3.13ms p(95)=3.86ms
grpc_streams.................: 6000 99.355331/s
grpc_streams_msgs_received...: 6000 99.355331/s
grpc_streams_msgs_sent.......: 6000 99.355331/s
iteration_duration...........: avg=503.16ms min=500.8ms med=502.64ms max=532.36ms p(90)=504.05ms p(95)=505.72ms
iterations...................: 6000 99.355331/s
vus..........................: 50 min=50 max=50
vus_max......................: 50 min=50 max=50
running (1m00.4s), 00/50 VUs, 6000 complete and 0 interrupted iterations
```
With the old defaults (QPS=5 and Burst=10), the latencies would be much higher and number of complete requests much lower.
## edge-23.11.4
This edge release introduces support for the native sidecar containers
entering beta support in Kubernetes 1.29. This improves the startup and
shutdown ordering for the proxy relative to other containers, fixing the
long-standing shutdown issue with injected `Job`s. Furthermore, traffic
from other `initContainer`s can now be proxied by Linkerd.
In addition, this edge release includes Helm chart improvements, and
improvements to the multicluster extension.
* Added a new `config.alpha.linkerd.io/proxy-enable-native-sidecar`
annotation and `Proxy.NativeSidecar` Helm option that causes the proxy
container to run as an init-container (thanks @teejaded!) (#11465;
fixes#11461)
* Fixed broken affinity rules for the multicluster `service-mirror` when
running in HA mode (#11609; fixes#11603)
* Added a new check to `linkerd check` that ensures all extension
namespaces are configured properly (#11629; fixes#11509)
* Updated the Prometheus Docker image used by the `linkerd-viz`
extension to v2.48.0, resolving a number of CVEs in older Prometheus
versions (#11633)
* Added `nodeAffinity` to `deployment` templates in the `linkerd-viz`
and `linkerd-jaeger` Helm charts (thanks @naing2victor!) (#11464;
fixes#10680)
* Add native sidecar support
Kubernetes will be providing beta support for native sidecar containers in version 1.29. This feature improves network proxy sidecar compatibility for jobs and initContainers.
Introduce a new annotation config.alpha.linkerd.io/proxy-enable-native-sidecar and configuration option Proxy.NativeSidecar that causes the proxy container to run as an init-container.
Fixes: #11461
Signed-off-by: TJ Miller <millert@us.ibm.com>
This edge release fixes a bug where Linkerd could cause EOF errors during bursts
of TCP connections.
* Fixed a bug where the `linkerd multicluster link` command's
`--gateway-addresses` flag was not respected when a remote gateway exists
([#11564])
* proxy: Increased DEFAULT_OUTBOUND_TCP_QUEUE_CAPACITY to prevent EOF errors
during bursts of TCP connections
[#11564]: https://github.com/linkerd/linkerd2/pull/11564
Signed-off-by: Alex Leong <alex@buoyant.io>
## edge-23.11.2
This edge release contains observability improvements and bug fixes to the
Destination controller, and a refinement to the multicluster gateway resolution
logic.
* Fixed an issue where the Destination controller could stop processing service
profile updates, if a proxy subscribed to those updates stops reading them;
this is a followup to the issue [#11491] fixed in [edge-23.10.3] ([#11546])
* In the Destination controller, added informer lag histogram metrics to track
whenever the Kubernetes objects watched by the controller are falling behind
the state in the kube-apiserver ([#11534])
* In the multicluster service mirror, extended the target gateway resolution
logic to take into account all the possible IPs a hostname might resolve to,
rather than just the first one (thanks @MrFreezeex!) ([#11499])
* Added probes to the debug container to appease environments requiring probes
for all containers ([#11308])
[edge-23.10.3]: https://github.com/linkerd/linkerd2/releases/tag/edge-23.10.3
[#11546]: https://github.com/linkerd/linkerd2/pull/11546
[#11534]: https://github.com/linkerd/linkerd2/pull/11534
[#11499]: https://github.com/linkerd/linkerd2/pull/11499
[#11308]: https://github.com/linkerd/linkerd2/pull/11308
Prior to setting an enormous value to disable protocol detection, the
field was meant to be configurable. In the refactor, the annotation name
stayed the same instead of reflecting the change in the contract (i.e.
not configurable but toggled). Additionally, there were two types in the
proxy partials.
Signed-off-by: Matei David <matei@buoyant.io>
This change allows users to configure protocol detection timeout values
(outbound and inbound). Certain environments may find that protocol
detection inhibits debugging and makes it harder to reason with a
client's behaviour. In such cases (and not only) it may be deseriable to
change the default protocol detection timeout to a higher value than the
default 10s.
Through this change, users may configure their timeout values either
with install-time settings or through annotations; this follows our
usual proxy configuration model. The proxy uses different timeout values
for the inbound and outbound stacks (even though they use the same
default value) and this change respects that by adding two separate
fields.
Signed-off-by: Matei David <matei@buoyant.io>
This edge release includes a fix for the `ServiceProfile` CRD resource schema.
The schema incorrectly required `not` response matches to be arrays, while the
in-cluster validator parsed `not` response matches as objects. In addition, an
issues has been fixed in `linkerd profile`. When used with the `--open-api`
flag, it would not strip trailing slashes when generating a resource from
swagger specifications.
* Fixed an issue where trailing slashes wouldn't be stripped when generating
`ServiceProfile` resources through `linkerd profile --open-api` ([#11519])
* Fixed an issue in the `ServiceProfile` CRD schema. The schema incorrectly
required that a `not` response match should be an array, which the service
profile validator rejected since it expected an object. The schema has been
updated to properly indicate that `not` values should be an object ([#11510];
fixes [#11483])
* Improved logging in the destination controller by adding the client pod's
name to the logging context. This will improve visibility into the messages
sent and received by the control plane from a specific proxy ([#11532])
* Fixed an issue in the destination controller where the metadata API would not
initialize a `Job` informer. The destination controller uses the metadata API
to retrieve `Job` metadata, and relies mostly on informers. Without an
initialized informer, an error message would be logged, and the controller
relied on direct API calls ([#11541]; fixes [#11531])
[#11541]: https://github.com/linkerd/linkerd2/pull/11532
[#11532]: https://github.com/linkerd/linkerd2/pull/11532
[#11531]: https://github.com/linkerd/linkerd2/issues/11531
[#11519]: https://github.com/linkerd/linkerd2/pull/11519
[#11510]: https://github.com/linkerd/linkerd2/pull/11510
[#11483]: https://github.com/linkerd/linkerd2/issues/11483
Signed-off-by: Matei David <matei@buoyant.io>
When the destination controller logs about receiving or sending messages to a data plane proxy, there is no information in the log about which data plane pod it is communicating with. This can make it difficult to diagnose issues which span the data plane and control plane.
We add a `pod` field to the context token that proxies include in requests to the destination controller. We add this pod name to the logging context so that it shows up in log messages. In order to accomplish this, we had to plumb through logging context in a few places where it previously had not been. This gives us a more complete logging context and more information in each log message.
An example log message with this fuller logging context is:
```
time="2023-10-24T00:14:09Z" level=debug msg="Sending destination add: add:{addrs:{addr:{ip:{ipv4:183762990} port:8080} weight:10000 metric_labels:{key:\"control_plane_ns\" value:\"linkerd\"} metric_labels:{key:\"deployment\" value:\"voting\"} metric_labels:{key:\"pod\" value:\"voting-7475cb974c-2crt5\"} metric_labels:{key:\"pod_template_hash\" value:\"7475cb974c\"} metric_labels:{key:\"serviceaccount\" value:\"voting\"} tls_identity:{dns_like_identity:{name:\"voting.emojivoto.serviceaccount.identity.linkerd.cluster.local\"}} protocol_hint:{h2:{}}} metric_labels:{key:\"namespace\" value:\"emojivoto\"} metric_labels:{key:\"service\" value:\"voting-svc\"}}" addr=":8086" component=endpoint-translator context-ns=emojivoto context-pod=web-767f4484fd-wmpvf remote="10.244.0.65:52786" service="voting-svc.emojivoto.svc.cluster.local:8080"
```
Note the `context-pod` field.
Additionally, we have tested this when no pod field is included in the context token (e.g. when handling requests from a pod which does not yet add this field) and confirmed that the `context-pod` log field is empty, but no errors occur.
Signed-off-by: Alex Leong <alex@buoyant.io>
Fixes#11483
Service profile's response class schema indicates that a `not` response match should be an array. This is incorrect and parsing of the response class will fail if an array is provided.
Update the schema to properly indicate that `not`'s value should be an object.
Signed-off-by: Alex Leong <alex@buoyant.io>
## edge-23.10.2
This edge release includes a fix addressing an issue during upgrades for
instances not relying on automated webhook certificate management (like
cert-manager provides).
* Added a `checksum/config` annotation to the destination and proxy injector
deployment manifests, to force restarting those workloads whenever their
webhook secrets change during upgrade (thanks @iAnomaly!) ([#11440])
* Fixed policy controller error when deleting a Gateway API HTTPRoute resource
([#11471])
[#11440]: https://github.com/linkerd/linkerd2/pull/11440
[#11471]: https://github.com/linkerd/linkerd2/pull/11471
Fixes#6940
Added a `checksum/config` annotation into the destination, proxy-injector and tap-injector workloads, whose value is calculated as the SHA256 of the template file containing the TLS cert they depend on. This is necessary so that every time those other files change (they get re-generated on every upgrade or config update via `linkerd upgrade`), the workloads change as well.
We had this in place before, but with the 2.12 helm charts migrations we dropped it by mistake.
Signed-off-by: Cameron Boulton <cameron.boulton@calm.com>
This edge release adds additional configurability to Linkerd's viz and
multicluster extensions.
* Added a `podAnnotations` Helm value to allow adding additional annotations to
the Linkerd-Viz Prometheus Deployment ([#11365]) (thanks @cemenson)
* Added `imagePullSecrets` Helm values to the multicluster chart so that it can
be installed in an air-gapped environment. ([#11285]) (thanks @lhaussknecht)
[#11365]: https://github.com/linkerd/linkerd2/issues/11365
[#11285]: https://github.com/linkerd/linkerd2/issues/11285
Signed-off-by: Alex Leong <alex@buoyant.io>