In order to support an HA mode for the service-mirror component, some
form of synchronization should be used to coordinate between replicas of
the service-mirror controller. Although in practice most of the updates
done by the replicas are idempotent (and have benign effects on
correctness), there are some downsides, such as: resource usage
implications from setting-up multiple watches, log pollution, errors
associated with writes on resources that out of date, and increased
difficulty in debugging.
This change adds coordination between the replicas through leader
election. To achieve leader election, client-go's `coordination` package
is used. The change refactors the existing code; the previous nested
loops now reside in a closure (to capture the necessary configuration),
and the closure is run when a leader is elected.
Leader election functions as part of a loop: a lease resource is created
(if it does not exist), and the controller blocks until it has acquired
the lease. The loop is terminated only on shutdown from an interrupt
signal. If the lease is lost, it is released, watchers are cleaned-up,
and the controller returns to blocking until it acquires the lease once
again.
Shutdown logic has been changed to rely on context cancellation
propagation so that the watchers may be ended either by the leader
elector (when claim is lost) or by the main routine when an interrupt is
handled.
---------
Signed-off-by: Matei David <matei@buoyant.io>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
When using `linkerd-await` as a preStart hook, we need to explicitly pass in the proxy's admin port if it is not the default (4191). While the admin server listener can be bound to any arbitrary port using `config.linkerd.io/admin-port` as a configuration annotation, `linkerd-await`'s template is not aware of the override resulting in start-up errors.
This change adds the override to `linkerd-await` by always using an explicit `--port` argument.
---------
Signed-off-by: jclegras <11457480+jclegras@users.noreply.github.com>
Tap and Top currently have a limited number of output fields; it would be extremely useful to be able to specify arbitrary http/grpc headers to display. This change adds a `jsonpath` output flag to filter json tap events based on arbitrary fields.
A new `jsonpath` option is introduced for `linkerd viz tap -o`.
Signed-off-by: hiteshwani29 <hiteshwani29@gmail.com>
* Updated release notes for edge-23.6.2
Signed-off-by: Eric Anderson <eric@buoyant.io>
* Updated helm charts and readmes
Signed-off-by: Eric Anderson <eric@buoyant.io>
* Updated release notes for edge release
Signed-off-by: Eric Anderson <eric@buoyant.io>
---------
Signed-off-by: Eric Anderson <eric@buoyant.io>
Updating proxy version on main
PRs #2418 and #2419 add per-route and per-backend request timeouts
configured by the `OutboundPolicies` API to the `MatchedRoute` and
`MatchedBackend` layers in the outbound `ClientPolicy` stack,
respectively. This means that — unlike in the `ServiceProfile` stack —
two separate request timeouts can be configured in `ClientPolicy`
stacks. However, because both the `MatchedRoute` and `MatchedBackend`
layers are in the HTTP logical stack, the errors emitted by both
timeouts will have a `LogicalError` as their most specific error
metadata, meaning that the log messages and `l5d-proxy-error` headers
recorded for these timeouts do not indicate whether the timeout that
failed the request was the route request timeout or the backend request
timeout.
In order to ensure this information is recorded and exposed to the user,
this branch adds two new error wrapper types, one of which enriches an
error with a `RouteRef`'s metadata, and one of which enriches an error
with a `BackendRef`'s metadata. The `MatchedRoute` stack now wraps all
errors with `RouteRef` metadata, and the `MatchedBackend` stack wraps
errors with `BackendRef` metadata. This way, when the route timeout
fails a request, the error will include the route metadata, while when
the backend request timeout fails a request, the error will include both
the route and backend metadata.
Adding these new error wrappers also has the additional side benefit of
adding this metadata to errors returned by filters, allowing users to
distinguish between errors emitted by a filter on a route rule and
errors emitted by a per-backend filter. Also, any other errors emitted
lower in the stack for requests that are handled by a client policy
stack will now also include this metadata, which seems generally useful.
Example errors, taken from a proxy unit test:
backend request:
```
logical service logical.test.svc.cluster.local:666: route httproute.test.timeout-route: backend service.test.test-svc:666: HTTP response timeout after 1s
```
route request:
```
logical service logical.test.svc.cluster.local:666: route httproute.test.timeout-route: HTTP response timeout after 2s
```
---
* recover: remove unused `mut` (linkerd/linkerd2-proxy#2425)
* outbound: implement `OutboundPolicies` route request timeouts (linkerd/linkerd2-proxy#2418)
* build(deps): bump tj-actions/changed-files from 35.7.7 to 36.2.1 (linkerd/linkerd2-proxy#2427)
* outbound: implement `OutboundPolicies` backend request timeouts (linkerd/linkerd2-proxy#2419)
* outbound: add backend and route metadata to errors (linkerd/linkerd2-proxy#2428)
Signed-off-by: Eric Anderson <eric@buoyant.io>
fix: supplement the HA flag
Linkerd checks are skipped for HA because the field is missing from the configmap generated during install time.
This change introduces an HA field in the helm charts that will be persisted, thereby allowing checks to run.
---------
Signed-off-by: Takumi Sue <u630868b@alumni.osaka-u.ac.jp>
Add go client codegen for HttpRoute v1beta3. This will be necessary for any of the go controllers (i.e. metrics-api) or go CLI commands to interact with HttpRoute v1beta3 resources in kubernetes.
Signed-off-by: Kevin Ingelman <ki@buoyant.io>
This adds the ability to pass in a docker builder option to docker. This makes building multi-arch images super simple by using our k8s infrastructure.
It also makes building multi-arch images very fast since they can be built in parallel and on native hardware.
DCO Sign off
I agree to the DCO for all the commits in this PR.
Co-authored-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
This edge release changes the behavior of the CNI plugin to run exclusively in
"chained mode". Instead of creating its own configuration file, the CNI plugin
will now wait until a `conf` file exists before appending its configuration.
Additionally, this change includes a bug fix for topology aware service
routing.
* Changed CNI plugin installer to always run in 'chained' mode; the plugin will
now wait until another CNI plugin is installed before appending its
configuration
* Added a timeout value to the HttpRoute CRD. The field is currently unused but
will be used eventually to allow for configuration of per-route timeouts
* Fixed bug where topology routing would not disable while service was under
load (thanks @MarkSRobinson!)
---------
Signed-off-by: Matei David <matei@buoyant.io>
Co-authored-by: Eliza Weisman <eliza@buoyant.io>
PR #10969 adds support for the GEP-1742 `timeouts` field to the
HTTPRoute CRD. This branch implements actual support for these fields in
the policy controller. The timeout fields are now read and used to set
the timeout fields added to the proxy-api in
linkerd/linkerd2-proxy-api#243.
In addition, I've added code to ensure that the timeout fields are
parsed correctly when a JSON manifest is deserialized. The current
implementation represents timeouts in the bindings as a Rust
`std::time::Duration` type. `Duration` does implement
`serde::Deserialize` and `serde::Serialize`, but its serialization
implementation attempts to (de)serialize it as a struct consisting of a
number of seconds and a number of subsecond nanoseconds. The timeout
fields are instead supposed to be represented as strings in the Go
standard library's `time.ParseDuration` format. Therefore, I've added a
newtype which wraps the Rust `std::time::Duration` and implements the
same parsing logic as Go. Eventually, I'd like to upstream the
implementation of this to `kube-rs`; see kube-rs/kube#1222 for details.
Depends on #10969
Depends on linkerd/linkerd2-proxy-api#243
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
This release stops using the "interface" mode, and instead wait till
another CNI plugin drops a proper network config and then append the
linkerd CNI config to it. This avoids having pods start before proper
networking is established in the node.
Add a new version to the HttpRoute CRD: `v1beta3`. This version adds a new `timeouts` struct to the http route rule. This mirrors a corresponding new field in the Gateway API, as described in [GEP-1742](https://github.com/kubernetes-sigs/gateway-api/pull/1997). This field is currently unused, but will eventually be read by the policy controller and used to configure timeouts enforced by the proxy.
The diff between v1beta2 and v1beta3 is:
```
timeouts:
description: "Timeouts defines the timeouts that can be configured
for an HTTP request. \n Support: Core \n <gateway:experimental>"
properties:
backendRequest:
description: "BackendRequest specifies a timeout for an
individual request from the gateway to a backend service.
Typically used in conjunction with automatic retries,
if supported by an implementation. Default is the value
of Request timeout. \n Support: Extended"
format: duration
type: string
request:
description: "Request specifies a timeout for responding
to client HTTP requests, disabled by default. \n For example,
the following rule will timeout if a client request is
taking longer than 10 seconds to complete: \n ``` rules:
- timeouts: request: 10s backendRefs: ... ``` \n Support:
Core"
format: duration
type: string
type: object
```
We update the `storage` version of HttpRoute to be v1beta3 but continue to serve all versions. Since this new field is optional, the Kubernetes API will be able to automatically convert between versions.
Signed-off-by: Alex Leong <alex@buoyant.io>