ExternalWorkload resources require that status condition has almost all of its
fields set (with the exception of a date field). The original inspiration for
this design was the HTTPRoute object.
When using the resource, it is more practical to handle many of the fields as
optional; it is cumbersome to fill out the fields when creating an
ExternalWorkload. We change the settings to be in-line with a [Pod] object
instead.
[Pod]:
7d1a2f7a73/core/v1/types.go (L3063-L3084)
---------
Signed-off-by: Matei David <matei@buoyant.io>
ExternalWorkload resources represent as a resource configuration associated
with a process (or a group of processes) that are foreign to a Kubernetes
cluster. It allows Linkerd to read / write and store configuration for mesh
expansion. Since VMs will be able to receive inbound traffic from a variety of
resources, the proxy should be able to dynamically discover inbound
authorisation policies.
This change introduces a set of callbacks in the indexer that will apply (or
delete) ExternalWorkload resources. In addition, we ensure that
ExternalWorkloads can be processed in a similar fashion to pods (where
applicable, of course) wrt to server matching and defaulting. To serve
discovery requests for a VM, the policy controller will now also start a
watcher for external workloads and allow requests to reference an
`external_workload` target
A quick list of changes:
* ExternalWorkloads can now be indexed in the inbound (policy) index. Renamed
* the pod module in the inbound index to be more generic ("workload"); the
* module has some re-usable building blocks that we can use for external
* workloads. Moved common functions (e.g. building a default inbound server)
* around to share what's already been done without abstracting more or
* introducing generics. Changed gRPC target types to a tuple of `(Workload,
* port)` from a tuple of `(String, String, port)` Added RBAC to watch external
* workloads.
---------
Signed-off-by: Matei David <matei@buoyant.io>
Currently, the value that is put in the `LINKERD2_PROXY_POLICY_WORKLOAD` env var has the format of `pod_ns:pod_name`. This PR changes the format of the policy token into a json struct, so it can encode the type of workload and not only its location. For now, we add an additional `external_workload` type.
Zahari Dichev <zaharidichev@gmail.com>
We introduced an ExternalWorkload CRD along with bindings for mesh
expansion. Currently, the CRD allows users to create ExternalWorkload
resources without adding a meshTls strategy.
This change adds some more validation restrictions to the CRD definition
(i.e. server side validation). When a meshTls strategy is used, we
require both identity and serverName to be present. We also mark meshTls
as the only required field in the spec. Every ExternalWorkload regardless
of the direction of its traffic must have it set.
WorkloadIPs and ports now become optional to allow resources to be
created only to configure outbound discovery (VM to workload)
and inbound policy discovery (VM).
---------
Signed-off-by: Matei David <matei@buoyant.io>
This PR adds the ability for a `Server` resource to select over `ExternalWorkload`
resources in addition to `Pods`. For the time being, only one of these selector types
can be specified. This has been realized via incrementing the version of the resource
to `v1beta2`
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
We introduced an ExternalWorkload CRD for mesh expansion. This change
follows up by adding bindings for Rust and Go code.
For Go code:
* We add a new schema and ExternalWorkload types
* We also update the code-gen script to generate informers
* We add a new informer type to our abstractions built on-top of
client-go, including a function to check if a client has access to the
resource.
For Rust code:
* We add ExternalWorkload bindings to the policy controller.
---------
Signed-off-by: Matei David <matei@buoyant.io>
This change enables the use of SPIFFE identities in `MeshTLSAuthentication`.
To make that happen validation of the identities field on the CRD has been moved
to the policy controller admission webhook. Apart from a more clear expression
of the constraints that a SPIFFE ID needs to meet, this approach allows for
richer error messages. Note that the DNS validation is still based on a regex.
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
* kube 0.87.1
* k8s-openapi 0.20.0
* kubert 0.21.1
* k8s-gateway-api 0.15
* ring 0.17
Furthermore, the policy controller's metrics endpoint has been updated
to include tokio runtime metrics.
The policy controller sets incorrect backend metadata when (1) there is
no explicit backend reference specified, and (2) when a backend
reference crosses namespaces.
This change fixes these backend references so that proxy logs and
metrics have the proper metadata references. Outbound policy tests are
updated to validate this.
New versions of the k8s-openapi crate drop support for Kubernetes 1.21.
Kubernetes v1.22 has been considered EOL by the upstream project since
2022-07-08. Major cloud providers have EOL'd it as well (GKE's current
MSKV is 1.24).
This change updates the MSKV to v1.22. It also updates the max version
in _test-helpers.sh to v1.28.
Fixes#11659
When the policy controller updates the status of an HttpRoute, it will override any existing statuses on that resource.
We update the policy controller to take into account any statuses which are not controlled by Linkerd already on the resource. It will patch the final statuses to be the combination of thee non-Linkerd statuses plus the Linkerd statuses.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Add native sidecar support
Kubernetes will be providing beta support for native sidecar containers in version 1.29. This feature improves network proxy sidecar compatibility for jobs and initContainers.
Introduce a new annotation config.alpha.linkerd.io/proxy-enable-native-sidecar and configuration option Proxy.NativeSidecar that causes the proxy container to run as an init-container.
Fixes: #11461
Signed-off-by: TJ Miller <millert@us.ibm.com>
* Update dev to v42
* Update Go to 1.21.3
* Update Rust to 1.73.0
* Update the Cargo workspace to use the v2 package resolver
* Update debian from bullseye to bookworm
* Update golangci-lint to 1.55.1
* Disable deprecated linters (deadcode, varcheck)
* Disable goconst linter -- pointless and noisy
* Disable depguard linter -- it requires that all of our Go dependencies be added to allowlists;
* Update K3d to v5.6.0
* Update CI from k3s 1.26 to 1.28
* Update markdownlint-cli2 to 0.10.0
When the policy controller's status index detects the deletion of a gateway API HTTPRoute, it attempts to delete that resource out of its own index. However, we use the wrong kind in the key when deleting. This results in the resource persisting in the index after it has been deleted from the cluster, which causes an error to be logged every 10 seconds when the policy controller attempts to do reconciliation and ensure that the statuses of all HTTPRoutes in its index are correct:
```
2023-10-09T20:53:17.059098Z ERROR status::Controller: linkerd_policy_controller_k8s_status::index: Failed to patch HTTPRoute namespace=linkerd-policy-test-sev0n7 route=GroupKindName { group: "gateway.networking.k8s.io", kind: "HTTPRoute", name: "test" } error=ApiError: httproutes.gateway.networking.k8s.io "test" not found: NotFound (ErrorResponse { status: "Failure", message: "httproutes.gateway.networking.k8s.io \"test\" not found", reason: "NotFound", code: 404 })
```
To fix this, we use the correct kind when deleting from the index.
Signed-off-by: Alex Leong <alex@buoyant.io>
We intermittently see flaky policy integration test failures like:
```
failures:
either
thread 'either' panicked at 'assertion failed: `(left == right)`
left: `7`,
right: `0`: blessed uninjected curl must succeed', policy-test/tests/e2e_server_authorization.rs:293:9
```
This test failure is saying that the curl process is returning an exit code of 7 instead of the expected exit code of 0. This exit code indicates that curl failed to establish a connection. https://everything.curl.dev/usingcurl/returns
It's unclear why this connection occasionally fails in CI and I have not been able to reproduce this failure locally.
However, by looking at the logic of the integration test, we can see that the integration test creates the `web` Service and the `web` Pod and waits for that pod to become ready before unblocking the curl from executing. This means that, theoretically, there could be a race condition between the test and the kubernetes endpoints controller. As soon as the web pod becomes ready, the endpoints controller will update the endpoints resource for the `web` Service and at the same time, our test will unblock the curl command. If the test wins this race, it is possible that curl will run before the endpoints resource has been updated.
We add an additional wait condition to the test to wait until the endpoints resource has an endpoint before unblocking curl.
Since I could not reproduce the test failure locally, it is impossible to say if this is actually the cause of the flakiness or if this change fixes it.
Signed-off-by: Alex Leong <alex@buoyant.io>
This branch updates the policy-controller's dependency on Kubert to
v0.18, `kube-rs` to v0.85, `k8s-gateway-api` to v0.13, and `k8s-openapi`
to v0.19.
All of these crates depend on `kube-rs` and `k8s-openapi`, so they must
all be updated together in one commit. Therefore, this branch updates
all these dependencies.
The [xRoute Binding KEP](https://gateway-api.sigs.k8s.io/geps/gep-1426/#namespace-boundaries) states that HttpRoutes may be created in either the namespace of their parent Service (producer routes) or in the namespace of the client initiating requests to the service (consumer routes). Linkerd currently only indexes producer routes and ignores consumer routes.
We add support for consumer routes by changing the way that HttpRoutes are indexed. We now index each route by the namespace of its parent service instead of by the namespace of the HttpRoute resource. We then further subdivide the `ServiceRoutes` struct to have a watch per-client-namespace instead of just a single watch. This is because clients from different namespaces will have a different view of the routes for a service.
When an HttpRoute is updated, if it is a producer route, we apply that HttpRoute to watches for all of the client namespaces. If the route was a consumer route, then we only apply the HttpRoute to watches for that consumer namespace.
We also add API tests for consumer and producer routes.
A few noteworthy changes:
* Because the namespace of the client factors into the lookup, we had to change the discovery target to a type which includes the client namespace.
* Because a service may have routes from different namespaces, the route metadata now needs to track group, kind, name, AND namespace instead of just using the namespace of the service. This means that many uses of the `GroupKindName` type are replaced with a `GroupKindNamespaceName` type.
Signed-off-by: Alex Leong <alex@buoyant.io>
According to the [xRoutes Mesh Binding KEP](https://gateway-api.sigs.k8s.io/geps/gep-1426/#ports), the port in a parent reference is optional:
> By default, a Service attachment applies to all ports in the service. Users may want to attach routes to only a specific port in a Service. To do so, the parentRef.port field should be used.
> If port is set, the implementation MUST associate the route only with that port. If port is not set, the implementation MUST associate the route with all ports defined in the Service.
However, we currently ignore any HttpRoutes which don't have a port specified in the parent ref.
We update the policy controller to apply HttpRoutes which do not specify a port in the parent ref to all ports of the parent service.
We do this by storing these "portless" HttpRoutes in the index and then copying these routes into every port-specific watch for that service.
Signed-off-by: Alex Leong <alex@buoyant.io>
We add support for the RequestHeaderModifier and RequestRedirect HTTP filters. The policy controller reads these filters in any HttpRoute resource that it indexes (both policy.linkerd.io and gateway.networking.k8s.io) and returns them in the outbound policy API. These filters may be added at the route rule level and at the backend level.
We add outbound api tests for this behavior for both types of HttpRoute.
Incidentally we also fix a flaky test in the outbound api tests where a watch was being recreated partway through a test, leading to a race condition.
Signed-off-by: Alex Leong <alex@buoyant.io>
Updates the policy-controller to watch `httproute.gateway.networking.k8s.io` resources in addition to watching `httproute.policy.linkerd.io` resources. Routes of either or both types can be returned in policy responses and will be appropriately identified by the `group` field on their metadata. Furthermore we update the Status of these resources to correctly reflect when they are accepted.
We add the `httproute.gateway.networking.k8s.io` CRD to the Linkerd installed CRD list and add the appropriate RBAC to the policy controller so that it may watch these resources.
Signed-off-by: Alex Leong <alex@buoyant.io>
Add go client codegen for HttpRoute v1beta3. This will be necessary for any of the go controllers (i.e. metrics-api) or go CLI commands to interact with HttpRoute v1beta3 resources in kubernetes.
Signed-off-by: Kevin Ingelman <ki@buoyant.io>
PR #10969 adds support for the GEP-1742 `timeouts` field to the
HTTPRoute CRD. This branch implements actual support for these fields in
the policy controller. The timeout fields are now read and used to set
the timeout fields added to the proxy-api in
linkerd/linkerd2-proxy-api#243.
In addition, I've added code to ensure that the timeout fields are
parsed correctly when a JSON manifest is deserialized. The current
implementation represents timeouts in the bindings as a Rust
`std::time::Duration` type. `Duration` does implement
`serde::Deserialize` and `serde::Serialize`, but its serialization
implementation attempts to (de)serialize it as a struct consisting of a
number of seconds and a number of subsecond nanoseconds. The timeout
fields are instead supposed to be represented as strings in the Go
standard library's `time.ParseDuration` format. Therefore, I've added a
newtype which wraps the Rust `std::time::Duration` and implements the
same parsing logic as Go. Eventually, I'd like to upstream the
implementation of this to `kube-rs`; see kube-rs/kube#1222 for details.
Depends on #10969
Depends on linkerd/linkerd2-proxy-api#243
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Add a new version to the HttpRoute CRD: `v1beta3`. This version adds a new `timeouts` struct to the http route rule. This mirrors a corresponding new field in the Gateway API, as described in [GEP-1742](https://github.com/kubernetes-sigs/gateway-api/pull/1997). This field is currently unused, but will eventually be read by the policy controller and used to configure timeouts enforced by the proxy.
The diff between v1beta2 and v1beta3 is:
```
timeouts:
description: "Timeouts defines the timeouts that can be configured
for an HTTP request. \n Support: Core \n <gateway:experimental>"
properties:
backendRequest:
description: "BackendRequest specifies a timeout for an
individual request from the gateway to a backend service.
Typically used in conjunction with automatic retries,
if supported by an implementation. Default is the value
of Request timeout. \n Support: Extended"
format: duration
type: string
request:
description: "Request specifies a timeout for responding
to client HTTP requests, disabled by default. \n For example,
the following rule will timeout if a client request is
taking longer than 10 seconds to complete: \n ``` rules:
- timeouts: request: 10s backendRefs: ... ``` \n Support:
Core"
format: duration
type: string
type: object
```
We update the `storage` version of HttpRoute to be v1beta3 but continue to serve all versions. Since this new field is optional, the Kubernetes API will be able to automatically convert between versions.
Signed-off-by: Alex Leong <alex@buoyant.io>
Fixes#10877
Linkerd reads the list of container readiness and liveness probes for a pod in order to generate authorizations which allow probes by default. However, when reading the `path` field of a probe, we interpret this field literally rather than parsing it as a URI. This means that any non-path parts of the URI (such as URI params) will attempt to match against the path of a probe request, causing these authorizations to fail.
Instead, we parse this field as a URI and only use the path part for path matching. Invalid URIs are skipped and a warning is logged.
Signed-off-by: Alex Leong <alex@buoyant.io>
When the `namespace` field of a `backend_ref` of an `HttpRoute` is set, Linkerd ignores this field and instead assumes that the backend is in the same namespace as the parent `Service`.
To properly handle the case where the backend is in a different namespace from the parent `Service`, we change the way that service metadata is stored in the policy controller outbound index. Instead of keeping a separate service metadata map per namespace, we maintain one global service metadata map which is shared between all namespaces using an RwLock. This allows us to make the two necessary changes:
1. When validating the existence of a backend service, we now look for it in the appropriate namespace instead of the Service's namespace
2. When constructing the backend authority, we use the appropriate namespace instead of the Service's namespace
We also add an API test for this situation.
Signed-off-by: Alex Leong <alex@buoyant.io>
Fixes#10782
Added the `minItems: 1` field to `spec.identities` and `spec.identitiRefs`. This is a BC change so it's not required to bump the CRD version, plus current CRs not abiding to this constraint would be broken anyway.
```bash
$ cat << EOF | k apply -f -
> apiVersion: policy.linkerd.io/v1alpha1
kind: MeshTLSAuthentication
metadata:
name: "test"
spec:
identities: []
> EOF
The MeshTLSAuthentication "test" is invalid: spec.identities: Invalid value: 0: spec.identities in body should have at least 1 items
```
Also refactored the MeshTLSAuthentication index reset loop to avoid stop processing items when one of them fails.
Fixes#10762
The Linkerd control plane chart contains a Lease resource which is used by the Policy controller to do leader election. ArgoCD considers Leases to be runtime resources and will not deploy them. This means that Linkerd will not work for users of ArgoCD.
We remove the policy-controller-write Lease resource from the Helm chart and instead have the policy controller create this resource at startup. We create it with an `Apply` patch with `resourceVersion="0"`. This ensures that the Lease resource will only be created if it does not already exist and that if there are multiple replicas of the policy controller starting up at once, only one of them will create the Lease resource.
We also set the `linkerd-destination` Deployment as the owner reference of the Lease resource. This means that when the `linkerd-destination` Deployment is deleted (for example, when Linkerd is uninstalled) then the Lease will be garbage collected by Kubernetes.
Signed-off-by: Alex Leong <alex@buoyant.io>
The provisional [xRoutes GEP](https://gateway-api.sigs.k8s.io/geps/gep-1426/#allowed-service-types) says that implementations should support ClusterIP Service types as a parentRef (with or without selectors) but should not support headless services, since they do not have a "frontend" functionality (e.g an IP or DNS hostname that consumers may use). It is unclear to me whether this extends to backendRefs, but I would assume so (since there are no endpoints to send to).
Additionally, ExternalName Service types should not allowed at all, either as backendRefs or parentRefs since they pose a security threat. Last but not least LoadBalancer and NodePort types are apparently supported since they still provision a cluster IP.
We update the policy-controller to only accept a parent Service if it has a ClusterIP.
Signed-off-by: Alex Leong <alex@buoyant.io>
Controller currently serves hardcoded configuration values for failure
accrual parameters. This change adds support to discover configuration
on Service objects based on annotations.
Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Alex Leong <alex@buoyant.io>
Co-authored-by: Alex Leong <alex@buoyant.io>
We enable kubert's metrics feature which allows us to create a prometheus metrics endpoint on the policy controller's admin server. By default, only process metrics are surfaced.
Signed-off-by: Alex Leong <alex@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
A route may have two conditions in a parent status: a condition that
states whether it has been accepted by the parents, and a condition that
states whether all backend references -- that traffic matched against
route is sent to -- have resolved successfully. Currently, the policy
controller does not support the latter.
This change introduces support for checking and setting a backendRef
specific condition. A successful condition (ResolvedRefs = True) is met
when all backend references point to a supported type, and that type
exists in the cluster. Currently, only Service objects are supported. A
nonexistent object, or an unsupported kind will reject the entire
condition; the particular reason will be reflected in the condition's
message.
Since statuses are set on a route's parents, the same condition will
apply to _all_ parents in a route (since there is no way to elicit
different backends for different parents).
If a route does not have any backend references, then the parent
reference type will be used. As such, any parents that are not Services
will automatically get an invalid backend condition (exception to the
rule in the third paragraph where a condition is shared by all parents).
When the parent is supported (i.e a Service) we needn't check its
existence since the parent condition will already reflect that.
---
Signed-off-by: Matei David <matei@buoyant.io>
Co-authored-by: Eliza Weisman <eliza@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
When an HTTPRoute references a backend service which does not exist, the outbound policy API returns an FailureInjector filter and no backend. However, since the proxy still needs to do queuing for this synthetic backend, we still need to send a queue config to the proxy.
We populate the queue config for this backend, even when it does not exist.
As an unrelated but nearby fix, we also populate metadata for service backends.
Signed-off-by: Alex Leong <alex@buoyant.io>
We update the outbound policy API to ignore any HTTPRoute if it does not have an Accepted status of True. Additionally, we remove spurious logging from both the inbound and outbound index when they ignore an HTTPRoute which does not have parents which are relevant to that API. This means that, for example, an HTTPRoute that has a Service as a parent will no longer trigger log warnings from the inbound index and vice versa.
Signed-off-by: Alex Leong <alex@buoyant.io>
We add support for setting the status field on HTTPRoute resources which have a Service as a parent_ref. We also make a number of simplifications to the logic of how statuses are determined. For each parent ref which is a Server or Service, we add a status. This status will either be "accepted" or "no matching parent" depending on if that parent resource exists in the cluster. No status condition is generated for parents of other kinds.
Signed-off-by: Alex Leong <alex@buoyant.io>