When the destination controller logs about receiving or sending messages to a data plane proxy, there is no information in the log about which data plane pod it is communicating with. This can make it difficult to diagnose issues which span the data plane and control plane.
We add a `pod` field to the context token that proxies include in requests to the destination controller. We add this pod name to the logging context so that it shows up in log messages. In order to accomplish this, we had to plumb through logging context in a few places where it previously had not been. This gives us a more complete logging context and more information in each log message.
An example log message with this fuller logging context is:
```
time="2023-10-24T00:14:09Z" level=debug msg="Sending destination add: add:{addrs:{addr:{ip:{ipv4:183762990} port:8080} weight:10000 metric_labels:{key:\"control_plane_ns\" value:\"linkerd\"} metric_labels:{key:\"deployment\" value:\"voting\"} metric_labels:{key:\"pod\" value:\"voting-7475cb974c-2crt5\"} metric_labels:{key:\"pod_template_hash\" value:\"7475cb974c\"} metric_labels:{key:\"serviceaccount\" value:\"voting\"} tls_identity:{dns_like_identity:{name:\"voting.emojivoto.serviceaccount.identity.linkerd.cluster.local\"}} protocol_hint:{h2:{}}} metric_labels:{key:\"namespace\" value:\"emojivoto\"} metric_labels:{key:\"service\" value:\"voting-svc\"}}" addr=":8086" component=endpoint-translator context-ns=emojivoto context-pod=web-767f4484fd-wmpvf remote="10.244.0.65:52786" service="voting-svc.emojivoto.svc.cluster.local:8080"
```
Note the `context-pod` field.
Additionally, we have tested this when no pod field is included in the context token (e.g. when handling requests from a pod which does not yet add this field) and confirmed that the `context-pod` log field is empty, but no errors occur.
Signed-off-by: Alex Leong <alex@buoyant.io>
The destination controller indexes Pods by their primary PodIP and their
primary HostIP (when applicable); and it indexes Services by their
primary ClusterIP.
In preparation for dual-stack (IPv6) support, this change updates the
destination controller indexers to consume all IPs for these resources.
Tests are updated to demonstrate a dual-stack setup (where these
resources have a primary IPv4 address and a secondary IPv6 address).
While exercising these tests, I had to replace the brittle host:port
parsing logic we use in the server, in favor of Go's standard
`net.SplitHostPort` utility.
When a grpc client of the destination.Get API initiates a request but then doesn't read off of that stream, the HTTP2 stream flow control window will fill up and eventually exert backpressure on the destination controller. This manifests as calls to `Send` on the stream blocking. Since `Send` is called synchronously from the client-go informer callback (by way of the endpoint translator), this blocks the informer callback and prevents all further informer calllbacks from firing. This causes the destination controller to stop sending updates to any of its clients.
We add a queue in the endpoint translator so that when it gets an update from the informer callback, that update is queued and we avoid potentially blocking the informer callback. Each endpoint translator spawns a goroutine to process this queue and call `Send`. If there is not capacity in this queue (e.g. because a client has stopped reading and we are experiencing backpressure) then we terminate the stream.
Signed-off-by: Alex Leong <alex@buoyant.io>
Followup to https://github.com/linkerd/linkerd2/pull/11334#issuecomment-1736093592
This extends the test introduced in #11334 to excercise upgrading a
Server associated to a pod's HostPort, and observing how the stream
updates the OpaqueProtocol field.
Helper functions were refactored a bit to allow retrieving the
l5dCRDClientSet used when building the fake API.
Followup to #11328
Implements a new pod watcher, instantiated along the other ones in the Destination server. It also watches on Servers and carries all the logic from ServerWatcher, which has now been decommissioned.
The `CreateAddress()` function has been moved into a function of the PodWatcher, because now we're calling it on every update given the pod associated to an ip:port might change and we need to regenerate the Address object. That function also takes care of capturing opaque protocol info from associated Servers, which is not new and had some logic that was duped in the now defunct ServerWatcher. `getAnnotatedOpaquePorts()` got also moved for similar reasons.
Other things to note about PodWatcher:
- It publishes a new pair of metrics `ip_port_subscribers` and `ip_port_updates` leveraging the framework in `prometheus.go`.
- The complexity in `updatePod()` is due to only send stream updates when there are changes in the pod's readiness, to avoid sending duped messages on every pod lifecycle event.
-
Finally, endpointProfileTranslator's `endpoint` (*pb.WeightedAddr) not being a static object anymore, the `Update()` function now receives an Address that allows it to rebuild the endpoint on the fly (and so `createEndpoint()` was converted into a method of endpointProfileTranslator).
* Bump CNI plugin to v1.2.1
* Bump proxy-init to v2.2.2
Both dependencies include a fix for CVE-2023-2603. Since alpine is used
as the runtime image, there is a security vulnerability detected in the
produced images (due to an issue with libcap). The alpine images have
been bumped to address the CVE.
Signed-off-by: Matei David <matei@buoyant.io>
* stopgap fix for hostport staleness
## Problem
When there's a pod with a `hostPort` entry, `GetProfile` requests
targetting the host's IP and that `hostPort` return an endpoint profile
with that pod's IP and `containerPort`. If that pod vanishes and another
one in that same host with that same `hostPort` comes up, the existing
`GetProfile` streams won't get updated with the new pod information
(metadata, identity, protocol).
That breaks the connectivity of the client proxy relying on that stream.
## Partial Solution
It should be less surprising for those `GetProfile` requests to return
an endpoint profile with the same host IP and port requested, and leave
to the cluster's CNI to peform the translation to the corresponding pod
IP and `containerPort`.
This PR performs that change, but continuing returning the corresponding
pod's information alongside.
If the pod associated to that host IP and port changes, the client proxy
won't loose connectivity, but the pod's information won't get updated
(that'll be fixed in a separate PR).
A new unit test validating this has been added, which will be expanded
to validate the changed pod information when that gets implemented.
## Details of Change
- We no longer do the HostPort->ContainerPort conversion, so the
`getPortForPod` function was dropped.
- The `getPodByIp` function will now be split in two: `getPodByPodIP`
and `getPodByHostIP`, the latter being called only if the former
doesn't return anything.
- The `createAddress` function is now simplified in that it just uses
the passed IP to build the address. The passed IP will depend on which
of the two functions just mentioned returned the pod (host IP or pod
IP)
When a service has it's internal traffic policy set to "local", we will perform filtering to only return local endpoints, as-per the ForZone hints in the endpoints. However, ForZone calculations do not take resources from remote clusters into account, therefore this type of filtering is not appropriate for remote discovery services.
We explicitly ignore any internal traffic policy when doing remote discovery.
Signed-off-by: Alex Leong <alex@buoyant.io>
Similar to #11246, the destination controller was complaning above
trying to register dupe metrics for the api cache gauges, when a given
target cluster got re-linked. This change unregisters the gauges for the
target cluster when said cluster is removed.
Supersedes #11252
We add a `cluster_store_size` gauge to the destination controller to track the number of entries in the remote discovery cluster store. If this is ever different from the number of cluster credentials secrets in the linkerd namespace, this indicates that there is a problem with a link that needs to be investigated further.
Signed-off-by: Alex Leong <alex@buoyant.io>
Whenever the service mirror's main loop was triggered again, the following warnings were generated:
```
time="2023-08-14T20:16:29Z" level=warning msg="failed to register Prometheus gauge Desc{fqName: \"service_cache_size\", help: \"Number of items in the client-go service cache\", constLabels: {cluster=\"remote\"}, variableLabels: []}: duplicate metrics collector registration attempted"
time="2023-08-14T20:16:29Z" level=warning msg="failed to register Prometheus gauge Desc{fqName: \"endpoints_cache_size\", help: \"Number of items in the client-go endpoints cache\", constLabels: {cluster=\"remote\"}, variableLabels: []}: duplicate metrics collector registration attempted"
```
To fix, this adds into the cluster watcher's `Stop()` method a directive to unregister the prometheus cache metrics associated to the cluster's client API.
Adds support for remote discovery to the destination controller.
When the destination controller gets a `Get` request for a Service with the `multicluster.linkerd.io/remote-discovery` label, this is an indication that the destination controller should discover the endpoints for this service from a remote cluster. The destination controller will look for a remote cluster which has been linked to it (using the `linkerd multicluster link` command) with that name. It will look at the `multicluster.linkerd.io/remote-discovery` label for the service name to look up in that cluster. It then streams back the endpoint data for that remote service.
Since we now have multiple client-go informers for the same resource types (one for the local cluster and one for each linked remote cluster) we add a `cluster` label onto the prometheus metrics for the informers and EndpointWatchers to ensure that each of these components' metrics are correctly tracked and don't overwrite each other.
---------
Signed-off-by: Alex Leong <alex@buoyant.io>
There were a few improvements we could have made to a recent change that
added a ClusterStore concept to the destination service. This PR
introduces the small improvements:
* Sync dynamically created clients in separate go routines
* Refactor metadata API creation
* Remove redundant check in cluster_store_test
* Create a new runtime schema every time a fake metadata API client is
created to avoid racey behaviour.
Signed-off-by: Matei David <matei@buoyant.io>
Fixes#10764
Removed the `server_port_subscribers` gauge, as it wasn't distiguishing
amongst different pods, and the number of subscribers for each pod were
conflicting with one another when updating the metric (see more details
[here](https://github.com/linkerd/linkerd2/issues/10764#issuecomment-1650835823)).
Besides carying an invalid value, this was generating the warning
`unable to delete server_port_subscribers metric with labels`
The metric was replaced with the `server_port_subscribes` and
`server_port_unsubscribes` counters, which track the overall number of
subscribes and unsubscribes to the particular pod port.
🌮 to @adleong for the diagnosis and the fix!
Currently, the control plane does not support indexing and discovering resources across cluster boundaries. In a multicluster set-up, it is desirable to have access to endpoint data (and by extension, any configuration associated with that endpoint) to support pod-to-pod communication. Linkerd's destination service should be extended to support reading discovery information from multiple sources (i.e. API Servers).
This change introduces an `EndpointsWatcher` cache. On creation, the cache registers a pair of event handlers:
* One handler to read `multicluster` secrets that embed kubeconfig files. For each such secret, the cache creates a corresponding `EndpointsWatcher` to read remote discovery information.
* Another handle to evict entries from the cache when a `multicluster` secret is deleted.
Additionally, a set of tests have been added that assert the behaviour of the cache when secrets are created and/or deleted. The cache will be used by the destination service to do look-ups for services that belong to another cluster, and that are running in a "remote discovery" mode.
---------
Signed-off-by: Matei David <matei@buoyant.io>
Fixes#11163
The `servicePublisher.updateServer` function will iterate through all registered listeners and update them. However, a nil listener may temporarily be in the list of listeners if an unsubscribe is in progress. This results in a nil pointer dereference.
All functions which result in updating the listeners must therefore be protected by the mutex so that we don't try to act on the list of listeners while it is being modified.
Signed-off-by: Alex Leong <alex@buoyant.io>
In #11008 I added Go support for the `timeouts` field in the
`HTTPRouteRule` struct. That used Go's built-in `time.Duration` type,
but based on my reading of kubernetes-sigs/gateway-api#2013, we should
instead by using apimachinery's `metav1.Duration` type.
Signed-off-by: Kevin Ingelman <ki@buoyant.io>
When using `linkerd-await` as a preStart hook, we need to explicitly pass in the proxy's admin port if it is not the default (4191). While the admin server listener can be bound to any arbitrary port using `config.linkerd.io/admin-port` as a configuration annotation, `linkerd-await`'s template is not aware of the override resulting in start-up errors.
This change adds the override to `linkerd-await` by always using an explicit `--port` argument.
---------
Signed-off-by: jclegras <11457480+jclegras@users.noreply.github.com>
Add go client codegen for HttpRoute v1beta3. This will be necessary for any of the go controllers (i.e. metrics-api) or go CLI commands to interact with HttpRoute v1beta3 resources in kubernetes.
Signed-off-by: Kevin Ingelman <ki@buoyant.io>
Add support for enabling and disabling topology aware routing when hints are added/removed.
The testing setup is very involved because it involves so many moving parts
1) Setup a service which is layered over several availability zones.
1a) The best way to do this is one service object, with 3 replicasets explicitly forced to use a specific AZ each.
2) Add `service.kubernetes.io/topology-aware-hints: Auto` annotation to the Service object
3) Use a load tester like k6 to send meaningful traffic to your service but only in one AZ
3) Scale up your replica sets until k8s adds Hints to your endpointslices
4) Observe that traffic shifts to only hit pods in one AZ
5) Turn down the replicasets count until such time that K8s removes the hints from your endpointslices
6) Observe traffic shifts back to all pods across all AZ.
The proxy caches results in-memory, both for inbound and outbound
service (and policy) discovery. While the proxy's default values are
great in most cases, certain client configurations may require
overrides. The proxy supports overriding the default values, however, it
currently does not offer an easy way for users to configure them.
This PR introduces two new values in Linkerd's control plane chart. The
values control the inbound and outbound cache discovery idle timeout --
the amount of time a result will be kept in the cache if unused. Setting
this value will change the configuration for all injected proxies, but
not for the control plane.
---------
Signed-off-by: Matei David <matei@buoyant.io>
Currently, the proxy injector will expand lists of opaque port ranges
into lists of individual port numbers. This is because the proxy has
historically not accepted port ranges in the
`LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION` environment
variable. However, when very large ranges are used, the size of the
injected manifest can be quite large, since each individual port number
in a range must be listed separately.
Proxy PR linkerd/linkerd2-proxy#2395 changed the proxy to accept ranges
as well as individual port numbers in the opaque ports environment
variable, and this change was included in the latest proxy release
(v2.200.0). This means that the proxy-injector no longer needs to expand
large port ranges into individual port numbers, and can now simply
forward the list of ranges to the proxy. This branch changes the proxy
injector to do this, resolving issues with manifest size due to large
port ranges.
Closes#9803
Fixes#10764
`GetProfile` streams create a `server_port_subscribers` gauge that tracks the number of listeners interested in a given Server. Because of an oversight, the gauge was only being registered until the second listener was added. For just one listener the gauge was absent. But whenever the `GetProfile` stream ended, the gauge was deleted which resulted in this error if it wasn't registered to begin with:
```
level=warning msg="unable to delete server_port_subscribers metric with labels map[name:voting namespace:emojivoto port:4191]" addr=":8086" component=server
```
One can check that the gauge wasn't being created by installing viz and emojivoto, and checking the following returns empty:
```bash
$ linkerd diagnostics controller-metrics | grep server_port_subscribers
```
After this fix, one can see the metric getting populated:
```bash
$ linkerd diagnostics controller-metrics | grep server_port_subscribers
# HELP server_port_subscribers Number of subscribers to Server changes associated with a pod's port.
# TYPE server_port_subscribers gauge
server_port_subscribers{name="emoji",namespace="emojivoto",port="4191"} 1
server_port_subscribers{name="linkerd",namespace="linkerd",port="4191"} 1
server_port_subscribers{name="linkerd",namespace="linkerd",port="9990"} 1
server_port_subscribers{name="linkerd",namespace="linkerd",port="9995"} 1
server_port_subscribers{name="linkerd",namespace="linkerd",port="9996"} 1
server_port_subscribers{name="linkerd",namespace="linkerd",port="9997"} 1
server_port_subscribers{name="metrics",namespace="linkerd-viz",port="4191"} 1
server_port_subscribers{name="metrics",namespace="linkerd-viz",port="9995"} 1
server_port_subscribers{name="tap",namespace="linkerd-viz",port="4191"} 1
server_port_subscribers{name="tap",namespace="linkerd-viz",port="9995"} 1
server_port_subscribers{name="tap",namespace="linkerd-viz",port="9998"} 1
server_port_subscribers{name="vote",namespace="emojivoto",port="4191"} 1
server_port_subscribers{name="voting",namespace="emojivoto",port="4191"} 1
server_port_subscribers{name="web",namespace="emojivoto",port="4191"} 1
server_port_subscribers{name="web",namespace="linkerd-viz",port="4191"} 1
server_port_subscribers{name="web",namespace="linkerd-viz",port="9994"} 1
```
And when scaling down the voting deployment, one can see how the metric with `name="voting"` is removed.
* add `trust_dns=error` to default proxy log level
Since upstream has yet to release a version with PR
bluejekyll/trust-dns#1881, this commit changes the proxy's default log
level to silence warnings from `trust_dns_proto` that are generally
spurious.
Closes#10123.
The outbound proxy handles endpoints with the `opaque_transport` flag by
opening a direct connection to the inbound proxy's inbound listener
port, and sending a ProtoBuf `TransportHeader` including the target port
of the originating outbound connection and an (optional)
`SessionProtocol` describing the protocol used on that connection.
Currently, outbound proxies initiating direct connections will *always*
send `SessionProtocol` values communicating the protocol as understood
by the outbound proxy. However, this is not always the desired behavior.
Direct connections with `TransportHeader`s are used in two cases: for
gateway connections, and for ports which are marked as opaque. When the
inbound port is marked as opaque, the presence of a `SessionProtocol`
tells the inbound proxy to handle that connection as the indicated
protocol, which results in incorrect behavior when the inbound proxy's
ServerPolicy configures the target port as opaque (see #9888).
Therefore, the `Destination` proxy API has been updated to add a new
`ProtocolHint`, `Opaque`, which indicates that an outbound proxy should
_not_ send a `SessionProtocol` when initiating a direct connection, even
if the outbound proxy handled the connection as HTTP. This hint was
added to the proxy API in linkerd/linkerd2-proxy-api#197, and released
in `linkerd2-proxy-api` v0.8.0.
This branch updates the Destination controller's dependency on
`linkerd2-proxy-api` to v0.8.0, and changes the controller to send an
`Opaque` protocol hint when the target port is marked as opaque on the
destination pod. This should override the `H2` protocol hint that is
added when the destination is meshed. I've also added a new test for
this behavior.
Fixes#9888 (along with linkerd/linkerd2-proxy#2209, which changes the
proxy to actually handle the `Opaque` protocol hint).
proxy-init v2.2.1:
* Sanitize `subnets-to-ignore` flag
* Dep bumps
cni-plugin v1.1.0:
* Add support for the `config.linkerd.io/skip-subnets` annotation
* Dep bumps
validator v0.1.2:
* Dep bumps
Also, `linkerd-network-validator` is now released wrapped in a tar file, so this PR also amends `Dockerfile-proxy` to account for that.
At the time of making this commit, the package `github.com/ghodss/yaml`
is no longer actively maintained.
`sigs.k8s.io/yaml` is a permanent fork of `ghodss/yaml` and is actively
maintained by Kubernetes SIG.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
The GetProfile API endpoint does not behave as expected: when a profile
watch is established, the API server starts two separate profile
watches--a primary watch with the client's namespace and fallback watch
ignoring the client's namespace. These watches race to send data back to
the client. If the backup watch updates first, it may be sent to clients
before being corrected by a subsequent update. If the primary watch
updates with an empty value, the default profile may be served before
being corrected by an update to the backup watch.
From the proxy's perspective, we'd much prefer that the API provide a
single authoritative response when possible. It avoids needless
corrective work from distributing across the system on every watch
initiation.
To fix this, we modify the fallbackProfileListener to behave
predictably: it only emits updates once both its primary and fallback
listeners have been updated. This avoids emitting updates based on a
partial understanding of the cluster state.
Furthermore, the opaquePortsAdaptor is updated to avoid synthesizing a
default serviceprofile (surprising behavior) and, instead, this
defaulting logic is moved into a dedicated defaultProfileListener
helper. A dedupProfileListener is added to squelch obviously redundant
updates.
Finally, this newfound predictability allows us to simplify the API's
tests. Many of the API tests are not clear in what they test and
sometimes make assertions about the "incorrect" profile updates.
Before changing any GetProfile behavior, this change splits the API
handler into some smaller scopes. This helps to clarify control flow and
reduce nested contexts. This change also adds relevant fields to log
contexts to improve diagnostics.
When processing a `delete` event for an EndpointSlice, regardless of the outcome (whether there are still addresses/endpoints alive or whether we have no endpoints left) we make a call to `noEndpoints` on the subscriber. This will send a `Destination.Get` update to the listeners to advertise a `NoEndpoints` event.
If there are still addresses left, NoEndpoints { exists: true } will be sent (to signal the service still exists). Until an update is registered (i.e a new add) there will be no available endpoints -- this is incorrect, since other EndpointSlices may exist. This change fixes the problem by handling `noEndpoints` in a more specialized way for EndpointSlices.
Signed-off-by: Yannick Utard <yannickutard@gmail.com>
Fixes#9986
After reviewing the k8s API calls in Destination, it was concluded we
could only swap out the calls to the Node and RS resources to use the
metadata API, as all the other resources (Endpoints, EndpointSlices,
Services, Pod, ServiceProfiles, Server) required fields other than those
found in their metadata section.
This also required completing the `NewFakeAPI` implementation by adding
the missing annotations and labels entries.
## Testing Memory Consumption
The gains here aren't as big as in #9650. In order to test this we need
to push hard and create 4000 RS:
``` bash
for i in {0..4000}; do kubectl create deployment test-pod-$i --image=nginx; done
```
In edge-23.2.1 the destination pod's memory consumption goes from 40Mi
to 160Mi after all the RS were created. With this change, it went from
37Mi to 140Mi.
Fixes#10205
The server watches uses `*corev1.Pod` in the key of it's subscriber map which is potentially brittle since it's not obvious what exactly should count as equality for full pod resources.
More robust is to use a unique per-pod identifier in the key: the pod's name and namespace. We therefore more the pod resource into the value type of the map since we still need it when we receive a Server update so that we can iterate through all subscriptions and find pods which match the Server's pod selector.
Signed-off-by: Alex Leong <alex@buoyant.io>
(This came out during the k8s api calls review for #9650)
In 5dc662ae9 inheritance of opaque ports annotation from namespaces to
pods was removed, but we didn't remove the associated RBAC.
* Add metrics to server and service watchers
Closes#10202 and completes #2204
As a followup to #10201, I'm adding the following metric in `server_watcher.go`:
- `server_port_subscribers`: This tracks the number of subscribers to changes to Servers associated to a port in a pod. The metric's label identify the namespace and name of the pod, and its targeted port.
Additionally, `opaque_ports.go` was missing metrics as well. I added `service_subscribers` which tracks the number of subscribers to a given Service, labeled by the Service's namespace and name.
`opaque_ports.go` was also leaking the subscriber's map key, so that got fixed as well.
Closes#10130https://kubernetes.io/docs/concepts/services-networking/service-traffic-policy/
1. Update endpoints watcher to include additional field `localTrafficPolicy`.
Set to true when `.spec.internalTrafficPolicy` is set to `Local`
2. Update endpoints translater to filter by node when `localTrafficPolicy` is
set to true. Topology Aware Hints are not used when
`service.pec.internalTrafficPolicy` is set to local
Signed-off-by: Yu Cao <yc185050@ncr.com>
Fixes#10150
When we added PodSecurityAdmission in #9719 (and included in
edge-23.1.1), we added the entry `seccompProfile.type=RuntimeDefault` to
the containers SecurityContext.
For PSP to accept that we require to add the annotation
`seccomp.security.alpha.kubernetes.io/allowedProfileNames:
"runtime/default"` into the PSP resource, which also implies we require
to add the entry `seccompProfile.type=RuntimeDefault` to the pod's
SecurityContext as well, not just the container's.
It also turns out the `namespace-metadata` Jobs used by extensions for
the helm installation method didn't have their ServiceAccount properly
bound to the PSP resource. This resulted in the `helm install` command
failing, and although the extensions resources did get deployed, they
were not being discoverable by `linkerd check`. This change fixes that
as well, that has been broken since 2.12.0!
Fixes#8270
When a listener unsubscribes to port updates in Servers, we were
removing the listener for the `ServerWatcher.subscriptions` map, leaving
the map's key (`podPort` with holds the pod object and port) with an
empty value. In clusters where there's a lot of pod churn, those keys
with empty values were getting accumulated, so this change cleans that
up.
The repro (basically constantly rolling emojivoto) is described in
#9947.
A followup will be up shortly adding metrics to track these metrics,
along with similar missing metrics from other parts of Destination.
Fixes https://github.com/linkerd/linkerd2/issues/10036
The Linkerd control plane components written in go serve liveness and readiness probes endpoint on their admin server. However, the admin server is not started until k8s informer caches are synced, which can take a long time on large clusters. This means that liveness checks can time out causing the controller to be restarted.
We start the admin server before attempting to sync caches so that we can respond to liveness checks immediately. We fail readiness probes until the caches are synced.
Signed-off-by: Alex Leong <alex@buoyant.io>
* build(deps): bump sigs.k8s.io/gateway-api from 0.5.1 to 0.6.0
Bumps [sigs.k8s.io/gateway-api](https://github.com/kubernetes-sigs/gateway-api) from 0.5.1 to 0.6.0.
- [Release notes](https://github.com/kubernetes-sigs/gateway-api/releases)
- [Changelog](https://github.com/kubernetes-sigs/gateway-api/blob/main/CHANGELOG.md)
- [Commits](https://github.com/kubernetes-sigs/gateway-api/compare/v0.5.1...v0.6.0)
---
updated-dependencies:
- dependency-name: sigs.k8s.io/gateway-api
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* Account for possible errors returned from `AddEventHandler`
In v0.26.0 client-go's `AddEventHandler` method for informers started
returning a registration handle (that we ignore) and an error that we
now surface up.
* client-go v0.26.0 removed the openstack plugin
* Temporary changes to trigger tests in k8s 1.21
- Adds an innocuous change to integration.yml so that all tests get
triggered
- Hard-code k8s version in `k3d cluster create` invocation to v1.21
* Revert "Temporary changes to trigger tests in k8s 1.21"
This reverts commit 3e1fdd0e5e.
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
* Removed dupe imports
My IDE (vim-gopls) has been complaining for a while, so I decided to take
care of it. Found via
[staticcheck](https://github.com/dominikh/go-tools)
* Add stylecheck to go-lint checks
Fixes#10003
When endpoints are removed from an EndpointSlice resource, the destination controller builds a list of addresses to remove. However, if any of the removed endpoints have a Pod as their targetRef, we will attempt to fetch that pod to build the address to remove. If that pod has already been removed from the informer cache, this will fail and the endpoint will be skipped in the list of endpoints to be removed. This results in stale endpoints being stuck in the address set and never being removed.
We update the endpoint watcher to construct only a list of endpoint IDs for endpoints to remove, rather than fetching the entire pod object. Since we no longer attempt to fetch the pod, this operation is now infallible and endpoints will no longer be skipped during removal.
We also add a `TestEndpointSliceScaleDown` test to exercise this.
Signed-off-by: Alex Leong <alex@buoyant.io>
Closes#9676
This adds the `pod-security.kubernetes.io/enforce` label as described in [Pod Security Admission labels for namespaces](https://kubernetes.io/docs/concepts/security/pod-security-admission/#pod-security-admission-labels-for-namespaces).
PSA gives us three different possible values (policies or modes): [privileged, baseline and restricted](https://kubernetes.io/docs/concepts/security/pod-security-standards/).
For non-CNI mode, the proxy-init container relies on granting the NET_RAW and NET_ADMIN capabilities, which places those pods under the `restricted` policy. OTOH for CNI mode we can enforce the `restricted` policy, by setting some defaults on the containers' `securityContext` as done in this PR.
Also note this change also adds the `cniEnabled` entry in the `values.yaml` file for all the extension charts, which determines what policy to use.
Final note: this includes the fix from #9717, otherwise an empty gateway UID prevents the pod to be created under the `restricted` policy.
## How to test
As this is only enforced as of k8s 1.25, here are the instructions to run 1.25 with k3d using Calico as CNI:
```bash
# launch k3d with k8s v1.25, with no flannel CI
$ k3d cluster create --image='+v1.25' --k3s-arg '--disable=local-storage,metrics-server@server:0' --no-lb --k3s-arg --write-kubeconfig-mode=644 --k3s-arg --flannel-backend=none --k3s-arg --cluster-cidr=192.168.0.0/16 --k3s-arg '--disable=servicelb,traefik@server:0'
# install Calico
$ k apply -f https://k3d.io/v5.1.0/usage/advanced/calico.yaml
# load all the images
$ bin/image-load --k3d proxy controller policy-controller web metrics-api tap cni-plugin jaeger-webhook
# install linkerd-cni
$ bin/go-run cli install-cni|k apply -f -
# install linkerd-crds
$ bin/go-run cli install --crds|k apply -f -
# install linkerd-control-plane in CNI mode
$ bin/go-run cli install --linkerd-cni-enabled|k apply -f -
# Pods should come up without issues. You can also try the viz and jaeger extensions.
# Try removing one of the securityContext entries added in this PR, and the Pod
# won't come up. You should be able to see the PodSecurity error in the associated
# ReplicaSet.
```
To test the multicluster extension using CNI, check this [gist](https://gist.github.com/alpeb/4cbbd5ad87538b9e0d39a29b4e3f02eb) with a patch to run the multicluster integration test with CNI in k8s 1.25.
When performing the HostPort mapping introduced in #9819, the `containsIP` iterates through the pod IPs searching for a match against `targetIP` using `ip.String()`, but that returns something like `&PodIP{IP: xxx}`. Fixed that to just use `ip.IP`, and also completed the text fixtures to include both `PodIP` and `PodIPs` in the pods manifests.
Note this wasn't affecting the end result, it was just producing an extra warning as shown below, that this change eliminates:
```bash
$ go test -v ./controller/api/destination/... -run TestGetProfiles
=== RUN TestGetProfiles
...
=== RUN TestGetProfiles/Return_profile_with_endpoint_when_using_pod_DNS
time="2022-11-29T09:38:48-05:00" level=info msg="waiting for caches to sync"
time="2022-11-29T09:38:49-05:00" level=info msg="caches synced"
time="2022-11-29T09:38:49-05:00" level=warning msg="unable to find container port as host (172.17.13.15) matches neither PodIP nor HostIP (&Pod{ObjectMeta:{pod-0 ns 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[linkerd.io/control-plane-ns:linkerd] map[] [] [] []},Spec:PodSpec{Volumes:[]Volume{},Containers:[]Container{},RestartPolicy:,TerminationGracePeriodSeconds:nil,ActiveDeadlineSeconds:nil,DNSPolicy:,NodeSelector:map[string]string{},ServiceAccountName:,DeprecatedServiceAccount:,NodeName:,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:nil,ImagePullSecrets:[]LocalObjectReference{},Hostname:,Subdomain:,Affinity:nil,SchedulerName:,InitContainers:[]Container{},AutomountServiceAccountToken:nil,Tolerations:[]Toleration{},HostAliases:[]HostAlias{},PriorityClassName:,Priority:nil,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[]PodReadinessGate{},RuntimeClassName:nil,EnableServiceLinks:nil,PreemptionPolicy:nil,Overhead:ResourceList{},TopologySpreadConstraints:[]TopologySpreadConstraint{},EphemeralContainers:[]EphemeralContainer{},SetHostnameAsFQDN:nil,OS:nil,HostUsers:nil,},Status:PodStatus{Phase:Running,Conditions:[]PodCondition{},Message:,Reason:,HostIP:,PodIP:172.17.13.15,StartTime:<nil>,ContainerStatuses:[]ContainerStatus{},QOSClass:,InitContainerStatuses:[]ContainerStatus{},NominatedNodeName:,PodIPs:[]PodIP{},EphemeralContainerStatuses:[]ContainerStatus{},},})" test=TestGetProfiles/Return_profile_with_endpoint_when_using_pod_DNS
```
Fixes#9896
The maps in `endpointTranslator` weren't being guarded against
concurrent access, so we're adding locks at the `Add` and `Remove`
methods. Also these functions ultimately call the `SendMsg` method on
the gRPC `stream`, which is not
["thread-safe"](https://github.com/grpc/grpc-go/blob/master/stream.go#L122-L126),
so we're guarding against other problems as well.
A new unit test `TestConcurrency` was added that failed in the following
ways before this fix:
When running the test with the `-race` flag, we immediately get the data race warning:
```bash
$ go test ./controller/api/destination/... -run TestConcurrency -race
time="2022-11-25T16:48:52-05:00" level=info msg="waiting for caches to sync"
time="2022-11-25T16:48:52-05:00" level=info msg="caches synced"
==================
WARNING: DATA RACE
Read at 0x00c0000c0040 by goroutine 161:
github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).Add()
/home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator.go:80 +0x29c
github.com/linkerd/linkerd2/controller/api/destination.TestConcurrency.func1()
/home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator_test.go:338 +0x92
Previous write at 0x00c0000c0040 by goroutine 162:
github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).sendFilteredUpdate()
/home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator.go:95 +0x66
github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).Add()
/home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator.go:83 +0x330
github.com/linkerd/linkerd2/controller/api/destination.TestConcurrency.func1()
/home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator_test.go:338 +0x92
Goroutine 161 (running) created at:
github.com/linkerd/linkerd2/controller/api/destination.TestConcurrency()
/home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator_test.go:336 +0x6f
testing.tRunner()
/usr/local/go/src/testing/testing.go:1439 +0x213
testing.(*T).Run.func1()
/usr/local/go/src/testing/testing.go:1486 +0x47
Goroutine 162 (running) created at:
github.com/linkerd/linkerd2/controller/api/destination.TestConcurrency()
/home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator_test.go:336 +0x6f
testing.tRunner()
/usr/local/go/src/testing/testing.go:1439 +0x213
testing.(*T).Run.func1()
/usr/local/go/src/testing/testing.go:1486 +0x47
```
If run without the `-race` flag, we get the `concurrent map writes` panic reported in #9896:
```bash
$ go test ./controller/api/destination/... -run TestConcurrency -count=1
time="2022-11-25T16:53:25-05:00" level=info msg="waiting for caches to sync"
time="2022-11-25T16:53:25-05:00" level=info msg="caches synced"
fatal error: concurrent map writes
goroutine 187 [running]:
runtime.throw({0x1b57bc4?, 0x500000000000000?})
/usr/local/go/src/runtime/panic.go:992 +0x71 fp=0xc00013dc80 sp=0xc00013dc50 pc=0x43a5b1
runtime.mapassign(0xc00013dec8?, 0x2?, 0x0?)
/usr/local/go/src/runtime/map.go:595 +0x4d6 fp=0xc00013dd00 sp=0xc00013dc80 pc=0x4113b6
github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).Add(...)
/home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator.go:80
github.com/linkerd/linkerd2/controller/api/destination.TestConcurrency.func1()
/home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator_test.go:338 +0x1a8 fp=0xc00013dfe0 sp=0xc00013dd00 pc=0x16d1da8
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc00013dfe8 sp=0xc00013dfe0 pc=0x46d721
created by github.com/linkerd/linkerd2/controller/api/destination.TestConcurrency
/home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator_test.go:336 +0x3c
```
* edge-22.11.3 change notes
Besides the notes, this corrects a small point in `RELEASE.md`, and
bumps the proxy-init image tag to `v2.1.0`. Note that the entry under
`go.mod` wasn't bumped because moving it past v2 requires changes on
`linkerd2-proxy-init`'s `go.mod` file, and we're gonna drop that
dependency soon anyways. Finally, all the charts got their patch version
bumped, except for `linkerd2-cni` that got its minor bumped because of
the tolerations default change.
## edge-22.11.3
This edge release fixes connection errors to pods using a `hostPort` different
than their `containerPort`. Also the `network-validator` init container improves
its logging, and the `linkerd-cni` DaemonSet now gets deployed in all nodes by
default.
* Fixed `destination` service to properly discover targets using a `hostPort`
different than their `containerPort`, which was causing 502 errors
* Upgraded the `network-validator` with better logging allowing users to
determine whether failures occur as a result of their environment or the tool
itself
* Added default `Exists` toleration to the `linkerd-cni` DaemonSet, allowing it
to be deployed in all nodes by default, regardless of taints
Co-authored-by: Oliver Gould <ver@buoyant.io>