linkerd2

Commit Graph

Author	SHA1	Message	Date
Alex Leong	72ff2f787f	Add flag to enable namespace creation in the service mirror controller (#13137 ) When the service mirror controller attempts to mirror a remote service in a namespace that does not exist in the local cluster, it skips mirroring that service since there is no local namespace to put the service in. We make this behavior configurable by adding a link value called `enableNamespaceCreation`. When set to true, the service mirror controller will create namespaces as necessary to mirror services if those namespaces don't already exist locally. When set to false (which is the default), the current behavior is preserved where mirroring of the service will be skipped if the local namespace does not already exist. Namespace creation can be enabled as so: ``` linkerd --context east multicluster link --cluster-name=east --set enableNamespaceCreation=true \| kubectl --context=west apply -f - ``` Signed-off-by: Alex Leong <alex@buoyant.io>	2024-10-07 12:24:47 -07:00
Alejandro Pedraza	7b2b01d539	Unregister prom gauges when recycling cluster watcher (#11875 ) Unregister prom gauges when recycling cluster watcher Fixes #11839 When in `restartClusterWatcher` we fail to connect to the target cluster for whatever reason, the function gets called again 10s later, and tries to register the same prometheus metrics without unregistering them first, which generates warnings. The problem lies in `NewRemoteClusterServiceWatcher`, which instantiates the remote kube-api client and registers the metrics, returning a nil object if the client can't connect. `cleanupWorkers` at the beginning of `restartClusterWatcher` won't unregister those metrics because of that nil object. To fix this, gauges are unregistered on error.	2024-01-05 18:07:13 -08:00
Matei David	4f569ae5c8	Allow clusters to be linked without a gateway (#11226 ) When a cluster has been installed without a gateway, it cannot be linked against, unless a load balancer service is used as an override. The service-mirror is tightly coupled with the notion of gateways. However, a gateway is not strictly necessary when clusters operate in a flat network. As part of this change, `linkerd multicluster link` has been changed to allow clusters without gateways to be linked against. When a cluster does not have a gateway, all services _must_ be exported in `remote-discovery` mode, otherwise routing wouldn't work. In addition, when a cluster does not have a gateway, linking against it will not create a probe service (since there is nothing to probe). Lastly, a check has been modified to ignore checking replicated endpoints when a service is in remote-discovery mode (to avoid false positives). Signed-off-by: Matei David <matei@buoyant.io>	2023-08-15 12:12:37 -07:00
Alex Leong	368b63866d	Add support for remote discovery (#11224 ) Adds support for remote discovery to the destination controller. When the destination controller gets a `Get` request for a Service with the `multicluster.linkerd.io/remote-discovery` label, this is an indication that the destination controller should discover the endpoints for this service from a remote cluster. The destination controller will look for a remote cluster which has been linked to it (using the `linkerd multicluster link` command) with that name. It will look at the `multicluster.linkerd.io/remote-discovery` label for the service name to look up in that cluster. It then streams back the endpoint data for that remote service. Since we now have multiple client-go informers for the same resource types (one for the local cluster and one for each linked remote cluster) we add a `cluster` label onto the prometheus metrics for the informers and EndpointWatchers to ensure that each of these components' metrics are correctly tracked and don't overwrite each other. --------- Signed-off-by: Alex Leong <alex@buoyant.io>	2023-08-11 09:31:45 -07:00
Matei David	d0e837d9ce	Add HA mode for service-mirror (#11047 ) In certain scenarios, the service-mirror may act as a single point of failure. Linkerd's multicluster extension supports an `--ha` mode to increase reliability by adding more replicas, however, it is currently supported only in the gateway. To avoid the service-mirror as a single point of failure, this change introduces an `--ha` flag for `linkerd multicluster link`. The HA flag will use a set of value overrides that will: * Configure the service-mirror with affinity and PDB policies to ensure replicas are spread across hosts to protect against (in)voluntary disruptions; * Configure the service-mirror to run with more than 3 replicas; * Configure the service-mirror deployment's rolling strategy to ensure at least one replica is available. Additionally, with the introduction of leader election, `linkerd mc gateways` displays redundant information since metrics are collected from each pod. This change adds a small lookup table of currently lease claimants. Metrics are extracted only for claimants. --------- Signed-off-by: Matei David <matei@buoyant.io>	2023-07-17 09:24:59 +01:00
Matei David	958d7983ac	Add leader election to the service-mirror controller (#11046 ) In order to support an HA mode for the service-mirror component, some form of synchronization should be used to coordinate between replicas of the service-mirror controller. Although in practice most of the updates done by the replicas are idempotent (and have benign effects on correctness), there are some downsides, such as: resource usage implications from setting-up multiple watches, log pollution, errors associated with writes on resources that out of date, and increased difficulty in debugging. This change adds coordination between the replicas through leader election. To achieve leader election, client-go's `coordination` package is used. The change refactors the existing code; the previous nested loops now reside in a closure (to capture the necessary configuration), and the closure is run when a leader is elected. Leader election functions as part of a loop: a lease resource is created (if it does not exist), and the controller blocks until it has acquired the lease. The loop is terminated only on shutdown from an interrupt signal. If the lease is lost, it is released, watchers are cleaned-up, and the controller returns to blocking until it acquires the lease once again. Shutdown logic has been changed to rely on context cancellation propagation so that the watchers may be ended either by the leader elector (when claim is lost) or by the main routine when an interrupt is handled. --------- Signed-off-by: Matei David <matei@buoyant.io> Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>	2023-06-29 11:27:40 +01:00
Alex Leong	b0778bb2ea	Readiness checks fail until caches are synced (#10166 ) Fixes https://github.com/linkerd/linkerd2/issues/10036 The Linkerd control plane components written in go serve liveness and readiness probes endpoint on their admin server. However, the admin server is not started until k8s informer caches are synced, which can take a long time on large clusters. This means that liveness checks can time out causing the controller to be restarted. We start the admin server before attempting to sync caches so that we can respond to liveness checks immediately. We fail readiness probes until the caches are synced. Signed-off-by: Alex Leong <alex@buoyant.io>	2023-01-25 11:43:09 -08:00
Kevin Leimkuhler	388f14f48f	allow pprof to be configurable via helm flags (#8090 ) Follow-up to #8087 that allows pprof to be enabled via the `--set enablePprof=true` flag. Each control plane components spawns its own admin server, so each of these received it's own `enable-pprof` flag. When `enablePprof=true`, it is passed through to each component so that when it launches its admin server, its pprof endpoints are enabled. A note on the templating: `-enable-pprof={{.Values.enablePprof \| default false}}`. `false` values are not rendered by Helm so without the `... \| default false}}`, it tries to pass the flag as `-enable-pprof=""` which results in an error. Inlining this felt better than conditionally passing the flag with ```yaml {{ if .Values.enablePprof -}} -enable-pprof={{.Values.enablePprof}} {{ end -}} ``` Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>	2022-03-22 14:31:04 -06:00
Kevin Leimkuhler	9829f486a0	empty mirror service endpoints when gateway is down (#8022 ) Closes linkerd/linkerd-failover#63 When the gateway in a target cluster is down, the endpoints for mirror services on a source cluster do not reflect that the target cluster will be unreachable. To fix this, the endpoints for mirror services should transition their addresses to not ready. Currently, every connected cluster has a probe worker and a cluster watcher. The probe worker is responsible for probing the gateway on the target cluster, and the cluster watcher is responsible for watching for events on the target cluster and making necessary changes on the source cluster. The probe worker and cluster watcher do not communicate in any way. This change introduces the ability for the probe worker to update the cluster watcher when the gateway has a change in liveness. When a gateway's liveness status changes, the cluster watcher is updated and triggers an endpoint repair. If the gateway is now down, the endpoint repair transitions all endpoints for mirror services to not ready. If the gateway is alive, the endpoint repair will ensure that all endpoints for mirror services point to the ready gateway address. There is a possibility that a gateway is unreachable but the local cluster can still connect to the remote cluster. When this is the case, we still honor the probe's liveness status. This means for example that if a remote service is created when the gateway is down, the local cluster may still see that update (via the cluster watcher), but the endpoints that it creates for the mirror service are not ready. Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>	2022-03-14 09:27:26 -06:00
Kevin Leimkuhler	67bcd8f642	Add `gosec` and `errcheck` lints (#7954 ) Closes #7826 This adds the `gosec` and `errcheck` lints to the `golangci` configuration. Most significant lints have been fixed my individual changes, but this enables them by default so that all future changes are caught ahead of time. A significant amount of these lints are been exluced by the various `exclude-rules` rules added to `.golangci.yml`. These include operations are files that generally do not fail such as `Copy`, `Flush`, or `Write`. We also choose to ignore most errors when cleaning up functions via the `defer` keyword. Aside from those, there are several other rules added that all have comments explaining why it's okay to ignore the errors that they cover. Finally, several smaller fixes in the code have been made where it seems necessary to catch errors or at least log them. Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>	2022-03-03 10:09:51 -07:00
Oliver Gould	f5876c2a98	go: Enable `errorlint` checking (#7885 ) Since Go 1.13, errors may "wrap" other errors. [`errorlint`][el] checks that error formatting and inspection is wrapping-aware. This change enables `errorlint` in golangci-lint and updates all error handling code to pass the lint. Some comparisons in tests have been left unchanged (using `//nolint:errorlint` comments). [el]: https://github.com/polyfloyd/go-errorlint Signed-off-by: Oliver Gould <ver@buoyant.io>	2022-02-16 18:32:19 -07:00
Stepan Rabotkin	5e6a1b5508	Graceful shutdown for admin server (#6817 ) * Graceful shutdown for admin server Signed-off-by: Stepan Rabotkin <epicstyt@gmail.com>	2021-09-07 10:50:31 -05:00
Matei David	e1e7b1d280	Introduce support for StatefulSets across multicluster (#6090 ) The purpose of this PR is to mirror StatefulSets in a multicluster setting. Currently, it isn't possible to communicate with a specific pod in a StatefulSet across clusters without manually creating clusterIP services for each pod backing the StatefulSet in the target cluster. After some brainstorming, we decided that one way to solve this problem is to have the Service Mirror component create a "root" headless service in our source cluster along with clusterIP services (one for each pod backing the StatefulSet in the target cluster). The idea here is that each individual clusterIP service will also have an Endpoints object whose only host is the Gateway IP -- this is the way mirrored services are constructed in a multicluster environment. The Endpoints object for the root service will contain pairs of hostnames and IP addresses; each hostname maps to the name of a pod in the StatefulSet, its IP corresponds to the clusterIP service that the Service Mirror would create in the source cluster. To exemplify, assume a StatefulSet `foo` in a target cluster `west` with 2 pods (foo-0, foo-1). In the source cluster `east`, we create a headless root service foo-west` and 2 services (`foo-0-west`, `foo-1-west`) whose Endpoints point to the Gateway IP. Foo-west's Endpoints will contain an AddressSet with two hosts: ```yaml # foo-west Endpoints - hostname: foo-0 ip: <clusterIP of foo-0-west> - hostname: foo-1 ip: <clusterIP of foo-1-west> ``` By making these changes, we solve the concerns associated with manually creating these services since the Service Mirror would reconcile, create and delete the clusterIP services (as opposed to requiring any interaction from the end user). Furthermore, by having a "root" headless service we can also configure DNS -- for an end user, there wouldn't be any difference in addressing a specific pod in the StatefulSet as far as syntax goes (i.e the host `foo-0.foo-west.default.svc.cluster.local` would point to the pod foo-0 in cluster west). Closes #5162	2021-07-29 14:12:20 -06:00
Tarun Pothulapati	72a0ca974d	extension: Separate multicluster chart and binary (#5293 ) Fixes #5257 This branch movies mc charts and cli level code to a new top level directory. None of the logic is changed. Also, moves some common types into `/pkg` so that they are accessible both to the main cli and extensions. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-12-04 16:36:10 -08:00

14 Commits