linkerd2

Commit Graph

Author	SHA1	Message	Date
Alejandro Pedraza	4a84f2cb32	Implement the k8s metadata API in the Destination controller (#10326 ) Fixes #9986 After reviewing the k8s API calls in Destination, it was concluded we could only swap out the calls to the Node and RS resources to use the metadata API, as all the other resources (Endpoints, EndpointSlices, Services, Pod, ServiceProfiles, Server) required fields other than those found in their metadata section. This also required completing the `NewFakeAPI` implementation by adding the missing annotations and labels entries. ## Testing Memory Consumption The gains here aren't as big as in #9650. In order to test this we need to push hard and create 4000 RS: ``` bash for i in {0..4000}; do kubectl create deployment test-pod-$i --image=nginx; done ``` In edge-23.2.1 the destination pod's memory consumption goes from 40Mi to 160Mi after all the RS were created. With this change, it went from 37Mi to 140Mi.	2023-02-13 17:30:07 -05:00
Alejandro Pedraza	aaca63be1f	Remove unneeded `namespaces` access in Destination (#10324 ) (This came out during the k8s api calls review for #9650) In `5dc662ae9` inheritance of opaque ports annotation from namespaces to pods was removed, but we didn't remove the associated RBAC.	2023-02-13 16:13:37 -05:00
Alex Leong	b0778bb2ea	Readiness checks fail until caches are synced (#10166 ) Fixes https://github.com/linkerd/linkerd2/issues/10036 The Linkerd control plane components written in go serve liveness and readiness probes endpoint on their admin server. However, the admin server is not started until k8s informer caches are synced, which can take a long time on large clusters. This means that liveness checks can time out causing the controller to be restarted. We start the admin server before attempting to sync caches so that we can respond to liveness checks immediately. We fail readiness probes until the caches are synced. Signed-off-by: Alex Leong <alex@buoyant.io>	2023-01-25 11:43:09 -08:00
Alejandro Pedraza	e6fa5a7156	Replace usage of io/ioutil package (#9613 ) `io/ioutil` has been deprecated since go 1.16 and the linter started to complain about it.	2022-10-13 12:10:58 -05:00
Oliver Gould	fbe92fab40	heartbeat: Include the CPU architecture in reports (#9589 ) It would be useful to know how prevalent 32-bit ARM deployments are so that we can determine whether it makes sense to continue supporting this platform. This change adds an `arch` field to heartbeats indicating the CPU architecture of the heartbeat container. Signed-off-by: Oliver Gould <ver@buoyant.io>	2022-10-10 12:31:05 -07:00
Oliver Gould	b9ecbcb521	Remove needless RBAC on the identity controller (#9368 ) The identity controller requires access to read all deployments. This isn't necessary. When these permissions were added in #3600, we incorrectly assumed that we must pass a whole Deployment resource as a _parent_ when recording events. The [EventRecorder docs] say: > 'object' is the object this event is about. Event will make a > reference--or you may also pass a reference to the object directly. We can confirm this by reviewing the source for [GetReference]: we can simply construct an ObjectReference without fetching it from the API. This change lets us drop unnecessary privileges in the identity controller. [EventRecorder docs]: https://pkg.go.dev/k8s.io/client-go/tools/record#EventRecorder [GetReference]: `ab826d2728/tools/reference/ref.go (L38-L45)` Signed-off-by: Oliver Gould <ver@buoyant.io>	2022-09-13 12:36:14 -07:00
Kevin Leimkuhler	388f14f48f	allow pprof to be configurable via helm flags (#8090 ) Follow-up to #8087 that allows pprof to be enabled via the `--set enablePprof=true` flag. Each control plane components spawns its own admin server, so each of these received it's own `enable-pprof` flag. When `enablePprof=true`, it is passed through to each component so that when it launches its admin server, its pprof endpoints are enabled. A note on the templating: `-enable-pprof={{.Values.enablePprof \| default false}}`. `false` values are not rendered by Helm so without the `... \| default false}}`, it tries to pass the flag as `-enable-pprof=""` which results in an error. Inlining this felt better than conditionally passing the flag with ```yaml {{ if .Values.enablePprof -}} -enable-pprof={{.Values.enablePprof}} {{ end -}} ``` Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>	2022-03-22 14:31:04 -06:00
Kevin Leimkuhler	67bcd8f642	Add `gosec` and `errcheck` lints (#7954 ) Closes #7826 This adds the `gosec` and `errcheck` lints to the `golangci` configuration. Most significant lints have been fixed my individual changes, but this enables them by default so that all future changes are caught ahead of time. A significant amount of these lints are been exluced by the various `exclude-rules` rules added to `.golangci.yml`. These include operations are files that generally do not fail such as `Copy`, `Flush`, or `Write`. We also choose to ignore most errors when cleaning up functions via the `defer` keyword. Aside from those, there are several other rules added that all have comments explaining why it's okay to ignore the errors that they cover. Finally, several smaller fixes in the code have been made where it seems necessary to catch errors or at least log them. Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>	2022-03-03 10:09:51 -07:00
Oliver Gould	425a43def5	Enable gocritic linting (#7906 ) [gocritic][gc] helps to enforce some consistency and check for potential errors. This change applies linting changes and enables gocritic via golangci-lint. [gc]: https://github.com/go-critic/go-critic Signed-off-by: Oliver Gould <ver@buoyant.io>	2022-02-17 22:45:25 +00:00
Alejandro Pedraza	68b63269d9	Remove the `proxy.disableIdentity` config (#7729 ) * Remove the `proxy.disableIdentity` config Fixes #7724 Also: - Removed the `linkerd.io/identity-mode` annotation. - Removed the `config.linkerd.io/disable-identity` annotation. - Removed the `linkerd.proxy.validation` template partial, which only made sense when `proxy.disableIdentity` was `true`. - TestInjectManualParams now requires to hit the cluster to retrieve the trust root.	2022-01-31 10:17:10 -05:00
Alejandro Pedraza	f9f3ebefa9	Remove namespace from charts and split them into `linkerd-crd` and `linkerd-control-plane` (#6635 ) Fixes #6584 #6620 #7405 # Namespace Removal With this change, the `namespace.yaml` template is rendered only for CLI installs and not Helm, and likewise the `namespace:` entry in the namespace-level objects (using a new `partials.namespace` helper). The `installNamespace` and `namespace` entries in `values.yaml` have been removed. There in the templates where the namespace is required, we moved from `.Values.namespace` to `.Release.Namespace` which is filled-in automatically by Helm. For the CLI, `install.go` now explicitly defines the contents of the `Release` map alongside `Values`. The proxy-injector has a new `linkerd-namespace` argument given the namespace is no longer persisted in the `linkerd-config` ConfigMap, so it has to be passed in. To pass it further down to `injector.Inject()` without modifying the `Handler` signature, a closure was used. ------------ Update: Merged-in #6638: Similar changes for the `linkerd-viz` chart: Stop rendering `namespace.yaml` in the `linkerd-viz` chart. The additional change here is the addition of the `namespace-metadata.yaml` template (and its RBAC), _not_ rendered in CLI installs, which is a Helm `post-install` hook, consisting on a Job that executes a script adding the required annotations and labels to the viz namespace using a PATCH request against kube-api. The script first checks if the namespace doesn't already have an annotations/labels entries, in which case it has to add extra ops in that patch. --------- Update: Merged-in the approved #6643, #6665 and #6669 which address the `linkerd2-cni`, `linkerd-multicluster` and `linkerd-jaeger` charts. Additional changes from what's already mentioned above: - Removes the install-namespace option from `linkerd install-cni`, which isn't found in `linkerd install` nor `linkerd viz install` anyways, and it would add some complexity to support. - Added a dependency on the `partials` chart to the `linkerd-multicluster-link` chart, so that we can tap on the `partials.namespace` helper. - We don't have any more the restriction on having the muticluster objects live in a separate namespace than linkerd. It's still good practice, and that's the default for the CLI install, but I removed that validation. Finally, as a side-effect, the `linkerd mc allow` subcommand was fixed; it has been broken for a while apparently: ```console $ linkerd mc allow --service-account-name foobar Error: template: linkerd-multicluster/templates/remote-access-service-mirror-rbac.yaml:16:7: executing "linkerd-multicluster/templates/remote-access-service-mirror-rbac.yaml" at <include "partials.annotations.created-by" $>: error calling include: template: no template "partials.annotations.created-by" associated with template "gotpl" ``` --------- Update: see helm/helm#5465 describing the current best-practice # Core Helm Charts Split This removes the `linkerd2` chart, and replaces it with the `linkerd-crds` and `linkerd-control-plane` charts. Note that the viz and other extension charts are not concerned by this change. Also note the original `values.yaml` file has been split into both charts accordingly. ### UX ```console $ helm install linkerd-crds --namespace linkerd --create-namespace linkerd/linkerd-crds ... # certs.yaml should contain identityTrustAnchorsPEM and the identity issuer values $ helm install linkerd-control-plane --namespace linkerd -f certs.yaml linkerd/linkerd-control-plane ``` ### Upgrade As explained in #6635, this is a breaking change. Users will have to uninstall the `linkerd2` chart and install these two, and eventually rollout the proxies (they should continue to work during the transition anyway). ### CLI The CLI install/upgrade code was updated to be able to pick the templates from these new charts, but the CLI UX remains identical as before. ### Other changes - The `linkerd-crds` and `linkerd-control-plane` charts now carry a version scheme independent of linkerd's own versioning, as explained in #7405. - These charts are Helm v3, which is reflected in the `Chart.yaml` entries and in the removal of the `requirements.yaml` files. - In the integration tests, replaced the `helm-chart` arg with `helm-charts` containing the path `./charts`, used to build the paths for both charts. ### Followups - Now it's possible to add a `ServiceProfile` instance for Destination in the `linkerd-control-plane` chart.	2021-12-10 15:53:08 -05:00
Tarun Pothulapati	4170b49b33	smi: remove default functionality in linkerd (#7334 ) Now, that SMI functionality is fully being moved into the [linkerd-smi](www.github.com/linkerd/linkerd-smi) extension, we can stop supporting its functionality by default. This means that the `destination` component will stop reacting to the `TrafficSplit` objects. When `linkerd-smi` is installed, It does the conversion of `TrafficSplit` objects to `ServiceProfiles` that destination components can understand, and will react accordingly. Also, Whenever a `ServiceProfile` with traffic splitting is associated with a service, the same information (i.e splits and weights) is also surfaced through the `UI` (in the new `services` tab) and the `viz cmd`. So, We are not really loosing any UI functionality here. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-12-03 12:07:30 +05:30
Alex Leong	18bfd0bd9e	default enableEndpointSlices to true (#7216 ) EndpointSlices are enabled by default in our Kubernetes minimum version of 1.20. Thus we can change the default behavior of the destination controller to use EndpointSlices instead of Endpoints. This unblocks any functionality which is specific to EndpointSlices such as topology aware hints. Signed-off-by: Alex Leong <alex@buoyant.io>	2021-12-01 15:56:44 -08:00
Kevin Leimkuhler	01cbe616f1	Honor Server `proxyProtocol` in destination service `Get` with policy CRD APIs (#7184 ) This change ensures that if a Server exists with `proxyProtocol: opaque` that selects an endpoint backed by a pod, that destination requests for that pod reflect the fact that it handles opaque traffic. Currently, the only way that opaque traffic is honored in the destination service is if the pod has the `config.linkerd.io/opaque-ports` annotation. With the introduction of Servers though, users can set `server.Spec.ProxyProtocol: opaque` to indicate that if a Server selects a pod, then traffic to that pod's `server.Spec.Port` should be opaque. Currently, the destination service does not take this into account. There is an existing change up that _also_ adds this functionality; it takes a different approach by creating a policy server client for each endpoint that a destination has. For `Get` requests on a service, the number of clients scales with the number of endpoints that back that service. This change fixes that issue by instead creating a Server watch in the endpoint watcher and sending updates through to the endpoint translator. The two primary scenarios to consider are ### A `Get` request for some service is streaming when a Server is created/updated/deleted When a Server is created or updated, the endpoint watcher iterates through its endpoint watches (`servicePublisher` -> `portPublisher`) and if it selects any of those endpoints, the port publisher sends an update if the Server has marked that port as opaque. When a Server is deleted, the endpoint watcher once again iterates through its endpoint watches and deletes the address set's `OpaquePodPorts` field—ensuring that updates have been cleared of Server overrides. ### A `Get` request for some service happens after a Server is created When a `Get` request occurs (or new endpoints are added—they both take the same path), we must check if any of those endpoints are selected by some existing Server. If so, we have to take that into account when creating the address set. This part of the change gives me a little concern as we first must get all the Servers on the cluster and then create a set of _all_ the pod-backed endpoints that they select in order to determine if any of these _new_ endpoints are selected. ## Testing Right now this can be tested by starting up the destination service locally and running `Get` requests on a service that has endpoints selected by a Server app.yaml ```yaml apiVersion: v1 kind: Pod metadata: name: pod labels: app: pod spec: containers: - name: app image: nginx ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: svc spec: selector: app: pod ports: - name: http port: 80 --- apiVersion: policy.linkerd.io/v1alpha1 kind: Server metadata: name: srv labels: policy: srv spec: podSelector: matchLabels: app: pod port: 80 proxyProtocol: HTTP/1 ``` ```bash $ go run controller/script/destination-client/main.go -path svc.default.svc.cluster.local:80 ``` Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2021-11-23 20:35:53 -07:00
Alejandro Pedraza	e4a7c3b9b6	Log Warn instead of Err when prom not found in heartbeat (#7029 ) Fixes #7013	2021-10-06 09:46:10 -05:00
Stepan Rabotkin	5e6a1b5508	Graceful shutdown for admin server (#6817 ) * Graceful shutdown for admin server Signed-off-by: Stepan Rabotkin <epicstyt@gmail.com>	2021-09-07 10:50:31 -05:00
Alex Leong	ca1077bb08	Read trust roots from configmap (#6455 ) Fixes #6452 We add a `linkerd-identity-trust-roots` ConfigMap which contains the configured trust root bundle. The proxy template partial is modified so that core control plane components load this bundle from the configmap through the downward API. The identity controller is updated to mount this new configmap as a volume read the trust root bundle at startup. Similarly, the proxy-injector also mounts this new configmap. For each pod it injects, it reads the trust root bundle file and sets it on the injected pod. Signed-off-by: Alex Leong <alex@buoyant.io>	2021-07-28 13:23:15 -07:00
Alex Leong	9a1468328c	Emit event when issuing leaf certificate (#6364 ) We emit a Kubernetes event from the identity controller when successfully issuing a leaf certificate. The events include the identity, expiry, and a hash of the certificate. Signed-off-by: Alex Leong <alex@buoyant.io>	2021-06-25 11:19:16 -07:00
Tarun Pothulapati	fac28ff8a7	destination: Remove support for IP Queries in `Get` API (#6018 ) * destination: Remove support for IP Queries in `Get` API Fixes #5246 This PR updates the destination to report an error when `Get` is called for IP Queries. As the issue mentions, The proxies are not using this API anymore and it helps to simplify and remove unnecessary logic. This removes the relevant `IPWatcher` logic, along with unit tests Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-04-21 12:40:40 +05:30
Alejandro Pedraza	6980e45e1d	Remove the `linkerd-controller` pod (#6039 ) * Remove the `linkerd-controller` pod Now that we got rid of the `Version` API (#6000) and the destination API forwarding business in `linkerd-controller` (#5993), we can get rid of the `linkerd-controller` pod. ## Removals - Deleted everything under `/controller/api/public` and `/controller/cmd/public-api`. - Moved `/controller/api/public/test_helper.go` to `/controller/api/destination/test_helper.go` because those are really utils for destination testing. I also extracted from there the prometheus mock structs and put that under `/pkg/prometheus/test_helper.go`, which is now by both the `linkerd diagnostics endpoints` and the `metrics-api` tests, removing some duplication. - Deleted the `controller.yaml` and `controller-rbac.yaml` helm templates along with the `publicAPIResources` and `publicAPIProxyResources` helm values. ## Health checks - Removed the `can initialize the client` check given such client is no longer needed. The `linkerd-api` section was left with only the check `control pods are ready`, so I moved that under the `linkerd-existence` section and got rid of the `linkerd-api` section altogether. - In that same `linkerd-existence` section, got rid of the `controller pod is running` check. ## Other changes - Fixed the Control Plane section of the dashboard, taking account the disappearance of `linkerd-controller` and previously, of `linkerd-sp-validator`.	2021-04-19 09:57:45 -05:00
Alejandro Pedraza	b22b5849e3	Removed Destination's `Get` API from the public-api (#5993 ) * Removed Destination's `Get` API from the public-api This is the first step towards removing the `linkerd-controller` pod. It deals with removing the Destination `Get` http and gRPC endpoint it exposes, that only the `linkerd diagnostics endpoints` is consuming. Removed all references to Destination in the public-api, including all the gRPC-to-http-to-gRPC forwardings: - Removed the `Get` method from the public-api gRPC server that forwarded the connection from the controller pod to the destination pod. Clients should now connect directly to the destination service via gRPC. - Likewise, removed the destination boilerplate in the public-api http server (and its `main.go`) that served the `Get` destination endpoint and forwarded it into the gRPC server. - Finally, removed the destination boilerplate from the public-api's `client.go` that created a client connecting to the http API.	2021-04-16 09:04:17 -05:00
Tarun Pothulapati	5c1a375a51	destination: pass opaque-ports through cmd flag (#5829 ) * destination: pass opaque-ports through cmd flag Fixes #5817 Currently, Default opaque ports are stored at two places i.e `Values.yaml` and also at `opaqueports/defaults.go`. As these ports are used only in destination, We can instead pass these values as a cmd flag for destination component from Values.yaml and remove defaultPorts in `defaults.go`. This means that users if they override `Values.yaml`'s opauePorts field, That change is propogated both for injection and also discovery like expected. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-03-01 16:00:20 +05:30
Max Goltzsche	a8e5bff21f	Provide CA cert as env var to identity controller. (#5690 ) Currently the identity controller is the only component that receives the CA certificate / trust anchors as option `-identity-trust-anchors-pem` instead of an env var. This stops one from letting it read the trust anchors from a Secret that is managed by e.g. cert-manager. This PR uses an env var instead of the option to provide the trust anchors. For most helm chart users this doesn't change anything. However using kustomize the helm output manifest can now be adjusted (again) so that the certificate is loaded from a ConfigMap or Secret like in [this example](https://github.com/mgoltzsche/khelm/tree/master/example/kpt/linkerd) which aims to produce a static manifest to make the installation/update more declarative and support GitOps workflows. This PR does not provide chart options/values to specify Secrets upfront - it would introduce dependencies to other operators. Relates to #3843, see https://github.com/linkerd/linkerd2/issues/3843#issuecomment-775516217 Fixes #3321 Signed-off-by: Max Goltzsche <max.goltzsche@gmail.com>	2021-02-22 10:30:43 -05:00
Filip Petkovski	73f9fb3518	Use a shared informer when getting node topology (#5722 ) Getting information about node topology queries the k8s api directly. In an environment with high traffic and high number of pods, the k8s api server can become overwhelmed or start throttling requests. This MR introduces a node informer to resolve the bottleneck and fetch node information asynchronously. Fixes #5684 Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>	2021-02-12 11:05:38 -05:00
Kevin Leimkuhler	75fcc9d623	Move tap from core into Viz extension (#5651 ) Closes #5545. This change moves all tap and tap-injector code into the viz directory. The tap and tap-injector components now also use a new tap image—separating these components from the controller image that they are currently part of. This means the controller image has removed all its build dependencies related to tap. Finally, the tap Protobuf has been separated from the metrics-api and moved into it's own `.proto` file and gen directory. This introduces a clear split between metrics-api and tap Protobuf. There is no change in behavior for the `viz tap` command. ### Reviewing #### Docker images All the bin directory scripts should be updated to build and load the tap image. All the CI workflows should be updated to build and push the tap image. #### Controller and pkg directories This is primarily deletions. Most of the deleted code in this directory is now in the tap directory of the Viz extension. #### viz/tap This is the location that all the tap related code now lives in. New files are mostly moved from the controller and pkg directories. Imports have all been updated to point at the right locations and Protobuf. The Protobuf here is taken from metrics-api and contains all tap-related Protobuf. Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2021-02-09 12:43:21 -05:00
Alejandro Pedraza	8ac5360041	Extract from public-api all the Prometheus dependencies, and moves things into a new viz component 'linkerd-metrics-api' (#5560 ) * Protobuf changes: - Moved `healthcheck.proto` back from viz to `proto/common` as it remains being used by the main `healthcheck.go` library (it was moved to viz by #5510). - Extracted from `viz.proto` the IP-related types and put them in `/controller/gen/common/net` to be used by both the public and the viz APIs. * Added chart templates for new viz linkerd-metrics-api pod * Spin-off viz healthcheck: - Created `viz/pkg/healthcheck/healthcheck.go` that wraps the original `pkg/healthcheck/healthcheck.go` while adding the `vizNamespace` and `vizAPIClient` fields which were removed from the core `healthcheck`. That way the core healthcheck doesn't have any dependencies on viz, and viz' healthcheck can now be used to retrieve viz api clients. - The core and viz healthcheck libs are now abstracted out via the new `healthcheck.Runner` interface. - Refactored the data plane checks so they don't rely on calling `ListPods` - The checks in `viz/cmd/check.go` have been moved to `viz/pkg/healthcheck/healthcheck.go` as well, so `check.go`'s sole responsibility is dealing with command business. This command also now retrieves its viz api client through viz' healthcheck. * Removed linkerd-controller dependency on Prometheus: - Removed the `global.prometheusUrl` config in the core values.yml. - Leave the Heartbeat's `-prometheus` flag hard-coded temporarily. TO-DO: have it automatically discover viz and pull Prometheus' endpoint (#5352). * Moved observability gRPC from linkerd-controller to viz: - Created a new gRPC server under `viz/metrics-api` moving prometheus-dependent functions out of the core gRPC server and into it (same thing for the accompaigning http server). - Did the same for the `PublicAPIClient` (now called just `Client`) interface. The `VizAPIClient` interface disappears as it's enough to just rely on the viz `ApiClient` protobuf type. - Moved the other files implementing the rest of the gRPC functions from `controller/api/public` to `viz/metrics-api` (`edge.go`, `stat_summary.go`, etc.). - Also simplified some type names to avoid stuttering. * Added linkerd-metrics-api bootstrap files. At the same time, we strip out of the public-api's `main.go` file the prometheus parameters and other no longer relevant bits. * linkerd-web updates: it requires connecting with both the public-api and the viz api, so both addresses (and the viz namespace) are now provided as parameters to the container. * CLI updates and other minor things: - Changes to command files under `cli/cmd`: - Updated `endpoints.go` according to new API interface name. - Updated `version.go`, `dashboard` and `uninstall.go` to pull the viz namespace dynamically. - Changes to command files under `viz/cmd`: - `edges.go`, `routes.go`, `stat.go` and `top.go`: point to dependencies that were moved from public-api to viz. - Other changes to have tests pass: - Added `metrics-api` to list of docker images to build in actions workflows. - In `bin/fmt` exclude protobuf generated files instead of entire directories because directories could contain both generated and non-generated code (case in point: `viz/metrics-api`). * Add retry to 'tap API service is running' check * mc check shouldn't err when viz is not available. Also properly set the log in multicluster/cmd/root.go so that it properly displays messages when --verbose is used	2021-01-21 18:26:38 -05:00
Kevin Leimkuhler	e7f2a3fba3	viz: add tap-injector (#5540 ) ## What this changes This adds a tap-injector component to the `linkerd-viz` extension which is responsible for adding the tap service name environment variable to the Linkerd proxy container. If a pod does not have a Linkerd proxy, no action is taken. If tap is disabled via annotation on the pod or the namespace, no action is taken. This also removes the environment variable for explicitly disabling tap through an environment variable. Tap status for a proxy is now determined only be the presence or absence of the tap service name environment variable. Closes #5326 ## How it changes ### tap-injector The tap-injector component determines if `LINKERD2_PROXY_TAP_SVC_NAME` should be added to a pod's Linkerd proxy container environment. If the pod satisfies the following, it is added: - The pod has a Linkerd proxy container - The pod has not already been mutated - Tap is not disabled via annotation on the pod or the pod's namespace ### LINKERD2_PROXY_TAP_DISABLED Now that tap is an extension of Linkerd and not a core component, it no longer made sense to explicitly enable or disable tap through this Linkerd proxy environment variable. The status of tap is now determined only be if the tap-injector adds or does not add the `LINKERD2_PROXY_TAP_SVC_NAME` environment variable. ### controller image The tap-injector has been added to the controller image's several startup commands which determines what it will do in the cluster. As a follow-up, I think splitting out the `tap` and `tap-injector` commands from the controller image into a linkerd-viz image (or something like that) makes sense. Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2021-01-21 11:24:08 -05:00
Kevin Leimkuhler	7c0843a823	Add opaque ports to destination service updates (#5294 ) ## Summary This changes the destination service to start indicating whether a profile is an opaque protocol or not. Currently, profiles returned by the destination service are built by chaining together updates coming from watching Profile and Traffic Split updates. With this change, we now also watch updates to Opaque Port annotations on pods and namespaces; if an update occurs this is now included in building a profile update and is sent to the client. ## Details Watching updates to Profiles and Traffic Splits is straightforward--we watch those resources and if an update occurs on one associated to a service we care about then the update is passed through. For Opaque Ports this is a little different because it is an annotation on pods or namespaces. To account for this, we watch the endpoints that we should care about. ### When host is a Pod IP When getting the profile for a Pod IP, we check for the opaque ports annotation on the pod and the pod's namespace. If one is found, we'll indicate if the profile is an opaque protocol if the requested port is in the annotation. We do not subscribe for updates to this pod IP. The only update we really care about is if the pod is deleted and this is already handled by the proxy. ### When host is a Service When getting the profile for a Service, we subscribe for updates to the endpoints of that service. For any ports set in the opaque ports annotation on any of the pods, we check if the requested port is present. Since the endpoints for a service can be added and removed, we do subscribe for updates to the endpoints of the service. Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-12-18 12:38:59 -05:00
Alejandro Pedraza	578d4a19e9	Have the tap APIServer refresh its cert automatically (#5388 ) Followup to #5282, fixes #5272 in its totality. This follows the same pattern as the injector/sp-validator webhooks, leveraging `FsCredsWatcher` to watch for changes in the cert files. To reuse code from the webhooks, we moved `updateCert()` to `creds_watcher.go`, and `run()` as well (which now is called `ProcessEvents()`). The `TestNewAPIServer` test in `apiserver_test.go` was removed as it really was just testing two things: (1) that `apiServerAuth` doesn't error which is already covered in the following test, and (2) that the golib call `net.Listen("tcp", addr)` doesn't error, which we're not interested in testing here. ## How to test To test that the injector/sp-validator functionality is still correct, you can refer to #5282 The steps below are similar, but focused towards the tap component: ```bash # Create some root cert $ step certificate create linkerd-tap.linkerd.svc ca.crt ca.key --profile root-ca --no-password --insecure # configure tap's caBundle to be that root cert $ cat > linkerd-overrides.yml << EOF tap: externalSecret: true caBundle: \| < ca.crt contents> EOF # Install linkerd $ bin/linkerd install --config linkerd-overrides.yml \| k apply -f - # Generate an intermediatery cert with short lifespan $ step certificate create linkerd-tap.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-tap.linkerd.svc # Create the secret using that intermediate cert $ kubectl create secret tls \ linkerd-tap-k8s-tls \ --cert=ca-int.crt \ --key=ca-int.key \ --namespace=linkerd # Rollout the tap pod for it to pick the new secret $ k -n linkerd rollout restart deploy/linkerd-tap # Tap should work $ bin/linkerd tap -n linkerd deploy/linkerd-web req id=0:0 proxy=in src=10.42.0.15:33040 dst=10.42.0.11:9994 tls=true :method=GET :authority=10.42.0.11:9994 :path=/metrics rsp id=0:0 proxy=in src=10.42.0.15:33040 dst=10.42.0.11:9994 tls=true :status=200 latency=1779µs end id=0:0 proxy=in src=10.42.0.15:33040 dst=10.42.0.11:9994 tls=true duration=65µs response-length=1709B # Wait 5 minutes and rollout tap again $ k -n linkerd rollout restart deploy/linkerd-tap # You'll see in the logs that the cert expired: $ k -n linkerd logs -f deploy/linkerd-tap tap 2020/12/15 16:03:41 http: TLS handshake error from 127.0.0.1:45866: remote error: tls: bad certificate 2020/12/15 16:03:41 http: TLS handshake error from 127.0.0.1:45870: remote error: tls: bad certificate # Recreate the secret $ step certificate create linkerd-tap.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-tap.linkerd.svc $ k -n linkerd delete secret linkerd-tap-k8s-tls $ kubectl create secret tls \ linkerd-tap-k8s-tls \ --cert=ca-int.crt \ --key=ca-int.key \ --namespace=linkerd # Wait a few moments and you'll see the certs got reloaded and tap is working again time="2020-12-15T16:03:42Z" level=info msg="Updated certificate" addr=":8089" component=apiserver ```	2020-12-16 17:46:14 -05:00
Tarun Pothulapati	72a0ca974d	extension: Separate multicluster chart and binary (#5293 ) Fixes #5257 This branch movies mc charts and cli level code to a new top level directory. None of the logic is changed. Also, moves some common types into `/pkg` so that they are accessible both to the main cli and extensions. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-12-04 16:36:10 -08:00
Alejandro Pedraza	4c634a3816	Have webhooks refresh their certs automatically (#5282 ) * Have webhooks refresh their certs automatically Fixes partially #5272 In 2.9 we introduced the ability for providing the certs for `proxy-injector` and `sp-validator` through some external means like cert-manager, through the new helm setting `externalSecret`. We forgot however to have those services watch changes in their secrets, so whenever they were rotated they would fail with a cert error, with the only workaround being to restart those pods to pick the new secrets. This addresses that by first abstracting out `FsCredsWatcher` from the identity controller, which now lives under `pkg/tls`. The webhook's logic in `launcher.go` no longer reads the certs before starting the https server, moving that instead into `server.go` which in a similar way as identity will receive events from `FsCredsWatcher` and update `Server.cert`. We're leveraging `http.Server.TLSConfig.GetCertificate` which allows us to provide a function that will return the current cert for every incoming request. ### How to test ```bash # Create some root cert $ step certificate create linkerd-proxy-injector.linkerd.svc ca.crt ca.key \ --profile root-ca --no-password --insecure --san linkerd-proxy-injector.linkerd.svc # configure injector's caBundle to be that root cert $ cat > linkerd-overrides.yaml << EOF proxyInjector: externalSecret: true caBundle: \| < ca.crt contents> EOF # Install linkerd. The injector won't start untill we create the secret below $ bin/linkerd install --controller-log-level debug --config linkerd-overrides.yaml \| k apply -f - # Generate an intermediatery cert with short lifespan step certificate create linkerd-proxy-injector.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-proxy-injector.linkerd.svc # Create the secret using that intermediate cert $ kubectl create secret tls \ linkerd-proxy-injector-k8s-tls \ --cert=ca-int.crt \ --key=ca-int.key \ --namespace=linkerd # start following the injector log $ k -n linkerd logs -f -l linkerd.io/control-plane-component=proxy-injector -c proxy-injector # Inject emojivoto. The pods should be injected normally $ bin/linkerd inject https://run.linkerd.io/emojivoto.yml \| kubectl apply -f - # Wait about 5 minutes and delete a pod $ k -n emojivoto delete po -l app=emoji-svc # You'll see it won't be injected, and something like "remote error: tls: bad certificate" will appear in the injector logs. # Regenerate the intermediate cert $ step certificate create linkerd-proxy-injector.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-proxy-injector.linkerd.svc # Delete the secret and recreate it $ k -n linkerd delete secret linkerd-proxy-injector-k8s-tls $ kubectl create secret tls \ linkerd-proxy-injector-k8s-tls \ --cert=ca-int.crt \ --key=ca-int.key \ --namespace=linkerd # Wait a couple of minutes and you'll see some filesystem events in the injector log along with a "Certificate has been updated" entry # Then delete the pod again and you'll see it gets injected this time $ k -n emojivoto delete po -l app=emoji-svc ```	2020-12-04 16:25:59 -05:00
Alejandro Pedraza	4687dc52aa	Refactor webhook framework to allow webhooks define their flags (#5256 ) * Refactor webhook framework to allow webhook define their flags Pulled out of `launcher.go` the flag parsing logic and moved it into the `Main` methods of the webhooks (under `controller/cmd/proxy.injector/main.go` and `controller/cmd/sp-validator/main.go`), so that individual webhooks themselves can define the flags they want to use. Also no longer require that webhooks have cluster-wide access. Finally, renamed the type `webhook.handlerFunc` to `webhook.Handler` so it can be exported. This will be used in the upcoming jaeger webhook.	2020-11-23 10:40:30 -05:00
Tarun Pothulapati	5e774aaf05	Remove dependency of linkerd-config for control plane components (#4915 ) * Remove dependency of linkerd-config for most control plane components This PR removes the dependency of `linkerd-config` into control plane components by making all that information passed through CLI flags. As most of these components require a couple of flags, passing them as flags could be more helpful, as updations to the flags trigger a rollout unlike a configMap update. This does not update the proxy-injector as it needs a lot more data and mounting `linkerd-config` is better.	2020-10-06 22:19:18 +05:30
Tarun Pothulapati	d0caaa86c4	Bump k8s client-go to v0.19.2 (#5002 ) Fixes #4191 #4993 This bumps Kubernetes client-go to the latest v0.19.2 (We had to switch directly to 1.19 because of this issue). Bumping to v0.19.2 required upgrading to smi-sdk-go v0.4.1. This also depends on linkerd/stern#5 This consists of the following changes: - Fix ./bin/update-codegen.sh by adding the template path to the gen commands, as it is needed after we moved to GOMOD. - Bump all k8s related dependencies to v0.19.2 - Generate CRD types, client code using the latest k8s.io/code-generator - Use context.Context as the first argument, in all code paths that touch the k8s client-go interface Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-09-28 12:45:18 -05:00
Alejandro Pedraza	b30d35f46a	Reset service-mirror component when target's k8s API is unreachable (#4996 ) When the service-mirror component can't reach the target's k8s API, the goroutine blocks and it can't be unblocked. This was happenining specifically in the case of the multicluster integration test (still to be pushed), where the source and target clusters are created in quick succession and the target's API service doesn't always have time to be exposed before being requested by the service mirror. The fix consists on no longer have restartClusterWatcher be side-effecting, and instead return an error. If such error is not nil then the link watcher is stopped and reset after 10 seconds.	2020-09-25 11:00:28 -05:00
Alex Leong	9d3cf6ee4d	Move most service-mirror code out of cmd package (#4901 ) All of the code for the service mirror controller lives in the `linkerd/linkerd2/controller/cmd` package. It is typical for control plane components to only have a `main.go` entrypoint in the cmd package. This can sometimes make it hard to find the service mirror code since I wouldn't expect it to be in the cmd package. We move the majority of the code to a dedicated controller package, leaving only main.go in the cmd package. This is purely organizational; no behavior change is expected. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-08-27 14:17:18 -07:00
Matei David	7ed904f31d	Enable endpoint slices when upgrading through CLI (#4864 ) ## What/How @adleong pointed out in #4780 that when enabling slices during an upgrade, the new value does not persist in the `linkerd-config` ConfigMap. I took a closer look and it seems that we were never overwriting the values in case they were different. * To fix this, I added an if block when validating and building the upgrade options -- if the current flag value differs from what we have in the ConfigMap, then change the ConfigMap value. * When doing so, I made sure to check that if the cluster does not support `EndpointSlices` yet the flag is set to true, we will error out. This is done similarly (copy&paste similarily) to what's in the install part. * Additionally, I have noticed that the helm ConfigMap template stored the flag value under `enableEndpointSlices` field name. I assume this was not changed in the initial PR to reflect the changes made in the protocol buffer. The API (and thus the CLI) uses the field name `endpointSliceEnabled` instead. I have changed the config template so that helm installations will use the same field, which can then be used in the destination service or other components that may implement slice support in the future. Signed-off-by: Matei David <matei.david.35@gmail.com>	2020-08-24 14:34:50 -07:00
Josh Soref	72aadb540f	Spelling (#4872 ) This PR corrects misspellings identified by the [check-spelling action](https://github.com/marketplace/actions/check-spelling). The misspellings have been reported at `aaf440489e (commitcomment-41423663)` The action reports that the changes in this PR would make it happy: `5b82c6c5ca` Note: this PR does not include the action. If you're interested in running a spell check on every PR and push, that can be offered separately. Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>	2020-08-12 21:59:50 -07:00
Alex Leong	024a35a3d3	Move multicluster API connectivity checks earlier (#4819 ) Fixes #4774 When a service mirror controller is unable to connect to the target cluster's API, the service mirror controller crashes with the error that it has failed to sync caches. This error lacks the necessary detail to debug the situation. Unfortunately, client-go does not surface more useful information about why the caches failed to sync. To make this more debuggable we do a couple things: 1. When creating the target cluster api client, we eagerly issue a server version check to test the connection. If the connection fails, the service-mirror-controller logs now look like this: ``` time="2020-07-30T23:53:31Z" level=info msg="Got updated link broken: {Name:broken Namespace:linkerd-multicluster TargetClusterName:broken TargetClusterDomain:cluster.local TargetClusterLinkerdNamespace:linkerd ClusterCredentialsSecret:cluster-credentials-broken GatewayAddress:35.230.81.215 GatewayPort:4143 GatewayIdentity:linkerd-gateway.linkerd-multicluster.serviceaccount.identity.linkerd.cluster.local ProbeSpec:ProbeSpec: {path: /health, port: 4181, period: 3s} Selector:{MatchLabels:map[] MatchExpressions:[{Key:mirror.linkerd.io/exported Operator:Exists Values:[]}]}}" time="2020-07-30T23:54:01Z" level=error msg="Unable to create cluster watcher: cannot connect to api for target cluster remote: Get \"https://36.199.152.138/version?timeout=32s\": dial tcp 36.199.152.138:443: i/o timeout" ``` This error also no longer causes the service mirror controller to crash. Updating the Link resource will cause the service mirror controller to reload the credentials and try again. 2. We rearrange the checks in `linkerd check --multicluster` to perform the target API connectivity checks before the service mirror controller checks. This means that we can validate the target cluster API connection even if the service mirror controller is not healthy. We also add a server version check here to quickly determine if the connection is healthy. Sample check output: ``` linkerd-multicluster -------------------- √ Link CRD exists √ Link resources are valid * broken W0730 16:52:05.620806 36735 transport.go:243] Unable to cancel request for promhttp.RoundTripperFunc × remote cluster access credentials are valid * failed to connect to API for cluster: [broken]: Get "https://36.199.152.138/version?timeout=30s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) see https://linkerd.io/checks/#l5d-smc-target-clusters-access for hints W0730 16:52:35.645499 36735 transport.go:243] Unable to cancel request for promhttp.RoundTripperFunc × clusters share trust anchors Problematic clusters: * broken: unable to fetch anchors: Get "https://36.199.152.138/api/v1/namespaces/linkerd/configmaps/linkerd-config?timeout=30s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) see https://linkerd.io/checks/#l5d-multicluster-clusters-share-anchors for hints √ service mirror controller has required permissions * broken √ service mirror controllers are running * broken × all gateway mirrors are healthy wrong number of (0) gateway metrics entries for probe-gateway-broken.linkerd-multicluster see https://linkerd.io/checks/#l5d-multicluster-gateways-endpoints for hints √ all mirror services have endpoints ‼ all mirror services are part of a Link mirror service voting-svc-gke.emojivoto is not part of any Link see https://linkerd.io/checks/#l5d-multicluster-orphaned-services for hints ``` Some logs from the underlying go network libraries sneak into the output which is kinda gross but I don't think it interferes too much with being able to understand what's going on. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-08-05 11:48:23 -07:00
Alex Leong	381f237f69	Add multicluster unlink command (#4802 ) Fixes #4707 In order to remove a multicluster link, we add a `linkerd multicluster unlink` command which produces the yaml necessary to delete all of the resources associated with a `linkerd multicluster link`. These are: * the link resource * the service mirror controller deployment * the service mirror controller's RBAC * the probe gateway mirror for this link * all mirror services for this link This command follows the same pattern as the `linkerd uninstall` command in that its output is expected to be piped to `kubectl delete`. The typical usage of this command is: ``` linkerd --context=source multicluster unlink --cluster-name=foo \| kubectl --context=source delete -f - ``` This change also fixes the shutdown lifecycle of the service mirror controller by properly having it listen for the shutdown signal and exit its main loop. A few alternative designs were considered: I investigated using owner references as suggested [here](https://github.com/linkerd/linkerd2/issues/4707#issuecomment-653494591) but it turns out that owner references must refer to resources in the same namespace (or to cluster scoped resources). This was not feasible here because a service mirror controller can create mirror services in many different namespaces. I also considered having the service mirror controller delete the mirror services that it created during its own shutdown. However, this could lead to scenarios where the controller is killed before it finishes deleting the services that it created. It seemed more reliable to have all the deletions happen from `kubectl delete`. Since this is the case, we avoid having the service mirror controller delete mirror services, even when the link is deleted, to avoid the race condition where the controller and CLI both attempt to delete the same mirror services and one of them fails with a potentially alarming error message. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-08-04 16:21:59 -07:00
Alex Leong	a1543b33e3	Add support for service-mirror selectors (#4795 ) * Add selector support Signed-off-by: Alex Leong <alex@buoyant.io> * Removed unused labels Signed-off-by: Alex Leong <alex@buoyant.io>	2020-07-30 10:07:14 -07:00
Alex Leong	d540e16c8b	Make service mirror controller per target cluster (#4710 ) This PR removes the service mirror controller from `linkerd mc install` to `linkerd mc link`, as described in https://github.com/linkerd/rfc/pull/31. For fuller context, please see that RFC. Basic multicluster functionality works here including: * `linkerd mc install` installs the Link CRD but not any service mirror controllers * `linkerd mc link` creates a Link resource and installs a service mirror controller which uses that Link * The service mirror controller creates and manages mirror services, a gateway mirror, and their endpoints. * The `linkerd mc gateways` command lists all linked target clusters, their liveliness, and probe latences. * The `linkerd check` multicluster checks have been updated for the new architecture. Several checks have been rendered obsolete by the new architecture and have been removed. The following are known issues requiring further work: * the service mirror controller uses the existing `mirror.linkerd.io/gateway-name` and `mirror.linkerd.io/gateway-ns` annotations to select which services to mirror. it does not yet support configuring a label selector. * an unlink command is needed for removing multicluster links: see https://github.com/linkerd/linkerd2/issues/4707 * an mc uninstall command is needed for uninstalling the multicluster addon: see https://github.com/linkerd/linkerd2/issues/4708 Signed-off-by: Alex Leong <alex@buoyant.io>	2020-07-23 14:32:50 -07:00
Tarun Pothulapati	b7e9507174	Remove/Relax prometheus related checks (#4724 ) * Removes/Relaxes prometheus related checks Now that prometheus is an add-on, There can be cases where prometheus is disabled at which the check should show a warning but not fail. This decouples the tight depedency. This changes the following checks: - Removes serviceAccount and pod checks in the CLI. - Relaxes `linkerd-api` checks to only check for prometheus access when the URL is not empty. This should work seamlessly with external prometheus as that URL will be passed and it performs the same check. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-07-20 14:24:00 -07:00
Matei David	8b85716eb8	Introduce install flag for EndpointSlices (#4740 ) EndpointSlices have been made opt-in due to their experimental nature. This PR introduces a new install flag 'enableEndpointSlices' that will allow adopters to specify in their cli install or helm install step whether they would like to use endpointslices as a resource in the destination service, instead of the endpoints k8s resource. Signed-off-by: Matei David <matei.david.35@gmail.com>	2020-07-15 09:53:04 -07:00
Matei David	9d8d89cce8	Add EndpointSlice logic to EndpointsWatcher (#4501 ) (#4663 ) Introduce support for the EndpointSlice k8s resource (k8s v1.16+) in the destination service. Through this PR, in the EndpointsWatcher, there will be a dedicated informer for EndpointSlice; the informer cannot run at the same time as the Endpoints resource informer. The main difference is that EndpointSlices have a one-to-many relationship with a service, they provide better performance benefits, dual-stack addresses and more. EndpointSlice support also implies service topology and other k8s related features. Validated and tested manually, as well as with dedicated unit tests. Closes #4501 Signed-off-by: Matei David <matei.david.35@gmail.com>	2020-07-07 13:20:40 -07:00
Arthur Silva Sens	021048d576	GoDocs for completion, dashboard and diagnostics cli commands (#4518 ) Signed-off-by: arthursens <arthursens2005@gmail.com>	2020-06-30 05:53:50 -05:00
Alex Leong	755538b84a	Resolve gateway hostnames into IP addresses (#4588 ) Fixes #4582 When a target cluster gateway is exposed as a hostname rather than with a fixed IP address, the service mirror controller fails to create mirror services and gateway mirrors for that gateway. This is because we only look at the IP field of the gateway service. We make two changes to address this problem: First, when extracting the gateway spec from a gateway that has a hostname instead of an IP address, we do a DNS lookup to resolve that hostname into an IP address to use in the mirror service endpoints and gateway mirror endpoints. Second, we schedule a repair job on a regular (1 minute) to update these endpoint objects. This has the effect of re-resolving the DNS names every minute to pick up any changes in DNS resolution. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-06-15 10:33:49 -07:00
Zahari Dichev	f01bcfe722	Tweak service-mirror log levels (#4562 ) This PR just modifies the log levels on the probe and cluster watchers to emit in INFO what they would emit in DEBUG. I think it makes sense as we need that information to track problems. The only difference is that when probing gateways we only log if the probe attempt was unsuccessful. Fix #4546	2020-06-05 13:12:36 -07:00
Zahari Dichev	b6b95455aa	Fix load balancer missing ip race condition (#4554 ) Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-06-05 19:35:47 +03:00
Alex Leong	cffa07ddba	Update gateway identity on gateway mirror endpoints (#4559 ) When the identity annotation on a gateway service is updated, this change is not propagated to the mirror gateway endpoints object. This is because the annotations are updated on the wrong object and the changes are lost. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-06-05 09:21:35 -07:00

1 2 3 4

173 Commits