linkerd2

Commit Graph

Author	SHA1	Message	Date
Tarun Pothulapati	cd2e911be3	viz: add data-plane and prometheus healthchecks (#5602 ) * viz: add data-plane and prometheus healthchecks Fixes #5325 This branch adds the remaining healthchecks for the viz extension i.e - Data-plane metrics check in Prometheus - `--proxy` mode which also checks for tap injections based on annotations. For this, The following changes were needed - Category.ID is made public so that --proxy toggleness can be allowed - Made tap env key as a field so that it can be re-used for checks simplify viz.NewHealthChecker by removing the need to pass categoryIDs and instead using hc.appendCategories directly at the caller to add the required categories. This is possible by dividing the vizCategories into separate functions Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-02-01 23:01:13 +05:30
Matei David	0ce9e84a94	Introduce V1 to CRDs and Mutating Hooks (#5603 ) Closes #5484 ### Changes --- Overview: * Update golden files and make necessary spec changes * Update test files for viz * Add v1 to healthcheck and uninstall * Fix link-crd clusterDomain field validation - To update to v1, I had to change crd schemas to be version-based (i.e each version has to declare its own schema). I noticed an error in the link-crd (`targetClusterDomain` was `targetDomainName`). Also, additionalPrinterColumns are also version-dependent as a field now. - For `admissionregistration` resources I had to add an additional `admissionReviewVersions` field -- I included `v1` and `v1beta1`. - In `healthcheck.go` and `resources.go` (used by `uninstall`) I had to make some changes to the client-go versions (i.e from `v1beta1` to `v1` for admissionreg and apiextension) so that we don't see any warning messages when uninstalling or when we do any install checks. I tested again different cli and k8s versions to have a bit more confidence in the changes (in addition to automated tests), hope the cases below will be enough, if not let me know and I can test further. ### Tests Linkerd local build CLI + k8s 1.19+ `install/check/mc-check/mc-install/mc-link/viz-install/viz-check/uninstall/` ``` $ kubectl version Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2+k3s1", GitCommit:"1d4adb0301b9a63ceec8cabb11b309e061f43d5f", GitTreeState:"clean", BuildDate:"2021-01-14T23:52:37Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"} $ bin/linkerd version Client version: git-b0fd2ec8 Server version: unavailable $ bin/linkerd install \| kubectl apply -f - - no errors, no version warnings - $ bin/linkerd check --expected-version git-b0fd2ec8 Status check results are :tick: # MC $ bin/linkerd mc install \| k apply -f - - no erros, no version warnings - $ bin/linkerd mc check Status check results are :tick: $ bin/linkerd mc link foo \| k apply -f - # test crd creation # had a validation error here because the schema had targetDomainName instead of targetClusterDomain # changed, rebuilt cli, re-installed mc, tried command again secret/cluster-credentials-foo created link.multicluster.linkerd.io/foo created ... # VIZ $ bin/linkerd viz install \| k apply -f - - no errors, no version warnings - $ bin/linkerd viz check - no errors, no version warnings - Status check results are :tick: $ bin/linkerd uninstall \| k delete -f - - no errors, no version warnings - ``` Linkerd local build CLI + k8s 1.17 `check-pre/install/mc-check/mc-install/mc-link/viz-install/viz-check` ``` $ kubectl version Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.17-rc1+k3s1", GitCommit:"e8c9484078bc59f2cd04f4018b095407758073f5", GitTreeState:"clean", BuildDate:"2021-01-14T06:20:56Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"} $ bin/linkerd version Client version: git-3d2d4df1 # made changes to link-crd after prev test case Server version: unavailable $ bin/linkerd check --pre --expected-version git-3d2d4df1 - no errors, no version warnings - Status check results are :tick: $ bin/linkerd install \| k apply -f - - no errors, no version warnings - $ bin/linkerd check --expected-version git-3d2d4df1 - no errors, no version warnings - Status check results are :tick: $ bin/linkerd mc install \| k apply -f - - no errors, no version warnings - $ bin/linkerd mc check - no errors, no version warnings - Status check results are :tick: $ bin/linkerd mc link --cluster-name foo \| k apply -f - bin/linkerd mc link --cluster-name foo \| k apply -f - secret/cluster-credentials-foo created link.multicluster.linkerd.io/foo created # VIZ $ bin/linkerd viz install \| k apply -f - - no errors, no version warnings - $ bin/linkerd viz check - no errors, no version warnings - - hangs up indefinitely after linkerd-viz can talk to Kubernetes ``` Linkerd edge (21.1.3) CLI + k8s 1.17 (already installed) `check` ``` $ linkerd version Client version: edge-21.1.3 Server version: git-3d2d4df1 $ linkerd check - no errors - - warnings: mismatch between cli & control plane, control plane not up to date (both expected) - Status check results are :tick: ``` Linkerd stable (2.9.2) CLI + k8s 1.17 (already installed) `check/uninstall` ``` $ linkerd version Client version: stable-2.9.2 Server version: git-3d2d4df1 $ linkerd check × control plane ClusterRoles exist missing ClusterRoles: linkerd-linkerd-tap see https://linkerd.io/checks/#l5d-existence-cr for hints Status check results are × # viz wasn't installed, hence the error, installing viz didn't help since # the res is named `viz-tap` now # moving to uninstall $ linkerd uninstall \| k delete -f - - no warnings, no errors - ``` _Note_: I used `go test ./cli/cmd/... --generate` which is why there are so many changes 😨 Signed-off-by: Matei David <matei.david.35@gmail.com>	2021-02-01 09:18:13 -05:00
Hu Shuai	5e3d5190c3	Add unit test for pkg/healthcheck/sidecar.go (#5609 ) Signed-off-by: Hu Shuai <hus.fnst@cn.fujitsu.com>	2021-01-27 16:56:14 -05:00
Tarun Pothulapati	4f0601e632	jaeger: cli and check logic cleanup (#5564 ) This branch cleans up some of the unnecessary logic that is not needed and thus making the check logic similar to that of other extensions, namely viz. Includes the following cleanups: - Remove `namespace` flag in jaeger CLI and make the fetching logic dynamic and use it in check and dashboard. - Use `hc.KubeAPIClient` instead of creating our own in jaeger check. - Move injection checks up before we run the readiness checks This change adds a new extension namespace exist check for jaeger. Also, Updates integration tests to run the check commands. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-01-22 23:31:35 +05:30
Alejandro Pedraza	8ac5360041	Extract from public-api all the Prometheus dependencies, and moves things into a new viz component 'linkerd-metrics-api' (#5560 ) * Protobuf changes: - Moved `healthcheck.proto` back from viz to `proto/common` as it remains being used by the main `healthcheck.go` library (it was moved to viz by #5510). - Extracted from `viz.proto` the IP-related types and put them in `/controller/gen/common/net` to be used by both the public and the viz APIs. * Added chart templates for new viz linkerd-metrics-api pod * Spin-off viz healthcheck: - Created `viz/pkg/healthcheck/healthcheck.go` that wraps the original `pkg/healthcheck/healthcheck.go` while adding the `vizNamespace` and `vizAPIClient` fields which were removed from the core `healthcheck`. That way the core healthcheck doesn't have any dependencies on viz, and viz' healthcheck can now be used to retrieve viz api clients. - The core and viz healthcheck libs are now abstracted out via the new `healthcheck.Runner` interface. - Refactored the data plane checks so they don't rely on calling `ListPods` - The checks in `viz/cmd/check.go` have been moved to `viz/pkg/healthcheck/healthcheck.go` as well, so `check.go`'s sole responsibility is dealing with command business. This command also now retrieves its viz api client through viz' healthcheck. * Removed linkerd-controller dependency on Prometheus: - Removed the `global.prometheusUrl` config in the core values.yml. - Leave the Heartbeat's `-prometheus` flag hard-coded temporarily. TO-DO: have it automatically discover viz and pull Prometheus' endpoint (#5352). * Moved observability gRPC from linkerd-controller to viz: - Created a new gRPC server under `viz/metrics-api` moving prometheus-dependent functions out of the core gRPC server and into it (same thing for the accompaigning http server). - Did the same for the `PublicAPIClient` (now called just `Client`) interface. The `VizAPIClient` interface disappears as it's enough to just rely on the viz `ApiClient` protobuf type. - Moved the other files implementing the rest of the gRPC functions from `controller/api/public` to `viz/metrics-api` (`edge.go`, `stat_summary.go`, etc.). - Also simplified some type names to avoid stuttering. * Added linkerd-metrics-api bootstrap files. At the same time, we strip out of the public-api's `main.go` file the prometheus parameters and other no longer relevant bits. * linkerd-web updates: it requires connecting with both the public-api and the viz api, so both addresses (and the viz namespace) are now provided as parameters to the container. * CLI updates and other minor things: - Changes to command files under `cli/cmd`: - Updated `endpoints.go` according to new API interface name. - Updated `version.go`, `dashboard` and `uninstall.go` to pull the viz namespace dynamically. - Changes to command files under `viz/cmd`: - `edges.go`, `routes.go`, `stat.go` and `top.go`: point to dependencies that were moved from public-api to viz. - Other changes to have tests pass: - Added `metrics-api` to list of docker images to build in actions workflows. - In `bin/fmt` exclude protobuf generated files instead of entire directories because directories could contain both generated and non-generated code (case in point: `viz/metrics-api`). * Add retry to 'tap API service is running' check * mc check shouldn't err when viz is not available. Also properly set the log in multicluster/cmd/root.go so that it properly displays messages when --verbose is used	2021-01-21 18:26:38 -05:00
Kevin Leimkuhler	e7f2a3fba3	viz: add tap-injector (#5540 ) ## What this changes This adds a tap-injector component to the `linkerd-viz` extension which is responsible for adding the tap service name environment variable to the Linkerd proxy container. If a pod does not have a Linkerd proxy, no action is taken. If tap is disabled via annotation on the pod or the namespace, no action is taken. This also removes the environment variable for explicitly disabling tap through an environment variable. Tap status for a proxy is now determined only be the presence or absence of the tap service name environment variable. Closes #5326 ## How it changes ### tap-injector The tap-injector component determines if `LINKERD2_PROXY_TAP_SVC_NAME` should be added to a pod's Linkerd proxy container environment. If the pod satisfies the following, it is added: - The pod has a Linkerd proxy container - The pod has not already been mutated - Tap is not disabled via annotation on the pod or the pod's namespace ### LINKERD2_PROXY_TAP_DISABLED Now that tap is an extension of Linkerd and not a core component, it no longer made sense to explicitly enable or disable tap through this Linkerd proxy environment variable. The status of tap is now determined only be if the tap-injector adds or does not add the `LINKERD2_PROXY_TAP_SVC_NAME` environment variable. ### controller image The tap-injector has been added to the controller image's several startup commands which determines what it will do in the cluster. As a follow-up, I think splitting out the `tap` and `tap-injector` commands from the controller image into a linkerd-viz image (or something like that) makes sense. Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2021-01-21 11:24:08 -05:00
Hu Shuai	08439f1f6e	Add unit test for pkg/charts/charts.go (#5565 ) Add tests for MergeMap Signed-off-by: Hu Shuai <hus.fnst@cn.fujitsu.com>	2021-01-20 09:55:01 -05:00
Yashvardhan Kukreja	b67bbe157b	add jaeger check: to confirm whether the jaeger injector pod is in running state or not (#5528 ) Currently, the linkerd jaeger check runs multiple checks but it doesn't have a check to confirm the state of the jaeger injector to be running. This commit adds that required check to confirm the running state of the jaeger injector pod. Fixes #5495 Signed-off-by: Yashvardhan Kukreja <yash.kukreja.98@gmail.com>	2021-01-19 08:35:16 +05:30
Tarun Pothulapati	0a2f1f3a26	viz: add check sub-command (#5496 ) * viz: add check sub-command This adds a new `viz check` cmd performing checks for the resources in linkerd-viz extension. Checks include resource checks and the health of resources, certs, etc Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-01-15 15:31:45 -05:00
Eugene Formanenko	535a36af7c	Add log-format flag to control plane components (#5537 ) Fixes #5536 Signed-off-by: Eugene Formanenko <mo4islona@gmail.com>	2021-01-15 10:51:32 -05:00
Alejandro Pedraza	dd9ea0aef4	Helm template helpers cleanup (#5514 ) Removed Helm template files no longer used, as well as some helper functions.	2021-01-14 09:05:31 -05:00
Alejandro Pedraza	f3b1ebfa99	Separate observability API (#5510 ) * Separate observability API Closes #5312 This is a preliminary step towards moving all the observability API into `/viz`, by first moving its protobuf into `viz/metrics-api`. This should facilitate review as the go files are not moved yet, which will happen in a followup PR. There are no user-facing changes here. - Moved `proto/common/healthcheck.proto` to `viz/metrics-api/proto/healthcheck.prot` - Moved the contents of `proto/public.proto` to `viz/metrics-api/proto/viz.proto` except for the `Version` Stuff. - Merged `proto/controller/tap.proto` into `viz/metrics-api/proto/viz.proto` - `grpc_server.go` now temporarily exposes `PublicAPIServer` and `VizAPIServer` interfaces to separate both APIs. This will get properly split in a followup. - The web server provides handlers for both interfaces. - `cli/cmd/public_api.go` and `pkg/healthcheck/healthcheck.go` temporarily now have methods to access both APIs. - Most of the CLI commands will use the Viz API, except for `version`. The other changes in the go files are just changes in the imports to point to the new protobufs. Other minor changes: - Removed `git add controller/gen` from `bin/protoc-go.sh`	2021-01-13 14:34:54 -05:00
Tarun Pothulapati	4c3d002501	viz: move sub-cmds using viz extension under viz cmd (#5485 ) * viz: move sub-cmds using viz extension under viz cmd Fixes #5327 , #5524 This branch moves the following commands, under the `linkerd viz` cmd as they use the viz extension to perform the job. - dashboard - edges - routes - stat - tap - top This also creates a new pkg `public-api` which fecilitates interaction and communication with public-api to be used across extensions. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> Co-authored-by: Alex Leong <alex@buoyant.io>	2021-01-13 12:11:25 +05:30
Alejandro Pedraza	a9317af3d8	Add back support for proxy resource settings (#5517 ) The last viz refactoring removed support for modifying the k8s resources used by the proxies injected into the control plane components (values like `tapProxyResources`, `prometheus.proxy.resources`, etc). This adds them back, using a consistent naming: `tap.proxy.resources`, `dashboard.proxy.resources`, etc. Also fixes the tap helm template that was making reference to `.Values.tapResources` instead of `.Values.tap.resources`. Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2021-01-12 10:56:43 -05:00
Tarun Pothulapati	836c077898	viz: add render golden tests (#5433 ) * viz: add render golden tests This branch adds golden tests for the viz install. This would be useful to track changes in render as more changes are added. This also moves the common code that is used across extensions to generate diffs into `testutil` to be able to be used widely. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-01-12 11:59:16 +05:30
Piyush Singariya	aa20c3e88e	Report namespace and pod name when port-forward fails (#5504 ) Subject Related to issue #5457 Problem Linkerd only reports the local port and the remote port whenever port-forwarding fails. Linkerd could print out namespace and port if port-forwarding fails instead of just at the error state and then force users to collate the port themselves Solution Linkerd needs to print the namespace and the pod name. - [x] Add two new string variables namespace and podName in `struct PortForward` - [x] assign the values to the variables when a new Instance is being created in `func NewPortForward()` run() function propagates the errors that occurred while port-forwarding - [x] Format the error being returned by `ForwardPorts()` from client-go using `fmt.Errorf()` and add `namespace` and `podName` as suffix and return error The error is being returned by ForwardPorts() from client-go https://github.com/kubernetes/client-go/blob/master/tools/portforward/portforward.go#L188 Fixes #5457 Signed-off-by: Piyush Singariya <piyushsingariya@gmail.com>	2021-01-11 15:49:27 -08:00
Tarun Pothulapati	8e3a7d714f	viz: Add HA Option through CLI (#5470 ) This PR adds `--ha` flag for `viz install` which overrides with the `values-ha.yaml` of the viz chart. This PR adds these functions in `pkg/charts` so that the same can be re-used elsewhere. ## Testing ```bash tarun in dev in on k3d-deep () linkerd2 on  tarun/viz-ha-nits [$?] via 🐹 v1.15.4 took 2s ❯ ./bin/go-run cli viz install \| grep 1024 tarun in dev in on k3d-deep () linkerd2 on  tarun/viz-ha-nits [$?] via 🐹 v1.15.4 took 2s ❯ ./bin/go-run cli viz install --ha \| grep 1024 memory: "1024Mi" tarun in dev in on k3d-deep () linkerd2 on  tarun/viz-ha-nits [$?] via 🐹 v1.15.4 took 2s ❯ ./bin/go-run cli viz install --ha --set grafana.resources.memory.limit=1023Mi \| grep 1023 memory: "1023Mi" ``` Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-01-07 14:00:43 -05:00
Tarun Pothulapati	ff841d54fc	viz: add a retry check for core control-plane pods before install (#5434 ) * viz: add a retry check for core control-plane pods before install This commit adds a new check so that `viz install` waits till the control-plane pods are up. For this to work, the `prometheus` sub-system check in control-plane self-check has been removed, as we re-use healthchecks to perform this. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-01-07 23:52:09 +05:30
Tarun Pothulapati	68c02d82d1	healthcheck: simplify Checker construction with a builder (#5475 ) Currently, Each new instance of `Checker` type have to manually set all the fields with the `NewChecker()`, even though most use-cases are fine with the defaults. This branch makes this simpler by using the Builder pattern, so that the users of `Checker` can override the defaults by using specific field methods when needed. Thus simplifying the code. This also removes some of the methods that were specific to tests, and replaces them with the currently used ones. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-01-06 14:32:39 -08:00
Alex Leong	790be8d972	Rename proxy-mutator to jaeger-injector (#5351 ) The name `proxy-mutator` is too generic. In particular, several different linkerd extensions will have mutating webhooks which mutate the proxy sidecar, the MutatingWebhookConfiguration resource is cluster scoped, and each one needs a unique name. We use the `jaeger-injector` name instead. This gives us a pattern to follow for future webhooks as well (e.g. `tap-injector` etc.) Signed-off-by: Alex Leong <alex@buoyant.io>	2021-01-06 10:00:07 -08:00
Matei David	a0e51fdfb5	Change injector proxy version annotation (#5338 ) (#5469 ) ### What When overriding the proxy version using annotations, the respective annotation displays the wrong information (`linkerd.io/proxy-version`). This is a simple fix to display the correct version for the annotation; instead of using the proxy image from the config for the annotation's value, we take it from the overriden values instead. Based on the discussion from #5338 I understood that when the image is updated it is reflected in the container image version but not the annotation. Alex's proposed fix seems to work like a charm so I can't really take credit for anything. I have attached below some before/after snippets of the deployments & pods. If there any additional changes required (or if I misunderstood anything) let me know and I'll gladly get it sorted :) #### Tests --- Didn't add any new tests, I built the images and just tested the annotation displays the correct version. To test: * I first injected an emojivoto-web deployment, its respective pod had the proxy version set to `dev-...`; * I then re-injected the same deployment using a different proxy version and restarted the pods, its respective pod displayed the expected annotation value `stable-2.9.0` (whereas before it would have still been `dev-...`) `Before` ``` # Deployment apiVersion: apps/v1 kind: Deployment ... template: metadata: annotations: kubectl.kubernetes.io/restartedAt: "2021-01-04T12:41:47Z" linkerd.io/inject: enabled # Pod apiVersion: v1 kind: Pod metadata: annotations: kubectl.kubernetes.io/restartedAt: "2021-01-04T12:41:47Z" linkerd.io/created-by: linkerd/proxy-injector dev-8d506317-madavid linkerd.io/identity-mode: default linkerd.io/inject: enabled linkerd.io/proxy-version: dev-8d506317-madavid ``` `After` ```sh $ linkerd inject --proxy-version stable-2.9.0 - \| kubectl apply -f - # Deployment apiVersion: apps/v1 kind: Deployment ... template: metadata: annotations: config.linkerd.io/proxy-version: stable-2.9.0 # Pod apiVersion: v1 kind: Pod metadata: annotations: config.linkerd.io/proxy-version: stable-2.9.0 kubectl.kubernetes.io/restartedAt: "2021-01-04T12:41:47Z" linkerd.io/created-by: linkerd/proxy-injector dev-8d506317-madavid linkerd.io/identity-mode: default linkerd.io/inject: enabled linkerd.io/proxy-version: stable-2.9.0 # linkerd.io/proxy-version changed after injection and now matches the config (and the proxy img) ``` Fixes #5338 Signed-off-by: Matei David <matei.david.35@gmail.com>	2021-01-06 11:13:11 -05:00
Naga Venkata Pradeep Namburi	df84a08ac8	Fix typo in healthcheck error message (#5445 ) Fixes #5438 Signed-off-by: pradeepnnv <pradeepnnv@gmail.com>	2021-01-06 09:44:07 +05:30
Lutz Behnke	8d50631727	remove check comparing ca.crt field in identity issuer secret and trust anchors in config (#5424 ) Currently the CA bundles in the config value `global.IdentityTrustAnchorsPEM` must not contain more than one certificate when the schema type is set to `kubernetes.io/tls` or the command `linkerd check` will fail. This change remove the comparison between the trust anchors configured in the linkerd config map and the contents of the `ca.crt` field of the identity issuer K8s secret. This is an alternative to MR #5396, which I will close as a result of the discussion with @adleong Fixes #5292 Signed-off-by: Lutz Behnke <lutz.behnke@finleap.com>	2020-12-23 11:14:02 -08:00
Tarun Pothulapati	2087c95dd8	viz: move some components into linkerd-viz (#5340 ) * viz: move some components into linkerd-viz This branch moves the grafana,prometheus,web, tap components into a new viz chart, following the same extension model that multi-cluster and jaeger follow. The components in viz are not injected during install time, and will go through the injector. The `viz install` does not have any cli flags to customize the install directly but instead follow the Helm way of customization by using flags such as `set`, `set-string`, `values`, `set-files`. Changes Include - Move `grafana`, `prometheus`, `web`, `tap` templates into viz extension. - Remove all add-on related charts, logic and tests w.r.t CLI & Helm. - Clean up `linkerd2/values.go` & `linkerd2/values.yaml` to not contain fields related to viz components. - Update `linkerd check` Healthchecks to not check for viz components. - Create a new top level `viz` directory with CLI logic and Helm charts. - Clean fields in the `viz/Values.yaml` to be in the `<component>.<property>` model. Ex: `prometheus.resources`, `dashboard.image.tag`, etc so that it is consistent everywhere. Testing ```bash # Install the Core Linkerd Installation ./bin/linkerd install \| k apply -f - # Wait for the proxy-injector to be ready # Install the Viz Extension ./bin/linkerd cli viz install \| k apply -f - # Customized Install ./bin/linkerd cli viz install --set prometheus.enabled=false \| k apply -f - ``` What is not included in this PR: - Move of Controller from core install into the viz extension. - Simplification and refactoring of the core chart i.e removing `.global`, etc. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-12-23 20:17:31 +05:30
Kevin Leimkuhler	f6c8d27d83	Add mulitcluster check command (#5410 ) ## What This change moves the `linkerd check --multicluster` functionality under it's own multicluster subcommand: `linkerd multicluster check`. There should be no functional changes as a result of this change. `linkerd check` no longer checks for anything multicluster related and the `--multicluster` flag has been removed. ## Why Closes #5208 The bulk of these changes are moving all the multicluster checks from `pkg/healthcheck` into the multicluster package. Doing this completely separates it from core Linkerd. It still uses `pkg/healtcheck` when possible, but anything that is used only by `multicluster check` has been moved. Note the the `kubernetes-api` and `linkerd-existence` checks are run. These checks are required for setting up the Linkerd health checker. They set the health checker's `kubeAPI`, `linkerdConfig`, and `apiClient` fields. These could be set manually so that the only check the user sees is `linkerd-multicluster`, but I chose not to do this. If any of the setting functions errors, it would just tell the user to run `linkerd check` and ensure the installation is correct. I find the user error handling to be better by including these required checks since they should be run in the first place. ## How to test Installing Linkerd and multicluster should result in a basic check output: ``` $ bin/linkerd install \|kubectl apply -f - .. $ bin/linkerd check .. $ bin/linkerd multicluster install \|kubectl apply -f - .. $ bin/linkerd multicluster check kubernetes-api -------------- √ can initialize the client √ can query the Kubernetes API linkerd-existence ----------------- √ 'linkerd-config' config map exists √ heartbeat ServiceAccount exist √ control plane replica sets are ready √ no unschedulable pods √ controller pod is running √ can initialize the client √ can query the control plane API linkerd-multicluster -------------------- √ Link CRD exists Status check results are √ ``` After linking a cluster: ``` $ bin/linkerd multicluster check kubernetes-api -------------- √ can initialize the client √ can query the Kubernetes API linkerd-existence ----------------- √ 'linkerd-config' config map exists √ heartbeat ServiceAccount exist √ control plane replica sets are ready √ no unschedulable pods √ controller pod is running √ can initialize the client √ can query the control plane API linkerd-multicluster -------------------- √ Link CRD exists √ Link resources are valid * k3d-y √ remote cluster access credentials are valid * k3d-y √ clusters share trust anchors * k3d-y √ service mirror controller has required permissions * k3d-y √ service mirror controllers are running * k3d-y × all gateway mirrors are healthy probe-gateway-k3d-y.linkerd-multicluster mirrored from cluster [k3d-y] has no endpoints see https://linkerd.io/checks/#l5d-multicluster-gateways-endpoints for hints Status check results are × ``` Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-12-21 15:50:17 -05:00
Kevin Leimkuhler	7c0843a823	Add opaque ports to destination service updates (#5294 ) ## Summary This changes the destination service to start indicating whether a profile is an opaque protocol or not. Currently, profiles returned by the destination service are built by chaining together updates coming from watching Profile and Traffic Split updates. With this change, we now also watch updates to Opaque Port annotations on pods and namespaces; if an update occurs this is now included in building a profile update and is sent to the client. ## Details Watching updates to Profiles and Traffic Splits is straightforward--we watch those resources and if an update occurs on one associated to a service we care about then the update is passed through. For Opaque Ports this is a little different because it is an annotation on pods or namespaces. To account for this, we watch the endpoints that we should care about. ### When host is a Pod IP When getting the profile for a Pod IP, we check for the opaque ports annotation on the pod and the pod's namespace. If one is found, we'll indicate if the profile is an opaque protocol if the requested port is in the annotation. We do not subscribe for updates to this pod IP. The only update we really care about is if the pod is deleted and this is already handled by the proxy. ### When host is a Service When getting the profile for a Service, we subscribe for updates to the endpoints of that service. For any ports set in the opaque ports annotation on any of the pods, we check if the requested port is present. Since the endpoints for a service can be added and removed, we do subscribe for updates to the endpoints of the service. Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-12-18 12:38:59 -05:00
Alejandro Pedraza	d661054795	Fix CLI install/upgrade overriding settings in HA (#5399 ) Fixes #5385 ## The problems - `linkerd install --ha` isn't honoring flags - `linkerd upgrade --ha` is overridding existing configs silently or failing with an error - Upgrading HA instances from before 2.9 to version 2.9.1 results in configs being overridden silently, or the upgrade fails with an error ## The cause The change in #5358 attempted to fix `linkerd install --ha` that was only applying some of the `values-ha.yaml` defaults, by calling `charts.NewValues(true)` and merging that with the values built from `values.yaml` overriden by the flags. It turns out the `charts.NewValues()` implementation was by itself merging against `values.yaml` and as a result any flag was getting overridden by its default. This also happened when doing `linkerd upgrade --ha` on an existing instance, which could result in silently overriding settings, or it could also fail loudly like for example when upgrading set up that has an external issuer (in this case the issuer cert won't be able to be read during upgrade and an error would occur as described in #5385). Finally, when doing `linkerd upgrade` (no --ha flag) on an HA install from before 2.9 results in configs getting overridden as well (silently or with an error) because in order to generate the `linkerd-config-overrides` secret, the original install flags are retrieved from `linkerd-config` via the `loadStoredValuesLegacy()` function which then effectively ends up performing a `linkerd upgrade` with all the flags used for `linkerd install` and falls into the same trap as above. ## The fix In `values.go` the faulting merging logic is not used anymore, so now `NewValues()` only returns the default values from `values.yaml` and doesn't require an argument anymore. It calls `readDefaults()` which now only returns the appropriate values depending on whether we're on HA or not. There's a new function `MergeHAValues()` that merges `values-ha.yaml` into the current values (it doesn't look into `values.yaml` anymore), which is only used when processing the `--ha` flag in `options.go`. ## How to test To replicate the issue try setting a custom setting and check it's not applied: ```bash linkerd install --ha --controller-log level debug \| grep log.level - -log-level=info ``` ## Followup This wasn't caught because we don't have HA integration tests. Now that our test infra is based on k3d, it should be easy to make such a test using a cluster with multiple nodes. Either that or issuing `linkerd install --ha` with additional configs and compare against a golden file.	2020-12-18 12:11:52 -05:00
Alejandro Pedraza	578d4a19e9	Have the tap APIServer refresh its cert automatically (#5388 ) Followup to #5282, fixes #5272 in its totality. This follows the same pattern as the injector/sp-validator webhooks, leveraging `FsCredsWatcher` to watch for changes in the cert files. To reuse code from the webhooks, we moved `updateCert()` to `creds_watcher.go`, and `run()` as well (which now is called `ProcessEvents()`). The `TestNewAPIServer` test in `apiserver_test.go` was removed as it really was just testing two things: (1) that `apiServerAuth` doesn't error which is already covered in the following test, and (2) that the golib call `net.Listen("tcp", addr)` doesn't error, which we're not interested in testing here. ## How to test To test that the injector/sp-validator functionality is still correct, you can refer to #5282 The steps below are similar, but focused towards the tap component: ```bash # Create some root cert $ step certificate create linkerd-tap.linkerd.svc ca.crt ca.key --profile root-ca --no-password --insecure # configure tap's caBundle to be that root cert $ cat > linkerd-overrides.yml << EOF tap: externalSecret: true caBundle: \| < ca.crt contents> EOF # Install linkerd $ bin/linkerd install --config linkerd-overrides.yml \| k apply -f - # Generate an intermediatery cert with short lifespan $ step certificate create linkerd-tap.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-tap.linkerd.svc # Create the secret using that intermediate cert $ kubectl create secret tls \ linkerd-tap-k8s-tls \ --cert=ca-int.crt \ --key=ca-int.key \ --namespace=linkerd # Rollout the tap pod for it to pick the new secret $ k -n linkerd rollout restart deploy/linkerd-tap # Tap should work $ bin/linkerd tap -n linkerd deploy/linkerd-web req id=0:0 proxy=in src=10.42.0.15:33040 dst=10.42.0.11:9994 tls=true :method=GET :authority=10.42.0.11:9994 :path=/metrics rsp id=0:0 proxy=in src=10.42.0.15:33040 dst=10.42.0.11:9994 tls=true :status=200 latency=1779µs end id=0:0 proxy=in src=10.42.0.15:33040 dst=10.42.0.11:9994 tls=true duration=65µs response-length=1709B # Wait 5 minutes and rollout tap again $ k -n linkerd rollout restart deploy/linkerd-tap # You'll see in the logs that the cert expired: $ k -n linkerd logs -f deploy/linkerd-tap tap 2020/12/15 16:03:41 http: TLS handshake error from 127.0.0.1:45866: remote error: tls: bad certificate 2020/12/15 16:03:41 http: TLS handshake error from 127.0.0.1:45870: remote error: tls: bad certificate # Recreate the secret $ step certificate create linkerd-tap.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-tap.linkerd.svc $ k -n linkerd delete secret linkerd-tap-k8s-tls $ kubectl create secret tls \ linkerd-tap-k8s-tls \ --cert=ca-int.crt \ --key=ca-int.key \ --namespace=linkerd # Wait a few moments and you'll see the certs got reloaded and tap is working again time="2020-12-15T16:03:42Z" level=info msg="Updated certificate" addr=":8089" component=apiserver ```	2020-12-16 17:46:14 -05:00
Tarun Pothulapati	589f36c4c2	jaeger: add check sub command (#5295 ) * jaeger: add check sub command This adds a new `linkerd jaeger check` command to have checks w.r.t jaeger extension. This is similar to that of the `linkerd check` cmd. As jaeger is a separate package, It was a bit complex for this to work as not all types and fields from healthcheck pkg are public, Helper funcs were used to mitigate this. This has the following changes: - Adds a new `check.go` file under the jaeger extension pkg - Moves some commonly needed funcs and types from `cli/cmd/check.go` and `pkg/healthcheck/health.go` into `pkg/healthcheck/healthcheck_output.go`. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-12-17 00:26:34 +05:30
Alex Leong	74950e9407	Add jaeger uninstall command (#5353 ) Add a `linkerd jaeger uninstall` command which prints the linkerd-jaeger extension resources so that they can be deleted. This is similar to the `linkerd uninstall` command. ``` > bin/linkerd jaeger uninstall \| k delete -f - clusterrole.rbac.authorization.k8s.io "linkerd-jaeger-linkerd-jaeger-proxy-mutator" deleted clusterrolebinding.rbac.authorization.k8s.io "linkerd-jaeger-linkerd-jaeger-proxy-mutator" deleted mutatingwebhookconfiguration.admissionregistration.k8s.io "linkerd-proxy-mutator-webhook-config" deleted namespace "linkerd-jaeger" deleted ``` Signed-off-by: Alex Leong <alex@buoyant.io>	2020-12-14 15:48:44 -08:00
Tarun Pothulapati	c19cfd71a1	upgrades: make webhooks restart if TLS creds are updated (#5349 ) * upgrades: make webhooks restart if TLS creds are updated Fixes #5231 Currently, we do not re-use the TLS certs during upgrades, which means that the secrets are updated while the webhooks are still paired with the older ones, causing the webhook requests to fail. This can be solved by making webhooks be restarted whenever there is a change in the certs. This can be performed by storing the hash of the `*-rbac` file, which contains the secrets, thus making the pod templates change whenever there is an update to the certs thus making restarts required. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-12-10 11:56:53 -05:00
Alejandro Pedraza	35612ae268	`linkerd install --ha` was only partially applying HA config (#5358 ) * `linkerd install --ha` was only partially applying HA config Fixes #5342 `values-ha.yml` contains the specific config for HA, but only the proxy resources controller replicas settings were applied. This PR adds EnablePodAntiafinity, WebhookFailurePolicy and all the resource settings for the other CP pods. Also the `--controller-replicas` flag is moved after the HA flags so it can override the HA settings. Finally, some comments no longer relevant were removed. ## How to test Perform `linkerd install --ha` and make sure the values in `values-ha.yml` are propagated correctly in the produced yaml. ## 2.9.1 After merging to `main`, this should be cherry-picked into the `release/stable-2.9` branch. Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-12-09 15:23:37 -05:00
Alex Leong	cdc57d1af0	Use linkerd-jaeger extension for control plane tracing (#5299 ) Now that tracing has been split out of the main control plane and into the linkerd-jaeger extension, we remove references to tracing from the main control plane including: * removing the tracing components from the main control plane chart * removing the tracing injection logic from the main proxy injector and inject CLI (these will be added back into the new injector in the linkerd-jaeger extension) * removing tracing related checks (these will be added back into `linkerd jaeger check`) * removing related tests We also update the `--control-plane-tracing` flag to configure the control plane components to send traces to the linkerd-jaeger extension. To make sure this works even when the linkerd-jaeger extension is installed in a non-default namespace, we also add a `--control-plane-tracing-namespace` flag which can be used to change the namespace that the control plane components send traces to. Note that for now, only the control plane components send traces; the proxies in the control plane do not. This is because the linkerd-jaeger injector is not yet available. However, this change adds the appropriate namespace annotations to the control plane namespace to configure the proxies to send traces to the linkerd-jaeger extension once the linkerd-jaeger injector is available. I tested this by doing the following: 1. bin/linkerd install \| kubectl apply -f - 1. bin/helm install jaeger jaeger/charts/jaeger 1. bin/linkerd upgrade --control-plane-tracing=true \| kubectl apply -f - 1. kubectl -n linkerd-jaeger port-forward svc/jaeger 16686 1. open http://localhost:16686 1. see traces from the linkerd control plane Signed-off-by: Alex Leong <alex@buoyant.io>	2020-12-08 14:34:26 -08:00
Tarun Pothulapati	72a0ca974d	extension: Separate multicluster chart and binary (#5293 ) Fixes #5257 This branch movies mc charts and cli level code to a new top level directory. None of the logic is changed. Also, moves some common types into `/pkg` so that they are accessible both to the main cli and extensions. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-12-04 16:36:10 -08:00
Tarun Pothulapati	47a49e5ac5	jaeger: Add support for override flags (#5304 ) This change adds flags `set`, `set-string`, `values`, `set-files`, etc flags which are used to override the default values. This is similar to that of Helm. This also updates the install workflow to directly use Helm v3 pkg for chart loading and generation, without having to use our chart type, etc. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-12-04 16:35:39 -08:00
Alejandro Pedraza	4c634a3816	Have webhooks refresh their certs automatically (#5282 ) * Have webhooks refresh their certs automatically Fixes partially #5272 In 2.9 we introduced the ability for providing the certs for `proxy-injector` and `sp-validator` through some external means like cert-manager, through the new helm setting `externalSecret`. We forgot however to have those services watch changes in their secrets, so whenever they were rotated they would fail with a cert error, with the only workaround being to restart those pods to pick the new secrets. This addresses that by first abstracting out `FsCredsWatcher` from the identity controller, which now lives under `pkg/tls`. The webhook's logic in `launcher.go` no longer reads the certs before starting the https server, moving that instead into `server.go` which in a similar way as identity will receive events from `FsCredsWatcher` and update `Server.cert`. We're leveraging `http.Server.TLSConfig.GetCertificate` which allows us to provide a function that will return the current cert for every incoming request. ### How to test ```bash # Create some root cert $ step certificate create linkerd-proxy-injector.linkerd.svc ca.crt ca.key \ --profile root-ca --no-password --insecure --san linkerd-proxy-injector.linkerd.svc # configure injector's caBundle to be that root cert $ cat > linkerd-overrides.yaml << EOF proxyInjector: externalSecret: true caBundle: \| < ca.crt contents> EOF # Install linkerd. The injector won't start untill we create the secret below $ bin/linkerd install --controller-log-level debug --config linkerd-overrides.yaml \| k apply -f - # Generate an intermediatery cert with short lifespan step certificate create linkerd-proxy-injector.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-proxy-injector.linkerd.svc # Create the secret using that intermediate cert $ kubectl create secret tls \ linkerd-proxy-injector-k8s-tls \ --cert=ca-int.crt \ --key=ca-int.key \ --namespace=linkerd # start following the injector log $ k -n linkerd logs -f -l linkerd.io/control-plane-component=proxy-injector -c proxy-injector # Inject emojivoto. The pods should be injected normally $ bin/linkerd inject https://run.linkerd.io/emojivoto.yml \| kubectl apply -f - # Wait about 5 minutes and delete a pod $ k -n emojivoto delete po -l app=emoji-svc # You'll see it won't be injected, and something like "remote error: tls: bad certificate" will appear in the injector logs. # Regenerate the intermediate cert $ step certificate create linkerd-proxy-injector.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-proxy-injector.linkerd.svc # Delete the secret and recreate it $ k -n linkerd delete secret linkerd-proxy-injector-k8s-tls $ kubectl create secret tls \ linkerd-proxy-injector-k8s-tls \ --cert=ca-int.crt \ --key=ca-int.key \ --namespace=linkerd # Wait a couple of minutes and you'll see some filesystem events in the injector log along with a "Certificate has been updated" entry # Then delete the pod again and you'll see it gets injected this time $ k -n emojivoto delete po -l app=emoji-svc ```	2020-12-04 16:25:59 -05:00
Björn Wenzel	0ee18eb168	Allow Multicluster Service to be non LoadBalancer ServiceType (#5307 ) Signed-off-by: Björn Wenzel <bjoern.wenzel@dbschenker.com>	2020-12-03 13:03:49 -05:00
Alejandro Pedraza	9cbfb08a38	Bump proxy-init to v1.3.8 (#5283 )	2020-11-27 09:07:34 -05:00
Tarun Pothulapati	e7f4c31257	extension: Add new jaeger binary (#5278 ) * extension: Add new jaeger binary This branch adds a new jaeger binary project in the jaeger directory. This follows the same logic as that of `linkerd install`. But as `linkerd install` VFS logic expects charts to be present in `/charts` directory, This command gets its own static pkg to generate its own VFS for its chart. This covers only the install part of the command Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-11-25 20:10:35 +05:30
Tarun Pothulapati	fafeee38d7	Upgrade to Helm v3 sdk (#4878 ) Fixes #4874 This branch upgrades Helm sdk from v2 to v3 without any functionaly changes, just replacing types with newer API's. This should not effect our current support for Helm v2 as we did not change any of the underlying tempaltes(which work with Helm v2). This works becuase we did not use any of the API's that read the Chart metadata (which are the only ones changed from v2 to v3) and currently manually load files and pass ito the sdk. This PR should provide a great point to start more of the newer Helm v3 API's including for the upgrade workflow thus allowing us to make Linkerd CLI more simpler. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-11-23 12:46:41 -08:00
hodbn	92eb174e06	Add safe accessor for Global in linkerd-config (#5269 ) CLI crashes if linkerd-config contains unexpected values. Add a safe accessor that initializes an empty Global on the first access. Refactor all accesses to use the newly introduced accessor using gopls. Add test for linkerd-config data without Global. Fixes #5215 Co-authored-by: Itai Schwartz <yitai27@gmail.com> Signed-off-by: Hod Bin Noon <bin.noon.hod@gmail.com>	2020-11-23 12:45:58 -08:00
Tarun Pothulapati	b389054d53	cli: Don't check for SAN in root and intermediate certs (#5237 ) As discussed in #5228, it is not correct for root and intermediate certs to have SAN. This PR updates the check to not verify the intermediate issuer cert with the identity dns name (which checks with SAN and not CN as the the `verify` func is used to verify leaf certs and not root and intermediate certs). This PR also avoids setting a SAN field when generating certs in the `install` command. Fixes #5228	2020-11-18 15:30:39 -08:00
Alejandro Pedraza	5a707323e6	Update proxy-init to v1.3.7 (#5221 ) This upgrades both the proxy-init image itself, and the go dependency on proxy-init as a library, which fixes CNI in k3s and any host using binaries coming from BusyBox, where `nsenter` has an issue parsing arguments (see rancher/k3s#1434).	2020-11-13 15:59:14 -05:00
Tarun Pothulapati	d9a6e217f9	nit: return crtExpiry even for External Certs (#5173 ) This change updates `FetchExternalIssuerData` to be more like `FetchIssuerData` and return expiry correctly. This field is currently not used anywhere and is just done for consistentcy purposes. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-11-03 13:15:53 -05:00
Oliver Gould	4d85b6cd65	inject: Set LINKERD2_PROXY_CORES from the cpu limit (#5170 ) Per #5165, Kubernetes does not necessarily limit the proxy's access to cores via `cgroups` when a CPU limit is set. As of #5168, the proxy now supports a `LINKERD2_PROXY_CORES` environment configuration that augments CPU detection from the host operating system. This change modifies the proxy injector to ensure that this environment is configured from the `Values.proxy.cores` Helm value, the `config.linkerd.io/proxy-cpu-limit` annotation, and the `--proxy-cpu-limit` install flag.	2020-11-03 10:02:31 -08:00
Oliver Gould	d6cb0c56cb	ha: Remove CPU limits for control plane components (#5171 ) As discussed in #5167 & #5169, Kubernetes CPU limits are not necessarily discoverable from within the pod. This means that the control plane processes may allocate far more threads than can actually be used by the process given its process limits. This change removes the default CPU limits for all control plane components. CPU limits may still be set via Helm configuration.	2020-11-03 09:18:36 -08:00
Oliver Gould	04e15c8544	ha: Do not set a default CPU limit (#5169 ) Now that the proxy can use more than one core, this behavior should be enabled by default, even in HA mode. This change modifies the default HA helm values to unset the cpu limit for proxy containers.	2020-11-03 07:53:36 -08:00
Tarun Pothulapati	14b8b8c792	upgrade: set identity.issuer.crtExpiry correctly with legacy upgrades (#5161 ) With legacy upgrades, we can parse the cert and store the expiry correctly instead of storing it as the default value which could be a problem when we use that field. Currently, we do not use this field and hence it did not cause any problems. Install on the latest edges, This field is correctly set and works as expected. Thus, upgrades also have the right value. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-11-03 00:19:18 +05:30
Tarun Pothulapati	262d5e041c	charts: Do not store .component in linkerd-config (#5144 ) * charts: Do not store .component in linkerd-config This removes the `.component` fields from `Values.go` and also prevents them from being emitted into `linkerd-config` by attaching them into a temporary variable during injection. This also simplies inbound and outbound Skip ports helm logic and adds quotes to them. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-11-02 20:41:37 +05:30
Alex Leong	da194f5dc3	Warn when webhook certificates near expiry (#5155 ) Fixes #5149 Before: ``` linkerd-webhooks-and-apisvc-tls ------------------------------- × tap API server has valid cert certificate will expire on 2020-10-28T20:22:32Z see https://linkerd.io/checks/#l5d-tap-cert-valid for hints ``` After: ``` linkerd-webhooks-and-apisvc-tls ------------------------------- √ tap API server has valid cert ‼ tap API server cert is valid for at least 60 days certificate will expire on 2020-10-28T20:22:32Z see https://linkerd.io/checks/#l5d-webhook-cert-not-expiring-soon for hints √ proxy-injector webhook has valid cert ‼ proxy-injector cert is valid for at least 60 days certificate will expire on 2020-10-29T18:17:03Z see https://linkerd.io/checks/#l5d-webhook-cert-not-expiring-soon for hints √ sp-validator webhook has valid cert ‼ sp-validator cert is valid for at least 60 days certificate will expire on 2020-10-28T20:21:34Z see https://linkerd.io/checks/#l5d-webhook-cert-not-expiring-soon for hints ``` Signed-off-by: Alex Leong <alex@buoyant.io>	2020-10-30 11:48:51 -07:00

1 2 3 4 5 ...

512 Commits