linkerd2

Commit Graph

Author	SHA1	Message	Date
Alex Leong	da194f5dc3	Warn when webhook certificates near expiry (#5155 ) Fixes #5149 Before: ``` linkerd-webhooks-and-apisvc-tls ------------------------------- × tap API server has valid cert certificate will expire on 2020-10-28T20:22:32Z see https://linkerd.io/checks/#l5d-tap-cert-valid for hints ``` After: ``` linkerd-webhooks-and-apisvc-tls ------------------------------- √ tap API server has valid cert ‼ tap API server cert is valid for at least 60 days certificate will expire on 2020-10-28T20:22:32Z see https://linkerd.io/checks/#l5d-webhook-cert-not-expiring-soon for hints √ proxy-injector webhook has valid cert ‼ proxy-injector cert is valid for at least 60 days certificate will expire on 2020-10-29T18:17:03Z see https://linkerd.io/checks/#l5d-webhook-cert-not-expiring-soon for hints √ sp-validator webhook has valid cert ‼ sp-validator cert is valid for at least 60 days certificate will expire on 2020-10-28T20:21:34Z see https://linkerd.io/checks/#l5d-webhook-cert-not-expiring-soon for hints ``` Signed-off-by: Alex Leong <alex@buoyant.io>	2020-10-30 11:48:51 -07:00
Tarun Pothulapati	4c106e9c08	cli: make check return SkipError when there is no prometheus configured (#5150 ) Fixes #5143 The availability of prometheus is useful for some calls in public-api that the check uses. This change updates the ListPods in public-api to still return the pods even when prometheus is not configured. For a test that exclusively checks for prometheus metrics, we have a gate which checks if a prometheus is configured and skips it othervise. Signed-off-by: Tarun Pothulapati tarunpothulapati@outlook.com	2020-10-29 19:57:11 +05:30
Alejandro Pedraza	177669b377	Remove code refs to controllerImageVersion (#5119 ) Followup to #5100 We had both `controllerImageVersion` and `global.controllerImageVersion` configs, but only the latter was taken into account in the chart templates, so this change removes all of its references.	2020-10-21 13:40:25 -05:00
Oliver Gould	84b1a826bd	Replace global.proxy.destinationGetNetworks with global.clusterNetworks (#5110 ) There is no longer a proxy config `DESTINATION_GET_NETWORKS`. Instead of reflecting this implementation in our values.yaml, this changes this variable to the more general `clusterNetworks` to emphasize its similarity to `clusterDomain` for the purposes of discovery.	2020-10-20 19:05:31 -07:00
Alex Leong	9701f1944e	Stop rendering addon config (#5078 ) The linkerd-addon-config is no longer used and can be safely removed. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-10-16 11:07:51 -07:00
Tarun Pothulapati	2a5e7dba62	Handle grafana add-on config repair (#5059 ) * Handle grafana add-on config repair Fixes #5014 In Grafana Add-On, Default fields i.e `grafana.image.name`, `grafana.name` have been removed from `linkerd-config-addons` after `2.8.1`. Only overriden values are stored in `linkerd-config-addons` as of now. Hence, `grafana.image.name` has to be removed from `linkerd-config-addons` unless they are overriden so that updates to it can take place especially the move from `gcr` to `ghcr`. This also removes `grafana.name` field if they are set to default, as its removed. This problem will not occur again even if we update default values, as default values are not stored in `linekrd-config-addons` anymore for all add-ons. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-10-13 13:12:49 -07:00
Tarun Pothulapati	faf77798f0	Update check to use new linkerd-config.values (#5023 ) This branch updates the check functionality to read the new `linkerd-config.values` which contains the full Values struct showing the current state of the Linkerd installation. (being added in #5020 ) This is done by adding a new `FetchCurrentConfiguraiton` which first tries to get the latest, if not falls back to the older `linkerd-config` protobuf format.` Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-10-01 11:19:25 -07:00
Lutz Behnke	de098cd52d	make api service secrets compatible to cert manager (#4737 ) Currently the secrets for the proxy-injector, sp-validator webhooks and tap API service are using the Opaque secret type and linkerd-specific field names. This makes it impossible to use cert-manager (https://github.com/jetstack/cert-manager) to provisions and rotate the secrets for these services. This change converts the secrets defined in the linkerd2 helm charts and the controller use the kubernetes.io/tls format instead. This format is used for secrets containing the generated secrets by cert-manager. Signed-off-by: Lutz Behnke <lutz.behnke@finleap.com>	2020-09-29 09:17:09 -05:00
Tarun Pothulapati	d0caaa86c4	Bump k8s client-go to v0.19.2 (#5002 ) Fixes #4191 #4993 This bumps Kubernetes client-go to the latest v0.19.2 (We had to switch directly to 1.19 because of this issue). Bumping to v0.19.2 required upgrading to smi-sdk-go v0.4.1. This also depends on linkerd/stern#5 This consists of the following changes: - Fix ./bin/update-codegen.sh by adding the template path to the gen commands, as it is needed after we moved to GOMOD. - Bump all k8s related dependencies to v0.19.2 - Generate CRD types, client code using the latest k8s.io/code-generator - Use context.Context as the first argument, in all code paths that touch the k8s client-go interface Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-09-28 12:45:18 -05:00
Tarun Pothulapati	ecce5b91f6	tests: Add Calico CNI deep integration tests (#4952 ) * tests: Add new CNI deep integration tests Fixes #3944 This PR adds a new test, called cni-calico-deep which installs the Linkerd CNI plugin on top of a cluster with Calico and performs the current integration tests on top, thus validating various Linkerd features when CNI is enabled. For Calico to work, special config is required for kind which is at `cni-calico.yaml` This is different from the CNI integration tests that we run in cloud integration which performs the CNI level integration tests. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-09-23 19:58:28 +05:30
Tarun Pothulapati	f75b9fe374	tracing: Move default values into addon-chart (#4951 ) * tracing: Move default values into chart This branch updates the tracing add-on's values into their own chart's values.yaml (just like grafana and prometheus). This prevents them from being saved into `linkerd-config-addons` where only the overridden values are stored. Thus allowing us to change the defaults. This also - Updates the check command to fall back to default values, if there are no overridden name fields. - Updates jaeger to `1.19.2` Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-09-15 15:19:25 -05:00
Alejandro Pedraza	ccf027c051	Push docker images to ghcr.io instead of gcr.io (#4953 ) * Push docker images to ghcr.io instead of gcr.io The `cloud_integration.yml` and `release.yml` workflows were modified to log into ghcr.io, and remove the `Configure gcloud` step which is no longer necessary. Note that besides the changes to cloud_integration.yml and release.yml, there was a change to the upgrade-stable integration test so that we do linkerd upgrade --addon-overwrite to reset the addons settings because in stable-2.8.1 the Grafana image was pegged to gcr.io/linkerd-io/grafana in linkerd-config-addons. This will need to be mentioned in the 2.9 upgrade notes. Also the egress integration test has a debug container that now is pegged to the edge-20.9.2 tag. Besides that, the other changes are just a global search and replace (s/gcr.io\/linkerd-io/ghcr.io\/linkerd/).	2020-09-10 15:16:24 -05:00
Zahari Dichev	084bb678c7	Perform TLS checks on injector, sp validator and tap (#4924 ) * Check sp-validator,proxy-injector and tap certs Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-09-10 11:21:23 -05:00
Alex Leong	33ddd4e357	Use correct component name in multicluster checks (#4921 ) The multicluster checks make sure that the correct resources exist for each service mirror controller. When looking up these resources, it uses the `linkerd.io/control-plane-component=linkerd-service-mirror` label selector. However, these resources have the label `linkerd.io/control-plane-component=service-mirror`. This causes the resource lookup to fail to find the resource and the check spuriously fails. ``` × service mirror controller has required permissions missing ServiceAccounts: linkerd-service-mirror-self missing ClusterRoles: linkerd-service-mirror-access-local-resources-self missing ClusterRoleBindings: linkerd-service-mirror-access-local-resources-self missing Roles: linkerd-service-mirror-read-remote-creds-self missing RoleBindings: linkerd-service-mirror-read-remote-creds-self see https://linkerd.io/checks/#l5d-multicluster-source-rbac-correct for hints \| * no service mirror controller deployment for Link self ``` Instead, use the correct label selector when looking up these resources. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-08-31 13:40:53 -07:00
Tarun Pothulapati	c9c5d97405	Remove SMI-Metrics charts and commands (#4843 ) Fixes #4790 This PR removes both the SMI-Metrics templates along with the experimental sub-commands. This also removes pkg `smi-metrics` as there is no direct use of it without the commands. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-08-24 14:35:33 -07:00
Zahari Dichev	c25f0a3af5	Triger kube-system HA check based on webhook failure policy (#4861 ) This PR changes the HA check that verifies that the `config.linkerd.io/admission-webhooks=disabled` is present on kube-system to be enabled only when the failure policy for the proxy injector webhook is set to `Fail`. This allows users to skip this check in cases when the label is removed because the namespace is managed by the cloud provider like in the case described in #4754 Fix #4754 Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-08-17 13:56:03 +03:00
Josh Soref	72aadb540f	Spelling (#4872 ) This PR corrects misspellings identified by the [check-spelling action](https://github.com/marketplace/actions/check-spelling). The misspellings have been reported at `aaf440489e (commitcomment-41423663)` The action reports that the changes in this PR would make it happy: `5b82c6c5ca` Note: this PR does not include the action. If you're interested in running a spell check on every PR and push, that can be offered separately. Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>	2020-08-12 21:59:50 -07:00
Alejandro Pedraza	4876a94ed0	Update proxy-init version to v1.3.6 (#4850 ) Supersedes #4846 Bump proxy-init to v1.3.6, containing CNI fixes and support for multi-arch builds. #4846 included this in v1.3.5 but proxy.golang.org refused to update the modified SHA	2020-08-11 11:54:00 -05:00
Tarun Pothulapati	7e5804d1cf	grafana: move default values into values file (#4755 ) This PR moves default values into add-on specific values.yaml thus allowing us to update default values as they would not be present in linkerd-config-addons cm. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-08-06 13:57:28 -07:00
Alex Leong	024a35a3d3	Move multicluster API connectivity checks earlier (#4819 ) Fixes #4774 When a service mirror controller is unable to connect to the target cluster's API, the service mirror controller crashes with the error that it has failed to sync caches. This error lacks the necessary detail to debug the situation. Unfortunately, client-go does not surface more useful information about why the caches failed to sync. To make this more debuggable we do a couple things: 1. When creating the target cluster api client, we eagerly issue a server version check to test the connection. If the connection fails, the service-mirror-controller logs now look like this: ``` time="2020-07-30T23:53:31Z" level=info msg="Got updated link broken: {Name:broken Namespace:linkerd-multicluster TargetClusterName:broken TargetClusterDomain:cluster.local TargetClusterLinkerdNamespace:linkerd ClusterCredentialsSecret:cluster-credentials-broken GatewayAddress:35.230.81.215 GatewayPort:4143 GatewayIdentity:linkerd-gateway.linkerd-multicluster.serviceaccount.identity.linkerd.cluster.local ProbeSpec:ProbeSpec: {path: /health, port: 4181, period: 3s} Selector:{MatchLabels:map[] MatchExpressions:[{Key:mirror.linkerd.io/exported Operator:Exists Values:[]}]}}" time="2020-07-30T23:54:01Z" level=error msg="Unable to create cluster watcher: cannot connect to api for target cluster remote: Get \"https://36.199.152.138/version?timeout=32s\": dial tcp 36.199.152.138:443: i/o timeout" ``` This error also no longer causes the service mirror controller to crash. Updating the Link resource will cause the service mirror controller to reload the credentials and try again. 2. We rearrange the checks in `linkerd check --multicluster` to perform the target API connectivity checks before the service mirror controller checks. This means that we can validate the target cluster API connection even if the service mirror controller is not healthy. We also add a server version check here to quickly determine if the connection is healthy. Sample check output: ``` linkerd-multicluster -------------------- √ Link CRD exists √ Link resources are valid * broken W0730 16:52:05.620806 36735 transport.go:243] Unable to cancel request for promhttp.RoundTripperFunc × remote cluster access credentials are valid * failed to connect to API for cluster: [broken]: Get "https://36.199.152.138/version?timeout=30s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) see https://linkerd.io/checks/#l5d-smc-target-clusters-access for hints W0730 16:52:35.645499 36735 transport.go:243] Unable to cancel request for promhttp.RoundTripperFunc × clusters share trust anchors Problematic clusters: * broken: unable to fetch anchors: Get "https://36.199.152.138/api/v1/namespaces/linkerd/configmaps/linkerd-config?timeout=30s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) see https://linkerd.io/checks/#l5d-multicluster-clusters-share-anchors for hints √ service mirror controller has required permissions * broken √ service mirror controllers are running * broken × all gateway mirrors are healthy wrong number of (0) gateway metrics entries for probe-gateway-broken.linkerd-multicluster see https://linkerd.io/checks/#l5d-multicluster-gateways-endpoints for hints √ all mirror services have endpoints ‼ all mirror services are part of a Link mirror service voting-svc-gke.emojivoto is not part of any Link see https://linkerd.io/checks/#l5d-multicluster-orphaned-services for hints ``` Some logs from the underlying go network libraries sneak into the output which is kinda gross but I don't think it interferes too much with being able to understand what's going on. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-08-05 11:48:23 -07:00
cpretzer	670caaf8ff	Update to proxy-init v1.3.4 (#4815 ) Signed-off-by: Charles Pretzer <charles@buoyant.io>	2020-07-30 15:58:58 -05:00
Alex Leong	a1543b33e3	Add support for service-mirror selectors (#4795 ) * Add selector support Signed-off-by: Alex Leong <alex@buoyant.io> * Removed unused labels Signed-off-by: Alex Leong <alex@buoyant.io>	2020-07-30 10:07:14 -07:00
Alejandro Pedraza	2aea2221ed	Fixed `linkerd check` not finding Prometheus (#4797 ) * Fixed `linkerd check` not finding Prometheus ## The Problem `linkerd check` run right after install is failing because it can't find the Prometheus Pod. ## The Cause The "control plane pods are ready" check used to verify the existence of all the control plane pods, blocking until all the pods were ready. Since #4724, Prometheus is no longer included in that check because it's checked separately as an add-on. An unintended consequence is that when the ensuing "control plane self-check" is triggered, Prometheus might not be ready yet and the check fails because it doesn't do retries. ## The Fix The "control plane self-check" uses a gRPC call (it's the only check that does that) and those weren't designed with retries in mind. This PR adds retry functionality to the `runCheckRPC()` function, making sure the final output remains the same It also temporarily disables the `upgrade-edge` integration test because after installing edge-20.7.4 `linkerd check` will fail because of this.	2020-07-27 11:54:03 -05:00
Alex Leong	d540e16c8b	Make service mirror controller per target cluster (#4710 ) This PR removes the service mirror controller from `linkerd mc install` to `linkerd mc link`, as described in https://github.com/linkerd/rfc/pull/31. For fuller context, please see that RFC. Basic multicluster functionality works here including: * `linkerd mc install` installs the Link CRD but not any service mirror controllers * `linkerd mc link` creates a Link resource and installs a service mirror controller which uses that Link * The service mirror controller creates and manages mirror services, a gateway mirror, and their endpoints. * The `linkerd mc gateways` command lists all linked target clusters, their liveliness, and probe latences. * The `linkerd check` multicluster checks have been updated for the new architecture. Several checks have been rendered obsolete by the new architecture and have been removed. The following are known issues requiring further work: * the service mirror controller uses the existing `mirror.linkerd.io/gateway-name` and `mirror.linkerd.io/gateway-ns` annotations to select which services to mirror. it does not yet support configuring a label selector. * an unlink command is needed for removing multicluster links: see https://github.com/linkerd/linkerd2/issues/4707 * an mc uninstall command is needed for uninstalling the multicluster addon: see https://github.com/linkerd/linkerd2/issues/4708 Signed-off-by: Alex Leong <alex@buoyant.io>	2020-07-23 14:32:50 -07:00
Tarun Pothulapati	986e0d4627	prometheus: add add-on checks (#4756 ) As linkerd-prometheus is optional now, the checks are also separated and should only work when the prometheus add-on is installed. This is done by re-using the add-on check code.	2020-07-23 18:03:24 +05:30
Tarun Pothulapati	b7e9507174	Remove/Relax prometheus related checks (#4724 ) * Removes/Relaxes prometheus related checks Now that prometheus is an add-on, There can be cases where prometheus is disabled at which the check should show a warning but not fail. This decouples the tight depedency. This changes the following checks: - Removes serviceAccount and pod checks in the CLI. - Relaxes `linkerd-api` checks to only check for prometheus access when the URL is not empty. This should work seamlessly with external prometheus as that URL will be passed and it performs the same check. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-07-20 14:24:00 -07:00
Tarun Pothulapati	2a099cb496	Move Prometheus as an Add-On (#4362 ) This moves Prometheus as a add-on, thus making it optional but enabled by default. The also make `linkerd-prometheus` more configurable, and allow it to have its own life-cycle for upgrades, configuration, etc. This work will be followed by documentation that help users configure existing Prometheus to work with Linkerd. Changes Include: - moving prometheus manifests into a separate chart at `charts/add-ons/prometheus`, and adding it as a dependency to `linkerd2` - implement the `addOn` interface to support the same with CLI. - include configuration in `linkerd-config-addons` User Facing Changes: The default install experience does not change much but for users who have already configured Prometheus differently, would need to apply the same using the new configuration fields present in chart README	2020-07-09 23:29:03 +05:30
Zahari Dichev	73010149ce	Do not treat evicted pods as failed in healthchecks (#4732 ) When a k8s pod is evicted its Phase is set to Failed and the reason is set to Evicted. Because in the ListPods method of the public APi we only transmit the phase and treat it as Status, the healthchecks assume such evicted data plane pods to be failed. Since this check is retryable, the results is that linkerd check --proxy appears to hang when there are evicted pods. As @adleong correctly pointed out here, the presence of evicted pod is not something that we should make the checks fail. This change modifies the publci api to set the Pod.Status to "Evicted" for evicted pods. The healtcheks are also modified to not treat evicted pods as error cases. Fix #4690 Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-07-09 14:22:27 +03:00
Suraj Deshmukh	d7dbe9cbff	Fix spelling mistakes using codespell (#4700 ) Using following command the wrong spelling were found and later on fixed: ``` codespell --skip CHANGES.md,.git,go.sum,\ controller/cmd/service-mirror/events_formatting.go,\ controller/cmd/service-mirror/cluster_watcher_test_util.go,\ SECURITY_AUDIT.pdf,.gcp.json.enc,web/app/img/favicon.png \ --ignore-words-list=aks,uint,ans,files\' --check-filenames \ --check-hidden ``` Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com>	2020-07-07 17:07:22 -05:00
Zahari Dichev	5a2f326bb5	Surface scheduling errors on retry (#4683 ) Currently linkerd check appears to hang on HA installations where there are pods that are unscheduable. In reality it is just wating on a condition that might never become true without showing any useful information (i.e. which pods are not scheduled). This change adds sets the `surfaceErrorOnRetry: true` so the user gets feedback wrt to what conditions are not met yet instead of simply being shown waiting for check to complete. Fix #4680 Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-06-30 18:14:21 +03:00
Zahari Dichev	51c48694d4	Make uncheduble pods check warning only (#4675 ) Currently commands that need access to the public api are executing the `LinkerdControlPlaneExistenceChecks` This set of checks includes one that specifically checks that there is no unscheduable pods. In fact in order to run commands like stat and edge we do not need to meet that requirement. This change relaxes all this by makind the no unschedulable pods a warning only check. Fixes #3940 Signed-off-by: Zahari Dichev zaharidichev@gmail.com	2020-06-30 16:55:17 +03:00
Zahari Dichev	7c98e89bdc	Make `service mirror controller is running check` retry (#4650 ) This PR makes the service mirror controller is running retry on failure. This brings the check in line with the rest of the checks that verify that certain Linkerd components are running. It is especially useful in integration tests when we want to wait for the service mirror component to be initialized for a certain amount of time before we simply fail the linkerd check command Fix #4642 Signed-off-by: Zahari Dichev zaharidichev@gmail.com	2020-06-22 20:33:43 +03:00
Tarun Pothulapati	4219955bdb	multicluster: checks for misconfigured mirror services (#4552 ) Fixes #4541 This PR adds the following checks - if a mirrored service has endpoints. (This includes gateway mirrors too). - if an exported service is referencing a gateway that does not exist. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> Signed-off-by: Alex Leong <alex@buoyant.io> Co-authored-by: Alex Leong <alex@buoyant.io>	2020-06-08 15:29:34 -07:00
Kevin Leimkuhler	d7f84e6c7b	Change help text to use source/target terminology in service-mirror and healthchecks (#4524 ) Change terminology from local/remote to source/target in service-mirror and healthchecks help text. This does not change any variable, function, struct, or field names since testing is still improving Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-06-02 15:21:52 -04:00
Alex Leong	33bd81692a	Add list of successful gateways in multicluster check (#4516 ) Fixes #4478 We add some additional output text when the "all remote cluster gateways are alive" check succeeds to list the gateways that have been detected as alive. In order to do this, we have added an `VerboseSuccess` error type. Even though this type implements the `error` interface, it represents a success which contains additional information to be printed. Sample output when dead gateways are detected: ``` [...] √ service mirror controller can access remote clusters × all remote cluster gateways are alive Some gateways are not alive: * cluster: [gke], gateway: [linkerd-multicluster/linkerd-gateway] see https://linkerd.io/checks/#l5d-multicluster-remote-gateways-alive for hints √ clusters share trust anchors ``` Sample output when all gateways are alive: ``` [...] √ service mirror controller can access remote clusters √ all remote cluster gateways are alive * cluster: [gke], gateway: [linkerd-multicluster/linkerd-gateway] √ clusters share trust anchors ``` Signed-off-by: Alex Leong <alex@buoyant.io>	2020-06-01 13:57:13 -07:00
Alex Leong	16d2d4bf81	Add multicluster daisy chain check (#4483 ) A mirror-service is one that has been created by the mirror service controller and resolves to a gateway in another cluster. If a mirror service is exported (and thus mirrored into another cluster) this creates a "daisy chain" where requests can come in to the cluster through the local gateway and be immediately sent out of the cluster to a remote gateway. If the remote gateway is in the source cluster, this can create an infinite loop. Similarly, if an exported service routes to a mirror service by a traffic split, the same daisy chain effect occurs. One example where this can come up is with multicluster fail-over. If both clusters simultaneously fail-over even a portion of their traffic, a loop is created. We add a check that detects either of the above conditions and warns of the existence of a daisy chain. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-06-01 12:10:59 -07:00
Tarun Pothulapati	a8158dbeac	Add HealthChecks for Tracing Add-On (#4407 ) Adds health-checks for tracing add-on, along with a refactor to have safe casts. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-05-26 22:10:23 +05:30
Tarun Pothulapati	555fb14403	separate multi-cluster checks and run after add-ons (#4468 )	2020-05-26 12:07:03 +05:30
Alex Leong	acacf2e023	Add --close-wait-timeout inject flag (#4409 ) Depends on https://github.com/linkerd/linkerd2-proxy-init/pull/10 Fixes #4276 We add a `--close-wait-timeout` inject flag which configures the proxy-init container to run with `privileged: true` and to set `nf_conntrack_tcp_timeout_close_wait`. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-05-21 14:14:14 -07:00
Zahari Dichev	3a3e407848	Tweak check hint anchors (#4449 ) Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-05-20 23:17:51 +03:00
Zahari Dichev	31e33d18d3	Enable service mirroring to work in private networks (#4440 ) This change creates a gateway proxy for every gateway. This enables the probe worker to leverage the destination service functionality in order to discover the identity of the gateway. Fix #4411 Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-05-20 19:48:36 +03:00
Zahari Dichev	6574f124a7	Restrict Service mirror RBACs (#4426 ) This PR introduces a few changes that were requested after a bit of service mirror reviewing. - we restrict the RBACs so the service mirror controller cannot read secrets in all namespaces but only in the one that it is installed in - we unify the namespace namings so all multicluster resources are installedi n `linkerd-multicluster` on both clusters - fixed checks to account for changes Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-05-20 17:08:01 +03:00
Tarun Pothulapati	e91dbda287	Add health checks for grafana add-on (#4321 ) * Add health checks for grafana add-on Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * update testCheck command and fixes Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * fix checkContainersRunnning function Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * linting fix Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * update test golden files Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * use hc.ControlPlanePods instead of k8s API Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * use hc.controlPLanePods directly Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * remove unnecessary comments Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * proper comments Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * update pod checks to use retries Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * add values key check Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-05-14 23:18:43 +05:30
Tarun Pothulapati	45ccc24a89	Move grafana templates into a separate sub-chart as a add-on (#4320 ) * adds grafana manifests as a sub-chart - moves grafana templates into its own chart - implement add-on interface Grafana struct - also add relevant conditions for grafana Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * remove redundant grafana fields in Values Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * update golden files Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * fix values issue Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * remove extra grafanaImage value Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * add add-on upgrade tests Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * fix golden file tests Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * add grafana field to linkerd-config-addons Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * Don't apply nil configuration Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * update golden files Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * make checks relaxed for grafana Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * update test to not test on grafana Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * update TestServiceAccountsMatch to contain extra members Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * replace map[string]interface{} with Grafana for better readability Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * update golden files Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-05-11 22:22:14 +05:30
Zahari Dichev	3008f1f87f	Add check for validating that remote clusters share the same trust an… (#4311 ) Add check for validating that remote clusters share the same trust anchors Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-05-11 09:59:15 +03:00
Zahari Dichev	4e82ba8878	Multicluster checks (#4279 ) Multicluster checks Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-05-05 10:19:38 +03:00
Alejandro Pedraza	2cd48bc488	Go test failure message wrappers to create GH Annotations (#4292 ) * Go test failure message wrappers to create GH Annotations First part of #4176 ## Problem Failures in go tests need to be properly formatted as Github annotations so that we can fetch them through Github's API for aggregation and analysis. ## Solution A wrapper for error messages has been created in `testutil/annotations.go`. The idea is that instead of throwing test failures like this: ```go t.Failf("error retrieving data;\nExpected: %#v\nActual: %#v", expected, actual) ``` We'd throw them like this: ```go testutil.AnnotationFatalf("error retrieving data", "error retrieving data;\nExpected: %#v\nActual: %#v", expected, actual) ``` That will continue reporting the error as before (when using `go test` or another test runner), but as a side-effect it will also send to stdout something like: ``` ::error file=pkg/inject_test.go,line=133::error retrieving data ``` Which becomes a GH annotation, visible in the CI run summary screen. The fist string art is used to have the GH annotation be a generic error message that can be aggregated and counted across multiple test runs. If `testutil.Fatalf(str, args...)` is called instead, the original error message will be used. Note that that the output will be produced only when the env var `GH_ANNOTATION` is set (which will when tests are triggered from a Github Actions workflow). Besides `testutil/annotation.go` and its accompanying unit test file, other changes were made in other tests as examples, the plan being that in a further PR _all_ the tests will use these wrappers.	2020-05-01 16:16:06 -05:00
Alex Leong	e962bf1968	Improve proxy version diagnostics (#4244 ) It can be difficult to know which versions of the proxy are running in your cluster, especially when you have pods running at multiple different proxy versions. We add two pieces of CLI functionality to assist with this: The `linkerd check --proxy` command will now list all data plane pods which are not up-to-date rather than just printing the first one it encounters: ``` ‼ data plane is up-to-date Some data plane pods are not running the current version: * default/books-84958fff5-95j75 (git-ca760bdd) * default/authors-57c6dc9b47-djldq (git-ca760bdd) * default/traffic-85f58ccb66-vxr49 (git-ca760bdd) * default/release-name-smi-metrics-899c68958-5ctpz (git-ca760bdd) * default/webapp-6975dc796f-2ngh4 (git-ca760bdd) * default/webapp-6975dc796f-z4bc4 (git-ca760bdd) * emojivoto/voting-54ffc5787d-wj6cp (git-ca760bdd) * emojivoto/vote-bot-7b54d6999b-57srw (git-ca760bdd) * emojivoto/emoji-5cb99f85d8-5bhvm (git-ca760bdd) * emojivoto/web-7988674b8b-zfvvm (git-ca760bdd) * default/webapp-6975dc796f-d2fbc (git-ca760bdd) * default/curl (git-7f6bbc73) see https://linkerd.io/checks/#l5d-data-plane-version for hints ``` The `linkerd version` command now supports a `--proxy` flag which will list all proxy versions running in the cluster and the number of pods running each version: ``` linkerd version --proxy Client version: dev-7b9d475f-alex Server version: edge-20.4.1 Proxy versions: edge-20.4.1 (10 pods) git-ca760bdd (11 pods) git-7f6bbc73 (1 pods) ``` Signed-off-by: Alex Leong <alex@buoyant.io>	2020-04-16 11:28:19 -07:00
Alex Leong	7b9d475ffc	Gate SMI-Metrics behind an install flag (#4240 ) This change adds a `--smi-metrics` install flag which controls if the SMI-metrics controller and associated RBAC and APIService resources are installed. The flag defaults to false and is hidden. We plan to remove this flag or default it to true if and when the SMI-Metrics integration graduates from experimental. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-04-09 14:34:08 -07:00
Alejandro Pedraza	573060bacc	New test for checking SA lists are synced (#4201 ) Followup to #4193 This is to verify that the list of SA installed, as well as the list of SA in the linkerd-psp RoleBinding match the list of expected SA defined in `healthcheck.go`.	2020-03-26 12:54:31 -05:00

1 2 3 4

181 Commits