linkerd2

Commit Graph

Author	SHA1	Message	Date
Jeffrey N. Davis	02f11da219	Update ADOPTERS.md (#4927 ) Add Novolabs!	2020-08-31 14:22:30 -07:00
Zhou Hao	55689044cb	add os.RemoveAll err verification (#4885 ) Signed-off-by: Zhou Hao <zhouhao@cn.fujitsu.com>	2020-08-31 13:58:13 -07:00
Alex Leong	33ddd4e357	Use correct component name in multicluster checks (#4921 ) The multicluster checks make sure that the correct resources exist for each service mirror controller. When looking up these resources, it uses the `linkerd.io/control-plane-component=linkerd-service-mirror` label selector. However, these resources have the label `linkerd.io/control-plane-component=service-mirror`. This causes the resource lookup to fail to find the resource and the check spuriously fails. ``` × service mirror controller has required permissions missing ServiceAccounts: linkerd-service-mirror-self missing ClusterRoles: linkerd-service-mirror-access-local-resources-self missing ClusterRoleBindings: linkerd-service-mirror-access-local-resources-self missing Roles: linkerd-service-mirror-read-remote-creds-self missing RoleBindings: linkerd-service-mirror-read-remote-creds-self see https://linkerd.io/checks/#l5d-multicluster-source-rbac-correct for hints \| * no service mirror controller deployment for Link self ``` Instead, use the correct label selector when looking up these resources. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-08-31 13:40:53 -07:00
Oliver Gould	8932f52ec6	proxy: v2.108.0 (#4932 ) This release improves error handling for DNS errors encountered when discovering control plane addresses. Such errors are common during installation, before all components have been started. --- * Recognize NXDomain Errors (linkerd/linkerd2-proxy#639) * control: Recover from failed resolutions (linkerd/linkerd2-proxy#640) * svc: Update stack diagnostic checks (linkerd/linkerd2-proxy#642) * service-profiles: Eliminate the HasDestination trait (linkerd/linkerd2-proxy#643) * outbound: Make discovery error detection generic (linkerd/linkerd2-proxy#644) * Restore disabled portion of profile override test (linkerd/linkerd2-proxy#645) * service-profiles: Cleanup crate organization (linkerd/linkerd2-proxy#646) * Update tower to tower-rs/tower@ad348d8 (linkerd/linkerd2-proxy#647)	2020-08-31 12:50:41 -07:00
tbsoares	8ab1e75afc	Update ADOPTERS.md (#4922 ) Add OLX Brasil to ADOPTERS.md	2020-08-31 09:35:48 -05:00
Hu Shuai	b1c953d20d	Fix a verb tense error (#4930 ) Signed-off-by: Hu Shuai <hus.fnst@cn.fujitsu.com>	2020-08-31 09:34:03 -05:00
Ali Ariff	5186383c81	Add ARM64 Integration Test (#4897 ) * Add ARM64 Integration Test Signed-off-by: Ali Ariff <ali.ariff12@gmail.com>	2020-08-28 10:38:40 -07:00
Alex Leong	9d3cf6ee4d	Move most service-mirror code out of cmd package (#4901 ) All of the code for the service mirror controller lives in the `linkerd/linkerd2/controller/cmd` package. It is typical for control plane components to only have a `main.go` entrypoint in the cmd package. This can sometimes make it hard to find the service mirror code since I wouldn't expect it to be in the cmd package. We move the majority of the code to a dedicated controller package, leaving only main.go in the cmd package. This is purely organizational; no behavior change is expected. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-08-27 14:17:18 -07:00
Zahari Dichev	d28044db7a	edge-20.8.4 (#4916 ) ## edge-20.8.4 * Fixed a problem causing the `enable-endpoint-slices` flag to not be persisted when set via `linkerd upgrade` (thanks @Matei207!) * Removed SMI-Metrics templates and experimental sub-commands * Use `--frozen-lockfile` to avoid accidental update of dashboard JS dependencies in CI (thanks @tharun208!) Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-08-27 18:43:12 +03:00
Oliver Gould	2122b43977	proxy: v2.107.0 (#4917 ) This release includes internal changes to the service discovery system, especially when discovering control plane components (like the destination and identity controllers). Now, the proxy attempts to balance requests across all pods in each control plane service. This requires control plane changes to use "headless" services so that SRV records are exposed. When the control plane services have a `clusterIP` set, the proxy falls back to using normal A-record lookups. --- * tracing: add richer verbose spans to http clients (linkerd/linkerd2-proxy#622) * trace: update tracing dependencies (linkerd/linkerd2-proxy#623) * Remove `Resolution` trait (linkerd/linkerd2-proxy#606) * Update proxy-identity to edge-20.8.2 (linkerd/linkerd2-proxy#627) * Add build arg for skipping identity wrapper (linkerd/linkerd2-proxy#624) * Wait for proxy thread to terminate in integration tests (linkerd/linkerd2-proxy#625) * Remove scrubbing for unused headers (linkerd/linkerd2-proxy#628) * Split orig-proto tests out of discovery tests (linkerd/linkerd2-proxy#629) * Re-enable outbound timeout test (linkerd/linkerd2-proxy#630) * profiles: perform profile resolution for IP addresses (linkerd/linkerd2-proxy#626) * Move resolve api to async-stream (linkerd/linkerd2-proxy#599) * Decouple discovery buffering from endpoint conversion (linkerd/linkerd2-proxy#631) * resolve: Add a Reset state (linkerd/linkerd2-proxy#633) * resolve: Eagerly fail resolutions (linkerd/linkerd2-proxy#634) * test: replace `net2` dependency with `socket2` (linkerd/linkerd2-proxy#635) * dns: Run DNS resolutions on the main runtime (linkerd/linkerd2-proxy#637) * Load balance requests to the control plane (linkerd/linkerd2-proxy#594) * Unify control plane client construction (linkerd/linkerd2-proxy#638)	2020-08-26 15:16:05 -07:00
Tarun Pothulapati	c9c5d97405	Remove SMI-Metrics charts and commands (#4843 ) Fixes #4790 This PR removes both the SMI-Metrics templates along with the experimental sub-commands. This also removes pkg `smi-metrics` as there is no direct use of it without the commands. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-08-24 14:35:33 -07:00
Matei David	7ed904f31d	Enable endpoint slices when upgrading through CLI (#4864 ) ## What/How @adleong pointed out in #4780 that when enabling slices during an upgrade, the new value does not persist in the `linkerd-config` ConfigMap. I took a closer look and it seems that we were never overwriting the values in case they were different. * To fix this, I added an if block when validating and building the upgrade options -- if the current flag value differs from what we have in the ConfigMap, then change the ConfigMap value. * When doing so, I made sure to check that if the cluster does not support `EndpointSlices` yet the flag is set to true, we will error out. This is done similarly (copy&paste similarily) to what's in the install part. * Additionally, I have noticed that the helm ConfigMap template stored the flag value under `enableEndpointSlices` field name. I assume this was not changed in the initial PR to reflect the changes made in the protocol buffer. The API (and thus the CLI) uses the field name `endpointSliceEnabled` instead. I have changed the config template so that helm installations will use the same field, which can then be used in the destination service or other components that may implement slice support in the future. Signed-off-by: Matei David <matei.david.35@gmail.com>	2020-08-24 14:34:50 -07:00
Tharun Rajendran	8a2cb12656	Fix js unit test dependency on ci (#4896 ) unit tests in ci are runned using yarn install. So, there will be some update to the dependencies. This is fixed by passing --frozen-lockfile in ci workflow Fixes #3838 Signed-off-by: Tharun <rajendrantharun@live.com>	2020-08-24 14:21:41 -07:00
Eliza Weisman	83d69beded	update changelog for edge-20.8.3 (#4904 ) This edge release adds support for [topology-aware service routing][1] to the Destination controller. When providing service discovery updates to proxies, the Destination controller will now filter endpoints based on the service's topology preferences. Additionally, this release includes bug fixes for the `linkerd check` CLI command and web dashboard. * CLI * `linkerd check` will no longer warn about a looser webhook failure policy in HA mode * Controller * Added support for [topology-aware service routing][1] to the Destination controller (thanks @Matei207) * Changed the Destination controller to always return destination overrides for service profiles when no traffic split is present * Web UI * Fixed Tap `Authority` dropdown not being populated (thanks to @tharun208!) [1]: https://kubernetes.io/docs/concepts/services-networking/service-topology/	2020-08-21 12:51:35 -07:00
Tharun Rajendran	b45abeaad5	Fix tap filter authority (#4810 ) Tap component is calling fetch metrics with skip_stats and authority service type is not sent.So, authority dropdown is not getting populated. Added a seperate call to get metrics for authority Fixes #4697 Signed-off-by: Tharun <rajendrantharun@live.com>	2020-08-20 09:05:53 -07:00
Kevin Leimkuhler	c2301749ef	Always return destination overrides for services (#4890 ) ## Motivation #4879 ## Solution When no traffic split exists for services, return a single destination override with a weight of 100%. Using the destination client on a new linkerd installation, this results in the following output for `linkerd-identity` service: ``` ❯ go run controller/script/destination-client/main.go -method getProfile -path linkerd-identity.linkerd.svc.cluster.local:8080 INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} dst_overrides:{authority:"linkerd-identity.linkerd.svc.cluster.local.:8080" weight:100000} INFO[0000] ``` Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-08-19 12:25:58 -07:00
Zhou Hao	803511d77b	Add some unit test (#4853 ) Add unit tests for parsing IP addresses. Signed-off-by: Zhou Hao <zhouhao@cn.fujitsu.com>	2020-08-18 16:10:13 -07:00
Matei David	f797ab1e65	service topologies: topology-aware service routing (#4780 ) [Link to RFC](https://github.com/linkerd/rfc/pull/23) ### What --- * PR that puts together all past pieces of the puzzle to deliver topology-aware service routing, as specified in the [Kubernetes docs](https://kubernetes.io/docs/concepts/services-networking/service-topology/) but with a much better load balancing algorithm and all the coolness of linkerd :) * The first piece of this PR is focused on adding topology metadata: topology preference for services and topology `<k,v>` pairs for endpoints. * The second piece of this PR puts together the new context format and fetching the source node topology metadata in order to allow for endpoints filtering. * The final part is doing the filtering -- passing all of the metadata to the listener and on every `Add` filtering endpoints based on the topology preference of the service, topology `<k,v>` pairs of endpoints and topology of the source (again `<k,v>` pairs). ### How --- * Collecting metadata: - Services do not have values for topology keys -- the topological keys defined in a service's spec are only there to dictate locality preference for routing; as such, I decided to store them in an array, they will be taken exactly as they are found in the service spec, this ensures we respect the preference order. - For EndpointSlices, we are using a map -- an EndpointSlice has locality information in the form of `<k,v>` pair, where the key is a topological key (similar to what's listed in the service) and the value is the locality information -- e.g `hostname: minikube`. For each address we now have a map of topology values which gets populated when we translate the endpoints to an address set. Because normal Endpoints do not have any topology information, we create each address with an empty map which is subsequently populated ONLY for slices in the `endpointSliceToAddressSet` function. * Filtering endpoints: - This was a tricky part and filled me with doubts. I think there are a few ways to do this, but this is how I "envisioned" it. First, the `endpoint_translator.go` should be the one to do the filtering; this means that on subscription, we need to feed all of the relevant metadata to the listener. To do this, I created a new function `AddTopologyFilter` as part of the listener interface. - To complement the `AddTopologyFilter` function, I created a new `TopologyFilter` struct in `endpoints_watcher.go`. I then embedded this structure in all listeners that implement the interface. The structure holds the source topology (source node), a boolean to tell if slices are activated in case we need to double check (or write tests for the function) and the service preference. We create the filter on Subscription -- we have access to the k8s client here as well as the service, so it's the best point to collect all of this data together. Addresses all have their own topology added to them so they do not have to be collected by the filter. - When we add a new set of addresses, we check to see if slices are enabled -- chances are if slices are enabled, service topology might be too. This lets us skip this step if the latest version is not adopted. Prior to sending an `Add` we filter the endpoints -- if the preference is registered by the filter we strictly enforce it, otherwise nothing changes. And that's pretty much it. Signed-off-by: Matei David <matei.david.35@gmail.com>	2020-08-18 11:11:09 -07:00
Zahari Dichev	2e7c00aa37	Diff generated code from proto files (#4863 ) Add a static check that ensures the generated files from the proto definitions have not changed. Fix #4669 Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-08-18 11:44:33 +03:00
Alejandro Pedraza	ac2bfb387b	Remove no longer needed GITCOOKIE_SH in CI (#4881 ) Removed usage of `GITCOOKIE_SH`, which was a script stored in a secret to authenticate requests against googlesource.com, to avoid hitting rate limits when pulling go dependencies from that source. Now that we use go modules, deps are pulled from http://proxy.golang.org/ and this is no longer needed.	2020-08-17 12:36:07 -07:00
Zahari Dichev	c25f0a3af5	Triger kube-system HA check based on webhook failure policy (#4861 ) This PR changes the HA check that verifies that the `config.linkerd.io/admission-webhooks=disabled` is present on kube-system to be enabled only when the failure policy for the proxy injector webhook is set to `Fail`. This allows users to skip this check in cases when the label is removed because the namespace is managed by the cloud provider like in the case described in #4754 Fix #4754 Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-08-17 13:56:03 +03:00
cpretzer	311a97a6fc	Changes for edge-20.8.2 (#4883 ) * Changes for edge-20.8.2 Signed-off-by: Charles Pretzer <charles@buoyant.io>	2020-08-14 17:40:42 -07:00
Carol A. Scott	0ffc44ad2e	Add es translations and docs instructions to dashboard (#4866 ) * Adding es translations Signed-off-by: Carol Scott carol@buoyant.io	2020-08-14 10:50:51 -07:00
cpretzer	c1fd6c3cae	Update webpack to 4.44.1 (#4871 ) * Update webpack to `4.44.1` to resolve vulnerability in `serialize-javascript` Signed-off-by: Charles Pretzer <charles@buoyant.io>	2020-08-14 10:07:37 -07:00
Ali Ariff	492b0fe093	Created ./bin/_os.sh lib for os-arch detection (#4880 ) And refactored `./bin/linkerd` and `./bin/build-cli-bin` to use it. Signed-off-by: Ali Ariff <ali.ariff12@gmail.com>	2020-08-14 09:59:52 -05:00
Kevin Leimkuhler	525ad264b0	Print identity in destination client and fix proxy-identity log line (#4873 ) ## Motivation These changes came up when testing mock identity. I found it useful for the destination client to print the identity of endpoints. ``` ❯ go run controller/script/destination-client/main.go -method get -path h1.test.example.com:8080 INFO[0000] Add: INFO[0000] labels: map[concrete:h1.test.example.com:8080] INFO[0000] - 127.0.0.1:4143 INFO[0000] - labels: map[addr:127.0.0.1:4143 h2:false] INFO[0000] - protocol hint: UNKNOWN INFO[0000] - identity: dns_like_identity:{name:"foo.ns1.serviceaccount.identity.linkerd.cluster.local"} INFO[0000] ``` I also fixed a log line in the proxy-identity where used the wrong value for the CSR path Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-08-13 13:49:55 -07:00
Kevin Leimkuhler	c0826dcedc	Choose the right architecture in bin/linkerd script (#4867 ) Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-08-13 11:03:08 -07:00
Ali Ariff	66d2c6b74b	Fix build-cli-bin (#4876 ) Fix `bin/build-cli-bin` on Linux to put the binary in the correctly named architecture directory. Signed-off-by: Ali Ariff <ali.ariff12@gmail.com>	2020-08-13 09:21:14 -07:00
Josh Soref	72aadb540f	Spelling (#4872 ) This PR corrects misspellings identified by the [check-spelling action](https://github.com/marketplace/actions/check-spelling). The misspellings have been reported at `aaf440489e (commitcomment-41423663)` The action reports that the changes in this PR would make it happy: `5b82c6c5ca` Note: this PR does not include the action. If you're interested in running a spell check on every PR and push, that can be offered separately. Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>	2020-08-12 21:59:50 -07:00
MaT1g3R	c6a043f9c8	Fix typos (#4858 ) Fix various typos Signed-off-by: Peijun Ma <peijun.ma@protonmail.com>	2020-08-11 22:13:45 -07:00
Steve Gray	6a78d2fdd5	Add PriceKinetics / GVC AU to Adopters List (#4854 ) Add PriceKinetics, part of GVC Australia to the adopters list. Signed-off-by: Steve Gray <steve.gray@ladbrokes.com.au> Co-authored-by: Steve Gray <steve.gray@ladbrokes.com.au>	2020-08-11 17:35:41 -07:00
Alex Leong	c2d8c16509	Add multicluster uninstall (#4840 ) Fixes #4708 Adds a `linkerd multicluster uninstall` command which outputs the manifests required to uninstall the mutlicluster components. This command first checks that no links exist and advises that any links must be removed with `linkerd multicluster unlink` before proceeding. Typical usage is: ``` linkerd multicluster uninstall \| kubectl delete -f - ``` Signed-off-by: Alex Leong <alex@buoyant.io>	2020-08-11 17:04:04 -07:00
Alex Leong	ae55a1aded	Turn on multicluster checks when --multicluster flag is used (#4817 ) When the Link CRD does not exist, multicluster checks in `linkerd check` will be skipped. The `--multicluster` flag is intended to force these checks on, but was being ignored. We update the options to force the multicluster checks on when the `--multicluster` flag is used, as intended. Now when `linkerd check --multicluster` is run on a cluster without the multicluster support installed, it gives the following output: ``` linkerd-multicluster -------------------- × Link CRD exists multicluster.linkerd.io/Link CRD is missing: the server could not find the requested resource see https://linkerd.io/checks/#l5d-multicluster-link-crd-exists for hints Status check results are × ``` Signed-off-by: Alex Leong <alex@buoyant.io>	2020-08-11 17:02:52 -07:00
Alex Leong	6ef9cab3d0	Fix up multicluster component labels (#4806 ) Fixes #4511 Add the `linkerd.io/control-plane-component: gateway` label to the multicluster gateway. Change the value of `linkerd.io/control-plane-component` from `linkerd-service-mirror` to `service-mirror` for the service mirror controller. These changes are for consistency and should not result in any change in functionality. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-08-11 17:02:20 -07:00
Alex Leong	729abf7f72	edge-20.8.1 (#4849 ) This edge adds multi-arch support to Linkerd! Our docker images and CLI now support the amd64, arm64, and arm architectures. * Multicluster * Added a multicluster unlink command for removing multicluster links * Improved multicluster checks to be more informative when the remote API is not reachable * Proxy * Enabled a multi-threaded runtime to substantially improve latency especially when the proxy is serving requests for many concurrent connections * Other * Fixed an issue where the debug sidecar image was missing during upgrades (thanks @javaducky!) * Updated all control plane plane and proxy container images to be multi-arch to support amd64, arm64, and arm (thanks @aliariff!) * Fixed an issue where check was failing when DisableHeartBeat was set to true (thanks @mvaal!)	2020-08-11 11:55:07 -07:00
Alejandro Pedraza	4876a94ed0	Update proxy-init version to v1.3.6 (#4850 ) Supersedes #4846 Bump proxy-init to v1.3.6, containing CNI fixes and support for multi-arch builds. #4846 included this in v1.3.5 but proxy.golang.org refused to update the modified SHA	2020-08-11 11:54:00 -05:00
Alex Leong	f4000afaf3	Refactor upgrade tests to remove use of golden files (#4860 ) The upgrade tests were failing due to hardcoded certificates which had expired. Additionally, these tests contained large swaths of yaml that made it very difficult to understand the semantics of each test case and even more difficult to maintain. We greatly improve the readability and maintainability of these tests by using a slightly different approach. Each test follows this basic structure: * Render an install manifest * Initialize a fake k8s client with the install manifest (and sometimes additional manifests) * Render an upgrade manifest * Parse the manifests as yaml tree structures * Perform a structured diff on the yaml tree structured and look for expected and unexpected differences The install manifests are generated dynamically using the regular install flow. This means that we no longer need large sections of hardcoded yaml in the tests themselves. Additionally, we now asses the output by doing a structured diff against the install manifest. This means that we no longer need golden files with explicit expected output. All test cases were preserved except for the following: * Any test cases related to multiphase install (config/control plane) were not replicated. This flow doesn't follow the same pattern as the tests above because the install and upgrade manifests are not expected to be the same or similar. I also felt that these tests were lower priority because the multiphase install/upgrade feature does not seem to be very popular and is a potential candidate for deprecation. * Any tests involving upgrading from a very old config were not replicated. The code to generate these old style configs is no longer present in the codebase so in order to test this case, we would need to resort to hardcoded install manifests. These tests also seemed low priority to me because Linkerd versions that used the old config are now over 1 year old so it may no longer be critical that we support upgrading from them. We generally recommend that users upgrading from an old version of Linkerd do so by upgrading through each major version rather than directly to the latest. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-08-11 09:22:29 -07:00
Ali Ariff	ae8bb0e26e	Release ARM CLI artifacts (#4841 ) * When releasing, build and upload the amd64, arm64 and arm architectures builds for the CLI * Refactored `Dockerfile-bin` so it has separate stages for single and multi arch builds. The latter stage is only used for releases. Signed-off-by: Ali Ariff <ali.ariff12@gmail.com>	2020-08-11 09:25:58 -05:00
Marcus Vaal	d902bbcddb	Add missing config when DisableHeartBeat value is set via Helm (#4835 ) Signed-off-by: mvaal <mjvaal@gmail.com>	2020-08-10 13:42:11 -07:00
Olukayode Bankole	6467b709da	Add Purdue University Global to ADOPTERS.md (#4798 ) Signed-off-by: Olu Bankole <rbankole@gmail.com>	2020-08-06 14:21:35 -07:00
Tarun Pothulapati	7e5804d1cf	grafana: move default values into values file (#4755 ) This PR moves default values into add-on specific values.yaml thus allowing us to update default values as they would not be present in linkerd-config-addons cm. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-08-06 13:57:28 -07:00
Herrmann Hinz	d64f3b498a	added mercedes-benz.io to adopters.md (#4847 ) Signed-off-by: Herrmann Hinz <tobias.hinz@gmail.com>	2020-08-06 13:34:25 -07:00
Oliver Gould	74f5c1a74a	proxy: v2.106.0 (#4842 ) This release enables a multi-threaded runtime. Previously, the proxy would only ever use a single thread for data plane processing; now, when the proxy is allocated more than 1 CPU share, the proxy allocates a thread per available CPU. This has shown substantial latency improvements in benchmarks, especially when the proxy is serving requests for many concurrent connections. --- * Add a `multicore` feature flag (linkerd/linkerd2-proxy#611) * Add `multicore` to default features (linkerd/linkerd2-proxy#612) * admin: add an endpoint to dump spawned Tokio tasks (linkerd/linkerd2-proxy#595) * trace: roll `tracing` and `tracing-subscriber` dependencies (linkerd/linkerd2-proxy#615) * stack: Add NewService::into_make_service (linkerd/linkerd2-proxy#618) * trace: tweak tracing & test support for the multithreaded runtime (linkerd/linkerd2-proxy#616) * Make FailFast cloneable (linkerd/linkerd2-proxy#617) * Move HTTP detection & server into linkerd2_proxy_http (linkerd/linkerd2-proxy#619) * Mark tap integration tests as flakey (linkerd/linkerd2-proxy#621) * Introduce a SkipDetect layer to preempt detection (linkerd/linkerd2-proxy#620)	2020-08-06 10:44:53 -07:00
cpretzer	d69974a70c	Change log output for failed namespace lookup (#4824 ) * Change log output for failed namespace lookup Signed-off-by: Charles Pretzer <charles@buoyant.io>	2020-08-05 21:03:54 -07:00
Alex Leong	024a35a3d3	Move multicluster API connectivity checks earlier (#4819 ) Fixes #4774 When a service mirror controller is unable to connect to the target cluster's API, the service mirror controller crashes with the error that it has failed to sync caches. This error lacks the necessary detail to debug the situation. Unfortunately, client-go does not surface more useful information about why the caches failed to sync. To make this more debuggable we do a couple things: 1. When creating the target cluster api client, we eagerly issue a server version check to test the connection. If the connection fails, the service-mirror-controller logs now look like this: ``` time="2020-07-30T23:53:31Z" level=info msg="Got updated link broken: {Name:broken Namespace:linkerd-multicluster TargetClusterName:broken TargetClusterDomain:cluster.local TargetClusterLinkerdNamespace:linkerd ClusterCredentialsSecret:cluster-credentials-broken GatewayAddress:35.230.81.215 GatewayPort:4143 GatewayIdentity:linkerd-gateway.linkerd-multicluster.serviceaccount.identity.linkerd.cluster.local ProbeSpec:ProbeSpec: {path: /health, port: 4181, period: 3s} Selector:{MatchLabels:map[] MatchExpressions:[{Key:mirror.linkerd.io/exported Operator:Exists Values:[]}]}}" time="2020-07-30T23:54:01Z" level=error msg="Unable to create cluster watcher: cannot connect to api for target cluster remote: Get \"https://36.199.152.138/version?timeout=32s\": dial tcp 36.199.152.138:443: i/o timeout" ``` This error also no longer causes the service mirror controller to crash. Updating the Link resource will cause the service mirror controller to reload the credentials and try again. 2. We rearrange the checks in `linkerd check --multicluster` to perform the target API connectivity checks before the service mirror controller checks. This means that we can validate the target cluster API connection even if the service mirror controller is not healthy. We also add a server version check here to quickly determine if the connection is healthy. Sample check output: ``` linkerd-multicluster -------------------- √ Link CRD exists √ Link resources are valid * broken W0730 16:52:05.620806 36735 transport.go:243] Unable to cancel request for promhttp.RoundTripperFunc × remote cluster access credentials are valid * failed to connect to API for cluster: [broken]: Get "https://36.199.152.138/version?timeout=30s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) see https://linkerd.io/checks/#l5d-smc-target-clusters-access for hints W0730 16:52:35.645499 36735 transport.go:243] Unable to cancel request for promhttp.RoundTripperFunc × clusters share trust anchors Problematic clusters: * broken: unable to fetch anchors: Get "https://36.199.152.138/api/v1/namespaces/linkerd/configmaps/linkerd-config?timeout=30s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) see https://linkerd.io/checks/#l5d-multicluster-clusters-share-anchors for hints √ service mirror controller has required permissions * broken √ service mirror controllers are running * broken × all gateway mirrors are healthy wrong number of (0) gateway metrics entries for probe-gateway-broken.linkerd-multicluster see https://linkerd.io/checks/#l5d-multicluster-gateways-endpoints for hints √ all mirror services have endpoints ‼ all mirror services are part of a Link mirror service voting-svc-gke.emojivoto is not part of any Link see https://linkerd.io/checks/#l5d-multicluster-orphaned-services for hints ``` Some logs from the underlying go network libraries sneak into the output which is kinda gross but I don't think it interferes too much with being able to understand what's going on. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-08-05 11:48:23 -07:00
Paul Balogh	62d54838b8	Ensure and update debug image during upgrade (#4823 ) Some installations upgrading from versions prior to 2.7.x may have missing debug image name and version. This fix ensures that the default values are in place for this scenario and additionally upgrades the version of debug image with the control plane version. Signed-off-by: Paul Balogh <javaducky@gmail.com>	2020-08-05 11:39:29 -07:00
Ali Ariff	61d7dedd98	Build ARM docker images (#4794 ) Build ARM docker images in the release workflow. # Changes: - Add a new env key `DOCKER_MULTIARCH` and `DOCKER_PUSH`. When set, it will build multi-arch images and push them to the registry. See https://github.com/docker/buildx/issues/59 for why it must be pushed to the registry. - Usage of `crazy-max/ghaction-docker-buildx ` is necessary as it already configured with the ability to perform cross-compilation (using QEMU) so we can just use it, instead of manually set up it. - Usage of `buildx` now make default global arguments. (See: https://docs.docker.com/engine/reference/builder/#automatic-platform-args-in-the-global-scope) # Follow-up: - Releasing the CLI binary file in ARM architecture. The docker images resulting from these changes already build in the ARM arch. Still, we need to make another adjustment like how to retrieve those binaries and to name it correctly as part of Github Release artifacts. Signed-off-by: Ali Ariff <ali.ariff12@gmail.com>	2020-08-05 11:14:01 -07:00
Alejandro Pedraza	f38bdf8ecc	Temporarily disable job `psscript-analyzer` in static checks (#4837 ) The job started failing consistently today with: ``` ##[error]devblackops/github-action-psscriptanalyzer/v2/action.yml (Line: 30, Col: 9): Unexpected value '' ##[error]devblackops/github-action-psscriptanalyzer/v2/action.yml (Line: 30, Col: 9): Unexpected value '' ##[error]System.ArgumentException: Unexpected type 'NullToken' encountered while reading 'outputs'. The type 'MappingToken' was expected. ``` It seems it's something in Github that changed today that is clashing with the `devblackops/github-action-psscriptanalyzer` action. I've raised devblackops/github-action-psscriptanalyzer#12	2020-08-04 22:01:30 -07:00
Alex Leong	381f237f69	Add multicluster unlink command (#4802 ) Fixes #4707 In order to remove a multicluster link, we add a `linkerd multicluster unlink` command which produces the yaml necessary to delete all of the resources associated with a `linkerd multicluster link`. These are: * the link resource * the service mirror controller deployment * the service mirror controller's RBAC * the probe gateway mirror for this link * all mirror services for this link This command follows the same pattern as the `linkerd uninstall` command in that its output is expected to be piped to `kubectl delete`. The typical usage of this command is: ``` linkerd --context=source multicluster unlink --cluster-name=foo \| kubectl --context=source delete -f - ``` This change also fixes the shutdown lifecycle of the service mirror controller by properly having it listen for the shutdown signal and exit its main loop. A few alternative designs were considered: I investigated using owner references as suggested [here](https://github.com/linkerd/linkerd2/issues/4707#issuecomment-653494591) but it turns out that owner references must refer to resources in the same namespace (or to cluster scoped resources). This was not feasible here because a service mirror controller can create mirror services in many different namespaces. I also considered having the service mirror controller delete the mirror services that it created during its own shutdown. However, this could lead to scenarios where the controller is killed before it finishes deleting the services that it created. It seemed more reliable to have all the deletions happen from `kubectl delete`. Since this is the case, we avoid having the service mirror controller delete mirror services, even when the link is deleted, to avoid the race condition where the controller and CLI both attempt to delete the same mirror services and one of them fails with a potentially alarming error message. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-08-04 16:21:59 -07:00
Rajat Jindal	cbcc305b78	Fixes unit tests when default kubeconfig namespace is not "default" (#4825 ) Use defaultNamespace from kubeconfig context Fixes #4779 Signed-off-by: Rajat Jindal <rajatjindal83@gmail.com>	2020-08-03 11:27:02 -07:00

1 2 3 4 5 ...

2319 Commits All Branches Search

2319 Commits

All Branches