linkerd2

Commit Graph

Author	SHA1	Message	Date
Tarun Pothulapati	7ab8255855	multicluster: make service mirror honour `requeueLimit` (#5969 ) * multicluster: make service mirror honour `requeueLimit` Fixes #5374 Currently, Whenever the `gatewayAddress` is changed the service mirror component keeps trying to repairEndpoints (which is invoked every `repairPeriod`). This behavior is fine and expected but as the service mirror does not honor `requeueLimit` currently, It keeps on requeuing the same event and keeps trying with no limit. The condition that we use to limit requeues `if (rcsw.eventsQueue.NumRequeues(event) < rcsw.requeueLimit)` does not work for the following reason: - For this queue to actually track requeues, `AddRateLimited` has to be used instead which makes `NumRequeues` actually return the actual number of requeues for a specific event. This change updates the requeuing logic to use `AddRateLimited` instead of `Add` After these changes, The logs in the service mirror are as follows ```bash time="2021-03-30T16:52:31Z" level=info msg="Received: OnAddCalled: {svc: Service: {name: grafana, namespace: linkerd-viz, annotations: [[linkerd.io/created-by=linkerd/helm git-0e2ecd7b]], labels [[linkerd.io/extension=viz]]}}" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Requeues: 1, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Requeues: 2, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Requeues: 3, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=error msg="Error processing RepairEndpoints (giving up): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Requeues: 0, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Requeues: 1, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Requeues: 2, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Requeues: 3, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (giving up): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote ``` As seen, The `RepairEndpoints` is called every `repairPeriod` which is 1 minute by default. Whenever a failure happens, It is retried but now the failures are tracked and the event is given up if it reaches the `reuqueLimit` which is 3 by default. This also fixes the requeuing logic for all type of events not just `repairEndpoints`. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-04-01 11:16:57 +05:30
Kevin Leimkuhler	5bd5db6524	Revert "Rename multicluster annotation prefix and move when possible (#5771 )" (#5813 ) This reverts commit `f9ab867cbc` which renamed the multicluster label name from `mirror.linkerd.io` to `multicluster.linkerd.io`. While this change was made to follow similar namings in other extensions, it complicates the multicluster upgrade process due to the secret creation. `mirror.linkerd.io` is not that important of a label to change and this will allow a smoother upgrade process for `stable-2.10.x` Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2021-02-24 12:54:52 -05:00
Kevin Leimkuhler	ff93d2d317	Mirror opaque port annotations on services (#5770 ) This change introduces an opaque ports annotation watcher that will send destination profile updates when a service has its opaque ports annotation change. The user facing change introduced by this is that the opaque ports annotation is now required on services when using the multicluster extension. This is because the service mirror will create mirrored services in the source cluster, and destination lookups in the source cluster need to discover that the workloads in the target cluster are opaque protocols. ### Why Closes #5650 ### How The destination server now has a new opaque ports annotation watcher. When a client subscribes to updates for a service name or cluster IP, the `GetProfile` method creates a profile translator stack that passes updates through resource adaptors such as: traffic split adaptor, service profile adaptor, and now opaque ports adaptor. When the annotation on a service changes, the update is passed through to the client where the `opaque_protocol` field will either be set to true or false. A few scenarios to consider are: - If the annotation is removed from the service, the client should receive an update with no opaque ports set. - If the service is deleted, the stream stays open so the client should receive an update with no opaque ports set. - If the service has the annotation added, the client should receive that update. ### Testing Unit test have been added to the watcher as well as the destination server. An integration test has been added that tests the opaque port annotation on a service. For manual testing, using the destination server scripts is easiest: ``` # install Linkerd # start the destination server $ go run controller/cmd/main.go destination -kubeconfig ~/.kube/config # Create a service or namespace with the annotation and inject it # get the destination profile for that service and observe the opaque protocol field $ go run controller/script/destination-client/main.go -method getProfile -path test-svc.default.svc.cluster.local:8080 INFO[0000] fully_qualified_name:"terminus-svc.default.svc.cluster.local" opaque_protocol:true retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} dst_overrides:{authority:"terminus-svc.default.svc.cluster.local.:8080" weight:10000} INFO[0000] INFO[0000] fully_qualified_name:"terminus-svc.default.svc.cluster.local" opaque_protocol:true retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} dst_overrides:{authority:"terminus-svc.default.svc.cluster.local.:8080" weight:10000} INFO[0000] ``` Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2021-02-23 13:36:17 -05:00
Kevin Leimkuhler	f9ab867cbc	Rename multicluster annotation prefix and move when possible (#5771 ) This renames the multicluster annotation prefix from `mirror.linkerd.io` to `multicluster.linkerd.io` in order to reflect other extension naming patterns. Additionally, it moves labels only used in the Multicluster extension into their own labels file—again to reflect other extensions. Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2021-02-18 17:10:33 -05:00
Alejandro Pedraza	131e270d5a	Don't swallow error when MC gateway hostname can't be resolved (#5362 ) * Don't swallow error when MC gateway hostname can't be resolved Ref #5343 When none of the gateway addresses is resolvable, propagate the error as a retryable error so it gets retried and logged. Don't create the mirrored resources if there's no success after the retries.	2020-12-11 09:58:44 -05:00
Tarun Pothulapati	72a0ca974d	extension: Separate multicluster chart and binary (#5293 ) Fixes #5257 This branch movies mc charts and cli level code to a new top level directory. None of the logic is changed. Also, moves some common types into `/pkg` so that they are accessible both to the main cli and extensions. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-12-04 16:36:10 -08:00

6 Commits