linkerd2

History

Tarun Pothulapati 7ab8255855 multicluster: make service mirror honour `requeueLimit` (#5969 ) * multicluster: make service mirror honour `requeueLimit` Fixes #5374 Currently, Whenever the `gatewayAddress` is changed the service mirror component keeps trying to repairEndpoints (which is invoked every `repairPeriod`). This behavior is fine and expected but as the service mirror does not honor `requeueLimit` currently, It keeps on requeuing the same event and keeps trying with no limit. The condition that we use to limit requeues `if (rcsw.eventsQueue.NumRequeues(event) < rcsw.requeueLimit)` does not work for the following reason: - For this queue to actually track requeues, `AddRateLimited` has to be used instead which makes `NumRequeues` actually return the actual number of requeues for a specific event. This change updates the requeuing logic to use `AddRateLimited` instead of `Add` After these changes, The logs in the service mirror are as follows ```bash time="2021-03-30T16:52:31Z" level=info msg="Received: OnAddCalled: {svc: Service: {name: grafana, namespace: linkerd-viz, annotations: [[linkerd.io/created-by=linkerd/helm git-0e2ecd7b]], labels [[linkerd.io/extension=viz]]}}" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Requeues: 1, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Requeues: 2, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Requeues: 3, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=error msg="Error processing RepairEndpoints (giving up): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Requeues: 0, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Requeues: 1, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Requeues: 2, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Requeues: 3, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (giving up): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote ``` As seen, The `RepairEndpoints` is called every `repairPeriod` which is 1 minute by default. Whenever a failure happens, It is retried but now the failures are tracked and the event is given up if it reaches the `reuqueLimit` which is 3 by default. This also fixes the requeuing logic for all type of events not just `repairEndpoints`. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>		2021-04-01 11:16:57 +05:30
..
cluster_watcher.go	multicluster: make service mirror honour `requeueLimit` (#5969 )	2021-04-01 11:16:57 +05:30
cluster_watcher_mirroring_test.go	extension: Separate multicluster chart and binary (#5293 )	2020-12-04 16:36:10 -08:00
cluster_watcher_test_util.go	Revert "Rename multicluster annotation prefix and move when possible (#5771 )" (#5813 )	2021-02-24 12:54:52 -05:00
events_formatting.go	extension: Separate multicluster chart and binary (#5293 )	2020-12-04 16:36:10 -08:00
jittered_ticker.go	extension: Separate multicluster chart and binary (#5293 )	2020-12-04 16:36:10 -08:00
metrics.go	extension: Separate multicluster chart and binary (#5293 )	2020-12-04 16:36:10 -08:00
probe_worker.go	extension: Separate multicluster chart and binary (#5293 )	2020-12-04 16:36:10 -08:00