mirror of https://github.com/linkerd/linkerd2.git
* multicluster: make service mirror honour `requeueLimit` Fixes #5374 Currently, Whenever the `gatewayAddress` is changed the service mirror component keeps trying to repairEndpoints (which is invoked every `repairPeriod`). This behavior is fine and expected but as the service mirror does not honor `requeueLimit` currently, It keeps on requeuing the same event and keeps trying with no limit. The condition that we use to limit requeues `if (rcsw.eventsQueue.NumRequeues(event) < rcsw.requeueLimit)` does not work for the following reason: - For this queue to actually track requeues, `AddRateLimited` has to be used instead which makes `NumRequeues` actually return the actual number of requeues for a specific event. This change updates the requeuing logic to use `AddRateLimited` instead of `Add` After these changes, The logs in the service mirror are as follows ```bash time="2021-03-30T16:52:31Z" level=info msg="Received: OnAddCalled: {svc: Service: {name: grafana, namespace: linkerd-viz, annotations: [[linkerd.io/created-by=linkerd/helm git-0e2ecd7b]], labels [[linkerd.io/extension=viz]]}}" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Requeues: 1, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Requeues: 2, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Requeues: 3, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=error msg="Error processing RepairEndpoints (giving up): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Requeues: 0, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Requeues: 1, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Requeues: 2, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Requeues: 3, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (giving up): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote ``` As seen, The `RepairEndpoints` is called every `repairPeriod` which is 1 minute by default. Whenever a failure happens, It is retried but now the failures are tracked and the event is given up if it reaches the `reuqueLimit` which is 3 by default. This also fixes the requeuing logic for all type of events not just `repairEndpoints`. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> |
||
|---|---|---|
| .. | ||
| cluster_watcher.go | ||
| cluster_watcher_mirroring_test.go | ||
| cluster_watcher_test_util.go | ||
| events_formatting.go | ||
| jittered_ticker.go | ||
| metrics.go | ||
| probe_worker.go | ||