linkerd2/controller
Alejandro Pedraza d9e9013cd9
Fix external-prometheus integration test flakyness (#6575)
Another attempt at fixing #6511

Even after #6524, we continued experiencing discrepancies on the
linkerd-edges integration test. The problem ended up being the external
prometheus instance not being injected. The injector logs revealed this:

```console
2021-07-29T13:57:10.2497460Z time="2021-07-29T13:54:15Z" level=info msg="caches synced"
2021-07-29T13:57:10.2498191Z time="2021-07-29T13:54:15Z" level=info msg="starting admin server on :9995"
2021-07-29T13:57:10.2498935Z time="2021-07-29T13:54:15Z" level=info msg="listening at :8443"
2021-07-29T13:57:10.2499945Z time="2021-07-29T13:54:18Z" level=info msg="received admission review request 2b7b4970-db40-4bda-895b-bb2e95e98265"
2021-07-29T13:57:10.2511751Z time="2021-07-29T13:54:18Z" level=debug msg="admission request: &AdmissionRequest{UID:2b7b4970-db40-4bda-895b-bb2e95e98265,Kind:/v1, Kind=Service,Resource:{ v1 services},SubResource:,Name:metrics-api,Namespace:linkerd-viz...
```

Usually one expects the webhook server to start first ("listening at
:8443") and then the admin server, but in this case it happened the
other way around. The admin server serves the readiness probe, so k8s
was signaled that the injector was ready before it could listen to
webhook requests, and given the WebhookFailurePolicy is Ignore by
default, sometimes this was causing for the prometheus pod creation
event to get missed, and we see in the log above that it starts by
processing the pods that are created afterwards, which are the viz ones.

In this fix we start first the webhook server, then block on the syncing
of the k8s API, which should give enough time for the webhook to be up,
and finally we start the admin server.
2021-07-29 13:29:31 -05:00
..
api Remove core dependency on viz (#6497) 2021-07-19 14:28:45 -07:00
cmd Read trust roots from configmap (#6455) 2021-07-28 13:23:15 -07:00
gen Bump google.golang.org/protobuf from 1.27.0 to 1.27.1 (#6409) 2021-07-01 14:50:04 -06:00
heartbeat Bump helm.sh/helm/v3 from 3.4.1 to 3.6.1 (#6286) 2021-06-18 09:34:29 -06:00
identity Have webhooks refresh their certs automatically (#5282) 2020-12-04 16:25:59 -05:00
k8s Fixes unit tests indeterminism (#6496) 2021-07-19 12:42:45 +05:30
proxy-injector Read trust roots from configmap (#6455) 2021-07-28 13:23:15 -07:00
script/destination-client Print identity in destination client and fix proxy-identity log line (#4873) 2020-08-13 13:49:55 -07:00
sp-validator Fix sp-validator not setting requestUID (#5866) 2021-03-04 09:57:57 -08:00
webhook Fix external-prometheus integration test flakyness (#6575) 2021-07-29 13:29:31 -05:00
Dockerfile Remove core dependency on viz (#6497) 2021-07-19 14:28:45 -07:00