The purpose of this PR is to mirror StatefulSets in a multicluster setting. Currently, it isn't possible to communicate with a specific pod in a StatefulSet across clusters without manually creating clusterIP services for each pod backing the StatefulSet in the target cluster.
After some brainstorming, we decided that one way to solve this problem is to have the Service Mirror component create a "root" headless service in our source cluster along with clusterIP services (one for each pod backing the StatefulSet in the target cluster). The idea here is that each individual clusterIP service will also have an Endpoints object whose only
host is the Gateway IP -- this is the way mirrored services are constructed in a multicluster environment. The Endpoints object for the root service will contain pairs of hostnames and IP addresses; each hostname maps to the name of a pod in the StatefulSet, its IP corresponds to the clusterIP service that the Service Mirror would create in the source cluster.
To exemplify, assume a StatefulSet `foo` in a target cluster `west` with 2 pods (foo-0, foo-1). In the source cluster `east`, we create a headless root service foo-west` and 2 services (`foo-0-west`, `foo-1-west`) whose Endpoints point to the Gateway IP. Foo-west's Endpoints will contain an AddressSet with two hosts:
```yaml
# foo-west Endpoints
- hostname: foo-0
ip: <clusterIP of foo-0-west>
- hostname: foo-1
ip: <clusterIP of foo-1-west>
```
By making these changes, we solve the concerns associated with manually creating these services since the Service Mirror would reconcile, create and delete the clusterIP services (as opposed to requiring any interaction from the end user). Furthermore, by having a "root" headless service we can also configure DNS -- for an end user, there wouldn't be any difference in addressing a specific pod in the StatefulSet as far as syntax goes (i.e the host `foo-0.foo-west.default.svc.cluster.local` would point to the pod foo-0 in cluster west).
Closes#5162
Another attempt at fixing #6511
Even after #6524, we continued experiencing discrepancies on the
linkerd-edges integration test. The problem ended up being the external
prometheus instance not being injected. The injector logs revealed this:
```console
2021-07-29T13:57:10.2497460Z time="2021-07-29T13:54:15Z" level=info msg="caches synced"
2021-07-29T13:57:10.2498191Z time="2021-07-29T13:54:15Z" level=info msg="starting admin server on :9995"
2021-07-29T13:57:10.2498935Z time="2021-07-29T13:54:15Z" level=info msg="listening at :8443"
2021-07-29T13:57:10.2499945Z time="2021-07-29T13:54:18Z" level=info msg="received admission review request 2b7b4970-db40-4bda-895b-bb2e95e98265"
2021-07-29T13:57:10.2511751Z time="2021-07-29T13:54:18Z" level=debug msg="admission request: &AdmissionRequest{UID:2b7b4970-db40-4bda-895b-bb2e95e98265,Kind:/v1, Kind=Service,Resource:{ v1 services},SubResource:,Name:metrics-api,Namespace:linkerd-viz...
```
Usually one expects the webhook server to start first ("listening at
:8443") and then the admin server, but in this case it happened the
other way around. The admin server serves the readiness probe, so k8s
was signaled that the injector was ready before it could listen to
webhook requests, and given the WebhookFailurePolicy is Ignore by
default, sometimes this was causing for the prometheus pod creation
event to get missed, and we see in the log above that it starts by
processing the pods that are created afterwards, which are the viz ones.
In this fix we start first the webhook server, then block on the syncing
of the k8s API, which should give enough time for the webhook to be up,
and finally we start the admin server.
Fixes#6452
We add a `linkerd-identity-trust-roots` ConfigMap which contains the configured trust root bundle. The proxy template partial is modified so that core control plane components load this bundle from the configmap through the downward API.
The identity controller is updated to mount this new configmap as a volume read the trust root bundle at startup.
Similarly, the proxy-injector also mounts this new configmap. For each pod it injects, it reads the trust root bundle file and sets it on the injected pod.
Signed-off-by: Alex Leong <alex@buoyant.io>
Followup to #6496
A test was added in #6496 in `healthckeck_test.go` to ensure the
`validateControlPlanePods` function was fully covered on every run, but
it was actually bailing out early because the `linkerd-destination` and
`linkerd-identity` were not included in the list of pods passed.
## edge-21.7.4
This release continues to focus on dependency updates. It also adds the
`l5d-proxy-error` information header to distinguish proxy generated errors
proxy generated errors from application generated errors.
* Updated several project dependencies
* Added a new `l5d-proxy-error` on responses that allows proxy-generated error
responses to be distinguished from application-generated error responses.
* Removed support for configuring HTTP/2 keepalives via the proxy.
Configuring this setting would sometimes cause conflicts with Go gRPC servers
and clients
* Added a new `target_addr` label to `*_tcp_accept_errors` metrics to improve
diagnostics, especially for TLS detection timeouts
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
These release features a change to gateway proxies to support endpoint
targets. Previously, only logical services were supported as gateway
targets.
The proxy now sets an informational header, `l5d-proxy-error`, when the
proxy encounters an internal error. This allows proxy-generated error
responses to be distinguished from application-generated error
responses.
HTTP/2 keepalives are no longer configured by the proxy. This resolves
conflicts with some Go gRPC clients & servers (as described in
linkerd/linkerd2#5988).
Finally, the `*_tcp_accept_errors` metrics now include a `target_addr`
label. This improves diagnostics, especially for TLS detection timeouts.
---
* metrics: Fix metrics test code allowing incomplete matches (linkerd/linkerd2-proxy#1146)
* Add `map_stack` to inbound & outbound builders (linkerd/linkerd2-proxy#1148)
* Update linkerd2-proxy-api to v0.2.0 (linkerd/linkerd2-proxy#1152)
* Remove HTTP/2 keepalive configuration (linkerd/linkerd2-proxy#1149)
* metrics: add target_addr label to TCP accept error metrics (linkerd/linkerd2-proxy#1118)
* build(deps): bump codecov/codecov-action from 1.5.2 to 2.0.1 (linkerd/linkerd2-proxy#1153)
* build(deps): bump tokio from 1.8.1 to 1.8.2 (linkerd/linkerd2-proxy#1155)
* app: Set the `l5d-proxy-error` header on synthesized responses (linkerd/linkerd2-proxy#1119)
* Handle profile endpoint in Gateway outbound stack (linkerd/linkerd2-proxy#1157)
* inbound: Reorganize server into smaller stacks (linkerd/linkerd2-proxy#1156)
* error: Replace `Never` with `std::convert::Infallible` (linkerd/linkerd2-proxy#1158)
Increase container security by making the root file system of the cni
install plugin read-only.
Change the temporary directory used in the cni install script, add a
writable EmptyDir volume and enable readOnlyFileSystem securityContext
in cni plugin helm chart.
Tested this by building the container image of the cni plugin and
installed the chart onto a cluster. Logs looked the same as before this
change.
Fixes#6468
Signed-off-by: Gerald Pape <gerald@giantswarm.io>
Fixes#6511
In the `external-prometheus-deep` integration test, make sure we wait for the rollout of the external prometheus instance before proceeding.
Also remove the special logic around `if TestHelper.ExternalPrometheus()` in the helm-related tests because we know we're using the embedded linkerd prometheus instance in those tests.
Fixes#5589
The core control plane has a dependency on the viz package in order to use the `BuildResource` function. This "backwards" dependency means that the viz source code needs to be included in core docker-builds and is bad for code hygiene.
We move the `BuildResource` function into the viz package. In `cli/cmd/metrics.go` we replace a call to `BuildResource` with a call directly to `CanonicalResourceNameFromFriendlyName`.
Signed-off-by: Alex Leong <alex@buoyant.io>
Ingress version `extensions/v1beta1` will disappear in k8s 1.22, so
this upgrades it to `networking.k8s.io/v1` alongside with other required
Ingress changes, used in the tracing integration test.
The other disappearing APIs in k8s 1.22 (MutatingWebhookConfig, etc)
have already been taken care of.
We were getting sporadic coverage differences on `controller/k8s/test_helper.go` and `pkg/healthcheck/healthcheck_test.go` on pushes unrelated to those files.
For the former, the problem was in tests in `controller/k8s/api_test.go` that compared slices of pods and services by sorting them. The `Sort` interface was implemented through the methods in `test_helper.go`. There is indeterminism in that sorting at the go library level apparently, in that the `Swap` method is not always called, which impacted the coverage report. The fix consists on comparing those slices item by item without needing to sort beforehand.
As for `healthcheck_test.go`, `validateControlPlanePods()` in `healthcheck.go` short-circuits on the first pod having all its containers ready. The unit tests iterate over maps, an iteration we know is not deterministic, so sometimes the short-circuiting avoided to ever cover the `!container.Ready` block, thus affecting the coverage report. This is fixed by adding a new small test that makes sure that block is covered.
Fixes#6479
Add the lock threads action to lock closed issues and PRs after
30 days of inactivity. This action runs on a cron to check for
issues or PRs that should be locked.
Signed-off-by: Alex Leong <alex@buoyant.io>
## edge-21.7.3
This edge release introduces several changes around metrics. ReplicaSets are now
a supported resource and metrics can be associated with them. A new metric has
been added which counts proxy errors encountered before a protocol can be
detected. Finally, the request errors metric has been split into separate
inbound and outbound directions.
* Fixed printing `check --pre` command usage if it fails after being unable to
connect to Kubernetes (thanks @rdileep13!)
* Updated the default skip and opaque ports to match that which is listed in the
[documentation](https://linkerd.io/2.10/features/protocol-detection/#configuring-protocol-detection)
* Added the `LINKERD2_PROXY_INBOUND_PORTS` environment variable during proxy
injection which will be used by ongoing policy changes
* Added client-go cache size metrics to the `diagnostics controller-metrics`
command
* Added validation that the certificate provided by an external issuer is a CA
(thanks @rumanzo!)
* Added metrics support for ReplicaSets
* Replaced the `request_errors_total` metric with two new metrics:
`inbound_http_errors_total` and `outbound_http_errors_total`
* Introduced the `inbound_tcp_accept_errors_total` and
`outbound_tcp_accept_errors_total` metrics which count proxy errors
encountered before a protocol can be detected
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>