This helps ensure a minimum level of security. The two places this affects is our controller webhook and linkerd-viz tap API.
The controller requires that kube-api supports TLSv1.3, which it does as of 1.19 (our minimum is currently 1.22). The linkerd-viz tap API is mostly used internally, and is deprecated. It may be worth revisiting if we want to keep it around at all.
Signed-off-by: Scott Fleener <scott@buoyant.io>
* Add ability to configure client-go's `QPS` and `Burst` settings
## Problem and Symptoms
When having a very large number of proxies request identity in a short period of time (e.g. during large node scaling events), the identity controller will attempt to validate the tokens sent by the proxies at a rate surpassing client-go's the default request rate threshold, triggering client-side throttling, which will delay the proxies initialization, and even failing their startup (after a 2m timeout). The identity controller will surface this through log entries like this:
```
time="2023-11-08T19:50:45Z" level=error msg="error validating token for web.emojivoto.serviceaccount.identity.linkerd.cluster.local: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline"
```
## Solution
Client-go's default `QPS` is 5 and `Burst` is 10. This PR exposes those settings as entries in `values.yaml` with defaults of 100 and 200 respectively. Note this only applies to the identity controller, as it's the only controller performing direct requests to the `kube-apiserver` in a hot path. The other controllers mostly rely in informers, and direct calls are sporadic.
## Observability
The `QPS` and `Burst` settings used are exposed both as a log entry as soon as the controller starts, and as in the new metric gauges `http_client_qps` and `http_client_burst`
## Testing
You can use the following K6 script, which simulates 6k calls to the `Certify` service during one minute from emojivoto's web pod. Before running this you need to:
- Put the identity.proto and [all the other proto files](https://github.com/linkerd/linkerd2-proxy-api/tree/v0.11.0/proto) in the same directory.
- Edit the [checkRequest](https://github.com/linkerd/linkerd2/blob/edge-23.11.3/pkg/identity/service.go#L266) function and add logging statements to figure the `token` and `csr` entries you can use here, that will be shown as soon as a web pod starts.
```javascript
import { Client, Stream } from 'k6/experimental/grpc';
import { sleep } from 'k6';
const client = new Client();
client.load(['.'], 'identity.proto');
// This always holds:
// req_num = (1 / req_duration ) * duration * VUs
// Given req_duration (0.5s) test duration (1m) and the target req_num (6k), we
// can solve for the required VUs:
// VUs = req_num * req_duration / duration
// VUs = 6000 * 0.5 / 60 = 50
export const options = {
scenarios: {
identity: {
executor: 'constant-vus',
vus: 50,
duration: '1m',
},
},
};
export default () => {
client.connect('localhost:8080', {
plaintext: true,
});
const stream = new Stream(client, 'io.linkerd.proxy.identity.Identity/Certify');
// Replace with your own token
let token = "ZXlKaGJHY2lPaUpTVXpJMU5pSXNJbXRwWkNJNkluQjBaV1pUZWtaNWQyVm5OMmxmTTBkV2VUTlhWSFpqTmxwSmJYRmtNMWRSVEhwNVNHWllhUzFaZDNNaWZRLmV5SmhkV1FpT2xzaWFXUmxiblJwZEhrdWJEVmtMbWx2SWwwc0ltVjRjQ0k2TVRjd01EWTRPVFk1TUN3aWFXRjBJam94TnpBd05qQXpNamt3TENKcGMzTWlPaUpvZEhSd2N6b3ZMMnQxWW1WeWJtVjBaWE11WkdWbVlYVnNkQzV6ZG1NdVkyeDFjM1JsY2k1c2IyTmhiQ0lzSW10MVltVnlibVYwWlhNdWFXOGlPbnNpYm1GdFpYTndZV05sSWpvaVpXMXZhbWwyYjNSdklpd2ljRzlrSWpwN0ltNWhiV1VpT2lKM1pXSXRPRFUxT1dJNU4yWTNZeTEwYldJNU5TSXNJblZwWkNJNklqaGlZbUV5WWpsbExXTXdOVGN0TkRnMk1TMWhNalZsTFRjelpEY3dOV1EzWmpoaU1TSjlMQ0p6WlhKMmFXTmxZV05qYjNWdWRDSTZleUp1WVcxbElqb2lkMlZpSWl3aWRXbGtJam9pWm1JelpUQXlNRE10TmpZMU55MDBOMk0xTFRoa09EUXRORGt6WXpBM1lXUTJaak0zSW4xOUxDSnVZbVlpT2pFM01EQTJNRE15T1RBc0luTjFZaUk2SW5ONWMzUmxiVHB6WlhKMmFXTmxZV05qYjNWdWREcGxiVzlxYVhadmRHODZkMlZpSW4wLnlwMzAzZVZkeHhpamxBOG1wVjFObGZKUDB3SC03RmpUQl9PcWJ3NTNPeGU1cnNTcDNNNk96VWR6OFdhYS1hcjNkVVhQR2x2QXRDRVU2RjJUN1lKUFoxVmxxOUFZZTNvV2YwOXUzOWRodUU1ZDhEX21JUl9rWDUxY193am9UcVlORHA5ZzZ4ZFJNcW9reGg3NE9GNXFjaEFhRGtENUJNZVZ6a25kUWZtVVZwME5BdTdDMTZ3UFZWSlFmNlVXRGtnYkI1SW9UQXpxSmcyWlpyNXBBY3F5enJ0WE1rRkhSWmdvYUxVam5sN1FwX0ljWm8yYzJWWk03T2QzRjIwcFZaVzJvejlOdGt3THZoSEhSMkc5WlNJQ3RHRjdhTkYwNVR5ZC1UeU1BVnZMYnM0ZFl1clRYaHNORjhQMVk4RmFuNjE4d0x6ZUVMOUkzS1BJLUctUXRUNHhWdw==";
// Replace with your own CSR
let csr = "MIIBWjCCAQECAQAwRjFEMEIGA1UEAxM7d2ViLmVtb2ppdm90by5zZXJ2aWNlYWNjb3VudC5pZGVudGl0eS5saW5rZXJkLmNsdXN0ZXIubG9jYWwwWTATBgcqhkjOPQIBBggqhkjOPQMBBwNCAATKjgVXu6F+WCda3Bbq2ue6m3z6OTMfQ4Vnmekmvirip/XGyi2HbzRzjARnIzGlG8wo4EfeYBtd2MBCb50kP8F8oFkwVwYJKoZIhvcNAQkOMUowSDBGBgNVHREEPzA9gjt3ZWIuZW1vaml2b3RvLnNlcnZpY2VhY2NvdW50LmlkZW50aXR5LmxpbmtlcmQuY2x1c3Rlci5sb2NhbDAKBggqhkjOPQQDAgNHADBEAiAM7aXY8MRs/EOhtPo4+PRHuiNOV+nsmNDv5lvtJt8T+QIgFP5JAq0iq7M6ShRNkRG99ZquJ3L3TtLWMNVTPvqvvUE=";
const data = {
identity: "web.emojivoto.serviceaccount.identity.linkerd.cluster.local",
token: token,
certificate_signing_request: csr,
};
stream.write(data);
// This request takes around 2ms, so this sleep will mostly determine its final duration
sleep(0.5);
};
```
This results in the following report:
```
scenarios: (100.00%) 1 scenario, 50 max VUs, 1m30s max duration (incl. graceful stop):
* identity: 50 looping VUs for 1m0s (gracefulStop: 30s)
data_received................: 6.3 MB 104 kB/s
data_sent....................: 9.4 MB 156 kB/s
grpc_req_duration............: avg=2.14ms min=873.93µs med=1.9ms max=12.89ms p(90)=3.13ms p(95)=3.86ms
grpc_streams.................: 6000 99.355331/s
grpc_streams_msgs_received...: 6000 99.355331/s
grpc_streams_msgs_sent.......: 6000 99.355331/s
iteration_duration...........: avg=503.16ms min=500.8ms med=502.64ms max=532.36ms p(90)=504.05ms p(95)=505.72ms
iterations...................: 6000 99.355331/s
vus..........................: 50 min=50 max=50
vus_max......................: 50 min=50 max=50
running (1m00.4s), 00/50 VUs, 6000 complete and 0 interrupted iterations
```
With the old defaults (QPS=5 and Burst=10), the latencies would be much higher and number of complete requests much lower.
* Add native sidecar support
Kubernetes will be providing beta support for native sidecar containers in version 1.29. This feature improves network proxy sidecar compatibility for jobs and initContainers.
Introduce a new annotation config.alpha.linkerd.io/proxy-enable-native-sidecar and configuration option Proxy.NativeSidecar that causes the proxy container to run as an init-container.
Fixes: #11461
Signed-off-by: TJ Miller <millert@us.ibm.com>
Adds support for remote discovery to the destination controller.
When the destination controller gets a `Get` request for a Service with the `multicluster.linkerd.io/remote-discovery` label, this is an indication that the destination controller should discover the endpoints for this service from a remote cluster. The destination controller will look for a remote cluster which has been linked to it (using the `linkerd multicluster link` command) with that name. It will look at the `multicluster.linkerd.io/remote-discovery` label for the service name to look up in that cluster. It then streams back the endpoint data for that remote service.
Since we now have multiple client-go informers for the same resource types (one for the local cluster and one for each linked remote cluster) we add a `cluster` label onto the prometheus metrics for the informers and EndpointWatchers to ensure that each of these components' metrics are correctly tracked and don't overwrite each other.
---------
Signed-off-by: Alex Leong <alex@buoyant.io>
Fixes https://github.com/linkerd/linkerd2/issues/10036
The Linkerd control plane components written in go serve liveness and readiness probes endpoint on their admin server. However, the admin server is not started until k8s informer caches are synced, which can take a long time on large clusters. This means that liveness checks can time out causing the controller to be restarted.
We start the admin server before attempting to sync caches so that we can respond to liveness checks immediately. We fail readiness probes until the caches are synced.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Removed dupe imports
My IDE (vim-gopls) has been complaining for a while, so I decided to take
care of it. Found via
[staticcheck](https://github.com/dominikh/go-tools)
* Add stylecheck to go-lint checks
* Use metadata API in the proxy and tap injectors
Part of #9485
This adds a new `MetadataAPI` similar to the current `k8s.API` hosting informers, but backed by k8s' `metadatainformer` shared informers, which retrieves only the objects metadata, resulting in less memory consumption by its clients. Currently this is only implemented for the proxy and tap injectors. Usage by the destination controller will be implemented as a follow-up.
## Existing API enhancements
Shared objects and logic required by API and MetadataAPI have been moved to the new `k8s.go`, `api_resource.go` and `prometheus.go` files. That includes the `isValidRSParent()` function whose arg is now more generic.
## Unit tests
`/controller/k8s/api_test.go` now also instantiates a MetadataAPI, used in the augmented `TestGetObjects()` and `TestGetOwnerKindAndName()` tests. The `resources` struct was introduced to capture the common fields among tests and simplify `newMockAPI()`'s signature.
## Other Changes
The injector no longer watches for Pods. It only requires watching workloads that own resources (and also watch namespaces), so Pod is not required.
## Testing Memory Consumption
Install linkerd, inject emojivoto and check the injector memory consumption with `kubectl -n linkerd top pod linkerd-proxy-injector-xxx`. It'll start consuming about 16Mi. Then ramp up emojivoto's `voting` deployment replicas to 2000. After 5 minutes memory will stabilize around 32Mi using the current branch. Using the latest edge, it'll stabilize around 110Mi.
04a66ba added a `ReadHeaderTimeout` to our HTTP servers (at gosec's
insistence). We chose a fairly arbitrary timeout of 10s. This
configuration causes any connection that has been idle for 10s to be
torn down by the server. Unfortunately, this timeout value matches the
default Kubernetes probe interval and the default linkerd-viz scrape
interval. This can cause probes to race the timeout so that the
connection is healthy from the proxy's point of view and a request is
sent on the connection exactly as the server drops the connection.
These request failures cause controller success rate to appear degraded.
To correct this, this change raises the timeout to 15s so that the
timeout no longer matches the default probe interval.
The proxy's HTTP client is supposed to [retry] requests that encounter
this type of error. We should follow up by doing more research into why
that is not occurring in this situation.
[retry]: https://docs.rs/hyper/0.14.20/hyper/client/struct.Builder.html#method.retry_canceled_requests
Signed-off-by: Oliver Gould <ver@buoyant.io>
Newer versions of golangci-lint flag `http.Server` instances that do not
set a `ReadHeaderTimeout` as being vulnerable to "slowloris" attacks,
wherein clients initiate requests that hold connections open
indefinitely.
This change sets a `ReadHeaderTimeout` of 10s. This timeout is fairly
conservative so that clients can eagerly create connections, but is
still constrained enough that these connections won't remain open
indefinitely.
This change also updates kubert to v0.9.1, which instruments a header
read timeout on the policy admission server.
Signed-off-by: Oliver Gould <ver@buoyant.io>
Follow-up to #8087 that allows pprof to be enabled via the `--set
enablePprof=true` flag.
Each control plane components spawns its own admin server, so each of these
received it's own `enable-pprof` flag. When `enablePprof=true`, it is passed
through to each component so that when it launches its admin server, its pprof
endpoints are enabled.
A note on the templating: `-enable-pprof={{.Values.enablePprof | default
false}}`. `false` values are not rendered by Helm so without the `... | default
false}}`, it tries to pass the flag as `-enable-pprof=""` which results in an
error. Inlining this felt better than conditionally passing the flag with
```yaml {{ if .Values.enablePprof -}} -enable-pprof={{.Values.enablePprof}} {{
end -}} ```
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
Closes#7826
This adds the `gosec` and `errcheck` lints to the `golangci` configuration. Most significant lints have been fixed my individual changes, but this enables them by default so that all future changes are caught ahead of time.
A significant amount of these lints are been exluced by the various `exclude-rules` rules added to `.golangci.yml`. These include operations are files that generally do not fail such as `Copy`, `Flush`, or `Write`. We also choose to ignore most errors when cleaning up functions via the `defer` keyword.
Aside from those, there are several other rules added that all have comments explaining why it's okay to ignore the errors that they cover.
Finally, several smaller fixes in the code have been made where it seems necessary to catch errors or at least log them.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
[gocritic][gc] helps to enforce some consistency and check for potential
errors. This change applies linting changes and enables gocritic via
golangci-lint.
[gc]: https://github.com/go-critic/go-critic
Signed-off-by: Oliver Gould <ver@buoyant.io>
Since Go 1.13, errors may "wrap" other errors. [`errorlint`][el] checks
that error formatting and inspection is wrapping-aware.
This change enables `errorlint` in golangci-lint and updates all error
handling code to pass the lint. Some comparisons in tests have been left
unchanged (using `//nolint:errorlint` comments).
[el]: https://github.com/polyfloyd/go-errorlint
Signed-off-by: Oliver Gould <ver@buoyant.io>
When the webhook server decodes a request's JSON payload, it may try to
use a nil value when handling the error. Furthermore, if the JSON
payload has a `nil` `Request` value, it may attempt to dereference the
value.
This change improves the webhook server's error handling to return a
`400 Bad Request` status if either of these cases are encountered.
Signed-off-by: Oliver Gould <ver@buoyant.io>
In several places where we build TLS servers (usually in our webhooks),
we use the default TLS configuration, which enables legacy versions of
TLS.
This change updates these servers to specify a minimum TLS version of
v1.2.
Signed-off-by: Oliver Gould <ver@buoyant.io>
Another attempt at fixing #6511
Even after #6524, we continued experiencing discrepancies on the
linkerd-edges integration test. The problem ended up being the external
prometheus instance not being injected. The injector logs revealed this:
```console
2021-07-29T13:57:10.2497460Z time="2021-07-29T13:54:15Z" level=info msg="caches synced"
2021-07-29T13:57:10.2498191Z time="2021-07-29T13:54:15Z" level=info msg="starting admin server on :9995"
2021-07-29T13:57:10.2498935Z time="2021-07-29T13:54:15Z" level=info msg="listening at :8443"
2021-07-29T13:57:10.2499945Z time="2021-07-29T13:54:18Z" level=info msg="received admission review request 2b7b4970-db40-4bda-895b-bb2e95e98265"
2021-07-29T13:57:10.2511751Z time="2021-07-29T13:54:18Z" level=debug msg="admission request: &AdmissionRequest{UID:2b7b4970-db40-4bda-895b-bb2e95e98265,Kind:/v1, Kind=Service,Resource:{ v1 services},SubResource:,Name:metrics-api,Namespace:linkerd-viz...
```
Usually one expects the webhook server to start first ("listening at
:8443") and then the admin server, but in this case it happened the
other way around. The admin server serves the readiness probe, so k8s
was signaled that the injector was ready before it could listen to
webhook requests, and given the WebhookFailurePolicy is Ignore by
default, sometimes this was causing for the prometheus pod creation
event to get missed, and we see in the log above that it starts by
processing the pods that are created afterwards, which are the viz ones.
In this fix we start first the webhook server, then block on the syncing
of the k8s API, which should give enough time for the webhook to be up,
and finally we start the admin server.
## What this changes
This adds a tap-injector component to the `linkerd-viz` extension which is
responsible for adding the tap service name environment variable to the Linkerd
proxy container.
If a pod does not have a Linkerd proxy, no action is taken. If tap is disabled
via annotation on the pod or the namespace, no action is taken.
This also removes the environment variable for explicitly disabling tap through
an environment variable. Tap status for a proxy is now determined only be the
presence or absence of the tap service name environment variable.
Closes#5326
## How it changes
### tap-injector
The tap-injector component determines if `LINKERD2_PROXY_TAP_SVC_NAME` should be
added to a pod's Linkerd proxy container environment. If the pod satisfies the
following, it is added:
- The pod has a Linkerd proxy container
- The pod has not already been mutated
- Tap is not disabled via annotation on the pod or the pod's namespace
### LINKERD2_PROXY_TAP_DISABLED
Now that tap is an extension of Linkerd and not a core component, it no longer
made sense to explicitly enable or disable tap through this Linkerd proxy
environment variable. The status of tap is now determined only be if the
tap-injector adds or does not add the `LINKERD2_PROXY_TAP_SVC_NAME` environment
variable.
### controller image
The tap-injector has been added to the controller image's several startup
commands which determines what it will do in the cluster.
As a follow-up, I think splitting out the `tap` and `tap-injector` commands from
the controller image into a linkerd-viz image (or something like that) makes
sense.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Followup to #5282, fixes#5272 in its totality.
This follows the same pattern as the injector/sp-validator webhooks, leveraging `FsCredsWatcher` to watch for changes in the cert files.
To reuse code from the webhooks, we moved `updateCert()` to `creds_watcher.go`, and `run()` as well (which now is called `ProcessEvents()`).
The `TestNewAPIServer` test in `apiserver_test.go` was removed as it really was just testing two things: (1) that `apiServerAuth` doesn't error which is already covered in the following test, and (2) that the golib call `net.Listen("tcp", addr)` doesn't error, which we're not interested in testing here.
## How to test
To test that the injector/sp-validator functionality is still correct, you can refer to #5282
The steps below are similar, but focused towards the tap component:
```bash
# Create some root cert
$ step certificate create linkerd-tap.linkerd.svc ca.crt ca.key --profile root-ca --no-password --insecure
# configure tap's caBundle to be that root cert
$ cat > linkerd-overrides.yml << EOF
tap:
externalSecret: true
caBundle: |
< ca.crt contents>
EOF
# Install linkerd
$ bin/linkerd install --config linkerd-overrides.yml | k apply -f -
# Generate an intermediatery cert with short lifespan
$ step certificate create linkerd-tap.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-tap.linkerd.svc
# Create the secret using that intermediate cert
$ kubectl create secret tls \
linkerd-tap-k8s-tls \
--cert=ca-int.crt \
--key=ca-int.key \
--namespace=linkerd
# Rollout the tap pod for it to pick the new secret
$ k -n linkerd rollout restart deploy/linkerd-tap
# Tap should work
$ bin/linkerd tap -n linkerd deploy/linkerd-web
req id=0:0 proxy=in src=10.42.0.15:33040 dst=10.42.0.11:9994 tls=true :method=GET :authority=10.42.0.11:9994 :path=/metrics
rsp id=0:0 proxy=in src=10.42.0.15:33040 dst=10.42.0.11:9994 tls=true :status=200 latency=1779µs
end id=0:0 proxy=in src=10.42.0.15:33040 dst=10.42.0.11:9994 tls=true duration=65µs response-length=1709B
# Wait 5 minutes and rollout tap again
$ k -n linkerd rollout restart deploy/linkerd-tap
# You'll see in the logs that the cert expired:
$ k -n linkerd logs -f deploy/linkerd-tap tap
2020/12/15 16:03:41 http: TLS handshake error from 127.0.0.1:45866: remote error: tls: bad certificate
2020/12/15 16:03:41 http: TLS handshake error from 127.0.0.1:45870: remote error: tls: bad certificate
# Recreate the secret
$ step certificate create linkerd-tap.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-tap.linkerd.svc
$ k -n linkerd delete secret linkerd-tap-k8s-tls
$ kubectl create secret tls \
linkerd-tap-k8s-tls \
--cert=ca-int.crt \
--key=ca-int.key \
--namespace=linkerd
# Wait a few moments and you'll see the certs got reloaded and tap is working again
time="2020-12-15T16:03:42Z" level=info msg="Updated certificate" addr=":8089" component=apiserver
```
* Have webhooks refresh their certs automatically
Fixes partially #5272
In 2.9 we introduced the ability for providing the certs for `proxy-injector` and `sp-validator` through some external means like cert-manager, through the new helm setting `externalSecret`.
We forgot however to have those services watch changes in their secrets, so whenever they were rotated they would fail with a cert error, with the only workaround being to restart those pods to pick the new secrets.
This addresses that by first abstracting out `FsCredsWatcher` from the identity controller, which now lives under `pkg/tls`.
The webhook's logic in `launcher.go` no longer reads the certs before starting the https server, moving that instead into `server.go` which in a similar way as identity will receive events from `FsCredsWatcher` and update `Server.cert`. We're leveraging `http.Server.TLSConfig.GetCertificate` which allows us to provide a function that will return the current cert for every incoming request.
### How to test
```bash
# Create some root cert
$ step certificate create linkerd-proxy-injector.linkerd.svc ca.crt ca.key \
--profile root-ca --no-password --insecure --san linkerd-proxy-injector.linkerd.svc
# configure injector's caBundle to be that root cert
$ cat > linkerd-overrides.yaml << EOF
proxyInjector:
externalSecret: true
caBundle: |
< ca.crt contents>
EOF
# Install linkerd. The injector won't start untill we create the secret below
$ bin/linkerd install --controller-log-level debug --config linkerd-overrides.yaml | k apply -f -
# Generate an intermediatery cert with short lifespan
step certificate create linkerd-proxy-injector.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-proxy-injector.linkerd.svc
# Create the secret using that intermediate cert
$ kubectl create secret tls \
linkerd-proxy-injector-k8s-tls \
--cert=ca-int.crt \
--key=ca-int.key \
--namespace=linkerd
# start following the injector log
$ k -n linkerd logs -f -l linkerd.io/control-plane-component=proxy-injector -c proxy-injector
# Inject emojivoto. The pods should be injected normally
$ bin/linkerd inject https://run.linkerd.io/emojivoto.yml | kubectl apply -f -
# Wait about 5 minutes and delete a pod
$ k -n emojivoto delete po -l app=emoji-svc
# You'll see it won't be injected, and something like "remote error: tls: bad certificate" will appear in the injector logs.
# Regenerate the intermediate cert
$ step certificate create linkerd-proxy-injector.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-proxy-injector.linkerd.svc
# Delete the secret and recreate it
$ k -n linkerd delete secret linkerd-proxy-injector-k8s-tls
$ kubectl create secret tls \
linkerd-proxy-injector-k8s-tls \
--cert=ca-int.crt \
--key=ca-int.key \
--namespace=linkerd
# Wait a couple of minutes and you'll see some filesystem events in the injector log along with a "Certificate has been updated" entry
# Then delete the pod again and you'll see it gets injected this time
$ k -n emojivoto delete po -l app=emoji-svc
```
* Refactor webhook framework to allow webhook define their flags
Pulled out of `launcher.go` the flag parsing logic and moved it into the `Main` methods of the webhooks (under `controller/cmd/proxy.injector/main.go` and `controller/cmd/sp-validator/main.go`), so that individual webhooks themselves can define the flags they want to use.
Also no longer require that webhooks have cluster-wide access.
Finally, renamed the type `webhook.handlerFunc` to `webhook.Handler` so it can be exported. This will be used in the upcoming jaeger webhook.
Fixes#4191#4993
This bumps Kubernetes client-go to the latest v0.19.2 (We had to switch directly to 1.19 because of this issue). Bumping to v0.19.2 required upgrading to smi-sdk-go v0.4.1. This also depends on linkerd/stern#5
This consists of the following changes:
- Fix ./bin/update-codegen.sh by adding the template path to the gen commands, as it is needed after we moved to GOMOD.
- Bump all k8s related dependencies to v0.19.2
- Generate CRD types, client code using the latest k8s.io/code-generator
- Use context.Context as the first argument, in all code paths that touch the k8s client-go interface
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
This PR introduces a service mirroring component that is responsible for watching remote clusters and mirroring their services locally.
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
The controller Docker image included 7 Go binaries (destination,
heartbeat, identity, proxy-injector, public-api, sp-validator, tap),
each roughly 35MB, with similar dependencies.
Change each controller binary into subcommands of a single `controller`
binary, decreasing the controller Docker image size from 315MB to 38MB.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Have the proxy-injector emit events upon injection/skipping injection
Fixes#3253
Have the proxy-injector emit an event whenever a injection happens, or
when injection is skipped for some reason (also added that reason into
the proxy-injector logs). The level is associated to the parent workload
(it can't be associated to the pod because at this point the pod hasn't
been persisted).
The event recorder was setup at the `webhook/server.go` level and passed
to the proxy-injector's `Inject` function. The sp-validator thus also
has access to the event recorder, but for now it's not using it.
Related changes:
- Refactored `api.GetOwnerKindAndName()` to have it return a more
generic object.
- Refactored `report.Injectable()` to also have it return the reason why
a workload is not injectable.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
* Update helm charts to include webhooks config and TLS secret
* Update the webhooks to read the secret cert and key
* Update webhooks to not recreate config on restart
* Ensure upgrade preserve existing secrets
* Revert the change to rename the webhook configs
The renaming change breaks upgrade, where the new webhook configs conflict with
the existing ones. The older resources aren't deleted during upgrade because
they are dynamically created.
* Make the secret volume read-only
* Remove unnecessary exported getter functions
* Remove obsolete mwc and vwc templates
Signed-off-by: Ivan Sim <ivan@buoyant.io>
Add validation webhook for service profiles
Fixes#2075
Todo in a follow-up PRs: remove the SP check from the CLI check.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>