Commit Graph

185 Commits

Author SHA1 Message Date
Matei David 407df01ec3
chore(controller): Remove stream concurrency limits (#12598)
Our gRPC servers use the default gRPC server configuration, which
limits the number of concurrent streams to 100. Since the controllers
run with proxies, this provides a hard scaling limit for the number of
watches an application can have.

This change updates our gRPC server configuration to clear the default
concurrency limit, allowing the server to handle as many streams as
possible.

Signed-off-by: Matei David <matei@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
2024-05-15 18:07:15 +01:00
Alejandro Pedraza 137eac9df3
Add IPv6 support for the destination controller (#12428)
Services in dual-stack mode result in the creation of two EndpointSlices, one for each IP family. Before this change, the Get Destination API would nondeterministically return the address for any of those ES, depending on which one was processed last by the controller because they would overwrite each other.

As part of the ongoing effort to support IPv6/dual-stack networks, this change fixes that behavior giving preference to IPv6 addresses whenever a service exposes both families.

There are a new set of unit tests in server_ipv6_test.go, and in the TestEndpointTranslatorForPods tests there's a couple of new cases to test the interaction with zone filtering.
Also the server unit tests were updated to segregate the tests and resources dealing with the IPv4/IPv6/dual-stack cases.
2024-05-02 14:39:05 -05:00
Oliver Gould aef8a02426
feat(destination): Add meshed HTTP/2 keep-alive settings (#12504)
This commit adds destination controller configuration that enables default
keep-alives for meshed HTTP/2 clients.

This is accomplished by encoding the raw protobuf message structure into the
helm values, and then encoding that as JSON in the destination controller's
command-line options. This allows operators to set any supported HTTP/2 client
configuration without having to modify the destination controller.
2024-04-30 19:35:30 +00:00
Hirotaka Tagawa / wafuwafu13 9a5284f453
controller: Stop logging errors on shutdown (#12167)
When a controller is shutdown, the admin server fails. This failure is logged
as an error, even when the shutdown was graceful.

This change updates the shutdown behavior to log more appropriately.

Signed-off-by: wafuwafu13 <jaruwafu@gmail.com>
2024-03-22 09:49:37 -07:00
Oliver Gould 2ab76b64c6
destination: Rename zone weighting flag to ext-endpoint-zone-weights (#12090) 2024-02-16 09:06:56 -05:00
Matei David 9c902dc6b4
Add an endpoints reconciler component for external workloads (#11948)
We introduced an endpoints controller that will be responsible for
managing EndpointSlices for services that select external workloads. We
introduce as a follow-up the reconciler component of the controller that
will be responsible for doing the writes and diffing.

Additionally, the controller is wired-up in the destination service's
main routine and will start if endpoint slice support is enabled.

---------

Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Co-authored-by: Zahari Dichev <zaharidichev@gmail.com>
2024-01-24 16:55:16 +00:00
Zahari Dichev 027d49a9a6
discovery: handle endpoint slices from ExternalWorkload (#11939)
This alters the endpoints slices watcher to handle slices that reference ExternalWorkloads.

Testing
Add the following resources: 

```yaml
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
addressType: IPv4
metadata:
  name: my-external-workload
  namespace: mixed-env
  labels:
    kubernetes.io/service-name: test-1
endpoints:
- addresses:
  - 172.21.0.5
  conditions:
    ready: true
    serving: true
    terminating: false
  targetRef:
    kind: ExternalWorkload
    name: my-external-workload
ports:
- port: 8080
  name: http
---
apiVersion: workload.linkerd.io/v1alpha1
kind: ExternalWorkload
metadata:
  name: my-external-workload
  namespace: mixed-env
  labels:
    app: test
spec:
  meshTls:
    identity: "test"
    serverName: "test"
  workloadIPs:
  - ip: 172.21.0.5
  ports:
  - port: 8080
    name: http
---
apiVersion: v1
kind: Service
metadata:
  name: test-1
  namespace: mixed-env
spec:
  selector:
    app: test
  type: ClusterIP
  ports:
  - name: http
    port: 8080
    targetPort: 8080
    protocol: TCP

```

Observe endpoints:
```
linkerd dg endpoints test-1.mixed-env.svc.cluster.local:8080
```

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2024-01-17 15:43:20 -08:00
Alex Leong a92d17dbe9
Fix error for profile lookups on unmeshed pods with port in default opaque list (#11550)
When we do a `GetProfile` lookup for an unmeshed pod, we set the `weightedAddr.ProtocolHint` to an empty value `&pb.ProtocolHint{}` to indicate that the address is unmeshed and has no protocol hint.  However, when the looked up port is in the default opaque list, we erroneously check if `weightedAddr.ProtocolHint != nil` to determine if we should attempt to get the inbound listen port for that pod.  Since `&pb.ProtocolHint{} != nil`, we attempt to get the inbound listen port for the unmeshed pod.  This results in an error, preventing any valid `GetProfile` responses from being returned.

We update the initialization logic for `weightedAddr.ProtocolHint` to only create a struct when a protocol hint is present and to leave it as `nil` if the pod is unmeshed.

We add a simple unit test for this behavior as well.

Signed-off-by: Alex Leong <alex@buoyant.io>
2023-12-20 13:56:49 -08:00
Oliver Gould 5e558ae3e5
destination: Add optional experimental endpoint weighting (#11795)
This change adds a runtime flag to the destination controller,
--experimental-endpoint-zone-weights=true, that causes endpoints in the
local zone to receive higher weights. This feature is disabled by
default, since the weight value is not honored by proxies. No helm
configuration is exposed yet, either.

This weighting is instrumented in the endpoint translator. Tests are
added to confirm that the behavior is feature-gated.

Additionally, this PR adds the "zone" metric label to endpoint metadata
responses.
2023-12-20 13:11:30 -08:00
Alejandro Pedraza 2d716299a1
Add ability to configure client-go's `QPS` and `Burst` settings (#11644)
* Add ability to configure client-go's `QPS` and `Burst` settings

## Problem and Symptoms

When having a very large number of proxies request identity in a short period of time (e.g. during large node scaling events), the identity controller will attempt to validate the tokens sent by the proxies at a rate surpassing client-go's the default request rate threshold, triggering client-side throttling, which will delay the proxies initialization, and even failing their startup (after a 2m timeout). The identity controller will surface this through log entries like this:

```
time="2023-11-08T19:50:45Z" level=error msg="error validating token for web.emojivoto.serviceaccount.identity.linkerd.cluster.local: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline"
```

## Solution

Client-go's default `QPS` is 5 and `Burst` is 10. This PR exposes those settings as entries in `values.yaml` with defaults of 100 and 200 respectively. Note this only applies to the identity controller, as it's the only controller performing direct requests to the `kube-apiserver` in a hot path. The other controllers mostly rely in informers, and direct calls are sporadic.

## Observability

The `QPS` and `Burst` settings used are exposed both as a log entry as soon as the controller starts, and as in the new metric gauges `http_client_qps` and `http_client_burst`

## Testing

You can use the following K6 script, which simulates 6k calls to the `Certify` service during one minute from emojivoto's web pod. Before running this you need to:

- Put the identity.proto and [all the other proto files](https://github.com/linkerd/linkerd2-proxy-api/tree/v0.11.0/proto) in the same directory.
- Edit the [checkRequest](https://github.com/linkerd/linkerd2/blob/edge-23.11.3/pkg/identity/service.go#L266) function and add logging statements to figure the `token` and `csr` entries you can use here, that will be shown as soon as a web pod starts.

```javascript
import { Client, Stream } from 'k6/experimental/grpc';
import { sleep } from 'k6';

const client = new Client();
client.load(['.'], 'identity.proto');

// This always holds:
// req_num = (1 / req_duration ) * duration * VUs
// Given req_duration (0.5s) test duration (1m) and the target req_num (6k), we
// can solve for the required VUs:
// VUs = req_num * req_duration / duration
// VUs = 6000 * 0.5 / 60 = 50
export const options = {
  scenarios: {
    identity: {
      executor: 'constant-vus',
      vus: 50,
      duration: '1m',
    },
  },
};

export default () => {
  client.connect('localhost:8080', {
    plaintext: true,
  });

  const stream = new Stream(client, 'io.linkerd.proxy.identity.Identity/Certify');

  // Replace with your own token
  let token = "ZXlKaGJHY2lPaUpTVXpJMU5pSXNJbXRwWkNJNkluQjBaV1pUZWtaNWQyVm5OMmxmTTBkV2VUTlhWSFpqTmxwSmJYRmtNMWRSVEhwNVNHWllhUzFaZDNNaWZRLmV5SmhkV1FpT2xzaWFXUmxiblJwZEhrdWJEVmtMbWx2SWwwc0ltVjRjQ0k2TVRjd01EWTRPVFk1TUN3aWFXRjBJam94TnpBd05qQXpNamt3TENKcGMzTWlPaUpvZEhSd2N6b3ZMMnQxWW1WeWJtVjBaWE11WkdWbVlYVnNkQzV6ZG1NdVkyeDFjM1JsY2k1c2IyTmhiQ0lzSW10MVltVnlibVYwWlhNdWFXOGlPbnNpYm1GdFpYTndZV05sSWpvaVpXMXZhbWwyYjNSdklpd2ljRzlrSWpwN0ltNWhiV1VpT2lKM1pXSXRPRFUxT1dJNU4yWTNZeTEwYldJNU5TSXNJblZwWkNJNklqaGlZbUV5WWpsbExXTXdOVGN0TkRnMk1TMWhNalZsTFRjelpEY3dOV1EzWmpoaU1TSjlMQ0p6WlhKMmFXTmxZV05qYjNWdWRDSTZleUp1WVcxbElqb2lkMlZpSWl3aWRXbGtJam9pWm1JelpUQXlNRE10TmpZMU55MDBOMk0xTFRoa09EUXRORGt6WXpBM1lXUTJaak0zSW4xOUxDSnVZbVlpT2pFM01EQTJNRE15T1RBc0luTjFZaUk2SW5ONWMzUmxiVHB6WlhKMmFXTmxZV05qYjNWdWREcGxiVzlxYVhadmRHODZkMlZpSW4wLnlwMzAzZVZkeHhpamxBOG1wVjFObGZKUDB3SC03RmpUQl9PcWJ3NTNPeGU1cnNTcDNNNk96VWR6OFdhYS1hcjNkVVhQR2x2QXRDRVU2RjJUN1lKUFoxVmxxOUFZZTNvV2YwOXUzOWRodUU1ZDhEX21JUl9rWDUxY193am9UcVlORHA5ZzZ4ZFJNcW9reGg3NE9GNXFjaEFhRGtENUJNZVZ6a25kUWZtVVZwME5BdTdDMTZ3UFZWSlFmNlVXRGtnYkI1SW9UQXpxSmcyWlpyNXBBY3F5enJ0WE1rRkhSWmdvYUxVam5sN1FwX0ljWm8yYzJWWk03T2QzRjIwcFZaVzJvejlOdGt3THZoSEhSMkc5WlNJQ3RHRjdhTkYwNVR5ZC1UeU1BVnZMYnM0ZFl1clRYaHNORjhQMVk4RmFuNjE4d0x6ZUVMOUkzS1BJLUctUXRUNHhWdw==";
  // Replace with your own CSR
  let csr = "MIIBWjCCAQECAQAwRjFEMEIGA1UEAxM7d2ViLmVtb2ppdm90by5zZXJ2aWNlYWNjb3VudC5pZGVudGl0eS5saW5rZXJkLmNsdXN0ZXIubG9jYWwwWTATBgcqhkjOPQIBBggqhkjOPQMBBwNCAATKjgVXu6F+WCda3Bbq2ue6m3z6OTMfQ4Vnmekmvirip/XGyi2HbzRzjARnIzGlG8wo4EfeYBtd2MBCb50kP8F8oFkwVwYJKoZIhvcNAQkOMUowSDBGBgNVHREEPzA9gjt3ZWIuZW1vaml2b3RvLnNlcnZpY2VhY2NvdW50LmlkZW50aXR5LmxpbmtlcmQuY2x1c3Rlci5sb2NhbDAKBggqhkjOPQQDAgNHADBEAiAM7aXY8MRs/EOhtPo4+PRHuiNOV+nsmNDv5lvtJt8T+QIgFP5JAq0iq7M6ShRNkRG99ZquJ3L3TtLWMNVTPvqvvUE=";

  const data = {
		identity:                     "web.emojivoto.serviceaccount.identity.linkerd.cluster.local",
    token:                        token,
		certificate_signing_request:  csr,
  };
  stream.write(data);

  // This request takes around 2ms, so this sleep will mostly determine its final duration
  sleep(0.5);
};
```

This results in the following report:

```
scenarios: (100.00%) 1 scenario, 50 max VUs, 1m30s max duration (incl. graceful stop):
           * identity: 50 looping VUs for 1m0s (gracefulStop: 30s)

     data_received................: 6.3 MB 104 kB/s
     data_sent....................: 9.4 MB 156 kB/s
     grpc_req_duration............: avg=2.14ms   min=873.93µs med=1.9ms    max=12.89ms  p(90)=3.13ms   p(95)=3.86ms
     grpc_streams.................: 6000   99.355331/s
     grpc_streams_msgs_received...: 6000   99.355331/s
     grpc_streams_msgs_sent.......: 6000   99.355331/s
     iteration_duration...........: avg=503.16ms min=500.8ms  med=502.64ms max=532.36ms p(90)=504.05ms p(95)=505.72ms
     iterations...................: 6000   99.355331/s
     vus..........................: 50     min=50      max=50
     vus_max......................: 50     min=50      max=50

running (1m00.4s), 00/50 VUs, 6000 complete and 0 interrupted iterations
```

With the old defaults (QPS=5 and Burst=10), the latencies would be much higher and number of complete requests much lower.
2023-11-28 15:25:05 -05:00
Alex Leong 9f210a02f9
Destination controller should initialize jobs metadata api (#11541)
The destination controller uses the metadata API to retrieve Job metadata, but does not properly initialize the metadata API to sync jobs. This results in error messages in the log and a fallback to using direct API calls instead of the informer cache.

We properly initialize the jobs informer so that this works as expected.

Signed-off-by: Alex Leong <alex@buoyant.io>
2023-10-27 16:18:33 +01:00
Alex Leong 368b63866d
Add support for remote discovery (#11224)
Adds support for remote discovery to the destination controller.

When the destination controller gets a `Get` request for a Service with the `multicluster.linkerd.io/remote-discovery` label, this is an indication that the destination controller should discover the endpoints for this service from a remote cluster.  The destination controller will look for a remote cluster which has been linked to it (using the `linkerd multicluster link` command) with that name.  It will look at the `multicluster.linkerd.io/remote-discovery` label for the service name to look up in that cluster.  It then streams back the endpoint data for that remote service.

Since we now have multiple client-go informers for the same resource types (one for the local cluster and one for each linked remote cluster) we add a `cluster` label onto the prometheus metrics for the informers and EndpointWatchers to ensure that each of these components' metrics are correctly tracked and don't overwrite each other.

---------

Signed-off-by: Alex Leong <alex@buoyant.io>
2023-08-11 09:31:45 -07:00
Alejandro Pedraza 4a84f2cb32
Implement the k8s metadata API in the Destination controller (#10326)
Fixes #9986

After reviewing the k8s API calls in Destination, it was concluded we
could only swap out the calls to the Node and RS resources to use the
metadata API, as all the other resources (Endpoints, EndpointSlices,
Services, Pod, ServiceProfiles, Server) required fields other than those
found in their metadata section.

This also required completing the `NewFakeAPI` implementation by adding
the missing annotations and labels entries.

## Testing Memory Consumption

The gains here aren't as big as in #9650. In order to test this we need
to push hard and create 4000 RS:

``` bash
for i in {0..4000}; do kubectl create deployment test-pod-$i --image=nginx; done
```

In edge-23.2.1 the destination pod's memory consumption goes from 40Mi
to 160Mi after all the RS were created. With this change, it went from
37Mi to 140Mi.
2023-02-13 17:30:07 -05:00
Alejandro Pedraza aaca63be1f
Remove unneeded `namespaces` access in Destination (#10324)
(This came out during the k8s api calls review for #9650)

In 5dc662ae9 inheritance of opaque ports annotation from namespaces to
pods was removed, but we didn't remove the associated RBAC.
2023-02-13 16:13:37 -05:00
Alex Leong b0778bb2ea
Readiness checks fail until caches are synced (#10166)
Fixes https://github.com/linkerd/linkerd2/issues/10036

The Linkerd control plane components written in go serve liveness and readiness probes endpoint on their admin server.  However, the admin server is not started until k8s informer caches are synced, which can take a long time on large clusters.  This means that liveness checks can time out causing the controller to be restarted.

We start the admin server before attempting to sync caches so that we can respond to liveness checks immediately.  We fail readiness probes until the caches are synced.

Signed-off-by: Alex Leong <alex@buoyant.io>
2023-01-25 11:43:09 -08:00
Alejandro Pedraza e6fa5a7156
Replace usage of io/ioutil package (#9613)
`io/ioutil` has been deprecated since go 1.16 and the linter started to
complain about it.
2022-10-13 12:10:58 -05:00
Oliver Gould fbe92fab40
heartbeat: Include the CPU architecture in reports (#9589)
It would be useful to know how prevalent 32-bit ARM deployments are so
that we can determine whether it makes sense to continue supporting this
platform. This change adds an `arch` field to heartbeats indicating the
CPU architecture of the heartbeat container.

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-10-10 12:31:05 -07:00
Oliver Gould b9ecbcb521
Remove needless RBAC on the identity controller (#9368)
The identity controller requires access to read all deployments. This
isn't necessary.

When these permissions were added in #3600, we incorrectly assumed that
we must pass a whole Deployment resource as a _parent_ when recording
events. The [EventRecorder docs] say:

> 'object' is the object this event is about. Event will make a
> reference--or you may also pass a reference to the object directly.

We can confirm this by reviewing the source for [GetReference]: we can
simply construct an ObjectReference without fetching it from the API.

This change lets us drop unnecessary privileges in the identity
controller.

[EventRecorder docs]: https://pkg.go.dev/k8s.io/client-go/tools/record#EventRecorder
[GetReference]: ab826d2728/tools/reference/ref.go (L38-L45)

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-09-13 12:36:14 -07:00
Kevin Leimkuhler 388f14f48f
allow pprof to be configurable via helm flags (#8090)
Follow-up to #8087 that allows pprof to be enabled via the `--set
enablePprof=true` flag.

Each control plane components spawns its own admin server, so each of these
received it's own `enable-pprof` flag. When `enablePprof=true`, it is passed
through to each component so that when it launches its admin server, its pprof
endpoints are enabled.

A note on the templating: `-enable-pprof={{.Values.enablePprof | default
false}}`. `false` values are not rendered by Helm so without the `... | default
false}}`, it tries to pass the flag as `-enable-pprof=""` which results in an
error. Inlining this felt better than conditionally passing the flag with

```yaml {{ if .Values.enablePprof -}} -enable-pprof={{.Values.enablePprof}} {{
end -}} ```

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2022-03-22 14:31:04 -06:00
Kevin Leimkuhler 67bcd8f642
Add `gosec` and `errcheck` lints (#7954)
Closes #7826

This adds the `gosec` and `errcheck` lints to the `golangci` configuration. Most significant lints have been fixed my individual changes, but this enables them by default so that all future changes are caught ahead of time.

A significant amount of these lints are been exluced by the various `exclude-rules` rules added to `.golangci.yml`. These include operations are files that generally do not fail such as `Copy`, `Flush`, or `Write`. We also choose to ignore most errors when cleaning up functions via the `defer` keyword.

Aside from those, there are several other rules added that all have comments explaining why it's okay to ignore the errors that they cover.

Finally, several smaller fixes in the code have been made where it seems necessary to catch errors or at least log them.

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2022-03-03 10:09:51 -07:00
Oliver Gould 425a43def5
Enable gocritic linting (#7906)
[gocritic][gc] helps to enforce some consistency and check for potential
errors. This change applies linting changes and enables gocritic via
golangci-lint.

[gc]: https://github.com/go-critic/go-critic

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-02-17 22:45:25 +00:00
Alejandro Pedraza 68b63269d9
Remove the `proxy.disableIdentity` config (#7729)
* Remove the `proxy.disableIdentity` config

Fixes #7724

Also:
- Removed the `linkerd.io/identity-mode` annotation.
- Removed the `config.linkerd.io/disable-identity` annotation.
- Removed the `linkerd.proxy.validation` template partial, which only
  made sense when `proxy.disableIdentity` was `true`.
- TestInjectManualParams now requires to hit the cluster to retrieve the
  trust root.
2022-01-31 10:17:10 -05:00
Alejandro Pedraza f9f3ebefa9
Remove namespace from charts and split them into `linkerd-crd` and `linkerd-control-plane` (#6635)
Fixes #6584 #6620 #7405

# Namespace Removal

With this change, the `namespace.yaml` template is rendered only for CLI installs and not Helm, and likewise the `namespace:` entry in the namespace-level objects (using a new `partials.namespace` helper).

The `installNamespace` and `namespace` entries in `values.yaml` have been removed.

There in the templates where the namespace is required, we moved from `.Values.namespace` to `.Release.Namespace` which is filled-in automatically by Helm. For the CLI, `install.go` now explicitly defines the contents of the `Release` map alongside `Values`.

The proxy-injector has a new `linkerd-namespace` argument given the namespace is no longer persisted in the `linkerd-config` ConfigMap, so it has to be passed in. To pass it further down to `injector.Inject()` without modifying the `Handler` signature, a closure was used.

------------
Update: Merged-in #6638: Similar changes for the `linkerd-viz` chart:

Stop rendering `namespace.yaml` in the `linkerd-viz` chart.

The additional change here is the addition of the `namespace-metadata.yaml` template (and its RBAC), _not_ rendered in CLI installs, which is a Helm `post-install` hook, consisting on a Job that executes a script adding the required annotations and labels to the viz namespace using a PATCH request against kube-api. The script first checks if the namespace doesn't already have an annotations/labels entries, in which case it has to add extra ops in that patch.

---------
Update: Merged-in the approved #6643, #6665 and #6669 which address the `linkerd2-cni`, `linkerd-multicluster` and `linkerd-jaeger` charts. 

Additional changes from what's already mentioned above:
- Removes the install-namespace option from `linkerd install-cni`, which isn't found in `linkerd install` nor `linkerd viz install` anyways, and it would add some complexity to support.
- Added a dependency on the `partials` chart to the `linkerd-multicluster-link` chart, so that we can tap on the `partials.namespace` helper.
- We don't have any more the restriction on having the muticluster objects live in a separate namespace than linkerd. It's still good practice, and that's the default for the CLI install, but I removed that validation.


Finally, as a side-effect, the `linkerd mc allow` subcommand was fixed; it has been broken for a while apparently:

```console
$ linkerd mc allow --service-account-name foobar
Error: template: linkerd-multicluster/templates/remote-access-service-mirror-rbac.yaml:16:7: executing "linkerd-multicluster/templates/remote-access-service-mirror-rbac.yaml" at <include "partials.annotations.created-by" $>: error calling include: template: no template "partials.annotations.created-by" associated with template "gotpl"
```
---------
Update: see helm/helm#5465 describing the current best-practice

# Core Helm Charts Split

This removes the `linkerd2` chart, and replaces it with the `linkerd-crds` and `linkerd-control-plane` charts. Note that the viz and other extension charts are not concerned by this change.

Also note the original `values.yaml` file has been split into both charts accordingly.

### UX

```console
$ helm install linkerd-crds --namespace linkerd --create-namespace linkerd/linkerd-crds
...
# certs.yaml should contain identityTrustAnchorsPEM and the identity issuer values
$ helm install linkerd-control-plane --namespace linkerd -f certs.yaml linkerd/linkerd-control-plane
```

### Upgrade

As explained in #6635, this is a breaking change. Users will have to uninstall the `linkerd2` chart and install these two, and eventually rollout the proxies (they should continue to work during the transition anyway).

### CLI

The CLI install/upgrade code was updated to be able to pick the templates from these new charts, but the CLI UX remains identical as before.

### Other changes

- The `linkerd-crds` and `linkerd-control-plane` charts now carry a version scheme independent of linkerd's own versioning, as explained in #7405.
- These charts are Helm v3, which is reflected in the `Chart.yaml` entries and in the removal of the `requirements.yaml` files.
- In the integration tests, replaced the `helm-chart` arg with `helm-charts` containing the path `./charts`, used to build the paths for both charts.

### Followups

- Now it's possible to add a `ServiceProfile` instance for Destination in the `linkerd-control-plane` chart.
2021-12-10 15:53:08 -05:00
Tarun Pothulapati 4170b49b33
smi: remove default functionality in linkerd (#7334)
Now, that SMI functionality is fully being moved into the
[linkerd-smi](www.github.com/linkerd/linkerd-smi) extension, we can
stop supporting its functionality by default.

This means that the `destination` component will stop reacting
to the `TrafficSplit` objects. When `linkerd-smi` is installed,
It does the conversion of `TrafficSplit` objects to `ServiceProfiles`
that destination components can understand, and will react accordingly.

Also, Whenever a `ServiceProfile` with traffic splitting is associated
with a service, the same information (i.e splits and weights) is also
surfaced through the `UI` (in the new `services` tab) and the `viz cmd`.
So, We are not really loosing any UI functionality here.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-12-03 12:07:30 +05:30
Alex Leong 18bfd0bd9e
default enableEndpointSlices to true (#7216)
EndpointSlices are enabled by default in our Kubernetes minimum version of 1.20.  Thus we can change the default behavior of the destination controller to use EndpointSlices instead of Endpoints.  This unblocks any functionality which is specific to EndpointSlices such as topology aware hints.

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-12-01 15:56:44 -08:00
Kevin Leimkuhler 01cbe616f1
Honor Server `proxyProtocol` in destination service `Get` with policy CRD APIs (#7184)
This change ensures that if a Server exists with `proxyProtocol: opaque` that selects an endpoint backed by a pod, that destination requests for that pod reflect the fact that it handles opaque traffic.

Currently, the only way that opaque traffic is honored in the destination service is if the pod has the `config.linkerd.io/opaque-ports` annotation. With the introduction of Servers though, users can set `server.Spec.ProxyProtocol: opaque` to indicate that if a Server selects a pod, then traffic to that pod's `server.Spec.Port` should be opaque. Currently, the destination service does not take this into account.

There is an existing change up that _also_ adds this functionality; it takes a different approach by creating a policy server client for each endpoint that a destination has. For `Get` requests on a service, the number of clients scales with the number of endpoints that back that service.

This change fixes that issue by instead creating a Server watch in the endpoint watcher and sending updates through to the endpoint translator.

The two primary scenarios to consider are

### A `Get` request for some service is streaming when a Server is created/updated/deleted
When a Server is created or updated, the endpoint watcher iterates through its endpoint watches (`servicePublisher` -> `portPublisher`) and if it selects any of those endpoints, the port publisher sends an update if the Server has marked that port as opaque.

When a Server is deleted, the endpoint watcher once again iterates through its endpoint watches and deletes the address set's `OpaquePodPorts` field—ensuring that updates have been cleared of Server overrides.

### A `Get` request for some service happens after a Server is created
When a `Get` request occurs (or new endpoints are added—they both take the same path), we must check if any of those endpoints are selected by some existing Server. If so, we have to take that into account when creating the address set.

This part of the change gives me a little concern as we first must get all the Servers on the cluster and then create a set of _all_ the pod-backed endpoints that they select in order to determine if any of these _new_ endpoints are selected.

## Testing
Right now this can be tested by starting up the destination service locally and running `Get` requests on a service that has endpoints selected by a Server

**app.yaml**
```yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod
  labels:
    app: pod
spec:
  containers:
  - name: app
    image: nginx
    ports:
    - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: svc
spec:
  selector:
    app: pod
  ports:
  - name: http
    port: 80
---
apiVersion: policy.linkerd.io/v1alpha1
kind: Server
metadata:
  name: srv
  labels:
    policy: srv
spec:
  podSelector:
    matchLabels:
      app: pod
  port: 80
  proxyProtocol: HTTP/1
```

```bash
$ go run controller/script/destination-client/main.go -path svc.default.svc.cluster.local:80
```

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-11-23 20:35:53 -07:00
Alejandro Pedraza e4a7c3b9b6
Log Warn instead of Err when prom not found in heartbeat (#7029)
Fixes #7013
2021-10-06 09:46:10 -05:00
Stepan Rabotkin 5e6a1b5508
Graceful shutdown for admin server (#6817)
* Graceful shutdown for admin server

Signed-off-by: Stepan Rabotkin <epicstyt@gmail.com>
2021-09-07 10:50:31 -05:00
Alex Leong ca1077bb08
Read trust roots from configmap (#6455)
Fixes #6452 

We add a `linkerd-identity-trust-roots` ConfigMap which contains the configured trust root bundle.  The proxy template partial is modified so that core control plane components load this bundle from the configmap through the downward API.

The identity controller is updated to mount this new configmap as a volume read the trust root bundle at startup.

Similarly, the proxy-injector also mounts this new configmap.  For each pod it injects, it reads the trust root bundle file and sets it on the injected pod.

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-07-28 13:23:15 -07:00
Alex Leong 9a1468328c
Emit event when issuing leaf certificate (#6364)
We emit a Kubernetes event from the identity controller when successfully issuing a leaf certificate.  The events include the identity, expiry, and a hash of the certificate.

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-06-25 11:19:16 -07:00
Tarun Pothulapati fac28ff8a7
destination: Remove support for IP Queries in `Get` API (#6018)
* destination: Remove support for IP Queries in `Get` API

Fixes #5246

This PR updates the destination to report an error when `Get`
is called for IP Queries. As the issue mentions, The proxies
are not using this API anymore and it helps to simplify and
remove unnecessary logic.

This removes the relevant `IPWatcher` logic, along with
unit tests

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-04-21 12:40:40 +05:30
Alejandro Pedraza 6980e45e1d
Remove the `linkerd-controller` pod (#6039)
* Remove the `linkerd-controller` pod

Now that we got rid of the `Version` API (#6000) and the destination API forwarding business in `linkerd-controller` (#5993), we can get rid of the `linkerd-controller` pod.

## Removals

- Deleted everything under `/controller/api/public` and `/controller/cmd/public-api`.
- Moved `/controller/api/public/test_helper.go` to `/controller/api/destination/test_helper.go` because those are really utils for destination testing. I also extracted from there the prometheus mock structs and put that under `/pkg/prometheus/test_helper.go`, which is now by both the `linkerd diagnostics endpoints` and the `metrics-api` tests, removing some duplication.
- Deleted the `controller.yaml` and `controller-rbac.yaml` helm templates along with the `publicAPIResources` and `publicAPIProxyResources` helm values.

## Health checks

- Removed the `can initialize the client` check given such client is no longer needed. The `linkerd-api` section was left with only the check `control pods are ready`, so I moved that under the `linkerd-existence` section and got rid of the `linkerd-api` section altogether.
- In that same `linkerd-existence` section, got rid of the `controller pod is running` check.

## Other changes

- Fixed the Control Plane section of the dashboard, taking account the disappearance of `linkerd-controller` and previously, of `linkerd-sp-validator`.
2021-04-19 09:57:45 -05:00
Alejandro Pedraza b22b5849e3
Removed Destination's `Get` API from the public-api (#5993)
* Removed Destination's `Get` API from the public-api

This is the first step towards removing the `linkerd-controller` pod. It deals with removing the Destination `Get` http and gRPC endpoint it exposes, that only the `linkerd diagnostics endpoints` is consuming.

Removed all references to Destination in the public-api, including all the gRPC-to-http-to-gRPC forwardings:
- Removed the `Get` method from the public-api gRPC server that forwarded the connection from the controller pod to the destination pod. Clients should now connect directly to the destination service via gRPC.
- Likewise, removed the destination boilerplate in the public-api http server (and its `main.go`) that served the `Get` destination endpoint and forwarded it into the gRPC server.
- Finally, removed the destination boilerplate from the public-api's `client.go` that created a client connecting to the http API.
2021-04-16 09:04:17 -05:00
Tarun Pothulapati 5c1a375a51
destination: pass opaque-ports through cmd flag (#5829)
* destination: pass opaque-ports through cmd flag

Fixes #5817

Currently, Default opaque ports are stored at two places i.e
`Values.yaml` and also at `opaqueports/defaults.go`. As these
ports are used only in destination, We can instead pass these
values as a cmd flag for destination component from Values.yaml
and remove defaultPorts in `defaults.go`.

This means that users if they override `Values.yaml`'s opauePorts
field, That change is propogated both for injection and also
discovery like expected.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-03-01 16:00:20 +05:30
Max Goltzsche a8e5bff21f
Provide CA cert as env var to identity controller. (#5690)
Currently the identity controller is the only component that receives the CA certificate / trust anchors as option `-identity-trust-anchors-pem` instead of an env var.
This stops one from letting it read the trust anchors from a Secret that is managed by e.g. cert-manager.

This PR uses an env var instead of the option to provide the trust anchors. For most helm chart users this doesn't change anything. However using kustomize the helm output manifest can now be adjusted (again) so that the certificate is loaded from a ConfigMap or Secret like in [this example](https://github.com/mgoltzsche/khelm/tree/master/example/kpt/linkerd) which aims to produce a static manifest to make the installation/update more declarative and support GitOps workflows.

This PR does not provide chart options/values to specify Secrets upfront - it would introduce dependencies to other operators.

Relates to #3843, see https://github.com/linkerd/linkerd2/issues/3843#issuecomment-775516217

Fixes #3321

Signed-off-by: Max Goltzsche <max.goltzsche@gmail.com>
2021-02-22 10:30:43 -05:00
Filip Petkovski 73f9fb3518
Use a shared informer when getting node topology (#5722)
Getting information about node topology queries the k8s api directly.
In an environment with high traffic and high number of pods, the
k8s api server can become overwhelmed or start throttling requests.

This MR introduces a node informer to resolve the bottleneck and
fetch node information asynchronously.

Fixes #5684

Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
2021-02-12 11:05:38 -05:00
Kevin Leimkuhler 75fcc9d623
Move tap from core into Viz extension (#5651)
Closes #5545.

This change moves all tap and tap-injector code into the viz directory. 

The tap and tap-injector components now also use a new tap image—separating
these components from the controller image that they are currently part of. This
means the controller image has removed all its build dependencies related to
tap.

Finally, the tap Protobuf has been separated from the metrics-api and moved into
it's own `.proto` file and gen directory. This introduces a clear split between
metrics-api and tap Protobuf.

There is no change in behavior for the `viz tap` command.

### Reviewing

#### Docker images

All the bin directory scripts should be updated to build and load the tap image.
All the CI workflows should be updated to build and push the tap image.

#### Controller and pkg directories

This is primarily deletions. Most of the deleted code in this directory is now
in the tap directory of the Viz extension.

#### viz/tap

This is the location that all the tap related code now lives in. New files are
mostly moved from the controller and pkg directories. Imports have all been
updated to point at the right locations and Protobuf.

The Protobuf here is taken from metrics-api and contains all tap-related
Protobuf.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-02-09 12:43:21 -05:00
Alejandro Pedraza 8ac5360041
Extract from public-api all the Prometheus dependencies, and moves things into a new viz component 'linkerd-metrics-api' (#5560)
* Protobuf changes:
- Moved `healthcheck.proto` back from viz to `proto/common` as it remains being used by the main `healthcheck.go` library (it was moved to viz by #5510).
- Extracted from `viz.proto` the IP-related types and put them in `/controller/gen/common/net` to be used by both the public and the viz APIs.

* Added chart templates for new viz linkerd-metrics-api pod

* Spin-off viz healthcheck:
- Created `viz/pkg/healthcheck/healthcheck.go` that wraps the original `pkg/healthcheck/healthcheck.go` while adding the `vizNamespace` and `vizAPIClient` fields which were removed from the core `healthcheck`. That way the core healthcheck doesn't have any dependencies on viz, and viz' healthcheck can now be used to retrieve viz api clients.
- The core and viz healthcheck libs are now abstracted out via the new `healthcheck.Runner` interface.
- Refactored the data plane checks so they don't rely on calling `ListPods`
- The checks in `viz/cmd/check.go` have been moved to `viz/pkg/healthcheck/healthcheck.go` as well, so `check.go`'s sole responsibility is dealing with command business. This command also now retrieves its viz api client through viz' healthcheck.

* Removed linkerd-controller dependency on Prometheus:
- Removed the `global.prometheusUrl` config in the core values.yml.
- Leave the Heartbeat's `-prometheus` flag hard-coded temporarily. TO-DO: have it automatically discover viz and pull Prometheus' endpoint (#5352).

* Moved observability gRPC from linkerd-controller to viz:
- Created a new gRPC server under `viz/metrics-api` moving prometheus-dependent functions out of the core gRPC server and into it (same thing for the accompaigning http server).
- Did the same for the `PublicAPIClient` (now called just `Client`) interface. The `VizAPIClient` interface disappears as it's enough to just rely on the viz `ApiClient` protobuf type.
- Moved the other files implementing the rest of the gRPC functions from `controller/api/public` to `viz/metrics-api` (`edge.go`, `stat_summary.go`, etc.).
- Also simplified some type names to avoid stuttering.

* Added linkerd-metrics-api bootstrap files. At the same time, we strip out of the public-api's `main.go` file the prometheus parameters and other no longer relevant bits.

* linkerd-web updates: it requires connecting with both the public-api and the viz api, so both addresses (and the viz namespace) are now provided as parameters to the container.

* CLI updates and other minor things:
- Changes to command files under `cli/cmd`:
  - Updated `endpoints.go` according to new API interface name.
  - Updated `version.go`, `dashboard` and `uninstall.go` to pull the viz namespace dynamically.
- Changes to command files under `viz/cmd`:
  - `edges.go`, `routes.go`, `stat.go` and `top.go`: point to dependencies that were moved from public-api to viz.
- Other changes to have tests pass:
  - Added `metrics-api` to list of docker images to build in actions workflows.
  - In `bin/fmt` exclude protobuf generated files instead of entire directories because directories could contain both generated and non-generated code (case in point: `viz/metrics-api`).

* Add retry to 'tap API service is running' check

* mc check shouldn't err when viz is not available. Also properly set the log in multicluster/cmd/root.go so that it properly displays messages when --verbose is used
2021-01-21 18:26:38 -05:00
Kevin Leimkuhler e7f2a3fba3
viz: add tap-injector (#5540)
## What this changes

This adds a tap-injector component to the `linkerd-viz` extension which is
responsible for adding the tap service name environment variable to the Linkerd
proxy container.

If a pod does not have a Linkerd proxy, no action is taken. If tap is disabled
via annotation on the pod or the namespace, no action is taken.

This also removes the environment variable for explicitly disabling tap through
an environment variable. Tap status for a proxy is now determined only be the
presence or absence of the tap service name environment variable.

Closes #5326

## How it changes

### tap-injector

The tap-injector component determines if `LINKERD2_PROXY_TAP_SVC_NAME` should be
added to a pod's Linkerd proxy container environment. If the pod satisfies the
following, it is added:

- The pod has a Linkerd proxy container
- The pod has not already been mutated
- Tap is not disabled via annotation on the pod or the pod's namespace

### LINKERD2_PROXY_TAP_DISABLED

Now that tap is an extension of Linkerd and not a core component, it no longer
made sense to explicitly enable or disable tap through this Linkerd proxy
environment variable. The status of tap is now determined only be if the
tap-injector adds or does not add the `LINKERD2_PROXY_TAP_SVC_NAME` environment
variable.

### controller image

The tap-injector has been added to the controller image's several startup
commands which determines what it will do in the cluster.

As a follow-up, I think splitting out the `tap` and `tap-injector` commands from
the controller image into a linkerd-viz image (or something like that) makes
sense.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-01-21 11:24:08 -05:00
Kevin Leimkuhler 7c0843a823
Add opaque ports to destination service updates (#5294)
## Summary

This changes the destination service to start indicating whether a profile is an
opaque protocol or not.

Currently, profiles returned by the destination service are built by chaining
together updates coming from watching Profile and Traffic Split updates.

With this change, we now also watch updates to Opaque Port annotations on pods
and namespaces; if an update occurs this is now included in building a profile
update and is sent to the client.

## Details

Watching updates to Profiles and Traffic Splits is straightforward--we watch
those resources and if an update occurs on one associated to a service we care
about then the update is passed through.

For Opaque Ports this is a little different because it is an annotation on pods
or namespaces. To account for this, we watch the endpoints that we should care
about.

### When host is a Pod IP

When getting the profile for a Pod IP, we check for the opaque ports annotation
on the pod and the pod's namespace. If one is found, we'll indicate if the
profile is an opaque protocol if the requested port is in the annotation.

We do not subscribe for updates to this pod IP. The only update we really care
about is if the pod is deleted and this is already handled by the proxy.

### When host is a Service

When getting the profile for a Service, we subscribe for updates to the
endpoints of that service. For any ports set in the opaque ports annotation on
any of the pods, we check if the requested port is present.

Since the endpoints for a service can be added and removed, we do subscribe for
updates to the endpoints of the service.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-12-18 12:38:59 -05:00
Alejandro Pedraza 578d4a19e9
Have the tap APIServer refresh its cert automatically (#5388)
Followup to #5282, fixes #5272 in its totality.

This follows the same pattern as the injector/sp-validator webhooks, leveraging `FsCredsWatcher` to watch for changes in the cert files.

To reuse code from the webhooks, we moved `updateCert()` to `creds_watcher.go`, and `run()` as well (which now is called `ProcessEvents()`).

The `TestNewAPIServer` test in `apiserver_test.go` was removed as it really was just testing two things: (1) that `apiServerAuth` doesn't error which is already covered in the following test, and (2) that the golib call `net.Listen("tcp", addr)` doesn't error, which we're not interested in testing here.

## How to test

To test that the injector/sp-validator functionality is still correct, you can refer to #5282

The steps below are similar, but focused towards the tap component:

```bash
# Create some root cert
$ step certificate create linkerd-tap.linkerd.svc ca.crt ca.key   --profile root-ca --no-password --insecure

# configure tap's caBundle to be that root cert
$ cat > linkerd-overrides.yml << EOF
tap:
  externalSecret: true
  caBundle: |
    < ca.crt contents>
EOF

# Install linkerd
$ bin/linkerd install --config linkerd-overrides.yml | k apply -f -

# Generate an intermediatery cert with short lifespan
$ step certificate create linkerd-tap.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-tap.linkerd.svc

# Create the secret using that intermediate cert
$ kubectl create secret tls \
  linkerd-tap-k8s-tls \
   --cert=ca-int.crt \
   --key=ca-int.key \
   --namespace=linkerd

# Rollout the tap pod for it to pick the new secret
$ k -n linkerd rollout restart deploy/linkerd-tap

# Tap should work
$ bin/linkerd tap -n linkerd deploy/linkerd-web
req id=0:0 proxy=in  src=10.42.0.15:33040 dst=10.42.0.11:9994 tls=true :method=GET :authority=10.42.0.11:9994 :path=/metrics
rsp id=0:0 proxy=in  src=10.42.0.15:33040 dst=10.42.0.11:9994 tls=true :status=200 latency=1779µs
end id=0:0 proxy=in  src=10.42.0.15:33040 dst=10.42.0.11:9994 tls=true duration=65µs response-length=1709B

# Wait 5 minutes and rollout tap again
$ k -n linkerd rollout restart deploy/linkerd-tap

# You'll see in the logs that the cert expired:
$ k -n linkerd logs -f deploy/linkerd-tap tap
2020/12/15 16:03:41 http: TLS handshake error from 127.0.0.1:45866: remote error: tls: bad certificate
2020/12/15 16:03:41 http: TLS handshake error from 127.0.0.1:45870: remote error: tls: bad certificate

# Recreate the secret
$ step certificate create linkerd-tap.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-tap.linkerd.svc
$ k -n linkerd delete secret linkerd-tap-k8s-tls
$ kubectl create secret tls \
  linkerd-tap-k8s-tls \
   --cert=ca-int.crt \
   --key=ca-int.key \
   --namespace=linkerd

# Wait a few moments and you'll see the certs got reloaded and tap is working again
time="2020-12-15T16:03:42Z" level=info msg="Updated certificate" addr=":8089" component=apiserver
```
2020-12-16 17:46:14 -05:00
Tarun Pothulapati 72a0ca974d
extension: Separate multicluster chart and binary (#5293)
Fixes #5257

This branch movies mc charts and cli level code to a new
top level directory. None of the logic is changed.

Also, moves some common types into `/pkg` so that they
are accessible both to the main cli and extensions.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-12-04 16:36:10 -08:00
Alejandro Pedraza 4c634a3816
Have webhooks refresh their certs automatically (#5282)
* Have webhooks refresh their certs automatically

Fixes partially #5272

In 2.9 we introduced the ability for providing the certs for `proxy-injector` and `sp-validator` through some external means like cert-manager, through the new helm setting `externalSecret`.
We forgot however to have those services watch changes in their secrets, so whenever they were rotated they would fail with a cert error, with the only workaround being to restart those pods to pick the new secrets.

This addresses that by first abstracting out `FsCredsWatcher` from the identity controller, which now lives under `pkg/tls`.

The webhook's logic in `launcher.go` no longer reads the certs before starting the https server, moving that instead into `server.go` which in a similar way as identity will receive events from `FsCredsWatcher` and update `Server.cert`. We're leveraging `http.Server.TLSConfig.GetCertificate` which allows us to provide a function that will return the current cert for every incoming request.

### How to test

```bash
# Create some root cert
$ step certificate create linkerd-proxy-injector.linkerd.svc ca.crt ca.key \
  --profile root-ca --no-password --insecure --san linkerd-proxy-injector.linkerd.svc

# configure injector's caBundle to be that root cert
$ cat > linkerd-overrides.yaml << EOF
proxyInjector:
  externalSecret: true
    caBundle: |
      < ca.crt contents>
EOF

# Install linkerd. The injector won't start untill we create the secret below
$ bin/linkerd install --controller-log-level debug --config linkerd-overrides.yaml | k apply -f -

# Generate an intermediatery cert with short lifespan
step certificate create linkerd-proxy-injector.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-proxy-injector.linkerd.svc

# Create the secret using that intermediate cert
$ kubectl create secret tls \
  linkerd-proxy-injector-k8s-tls \
   --cert=ca-int.crt \
   --key=ca-int.key \
   --namespace=linkerd

# start following the injector log
$ k -n linkerd logs -f -l linkerd.io/control-plane-component=proxy-injector -c proxy-injector

# Inject emojivoto. The pods should be injected normally
$ bin/linkerd inject https://run.linkerd.io/emojivoto.yml | kubectl apply -f -

# Wait about 5 minutes and delete a pod
$ k -n emojivoto delete po -l app=emoji-svc

# You'll see it won't be injected, and something like "remote error: tls: bad certificate" will appear in the injector logs.

# Regenerate the intermediate cert
$ step certificate create linkerd-proxy-injector.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-proxy-injector.linkerd.svc

# Delete the secret and recreate it
$ k -n linkerd delete secret linkerd-proxy-injector-k8s-tls
$ kubectl create secret tls \
  linkerd-proxy-injector-k8s-tls \
   --cert=ca-int.crt \
   --key=ca-int.key \
   --namespace=linkerd

# Wait a couple of minutes and you'll see some filesystem events in the injector log along with a "Certificate has been updated" entry
# Then delete the pod again and you'll see it gets injected this time
$ k -n emojivoto delete po -l app=emoji-svc

```
2020-12-04 16:25:59 -05:00
Alejandro Pedraza 4687dc52aa
Refactor webhook framework to allow webhooks define their flags (#5256)
* Refactor webhook framework to allow webhook define their flags

Pulled out of `launcher.go` the flag parsing logic and moved it into the `Main` methods of the webhooks (under `controller/cmd/proxy.injector/main.go` and `controller/cmd/sp-validator/main.go`), so that individual webhooks themselves can define the flags they want to use.

Also no longer require that webhooks have cluster-wide access.

Finally, renamed the type `webhook.handlerFunc` to `webhook.Handler` so it can be exported. This will be used in the upcoming jaeger webhook.
2020-11-23 10:40:30 -05:00
Tarun Pothulapati 5e774aaf05
Remove dependency of linkerd-config for control plane components (#4915)
* Remove dependency of linkerd-config for most control plane components

This PR removes the dependency of `linkerd-config` into control
plane components by making all that information passed through CLI
flags. As most of these components require a couple of flags, passing
them as flags could be more helpful, as updations to the flags trigger a
rollout unlike a configMap update.

This does not update the proxy-injector as it needs a lot more data
and mounting `linkerd-config` is better.
2020-10-06 22:19:18 +05:30
Tarun Pothulapati d0caaa86c4
Bump k8s client-go to v0.19.2 (#5002)
Fixes #4191 #4993

This bumps Kubernetes client-go to the latest v0.19.2 (We had to switch directly to 1.19 because of this issue). Bumping to v0.19.2 required upgrading to smi-sdk-go v0.4.1. This also depends on linkerd/stern#5

This consists of the following changes:

- Fix ./bin/update-codegen.sh by adding the template path to the gen commands, as it is needed after we moved to GOMOD.
- Bump all k8s related dependencies to v0.19.2
- Generate CRD types, client code using the latest k8s.io/code-generator
- Use context.Context as the first argument, in all code paths that touch the k8s client-go interface

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-09-28 12:45:18 -05:00
Alejandro Pedraza b30d35f46a
Reset service-mirror component when target's k8s API is unreachable (#4996)
When the service-mirror component can't reach the target's k8s API, the goroutine blocks and it can't be unblocked.

This was happenining specifically in the case of the multicluster integration test (still to be pushed), where the source and target clusters are created in quick succession and the target's API service doesn't always have time to be exposed before being requested by the service mirror.

The fix consists on no longer have restartClusterWatcher be side-effecting, and instead return an error. If such error is not nil then the link watcher is stopped and reset after 10 seconds.
2020-09-25 11:00:28 -05:00
Alex Leong 9d3cf6ee4d
Move most service-mirror code out of cmd package (#4901)
All of the code for the service mirror controller lives in the `linkerd/linkerd2/controller/cmd` package.  It is typical for control plane components to only have a `main.go` entrypoint in the cmd package.  This can sometimes make it hard to find the service mirror code since I wouldn't expect it to be in the cmd package.

We move the majority of the code to a dedicated controller package, leaving only main.go in the cmd package.  This is purely organizational; no behavior change is expected.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-08-27 14:17:18 -07:00
Matei David 7ed904f31d
Enable endpoint slices when upgrading through CLI (#4864)
## What/How
@adleong  pointed out in #4780 that when enabling slices during an upgrade, the new value does not persist in the `linkerd-config` ConfigMap. I took a closer look and it seems that we were never overwriting the values in case they were different.

* To fix this, I added an if block when validating and building the upgrade options -- if the current flag value differs from what we have in the ConfigMap, then change the ConfigMap value.
* When doing so, I made sure to check that if the cluster does not support `EndpointSlices` yet the flag is set to true, we will error out. This is done similarly (copy&paste similarily) to what's in the install part.
* Additionally, I have noticed that the helm ConfigMap template stored the flag value under `enableEndpointSlices` field name. I assume this was not changed in the initial PR to reflect the changes made in the protocol buffer. The API (and thus the CLI) uses the field name `endpointSliceEnabled` instead. I have changed the config template so that helm installations will use the same field, which can then be used in the destination service or other components that may implement slice support in the future.

Signed-off-by: Matei David <matei.david.35@gmail.com>
2020-08-24 14:34:50 -07:00
Josh Soref 72aadb540f
Spelling (#4872)
This PR corrects misspellings identified by the [check-spelling action](https://github.com/marketplace/actions/check-spelling).

The misspellings have been reported at aaf440489e (commitcomment-41423663)

The action reports that the changes in this PR would make it happy: 5b82c6c5ca

Note: this PR does not include the action. If you're interested in running a spell check on every PR and push, that can be offered separately.

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
2020-08-12 21:59:50 -07:00