Commit Graph

184 Commits

Author SHA1 Message Date
Alejandro Pedraza 9ac1caaf1b
Add `additionalEnv` helm settings (#12080)
Add `additionalEnv` helm settings to the proxy and controller manifests
alongside the existing `experimentalEnv` ones.
2024-02-15 14:26:45 -05:00
jan-kantert af402a35ff
Introduce Helm configuration for probe timeout and delays (#11458)
In certain cases (e.g. high CPU load) kubelets can be slow to read readiness
and liveness responses. Linkerd is configured with a default time out of `1s`
for its probes. To prevent injected pod restarts under high load, this
change makes probe timeouts configurable.

---------

Signed-off-by: Matei David <matei@buoyant.io>
Co-authored-by: Matei David <matei@buoyant.io>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
2024-02-08 18:05:53 +00:00
Andrew Seigner b9546af08f
helm: Use k8s `EnvVar` for `proxy.ExperimentalEnv` (#11923)
PR #11874 introduced a `proxy.ExperimentalEnv` setting, allowing
arbitrary name+value environment variables on proxies. This name+value
pairing was a subset of k8s' environment variables, specifically, it did
not allow for `valueFrom.configMapKeyRef` and related fields. PR #11908
introduced this pattern in the ControlPlane containers.

Modify `proxy.ExperimentalEnv` to behave identically to k8s' native
`EnvVar`, allowing settings such as:
```
--set proxy.experimentalEnv[0].name=LINKERD2_PROXY_DEFROBINATION
--set proxy.experimentalEnv[0].valueFrom.configMapKeyRef.key=extreme-key
--set proxy.experimentalEnv[0].valueFrom.configMapKeyRef.name=extreme-config
```

Context:
https://github.com/linkerd/linkerd2/pull/11908#issuecomment-1888945793

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2024-01-15 10:03:11 +00:00
Alejandro Pedraza 55d1049b73
Add cni-repair-controller to linkerd-cni DaemonSet (#11699)
Followup to linkerd/linkerd2-proxy-init#306
Fixes linkerd/linkerd2#11073

This adds the `reinitialize-pods` container to the `linkerd-cni`
DaemonSet, along with its config in `values.yaml`.

Also the `linkerd-cni`'s version is bumped, to contain the new binary
for this controller.
2024-01-05 09:28:43 -08:00
Oliver Gould 3a2d164a5d
helm: Add proxy.experimentalEnv settings (#11874)
When working with experimental proxy features that are not yet exposed via
control plane APIs, it can be convenient to set additional environment variables
on proxies.

To support this, we add an undocumented `proxy.experimentalEnv` value:

    --set proxy.experimentalEnv.LINKERD2_PROXY_DEFROBINATION=extreme
2024-01-04 10:28:16 +00:00
Oliver Gould 04f2ce511a
inject: Configure proxy stream lifetime limits (#11837)
linkerd/linkerd2-proxy#2587 adds configuration parameters that bound the
lifetime and idle times of control plane streams. This change helps to
mitigate imbalanced control plane replica usage and to generally prevent
scenarios where a stream becomes "stuck," as has been observed when a
control plane replica is unhealthy.

This change adds helm values to control this behavior. Default values
are provided.
2023-12-27 16:24:33 -08:00
Alejandro Pedraza 2d716299a1
Add ability to configure client-go's `QPS` and `Burst` settings (#11644)
* Add ability to configure client-go's `QPS` and `Burst` settings

## Problem and Symptoms

When having a very large number of proxies request identity in a short period of time (e.g. during large node scaling events), the identity controller will attempt to validate the tokens sent by the proxies at a rate surpassing client-go's the default request rate threshold, triggering client-side throttling, which will delay the proxies initialization, and even failing their startup (after a 2m timeout). The identity controller will surface this through log entries like this:

```
time="2023-11-08T19:50:45Z" level=error msg="error validating token for web.emojivoto.serviceaccount.identity.linkerd.cluster.local: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline"
```

## Solution

Client-go's default `QPS` is 5 and `Burst` is 10. This PR exposes those settings as entries in `values.yaml` with defaults of 100 and 200 respectively. Note this only applies to the identity controller, as it's the only controller performing direct requests to the `kube-apiserver` in a hot path. The other controllers mostly rely in informers, and direct calls are sporadic.

## Observability

The `QPS` and `Burst` settings used are exposed both as a log entry as soon as the controller starts, and as in the new metric gauges `http_client_qps` and `http_client_burst`

## Testing

You can use the following K6 script, which simulates 6k calls to the `Certify` service during one minute from emojivoto's web pod. Before running this you need to:

- Put the identity.proto and [all the other proto files](https://github.com/linkerd/linkerd2-proxy-api/tree/v0.11.0/proto) in the same directory.
- Edit the [checkRequest](https://github.com/linkerd/linkerd2/blob/edge-23.11.3/pkg/identity/service.go#L266) function and add logging statements to figure the `token` and `csr` entries you can use here, that will be shown as soon as a web pod starts.

```javascript
import { Client, Stream } from 'k6/experimental/grpc';
import { sleep } from 'k6';

const client = new Client();
client.load(['.'], 'identity.proto');

// This always holds:
// req_num = (1 / req_duration ) * duration * VUs
// Given req_duration (0.5s) test duration (1m) and the target req_num (6k), we
// can solve for the required VUs:
// VUs = req_num * req_duration / duration
// VUs = 6000 * 0.5 / 60 = 50
export const options = {
  scenarios: {
    identity: {
      executor: 'constant-vus',
      vus: 50,
      duration: '1m',
    },
  },
};

export default () => {
  client.connect('localhost:8080', {
    plaintext: true,
  });

  const stream = new Stream(client, 'io.linkerd.proxy.identity.Identity/Certify');

  // Replace with your own token
  let token = "ZXlKaGJHY2lPaUpTVXpJMU5pSXNJbXRwWkNJNkluQjBaV1pUZWtaNWQyVm5OMmxmTTBkV2VUTlhWSFpqTmxwSmJYRmtNMWRSVEhwNVNHWllhUzFaZDNNaWZRLmV5SmhkV1FpT2xzaWFXUmxiblJwZEhrdWJEVmtMbWx2SWwwc0ltVjRjQ0k2TVRjd01EWTRPVFk1TUN3aWFXRjBJam94TnpBd05qQXpNamt3TENKcGMzTWlPaUpvZEhSd2N6b3ZMMnQxWW1WeWJtVjBaWE11WkdWbVlYVnNkQzV6ZG1NdVkyeDFjM1JsY2k1c2IyTmhiQ0lzSW10MVltVnlibVYwWlhNdWFXOGlPbnNpYm1GdFpYTndZV05sSWpvaVpXMXZhbWwyYjNSdklpd2ljRzlrSWpwN0ltNWhiV1VpT2lKM1pXSXRPRFUxT1dJNU4yWTNZeTEwYldJNU5TSXNJblZwWkNJNklqaGlZbUV5WWpsbExXTXdOVGN0TkRnMk1TMWhNalZsTFRjelpEY3dOV1EzWmpoaU1TSjlMQ0p6WlhKMmFXTmxZV05qYjNWdWRDSTZleUp1WVcxbElqb2lkMlZpSWl3aWRXbGtJam9pWm1JelpUQXlNRE10TmpZMU55MDBOMk0xTFRoa09EUXRORGt6WXpBM1lXUTJaak0zSW4xOUxDSnVZbVlpT2pFM01EQTJNRE15T1RBc0luTjFZaUk2SW5ONWMzUmxiVHB6WlhKMmFXTmxZV05qYjNWdWREcGxiVzlxYVhadmRHODZkMlZpSW4wLnlwMzAzZVZkeHhpamxBOG1wVjFObGZKUDB3SC03RmpUQl9PcWJ3NTNPeGU1cnNTcDNNNk96VWR6OFdhYS1hcjNkVVhQR2x2QXRDRVU2RjJUN1lKUFoxVmxxOUFZZTNvV2YwOXUzOWRodUU1ZDhEX21JUl9rWDUxY193am9UcVlORHA5ZzZ4ZFJNcW9reGg3NE9GNXFjaEFhRGtENUJNZVZ6a25kUWZtVVZwME5BdTdDMTZ3UFZWSlFmNlVXRGtnYkI1SW9UQXpxSmcyWlpyNXBBY3F5enJ0WE1rRkhSWmdvYUxVam5sN1FwX0ljWm8yYzJWWk03T2QzRjIwcFZaVzJvejlOdGt3THZoSEhSMkc5WlNJQ3RHRjdhTkYwNVR5ZC1UeU1BVnZMYnM0ZFl1clRYaHNORjhQMVk4RmFuNjE4d0x6ZUVMOUkzS1BJLUctUXRUNHhWdw==";
  // Replace with your own CSR
  let csr = "MIIBWjCCAQECAQAwRjFEMEIGA1UEAxM7d2ViLmVtb2ppdm90by5zZXJ2aWNlYWNjb3VudC5pZGVudGl0eS5saW5rZXJkLmNsdXN0ZXIubG9jYWwwWTATBgcqhkjOPQIBBggqhkjOPQMBBwNCAATKjgVXu6F+WCda3Bbq2ue6m3z6OTMfQ4Vnmekmvirip/XGyi2HbzRzjARnIzGlG8wo4EfeYBtd2MBCb50kP8F8oFkwVwYJKoZIhvcNAQkOMUowSDBGBgNVHREEPzA9gjt3ZWIuZW1vaml2b3RvLnNlcnZpY2VhY2NvdW50LmlkZW50aXR5LmxpbmtlcmQuY2x1c3Rlci5sb2NhbDAKBggqhkjOPQQDAgNHADBEAiAM7aXY8MRs/EOhtPo4+PRHuiNOV+nsmNDv5lvtJt8T+QIgFP5JAq0iq7M6ShRNkRG99ZquJ3L3TtLWMNVTPvqvvUE=";

  const data = {
		identity:                     "web.emojivoto.serviceaccount.identity.linkerd.cluster.local",
    token:                        token,
		certificate_signing_request:  csr,
  };
  stream.write(data);

  // This request takes around 2ms, so this sleep will mostly determine its final duration
  sleep(0.5);
};
```

This results in the following report:

```
scenarios: (100.00%) 1 scenario, 50 max VUs, 1m30s max duration (incl. graceful stop):
           * identity: 50 looping VUs for 1m0s (gracefulStop: 30s)

     data_received................: 6.3 MB 104 kB/s
     data_sent....................: 9.4 MB 156 kB/s
     grpc_req_duration............: avg=2.14ms   min=873.93µs med=1.9ms    max=12.89ms  p(90)=3.13ms   p(95)=3.86ms
     grpc_streams.................: 6000   99.355331/s
     grpc_streams_msgs_received...: 6000   99.355331/s
     grpc_streams_msgs_sent.......: 6000   99.355331/s
     iteration_duration...........: avg=503.16ms min=500.8ms  med=502.64ms max=532.36ms p(90)=504.05ms p(95)=505.72ms
     iterations...................: 6000   99.355331/s
     vus..........................: 50     min=50      max=50
     vus_max......................: 50     min=50      max=50

running (1m00.4s), 00/50 VUs, 6000 complete and 0 interrupted iterations
```

With the old defaults (QPS=5 and Burst=10), the latencies would be much higher and number of complete requests much lower.
2023-11-28 15:25:05 -05:00
TJ Miller 1b37e1989f
Add native sidecar support (#11465)
* Add native sidecar support

Kubernetes will be providing beta support for native sidecar containers in version 1.29.  This feature improves network proxy sidecar compatibility for jobs and initContainers.

Introduce a new annotation config.alpha.linkerd.io/proxy-enable-native-sidecar and configuration option Proxy.NativeSidecar that causes the proxy container to run as an init-container.

Fixes: #11461

Signed-off-by: TJ Miller <millert@us.ibm.com>
2023-11-22 12:23:24 -05:00
Matei David 1e6a019b31
Introduce configurable values for protocol detection (#11536)
This change allows users to configure protocol detection timeout values
(outbound and inbound). Certain environments may find that protocol
detection inhibits debugging and makes it harder to reason with a
client's behaviour. In such cases (and not only) it may be deseriable to
change the default protocol detection timeout to a higher value than the
default 10s.

Through this change, users may configure their timeout values either
with install-time settings or through annotations; this follows our
usual proxy configuration model. The proxy uses different timeout values
for the inbound and outbound stacks (even though they use the same
default value) and this change respects that by adding two separate
fields.

Signed-off-by: Matei David <matei@buoyant.io>
2023-11-02 14:03:50 +00:00
Alex Leong fe9532b1cf
re-enable go format checking in CI (#11363)
A git related "dubious ownership" error was preventing the go format action from running in CI. As a result of go formatting not getting checked in CI, some go formatting drift has been introduced.

Add the appropriate git config command to resolve dubious ownership so that go format checking is run in CI.

Signed-off-by: Alex Leong <alex@buoyant.io>
2023-09-14 13:13:30 -07:00
Alex Leong 748d99e6e8
Add networkValidator.enableSecurityContext value (#11064)
Certain environments are incompatible with the security context used by the network validator init container.

We add an option for disabling the network validator's security context so that such environments can provide their own.

Signed-off-by: Alex Leong <alex@buoyant.io>
2023-07-14 09:15:35 -07:00
Takumi Sue 1bb3ff6cdd
fix: supplement the HA flag (#11011)
fix: supplement the HA flag

Linkerd checks are skipped for HA because the field is missing from the configmap generated during install time.

This change introduces an HA field in the helm charts that will be persisted, thereby allowing checks to run.

---------

Signed-off-by: Takumi Sue <u630868b@alumni.osaka-u.ac.jp>
2023-06-15 13:40:09 +01:00
Matei David 6bea77d89b
Add cache configuration annotation support (#10871)
The proxy caches discovery results in-memory. Linkerd supports
overriding the default eviction timeout for cached discovery results
through install (i.e. helm) values. However, it is currently not
possible to configure timeouts on a workload-per-workload basis, or to
configure the values after Linkerd has been installed (or upgraded).

This change adds support for annotation based configuration. Workloads
and namespaces now support two new configuration annotations that will
override the install values when specified.

Additionally, a typo has been fixed on the internal type representation.
This is not a breaking change since the type itself is not exposed to
users and is parsed correctly in the values.yaml file (or CLI)


Signed-off-by: Matei David <matei@buoyant.io>
Co-authored-by: Eliza Weisman <eliza@buoyant.io>
2023-05-10 16:27:37 +01:00
Matei David 38c186be41
Introduce discovery cache timeout values (#10831)
The proxy caches results in-memory, both for inbound and outbound
service (and policy) discovery. While the proxy's default values are
great in most cases, certain client configurations may require
overrides. The proxy supports overriding the default values, however, it
currently does not offer an easy way for users to configure them.

This PR introduces two new values in Linkerd's control plane chart. The
values control the inbound and outbound cache discovery idle timeout --
the amount of time a result will be kept in the cache if unused. Setting
this value will change the configuration for all injected proxies, but
not for the control plane.


---------

Signed-off-by: Matei David <matei@buoyant.io>
2023-05-05 14:33:34 +01:00
Amit Kumar d26c324e76
added --set flag to install-cni plugin (#10633)
This PR added support for --set flag to linkerd cni-plugin installation command.
Also made changes to test file for cni-plugin install.
Fixed a bug at pkg/chart/charts.go for resources template.
fixes #9917

* Allow supporting all flags and values

This leverages `chartutil.CoalesceValues` in order to merge the values provided through regular flags, and the ones provided via `--set` flags. As that function consumes maps, I introduced the `ToMap` method function on the cni `Values` struct (a copy of the same function from the core linkerd `Values` struct) to convert the struct backing the regular flags into a map.

And for the `RenderCNI` method to be able to deal with value maps instead of yaml, the `charts.Chart` struct now distinguishes between `Values` (a map) and `RawValues` (YAML).

This allowed removing the `parseYAMLValue` function and avoid having to deal with individual entries in `buildValues()`, and we no longer need the `valuesOverrides` field in the `cniPluginOptions` struct.

## Tests

```bash
# Testing regular flag
$ bin/go-run cli install-cni --use-wait-flag | grep use.wait.flag
        "use-wait-flag": true

# Testing using --set
$ bin/go-run cli install-cni --set useWaitFlag=true | grep use.wait.flag
        "use-wait-flag": true

# Testing using --set on a setting that has no regular flag
$ bin/go-run cli install-cni --set enablePSP=true | grep PodSecurityPolicy
kind: PodSecurityPolicy
```

---------

Signed-off-by: amit-62 <kramit6662@gmail.com>
Co-authored-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
Co-authored-by: Matei David <matei.david.35@gmail.com>
2023-04-20 09:34:06 -05:00
Eliza Weisman 83e9c45bd1
add `trust_dns=error` to default proxy log level (#10774)
* add `trust_dns=error` to default proxy log level

Since upstream has yet to release a version with PR
bluejekyll/trust-dns#1881, this commit changes the proxy's default log
level to silence warnings from `trust_dns_proto` that are generally
spurious.

Closes #10123.
2023-04-20 09:29:56 -05:00
Eng Zer Jun 27703ab900
Replace `github.com/ghodss/yaml` with `sigs.k8s.io/yaml` (#10610)
At the time of making this commit, the package `github.com/ghodss/yaml`
is no longer actively maintained.

`sigs.k8s.io/yaml` is a permanent fork of `ghodss/yaml` and is actively
maintained by Kubernetes SIG.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2023-03-24 09:20:55 -05:00
Steve Jenson 44424466c1
linkerd-cni: add new release to the build (#10209)
wind the new linkerd-cni build through the build. refactor image, version, and pullPolicy into an Image object.

Signed-off-by: Steve Jenson <stevej@buoyant.io>
2023-02-08 13:54:35 -08:00
anoxape 3855aa2371
Correct `identity.issuer.externalCA` to `identity.externalCA` (#10071)
Helm chart has `identity.externalCA` value.
CLI code sets `identity.issuer.externalCA` and fails to produce the desired configuration. This change aligns everything to `identity.externalCA`.

Signed-off-by: Dmitry Mikhaylov <anoxape@gmail.com>
2023-01-03 11:37:30 -08:00
Matei David 35cecb50e1
Add static and dynamic port overrides for CNI ebpf (#9841)
When CNI plugins run in ebpf mode, they may rewrite the packet
destination when doing socket-level load balancing (i.e in the
`connect()` call). In these cases, skipping `443` on the outbound side
for control plane components becomes redundant; the packet is re-written
to target the actual Kubernetes API Server backend (which typically
listens on port `6443`, but may be overridden when the cluster is
created).

This change adds port `6443` to the list of skipped ports for control
plane components. On the linkerd-cni plugin side, the ports are
non-configurable. Whenever a pod with the control plane component label
is handled by the plugin, we look-up the `kubernetes` service in the
default namespace and append the port values (of both ClusterIP and
backend) to the list.

On the initContainer side, we make this value configurable in Helm and
provide a sensible default (`443,6443`). Users may override this value
if the ports do not correspond to what they have in their cluster. In
the CLI, if no override is given, we look-up the service in the same way
that we do for linkerd-cni; if failures are encountered we fallback to
the default list of ports from the values file.

Closes #9817

Signed-off-by: Matei David <matei@buoyant.io>
2022-11-30 09:45:25 +00:00
Matei David d45f7331f3
Introduce value to run proxy-init as privileged (#9873)
This change aims to solve two distinct issues that have cropped up in
the proxy-init configuration.

First, it decouples `allowPrivilegeEscalation` from running proxy-init
as root. At the moment, whenever the container is run as root, privilege
escalation is also allowed. In more restrictive environments, this will
prevent the pod from coming up (e.g security policies may complain about
`allowPrivilegeEscalation=true`). Worth noting that privilege escalation
is not necessary in many scenarios since the capabilities are passed to
the iptables child process at build time.

Second, it introduces a new `privileged` value that will allow users to
run the proxy-init container without any restrictions (meaning all
capabilities are inherited). This is essentially the same as mapping
root on host to root in the container. This value may solve issues in
distributions that run security enhanced linux, since iptables will be
able to load kernel modules that it may otherwise not be able to load
(privileged mode allows the container nearly the same privileges as
processes running outside of a container on a host, this further allows
the container to set configurations in AppArmor or SELinux).

Privileged mode is independent from running the container as root. This
gives users more control over the security context in proxy-init. The
value may still be used with `runAsRoot: false`.

Fixes #9718

Signed-off-by: Matei David <matei@buoyant.io>
2022-11-25 10:58:51 +00:00
Steve Jenson a83bad9ccb
Adds a default Exists toleration to linkerd-cni (#9789) 2022-11-22 15:26:20 -05:00
Steve Jenson 309e8d1210
Validate CNI configurations during pod startup (#9678)
When users use CNI, we want to ensure that network rewriting inside the pod is setup before allowing linkerd to start. When rewriting isn't happening, we want to exit with a clear error message and enough information in the container log for the administrator to either file a bug report with us or fix their configuration.

This change adds a validator initContainer to all injected workloads, when linkerd is installed with "cniEnabled=false". The validator replaces the noop init container, and will prevent pods from starting up if iptables is not configured.

Part of #8120

Signed-off-by: Steve Jenson <stevej@buoyant.io>
2022-10-26 11:14:45 +01:00
Jeremy Chase 32b4ac4f3a
Populate empty proxy-version annotation (#9382)
Addresses: #9311 

* Set injected `proxy-version` annotation to `values.LinkerdVersion` when image version is empty.
* Set `Proxy.Image.Version` consistently between CLI and Helm

Tested when installed via CLI:

```
$ k get po -o yaml -n emojivoto | grep proxy-version
      linkerd.io/proxy-version: dev-0911ad92-jchase
      linkerd.io/proxy-version: dev-0911ad92-jchase
      linkerd.io/proxy-version: dev-0911ad92-jchase
      linkerd.io/proxy-version: dev-0911ad92-jchase
```

Untested when installed via Helm.

Signed-off-by: Jeremy Chase <jeremy.chase@gmail.com>
Co-authored-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2022-10-11 13:05:59 -06:00
Alex Leong 5cb6755ebe
Add noop init container when the cni plugin is enabled (#9504)
Add a "noop" init container which uses the proxy image and runs `/bin/sleep 0` to injected pods.  This init container is only added when the linkerd-cni-plugin is enabled.  The idea here is that by running an init container, we trigger kubernetes to update the pod status.  In particular, this ensures that the pod status IP is populated, which is necessary in certain cases where other CNIs such as Calico are involved.

Therefore, this may fix https://github.com/linkerd/linkerd2/issues/9310, but I don't have a reproduction and therefore am not able to verify.

Signed-off-by: Alex Leong <alex@buoyant.io>
2022-10-11 11:31:45 -07:00
Martin Odstrčilík 89c5729264
Add PodMonitor resources to the Helm chart (#9113)
Add PodMonitor resources to the Helm chart

With an external Prometheus setup installed using prometheus-operator the Prometheus instance scraping can be configured using Service/PodMonitor resources.

By adding PodMonitor resource into Linkerd Helm chart we can mimic the configuration of bundled Prometheus, see https://github.com/linkerd/linkerd2/blob/main/viz/charts/linkerd-viz/templates/prometheus.yaml#L47-L151, that comes with linkerd-viz extension. The PodMonitor resources are based on https://github.com/linkerd/website/issues/853#issuecomment-913234295 which are proven to be working. The only problem we face is that bundled Grafana charts will need to look at different jobs when querying metrics.

When enabled by `podMonitor.enabled` value in the Helm chart, PodMonitor for Linkerd resources should be installed alongside the Linkerd and Linkerd metrics should be present in the Prometheus.

Fixes #6596

Signed-off-by: Martin Odstrcilik <martin.odstrcilik@gmail.com>
2022-10-04 06:19:23 -05:00
Alejandro Pedraza b65364704b
Add config proxyInit.runAsUser to facilitate 2.11.x->2.12.0 upgrade (#9201)
In 2.11.x, proxyInit.runAsRoot was true by default, which caused the
proxy-init's runAsUser field to be 0. proxyInit.runAsRoot is now
defaulted to false in 2.12.0, but runAsUser still isn't
configurable, and when following the upgrade instructions
here, helm doesn't change runAsUser and so it conflicts with the new value
for runAsRoot=false, resulting in the pods erroring with this message:
Error: container's runAsUser breaks non-root policy (pod: "linkerd-identity-bc649c5f9-ckqvg_linkerd(fb3416d2-c723-4664-acf1-80a64a734561)", container: linkerd-init)

This PR adds a new default for runAsUser to avoid this issue.
2022-08-19 09:07:13 -05:00
Eliza Weisman f6c6ff965c
inject: fix --default-inbound-policy not setting annotation (#9197)
Depends on #9195

Currently, `linkerd inject --default-inbound-policy` does not set the
`config.linkerd.io/default-inbound-policy` annotation on the injected
resource(s).

The `inject` command does _try_ to set that annotation if it's set in
the `Values` generated by `proxyFlagSet`:
14d1dbb3b7/cli/cmd/inject.go (L485-L487)

...but, the flag in the proxy `FlagSet` doesn't set
`Values.Proxy.DefaultInboundPolicy`, it sets
`Values.PolicyController.DefaultAllowPolicy`:
7c5e3aaf40/cli/cmd/options.go (L375-L379)

This is because the flag set is shared across `linkerd inject` and
`linkerd install` subcommands, and in `linkerd install`, we want to set
the default policy for the whole cluster by configuring the policy
controller. In `linkerd inject`, though, we want to add the annotation
to the injected pods only.

This branch fixes this issue by changing the flag so that it sets the
`Values.Proxy.DefaultInboundPolicy` instead of the
`Values.PolicyController.DefaultAllowPolicy` value. In `linkerd
install`, we then set `Values.PolicyController.DefaultAllowPolicy` based
on the value of `Values.Proxy.DefaultInboundPolicy`, while in `inject`,
we will now actually add the annotation.

This branch is based on PR #9195, which adds validation to reject
invalid values for `--default-inbound-policy`, rather than on `main`.
This is because the validation code added in that PR had to be moved
around a bit, since it now needs to validate the
`Values.Proxy.DefaultInboundPolicy` value rather than the
`Values.PolicyController.DefaultAllowPolicy` value. I thought using
#9195 as a base branch was better than basing this on `main` and then
having to resolve merge conflicts later. When that PR merges, this can 
be rebased onto `main`.

Fixes #9168
2022-08-18 17:16:27 -07:00
Matei David e4f7788c14
Change default iptables mode to legacy (#9097)
Some hosts may not have 'nft' modules available. Currently, proxy-init
defaults to using 'iptables-nft'; if the host does not have support for
nft modules, the init container will crash, blocking all injected
workloads from starting up.

This change defaults the 'iptablesMode' value to 'legacy'.

* Update linkerd-control-plane/values file default
* Update proxy-init partial to default to 'legacy' when no mode is
  specified
* Change expected values in 'pkg/charts/linkerd2/values_test.go' and in
  'cli/cmd/install_test'
* Update golden files

Fixes #9053

Signed-off-by: Matei David <matei@buoyant.io>
2022-08-05 10:45:29 -06:00
Kevin Leimkuhler c6693a5ae3
Add `policyController.probeNetworks` configuration value (#9091)
Closes #8945 

This adds the `policyController.probeNetworks` configuration value so that users
can configure the networks from which probes are expected to be performed.

By default, we allow all networks (`0.0.0.0/0`). Additionally, this value
differs from `clusterNetworks` is that it is a list of networks, and thus we
have to join the values in the Helm templating.

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2022-08-05 10:43:22 -06:00
Matei David 9dd51d3897
Add `iptablesMode` flag to proxy-init (#8887)
This change introduces a new value to be used at install (or upgrade)
time. The value (`proxyInit.iptablesMode=nft|legacy`) is responsible
for starting the proxy-init container in nft or legacy mode.

By default, the init container will use iptables-nft. When the mode is set to
`nft`, it will instead use iptables-nft. Most modern Linux distributions
support both, but a subset (such as RHEL based families) only support
iptables-nft and nf_tables.

Signed-off-by: Matei David <matei@buoyant.io>
2022-07-27 21:45:19 -07:00
Matei David e6c263fd3d
Change `nodeAffinity` type in charts pkg (#8992)
Linkerd supports nodeAffinity for its control plane components. In code,
nodeAffinity is expressed as a map with string keys and string values.
In practice, the types do not appear to be correct, however.

The proxy-injector will, for example, read the install values from a
ConfigMap and unmarshal them into a Values struct. Unmarshalling fails
due to a type error: error unmarshaling JSON: while decoding JSON: json:
cannot unmarshal object into Go struct field Values.nodeAffinity of type
string. While the control plane will be deployed with the correct
affinity values, any subsequent injections will fail as a result of the
incorrect types.

To fix the issue, the type has been changed from map[string]string, to
map[string]interface{}. While the keys are preserved as strings, values
should be arbitrary in this case (since we do not have a concrete
representation of the affinity object).

Signed-off-by: Matei David <matei@buoyant.io>
2022-07-27 13:47:29 +03:00
Oliver Gould b0712ebdf6
policy: Enable controller logs at INFO level (#8958)
Dependencies like `kubert` may emit INFO level logs that are useful to
see (e.g., when the serviceaccount has insufficient RBAC). This change
updates the default policy controller log level to simply be `info`.

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-07-22 11:09:17 -07:00
Eliza Weisman 85e5ab3b38
inject: add `config.linkerd.io/shutdown-grace-period` annotation (#8923)
PR linkerd/linkerd2-proxy#1815 added support for a
`LINKERD2_PROXY_SHUTDOWN_GRACE_PERIOD` environment variable that
configures the proxy's maximum grace period for graceful shutdown. This
is intended to ensure that if a proxy is shut down, it will eventually
terminate in a relatively timely manner, even if some stubborn
connections don't close gracefully.

This branch adds support for a `config.linkerd.io/shutdown-grace-period`
annotation that can be used to override the default grace period
duration.

Hopefully I've added this everywhere it needs to be added --- please let
me know if I've missed anything!
2022-07-19 14:43:38 -07:00
Alex Leong 893fa78671
Split HA functionality into multiple configurable values (#8445)
Some autoscalers, namely Karpenter, don't allow podAntiAffinity and the enablePodAntiAffinity flag is
currently overloaded with other HA requirements. This commit splits out the PDB and updateStrategy
configuration into separate value inputs.

Fixes #8062

Signed-off-by: Alex Leong <alex@buoyant.io>
Co-authored-by: Evan Hines <evan@firebolt.io>
2022-05-10 09:49:58 -07:00
Oliver Gould fa8ddb4801
Use go-test/deep for comparisons in tests (#8427)
We frequently compare data structures--sometimes very large data
structures--that are difficult to compare visually. This change replaces
uses of `reflect.DeepEqual` with `deep.Equal`. `go-test`'s `deep.Equal`
returns a diff of values that are not equal.

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-05-05 09:31:07 -07:00
Alex Leong 6762dd28ac
Add --crds flag to install/upgrade and remove config/control-plane stages (#8251)
Fixes: #8173 

In order to support having custom resources in the default Linkerd installation, it is necessary to add a separate install step to install CRDs before the core install.  The Linkerd Helm charts already accomplish this by having CRDs in a separate chart.

We add this functionality to the CLI by adding a `--crds` flag to `linkerd install` and `linkerd upgrade` which outputs manifests for the CRDs only and remove the CRD manifests when the `--crds` flag is not set.  To avoid a compounding of complexity, we remove the `config` and `control-plane` stages from install/upgrade.  The effect of this is that we drop support for splitting up an install by privilege level (cluster admin vs Linkerd admin).

The Linkerd install flow is now always a 2-step process where `linkerd install --crds` must be run first to install CRDs only and then `linkerd install` is run to install everything else.  This more closely aligns the CLI install flow with the Helm install flow where the CRDs are a separate chart.  Attempting to run `linkerd install` before the CRDs are installed will result in a helpful error message.

Similarly, upgrade is also a 2-step process of `linkerd upgrade --crds` follow by `linkerd upgrade`.

Signed-off-by: Alex Leong <alex@buoyant.io>
2022-04-28 09:36:14 -07:00
Michał Romanowski 88b8da50d2
Introduce node affinity support for linkerd pods (#8137)
In order to restrict pods to run only on arbitrarily chosen nodes, affinities
or tolerations can be used. Currently, Linkerd only supports tolerations,
which are applied to pods and allow them to be scheduled on nodes with
matching "taints".

Certain environments and workflows lean more towards affinity instead of
tolerations to determine preferred or required scheduling. This change
introduces a new "nodeAffinity" field so that users may specify affinity
rules for scheduling Linkerd pods.

Closes #8136

Signed-off-by: Michal Romanowski <michal.rom089@gmail.com>
2022-04-15 11:24:16 +01:00
Kevin Leimkuhler 3222778191
Match linkerd-init CPU/memory requests/limits (#7989)
Closes #7980 

A pod is considered `Burstable` instead of `Guaranteed` if there exists at least one container in the pod that specifies CPU/memory limits/requests that do not match.

The `linkerd-init` container falls into this category meaning that even if all other containers in a Pod have matching CPU/memory limits/requests, the Pod will not be considered `Guaranteed` because of `linkerd-init`'s hardcoded values.

This changes the values to match, meaning that `linkerd-init` will not be the culprit container if a Pod is not considered `Guaranteed`. Raising the requests—instead of lowering the limits—felt like the safer option here. This means that the container will now always be guaranteed these amounts _and_ will never use more.

[Docs](https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed) explain this in more detail.

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2022-03-08 15:30:03 -07:00
Oliver Gould 425a43def5
Enable gocritic linting (#7906)
[gocritic][gc] helps to enforce some consistency and check for potential
errors. This change applies linting changes and enables gocritic via
golangci-lint.

[gc]: https://github.com/go-critic/go-critic

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-02-17 22:45:25 +00:00
Matei David 0d59864033
Remove usage of controllerImageVersion values field (#7883)
Remove usage of controllerImageVersion values field

This change removes the unused `controllerImageVersion` field, first
from the tests, and then from the actual chart values structure. Note
that at this point in time, it is impossible to use
`--controller-image-version` through Helm, yet it still seems to be
working for the CLI.

* We configure the charts to use `linkerdVersionValue` instead of
  `controlPlaneImageVersion` (or default to it where appropriate).
* We add the stringslicevar flag (i.e `--set`) to the flagset we use in
  upgrade tests. This means instead of testing value overrides through a
  dedicated flag, we can now make use of `--set` in upgrade tests. We
  first set the linkerdVersionValue in the install option and then
  override the policy controller image version and the linkerd
  controller image version to test flags work as expected.
* We remove hardcoded values from healthcheck test.
* We remove field from chart values struct.

Signed-off-by: Matei David <matei@buoyant.io>
2022-02-17 15:19:08 +00:00
Oliver Gould f5876c2a98
go: Enable `errorlint` checking (#7885)
Since Go 1.13, errors may "wrap" other errors. [`errorlint`][el] checks
that error formatting and inspection is wrapping-aware.

This change enables `errorlint` in golangci-lint and updates all error
handling code to pass the lint. Some comparisons in tests have been left
unchanged (using `//nolint:errorlint` comments).

[el]: https://github.com/polyfloyd/go-errorlint

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-02-16 18:32:19 -07:00
Alejandro Pedraza 539bcced34
Fix HA race when installing through Helm (#7718)
* Fix HA race when installing through Helm

Fixes #7699

The problem didn't affect 2.11, only latest edges since the Helm charts
got split into `linkerd-crds` and `linkerd-control-plane` and we stopped
creating the linkerd namespace.

With the surrendering of the creation of the namespace, we can no longer
guarantee the existence of the `config.linkerd.io/admission-webhooks`
label, so this PR creates an `objectSelector` for the injector that
filters-out control-plane components, based on the existence of the
`linkerd.io/control-plane-component` label.

Given we still want the multicluster components to be injected, we had
to be rename its `linkerd.io/control-plane-component` label to
`component`, following the same convention used by the other extensions.
The corresponding Prometheus rule for scraping the service mirrors was
updated accordingly.

A similar filter was added for the linkerd-cni DaemonSet.

Also, now that the `kubernetes.io/metadata.name` is prevalent, we're
also using it to filter out the kube-system and cert-manager namespaces.
The former namespace was already mentioned in the docs; the latter is
also included to avoid having races with cert-manager-cainjector which
can be used to provision the injector's cert.
2022-02-02 11:27:20 -05:00
Alejandro Pedraza 68b63269d9
Remove the `proxy.disableIdentity` config (#7729)
* Remove the `proxy.disableIdentity` config

Fixes #7724

Also:
- Removed the `linkerd.io/identity-mode` annotation.
- Removed the `config.linkerd.io/disable-identity` annotation.
- Removed the `linkerd.proxy.validation` template partial, which only
  made sense when `proxy.disableIdentity` was `true`.
- TestInjectManualParams now requires to hit the cluster to retrieve the
  trust root.
2022-01-31 10:17:10 -05:00
Eliza Weisman 9e9c9457ae
inject: support `config.linkerd.io/access-log` annotation (#7689)
With #7661, the proxy supports a `LINKERD2_PROXY_ACCESS_LOG`
configuration with the values `apache` or `json`. This configuration
causes the proxy to emit access logs to stderr. This branch makes it
possible for users to enable access logging by adding an annotation,
`config.linkerd.io/access-log`, that tells the proxy injector to set
this environment variable.

I've also added some tests to ensure that the annotation and the
environment variable are set correctly. I tried to follow the existing
tests as examples of how we do this, but please let me know if I've
overlooked anything!

Closes #7662 #1913

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2022-01-24 14:02:19 -08:00
Michael Lin 99f3e087e1
Introduce annotation to skip subnets (#7631)
The goal is to support configuring the
`--subnets-to-ignore` flag in proxy-init

This change adds a new annotation `/skip-subnets` which
takes a comma-separated list of valid CIDR.
The argument will map to the `--subnets-to-ignore`
flag in the proxy-init initContainer.

Fixes #6758

Signed-off-by: Michael Lin <mlzc@hey.com>
2022-01-20 16:53:59 +00:00
Alejandro Pedraza 67dfebb259
Stop shipping grafana-based image (#7567)
* Stop shipping grafana-based image

Fixes #6045 #7358

With this change we stop building a Grafana-based image preloaded with the Linkerd Grafana dashboards.

Instead, we'll recommend users to install Grafana by themselves, and we provide a file `grafana/values.yaml` with a default config that points to all the same Grafana dashboards we had, which are now hosted in https://grafana.com/orgs/linkerd/dashboards .

The new file `grafana/README.md` contains instructions for installing the official Grafana Helm chart, and mentions other available methods.

The `grafana.enabled` flag has been removed, and `grafanaUrl` has been moved to `grafana.url`. This will help consolidating other grafana settings that might emerge, in particular when #7429 gets addressed.

## Dashboards definitions changes

The dashboard definitions under `grafana/dashboards` (which should be kept in sync with what's published in https://grafana.com/orgs/linkerd/dashboards), got updated, adding the `__inputs`, `__elements` and `__requires` entries at the beginning, that were required in order to be published.
2022-01-11 14:47:40 -05:00
Brian Dunnigan a8dbe4d1e0
Adding support for injecting Webhook CA bundles with cert-manager CA Injector (#7353) (#7354)
* Adding support for injecting Webhook CA bundles with cert-manager CA Injector (#7353)

Currently, users need to pass in the caBundle when doing a helm/CLI install. If the user is already using cert-manager to generate webhook certs, they can use the cert-manager CA injector to populate the caBundle for the Webhooks.

Adding inectCaFrom and injectCaFromSecret options to every webhook alongside every caBundle option gives users the ability to add the cert-manager.io/inject-ca-from or cert-manager.io/inject-ca-from-secret annotations to the Webhooks specifying the Certificate or Secret to pull the CA from to accomplish ca bundle injection.

Signed-off-by: Brian Dunnigan <bdun1013dev@gmail.com>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
2022-01-03 14:28:30 -05:00
Alejandro Pedraza f9f3ebefa9
Remove namespace from charts and split them into `linkerd-crd` and `linkerd-control-plane` (#6635)
Fixes #6584 #6620 #7405

# Namespace Removal

With this change, the `namespace.yaml` template is rendered only for CLI installs and not Helm, and likewise the `namespace:` entry in the namespace-level objects (using a new `partials.namespace` helper).

The `installNamespace` and `namespace` entries in `values.yaml` have been removed.

There in the templates where the namespace is required, we moved from `.Values.namespace` to `.Release.Namespace` which is filled-in automatically by Helm. For the CLI, `install.go` now explicitly defines the contents of the `Release` map alongside `Values`.

The proxy-injector has a new `linkerd-namespace` argument given the namespace is no longer persisted in the `linkerd-config` ConfigMap, so it has to be passed in. To pass it further down to `injector.Inject()` without modifying the `Handler` signature, a closure was used.

------------
Update: Merged-in #6638: Similar changes for the `linkerd-viz` chart:

Stop rendering `namespace.yaml` in the `linkerd-viz` chart.

The additional change here is the addition of the `namespace-metadata.yaml` template (and its RBAC), _not_ rendered in CLI installs, which is a Helm `post-install` hook, consisting on a Job that executes a script adding the required annotations and labels to the viz namespace using a PATCH request against kube-api. The script first checks if the namespace doesn't already have an annotations/labels entries, in which case it has to add extra ops in that patch.

---------
Update: Merged-in the approved #6643, #6665 and #6669 which address the `linkerd2-cni`, `linkerd-multicluster` and `linkerd-jaeger` charts. 

Additional changes from what's already mentioned above:
- Removes the install-namespace option from `linkerd install-cni`, which isn't found in `linkerd install` nor `linkerd viz install` anyways, and it would add some complexity to support.
- Added a dependency on the `partials` chart to the `linkerd-multicluster-link` chart, so that we can tap on the `partials.namespace` helper.
- We don't have any more the restriction on having the muticluster objects live in a separate namespace than linkerd. It's still good practice, and that's the default for the CLI install, but I removed that validation.


Finally, as a side-effect, the `linkerd mc allow` subcommand was fixed; it has been broken for a while apparently:

```console
$ linkerd mc allow --service-account-name foobar
Error: template: linkerd-multicluster/templates/remote-access-service-mirror-rbac.yaml:16:7: executing "linkerd-multicluster/templates/remote-access-service-mirror-rbac.yaml" at <include "partials.annotations.created-by" $>: error calling include: template: no template "partials.annotations.created-by" associated with template "gotpl"
```
---------
Update: see helm/helm#5465 describing the current best-practice

# Core Helm Charts Split

This removes the `linkerd2` chart, and replaces it with the `linkerd-crds` and `linkerd-control-plane` charts. Note that the viz and other extension charts are not concerned by this change.

Also note the original `values.yaml` file has been split into both charts accordingly.

### UX

```console
$ helm install linkerd-crds --namespace linkerd --create-namespace linkerd/linkerd-crds
...
# certs.yaml should contain identityTrustAnchorsPEM and the identity issuer values
$ helm install linkerd-control-plane --namespace linkerd -f certs.yaml linkerd/linkerd-control-plane
```

### Upgrade

As explained in #6635, this is a breaking change. Users will have to uninstall the `linkerd2` chart and install these two, and eventually rollout the proxies (they should continue to work during the transition anyway).

### CLI

The CLI install/upgrade code was updated to be able to pick the templates from these new charts, but the CLI UX remains identical as before.

### Other changes

- The `linkerd-crds` and `linkerd-control-plane` charts now carry a version scheme independent of linkerd's own versioning, as explained in #7405.
- These charts are Helm v3, which is reflected in the `Chart.yaml` entries and in the removal of the `requirements.yaml` files.
- In the integration tests, replaced the `helm-chart` arg with `helm-charts` containing the path `./charts`, used to build the paths for both charts.

### Followups

- Now it's possible to add a `ServiceProfile` instance for Destination in the `linkerd-control-plane` chart.
2021-12-10 15:53:08 -05:00
Kevin Leimkuhler e54061b61f
Remove old build constraints (#7392)
#7371 upgraded the Go version which included using the new formats for [build constraints](https://pkg.go.dev/cmd/go#hdr-Build_constraints). This removes the old ones that are no longer used.

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2021-12-08 14:36:24 -07:00