Commit Graph

2526 Commits

Author SHA1 Message Date
Kevin Leimkuhler b830efdad7
Add OpaqueTransport field to destination protocol hints (#5421)
## What

When the destination service returns a destination profile for an endpoint,
indicate if the endpoint can receive opaque traffic.

## Why

Closes #5400

## How

When translating a pod address to a destination profile, the destination service
checks if the pod is controlled by any linkerd control plane. If it is, it can
set a protocol hint where we indicate that it supports H2 and opaque traffic.

If the pod supports opaque traffic, we need to get the port that it expects
inbound traffic on. We do this by getting the proxy container and reading it's
`LINKERD2_PROXY_INBOUND_LISTEN_ADDR` environment variable. If we successfully
parse that into a port, we can set the opaque transport field in the destination
profile.

## Testing

A test has been added to the destination server where a pod has a
`linkerd-proxy` container. We can expect the `OpaqueTransport` field to be set
in the returned destination profile's protocol hint.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-12-23 11:06:39 -05:00
Tarun Pothulapati 2087c95dd8
viz: move some components into linkerd-viz (#5340)
* viz: move some components into linkerd-viz

This branch moves the grafana,prometheus,web, tap components
into a new viz chart, following the same extension model that
multi-cluster and jaeger follow.

The components in viz are not injected during install time, and
will go through the injector. The `viz install` does not have any
cli flags to customize the install directly but instead follow the Helm
way of customization by using flags such as 
`set`, `set-string`, `values`, `set-files`.

**Changes Include**
- Move `grafana`, `prometheus`, `web`, `tap` templates into viz extension.
- Remove all add-on related charts, logic and tests w.r.t CLI & Helm.
- Clean up `linkerd2/values.go` & `linkerd2/values.yaml` to not contain
 fields related to viz components.
- Update `linkerd check` Healthchecks to not check for viz components.
- Create a new top level `viz` directory with CLI logic and Helm charts.
- Clean fields in the `viz/Values.yaml` to be in the `<component>.<property>`
model. Ex: `prometheus.resources`, `dashboard.image.tag`, etc so that it is
consistent everywhere.

**Testing**

```bash
# Install the Core Linkerd Installation
./bin/linkerd install | k apply -f -

# Wait for the proxy-injector to be ready
# Install the Viz Extension
./bin/linkerd cli viz install | k apply -f -

# Customized Install
./bin/linkerd cli viz install --set prometheus.enabled=false | k apply -f -
```

What is not included in this PR:
- Move of Controller from core install into the viz extension.
- Simplification and refactoring of the core chart i.e removing `.global`, etc.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-12-23 20:17:31 +05:30
Josh Soref 84a9fc9b53
Fix description to match command (#5431)
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
2020-12-22 15:18:51 -08:00
Kevin Leimkuhler 2c78cf9255
Remove count from opaque ports tcp metric (#5422)
We need to test for the presence of the TCP metric labels, not the exact count.
This change removes the count of `1` so that it can match any count.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-12-22 12:10:05 -05:00
Kevin Leimkuhler f6c8d27d83
Add mulitcluster check command (#5410)
## What

This change moves the `linkerd check --multicluster` functionality under it's
own multicluster subcommand: `linkerd multicluster check`.

There should be no functional changes as a result of this change. `linkerd
check` no longer checks for anything multicluster related and the
`--multicluster` flag has been removed.

## Why

Closes #5208

The bulk of these changes are moving all the multicluster checks from
`pkg/healthcheck` into the multicluster package.

Doing this completely separates it from core Linkerd. It still uses
`pkg/healtcheck` when possible, but anything that is used only by `multicluster
check` has been moved.

**Note the the `kubernetes-api` and `linkerd-existence` checks are run.**

These checks are required for setting up the Linkerd health checker. They set
the health checker's `kubeAPI`, `linkerdConfig`, and `apiClient` fields.

These could be set manually so that the only check the user sees is
`linkerd-multicluster`, but I chose not to do this.

If any of the setting functions errors, it would just tell the user to run
`linkerd check` and ensure the installation is correct. I find the user error
handling to be better by including these required checks since they should be
run in the first place.

## How to test

Installing Linkerd and multicluster should result in a basic check output:

```
$ bin/linkerd install |kubectl apply -f -
..
$ bin/linkerd check
..
$ bin/linkerd multicluster install |kubectl apply -f -
..
$ bin/linkerd multicluster check
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
√ can initialize the client
√ can query the control plane API

linkerd-multicluster
--------------------
√ Link CRD exists


Status check results are √
```

After linking a cluster:

```
$ bin/linkerd multicluster check
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
√ can initialize the client
√ can query the control plane API

linkerd-multicluster
--------------------
√ Link CRD exists
√ Link resources are valid
        * k3d-y
√ remote cluster access credentials are valid
        * k3d-y
√ clusters share trust anchors
        * k3d-y
√ service mirror controller has required permissions
        * k3d-y
√ service mirror controllers are running
        * k3d-y
× all gateway mirrors are healthy
        probe-gateway-k3d-y.linkerd-multicluster mirrored from cluster [k3d-y] has no endpoints
    see https://linkerd.io/checks/#l5d-multicluster-gateways-endpoints for hints

Status check results are ×
```

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-12-21 15:50:17 -05:00
Alejandro Pedraza 557f3a9f60
Remove tracing from linkerd's helm chart requirements.lock (#5411)
This avoids `bin/helm-build` to return a lint error
2020-12-21 10:40:25 -05:00
Eliza Weisman da3195f2ef
add release notes for edge-20.12.4 (#5409)
This edge release adds support for the `config.linkerd.io/opaque-ports`
annotation on pods and namespaces, to configure ports that should skip
the proxy's protocol detection. In addition, it adds new CLI commands
related to the `linkerd-jaeger` extension, fixes bugs in the CLI
`install` and `upgrade` commands and Helm charts, and fixes a potential
false positive in the proxy's HTTP protocol detection. Finally, it
includes improvements in proxy performance and memory usage, including
an upgrade for the proxy's dependency on the Tokio async runtime.

* Added support for the `config.linkerd.io/opaque-ports` annotation on
  pods and namespaces, to indicate to the proxy that some ports should
  skip protocol detection
* Fixed an issue where `linkerd install --ha` failed to honor flags
* Fixed an issue where `linkerd upgrade --ha` can override existing
  configs
* Added missing label to the `linkerd-config-overrides` secret to avoid 
  breaking upgrades performed with the help of `kubectl apply --prune`
* Added a missing icon to Jaeger Helm chart
* Added new `linkerd jaeger check` CLI command to validate that the
  `linkerd-jaeger` extension is working correctly
* Added new `linkerd jaeger uninstall` CLI command to print the
  `linkerd-jaeger` extension's resources so that they can be piped into
  `kubectl delete`
* Fixed an issue where the `linkerd-cni` daemonset may not be installed
  on all intended nodes, due to missing tolerations to the `linkerd-cni`
  Helm chart (thanks @rish-onesignal!)
* Fixed an issue where the `tap` APIServer would not refresh its certs
  automatically when provided externally—like through cert-manager
* Changed the proxy's cache eviction strategy to reduce memory
  consumption, especially for busy HTTP/1.1 clients
* Fixed an issue in the proxy's HTTP protocol detection which could
  cause false positives for non-HTTP traffic
* Increased the proxy's default dispatch timeout to 5 seconds to
  accomodate connection pools which might open conenctions without
  immediately making a request
* Updated the proxy's Tokio dependency to v0.3
2020-12-18 13:04:01 -08:00
Alejandro Pedraza 7471752fc4
Add missing label to linkerd-config-overrides secret (#5407)
Fixes #5385 (second bug in there)

Added missing label `linkerd.io/control-plane-ns=linkerd` that all the
control plane resources must have, that is passed to `kubectl apply
--prune`
2020-12-18 13:47:19 -05:00
Kevin Leimkuhler 7c0843a823
Add opaque ports to destination service updates (#5294)
## Summary

This changes the destination service to start indicating whether a profile is an
opaque protocol or not.

Currently, profiles returned by the destination service are built by chaining
together updates coming from watching Profile and Traffic Split updates.

With this change, we now also watch updates to Opaque Port annotations on pods
and namespaces; if an update occurs this is now included in building a profile
update and is sent to the client.

## Details

Watching updates to Profiles and Traffic Splits is straightforward--we watch
those resources and if an update occurs on one associated to a service we care
about then the update is passed through.

For Opaque Ports this is a little different because it is an annotation on pods
or namespaces. To account for this, we watch the endpoints that we should care
about.

### When host is a Pod IP

When getting the profile for a Pod IP, we check for the opaque ports annotation
on the pod and the pod's namespace. If one is found, we'll indicate if the
profile is an opaque protocol if the requested port is in the annotation.

We do not subscribe for updates to this pod IP. The only update we really care
about is if the pod is deleted and this is already handled by the proxy.

### When host is a Service

When getting the profile for a Service, we subscribe for updates to the
endpoints of that service. For any ports set in the opaque ports annotation on
any of the pods, we check if the requested port is present.

Since the endpoints for a service can be added and removed, we do subscribe for
updates to the endpoints of the service.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-12-18 12:38:59 -05:00
Alejandro Pedraza d661054795
Fix CLI install/upgrade overriding settings in HA (#5399)
Fixes #5385

## The problems

- `linkerd install --ha` isn't honoring flags
- `linkerd upgrade --ha` is overridding existing configs silently or failing with an error
- *Upgrading HA instances from before 2.9 to version 2.9.1 results in configs being overridden silently, or the upgrade fails with an error*

## The cause

The change in #5358 attempted to fix `linkerd install --ha` that was only applying some of the `values-ha.yaml` defaults, by calling `charts.NewValues(true)` and merging that with the values built from `values.yaml` overriden by the flags. It turns out the `charts.NewValues()` implementation was by itself merging against `values.yaml` and as a result any flag was getting overridden by its default.

This also happened when doing `linkerd upgrade --ha` on an existing instance, which could result in silently overriding settings, or it could also fail loudly like for example when upgrading set up that has an external issuer (in this case the issuer cert won't be able to be read during upgrade and an error would occur as described in #5385).

Finally, when doing `linkerd upgrade` (no --ha flag) on an HA install from before 2.9 results in configs getting overridden as well (silently or with an error) because in order to generate the `linkerd-config-overrides` secret, the original install flags are retrieved from `linkerd-config` via the `loadStoredValuesLegacy()` function which then effectively ends up performing a `linkerd upgrade` with all the flags used for `linkerd install` and falls into the same trap as above.

## The fix

In `values.go` the faulting merging logic is not used anymore, so now `NewValues()` only returns the default values from `values.yaml` and doesn't require an argument anymore. It calls `readDefaults()` which now only returns the appropriate values depending on whether we're on HA or not.
There's a new function `MergeHAValues()` that merges `values-ha.yaml` into the current values (it doesn't look into `values.yaml` anymore), which is only used when processing the `--ha` flag in `options.go`.

## How to test

To replicate the issue try setting a custom setting and check it's not applied:
```bash
linkerd install --ha --controller-log level debug | grep log.level
- -log-level=info
```

## Followup

This wasn't caught because we don't have HA integration tests. Now that our test infra is based on k3d, it should be easy to make such a test using a cluster with multiple nodes. Either that or issuing `linkerd install --ha` with additional configs and compare against a golden file.
2020-12-18 12:11:52 -05:00
Alejandro Pedraza 0666824d4e
Add missing icon entry to jaeger chart (#5397)
* Add missing icon entry to jaeger chart

This is required for `helm lint` to pass. Its absence was what caused
the last CI edge release to fail and so we had to manually upload the
charts.
2020-12-17 13:32:44 -05:00
rish-onesignal 146cd1301d
Add missing tolerations in linkerd-cni helm chart (#5368) (#5369)
The linkerd-cni helm chart is missing tolerations on the daemonset. This
prevents the linkerd-cni daemonset from being installed on all intended
nodes.

We use the same template partial as used in the main linkerd helm chart
to add tolerations if specified to the linkerd-cni daemonset spec.

Fixes #5368

Signed-off-by: Rishabh Jain <rishabh@onesignal.com>
2020-12-17 10:12:31 -05:00
Alejandro Pedraza 578d4a19e9
Have the tap APIServer refresh its cert automatically (#5388)
Followup to #5282, fixes #5272 in its totality.

This follows the same pattern as the injector/sp-validator webhooks, leveraging `FsCredsWatcher` to watch for changes in the cert files.

To reuse code from the webhooks, we moved `updateCert()` to `creds_watcher.go`, and `run()` as well (which now is called `ProcessEvents()`).

The `TestNewAPIServer` test in `apiserver_test.go` was removed as it really was just testing two things: (1) that `apiServerAuth` doesn't error which is already covered in the following test, and (2) that the golib call `net.Listen("tcp", addr)` doesn't error, which we're not interested in testing here.

## How to test

To test that the injector/sp-validator functionality is still correct, you can refer to #5282

The steps below are similar, but focused towards the tap component:

```bash
# Create some root cert
$ step certificate create linkerd-tap.linkerd.svc ca.crt ca.key   --profile root-ca --no-password --insecure

# configure tap's caBundle to be that root cert
$ cat > linkerd-overrides.yml << EOF
tap:
  externalSecret: true
  caBundle: |
    < ca.crt contents>
EOF

# Install linkerd
$ bin/linkerd install --config linkerd-overrides.yml | k apply -f -

# Generate an intermediatery cert with short lifespan
$ step certificate create linkerd-tap.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-tap.linkerd.svc

# Create the secret using that intermediate cert
$ kubectl create secret tls \
  linkerd-tap-k8s-tls \
   --cert=ca-int.crt \
   --key=ca-int.key \
   --namespace=linkerd

# Rollout the tap pod for it to pick the new secret
$ k -n linkerd rollout restart deploy/linkerd-tap

# Tap should work
$ bin/linkerd tap -n linkerd deploy/linkerd-web
req id=0:0 proxy=in  src=10.42.0.15:33040 dst=10.42.0.11:9994 tls=true :method=GET :authority=10.42.0.11:9994 :path=/metrics
rsp id=0:0 proxy=in  src=10.42.0.15:33040 dst=10.42.0.11:9994 tls=true :status=200 latency=1779µs
end id=0:0 proxy=in  src=10.42.0.15:33040 dst=10.42.0.11:9994 tls=true duration=65µs response-length=1709B

# Wait 5 minutes and rollout tap again
$ k -n linkerd rollout restart deploy/linkerd-tap

# You'll see in the logs that the cert expired:
$ k -n linkerd logs -f deploy/linkerd-tap tap
2020/12/15 16:03:41 http: TLS handshake error from 127.0.0.1:45866: remote error: tls: bad certificate
2020/12/15 16:03:41 http: TLS handshake error from 127.0.0.1:45870: remote error: tls: bad certificate

# Recreate the secret
$ step certificate create linkerd-tap.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-tap.linkerd.svc
$ k -n linkerd delete secret linkerd-tap-k8s-tls
$ kubectl create secret tls \
  linkerd-tap-k8s-tls \
   --cert=ca-int.crt \
   --key=ca-int.key \
   --namespace=linkerd

# Wait a few moments and you'll see the certs got reloaded and tap is working again
time="2020-12-15T16:03:42Z" level=info msg="Updated certificate" addr=":8089" component=apiserver
```
2020-12-16 17:46:14 -05:00
Tarun Pothulapati 589f36c4c2
jaeger: add check sub command (#5295)
* jaeger: add check sub command

This adds a new `linkerd jaeger check` command to have checks w.r.t
jaeger extension. This is similar to that of the `linkerd check` cmd.
As jaeger is a separate package, It was a bit complex for this to work
as not all types and fields from healthcheck pkg are public, Helper
funcs were used to mitigate this.

This has the following changes:

- Adds a new `check.go` file under the jaeger extension pkg
- Moves some commonly needed funcs and types from `cli/cmd/check.go`
  and `pkg/healthcheck/health.go` into
  `pkg/healthcheck/healthcheck_output.go`.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-12-17 00:26:34 +05:30
Eirik 168f36605d
Added Altinn (#5390) 2020-12-15 14:50:44 -08:00
Oliver Gould d67e13cf17
proxy: v2.125.0 (#5392)
This release features a change to the proxy's cache eviction strategy to
ensure that clients (and their load balancers) are reused by new
outbound connections. This can dramatically reduce memory consumption,
especially for busy HTTP/1.1 clients.

Also, the proxy's HTTP detection scheme has been made more robust.
Previously, the proxy would perform a only single read to determine
whether a TCP stream was HTTP, which could lead to false positives. Now,
the proxy reads until at least the first newline, which is what the HTTP
parser actually needs to make a proper determination. With this, the
default dispatch timeouts have been increased to 5s to accomodate
connection pools that may not issue an immediate request.

Furthermore, this release includes an upgrade to Tokio v0.3 and its
associated ecosystem.

---

* update buffers to use Tokio 0.3 MPSC channels (linkerd/linkerd2-proxy#759)
* Update the proxy to use Tokio 0.3  (linkerd/linkerd2-proxy#732)
* Rename DetectHttp to NewServeHttp (linkerd/linkerd2-proxy#760)
* http: more consistent names for body types (linkerd/linkerd2-proxy#761)
* io: simplify the `Io` trait (linkerd/linkerd2-proxy#762)
* trace: nicer traces in tests, clean up trace configuration (linkerd/linkerd2-proxy#766)
* Ensure that services are held as long they are being used (linkerd/linkerd2-proxy#767)
* outbound: add stack tests for http (linkerd/linkerd2-proxy#765)
* cache: Ensure that actively held services are not evicted (linkerd/linkerd2-proxy#768)
* cache: Only spawn a single task per cache entry (linkerd/linkerd2-proxy#770)
* test: make integration tests shut up (linkerd/linkerd2-proxy#771)
* metrics: Add support for microsecond counters (linkerd/linkerd2-proxy#772)
* Add a protocol label to stack metrics (linkerd/linkerd2-proxy#773)
* detect: Make protocol detection more robust (linkerd/linkerd2-proxy#744)
2020-12-15 14:35:52 -08:00
Alex Leong 74950e9407
Add jaeger uninstall command (#5353)
Add a `linkerd jaeger uninstall` command which prints the linkerd-jaeger extension resources so that they can be deleted.  This is similar to the `linkerd uninstall` command.

```
> bin/linkerd jaeger uninstall | k delete -f -
clusterrole.rbac.authorization.k8s.io "linkerd-jaeger-linkerd-jaeger-proxy-mutator" deleted
clusterrolebinding.rbac.authorization.k8s.io "linkerd-jaeger-linkerd-jaeger-proxy-mutator" deleted
mutatingwebhookconfiguration.admissionregistration.k8s.io "linkerd-proxy-mutator-webhook-config" deleted
namespace "linkerd-jaeger" deleted
```

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-12-14 15:48:44 -08:00
Kevin Leimkuhler ce9d1335d1
Add changes for edge-20.12.3 (#5383)
## edge-20.12.3

This edge release is functionally the same as `edge-20.12.2`. It fixes an issue
that prevented the release build from occurring.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-12-14 13:29:40 -05:00
Kevin Leimkuhler dd837be375
Build jaeger-webhook in release CI (#5381)
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-12-14 11:42:51 -05:00
dependabot[bot] 5ceaf29bac
build(deps): bump ini from 1.3.5 to 1.3.7 in /web/app (#5370)
Bumps [ini](https://github.com/isaacs/ini) from 1.3.5 to 1.3.7.
- [Release notes](https://github.com/isaacs/ini/releases)
- [Commits](https://github.com/isaacs/ini/compare/v1.3.5...v1.3.7)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2020-12-14 09:39:30 -05:00
Kevin Leimkuhler b72b2d14b9
Add changes for edge-20.12.2 (#5367)
## edge-20.12.2

* Fixed an issue where the `proxy-injector` and `sp-validator` did not refresh
  their certs automatically when provided externally—like through cert-manager
* Added support for overrides flags to the `jaeger install` command to allow
  setting Helm values when installing the Linkerd-jaeger extension
* Added missing Helm values to the multicluster chart (thanks @DaspawnW!)
* Moved tracing functionality to the `linkerd-jaeger` extension
* Fixed various issues in developer shell scripts (thanks @joakimr-axis!)
* Fixed an issue where `install --ha` was only partially applying the high
  availability config
* Updated RBAC API versions in the CNI chart (thanks @glitchcrab!)
* Fixed an issue where TLS credentials are changed during upgrades, but the
  Linkerd webhooks would not restart, leaving them to use older credentials and
  fail requests
* Stopped publishing the multicluster link chart as its primary use case is in
  the `multicluster link` command and not being installed through Helm
* Added service mirror error logs for when the multicluster gateway's hostname
  cannot be resolved.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-12-11 11:06:51 -05:00
Alejandro Pedraza 131e270d5a
Don't swallow error when MC gateway hostname can't be resolved (#5362)
* Don't swallow error when MC gateway hostname can't be resolved

Ref #5343

When none of the gateway addresses is resolvable, propagate the error as
a retryable error so it gets retried and logged. Don't create the
mirrored resources if there's no success after the retries.
2020-12-11 09:58:44 -05:00
Alejandro Pedraza 02b456087d
Stop publishing the linkerd2-multicluster-link chart (#5365)
Closes #5348

That chart generates the service mirror resources and related RBAC, but
doesn't generate the credentials secret nor the Link CR which require
go-client logic not available from sheer Helm templates.

This PR stops publishing that chart, and adds a comment to its README
about it.
2020-12-11 08:55:50 -05:00
Alejandro Pedraza 7ddef6dbeb
Clarification for collectorSvcAccount and collectorSvcAddr in jaeger's values.yaml (#5366)
Moved the `collectorSvcAccount` and `collectorSvcAddr` values in
`values.yaml` under the `webhook` section, given it's the injector that
will make use of that, and to not confuse with the SA and address for
the collector that is provided by default (the injector could point to a
different collector than that one).
2020-12-11 08:55:20 -05:00
Tarun Pothulapati c19cfd71a1
upgrades: make webhooks restart if TLS creds are updated (#5349)
* upgrades: make webhooks restart if TLS creds are updated

Fixes #5231

Currently, we do not re-use the TLS certs during upgrades, which
means that the secrets are updated while the webhooks are still
paired with the older ones, causing the webhook requests to fail.

This can be solved by making webhooks be restarted whenever there
is a change in the certs. This can be performed by storing the hash
of the `*-rbac` file, which contains the secrets, thus making the
pod templates change whenever there is an update to the certs thus
making restarts required.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-12-10 11:56:53 -05:00
Simon Weald cae4add8d0
Update RBAC API versions to avoid deprecations (#5332)
When testing the `linkerd2-cni` chart with `ct`, it flags up usage
of some deprecated apiVersions.

This PR aligns the RBAC API group across all resources in the chart.

---

Signed-off-by: Simon Weald <glitchcrab-github@simonweald.com>
2020-12-09 15:56:25 -05:00
Alejandro Pedraza 35612ae268
`linkerd install --ha` was only partially applying HA config (#5358)
* `linkerd install --ha` was only partially applying HA config

Fixes #5342

`values-ha.yml` contains the specific config for HA, but only the proxy
resources controller replicas settings were applied. This PR adds
EnablePodAntiafinity, WebhookFailurePolicy and all the resource settings
for the other CP pods.

Also the `--controller-replicas` flag is moved after the HA flags so it
can override the HA settings.

Finally, some comments no longer relevant were removed.

## How to test

Perform `linkerd install --ha` and make sure the values in
`values-ha.yml` are propagated correctly in the produced yaml.

## 2.9.1

After merging to `main`, this should be cherry-picked into the
`release/stable-2.9` branch.

Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-12-09 15:23:37 -05:00
Joakim Roubert 377b38f0bf
bin/protoc-diff: Don't assume Debian and don't install unzip (#5347)
- Do unzip check but don't install; leave installation to user
- Move unzip check to bin/protoc that actually uses unzip
- Make sure the protoc scripts can be called from any directory

Fixes #5337

Signed-off-by: Joakim Roubert <joakim.roubert@axis.com>
2020-12-09 09:12:38 -05:00
Alex Leong cdc57d1af0
Use linkerd-jaeger extension for control plane tracing (#5299)
Now that tracing has been split out of the main control plane and into the linkerd-jaeger extension, we remove references to tracing from the main control plane including:

* removing the tracing components from the main control plane chart
* removing the tracing injection logic from the main proxy injector and inject CLI (these will be added back into the new injector in the linkerd-jaeger extension)
* removing tracing related checks (these will be added back into `linkerd jaeger check`)
* removing related tests

We also update the `--control-plane-tracing` flag to configure the control plane components to send traces to the linkerd-jaeger extension.  To make sure this works even when the linkerd-jaeger extension is installed in a non-default namespace, we also add a `--control-plane-tracing-namespace` flag which can be used to change the namespace that the control plane components send traces to.

Note that for now, only the control plane components send traces; the proxies in the control plane do not.  This is because the linkerd-jaeger injector is not yet available.  However, this change adds the appropriate namespace annotations to the control plane namespace to configure the proxies to send traces to the linkerd-jaeger extension once the linkerd-jaeger injector is available.

I tested this by doing the following:

1. bin/linkerd install | kubectl apply -f -
1. bin/helm install jaeger jaeger/charts/jaeger
1. bin/linkerd upgrade --control-plane-tracing=true | kubectl apply -f -
1. kubectl -n linkerd-jaeger port-forward svc/jaeger 16686
1. open http://localhost:16686
1. see traces from the linkerd control plane

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-12-08 14:34:26 -08:00
Kevin Leimkuhler 15dc97c70e
add some missing helm values for multicluster setup (#5346)
Original description:

> **Subject**
> Add missing helm values for multicluster setup
> 
> **Problem**
> When executing this without the linkerd command the two variables are missing and the rendering will generate empty values.
> This produces the following gateway identity, that is also used in the gateway link command to generate the link crd:
> 
> ```
> mirror.linkerd.io/gateway-identity: linkerd-gateway.linkerd-multicluster.serviceaccount.identity..
> ```
> 
> **Solution**
> Add the values as defaults to the helm chart values.yaml file. If the cli is used they are overwritten by the following parameters:
> * https://github.com/linkerd/linkerd2/blob/main/cli/cmd/multicluster.go#L197
> * https://github.com/linkerd/linkerd2/blob/main/cli/cmd/multicluster.go#L196

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Co-authored-by: Björn Wenzel <bjoern.wenzel@dbschenker.com>
2020-12-08 10:27:16 -05:00
Alejandro Pedraza 48666a7673
bin/shellcheck-all was missing some files (#5335)
* bin/shellcheck-all was missing some files

`bin/shellcheck-all` identifies what files to check by filtering by the
`text/x-shellscript` mime-type, which only applies to files with a
shebang pointing to bash. We had a number of files with a
`#!/usr/bin/env sh` shebang that (at least in Ubuntu given `sh` points
to `dash`) only exposes a `text/plain` mime-type, thus they were not
being checked.

This fixes that issue by replacing the filter in `bin/shellcheck-all`, using a simple grep over the file shebang instead of using the `file` command.
2020-12-08 09:30:52 -05:00
Chris Downs d2dc87a0bc
Adding Mythical Games to adopters (#5345)
Signed-off-by: Chris Downs <downs@mythical.games>
2020-12-07 14:31:19 -08:00
Kevin Leimkuhler a456d03621
Change script to work with k3d and install only CLI (#5333)
This changes the install-pr script to work with k3d.

Additionally, it now only installs the CLI; it no longer installs Linkerd on the
cluster. This was removed because most of the time when installing a Linkerd
version from a PR, some extra installation configuration is required and I was
always commenting out that final part of the script.

`--context` was changed to `--cluster` since we no longer need a context value,
only the cluster name which we are loading the images in to.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-12-07 11:35:01 -05:00
Tarun Pothulapati 72a0ca974d
extension: Separate multicluster chart and binary (#5293)
Fixes #5257

This branch movies mc charts and cli level code to a new
top level directory. None of the logic is changed.

Also, moves some common types into `/pkg` so that they
are accessible both to the main cli and extensions.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-12-04 16:36:10 -08:00
Tarun Pothulapati 47a49e5ac5
jaeger: Add support for override flags (#5304)
This change adds flags `set`, `set-string`, `values`, `set-files`,
etc flags which are used to override the default values. This is
similar to that of Helm.

This also updates the install workflow to directly use Helm v3
pkg for chart loading and generation, without having to use
our chart type, etc.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-12-04 16:35:39 -08:00
Alejandro Pedraza 4c634a3816
Have webhooks refresh their certs automatically (#5282)
* Have webhooks refresh their certs automatically

Fixes partially #5272

In 2.9 we introduced the ability for providing the certs for `proxy-injector` and `sp-validator` through some external means like cert-manager, through the new helm setting `externalSecret`.
We forgot however to have those services watch changes in their secrets, so whenever they were rotated they would fail with a cert error, with the only workaround being to restart those pods to pick the new secrets.

This addresses that by first abstracting out `FsCredsWatcher` from the identity controller, which now lives under `pkg/tls`.

The webhook's logic in `launcher.go` no longer reads the certs before starting the https server, moving that instead into `server.go` which in a similar way as identity will receive events from `FsCredsWatcher` and update `Server.cert`. We're leveraging `http.Server.TLSConfig.GetCertificate` which allows us to provide a function that will return the current cert for every incoming request.

### How to test

```bash
# Create some root cert
$ step certificate create linkerd-proxy-injector.linkerd.svc ca.crt ca.key \
  --profile root-ca --no-password --insecure --san linkerd-proxy-injector.linkerd.svc

# configure injector's caBundle to be that root cert
$ cat > linkerd-overrides.yaml << EOF
proxyInjector:
  externalSecret: true
    caBundle: |
      < ca.crt contents>
EOF

# Install linkerd. The injector won't start untill we create the secret below
$ bin/linkerd install --controller-log-level debug --config linkerd-overrides.yaml | k apply -f -

# Generate an intermediatery cert with short lifespan
step certificate create linkerd-proxy-injector.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-proxy-injector.linkerd.svc

# Create the secret using that intermediate cert
$ kubectl create secret tls \
  linkerd-proxy-injector-k8s-tls \
   --cert=ca-int.crt \
   --key=ca-int.key \
   --namespace=linkerd

# start following the injector log
$ k -n linkerd logs -f -l linkerd.io/control-plane-component=proxy-injector -c proxy-injector

# Inject emojivoto. The pods should be injected normally
$ bin/linkerd inject https://run.linkerd.io/emojivoto.yml | kubectl apply -f -

# Wait about 5 minutes and delete a pod
$ k -n emojivoto delete po -l app=emoji-svc

# You'll see it won't be injected, and something like "remote error: tls: bad certificate" will appear in the injector logs.

# Regenerate the intermediate cert
$ step certificate create linkerd-proxy-injector.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-proxy-injector.linkerd.svc

# Delete the secret and recreate it
$ k -n linkerd delete secret linkerd-proxy-injector-k8s-tls
$ kubectl create secret tls \
  linkerd-proxy-injector-k8s-tls \
   --cert=ca-int.crt \
   --key=ca-int.key \
   --namespace=linkerd

# Wait a couple of minutes and you'll see some filesystem events in the injector log along with a "Certificate has been updated" entry
# Then delete the pod again and you'll see it gets injected this time
$ k -n emojivoto delete po -l app=emoji-svc

```
2020-12-04 16:25:59 -05:00
Alex Leong 8ad546b302
edge-20.12.1 (#5324)
This edge release continues the work of decoupling non-core Linkerd components
by moving more tracing related functionality into the Linkerd-jaeger extension.

* Continued work on moving tracing functionality from the main control plane
  into the `linkerd-jaeger` extension
* Fixed a potential panic in the proxy when looking up a socket's peer address
  while under high load
* Added automatic readme generation for charts (thanks @GMarkfjard!)
* Fixed zsh completion for the CLI (thanks @jiraguha!)
* Added support for multicluster gateways of types other than LoadBalancer
  (thanks @DaspawnW!)

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-12-03 14:35:57 -08:00
Kevin Leimkuhler 5c39c9b44d
Fix namespace flag in install help text (#5322)
The shorthand flag for `--linkerd-namespace` is `-L` not `-l`.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-12-03 14:33:15 -05:00
Oliver Gould 13c3aa9062
proxy: v2.124.0 (#5323)
This release updates the proxy's `*ring*` dependency to pick up the
latest changes from BoringSSL.

Additionally, we've audited uses of non-cryptographic random number
generators in the proxy to ensure that each balancer/router intializes
its own RNG state.

---

* Audit uses of SmallRng (linkerd/linkerd2-proxy#757)
* Update *ring* to 0.6.19 (linkerd/linkerd2-proxy#758)
* metrics: Support the Summary metric type (linkerd/linkerd2-proxy#756)
2020-12-03 11:28:23 -08:00
Björn Wenzel 0ee18eb168
Allow Multicluster Service to be non LoadBalancer ServiceType (#5307)
Signed-off-by: Björn Wenzel <bjoern.wenzel@dbschenker.com>
2020-12-03 13:03:49 -05:00
Alex Leong 86d6b46e04
Add linkerd.io/extension label (#5311)
The namespace that Linkerd extensions are installed into is configurable.  This can make it difficult to know which extensions are installed and where they are located.  We add a `linkerd.io/extension` namespace label to easily enumerate and locate Linkerd extensions.  This can be used, for example, to enable certain features only when certain extensions are installed.  All new Linkerd extensions should include this namespace label.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-12-02 13:17:06 -08:00
jiraguha ed1dff5366
Fix CLI zsh completion (#5285)
Signed-off-by: jpiraguha <jiraguha@gmail.com>
2020-12-02 15:37:57 -05:00
Alejandro Pedraza 94574d4003
Add automatic readme generation for charts (#5316)
* Add automatic readme generation for charts

The current readmes for each chart is generated
manually and doesn't contain all the information available.

Utilize helm-docs to automatically fill out readme.mds
for the helm charts by pulling metadata from values.yml.

Fixes #4156

Co-authored-by: GMarkfjard <gabma047@student.liu.se>
2020-12-02 14:37:45 -05:00
Tarun Pothulapati f5f5da0e7e
extension: add jaeger dashboard sub-command (#5291)
This branch adds `jaeger dashboard` sub-command which is used
to view the jaeger dashboard. This follows the same logic/pattern
of that of `linkerd-dashboard`. Also, provides the same flags.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-12-01 10:56:18 -08:00
Oliver Gould 83241fef20
proxy: v2.123.0 (#5301)
This release removes a potential panic: it was assumed that looking up a
socket's peer address was infallible, but in practice this call can
fail when a host is under high load. Now these failures only impact the
connection-level task and not the whole proxy proces.

Also, the `process_cpu_seconds_total` metric is now exposed as a float
so that its value may include fractional seconds with 10ms granularity.

---

* io: Make peer_addr fallible (linkerd/linkerd2-proxy#755)
* metrics: Expose process_cpu_seconds_total as a float (linkerd/linkerd2-proxy#754)
2020-11-30 17:14:03 -08:00
Alejandro Pedraza 6fb35b0af7
Jaeger injector mutating webhook (#5276)
* Jaeger injector mutating webhook

Closes #5231. This is based off of the `alex/sep-tracing` branch.

This webhook injects the `LINKERD2_PROXY_TRACE_COLLECTOR_SVC_ADDR`,
`LINKERD2_PROXY_TRACE_COLLECTOR_SVC_NAME` and
`LINKERD2_PROXY_TRACE_ATTRIBUTES_PATH` environment vars into the proxy
spec when a pod is created, as well as the podinfo volume and its mount.
If any of these are found to be present already in the pod spec, it
exits without applying a patch.

The `values.yaml` file has been expanded to include config for this
webhook. In particular, one can define a `namespaceSelector` and/or a
`objectSelector` to filter which pods will this webhook act on.

The config entries in `values.yam` for `collectorSvcAddr` and
`collectorSvcAccount` can be overriden with the
`config.linkerd.io/trace-collector` and
`config.alpha.linkerd.io/trace-collector-service-account` annotation at
the namespace or pod spec level.

## How to test:
```bash
docker build . -t ghcr.io/linkerd/jaeger-webhook:0.0.1 -f
jaeger/proxy-mutator/Dockerfile
k3d image import ghcr.io/linkerd/jaeger-webhook:0.0.1
bin/helm-build
linkerd install
helm install jaeger jaeger/charts/jaeger
linkerd inject https://run.linkerd.io/emojivoto.yml | kubectl apply -f -
kubectl -n emojivoto get po -l app=emoji-svc -oyaml | grep -A1 TRACE
```

## Reinvocation policy
The webhookconfig resource is configured with `reinvocationPolicy:
IfNeeded` so that if the tracing injector gets triggered before the
proxy injector, it will get triggered a second time after the proxy
injector runs so it can act on the injected proxy. By default this won't
be necessary because the webhooks run in alphabetical order (this is not
documented in k8s docs though) so
`linkerd-proxy-injector-webhook-config` will run before
`linkerd-proxy-mutator-webhook-config`. In order to test the
reinvocation mechanism, you can change the name of the former so it gets
called first.

I versioned the webhook image as `0.0.1`, but we can decide to align
that with linkerd's main version tag.
2020-11-27 12:25:28 -05:00
Alejandro Pedraza 62e208b99f
Release notes for edge-20.11.5 (#5284)
This edge release improves the proxy's support high-traffic workloads. It also
contains the first steps towards decoupling non-core Linkerd components, the
first iteration being a new `linkerd jaeger` sub-command for installing tracing.
Please note this is still a work in progress.

* Addressed some issues reported around clients seeing max-concurrency errors by
  increasing the default in-flight request limit to 100K pending requests
* Have the proxy appropriately set `content-type` when synthesizing gRPC error
  responses
* Bumped the `proxy-init` image to `v1.3.8` which is based off of
  `buster-20201117-slim` to reduce potential security vulnerabilities
* No longer panic in rare cases when `linkerd-config` doesn't have an entry for
  `Global` configs (thanks @hodbn!)
* Work in progress: the `/jaeger` directory now contains the charts and commands
  for installing the tracing component.
2020-11-27 07:35:00 -08:00
Alejandro Pedraza 9cbfb08a38
Bump proxy-init to v1.3.8 (#5283) 2020-11-27 09:07:34 -05:00
Tarun Pothulapati e7f4c31257
extension: Add new jaeger binary (#5278)
* extension: Add new jaeger binary

This branch adds a new jaeger binary project in the jaeger directory.
This follows the same logic as that of `linkerd install`. But as
`linkerd install` VFS logic expects charts to be present in `/charts`
directory, This command gets its own static pkg to generate its own
VFS for its chart.

This covers only the install part of the command

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-11-25 20:10:35 +05:30
Alex Leong 0f20b0572e
tracing: new jaeger independent helm chart (#5275)
Fixes #5230

This PR moves tracing into a jaeger chart with no proxy injection
templates. We still keep the dependency on partials, as we could use
common templates like resources, etc from there.

Signed-off-by: Tarun Pothulapati tarunpothulapati@outlook.com
2020-11-24 09:45:16 -08:00