Commit Graph

2627 Commits

Author SHA1 Message Date
Alejandro Pedraza a04b30d2ab
Simplify SelfCheck API (#5665)
Fixes #5575

Now that only viz makes use of the `SelfCheck` api, merged the `healthcheck.proto` into `viz.proto`.

Also removed the "checkRPC" functionality that was used for handling multiple API responses and was only used by `SelfCheck`, because the extra complexity was not granted. Revert to use the plain vanilla "check" by just concatenating error responses.

## Success Output

```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
√ viz extension self-check
```

## Failure Examples

Failure when viz fails to connect to the k8s api:
```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
× viz extension self-check
    Error calling the Kubernetes API: someerror
    see https://linkerd.io/checks/#l5d-api-control-api for hints

Status check results are ×
```

Failure when viz fails to connect to Prometheus:
```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
× viz extension self-check
    Error calling Prometheus from the control plane: someerror
    see https://linkerd.io/checks/#l5d-api-control-api for hints

Status check results are ×
```

Failure when viz fails to connect to both the k8s api and Prometheus:
```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
× viz extension self-check
    Error calling the Kubernetes API: someerror
    Error calling Prometheus from the control plane: someerror
    see https://linkerd.io/checks/#l5d-api-control-api for hints

Status check results are ×
```
2021-02-05 10:13:45 -05:00
Tarun Pothulapati 704ed00a49
viz: make checks aware of prom and grafana being optional (#5627)
* viz: make checks aware of prom and grafana being optional

Fixes #5618

Currently, The linkerd-viz checks fail whenever external
Prometheus is being used as those checks are not aware of
Prometheus and grafana being optional.

This commit fixes this by making the Prometheus and Grafana
as separate checks which are not fatal and these checks
can also be made dynamic and be ran only if those
components are available.

This commit also adds some of the missing resources checks,
especially that of the new `metrics-api` into viz checks

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-02-05 11:26:44 +05:30
Alex Leong c6536996f7
Add linkerd repair command (#5636)
In the stable-2.9.0, stable-2.9.1, and stable-2.9.2 releases, the `linkerd-config-overrides` secret is missing the `linkerd.io/control-plane-ns` label.  This means that if a `linkerd upgrade` is performed to one of these versions using the `--prune` flag, then the secret will be deleted.  Missing this secret will prevent any further upgrades.

We add a `linkerd repair` command which recreates the `linkerd-config-overrides` secret by fetching the installed values from the `linkerd-config` configmap and then re-populating the redacted identity values from the `linkerd-identity-issuer` secret.

Usage:

```bash
linkerd repair | kubectl apply -f -
```

To test:
```
# Set Linkerd version to stable-2.8.0
> linkerd install | kubectl apply -f -
# Set Linkerd version to stable-2.9.1
> linkerd upgrade | kubectl apply --prune -l linkerd.io/control-plane-ns=linkerd -f -
# Set Linkerd version to stable-2.9.2
> linkerd upgrade | kubectl apply --prune -l linkerd.io/control-plane-ns=linkerd -f -
(Command fails)
# Set Linkerd version to HEAD
> linkerd repair | kubectl apply -f -
# Set Linkerd version to stable-2.9.2
> linkerd upgrade | kubectl apply --prune -l linkerd.io/control-plane-ns=linkerd -f -
(Command succeeds)
> linkerd check
```

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-02-04 16:47:04 -08:00
Alejandro Pedraza 59eca5bb82
Use full-length actions SHAs in CI (#5668)
Github now requires that actions pined to a SHA use the full-length SHA,
otherwise CI throws a warning like:
```
actions/checkout@722adc6 looks like the shortened version of a commit
SHA. Referencing actions by the short SHA will be disabled soon. Please
see
https://docs.github.com/en/actions/learn-github-actions/security-hardening-for-github-actions#using-third-party-actions.
```
2021-02-04 16:51:53 -05:00
Alejandro Pedraza 565f32d429
## edge-21.2.1 (#5667)
This edge release continues improving the proxy's diagnostics and also avoids
timing out when the HTTP protocol detection fails. Additionally, old resource
versions were upgraded to avoid warnings in k8s v1.19. Finally, it comes with
lots of CLI improvements detailed below.

* Improved the proxy's diagnostic metrics to help us get better insights into
  services that are in fail-fast
* Improved the proxy's HTTP protocol detection to prevent timeout errors
* Upgraded CRD and webhook config resources to get rid of warnings in k8s v1.19
  (thanks @mateiidavid!)
* Added viz components into the Linkerd Health Grafana charts
* Had the tap injector add a `viz.linkerd.io/tap-enabled` annotation when
  injecting a pod, which allowed providing clearer feedback for the `linkerd
  tap` command
* Had the jaeger injector add a `jaeger.linkerd.io/tracing-enabled` annotation
  when injecting a pod, which also allowed providing better feedback for the
  `linkerd jaeger check` command
* Improved the `linkerd uninstall` command so it fails gracefully when there
  still are injected resources in the cluster (a `--force` flag was provided
  too)
* Moved the `linkerd profile --tap` functionality into a new command `linkerd
  viz profile --tap`, given tap now belongs to the viz extension
* Expanded the `linkerd viz check` command to include data-plane checks
* Cleaned-up YAML in templates that was incompatible with SOPS (thanks
  @tkms0106!)
2021-02-04 15:10:26 -05:00
Alejandro Pedraza 7cd5bf9e10
bin/test-cleanup reordering (#5666)
After #5642 `linkerd uninstall` will fail if there still are injected
pods in the cluster. So uninstall should happen last.
2021-02-04 14:01:01 -05:00
Alejandro Pedraza b8ed799372
Include viz components in Prom scrapes, fix Linkerd Health charts (#5656)
* Include viz components in Prom scrapes, fix Linkerd Health charts

Fixes #5429

Expanded the `linkerd-controller` Prometheus scraping config so it also includes the `linkerd-viz` namespace. Also simplified the first relabelling config there removing the `_meta_kubernetes_pod_label_linkerd_io_control_plane_component` source label that wasn't serving any purpose. Just by its own, that extra scraping now allows having non-empty Go charts at the bottom of the `Linkerd Health` charts for the viz components.

Additionally, the `namespace-viz` variable was added into `health.json` which then is leveraged in the queries for the `Control-Plane Traffic` and `Control-Plane TCP Metrics` charts to include the viz pods.

Finally in that same file the queries for the `Data-Plane Telemetry` section were simplified by removing the filter on the `control_plane_ns` label which was redundant.
2021-02-04 09:40:23 -05:00
William Morgan 511ba69f5c
add initial steering committee members (#5640)
* add initial steering committee members

Signed-off-by: William Morgan <william@buoyant.io>

Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
2021-02-04 08:25:09 -06:00
Kevin Leimkuhler 228d8e9e95
Add tracing enabled annotation (#5643)
This change adds the `jaeger.linkerd.io/tracing-enabled` annotation which is
automatically added by the Jaeger extension's `jaeger-injector`.

All pods that receive this annotation have also had the required environment
variables and volume/volume mounts add by the injector.

The purpose of this annotation is that it will allow `jaeger check` to check for
the presence of this annotation instead of needing to look at the proxy
containers directly. If this annotation is not present on pods, `jaeger check`
can warn users that tracing is not configured for those pods. This is similar to
`viz check` warning users that tap is not configured—recenlty added in #5602.

Closes #5632

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-02-03 14:05:15 -05:00
Tarun Pothulapati b521091ca7
cli: make `linkerd uninstall` fail when injected pods are present (#5642)
* cli: make `linkerd uninstall` fail when injected pods are present

Fixes #5622

This PR updates the `linkerd uninstall` cmd to check if there
are any injected pods and fails if there are any. This also
provides `--force` flag to skip this check.

pods from namespaces with prefix `linkerd` are skipped
so as to not error out for control-plane and extension
pods.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-02-03 12:31:30 +05:30
Oliver Gould 10411d02bb
proxy: v2.131.0 (#5657)
This release changes HTTP protocol detection to prevent timeout errors
in two ways:

1. HTTP detection no longer blocks until a newline is read. We've
   reverted to relying on a single read to make a determination.
2. Detection timeouts are no longer terminal. When a timeout is
   encountered, we continue forwarding the connection as an opaque TCP
   connection.

These changes may lead to false-negatives--we may fail to detect some
HTTP streams--but it should prevent many avoidable detection errors.

This release also makes improvements for multicluster gateways,
improving caching so that profile lookups are only performed
once per target service.

Diagnostic `stack_*` metrics have been moved so that they track
underlying services, ignoring fail-fast. This should help us get better
insights into services that are in failfast.

Finally, the opencensus exporter has been improved to ensure that trace
events are flushed if the trace buffer is not filled within a timeout.

---

* actions: Update actions to use full SHAs (linkerd/linkerd2-proxy#885)
* http: Parameterize normalize_ur::DefaultAuthority (linkerd/linkerd2-proxy#886)
* http: Parameterize the HTTP server (linkerd/linkerd2-proxy#887)
* opencensus: rewrite span exporter using async/await (linkerd/linkerd2-proxy#789)
* Update http::Insert to use `Param` (linkerd/linkerd2-proxy#889)
* Update crate dependencies (linkerd/linkerd2-proxy#892)
* stack: Make the router fallible (linkerd/linkerd2-proxy#888)
* Track stack metrics within failfast (linkerd/linkerd2-proxy#891)
* outbound: Avoid building balancers when no concrete name (linkerd/linkerd2-proxy#890)
* inbound: Cache HTTP gateways per destination (linkerd/linkerd2-proxy#893)
* Reorganize the gateway crate (linkerd/linkerd2-proxy#897)
* Bias HTTP detection towards availability (linkerd/linkerd2-proxy#894)
* inbound: Use ALPN to determine transport header (linkerd/linkerd2-proxy#895)
* detect: Return unknown protocol on detection timeout (linkerd/linkerd2-proxy#896)
* Extract protocol detection into the gateway crate (linkerd/linkerd2-proxy#898)
2021-02-02 18:08:22 -08:00
Kevin Leimkuhler df0ce24b12
viz: add viz profile command (#5621)
## What this changes

This adds a `viz profile` command that outputs a service profile based off tap
data. It is identical—but fixes—the current `profile --tap` command.

Additionally, it removes the `--tap` flag from the `profile` command since this
depends on the Viz extension being installed in order to tap a service.

## Why

The `profile --tap` command is currently broken since it depends on the Viz
extension being installed, but the `profile` command is part of the core
install.

Closes #5613

Unblocks #5545

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-02-01 19:02:46 -05:00
Simon Weald 590d152566
add Giant Swarm to adopters (#5645)
This PR adds Giant Swarm as an adopter 🥳

---

Signed-off-by: Simon Weald <glitchcrab-github@simonweald.com>
2021-02-01 16:33:45 -05:00
Tarun Pothulapati cd2e911be3
viz: add data-plane and prometheus healthchecks (#5602)
* viz: add data-plane and prometheus healthchecks

Fixes #5325

This branch adds the remaining healthchecks for the viz extension
i.e

- Data-plane metrics check in Prometheus
- `--proxy` mode which also checks for tap injections based
  on annotations.

For this, The following changes were needed
- Category.ID is made public so that --proxy toggleness can be
allowed
- Made tap env key as a field so that it can be re-used for
checks

simplify viz.NewHealthChecker by removing the need to
 pass categoryIDs and instead using
hc.appendCategories directly at the caller to add the
required categories. This is possible by dividing the vizCategories
into separate functions

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-02-01 23:01:13 +05:30
Takumi Sue 77add64860
Remove extra three dashes from helm templates (#5628)
(Background information)
In our company we are checking the sops-encrypted Linkerd manifest into GitHub repository,
and I came across the following problem.

---

Three dashes mean the start of the YAML document (or the end of the
directive).
https://yaml.org/spec/1.2/spec.html#id2800132

If there are only comments between `---`, the document is empty.
Assume the file which include an empty document at the top of itself.

```yaml
---
# foo
---
apiVersion: v1
kind: Namespace
metadata:
  name: foo
---
# bar
---
apiVersion: v1
kind: Namespace
metadata:
  name: bar
```

When we encrypt and decrypt it with [sops](https://github.com/mozilla/sops), the empty document will be
converted to `{}`.

```yaml
{}
---
apiVersion: v1
kind: Namespace
metadata:
    name: foo
---
apiVersion: v1
kind: Namespace
metadata:
    name: bar
```

It is invalid as k8s manifest ([apiVersion not set, kind not set]).

```
error validating data: [apiVersion not set, kind not set]
```

---

I'm afraid that it's sops's problem (at least partly), but anyhow this modification is enough harmless I think.
Thank you.

Signed-off-by: Takumi Sue <u630868b@alumni.osaka-u.ac.jp>
2021-02-01 10:51:34 -05:00
Matei David 0ce9e84a94
Introduce V1 to CRDs and Mutating Hooks (#5603)
*Closes #5484*
 ### Changes
---
*Overview*:
 * Update golden files and make necessary spec changes
 * Update test files for viz
 * Add v1 to healthcheck and uninstall
 * Fix link-crd clusterDomain field validation

- To update to v1, I had to change crd schemas to be version-based (i.e each version has to declare its own schema). I noticed an error in the link-crd (`targetClusterDomain` was `targetDomainName`). Also, additionalPrinterColumns are also version-dependent as a field now.

- For `admissionregistration` resources I had to add an additional `admissionReviewVersions` field -- I included `v1` and `v1beta1`.

- In `healthcheck.go` and `resources.go` (used by `uninstall`) I had to make some changes to the client-go versions (i.e from `v1beta1` to `v1` for admissionreg and apiextension) so that we don't see any warning messages when uninstalling or when we do any install checks. 

I tested again different cli and k8s versions to have a bit more confidence in the changes (in addition to automated tests), hope the cases below will be enough, if not let me know and I can test further.

### Tests

Linkerd local build CLI + k8s 1.19+
`install/check/mc-check/mc-install/mc-link/viz-install/viz-check/uninstall/`
```
$ kubectl version
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2+k3s1", GitCommit:"1d4adb0301b9a63ceec8cabb11b309e061f43d5f", GitTreeState:"clean", BuildDate:"2021-01-14T23:52:37Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

$ bin/linkerd version
Client version: git-b0fd2ec8
Server version: unavailable

$ bin/linkerd install | kubectl apply -f -
- no errors, no version warnings - 

$ bin/linkerd check --expected-version git-b0fd2ec8
Status check results are :tick:

# MC

$ bin/linkerd mc install | k apply -f - 
- no erros, no version warnings - 

$ bin/linkerd mc check
Status check results are :tick:

$ bin/linkerd mc link foo | k apply -f -   # test crd creation
# had a validation error here because the schema had targetDomainName instead of targetClusterDomain
# changed, rebuilt cli, re-installed mc, tried command again
secret/cluster-credentials-foo created
link.multicluster.linkerd.io/foo created
...

# VIZ
$ bin/linkerd viz install | k apply -f - 
- no errors, no version warnings - 

$ bin/linkerd viz check 
- no errors, no version warnings - 
Status check results are :tick:

$ bin/linkerd uninstall | k delete -f -
- no errors, no version warnings - 
```

Linkerd local build CLI + k8s 1.17
`check-pre/install/mc-check/mc-install/mc-link/viz-install/viz-check`
```
$ kubectl version
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.17-rc1+k3s1", GitCommit:"e8c9484078bc59f2cd04f4018b095407758073f5", GitTreeState:"clean", BuildDate:"2021-01-14T06:20:56Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

$ bin/linkerd version
Client version: git-3d2d4df1 # made changes to link-crd after prev test case
Server version: unavailable

$ bin/linkerd check --pre --expected-version git-3d2d4df1
- no errors, no version warnings -
Status check results are :tick:

$ bin/linkerd install | k apply -f -
- no errors, no version warnings -

$ bin/linkerd check --expected-version git-3d2d4df1
- no errors, no version warnings - 
Status check results are :tick:

$ bin/linkerd mc install | k apply -f -
- no errors, no version warnings - 

$ bin/linkerd mc check 
- no errors, no version warnings - 
Status check results are :tick:

$ bin/linkerd mc link --cluster-name foo | k apply -f -
bin/linkerd mc link --cluster-name foo | k apply -f -
secret/cluster-credentials-foo created
link.multicluster.linkerd.io/foo created

# VIZ

$ bin/linkerd viz install | k apply -f - 
- no errors, no version warnings - 

$ bin/linkerd viz check
- no errors, no version warnings -
- hangs up indefinitely after linkerd-viz can talk to Kubernetes
```

Linkerd edge (21.1.3) CLI + k8s 1.17 (already installed)
`check`
```
$ linkerd version
Client version: edge-21.1.3
Server version: git-3d2d4df1

$ linkerd check
- no errors -
- warnings: mismatch between cli & control plane, control plane not up to date (both expected) -
Status check results are :tick:
```

Linkerd stable (2.9.2) CLI + k8s 1.17 (already installed)
`check/uninstall`
```
$ linkerd version
Client version: stable-2.9.2
Server version: git-3d2d4df1

$ linkerd check
× control plane ClusterRoles exist
    missing ClusterRoles: linkerd-linkerd-tap
    see https://linkerd.io/checks/#l5d-existence-cr for hints

Status check results are ×

# viz wasn't installed, hence the error, installing viz didn't help since
# the res is named `viz-tap` now
# moving to uninstall

$ linkerd uninstall | k delete -f -
- no warnings, no errors - 
```

_Note_: I used `go test ./cli/cmd/... --generate` which is why there are so many changes 😨 

Signed-off-by: Matei David <matei.david.35@gmail.com>
2021-02-01 09:18:13 -05:00
Kevin Leimkuhler 964a190069
viz: only tap pods that have tap explicitly enabled (#5608)
## What this changes

This allows the tap controller to inform `tap` users when pods either have tap
disabled or tap is not enabled yet.

## Why

When a user taps a resource that has not been admitted by the Viz extension's
`tap-injector`, tap is not explicitly disabled but it is also not enabled.
Therefore, the `tap` command hangs and provides no feedback to the user.

Closes #5544

## How

A new `viz.linkerd.io/tap-enabled` annotation is introduced which is
automatically added by the Viz extension's `tap-injector`. This annotation is
added to a pod when it is able to be tapped; this means that the pod and the
pod's namespace do not have the `config.linkerd.io/disable-tap` annotation
added.

When a user attempts to tap a resource, the tap controller now looks for this
new annotation; if the annotation is present on the pod then that pod is
tappable.

If the annotation is not present or tap is explicitly disabled, an error is
returned.

## UI changes

Multiple errors can now occur when trying to tap a resource:

1. There are no pods for the resource.
2. There are pods for the resource, but tap is disabled via pod or namespace
   annotation.
3. There are pods for the resource, but tap is not yet enabled because the
   `tap-injector` did not admit the resource.

Errors are now handled as shown below:

Tap is disabled:

```
❯ bin/linkerd viz tap deploy/test
Error: no pods to tap for deployment/test
pods found with tap disabled via the config.linkerd.io/disable-tap annotation
```

Tap is not enabled:

```
❯ bin/linkerd viz tap deploy/test
Error: no pods to tap for deployment/test
pods found with tap not enabled; try restarting resource so that it can be injected
```

There are a mix of pods with tap disabled or tap not enabled:

```
❯ bin/linkerd viz tap deploy/test
Error: no pods to tap for deployment/test
pods found with tap disabled via the config.linkerd.io/disable-tap annotation
pods found with tap not enabled; try restarting resource so that it can be injected
```

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-01-28 17:37:45 -05:00
Alex Leong 07a2f07471
edge-21.1.4 (#5633)
This edge release continues to polish the Linkerd extension model and improves
the robustness of the opaque transport.

* Improved the consistency of behavior of the `check` commands between
  Linkerd extensions
* Fixed an issue where Linkerd extension commands could be run before the
  extension was fully installed
* Renamed some extension Helm charts for consistency:
  * jaeger -> linkerd-jaeger
  * linkerd2-multicluster -> linkerd-multicluster
  * linkerd2-multicluster-link -> linkerd-multicluster-link
* Fixed an issue that could cause the inbound proxy to fail meshed HTTP/1
  requests from older proxies (from the stable-2.8.x vintage)
* Changed opaque-port transport to be advertised via ALPN so that new proxies
  will not initiate opaque-transport connections to proxies from prior
  edge releases
* Added inbound proxy transport metrics with `tls="passhtru"` when forwarding
  non-mesh TLS connections
* Thanks to @hs0210 for adding new unit tests!

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-01-28 11:28:07 -08:00
Oliver Gould fdd36357d5
proxy: v2.130.1 (#5625)
First and foremost, this release fixes an issue that could cause the
inbound proxy to fail meshed HTTP/1 requests from older proxies (from
the stable-2.8.x vintage).

Additionally, this release changes how opaque-port transport works, in
preparation for TCP multicluster functionality: now servers must
advertise support for the transport header via ALPN. Clients will only
send a transport header when the server advertises support for ALPN.
This means that new proxies will not initiate opaque-transport
connections to proxies from prior edge releases.

Finally, inbound proxies may now report transport metrics with
`tls="passhtru"` when forwarding non-mesh TLS connections.

---

* metrics: add `target_addr` labels to HTTP metrics (linkerd/linkerd2-proxy#866)
* inbound: Handle direct connections with a dedicated stack (linkerd/linkerd2-proxy#863)
* inbound: Avoid HTTP detection when a transport header is present (linkerd/linkerd2-proxy#867)
* Update tokio to v1.1.0 (linkerd/linkerd2-proxy#870)
* admin: stackify admin server (linkerd/linkerd2-proxy#868)
* tls: Report SNI values for non-Linkerd TLS (linkerd/linkerd2-proxy#869)
* admin: Record transport & HTTP metrics (linkerd/linkerd2-proxy#871)
* test: Disable tracing-subscriber by default (linkerd/linkerd2-proxy#873)
* inbound: Split stack into modules (linkerd/linkerd2-proxy#872)
* Improve diagnostics around the SwitchReady module (linkerd/linkerd2-proxy#875)
* Use TLS ALPN to negotiate transport header support (linkerd/linkerd2-proxy#874)
* stack: Introduce the Param trait (linkerd/linkerd2-proxy#876)
* transport-header: Encode session protocol (linkerd/linkerd2-proxy#877)
* transport: Add a ConnectAddr parameter type (linkerd/linkerd2-proxy#879)
* profiles: Use a LogicalAddr param type (linkerd/linkerd2-proxy#878)
* stack: Replace switch with Filter and NewEither (linkerd/linkerd2-proxy#880)
* inbound: normalize URIs after downgrading to HTTP/1 (linkerd/linkerd2-proxy#881)
2021-01-28 08:01:51 -08:00
Hu Shuai 5e3d5190c3
Add unit test for pkg/healthcheck/sidecar.go (#5609)
Signed-off-by: Hu Shuai <hus.fnst@cn.fujitsu.com>
2021-01-27 16:56:14 -05:00
William Morgan c79d36d63a
add STEERING.md (#5607)
* add STEERING.md

Signed-off-by: William Morgan <william@buoyant.io>
2021-01-27 09:39:32 -06:00
Alex Leong dd8e5fc5bc
Rename extension charts to linkerd-* (#5552)
For consistency we rename the extension charts to a common naming scheme:

linkerd-viz -> linkerd-viz (unchanged)
jaeger -> linkerd-jaeger
linkerd2-multicluster -> linkerd-multicluster
linkerd2-multicluster-link -> linkerd-multicluster-link

We also make the chart files and chart readmes a bit more uniform.

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-01-26 16:20:49 -08:00
Tarun Pothulapati 9756b3f8f1
extensions: make subcmds check/wait for respective extensions (#5566)
* extensions: make subcmds check/wait for respective extensions

This commit updates the extension subcmds to check and wait
for the respective extensions to be up before running them.

The same healthcheck pkg and respective extension checks
 are used to at the check/wait logic.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-01-26 23:01:25 +05:30
Dennis Adjei-Baah ae2b3499b0
Tweak error message in web script (#5596)
The `bin/web run` script sets up a local environment for linkerd
dashboard development. This script port-forwards an existing linkerd
controller and a grafana instance in a local kubernetes cluster. When
running the command with just the linkerd control plane  and no
linkerd viz extension the error message is shown below.
```
'Controller is not running. Have you installed Linkerd?'
```

This error message is a little misleading because the controller is
installed when running this after `linkerd install`. The issue here is
that the script checks for a Grafana instance but the error message it
displays when it can't find a Grafana pod is that the controller isn't
install. The error message should instead notify the developer that
Linkerd Viz is not installed.

This change modifies the error message so it is more clear.

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2021-01-26 11:21:44 -06:00
Andrew Seigner 700b4c5cb5
Move @siggy to emeritus maintainer (#5597)
Emeritus (adj): having retired but allowed to retain their title as an
honor

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2021-01-23 07:26:47 -08:00
Alex Leong 964ce11559
Update generated serviceprofile code (#5580)
I ran `bin/update-codegen.sh` to update the generated code to include the opaque ports in the generated deepcopy function for service profiles.

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-01-22 14:34:49 -08:00
Tarun Pothulapati 4f0601e632
jaeger: cli and check logic cleanup (#5564)
This branch cleans up some of the unnecessary logic that is not
needed and thus making the check logic similar to that of other
extensions, namely viz.

Includes the following cleanups:

- Remove `namespace` flag in jaeger CLI and make the fetching logic
dynamic and use it in check and dashboard.
- Use `hc.KubeAPIClient` instead of creating our own in jaeger check.
- Move injection checks up before we run the readiness checks

This change adds a new extension namespace exist check for
jaeger.

Also, Updates integration tests to run the check commands.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-01-22 23:31:35 +05:30
cpretzer fcb71de428
Changes for edge-21.1.3 (#5590)
* Changes for edge-21.1.3
Signed-off-by: Charles Pretzer <charles@buoyant.io>
2021-01-21 17:06:10 -08:00
Alejandro Pedraza 8ac5360041
Extract from public-api all the Prometheus dependencies, and moves things into a new viz component 'linkerd-metrics-api' (#5560)
* Protobuf changes:
- Moved `healthcheck.proto` back from viz to `proto/common` as it remains being used by the main `healthcheck.go` library (it was moved to viz by #5510).
- Extracted from `viz.proto` the IP-related types and put them in `/controller/gen/common/net` to be used by both the public and the viz APIs.

* Added chart templates for new viz linkerd-metrics-api pod

* Spin-off viz healthcheck:
- Created `viz/pkg/healthcheck/healthcheck.go` that wraps the original `pkg/healthcheck/healthcheck.go` while adding the `vizNamespace` and `vizAPIClient` fields which were removed from the core `healthcheck`. That way the core healthcheck doesn't have any dependencies on viz, and viz' healthcheck can now be used to retrieve viz api clients.
- The core and viz healthcheck libs are now abstracted out via the new `healthcheck.Runner` interface.
- Refactored the data plane checks so they don't rely on calling `ListPods`
- The checks in `viz/cmd/check.go` have been moved to `viz/pkg/healthcheck/healthcheck.go` as well, so `check.go`'s sole responsibility is dealing with command business. This command also now retrieves its viz api client through viz' healthcheck.

* Removed linkerd-controller dependency on Prometheus:
- Removed the `global.prometheusUrl` config in the core values.yml.
- Leave the Heartbeat's `-prometheus` flag hard-coded temporarily. TO-DO: have it automatically discover viz and pull Prometheus' endpoint (#5352).

* Moved observability gRPC from linkerd-controller to viz:
- Created a new gRPC server under `viz/metrics-api` moving prometheus-dependent functions out of the core gRPC server and into it (same thing for the accompaigning http server).
- Did the same for the `PublicAPIClient` (now called just `Client`) interface. The `VizAPIClient` interface disappears as it's enough to just rely on the viz `ApiClient` protobuf type.
- Moved the other files implementing the rest of the gRPC functions from `controller/api/public` to `viz/metrics-api` (`edge.go`, `stat_summary.go`, etc.).
- Also simplified some type names to avoid stuttering.

* Added linkerd-metrics-api bootstrap files. At the same time, we strip out of the public-api's `main.go` file the prometheus parameters and other no longer relevant bits.

* linkerd-web updates: it requires connecting with both the public-api and the viz api, so both addresses (and the viz namespace) are now provided as parameters to the container.

* CLI updates and other minor things:
- Changes to command files under `cli/cmd`:
  - Updated `endpoints.go` according to new API interface name.
  - Updated `version.go`, `dashboard` and `uninstall.go` to pull the viz namespace dynamically.
- Changes to command files under `viz/cmd`:
  - `edges.go`, `routes.go`, `stat.go` and `top.go`: point to dependencies that were moved from public-api to viz.
- Other changes to have tests pass:
  - Added `metrics-api` to list of docker images to build in actions workflows.
  - In `bin/fmt` exclude protobuf generated files instead of entire directories because directories could contain both generated and non-generated code (case in point: `viz/metrics-api`).

* Add retry to 'tap API service is running' check

* mc check shouldn't err when viz is not available. Also properly set the log in multicluster/cmd/root.go so that it properly displays messages when --verbose is used
2021-01-21 18:26:38 -05:00
Tarun Pothulapati 288fbefe02
viz: cleanup helm values.yaml (#5546)
* viz: cleanup helm values.yaml

This branch fixes some nits around naming of default variables
i.e replace the usage of global with default.

Renames globalLogLevel to defaultLogLevel and globalUID to
defaultUID along with some chart README updates.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-01-22 00:48:16 +05:30
Tarun Pothulapati a95efe2db1
tests: update newly added golden tests (#5588)
#5507 added new golden tests but missed some updates from other PRs
that got merged meanwhile.

This branch updates those golden tests with those changes

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-01-21 23:47:36 +05:30
Tarun Pothulapati d0d2e0ea7a
cli: add helm customization flags to core install (#5507)
* cli: add helm customization flags to core install

Fixes #5506

This branch adds helm way of customization through
 `set`, `set-string`, `values`, `set-files` flags for
`linkerd install` cmd along with unit tests.

For this to work, the helm v3 engine rendering helpers
had to be used instead of our own wrapper type.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-01-21 22:49:50 +05:30
Andrew Seigner 9c80d4d2a1
Fix `linkerd metrics` resource selector (#5567)
The `linkerd metrics` command was selecting pods based on owner resource
names. If multiple owners existed with the same name (for example
`sts/web`, `deploy/web`), additional pods would be incorrectly included
in the output.

Fix the pod selector code to validate pods have owner references to the
given workload/owner.

Before:
```
$ linkerd metrics -n emojivoto deploy/web|grep POD
  # POD web-0 (1 of 3)
  # POD web-d9ffd684f-gnbcx (2 of 3)
  # POD web-fs6l7 (3 of 3)
```

After:
```
$ bin/go-run cli metrics -n emojivoto deploy/web|grep POD
  # POD web-d9ffd684f-gnbcx (1 of 1)
```

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2021-01-21 11:57:27 -05:00
Matei David c63fbdf0e4
Introduce OpenAPIV3 validation for CRDs (#5573)
* Introduce OpenAPIV3 validation for CRDs

* Add validation to link crd
* Add validation to sp using kube-gen
* Add openapiv3 under schema fields in specific versions
* Modify fields to rid spec of yaml errors
* Add top level validation for all three CRDs

Signed-off-by: Matei David <matei.david.35@gmail.com>
2021-01-21 11:56:28 -05:00
Naseem 2cc96d4ab9
fix alertmanagers casing (one word) (#5377)
fixes #5371

Signed-off-by: naseemkullah <naseem@transit.app>
2021-01-21 11:55:24 -05:00
Kevin Leimkuhler e7f2a3fba3
viz: add tap-injector (#5540)
## What this changes

This adds a tap-injector component to the `linkerd-viz` extension which is
responsible for adding the tap service name environment variable to the Linkerd
proxy container.

If a pod does not have a Linkerd proxy, no action is taken. If tap is disabled
via annotation on the pod or the namespace, no action is taken.

This also removes the environment variable for explicitly disabling tap through
an environment variable. Tap status for a proxy is now determined only be the
presence or absence of the tap service name environment variable.

Closes #5326

## How it changes

### tap-injector

The tap-injector component determines if `LINKERD2_PROXY_TAP_SVC_NAME` should be
added to a pod's Linkerd proxy container environment. If the pod satisfies the
following, it is added:

- The pod has a Linkerd proxy container
- The pod has not already been mutated
- Tap is not disabled via annotation on the pod or the pod's namespace

### LINKERD2_PROXY_TAP_DISABLED

Now that tap is an extension of Linkerd and not a core component, it no longer
made sense to explicitly enable or disable tap through this Linkerd proxy
environment variable. The status of tap is now determined only be if the
tap-injector adds or does not add the `LINKERD2_PROXY_TAP_SVC_NAME` environment
variable.

### controller image

The tap-injector has been added to the controller image's several startup
commands which determines what it will do in the cluster.

As a follow-up, I think splitting out the `tap` and `tap-injector` commands from
the controller image into a linkerd-viz image (or something like that) makes
sense.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-01-21 11:24:08 -05:00
Oliver Gould 6f954c3823
proxy: v2.129.0 (#5581)
This release improves diagnostics about the proxy's failfast state:

* Warnings are now emitted when the failfast state is entered;
* The "max concurrency exhausted" gRPC message has been changed to
  more-clearly indicate a failfast state error; and
* Failfast recovery has been made more robust, ensuring that a service
  can recover indepenently of new requests being received.

Furthermore, metric labeling has been improved:

* TCP server metrics are now annotated with the original `target_addr`;
* The `tls` label is now set to true for inbound TLS connections that
  lack a client ID. This is mostly helpful to clarify inbound metrics on
  the `identity` controller;
* Outbound `tls` metrics could be reported incorrectly when a proxy was
  configured to not use identity. This has been corrected.

Finally, socket-level errors now include a _client_ or _server_ prefix
to indicate which side of the proxy encountered the error.

---

* stack: remove `map_response` (linkerd/linkerd2-proxy#835)
* replace `RequestFilter` with Tower's upstream impl (linkerd/linkerd2-proxy#842)
* tracing: fix incorrect field format when logging in JSON (linkerd/linkerd2-proxy#845)
* replace `FutureService` with Tower's upstream impl (linkerd/linkerd2-proxy#839)
* integration: improve tracing in tests (linkerd/linkerd2-proxy#846)
* service-profiles: Prevent Duration coercion panics (linkerd/linkerd2-proxy#844)
* inbound: Separate HTTP server logic from protocol detection (linkerd/linkerd2-proxy#843)
* Correct gRPC 'max-concurrency exhausted' error messages (linkerd/linkerd2-proxy#847)
* Update tonic to v0.4 (linkerd/linkerd2-proxy#849)
* failfast: Improve diagnostic logging (linkerd/linkerd2-proxy#848)
* Update the base docker image (linkerd/linkerd2-proxy#850)
* stack: Implement Clone for ResultService (linkerd/linkerd2-proxy#851)
* Ensure services in failfast can become ready (linkerd/linkerd2-proxy#858)
* tests: replace string matching on metrics with parsing (linkerd/linkerd2-proxy#859)
* Decouple tls::accept from TcpStream (linkerd/linkerd2-proxy#853)
* metrics: Handle NoPeerIdFromRemote properly (linkerd/linkerd2-proxy#857)
* metrics: Reorder metrics labels (linkerd/linkerd2-proxy#856)
* Rename tls::accept to tls::server (linkerd/linkerd2-proxy#854)
* Annotate socket-level errors with a scope (linkerd/linkerd2-proxy#852)
* test: reduce repetition in metrics tests (linkerd/linkerd2-proxy#860)
* tls: Disambiguate client and server identities (linkerd/linkerd2-proxy#855)
* Update to tower v0.4.4 (linkerd/linkerd2-proxy#864)
* Update cargo dependencies (linkerd/linkerd2-proxy#865)
* metrics: add `target_addr` label for accepted transport metrics (linkerd/linkerd2-proxy#861)
* outbound: Strip endpoint identity when disabled (linkerd/linkerd2-proxy#862)

---

The opaque-ports test has been updated to reflect proxy metrics changes.
2021-01-21 06:52:38 -08:00
Oliver Gould d2ae5a8117
build: Remove the DOCKER_TRACE environment variable (#5583)
Our build scripts hide docker's output by default and only pass through
output when DOCKER_TRACE is set. Practically everyone else tends to use
DOCKER_TRACE=1 persistently. And, recently, GitHub Actions stopped
working with `/dev/stderr`

This change removes the DOCKER_TRACE environment variable so that output
is always emitted as it would when invoking docker directly.
2021-01-20 22:09:47 -08:00
Hu Shuai 08439f1f6e
Add unit test for pkg/charts/charts.go (#5565)
Add tests for MergeMap

Signed-off-by: Hu Shuai <hus.fnst@cn.fujitsu.com>
2021-01-20 09:55:01 -05:00
Tarun Pothulapati 3b755e5c1d
multicluster: add helm customization flags for install (#5534)
* multicluster: add helm customization flags

This branch updates the multicluster install flow to use the
helm engine directly instead of our own chart wrapper. This
also adds the helm customization flags.

```bash
tarun in dev in on  k3d-deep (default) linkerd2 on  tarun/mc-helm-flags [$+?] via  v1.15.4
 ./bin/go-run cli mc install --set namespace=l5d-mc | grep l5d-mc
github.com/linkerd/linkerd2/multicluster/cmd
github.com/linkerd/linkerd2/cli/cmd
  name: l5d-mc
  namespace: l5d-mc
  namespace: l5d-mc
  namespace: l5d-mc
    mirror.linkerd.io/gateway-identity: linkerd-gateway.l5d-mc.serviceaccount.identity.linkerd.cluster.local
  namespace: l5d-mc
  namespace: l5d-mc
  namespace: l5d-mc
  namespace: l5d-mc
  namespace: l5d-mc
```

* add customization flags even for link cmd

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-01-20 11:29:42 +05:30
Hu Shuai 37472c566f
Fix typos. (#5563)
Fix spelling: accomodate
Fix spelling: conenctions

Signed-off-by: Hu Shuai <hus.fnst@cn.fujitsu.com>
2021-01-19 15:28:57 -08:00
Risha Mars 1a5a8c0cf2
Move @rmars to emeritus maintainer (#5562)
Signed-off-by: Risha Mars <mars@buoyant.io>
2021-01-19 13:54:05 -08:00
Kevin Leimkuhler eb9b264d65
Add watch to jaeger-injector (#5548)
## What this changes

This fixes an issue in the Jaeger extension's `jaeger-injector` component that
causes an injection error in situations with high pod or namespace churn.

Because it cannot watch namespaces, it relies only off of `get` and this appears
to fall behind at a certain point. This surfaces as an error.

For example, in the `inject` test about half way through it errors with the
error:

```
=== RUN   TestInjectAutoPod
    inject_test.go:430: failed to create pod/inject-pod-test-terminus in namespace linkerd-inject-pod-test for exit status 1: Error from server: error when creating "STDIN": admission webhook "jaeger-injector.linkerd.io" denied the request: namespace "linkerd-inject-pod-test" not found
--- FAIL: TestInjectAutoPod (0.22s)
FAIL
```

Looking at the `jaeger-injector` logs, most of it's messages are about the test
namespaces not being created:

```
..
time="2021-01-15T15:34:12Z" level=info msg="received admission review request b2f36a9c-3f88-4abe-bcaa-f63c61cd24c0"
time="2021-01-15T15:34:12Z" level=info msg="received admission review request 9f5b229b-1c60-4b24-a020-b66cd201171e"
time="2021-01-15T15:34:12Z" level=error msg="failed to run webhook handler. Reason: namespace \"linkerd-inj-auto-params-test\" not found"
time="2021-01-15T15:34:12Z" level=info msg="received admission review request ae00d63a-1585-46ba-9a75-1f93d40766a8"
time="2021-01-15T15:34:12Z" level=info msg="received admission review request 998721eb-5625-4be8-9166-9db834c58f10"
time="2021-01-15T15:34:12Z" level=error msg="failed to run webhook handler. Reason: namespace \"linkerd-inj-auto-params-test\" not found"
time="2021-01-15T15:34:12Z" level=info msg="received admission review request 52e4e603-89b1-492b-a69b-dc8ff67d5f26"
time="2021-01-15T15:34:12Z" level=info msg="received admission review request 27558a16-5120-4aeb-a0bd-f22a1666b2b1"
time="2021-01-15T15:34:12Z" level=error msg="failed to run webhook handler. Reason: namespace \"linkerd-inj-auto-params-test\" not found"
..
```

Adding the `watch` verb to it's cluster role fixes this and these errors no
longer occur.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-01-19 09:42:28 -05:00
Yashvardhan Kukreja b67bbe157b
add jaeger check: to confirm whether the jaeger injector pod is in running state or not (#5528)
Currently, the linkerd jaeger check runs multiple checks but it doesn't have a check to confirm the state of the jaeger injector to be running.

This commit adds that required check to confirm the running state of the jaeger injector pod.

Fixes #5495

Signed-off-by: Yashvardhan Kukreja <yash.kukreja.98@gmail.com>
2021-01-19 08:35:16 +05:30
Oleh Ozimok c416e78261
destination: Fix crash when EndpointSlices are enabled (#5543)
The Destination controller can panic due to a nil-deref when
the EndpointSlices API is enabled.

This change updates the controller to properly initialize values
to avoid this segmentation fault.

Fixes #5521

Signed-off-by: Oleg Ozimok <oleg.ozimok@corp.kismia.com>
2021-01-15 12:52:11 -08:00
Tarun Pothulapati 0a2f1f3a26
viz: add check sub-command (#5496)
* viz: add check sub-command

This adds a new `viz check` cmd performing checks for the resources
in linkerd-viz extension. Checks include resource checks and
the health of resources, certs, etc

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-01-15 15:31:45 -05:00
Eugene Formanenko 535a36af7c
Add log-format flag to control plane components (#5537)
Fixes #5536

Signed-off-by: Eugene Formanenko <mo4islona@gmail.com>
2021-01-15 10:51:32 -05:00
Alejandro Pedraza d7e4f901e6
Detect default ns for metrics, identity, routes and profile subcommands (#5530)
* Detect default ns for metrics and profile subcommands

Followup to #5485, fixes remaining cases for #5524

Properly detect the default namespace given `kubeConfigPath` and
`kubeContext` for the `metrics`, `identity`, `routes` and `profile` subcommands.

Also gets rid once and for all of the `defaultNamespace` global var.
2021-01-15 08:51:26 -05:00
Alejandro Pedraza 3365e98f13
Have 'bin/test-cleanup' clean the viz helm release (#5542)
This is needed for the tests in the ARM box to pass.
2021-01-15 00:40:37 +05:30
Tarun Pothulapati 536bdf245c
Add changes for edge-21.1.2 (#5538)
## edge-21.1.2

This edge release continues the work on decoupling non-core Linkerd components.
Commands that use the viz-extension i.e, `dashboard`, `edges`, `routes`,
`stat`, `tap` and `top` are moved to the `viz` sub-command. These commands are still
available under root but are marked deprecated and will be removed in a
later stable release.

This release also features proxy's dependencies upgrade to the
Tokio v1 ecosystem.

* Moved sub-commands that use the viz-extension under `viz`
* Started ignoring pods with status.phase=Succeeded when watching IP addresses
  in destination. This is useful for re-use of IPs of terminated pods
* Support Bring your own Jaeger use-case by adding `collector.jaegerAddr` in
  the jaeger extension.
* Fixed an issue with the generation of working manifests in the
  `podAntiAffinity` use-case
* Added support for the modification of proxy resources in the viz
  extension through `values.yaml` in Helm and flags in CLI.
* Improved error reporting for port-forward logic with namespace
  and pod data, used across dashboard, checks, etc
  (thanks @piyushsingariya)
* Added support to disable the rendering of `linkerd-viz` namespace
  resource in the viz extension (thanks @nlamirault)
* Made service-profile generation work offline with `--ignore-cluster`
  flag (thanks @piyushsingariya)
* Proxy's Tap API is disabled by default and it is enabled only when
  `LINKERD2_PROXY_TAP_SVC_NAME` configuration is set. This means that
  `LINKERD2_PROXY_TAP_DISABLED` is no longer honored
* Upgraded the proxy's dependencies to Tokio v1 ecosystem

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-01-14 22:16:39 +05:30