Fixes#5575
Now that only viz makes use of the `SelfCheck` api, merged the `healthcheck.proto` into `viz.proto`.
Also removed the "checkRPC" functionality that was used for handling multiple API responses and was only used by `SelfCheck`, because the extra complexity was not granted. Revert to use the plain vanilla "check" by just concatenating error responses.
## Success Output
```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
√ viz extension self-check
```
## Failure Examples
Failure when viz fails to connect to the k8s api:
```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
× viz extension self-check
Error calling the Kubernetes API: someerror
see https://linkerd.io/checks/#l5d-api-control-api for hints
Status check results are ×
```
Failure when viz fails to connect to Prometheus:
```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
× viz extension self-check
Error calling Prometheus from the control plane: someerror
see https://linkerd.io/checks/#l5d-api-control-api for hints
Status check results are ×
```
Failure when viz fails to connect to both the k8s api and Prometheus:
```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
× viz extension self-check
Error calling the Kubernetes API: someerror
Error calling Prometheus from the control plane: someerror
see https://linkerd.io/checks/#l5d-api-control-api for hints
Status check results are ×
```
* viz: make checks aware of prom and grafana being optional
Fixes#5618
Currently, The linkerd-viz checks fail whenever external
Prometheus is being used as those checks are not aware of
Prometheus and grafana being optional.
This commit fixes this by making the Prometheus and Grafana
as separate checks which are not fatal and these checks
can also be made dynamic and be ran only if those
components are available.
This commit also adds some of the missing resources checks,
especially that of the new `metrics-api` into viz checks
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
In the stable-2.9.0, stable-2.9.1, and stable-2.9.2 releases, the `linkerd-config-overrides` secret is missing the `linkerd.io/control-plane-ns` label. This means that if a `linkerd upgrade` is performed to one of these versions using the `--prune` flag, then the secret will be deleted. Missing this secret will prevent any further upgrades.
We add a `linkerd repair` command which recreates the `linkerd-config-overrides` secret by fetching the installed values from the `linkerd-config` configmap and then re-populating the redacted identity values from the `linkerd-identity-issuer` secret.
Usage:
```bash
linkerd repair | kubectl apply -f -
```
To test:
```
# Set Linkerd version to stable-2.8.0
> linkerd install | kubectl apply -f -
# Set Linkerd version to stable-2.9.1
> linkerd upgrade | kubectl apply --prune -l linkerd.io/control-plane-ns=linkerd -f -
# Set Linkerd version to stable-2.9.2
> linkerd upgrade | kubectl apply --prune -l linkerd.io/control-plane-ns=linkerd -f -
(Command fails)
# Set Linkerd version to HEAD
> linkerd repair | kubectl apply -f -
# Set Linkerd version to stable-2.9.2
> linkerd upgrade | kubectl apply --prune -l linkerd.io/control-plane-ns=linkerd -f -
(Command succeeds)
> linkerd check
```
Signed-off-by: Alex Leong <alex@buoyant.io>
This edge release continues improving the proxy's diagnostics and also avoids
timing out when the HTTP protocol detection fails. Additionally, old resource
versions were upgraded to avoid warnings in k8s v1.19. Finally, it comes with
lots of CLI improvements detailed below.
* Improved the proxy's diagnostic metrics to help us get better insights into
services that are in fail-fast
* Improved the proxy's HTTP protocol detection to prevent timeout errors
* Upgraded CRD and webhook config resources to get rid of warnings in k8s v1.19
(thanks @mateiidavid!)
* Added viz components into the Linkerd Health Grafana charts
* Had the tap injector add a `viz.linkerd.io/tap-enabled` annotation when
injecting a pod, which allowed providing clearer feedback for the `linkerd
tap` command
* Had the jaeger injector add a `jaeger.linkerd.io/tracing-enabled` annotation
when injecting a pod, which also allowed providing better feedback for the
`linkerd jaeger check` command
* Improved the `linkerd uninstall` command so it fails gracefully when there
still are injected resources in the cluster (a `--force` flag was provided
too)
* Moved the `linkerd profile --tap` functionality into a new command `linkerd
viz profile --tap`, given tap now belongs to the viz extension
* Expanded the `linkerd viz check` command to include data-plane checks
* Cleaned-up YAML in templates that was incompatible with SOPS (thanks
@tkms0106!)
* Include viz components in Prom scrapes, fix Linkerd Health charts
Fixes#5429
Expanded the `linkerd-controller` Prometheus scraping config so it also includes the `linkerd-viz` namespace. Also simplified the first relabelling config there removing the `_meta_kubernetes_pod_label_linkerd_io_control_plane_component` source label that wasn't serving any purpose. Just by its own, that extra scraping now allows having non-empty Go charts at the bottom of the `Linkerd Health` charts for the viz components.
Additionally, the `namespace-viz` variable was added into `health.json` which then is leveraged in the queries for the `Control-Plane Traffic` and `Control-Plane TCP Metrics` charts to include the viz pods.
Finally in that same file the queries for the `Data-Plane Telemetry` section were simplified by removing the filter on the `control_plane_ns` label which was redundant.
This change adds the `jaeger.linkerd.io/tracing-enabled` annotation which is
automatically added by the Jaeger extension's `jaeger-injector`.
All pods that receive this annotation have also had the required environment
variables and volume/volume mounts add by the injector.
The purpose of this annotation is that it will allow `jaeger check` to check for
the presence of this annotation instead of needing to look at the proxy
containers directly. If this annotation is not present on pods, `jaeger check`
can warn users that tracing is not configured for those pods. This is similar to
`viz check` warning users that tap is not configured—recenlty added in #5602.
Closes#5632
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* cli: make `linkerd uninstall` fail when injected pods are present
Fixes#5622
This PR updates the `linkerd uninstall` cmd to check if there
are any injected pods and fails if there are any. This also
provides `--force` flag to skip this check.
pods from namespaces with prefix `linkerd` are skipped
so as to not error out for control-plane and extension
pods.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
This release changes HTTP protocol detection to prevent timeout errors
in two ways:
1. HTTP detection no longer blocks until a newline is read. We've
reverted to relying on a single read to make a determination.
2. Detection timeouts are no longer terminal. When a timeout is
encountered, we continue forwarding the connection as an opaque TCP
connection.
These changes may lead to false-negatives--we may fail to detect some
HTTP streams--but it should prevent many avoidable detection errors.
This release also makes improvements for multicluster gateways,
improving caching so that profile lookups are only performed
once per target service.
Diagnostic `stack_*` metrics have been moved so that they track
underlying services, ignoring fail-fast. This should help us get better
insights into services that are in failfast.
Finally, the opencensus exporter has been improved to ensure that trace
events are flushed if the trace buffer is not filled within a timeout.
---
* actions: Update actions to use full SHAs (linkerd/linkerd2-proxy#885)
* http: Parameterize normalize_ur::DefaultAuthority (linkerd/linkerd2-proxy#886)
* http: Parameterize the HTTP server (linkerd/linkerd2-proxy#887)
* opencensus: rewrite span exporter using async/await (linkerd/linkerd2-proxy#789)
* Update http::Insert to use `Param` (linkerd/linkerd2-proxy#889)
* Update crate dependencies (linkerd/linkerd2-proxy#892)
* stack: Make the router fallible (linkerd/linkerd2-proxy#888)
* Track stack metrics within failfast (linkerd/linkerd2-proxy#891)
* outbound: Avoid building balancers when no concrete name (linkerd/linkerd2-proxy#890)
* inbound: Cache HTTP gateways per destination (linkerd/linkerd2-proxy#893)
* Reorganize the gateway crate (linkerd/linkerd2-proxy#897)
* Bias HTTP detection towards availability (linkerd/linkerd2-proxy#894)
* inbound: Use ALPN to determine transport header (linkerd/linkerd2-proxy#895)
* detect: Return unknown protocol on detection timeout (linkerd/linkerd2-proxy#896)
* Extract protocol detection into the gateway crate (linkerd/linkerd2-proxy#898)
## What this changes
This adds a `viz profile` command that outputs a service profile based off tap
data. It is identical—but fixes—the current `profile --tap` command.
Additionally, it removes the `--tap` flag from the `profile` command since this
depends on the Viz extension being installed in order to tap a service.
## Why
The `profile --tap` command is currently broken since it depends on the Viz
extension being installed, but the `profile` command is part of the core
install.
Closes#5613
Unblocks #5545
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* viz: add data-plane and prometheus healthchecks
Fixes#5325
This branch adds the remaining healthchecks for the viz extension
i.e
- Data-plane metrics check in Prometheus
- `--proxy` mode which also checks for tap injections based
on annotations.
For this, The following changes were needed
- Category.ID is made public so that --proxy toggleness can be
allowed
- Made tap env key as a field so that it can be re-used for
checks
simplify viz.NewHealthChecker by removing the need to
pass categoryIDs and instead using
hc.appendCategories directly at the caller to add the
required categories. This is possible by dividing the vizCategories
into separate functions
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
(Background information)
In our company we are checking the sops-encrypted Linkerd manifest into GitHub repository,
and I came across the following problem.
---
Three dashes mean the start of the YAML document (or the end of the
directive).
https://yaml.org/spec/1.2/spec.html#id2800132
If there are only comments between `---`, the document is empty.
Assume the file which include an empty document at the top of itself.
```yaml
---
# foo
---
apiVersion: v1
kind: Namespace
metadata:
name: foo
---
# bar
---
apiVersion: v1
kind: Namespace
metadata:
name: bar
```
When we encrypt and decrypt it with [sops](https://github.com/mozilla/sops), the empty document will be
converted to `{}`.
```yaml
{}
---
apiVersion: v1
kind: Namespace
metadata:
name: foo
---
apiVersion: v1
kind: Namespace
metadata:
name: bar
```
It is invalid as k8s manifest ([apiVersion not set, kind not set]).
```
error validating data: [apiVersion not set, kind not set]
```
---
I'm afraid that it's sops's problem (at least partly), but anyhow this modification is enough harmless I think.
Thank you.
Signed-off-by: Takumi Sue <u630868b@alumni.osaka-u.ac.jp>
*Closes #5484*
### Changes
---
*Overview*:
* Update golden files and make necessary spec changes
* Update test files for viz
* Add v1 to healthcheck and uninstall
* Fix link-crd clusterDomain field validation
- To update to v1, I had to change crd schemas to be version-based (i.e each version has to declare its own schema). I noticed an error in the link-crd (`targetClusterDomain` was `targetDomainName`). Also, additionalPrinterColumns are also version-dependent as a field now.
- For `admissionregistration` resources I had to add an additional `admissionReviewVersions` field -- I included `v1` and `v1beta1`.
- In `healthcheck.go` and `resources.go` (used by `uninstall`) I had to make some changes to the client-go versions (i.e from `v1beta1` to `v1` for admissionreg and apiextension) so that we don't see any warning messages when uninstalling or when we do any install checks.
I tested again different cli and k8s versions to have a bit more confidence in the changes (in addition to automated tests), hope the cases below will be enough, if not let me know and I can test further.
### Tests
Linkerd local build CLI + k8s 1.19+
`install/check/mc-check/mc-install/mc-link/viz-install/viz-check/uninstall/`
```
$ kubectl version
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2+k3s1", GitCommit:"1d4adb0301b9a63ceec8cabb11b309e061f43d5f", GitTreeState:"clean", BuildDate:"2021-01-14T23:52:37Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
$ bin/linkerd version
Client version: git-b0fd2ec8
Server version: unavailable
$ bin/linkerd install | kubectl apply -f -
- no errors, no version warnings -
$ bin/linkerd check --expected-version git-b0fd2ec8
Status check results are :tick:
# MC
$ bin/linkerd mc install | k apply -f -
- no erros, no version warnings -
$ bin/linkerd mc check
Status check results are :tick:
$ bin/linkerd mc link foo | k apply -f - # test crd creation
# had a validation error here because the schema had targetDomainName instead of targetClusterDomain
# changed, rebuilt cli, re-installed mc, tried command again
secret/cluster-credentials-foo created
link.multicluster.linkerd.io/foo created
...
# VIZ
$ bin/linkerd viz install | k apply -f -
- no errors, no version warnings -
$ bin/linkerd viz check
- no errors, no version warnings -
Status check results are :tick:
$ bin/linkerd uninstall | k delete -f -
- no errors, no version warnings -
```
Linkerd local build CLI + k8s 1.17
`check-pre/install/mc-check/mc-install/mc-link/viz-install/viz-check`
```
$ kubectl version
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.17-rc1+k3s1", GitCommit:"e8c9484078bc59f2cd04f4018b095407758073f5", GitTreeState:"clean", BuildDate:"2021-01-14T06:20:56Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
$ bin/linkerd version
Client version: git-3d2d4df1 # made changes to link-crd after prev test case
Server version: unavailable
$ bin/linkerd check --pre --expected-version git-3d2d4df1
- no errors, no version warnings -
Status check results are :tick:
$ bin/linkerd install | k apply -f -
- no errors, no version warnings -
$ bin/linkerd check --expected-version git-3d2d4df1
- no errors, no version warnings -
Status check results are :tick:
$ bin/linkerd mc install | k apply -f -
- no errors, no version warnings -
$ bin/linkerd mc check
- no errors, no version warnings -
Status check results are :tick:
$ bin/linkerd mc link --cluster-name foo | k apply -f -
bin/linkerd mc link --cluster-name foo | k apply -f -
secret/cluster-credentials-foo created
link.multicluster.linkerd.io/foo created
# VIZ
$ bin/linkerd viz install | k apply -f -
- no errors, no version warnings -
$ bin/linkerd viz check
- no errors, no version warnings -
- hangs up indefinitely after linkerd-viz can talk to Kubernetes
```
Linkerd edge (21.1.3) CLI + k8s 1.17 (already installed)
`check`
```
$ linkerd version
Client version: edge-21.1.3
Server version: git-3d2d4df1
$ linkerd check
- no errors -
- warnings: mismatch between cli & control plane, control plane not up to date (both expected) -
Status check results are :tick:
```
Linkerd stable (2.9.2) CLI + k8s 1.17 (already installed)
`check/uninstall`
```
$ linkerd version
Client version: stable-2.9.2
Server version: git-3d2d4df1
$ linkerd check
× control plane ClusterRoles exist
missing ClusterRoles: linkerd-linkerd-tap
see https://linkerd.io/checks/#l5d-existence-cr for hints
Status check results are ×
# viz wasn't installed, hence the error, installing viz didn't help since
# the res is named `viz-tap` now
# moving to uninstall
$ linkerd uninstall | k delete -f -
- no warnings, no errors -
```
_Note_: I used `go test ./cli/cmd/... --generate` which is why there are so many changes 😨
Signed-off-by: Matei David <matei.david.35@gmail.com>
## What this changes
This allows the tap controller to inform `tap` users when pods either have tap
disabled or tap is not enabled yet.
## Why
When a user taps a resource that has not been admitted by the Viz extension's
`tap-injector`, tap is not explicitly disabled but it is also not enabled.
Therefore, the `tap` command hangs and provides no feedback to the user.
Closes#5544
## How
A new `viz.linkerd.io/tap-enabled` annotation is introduced which is
automatically added by the Viz extension's `tap-injector`. This annotation is
added to a pod when it is able to be tapped; this means that the pod and the
pod's namespace do not have the `config.linkerd.io/disable-tap` annotation
added.
When a user attempts to tap a resource, the tap controller now looks for this
new annotation; if the annotation is present on the pod then that pod is
tappable.
If the annotation is not present or tap is explicitly disabled, an error is
returned.
## UI changes
Multiple errors can now occur when trying to tap a resource:
1. There are no pods for the resource.
2. There are pods for the resource, but tap is disabled via pod or namespace
annotation.
3. There are pods for the resource, but tap is not yet enabled because the
`tap-injector` did not admit the resource.
Errors are now handled as shown below:
Tap is disabled:
```
❯ bin/linkerd viz tap deploy/test
Error: no pods to tap for deployment/test
pods found with tap disabled via the config.linkerd.io/disable-tap annotation
```
Tap is not enabled:
```
❯ bin/linkerd viz tap deploy/test
Error: no pods to tap for deployment/test
pods found with tap not enabled; try restarting resource so that it can be injected
```
There are a mix of pods with tap disabled or tap not enabled:
```
❯ bin/linkerd viz tap deploy/test
Error: no pods to tap for deployment/test
pods found with tap disabled via the config.linkerd.io/disable-tap annotation
pods found with tap not enabled; try restarting resource so that it can be injected
```
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This edge release continues to polish the Linkerd extension model and improves
the robustness of the opaque transport.
* Improved the consistency of behavior of the `check` commands between
Linkerd extensions
* Fixed an issue where Linkerd extension commands could be run before the
extension was fully installed
* Renamed some extension Helm charts for consistency:
* jaeger -> linkerd-jaeger
* linkerd2-multicluster -> linkerd-multicluster
* linkerd2-multicluster-link -> linkerd-multicluster-link
* Fixed an issue that could cause the inbound proxy to fail meshed HTTP/1
requests from older proxies (from the stable-2.8.x vintage)
* Changed opaque-port transport to be advertised via ALPN so that new proxies
will not initiate opaque-transport connections to proxies from prior
edge releases
* Added inbound proxy transport metrics with `tls="passhtru"` when forwarding
non-mesh TLS connections
* Thanks to @hs0210 for adding new unit tests!
Signed-off-by: Alex Leong <alex@buoyant.io>
First and foremost, this release fixes an issue that could cause the
inbound proxy to fail meshed HTTP/1 requests from older proxies (from
the stable-2.8.x vintage).
Additionally, this release changes how opaque-port transport works, in
preparation for TCP multicluster functionality: now servers must
advertise support for the transport header via ALPN. Clients will only
send a transport header when the server advertises support for ALPN.
This means that new proxies will not initiate opaque-transport
connections to proxies from prior edge releases.
Finally, inbound proxies may now report transport metrics with
`tls="passhtru"` when forwarding non-mesh TLS connections.
---
* metrics: add `target_addr` labels to HTTP metrics (linkerd/linkerd2-proxy#866)
* inbound: Handle direct connections with a dedicated stack (linkerd/linkerd2-proxy#863)
* inbound: Avoid HTTP detection when a transport header is present (linkerd/linkerd2-proxy#867)
* Update tokio to v1.1.0 (linkerd/linkerd2-proxy#870)
* admin: stackify admin server (linkerd/linkerd2-proxy#868)
* tls: Report SNI values for non-Linkerd TLS (linkerd/linkerd2-proxy#869)
* admin: Record transport & HTTP metrics (linkerd/linkerd2-proxy#871)
* test: Disable tracing-subscriber by default (linkerd/linkerd2-proxy#873)
* inbound: Split stack into modules (linkerd/linkerd2-proxy#872)
* Improve diagnostics around the SwitchReady module (linkerd/linkerd2-proxy#875)
* Use TLS ALPN to negotiate transport header support (linkerd/linkerd2-proxy#874)
* stack: Introduce the Param trait (linkerd/linkerd2-proxy#876)
* transport-header: Encode session protocol (linkerd/linkerd2-proxy#877)
* transport: Add a ConnectAddr parameter type (linkerd/linkerd2-proxy#879)
* profiles: Use a LogicalAddr param type (linkerd/linkerd2-proxy#878)
* stack: Replace switch with Filter and NewEither (linkerd/linkerd2-proxy#880)
* inbound: normalize URIs after downgrading to HTTP/1 (linkerd/linkerd2-proxy#881)
For consistency we rename the extension charts to a common naming scheme:
linkerd-viz -> linkerd-viz (unchanged)
jaeger -> linkerd-jaeger
linkerd2-multicluster -> linkerd-multicluster
linkerd2-multicluster-link -> linkerd-multicluster-link
We also make the chart files and chart readmes a bit more uniform.
Signed-off-by: Alex Leong <alex@buoyant.io>
* extensions: make subcmds check/wait for respective extensions
This commit updates the extension subcmds to check and wait
for the respective extensions to be up before running them.
The same healthcheck pkg and respective extension checks
are used to at the check/wait logic.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
The `bin/web run` script sets up a local environment for linkerd
dashboard development. This script port-forwards an existing linkerd
controller and a grafana instance in a local kubernetes cluster. When
running the command with just the linkerd control plane and no
linkerd viz extension the error message is shown below.
```
'Controller is not running. Have you installed Linkerd?'
```
This error message is a little misleading because the controller is
installed when running this after `linkerd install`. The issue here is
that the script checks for a Grafana instance but the error message it
displays when it can't find a Grafana pod is that the controller isn't
install. The error message should instead notify the developer that
Linkerd Viz is not installed.
This change modifies the error message so it is more clear.
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
I ran `bin/update-codegen.sh` to update the generated code to include the opaque ports in the generated deepcopy function for service profiles.
Signed-off-by: Alex Leong <alex@buoyant.io>
This branch cleans up some of the unnecessary logic that is not
needed and thus making the check logic similar to that of other
extensions, namely viz.
Includes the following cleanups:
- Remove `namespace` flag in jaeger CLI and make the fetching logic
dynamic and use it in check and dashboard.
- Use `hc.KubeAPIClient` instead of creating our own in jaeger check.
- Move injection checks up before we run the readiness checks
This change adds a new extension namespace exist check for
jaeger.
Also, Updates integration tests to run the check commands.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Protobuf changes:
- Moved `healthcheck.proto` back from viz to `proto/common` as it remains being used by the main `healthcheck.go` library (it was moved to viz by #5510).
- Extracted from `viz.proto` the IP-related types and put them in `/controller/gen/common/net` to be used by both the public and the viz APIs.
* Added chart templates for new viz linkerd-metrics-api pod
* Spin-off viz healthcheck:
- Created `viz/pkg/healthcheck/healthcheck.go` that wraps the original `pkg/healthcheck/healthcheck.go` while adding the `vizNamespace` and `vizAPIClient` fields which were removed from the core `healthcheck`. That way the core healthcheck doesn't have any dependencies on viz, and viz' healthcheck can now be used to retrieve viz api clients.
- The core and viz healthcheck libs are now abstracted out via the new `healthcheck.Runner` interface.
- Refactored the data plane checks so they don't rely on calling `ListPods`
- The checks in `viz/cmd/check.go` have been moved to `viz/pkg/healthcheck/healthcheck.go` as well, so `check.go`'s sole responsibility is dealing with command business. This command also now retrieves its viz api client through viz' healthcheck.
* Removed linkerd-controller dependency on Prometheus:
- Removed the `global.prometheusUrl` config in the core values.yml.
- Leave the Heartbeat's `-prometheus` flag hard-coded temporarily. TO-DO: have it automatically discover viz and pull Prometheus' endpoint (#5352).
* Moved observability gRPC from linkerd-controller to viz:
- Created a new gRPC server under `viz/metrics-api` moving prometheus-dependent functions out of the core gRPC server and into it (same thing for the accompaigning http server).
- Did the same for the `PublicAPIClient` (now called just `Client`) interface. The `VizAPIClient` interface disappears as it's enough to just rely on the viz `ApiClient` protobuf type.
- Moved the other files implementing the rest of the gRPC functions from `controller/api/public` to `viz/metrics-api` (`edge.go`, `stat_summary.go`, etc.).
- Also simplified some type names to avoid stuttering.
* Added linkerd-metrics-api bootstrap files. At the same time, we strip out of the public-api's `main.go` file the prometheus parameters and other no longer relevant bits.
* linkerd-web updates: it requires connecting with both the public-api and the viz api, so both addresses (and the viz namespace) are now provided as parameters to the container.
* CLI updates and other minor things:
- Changes to command files under `cli/cmd`:
- Updated `endpoints.go` according to new API interface name.
- Updated `version.go`, `dashboard` and `uninstall.go` to pull the viz namespace dynamically.
- Changes to command files under `viz/cmd`:
- `edges.go`, `routes.go`, `stat.go` and `top.go`: point to dependencies that were moved from public-api to viz.
- Other changes to have tests pass:
- Added `metrics-api` to list of docker images to build in actions workflows.
- In `bin/fmt` exclude protobuf generated files instead of entire directories because directories could contain both generated and non-generated code (case in point: `viz/metrics-api`).
* Add retry to 'tap API service is running' check
* mc check shouldn't err when viz is not available. Also properly set the log in multicluster/cmd/root.go so that it properly displays messages when --verbose is used
* viz: cleanup helm values.yaml
This branch fixes some nits around naming of default variables
i.e replace the usage of global with default.
Renames globalLogLevel to defaultLogLevel and globalUID to
defaultUID along with some chart README updates.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
#5507 added new golden tests but missed some updates from other PRs
that got merged meanwhile.
This branch updates those golden tests with those changes
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* cli: add helm customization flags to core install
Fixes#5506
This branch adds helm way of customization through
`set`, `set-string`, `values`, `set-files` flags for
`linkerd install` cmd along with unit tests.
For this to work, the helm v3 engine rendering helpers
had to be used instead of our own wrapper type.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
The `linkerd metrics` command was selecting pods based on owner resource
names. If multiple owners existed with the same name (for example
`sts/web`, `deploy/web`), additional pods would be incorrectly included
in the output.
Fix the pod selector code to validate pods have owner references to the
given workload/owner.
Before:
```
$ linkerd metrics -n emojivoto deploy/web|grep POD
# POD web-0 (1 of 3)
# POD web-d9ffd684f-gnbcx (2 of 3)
# POD web-fs6l7 (3 of 3)
```
After:
```
$ bin/go-run cli metrics -n emojivoto deploy/web|grep POD
# POD web-d9ffd684f-gnbcx (1 of 1)
```
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Introduce OpenAPIV3 validation for CRDs
* Add validation to link crd
* Add validation to sp using kube-gen
* Add openapiv3 under schema fields in specific versions
* Modify fields to rid spec of yaml errors
* Add top level validation for all three CRDs
Signed-off-by: Matei David <matei.david.35@gmail.com>
## What this changes
This adds a tap-injector component to the `linkerd-viz` extension which is
responsible for adding the tap service name environment variable to the Linkerd
proxy container.
If a pod does not have a Linkerd proxy, no action is taken. If tap is disabled
via annotation on the pod or the namespace, no action is taken.
This also removes the environment variable for explicitly disabling tap through
an environment variable. Tap status for a proxy is now determined only be the
presence or absence of the tap service name environment variable.
Closes#5326
## How it changes
### tap-injector
The tap-injector component determines if `LINKERD2_PROXY_TAP_SVC_NAME` should be
added to a pod's Linkerd proxy container environment. If the pod satisfies the
following, it is added:
- The pod has a Linkerd proxy container
- The pod has not already been mutated
- Tap is not disabled via annotation on the pod or the pod's namespace
### LINKERD2_PROXY_TAP_DISABLED
Now that tap is an extension of Linkerd and not a core component, it no longer
made sense to explicitly enable or disable tap through this Linkerd proxy
environment variable. The status of tap is now determined only be if the
tap-injector adds or does not add the `LINKERD2_PROXY_TAP_SVC_NAME` environment
variable.
### controller image
The tap-injector has been added to the controller image's several startup
commands which determines what it will do in the cluster.
As a follow-up, I think splitting out the `tap` and `tap-injector` commands from
the controller image into a linkerd-viz image (or something like that) makes
sense.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This release improves diagnostics about the proxy's failfast state:
* Warnings are now emitted when the failfast state is entered;
* The "max concurrency exhausted" gRPC message has been changed to
more-clearly indicate a failfast state error; and
* Failfast recovery has been made more robust, ensuring that a service
can recover indepenently of new requests being received.
Furthermore, metric labeling has been improved:
* TCP server metrics are now annotated with the original `target_addr`;
* The `tls` label is now set to true for inbound TLS connections that
lack a client ID. This is mostly helpful to clarify inbound metrics on
the `identity` controller;
* Outbound `tls` metrics could be reported incorrectly when a proxy was
configured to not use identity. This has been corrected.
Finally, socket-level errors now include a _client_ or _server_ prefix
to indicate which side of the proxy encountered the error.
---
* stack: remove `map_response` (linkerd/linkerd2-proxy#835)
* replace `RequestFilter` with Tower's upstream impl (linkerd/linkerd2-proxy#842)
* tracing: fix incorrect field format when logging in JSON (linkerd/linkerd2-proxy#845)
* replace `FutureService` with Tower's upstream impl (linkerd/linkerd2-proxy#839)
* integration: improve tracing in tests (linkerd/linkerd2-proxy#846)
* service-profiles: Prevent Duration coercion panics (linkerd/linkerd2-proxy#844)
* inbound: Separate HTTP server logic from protocol detection (linkerd/linkerd2-proxy#843)
* Correct gRPC 'max-concurrency exhausted' error messages (linkerd/linkerd2-proxy#847)
* Update tonic to v0.4 (linkerd/linkerd2-proxy#849)
* failfast: Improve diagnostic logging (linkerd/linkerd2-proxy#848)
* Update the base docker image (linkerd/linkerd2-proxy#850)
* stack: Implement Clone for ResultService (linkerd/linkerd2-proxy#851)
* Ensure services in failfast can become ready (linkerd/linkerd2-proxy#858)
* tests: replace string matching on metrics with parsing (linkerd/linkerd2-proxy#859)
* Decouple tls::accept from TcpStream (linkerd/linkerd2-proxy#853)
* metrics: Handle NoPeerIdFromRemote properly (linkerd/linkerd2-proxy#857)
* metrics: Reorder metrics labels (linkerd/linkerd2-proxy#856)
* Rename tls::accept to tls::server (linkerd/linkerd2-proxy#854)
* Annotate socket-level errors with a scope (linkerd/linkerd2-proxy#852)
* test: reduce repetition in metrics tests (linkerd/linkerd2-proxy#860)
* tls: Disambiguate client and server identities (linkerd/linkerd2-proxy#855)
* Update to tower v0.4.4 (linkerd/linkerd2-proxy#864)
* Update cargo dependencies (linkerd/linkerd2-proxy#865)
* metrics: add `target_addr` label for accepted transport metrics (linkerd/linkerd2-proxy#861)
* outbound: Strip endpoint identity when disabled (linkerd/linkerd2-proxy#862)
---
The opaque-ports test has been updated to reflect proxy metrics changes.
Our build scripts hide docker's output by default and only pass through
output when DOCKER_TRACE is set. Practically everyone else tends to use
DOCKER_TRACE=1 persistently. And, recently, GitHub Actions stopped
working with `/dev/stderr`
This change removes the DOCKER_TRACE environment variable so that output
is always emitted as it would when invoking docker directly.
* multicluster: add helm customization flags
This branch updates the multicluster install flow to use the
helm engine directly instead of our own chart wrapper. This
also adds the helm customization flags.
```bash
tarun in dev in on k3d-deep (default) linkerd2 on tarun/mc-helm-flags [$+?] via v1.15.4
./bin/go-run cli mc install --set namespace=l5d-mc | grep l5d-mc
github.com/linkerd/linkerd2/multicluster/cmd
github.com/linkerd/linkerd2/cli/cmd
name: l5d-mc
namespace: l5d-mc
namespace: l5d-mc
namespace: l5d-mc
mirror.linkerd.io/gateway-identity: linkerd-gateway.l5d-mc.serviceaccount.identity.linkerd.cluster.local
namespace: l5d-mc
namespace: l5d-mc
namespace: l5d-mc
namespace: l5d-mc
namespace: l5d-mc
```
* add customization flags even for link cmd
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
## What this changes
This fixes an issue in the Jaeger extension's `jaeger-injector` component that
causes an injection error in situations with high pod or namespace churn.
Because it cannot watch namespaces, it relies only off of `get` and this appears
to fall behind at a certain point. This surfaces as an error.
For example, in the `inject` test about half way through it errors with the
error:
```
=== RUN TestInjectAutoPod
inject_test.go:430: failed to create pod/inject-pod-test-terminus in namespace linkerd-inject-pod-test for exit status 1: Error from server: error when creating "STDIN": admission webhook "jaeger-injector.linkerd.io" denied the request: namespace "linkerd-inject-pod-test" not found
--- FAIL: TestInjectAutoPod (0.22s)
FAIL
```
Looking at the `jaeger-injector` logs, most of it's messages are about the test
namespaces not being created:
```
..
time="2021-01-15T15:34:12Z" level=info msg="received admission review request b2f36a9c-3f88-4abe-bcaa-f63c61cd24c0"
time="2021-01-15T15:34:12Z" level=info msg="received admission review request 9f5b229b-1c60-4b24-a020-b66cd201171e"
time="2021-01-15T15:34:12Z" level=error msg="failed to run webhook handler. Reason: namespace \"linkerd-inj-auto-params-test\" not found"
time="2021-01-15T15:34:12Z" level=info msg="received admission review request ae00d63a-1585-46ba-9a75-1f93d40766a8"
time="2021-01-15T15:34:12Z" level=info msg="received admission review request 998721eb-5625-4be8-9166-9db834c58f10"
time="2021-01-15T15:34:12Z" level=error msg="failed to run webhook handler. Reason: namespace \"linkerd-inj-auto-params-test\" not found"
time="2021-01-15T15:34:12Z" level=info msg="received admission review request 52e4e603-89b1-492b-a69b-dc8ff67d5f26"
time="2021-01-15T15:34:12Z" level=info msg="received admission review request 27558a16-5120-4aeb-a0bd-f22a1666b2b1"
time="2021-01-15T15:34:12Z" level=error msg="failed to run webhook handler. Reason: namespace \"linkerd-inj-auto-params-test\" not found"
..
```
Adding the `watch` verb to it's cluster role fixes this and these errors no
longer occur.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Currently, the linkerd jaeger check runs multiple checks but it doesn't have a check to confirm the state of the jaeger injector to be running.
This commit adds that required check to confirm the running state of the jaeger injector pod.
Fixes#5495
Signed-off-by: Yashvardhan Kukreja <yash.kukreja.98@gmail.com>
The Destination controller can panic due to a nil-deref when
the EndpointSlices API is enabled.
This change updates the controller to properly initialize values
to avoid this segmentation fault.
Fixes#5521
Signed-off-by: Oleg Ozimok <oleg.ozimok@corp.kismia.com>
* viz: add check sub-command
This adds a new `viz check` cmd performing checks for the resources
in linkerd-viz extension. Checks include resource checks and
the health of resources, certs, etc
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Detect default ns for metrics and profile subcommands
Followup to #5485, fixes remaining cases for #5524
Properly detect the default namespace given `kubeConfigPath` and
`kubeContext` for the `metrics`, `identity`, `routes` and `profile` subcommands.
Also gets rid once and for all of the `defaultNamespace` global var.
## edge-21.1.2
This edge release continues the work on decoupling non-core Linkerd components.
Commands that use the viz-extension i.e, `dashboard`, `edges`, `routes`,
`stat`, `tap` and `top` are moved to the `viz` sub-command. These commands are still
available under root but are marked deprecated and will be removed in a
later stable release.
This release also features proxy's dependencies upgrade to the
Tokio v1 ecosystem.
* Moved sub-commands that use the viz-extension under `viz`
* Started ignoring pods with status.phase=Succeeded when watching IP addresses
in destination. This is useful for re-use of IPs of terminated pods
* Support Bring your own Jaeger use-case by adding `collector.jaegerAddr` in
the jaeger extension.
* Fixed an issue with the generation of working manifests in the
`podAntiAffinity` use-case
* Added support for the modification of proxy resources in the viz
extension through `values.yaml` in Helm and flags in CLI.
* Improved error reporting for port-forward logic with namespace
and pod data, used across dashboard, checks, etc
(thanks @piyushsingariya)
* Added support to disable the rendering of `linkerd-viz` namespace
resource in the viz extension (thanks @nlamirault)
* Made service-profile generation work offline with `--ignore-cluster`
flag (thanks @piyushsingariya)
* Proxy's Tap API is disabled by default and it is enabled only when
`LINKERD2_PROXY_TAP_SVC_NAME` configuration is set. This means that
`LINKERD2_PROXY_TAP_DISABLED` is no longer honored
* Upgraded the proxy's dependencies to Tokio v1 ecosystem
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>