* multicluster: make service mirror honour `requeueLimit`
Fixes#5374
Currently, Whenever the `gatewayAddress` is changed the service
mirror component keeps trying to repairEndpoints (which is
invoked every `repairPeriod`). This behavior is fine and
expected
but as the service mirror does not honor `requeueLimit` currently,
It keeps on requeuing the same event and keeps trying with no limit.
The condition that we use to limit requeues
`if (rcsw.eventsQueue.NumRequeues(event) < rcsw.requeueLimit)` does
not work for the following reason:
- For this queue to actually track requeues, `AddRateLimited` has to be
used instead which makes `NumRequeues` actually return the actual
number of requeues for a specific event.
This change updates the requeuing logic to use `AddRateLimited` instead
of `Add`
After these changes, The logs in the service mirror are as follows
```bash
time="2021-03-30T16:52:31Z" level=info msg="Received: OnAddCalled: {svc: Service: {name: grafana, namespace: linkerd-viz, annotations: [[linkerd.io/created-by=linkerd/helm git-0e2ecd7b]], labels [[linkerd.io/extension=viz]]}}" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=info msg="Requeues: 1, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=info msg="Requeues: 2, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=info msg="Requeues: 3, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=error msg="Error processing RepairEndpoints (giving up): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=info msg="Requeues: 0, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=info msg="Requeues: 1, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=info msg="Requeues: 2, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=info msg="Requeues: 3, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (giving up): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
```
As seen, The `RepairEndpoints` is called every `repairPeriod` which
is 1 minute by default. Whenever a failure happens, It is retried
but now the failures are tracked and the event is given up if it
reaches the `reuqueLimit` which is 3 by default.
This also fixes the requeuing logic for all type of events
not just `repairEndpoints`.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Currently, There is no `Notes` that get printed out after installatio
is performed through helm for extensions, like we do for the core
chart. This updates the viz and jaeger charts to include that
along with instructions to view the dashbaord.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Move CP check after the readiness check
Moved the `can initialize client` and `can query the control plane API`
checks from the `linkerd-existence` section to the `linkerd-api` because
they required the `linkerd-controller` pod to not just be "Running" but
actually be ready.
This was causing `linkerd check` to show some port-forwarding warnings
when ran right after install.
This also allowed getting rid of the `CheckPublicAPIClientOrExit` function
and directly use `CheckPublicAPIClientOrRetryOrExit` (better naming
punted for later) which was refactored so it always runs the
`linkerd-api` checks before retrieving the client.
Other changes:
- Temporarily disabled `upgrade-edge` test because the latest edge has this readiness check issue
- Have the upgrade tests do proper pruning (stolen for @Pothulapati's #5673😉 )
- Added missing label to tap SA (fixes#5850)
- Complete tap-injector Service selector
* Remove linkerd prefix from extension resources
This change removes the `linkerd-` prefix on all non-cluster resources
in the jaeger and viz linkerd extensions. Removing the prefix makes all
linkerd extensions consistent in their naming.
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
As described in https://github.com/linkerd/linkerd2/pull/5692, this PR adds support for CLI extensions.
Calling `linkerd foo` (if `foo` is not an existing Linkerd command) will now search the current PATH for an executable named `linkerd-foo` and invoke it with the current arguments.
* All arguments and flags will be passed to the extension command
* The Linkerd command itself will not process any flags
* To simplify parsing, flags are not allowed before the extension name
e.g. with an executable called `linkerd-foo` on my PATH:
```console
> linkerd foo install
Welcome to Linkerd foo!
Got: install
> linkerd foo --context=prod install
Welcome to Linkerd foo!
Got: --context=prod install
> linkerd --context=prod foo install
Cannot accept flags before Linkerd extension name
> linkerd bar install
Error: unknown command "bar" for "linkerd"
Run 'linkerd --help' for usage.
```
We also update `linkerd check` to invoke `linkerd <extension> check` for each extension found installed on the current cluster. A check warning is emitted if the extension command is not found on the path.
e.g. with both `linkerd.io/extension=foo` and `linkerd.io/extension=bar` extensions installed on the cluster:
```console
> linkerd check
[...]
Linkerd extensions checks
=========================
Welcome to Linkerd foo!
Got: check --as-group=[] --cni-namespace=linkerd-cni --help=false --linkerd-cni-enabled=false --linkerd-namespace=linkerd --output=table --pre=false --proxy=false --verbose=false --wait=5m0s
linkerd-bar
-----------
‼ Linkerd extension command linkerd-bar exists
Status check results are ‼
```
Signed-off-by: Alex Leong <alex@buoyant.io>
This reverts commit f9ab867cbc which renamed the
multicluster label name from `mirror.linkerd.io` to `multicluster.linkerd.io`.
While this change was made to follow similar namings in other extensions, it
complicates the multicluster upgrade process due to the secret creation.
`mirror.linkerd.io` is not that important of a label to change and this will
allow a smoother upgrade process for `stable-2.10.x`
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This change introduces an opaque ports annotation watcher that will send
destination profile updates when a service has its opaque ports annotation
change.
The user facing change introduced by this is that the opaque ports annotation is
now required on services when using the multicluster extension. This is because
the service mirror will create mirrored services in the source cluster, and
destination lookups in the source cluster need to discover that the workloads in
the target cluster are opaque protocols.
### Why
Closes#5650
### How
The destination server now has a new opaque ports annotation watcher. When a
client subscribes to updates for a service name or cluster IP, the `GetProfile`
method creates a profile translator stack that passes updates through resource
adaptors such as: traffic split adaptor, service profile adaptor, and now opaque
ports adaptor.
When the annotation on a service changes, the update is passed through to the
client where the `opaque_protocol` field will either be set to true or false.
A few scenarios to consider are:
- If the annotation is removed from the service, the client should receive
an update with no opaque ports set.
- If the service is deleted, the stream stays open so the client should
receive an update with no opaque ports set.
- If the service has the annotation added, the client should receive that
update.
### Testing
Unit test have been added to the watcher as well as the destination server.
An integration test has been added that tests the opaque port annotation on a
service.
For manual testing, using the destination server scripts is easiest:
```
# install Linkerd
# start the destination server
$ go run controller/cmd/main.go destination -kubeconfig ~/.kube/config
# Create a service or namespace with the annotation and inject it
# get the destination profile for that service and observe the opaque protocol field
$ go run controller/script/destination-client/main.go -method getProfile -path test-svc.default.svc.cluster.local:8080
INFO[0000] fully_qualified_name:"terminus-svc.default.svc.cluster.local" opaque_protocol:true retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} dst_overrides:{authority:"terminus-svc.default.svc.cluster.local.:8080" weight:10000}
INFO[0000]
INFO[0000] fully_qualified_name:"terminus-svc.default.svc.cluster.local" opaque_protocol:true retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} dst_overrides:{authority:"terminus-svc.default.svc.cluster.local.:8080" weight:10000}
INFO[0000]
```
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Problem: CLI prints command usage for multiple commands in-case of API errors
Solution: Print the error and the exit using os.Exit(1) to avoid Cobra printing usage
Closes#5058
Signed-off-by: Piyush Singariya piyushsingariya@gmail.com
Fixes#5574 and supersedes #5660
- Removed from all the `values.yaml` files all those "do not edit" entries for annotation/label names, hard-coding them in the templates instead.
- The `values.go` files got simplified as a result.
- The `created-by` annotation was also refactored into a reusable partial. This means we had to add a `partials` dependency to multicluster.
This renames the multicluster annotation prefix from `mirror.linkerd.io` to
`multicluster.linkerd.io` in order to reflect other extension naming patterns.
Additionally, it moves labels only used in the Multicluster extension into their
own labels file—again to reflect other extensions.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* cli: make jaeger and multicluster installs wait for core cp
This PR updates the jaeger and multicluster installs to wait
for the core control-plane to be up before moving to the rendering
logic. This prevents these components from being installed before
the injector is up and running correctly.
`--skip-checks` has been added to jaeger to skip these checks. The
same has not been added to `multicluster` as the install fails when
there is no core cp is present.
This PR also cleans up extra core cp check that we have for `viz install`
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
We've created a custom domain, `cr.l5d.io`, that redirects to `ghcr.io`
(using `scarf.sh`). This custom domain allows us to swap the underlying
container registry without impacting users. It also provides us with
important metrics about container usage, without collecting PII like IP
addresses.
This change updates our Helm charts and CLIs to reference this custom
domain. The integration test workflow now refers to the new domain,
while the release workflow continues to use the `ghcr.io/linkerd` registry
for the purpose of publishing images.
Problem
If the main Linkerd control plane has been uninstalled, it is no longer possible to uninstall the multicluster extension.
```
$ bin/linkerd mc uninstall | k delete -f -
Error: you need Linkerd to be installed in order to install multicluster addons
Usage:
linkerd multicluster uninstall [flags]
```
Solution
Fetch resources with the label `linkerd.io/extension: linkerd-multicluster` and delete them
Closes#5624
Signed-off-by: Piyush Singariya <piyushsingariya@gmail.com>
* values: removal of .global field
Fixes#5425
With the new extension model, We no longer need `Global` field
as we don't rely on chart dependencies anymore. This helps us
further cleanup Values, and make configuration more simpler.
To make upgrades and the usage of new CLI with older config work,
We add a new method called `config.RemoveGlobalFieldIfPresent` that
is used in the upgrade and `FetchCurrentConfiguration` paths to remove
global field and attach its child nodes if global is present. This is verified
by the `TestFetchCurrentConfiguration`'s older test that has the global
field.
We also don't yet remove .global in some helm stable-upgrade tests for
the initial install to work.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Run extension checks when linkerd check is invoked
This change allows the linkerd check command to also run any known
linkerd extension commands that have been installed in the cluster. It
does this by first querying for any namespace that has the label
selector `linkerd.io/extension` and then runs the subcommands for either
`jaeger`, `multicluster` or `viz`. This change runs basic namespace
healthchecks for extensions that aren't part of the Linkerd extension suite.
Fixes#5233
(Background information)
In our company we are checking the sops-encrypted Linkerd manifest into GitHub repository,
and I came across the following problem.
---
Three dashes mean the start of the YAML document (or the end of the
directive).
https://yaml.org/spec/1.2/spec.html#id2800132
If there are only comments between `---`, the document is empty.
Assume the file which include an empty document at the top of itself.
```yaml
---
# foo
---
apiVersion: v1
kind: Namespace
metadata:
name: foo
---
# bar
---
apiVersion: v1
kind: Namespace
metadata:
name: bar
```
When we encrypt and decrypt it with [sops](https://github.com/mozilla/sops), the empty document will be
converted to `{}`.
```yaml
{}
---
apiVersion: v1
kind: Namespace
metadata:
name: foo
---
apiVersion: v1
kind: Namespace
metadata:
name: bar
```
It is invalid as k8s manifest ([apiVersion not set, kind not set]).
```
error validating data: [apiVersion not set, kind not set]
```
---
I'm afraid that it's sops's problem (at least partly), but anyhow this modification is enough harmless I think.
Thank you.
Signed-off-by: Takumi Sue <u630868b@alumni.osaka-u.ac.jp>
*Closes #5484*
### Changes
---
*Overview*:
* Update golden files and make necessary spec changes
* Update test files for viz
* Add v1 to healthcheck and uninstall
* Fix link-crd clusterDomain field validation
- To update to v1, I had to change crd schemas to be version-based (i.e each version has to declare its own schema). I noticed an error in the link-crd (`targetClusterDomain` was `targetDomainName`). Also, additionalPrinterColumns are also version-dependent as a field now.
- For `admissionregistration` resources I had to add an additional `admissionReviewVersions` field -- I included `v1` and `v1beta1`.
- In `healthcheck.go` and `resources.go` (used by `uninstall`) I had to make some changes to the client-go versions (i.e from `v1beta1` to `v1` for admissionreg and apiextension) so that we don't see any warning messages when uninstalling or when we do any install checks.
I tested again different cli and k8s versions to have a bit more confidence in the changes (in addition to automated tests), hope the cases below will be enough, if not let me know and I can test further.
### Tests
Linkerd local build CLI + k8s 1.19+
`install/check/mc-check/mc-install/mc-link/viz-install/viz-check/uninstall/`
```
$ kubectl version
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2+k3s1", GitCommit:"1d4adb0301b9a63ceec8cabb11b309e061f43d5f", GitTreeState:"clean", BuildDate:"2021-01-14T23:52:37Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
$ bin/linkerd version
Client version: git-b0fd2ec8
Server version: unavailable
$ bin/linkerd install | kubectl apply -f -
- no errors, no version warnings -
$ bin/linkerd check --expected-version git-b0fd2ec8
Status check results are :tick:
# MC
$ bin/linkerd mc install | k apply -f -
- no erros, no version warnings -
$ bin/linkerd mc check
Status check results are :tick:
$ bin/linkerd mc link foo | k apply -f - # test crd creation
# had a validation error here because the schema had targetDomainName instead of targetClusterDomain
# changed, rebuilt cli, re-installed mc, tried command again
secret/cluster-credentials-foo created
link.multicluster.linkerd.io/foo created
...
# VIZ
$ bin/linkerd viz install | k apply -f -
- no errors, no version warnings -
$ bin/linkerd viz check
- no errors, no version warnings -
Status check results are :tick:
$ bin/linkerd uninstall | k delete -f -
- no errors, no version warnings -
```
Linkerd local build CLI + k8s 1.17
`check-pre/install/mc-check/mc-install/mc-link/viz-install/viz-check`
```
$ kubectl version
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.17-rc1+k3s1", GitCommit:"e8c9484078bc59f2cd04f4018b095407758073f5", GitTreeState:"clean", BuildDate:"2021-01-14T06:20:56Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
$ bin/linkerd version
Client version: git-3d2d4df1 # made changes to link-crd after prev test case
Server version: unavailable
$ bin/linkerd check --pre --expected-version git-3d2d4df1
- no errors, no version warnings -
Status check results are :tick:
$ bin/linkerd install | k apply -f -
- no errors, no version warnings -
$ bin/linkerd check --expected-version git-3d2d4df1
- no errors, no version warnings -
Status check results are :tick:
$ bin/linkerd mc install | k apply -f -
- no errors, no version warnings -
$ bin/linkerd mc check
- no errors, no version warnings -
Status check results are :tick:
$ bin/linkerd mc link --cluster-name foo | k apply -f -
bin/linkerd mc link --cluster-name foo | k apply -f -
secret/cluster-credentials-foo created
link.multicluster.linkerd.io/foo created
# VIZ
$ bin/linkerd viz install | k apply -f -
- no errors, no version warnings -
$ bin/linkerd viz check
- no errors, no version warnings -
- hangs up indefinitely after linkerd-viz can talk to Kubernetes
```
Linkerd edge (21.1.3) CLI + k8s 1.17 (already installed)
`check`
```
$ linkerd version
Client version: edge-21.1.3
Server version: git-3d2d4df1
$ linkerd check
- no errors -
- warnings: mismatch between cli & control plane, control plane not up to date (both expected) -
Status check results are :tick:
```
Linkerd stable (2.9.2) CLI + k8s 1.17 (already installed)
`check/uninstall`
```
$ linkerd version
Client version: stable-2.9.2
Server version: git-3d2d4df1
$ linkerd check
× control plane ClusterRoles exist
missing ClusterRoles: linkerd-linkerd-tap
see https://linkerd.io/checks/#l5d-existence-cr for hints
Status check results are ×
# viz wasn't installed, hence the error, installing viz didn't help since
# the res is named `viz-tap` now
# moving to uninstall
$ linkerd uninstall | k delete -f -
- no warnings, no errors -
```
_Note_: I used `go test ./cli/cmd/... --generate` which is why there are so many changes 😨
Signed-off-by: Matei David <matei.david.35@gmail.com>
For consistency we rename the extension charts to a common naming scheme:
linkerd-viz -> linkerd-viz (unchanged)
jaeger -> linkerd-jaeger
linkerd2-multicluster -> linkerd-multicluster
linkerd2-multicluster-link -> linkerd-multicluster-link
We also make the chart files and chart readmes a bit more uniform.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Protobuf changes:
- Moved `healthcheck.proto` back from viz to `proto/common` as it remains being used by the main `healthcheck.go` library (it was moved to viz by #5510).
- Extracted from `viz.proto` the IP-related types and put them in `/controller/gen/common/net` to be used by both the public and the viz APIs.
* Added chart templates for new viz linkerd-metrics-api pod
* Spin-off viz healthcheck:
- Created `viz/pkg/healthcheck/healthcheck.go` that wraps the original `pkg/healthcheck/healthcheck.go` while adding the `vizNamespace` and `vizAPIClient` fields which were removed from the core `healthcheck`. That way the core healthcheck doesn't have any dependencies on viz, and viz' healthcheck can now be used to retrieve viz api clients.
- The core and viz healthcheck libs are now abstracted out via the new `healthcheck.Runner` interface.
- Refactored the data plane checks so they don't rely on calling `ListPods`
- The checks in `viz/cmd/check.go` have been moved to `viz/pkg/healthcheck/healthcheck.go` as well, so `check.go`'s sole responsibility is dealing with command business. This command also now retrieves its viz api client through viz' healthcheck.
* Removed linkerd-controller dependency on Prometheus:
- Removed the `global.prometheusUrl` config in the core values.yml.
- Leave the Heartbeat's `-prometheus` flag hard-coded temporarily. TO-DO: have it automatically discover viz and pull Prometheus' endpoint (#5352).
* Moved observability gRPC from linkerd-controller to viz:
- Created a new gRPC server under `viz/metrics-api` moving prometheus-dependent functions out of the core gRPC server and into it (same thing for the accompaigning http server).
- Did the same for the `PublicAPIClient` (now called just `Client`) interface. The `VizAPIClient` interface disappears as it's enough to just rely on the viz `ApiClient` protobuf type.
- Moved the other files implementing the rest of the gRPC functions from `controller/api/public` to `viz/metrics-api` (`edge.go`, `stat_summary.go`, etc.).
- Also simplified some type names to avoid stuttering.
* Added linkerd-metrics-api bootstrap files. At the same time, we strip out of the public-api's `main.go` file the prometheus parameters and other no longer relevant bits.
* linkerd-web updates: it requires connecting with both the public-api and the viz api, so both addresses (and the viz namespace) are now provided as parameters to the container.
* CLI updates and other minor things:
- Changes to command files under `cli/cmd`:
- Updated `endpoints.go` according to new API interface name.
- Updated `version.go`, `dashboard` and `uninstall.go` to pull the viz namespace dynamically.
- Changes to command files under `viz/cmd`:
- `edges.go`, `routes.go`, `stat.go` and `top.go`: point to dependencies that were moved from public-api to viz.
- Other changes to have tests pass:
- Added `metrics-api` to list of docker images to build in actions workflows.
- In `bin/fmt` exclude protobuf generated files instead of entire directories because directories could contain both generated and non-generated code (case in point: `viz/metrics-api`).
* Add retry to 'tap API service is running' check
* mc check shouldn't err when viz is not available. Also properly set the log in multicluster/cmd/root.go so that it properly displays messages when --verbose is used
* Introduce OpenAPIV3 validation for CRDs
* Add validation to link crd
* Add validation to sp using kube-gen
* Add openapiv3 under schema fields in specific versions
* Modify fields to rid spec of yaml errors
* Add top level validation for all three CRDs
Signed-off-by: Matei David <matei.david.35@gmail.com>
* multicluster: add helm customization flags
This branch updates the multicluster install flow to use the
helm engine directly instead of our own chart wrapper. This
also adds the helm customization flags.
```bash
tarun in dev in on k3d-deep (default) linkerd2 on tarun/mc-helm-flags [$+?] via v1.15.4
./bin/go-run cli mc install --set namespace=l5d-mc | grep l5d-mc
github.com/linkerd/linkerd2/multicluster/cmd
github.com/linkerd/linkerd2/cli/cmd
name: l5d-mc
namespace: l5d-mc
namespace: l5d-mc
namespace: l5d-mc
mirror.linkerd.io/gateway-identity: linkerd-gateway.l5d-mc.serviceaccount.identity.linkerd.cluster.local
namespace: l5d-mc
namespace: l5d-mc
namespace: l5d-mc
namespace: l5d-mc
namespace: l5d-mc
```
* add customization flags even for link cmd
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Separate observability API
Closes#5312
This is a preliminary step towards moving all the observability API into `/viz`, by first moving its protobuf into `viz/metrics-api`. This should facilitate review as the go files are not moved yet, which will happen in a followup PR. There are no user-facing changes here.
- Moved `proto/common/healthcheck.proto` to `viz/metrics-api/proto/healthcheck.prot`
- Moved the contents of `proto/public.proto` to `viz/metrics-api/proto/viz.proto` except for the `Version` Stuff.
- Merged `proto/controller/tap.proto` into `viz/metrics-api/proto/viz.proto`
- `grpc_server.go` now temporarily exposes `PublicAPIServer` and `VizAPIServer` interfaces to separate both APIs. This will get properly split in a followup.
- The web server provides handlers for both interfaces.
- `cli/cmd/public_api.go` and `pkg/healthcheck/healthcheck.go` temporarily now have methods to access both APIs.
- Most of the CLI commands will use the Viz API, except for `version`.
The other changes in the go files are just changes in the imports to point to the new protobufs.
Other minor changes:
- Removed `git add controller/gen` from `bin/protoc-go.sh`
As #5307 & #5293 went in the same time-frame, Some of the logic
added in #5307 got lost during the merge. (oopss, Sorry!)
The same logic has been added back. The MC refactor PR #5293 moved
all the logic from `multicluster.go` into cmd specific files
whose changes added in #5307 were lost, while the changes added
in `multicluster/values.go` and template files still remained.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Currently, Each new instance of `Checker` type have to manually
set all the fields with the `NewChecker()`, even though most
use-cases are fine with the defaults.
This branch makes this simpler by using the Builder pattern, so
that the users of `Checker` can override the defaults by using
specific field methods when needed. Thus simplifying the code.
This also removes some of the methods that were specific to tests,
and replaces them with the currently used ones.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
## What
This change moves the `linkerd check --multicluster` functionality under it's
own multicluster subcommand: `linkerd multicluster check`.
There should be no functional changes as a result of this change. `linkerd
check` no longer checks for anything multicluster related and the
`--multicluster` flag has been removed.
## Why
Closes#5208
The bulk of these changes are moving all the multicluster checks from
`pkg/healthcheck` into the multicluster package.
Doing this completely separates it from core Linkerd. It still uses
`pkg/healtcheck` when possible, but anything that is used only by `multicluster
check` has been moved.
**Note the the `kubernetes-api` and `linkerd-existence` checks are run.**
These checks are required for setting up the Linkerd health checker. They set
the health checker's `kubeAPI`, `linkerdConfig`, and `apiClient` fields.
These could be set manually so that the only check the user sees is
`linkerd-multicluster`, but I chose not to do this.
If any of the setting functions errors, it would just tell the user to run
`linkerd check` and ensure the installation is correct. I find the user error
handling to be better by including these required checks since they should be
run in the first place.
## How to test
Installing Linkerd and multicluster should result in a basic check output:
```
$ bin/linkerd install |kubectl apply -f -
..
$ bin/linkerd check
..
$ bin/linkerd multicluster install |kubectl apply -f -
..
$ bin/linkerd multicluster check
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
√ can initialize the client
√ can query the control plane API
linkerd-multicluster
--------------------
√ Link CRD exists
Status check results are √
```
After linking a cluster:
```
$ bin/linkerd multicluster check
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
√ can initialize the client
√ can query the control plane API
linkerd-multicluster
--------------------
√ Link CRD exists
√ Link resources are valid
* k3d-y
√ remote cluster access credentials are valid
* k3d-y
√ clusters share trust anchors
* k3d-y
√ service mirror controller has required permissions
* k3d-y
√ service mirror controllers are running
* k3d-y
× all gateway mirrors are healthy
probe-gateway-k3d-y.linkerd-multicluster mirrored from cluster [k3d-y] has no endpoints
see https://linkerd.io/checks/#l5d-multicluster-gateways-endpoints for hints
Status check results are ×
```
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* Don't swallow error when MC gateway hostname can't be resolved
Ref #5343
When none of the gateway addresses is resolvable, propagate the error as
a retryable error so it gets retried and logged. Don't create the
mirrored resources if there's no success after the retries.
Closes#5348
That chart generates the service mirror resources and related RBAC, but
doesn't generate the credentials secret nor the Link CR which require
go-client logic not available from sheer Helm templates.
This PR stops publishing that chart, and adds a comment to its README
about it.
Original description:
> **Subject**
> Add missing helm values for multicluster setup
>
> **Problem**
> When executing this without the linkerd command the two variables are missing and the rendering will generate empty values.
> This produces the following gateway identity, that is also used in the gateway link command to generate the link crd:
>
> ```
> mirror.linkerd.io/gateway-identity: linkerd-gateway.linkerd-multicluster.serviceaccount.identity..
> ```
>
> **Solution**
> Add the values as defaults to the helm chart values.yaml file. If the cli is used they are overwritten by the following parameters:
> * https://github.com/linkerd/linkerd2/blob/main/cli/cmd/multicluster.go#L197
> * https://github.com/linkerd/linkerd2/blob/main/cli/cmd/multicluster.go#L196
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Co-authored-by: Björn Wenzel <bjoern.wenzel@dbschenker.com>
Fixes#5257
This branch movies mc charts and cli level code to a new
top level directory. None of the logic is changed.
Also, moves some common types into `/pkg` so that they
are accessible both to the main cli and extensions.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>