Commit Graph

2319 Commits

Author SHA1 Message Date
Joakim Roubert 903fb0fcad
Fix quotes in shellscripts (#4406)
- Add quotes where missing, to handle whitespace & c:o.
- Use single quotes for non-expansion strings.
- Fix quotes were the current would cause errors.

Signed-off-by: Joakim Roubert <joakim.roubert@axis.com>
2020-06-02 16:44:38 -04:00
Alex Leong 5635f7377f
Fix uname flags for darwin in bin/lint (#4490)
The version of `uname` on Darwin doesn't support the `-o` flag, resulting in an error message when running the `bin/lint` script. 

We add an if-branch to short-circuit the `uname-o` call if running on Darwin.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-06-02 13:02:07 -07:00
Kevin Leimkuhler d7f84e6c7b
Change help text to use source/target terminology in service-mirror and healthchecks (#4524)
Change terminology from local/remote to source/target in service-mirror and
healthchecks help text.

This does not change any variable, function, struct, or field names since
testing is still improving

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-06-02 15:21:52 -04:00
Kevin Leimkuhler 8f6186f9ae
Change help text to use source/target terminology in multicluster CLI (#4523)
Change terminology from local/remote to source/target in `multicluster` CLI help
text.

This does not change any variable, function, struct, or field names since
testing is still improving.

Relevant issue: #4480

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-06-02 12:33:18 -04:00
Oliver Gould d5a6e1a424
Add projector to adopters (#4529)
* Update ADOPTERS.md

Signed-off-by: Jeremy Gordon <jeremy.gordon@gmail.com>
2020-06-02 09:10:32 -05:00
Alex Leong 91a067c924
Rename gateway ports (#4526)
* Rename gateway ports

Signed-off-by: Alex Leong <alex@buoyant.io>

* fmt

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-06-02 09:08:23 +03:00
Kevin Leimkuhler b4804a0bb5
Format fix (#4525)
Fixes CI failures

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-06-01 18:51:00 -04:00
Alejandro Pedraza e607fc9247
Fetch logs/events when integration test fails, not only for install tests (#4522)
* Fetch logs/events when integration test fails, not only for install tests

## Motivation

Mainly to know what caused containers to not start (or to restart), like in #4285

## Implementation

Followup to #4410, where we fetched unexpected logs/events when a test failed in `test/install_test.go`; now we're expanding that behavior to every integration test.

For that, we replace in each `TestMain()`:

```go
os.Exit(m.Run())
```

with

```go
os.Exit(testutil.Run(m, TestHelper, true))

```

where `testutil.Run()` executes the tests and fetches the logs/events if the tests failed.

Also extracted the log/event fetching and matching into its own separate file.

* Appease linter

* For external_issuer_integration_tests controlPlaninstalled wasn't being set
2020-06-01 16:48:55 -05:00
Zahari Dichev 6c3922a7f1
Probe manager simplification (#4510)
There are a few notable things happening in this PR: 

- the probe manager has been decoupled from the cluster_watcher. Now its only responsibility is to watch for mirrored gateways beeing created and to probe them. This means that probes are initiated for all gateways no matter whether there are mirrored services being paired
- the number of paired services is derived from the existing services in the cluster rather than being published as a metric by the prober
- there are no events being exchanged between the cluster watcher and the probe manager

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-06-01 14:41:29 -07:00
Alejandro Pedraza 571626d524
CI: properly report errors from commands (#4514)
Failures in `bin/_test-run` from commands different than `go test`
aren't currently properly reported, in part because CI's bash default is
to have `set -e` which terminates the script and just outputs
`##[error]Process completed with exit code 2.` like
[here](https://github.com/linkerd/linkerd2/pull/4496/checks?check_run_id=720720352#step:14:116)

```
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
× no unschedulable pods
    linkerd-controller-6c77c7ffb8-w8wh5: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
    linkerd-destination-6767d88f7f-rcnbq: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
    linkerd-grafana-76c76fcfb9-pdhfb: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
    linkerd-identity-5bcf97d6c8-q6rll: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
    linkerd-prometheus-6b95c56b44-hd9m6: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
    linkerd-proxy-injector-58d794ff9-jf7cj: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
    linkerd-sp-validator-6c5f999bfb-qg252: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
    linkerd-tap-6fdf84fc65-6txvr: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
    linkerd-web-8484fbd867-nm8z2: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
    see https://linkerd.io/checks/#l5d-existence-unschedulable-pods for hints

Status check results are ×
[error]Process completed with exit code 2.
```

I've made the following changes to `bin/_test-run` to generate better
messages and Github annotations when an error occurs:

- Unset `set -e` so that errors don't immediately exit the script and
don't allow us to properly format the errors.
- Removed many of the `exit_on_err` calls after go test calls because
those output enough information already (they were not being used
anyways in CI because of `set -e`). And instead have `run_test` exit
upon a `go test` error.
- Added `exit_on_err` calls right after non-`go-test` commands to
properly report their failure.
- Refactored the `exit_on_err` function so that it generates a Github
error annotation upon failure.
- Removed `trap` in `install_stable`, since the OS should be able to
handle GC for stuff under `/tmp`.

Also, I've changed the exit 2 code from `linkerd check` when it fails,
to exit code 1.
2020-06-01 15:57:33 -05:00
Alex Leong 33bd81692a
Add list of successful gateways in multicluster check (#4516)
Fixes #4478 

We add some additional output text when the "all remote cluster gateways are alive" check succeeds to list the gateways that have been detected as alive.  In order to do this, we have added an `VerboseSuccess` error type.  Even though this type implements the `error` interface, it represents a success which contains additional information to be printed.

Sample output when dead gateways are detected:

```
[...]
√ service mirror controller can access remote clusters
× all remote cluster gateways are alive
    Some gateways are not alive:
	* cluster: [gke], gateway: [linkerd-multicluster/linkerd-gateway]
    see https://linkerd.io/checks/#l5d-multicluster-remote-gateways-alive for hints
√ clusters share trust anchors
```

Sample output when all gateways are alive:

```
[...]
√ service mirror controller can access remote clusters
√ all remote cluster gateways are alive
	* cluster: [gke], gateway: [linkerd-multicluster/linkerd-gateway]
√ clusters share trust anchors
```

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-06-01 13:57:13 -07:00
Mayank Shah 2f710f48c0
multicluster: normalize nginx configmap naming (#4508)
For the Edge-20.5.6 release notes: Mention under the Helm section that the user might wanna manually remove the `nginx-configuration` configmap that is left over after this upgrade.

Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
2020-06-01 14:55:53 -05:00
Alex Leong 16d2d4bf81
Add multicluster daisy chain check (#4483)
A mirror-service is one that has been created by the mirror service controller and resolves to a gateway in another cluster.  If a mirror service is exported (and thus mirrored into another cluster) this creates a "daisy chain" where requests can come in to the cluster through the local gateway and be immediately sent out of the cluster to a remote gateway.  If the remote gateway is in the source cluster, this can create an infinite loop.

Similarly, if an exported service routes to a mirror service by a traffic split, the same daisy chain effect occurs.

One example where this can come up is with multicluster fail-over.  If both clusters simultaneously fail-over even a portion of their traffic, a loop is created.

We add a check that detects either of the above conditions and warns of the existence of a daisy chain.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-06-01 12:10:59 -07:00
Alex Leong 015d352f34
Fix array handling in bin/fmt (#4489)
Quoting the list of directories passed to `goimports` was causing the list to be interpreted as a single argument which was stopping `bin/fmt` from working.

Instead, use `read` to split the list of directories into an array.

Also fix up incorrect formatting that has crept in while `bin/fmt` has been broken.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-06-01 12:10:24 -07:00
cpretzer fb18295430
changes for edge-20.5.5 (#4504)
* changes for edge-20.5.5

Signed-off-by: Charles Pretzer <charles@buoyant.io>
2020-05-28 14:49:45 -07:00
Kevin Leimkuhler 8f5ff8d973
Wait for KinD nodes to be ready in CI (#4488)
* Wait for all nodes to be ready in CI

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-05-28 13:56:09 -07:00
Alejandro Pedraza 9a02e0d300
Multicluster Helm templates nits (#4494)
Followup to #4466

Fixed var name in multicluster's chart README.md, and removed duped
namespace yaml in `service-mirror.yaml`
2020-05-28 09:48:51 +03:00
Zahari Dichev 7b46682841
Add allow and link commands (#4466)
This change adds a `allow` and `link` commands, effectivelly enabling a cluster to have more than one set of credentials that allow it to be mirrored. 

Fx #4461

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

Co-authored-by: Alex Leong <alex@buoyant.io>
2020-05-27 14:30:55 -07:00
Alejandro Pedraza d4cdd956f5
Use bash shebang instead of sh in bin/root-tag (#4487)
In #4436 `head_root_tag()` was changed to replace `sed` with a
bash-native substitution. This assumes bash is our shell, which is the
case in `bin/_tag.sh` but not in `bin/root-tag` which calls it, and
which has a `sh` shebang that in Ubuntu points to dash instead of bash,
which breaks with the new bash-native substitution. Ergo, I'm
expliciting the bash shebang in this file.
2020-05-27 15:33:54 -05:00
Alejandro Pedraza 1844fd573b
Unhide multicluster command (#4486)
Unhide multicluster command
2020-05-27 14:22:23 -05:00
Tarun Pothulapati cd8ef3880b
Remove proxy.image.version check in templates (#4432)
This check seems redundant, as the values are being populated early. To make the template files cleaner, this is being removed.
2020-05-27 20:32:54 +05:30
Kevin Leimkuhler 4879f07334
cli: rename cluster cli command to multicluster (#4484)
This is @psinghal20's changes in #4462 which is currently failing CI.

Fixes #4456

Description from the original PR:

> This pr renames the `cluster` command in CLI to `multicluster` command. It
> also adds a shorthand `mc` for easy use.
>
> Fixes #4456
>
> Signed-off-by: psinghal20 <psinghal20@gmail.com>

The CI failure doesn't seem to be related to this change, but has only been seen
on forks. Opening this from a non-fork for now to continue investigating.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Co-authored-by: psinghal20 <psinghal20@gmail.com>
2020-05-27 10:39:52 +03:00
Alejandro Pedraza de5b22ffba
Flaky tests: when installation test fails, fetch logs and events (#4410)
* When installation test fails, fetch logs and events

Re #4371

When a test fails in `./test/install_test.go`, trigger the `TestLogs`
and `TestEvents` tests in a separate process in order to output any
unexpected logs/events that might have caused the initial test failure.

For instance, currently we're sporadically experiencing pod restarts.
Instead of ignoring them, this might help provide us with the real
underlying cause.
2020-05-26 16:41:31 -05:00
Arthur Silva Sens bfedcd5485
Added documentation for alpha cli command (#4412)
Added comments to document several methods and strucs on cmd package. Based on GoDoc guidelines. Focus on alpha cli command

Signed-off-by: arthursens <arthursens2005@gmail.com>
2020-05-26 13:59:56 -07:00
Tarun Pothulapati a8158dbeac
Add HealthChecks for Tracing Add-On (#4407)
Adds health-checks for tracing add-on, along with a refactor to have safe casts.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-05-26 22:10:23 +05:30
Alex Leong 8b04a657e0
Fix typo in release workflow (#4475)
This should fix the warning in the release action: https://github.com/linkerd/linkerd2/actions/runs/111938670

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-05-26 09:27:25 -07:00
Tarun Pothulapati 555fb14403
separate multi-cluster checks and run after add-ons (#4468) 2020-05-26 12:07:03 +05:30
Oliver Gould 2b8df8076d
proxy: v2.98.0 (#4470)
In some ingress setups, the proxy could be tricked into looping requests
through the outbound proxy. We now detect these loops and fail these
requests with a 502, saving your precious CPU.

---

* outbound: Prevent loops (linkerd/linkerd2-proxy#525)
2020-05-22 09:29:00 -07:00
Zahari Dichev 8fb0ea608a
Skip services that are mirrors of remote ones (#4460)
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-22 09:24:59 +03:00
Alex Leong 05b9e4c7d7
edge-20.5.4 (#4463)
* CLI
  * Fixed the display of the meshed pod column for non-selector services in
    `linkerd stat` output
  * Added an `addon-overwrite` upgrade flag which allows users to overwrite the
    existing addon config rather than merging into it
  * Added a `--close-wait-timeout` inject flag which sets the 
    `nf_conntrack_tcp_timeout_close_wait` property which can be used to mitigate
    connection issues with application that hold half-closed sockets
* Controller
  * Restricted the service-mirror's RBAC permissions so that it no longer is
    able to read secrets in all namespaces
  * Moved many multicluster components into the `linkerd-multicluster` namespace
    by default
  * Added multicluster gateway mirror services to allow multicluster liveness
    probes to work in private networks
  * Fixed an issue where multicluster gateway mirror services could be
    incorrectly deleted during a resync
* Internal
  * Fixed many style issues in build scripts (thanks @joakimr-axis!)
* Helm
  * Added `global.grafanaUrl` variable to allow using an existing Grafana
    installation

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-05-21 16:45:20 -07:00
Kevin Leimkuhler 2e1eb9e2ec
Use bin/kind in CI scripts (#4464)
Create kind clusters using bin script instead of GitHub action

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-05-21 16:22:23 -07:00
Zahari Dichev f7f70690fb
Fix resync bug + service selection annotations (#4453)
THis PR addresses two problems: 

- when a resync happens (or the mirror controller is restarted) we incorrectly classify the remote gateway as a mirrored service that is not mirrored anymore and we delete it
- when updating services due to a gateway update, we need to select only the services for the particular cluster

The latter fixes #4451
2020-05-21 14:15:13 -07:00
Alex Leong acacf2e023
Add --close-wait-timeout inject flag (#4409)
Depends on https://github.com/linkerd/linkerd2-proxy-init/pull/10

Fixes #4276 

We add a `--close-wait-timeout` inject flag which configures the proxy-init container to run with `privileged: true` and to set `nf_conntrack_tcp_timeout_close_wait`. 

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-05-21 14:14:14 -07:00
Tarun Pothulapati 0c53760094
update golden files with new grafana.image field format (#4455) 2020-05-21 23:05:04 +05:30
Tarun Pothulapati bd60c90e5d
Add addon-overwrite flag (#4377)
provide a `addon-overwrite` flag for upgrades to skip `linkerd-config-addons` and use `--addon-overwrite` if passed or defaults
2020-05-21 21:01:41 +05:30
Tarun Pothulapati 3473db32f8
use "/" for as the FS is virtualised (#4443)
replacing `filepath.join` in the install path in the CLI, as the fs is virtualized
2020-05-21 10:25:14 +05:30
Joakim Roubert 6b36934143
markdownlint: Use /bin/sh instead of /bin/bash (#4447)
The nice and clean markdownlint scripts use no bash-specific
functionality. Hence they could be run with /bin/sh instead. On e.g.
Debian-based systems /bin/sh is dash which has 1/10 of bash's footprint.

Signed-off-by: Joakim Roubert <joakimr@axis.com>
2020-05-20 16:36:53 -07:00
Joakim Roubert 5c104ebec6
Run shellcheck for all shell scripts in repository (#4441)
* Run shellcheck for all shell scripts in repository

Update the shellcheck command in static_checks.yml to not only scan the
contents of ./bin, but search for all files with mimetype
text/x-shellscript and feed them to shellcheck.

Certainly, this is a tad more time consuming than just scanning one
directory, but still a quite fast thing to do while it prevents any
new scripts to fly under the radar.

(Also, there is no need to exclude *.nuspec or *.ps1 from the find
command as they do not have the text/x-shellscript mimetype.)

Change-Id: I7433d231e8a315df65c03ee8765914e782057343
Signed-off-by: Joakim Roubert <joakimr@axis.com>

* Updates after review comment

Move shellcheck of all scripts to own script that is then called by
static_checks.yml as suggested by @kleimkuhler.
Also updated sources for helm-build and kind-load so that the
new shellcheck-all script can be called from any directory.

Change-Id: I9e82230459cb843c4143ec979c93060f424baed8
Signed-off-by: Joakim Roubert <joakim.roubert@axis.com>
2020-05-20 14:08:45 -07:00
Zahari Dichev 3a3e407848
Tweak check hint anchors (#4449)
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-20 23:17:51 +03:00
Alejandro Pedraza 301429ea9b
Bump KinD to 0.8.1 (#4445)
* Bump KinD to 0.8.1

This brings us K8s 1.18, which is in theory passing all the integration
tests. Currently the tracing one is failing just because of the quay.io
downtime, that hosts the nginx-ingress image.

Re #4382
2020-05-20 14:46:05 -05:00
Alex Leong 9cd4557644
Properly show the meshed count for non-selector services (#4446)
When viewing the output of `linkerd stat` for services which do not have a selector (such as services created by the service-mirror, for example) the meshed count column shows the total number which exist, even though the service actually selects no pods at all.

We update the StatSummary implementation to account for services which have no selector.

Additionally, we update the logic of the `--unmeshed` flag.  When the `--unmeshed` flag is not set, we typically skip rows for unmeshed resources because those resources would have no stats.  This is not appropriate to do when the `--from` flag is also set because in this case, metrics are not collected on the target resource but are instead collected on the client-side.  This means that stats can be present, even for unmeshed resources and these resources should still be displayed, even if the `--unmeshed` flag is not set.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-05-20 10:08:27 -07:00
Tarun Pothulapati be664571c1
Separate grafana image tag in template (#4395)
Separates grafana image field into image.name, image.version and also moves controllerImageVersion to global
2020-05-20 22:27:19 +05:30
Joakim Roubert 960ce556ba
bin/_log.sh: Add shebang to please shellcheck (#4437)
Signed-off-by: Joakim Roubert <joakimr@axis.com>
2020-05-20 09:55:51 -07:00
Zahari Dichev 31e33d18d3
Enable service mirroring to work in private networks (#4440)
This change creates a gateway proxy for every gateway. This enables the probe worker to leverage the destination service functionality in order to discover the identity of the gateway.

Fix #4411

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-20 19:48:36 +03:00
Joakim Roubert de1b5d5a81
install-cni.sh: Fix shellcheck issues (#4405)
Where cat and echo are actually not needed, they have been removed.

Signed-off-by: Joakim Roubert <joakim.roubert@axis.com>
2020-05-20 09:29:14 -07:00
Zahari Dichev 6574f124a7
Restrict Service mirror RBACs (#4426)
This PR introduces a few changes that were requested after a bit of service mirror reviewing.

- we restrict the RBACs so the service mirror controller cannot read secrets in all namespaces but only in the one that it is installed in
- we unify the namespace namings so all multicluster resources are installedi n `linkerd-multicluster` on both clusters
- fixed checks to account for changes

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-20 17:08:01 +03:00
Joakim Roubert ef67cbed38
bin/lint: Fix shellcheck issue (#4434)
Delete variable `os` that is not used. The golangci-lint downloader script does its own extensive platform lookup before downloading the selected binary.

Signed-off-by: Joakim Roubert <joakimr@axis.com>
2020-05-19 23:23:25 -07:00
Kevin Leimkuhler d99c1486ba
Lint all markdown files in CI (#4402)
## Motivation

linkerd/rfc#22

## Solution

Use the [markdown-lint-action](https://github.com/marketplace/actions/markdown-linting-action) to lint all `.md` files for all pull requests
and pushes to master.

This action uses the default rules outlined in [markdownlint
package](https://github.com/DavidAnson/markdownlint/blob/master/doc/Rules.md).

The additional rules are added are explained below:
- Ignore line length lints for code blocks
- Ignore line length lints for tables
- Allow duplicate sub-headers in sibling headers (e.g. allowing multiple ##
  Significant headers in `CHANGES.md` as long as they are part of separate
  release headers)

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-05-19 23:03:50 -07:00
Joakim Roubert 30ba9a1261
bin/fmt: Fix shellcheck issue (#4438)
Signed-off-by: Joakim Roubert <joakimr@axis.com>
2020-05-19 14:49:28 -07:00
Joakim Roubert 6f1654a65d
bin/_tag.sh: Fix shellcheck issues (#4436)
Signed-off-by: Joakim Roubert <joakimr@axis.com>
2020-05-19 14:49:07 -07:00