Commit Graph

54 Commits

Author SHA1 Message Date
Alejandro Pedraza 59eca5bb82
Use full-length actions SHAs in CI (#5668)
Github now requires that actions pined to a SHA use the full-length SHA,
otherwise CI throws a warning like:
```
actions/checkout@722adc6 looks like the shortened version of a commit
SHA. Referencing actions by the short SHA will be disabled soon. Please
see
https://docs.github.com/en/actions/learn-github-actions/security-hardening-for-github-actions#using-third-party-actions.
```
2021-02-04 16:51:53 -05:00
Alejandro Pedraza 8ac5360041
Extract from public-api all the Prometheus dependencies, and moves things into a new viz component 'linkerd-metrics-api' (#5560)
* Protobuf changes:
- Moved `healthcheck.proto` back from viz to `proto/common` as it remains being used by the main `healthcheck.go` library (it was moved to viz by #5510).
- Extracted from `viz.proto` the IP-related types and put them in `/controller/gen/common/net` to be used by both the public and the viz APIs.

* Added chart templates for new viz linkerd-metrics-api pod

* Spin-off viz healthcheck:
- Created `viz/pkg/healthcheck/healthcheck.go` that wraps the original `pkg/healthcheck/healthcheck.go` while adding the `vizNamespace` and `vizAPIClient` fields which were removed from the core `healthcheck`. That way the core healthcheck doesn't have any dependencies on viz, and viz' healthcheck can now be used to retrieve viz api clients.
- The core and viz healthcheck libs are now abstracted out via the new `healthcheck.Runner` interface.
- Refactored the data plane checks so they don't rely on calling `ListPods`
- The checks in `viz/cmd/check.go` have been moved to `viz/pkg/healthcheck/healthcheck.go` as well, so `check.go`'s sole responsibility is dealing with command business. This command also now retrieves its viz api client through viz' healthcheck.

* Removed linkerd-controller dependency on Prometheus:
- Removed the `global.prometheusUrl` config in the core values.yml.
- Leave the Heartbeat's `-prometheus` flag hard-coded temporarily. TO-DO: have it automatically discover viz and pull Prometheus' endpoint (#5352).

* Moved observability gRPC from linkerd-controller to viz:
- Created a new gRPC server under `viz/metrics-api` moving prometheus-dependent functions out of the core gRPC server and into it (same thing for the accompaigning http server).
- Did the same for the `PublicAPIClient` (now called just `Client`) interface. The `VizAPIClient` interface disappears as it's enough to just rely on the viz `ApiClient` protobuf type.
- Moved the other files implementing the rest of the gRPC functions from `controller/api/public` to `viz/metrics-api` (`edge.go`, `stat_summary.go`, etc.).
- Also simplified some type names to avoid stuttering.

* Added linkerd-metrics-api bootstrap files. At the same time, we strip out of the public-api's `main.go` file the prometheus parameters and other no longer relevant bits.

* linkerd-web updates: it requires connecting with both the public-api and the viz api, so both addresses (and the viz namespace) are now provided as parameters to the container.

* CLI updates and other minor things:
- Changes to command files under `cli/cmd`:
  - Updated `endpoints.go` according to new API interface name.
  - Updated `version.go`, `dashboard` and `uninstall.go` to pull the viz namespace dynamically.
- Changes to command files under `viz/cmd`:
  - `edges.go`, `routes.go`, `stat.go` and `top.go`: point to dependencies that were moved from public-api to viz.
- Other changes to have tests pass:
  - Added `metrics-api` to list of docker images to build in actions workflows.
  - In `bin/fmt` exclude protobuf generated files instead of entire directories because directories could contain both generated and non-generated code (case in point: `viz/metrics-api`).

* Add retry to 'tap API service is running' check

* mc check shouldn't err when viz is not available. Also properly set the log in multicluster/cmd/root.go so that it properly displays messages when --verbose is used
2021-01-21 18:26:38 -05:00
Matei David c63fbdf0e4
Introduce OpenAPIV3 validation for CRDs (#5573)
* Introduce OpenAPIV3 validation for CRDs

* Add validation to link crd
* Add validation to sp using kube-gen
* Add openapiv3 under schema fields in specific versions
* Modify fields to rid spec of yaml errors
* Add top level validation for all three CRDs

Signed-off-by: Matei David <matei.david.35@gmail.com>
2021-01-21 11:56:28 -05:00
Kevin Leimkuhler 308a1f3ff3
Use linkerd path in test-cleanup (#5498)
## What this fixes

When clusters are cleaned up after tests in CI, the `bin/test-cleanup` script is
responsible for clearing the cluster of all testing resources.

Right now this does not work as expected because the script uses the `linkerd`
binary instead of the Linkerd path that is passed in to the `tests` script.

There are cases where different binaries have different uninstall behavior and
the script can complete with an incomplete uninstallation.

## How it fixes

`test-cleanup` now takes a linkerd path argument. This is used to specify the
Linkerd binary that should be used when running in the `uninstall` commands.

This value is passed through from the `tests` invocation which means that in CI,
the same binary is used for running tests as well as cleaning up the cluster.

Additionally, specifying the k8s context has now moved from an argument to the
`--context` flag. This is similar to how `tests` script works because it's not
always required.

## How to use

Shown here:

``` $ bin/test-cleanup -h Cleanup Linkerd integration tests.

Usage:
    test-cleanup [--context k8s_context] /path/to/linkerd

Examples:
    # Cleanup tests in non-default context test-cleanup --context k8s_context
    /path/to/linkerd

Available Commands:
    --context: use a non-default k8s context
```

## edge-21.1.1

This edge release introduces a new "opaque transport" feature that allows the
proxy to securely transport server-speaks-first and otherwise opaque TCP
traffic. Using the `config.linkerd.io/opaque-ports` annotation on pods and
namespaces, users can configure ports that should skip the proxy's protocol
detection.

Additionally, a new `linkerd-viz` extension has been introduced that separates
the installation of the Grafana, Prometheus, web, and tap components. This
extension closely follows the Jaeger and multicluster extensions; users can
`install` and `uninstall` with the `linkerd viz ..` command as well as configure
for HA with the `--ha` flag.

The `linkerd viz install` command does not have any cli flags to customize the
install directly, but instead follows the Helm way of customization by using
flags such as `set`, `set-string`, `values`, `set-files`.

Finally, a new `/shutdown` admin endpoint that may only be accessed over the
loopback network has been added. This allows batch jobs to gracefully terminate
the proxy on completion. The `linkerd-await` utility can be used to automate
this.

* Added a new `linkerd multicluster check` command to validate that the
  `linkerd-multicluster` extension is working correctly
* Fixed description in the `linkerd edges` command (thanks @jsoref!)
* Moved the Grafana, Prometheus, web, and tap components into a new Viz chart,
  following the same extension model that multicluster and Jaeger follow
* Introduced a new "opaque transport" feature that allows the proxy to securely
  transport server-speaks-first and otherwise opaque TCP traffic
* Removed the check comparing the `ca.crt` field in the identity issuer secret
  and the trust anchors in the Linkerd config; these values being different is
  not a failure case for the `linkerd check` command (thanks @cypherfox!)
* Removed the Prometheus check from the `linkerd check` command since it now
  depends on a component that is installed with the Viz extension
* Fixed error messages thrown by the cert checks in `linkerd check` (thanks
  @pradeepnnv!)
* Added PodDisruptionBudgets to the control plane components so that they cannot
  be all terminated at the same time during disruptions (thanks @tustvold!)
* Fixed an issue that displayed the wrong `linkerd.io/proxy-version` when it is
  overridden by annotations (thanks @mateiidavid!)
* Added support for custom registries in the `linkerd-viz` helm chart (thanks
  @jimil749!)
* Renamed `proxy-mutator` to `jaeger-injector` in the `linkerd-jaeger` extension
* Added a new `/shutdown` admin endpoint that may only be accessed over the
  loopback network allowing batch jobs to gracefully terminate the proxy on
  completion
* Introduced the `linkerd identity` command, used to fetch the TLS certificates
  for injected pods (thanks @jimil749)
* Fixed an issue with the CNI plugin where it was incorrectly terminating and
  emitting error events (thanks @mhulscher!)
* Re-added support for non-LoadBalancer service types in the
  `linkerd-multicluster` extension

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-01-08 15:24:14 -05:00
Kevin Leimkuhler dd837be375
Build jaeger-webhook in release CI (#5381)
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-12-14 11:42:51 -05:00
Alejandro Pedraza deca7ede08
Consolidate integration tests under k3d (#5245)
* Consolidate integration tests under k3d

Fixes #5007

Simplified integration tests by moving all to k3d. Previously things were running in Kind, except for the multicluster tests, which implied some extra complexity in the supporting scripts.

Removed the KinD config files under `test/integration/configs`, as config is now passed as flags into the `k3d` command.

Also renamed `kind_integration.yml` to `integration_tests.yml`

Test skipping logic under ARM was also simplified.
2020-11-18 14:33:16 -05:00
Alex Leong 1a91f6b0df
Increase ARM integration test timeout (#5222)
The ARM integration tests take a very long time to run for some reason.  For example, in the stable-2.9.0 release, they took 
38 minutes.  Thus, this test needs a longer timeout.

Increase the ARM integration test timeout from 30 minutes to 60 minutes.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-11-12 13:57:28 -08:00
Oliver Gould 440a201997
actions: Limit job runtime to <= 30 minutes (#5216)
The default job timeout is 6 hours! This allows runaway builds to
consume our actions resources unnecessarily.

This change limits integration test jobs to 30 minutes. Static checks
are limited to 10 minutes.
2020-11-12 08:30:02 -08:00
Alex Leong 15bd95ee1d
Trigger ARM int tests for edge releases as well (#5073) (#5120)
Used to be triggered only for stable releases, but now that 2.9 stable
approaches let's turn it on for the upcoming RCs.

Signed-off-by: Alex Leong <alex@buoyant.io>

Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
2020-10-21 14:29:13 -07:00
Alex Leong 827646a3e1
Revert "Trigger ARM int tests for edge releases as well" (#5087)
This reverts commit 85cbcb4a85.

We disable the ARM integration tests for now until we have more confidence in them.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-10-15 09:28:48 -07:00
Alex Leong fbc405d5b4
Fix incorrect usage of --skip-kind-create flag (#5084)
The release workflow uses the `-skip-kind-create` flag when the flag is actually called `-skip-cluster-create`.  This causes the workflow to fail.

We correct the flag name.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-10-14 11:30:27 -07:00
Alejandro Pedraza 865e140be9
Trigger ARM int tests for edge releases as well (#5073)
Used to be triggered only for stable releases, but now that 2.9 stable
approaches let's turn it on for the upcoming RCs.
2020-10-14 10:54:07 -05:00
Alejandro Pedraza 3af25fa886
Fix how env vars are set in CI (#5054)
Replaced `set-env` directives with environment files, as explained
[here](https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/)

This gets rids of warnings of the sort:
```
The `set-env` command is deprecated and will be disabled soon. Please
upgrade to using Environment Files. For more information see:
https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/
```
2020-10-09 19:24:41 -07:00
Alejandro Pedraza e1772ae183
Fixed releases.yaml by pulling images directly from ghcr.io (#5035)
Previously, `releases.yaml` was trying to load images into the kind
clusters but that failed because those images were already in `ghcr.io`
and not in the local docker cache, but that failure was masked.
Unmasking that failure revealed some flaws that this change addresses:

- In `bin/_test_helpers` (used by `bin/tests`), modified the `images`
arg to accept `docker(default)|archive|skip`, for determining how to
load the images into the cluster (if loading them at all)
- In `bin/image-load`, changed arg `images` to `archive` which is more
descriptive.
- Have `kind_integration.yml` call `bin/tests --images archive`.
- Have `release.yml` call `bin/tests --images skip`.
2020-10-02 08:05:17 -05:00
Tarun Pothulapati ecce5b91f6
tests: Add Calico CNI deep integration tests (#4952)
* tests: Add new CNI deep integration tests

Fixes #3944

This PR adds a new test, called cni-calico-deep which installs the Linkerd CNI
plugin on top of a cluster with Calico and performs the current integration tests on top, thus
validating various Linkerd features when CNI is enabled. For Calico
to work, special config is required for kind which is at `cni-calico.yaml`

This is different from the CNI integration tests that we run in
cloud integration which performs the CNI level integration tests.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-09-23 19:58:28 +05:30
Alejandro Pedraza d6bcd1e906
Only run the ARM integration tests for stable releases (#4986) 2020-09-21 09:12:00 -05:00
Alejandro Pedraza 68582c5f5b
Do not run cloud integration tests in CI (#4969)
* Do not run cloud integration tests in CI

Closes #4963

Removed the `./.github/workflows/cloud_integration.yml` workflow, and
removed the `cloud_integration_tests` job from the ``./.github/workflows/release.yml` workflow.
2020-09-16 09:36:04 -05:00
Alejandro Pedraza ccf027c051
Push docker images to ghcr.io instead of gcr.io (#4953)
* Push docker images to ghcr.io instead of gcr.io

The `cloud_integration.yml` and `release.yml` workflows were modified to
log into ghcr.io, and remove the `Configure gcloud` step which is no
longer necessary.

Note that besides the changes to cloud_integration.yml and release.yml, there was a change to the upgrade-stable integration test so that we do linkerd upgrade --addon-overwrite to reset the addons settings because in stable-2.8.1 the Grafana image was pegged to gcr.io/linkerd-io/grafana in linkerd-config-addons. This will need to be mentioned in the 2.9 upgrade notes.

Also the egress integration test has a debug container that now is pegged to the edge-20.9.2 tag.

Besides that, the other changes are just a global search and replace (s/gcr.io\/linkerd-io/ghcr.io\/linkerd/).
2020-09-10 15:16:24 -05:00
Ali Ariff 5186383c81
Add ARM64 Integration Test (#4897)
* Add ARM64 Integration Test

Signed-off-by: Ali Ariff <ali.ariff12@gmail.com>
2020-08-28 10:38:40 -07:00
Alejandro Pedraza ac2bfb387b
Remove no longer needed GITCOOKIE_SH in CI (#4881)
Removed usage of `GITCOOKIE_SH`, which was a script stored in a secret
to authenticate requests against googlesource.com, to avoid hitting
rate limits when pulling go dependencies from that source. Now that we
use go modules, deps are pulled from http://proxy.golang.org/ and this
is no longer needed.
2020-08-17 12:36:07 -07:00
Ali Ariff ae8bb0e26e
Release ARM CLI artifacts (#4841)
* When releasing, build and upload the amd64, arm64 and arm architectures builds for the CLI
* Refactored `Dockerfile-bin` so it has separate stages for single and multi arch builds. The latter stage is only used for releases.

Signed-off-by: Ali Ariff <ali.ariff12@gmail.com>
2020-08-11 09:25:58 -05:00
Ali Ariff 61d7dedd98
Build ARM docker images (#4794)
Build ARM docker images in the release workflow.

# Changes:
- Add a new env key `DOCKER_MULTIARCH` and `DOCKER_PUSH`. When set, it will build multi-arch images and push them to the registry. See https://github.com/docker/buildx/issues/59 for why it must be pushed to the registry.
- Usage of `crazy-max/ghaction-docker-buildx ` is necessary as it already configured with the ability to perform cross-compilation (using QEMU) so we can just use it, instead of manually set up it.
- Usage of `buildx` now make default global arguments. (See: https://docs.docker.com/engine/reference/builder/#automatic-platform-args-in-the-global-scope)

# Follow-up:
- Releasing the CLI binary file in ARM architecture. The docker images resulting from these changes already build in the ARM arch. Still, we need to make another adjustment like how to retrieve those binaries and to name it correctly as part of Github Release artifacts.

Signed-off-by: Ali Ariff <ali.ariff12@gmail.com>
2020-08-05 11:14:01 -07:00
Alejandro Pedraza a1be60aea1
Reenable `upgrade-edge` integration test (#4821)
Followup to #4797

That test was temporarily disabled until the prometheus check in
`linkerd check` got fixed in #4797 and made it into edge-20.7.5
2020-07-31 12:11:32 -05:00
Alejandro Pedraza 2aea2221ed
Fixed `linkerd check` not finding Prometheus (#4797)
* Fixed `linkerd check` not finding Prometheus

## The Problem

`linkerd check` run right after install is failing because it can't find the Prometheus Pod.

## The Cause

The "control plane pods are ready" check used to verify the existence of all the control plane pods, blocking until all the pods were ready.

Since #4724, Prometheus is no longer included in that check because it's checked separately as an add-on. An unintended consequence is that when the ensuing "control plane self-check" is triggered, Prometheus might not be ready yet and the check fails because it doesn't do retries.

## The Fix

The "control plane self-check" uses a gRPC call (it's the only check that does that) and those weren't designed with retries in mind.

This PR adds retry functionality to the `runCheckRPC()` function, making sure the final output remains the same

It also temporarily disables the `upgrade-edge` integration test because after installing edge-20.7.4 `linkerd check` will fail because of this.
2020-07-27 11:54:03 -05:00
Alejandro Pedraza 5e789ba152
Migrate CI to docker buildx and other improvements (#4765)
* Migrate CI to docker buildx and other improvements

## Motivation
- Improve build times in forks. Specially when rerunning builds because of some flaky test.
- Start using `docker buildx` to pave the way for multiplatform builds.

## Performance improvements
These timings were taken for the `kind_integration.yml` workflow when we merged and rerun the lodash bump PR (#4762)

Before these improvements:
- when merging: `24:18`
- when rerunning after merge (docker cache warm): `19:00`
- when running the same changes in a fork (no docker cache): `32:15`

After these improvements:
- when merging: `25:38`
- when rerunning after merge (docker cache warm): `19:25`
- when running the same changes in a fork (docker cache warm): `19:25`

As explained below, non-forks and forks now use the same cache, so the important take is that forks will always start with a warm cache and we'll no longer see long build times like the `32:15` above.
The downside is a slight increase in the build times for non-forks (up to a little more than a minute, depending on the case).

## Build containers in parallel
The `docker_build` job in the `kind_integration.yml`, `cloud_integration.yml` and `release.yml` workflows relied on running `bin/docker-build` which builds all the containers in sequence. Now each container is built in parallel using a matrix strategy.

## New caching strategy
CI now uses `docker buildx` for building the container images, which allows using an external cache source for builds, a location in the filesystem in this case. That location gets cached using actions/cache, using the key `{{ runner.os }}-buildx-${{ matrix.target }}-${{ env.TAG }}` and the restore key `${{ runner.os }}-buildx-${{ matrix.target }}-`.

For example when building the `web` container, its image and all the intermediary layers get cached under the key `Linux-buildx-web-git-abc0123`. When that has been cached in the `main` branch, that cache will be available to all the child branches, including forks. If a new branch in a fork asks for a key like `Linux-buildx-web-git-def456`, the key won't be found during the first CI run, but the system falls back to the key `Linux-buildx-web-git-abc0123` from `main` and so the build will start with a warm cache (more info about how keys are matched in the [actions/cache docs](https://docs.github.com/en/actions/configuring-and-managing-workflows/caching-dependencies-to-speed-up-workflows#matching-a-cache-key)).

## Packet host no longer needed
To benefit from the warm caches both in non-forks and forks like just explained, we're required to ditch doing the builds in Packet and now everything runs in the github runners VMs.
As a result there's no longer separate logic for non-forks and forks in the workflow files; `kind_integration.yml` was greatly simplified but `cloud_integration.yml` and `release.yml` got a little bigger in order to use the actions artifacts as a repository for the images built. This bloat will be fixed when support for [composite actions](https://github.com/actions/runner/blob/users/ethanchewy/compositeADR/docs/adrs/0549-composite-run-steps.md) lands in github.

## Local builds
You still are able to run `bin/docker-build` or any of the `docker-build.*` scripts. And to make use of buildx, run those same scripts after having set the env var `DOCKER_BUILDKIT=1`. Using buildx supposes you have installed it, as instructed [here](https://github.com/docker/buildx).

## Other
- A new script `bin/docker-cache-prune` is used to remove unused images from the cache. Without that the cache grows constantly and we can rapidly hit the 5GB limit (when the limit is attained the oldest entries get evicted).
- The `go-deps` dockerfile base image was changed from `golang:1.14.2` (ubuntu based) to `golang-1:14.2-alpine` also to conserve cache space.

# Addressed separately in #4875:

Got rid of the `go-deps` image and instead added something similar on top of all the Dockerfiles dealing with `go`, as a first stage for those Dockerfiles. That continues to serve as a way to pre-populate go's build cache, which speeds up the builds in the subsequent stages. That build should in theory be rebuilt automatically only when `go.mod` or `go.sum` change, and now we don't require running `bin/update-go-deps-shas`. That script was removed along with all the logic elsewhere that used it, including the `go_dependencies` job in the `static_checks.yml` github workflow.

The list of modules preinstalled was moved from `Dockerfile-go-deps` to a new script `bin/install-deps`. I couldn't find a way to generate that list dynamically, so whenever a slow-to-compile dependency is found, we have to make sure it's included in that list.

Although this simplifies the dev workflow, note that the real motivation behind this was a limitation in buildx's `docker-container` driver that forbids us from depending on images that haven't been pushed to a registry, so we have to resort to building the dependencies as a first stage in the Dockerfiles.
2020-07-22 14:27:45 -05:00
Andrew Seigner 8773416496
Fix build status badge in README (#4769)
The GitHub Actions build status badge was referencing an old workflow
named `CI`, which always shows red, and is no longer used.

Fix the build status badge to reference the `Release` workload.

Also slightly reformat the matrix build yaml, as the list has grown a
bit.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2020-07-20 17:18:12 -07:00
Alejandro Pedraza 873bd61324
Helm integration deep tests (#4728)
This creates a new integration test target that launches the deep suite,
using a linkerd instance installed through Helm.

I've added a `global.proxyInit.ignoreInboundPorts=1234,5678` override
during install and enhanced the injection test to catch problems like
what we saw in #4679.
2020-07-10 14:48:49 -05:00
Alejandro Pedraza a30213b709
Increased node count for the GKE tests in release.yml (#4748)
Followup to #4746. Tests were running out of resources.
2020-07-10 11:05:12 -05:00
Alejandro Pedraza 9908b2b8b2
Re-enable custom domain integration test (#4722)
The function triggering the test for k8s custom cluster domain was
misnamed, and thus the test wasn't being run.

This also adds some extra error handling to catch this and other
potential issues.
2020-07-07 16:27:46 -05:00
Tarun Pothulapati cf34a14985
Add a Windows Linkerd cli Test (#4653)
This PR adds a new cli test to see if installation yamls are correctly
generated even on windows, this is important because of all the file
path difference between windows and Linux, and if any code uses a wrong
format might cause the chart generation commands to fail on windows.

This creates a separate workflow for both release and integration.

Also, all the exisiting integration tests are moved in to
/tests/integration to separate from /test/cli as this test does not fall
under integration tests category
2020-07-02 23:13:57 +05:30
Kevin Leimkuhler 29bcb57de4
Update release workflow kind integration tests (#4668)
## Description

As discussed [here](https://github.com/linkerd/linkerd2/pull/4653#discussion_r445543061), the `kind_integration` job of the release workflow was not kept in sync with the changes made in #4593.

Until GitHub actions can reuse yaml for separate workflows, these sections are supposed to be kept in sync.

This would be an issue if we had tried doing a release since #4593 merged, but that has not happened yet.

## Changes

This updates the release workflow `kind_integration` job to use the use new test interface, mainly removing cluster creation and image loading as necessary prerequisites.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-06-25 13:01:04 -04:00
Kevin Leimkuhler 4372ed56dd
Isolate tests by cluster and make run interface simpler (#4593)
## Summary

Change the default behavior of integration tests to be isolated by cluster.
Additionally, make running one or all tests easier than the current process.

These changes are explained more in the [Testing
RFC](https://github.com/linkerd/rfc/blob/master/design/0004-isolated-integration-tests.md)

## Changes

This is a script used only by Linkerd developers, but there is a lot of useful
usage examples and explanations in `bin/tests --help` output:

```
Run Linkerd integration tests.

Optionally specify one of the following tests: [upgrade helm helm-upgrade uninstall deep external-issuer]

Usage:
    tests [--images] [--images-host ssh://linkerd-docker] [--name test-name] [--skip-kind-create] /path/to/linkerd

Examples:
    # Run all tests in isolated clusters
    tests /path/to/linkerd

    # Run single test in isolated clusters
    tests --name test-name /path/to/linkerd

    # Skip KinD cluster creation and run all tests in default cluster context
    tests --skip-kind-create /path/to/linkerd

    # Load images from tar files located under the 'image-archives' directory
    # Note: This is primarly for CI
    tests --images /path/to/linkerd

    # Retrieve images from a remote docker instance and then load them into KinD
    # Note: This is primarly for CI
    tests --images --images-host ssh://linkerd-docker /path/to/linkerd

Available Commands:
    --name: the argument to this option is the specific test to run
    --skip-kind-create: skip KinD cluster creation step and run tests in an existing cluster.
    --images: (Primarily for CI) use 'kind load image-archive' to load the images from local .tar files in the current directory.
    --images-host: (Primarily for CI) the argument to this option is used as the remote docker instance from which images are first retrieved (using 'docker save') to be then loaded into KinD. This command requires --images.
```

### Run all tests

Old:

```bash
bin/test-run $PWD/bin/linkerd
```

New:

```bash
bin/tests $PWD/bin/linkerd
```

### Run single test (upgrade for example):

Current:

```bash
. bin/_test-run.sh
init_test_run $PWD/bin/linkerd
upgrade_integration_tests
```

New:

```bash
bin/tests --name upgrade $PWD/bin/linkerd
```

### Run tests in isolated KinD clusters

Current: Not possible without running single tests in newly created clusters
manually

New:

```bash
bin/tests $PWD/bin/linkerd
```

### Run tests in isolated namespaces on an existing cluster

Old:

```bash
bin/test-run $PWD/bin/linkerd
```

New:

```bash
bin/tests --skip-kind-create $PWD/bin/linkerd
```

## CI

`kind_integration` has been updated so that it does not create a KinD cluster as
part of its test setup.

`cloud_integration` passes the `--skip-kind-create` flag so that the tests are
run serially in a non-KinD cluster.


Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-06-24 17:06:29 -04:00
Alejandro Pedraza d842a97cb2
Update CI and docs to reference `main` branch (#4662)
Files changed:

```
.github/PULL_REQUEST_TEMPLATE.md
.github/workflows/cloud_integration.yml
.github/workflows/kind_integration.yml
.github/workflows/release.yml
.github/workflows/static_checks.yml
.github/workflows/unit_tests.yml
BUILD.md
CONTRIBUTING.md
bin/test-scale
bin/win/linkerd.nuspec
```
2020-06-24 12:39:22 -07:00
Alejandro Pedraza ba420f2fac
Fix release workflow - avoid downloading choco package in edges (#4638)
Added guard against trying to download choco package when not doing a
stable release.
2020-06-18 16:02:53 -05:00
Alejandro Pedraza 2696ea94dd
Fix release workflow dependencies on choco_pack (#4635)
The `choco_pack` job only runs for stable tags. In order for jobs to
depend on it to run on non-stable tags, we need to move this tag check from the
`choco_pack` job level down into its steps.
2020-06-18 13:50:19 -05:00
Kevin Leimkuhler b0765c4361
Add integration test for upgrading from edge (#4557)
This adds an integration test for upgrading from the latest edge to the current
build.

Closes #4471

Signed-off-by: Kevin Leimkuhler kevin@kleimkuhler.com
2020-06-16 09:18:52 -07:00
Alejandro Pedraza d10ed2aa5e
CI steps for Chocolatey package - take 2 (#4536)
* CI steps for Chocolatey package - take 2

Followup to #4205, supersedes #4205

This adds:

- A new job psscript-analyzer into the `statics_checks.yml`
workflow for linting the Chocolatey Powershell script.
- A new `choco_pack` job in the `release.yml` workflow for
updating the Chocolatey spec file and generating the
package. This is only triggered for stable releases. It requires
a windows runner in order to run the choco tooling (in theory
it should have worked on a linux runner but in practice it
didn't).
- The `Create release` step was updated to upload the generated package,
if present.
- The source file path in `bin/win/linkerd.nuspec` was updated
to make this work.

* Name nupkg file accordingly to the other release assets
2020-06-15 16:42:50 -05:00
Kevin Leimkuhler 8f5ff8d973
Wait for KinD nodes to be ready in CI (#4488)
* Wait for all nodes to be ready in CI

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-05-28 13:56:09 -07:00
Alex Leong 8b04a657e0
Fix typo in release workflow (#4475)
This should fix the warning in the release action: https://github.com/linkerd/linkerd2/actions/runs/111938670

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-05-26 09:27:25 -07:00
Kevin Leimkuhler 2e1eb9e2ec
Use bin/kind in CI scripts (#4464)
Create kind clusters using bin script instead of GitHub action

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-05-21 16:22:23 -07:00
Alejandro Pedraza 301429ea9b
Bump KinD to 0.8.1 (#4445)
* Bump KinD to 0.8.1

This brings us K8s 1.18, which is in theory passing all the integration
tests. Currently the tracing one is failing just because of the quay.io
downtime, that hosts the nginx-ingress image.

Re #4382
2020-05-20 14:46:05 -05:00
Alejandro Pedraza 8b0122bf94
Refactor CNI integration tests to use annotations functions (#4363)
Followup to #4341

Replaced all the `t.Error`/`t.Fatal` calls in the integration tests with the
new functions defined in `testutil/annotations.go` as described in #4292,
in order for the errors to produce Github annotations.

This piece takes care of the CNI integration test suite.

This also enables the annotations for these and the general integration
tests, by setting the `GH_ANNOTATIONS` environment variable in the
workflows whose flakiness we're interested on catching: Kind
integration, Cloud integration and Release.

Re #4176
2020-05-14 12:13:07 -05:00
Alejandro Pedraza 9e9f3bb1e2
Not use preemptible nodes in cloud and release workflows (#4315)
Temporarily set `preemptible: false` for cloud and release workflows to
see if that's the cause of nodes getting killed.
2020-04-30 18:05:24 -05:00
Kevin Leimkuhler c8bca67306
Update integration tests to use kind 0.7.0 (#4282)
I missed these integration test version bumps in #4280 

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-04-23 13:53:00 -07:00
Alejandro Pedraza 322ba5fd2f
`linkerd uninstall` errors when attempting to delete PSP (#4234)
* Bug in `linkerd uninstall` when attempting to delete PSP

We were using a wrong apiVersion for PSP in `linkerd uninstall`'s
output, which avoids removing that resource:

```
$ linkerd uninstall | kubectl delete -f -
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-controller"
deleted
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-destination"
deleted
...
mutatingwebhookconfiguration.admissionregistration.k8s.io
"linkerd-proxy-injector-webhook-config" deleted
validatingwebhookconfiguration.admissionregistration.k8s.io
"linkerd-sp-validator-webhook-config" deleted
namespace "linkerd" deleted
error: unable to recognize "uninstall.yml": no matches for kind
"PodSecurityPolicy" in version "extensions/v1beta1"

$ kubectl get psp -oname
podsecuritypolicy.policy/linkerd-linkerd-control-plane
```

I've also replaced the uninstall integration test with a new separate
suite that performs the installation, waits for it to be ready,
uninstalls, and then confirms `linkerd check --pre` returns as expected.
2020-04-07 11:01:11 -05:00
Alejandro Pedraza be558b6869
Don't include git SHA in cloud_integration_tests names - part 2 (#4231)
Followup to #4230

I forgot to make the same change to the `release.yml` workflow
2020-04-02 19:43:27 -05:00
Alejandro Pedraza 22f1606b73
Extract common logic in scripts and CI to load images into KinD (#4212)
Fixes #4206 Followup to #4167

Extract common logic to load images into KinD, from `bin/kind-load`, `bin/install-pr`, `.github/workflows/kind_integration.yml` and `.github/workflows/release.yml`.

Besides removing the duplication, `bin/kind-load` will benefit in performance by having each image be loaded in parallel.

```
Load into KinD the images for Linkerd's proxy, controller, web, grafana, debug and cni-plugin.

Usage:
    bin/kind-load [--images] [--images-host ssh://linkerd-docker]

Examples:

    # Load images from the local docker instance
    bin/kind-load

    # Load images from tar files located in the current directory
    bin/kind-load --images

    # Retrieve images from a remote docker instance and then load them into KinD
    bin/kind-load --images --images-host ssh://linkerd-docker

Available Commands:
    --images: use 'kind load image-archive' to load the images from local .tar files in the current directory.
    --images-host: the argument to this option is used as the remote docker instance from which images are first retrieved
                   (using 'docker save') to be then loaded into KinD. This command requires --images.
```
2020-03-30 16:28:28 -05:00
cpretzer abbf4a4e60
Set the published version check to sleep for 30 seconds (#4143)
Signed-off-by: Charles Pretzer <charles@buoyant.io>
2020-03-06 12:27:21 -08:00
Alejandro Pedraza 578a2d1960
CI: Adjustments to the release job (#4129)
Extracted the logic to pull the latest release notes, out of
`bin/create-release-tag` into `bin/_release.sh` so that it can be reused
in the `release.yml` workflow, which needs to use that inside
`gh_release` when creating the github release in order to have prettier
markup release notes instead of a plaintext message pulled out of the tag
message.
The new extracted function also receives an optional argument with the
name of the file to put the release notes into, because the `body_path`
parameter in `softprops/action-gh-release` doesn't work with dynamic
vars.

Finally, now the `website_publish` job will only launch until the `gh_release`
has succeeded.
2020-03-05 09:03:30 -05:00
Alejandro Pedraza a65f76ed22
Use SHAs instead of tags when referring to GH Actions libs (#4114)
When adding an action we can quickly vet it and fix it to a sha. Whereas
if we use a tag, the 3rd party can change the code and retag it without us noticing
2020-03-02 15:03:24 -05:00