This adds an integration test for upgrading from the latest edge to the current
build.
Closes#4471
Signed-off-by: Kevin Leimkuhler kevin@kleimkuhler.com
- Add quotes where missing, to handle whitespace & c:o.
- Use single quotes for non-expansion strings.
- Fix quotes were the current would cause errors.
Signed-off-by: Joakim Roubert <joakim.roubert@axis.com>
Failures in `bin/_test-run` from commands different than `go test`
aren't currently properly reported, in part because CI's bash default is
to have `set -e` which terminates the script and just outputs
`##[error]Process completed with exit code 2.` like
[here](https://github.com/linkerd/linkerd2/pull/4496/checks?check_run_id=720720352#step:14:116)
```
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
× no unschedulable pods
linkerd-controller-6c77c7ffb8-w8wh5: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-destination-6767d88f7f-rcnbq: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-grafana-76c76fcfb9-pdhfb: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-identity-5bcf97d6c8-q6rll: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-prometheus-6b95c56b44-hd9m6: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-proxy-injector-58d794ff9-jf7cj: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-sp-validator-6c5f999bfb-qg252: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-tap-6fdf84fc65-6txvr: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-web-8484fbd867-nm8z2: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
see https://linkerd.io/checks/#l5d-existence-unschedulable-pods for hints
Status check results are ×
[error]Process completed with exit code 2.
```
I've made the following changes to `bin/_test-run` to generate better
messages and Github annotations when an error occurs:
- Unset `set -e` so that errors don't immediately exit the script and
don't allow us to properly format the errors.
- Removed many of the `exit_on_err` calls after go test calls because
those output enough information already (they were not being used
anyways in CI because of `set -e`). And instead have `run_test` exit
upon a `go test` error.
- Added `exit_on_err` calls right after non-`go-test` commands to
properly report their failure.
- Refactored the `exit_on_err` function so that it generates a Github
error annotation upon failure.
- Removed `trap` in `install_stable`, since the OS should be able to
handle GC for stuff under `/tmp`.
Also, I've changed the exit 2 code from `linkerd check` when it fails,
to exit code 1.
## Motivation
As mentioned in the [Testing RFC](https://github.com/linkerd/rfc/blob/master/design/0003-isolated-integration-tests.md#constraints):
> The integration test setup checks require that certain conditions are
> satisfied by the given cluster. A surprising condition is that no
> pre-existing Linkerd installation resource may exist; if it does then it is
> deleted.
## Solution
`init_test_run` which runs before integration tests start will now exit the
script if any Linkerd resources exist on the cluster.
Example bad path:
```
Checking the linkerd binary...[ok]
Checking if there is a Kubernetes cluster available...[ok]
Checking if Linkerd resources exist on cluster...
Linkerd resources exist on cluster:
pod/hello-6b6b5d644d-xrnhn
pod/hello-slow-cooker-h8xn2
pod/world-fc8f457b7-gj7wq
pod/gateway-676fd64cb9-j57k6
pod/hello-c767bf764-cbdqh
pod/hello-slow-cooker-fqmxr
pod/slow-cooker-ftxdx
pod/t1-855c678bdd-vdg96
pod/t2-76989f94d4-d5fv8
pod/t3-75c8877797-hfwgc
pod/world-6784d4f65c-cn6vl
replicaset.apps/gateway-676fd64cb9
replicaset.apps/hello-c767bf764
replicaset.apps/t1-855c678bdd
replicaset.apps/t2-76989f94d4
replicaset.apps/t3-75c8877797
replicaset.apps/world-6784d4f65c
job.batch/hello-slow-cooker
job.batch/slow-cooker
Help:
Run [/home/kevin/Projects/linkerd/linkerd2/bin/test-cleanup]
Specify a cluster context [/home/kevin/Projects/linkerd/linkerd2/bin/test-run /home/kevin/Projects/linkerd/linkerd2/target/cli/linux/linkerd [l5d-integration] [context]]
exit
```
Example good path:
```
Checking the linkerd binary...[ok]
Checking if there is a Kubernetes cluster available...[ok]
Checking if Linkerd resources exist on cluster...[ok]
```
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Upgraded to Helm v3.2.1 from v2.16.1, getting rid of Tiller and making
other simplifications.
Note that the version placeholder in the `values.yaml` files had to be
changed from `{version}` to `linkerdVersionValue` because the former
confuses Helm v3.
* Increase timeout for Helm cleanup in integration tests
Tests were failing sporadically, waiting for the Helm namespace to get
cleaned up. I verified that it is getting cleaned up, but taking more
time sometimes.
* Bug in `linkerd uninstall` when attempting to delete PSP
We were using a wrong apiVersion for PSP in `linkerd uninstall`'s
output, which avoids removing that resource:
```
$ linkerd uninstall | kubectl delete -f -
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-controller"
deleted
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-destination"
deleted
...
mutatingwebhookconfiguration.admissionregistration.k8s.io
"linkerd-proxy-injector-webhook-config" deleted
validatingwebhookconfiguration.admissionregistration.k8s.io
"linkerd-sp-validator-webhook-config" deleted
namespace "linkerd" deleted
error: unable to recognize "uninstall.yml": no matches for kind
"PodSecurityPolicy" in version "extensions/v1beta1"
$ kubectl get psp -oname
podsecuritypolicy.policy/linkerd-linkerd-control-plane
```
I've also replaced the uninstall integration test with a new separate
suite that performs the installation, waits for it to be ready,
uninstalls, and then confirms `linkerd check --pre` returns as expected.
Have the preliminary setup for the Helm integration tests use
`bin/helm-build` instead of directly calling `helm dependency update`.
This allows testing `bin/helm-build` itself, and also lints the linkerd2
and linkerd2-cni charts (the latter lint call is being added as well in this
PR).
In light of the breaking changes we are introducing to the Helm chart and the convoluted upgrade process (see linkerd/website#647) an integration test can be quite helpful. This simply installs latest stable through helm install and then upgrades to the current head of the branch.
Signed-off-by: Zahari Dichev zaharidichev@gmail.com
* Fix whitespace path handling in non-docker (build) scripts
Handling of whitespace paths was not fully implemented; this patch adds
the missing pieces. Also, only use bash where bash-specific
functionality is used/needed.
Signed-off-by: Joakim Roubert <joakimr@axis.com>
- Added cleanup step at the end of all integration tests.
- Disable external_issuer_integration_tests in cloud_tests due to
namespace issue. Running this via `kind` tests is sufficient for now.
- Set a flakey test to `Skip`, relates to #3332.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Fix auto-injecting pods and integration tests reporting
When creating an Event when auto-injection occurs (#3316) we try to
fetch the parent object to associate the event to it. If the parent
doesn't exist (like in the case of stand-alone pods) the event isn't
created. I had missed dealing with one part where that parent was
expected.
This also adds a new integration test that I verified fails before this
fix.
Finally, I removed from `_test-run.sh` some `|| exit_code=$?` that was
preventing the whole suite to report failure whenever one of the tests
in `/tests` failed.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
The integration tests under `/test` were run separately via l5d-bot,
lacking the feedback and job management provided by ci.
Enable integration tests in ci, via a docker build and kind clusters
executed on a remote DOCKER_HOST.
CI runs are now broken into two stages, run serially. Each stage is
composed of jobs run in parallel:
- Setup stage
- Validate go deps
- Remote docker build
- Kind cluster setup (deep)
- Kind cluster setup (upgrade)
- Kind cluster setup (helm)
- Test stage
- Go unit tests
- Node.js unit tests
- Kind integration tests (deep)
- Kind integration tests (upgrade)
- Kind integration tests (helm)
This PR also modifies `bin/test-run.sh` to always set `--failfast` for
Go tests.
Also introduce `bin/docker` and `bin/kubectl` scripts, to ensure
cacheable, pinned executables in ci.
The existing integration tests for master merges and docker pushes,
running against GKE, remain in place.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The `bin/test-run` script executed upgrade, helm, and deep integration
test in series, but was structured in a way that did not permit running
these tests individually.
Move most of the logic from `bin/test-run` to a supporting library,
`bin/test-run.sh`, which will provide the ability to execute integration
tests individually. `bin/test-run`'s behavior is unchanged, it continues
to run upgrade, helm, and deep integration tests in series.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>