linkerd2

Commit Graph

Author	SHA1	Message	Date
Alejandro Pedraza	59eca5bb82	Use full-length actions SHAs in CI (#5668 ) Github now requires that actions pined to a SHA use the full-length SHA, otherwise CI throws a warning like: ``` actions/checkout@722adc6 looks like the shortened version of a commit SHA. Referencing actions by the short SHA will be disabled soon. Please see https://docs.github.com/en/actions/learn-github-actions/security-hardening-for-github-actions#using-third-party-actions. ```	2021-02-04 16:51:53 -05:00
Alejandro Pedraza	8ac5360041	Extract from public-api all the Prometheus dependencies, and moves things into a new viz component 'linkerd-metrics-api' (#5560 ) * Protobuf changes: - Moved `healthcheck.proto` back from viz to `proto/common` as it remains being used by the main `healthcheck.go` library (it was moved to viz by #5510). - Extracted from `viz.proto` the IP-related types and put them in `/controller/gen/common/net` to be used by both the public and the viz APIs. * Added chart templates for new viz linkerd-metrics-api pod * Spin-off viz healthcheck: - Created `viz/pkg/healthcheck/healthcheck.go` that wraps the original `pkg/healthcheck/healthcheck.go` while adding the `vizNamespace` and `vizAPIClient` fields which were removed from the core `healthcheck`. That way the core healthcheck doesn't have any dependencies on viz, and viz' healthcheck can now be used to retrieve viz api clients. - The core and viz healthcheck libs are now abstracted out via the new `healthcheck.Runner` interface. - Refactored the data plane checks so they don't rely on calling `ListPods` - The checks in `viz/cmd/check.go` have been moved to `viz/pkg/healthcheck/healthcheck.go` as well, so `check.go`'s sole responsibility is dealing with command business. This command also now retrieves its viz api client through viz' healthcheck. * Removed linkerd-controller dependency on Prometheus: - Removed the `global.prometheusUrl` config in the core values.yml. - Leave the Heartbeat's `-prometheus` flag hard-coded temporarily. TO-DO: have it automatically discover viz and pull Prometheus' endpoint (#5352). * Moved observability gRPC from linkerd-controller to viz: - Created a new gRPC server under `viz/metrics-api` moving prometheus-dependent functions out of the core gRPC server and into it (same thing for the accompaigning http server). - Did the same for the `PublicAPIClient` (now called just `Client`) interface. The `VizAPIClient` interface disappears as it's enough to just rely on the viz `ApiClient` protobuf type. - Moved the other files implementing the rest of the gRPC functions from `controller/api/public` to `viz/metrics-api` (`edge.go`, `stat_summary.go`, etc.). - Also simplified some type names to avoid stuttering. * Added linkerd-metrics-api bootstrap files. At the same time, we strip out of the public-api's `main.go` file the prometheus parameters and other no longer relevant bits. * linkerd-web updates: it requires connecting with both the public-api and the viz api, so both addresses (and the viz namespace) are now provided as parameters to the container. * CLI updates and other minor things: - Changes to command files under `cli/cmd`: - Updated `endpoints.go` according to new API interface name. - Updated `version.go`, `dashboard` and `uninstall.go` to pull the viz namespace dynamically. - Changes to command files under `viz/cmd`: - `edges.go`, `routes.go`, `stat.go` and `top.go`: point to dependencies that were moved from public-api to viz. - Other changes to have tests pass: - Added `metrics-api` to list of docker images to build in actions workflows. - In `bin/fmt` exclude protobuf generated files instead of entire directories because directories could contain both generated and non-generated code (case in point: `viz/metrics-api`). * Add retry to 'tap API service is running' check * mc check shouldn't err when viz is not available. Also properly set the log in multicluster/cmd/root.go so that it properly displays messages when --verbose is used	2021-01-21 18:26:38 -05:00
Matei David	c63fbdf0e4	Introduce OpenAPIV3 validation for CRDs (#5573 ) * Introduce OpenAPIV3 validation for CRDs * Add validation to link crd * Add validation to sp using kube-gen * Add openapiv3 under schema fields in specific versions * Modify fields to rid spec of yaml errors * Add top level validation for all three CRDs Signed-off-by: Matei David <matei.david.35@gmail.com>	2021-01-21 11:56:28 -05:00
Kevin Leimkuhler	308a1f3ff3	Use linkerd path in test-cleanup (#5498 ) ## What this fixes When clusters are cleaned up after tests in CI, the `bin/test-cleanup` script is responsible for clearing the cluster of all testing resources. Right now this does not work as expected because the script uses the `linkerd` binary instead of the Linkerd path that is passed in to the `tests` script. There are cases where different binaries have different uninstall behavior and the script can complete with an incomplete uninstallation. ## How it fixes `test-cleanup` now takes a linkerd path argument. This is used to specify the Linkerd binary that should be used when running in the `uninstall` commands. This value is passed through from the `tests` invocation which means that in CI, the same binary is used for running tests as well as cleaning up the cluster. Additionally, specifying the k8s context has now moved from an argument to the `--context` flag. This is similar to how `tests` script works because it's not always required. ## How to use Shown here: ``` $ bin/test-cleanup -h Cleanup Linkerd integration tests. Usage: test-cleanup [--context k8s_context] /path/to/linkerd Examples: # Cleanup tests in non-default context test-cleanup --context k8s_context /path/to/linkerd Available Commands: --context: use a non-default k8s context ``` ## edge-21.1.1 This edge release introduces a new "opaque transport" feature that allows the proxy to securely transport server-speaks-first and otherwise opaque TCP traffic. Using the `config.linkerd.io/opaque-ports` annotation on pods and namespaces, users can configure ports that should skip the proxy's protocol detection. Additionally, a new `linkerd-viz` extension has been introduced that separates the installation of the Grafana, Prometheus, web, and tap components. This extension closely follows the Jaeger and multicluster extensions; users can `install` and `uninstall` with the `linkerd viz ..` command as well as configure for HA with the `--ha` flag. The `linkerd viz install` command does not have any cli flags to customize the install directly, but instead follows the Helm way of customization by using flags such as `set`, `set-string`, `values`, `set-files`. Finally, a new `/shutdown` admin endpoint that may only be accessed over the loopback network has been added. This allows batch jobs to gracefully terminate the proxy on completion. The `linkerd-await` utility can be used to automate this. * Added a new `linkerd multicluster check` command to validate that the `linkerd-multicluster` extension is working correctly * Fixed description in the `linkerd edges` command (thanks @jsoref!) * Moved the Grafana, Prometheus, web, and tap components into a new Viz chart, following the same extension model that multicluster and Jaeger follow * Introduced a new "opaque transport" feature that allows the proxy to securely transport server-speaks-first and otherwise opaque TCP traffic * Removed the check comparing the `ca.crt` field in the identity issuer secret and the trust anchors in the Linkerd config; these values being different is not a failure case for the `linkerd check` command (thanks @cypherfox!) * Removed the Prometheus check from the `linkerd check` command since it now depends on a component that is installed with the Viz extension * Fixed error messages thrown by the cert checks in `linkerd check` (thanks @pradeepnnv!) * Added PodDisruptionBudgets to the control plane components so that they cannot be all terminated at the same time during disruptions (thanks @tustvold!) * Fixed an issue that displayed the wrong `linkerd.io/proxy-version` when it is overridden by annotations (thanks @mateiidavid!) * Added support for custom registries in the `linkerd-viz` helm chart (thanks @jimil749!) * Renamed `proxy-mutator` to `jaeger-injector` in the `linkerd-jaeger` extension * Added a new `/shutdown` admin endpoint that may only be accessed over the loopback network allowing batch jobs to gracefully terminate the proxy on completion * Introduced the `linkerd identity` command, used to fetch the TLS certificates for injected pods (thanks @jimil749) * Fixed an issue with the CNI plugin where it was incorrectly terminating and emitting error events (thanks @mhulscher!) * Re-added support for non-LoadBalancer service types in the `linkerd-multicluster` extension Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2021-01-08 15:24:14 -05:00
Kevin Leimkuhler	dd837be375	Build jaeger-webhook in release CI (#5381 ) Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-12-14 11:42:51 -05:00
Joakim Roubert	377b38f0bf	bin/protoc-diff: Don't assume Debian and don't install unzip (#5347 ) - Do unzip check but don't install; leave installation to user - Move unzip check to bin/protoc that actually uses unzip - Make sure the protoc scripts can be called from any directory Fixes #5337 Signed-off-by: Joakim Roubert <joakim.roubert@axis.com>	2020-12-09 09:12:38 -05:00
Alex Leong	cdc57d1af0	Use linkerd-jaeger extension for control plane tracing (#5299 ) Now that tracing has been split out of the main control plane and into the linkerd-jaeger extension, we remove references to tracing from the main control plane including: * removing the tracing components from the main control plane chart * removing the tracing injection logic from the main proxy injector and inject CLI (these will be added back into the new injector in the linkerd-jaeger extension) * removing tracing related checks (these will be added back into `linkerd jaeger check`) * removing related tests We also update the `--control-plane-tracing` flag to configure the control plane components to send traces to the linkerd-jaeger extension. To make sure this works even when the linkerd-jaeger extension is installed in a non-default namespace, we also add a `--control-plane-tracing-namespace` flag which can be used to change the namespace that the control plane components send traces to. Note that for now, only the control plane components send traces; the proxies in the control plane do not. This is because the linkerd-jaeger injector is not yet available. However, this change adds the appropriate namespace annotations to the control plane namespace to configure the proxies to send traces to the linkerd-jaeger extension once the linkerd-jaeger injector is available. I tested this by doing the following: 1. bin/linkerd install \| kubectl apply -f - 1. bin/helm install jaeger jaeger/charts/jaeger 1. bin/linkerd upgrade --control-plane-tracing=true \| kubectl apply -f - 1. kubectl -n linkerd-jaeger port-forward svc/jaeger 16686 1. open http://localhost:16686 1. see traces from the linkerd control plane Signed-off-by: Alex Leong <alex@buoyant.io>	2020-12-08 14:34:26 -08:00
Alejandro Pedraza	94574d4003	Add automatic readme generation for charts (#5316 ) * Add automatic readme generation for charts The current readmes for each chart is generated manually and doesn't contain all the information available. Utilize helm-docs to automatically fill out readme.mds for the helm charts by pulling metadata from values.yml. Fixes #4156 Co-authored-by: GMarkfjard <gabma047@student.liu.se>	2020-12-02 14:37:45 -05:00
Alejandro Pedraza	deca7ede08	Consolidate integration tests under k3d (#5245 ) * Consolidate integration tests under k3d Fixes #5007 Simplified integration tests by moving all to k3d. Previously things were running in Kind, except for the multicluster tests, which implied some extra complexity in the supporting scripts. Removed the KinD config files under `test/integration/configs`, as config is now passed as flags into the `k3d` command. Also renamed `kind_integration.yml` to `integration_tests.yml` Test skipping logic under ARM was also simplified.	2020-11-18 14:33:16 -05:00
Alex Leong	1a91f6b0df	Increase ARM integration test timeout (#5222 ) The ARM integration tests take a very long time to run for some reason. For example, in the stable-2.9.0 release, they took 38 minutes. Thus, this test needs a longer timeout. Increase the ARM integration test timeout from 30 minutes to 60 minutes. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-11-12 13:57:28 -08:00
Oliver Gould	440a201997	actions: Limit job runtime to <= 30 minutes (#5216 ) The default job timeout is 6 hours! This allows runaway builds to consume our actions resources unnecessarily. This change limits integration test jobs to 30 minutes. Static checks are limited to 10 minutes.	2020-11-12 08:30:02 -08:00
Alex Leong	15bd95ee1d	Trigger ARM int tests for edge releases as well (#5073 ) (#5120 ) Used to be triggered only for stable releases, but now that 2.9 stable approaches let's turn it on for the upcoming RCs. Signed-off-by: Alex Leong <alex@buoyant.io> Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>	2020-10-21 14:29:13 -07:00
Alex Leong	827646a3e1	Revert "Trigger ARM int tests for edge releases as well" (#5087 ) This reverts commit `85cbcb4a85`. We disable the ARM integration tests for now until we have more confidence in them. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-10-15 09:28:48 -07:00
Alex Leong	fbc405d5b4	Fix incorrect usage of --skip-kind-create flag (#5084 ) The release workflow uses the `-skip-kind-create` flag when the flag is actually called `-skip-cluster-create`. This causes the workflow to fail. We correct the flag name. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-10-14 11:30:27 -07:00
Alejandro Pedraza	865e140be9	Trigger ARM int tests for edge releases as well (#5073 ) Used to be triggered only for stable releases, but now that 2.9 stable approaches let's turn it on for the upcoming RCs.	2020-10-14 10:54:07 -05:00
Alejandro Pedraza	3af25fa886	Fix how env vars are set in CI (#5054 ) Replaced `set-env` directives with environment files, as explained [here](https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/) This gets rids of warnings of the sort: ``` The `set-env` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/ ```	2020-10-09 19:24:41 -07:00
Alejandro Pedraza	e1772ae183	Fixed releases.yaml by pulling images directly from ghcr.io (#5035 ) Previously, `releases.yaml` was trying to load images into the kind clusters but that failed because those images were already in `ghcr.io` and not in the local docker cache, but that failure was masked. Unmasking that failure revealed some flaws that this change addresses: - In `bin/_test_helpers` (used by `bin/tests`), modified the `images` arg to accept `docker(default)\|archive\|skip`, for determining how to load the images into the cluster (if loading them at all) - In `bin/image-load`, changed arg `images` to `archive` which is more descriptive. - Have `kind_integration.yml` call `bin/tests --images archive`. - Have `release.yml` call `bin/tests --images skip`.	2020-10-02 08:05:17 -05:00
Alejandro Pedraza	b50ae6290d	Add support for k3d in integration tests (#4994 ) * Add support for k3d in integration tests KinD doesn't support setting LoadBalancer services out of the box. It can be added with some additional work, but it seems the solutions are not cross-platform. K3d on the other hand facilitates this, so we'll be using k3d clusters for the multicluster integration test. The current change sets the ground by generalizing some of the integration tests operations that were hard-coded to KinD. - Added `bin/k3d` to wrap the setup and running of a pinned version of `k3d`. - Refactored `bin/_test-helpers.sh` to account for tests to be run in either KinD or k3d. - Renamed `bin/kind-load` to `bin/image-load` and make it more generic to load images for both KinD (default) and k3d. Also got rid of the no longer used `--images-host` option. - Added a placeholder for the new `multicluster` test in the lists in `bin/_test-helpers.sh`. It starts by setting up two k3d clusters. * Refactor handling of the `--multicluster` flag in integration tests (#4995) Followup to #4994, based off of that branch (`alpeb/k3d-tests`). This is more preliminary work previous to the more complete multicluster integration test. - Removed the `--multicluster` flag from all the tests we had in `bin/_test-helpers.sh`, so only the new "multicluster" integration test will make use of that. Also got rid of the `TestUninstallMulticluster()` test in `install_test.go` to keep the multicluster stuff around, needed for the more complete multicluster test that will be implemented in a followup PR. - Added "multicluster" to the list of tests in the `kind_integration.yml` workflow. - For now, this new "multicluster" test in `run_multicluster_test()` is just running the install tests (`test/integration/install_test.go`) with the `--multicluster` flag. Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-09-25 16:33:17 -05:00
Tarun Pothulapati	ecce5b91f6	tests: Add Calico CNI deep integration tests (#4952 ) * tests: Add new CNI deep integration tests Fixes #3944 This PR adds a new test, called cni-calico-deep which installs the Linkerd CNI plugin on top of a cluster with Calico and performs the current integration tests on top, thus validating various Linkerd features when CNI is enabled. For Calico to work, special config is required for kind which is at `cni-calico.yaml` This is different from the CNI integration tests that we run in cloud integration which performs the CNI level integration tests. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-09-23 19:58:28 +05:30
Alejandro Pedraza	d6bcd1e906	Only run the ARM integration tests for stable releases (#4986 )	2020-09-21 09:12:00 -05:00
Alejandro Pedraza	68582c5f5b	Do not run cloud integration tests in CI (#4969 ) * Do not run cloud integration tests in CI Closes #4963 Removed the `./.github/workflows/cloud_integration.yml` workflow, and removed the `cloud_integration_tests` job from the ``./.github/workflows/release.yml` workflow.	2020-09-16 09:36:04 -05:00
Alejandro Pedraza	ccf027c051	Push docker images to ghcr.io instead of gcr.io (#4953 ) * Push docker images to ghcr.io instead of gcr.io The `cloud_integration.yml` and `release.yml` workflows were modified to log into ghcr.io, and remove the `Configure gcloud` step which is no longer necessary. Note that besides the changes to cloud_integration.yml and release.yml, there was a change to the upgrade-stable integration test so that we do linkerd upgrade --addon-overwrite to reset the addons settings because in stable-2.8.1 the Grafana image was pegged to gcr.io/linkerd-io/grafana in linkerd-config-addons. This will need to be mentioned in the 2.9 upgrade notes. Also the egress integration test has a debug container that now is pegged to the edge-20.9.2 tag. Besides that, the other changes are just a global search and replace (s/gcr.io\/linkerd-io/ghcr.io\/linkerd/).	2020-09-10 15:16:24 -05:00
Tarun Pothulapati	c4f8ba270d	Generate Identity certs with alternate domain names (#4920 ) Updating only the go 1.15 version, makes the upgrades fail from older versions, as the identity certs do not have that setting and go 1.15 expects them. This PR upgrades the cert generation code to have that field, allowing us to move to go 1.15 in later versions of Linkerd.	2020-09-03 22:33:10 +05:30
Alejandro Pedraza	85b71ad786	Revert "Temporarily disable job `psscript-analyzer` in static checks (#4837 )" (#4937 ) This reverts #4837 which disabled the psscript-analyzer job that had an issue. This upgrades it to version 2.3.0, which fixes the issue.	2020-09-03 11:54:22 -05:00
Ali Ariff	5186383c81	Add ARM64 Integration Test (#4897 ) * Add ARM64 Integration Test Signed-off-by: Ali Ariff <ali.ariff12@gmail.com>	2020-08-28 10:38:40 -07:00
Tharun Rajendran	8a2cb12656	Fix js unit test dependency on ci (#4896 ) unit tests in ci are runned using yarn install. So, there will be some update to the dependencies. This is fixed by passing --frozen-lockfile in ci workflow Fixes #3838 Signed-off-by: Tharun <rajendrantharun@live.com>	2020-08-24 14:21:41 -07:00
Zahari Dichev	2e7c00aa37	Diff generated code from proto files (#4863 ) Add a static check that ensures the generated files from the proto definitions have not changed. Fix #4669 Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-08-18 11:44:33 +03:00
Alejandro Pedraza	ac2bfb387b	Remove no longer needed GITCOOKIE_SH in CI (#4881 ) Removed usage of `GITCOOKIE_SH`, which was a script stored in a secret to authenticate requests against googlesource.com, to avoid hitting rate limits when pulling go dependencies from that source. Now that we use go modules, deps are pulled from http://proxy.golang.org/ and this is no longer needed.	2020-08-17 12:36:07 -07:00
Ali Ariff	ae8bb0e26e	Release ARM CLI artifacts (#4841 ) * When releasing, build and upload the amd64, arm64 and arm architectures builds for the CLI * Refactored `Dockerfile-bin` so it has separate stages for single and multi arch builds. The latter stage is only used for releases. Signed-off-by: Ali Ariff <ali.ariff12@gmail.com>	2020-08-11 09:25:58 -05:00
Ali Ariff	61d7dedd98	Build ARM docker images (#4794 ) Build ARM docker images in the release workflow. # Changes: - Add a new env key `DOCKER_MULTIARCH` and `DOCKER_PUSH`. When set, it will build multi-arch images and push them to the registry. See https://github.com/docker/buildx/issues/59 for why it must be pushed to the registry. - Usage of `crazy-max/ghaction-docker-buildx ` is necessary as it already configured with the ability to perform cross-compilation (using QEMU) so we can just use it, instead of manually set up it. - Usage of `buildx` now make default global arguments. (See: https://docs.docker.com/engine/reference/builder/#automatic-platform-args-in-the-global-scope) # Follow-up: - Releasing the CLI binary file in ARM architecture. The docker images resulting from these changes already build in the ARM arch. Still, we need to make another adjustment like how to retrieve those binaries and to name it correctly as part of Github Release artifacts. Signed-off-by: Ali Ariff <ali.ariff12@gmail.com>	2020-08-05 11:14:01 -07:00
Alejandro Pedraza	f38bdf8ecc	Temporarily disable job `psscript-analyzer` in static checks (#4837 ) The job started failing consistently today with: ``` ##[error]devblackops/github-action-psscriptanalyzer/v2/action.yml (Line: 30, Col: 9): Unexpected value '' ##[error]devblackops/github-action-psscriptanalyzer/v2/action.yml (Line: 30, Col: 9): Unexpected value '' ##[error]System.ArgumentException: Unexpected type 'NullToken' encountered while reading 'outputs'. The type 'MappingToken' was expected. ``` It seems it's something in Github that changed today that is clashing with the `devblackops/github-action-psscriptanalyzer` action. I've raised devblackops/github-action-psscriptanalyzer#12	2020-08-04 22:01:30 -07:00
Alejandro Pedraza	a1be60aea1	Reenable `upgrade-edge` integration test (#4821 ) Followup to #4797 That test was temporarily disabled until the prometheus check in `linkerd check` got fixed in #4797 and made it into edge-20.7.5	2020-07-31 12:11:32 -05:00
Alejandro Pedraza	2aea2221ed	Fixed `linkerd check` not finding Prometheus (#4797 ) * Fixed `linkerd check` not finding Prometheus ## The Problem `linkerd check` run right after install is failing because it can't find the Prometheus Pod. ## The Cause The "control plane pods are ready" check used to verify the existence of all the control plane pods, blocking until all the pods were ready. Since #4724, Prometheus is no longer included in that check because it's checked separately as an add-on. An unintended consequence is that when the ensuing "control plane self-check" is triggered, Prometheus might not be ready yet and the check fails because it doesn't do retries. ## The Fix The "control plane self-check" uses a gRPC call (it's the only check that does that) and those weren't designed with retries in mind. This PR adds retry functionality to the `runCheckRPC()` function, making sure the final output remains the same It also temporarily disables the `upgrade-edge` integration test because after installing edge-20.7.4 `linkerd check` will fail because of this.	2020-07-27 11:54:03 -05:00
Alejandro Pedraza	5e789ba152	Migrate CI to docker buildx and other improvements (#4765 ) * Migrate CI to docker buildx and other improvements ## Motivation - Improve build times in forks. Specially when rerunning builds because of some flaky test. - Start using `docker buildx` to pave the way for multiplatform builds. ## Performance improvements These timings were taken for the `kind_integration.yml` workflow when we merged and rerun the lodash bump PR (#4762) Before these improvements: - when merging: `24:18` - when rerunning after merge (docker cache warm): `19:00` - when running the same changes in a fork (no docker cache): `32:15` After these improvements: - when merging: `25:38` - when rerunning after merge (docker cache warm): `19:25` - when running the same changes in a fork (docker cache warm): `19:25` As explained below, non-forks and forks now use the same cache, so the important take is that forks will always start with a warm cache and we'll no longer see long build times like the `32:15` above. The downside is a slight increase in the build times for non-forks (up to a little more than a minute, depending on the case). ## Build containers in parallel The `docker_build` job in the `kind_integration.yml`, `cloud_integration.yml` and `release.yml` workflows relied on running `bin/docker-build` which builds all the containers in sequence. Now each container is built in parallel using a matrix strategy. ## New caching strategy CI now uses `docker buildx` for building the container images, which allows using an external cache source for builds, a location in the filesystem in this case. That location gets cached using actions/cache, using the key `{{ runner.os }}-buildx-${{ matrix.target }}-${{ env.TAG }}` and the restore key `${{ runner.os }}-buildx-${{ matrix.target }}-`. For example when building the `web` container, its image and all the intermediary layers get cached under the key `Linux-buildx-web-git-abc0123`. When that has been cached in the `main` branch, that cache will be available to all the child branches, including forks. If a new branch in a fork asks for a key like `Linux-buildx-web-git-def456`, the key won't be found during the first CI run, but the system falls back to the key `Linux-buildx-web-git-abc0123` from `main` and so the build will start with a warm cache (more info about how keys are matched in the [actions/cache docs](https://docs.github.com/en/actions/configuring-and-managing-workflows/caching-dependencies-to-speed-up-workflows#matching-a-cache-key)). ## Packet host no longer needed To benefit from the warm caches both in non-forks and forks like just explained, we're required to ditch doing the builds in Packet and now everything runs in the github runners VMs. As a result there's no longer separate logic for non-forks and forks in the workflow files; `kind_integration.yml` was greatly simplified but `cloud_integration.yml` and `release.yml` got a little bigger in order to use the actions artifacts as a repository for the images built. This bloat will be fixed when support for [composite actions](https://github.com/actions/runner/blob/users/ethanchewy/compositeADR/docs/adrs/0549-composite-run-steps.md) lands in github. ## Local builds You still are able to run `bin/docker-build` or any of the `docker-build.*` scripts. And to make use of buildx, run those same scripts after having set the env var `DOCKER_BUILDKIT=1`. Using buildx supposes you have installed it, as instructed [here](https://github.com/docker/buildx). ## Other - A new script `bin/docker-cache-prune` is used to remove unused images from the cache. Without that the cache grows constantly and we can rapidly hit the 5GB limit (when the limit is attained the oldest entries get evicted). - The `go-deps` dockerfile base image was changed from `golang:1.14.2` (ubuntu based) to `golang-1:14.2-alpine` also to conserve cache space. # Addressed separately in #4875: Got rid of the `go-deps` image and instead added something similar on top of all the Dockerfiles dealing with `go`, as a first stage for those Dockerfiles. That continues to serve as a way to pre-populate go's build cache, which speeds up the builds in the subsequent stages. That build should in theory be rebuilt automatically only when `go.mod` or `go.sum` change, and now we don't require running `bin/update-go-deps-shas`. That script was removed along with all the logic elsewhere that used it, including the `go_dependencies` job in the `static_checks.yml` github workflow. The list of modules preinstalled was moved from `Dockerfile-go-deps` to a new script `bin/install-deps`. I couldn't find a way to generate that list dynamically, so whenever a slow-to-compile dependency is found, we have to make sure it's included in that list. Although this simplifies the dev workflow, note that the real motivation behind this was a limitation in buildx's `docker-container` driver that forbids us from depending on images that haven't been pushed to a registry, so we have to resort to building the dependencies as a first stage in the Dockerfiles.	2020-07-22 14:27:45 -05:00
Andrew Seigner	8773416496	Fix build status badge in README (#4769 ) The GitHub Actions build status badge was referencing an old workflow named `CI`, which always shows red, and is no longer used. Fix the build status badge to reference the `Release` workload. Also slightly reformat the matrix build yaml, as the list has grown a bit. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2020-07-20 17:18:12 -07:00
Alejandro Pedraza	873bd61324	Helm integration deep tests (#4728 ) This creates a new integration test target that launches the deep suite, using a linkerd instance installed through Helm. I've added a `global.proxyInit.ignoreInboundPorts=1234,5678` override during install and enhanced the injection test to catch problems like what we saw in #4679.	2020-07-10 14:48:49 -05:00
Alejandro Pedraza	a30213b709	Increased node count for the GKE tests in release.yml (#4748 ) Followup to #4746. Tests were running out of resources.	2020-07-10 11:05:12 -05:00
Kevin Leimkuhler	e482ed4410	Use 2 nodes in cloud integration tests (#4746 ) The deep integration tests started failing on GKE. Originally, this was thought to be a cleanup issue, but we have not cleaned up deep integration tests in the past. We install Linkerd once, and then run all the tests serially. In thinking it's been a while since we've run a full deep tests on GKE, we may just need more resources when running them now. This increases the node count of the GKE cluster that we run on from 1 to 2. Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-07-09 18:23:45 -07:00
Alejandro Pedraza	9908b2b8b2	Re-enable custom domain integration test (#4722 ) The function triggering the test for k8s custom cluster domain was misnamed, and thus the test wasn't being run. This also adds some extra error handling to catch this and other potential issues.	2020-07-07 16:27:46 -05:00
Tarun Pothulapati	cf34a14985	Add a Windows Linkerd cli Test (#4653 ) This PR adds a new cli test to see if installation yamls are correctly generated even on windows, this is important because of all the file path difference between windows and Linux, and if any code uses a wrong format might cause the chart generation commands to fail on windows. This creates a separate workflow for both release and integration. Also, all the exisiting integration tests are moved in to /tests/integration to separate from /test/cli as this test does not fall under integration tests category	2020-07-02 23:13:57 +05:30
Kevin Leimkuhler	29bcb57de4	Update release workflow kind integration tests (#4668 ) ## Description As discussed [here](https://github.com/linkerd/linkerd2/pull/4653#discussion_r445543061), the `kind_integration` job of the release workflow was not kept in sync with the changes made in #4593. Until GitHub actions can reuse yaml for separate workflows, these sections are supposed to be kept in sync. This would be an issue if we had tried doing a release since #4593 merged, but that has not happened yet. ## Changes This updates the release workflow `kind_integration` job to use the use new test interface, mainly removing cluster creation and image loading as necessary prerequisites. Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-06-25 13:01:04 -04:00
Kevin Leimkuhler	4372ed56dd	Isolate tests by cluster and make run interface simpler (#4593 ) ## Summary Change the default behavior of integration tests to be isolated by cluster. Additionally, make running one or all tests easier than the current process. These changes are explained more in the [Testing RFC](https://github.com/linkerd/rfc/blob/master/design/0004-isolated-integration-tests.md) ## Changes This is a script used only by Linkerd developers, but there is a lot of useful usage examples and explanations in `bin/tests --help` output: ``` Run Linkerd integration tests. Optionally specify one of the following tests: [upgrade helm helm-upgrade uninstall deep external-issuer] Usage: tests [--images] [--images-host ssh://linkerd-docker] [--name test-name] [--skip-kind-create] /path/to/linkerd Examples: # Run all tests in isolated clusters tests /path/to/linkerd # Run single test in isolated clusters tests --name test-name /path/to/linkerd # Skip KinD cluster creation and run all tests in default cluster context tests --skip-kind-create /path/to/linkerd # Load images from tar files located under the 'image-archives' directory # Note: This is primarly for CI tests --images /path/to/linkerd # Retrieve images from a remote docker instance and then load them into KinD # Note: This is primarly for CI tests --images --images-host ssh://linkerd-docker /path/to/linkerd Available Commands: --name: the argument to this option is the specific test to run --skip-kind-create: skip KinD cluster creation step and run tests in an existing cluster. --images: (Primarily for CI) use 'kind load image-archive' to load the images from local .tar files in the current directory. --images-host: (Primarily for CI) the argument to this option is used as the remote docker instance from which images are first retrieved (using 'docker save') to be then loaded into KinD. This command requires --images. ``` ### Run all tests Old: ```bash bin/test-run $PWD/bin/linkerd ``` New: ```bash bin/tests $PWD/bin/linkerd ``` ### Run single test (upgrade for example): Current: ```bash . bin/_test-run.sh init_test_run $PWD/bin/linkerd upgrade_integration_tests ``` New: ```bash bin/tests --name upgrade $PWD/bin/linkerd ``` ### Run tests in isolated KinD clusters Current: Not possible without running single tests in newly created clusters manually New: ```bash bin/tests $PWD/bin/linkerd ``` ### Run tests in isolated namespaces on an existing cluster Old: ```bash bin/test-run $PWD/bin/linkerd ``` New: ```bash bin/tests --skip-kind-create $PWD/bin/linkerd ``` ## CI `kind_integration` has been updated so that it does not create a KinD cluster as part of its test setup. `cloud_integration` passes the `--skip-kind-create` flag so that the tests are run serially in a non-KinD cluster. Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-06-24 17:06:29 -04:00
Alejandro Pedraza	d842a97cb2	Update CI and docs to reference `main` branch (#4662 ) Files changed: ``` .github/PULL_REQUEST_TEMPLATE.md .github/workflows/cloud_integration.yml .github/workflows/kind_integration.yml .github/workflows/release.yml .github/workflows/static_checks.yml .github/workflows/unit_tests.yml BUILD.md CONTRIBUTING.md bin/test-scale bin/win/linkerd.nuspec ```	2020-06-24 12:39:22 -07:00
Oliver Gould	1fb3bd0731	codeowners: Relax review requirements on readmes (#4661 ) I should review all changes to the top-level project documents. CODEOWNERS is misconfigured, however, so that I am required to review changes to all files named README.md, which isn't intended. This change ensures that my review is only required on these files in the root of the repository.	2020-06-24 11:36:25 -07:00
Oliver Gould	d43ec41574	Relax review requirements on cni-plugin/Dockerfile (#4655 ) The /cni-plugin directory has additional review requirements; however, its Dockerfile changes each time `go.mod` is updated. It was not intended to require this extra review on these routine changes. This change updates CODEOWNERS to make all maintainers owners of `cni-plugin/Dockerfile`.	2020-06-23 15:23:38 -07:00
Ivan Sim	7927be6856	Update GitHub issue templates (#4654 ) Link the 'Question' option to the 'Discussion' page Signed-off-by: Ivan Sim <ivan@buoyant.io>	2020-06-23 13:43:53 -07:00
Alejandro Pedraza	ba420f2fac	Fix release workflow - avoid downloading choco package in edges (#4638 ) Added guard against trying to download choco package when not doing a stable release.	2020-06-18 16:02:53 -05:00
Alejandro Pedraza	2696ea94dd	Fix release workflow dependencies on choco_pack (#4635 ) The `choco_pack` job only runs for stable tags. In order for jobs to depend on it to run on non-stable tags, we need to move this tag check from the `choco_pack` job level down into its steps.	2020-06-18 13:50:19 -05:00
Kevin Leimkuhler	b0765c4361	Add integration test for upgrading from edge (#4557 ) This adds an integration test for upgrading from the latest edge to the current build. Closes #4471 Signed-off-by: Kevin Leimkuhler kevin@kleimkuhler.com	2020-06-16 09:18:52 -07:00
Alejandro Pedraza	d10ed2aa5e	CI steps for Chocolatey package - take 2 (#4536 ) * CI steps for Chocolatey package - take 2 Followup to #4205, supersedes #4205 This adds: - A new job psscript-analyzer into the `statics_checks.yml` workflow for linting the Chocolatey Powershell script. - A new `choco_pack` job in the `release.yml` workflow for updating the Chocolatey spec file and generating the package. This is only triggered for stable releases. It requires a windows runner in order to run the choco tooling (in theory it should have worked on a linux runner but in practice it didn't). - The `Create release` step was updated to upload the generated package, if present. - The source file path in `bin/win/linkerd.nuspec` was updated to make this work. * Name nupkg file accordingly to the other release assets	2020-06-15 16:42:50 -05:00

1 2 3

133 Commits