Commit Graph

139 Commits

Author SHA1 Message Date
Alejandro Pedraza e6fa5a7156
Replace usage of io/ioutil package (#9613)
`io/ioutil` has been deprecated since go 1.16 and the linter started to
complain about it.
2022-10-13 12:10:58 -05:00
Eliza Weisman f6c6ff965c
inject: fix --default-inbound-policy not setting annotation (#9197)
Depends on #9195

Currently, `linkerd inject --default-inbound-policy` does not set the
`config.linkerd.io/default-inbound-policy` annotation on the injected
resource(s).

The `inject` command does _try_ to set that annotation if it's set in
the `Values` generated by `proxyFlagSet`:
14d1dbb3b7/cli/cmd/inject.go (L485-L487)

...but, the flag in the proxy `FlagSet` doesn't set
`Values.Proxy.DefaultInboundPolicy`, it sets
`Values.PolicyController.DefaultAllowPolicy`:
7c5e3aaf40/cli/cmd/options.go (L375-L379)

This is because the flag set is shared across `linkerd inject` and
`linkerd install` subcommands, and in `linkerd install`, we want to set
the default policy for the whole cluster by configuring the policy
controller. In `linkerd inject`, though, we want to add the annotation
to the injected pods only.

This branch fixes this issue by changing the flag so that it sets the
`Values.Proxy.DefaultInboundPolicy` instead of the
`Values.PolicyController.DefaultAllowPolicy` value. In `linkerd
install`, we then set `Values.PolicyController.DefaultAllowPolicy` based
on the value of `Values.Proxy.DefaultInboundPolicy`, while in `inject`,
we will now actually add the annotation.

This branch is based on PR #9195, which adds validation to reject
invalid values for `--default-inbound-policy`, rather than on `main`.
This is because the validation code added in that PR had to be moved
around a bit, since it now needs to validate the
`Values.Proxy.DefaultInboundPolicy` value rather than the
`Values.PolicyController.DefaultAllowPolicy` value. I thought using
#9195 as a base branch was better than basing this on `main` and then
having to resolve merge conflicts later. When that PR merges, this can 
be rebased onto `main`.

Fixes #9168
2022-08-18 17:16:27 -07:00
Oliver Gould c3d327f9c7
ci: Unify integration test workflows (#8964)
The policy integration tests builds some of the same container images
that are built in the integration test workflow. We can avoid this
duplication while still making the policy test execution optional.

This change moves `integration_tests.yml` and `policy_controller.yml`
into `integration.yml`. The policy test workflows now re-use some of the
`just` recipes used for development. This change also seperates the
tests that need Viz components so that, eventually, they can be executed
more selectively.

Furthermore, this change updates the docker-build action to take a tag
as a parameter, rather than *setting* a TAG environment variable. This
change tries to simplify the use of environment variables by setting a
single tag output in both the integration and release workflows.

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-07-25 15:51:09 -07:00
Eliza Weisman 85e5ab3b38
inject: add `config.linkerd.io/shutdown-grace-period` annotation (#8923)
PR linkerd/linkerd2-proxy#1815 added support for a
`LINKERD2_PROXY_SHUTDOWN_GRACE_PERIOD` environment variable that
configures the proxy's maximum grace period for graceful shutdown. This
is intended to ensure that if a proxy is shut down, it will eventually
terminate in a relatively timely manner, even if some stubborn
connections don't close gracefully.

This branch adds support for a `config.linkerd.io/shutdown-grace-period`
annotation that can be used to override the default grace period
duration.

Hopefully I've added this everywhere it needs to be added --- please let
me know if I've missed anything!
2022-07-19 14:43:38 -07:00
Krzysztof Dryś 35722363f7
Use prommatch everywhere (#8674)
Introduce `prommatch.Suite` to simplify checking against multiple matchers. Also, added labels for a common scenario of `target_addr`.
2022-06-21 21:48:51 -06:00
Krzysztof Dryś 58650b655e
Add promm package (#8547)
Package promm is used for making assertions on prometheus metrics.

Fixes #8250 

I believe this improves the readability of tests. While the new assertions use much more LOC (basically plenty vs one line regex), this allows developers to understand better what is actually being tested. 

It allows for better test granularity. For example, it is easier to include more details in "has metrics" assertions, while having checks in "has no such metrics" assertions (for example look at `HasNoOutboundHTTPRequest`).

Actually, I was able to spot a bug in opaque port tests. Previously tests were passing only because we were looking for a very specific metrics.

If this gets approved, my plan is to use `metrictest` in more places and replace all assertions based on the regular expressions.

Signed-off-by: Krzysztof Dryś <krzysztofdrys@gmail.com>
2022-06-07 15:48:27 -06:00
Oliver Gould 33c1d610ad
test: Diff structured YAML when possible (#8432)
When we compare generated manifests against fixtures, we do a simple
string comparison to compare output. The diffed data can be pretty hard
to understand.

This change adds a new test helper, `DiffTestYAML` that parses strings
as arbitrary YAML data structures and uses `deep.Equal` to generate a
diff of the datastructures.

Now, when a test fails, we'll get output like:

```
install_test.go:244: YAML mismatches install_output.golden:
	slice[32].map[spec].map[template].map[spec].map[containers].slice[3].map[image]: PolicyControllerImageName:PolicyControllerVersion != SomeOtherImage:PolicyControllerVersion
```

While testing this, it became apparent that several of our generated
golden files were not actually valid YAML, due to the `LinkerdVersion`
value being unset. This has been fixed.

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-05-10 08:40:29 -07:00
AdamKorcz 5610d6b6fa
Fuzzing: Move fuzzers upstream (#7419)
Move fuzzers from downstream into Linkerd

Signed-off-by: AdamKorcz <adam@adalogics.com>
Co-authored-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2022-05-05 13:01:00 -06:00
Matei David 61b75509da
Refactor multicluster test install (#8139)
This change continues the work from #7403 by refactoring the
multicluster tests in order to install components programatically.

As part of this change, we now generate certificates (a CA and a shared
issuer) in code, and add a few utilities to manage different Kubernetes
contexts; a few examples are `KubectlApplyWithContext` and a function to
re-initialise the clientset with an arbitrary context.

Few bits and pieces have also been changed as I went through this, such
as applying entire files as opposed to reading manifests in memory
before piping them to kubectl.

Some other changes:
* remove logic from test runner script that set-up multicluster
* add a more rigurous check test after linking source to target cluster
* remove `target1`, `source` and `target_statefulset` tests
* consolidated previous tests in one file

Signed-off-by: Matei David <matei@buoyant.io>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
Co-authored-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2022-04-20 10:59:12 +01:00
Kevin Leimkuhler 67bcd8f642
Add `gosec` and `errcheck` lints (#7954)
Closes #7826

This adds the `gosec` and `errcheck` lints to the `golangci` configuration. Most significant lints have been fixed my individual changes, but this enables them by default so that all future changes are caught ahead of time.

A significant amount of these lints are been exluced by the various `exclude-rules` rules added to `.golangci.yml`. These include operations are files that generally do not fail such as `Copy`, `Flush`, or `Write`. We also choose to ignore most errors when cleaning up functions via the `defer` keyword.

Aside from those, there are several other rules added that all have comments explaining why it's okay to ignore the errors that they cover.

Finally, several smaller fixes in the code have been made where it seems necessary to catch errors or at least log them.

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2022-03-03 10:09:51 -07:00
Matei David 044ffaa45e
Move `upgrade` integration tests in their own files (#7934)
This change hoists all of the upgrade logic into dedicated files.

Upgrade tests are now done programatically (at the expense of repeating a lot of code between upgrade-edge and upgrade-stable). This (in theory) will remove the dependency on an external test-runner and special arguments for the TestHelper. Upgrade tests can now be run through go's runner, provided a Kubernetes cluster has been provisioned: `go test ./test/integration/upgrade-<release-channel>/...`

Signed-off-by: Matei David <matei@buoyant.io>
2022-02-25 15:12:09 +00:00
Alex Leong 2a0084c6ac
Replace time.After with time.NewTimer to avoid memory leaks (#7956)
The timer created by calling `time.After` is not cleaned up until the timer fires.  Repeated calls to `time.After` in a loop create multiple timers which can accumulate if the loop runs faster than the timeout.

We replace `time.After` with `time.NewTimer` and explicitly call `Stop` when we are finished to clean up the timer.

Signed-off-by: Alex Leong <alex@buoyant.io>
2022-02-24 15:34:52 -08:00
Oliver Gould 425a43def5
Enable gocritic linting (#7906)
[gocritic][gc] helps to enforce some consistency and check for potential
errors. This change applies linting changes and enables gocritic via
golangci-lint.

[gc]: https://github.com/go-critic/go-critic

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-02-17 22:45:25 +00:00
Oliver Gould f5876c2a98
go: Enable `errorlint` checking (#7885)
Since Go 1.13, errors may "wrap" other errors. [`errorlint`][el] checks
that error formatting and inspection is wrapping-aware.

This change enables `errorlint` in golangci-lint and updates all error
handling code to pass the lint. Some comparisons in tests have been left
unchanged (using `//nolint:errorlint` comments).

[el]: https://github.com/polyfloyd/go-errorlint

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-02-16 18:32:19 -07:00
Oliver Gould dcb0636ac1
Skip viz policy test on missing success rate (#7892)
When the flakey policy test can't produce a success rate, let's just
mark the test as skipped instead of failing CI.

Relates to #7590

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-02-15 16:45:37 -08:00
Matei David e46f7b4be2
Allow integration tests to run in parallel (#7773)
Go's test runner (`go test`) can be non-deterministic with the order in
which it runs the tests. Tests in Go seem to be always
run in parallel, but the specifics here differ depending on the
available CPU.

We can take advantage of parallelism here to get better timing on our
tests, however, we need to block the start of each test until the
control plane (or extension) pods are ready. In each `TestMain`, we
block until the pods are ready.

Signed-off-by: Matei David <matei@buoyant.io>
2022-02-07 15:21:57 +00:00
Matei David a0e2d12516
Isolate integration tests into suites (#7721)
This change re-works how tests are organised, by splitting each
integration test based on its functionality. Each "suite", is a
directory that compromises tests that will be specific to that suite;
deep tests will test the control plane, viz tests will target the
monitoring stack, and so on.

As part of this change, aside from moving the tests, the GH workflow
file has also been changed to remove old test names, and add (or
replace) new ones where appropriate.

To have a fully functioning viz suite, a new flag has been added to the
TestHelper to discern between tests that make use of the monitoring
stack and tests that do not. The external suite also has a duplicated
`edges` test so we can make sure interactions with external prometheus
don't break.

Signed-off-by: Matei David <matei@buoyant.io>
2022-02-02 17:19:08 +00:00
Alejandro Pedraza 68b63269d9
Remove the `proxy.disableIdentity` config (#7729)
* Remove the `proxy.disableIdentity` config

Fixes #7724

Also:
- Removed the `linkerd.io/identity-mode` annotation.
- Removed the `config.linkerd.io/disable-identity` annotation.
- Removed the `linkerd.proxy.validation` template partial, which only
  made sense when `proxy.disableIdentity` was `true`.
- TestInjectManualParams now requires to hit the cluster to retrieve the
  trust root.
2022-01-31 10:17:10 -05:00
Michael Lin 99f3e087e1
Introduce annotation to skip subnets (#7631)
The goal is to support configuring the
`--subnets-to-ignore` flag in proxy-init

This change adds a new annotation `/skip-subnets` which
takes a comma-separated list of valid CIDR.
The argument will map to the `--subnets-to-ignore`
flag in the proxy-init initContainer.

Fixes #6758

Signed-off-by: Michael Lin <mlzc@hey.com>
2022-01-20 16:53:59 +00:00
Alejandro Pedraza 67dfebb259
Stop shipping grafana-based image (#7567)
* Stop shipping grafana-based image

Fixes #6045 #7358

With this change we stop building a Grafana-based image preloaded with the Linkerd Grafana dashboards.

Instead, we'll recommend users to install Grafana by themselves, and we provide a file `grafana/values.yaml` with a default config that points to all the same Grafana dashboards we had, which are now hosted in https://grafana.com/orgs/linkerd/dashboards .

The new file `grafana/README.md` contains instructions for installing the official Grafana Helm chart, and mentions other available methods.

The `grafana.enabled` flag has been removed, and `grafanaUrl` has been moved to `grafana.url`. This will help consolidating other grafana settings that might emerge, in particular when #7429 gets addressed.

## Dashboards definitions changes

The dashboard definitions under `grafana/dashboards` (which should be kept in sync with what's published in https://grafana.com/orgs/linkerd/dashboards), got updated, adding the `__inputs`, `__elements` and `__requires` entries at the beginning, that were required in order to be published.
2022-01-11 14:47:40 -05:00
Alejandro Pedraza f9f3ebefa9
Remove namespace from charts and split them into `linkerd-crd` and `linkerd-control-plane` (#6635)
Fixes #6584 #6620 #7405

# Namespace Removal

With this change, the `namespace.yaml` template is rendered only for CLI installs and not Helm, and likewise the `namespace:` entry in the namespace-level objects (using a new `partials.namespace` helper).

The `installNamespace` and `namespace` entries in `values.yaml` have been removed.

There in the templates where the namespace is required, we moved from `.Values.namespace` to `.Release.Namespace` which is filled-in automatically by Helm. For the CLI, `install.go` now explicitly defines the contents of the `Release` map alongside `Values`.

The proxy-injector has a new `linkerd-namespace` argument given the namespace is no longer persisted in the `linkerd-config` ConfigMap, so it has to be passed in. To pass it further down to `injector.Inject()` without modifying the `Handler` signature, a closure was used.

------------
Update: Merged-in #6638: Similar changes for the `linkerd-viz` chart:

Stop rendering `namespace.yaml` in the `linkerd-viz` chart.

The additional change here is the addition of the `namespace-metadata.yaml` template (and its RBAC), _not_ rendered in CLI installs, which is a Helm `post-install` hook, consisting on a Job that executes a script adding the required annotations and labels to the viz namespace using a PATCH request against kube-api. The script first checks if the namespace doesn't already have an annotations/labels entries, in which case it has to add extra ops in that patch.

---------
Update: Merged-in the approved #6643, #6665 and #6669 which address the `linkerd2-cni`, `linkerd-multicluster` and `linkerd-jaeger` charts. 

Additional changes from what's already mentioned above:
- Removes the install-namespace option from `linkerd install-cni`, which isn't found in `linkerd install` nor `linkerd viz install` anyways, and it would add some complexity to support.
- Added a dependency on the `partials` chart to the `linkerd-multicluster-link` chart, so that we can tap on the `partials.namespace` helper.
- We don't have any more the restriction on having the muticluster objects live in a separate namespace than linkerd. It's still good practice, and that's the default for the CLI install, but I removed that validation.


Finally, as a side-effect, the `linkerd mc allow` subcommand was fixed; it has been broken for a while apparently:

```console
$ linkerd mc allow --service-account-name foobar
Error: template: linkerd-multicluster/templates/remote-access-service-mirror-rbac.yaml:16:7: executing "linkerd-multicluster/templates/remote-access-service-mirror-rbac.yaml" at <include "partials.annotations.created-by" $>: error calling include: template: no template "partials.annotations.created-by" associated with template "gotpl"
```
---------
Update: see helm/helm#5465 describing the current best-practice

# Core Helm Charts Split

This removes the `linkerd2` chart, and replaces it with the `linkerd-crds` and `linkerd-control-plane` charts. Note that the viz and other extension charts are not concerned by this change.

Also note the original `values.yaml` file has been split into both charts accordingly.

### UX

```console
$ helm install linkerd-crds --namespace linkerd --create-namespace linkerd/linkerd-crds
...
# certs.yaml should contain identityTrustAnchorsPEM and the identity issuer values
$ helm install linkerd-control-plane --namespace linkerd -f certs.yaml linkerd/linkerd-control-plane
```

### Upgrade

As explained in #6635, this is a breaking change. Users will have to uninstall the `linkerd2` chart and install these two, and eventually rollout the proxies (they should continue to work during the transition anyway).

### CLI

The CLI install/upgrade code was updated to be able to pick the templates from these new charts, but the CLI UX remains identical as before.

### Other changes

- The `linkerd-crds` and `linkerd-control-plane` charts now carry a version scheme independent of linkerd's own versioning, as explained in #7405.
- These charts are Helm v3, which is reflected in the `Chart.yaml` entries and in the removal of the `requirements.yaml` files.
- In the integration tests, replaced the `helm-chart` arg with `helm-charts` containing the path `./charts`, used to build the paths for both charts.

### Followups

- Now it's possible to add a `ServiceProfile` instance for Destination in the `linkerd-control-plane` chart.
2021-12-10 15:53:08 -05:00
Michael Lin 0e39017807
Improve ephemeral-storage resource config (#7218)
Fixes #3307

Add support for annotations `config.linkerd.io/proxy-ephemeral-storage-limit` and `config.linkerd.io/proxy-ephemeral-storage-request`

Signed-off-by: Michael Lin <mlzc@hey.com>
2021-11-05 14:04:57 -05:00
Alejandro Pedraza e8aa640085
New integration test default-policy-deny (#6931)
Closes #6813, followup to #6846

This runs `./test/integration/install_test.go`, installing linkerd with
`--set policyController.defaultAllowPolicy=deny` and subsequently
installing viz, making sure everything is alright.
2021-09-23 07:54:53 -05:00
Tarun Pothulapati 174c50a235
integrationtests: check for stat output of policy resources (#6890)
This PR adds a new `policy` integration tests suite
to check the output of for the `srv` and `saz` commands and
make sure they have the expected format.

This is majorly needed to make sure things work as expected
with stat cmd output without having to do manual testing.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-09-17 16:37:44 +05:30
Tarun Pothulapati 45478b6db8
viz: support `stat` on new policy resources (#6785)
Fixes #6733

As policy resources provide a grouping, statistics summaries should
also be allowed on these groupings which are useful to the user. Them
being port specific provide a great way to break down these metrics
further.

This PR adds support for policy resources i.e `server` and `serverauthorization`
 on the `stat` command.

## Changes

This adds a new path in the `stat_summary.go` file to handle policy
objects. I tried to see if we could re-use some of the other paths
but some of the labels seems to differ and hence a different path
had to be created. We can try to refactor and merge them though.

We support both request and TCP metrics for the `server` resource
while only the former with `serverauthorization` resources
as metrics are generated in this manner.

This also adds these policy objects into the `k8s` package to
make them as known resources.

For both the policy resources, `--from` doesn't work as these
metrics are not exposed from outbound, and there is no way to
query about the client workload from the inbound metrics. `--to`
is supported to get metrics specifically for a destination workload.
(just like on a service)

## Testing

```bash
> curl -sL https://run.linkerd.io/emojivoto.yml | linkerd inject --proxy-log-level debug - | kubectl apply -f -

> kubectl apply -f 897de1a8d5/emojivoto-policy.yml


# Initial values
on  kind-kind  linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.7 via  via ❄️  impure (shell)
➜ ./bin/go-run cli viz stat srv -A -owide                                                                                                         ~/work/linkerd2
NAMESPACE   NAME          UNAUTHORIZED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN   READ_BYTES/SEC   WRITE_BYTES/SEC
emojivoto   emoji-grpc          0.0rps   100.00%   1.8rps           1ms           1ms           3ms          1         188.6B/s         2072.9B/s
emojivoto   prom                0.0rps         -        -             -             -             -          -                -                 -
emojivoto   voting-grpc         0.0rps    80.70%   0.9rps           1ms           2ms           3ms          1          91.4B/s           52.7B/s
emojivoto   web-http            0.0rps    90.68%   2.0rps           2ms          10ms          28ms          1         153.7B/s         4509.4B/s

# After changing the `emoji-grpc` authz
on  kind-kind  linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.7 via  via ❄️  impure (shell) took 2s
➜ ./bin/go-run cli viz stat srv -A -owide                                                                                                         ~/work/linkerd2
NAMESPACE   NAME          UNAUTHORIZED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN   READ_BYTES/SEC   WRITE_BYTES/SEC
emojivoto   emoji-grpc          0.3rps   100.00%   1.1rps           0ms           0ms           0ms          1         156.5B/s         1282.4B/s
emojivoto   prom                0.0rps         -        -             -             -             -          -                -                 -
emojivoto   voting-grpc         0.0rps    87.88%   0.6rps           0ms           0ms           0ms          1          53.5B/s           31.5B/s
emojivoto   web-http            0.0rps    61.18%   1.4rps           1ms           2ms           2ms          1         110.2B/s         2195.7B/s

# after changing the `web-http` authz

on  kind-kind  linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.7 via  via ❄️  impure (shell)
➜ ./bin/go-run cli viz stat srv -A -owide                                                                                                         ~/work/linkerd2
NAMESPACE   NAME          UNAUTHORIZED   SUCCESS   RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN   READ_BYTES/SEC   WRITE_BYTES/SEC
emojivoto   emoji-grpc          0.0rps         -     -             -             -             -          -                -                 -
emojivoto   prom                0.0rps         -     -             -             -             -          -                -                 -
emojivoto   voting-grpc         0.0rps         -     -             -             -             -          -                -                 -
emojivoto   web-http            1.0rps         -     -             -             -             -          -                -                 -

> linkerd  viz stat srv/emoji-grpc -n emojivoto -owide
NAME         SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN   READ_BYTES/SEC   WRITE_BYTES/SEC
emoji-grpc        100.00%   2.0rps           1ms           1ms           1ms          1         199.9B/s         2208.0B/s

> linkerd  viz stat srv/web-http -n emojivoto -owide
NAME      SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN   READ_BYTES/SEC   WRITE_BYTES/SEC
web-http         94.02%   1.9rps           4ms           9ms          10ms          1         152.7B/s         4505.9B/s

> linkerd  viz stat srv -n emojivoto -o wide                                                      
NAME          MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN   READ_BYTES/SEC   WRITE_BYTES/SEC
emoji-grpc         -   100.00%   2.0rps           1ms           1ms           1ms          1         201.6B/s         2209.8B/s
prom               -         -        -             -             -             -          -                -                 -
voting-grpc        -    86.21%   1.0rps           1ms           1ms           1ms          1          98.3B/s           55.9B/s
web-http           -    91.67%   2.0rps           3ms           8ms          10ms          1         157.7B/s         4600.3B/s


> linkerd  viz stat serverauthorization/web-public -n emojivoto
NAME       MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99  
web-http        -    89.83%   2.0rps           3ms           9ms          10ms

> linkerd viz stat saz -n emojivoto
NAME          AUTHORIZATION     MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
emoji-grpc    emoji-grpc             -   100.00%   2.0rps           1ms           1ms           1ms
prom          prom-prometheus        -         -        -             -             -             -
voting-grpc   voting-grpc            -    89.83%   1.0rps           1ms           1ms           1ms
web-http      web-public             -    94.96%   2.0rps           1ms           5ms           9ms

> linkerd viz stat saz/web-public -n emojivoto                                                 
NAME       AUTHORIZATION   MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
web-http   web-public           -    90.00%   2.0rps           1ms           5ms           9ms
```

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-09-15 10:59:36 +05:30
Alex Leong 9ed5a3cb3f
Add sleep binary to proxy image (#6734)
Fixes #6723

We add the sleep binary to the proxy image so that the waitBeforeExitSeconds will work.

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-08-25 08:56:20 -07:00
LiuDui 5133af63e8
remove some hard-coded and duplicate constants (#6675)
Remove some hard-coded and duplicate constants

Signed-off-by: liudui <1693291525@qq.com>
2021-08-16 08:47:37 -04:00
Alejandro Pedraza e1e3f99b37
Upon rollout timeouts in integration tests, show events (#6529)
Sometimes deployments deadlines are reached in integrations tests when
waiting for rollouts, with no additional explanation.

E.g.
https://github.com/linkerd/linkerd2/runs/3123265801?check_suite_focus=true#step:8:161

This change outputs the events related to the deployment, for better
insight.
2021-07-21 14:46:17 -05:00
Josh Soref 0be792fadc
Spelling (#6215)
This PR corrects misspellings identified by the [check-spelling action](https://github.com/marketplace/actions/check-spelling).

The misspellings have been reported at 0d56327e6f (commitcomment-51603624)

The action reports that the changes in this PR would make it happy: 03a9c310aa

Note: this PR does not include the action. If you're interested in running a spell check on every PR and push, that can be offered separately.

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
2021-06-07 15:16:59 -06:00
Kevin Leimkuhler 945eaf57e9
Only check proxy version of running pods (#6212)
This changes the data plane and control plane checks to only consider pods that
are in a running state.

This is currently an issue in scenarios when upgrading and an old pod is still
terminating while the upgraded pod is already running. Running the `check`
command will report that the old pod has an unexpected proxy version even though
it is terminating; it should not warn in this case.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-06-03 10:59:07 -04:00
Oliver Gould da6d8e5272
Update Go to 1.16.4 (#6170)
Go 1.16.4 includes a fix for a denial-of-service in net/http: golang/go#45710

Go's error file-line formatting changed in 1.16.3, so this change
updates tests to only do suffix matching on these error strings.
2021-05-24 11:57:46 -07:00
Kevin Leimkuhler 8c66aafe7b
Increase all testing timeouts to 60 minutes (#6134) 2021-05-17 16:49:57 -04:00
Tarun Pothulapati 8db6398442
checks: add proxy checks for core cp and extension pods (#5673)
* checks: add proxy checks for core cp and extension pods

Fixes #5623

This PR adds proxy checks for control-plane and extension pods
when the respective checks are ran. This can make sure proxies
are working correctly and are able to communicate.

Currently, The following checks are added:

- proxy status checks
- proxy certificate checks
- proxy version checks

These are the same data-plane proxy checks that were already
present.

As these checks result in errors in most cases under integration
tests as there are latest versions online. This is fixed by templating
the check golden files and checking for the known error.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-04-22 11:39:52 +05:30
Alejandro Pedraza 6980e45e1d
Remove the `linkerd-controller` pod (#6039)
* Remove the `linkerd-controller` pod

Now that we got rid of the `Version` API (#6000) and the destination API forwarding business in `linkerd-controller` (#5993), we can get rid of the `linkerd-controller` pod.

## Removals

- Deleted everything under `/controller/api/public` and `/controller/cmd/public-api`.
- Moved `/controller/api/public/test_helper.go` to `/controller/api/destination/test_helper.go` because those are really utils for destination testing. I also extracted from there the prometheus mock structs and put that under `/pkg/prometheus/test_helper.go`, which is now by both the `linkerd diagnostics endpoints` and the `metrics-api` tests, removing some duplication.
- Deleted the `controller.yaml` and `controller-rbac.yaml` helm templates along with the `publicAPIResources` and `publicAPIProxyResources` helm values.

## Health checks

- Removed the `can initialize the client` check given such client is no longer needed. The `linkerd-api` section was left with only the check `control pods are ready`, so I moved that under the `linkerd-existence` section and got rid of the `linkerd-api` section altogether.
- In that same `linkerd-existence` section, got rid of the `controller pod is running` check.

## Other changes

- Fixed the Control Plane section of the dashboard, taking account the disappearance of `linkerd-controller` and previously, of `linkerd-sp-validator`.
2021-04-19 09:57:45 -05:00
Alejandro Pedraza 6b8c17eef1
Move `sp-validator` into the `destination` pod (#5936)
* Move `sp-validator` into the `destination` pod

Fixes #5195

The webhook secrets remain the same, as do the `profileValidator` settings in `values.yaml`, so this doesn't pose any upgrading challenges.
2021-04-16 08:04:11 -05:00
Alejandro Pedraza fae96c56d2
In the `helm-upgrade` int test, install the proper viz chart version (#6001)
The `helm-upgrade` test was installing the current viz chart version from `viz/charts/linkerd-viz` instead of the latest stable published one, before attempting to perform the helm upgrade.

As seen below, we're passing `--helm-chart` and `helm-stable-chart` as parameters, but only `--viz-helm-chart`

```console
Test script: [install_test.go] Params: [--helm-path=/home/alpeb/src/pr/version2/linkerd2/bin/helm --helm-chart=/home/alpeb/src/pr/version2/linkerd2/charts/linkerd2 --viz-helm-chart=/home/alpeb/src/pr/version2/linkerd2/viz/charts/linkerd-viz --helm-stable-chart=linkerd/linkerd2 --helm-release=helm-test --upgrade-helm-from-version=stable-2.10.0]
```

So this PR adds a new `--viz-helm-stable-chart` parameter that instructs which version to install before performing the upgrade.

This is affecting #6000 which has changes in the viz chart that are going undetected and are failing the `helm-upgrade` test.
2021-04-08 13:39:33 -05:00
Kevin Leimkuhler a11012819c
Add opaque ports namespace inheritance to pods (#5941)
### What

When a namespace has the opaque ports annotation, pods and services should
inherit it if they do not have one themselves. Currently, services do this but
pods do not. This can lead to surprising behavior where services are correctly
marked as opaque, but pods are not.

This changes the proxy-injector so that it now passes down the opaque ports
annotation to pods from their namespace if they do not have their own annotation
set. Closes #5736.

### How

The proxy-injector webhook receives admission requests for pods and services.
Regardless of the resource kind, it now checks if the resource should inherit
the opaque ports annotation from its namespace. It should inherit it if the
namespace has the annotation but the resource does not.

If the resource should inherit the annotation, the webhook creates an annotation
patch which is only responsible for adding the opaque ports annotation.

After generating the annotation patch, it checks if the resource is injectable.
From here there are a few scenarios:

1. If no annotation patch was created and the resource is not injectable, then
   admit the request with no changes. Examples of this are services with no OP
   annotation and inject-disabled pods with no OP annotation.
2. If the resource is a pod and it is injectable, create a patch that includes
   the proxy and proxy-init containers—as well as any other annotations and
   labels.
3. The above two scenarios lead to a patch being generated at this point, so no
   matter the resource the patch is returned.

### UI changes

Resources are now reported to either be "injected", "skipped", or "annotated".

The first pass at this PR worked around the fact that injection reports consider
services and namespaces injectable. This is not accurate because they don't have
pod templates that could be injected; they can however be annotated.

To fix this, an injection report now considers resources "annotatable" and uses
this to clean up some logic in the `inject` command, as well as avoid a more
complex proxy-injector webhook.

What's cool about this is it fixes some `inject` command output that would label
resources as "injected" when they were not even mutated. For example, namespaces
were always reported as being injected even if annotations were not added. Now,
it will properly report that a namespace has been "annotated" or "skipped".

### Tests

For testing, unit tests and integration tests have been added. Manual testing
can be done by installing linkerd with `debug` controller log levels, and
tailing the proxy-injector's app container when creating pods or services.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-03-29 19:41:15 -04:00
Alejandro Pedraza 9a191fbd7b
Get rid of `CheckDeployments()` in integration tests (#5942)
`CheckDeployments()` verified that some deployment had the appropriate
number of replicas in the Ready state. `CheckPods()` does the same, plus
checking if there were restarts (a single restart returns
`RestartCountError` which only triggers a warning on the calling side,
more restarts trigger a regular error). We were always calling the
former followed by a call to the latter which is superfluous, so we're
getting rid of the latter.

Also, the `testutil.DeploySpec` struct had a `Containers` field for
checking the name of containers, but that wasn't always checked and
didn't really represent a potential error that wouldn't be clearly
manifested otherwise (like in golden files), so that was simplified as
well.
2021-03-23 13:53:38 -05:00
Alejandro Pedraza 711124159a
Fix helm-upgrade and upgrade-stable integration tests (#5891)
* Fix helm-upgrade integration test

Update `install_test.go` now that the upgrade test is done from 2.10.
This also implied installing viz right after core.

Refactored `HelmInstallPlain()` into `HelmCmdPlain()` to work for both
helm install and upgrade.

* Add expected heartbeat config entry to the upgrade-stable test, and remove testCheckCommand arg (for lint)
2021-03-12 08:20:45 -05:00
Alejandro Pedraza af5aa70e68
Helm integration tests improvements, and other flakiness fixes to CI (#5857)
- Get rid of all the custom settings passed through `--set` during `helm install`, and instead let the defaults mechanisms in the templates to  kick-in
- Before installing `linkerd-viz` through Helm, wait on _all_ the core components to be ready (this is what might have been causing the restarts seen in CI)
- Show full output when `linkerd jaeger check` fails, and do some cleanup before triggering the tracing tests. But then I decided to temporarily disable that test till we figure out what's the deal.
2021-03-02 20:11:00 -05:00
Alejandro Pedraza 571f505b6b
Move CP check after the readiness check (#5848)
* Move CP check after the readiness check

Moved the `can initialize client` and `can query the control plane API`
checks from the `linkerd-existence` section to the `linkerd-api` because
they required the `linkerd-controller` pod to not just be "Running" but
actually be ready.

This was causing `linkerd check` to show some port-forwarding warnings
when ran right after install.

This also allowed getting rid of the `CheckPublicAPIClientOrExit` function
and directly use `CheckPublicAPIClientOrRetryOrExit` (better naming
punted for later) which was refactored so it always runs the
`linkerd-api` checks before retrieving the client.

Other changes:

- Temporarily disabled `upgrade-edge` test because the latest edge has this readiness check issue
- Have the upgrade tests do proper pruning (stolen for @Pothulapati's #5673 😉 )
- Added missing label to tap SA (fixes #5850)
- Complete tap-injector Service selector
2021-03-01 19:47:25 -05:00
Dennis Adjei-Baah 15d1809bd0
Remove linkerd prefix from extension resources (#5803)
* Remove linkerd prefix from extension resources

This change removes the `linkerd-` prefix on all non-cluster resources
in the jaeger and viz linkerd extensions. Removing the prefix makes all
linkerd extensions consistent in their naming.

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2021-02-25 11:01:31 -05:00
Tarun Pothulapati ed65a96f43
tests: enable external-prometheus-deep with cleanup logic (#5779)
This PR enables the temporarily disabled external-prometheus-deep
integration test. This also fixes the clean up issue by essentially
moving the external-prometheus resources into `external-prometheus`
which has the relevant annotation required to be deleted by
`bin/test-cleanup`

This PR also updates the test resource label to be `test.linkerd.io/is-test-data-plane` from
`linkerd.io/is-test-data-plane` to prevent `linkerd inject` from removing it.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-02-22 22:23:31 +05:30
Tarun Pothulapati ac67a2c5d7
tests: add exnternal-prometheus integration test (#5720)
* tests: add exnternal-prometheus integration test

Fixes #5659

Though most control-plane components dont differntiate on default
vs external prometheus, There have been issues w.r.t CLI and external
prometheus i.e check, etc.

This PR adds a e2e deep integration tests w.r.t external pronmetheus
thereby running viz and non-viz cmds integration tests on a linkerd-viz
ewith external prometheus instance

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-02-16 23:20:58 +05:30
Tarun Pothulapati a393c42536
values: removal of .global field (#5699)
* values: removal of .global field

Fixes #5425

With the new extension model, We no longer need `Global` field
as we don't rely on chart dependencies anymore. This helps us
further cleanup Values, and make configuration more simpler.

To make upgrades and the usage of new CLI with older config work,
We add a new method called `config.RemoveGlobalFieldIfPresent` that
is used in the upgrade and `FetchCurrentConfiguration` paths to remove
global field and attach its child nodes if global is present. This is verified
by the `TestFetchCurrentConfiguration`'s older test that has the global
field.

We also don't yet remove .global in some helm stable-upgrade tests for
the initial install to work.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-02-11 23:38:34 +05:30
Dennis Adjei-Baah e4069b47e0
Run extension checks when linkerd check is invoked (#5647)
* Run extension checks when linkerd check is invoked

This change allows the linkerd check command to also run any known
linkerd extension commands that have been installed in the cluster. It
does this by first querying for any namespace that has the label
selector `linkerd.io/extension` and then runs the subcommands for either
`jaeger`, `multicluster` or `viz`. This change runs basic namespace
healthchecks for extensions that aren't part of the Linkerd extension suite.

Fixes #5233
2021-02-11 10:50:16 -06:00
Mayank Shah 96e078421c
CLI: Remove the `--disable-tap` flag from inject (#5671)
Fixes https://github.com/linkerd/linkerd2/issues/5664

- Remove `--disable-flag` from `inject`
-  Move `config.linkerd.io/disable-tap` to `viz.linkerd.io/disable-tap`

Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
2021-02-11 10:01:53 -05:00
Alex Leong dd8e5fc5bc
Rename extension charts to linkerd-* (#5552)
For consistency we rename the extension charts to a common naming scheme:

linkerd-viz -> linkerd-viz (unchanged)
jaeger -> linkerd-jaeger
linkerd2-multicluster -> linkerd-multicluster
linkerd2-multicluster-link -> linkerd-multicluster-link

We also make the chart files and chart readmes a bit more uniform.

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-01-26 16:20:49 -08:00
Kevin Leimkuhler e7f2a3fba3
viz: add tap-injector (#5540)
## What this changes

This adds a tap-injector component to the `linkerd-viz` extension which is
responsible for adding the tap service name environment variable to the Linkerd
proxy container.

If a pod does not have a Linkerd proxy, no action is taken. If tap is disabled
via annotation on the pod or the namespace, no action is taken.

This also removes the environment variable for explicitly disabling tap through
an environment variable. Tap status for a proxy is now determined only be the
presence or absence of the tap service name environment variable.

Closes #5326

## How it changes

### tap-injector

The tap-injector component determines if `LINKERD2_PROXY_TAP_SVC_NAME` should be
added to a pod's Linkerd proxy container environment. If the pod satisfies the
following, it is added:

- The pod has a Linkerd proxy container
- The pod has not already been mutated
- Tap is not disabled via annotation on the pod or the pod's namespace

### LINKERD2_PROXY_TAP_DISABLED

Now that tap is an extension of Linkerd and not a core component, it no longer
made sense to explicitly enable or disable tap through this Linkerd proxy
environment variable. The status of tap is now determined only be if the
tap-injector adds or does not add the `LINKERD2_PROXY_TAP_SVC_NAME` environment
variable.

### controller image

The tap-injector has been added to the controller image's several startup
commands which determines what it will do in the cluster.

As a follow-up, I think splitting out the `tap` and `tap-injector` commands from
the controller image into a linkerd-viz image (or something like that) makes
sense.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-01-21 11:24:08 -05:00
Tarun Pothulapati 4c3d002501
viz: move sub-cmds using viz extension under viz cmd (#5485)
* viz: move sub-cmds using viz extension under viz cmd

Fixes #5327 , #5524 

This branch moves the following commands, under the `linkerd viz`
cmd as they use the viz extension to perform the job.

- dashboard
- edges
- routes
- stat
- tap
- top

This also creates a new pkg `public-api` which fecilitates
interaction and communication with public-api to be used
across extensions.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Co-authored-by: Alex Leong <alex@buoyant.io>
2021-01-13 12:11:25 +05:30