Commit Graph

198 Commits

Author SHA1 Message Date
Alejandro Pedraza 29459a50e6 Removed 'no invalid service profiles' from linkerd check test fixtures (#3724)
Followup to #3718
2019-11-14 10:52:38 -08:00
Zahari Dichev 2d224302de
Add integration test for external issuer and cert rotation flows (#3709)
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
2019-11-14 06:58:32 +02:00
Zahari Dichev a6ff442789
Traffic split integration test (#3649)
* Traffic split integration test

Signed-off-by: zaharidichev <zaharidichev@gmail.com>

* Address comments

Signed-off-by: zaharidichev <zaharidichev@gmail.com>

* Display placeholder when there is no basic stats data

Signed-off-by: zaharidichev <zaharidichev@gmail.com>
2019-11-13 21:14:34 +02:00
Alejandro Pedraza 4b6254b52e
Replaced `uuid` with `uid` from linkerd-config resource (#3694)
* Replaced `uuid` with `uid` from linkerd-config resource

Fixes #3621

Removed the old `uuid` for identifying linkerd installations, and
replaced it with the `uid` property from the `linkerd-config` ConfigMap.

I tested that this `uid` remains the same by updating the config and
also upgrading linkerd, using both the CLI and Helm.

Note that this required granting `linkerd-web` RBAC access to the
`linkerd-config` Config.

I also added an integration test to verify the stability of the uid.
2019-11-13 13:56:01 -05:00
Alex Leong 5da1a0723d
Disable edges integration test (#3707)
The edges integration test can fail when more edges are added to the Linked namespace due to https://github.com/linkerd/linkerd2/issues/3706.  We disable this test until that issue can be resolved.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-11-13 10:04:40 -08:00
Zahari Dichev 6e67f1a8ca
Modify knownEventWarningsRegex regex to catch ipv6 error events (#3699)
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
2019-11-13 09:38:01 +02:00
Zahari Dichev 038900c27e Remove destination container from controller (#3661)
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
2019-11-08 14:40:25 -08:00
Zahari Dichev 7dd5dfc2ba
Check health of meshed apps before and after linkerd upgrade (#3641)
* Check stats of deployed app before and after linkerd upgrade to ensure nothing broke

Signed-off-by: zaharidichev <zaharidichev@gmail.com>

* Address naming remarks

Signed-off-by: zaharidichev <zaharidichev@gmail.com>

* Improve application health checking

Signed-off-by: zaharidichev <zaharidichev@gmail.com>
2019-11-07 20:48:12 +02:00
Zahari Dichev 1bb9d66757 Integration test for custom cluster domain (#3660)
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
2019-11-04 14:49:52 -08:00
Zahari Dichev a8170bd634
Add preinstall checks for deletion and creation of secrets (#3639)
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
2019-10-31 18:01:03 +02:00
Alex Leong befea4aff6
Add direct edges integration test (#3603)
Add an integration test which exercises the behavior when one meshed pod connects to another meshed pod by pod ip address.

The current behavior is that the Linkerd proxy will not do any lookup against the destination service for this kind of connection and will proxy directly to the SO_ORIG_DST.  This means that it will not have the identity metadata necessary to TLS the connection, and the connection will not be present in the `linkerd edges` command output.  This test validates that behavior.

The purpose of this test is to set the stage for future work which will allow the Linkerd proxy to TLS this type of connection and display it in `linkerd edges`.  The assertions in this test will be updated as part of that work.

This test will be run as part of the integration test suite.  It can also be run directly:

```
go test --failfast --mod=readonly test/install_test.go   --linkerd=(pwd)"/bin/linkerd" --k8s-context="$CTX" --integration-tests
go test -v --mod=readonly test/edges/edges_test.go  --linkerd=(pwd)"/bin/linkerd" --k8s-context="$CTX" --integration-tests
```

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-10-30 10:48:03 -07:00
Ivan Sim ff69c29f5e
Add missing package to proxy Dockerfile (#3583)
* Add missing package to proxy Dockerfile
* Fix failing 'check' integration test
* Trim whitespaces in certs comparison.

Without this change, the integration test would fail because the trust anchor
stored in the linkerd-config config map generated by the Helm renderer is
stripped of the line breaks. See charts/linkerd2/templates/_config.tpl

Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-10-15 15:51:26 -07:00
Ivan Sim cf69dedf9c
Re-add the destination container to the controller spec (#3540)
* Re-add the destination container to the controller spec

This fix is necessary to avoid data plane downtime during an upgrade to
stable-2.6. All existing older proxies will continue to send requests to
this destination container, until the data plane is restarted.

On restart, the new pods will start forwarding their requests to the new
linkerd-dst service.

* Use the 2.6 destination service fqdn
* Fixed unit tests
* Fix integration test failure

Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-10-08 10:49:40 -07:00
Alejandro Pedraza c21ceb5b4e
Add integration tests for linkerd endpoints and edges (#3491)
* Add integration tests for linkerd endpoints and edges

Fixes #3477 and #3478
2019-10-01 15:46:27 -05:00
Alejandro Pedraza 6568929028
Add --disable-heartbeat flag for linkerd install|upgrade (#3439)
Fixes #278

Add `linkerd install|upgrade --disable-heartbeat` flag, and have
`linkerd check` check for the heartbeat's SA only if it's enabled.

Also added those flags into the `linkerd upgrade -h` examples.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-09-25 15:53:36 -05:00
Alejandro Pedraza 1653f88651
Put the destination controller into its own deployment (#3407)
* Put the destination controller into its own deployment

Fixes #3268

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-09-18 13:41:06 -05:00
Andrew Seigner a5a6e8ff9f
Fix integration test event regex matching (#3416)
The integration tests check for known k8s events using a regex. This
regex included an incorrect pattern that prepended a failure reason and
object, rather than simply the event message we were trying to match on.
This resulted in failures such as:
https://github.com/linkerd/linkerd2/runs/217872818#step:6:476

Fix the regex to only check for the event message. Also explicitly
differentiate reason, object, and message in the log output.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-09-10 13:24:22 -07:00
Andrew Seigner 9bb7b6f119
Make KillPodSandbox regex match broader (#3409)
We're getting flakey `KillPodSandbox` events in the integration tests:
https://github.com/linkerd/linkerd2/runs/216505657#step:6:427
This is despite adding a regex for these events in #3380.

Modify the KillPodSandbox event regex to match on a broader set of
strings.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-09-09 11:54:14 -07:00
Bruno M. Custódio 8fec756395 Add '--address' flag to 'linkerd dashboard'. (#3274)
Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com>
2019-09-05 10:56:10 -07:00
Andrew Seigner e51af8c8a9
Add known KillPod k8s event to integration test (#3380)
FailedKillPod events were causing integration tests to fail:
https://github.com/linkerd/linkerd2/runs/212313175#step:6:409

Add FailedKillPod as a known event. Example:
https://play.golang.org/p/WV52tyZgijW

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-09-04 14:00:16 -07:00
Andrew Seigner a8481b721a
GitHub Actions, kind, integration test logs fixes (#3372)
PR #3339 introduced a GitHub Actions CI workflow. Booting 6 clusters
simultaneously (3x Github Actions + 3x Travis) exhibits some transient
failures.

Implement fixes in GitHub Actions and integration tests to address kind
cluster creation and testing:
- Retry kind cluster creation once.
- Retry log reading from integration k8s clusters once.
- Add kind cluster creation debug logging.
- Add a GitHub Actions status badge to top of `README.md`.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-09-04 12:44:27 -07:00
Alejandro Pedraza acbab93ca8
Add support for k8s 1.16 (#3364)
Fixes #3356

1.16 removes some api groups that were already deprecated. From k8s blog
post (https://kubernetes.io/blog/2019/07/18/api-deprecations-in-1-16/):

```
- PodSecurityPolicy: will no longer be served from extensions/v1beta1 in
v1.16.
    Migrate to the policy/v1beta1 API, available since v1.10. Existing
    persisted data can be retrieved/updated via the policy/v1beta1 API.
- DaemonSet, Deployment, StatefulSet, and ReplicaSet: will no longer be
served from extensions/v1beta1, apps/v1beta1, or apps/v1beta2 in v1.16.
    Migrate to the apps/v1 API, available since v1.9. Existing persisted
    data can be retrieved/updated via the apps/v1 API.
```

Previous PRs had already made this change at the Helm templates level,
but we still needed to do it at the API calls and tests.

The integration tests ran fine for k8s 1.12 and 1.15. They fail on 1.16
because the upgrade integration test tries to install linkerd 2.5 which is not
compatible with 1.16.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-09-04 09:59:55 -05:00
Alejandro Pedraza 368d16f23c
Fix auto-injecting pods and integration tests reporting (#3335)
* Fix auto-injecting pods and integration tests reporting

When creating an Event when auto-injection occurs (#3316) we try to
fetch the parent object to associate the event to it. If the parent
doesn't exist (like in the case of stand-alone pods) the event isn't
created. I had missed dealing with one part where that parent was
expected.

This also adds a new integration test that I verified fails before this
fix.

Finally, I removed from `_test-run.sh` some `|| exit_code=$?` that was
preventing the whole suite to report failure whenever one of the tests
in `/tests` failed.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-28 15:04:20 -05:00
Andrew Seigner 956d1bff06
Update warning event regex for integration test (#3336)
Kubernetes was generating events for failed readiness probes that did
not quite match the expected events regex in the install integration
test:
https://travis-ci.org/linkerd/linkerd2/jobs/577642724#L647

Update the readiness probe regex to handle these variations in events:
https://play.golang.org/p/OVGJkFNN-XA

Relates to CI failure in #3333.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-28 10:19:40 -07:00
Andrew Seigner 419e9052ff
Fix flakey upgrade integration test (#3329)
The `linkerd upgrade` integration test compares the output from two
commands:
- `linkerd upgrade control-plane`
- `linkerd upgrade control-plane --from-manifests`

The output of these commands include the heartbeat cronjob schedule,
which is generated based on the current time.

Modify the upgrade integration test to retry the manifest comparison one
time, assuming that `linkerd upgrade control-plane` should not take more
than one minute to execute.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-28 09:41:09 -07:00
Alejandro Pedraza 9ee98d35be
Stop ignoring client-go log entries (#3315)
* Stop ignoring client-go log entries

Pipe klog output into logrus. Not doing this avoids us from seeing
client-go log entries, for some reason I don't understand.

To enable, `--controller-log-level` must be `debug`.

This was discovered while trying to debug sending events for #3253.

I added an integration test that fails when this piping is not in place.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-26 15:46:31 -05:00
Ivan Sim 954a45f751
Fix broken unit and integration tests (#3303)
Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-08-21 18:52:19 -07:00
Alejandro Pedraza fd5fc07db1
Fix integration test (#3266)
Followup to #3194

The namespace was too long for l5d-bot:

```
    inject_test.go:117: failed to create
    l5d-integration-auto-git-9688d9ba-inject-namespace-override-test
    namespace: Namespace
    "l5d-integration-auto-git-9688d9ba-inject-namespace-override-test"
    is invalid: metadata.name: Invalid value:
    "l5d-integration-auto-git-9688d9ba-inject-namespace-override-test":
    must be no more than 63 characters
```

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-15 09:13:36 -05:00
Tarun Pothulapati 242566ac7c Check for Namespace level config override annotations (#3194)
* Check for Namespace level config override annotations
* Add unit tests for namespace level config overrides
* add integration test for namespace level config override
* use different namespace for override tests
* check resource requests for integration tests

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2019-08-14 21:01:44 -07:00
Alejandro Pedraza 4e65ed1e6a Update integration test with new Helm values struct (#3246)
Followup to #3229

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-13 19:00:55 -07:00
Alejandro Pedraza d64a2f3689
Add integration test for `helm install` (#3223)
Ref #3143

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-13 09:14:32 -05:00
Alejandro Pedraza 3ae653ae92
Refactor proxy injection to use Helm charts (#3200)
* Refactor proxy injection to use Helm charts

Fixes #3128

A new chart `/charts/patch` was created, that generates the JSON patch
payload that is to be returned to the k8s API when doing the injection
through the proxy injector, and it's also leveraged by the `linkerd
inject --manual` CLI.

The VFS was used by `linkerd install` to access the old chart under
`/chart`. Now the proxy injection also uses the Helm charts to generate
the JSON patch (see above) so we've moved the VFS from `cli/static` to a
new common place under `/pkg/charts/static`, and the new root for the VFS is
now `/charts`.

`linkerd install` hasn't yet migrated to use the new charts (that'll
happen in #3127), so the only change in that regard was the creation of
`/charts/chart` which is a symlink pointing to `/chart` that
`install.go` now uses, so that the VFS contains both the old and new
charts, as a temporary measure.

You can see that `/bin/Dockerfile-bin`, `/controller/Dockerfile` and
`/bin/build-cli-bin` do now `go generate` pointing to the new location
(and the `go generate` annotation was moved from `/cli/main.go` to
`pkg/charts/static/templates.go`).

The symlink trick doesn't work when building the binaries through
Docker, so `/bin/Dockerfile-bin` replaces the symlink with an actual
copy of `/chart`.

Also note that in `/controller/Dockerfile` we now need to include the
`prod` tag in `go install` like we do in `/bin/Dockerfile-bin` so that
the proxy injector does use the VFS instead of the local file system.

- The common logic to parse a chart has been moved from `install.go` to
`/pkg/charts/util.go`.
- The special ENV var in the proxy for "outbound router capacity" that
only applies to the Prometheus pod is now handled directly in the proxy
partial and all the associated go code could be removed.
- The `patch.go` lib for generating the JSON patch in go along
with its tests `patch_test.go` are no longer needed.
- Lots of functions in `/pkg/inject/inject.go` got removed/simplified
with their logic being moved into the charts themselves. As a
consequence lots of things in `inject_test.go` became irrelevant.
- Moved `template-values.go` from `/pkg/inject` to `pkg/charts` as that
contains the go structs representation of the chart variables that
will be leveraged in #3127.

Don't forget to run `/bin/helm.sh` whenever you make changes to charts
;-)

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-07 17:32:37 -05:00
Andrew Seigner a59c1dd32d
Introduce tap APIService, update `linkerd tap` (#3167)
The Tap Service enabled tapping of any meshed pod, regardless of user
privilege.

This change introduces a new Tap APIService. Kubernetes provides
authentication and authorization of Tap requests, and then forwards
requests to a new Tap APIServer, which implements a Kubernetes
aggregated APIServer. The Tap APIServer authenticates the client TLS
from Kubernetes, and authorizes the user via a SubjectAccessReview.

This change also modifies the `linkerd tap` command to make requests
against the new APIService.

The Tap APIService implements these Kubernetes-style endpoints:
POST /apis/tap.linkerd.io/v1alpha1/watch/namespaces/:ns/tap
POST /apis/tap.linkerd.io/v1alpha1/watch/namespaces/:ns/:res/:name/tap
GET  /apis
GET  /apis/tap.linkerd.io
GET  /apis/tap.linkerd.io/v1alpha1
GET  /healthz
GET  /healthz/log
GET  /healthz/ping
GET  /metrics
GET  /openapi/v2
GET  /version

Users authorize to the new `tap.linkerd.io/v1alpha1` via RBAC. Only the
`watch` verb is supported. Access is also available via subresources
such as `deployments/tap` and `pods/tap`.

This change introduces the following resources into the default Linkerd
install:
- Global
  - APIService/v1alpha1.tap.linkerd.io
  - ClusterRoleBinding/linkerd-linkerd-tap-auth-delegator
- `linkerd` namespace:
  - Secret/linkerd-tap-tls
- `kube-system` namespace:
  - RoleBinding/linkerd-linkerd-tap-auth-reader

Tasks not covered by this PR:
- `linkerd top`
- `linkerd dashboard`
- `linkerd profile --tap`
- removal of the unauthenticated tap controller

Fixes #2725, #3162, #3172

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-01 14:02:45 -07:00
Alex Leong ab7226cbcd
Return invalid argument for external name services (#3120)
Fixes https://github.com/linkerd/linkerd2/issues/2800#issuecomment-513740498

When the Linkerd proxy sends a query for a Kubernetes external name service to the destination service, the destination service returns `NoEndpoints: exists=false` because an external name service has no endpoints resource.  Due to a change in the proxy's fallback logic, this no longer causes the proxy to fallback to either DNS or SO_ORIG_DST and instead fails the request.  The net effect is that Linkerd fails all requests to external name services.

We change the destination service to instead return `InvalidArgument` for external name services.  This causes the proxy to fallback to SO_ORIG_DST instead of failing the request.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-07-29 16:31:22 -07:00
Andrew Seigner 065dd3ec9d
Add "can create cronjobs" to linkerd check (#3133)
PR #3056 introduced a cluster heartbeat cronjob to the Linkerd
installation. This implies the user installing Linkerd requires the
privileges to create CronJobs.

Update `linkerd check` to validate the user has privileges necessary to
create CronJobs.

Fixes #3057

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-07-26 13:09:41 -07:00
Andrew Seigner 64ed8e4a74
Introduce Cluster Heartbeat cronjob (#3056)
`linkerd check`, the web dashboard, and Grafana all perform version
checks to validate Linkerd is up to date. It's common for users to
seldom execute these codepaths. This makes it difficult to identify what
versions of Linkerd are currently in use and what environments it is
being run in, which helps prioritize testing and backports.

Introduce a `heartbeat` CronJob to the default Linkerd install. The
cronjob executes every 24 hours, starting from 5 minutes after
`linkerd install` is run.

Example check URL:
https://versioncheck.linkerd.io/version.json?
  install-time=1562761177&
  k8s-version=v1.15.0&
  meshed-pods=8&
  rps=3&
  source=heartbeat&
  uuid=cc4bb700-3314-426a-9f0f-ec588b9df020&
  version=git-b97ee9f7

Fixes #2961

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-07-23 17:12:30 -07:00
Alex Leong d6ef9ea460
Update ServiceProfile CRD to version v1alpha2 and remove validation (#3078)
The openAPIV3Schema validation in the ServiceProfiles CRD is very limited in what it can validate and is obviated by more sophisticated validation done by the validating admission controller.  Therefore, we would like to remove the openAPIV3Schema validation to reduce the size and complexity of the CRD object.

To do so, we must also bump the version of the ServiceProfile custom resource from v1alpha1 to v1alpha2.  This ensures that when the controller is upgraded, it will attempt to watch the v1alpha2 resource.  If it cannot (because, for example, the controller pod started before the ServiceProfile CRD was updated and therefore the v1alpha2 version does not exist) then it will go into a crash loop backoff until it can.  This essentially means that the controller will wait for the CRD to be upgraded to include v1alpha2 before it will start.  

Bumping the version is necessary because if we did not, it would be possible for the controller to start before the CRD is updated (removing the validation).  In this case, when the CRD is edited, the controller will lose its list watch on ServiceProfiles and will stop getting updates.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-07-23 11:46:31 -07:00
Ivan Sim 36681218ba
Revert "Increase the retry duration in the post-upgrade 'check' integration test (#2944)" (#3091)
This reverts commit 60c58c1f85.

Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-07-18 09:25:27 -07:00
Alejandro Pedraza 68f2f694e3
Improve object cleanup when integration tests fail (#3080)
Integration tests may fail and leave behind namespaces that following
builds aren't able to clean up because the git sha is being included in
the namespace name, and the following builds don't know about those
shas.

This modifies the `test-cleanup` script to delete based on object labels
instead of relying on the objects names, now that after 2.4 all the
control plane components are labeled. Note that this will also remove
non-testing linkerd namespaces, but we were already kinda doing that
partially because we were removing the cluster-level resources (CRDs,
webhook configs, clusterroles, clusterrolebindings, psp).

`test-cleanup` no longer receives a namespace name as an argument.

The data plane namespaces aren't labeled though, so I've added the
`linkerd.io/is-test-data-plane` label for them in
`CreateNamespaceIfNotExists()`, and making sure all tests that need a
data plaine explicitly call that method instead of creating the
namespace as a side-effect in `KubectlApply()`.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-07-12 15:01:10 -05:00
Jonathan Juares Beber 2dcbde08b3 Show pod status more clearly (#1967) (#2989)
During operations with `linkerd stat` sometimes it's not clear the actual
pod status.

This commit introduces a method, to the `k8s`package, getting the pod status,
based on [`kubectl` logic](33a3e325f7/pkg/printers/internalversion/printers.go (L558-L640))
to expose the `STATUS` column for pods . Also, it changes the stat command
on the` cli` package adding a column when the resource type is a Pod.

Fixes #1967

Signed-off-by: Jonathan Juares Beber <jonathanbeber@gmail.com>
2019-07-10 12:44:44 -07:00
Andrew Seigner 5d0746ff91
Add NET_RAW to `linkerd check --pre` (#3055)
`linkerd check --pre` validates that PSPs provide `NET_ADMIN`, but was
not validating `NET_RAW`, despite `NET_RAW` being required by Linkerd's
proxy-init container since #2969.

Introduce a `has NET_RAW capability` check to `linkerd check --pre`.

Fixes #3054

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-07-10 20:28:49 +02:00
Andrew Seigner 7c87fd4498
Make ReplicaSet check more explicit. (#3017)
The `linkerd check` for healthy ReplicaSets had a generic
`control plane components ready` description, and a hint anchor to
`l5d-existence-psp`. While a ReplicaSet failure could definitely occur
due to psp, that hintAnchor was already in use by the "control plane
PodSecurityPolicies exist" check.

Rename the `control plane components ready` check to
`control plane replica sets are ready`, and the hintAnchor from
`l5d-existence-psp` to `l5d-existence-replicasets`.

Relates to https://github.com/linkerd/website/issues/372.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-07-02 20:02:08 +02:00
Ivan Sim 866fe6fa5e
Introduce global resources checks to install and multi-stage install (#2987)
* Introduce new checks to determine existence of global resources and the
'linkerd-config' config map.
* Update pre-check to check for existence of global resources

This ensures that multiple control planes can't be installed into
different namespaces.

* Update integration test clean-up script to delete psp and crd

Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-06-27 09:59:12 -07:00
Andrew Seigner 2528e3d62d
Make NET_ADMIN check a warning, add PSP check (#2958)
`linkerd check` validates whether PSP's exist, and if the caller has the
`NET_ADMIN` capability. This check was previously failing if `NET_ADMIN`
was not found, even in the case where the PSP admission controller was
not running. Related, `linkerd install` now includes a PSP, so
`linkerd check` should also validate that the caller can create PSP's.

Modify the `NET_ADMIN` check to warn, but not fail, if PSP's are found
but the caller does not have `NET_ADMIN`. Update the warning message to
mention that this is only a problem if the PSP admission controller is
running (and will only be a problem during injection, since #2920
handles control plane installation by adding its own PSP).

Also introduce a check to validate the caller can create PSP's.

Fixes #2884, #2849

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-06-20 17:58:26 +02:00
Ivan Sim e2e976cce9
Add `NET_RAW` capability to the proxy-init container (#2969)
Also, update control plane PSP to match linkerd/website#94

Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-06-19 19:34:37 -07:00
Ivan Sim 60c58c1f85
Increase the retry duration in the post-upgrade 'check' integration test (#2944)
Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-06-14 12:23:18 -07:00
Alejandro Pedraza 7fc6c195ad
Set MWC and VWC failure policy to 'fail' in HA mode only (#2943)
Fixes #2927

Also moved `TestInstallSP` after `TestCheckPostInstall` so we're sure
the validating webhook is ready before installing a service profile.

Signed-off-by: Alejandro Pedraza Borrero <alejandro@buoyant.io>
2019-06-14 11:50:59 -05:00
Ivan Sim c4354ab1a4
Fix inject integration test failure (#2928)
Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-06-11 15:08:37 -07:00
Alex Leong 06a69f69c5
Refactor destination service (#2786)
This is a major refactor of the destination service.  The goals of this refactor are to simplify the code for improved maintainability.  In particular:

* Remove the "resolver" interfaces.  These were a holdover from when our decision tree was more complex about how to handle different kinds of authorities.  The current implementation only accepts fully qualified kubernetes service names and thus this was an unnecessary level of indirection.
* Moved the endpoints and profile watchers into their own package for a more clear separation of concerns.  These watchers deal only in Kubernetes primitives and are agnostic to how they are used.  This allows a cleaner layering when we use them from our gRPC service.
* Renamed the "listener" types to "translator" to make it more clear that the function of these structs is to translate kubernetes updates from the watcher to gRPC messages.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-06-04 15:01:16 -07:00
Alejandro Pedraza 74ca92ea25
Split proxy-init into separate repo (#2824)
Split proxy-init into separate repo

Fixes #2563

The new repo is https://github.com/linkerd/linkerd2-proxy-init, and I
tagged the latest there `v1.0.0`.

Here, I've removed the `/proxy-init` dir and pinned the injected
proxy-init version to `v1.0.0` in the injector code and tests.

`/cni-plugin` depends on proxy-init, so I updated the import paths
there, and could verify CNI is still working (there is some flakiness
but unrelated to this PR).

For consistency, I added a `--init-image-version` flag to `linkerd
inject` along with its corresponding override config annotation.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-06-03 16:24:05 -05:00
Alejandro Pedraza b384995397
Known warnings should hold just the message part, not the reason nor the involved object name (#2865)
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-05-29 17:13:31 -05:00
Dennis Adjei-Baah b9cc66c6c8
fixes roleRef name in linkerd-tap rbac (#2845)
* fixes roleRef name in linkerd-tap rbac
2019-05-28 10:05:01 -07:00
Andrew Seigner bd4c2788fa Modify upgrade integration test to upgrade stable (#2835)
In #2679 we introduced an upgrade integration test. At the time we only
supported upgrading from a recent edge. Since that PR, a stable build
was released supporting upgrade.

Modify the upgrade integration test to upgrade from the latest stable
rather than latest edge. This fulfills the original intent of #2669.

Also add some known k8s event warnings.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-05-20 13:36:07 -07:00
Dennis Adjei-Baah a0fa1dff59
Move tap service into its own pod. (#2773)
* Split tap into its own pod in the control plane

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2019-05-15 16:28:44 -05:00
mg b965d0d30e Introduce pre-install healthcheck for clock skew (#2803)
* Adding pre-install check for clock skew
* Fixing lint error - time.Since
* Update test data for clock skew check
* Incorporating code review comments
* Additional fix - clock skew test

Signed-off-by: Matej Gera <matejgera@gmail.com>
2019-05-13 10:14:38 -07:00
Alejandro Pedraza 065c221858
Support for resources opting-out of tap (#2807)
Support for resources opting out of tap

Implements the `linkerd inject --disable-tap` flag (although hidden pending #2811) and the config override annotation `config.linkerd.io/disable-tap`.
Fixes #2778

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-05-10 14:17:23 -05:00
Alejandro Pedraza f4ab9d6f9e
Integration test for k8s events generated during install (#2805)
Integration test for k8s events generated during install

Fixes #2713

I did make sure a scenario like the one described in #2964 is caught.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-05-10 09:09:54 -05:00
Alejandro Pedraza 2ae0daca9f
Add `linkerd inject --manual` test into install_test.go (#2791)
Add `linkerd inject --manual` test into install_test.go

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-05-08 11:13:30 -05:00
Ivan Sim 00a94be073
Change the service profile assertion conditions (#2785)
With the server configured to response with a failure of 50%, the test first
checks to ensure the actual success rate is less than 100%. Then the
service profile is edited to perform retries. The test then checks to
ensure the effective success rate is at least 95%.

This is (hopefully) more reliable than changing the test to perform waits and
retries until there is a difference between effective success rate and actual
success rate and compare them.

Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-05-03 08:37:48 -07:00
Andrew Seigner 66494591e0
Multi-stage check support (#2765)
Add support for `linkerd check config`. Validates the existence of the
Linkerd Namespace, ClusterRoles, ClusterRoleBindings, ServiceAccounts,
and CustomResourceDefitions.

Part of #2337

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-04-30 17:17:59 +01:00
Andrew Seigner 85a7c885d4
Add known proxy-injector log warning to tests (#2760)
PR #2737 introduced a warning in the proxy-injector when owner ref
lookups failed due to not having up-to-date ReplicaSet information. That
warning may occur during integration tests, causing a failure.

Add the warning as a known controller log message. The warning will be
printed as a skipped test, allowing the integration tests to pass.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-04-30 13:38:59 +01:00
Ivan Sim 714035fee9
Define default resource spec for proxy-init init container (#2763)
Fixes #2750 

Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-04-29 11:41:05 -07:00
Andrew Seigner 15ffd86cf1
Introduce multi-stage upgrade (#2723)
`linkerd install` supports a 2-stage install process, `linkerd upgrade`
did not.

Add 2-stage support for `linkerd upgrade`. Also exercise multi-stage
functionality during upgrade integration tests.

Part of #2337

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-04-25 14:29:52 -07:00
Alejandro Pedraza 53bb7c47f6
Make the auto-injector required and removed proxy-auto-inject flag (#2733)
Make the auto-injector required and removed proxy-auto-inject flag

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-04-24 13:06:51 -05:00
Alejandro Pedraza 62d9a80894
New `linkerd inject` default and manual modes (#2721)
Fixes #2720 and 2711 

This changes the default behavior of `linkerd inject` to not inject the
proxy but just the `linkerd.io/inject: enabled` annotation for the
auto-injector to pick it up (regardless of any namespace annotation).

A new `--manual` mode was added, which behaves as before, injecting
the proxy in the command output.

The unit tests are running with `--manual` to avoid any changes in the
fixtures.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-04-24 09:05:27 -05:00
Alejandro Pedraza c56766a923
Add config.linkerd.io/disable-identity annotation (#2717)
Add config.linkerd.io/disable-identity annotation

First part of #2540

We'll tackle support for `--disable-identity` in `linkerd install` in a
separate commit.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-04-19 19:04:49 -04:00
Andrew Seigner ebbb55cf6b
Report known logging errors in test output (#2724)
The integration tests check container logs for errors. When an error is
encountered that matches a list of expected errors, it was hidden, and
the test passed.

Modify the integration tests to report known errors in logs via
`t.Skipf`.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-04-19 14:22:35 -07:00
Gaurav Kumar 392cab80fa Add check for unschedulable pods and psp issues (#2510)
Fixes #2465 

* Add check for unschedulable pods and psp issues (#2465)
* Return error reason and message on pod or node failure

Signed-off-by: Gaurav Kumar <gaurav.kumar9825@gmail.com>
2019-04-19 11:42:22 -07:00
Ivan Sim 8d13084f94
Split the `linkerd-version` CLI flag into `control-plane-version` and `proxy-version` (#2702)
* The 'linkerd-version' CLI flag is renamed to 'control-plane-version'
* Add version field to proxy config
* Add the control plane version to the global config
* Unit test for init image version
* Use more specific control plane and proxy versions in unit tests

Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-04-19 11:35:20 -07:00
Dennis Adjei-Baah be614656bb
add service profile integration tests for service profile metrics (#2685)
* add integration tests for retryable requests

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2019-04-18 11:01:49 -07:00
Andrew Seigner 8323e104fb
Introduce upgrade --from-manifests flag (#2697)
The `linkerd upgrade` command read the control-plane's config from
Kubernetes, which required the environment to be configured to connect
to the appropriate k8s cluster.

Intrdouce a `linkerd upgrade --from-manifests` flag, allowing the user
to feed the output of `linkerd install` into the upgrade command.

Fixes #2629

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-04-17 13:32:21 -07:00
Ivan Sim 1c0f147718
Integration test for the 'upgrade' command (#2679)
Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-04-11 19:37:50 -07:00
Ivan Sim 4e19827457
Allow identity to be disabled during inject on existing cluster (#2686)
Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-04-11 13:37:06 -07:00
Andrew Seigner 46d1490f98
Skip proxy logging errors in integration tests (#2647)
The list of known proxy log errors has been growing, and causing regular
ci failures.

Skip proxy logging errors. The tests will continue to run and report
unexpected errors, but this will not fail the tests (and ci). Also break
out the controller log errors separately, and continue to fail on those.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-04-10 16:43:56 -07:00
Dennis Adjei-Baah c166b1db55
Service profiles integration tests (#2638)
adds integration tests for service profiles that test profile generation through the tap and Open API flags.

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2019-04-09 10:16:26 -07:00
Alejandro Pedraza edb225069c
Add validation webhook for service profiles (#2623)
Add validation webhook for service profiles

Fixes #2075

Todo in a follow-up PRs: remove the SP check from the CLI check.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-04-05 16:10:47 -05:00
Andrew Seigner 2f80add17a
Introduce inject integration tests (#2616)
This change introduces integration tests for `linkerd inject`. The tests
perform CLI injection, with and without params, and validates the
output, including annotations.

Also add some known errors in logs to `install_test.go`.

TODO:
- deploy uninjected and injected resources to a default and
  auto-injected cluster
- test creation and update

Part of #2459

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-04-05 11:42:49 -07:00
harsh jain 976bc40345 Fixes #2607: Remove TLS from stat (#2613)
Removes the TLS percentages from the stat command in the CLI.
2019-04-04 10:37:42 -07:00
Andrew Seigner a3bba0e143
Fix logging regex to handle any addr (#2619)
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-04-02 12:47:38 -07:00
Andrew Seigner a21c7be9f5
Fix ci logging errors (#2617)
Introduce some additional known logging errors to the integration tests

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-04-02 11:24:30 -07:00
Andrew Seigner b454f8fbc1
Introduce auto inject integration tests (#2595)
The integration tests were not exercising proxy auto inject.

Introduce a `--proxy-auto-inject` flag to `install_test.go`, which
now exercises install, check, and smoke test deploy for both manual and
auto injected use cases.

Part of #2569

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-04-01 10:32:56 -07:00
Andrew Seigner 9eab0e28a6
Introduce ServiceProfile integration tests (#2588)
The existing integration tests were not validating ServiceProfile
functionality.

Introduce ServiceProfile integration tests that:
- install control-plane ServiceProfiles via `linkerd install-sp`
- install smoke-test ServiceProfiles via `linkerd profile --proto`
- validate well-formed ServiceProfiles via `linkerd check`
- validate `linkerd routes` returns expected output

Fixes #2520

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-03-29 10:52:54 -07:00
Alejandro Pedraza e711ab7b15
Update tap integration test to account for always-on mTLS (#2589)
Out of all the integration tests (egress, get, stat, tap and
install_test) only in stat and tap do meshed (proxy-to-proxy) connections take
place, which we can test are 100% TLS.

For stat, #2537 already added such check for connections with the
Prometheus pod (connections to other pods are not meshed, apparently).

This commit adds such check for tap.

Fixes #2519
2019-03-29 12:28:47 -05:00
Risha Mars eda36e3258
Always show TCP open connections in the CLI (#2533)
Allow the TCP CONNECTIONS column to be shown on all stat queries in the CLI.
This column will now be called TCP_CONN for brevity.
Read/Write bytes will still only be shown on -o wide or -o json
2019-03-27 13:34:28 -07:00
Oliver Gould da0330743f
Provide peer Identities via the Destination API (#2537)
This change reintroduces identity hinting to the destination service.
The Get endpoint includes identities for pods that are injected with an
identity-mode of "default" and have the same linkerd control plane.

A `serviceaccount` label is now also added to destination response
metadata so that it's accessible in prometheus and tap.
2019-03-22 09:19:14 -07:00
Oliver Gould 34ea302a32
inject: Configure proxies to enable Identity (#2536)
This change adds a new `linkerd2-proxy-identity` binary to the `proxy`
container image as well as a `linkerd2-proxy-run` entrypoint script.

The inject process now sets environment variables on pods to support
identity, including identity names for the destination and identity
services.

As the proxy starts, the identity helper creates a key and CSR in a
tmpfs. As the proxy starts, it reads these files, as well as a
serviceaccount token, and provisions a certificate from controller.
The proxy's /ready endpoint will not succeed until a certificate has
been provisioned.

The proxy will not participate in identity with services other than the
controllers until the Destination controller is modified to provide
identities via discovery.
2019-03-21 18:39:05 -07:00
Kevin Lingerfelt 0d4eb02835
Add identity pod to check, web, and integration tests (#2529)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2019-03-19 20:49:31 -07:00
Oliver Gould 91c5f07650
proxy: Upgrade to identity-capable proxy (#2524)
The new proxy has changed its configuration as follows:

- `LISTENER` urls are now `LISTEN_ADDR` addresses;
- `CONTROL_URL` is now `DESTINATION_SVC_ADDR`;
- `*_NAMESPACE` vars are no longer needed;
- The `PROXY_ID` is now the `DESTINATION_CONTEXT`;
- The "metrics" port is now the "admin" port, since it serves more than
  just metrics;
- A readiness probe now checks a dedicated /ready endpoint eagerly.

Identity injection is **NOT** configured by this branch.
2019-03-19 14:20:39 -07:00
Oliver Gould 81f645da66
Remove `--tls=optional` and `linkerd-ca` (#2515)
The proxy's TLS implementation has changed to use a new _Identity_ controller.

In preparation for this, the `--tls=optional` CLI flag has been removed
from install and inject; and the `ca` controller has been deleted. Metrics
and UI treatments for TLS have **not** been removed, as they will continue to
be valuable for the new Identity system.

With the removal of the old identity scheme, the Destination service's proxy
ID field is now set with an opaque string (e.g. `ns:emojivoto`) to enable
locality awareness.
2019-03-18 17:40:31 -07:00
Andrew Seigner e5d2460792
Remove single namespace functionality (#2474)
linkerd/linkerd2#1721 introduced a `--single-namespace` install flag,
enabling the control-plane to function within a single namespace. With
the introduction of ServiceProfiles, and upcoming identity changes, this
single namespace mode of operation is becoming less viable.

This change removes the `--single-namespace` install flag, and all
underlying support. The control-plane must have cluster-wide access to
operate.

A few related changes:
- Remove `--single-namespace` from `linkerd check`, this motivates
  combining some check categories, as we can always assume cluster-wide
  requirements.
- Simplify the `k8s.ResourceAuthz` API, as callers no longer need to
  make a decision based on cluster-wide vs. namespace-wide access.
  Components either have access, or they error out.
- Modify the web dashboard to always assume ServiceProfiles are enabled.

Reverts #1721
Part of #2337

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-03-12 00:17:22 -07:00
Andrew Seigner 38288b0688
README and test updates (#2467)
Add another known log error to the integration tests.
Also bump README copyright to 2019

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-03-07 14:36:41 -08:00
Andrew Seigner a3d84eae7f
Add more known log errors to integration tests (#2457)
Relates to #2414, #2452

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-03-06 12:58:00 -08:00
Andrew Seigner 756a1312fd
Add more known log errors to integration tests (#2452)
linkerd/linkerd2#2414 introduced integration tests to ensure logs did
not contain unexpected errors. Additional errors are not being caught,
causing ci to fail.

This change adds more known log errors to the log regex.

Also temporarily enable integration tests in ci for this PR.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-03-06 10:16:48 -08:00
Andrew Seigner d850b02b66
Introduce logging and restart integration tests (#2414)
The integration tests deploy complete Linkerd environments into
Kubernetes, but do not check if the components are logging errors or
restarting.

Introduce integration tests to validation that all expected
control-plane containers (including `linkerd-proxy` and `linkerd-init`)
are found, logging no errors, and not restarting.

Fixes #2348

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-03-05 19:49:38 -08:00
Kevin Lingerfelt 0dcd69c465
Re-add pre-install permission checks (#2451)
* Re-add pre-install permission checks
* Fix ordering in check.go

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2019-03-05 19:17:21 -08:00
Andrew Seigner d90fa16727
Introduce NET_ADMIN cli check (#2421)
The `linkerd-init` container requires the NET_ADMIN capability to modify
iptables. The `linkerd check` command was not verifying this.

Introduce a `has NET_ADMIN capability` check, which does the following:
1) Lists all available PodSecurityPolicies, if none found, returns
success
2) For each PodSecurityPolicy, validate one exists that:
    - the user has `use` access AND
    - provides `*` or `NET_ADMIN` capability

A couple limitations to this approach:
- It is testing whether the user running `linkerd check` has NET_ADMIN,
  but during installation time it will be the `linkerd-init` pod that
  requires NET_ADMIN.
- It assumes the presense of PodSecurityPolicies in the cluster means
  the PodSecurityPolicy admission controller is installed. If the
  admission controller is not installed, but PSPs exists that restrict
  NET_ADMIN, `linkerd check` will incorrectly report the user does not
  have that capability.

This PR also fixes the `can create CustomResourceDefinitions` check to
not specify a namespace when doing a `create` check, as CRDs are
cluster-wide.

Fixes #1732

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-03-05 17:19:11 -08:00
Yan 4cd1f99e89 Check kubectl version as part of checks (#2358)
Fixes #2354

Signed-off-by: Yan Babitski <yan.babitski@gmail.com>
2019-03-01 10:03:59 -08:00
Andrew Seigner 10d9b7e493
Revert integration test check wait (#2400)
linkerd/linkerd2#2360 modified the `linkerd check --wait` param from `0`
to `1m`. Waiting on a check command causes spinner control characters in
the output, making output validation non-trivial.

Instead, revert the wait param back to `0`, and use
`TestHelper.RetryFor`.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-02-26 16:37:29 -08:00
Oliver Gould f7435800da
lint: Enable scopelint (#2364)
[scopelint][scopelint] detects a nasty reference-scoping issue in loops.

[scopelint]: https://github.com/kyoh86/scopelint
2019-02-24 08:59:51 -08:00
Andrew Seigner e300309af5
Increase integration test timeouts (#2360)
The integration tests occasionally timeout in ci when talking to
Kubernetes and Linkerd:
https://travis-ci.org/linkerd/linkerd2/jobs/497300669#L972
https://travis-ci.org/linkerd/linkerd2/jobs/497329339#L7284

Increase `linkerd check --wait` from `0` to `30s`.
Increase `HTTPGetURL` timeout from 30s to 1 minute.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-02-23 13:54:02 -08:00