* Fix auto-injecting pods and integration tests reporting
When creating an Event when auto-injection occurs (#3316) we try to
fetch the parent object to associate the event to it. If the parent
doesn't exist (like in the case of stand-alone pods) the event isn't
created. I had missed dealing with one part where that parent was
expected.
This also adds a new integration test that I verified fails before this
fix.
Finally, I removed from `_test-run.sh` some `|| exit_code=$?` that was
preventing the whole suite to report failure whenever one of the tests
in `/tests` failed.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
The `linkerd upgrade` integration test compares the output from two
commands:
- `linkerd upgrade control-plane`
- `linkerd upgrade control-plane --from-manifests`
The output of these commands include the heartbeat cronjob schedule,
which is generated based on the current time.
Modify the upgrade integration test to retry the manifest comparison one
time, assuming that `linkerd upgrade control-plane` should not take more
than one minute to execute.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Stop ignoring client-go log entries
Pipe klog output into logrus. Not doing this avoids us from seeing
client-go log entries, for some reason I don't understand.
To enable, `--controller-log-level` must be `debug`.
This was discovered while trying to debug sending events for #3253.
I added an integration test that fails when this piping is not in place.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
Followup to #3194
The namespace was too long for l5d-bot:
```
inject_test.go:117: failed to create
l5d-integration-auto-git-9688d9ba-inject-namespace-override-test
namespace: Namespace
"l5d-integration-auto-git-9688d9ba-inject-namespace-override-test"
is invalid: metadata.name: Invalid value:
"l5d-integration-auto-git-9688d9ba-inject-namespace-override-test":
must be no more than 63 characters
```
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
* Check for Namespace level config override annotations
* Add unit tests for namespace level config overrides
* add integration test for namespace level config override
* use different namespace for override tests
* check resource requests for integration tests
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Refactor proxy injection to use Helm charts
Fixes#3128
A new chart `/charts/patch` was created, that generates the JSON patch
payload that is to be returned to the k8s API when doing the injection
through the proxy injector, and it's also leveraged by the `linkerd
inject --manual` CLI.
The VFS was used by `linkerd install` to access the old chart under
`/chart`. Now the proxy injection also uses the Helm charts to generate
the JSON patch (see above) so we've moved the VFS from `cli/static` to a
new common place under `/pkg/charts/static`, and the new root for the VFS is
now `/charts`.
`linkerd install` hasn't yet migrated to use the new charts (that'll
happen in #3127), so the only change in that regard was the creation of
`/charts/chart` which is a symlink pointing to `/chart` that
`install.go` now uses, so that the VFS contains both the old and new
charts, as a temporary measure.
You can see that `/bin/Dockerfile-bin`, `/controller/Dockerfile` and
`/bin/build-cli-bin` do now `go generate` pointing to the new location
(and the `go generate` annotation was moved from `/cli/main.go` to
`pkg/charts/static/templates.go`).
The symlink trick doesn't work when building the binaries through
Docker, so `/bin/Dockerfile-bin` replaces the symlink with an actual
copy of `/chart`.
Also note that in `/controller/Dockerfile` we now need to include the
`prod` tag in `go install` like we do in `/bin/Dockerfile-bin` so that
the proxy injector does use the VFS instead of the local file system.
- The common logic to parse a chart has been moved from `install.go` to
`/pkg/charts/util.go`.
- The special ENV var in the proxy for "outbound router capacity" that
only applies to the Prometheus pod is now handled directly in the proxy
partial and all the associated go code could be removed.
- The `patch.go` lib for generating the JSON patch in go along
with its tests `patch_test.go` are no longer needed.
- Lots of functions in `/pkg/inject/inject.go` got removed/simplified
with their logic being moved into the charts themselves. As a
consequence lots of things in `inject_test.go` became irrelevant.
- Moved `template-values.go` from `/pkg/inject` to `pkg/charts` as that
contains the go structs representation of the chart variables that
will be leveraged in #3127.
Don't forget to run `/bin/helm.sh` whenever you make changes to charts
;-)
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
The Tap Service enabled tapping of any meshed pod, regardless of user
privilege.
This change introduces a new Tap APIService. Kubernetes provides
authentication and authorization of Tap requests, and then forwards
requests to a new Tap APIServer, which implements a Kubernetes
aggregated APIServer. The Tap APIServer authenticates the client TLS
from Kubernetes, and authorizes the user via a SubjectAccessReview.
This change also modifies the `linkerd tap` command to make requests
against the new APIService.
The Tap APIService implements these Kubernetes-style endpoints:
POST /apis/tap.linkerd.io/v1alpha1/watch/namespaces/:ns/tap
POST /apis/tap.linkerd.io/v1alpha1/watch/namespaces/:ns/:res/:name/tap
GET /apis
GET /apis/tap.linkerd.io
GET /apis/tap.linkerd.io/v1alpha1
GET /healthz
GET /healthz/log
GET /healthz/ping
GET /metrics
GET /openapi/v2
GET /version
Users authorize to the new `tap.linkerd.io/v1alpha1` via RBAC. Only the
`watch` verb is supported. Access is also available via subresources
such as `deployments/tap` and `pods/tap`.
This change introduces the following resources into the default Linkerd
install:
- Global
- APIService/v1alpha1.tap.linkerd.io
- ClusterRoleBinding/linkerd-linkerd-tap-auth-delegator
- `linkerd` namespace:
- Secret/linkerd-tap-tls
- `kube-system` namespace:
- RoleBinding/linkerd-linkerd-tap-auth-reader
Tasks not covered by this PR:
- `linkerd top`
- `linkerd dashboard`
- `linkerd profile --tap`
- removal of the unauthenticated tap controller
Fixes#2725, #3162, #3172
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Fixes https://github.com/linkerd/linkerd2/issues/2800#issuecomment-513740498
When the Linkerd proxy sends a query for a Kubernetes external name service to the destination service, the destination service returns `NoEndpoints: exists=false` because an external name service has no endpoints resource. Due to a change in the proxy's fallback logic, this no longer causes the proxy to fallback to either DNS or SO_ORIG_DST and instead fails the request. The net effect is that Linkerd fails all requests to external name services.
We change the destination service to instead return `InvalidArgument` for external name services. This causes the proxy to fallback to SO_ORIG_DST instead of failing the request.
Signed-off-by: Alex Leong <alex@buoyant.io>
PR #3056 introduced a cluster heartbeat cronjob to the Linkerd
installation. This implies the user installing Linkerd requires the
privileges to create CronJobs.
Update `linkerd check` to validate the user has privileges necessary to
create CronJobs.
Fixes#3057
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
`linkerd check`, the web dashboard, and Grafana all perform version
checks to validate Linkerd is up to date. It's common for users to
seldom execute these codepaths. This makes it difficult to identify what
versions of Linkerd are currently in use and what environments it is
being run in, which helps prioritize testing and backports.
Introduce a `heartbeat` CronJob to the default Linkerd install. The
cronjob executes every 24 hours, starting from 5 minutes after
`linkerd install` is run.
Example check URL:
https://versioncheck.linkerd.io/version.json?
install-time=1562761177&
k8s-version=v1.15.0&
meshed-pods=8&
rps=3&
source=heartbeat&
uuid=cc4bb700-3314-426a-9f0f-ec588b9df020&
version=git-b97ee9f7
Fixes#2961
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The openAPIV3Schema validation in the ServiceProfiles CRD is very limited in what it can validate and is obviated by more sophisticated validation done by the validating admission controller. Therefore, we would like to remove the openAPIV3Schema validation to reduce the size and complexity of the CRD object.
To do so, we must also bump the version of the ServiceProfile custom resource from v1alpha1 to v1alpha2. This ensures that when the controller is upgraded, it will attempt to watch the v1alpha2 resource. If it cannot (because, for example, the controller pod started before the ServiceProfile CRD was updated and therefore the v1alpha2 version does not exist) then it will go into a crash loop backoff until it can. This essentially means that the controller will wait for the CRD to be upgraded to include v1alpha2 before it will start.
Bumping the version is necessary because if we did not, it would be possible for the controller to start before the CRD is updated (removing the validation). In this case, when the CRD is edited, the controller will lose its list watch on ServiceProfiles and will stop getting updates.
Signed-off-by: Alex Leong <alex@buoyant.io>
Integration tests may fail and leave behind namespaces that following
builds aren't able to clean up because the git sha is being included in
the namespace name, and the following builds don't know about those
shas.
This modifies the `test-cleanup` script to delete based on object labels
instead of relying on the objects names, now that after 2.4 all the
control plane components are labeled. Note that this will also remove
non-testing linkerd namespaces, but we were already kinda doing that
partially because we were removing the cluster-level resources (CRDs,
webhook configs, clusterroles, clusterrolebindings, psp).
`test-cleanup` no longer receives a namespace name as an argument.
The data plane namespaces aren't labeled though, so I've added the
`linkerd.io/is-test-data-plane` label for them in
`CreateNamespaceIfNotExists()`, and making sure all tests that need a
data plaine explicitly call that method instead of creating the
namespace as a side-effect in `KubectlApply()`.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
During operations with `linkerd stat` sometimes it's not clear the actual
pod status.
This commit introduces a method, to the `k8s`package, getting the pod status,
based on [`kubectl` logic](33a3e325f7/pkg/printers/internalversion/printers.go (L558-L640))
to expose the `STATUS` column for pods . Also, it changes the stat command
on the` cli` package adding a column when the resource type is a Pod.
Fixes#1967
Signed-off-by: Jonathan Juares Beber <jonathanbeber@gmail.com>
`linkerd check --pre` validates that PSPs provide `NET_ADMIN`, but was
not validating `NET_RAW`, despite `NET_RAW` being required by Linkerd's
proxy-init container since #2969.
Introduce a `has NET_RAW capability` check to `linkerd check --pre`.
Fixes#3054
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The `linkerd check` for healthy ReplicaSets had a generic
`control plane components ready` description, and a hint anchor to
`l5d-existence-psp`. While a ReplicaSet failure could definitely occur
due to psp, that hintAnchor was already in use by the "control plane
PodSecurityPolicies exist" check.
Rename the `control plane components ready` check to
`control plane replica sets are ready`, and the hintAnchor from
`l5d-existence-psp` to `l5d-existence-replicasets`.
Relates to https://github.com/linkerd/website/issues/372.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Introduce new checks to determine existence of global resources and the
'linkerd-config' config map.
* Update pre-check to check for existence of global resources
This ensures that multiple control planes can't be installed into
different namespaces.
* Update integration test clean-up script to delete psp and crd
Signed-off-by: Ivan Sim <ivan@buoyant.io>
`linkerd check` validates whether PSP's exist, and if the caller has the
`NET_ADMIN` capability. This check was previously failing if `NET_ADMIN`
was not found, even in the case where the PSP admission controller was
not running. Related, `linkerd install` now includes a PSP, so
`linkerd check` should also validate that the caller can create PSP's.
Modify the `NET_ADMIN` check to warn, but not fail, if PSP's are found
but the caller does not have `NET_ADMIN`. Update the warning message to
mention that this is only a problem if the PSP admission controller is
running (and will only be a problem during injection, since #2920
handles control plane installation by adding its own PSP).
Also introduce a check to validate the caller can create PSP's.
Fixes#2884, #2849
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Fixes#2927
Also moved `TestInstallSP` after `TestCheckPostInstall` so we're sure
the validating webhook is ready before installing a service profile.
Signed-off-by: Alejandro Pedraza Borrero <alejandro@buoyant.io>
This is a major refactor of the destination service. The goals of this refactor are to simplify the code for improved maintainability. In particular:
* Remove the "resolver" interfaces. These were a holdover from when our decision tree was more complex about how to handle different kinds of authorities. The current implementation only accepts fully qualified kubernetes service names and thus this was an unnecessary level of indirection.
* Moved the endpoints and profile watchers into their own package for a more clear separation of concerns. These watchers deal only in Kubernetes primitives and are agnostic to how they are used. This allows a cleaner layering when we use them from our gRPC service.
* Renamed the "listener" types to "translator" to make it more clear that the function of these structs is to translate kubernetes updates from the watcher to gRPC messages.
Signed-off-by: Alex Leong <alex@buoyant.io>
Split proxy-init into separate repo
Fixes#2563
The new repo is https://github.com/linkerd/linkerd2-proxy-init, and I
tagged the latest there `v1.0.0`.
Here, I've removed the `/proxy-init` dir and pinned the injected
proxy-init version to `v1.0.0` in the injector code and tests.
`/cni-plugin` depends on proxy-init, so I updated the import paths
there, and could verify CNI is still working (there is some flakiness
but unrelated to this PR).
For consistency, I added a `--init-image-version` flag to `linkerd
inject` along with its corresponding override config annotation.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
In #2679 we introduced an upgrade integration test. At the time we only
supported upgrading from a recent edge. Since that PR, a stable build
was released supporting upgrade.
Modify the upgrade integration test to upgrade from the latest stable
rather than latest edge. This fulfills the original intent of #2669.
Also add some known k8s event warnings.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Support for resources opting out of tap
Implements the `linkerd inject --disable-tap` flag (although hidden pending #2811) and the config override annotation `config.linkerd.io/disable-tap`.
Fixes#2778
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
Integration test for k8s events generated during install
Fixes#2713
I did make sure a scenario like the one described in #2964 is caught.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
With the server configured to response with a failure of 50%, the test first
checks to ensure the actual success rate is less than 100%. Then the
service profile is edited to perform retries. The test then checks to
ensure the effective success rate is at least 95%.
This is (hopefully) more reliable than changing the test to perform waits and
retries until there is a difference between effective success rate and actual
success rate and compare them.
Signed-off-by: Ivan Sim <ivan@buoyant.io>
Add support for `linkerd check config`. Validates the existence of the
Linkerd Namespace, ClusterRoles, ClusterRoleBindings, ServiceAccounts,
and CustomResourceDefitions.
Part of #2337
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
PR #2737 introduced a warning in the proxy-injector when owner ref
lookups failed due to not having up-to-date ReplicaSet information. That
warning may occur during integration tests, causing a failure.
Add the warning as a known controller log message. The warning will be
printed as a skipped test, allowing the integration tests to pass.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
`linkerd install` supports a 2-stage install process, `linkerd upgrade`
did not.
Add 2-stage support for `linkerd upgrade`. Also exercise multi-stage
functionality during upgrade integration tests.
Part of #2337
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Fixes#2720 and 2711
This changes the default behavior of `linkerd inject` to not inject the
proxy but just the `linkerd.io/inject: enabled` annotation for the
auto-injector to pick it up (regardless of any namespace annotation).
A new `--manual` mode was added, which behaves as before, injecting
the proxy in the command output.
The unit tests are running with `--manual` to avoid any changes in the
fixtures.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
Add config.linkerd.io/disable-identity annotation
First part of #2540
We'll tackle support for `--disable-identity` in `linkerd install` in a
separate commit.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
The integration tests check container logs for errors. When an error is
encountered that matches a list of expected errors, it was hidden, and
the test passed.
Modify the integration tests to report known errors in logs via
`t.Skipf`.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Fixes#2465
* Add check for unschedulable pods and psp issues (#2465)
* Return error reason and message on pod or node failure
Signed-off-by: Gaurav Kumar <gaurav.kumar9825@gmail.com>
* The 'linkerd-version' CLI flag is renamed to 'control-plane-version'
* Add version field to proxy config
* Add the control plane version to the global config
* Unit test for init image version
* Use more specific control plane and proxy versions in unit tests
Signed-off-by: Ivan Sim <ivan@buoyant.io>
The `linkerd upgrade` command read the control-plane's config from
Kubernetes, which required the environment to be configured to connect
to the appropriate k8s cluster.
Intrdouce a `linkerd upgrade --from-manifests` flag, allowing the user
to feed the output of `linkerd install` into the upgrade command.
Fixes#2629
Signed-off-by: Andrew Seigner <siggy@buoyant.io>