This change adds a new `linkerd2-proxy-identity` binary to the `proxy`
container image as well as a `linkerd2-proxy-run` entrypoint script.
The inject process now sets environment variables on pods to support
identity, including identity names for the destination and identity
services.
As the proxy starts, the identity helper creates a key and CSR in a
tmpfs. As the proxy starts, it reads these files, as well as a
serviceaccount token, and provisions a certificate from controller.
The proxy's /ready endpoint will not succeed until a certificate has
been provisioned.
The proxy will not participate in identity with services other than the
controllers until the Destination controller is modified to provide
identities via discovery.
The new proxy has changed its configuration as follows:
- `LISTENER` urls are now `LISTEN_ADDR` addresses;
- `CONTROL_URL` is now `DESTINATION_SVC_ADDR`;
- `*_NAMESPACE` vars are no longer needed;
- The `PROXY_ID` is now the `DESTINATION_CONTEXT`;
- The "metrics" port is now the "admin" port, since it serves more than
just metrics;
- A readiness probe now checks a dedicated /ready endpoint eagerly.
Identity injection is **NOT** configured by this branch.
The proxy's TLS implementation has changed to use a new _Identity_ controller.
In preparation for this, the `--tls=optional` CLI flag has been removed
from install and inject; and the `ca` controller has been deleted. Metrics
and UI treatments for TLS have **not** been removed, as they will continue to
be valuable for the new Identity system.
With the removal of the old identity scheme, the Destination service's proxy
ID field is now set with an opaque string (e.g. `ns:emojivoto`) to enable
locality awareness.
linkerd/linkerd2#1721 introduced a `--single-namespace` install flag,
enabling the control-plane to function within a single namespace. With
the introduction of ServiceProfiles, and upcoming identity changes, this
single namespace mode of operation is becoming less viable.
This change removes the `--single-namespace` install flag, and all
underlying support. The control-plane must have cluster-wide access to
operate.
A few related changes:
- Remove `--single-namespace` from `linkerd check`, this motivates
combining some check categories, as we can always assume cluster-wide
requirements.
- Simplify the `k8s.ResourceAuthz` API, as callers no longer need to
make a decision based on cluster-wide vs. namespace-wide access.
Components either have access, or they error out.
- Modify the web dashboard to always assume ServiceProfiles are enabled.
Reverts #1721
Part of #2337
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
linkerd/linkerd2#2414 introduced integration tests to ensure logs did
not contain unexpected errors. Additional errors are not being caught,
causing ci to fail.
This change adds more known log errors to the log regex.
Also temporarily enable integration tests in ci for this PR.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The integration tests deploy complete Linkerd environments into
Kubernetes, but do not check if the components are logging errors or
restarting.
Introduce integration tests to validation that all expected
control-plane containers (including `linkerd-proxy` and `linkerd-init`)
are found, logging no errors, and not restarting.
Fixes#2348
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The `linkerd-init` container requires the NET_ADMIN capability to modify
iptables. The `linkerd check` command was not verifying this.
Introduce a `has NET_ADMIN capability` check, which does the following:
1) Lists all available PodSecurityPolicies, if none found, returns
success
2) For each PodSecurityPolicy, validate one exists that:
- the user has `use` access AND
- provides `*` or `NET_ADMIN` capability
A couple limitations to this approach:
- It is testing whether the user running `linkerd check` has NET_ADMIN,
but during installation time it will be the `linkerd-init` pod that
requires NET_ADMIN.
- It assumes the presense of PodSecurityPolicies in the cluster means
the PodSecurityPolicy admission controller is installed. If the
admission controller is not installed, but PSPs exists that restrict
NET_ADMIN, `linkerd check` will incorrectly report the user does not
have that capability.
This PR also fixes the `can create CustomResourceDefinitions` check to
not specify a namespace when doing a `create` check, as CRDs are
cluster-wide.
Fixes#1732
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
linkerd/linkerd2#2360 modified the `linkerd check --wait` param from `0`
to `1m`. Waiting on a check command causes spinner control characters in
the output, making output validation non-trivial.
Instead, revert the wait param back to `0`, and use
`TestHelper.RetryFor`.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Hint URLs should display for all failed checks in `linkerd check`, but
were not displaying for RPC checks.
Fix `runCheckRPC` to pass along the hintAnchor to the check result.
Also rename the second `can query the control plane API` to
`control plane self-check`, as there were two checks with that name.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The httpbin responses recently started returning `url` fields starting
with `https`, regardless of the protocol used in the request.
This change modifies the egress integration test to always expect
`https` in the `url` response field.
This is a workaround until #2316 is implemented.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Up until now, the proxy-api controller service has been the sole service
that the proxy communicates with, implementing the majoriry of the API
defined in the `linkerd2-proxy-api` repo. But this is about to change:
linkerd/linkerd2-proxy-api#25 introduces a new Identity service; and
this service must be served outside of the existing proxy-api service
in the linkerd-controller deployment (so that it may run under a
distinct service account).
With this change, the "proxy-api" name becomes less descriptive. It's no
longer "the service that serves the API for the proxy," it's "the
service that serves the Destination API to the proxy." Therefore, it
seems best to bite the bullet and rename this to be the "destination"
service (i.e. because it only serves the
`io.linkerd.proxy.destination.Destination` service).
Co-authored-by: Kevin Lingerfelt <kl@buoyant.io>
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
Version checks were not validating that the cli version matched the
control plane or data plane versions.
Add checks via the `linkerd check` command to validate the cli is
running the same version as the control and data plane.
Also add types around `channel-version` string parsing and matching. A
consequence being that during development `version.Version` changes from
`undefined` to `dev-undefined`.
Fixes#2076
Depends on linkerd/website#101
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The default font in Windows console did not support the Unicode
characters recently added to check and inject commands. Also the color
library the linkerd cli depends on was not being used in a
cross-platform way.
Replace the existing Unicode characters used in `check` and `inject`
with characters available in most fonts, including Windows console.
Similarly replace the spinner used in `check` with one that uses
characters available in most fonts.
Modify `check` and `inject` to use `color.Output` and `color.Error`,
which wrap `os.Stdout` and `os.Stderr`, and perform special
tranformations when on Windows.
Add a `--no-color` option to `linkerd logs`. While stern uses the same
color library that `check`/`inject` use, it is not yet using the
`color.Output` API for Windows support. That issue is tracked at:
https://github.com/wercker/stern/issues/69
Relates to https://github.com/linkerd/linkerd2/pull/2087
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The outputs of the `check` and `inject` commands did not vary much
between successful and failed executions, and were a bit verbose and
challenging to parse.
Reorganize output of `check` and `inject` commands, to provide more
output when errors occur, and less output when successful.
Specific changes:
`linkerd check`
- visually group checks by category
- introduce `hintURL`'s, to provide doc links when checks fail
- add spinners when retrying, remove additional retry lines
- colored unicode characters to indicate success/warning/failure
`linkerd inject`
- modify default output to mirror `kubectl apply`
- only output non-successful inject reports
- support `--verbose` flag to output all inject reports
Fixes#1471, #1653, #1656, #1739
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The set of health checks to be executed were dependent on a combination
of check enums and boolean options.
This change modifies the health checks to be governed strictly by a set
of enums.
Next steps:
- tightly couple category IDs to names
- tightly couple checks to their parent categories
- programmatic control over check ordering
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Setup port-forwarding for linkerd dashboard command
* Output port-forward logs when --verbose flag is set
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
Filtering by Kubernetes job was not supported. Also filtering by any unknown
type caused a panic.
Add filtering support by Kubernetes job, with special case mapping `job` to
`k8s_job`, to not conflict with Prometheus' job label.
Fix panic when unknown type specified as a `--from` or `--to` flag.
Fix `job` label from `linkerd-proxy` overwriting Prometheus `job` label at
collection time. This caused all metrics collected by proxy sidecars in
Kubernetes jobs to be collected into an incorrect Prometheus job, rather than
the expected `linkerd-proxy` Prometheus job.
Fix `unsupported resource type` tap error message incorrectly printing the
target resource rather than the destination.
Set `--controller-log-level debug` in `install_test.go` for easier debugging.
Expose `slow-cooker`'s metrics via a k8s service in the tap integration test, to
validate proxy requests with a job as destination.
Fixes#1872
Part of #627
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
When running integration tests in a Kubernetes cluster that sometimes takes a little longer to get pods ready, the integration tests fail tests too early because most tests have a retry timeout of 30 seconds.
This PR bumps up this retry timeout for `TestInstall` to 3 minutes. This gives the test enough time to download any new docker images that it needs to complete succesfully and also reduces the need to have large timeout values for subsequent tests. This PR also refactors `CheckPods` to check that all containers in a pods for a deployment are in a`Ready` state. This helps also helps in ensuring that all docker images have been downloaded and the pods are in a good state.
Tests were run on the community cluster and all were successful.
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
This PR adds a check before the TestCheckProxy executes the CLI command to make sure the control plane has installed all its pods. This avoids the situation where the test would fail because the pods are retrying the data plane check. The PR's base is #1835 but will switch to master once that PR merges.
Relates to PR #1835Fixes#1733
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
Fix broken docker build by moving Service Profile conversion and validation into `/pkg`.
Fix broken integration test by adding service profile validation output to `check`'s expected output.
Testing done:
* `gotest -v ./...`
* `bin/docker-build`
* `bin/test-run (pwd)/bin/linkerd`
Signed-off-by: Alex Leong <alex@buoyant.io>
Add checks to `linkerd check --pre` to verify that the user has permission to create:
* namespaces
* serviceaccounts
* clusterroles
* clusterrolebindings
* services
* deployments
* configmaps
Signed-off-by: Alex Leong <alex@buoyant.io>
The `linkerd check` command was not validating whether data plane
proxies were successfully reporting metrics to Prometheus.
Introduce a new check that validates data plane proxies are found in
Prometheus. This is made possible via the existing `ListPods` endpoint
in the public API, which includes an `Added` field, indicating a pod's
metrics were found in Prometheus.
Fixes#1517
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
`linkerd inject` was not checking its input for known sidecars and
initContainers.
Modify `linkerd inject` to check for existing sidecars and
initContainers, specifically, Linkerd, Istio, and Contour.
Part of #1516
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
linkerd only routes TCP data, but `linkerd inject` does not warn when it
injects into pods with ports set to `protocol: UDP`.
Modify `linkerd inject` to warn when injected into a pod with
`protocol: UDP`. The Linkerd sidecar will still be injected, but the
stderr output will include a warning.
Also add stderr checking on all inject unit tests.
Part of #1516.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
If an input file is un-injectable, existing inject behavior is to simply
output a copy of the input.
Introduce a report, printed to stderr, that communicates the end state
of the inject command. Currently this includes checking for hostNetwork
and unsupported resources.
Malformed YAML documents will continue to cause no YAML output, and return
error code 1.
This change also modifies integration tests to handle stdout and stderr separately.
example outputs...
some pods injected, none with host networking:
```
hostNetwork: pods do not use host networking...............................[ok]
supported: at least one resource injected..................................[ok]
Summary: 4 of 8 YAML document(s) injected
deploy/emoji
deploy/voting
deploy/web
deploy/vote-bot
```
some pods injected, one host networking:
```
hostNetwork: pods do not use host networking...............................[warn] -- deploy/vote-bot uses "hostNetwork: true"
supported: at least one resource injected..................................[ok]
Summary: 3 of 8 YAML document(s) injected
deploy/emoji
deploy/voting
deploy/web
```
no pods injected:
```
hostNetwork: pods do not use host networking...............................[warn] -- deploy/emoji, deploy/voting, deploy/web, deploy/vote-bot use "hostNetwork: true"
supported: at least one resource injected..................................[warn] -- no supported objects found
Summary: 0 of 8 YAML document(s) injected
```
TODO: check for UDP and other init containers
Part of #1516
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
`ca-bundle-distributor` described the original role of the program but
`ca` ("Certificate Authority") better describes its current role.
Signed-off-by: Brian Smith <brian@briansmith.org>
This PR begins to migrate Conduit to Linkerd2:
* The proxy has been completely removed from this repo, and is now located at
github.com/linkerd/linkerd2-proxy.
* A `Dockerfile-proxy` has been added to fetch the most-recently published proxy
binary from build.l5d.io.
* Proxy-specific protobuf bindings have been moved to
github.com/linkerd/linkerd2-proxy-api.
* All docker images now use the gcr.io/linkerd-io registry.
* `inject` now uses `LINKERD2_PROXY_` environment variables
* Go paths have been updated to reflect the new (future) repo location.
- Return pod uptimes from the GetPods endpoint
- Adds filtering by namespace to api.GetPods
- Adds a --namespace filter to conduit get pods
- Adds pod uptimes to the controller component toolitps on the ServiceMesh page
- Moves the ServiceMesh page back to using /api/pods
Adds the ability to query by a new non-kubernetes resource type, "authorities",
in the StatSummary api.
This includes an extensive refactor of stat_summary.go to deal with non-kubernetes
resource types.
- Add documentation to Resource in the public api so we can use it for authority
- Handle non-k8s resource requests in the StatSummary endpoint
- Rewrite stat summary fetching and parsing to handle non-k8s resources
- keys stat summary metric handling by Resource instead of a generated string
- Adds authority to the CLI
- Adds /authorities to the Web UI
- Adds some more stat integration and unit tests
* Add controller admin servers and readiness probes
* Tweak readiness probes to be more sane
* Refactor based on review feedback
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
Problem
`conduit stat` would cause a panic for any resource that wasn't in the list
of StatAllResourceTypes
This bug was introduced by https://github.com/runconduit/conduit/pull/1088/files
Solution
Fix writeStatsToBuffer to not depend on what resources are in StatAllResourceTypes
Also adds a unit test and integration test for `conduit stat ns`
* Add integration test for conduit controller stats
* Update test for new SECURED column in stat output
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The ListPods endpoint's logic resides in the telemetry service, which is
going away.
Move ListPods logic into public-api, use new k8s informer APIs.
Fixes#694
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Add tests/utils/scripts for running integration tests
Add a suite of integration tests in the `test/` directory, as well as
utilities for testing in the `testutil/` directory.
You can use the `bin/test-run` script to run the full suite of tests,
and the `bin/test-cleanup` script to cleanup after the tests.
The test/README.md file has more information about running tests.
@pcalcado, @franziskagoltz, and @rmars also contributed to this change.
* Create TEST.md file at the root of the repo
* Update based on review feedback
* Relax external service IP timeout for GKE
* Update TEST.md with more info about different types of test runs
* More updates to TEST.md based on review feedback
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>