## edge-19.8.7
* CLI
* Added a global `--cluster-domain` flag to `linkerd install` to allow
installing Linkerd into a Kubernetes cluster that uses a base domain other
than `cluster.local.` (thanks @arminbuerkle!)
* Web UI
* Fixed an issue that caused unnecessary Prometheus queries, reducing load on
Prometheus
* Control Plane
* Added Kubernetes events (and log lines) when the proxt injector injects a
deployment, and when injection is skipped
* Proxy
* Changed the proxy to require the `LINKERD2_PROXY_DESTINATION_SVC_ADDR`
environment variable when starting up
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
* Avoid the dashboard requesting stats when not needed
Create an alternative to `urlsForResource` called
`urlsForResourceNoStats` that makes use of the `skip_stats` parameter in
the stats API (created in #1871) that doesn't query Prometheus when not needed.
When testing using the dashboard looking at the linkerd namespace,
queries per second went down from 2874 to 2756, a 4% decrease.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
* Fix auto-injecting pods and integration tests reporting
When creating an Event when auto-injection occurs (#3316) we try to
fetch the parent object to associate the event to it. If the parent
doesn't exist (like in the case of stand-alone pods) the event isn't
created. I had missed dealing with one part where that parent was
expected.
This also adds a new integration test that I verified fails before this
fix.
Finally, I removed from `_test-run.sh` some `|| exit_code=$?` that was
preventing the whole suite to report failure whenever one of the tests
in `/tests` failed.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
The `linkerd upgrade` integration test compares the output from two
commands:
- `linkerd upgrade control-plane`
- `linkerd upgrade control-plane --from-manifests`
The output of these commands include the heartbeat cronjob schedule,
which is generated based on the current time.
Modify the upgrade integration test to retry the manifest comparison one
time, assuming that `linkerd upgrade control-plane` should not take more
than one minute to execute.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Set custom cluster domain in GetServiceProfileFor
* Set custom cluster domain in tap server
Move fetching cluster domain for tap server to cmd main
* Handle fetchting cluster domain errors separately
* Use custom cluster domain for traffic split adaptor
Signed-off-by: Armin Buerkle <armin.buerkle@alfatraining.de>
The Linkerd Community Code of Conduct lives in the wiki:
https://github.com/linkerd/linkerd/wiki/Linkerd-code-of-conduct
Per Github's Community Profile checklist
(https://github.com/linkerd/linkerd2/community), it should live at the
root of the repo.
Copy the contents of the wiki to a markdown file in the root of the
repo. Once merged, we will modify the wiki to point to the repo.
Also update README.md to indicate k8s 1.12+ is required.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Stop ignoring client-go log entries
Pipe klog output into logrus. Not doing this avoids us from seeing
client-go log entries, for some reason I don't understand.
To enable, `--controller-log-level` must be `debug`.
This was discovered while trying to debug sending events for #3253.
I added an integration test that fails when this piping is not in place.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
The integration tests under `/test` were run separately via l5d-bot,
lacking the feedback and job management provided by ci.
Enable integration tests in ci, via a docker build and kind clusters
executed on a remote DOCKER_HOST.
CI runs are now broken into two stages, run serially. Each stage is
composed of jobs run in parallel:
- Setup stage
- Validate go deps
- Remote docker build
- Kind cluster setup (deep)
- Kind cluster setup (upgrade)
- Kind cluster setup (helm)
- Test stage
- Go unit tests
- Node.js unit tests
- Kind integration tests (deep)
- Kind integration tests (upgrade)
- Kind integration tests (helm)
This PR also modifies `bin/test-run.sh` to always set `--failfast` for
Go tests.
Also introduce `bin/docker` and `bin/kubectl` scripts, to ensure
cacheable, pinned executables in ci.
The existing integration tests for master merges and docker pushes,
running against GKE, remain in place.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Have the proxy-injector emit events upon injection/skipping injection
Fixes#3253
Have the proxy-injector emit an event whenever a injection happens, or
when injection is skipped for some reason (also added that reason into
the proxy-injector logs). The level is associated to the parent workload
(it can't be associated to the pod because at this point the pod hasn't
been persisted).
The event recorder was setup at the `webhook/server.go` level and passed
to the proxy-injector's `Inject` function. The sp-validator thus also
has access to the event recorder, but for now it's not using it.
Related changes:
- Refactored `api.GetOwnerKindAndName()` to have it return a more
generic object.
- Refactored `report.Injectable()` to also have it return the reason why
a workload is not injectable.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
GitHub's community guidelines recommend a pull request template, the repo was
lacking one.
Introduce a `PULL_REQUEST_TEMPLATE.md` file.
Once merged, the
[Community profile checklist](https://github.com/linkerd/linkerd2/community)
should indicate the repo now provides a pull request template.
Fixes#3321
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Fixes#3052.
Adds a unit test for the edges API endpoint. To maintain a consistent order for
testing, the returned rows in api/public/edges.go are now sorted.
The `bin/test-run` script executed upgrade, helm, and deep integration
test in series, but was structured in a way that did not permit running
these tests individually.
Move most of the logic from `bin/test-run` to a supporting library,
`bin/test-run.sh`, which will provide the ability to execute integration
tests individually. `bin/test-run`'s behavior is unchanged, it continues
to run upgrade, helm, and deep integration tests in series.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
`bin/helm` and `bin/protoc` were downloading their binaries into
`./target`, while `bin/lint` was downloading to the root of the repo.
Also travis was caching `./target`, which could become problematic if
that part of the test script relied on `target/cli/linux/linkerd`.
Standardize helm, kind, lint, and protoc to all download into
`./target/bin`, and modify travis to strictly cache that subdirectory.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
In preparation for #3242, the destination controller will need to
support a broader set of valid authorities including IP addresses.
This change modifies the destination controller's authority-parsing code
so that the is-this-a-kubernete-service-name decision is decoupled from
parsing of authorities into their consituent parts.
The `Get` API now explicitly handles IP address names, though it
currently fails all such resolutions.
* Rename template-values.go
* Define new constructor of charts.Values type
* Move all Helm values related code to the pkg/charts package
* Bump dependency
* Use '/' in filepath to remain compatible with VFS requirement
* Add unit test to verify Helm YAML output
* Alejandro's feedback
* Add unit test for Helm YAML validation (HA)
Signed-off-by: Ivan Sim <ivan@buoyant.io>
* Always use forward-slash when interacting with the VFS
Fixes#3283
Our VFS implementation relies on `net.http.FileSystem` which always
expects `/` regardless of the OS.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
### Summary
As an initial attempt to secure the connection from clients to the gRPC tap
server on the tap Pod, the tap `addr` only listened on localhost.
As @adleong pointed out #3257, this was not actually secure because the inbound
proxy would establish a connection to localhost anyways.
This change removes the gRPC tap server listener and changes `TapByResource`
requests to interface with the server object directly.
From this, we know that all `TapByResourceRequests` have gone through the tap
APIServer and thus authorized by RBAC.
### Details
[NewAPIServer](ef90e0184f/controller/tap/apiserver.go (L25-L26)) now takes a [GRPCTapServer](f6362dfa80/controller/tap/server.go (L33-L34)) instead of a `pb.TapClient` so that
`TapByResource` requests can interact directly with the [TapByResource](f6362dfa80/controller/tap/server.go (L49-L50)) method.
`GRPCTapServer.TapByResource` now makes a private [grpcTapServer](ef90e0184f/controller/tap/handlers.go (L373-L374)) that satisfies
the [tap.TapServer](https://godoc.org/github.com/linkerd/linkerd2/controller/gen/controller/tap#TapServer) interface. Because this interface is satisfied, we can interact
with the tap server methods without spawning an additional listener.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
Followup to #3194
The namespace was too long for l5d-bot:
```
inject_test.go:117: failed to create
l5d-integration-auto-git-9688d9ba-inject-namespace-override-test
namespace: Namespace
"l5d-integration-auto-git-9688d9ba-inject-namespace-override-test"
is invalid: metadata.name: Invalid value:
"l5d-integration-auto-git-9688d9ba-inject-namespace-override-test":
must be no more than 63 characters
```
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
* Check for Namespace level config override annotations
* Add unit tests for namespace level config overrides
* add integration test for namespace level config override
* use different namespace for override tests
* check resource requests for integration tests
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Wait for `helm delete` to finish in integration test
Followup to #3251
In `helm_cleanup` block till the linkerd namespace has been deleted
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
This PR adds a test for trafficsplits to stat_summary_test.go.
Because the test requires a consistent order for returned rows, trafficsplit
rows in stat_summary.go are now sorted by apex + leaf name before being
returned.
### Summary
After the addition of the tap APIServer, all the logic related to tap in the public API no longer needs to be there. The servers and clients that are created but not used, as well as all the old testing infrastrucure related to tap can be removed.
This deprecates TapByResource and therefore required an update to the protobuf files with `bin/protoc-go.sh`. While the change to deprecate this method was extremely small, a lot of protobuf fils were updated in the process. These changes to the code and protobuf files should probably remain coupled since `TapByResource` is officially deprecated in the public API, but a majority of the additions/deletions are related to those files.
This draft passes `go test` as well as a local run of the integration tests.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
PR #3217 re-introduced container metrics collection to
linkerd-prometheus. This enabled linkerd-heartbeat to collect mem and
cpu metrics at the container-level.
Add container cpu and mem metrics to heartbeat requests. For each of
(destination, prometheus, linkerd-proxy), collect maximum memory and p95
cpu.
Concretely, this introduces 7 new query params to heartbeat requests:
- p99-handle-us
- max-mem-linkerd-proxy
- max-mem-destination
- max-mem-prometheus
- p95-cpu-linkerd-proxy
- p95-cpu-destination
- p95-cpu-prometheus
Part of #2961
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
PR #3247 introduced additional helm cleanup in `bin/test-cleanup`.
During the integration tests, `bin/test-cleanup` is called prior to
`helm_cleanup` in `bin/test-run`. This causes `helm_cleanup` to fail, as
resources have already been deleted by `bin/test-cleanup`, and the
integration tests fail with `FAIL: error cleaning up Helm`.
Modify the integration tests to first call `helm_cleanup` prior to
calling `bin/test-cleanup`.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
This PR adds `trafficsplit` as a supported resource for the `linkerd stat` command. Users can type `linkerd stat ts` to see the apex and leaf services of their trafficsplits, as well as metrics for those leaf services.
When helm integration tests fail, `bin/test-run` exits prior to calling
`helm_cleanup`, leaving behind a helm namespace and clusterrolebinding.
Update `bin/test-cleanup` to delete any remaining helm resources.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>