Commit Graph

1682 Commits

Author SHA1 Message Date
Eliza Weisman 0914a668f6
add changelog for edge-19.8.7 (#3348)
## edge-19.8.7

* CLI
  * Added a global `--cluster-domain` flag to `linkerd install` to allow
    installing Linkerd into a Kubernetes cluster that uses a base domain other
    than `cluster.local.` (thanks @arminbuerkle!)
* Web UI
  * Fixed an issue that caused unnecessary Prometheus queries, reducing load on
    Prometheus
* Control Plane
  * Added Kubernetes events (and log lines) when the proxt injector injects a
    deployment, and when injection is skipped
* Proxy
  * Changed the proxy to require the `LINKERD2_PROXY_DESTINATION_SVC_ADDR`
    environment variable when starting up

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2019-08-29 14:03:19 -07:00
Eliza Weisman 96e8ed0165
proxy: Update proxy to 9a84914 (#3347)
* cargo: Set authors to Linkerd Developers (linkerd/linkerd2-proxy#322)
* Update Rust to 1.37.0 (linkerd/linkerd2-proxy#324)
* Update url crate to 1.7.2 (linkerd/linkerd2-proxy#327)
* config: Make destination service configuration required (linkerd/linkerd2-proxy#325)
* make: Add test-lib target (linkerd/linkerd2-proxy#329)
* fallback: Split fallback into dedicated crate (linkerd/linkerd2-proxy#326)
* update to latest rustls, webpki, and ring
2019-08-29 12:00:20 -07:00
Alejandro Pedraza fd248d3755
Undo refactoring from #3316 (#3331)
Thus fixing `linkerd edges` and the dashboard topology graph

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-29 13:37:54 -05:00
Alejandro Pedraza 5d7499dc84
Avoid the dashboard requesting stats when not needed (#3338)
* Avoid the dashboard requesting stats when not needed

Create an alternative to `urlsForResource` called
`urlsForResourceNoStats` that makes use of the `skip_stats` parameter in
the stats API (created in #1871) that doesn't query Prometheus when not needed.

When testing using the dashboard looking at the linkerd namespace,
queries per second went down from 2874 to 2756, a 4% decrease.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-29 05:52:44 -05:00
Alejandro Pedraza 368d16f23c
Fix auto-injecting pods and integration tests reporting (#3335)
* Fix auto-injecting pods and integration tests reporting

When creating an Event when auto-injection occurs (#3316) we try to
fetch the parent object to associate the event to it. If the parent
doesn't exist (like in the case of stand-alone pods) the event isn't
created. I had missed dealing with one part where that parent was
expected.

This also adds a new integration test that I verified fails before this
fix.

Finally, I removed from `_test-run.sh` some `|| exit_code=$?` that was
preventing the whole suite to report failure whenever one of the tests
in `/tests` failed.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-28 15:04:20 -05:00
Andrew Seigner 956d1bff06
Update warning event regex for integration test (#3336)
Kubernetes was generating events for failed readiness probes that did
not quite match the expected events regex in the install integration
test:
https://travis-ci.org/linkerd/linkerd2/jobs/577642724#L647

Update the readiness probe regex to handle these variations in events:
https://play.golang.org/p/OVGJkFNN-XA

Relates to CI failure in #3333.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-28 10:19:40 -07:00
Andrew Seigner 419e9052ff
Fix flakey upgrade integration test (#3329)
The `linkerd upgrade` integration test compares the output from two
commands:
- `linkerd upgrade control-plane`
- `linkerd upgrade control-plane --from-manifests`

The output of these commands include the heartbeat cronjob schedule,
which is generated based on the current time.

Modify the upgrade integration test to retry the manifest comparison one
time, assuming that `linkerd upgrade control-plane` should not take more
than one minute to execute.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-28 09:41:09 -07:00
陈谭军 981f5bc85d fix-up spelling mistake (#3328)
Signed-off-by: chentanjun <2799194073@qq.com>
2019-08-27 10:24:53 -07:00
arminbuerkle 5c38f38a02 Allow custom cluster domains in remaining backends (#3278)
* Set custom cluster domain in GetServiceProfileFor
* Set custom cluster domain in tap server
Move fetching cluster domain for tap server to cmd main
* Handle fetchting cluster domain errors separately
* Use custom cluster domain for traffic split adaptor

Signed-off-by: Armin Buerkle <armin.buerkle@alfatraining.de>
2019-08-27 10:01:36 -07:00
Andrew Seigner 204e68ffe9
Move Code of Conduct from wiki to repo (#3320)
The Linkerd Community Code of Conduct lives in the wiki:
https://github.com/linkerd/linkerd/wiki/Linkerd-code-of-conduct
Per Github's Community Profile checklist
(https://github.com/linkerd/linkerd2/community), it should live at the
root of the repo.

Copy the contents of the wiki to a markdown file in the root of the
repo. Once merged, we will modify the wiki to point to the repo.

Also update README.md to indicate k8s 1.12+ is required.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-26 14:15:58 -07:00
Alejandro Pedraza 9ee98d35be
Stop ignoring client-go log entries (#3315)
* Stop ignoring client-go log entries

Pipe klog output into logrus. Not doing this avoids us from seeing
client-go log entries, for some reason I don't understand.

To enable, `--controller-log-level` must be `debug`.

This was discovered while trying to debug sending events for #3253.

I added an integration test that fails when this piping is not in place.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-26 15:46:31 -05:00
Andrew Seigner ea27e0ca0e
Introduce integration tests into all ci runs (#3293)
The integration tests under `/test` were run separately via l5d-bot,
lacking the feedback and job management provided by ci.

Enable integration tests in ci, via a docker build and kind clusters
executed on a remote DOCKER_HOST.

CI runs are now broken into two stages, run serially. Each stage is
composed of jobs run in parallel:
- Setup stage
  - Validate go deps
  - Remote docker build
  - Kind cluster setup (deep)
  - Kind cluster setup (upgrade)
  - Kind cluster setup (helm)
- Test stage
  - Go unit tests
  - Node.js unit tests
  - Kind integration tests (deep)
  - Kind integration tests (upgrade)
  - Kind integration tests (helm)

This PR also modifies `bin/test-run.sh` to always set `--failfast` for
Go tests.

Also introduce `bin/docker` and `bin/kubectl` scripts, to ensure
cacheable, pinned executables in ci.

The existing integration tests for master merges and docker pushes,
running against GKE, remain in place.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-26 11:41:17 -07:00
Alejandro Pedraza 02efb46e45
Have the proxy-injector emit events upon injection/skipping injection (#3316)
* Have the proxy-injector emit events upon injection/skipping injection

Fixes #3253

Have the proxy-injector emit an event whenever a injection happens, or
when injection is skipped for some reason (also added that reason into
the proxy-injector logs). The level is associated to the parent workload
(it can't be associated to the pod because at this point the pod hasn't
been persisted).

The event recorder was setup at the `webhook/server.go` level and passed
to the proxy-injector's `Inject` function. The sp-validator thus also
has access to the event recorder, but for now it's not using it.

Related changes:

- Refactored `api.GetOwnerKindAndName()` to have it return a more
generic object.
- Refactored `report.Injectable()` to also have it return the reason why
a workload is not injectable.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-26 13:34:36 -05:00
Andrew Seigner 9575512fb3
Introduce Pull Request Template (#3322)
GitHub's community guidelines recommend a pull request template, the repo was
lacking one.

Introduce a `PULL_REQUEST_TEMPLATE.md` file.

Once merged, the
[Community profile checklist](https://github.com/linkerd/linkerd2/community)
should indicate the repo now provides a pull request template.

Fixes #3321

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-26 11:02:09 -07:00
cpretzer f4608f2bcd
remove cluster domain from release notes because it is WIP (#3323)
* add information about cluster domain internal work

Signed-off-by: Charles Pretzer <charles@buoyant.io>
2019-08-23 17:59:54 -07:00
cpretzer 569f08811f
Release notes for edge-19.8.6 (#3318)
* Release notes for edge-19.8.6

Signed-off-by: Charles Pretzer <charles@buoyant.io>
2019-08-23 13:37:03 -07:00
Carol A. Scott 089836842a
Add unit test for edges API endpoint (#3306)
Fixes #3052.

Adds a unit test for the edges API endpoint. To maintain a consistent order for
testing, the returned rows in api/public/edges.go are now sorted.
2019-08-23 09:28:02 -07:00
Andrew Seigner 653ec8c5b7
Refactor bin/test-run for running tests separately (#3304)
The `bin/test-run` script executed upgrade, helm, and deep integration
test in series, but was structured in a way that did not permit running
these tests individually.

Move most of the logic from `bin/test-run` to a supporting library,
`bin/test-run.sh`, which will provide the ability to execute integration
tests individually. `bin/test-run`'s behavior is unchanged, it continues
to run upgrade, helm, and deep integration tests in series.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-22 11:05:06 -07:00
Andrew Seigner 4e058bfea2
Introduce bin/kind, move executables to target/bin (#3289)
`bin/helm` and `bin/protoc` were downloading their binaries into
`./target`, while `bin/lint` was downloading to the root of the repo.
Also travis was caching `./target`, which could become problematic if
that part of the test script relied on `target/cli/linux/linkerd`.

Standardize helm, kind, lint, and protoc to all download into
`./target/bin`, and modify travis to strictly cache that subdirectory.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-21 19:49:21 -07:00
Ivan Sim 954a45f751
Fix broken unit and integration tests (#3303)
Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-08-21 18:52:19 -07:00
Pascal Bourque b65207213e Added a "Linkerd Namespace" Grafana dashboard (#3301)
Closes #3299

Signed-off-by: Pascal Bourque <pascal@studyo.co>
2019-08-21 17:30:38 -07:00
arminbuerkle e7d303e03f Add LINKERD2_PROXY_DESTINATION_GET_SUFFIXES (#3277)
* Fix missing `clusterDomain` in render RenderTapOutputProfile
* Add LINKERD2_PROXY_DESTINATION_GET_SUFFIXES env variable

Signed-off-by: Armin Buerkle <armin.buerkle@alfatraining.de>
2019-08-21 14:28:30 -07:00
Oliver Gould cb276032f5
Require go 1.12.9 for controller builds (#3297)
Netflix recently announced a security advisory that identified several
Denial of Service attack vectors that can affect server implementations
of the HTTP/2 protocol, and has issued eight CVEs. [1]

Go is affected by two of the vulnerabilities (CVE-2019-9512 and
CVE-2019-9514) and so Linkerd components that serve HTTP/2 traffic are
also affected. [2]

These vulnerabilities allow untrusted clients to allocate an unlimited
amount of memory, until the server crashes. The Kubernetes Product
Security Committee has assigned this set of vulnerabilities with a CVSS
score of 7.5. [3]

[1] https://github.com/Netflix/security-bulletins/blob/master/advisories/third-party/2019-002.md
[2] https://golang.org/doc/devel/release.html#go1.12
[3] https://www.first.org/cvss/calculator/3.0#CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H
2019-08-21 10:03:29 -07:00
Guangming Wang 70d85d2065 Cleanup: fix some typos in code comment (#3296)
Signed-off-by: Guangming Wang <guangming.wang@daocloud.io>
2019-08-21 09:40:43 -07:00
Oliver Gould ee79d5d324
destination: Reorganize authority-parsing (#3244)
In preparation for #3242, the destination controller will need to
support a broader set of valid authorities including IP addresses.

This change modifies the destination controller's authority-parsing code
so that the is-this-a-kubernete-service-name decision is decoupled from
parsing of authorities into their consituent parts.

The `Get` API now explicitly handles IP address names, though it
currently fails all such resolutions.
2019-08-21 07:19:42 -07:00
Ivan Sim 183e42e4cd
Merge the CLI 'installValues' type with Helm 'Values' type (#3291)
* Rename template-values.go
* Define new constructor of charts.Values type
* Move all Helm values related code to the pkg/charts package
* Bump dependency
* Use '/' in filepath to remain compatible with VFS requirement
* Add unit test to verify Helm YAML output
* Alejandro's feedback
* Add unit test for Helm YAML validation (HA)

Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-08-20 19:26:38 -07:00
Andrew Seigner f6e8d3a7ae
Add release notes for stable-2.5.0 (#3294)
Relates to:
- https://github.com/linkerd/website/pull/470
- https://github.com/linkerd/website/pull/475

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-20 14:19:45 -07:00
Andrew Seigner d4cd8add3a
Add changes for `edge-19.8.5` (#3285)
Depends on #3286

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-19 14:15:25 -07:00
Oliver Gould e3c3e928dd
proxy: Update proxy to master (#3286)
* Split utilities into sub-crates (linkerd/linkerd2-proxy#306)
* tests: Update to Rust 2018 (linkerd/linkerd2-proxy#311)
* app: Split modules from inbound and outbound (linkerd/linkerd2-proxy#312)
* Introduce linkerd2-proxy-core (linkerd/linkerd2-proxy#313)
* travis: `make clean` after tests (linkerd/linkerd2-proxy#315)
* core: Formalize the listen/serve API (linkerd/linkerd2-proxy#314)
* Move inbound and outbound stacks from app::main (linkerd/linkerd2-proxy#316)
* core: Split resolve traits into core (linkerd/linkerd2-proxy#317)
* Split linkerd2-proxy-resolve (linkerd/linkerd2-proxy#318)
* classify: Assume success on missing grpc-status (linkerd/linkerd2-proxy#319)

Fixes #3281
2019-08-19 13:27:11 -07:00
Carol A. Scott bc8fef7ba9 Sorting the expected response for trafficsplit rows so it is always in consistent row order (#3280) 2019-08-19 10:10:26 -07:00
Alejandro Pedraza 99ddc66461
Always use forward-slash when interacting with the VFS (#3284)
* Always use forward-slash when interacting with the VFS

Fixes #3283

Our VFS implementation relies on `net.http.FileSystem` which always
expects `/` regardless of the OS.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-19 11:10:21 -05:00
Andrew Seigner f9c956b91e
Add changes for `edge-19.8.4` (#3272)
Depends on #3276

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-16 14:03:45 -07:00
Kevin Leimkuhler c9c41e2e8a
Remove gRPC tap server listener from controller (#3276)
### Summary

As an initial attempt to secure the connection from clients to the gRPC tap
server on the tap Pod, the tap `addr` only listened on localhost.

As @adleong pointed out #3257, this was not actually secure because the inbound
proxy would establish a connection to localhost anyways.

This change removes the gRPC tap server listener and changes `TapByResource`
requests to interface with the server object directly.

From this, we know that all `TapByResourceRequests` have gone through the tap
APIServer and thus authorized by RBAC.

### Details

[NewAPIServer](ef90e0184f/controller/tap/apiserver.go (L25-L26)) now takes a [GRPCTapServer](f6362dfa80/controller/tap/server.go (L33-L34)) instead of a `pb.TapClient` so that
`TapByResource` requests can interact directly with the [TapByResource](f6362dfa80/controller/tap/server.go (L49-L50)) method.

`GRPCTapServer.TapByResource` now makes a private [grpcTapServer](ef90e0184f/controller/tap/handlers.go (L373-L374)) that satisfies
the [tap.TapServer](https://godoc.org/github.com/linkerd/linkerd2/controller/gen/controller/tap#TapServer) interface. Because this interface is satisfied, we can interact
with the tap server methods without spawning an additional listener.

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2019-08-16 16:38:50 -04:00
Alejandro Pedraza 6567206b53
Update CNI integration tests (#3273)
Followup to #3066

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-15 20:19:29 -05:00
Alejandro Pedraza 7aaff9e0f4
Document Helm dev workflow (#3269)
* Document Helm dev workflow

Fixes #3199

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-15 17:57:09 -05:00
cpretzer 4e92064f3b
Add a flag to install-cni command to configure iptables wait flag (#3066)
Signed-off-by: Charles Pretzer <charles@buoyant.io>
2019-08-15 12:58:18 -07:00
Andrew Seigner a213343978
Add changes for `edge-19.8.3` (#3265)
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-15 10:38:14 -07:00
Alejandro Pedraza fd5fc07db1
Fix integration test (#3266)
Followup to #3194

The namespace was too long for l5d-bot:

```
    inject_test.go:117: failed to create
    l5d-integration-auto-git-9688d9ba-inject-namespace-override-test
    namespace: Namespace
    "l5d-integration-auto-git-9688d9ba-inject-namespace-override-test"
    is invalid: metadata.name: Invalid value:
    "l5d-integration-auto-git-9688d9ba-inject-namespace-override-test":
    must be no more than 63 characters
```

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-15 09:13:36 -05:00
Tarun Pothulapati 242566ac7c Check for Namespace level config override annotations (#3194)
* Check for Namespace level config override annotations
* Add unit tests for namespace level config overrides
* add integration test for namespace level config override
* use different namespace for override tests
* check resource requests for integration tests

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2019-08-14 21:01:44 -07:00
Alejandro Pedraza 879650cef9
Wait for `helm delete` to finish in integration test (#3259)
* Wait for `helm delete` to finish in integration test

Followup to #3251

In `helm_cleanup` block till the linkerd namespace has been deleted

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-14 19:15:34 -05:00
Ivan Sim e52afc1197
Update the Helm build script (#3248)
* Update Helm build script to pin the Helm CLI version
* Update Linkerd version in the Helm values file

Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-08-14 16:04:56 -07:00
Carol A. Scott 9c62b65c6a
Adding trafficsplit test to stat_summary_test.go (#3252)
This PR adds a test for trafficsplits to stat_summary_test.go.

Because the test requires a consistent order for returned rows, trafficsplit
rows in stat_summary.go are now sorted by apex + leaf name before being
returned.
2019-08-14 14:48:46 -07:00
Kevin Leimkuhler cc3c53fa73
Remove tap from public API and associated test infrastructure (#3240)
### Summary

After the addition of the tap APIServer, all the logic related to tap in the public API no longer needs to be there. The servers and clients that are created but not used, as well as all the old testing infrastrucure related to tap can be removed.

This deprecates TapByResource and therefore required an update to the protobuf files with `bin/protoc-go.sh`. While the change to deprecate this method was extremely small, a lot of protobuf fils were updated in the process. These changes to the code and protobuf files should probably remain coupled since `TapByResource` is officially deprecated in the public API, but a majority of the additions/deletions are related to those files.

This draft passes `go test` as well as a local run of the integration tests.

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2019-08-14 17:27:37 -04:00
Oliver Gould 8ef4104c95
proxy: Update proxy to 6910d717 (#3254)
* logging: format log records consistently (linkerd/linkerd2-proxy#310)
2019-08-14 13:34:15 -07:00
Andrew Seigner 3b55e2e87d
Add container cpu and mem to heartbeat requests (#3238)
PR #3217 re-introduced container metrics collection to
linkerd-prometheus. This enabled linkerd-heartbeat to collect mem and
cpu metrics at the container-level.

Add container cpu and mem metrics to heartbeat requests. For each of
(destination, prometheus, linkerd-proxy), collect maximum memory and p95
cpu.

Concretely, this introduces 7 new query params to heartbeat requests:
- p99-handle-us
- max-mem-linkerd-proxy
- max-mem-destination
- max-mem-prometheus
- p95-cpu-linkerd-proxy
- p95-cpu-destination
- p95-cpu-prometheus

Part of #2961

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-14 12:04:08 -07:00
Andrew Seigner 6c0ee2475b
Cleanup helm before full test-cleanup (#3251)
PR #3247 introduced additional helm cleanup in `bin/test-cleanup`.
During the integration tests, `bin/test-cleanup` is called prior to
`helm_cleanup` in `bin/test-run`. This causes `helm_cleanup` to fail, as
resources have already been deleted by `bin/test-cleanup`, and the
integration tests fail with `FAIL: error cleaning up Helm`.

Modify the integration tests to first call `helm_cleanup` prior to
calling `bin/test-cleanup`.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-14 10:47:17 -07:00
Carol A. Scott 00437709eb
Add trafficsplit metrics to CLI (#3176)
This PR adds `trafficsplit` as a supported resource for the `linkerd stat` command. Users can type `linkerd stat ts` to see the apex and leaf services of their trafficsplits, as well as metrics for those leaf services.
2019-08-14 10:30:57 -07:00
Andrew Seigner 9826cbdfe0
Label and cleanup helm after integration tests (#3247)
When helm integration tests fail, `bin/test-run` exits prior to calling
`helm_cleanup`, leaving behind a helm namespace and clusterrolebinding.

Update `bin/test-cleanup` to delete any remaining helm resources.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-14 09:30:48 -07:00
Alena Varkockova 12966c2b6e Remove unused code (#3250)
Signed-off-by: Alena Varkockova <varkockova.a@gmail.com>
2019-08-14 09:29:02 -05:00
Alejandro Pedraza 4e65ed1e6a Update integration test with new Helm values struct (#3246)
Followup to #3229

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-13 19:00:55 -07:00