Commit Graph

1915 Commits

Author SHA1 Message Date
Alejandro Pedraza 578a2d1960
CI: Adjustments to the release job (#4129)
Extracted the logic to pull the latest release notes, out of
`bin/create-release-tag` into `bin/_release.sh` so that it can be reused
in the `release.yml` workflow, which needs to use that inside
`gh_release` when creating the github release in order to have prettier
markup release notes instead of a plaintext message pulled out of the tag
message.
The new extracted function also receives an optional argument with the
name of the file to put the release notes into, because the `body_path`
parameter in `softprops/action-gh-release` doesn't work with dynamic
vars.

Finally, now the `website_publish` job will only launch until the `gh_release`
has succeeded.
2020-03-05 09:03:30 -05:00
Zahari Dichev 72fc94b03c
Service mirroring tests (#4115)
Unit tests that exercise most of the code in cluster_watcher.go. Essentially the whole cluster mirroring machinary can be tought of as a function that takes remote cluster state, local cluster state, and modification events and as a result it either modifies local cluster state or issues new events onto the queue. This is what these tests are trying to model. I think this covers a lot of the logic there. Any suggestions for other edge cases are welcome.

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-03-04 20:17:21 +02:00
Andrew Seigner 2d17d6253d
Fix namespace-scoped Grafana links (#4119)
The `/namespaces` page in the web dashboard was rendering broken Grafana
links, containing an extra `var-namespace=` param, for example:
```
/grafana/dashboard/db/linkerd-namespace?var-namespace=&var-namespace=emojivoto
```

Root cause was the `GrafanaLink` component taking both `resource` and
`namespace` properties, but not special-casing when
`resource === 'namespace' && namespace === ''`.

Modify the `GrafanaLink` component to omit the `var-namespace` param
when a `namespace` property is not provided.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2020-03-02 13:20:40 -08:00
Andrew Seigner a37316a336
Introduce `bin/shellcheck`, add to ci (#4118)
PR #4117 was root-caused with the help of `shellcheck`.

This change introduces a `bin/shellcheck` script, and adds it to CI. In
CI, many checks are disabled to allow it to pass. This will at least
prevent introduction of new classes of shell issue, and should motivate
re-enabling more checks over time.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2020-03-02 13:18:08 -08:00
Andrew Seigner b52dc35587
Fix `bin/fetch-proxy` on Linux (#4117)
`bin/fetch-proxy` was failing on Linux:

```bash
$ bin/fetch-proxy
linkerd2-proxy-v2.87.0/
linkerd2-proxy-v2.87.0/LICENSE
linkerd2-proxy-v2.87.0/bin/
linkerd2-proxy-v2.87.0/bin/linkerd2-proxy
bin/fetch-proxy: 31: [: Linux: unexpected operator
/home/siggy/code/linkerd2/target/proxy/linkerd2-proxy-v2.87.0
```

Also in CI:
https://github.com/linkerd/linkerd2/runs/473746447?check_suite_focus=true#step:5:32

Unfortunately `bin/fetch-proxy` still returned a zero exit status, because
`set -e` does not apply to commands that are part of `if` statements.
From https://ss64.com/bash/set.html:
```
-e  Exit immediately if a simple command exits with a non-zero status, unless
    the command that fails is part of an until or  while loop, part of an
    if statement, part of a && or || list, or if the command's return status
    is being inverted using !.  -o errexit
```
Fortunately when the `if` command failed, it fell through to the `else` clause
for Linux, and copied `linkerd-proxy` successfully.

Root cause was a `==` instead of `=`. `shellcheck` confirms, and also
recommends quoting:

```bash
$ shellcheck bin/fetch-proxy

In bin/fetch-proxy line 31:
if [ $(uname) == "Darwin" ]; then
     ^-- SC2046: Quote this to prevent word splitting.
              ^-- SC2039: In POSIX sh, == in place of = is undefined.
```

Apply `shellcheck` recommendations.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2020-03-02 12:33:20 -08:00
Alejandro Pedraza a65f76ed22
Use SHAs instead of tags when referring to GH Actions libs (#4114)
When adding an action we can quickly vet it and fix it to a sha. Whereas
if we use a tag, the 3rd party can change the code and retag it without us noticing
2020-03-02 15:03:24 -05:00
Zahari Dichev edd7fd203d
Service Mirroring Component (#4028)
This PR introduces a service mirroring component that is responsible for watching remote clusters and mirroring their services locally.

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-03-02 21:16:08 +02:00
Alex Leong 71d6a00faa
Include SMI metrics as part of Linkerd install (#4109)
Adds the SMI metrics API to the Linkerd install flow.  This installs the SMI metrics controller deployment, the SMI metrics ApiService object, and supporting RBAC, and config resources.

This is the first step toward having Linkerd consume the SMI metrics API in the CLI and web dashboard.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-03-02 10:11:16 -08:00
arminbuerkle 65eae40b6a
Remove envoy, contour restrictions (#4092)
* Remove envoy, contour restrictions

Signed-off-by: Armin Buerkle <armin.buerkle@alfatraining.de>
2020-03-02 09:18:51 -05:00
jpresky 67aa71b89b
Update ADOPTERS.md (#4121)
dco
Signed-off-by: Jacob Presky <jacob.presky@commonbond.co>
2020-02-29 11:23:50 -08:00
Christy Jacob 8111e54606
Check for extension server certificate (#4062)
* Check Extension api server Authentication
* Added Checks and tests for extension api-server authentication
* Fixed Failing Static Checks
* Updated the golden file

Signed-off-by: Christy Jacob <christyjacob4@gmail.com>
2020-02-28 13:39:02 -08:00
Kevin Leimkuhler 44f1078498
Fix `fetch-proxy` script on macos (#4112)
`sha256sum` is not installed by default. Use `openssl dgst -sha256` instead.
2020-02-27 17:03:02 -08:00
Kevin Leimkuhler 42349d6280
Add changes for edge-20.2.3 (#4113)
## edge-20.2.3

This release introduces the first optional add-on `tracing`, added through the
new add-on model!

The existing optional `tracing` components Jaeger and OpenCensus can now be
installed as add-on components.

There will be more information to come about the new add-on model, but please
refer to the details of [#3955](https://github.com/linkerd/linkerd2/pull/3955) for how to get started.

* CLI
  * Added the `linkerd diagnostics` command to get metrics only from the
    control plane, excluding metrics from the data plane proxies (thanks
    @srv-twry!)
  * Added the `linkerd install --prometheus-image` option for installing a
    custom Prometheus image (thanks @christyjacob4!)
  * Fixed an issue with `linkerd upgrade` where changes to the `Namespace`
    object were ignored (thanks @supra08!)
* Controller
  * Added the `tracing` add-on which installs Jaeger and OpenCensus as add-on
    components (thanks @Pothulapati!!)
* Proxy
  * Increased the inbound router's default capacity from 100 to 10k to
    accommodate environments that have a high cardinality of virtual hosts
    served by a single pod
* Web UI
  * Fixed styling in the CallToAction banner (thanks @aliariff!)
2020-02-27 13:29:40 -08:00
Kevin Leimkuhler e37cb3b932
Add success message for tag script (#4111)
This adds a message after running the `create-release-script` that I intended to
add as part of the initial PR. Example output:

```
❯ bin/create-release-tag $TAG tag created and signed.

tag: edge-93.1.1

To push tag, run:
    git push origin edge-93.1.1
```
2020-02-27 10:03:41 -08:00
Oliver Gould 1c127c4902
proxy: v2.87.0 (#4110)
This release comprises many internal changes that are not expected to
have any user-facing impact.

There is one user-facing change: the inbound router's default capacity
has been increased from 100 to 10K to accomodate environments that have
a high cardinality of virtual hosts served by a single pod.

---

* fallback: Operate on Services instead of Layers (linkerd/linkerd2-proxy#432)
* internal: Extract a service-profile crate (linkerd/linkerd2-proxy#433)
* Increase inbound router capacity default to 10000 (linkerd/linkerd2-proxy#434)
* Upgrade to Rust 1.41 (linkerd/linkerd2-proxy#437)
* cleanup: Remove various cruft (linkerd/linkerd2-proxy#438)
* Generalize router::Make as stack::NewService (linkerd/linkerd2-proxy#435)
* integration: Make the test controller more realistic (linkerd/linkerd2-proxy#436)
* trace-context: Remove unnecessary MakeService (linkerd/linkerd2-proxy#439)
* Split the `stack-tracing` crate from `app-core` (linkerd/linkerd2-proxy#440)
* stack: Introduce the Proxy trait (linkerd/linkerd2-proxy#441)
* timeout: Do not synthesize HTTP response (linkerd/linkerd2-proxy#442)
* addr: Avoid trailing dots in authorities (linkerd/linkerd2-proxy#446)
* outbound: Relax type constraints in require_identity_on_endpoint (linkerd/linkerd2-proxy#447)
* Cleanup transport::Connect & http::Client types (linkerd/linkerd2-proxy#443)
* app: Use locks with controller clients (linkerd/linkerd2-proxy#448)
2020-02-27 07:26:26 -08:00
Alejandro Pedraza fa4db2d7a9
Fixed flaky integration test for ExternalIssuer (#4108)
Fixes #4105

In my local machine, `linkerd stat` was not returning traffic up until
the 17th try or so. Which explains why the 20s timeout was a bit too
close to the limit and this test was failing sometimes. So I increased
the timeout up to 40s and I'm also adding stderr to the error message.
2020-02-27 10:10:19 -05:00
Ali Ariff 3505c913d8
Add spacing in CallToAction banner (#4095)
* Add spacing in CallToAction banner

The call to action banner  in control plane page is missing some spacing.

The CSS is defined but not yet used. So the solution is to add the class name to the corresponding banner.

After its merged the banner will have more space.

Fixes #3690

* Remove unused css

Signed-off-by: Ali Ariff <ali.ariff12@gmail.com>
2020-02-26 19:53:26 -08:00
Alejandro Pedraza 43e64e4818
Added last release job to verify version was published in the website (#4106)
Added the `website_publish_check` job, triggered upon edge/stable tag pushes,
that verifies that the install script at the website has indeed been
updated with the corresponding edge/stable version. It performs the
check every 5 seconds, 10 times, before giving up and failing the build
run. Tested fine in my fork 👍
2020-02-26 15:03:45 -05:00
Tarun Pothulapati 948dc22a34
Tracing Add-on For Linkerd (#3955)
* Moves Common templates needed to partials

As add-ons re-use the partials helm chart, all the templates needed by multiple charts should be present in partials
This commit also updates the helm tests
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add tracing add-on helm chart

Tracing sub-chart includes open-census and jaeger components as a sub-chart which can be enabled as needed

* Updated Install path to also install add-ons

This includes new interface for add-ons to implement, with example tracing implementation

* Updates Linkerd install path to also install add-ons

Changes include:
 - Adds an optional Linkerd Values configmap which stores add-on configuration when add-ons are present.
 - Updates Linkerd install path to check for add-ons and render their sub-charts.
 - Adds a install Option called config, which is used to pass confiugration for add-ons.
 - Uses a fork of mergo, to over-write default Values with the Values struct generated from config.

* Updates the upgrade path about add-ons.

Upgrade path now checks for the linkerd-values cm, and overwrites the default values with it, if present.
It then checks the config option, for any further overwrites

* Refactor linkerd-values and re-update tests
also adds relevant nil checks
* Refactor code to fix linting issues
* Fixes an error with linkerd-config global values

Also refactors the linkerd-values cm to work the same with helm

* Fix a nil pointer issue for tests
* Updated Tracing add-on chart meta-data
Also introduced a defaultGetFiles method for add-ons

* Add add-on/charts to gitignore
* refactor gitignore for chart deps
* Moves sub-charts to /charts directly
* Refactor linkerd values cm
* Add comment in linkerd-values
* remove extra controlplanetracing flag
* Support Stages deployment for add-ons along with tests
* linting fix
* update tracing rbac
* Removes the need for add-on Interface
- Uses helm loading capabiltiies to get info about add-ons
- Uses reflection to not have to unnecessarily add checks for each add-on type

* disable tracing flag
* Remove dep on forked mergo
- Re-use merge from helm

* Re-use helm's merge
* Override the chartDir path during tests
* add error check
* Updated the dependency iteration code

Currently, the charts directory, will not have the deps in the repo. So, Code is updated to read the dependencies from requirements.yaml
and use that info to read templates from the relevant add-ons directory.

* Hard Code add-ons name
* Remove struct details for add-ons

- As we don't use fields of a add-on struct, we don't have them to be typed. Instead we can just use the `enabled` flag using reflection
- Users can just use map[string]interface{} as the add-on type.

* update unit tests
* linting fix
* Rename flag to addon-config
* Use Chart loading logic
- This code uses chart loading to read the files and keep in a vfs.
- Once we have those files read we will then use them for generation of sub-charts.

* Go fmt fix
* Update the linkerd-values cm to use second level field
* Add relevant unit tests for mergeRaw
* linting fix
* Move addon tests to a new file
* Fix golden files
* remove addon install unit test
* Refactor sub-chart load logic
* Add install tracing unit test
* golden file update for tracing install
* Update golden files to reflect another pr changes
* Move addon-config flag to recordFlagSet
* add relevant tracing enabled checks
* linting fix

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-02-26 10:15:04 -08:00
Kevin Leimkuhler ae880f0e33
Create linkerd/website dispatch event on release (#4100)
## Motivation

A goal of the release automation project is to automate the website publish
that publishes a new install script that uses the new release version.

linkerd/website#668 removed the hard coded versions from the repo and moved
the version update into the `make publish` command.

That workflow now needs to be triggered by a release in `linkerd2`.

## Solution

Once `kind_integration_tests` and `cloud_integration_test` pass, a job runs
using the [repository-dispatch](https://github.com/marketplace/actions/repository-dispatch) action to create a repository dispatch event in
`linkerd/website`.

This dispatch event will (current PR: linkerd/website#670) trigger the
publish workflow.

## Testing

Tested in my fork [here](https://github.com/kleimkuhler/linkerd2/actions/runs/45165789)

## Additional steps needed

A new `RELEASE_TOKEN` secret needs to be added to this repo. It should be the
personal access token of [l5d-bot](https://github.com/l5d-bot) with `repo` access.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-02-26 09:42:23 -08:00
Alejandro Pedraza c355ae8ff0
New CI job to automatically generate the Github release (#4094)
* New CI job to automatically generate the Github release

Fixes #4083

New `gh_release` job in the `release.yml` for creating the release in
Github and uploading the CLI binaries for each platform, along with
their checksum files.

This job only gets triggered upon successful docker images building and
pushing, and kind and cloud integration tests passing.

The Helm chart deploying job gets now triggered upon the success of this
new job.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2020-02-26 09:05:55 -05:00
Supratik Das d9956f3b35
Update control-plane-namespace label (#4061)
* Update control-plane-namespace label

Upgrade command ignores changes to the namespace object

Add linkerd.io/control-plane-ns=linkerd label to the control-plane namespace

Fixes #3958

* Add controlPlaneNamespace label to namespace.yaml
* Modify tests for updated controlPlaneNamespace label
* Fix faulty values.yaml value
* Localize reference for controlPlaneNamespace label in kubernetes_helper.go

Signed-off-by: Supratik Das <rick.das08@gmail.com>
2020-02-24 12:57:28 -08:00
Alejandro Pedraza 2ad141d27a
Exclude changes on markup files to trigger CI runs in master (#4084)
Fixes #4082

This tested fine in a fork, under various scenarios combining:
- Modify markup file in root dir
- Modify markup file in subdir
- Modify non-markup file
- In master
- In a PR
2020-02-24 13:50:35 -05:00
Christy Jacob f9b940e89d
Support for custom prometheus registry (#4041)
* feat: added prometheus Registry Option for install command
* chore: draft commit
* Draft for custom prometheus image
* Support for custom prometheus image

This PR adds support to override the default prometheus image name and use custom image names in private repositories

* Added default Prometheus Image from values.yaml

The default can be overridden by the argument given in installOptions

* chore: fixed failing check
* Fixed fialing check
* Updated the tests as per the new flag
* Air-gapped installation for prometheus-image
* Air Gapped installation for Prometheus Image
* Added regex for prometheus repository/image cli option

Signed-off-by: Christy Jacob <christyjacob4@gmail.com>
2020-02-24 09:59:29 -08:00
Saurav Tiwary 1c19e314b7
Linkerd CLI command to get control plane diagnostics (#4050)
* CLI command to fetch control plane metrics
Fixes #3116
* Add GetResonse method to return http GET response
* Implemented timeouts using waitgroups
* Refactor metrics command by extracting common code to metrics_diagnostics_util
* Refactor diagnostics to remove code duplication
* Update portforward_test for NewContainerMetricsForward function
* Lint code
* Incorporate Alex's suggestions
* Lint code
* fix minor errors
* Add unit test for getAllContainersWithPort
* Update metrics and diagnostics to store results in a buffer and print once
* Incorporate Ivan's suggestions
* consistent error handling inside diagnostics
* add coloring for the output
* spawn goroutines for each pod instead of each container
* switch back to unbuffered channel
* remove coloring in the output
* Add a long description of the command

Signed-off-by: Saurav Tiwary <srv.twry@gmail.com>
2020-02-24 09:09:54 -08:00
Kevin Leimkuhler ab4a13ab52
Add minimal release workflow (#4090)
## Motivation

A release workflow will be the only triggered workflow on `push.tags` events.

As a first step in automating the release process, it should assert that
integration tests pass once the docker images have been tagged.

Both KinD and cloud integration tests should run since they have different
sets of integration tests that they are responsible for running.

It then needs to run the `chart_deploy` job.

## Testing

This has been fully tested with a release tag push on my fork. The run can be
found [here](https://github.com/kleimkuhler/linkerd2/actions/runs/42664128)

It properly failed on `chart_deploy` because I did not want to push a test tag
helm chart.

## Solution

This workflow will:

- Build the docker images on the Packet Host
- Tag the docker images with the release tag
- Run KinD integration tests
- Run cloud integration tests
- Run `chart_deploy`

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-02-22 17:13:48 -08:00
Kevin Leimkuhler 4aac6445c4
Add script to create release tag (#4091)
## Motivation

Creating a release tag is a manual process that is prone to error by the
release responsible member.

Additionally, the automated release project will require that a message is
included that is a copy of the recent `CHANGES.md` changes.

These steps can be scripted so that the member will just need to run a release
script.

## Solution

A `bin/create-release-tag` script will:
- Take a `$TAG` argument (maybe can remove this in the future) to use as the
  tag name
- Pull out the top section of `CHANGES.md` to use as the commit message
- Create the a tag with `$TAG` name and release changes as the message

## Example

```
$ TAG="edge-20.2.3"
$ bin/create-release-tag $TAG
$ git push $TAG
```

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-02-22 16:30:17 -08:00
Alejandro Pedraza ea523a46b0
Fixed shellcheck warnings on bin/helm-build (#4080)
Followup to #4058

```
$ shellcheck -x bin/helm-build; echo $?
0
```
2020-02-21 09:51:21 -05:00
Alejandro Pedraza 9b64f0dc94
Reuse bin/helm-build in Helm integration tests (#4088)
Have the preliminary setup for the Helm integration tests use
`bin/helm-build` instead of directly calling `helm dependency update`.
This allows testing `bin/helm-build` itself, and also lints the linkerd2
and linkerd2-cni charts (the latter lint call is being added as well in this
PR).
2020-02-21 09:26:10 -05:00
Alex Leong c891b22632
Remove extraneous return (#4089)
Remove extraneous `return` which was missed in https://github.com/linkerd/linkerd2/pull/4007

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-02-20 16:12:49 -08:00
Alejandro Pedraza 8c12f03af8
Update gcloud configs for the chart_deploy job - Part 2 (#4087) 2020-02-20 18:10:40 -05:00
Alejandro Pedraza 262fe578b6
Update gcloud configs for the chart_deploy job (#4086) 2020-02-20 17:00:52 -05:00
Alejandro Pedraza 395ca102d5
Edge-20.2.2 release notes (#4077)
* Edge-20.2.2 release notes
2020-02-20 14:21:30 -05:00
Supratik Das 42efc1da01
Improve kubectl apply format by removing misplaced message (#4053)
* Improve kubectl apply format by removing misplaced message

Fixes #2956

Also separate stderr messages with a new line

Signed-off-by: Supratik Das <rick.das08@gmail.com>
2020-02-20 10:36:36 -05:00
Mayank Shah 7cff974a79
cli: handle panic caused by `linkerd metrics` port-forward failure (#4007)
* cli: handle `linkerd metrics` port-forward gracefully

- add return for routine in func `Init()` in case of error
- add return from func `getMetrics()` if error from `portforward.Init()`

* Remove select block at pkg/k8s/portforward.go

- It is now the caller's responsibility to call pf.Stop()

Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
2020-02-19 21:44:37 -08:00
Alejandro Pedraza df2011dbb2
CI: Upgrades the gcloud action to fix GKE clusters teardown issue (#4074)
Ref linkerd/linkerd2-action-gcloud#1
2020-02-19 17:24:59 -05:00
Oliver Gould dc451208d4
proxy: v2.86.0 (#4075)
This release includes the results from continued profiling & performance
analysis. In addition to modifying internals to prevent unwarranted
memory growth, we've introduced new metrics to aid in debugging and
diagnostics: a new `request_errors_total` metric exposes the number of
requests that receive synthesized responses due to proxy errors; and a
suite of `stack_*` metrics expose proxy internals that can help us
identify unexpected behavior.

---

* trace: update `tracing-subscriber` dependency to 0.2.1 (linkerd/linkerd2-proxy#426)
* Reimplement the Lock middleware with tokio::sync (linkerd/linkerd2-proxy#427)
* Add the request_errors_total metric (linkerd/linkerd2-proxy#417)
* Expose the number of service instances in the proxy (linkerd/linkerd2-proxy#428)
* concurrency-limit: Share a limit across Services (linkerd/linkerd2-proxy#429)
* profiling: add benchmark and profiling scripts (linkerd/linkerd2-proxy#406)
* http-box: Box HTTP payloads via middleware (linkerd/linkerd2-proxy#430)
* lock: Generalize to protect a guarded value (linkerd/linkerd2-proxy#431)
2020-02-19 14:24:47 -08:00
Sanni Michael aa1f200dde
Allow custom host set from helm values (#4054)
* Allow custom host set from helm values #3961

Fixes #3961

Signed-off-by: Sanni Michael Tomiwa <sannimichaelse@gmail.com>
2020-02-19 09:50:11 -05:00
Kohsheen Tiku dea8b8c547
Support for configuring service profile retries(x-linkerd-retryable) via openapi spec (#4052)
* Retry policy is manually written in yaml file and patched it into the service profile

Added support for configuring service profile retries(x-linkerd-retryable) via openapi spec

Signed-off-by: Kohsheen Tiku <kohsheen.t@gmail.com>
2020-02-18 13:07:07 -08:00
Christian Hüning 67f83cf51a
Switched figo to finleap (#4056)
Due to company mergers we're now finleap connect and not figo anymore. Therefore I'd like to update the URL and company name here.

Signed-off-by: Christian Hüning <christian.huening@finleap.com>
2020-02-18 12:09:45 -08:00
Alejandro Pedraza 77af716ab2
bin/helm-build automatically updates version in values.yaml (#4058)
* bin/helm-build automatically updates version in values.yaml

Have the Helm charts building script (`bin/helm-build`) update the
linkerd version in the `values.yaml` files according to the tagged
version, thus removing the need of doing this manually on every release.

This is akin to the update we do in `version.go` at CLI build time.

Note that `shellcheck` is issuing some warnings about this script, but
that's on code that was already there, so that will be handled in an
followup PR.
2020-02-18 11:19:58 -05:00
Mayank Shah 3c3a4a5f5d
cli: Add label selector flag for `stat` (#4040)
* Update `linkerd-namespace` shorthand to `L`
* Add --selector (-l) flag for `stat`

Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
2020-02-17 13:40:07 -05:00
Kohsheen Tiku 19806e3626
Scroll functionality for linkerd top deploy/linkerd-web (#4011)
* Table obtained from linkerd top is not scrollable.

Added scroll functionality for the table.

Fixes #2558

Signed-off-by: Kohsheen Tiku <kohsheen.t@gmail.com>
2020-02-17 11:17:43 -05:00
Zahari Dichev 6fa9407318
Ensure we get the correct type out of Informer Deletion events (#4034)
Ensure we get what we expect when receiving DELETE events from the k8s Informer api

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-02-15 10:15:24 +02:00
Zahari Dichev 3538944d03
Unify trust anchors terminology (#4047)
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-02-15 10:12:46 +02:00
Kevin Leimkuhler c31284e6db
Separate single Actions workflow into multiple workflows (#4039)
Depends on #4033

## Motivation

If any job fails in the current GH Actions workflow, a re-run on the same
commit SHA requires re-running *all* jobs--regardless if the job already
passed in the previous run.

This can be problematic when dealing with flakiness in the integration tests.

If a test fails due to flakiness in `cloud_integration_tests`, all the unit
tests, static checks, and `kind_integration_tests` are re-run which leads to
lots of waiting and dealing with the possibility of flakiness in earlier jobs.

With this change, individual workflows can now be re-run without triggering
all other jobs to complete again first.

## Solution

`workflow.yml` is now split into:
- `static_checks.yml`
- `unit_tests.yml`
- `kind_integration.yml`
- `cloud_integration.yml`

### Workflows

`statc_checks.yml` performs checks related to dependencies, linting, and
formatting.

`unit_tests.yml` performs the Go and JS unit tests.

`kind_integration.yml` builds the images (on Packet or the GH Action VM) and
runs the integration tests on a KinD cluster. This workflow continues to run
for **all** PRs and pushes to `master` and tags.

`cloud_integration.yml` builds the images only on Packet. This is because
forked repositories do not need to trigger this workflow. It then creates a
unique GKE cluster and runs the integration tests on the cluster.

### The actual flow of work..

A forked repository or non-forked repository opening a PR triggers:
- `static_checks`
- `unit_tests`
- `kind_integration_tests`

These workflows all run in parallel and are invidivually re-runnable.

A push to `master` or tags triggers:
- `static_checks`
- `unit_tests`
- `kind_integration_tests`
- `cloud_integration_tests`

These workflows also all run in parallel, including the `docker_build` step of
both integration test workflows. This has not conflicted in testing as it
takes place on the same Packet host and just utilizes docker layer caching
well.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-02-13 09:11:30 -08:00
Mayank Shah c1b683147a
Update identity to make certs more diagnosable (#3990)
Update identity controller to make issuer certificates diagnosable if
cert validity is causing error

    - Add expiry time in identity log message
    - Add current time in identity log message
    - Emit k8s event with appropriate message


Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
2020-02-13 11:21:41 +02:00
Kevin Leimkuhler a460ada166
Run all PRs on GH Actions VMs (#4033)
Run all PRs on GH Actions VMs

## Motivation

Currently all pushes to master branch, tags, and Linkerd org member PRs run
the `kind_integration_host` job on the same Packet host.

The means that parallel jobs spin up KinD clusters with a unique name and
sandbox the tests so that they do not clash.

This is problematic for a few reasons:
* There is a limit on the number of jobs we can run in parallel due to
  resource constraints.
* Workflow cancellation and re-runs conflict when the cancelled run deletes
  it's namespaces and the running one expects them to be present.
* There has been an observed flakiness with running multiple KinD clusters
  resulting in inconsistent timeouts and docker errors.

## Solution

This change moves all KinD integration testing to GH Actions VMs. This is
currently what forked repository workflows do.

There is no longer a `docker_pull` job as it's responsibilities has been moved
into one of the `kind_integration_tests` steps.

The renamed `kind_integration_tests` job is responsible for **all** PR
workflows and has steps specific to forked and non-forked repositories.

### Non-forked repository PRs

The Packet host is still used for building docker images as leveraging docker
layer caching is still valuable--a build can be as fast as 30 seconds compared
to ~12 minutes.

Loading the docker images into the KinD cluster on the GH Action VM is done by
saving the Packet host docker images as image archives, and loading those
directly into the local KinD cluster.

### Forked repository PRs

`docker_build` has been sped up slightly by sending `docker save` processes to
the background.

Docker layer caching cannot be leveraged since there are no SSH secrets
available, so the `artifact-upload`/`artifact-download` actions introduced in
#TODO are still used.

### Cleanup

This PR also includes some general cleanup such as:
* Some job names have been renamed to better reflect their purpose or match
  the current naming pattern.
* Environment variables are set earlier in jobs as a separate step if it is
  currently exported multiple times.
* Indentation was really bothering me because it switches back and forth
  throughout the workflow file, so lists are now indented.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-02-12 14:38:05 -08:00
Alex Leong ec51434eb9
Show traffic split metrics from sources in all namespaces (#3967)
Fixes #3562 

When a pod in one namespace sends traffic to a service which is the apex of a traffic split in another namespace, that traffic is not displayed in the `linkerd stat trafficsplit` output.  This is because when we do a Prometheus query for traffic to the traffic split, we supply a Prometheus label selector to only select traffic sources in the namespace of the traffic split.

Since any pod in any namespace can send traffic to the apex service of a traffic split, we must look at all possible sources of traffic, not just the ones in the same namespace.

Before:

```
$ bin/linkerd stat ts
NAME           APEX     LEAF       WEIGHT   SUCCESS   RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
webapp-split   webapp   webapp       900m         -     -             -             -             -
webapp-split   webapp   webapp-2     100m         -     -             -             -             -
```

After:

```
$ bin/linkerd stat ts
NAME           APEX     LEAF       WEIGHT   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
webapp-split   webapp   webapp       900m    80.00%   1.4rps          31ms          99ms        2530ms
webapp-split   webapp   webapp-2     100m    60.00%   0.2rps          35ms          93ms          99ms
```

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-02-12 09:21:59 -08:00
Alejandro Pedraza 7584f88b69
Integration test flakiness: endpoint version mismatch warning (#4020)
Updated regex for ignoring version mismatch warning events. It was only
applied for '-*upgrade' namespaces.

It is safe to ignore such warnings because the endpoint controller
retries when that happens, and if after many retries it still can't then
a different warning is thrown which is _not_ whitelisted and will make
the build fail.
https://github.com/kubernetes/kubernetes/blob/v1.16.6/pkg/controller/endpoint/endpoints_controller.go#L334-L348

This PR also removes logging matches on expected warnings, to avoid
cluttering the CI log.
2020-02-11 13:58:11 -05:00