Commit Graph

195 Commits

Author SHA1 Message Date
Tarun Pothulapati e91dbda287
Add health checks for grafana add-on (#4321)
* Add health checks for grafana add-on

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* update testCheck command and fixes

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* fix checkContainersRunnning function

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* linting fix

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* update test golden files

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* use hc.ControlPlanePods instead of k8s API

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* use hc.controlPLanePods directly

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* remove unnecessary comments

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* proper comments

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* update pod checks to use retries

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add values key check

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-05-14 23:18:43 +05:30
Alejandro Pedraza d0d97e9426
Upgrade to Helm v3 (#4373)
Upgraded to Helm v3.2.1 from v2.16.1, getting rid of Tiller and making
other simplifications.

Note that the version placeholder in the `values.yaml` files had to be
changed from `{version}` to `linkerdVersionValue` because the former
confuses Helm v3.
2020-05-14 12:11:47 -05:00
Kevin Leimkuhler dc5ca1a754
Check that ActualSuccess is greater than 0 in ServiceProfiles test (#4384)
#4217 suggests a retries integration test, but this is already tested as part
of the ServiceProfiles test.

In order to fix this issue, an extra check has been added to the assertion of
the `ActualSuccess` value. It now asserts the value is both greater than 0 and
less than 100.

Closes #4217

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-05-14 09:56:39 -07:00
Tarun Pothulapati 45ccc24a89
Move grafana templates into a separate sub-chart as a add-on (#4320)
* adds grafana manifests as a sub-chart

- moves grafana templates into its own chart
- implement add-on interface Grafana struct
- also add relevant conditions for grafana

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* remove redundant grafana fields in Values

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* update golden files

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* fix values issue

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* remove extra grafanaImage value

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add add-on upgrade tests

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* fix golden file tests

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add grafana field to linkerd-config-addons

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* Don't apply nil configuration

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* update golden files

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* make checks relaxed for grafana

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* update test to not test on grafana

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* update TestServiceAccountsMatch to contain extra members

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* replace map[string]interface{} with Grafana for better readability

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* update golden files

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-05-11 22:22:14 +05:30
Alejandro Pedraza f62a2e6ee4
Refactor integration tests to use annotations functions (#4341)
* Refactor integration tests to use annotations functions

First part of #4176

Replaced all the `t.Error`/`t.Fatal` calls in the integration with the
new functions defined in `testutil/annotations.go` as described in #4292,
in order for the errors to produce Github annotations.

Most of these calls have now two strings: one containing a generic error
message and another with a more specific message. The former is what
will be aggregated and seen in the CI reports at
[linkerd2-ci-metrics](https://github.com/linkerd/linkerd2-ci-metrics).

Other changes:

- Improved the annotation generator in `annotations.go` so that the
  message includes the name of the test.
- When a failure from `RetryFor` occurs, log the original timeout so
  we can consider incrementing it when the failure is persistent.
2020-05-08 08:41:42 -05:00
Alejandro Pedraza 1a2eaf29dc
Flaky tests: increase timeout for 'linkerd edges' (#4353)
The `linkerd edges` test was being flaky, so gave more slack for it to
succeed.
2020-05-07 18:24:32 -05:00
Alejandro Pedraza 0b7c8f76f9
Flaky tests: increase timeout for 'kubectl wait' (#4354)
Sometimes for no clear reason pods are taking their time to become
available. The `kubectl wait --for=condition=available` command in
`inject_test.go` is failing sporadically because of this.

e.g in
https://github.com/linkerd/linkerd2/runs/652159504?check_suite_focus=true#step:14:56

I could reproduce this and even though I couldn't see any errors in the logs
or events, I could confirm how long it's taking for the pod to come up:

```
$ k -n l5d-integration-inject-test describe po inject-test-terminus-enabled
...
Events:
  Type    Reason     Age    From                                               Message
  ----    ------     ----   ----                                               -------
  Normal  Scheduled  7m12s  default-scheduler                                  Successfully assigned l5d-integration-inject-test/inject-test-terminus-enabled-96fd5f5dc-5qlpb to gke-alpeb-dev-default-pool-b94ca25c-h84p
  Normal  Pulled     6m55s  kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p  Container image "gcr.io/linkerd-io/proxy-init:v1.3.2" already present on machine
  Normal  Created    6m54s  kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p  Created container linkerd-init
  Normal  Started    6m47s  kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p  Started container linkerd-init
  Normal  Pulled     6m28s  kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p  Container image "buoyantio/bb:v0.0.5" already present on machine
  Normal  Created    6m27s  kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p  Created container bb-terminus
  Normal  Started    6m27s  kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p  Started container bb-terminus
  Normal  Pulled     6m27s  kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p  Container image "gcr.io/linkerd-io/proxy:git-2a95d373" already present on machine
  Normal  Created    6m27s  kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p  Created container linkerd-proxy
  Normal  Started    6m27s  kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p  Started container linkerd-proxy
```

here the pod took 45s to start!
2020-05-07 18:23:44 -05:00
Alejandro Pedraza 6855bf9480
Flaky tests: Updated ignored error regex for cloud integration test (#4352)
Updated rule in list of ignored k8s warning events to make it more
generic and to account for this failure:
```
error killing pod: failed to "KillPodSandbox" for
"756c8333-1d4d-4f42-bc2d-bd99eb8b4c94" with KillPodSandboxError: "rpc
error: code = Unknown desc = networkPlugin cni failed to teardown pod
\"_\" network: operation Delete is not supported on
WorkloadEndpoint(default/gke--testing--git--2d2fd3f1--default--pool--b9cfce6d--tgcn-cni-bd3ca37ee6fc3a05bafa26ce71faa05279ce08de02462040300786cb7e046b38-eth0)"
```

That happened here:
https://github.com/linkerd/linkerd2/runs/653622248?check_suite_focus=true#step:6:27
2020-05-07 18:22:31 -05:00
Matei David 6b9aaac9d6
Add Kubeconfig contex namespace to cli commands' options (#4197) (#4291)
When using cli commands that work on namespaced resources in the cluster, the default namespace used by the cli is hardcoded to the default Kubernetes namespace (i.e 'default'). This update will allow cli commands that operate on namespaced resources to automatically infer what the name of the default  namespace is, by taking the relevant default from the currently used Kubeconfig context. In short, this allows the omission of the -n flag in commands such as linkerd metrics, when working with resources that belong to a namespace that is set as default in the currently active context.

Validation was done manually by setting the default namespace of the currently used context, as well as through two integration tests that target the tap and get command respectively.

Signed-off-by: Matei David <matei.david.35@gmail.com>
2020-05-04 16:21:05 -05:00
Alex Leong 40b921508f
Inject LINKERD2_PROXY_DESTINATION_GET_NETWORKS proxy variable (#4300)
Fixes #3807

By setting the LINKERD2_PROXY_DESTINATION_GET_NETWORKS environment variable, we configure the Linkerd proxy to do destination lookups for authorities which are IP addresses in the private network range.  This allows us to get destination metadata including identity for HTTP requests which target an IP address in the cluster, Prometheus metrics scrape requests, for example.

This change allowed us to update the "direct edges" test which ensures that the edges command produces correct output for traffic which is addressed directly to a pod IP.

We also re-enabled the "linkerd stat" integration tests which had been disabled while the destination service did not yet support these types of IP queries.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-04-30 11:22:24 -07:00
Tarun Pothulapati 2b1cbc6fc1
charts: Using downwardAPI to mount labels to the proxy container (#4199)
* use downward API to mount labels to the proxy container as a volume
* add namespace as a label to the pod
* add a trace inject test
* add downwardAPi for controlplaneTracing
* add controlPlaneTracing condition to volumeMounts
* update add-ons to have workload-ns
* add workload-ns label to control-plane components

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-04-22 10:33:51 -05:00
Alex Leong 5d3862c120
Use /live for liveness probe (#4270)
Fixes #3984

We use the new `/live` admin endpoint in the Linkerd proxy for liveness probes instead of the `/metrics` endpoint.  This endpoint returns a much smaller payload.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-04-17 14:53:32 -07:00
Kevin Leimkuhler 0d235694af
Add `unmeshed` flag to stat command (#4254)
## Motivation

Introduces an `unmeshed` flag to the `stat` command so that users can opt-in
to viewing unmeshed resources in the `stat` output.

This changes the existing behavior of the `stat` command such that unmeshed
resources no longer render by default in the output.

Before:

```
❯ bin/linkerd stat -A deploy
NAMESPACE     NAME                     MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN
kube-system   coredns                     0/1         -        -             -             -             -          -
kube-system   local-path-provisioner      0/1         -        -             -             -             -          -
kube-system   metrics-server              0/1         -        -             -             -             -          -
kube-system   traefik                     0/1         -        -             -             -             -          -
linkerd       linkerd-controller          1/1   100.00%   0.3rps           1ms           2ms           2ms          2
linkerd       linkerd-destination         1/1   100.00%   0.3rps           1ms           1ms           1ms         11
...
```

After:

```
❯ bin/linkerd stat -A deploy
NAMESPACE   NAME                     MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN
linkerd     linkerd-controller          1/1   100.00%   0.3rps           1ms           1ms           1ms          2
linkerd     linkerd-destination         1/1   100.00%   0.3rps           1ms           2ms           2ms         13
...
```

Closes #3871

## Solution

Using the meshed pod count in the stat response, resources with a count of `0`
are not rendered in the table.

The `-l`/`--selector` flag do not work for all resource types, so applying a
default label does not solve this problem. While it works for pods, it does
not work for deployments as the `linkerd.io/inject` is an annotation that
cannot be selected on.

I did not think a shorthand flag was necessary for this. I do not think users
will commonly pass this flag to the `stat` command, and I didn't think adding
an additional short flag such as `u` was necessary.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-04-14 20:24:29 -07:00
Alex Leong 7b9d475ffc
Gate SMI-Metrics behind an install flag (#4240)
This change adds a `--smi-metrics` install flag which controls if the SMI-metrics controller and associated RBAC and APIService resources are installed.  The flag defaults to false and is hidden.

We plan to remove this flag or default it to true if and when the SMI-Metrics integration graduates from experimental.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-04-09 14:34:08 -07:00
Alejandro Pedraza 322ba5fd2f
`linkerd uninstall` errors when attempting to delete PSP (#4234)
* Bug in `linkerd uninstall` when attempting to delete PSP

We were using a wrong apiVersion for PSP in `linkerd uninstall`'s
output, which avoids removing that resource:

```
$ linkerd uninstall | kubectl delete -f -
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-controller"
deleted
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-destination"
deleted
...
mutatingwebhookconfiguration.admissionregistration.k8s.io
"linkerd-proxy-injector-webhook-config" deleted
validatingwebhookconfiguration.admissionregistration.k8s.io
"linkerd-sp-validator-webhook-config" deleted
namespace "linkerd" deleted
error: unable to recognize "uninstall.yml": no matches for kind
"PodSecurityPolicy" in version "extensions/v1beta1"

$ kubectl get psp -oname
podsecuritypolicy.policy/linkerd-linkerd-control-plane
```

I've also replaced the uninstall integration test with a new separate
suite that performs the installation, waits for it to be ready,
uninstalls, and then confirms `linkerd check --pre` returns as expected.
2020-04-07 11:01:11 -05:00
Matei David fee70c064b
Add uninstall cmd functionality to cli (#3622) (#4200)
Signed-off-by: Matei David <matei.david.35@gmail.com>
2020-04-02 12:35:39 -05:00
Alejandro Pedraza 573060bacc
New test for checking SA lists are synced (#4201)
Followup to #4193

This is to verify that the list of SA installed, as well as the list of
SA in the linkerd-psp RoleBinding match the list of expected SA defined
in `healthcheck.go`.
2020-03-26 12:54:31 -05:00
Alejandro Pedraza d6c588f683
Add missing SAs to linkerd check (#4194)
* Add missing SAs to linkerd check

This adds the service accounts `linkerd-destination` and
`linkerd-smi-metrics` that were missing from the "control plane
ServiceAccounts exist" check.
2020-03-24 12:50:54 -05:00
Alex Leong 71d6a00faa
Include SMI metrics as part of Linkerd install (#4109)
Adds the SMI metrics API to the Linkerd install flow.  This installs the SMI metrics controller deployment, the SMI metrics ApiService object, and supporting RBAC, and config resources.

This is the first step toward having Linkerd consume the SMI metrics API in the CLI and web dashboard.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-03-02 10:11:16 -08:00
Christy Jacob 8111e54606
Check for extension server certificate (#4062)
* Check Extension api server Authentication
* Added Checks and tests for extension api-server authentication
* Fixed Failing Static Checks
* Updated the golden file

Signed-off-by: Christy Jacob <christyjacob4@gmail.com>
2020-02-28 13:39:02 -08:00
Alejandro Pedraza fa4db2d7a9
Fixed flaky integration test for ExternalIssuer (#4108)
Fixes #4105

In my local machine, `linkerd stat` was not returning traffic up until
the 17th try or so. Which explains why the 20s timeout was a bit too
close to the limit and this test was failing sometimes. So I increased
the timeout up to 40s and I'm also adding stderr to the error message.
2020-02-27 10:10:19 -05:00
Zahari Dichev 3538944d03
Unify trust anchors terminology (#4047)
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-02-15 10:12:46 +02:00
Alejandro Pedraza 7584f88b69
Integration test flakiness: endpoint version mismatch warning (#4020)
Updated regex for ignoring version mismatch warning events. It was only
applied for '-*upgrade' namespaces.

It is safe to ignore such warnings because the endpoint controller
retries when that happens, and if after many retries it still can't then
a different warning is thrown which is _not_ whitelisted and will make
the build fail.
https://github.com/kubernetes/kubernetes/blob/v1.16.6/pkg/controller/endpoint/endpoints_controller.go#L334-L348

This PR also removes logging matches on expected warnings, to avoid
cluttering the CI log.
2020-02-11 13:58:11 -05:00
Alex Leong 41d58c0905
Remove dependency on httpbin in egress integration test (#3987)
* Use linkerd.io for egress test instead of httpbin

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-02-07 19:35:51 -05:00
Alejandro Pedraza f78bef4ffd
Address external_issuer_test.go flakiness (#4018)
From time to time we get this CI error when testing the external issuer
mechanism:
```
Test script: [external_issuer_test.go] Params:
[--linkerd-namespace=l5d-integration-external-issuer
--external-issuer=true]
--- FAIL: TestExternalIssuer (33.61s)
    external_issuer_test.go:89: Received error while ensuring test app
    works (before cert rotation): Error stripping header and trailing
    newline; full output:
    FAIL
```
https://github.com/alpeb/linkerd2/runs/428273855?check_suite_focus=true#step:6:526

This is caused by the "backend" pod not receiving traffic from
"slow-cooker" in a timely manner.
After those pods are deployed we're only checking that "backend" is
ready, but not "slow-cooker", so this change adds that check.

I'm also removing the `TestHelper.CheckDeployment` call because it's
redundant, since the preceeding `TestHelper.CheckPods` is already checking
that the deployment has all the specified replicas ready.
2020-02-07 19:33:32 -05:00
Zahari Dichev 5cd3655b1e
Update helm overrides to match stable ones (#4025)
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-02-07 09:37:18 -05:00
Zahari Dichev c609564dc8
Add helm upgrade integration test (#3976)
In light of the breaking changes we are introducing to the Helm chart and the convoluted upgrade process (see linkerd/website#647) an integration test can be quite helpful. This simply installs latest stable through helm install and then upgrades to the current head of the branch.

Signed-off-by: Zahari Dichev zaharidichev@gmail.com
2020-02-04 08:27:46 +02:00
Anantha Krishnan 7f026c96f6 Added check for TapAPI service (#3689)
Added check for TapAPI service

Fixes #3462
Added a checker using `kube-aggregator` client

Signed-off-by: Ananthakrishnan <kannan4mi3@gmail.com>
2020-01-27 20:07:07 +02:00
Kevin Leimkuhler 8c9498def2
Temporarily fix flaky integration test (#3968)
*From the comment disabling the test*:

#2316

The response from `http://httpbin.org/get` is non-deterministic--returning
either `http://..` or `https://..` for GET requests. As #2316 mentions, this
test should not have an external dependency on this endpoint. As a workaround
for edge-20.1.3, temporarily disable this test and renable with one that has
reliable behavior.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-01-24 10:32:10 -08:00
Kevin Leimkuhler 53baecb382
Changes for edge-20.1.3 (#3966)
## edge-20.1.3

* CLI
  * Introduced `linkerd check --pre --linkerd-cni-enabled`, used when the CNI
    plugin is used, to check it has been properly installed before proceeding
    with the control plane installation
  * Added support for the `--as-group` flag so that users can impersonate
    groups for Kubernetes operations (thanks @mayankshah160!)
* Controller
  * Fixed an issue where an override of the Docker registry was not being
    applied to debug containers (thanks @javaducky!)
  * Added check for the Subject Alternate Name attributes to the API server
    when access restrictions have been enabled (thanks @javaducky!)
  * Added support for arbitrary pod labels so that users can leverage the
    Linkerd provided Prometheus instance to scrape for their own labels
    (thanks @daxmc99!)
  * Fixed an issue with CNI config parsing

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-01-23 16:55:21 -08:00
Zahari Dichev e30b9a9c69
Add checks for CNI plugin (#3903)
As part of the effort to remove the "experimental" label from the CNI plugin, this PR introduces cni checks to `linkerd check`

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-01-17 12:11:19 +02:00
Alex Leong cabe2a13e6
Add distributed tracing integration test (#3920)
This integration test roughly follows the [Linkerd guide to distributed tracing](https://linkerd.io/2019/10/07/a-guide-to-distributed-tracing-with-linkerd/).

We deploy the tracing components (oc-collector and jaeger), emojivoto, and nginx as an ingress to do span initiation.  We then watch the jaeger API and check that a trace is eventually created that includes traces from all of the data plane components: nginx, linkerd-proxy, web, voting, and emoji.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-01-16 10:57:15 -08:00
Zahari Dichev 0ee409eaa3 Fix inject integration tests failing due to wrong golden files (#3923)
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-01-14 12:47:16 -05:00
Alex Leong 93a81dce97
Change default proxy log level to "warn,linkerd=info" (#3908)
Fixes #3901 

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-01-09 14:22:06 -08:00
Oliver Gould d3d8d855f0
proxy: v2.83.0-experimental (#3897)
This is an experimental release that includes large changes to the
proxy's request buffering and backpressure infrastructure.

Please exercise caution before deploying this proxy version into mission
critical environments.
2020-01-09 14:12:46 -08:00
Zahari Dichev b4266c93de
Ensure proxy cert does not exceed the lifetime of the certs in the trust chain (#3893)
Fixes a problem where the identitiy serice can issue a certificate that has a lifetime larger than the issuer certificate. This was causing the proxies to end up using an invalid TLS certificate. This fix ensures that the lifetime of the issued certificate is not greater than the smallest lifetime of the certs in the issuer cert trust chain.

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-01-09 09:52:29 +02:00
Tarun Pothulapati eac06b973c Move common values to global (#3839)
* move values to global in template

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* update inject and cli

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* update unit tests

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* fix linting issues

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* remote controllerImageVersion from global

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* move identity out of global

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* update var name and comments

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* update bin and helm tests

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* update helm readme

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* fix proxy config

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* fix proxy config indentation

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* more linting issues

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* remove unnecessary lines

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-01-06 14:31:41 -08:00
Alejandro Pedraza 6f8574a633
Add event regex to ignore in integration test (#3884)
We were ignoring events like
```
MountVolume.SetUp failed for volume .* : couldn't propagate object cache: timed out waiting for the condition
```

but as k8s 1.16 those got replaced by more precise messages, like
```
MountVolume.SetUp failed for volume "linkerd-identity-token-cm4fn" :failed to sync secret cache: timed out waiting for the condition
MountVolume.SetUp failed for volume "prometheus-config" : failed to sync configmap cache: timed out waiting for the condition
```

This was causing sporadic CI test failures like
[here](https://github.com/linkerd/linkerd2/runs/368424822#step:7:562)

So I'm including another regex for that.

Re: 96c41f8a1e
2020-01-06 14:22:15 -05:00
Alejandro Pedraza 4abd778558
Don't hide stderr in integration tests (#3855)
In various integration tests we're not showing stderr when a failure
happens, thus hiding some possibly useful debugging info.
E.g. in the latest CI failures, commands like `linkerd update` were
failing with no visible reason why.
2019-12-20 09:27:18 -05:00
Alex Leong 03762cc526
Support pod ip and service cluster ip lookups in the destination service (#3595)
Fixes #3444 
Fixes #3443 

## Background and Behavior

This change adds support for the destination service to resolve Get requests which contain a service clusterIP or pod ip as the `Path` parameter.  It returns the stream of endpoints, just as if `Get` had been called with the service's authority.  This lays the groundwork for allowing the proxy to TLS TCP connections by allowing the proxy to do destination lookups for the SO_ORIG_DST of tcp connections.  When that ip address corresponds to a service cluster ip or pod ip, the destination service will return the endpoints stream, including the pod metadata required to establish identity.

Prior to this change, attempting to look up an ip address in the destination service would result in a `InvalidArgument` error.

Updating the `GetProfile` method to support ip address lookups is out of scope and attempts to look up an ip address with the `GetProfile` method will result in `InvalidArgument`.

## Implementation

We do this by creating a `IPWatcher` which wraps the `EndpointsWatcher` and supports lookups by ip.   `IPWatcher` maintains a mapping up clusterIPs to service ids and translates subscriptions to an IP address into a subscription to the service id using the underlying `EndpointsWatcher`.

Since the service name is no longer always infer-able directly from the input parameters, we restructure `EndpointTranslator` and `PodSet` so that we propagate the service name from the endpoints API response.

## Testing

This can be tested by running the destination service locally, using the current kube context to connect to a Kubernetes cluster:

```
go run controller/cmd/main.go destination -kubeconfig ~/.kube/config
```

Then lookups can be issued using the destination client:

```
go run controller/script/destination-client/main.go -path 192.168.54.78:80 -method get -addr localhost:8086
```

Service cluster ips and pod ips can be used as the `path` argument.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-12-19 09:25:12 -08:00
Sergio C. Arteaga 56c8a1429f Increase the comprehensiveness of check --pre (#3701)
* Increase the comprehensiveness of check --pre

Closes #3224

Signed-off-by: Sergio Castaño Arteaga <tegioz@icloud.com>
2019-12-18 13:27:32 -05:00
Zahari Dichev f88b55e36e Tls certs checks (#3813)
* Added checks for cert correctness
* Add warning checks for approaching expiration
* Add unit tests
* Improve unit tests
* Address comments
* Address more comments
* Prevent upgrade from breaking proxies when issuer cert is overwritten (#3821)
* Address more comments
* Add gate to upgrade cmd that checks that all proxies roots work with the identitiy issuer that we are updating to
* Address comments
* Enable use of upgarde to modify both roots and issuer at the same time

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2019-12-16 14:49:32 -08:00
Tarun Pothulapati 2f492a77fb Switch to Smaller-Case in Linkerd2 and Partials Charts (#3823)
* update linkerd2, partials charts
* support install and inject workflow
* update helm docs
* update comments in values
* update helm tests
* update comments in test

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2019-12-13 14:48:07 -05:00
Alejandro Pedraza 2a4c71760d
Enable cert rotation test to work with dynamic namespaces, take two (#3795)
* Enable cert rotation test to work with dynamic namespaces

This PR adds support for dynamic cert generation when running the cert rotation intergration tests. This allows to avoid baking in the namespace in the certificate CN, thereby allowing us to run these tests on the clouds.

The tests in #3775 were failing because the second secret holding the issuer cert replacement was a leaf cert and not a root/intermediary cert capable of signing the CSRs. This is how the replacement cert looked like:

```bash
$ k -n l5d-integration-external-issuer get secrets linkerd-identity-issuer-new -ojson | jq '.data|.["tls.crt"]' | tr -d '"' | base64 -d | step certificate inspect -
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 2 (0x2)
    Signature Algorithm: ECDSA-SHA256
        Issuer: CN=identity.l5d-integration-external-issuer.cluster.local
        Validity
            Not Before: Dec 6 19:16:08 2019 UTC
            Not After : Dec 5 19:16:28 2020 UTC
        Subject: CN=identity.l5d-integration-external-issuer.cluster.local
        Subject Public Key Info:
            Public Key Algorithm: ECDSA
                Public-Key: (256 bit)
                X:
                    93:d5:fa:f8:d1:44:4f:9a:8c:aa:0c:9e:4f:98:a3:
                    8d:28:d9:cc:f2:74:4c:5f:76:14:52:47:b9:fb:c9:
                    a3:33
                Y:
                    d2:04:74:95:2e:b4:78:28:94:8a:90:b2:fb:66:1b:
                    e7:60:e5:02:48:d2:02:0e:4d:9e:4f:6f:e9:0a:d9:
                    22:78
                Curve: P-256
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Subject Alternative Name:
                DNS:identity.l5d-integration-external-issuer.cluster.local

    Signature Algorithm: ECDSA-SHA256
         30:46:02:21:00:f6:93:2f:10:ba:eb:be:bf:77:1a:2d:68:e6:
         04:17:a4:b4:2a:05:80:f7:c5:f7:37:82:7b:b7:9c:a1:66:6a:
         e1:02:21:00:b3:65:06:37:49:06:1e:13:98:7c:cf:f9:71:ce:
         5a:55:de:f6:1b:83:85:b0:a8:88:b7:cf:21:d1:16:f2:10:f9
```
For it to be a root/intermediate cert it should have had `CA:TRUE` under the `X509v3 extensions` section.

Why did the test pass sometimes? When it did pass for me, I could see in the linkerd-identity proxy logs something like:
```
ERR! [   320.964592s] linkerd2_proxy_identity::certify Received invalid ceritficate: invalid certificate: UnknownIssuer
```
so the cert retrieved from identity still was invalid but for some reason the proxy, sometimes, keeps on going despite that. And when one would delete the linkerd-identity pod, its proxy wouldn't come up at all, also showing that error.

With the changes from this branch, we no longer see that error in the logs and after deleting the linkerd-identity pod it comes back gracefully.
2019-12-11 15:50:06 -05:00
Zahari Dichev 6faf64e49f Revert "Enable cert rotation test to work with dynamic namespaces (#3775)" (#3787)
This reverts commit 0e45b9c03d.

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2019-12-05 15:33:22 -05:00
Zahari Dichev 0e45b9c03d
Enable cert rotation test to work with dynamic namespaces (#3775)
This PR adds support for dynamic cert generation when running the cert rotation intergration tests. This allows to avoid baking in the namespace in the certificate CN, thereby allowing us to run these tests on the clouds.

* Enable cert rotation test to work with dynamic namespaces

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

* Address comments

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

* Address further comments

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2019-12-05 10:08:01 +02:00
Alex Leong ecb55bb1a3
Disable TestCliStatForLinkerdNamespace integration test (#3727)
https://github.com/linkerd/linkerd2/pull/3693 caused the proxy to start resolving private IP addresses with the destination service.  However, the destination service does not support IP lookups and returns failures for these lookups.  This negatively affects the destination service success rate and can cause this test to fail.  We disable this test for now until the destination service supports IP lookups.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-11-14 13:13:10 -08:00
Alejandro Pedraza 29459a50e6 Removed 'no invalid service profiles' from linkerd check test fixtures (#3724)
Followup to #3718
2019-11-14 10:52:38 -08:00
Zahari Dichev 2d224302de
Add integration test for external issuer and cert rotation flows (#3709)
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
2019-11-14 06:58:32 +02:00
Zahari Dichev a6ff442789
Traffic split integration test (#3649)
* Traffic split integration test

Signed-off-by: zaharidichev <zaharidichev@gmail.com>

* Address comments

Signed-off-by: zaharidichev <zaharidichev@gmail.com>

* Display placeholder when there is no basic stats data

Signed-off-by: zaharidichev <zaharidichev@gmail.com>
2019-11-13 21:14:34 +02:00