linkerd2

Commit Graph

Author	SHA1	Message	Date
Zahari Dichev	904f146558	Multicluster install integration test (#4540 ) This PR adds multicluster components to the integration tests. The existing tests have been modified to pass the `--multicluster` flag so that the entire integration test suite runs with multicluster components. Currently, the upgrade tests do not have multicluster components installed, but this will be done in a follow-up PR. Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-06-24 14:32:22 -04:00
Lutz Behnke	846d2f11d4	Add support for Helm configuration of per-component proxy resources requests and limits (#4226 ) Signed-off-by: Lutz Behnke <lutz.behnke@finleap.com>	2020-06-24 12:54:27 -05:00
Mayank Shah	7f29717a64	Refactor helper functions from `inject` integration tests (#4644 ) move `applyPatch` `useTestImageTag`, `validateInject``getProxyContainers` as global functions to be used!	2020-06-22 23:15:52 +05:30
Alejandro Pedraza	27b2838c52	Post-2.8.0 integration test cleanup (#4641 ) * Post-2.8.0 integration test cleanup We had some code for testing upgrades from pre-2.8.0 stables that took care of creating the non-existent `linkerd-smi-metrics` SA, which is no longer necessary. I also had missed many spots in test/install_test.go from #4623	2020-06-22 09:09:04 -05:00
Alejandro Pedraza	c8c5980d63	Integration tests: Warn (instead of erroring) upon pod restarts (#4623 ) * Integration tests: Warn (instead of erroring) upon pod restarts Fixes #4595 Don't have integration tests fail whenever a pod is detected to have restarted just once. For now we'll be just logging this out and creating a warning annotation for it.	2020-06-18 06:08:05 -05:00
Kevin Leimkuhler	f6bd722e2c	Fix install-pr script (#4610 ) * Fix install-pr script * Add image-archives path to commands to use the files Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com> Signed-off-by: Charles Pretzer <charles@buoyant.io> Co-authored-by: Charles Pretzer <charles@buoyant.io>	2020-06-17 21:32:01 -07:00
Oliver Gould	bb01b94e8a	Pin bb protobuf to a stable sha (#4619 ) The bb repo does not have a master branch anymore. We should just pin the SHA to avoid branches and changes all together.	2020-06-16 18:16:55 -07:00
Kevin Leimkuhler	b0765c4361	Add integration test for upgrading from edge (#4557 ) This adds an integration test for upgrading from the latest edge to the current build. Closes #4471 Signed-off-by: Kevin Leimkuhler kevin@kleimkuhler.com	2020-06-16 09:18:52 -07:00
Alejandro Pedraza	c0afb443d2	Fix mechanism to fetch logs/events upon test failures (#4538 ) Followup to #4522 This removes the `controlPlaneInstalled` var in `bin/install_test.go` that flagged whether the control plane was already present in the series of tests, whose intention was to avoid fetching the logs/events when the CP wasn't yet there. That was done under the assumption `TestMain()` would feed that flag to the runner for each individual test function, but it turns out `TestMain()` only runs once per test file, and so `controlPlaneInstalled` remained with its initial value `false`. So now logs/events are fetched always, even if the control plane is not there. If the CP is absent and we try fetching, we only see a `didn't find any client-go entries` message.	2020-06-04 09:11:30 -05:00
Alejandro Pedraza	ed5482ac3b	Fixed prom route in linkerd service profile, and some extra cleanup (#4493 ) * Fixes #4305 Fixed SP route for `POST /api/v1/query`: ``` $ bin/linkerd routes -n linkerd deploy/linkerd-prometheus ROUTE SERVICE SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 GET /api/v1/query_range linkerd-prometheus 100.00% 3.9rps 1ms 2ms 2ms GET /api/v1/series linkerd-prometheus 100.00% 1.1rps 1ms 1ms 1ms POST /api/v1/query linkerd-prometheus 100.00% 3.1rps 1ms 17ms 19ms [DEFAULT] linkerd-prometheus - - - - - ``` Also added one missing route for `linkerd-grafana`, realizing afterwards there are many other ones missing, but not really worth adding them all. I also removed the routes in `linkerd-controller` for the tap routes given that's no longer handled in that service. And the tap service SP was also removed alltogether since nothing was getting reported.	2020-06-03 12:53:50 -05:00
Oliver Gould	7cc5e5c646	multicluster: Use the proxy as an HTTP gateway (#4528 ) This change modifies the linkerd-gateway component to use the inbound proxy, rather than nginx, for gateway. This allows us to detect loops and propagate identity through the gateway. This change also cleans up port naming to `mc-gateway` and `mc-probe` to resolve conflicts with Kubernetes validation. --- * proxy: v2.99.0 The proxy can now operate as gateway, routing requests from its inbound proxy to the outbound proxy, without passing the requests to a local application. This supports Linkerd's multicluster feature by adding a `Forwarded` header to propagate the original client identity and assist in loop detection. --- * Add loop detection to inbound & TCP forwarding (linkerd/linkerd2-proxy#527) * Test loop detection (linkerd/linkerd2-proxy#532) * fallback: Unwrap errors recursively (linkerd/linkerd2-proxy#534) * app: Split inbound/outbound constructors into components (linkerd/linkerd2-proxy#533) * Introduce a gateway between inbound and outbound (linkerd/linkerd2-proxy#540) * gateway: Add a Forwarded header (linkerd/linkerd2-proxy#544) * gateway: Return errors instead of responses (linkerd/linkerd2-proxy#547) * Fail requests that loop through the gateway (linkerd/linkerd2-proxy#545) * inject: Support config.linkerd.io/enable-gateway This change introduces a new annotation, config.linkerd.io/enable-gateway, that, when set, enables the proxy to act as a gateway, routing all traffic targetting the inbound listener through the outbound proxy. This also removes the nginx default listener and gateway port of 4180, instead using 4143 (the inbound port). * proxy: v2.100.0 This change modifies the inbound gateway caching so that requests may be routed to multiple leaves of a traffic split. --- * inbound: Do not cache gateway services (linkerd/linkerd2-proxy#549)	2020-06-02 19:37:14 -07:00
Alejandro Pedraza	e607fc9247	Fetch logs/events when integration test fails, not only for install tests (#4522 ) * Fetch logs/events when integration test fails, not only for install tests ## Motivation Mainly to know what caused containers to not start (or to restart), like in #4285 ## Implementation Followup to #4410, where we fetched unexpected logs/events when a test failed in `test/install_test.go`; now we're expanding that behavior to every integration test. For that, we replace in each `TestMain()`: ```go os.Exit(m.Run()) ``` with ```go os.Exit(testutil.Run(m, TestHelper, true)) ``` where `testutil.Run()` executes the tests and fetches the logs/events if the tests failed. Also extracted the log/event fetching and matching into its own separate file. * Appease linter * For external_issuer_integration_tests controlPlaninstalled wasn't being set	2020-06-01 16:48:55 -05:00
Alejandro Pedraza	de5b22ffba	Flaky tests: when installation test fails, fetch logs and events (#4410 ) * When installation test fails, fetch logs and events Re #4371 When a test fails in `./test/install_test.go`, trigger the `TestLogs` and `TestEvents` tests in a separate process in order to output any unexpected logs/events that might have caused the initial test failure. For instance, currently we're sporadically experiencing pod restarts. Instead of ignoring them, this might help provide us with the real underlying cause.	2020-05-26 16:41:31 -05:00
Tarun Pothulapati	a8158dbeac	Add HealthChecks for Tracing Add-On (#4407 ) Adds health-checks for tracing add-on, along with a refactor to have safe casts. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-05-26 22:10:23 +05:30
Tarun Pothulapati	be664571c1	Separate grafana image tag in template (#4395 ) Separates grafana image field into image.name, image.version and also moves controllerImageVersion to global	2020-05-20 22:27:19 +05:30
Tarun Pothulapati	e91dbda287	Add health checks for grafana add-on (#4321 ) * Add health checks for grafana add-on Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * update testCheck command and fixes Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * fix checkContainersRunnning function Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * linting fix Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * update test golden files Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * use hc.ControlPlanePods instead of k8s API Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * use hc.controlPLanePods directly Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * remove unnecessary comments Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * proper comments Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * update pod checks to use retries Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * add values key check Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-05-14 23:18:43 +05:30
Alejandro Pedraza	d0d97e9426	Upgrade to Helm v3 (#4373 ) Upgraded to Helm v3.2.1 from v2.16.1, getting rid of Tiller and making other simplifications. Note that the version placeholder in the `values.yaml` files had to be changed from `{version}` to `linkerdVersionValue` because the former confuses Helm v3.	2020-05-14 12:11:47 -05:00
Kevin Leimkuhler	dc5ca1a754	Check that ActualSuccess is greater than 0 in ServiceProfiles test (#4384 ) #4217 suggests a retries integration test, but this is already tested as part of the ServiceProfiles test. In order to fix this issue, an extra check has been added to the assertion of the `ActualSuccess` value. It now asserts the value is both greater than 0 and less than 100. Closes #4217 Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-05-14 09:56:39 -07:00
Tarun Pothulapati	45ccc24a89	Move grafana templates into a separate sub-chart as a add-on (#4320 ) * adds grafana manifests as a sub-chart - moves grafana templates into its own chart - implement add-on interface Grafana struct - also add relevant conditions for grafana Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * remove redundant grafana fields in Values Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * update golden files Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * fix values issue Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * remove extra grafanaImage value Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * add add-on upgrade tests Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * fix golden file tests Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * add grafana field to linkerd-config-addons Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * Don't apply nil configuration Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * update golden files Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * make checks relaxed for grafana Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * update test to not test on grafana Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * update TestServiceAccountsMatch to contain extra members Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * replace map[string]interface{} with Grafana for better readability Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * update golden files Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-05-11 22:22:14 +05:30
Alejandro Pedraza	f62a2e6ee4	Refactor integration tests to use annotations functions (#4341 ) * Refactor integration tests to use annotations functions First part of #4176 Replaced all the `t.Error`/`t.Fatal` calls in the integration with the new functions defined in `testutil/annotations.go` as described in #4292, in order for the errors to produce Github annotations. Most of these calls have now two strings: one containing a generic error message and another with a more specific message. The former is what will be aggregated and seen in the CI reports at [linkerd2-ci-metrics](https://github.com/linkerd/linkerd2-ci-metrics). Other changes: - Improved the annotation generator in `annotations.go` so that the message includes the name of the test. - When a failure from `RetryFor` occurs, log the original timeout so we can consider incrementing it when the failure is persistent.	2020-05-08 08:41:42 -05:00
Alejandro Pedraza	1a2eaf29dc	Flaky tests: increase timeout for 'linkerd edges' (#4353 ) The `linkerd edges` test was being flaky, so gave more slack for it to succeed.	2020-05-07 18:24:32 -05:00
Alejandro Pedraza	0b7c8f76f9	Flaky tests: increase timeout for 'kubectl wait' (#4354 ) Sometimes for no clear reason pods are taking their time to become available. The `kubectl wait --for=condition=available` command in `inject_test.go` is failing sporadically because of this. e.g in https://github.com/linkerd/linkerd2/runs/652159504?check_suite_focus=true#step:14:56 I could reproduce this and even though I couldn't see any errors in the logs or events, I could confirm how long it's taking for the pod to come up: ``` $ k -n l5d-integration-inject-test describe po inject-test-terminus-enabled ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 7m12s default-scheduler Successfully assigned l5d-integration-inject-test/inject-test-terminus-enabled-96fd5f5dc-5qlpb to gke-alpeb-dev-default-pool-b94ca25c-h84p Normal Pulled 6m55s kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p Container image "gcr.io/linkerd-io/proxy-init:v1.3.2" already present on machine Normal Created 6m54s kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p Created container linkerd-init Normal Started 6m47s kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p Started container linkerd-init Normal Pulled 6m28s kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p Container image "buoyantio/bb:v0.0.5" already present on machine Normal Created 6m27s kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p Created container bb-terminus Normal Started 6m27s kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p Started container bb-terminus Normal Pulled 6m27s kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p Container image "gcr.io/linkerd-io/proxy:git-2a95d373" already present on machine Normal Created 6m27s kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p Created container linkerd-proxy Normal Started 6m27s kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p Started container linkerd-proxy ``` here the pod took 45s to start!	2020-05-07 18:23:44 -05:00
Alejandro Pedraza	6855bf9480	Flaky tests: Updated ignored error regex for cloud integration test (#4352 ) Updated rule in list of ignored k8s warning events to make it more generic and to account for this failure: ``` error killing pod: failed to "KillPodSandbox" for "756c8333-1d4d-4f42-bc2d-bd99eb8b4c94" with KillPodSandboxError: "rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \"_\" network: operation Delete is not supported on WorkloadEndpoint(default/gke--testing--git--2d2fd3f1--default--pool--b9cfce6d--tgcn-cni-bd3ca37ee6fc3a05bafa26ce71faa05279ce08de02462040300786cb7e046b38-eth0)" ``` That happened here: https://github.com/linkerd/linkerd2/runs/653622248?check_suite_focus=true#step:6:27	2020-05-07 18:22:31 -05:00
Matei David	6b9aaac9d6	Add Kubeconfig contex namespace to cli commands' options (#4197 ) (#4291 ) When using cli commands that work on namespaced resources in the cluster, the default namespace used by the cli is hardcoded to the default Kubernetes namespace (i.e 'default'). This update will allow cli commands that operate on namespaced resources to automatically infer what the name of the default namespace is, by taking the relevant default from the currently used Kubeconfig context. In short, this allows the omission of the -n flag in commands such as linkerd metrics, when working with resources that belong to a namespace that is set as default in the currently active context. Validation was done manually by setting the default namespace of the currently used context, as well as through two integration tests that target the tap and get command respectively. Signed-off-by: Matei David <matei.david.35@gmail.com>	2020-05-04 16:21:05 -05:00
Alex Leong	40b921508f	Inject LINKERD2_PROXY_DESTINATION_GET_NETWORKS proxy variable (#4300 ) Fixes #3807 By setting the LINKERD2_PROXY_DESTINATION_GET_NETWORKS environment variable, we configure the Linkerd proxy to do destination lookups for authorities which are IP addresses in the private network range. This allows us to get destination metadata including identity for HTTP requests which target an IP address in the cluster, Prometheus metrics scrape requests, for example. This change allowed us to update the "direct edges" test which ensures that the edges command produces correct output for traffic which is addressed directly to a pod IP. We also re-enabled the "linkerd stat" integration tests which had been disabled while the destination service did not yet support these types of IP queries. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-04-30 11:22:24 -07:00
Tarun Pothulapati	2b1cbc6fc1	charts: Using downwardAPI to mount labels to the proxy container (#4199 ) * use downward API to mount labels to the proxy container as a volume * add namespace as a label to the pod * add a trace inject test * add downwardAPi for controlplaneTracing * add controlPlaneTracing condition to volumeMounts * update add-ons to have workload-ns * add workload-ns label to control-plane components Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-04-22 10:33:51 -05:00
Alex Leong	5d3862c120	Use /live for liveness probe (#4270 ) Fixes #3984 We use the new `/live` admin endpoint in the Linkerd proxy for liveness probes instead of the `/metrics` endpoint. This endpoint returns a much smaller payload. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-04-17 14:53:32 -07:00
Kevin Leimkuhler	0d235694af	Add `unmeshed` flag to stat command (#4254 ) ## Motivation Introduces an `unmeshed` flag to the `stat` command so that users can opt-in to viewing unmeshed resources in the `stat` output. This changes the existing behavior of the `stat` command such that unmeshed resources no longer render by default in the output. Before: ``` ❯ bin/linkerd stat -A deploy NAMESPACE NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN kube-system coredns 0/1 - - - - - - kube-system local-path-provisioner 0/1 - - - - - - kube-system metrics-server 0/1 - - - - - - kube-system traefik 0/1 - - - - - - linkerd linkerd-controller 1/1 100.00% 0.3rps 1ms 2ms 2ms 2 linkerd linkerd-destination 1/1 100.00% 0.3rps 1ms 1ms 1ms 11 ... ``` After: ``` ❯ bin/linkerd stat -A deploy NAMESPACE NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN linkerd linkerd-controller 1/1 100.00% 0.3rps 1ms 1ms 1ms 2 linkerd linkerd-destination 1/1 100.00% 0.3rps 1ms 2ms 2ms 13 ... ``` Closes #3871 ## Solution Using the meshed pod count in the stat response, resources with a count of `0` are not rendered in the table. The `-l`/`--selector` flag do not work for all resource types, so applying a default label does not solve this problem. While it works for pods, it does not work for deployments as the `linkerd.io/inject` is an annotation that cannot be selected on. I did not think a shorthand flag was necessary for this. I do not think users will commonly pass this flag to the `stat` command, and I didn't think adding an additional short flag such as `u` was necessary. Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-04-14 20:24:29 -07:00
Alex Leong	7b9d475ffc	Gate SMI-Metrics behind an install flag (#4240 ) This change adds a `--smi-metrics` install flag which controls if the SMI-metrics controller and associated RBAC and APIService resources are installed. The flag defaults to false and is hidden. We plan to remove this flag or default it to true if and when the SMI-Metrics integration graduates from experimental. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-04-09 14:34:08 -07:00
Alejandro Pedraza	322ba5fd2f	`linkerd uninstall` errors when attempting to delete PSP (#4234 ) * Bug in `linkerd uninstall` when attempting to delete PSP We were using a wrong apiVersion for PSP in `linkerd uninstall`'s output, which avoids removing that resource: ``` $ linkerd uninstall \| kubectl delete -f - clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-controller" deleted clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-destination" deleted ... mutatingwebhookconfiguration.admissionregistration.k8s.io "linkerd-proxy-injector-webhook-config" deleted validatingwebhookconfiguration.admissionregistration.k8s.io "linkerd-sp-validator-webhook-config" deleted namespace "linkerd" deleted error: unable to recognize "uninstall.yml": no matches for kind "PodSecurityPolicy" in version "extensions/v1beta1" $ kubectl get psp -oname podsecuritypolicy.policy/linkerd-linkerd-control-plane ``` I've also replaced the uninstall integration test with a new separate suite that performs the installation, waits for it to be ready, uninstalls, and then confirms `linkerd check --pre` returns as expected.	2020-04-07 11:01:11 -05:00
Matei David	fee70c064b	Add uninstall cmd functionality to cli (#3622 ) (#4200 ) Signed-off-by: Matei David <matei.david.35@gmail.com>	2020-04-02 12:35:39 -05:00
Alejandro Pedraza	573060bacc	New test for checking SA lists are synced (#4201 ) Followup to #4193 This is to verify that the list of SA installed, as well as the list of SA in the linkerd-psp RoleBinding match the list of expected SA defined in `healthcheck.go`.	2020-03-26 12:54:31 -05:00
Alejandro Pedraza	d6c588f683	Add missing SAs to linkerd check (#4194 ) * Add missing SAs to linkerd check This adds the service accounts `linkerd-destination` and `linkerd-smi-metrics` that were missing from the "control plane ServiceAccounts exist" check.	2020-03-24 12:50:54 -05:00
Alex Leong	71d6a00faa	Include SMI metrics as part of Linkerd install (#4109 ) Adds the SMI metrics API to the Linkerd install flow. This installs the SMI metrics controller deployment, the SMI metrics ApiService object, and supporting RBAC, and config resources. This is the first step toward having Linkerd consume the SMI metrics API in the CLI and web dashboard. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-03-02 10:11:16 -08:00
Christy Jacob	8111e54606	Check for extension server certificate (#4062 ) * Check Extension api server Authentication * Added Checks and tests for extension api-server authentication * Fixed Failing Static Checks * Updated the golden file Signed-off-by: Christy Jacob <christyjacob4@gmail.com>	2020-02-28 13:39:02 -08:00
Alejandro Pedraza	fa4db2d7a9	Fixed flaky integration test for ExternalIssuer (#4108 ) Fixes #4105 In my local machine, `linkerd stat` was not returning traffic up until the 17th try or so. Which explains why the 20s timeout was a bit too close to the limit and this test was failing sometimes. So I increased the timeout up to 40s and I'm also adding stderr to the error message.	2020-02-27 10:10:19 -05:00
Zahari Dichev	3538944d03	Unify trust anchors terminology (#4047 ) Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-02-15 10:12:46 +02:00
Alejandro Pedraza	7584f88b69	Integration test flakiness: endpoint version mismatch warning (#4020 ) Updated regex for ignoring version mismatch warning events. It was only applied for '-*upgrade' namespaces. It is safe to ignore such warnings because the endpoint controller retries when that happens, and if after many retries it still can't then a different warning is thrown which is _not_ whitelisted and will make the build fail. https://github.com/kubernetes/kubernetes/blob/v1.16.6/pkg/controller/endpoint/endpoints_controller.go#L334-L348 This PR also removes logging matches on expected warnings, to avoid cluttering the CI log.	2020-02-11 13:58:11 -05:00
Alex Leong	41d58c0905	Remove dependency on httpbin in egress integration test (#3987 ) * Use linkerd.io for egress test instead of httpbin Signed-off-by: Alex Leong <alex@buoyant.io>	2020-02-07 19:35:51 -05:00
Alejandro Pedraza	f78bef4ffd	Address external_issuer_test.go flakiness (#4018 ) From time to time we get this CI error when testing the external issuer mechanism: ``` Test script: [external_issuer_test.go] Params: [--linkerd-namespace=l5d-integration-external-issuer --external-issuer=true] --- FAIL: TestExternalIssuer (33.61s) external_issuer_test.go:89: Received error while ensuring test app works (before cert rotation): Error stripping header and trailing newline; full output: FAIL ``` https://github.com/alpeb/linkerd2/runs/428273855?check_suite_focus=true#step:6:526 This is caused by the "backend" pod not receiving traffic from "slow-cooker" in a timely manner. After those pods are deployed we're only checking that "backend" is ready, but not "slow-cooker", so this change adds that check. I'm also removing the `TestHelper.CheckDeployment` call because it's redundant, since the preceeding `TestHelper.CheckPods` is already checking that the deployment has all the specified replicas ready.	2020-02-07 19:33:32 -05:00
Zahari Dichev	5cd3655b1e	Update helm overrides to match stable ones (#4025 ) Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-02-07 09:37:18 -05:00
Zahari Dichev	c609564dc8	Add helm upgrade integration test (#3976 ) In light of the breaking changes we are introducing to the Helm chart and the convoluted upgrade process (see linkerd/website#647) an integration test can be quite helpful. This simply installs latest stable through helm install and then upgrades to the current head of the branch. Signed-off-by: Zahari Dichev zaharidichev@gmail.com	2020-02-04 08:27:46 +02:00
Anantha Krishnan	7f026c96f6	Added check for TapAPI service (#3689 ) Added check for TapAPI service Fixes #3462 Added a checker using `kube-aggregator` client Signed-off-by: Ananthakrishnan <kannan4mi3@gmail.com>	2020-01-27 20:07:07 +02:00
Kevin Leimkuhler	8c9498def2	Temporarily fix flaky integration test (#3968 ) From the comment disabling the test: #2316 The response from `http://httpbin.org/get` is non-deterministic--returning either `http://..` or `https://..` for GET requests. As #2316 mentions, this test should not have an external dependency on this endpoint. As a workaround for edge-20.1.3, temporarily disable this test and renable with one that has reliable behavior. Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-01-24 10:32:10 -08:00
Kevin Leimkuhler	53baecb382	Changes for edge-20.1.3 (#3966 ) ## edge-20.1.3 * CLI * Introduced `linkerd check --pre --linkerd-cni-enabled`, used when the CNI plugin is used, to check it has been properly installed before proceeding with the control plane installation * Added support for the `--as-group` flag so that users can impersonate groups for Kubernetes operations (thanks @mayankshah160!) * Controller * Fixed an issue where an override of the Docker registry was not being applied to debug containers (thanks @javaducky!) * Added check for the Subject Alternate Name attributes to the API server when access restrictions have been enabled (thanks @javaducky!) * Added support for arbitrary pod labels so that users can leverage the Linkerd provided Prometheus instance to scrape for their own labels (thanks @daxmc99!) * Fixed an issue with CNI config parsing Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-01-23 16:55:21 -08:00
Zahari Dichev	e30b9a9c69	Add checks for CNI plugin (#3903 ) As part of the effort to remove the "experimental" label from the CNI plugin, this PR introduces cni checks to `linkerd check` Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-01-17 12:11:19 +02:00
Alex Leong	cabe2a13e6	Add distributed tracing integration test (#3920 ) This integration test roughly follows the [Linkerd guide to distributed tracing](https://linkerd.io/2019/10/07/a-guide-to-distributed-tracing-with-linkerd/). We deploy the tracing components (oc-collector and jaeger), emojivoto, and nginx as an ingress to do span initiation. We then watch the jaeger API and check that a trace is eventually created that includes traces from all of the data plane components: nginx, linkerd-proxy, web, voting, and emoji. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-01-16 10:57:15 -08:00
Zahari Dichev	0ee409eaa3	Fix inject integration tests failing due to wrong golden files (#3923 ) Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-01-14 12:47:16 -05:00
Alex Leong	93a81dce97	Change default proxy log level to "warn,linkerd=info" (#3908 ) Fixes #3901 Signed-off-by: Alex Leong <alex@buoyant.io>	2020-01-09 14:22:06 -08:00
Oliver Gould	d3d8d855f0	proxy: v2.83.0-experimental (#3897 ) This is an experimental release that includes large changes to the proxy's request buffering and backpressure infrastructure. Please exercise caution before deploying this proxy version into mission critical environments.	2020-01-09 14:12:46 -08:00
Zahari Dichev	b4266c93de	Ensure proxy cert does not exceed the lifetime of the certs in the trust chain (#3893 ) Fixes a problem where the identitiy serice can issue a certificate that has a lifetime larger than the issuer certificate. This was causing the proxies to end up using an invalid TLS certificate. This fix ensures that the lifetime of the issued certificate is not greater than the smallest lifetime of the certs in the issuer cert trust chain. Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-01-09 09:52:29 +02:00
Tarun Pothulapati	eac06b973c	Move common values to global (#3839 ) * move values to global in template Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * update inject and cli Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * update unit tests Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * fix linting issues Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * remote controllerImageVersion from global Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * move identity out of global Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * update var name and comments Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * update bin and helm tests Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * update helm readme Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * fix proxy config Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * fix proxy config indentation Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * more linting issues Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> * remove unnecessary lines Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-01-06 14:31:41 -08:00
Alejandro Pedraza	6f8574a633	Add event regex to ignore in integration test (#3884 ) We were ignoring events like ``` MountVolume.SetUp failed for volume .* : couldn't propagate object cache: timed out waiting for the condition ``` but as k8s 1.16 those got replaced by more precise messages, like ``` MountVolume.SetUp failed for volume "linkerd-identity-token-cm4fn" :failed to sync secret cache: timed out waiting for the condition MountVolume.SetUp failed for volume "prometheus-config" : failed to sync configmap cache: timed out waiting for the condition ``` This was causing sporadic CI test failures like [here](https://github.com/linkerd/linkerd2/runs/368424822#step:7:562) So I'm including another regex for that. Re: `96c41f8a1e`	2020-01-06 14:22:15 -05:00
Alejandro Pedraza	4abd778558	Don't hide stderr in integration tests (#3855 ) In various integration tests we're not showing stderr when a failure happens, thus hiding some possibly useful debugging info. E.g. in the latest CI failures, commands like `linkerd update` were failing with no visible reason why.	2019-12-20 09:27:18 -05:00
Alex Leong	03762cc526	Support pod ip and service cluster ip lookups in the destination service (#3595 ) Fixes #3444 Fixes #3443 ## Background and Behavior This change adds support for the destination service to resolve Get requests which contain a service clusterIP or pod ip as the `Path` parameter. It returns the stream of endpoints, just as if `Get` had been called with the service's authority. This lays the groundwork for allowing the proxy to TLS TCP connections by allowing the proxy to do destination lookups for the SO_ORIG_DST of tcp connections. When that ip address corresponds to a service cluster ip or pod ip, the destination service will return the endpoints stream, including the pod metadata required to establish identity. Prior to this change, attempting to look up an ip address in the destination service would result in a `InvalidArgument` error. Updating the `GetProfile` method to support ip address lookups is out of scope and attempts to look up an ip address with the `GetProfile` method will result in `InvalidArgument`. ## Implementation We do this by creating a `IPWatcher` which wraps the `EndpointsWatcher` and supports lookups by ip. `IPWatcher` maintains a mapping up clusterIPs to service ids and translates subscriptions to an IP address into a subscription to the service id using the underlying `EndpointsWatcher`. Since the service name is no longer always infer-able directly from the input parameters, we restructure `EndpointTranslator` and `PodSet` so that we propagate the service name from the endpoints API response. ## Testing This can be tested by running the destination service locally, using the current kube context to connect to a Kubernetes cluster: ``` go run controller/cmd/main.go destination -kubeconfig ~/.kube/config ``` Then lookups can be issued using the destination client: ``` go run controller/script/destination-client/main.go -path 192.168.54.78:80 -method get -addr localhost:8086 ``` Service cluster ips and pod ips can be used as the `path` argument. Signed-off-by: Alex Leong <alex@buoyant.io>	2019-12-19 09:25:12 -08:00
Sergio C. Arteaga	56c8a1429f	Increase the comprehensiveness of check --pre (#3701 ) * Increase the comprehensiveness of check --pre Closes #3224 Signed-off-by: Sergio Castaño Arteaga <tegioz@icloud.com>	2019-12-18 13:27:32 -05:00
Zahari Dichev	f88b55e36e	Tls certs checks (#3813 ) * Added checks for cert correctness * Add warning checks for approaching expiration * Add unit tests * Improve unit tests * Address comments * Address more comments * Prevent upgrade from breaking proxies when issuer cert is overwritten (#3821) * Address more comments * Add gate to upgrade cmd that checks that all proxies roots work with the identitiy issuer that we are updating to * Address comments * Enable use of upgarde to modify both roots and issuer at the same time Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2019-12-16 14:49:32 -08:00
Tarun Pothulapati	2f492a77fb	Switch to Smaller-Case in Linkerd2 and Partials Charts (#3823 ) * update linkerd2, partials charts * support install and inject workflow * update helm docs * update comments in values * update helm tests * update comments in test Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2019-12-13 14:48:07 -05:00
Alejandro Pedraza	2a4c71760d	Enable cert rotation test to work with dynamic namespaces, take two (#3795 ) * Enable cert rotation test to work with dynamic namespaces This PR adds support for dynamic cert generation when running the cert rotation intergration tests. This allows to avoid baking in the namespace in the certificate CN, thereby allowing us to run these tests on the clouds. The tests in #3775 were failing because the second secret holding the issuer cert replacement was a leaf cert and not a root/intermediary cert capable of signing the CSRs. This is how the replacement cert looked like: ```bash $ k -n l5d-integration-external-issuer get secrets linkerd-identity-issuer-new -ojson \| jq '.data\|.["tls.crt"]' \| tr -d '"' \| base64 -d \| step certificate inspect - Certificate: Data: Version: 3 (0x2) Serial Number: 2 (0x2) Signature Algorithm: ECDSA-SHA256 Issuer: CN=identity.l5d-integration-external-issuer.cluster.local Validity Not Before: Dec 6 19:16:08 2019 UTC Not After : Dec 5 19:16:28 2020 UTC Subject: CN=identity.l5d-integration-external-issuer.cluster.local Subject Public Key Info: Public Key Algorithm: ECDSA Public-Key: (256 bit) X: 93:d5:fa:f8:d1:44:4f:9a:8c:aa:0c:9e:4f:98:a3: 8d:28:d9:cc:f2:74:4c:5f:76:14:52:47:b9:fb:c9: a3:33 Y: d2:04:74:95:2e:b4:78:28:94:8a:90:b2:fb:66:1b: e7:60:e5:02:48:d2:02:0e:4d:9e:4f:6f:e9:0a:d9: 22:78 Curve: P-256 X509v3 extensions: X509v3 Key Usage: critical Digital Signature, Key Encipherment X509v3 Extended Key Usage: TLS Web Server Authentication, TLS Web Client Authentication X509v3 Subject Alternative Name: DNS:identity.l5d-integration-external-issuer.cluster.local Signature Algorithm: ECDSA-SHA256 30:46:02:21:00:f6:93:2f:10:ba:eb:be:bf:77:1a:2d:68:e6: 04:17:a4:b4:2a:05:80:f7:c5:f7:37:82:7b:b7:9c:a1:66:6a: e1:02:21:00:b3:65:06:37:49:06:1e:13:98:7c:cf:f9:71:ce: 5a:55:de:f6:1b:83:85:b0:a8:88:b7:cf:21:d1:16:f2:10:f9 ``` For it to be a root/intermediate cert it should have had `CA:TRUE` under the `X509v3 extensions` section. Why did the test pass sometimes? When it did pass for me, I could see in the linkerd-identity proxy logs something like: ``` ERR! [ 320.964592s] linkerd2_proxy_identity::certify Received invalid ceritficate: invalid certificate: UnknownIssuer ``` so the cert retrieved from identity still was invalid but for some reason the proxy, sometimes, keeps on going despite that. And when one would delete the linkerd-identity pod, its proxy wouldn't come up at all, also showing that error. With the changes from this branch, we no longer see that error in the logs and after deleting the linkerd-identity pod it comes back gracefully.	2019-12-11 15:50:06 -05:00
Zahari Dichev	6faf64e49f	Revert "Enable cert rotation test to work with dynamic namespaces (#3775 )" (#3787 ) This reverts commit `0e45b9c03d`. Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2019-12-05 15:33:22 -05:00
Zahari Dichev	0e45b9c03d	Enable cert rotation test to work with dynamic namespaces (#3775 ) This PR adds support for dynamic cert generation when running the cert rotation intergration tests. This allows to avoid baking in the namespace in the certificate CN, thereby allowing us to run these tests on the clouds. * Enable cert rotation test to work with dynamic namespaces Signed-off-by: Zahari Dichev <zaharidichev@gmail.com> * Address comments Signed-off-by: Zahari Dichev <zaharidichev@gmail.com> * Address further comments Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2019-12-05 10:08:01 +02:00
Alex Leong	ecb55bb1a3	Disable TestCliStatForLinkerdNamespace integration test (#3727 ) https://github.com/linkerd/linkerd2/pull/3693 caused the proxy to start resolving private IP addresses with the destination service. However, the destination service does not support IP lookups and returns failures for these lookups. This negatively affects the destination service success rate and can cause this test to fail. We disable this test for now until the destination service supports IP lookups. Signed-off-by: Alex Leong <alex@buoyant.io>	2019-11-14 13:13:10 -08:00
Alejandro Pedraza	29459a50e6	Removed 'no invalid service profiles' from linkerd check test fixtures (#3724 ) Followup to #3718	2019-11-14 10:52:38 -08:00
Zahari Dichev	2d224302de	Add integration test for external issuer and cert rotation flows (#3709 ) Signed-off-by: zaharidichev <zaharidichev@gmail.com>	2019-11-14 06:58:32 +02:00
Zahari Dichev	a6ff442789	Traffic split integration test (#3649 ) * Traffic split integration test Signed-off-by: zaharidichev <zaharidichev@gmail.com> * Address comments Signed-off-by: zaharidichev <zaharidichev@gmail.com> * Display placeholder when there is no basic stats data Signed-off-by: zaharidichev <zaharidichev@gmail.com>	2019-11-13 21:14:34 +02:00
Alejandro Pedraza	4b6254b52e	Replaced `uuid` with `uid` from linkerd-config resource (#3694 ) * Replaced `uuid` with `uid` from linkerd-config resource Fixes #3621 Removed the old `uuid` for identifying linkerd installations, and replaced it with the `uid` property from the `linkerd-config` ConfigMap. I tested that this `uid` remains the same by updating the config and also upgrading linkerd, using both the CLI and Helm. Note that this required granting `linkerd-web` RBAC access to the `linkerd-config` Config. I also added an integration test to verify the stability of the uid.	2019-11-13 13:56:01 -05:00
Alex Leong	5da1a0723d	Disable edges integration test (#3707 ) The edges integration test can fail when more edges are added to the Linked namespace due to https://github.com/linkerd/linkerd2/issues/3706. We disable this test until that issue can be resolved. Signed-off-by: Alex Leong <alex@buoyant.io>	2019-11-13 10:04:40 -08:00
Zahari Dichev	6e67f1a8ca	Modify knownEventWarningsRegex regex to catch ipv6 error events (#3699 ) Signed-off-by: zaharidichev <zaharidichev@gmail.com>	2019-11-13 09:38:01 +02:00
Zahari Dichev	038900c27e	Remove destination container from controller (#3661 ) Signed-off-by: zaharidichev <zaharidichev@gmail.com>	2019-11-08 14:40:25 -08:00
Zahari Dichev	7dd5dfc2ba	Check health of meshed apps before and after linkerd upgrade (#3641 ) * Check stats of deployed app before and after linkerd upgrade to ensure nothing broke Signed-off-by: zaharidichev <zaharidichev@gmail.com> * Address naming remarks Signed-off-by: zaharidichev <zaharidichev@gmail.com> * Improve application health checking Signed-off-by: zaharidichev <zaharidichev@gmail.com>	2019-11-07 20:48:12 +02:00
Zahari Dichev	1bb9d66757	Integration test for custom cluster domain (#3660 ) Signed-off-by: zaharidichev <zaharidichev@gmail.com>	2019-11-04 14:49:52 -08:00
Zahari Dichev	a8170bd634	Add preinstall checks for deletion and creation of secrets (#3639 ) Signed-off-by: zaharidichev <zaharidichev@gmail.com>	2019-10-31 18:01:03 +02:00
Alex Leong	befea4aff6	Add direct edges integration test (#3603 ) Add an integration test which exercises the behavior when one meshed pod connects to another meshed pod by pod ip address. The current behavior is that the Linkerd proxy will not do any lookup against the destination service for this kind of connection and will proxy directly to the SO_ORIG_DST. This means that it will not have the identity metadata necessary to TLS the connection, and the connection will not be present in the `linkerd edges` command output. This test validates that behavior. The purpose of this test is to set the stage for future work which will allow the Linkerd proxy to TLS this type of connection and display it in `linkerd edges`. The assertions in this test will be updated as part of that work. This test will be run as part of the integration test suite. It can also be run directly: ``` go test --failfast --mod=readonly test/install_test.go --linkerd=(pwd)"/bin/linkerd" --k8s-context="$CTX" --integration-tests go test -v --mod=readonly test/edges/edges_test.go --linkerd=(pwd)"/bin/linkerd" --k8s-context="$CTX" --integration-tests ``` Signed-off-by: Alex Leong <alex@buoyant.io>	2019-10-30 10:48:03 -07:00
Ivan Sim	ff69c29f5e	Add missing package to proxy Dockerfile (#3583 ) * Add missing package to proxy Dockerfile * Fix failing 'check' integration test * Trim whitespaces in certs comparison. Without this change, the integration test would fail because the trust anchor stored in the linkerd-config config map generated by the Helm renderer is stripped of the line breaks. See charts/linkerd2/templates/_config.tpl Signed-off-by: Ivan Sim <ivan@buoyant.io>	2019-10-15 15:51:26 -07:00
Ivan Sim	cf69dedf9c	Re-add the destination container to the controller spec (#3540 ) * Re-add the destination container to the controller spec This fix is necessary to avoid data plane downtime during an upgrade to stable-2.6. All existing older proxies will continue to send requests to this destination container, until the data plane is restarted. On restart, the new pods will start forwarding their requests to the new linkerd-dst service. * Use the 2.6 destination service fqdn * Fixed unit tests * Fix integration test failure Signed-off-by: Ivan Sim <ivan@buoyant.io>	2019-10-08 10:49:40 -07:00
Alejandro Pedraza	c21ceb5b4e	Add integration tests for linkerd endpoints and edges (#3491 ) * Add integration tests for linkerd endpoints and edges Fixes #3477 and #3478	2019-10-01 15:46:27 -05:00
Alejandro Pedraza	6568929028	Add --disable-heartbeat flag for linkerd install\|upgrade (#3439 ) Fixes #278 Add `linkerd install\|upgrade --disable-heartbeat` flag, and have `linkerd check` check for the heartbeat's SA only if it's enabled. Also added those flags into the `linkerd upgrade -h` examples. Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>	2019-09-25 15:53:36 -05:00
Alejandro Pedraza	1653f88651	Put the destination controller into its own deployment (#3407 ) * Put the destination controller into its own deployment Fixes #3268 Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>	2019-09-18 13:41:06 -05:00
Andrew Seigner	a5a6e8ff9f	Fix integration test event regex matching (#3416 ) The integration tests check for known k8s events using a regex. This regex included an incorrect pattern that prepended a failure reason and object, rather than simply the event message we were trying to match on. This resulted in failures such as: https://github.com/linkerd/linkerd2/runs/217872818#step:6:476 Fix the regex to only check for the event message. Also explicitly differentiate reason, object, and message in the log output. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2019-09-10 13:24:22 -07:00
Andrew Seigner	9bb7b6f119	Make KillPodSandbox regex match broader (#3409 ) We're getting flakey `KillPodSandbox` events in the integration tests: https://github.com/linkerd/linkerd2/runs/216505657#step:6:427 This is despite adding a regex for these events in #3380. Modify the KillPodSandbox event regex to match on a broader set of strings. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2019-09-09 11:54:14 -07:00
Bruno M. Custódio	8fec756395	Add '--address' flag to 'linkerd dashboard'. (#3274 ) Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com>	2019-09-05 10:56:10 -07:00
Andrew Seigner	e51af8c8a9	Add known KillPod k8s event to integration test (#3380 ) FailedKillPod events were causing integration tests to fail: https://github.com/linkerd/linkerd2/runs/212313175#step:6:409 Add FailedKillPod as a known event. Example: https://play.golang.org/p/WV52tyZgijW Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2019-09-04 14:00:16 -07:00
Andrew Seigner	a8481b721a	GitHub Actions, kind, integration test logs fixes (#3372 ) PR #3339 introduced a GitHub Actions CI workflow. Booting 6 clusters simultaneously (3x Github Actions + 3x Travis) exhibits some transient failures. Implement fixes in GitHub Actions and integration tests to address kind cluster creation and testing: - Retry kind cluster creation once. - Retry log reading from integration k8s clusters once. - Add kind cluster creation debug logging. - Add a GitHub Actions status badge to top of `README.md`. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2019-09-04 12:44:27 -07:00
Alejandro Pedraza	acbab93ca8	Add support for k8s 1.16 (#3364 ) Fixes #3356 1.16 removes some api groups that were already deprecated. From k8s blog post (https://kubernetes.io/blog/2019/07/18/api-deprecations-in-1-16/): ``` - PodSecurityPolicy: will no longer be served from extensions/v1beta1 in v1.16. Migrate to the policy/v1beta1 API, available since v1.10. Existing persisted data can be retrieved/updated via the policy/v1beta1 API. - DaemonSet, Deployment, StatefulSet, and ReplicaSet: will no longer be served from extensions/v1beta1, apps/v1beta1, or apps/v1beta2 in v1.16. Migrate to the apps/v1 API, available since v1.9. Existing persisted data can be retrieved/updated via the apps/v1 API. ``` Previous PRs had already made this change at the Helm templates level, but we still needed to do it at the API calls and tests. The integration tests ran fine for k8s 1.12 and 1.15. They fail on 1.16 because the upgrade integration test tries to install linkerd 2.5 which is not compatible with 1.16. Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>	2019-09-04 09:59:55 -05:00
Alejandro Pedraza	368d16f23c	Fix auto-injecting pods and integration tests reporting (#3335 ) * Fix auto-injecting pods and integration tests reporting When creating an Event when auto-injection occurs (#3316) we try to fetch the parent object to associate the event to it. If the parent doesn't exist (like in the case of stand-alone pods) the event isn't created. I had missed dealing with one part where that parent was expected. This also adds a new integration test that I verified fails before this fix. Finally, I removed from `_test-run.sh` some `\|\| exit_code=$?` that was preventing the whole suite to report failure whenever one of the tests in `/tests` failed. Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>	2019-08-28 15:04:20 -05:00
Andrew Seigner	956d1bff06	Update warning event regex for integration test (#3336 ) Kubernetes was generating events for failed readiness probes that did not quite match the expected events regex in the install integration test: https://travis-ci.org/linkerd/linkerd2/jobs/577642724#L647 Update the readiness probe regex to handle these variations in events: https://play.golang.org/p/OVGJkFNN-XA Relates to CI failure in #3333. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2019-08-28 10:19:40 -07:00
Andrew Seigner	419e9052ff	Fix flakey upgrade integration test (#3329 ) The `linkerd upgrade` integration test compares the output from two commands: - `linkerd upgrade control-plane` - `linkerd upgrade control-plane --from-manifests` The output of these commands include the heartbeat cronjob schedule, which is generated based on the current time. Modify the upgrade integration test to retry the manifest comparison one time, assuming that `linkerd upgrade control-plane` should not take more than one minute to execute. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2019-08-28 09:41:09 -07:00
Alejandro Pedraza	9ee98d35be	Stop ignoring client-go log entries (#3315 ) * Stop ignoring client-go log entries Pipe klog output into logrus. Not doing this avoids us from seeing client-go log entries, for some reason I don't understand. To enable, `--controller-log-level` must be `debug`. This was discovered while trying to debug sending events for #3253. I added an integration test that fails when this piping is not in place. Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>	2019-08-26 15:46:31 -05:00
Ivan Sim	954a45f751	Fix broken unit and integration tests (#3303 ) Signed-off-by: Ivan Sim <ivan@buoyant.io>	2019-08-21 18:52:19 -07:00
Alejandro Pedraza	fd5fc07db1	Fix integration test (#3266 ) Followup to #3194 The namespace was too long for l5d-bot: ``` inject_test.go:117: failed to create l5d-integration-auto-git-9688d9ba-inject-namespace-override-test namespace: Namespace "l5d-integration-auto-git-9688d9ba-inject-namespace-override-test" is invalid: metadata.name: Invalid value: "l5d-integration-auto-git-9688d9ba-inject-namespace-override-test": must be no more than 63 characters ``` Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>	2019-08-15 09:13:36 -05:00
Tarun Pothulapati	242566ac7c	Check for Namespace level config override annotations (#3194 ) * Check for Namespace level config override annotations * Add unit tests for namespace level config overrides * add integration test for namespace level config override * use different namespace for override tests * check resource requests for integration tests Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2019-08-14 21:01:44 -07:00
Alejandro Pedraza	4e65ed1e6a	Update integration test with new Helm values struct (#3246 ) Followup to #3229 Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>	2019-08-13 19:00:55 -07:00
Alejandro Pedraza	d64a2f3689	Add integration test for `helm install` (#3223 ) Ref #3143 Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>	2019-08-13 09:14:32 -05:00
Alejandro Pedraza	3ae653ae92	Refactor proxy injection to use Helm charts (#3200 ) * Refactor proxy injection to use Helm charts Fixes #3128 A new chart `/charts/patch` was created, that generates the JSON patch payload that is to be returned to the k8s API when doing the injection through the proxy injector, and it's also leveraged by the `linkerd inject --manual` CLI. The VFS was used by `linkerd install` to access the old chart under `/chart`. Now the proxy injection also uses the Helm charts to generate the JSON patch (see above) so we've moved the VFS from `cli/static` to a new common place under `/pkg/charts/static`, and the new root for the VFS is now `/charts`. `linkerd install` hasn't yet migrated to use the new charts (that'll happen in #3127), so the only change in that regard was the creation of `/charts/chart` which is a symlink pointing to `/chart` that `install.go` now uses, so that the VFS contains both the old and new charts, as a temporary measure. You can see that `/bin/Dockerfile-bin`, `/controller/Dockerfile` and `/bin/build-cli-bin` do now `go generate` pointing to the new location (and the `go generate` annotation was moved from `/cli/main.go` to `pkg/charts/static/templates.go`). The symlink trick doesn't work when building the binaries through Docker, so `/bin/Dockerfile-bin` replaces the symlink with an actual copy of `/chart`. Also note that in `/controller/Dockerfile` we now need to include the `prod` tag in `go install` like we do in `/bin/Dockerfile-bin` so that the proxy injector does use the VFS instead of the local file system. - The common logic to parse a chart has been moved from `install.go` to `/pkg/charts/util.go`. - The special ENV var in the proxy for "outbound router capacity" that only applies to the Prometheus pod is now handled directly in the proxy partial and all the associated go code could be removed. - The `patch.go` lib for generating the JSON patch in go along with its tests `patch_test.go` are no longer needed. - Lots of functions in `/pkg/inject/inject.go` got removed/simplified with their logic being moved into the charts themselves. As a consequence lots of things in `inject_test.go` became irrelevant. - Moved `template-values.go` from `/pkg/inject` to `pkg/charts` as that contains the go structs representation of the chart variables that will be leveraged in #3127. Don't forget to run `/bin/helm.sh` whenever you make changes to charts ;-) Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>	2019-08-07 17:32:37 -05:00
Andrew Seigner	a59c1dd32d	Introduce tap APIService, update `linkerd tap` (#3167 ) The Tap Service enabled tapping of any meshed pod, regardless of user privilege. This change introduces a new Tap APIService. Kubernetes provides authentication and authorization of Tap requests, and then forwards requests to a new Tap APIServer, which implements a Kubernetes aggregated APIServer. The Tap APIServer authenticates the client TLS from Kubernetes, and authorizes the user via a SubjectAccessReview. This change also modifies the `linkerd tap` command to make requests against the new APIService. The Tap APIService implements these Kubernetes-style endpoints: POST /apis/tap.linkerd.io/v1alpha1/watch/namespaces/:ns/tap POST /apis/tap.linkerd.io/v1alpha1/watch/namespaces/:ns/:res/:name/tap GET /apis GET /apis/tap.linkerd.io GET /apis/tap.linkerd.io/v1alpha1 GET /healthz GET /healthz/log GET /healthz/ping GET /metrics GET /openapi/v2 GET /version Users authorize to the new `tap.linkerd.io/v1alpha1` via RBAC. Only the `watch` verb is supported. Access is also available via subresources such as `deployments/tap` and `pods/tap`. This change introduces the following resources into the default Linkerd install: - Global - APIService/v1alpha1.tap.linkerd.io - ClusterRoleBinding/linkerd-linkerd-tap-auth-delegator - `linkerd` namespace: - Secret/linkerd-tap-tls - `kube-system` namespace: - RoleBinding/linkerd-linkerd-tap-auth-reader Tasks not covered by this PR: - `linkerd top` - `linkerd dashboard` - `linkerd profile --tap` - removal of the unauthenticated tap controller Fixes #2725, #3162, #3172 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2019-08-01 14:02:45 -07:00
Alex Leong	ab7226cbcd	Return invalid argument for external name services (#3120 ) Fixes https://github.com/linkerd/linkerd2/issues/2800#issuecomment-513740498 When the Linkerd proxy sends a query for a Kubernetes external name service to the destination service, the destination service returns `NoEndpoints: exists=false` because an external name service has no endpoints resource. Due to a change in the proxy's fallback logic, this no longer causes the proxy to fallback to either DNS or SO_ORIG_DST and instead fails the request. The net effect is that Linkerd fails all requests to external name services. We change the destination service to instead return `InvalidArgument` for external name services. This causes the proxy to fallback to SO_ORIG_DST instead of failing the request. Signed-off-by: Alex Leong <alex@buoyant.io>	2019-07-29 16:31:22 -07:00
Andrew Seigner	065dd3ec9d	Add "can create cronjobs" to linkerd check (#3133 ) PR #3056 introduced a cluster heartbeat cronjob to the Linkerd installation. This implies the user installing Linkerd requires the privileges to create CronJobs. Update `linkerd check` to validate the user has privileges necessary to create CronJobs. Fixes #3057 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2019-07-26 13:09:41 -07:00
Andrew Seigner	64ed8e4a74	Introduce Cluster Heartbeat cronjob (#3056 ) `linkerd check`, the web dashboard, and Grafana all perform version checks to validate Linkerd is up to date. It's common for users to seldom execute these codepaths. This makes it difficult to identify what versions of Linkerd are currently in use and what environments it is being run in, which helps prioritize testing and backports. Introduce a `heartbeat` CronJob to the default Linkerd install. The cronjob executes every 24 hours, starting from 5 minutes after `linkerd install` is run. Example check URL: https://versioncheck.linkerd.io/version.json? install-time=1562761177& k8s-version=v1.15.0& meshed-pods=8& rps=3& source=heartbeat& uuid=cc4bb700-3314-426a-9f0f-ec588b9df020& version=git-b97ee9f7 Fixes #2961 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2019-07-23 17:12:30 -07:00
Alex Leong	d6ef9ea460	Update ServiceProfile CRD to version v1alpha2 and remove validation (#3078 ) The openAPIV3Schema validation in the ServiceProfiles CRD is very limited in what it can validate and is obviated by more sophisticated validation done by the validating admission controller. Therefore, we would like to remove the openAPIV3Schema validation to reduce the size and complexity of the CRD object. To do so, we must also bump the version of the ServiceProfile custom resource from v1alpha1 to v1alpha2. This ensures that when the controller is upgraded, it will attempt to watch the v1alpha2 resource. If it cannot (because, for example, the controller pod started before the ServiceProfile CRD was updated and therefore the v1alpha2 version does not exist) then it will go into a crash loop backoff until it can. This essentially means that the controller will wait for the CRD to be upgraded to include v1alpha2 before it will start. Bumping the version is necessary because if we did not, it would be possible for the controller to start before the CRD is updated (removing the validation). In this case, when the CRD is edited, the controller will lose its list watch on ServiceProfiles and will stop getting updates. Signed-off-by: Alex Leong <alex@buoyant.io>	2019-07-23 11:46:31 -07:00
Ivan Sim	36681218ba	Revert "Increase the retry duration in the post-upgrade 'check' integration test (#2944 )" (#3091 ) This reverts commit `60c58c1f85`. Signed-off-by: Ivan Sim <ivan@buoyant.io>	2019-07-18 09:25:27 -07:00

1 2 3 4 5 ...

260 Commits