Commit Graph

549 Commits

Author SHA1 Message Date
Zahari Dichev 7f3d872930
Add destination-get-networks option (#4608)
In #4585 we are observing an issue where a loop is encountered when using nginx ingress. The problem is that the outbound proxy does a dst lookup on the IP address which happens to be the very same address the ingress is listening on.

In order to avoid situations like that this PR introduces a way to modify the set of networks for which the proxy shall do IP based discovery. The change introduces a helm flag `.Values.global.proxy.destinationGetNetworks` that can be used to modify this value. There are two ways a user can affect the this setting: 


- setting the `destinationGetNetworks` field in values during a Helm install, which changes the default on all injected pods
- using an annotation ` config.linkerd.io/proxy-destination-get-networks` for injected workloads to override this value

Note that this setting cannot be tweaked through the `install` or `inject` command

Fix: #4585

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-06-18 20:07:47 +03:00
Alex Leong 755538b84a
Resolve gateway hostnames into IP addresses (#4588)
Fixes #4582 

When a target cluster gateway is exposed as a hostname rather than with a fixed IP address, the service mirror controller fails to create mirror services and gateway mirrors for that gateway.  This is because we only look at the IP field of the gateway service.

We make two changes to address this problem:
 
First, when extracting the gateway spec from a gateway that has a hostname instead of an IP address, we do a DNS lookup to resolve that hostname into an IP address to use in the mirror service endpoints and gateway mirror endpoints.

Second, we schedule a repair job on a regular (1 minute) to update these endpoint objects.  This has the effect of re-resolving the DNS names every minute to pick up any changes in DNS resolution.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-06-15 10:33:49 -07:00
Zahari Dichev f01bcfe722
Tweak service-mirror log levels (#4562)
This PR just modifies the log levels on the probe and cluster watchers
to emit in INFO what they would emit in DEBUG. I think it makes sense
as we need that information to track problems. The only difference is
that when probing gateways we only log if the probe attempt was
unsuccessful.

Fix #4546
2020-06-05 13:12:36 -07:00
Zahari Dichev 3365455e45
Fix mc labels (#4560)
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-06-05 19:36:09 +03:00
Zahari Dichev b6b95455aa
Fix load balancer missing ip race condition (#4554)
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-06-05 19:35:47 +03:00
Alex Leong cffa07ddba
Update gateway identity on gateway mirror endpoints (#4559)
When the identity annotation on a gateway service is updated, this change is not propagated to the mirror gateway endpoints object.

This is because the annotations are updated on the wrong object and the changes are lost.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-06-05 09:21:35 -07:00
Alex Leong 0f84ff61db
Update gateway mirror ports (#4551)
* Update gateway mirror spec when remote gateway changes

Signed-off-by: Alex Leong <alex@buoyant.io>

* Only update ports

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-06-04 17:25:46 +03:00
Kevin Leimkuhler 8a932ac905
Change text to use source/target terminology in events and metrics (#4527)
Change terminology from local/remote to source/target in events and metrics.

This does not change any variable, function, struct, or field names since
testing is still improving

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-06-03 15:02:39 -04:00
Oliver Gould 7cc5e5c646
multicluster: Use the proxy as an HTTP gateway (#4528)
This change modifies the linkerd-gateway component to use the inbound
proxy, rather than nginx, for gateway. This allows us to detect loops and
propagate identity through the gateway.

This change also cleans up port naming to `mc-gateway` and `mc-probe`
to resolve conflicts with Kubernetes validation.

---

* proxy: v2.99.0

The proxy can now operate as gateway, routing requests from its inbound
proxy to the outbound proxy, without passing the requests to a local
application. This supports Linkerd's multicluster feature by adding a
`Forwarded` header to propagate the original client identity and assist
in loop detection.

---

* Add loop detection to inbound & TCP forwarding (linkerd/linkerd2-proxy#527)
* Test loop detection (linkerd/linkerd2-proxy#532)
* fallback: Unwrap errors recursively (linkerd/linkerd2-proxy#534)
* app: Split inbound/outbound constructors into components (linkerd/linkerd2-proxy#533)
* Introduce a gateway between inbound and outbound (linkerd/linkerd2-proxy#540)
* gateway: Add a Forwarded header (linkerd/linkerd2-proxy#544)
* gateway: Return errors instead of responses (linkerd/linkerd2-proxy#547)
* Fail requests that loop through the gateway (linkerd/linkerd2-proxy#545)

* inject: Support config.linkerd.io/enable-gateway

This change introduces a new annotation,
config.linkerd.io/enable-gateway, that, when set, enables the proxy to
act as a gateway, routing all traffic targetting the inbound listener
through the outbound proxy.

This also removes the nginx default listener and gateway port of 4180,
instead using 4143 (the inbound port).

* proxy: v2.100.0

This change modifies the inbound gateway caching so that requests may be
routed to multiple leaves of a traffic split.

---

* inbound: Do not cache gateway services (linkerd/linkerd2-proxy#549)
2020-06-02 19:37:14 -07:00
Kevin Leimkuhler d7f84e6c7b
Change help text to use source/target terminology in service-mirror and healthchecks (#4524)
Change terminology from local/remote to source/target in service-mirror and
healthchecks help text.

This does not change any variable, function, struct, or field names since
testing is still improving

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-06-02 15:21:52 -04:00
Alex Leong 91a067c924
Rename gateway ports (#4526)
* Rename gateway ports

Signed-off-by: Alex Leong <alex@buoyant.io>

* fmt

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-06-02 09:08:23 +03:00
Kevin Leimkuhler b4804a0bb5
Format fix (#4525)
Fixes CI failures

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-06-01 18:51:00 -04:00
Zahari Dichev 6c3922a7f1
Probe manager simplification (#4510)
There are a few notable things happening in this PR: 

- the probe manager has been decoupled from the cluster_watcher. Now its only responsibility is to watch for mirrored gateways beeing created and to probe them. This means that probes are initiated for all gateways no matter whether there are mirrored services being paired
- the number of paired services is derived from the existing services in the cluster rather than being published as a metric by the prober
- there are no events being exchanged between the cluster watcher and the probe manager

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-06-01 14:41:29 -07:00
Zahari Dichev f7f70690fb
Fix resync bug + service selection annotations (#4453)
THis PR addresses two problems: 

- when a resync happens (or the mirror controller is restarted) we incorrectly classify the remote gateway as a mirrored service that is not mirrored anymore and we delete it
- when updating services due to a gateway update, we need to select only the services for the particular cluster

The latter fixes #4451
2020-05-21 14:15:13 -07:00
Alex Leong acacf2e023
Add --close-wait-timeout inject flag (#4409)
Depends on https://github.com/linkerd/linkerd2-proxy-init/pull/10

Fixes #4276 

We add a `--close-wait-timeout` inject flag which configures the proxy-init container to run with `privileged: true` and to set `nf_conntrack_tcp_timeout_close_wait`. 

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-05-21 14:14:14 -07:00
Alex Leong 9cd4557644
Properly show the meshed count for non-selector services (#4446)
When viewing the output of `linkerd stat` for services which do not have a selector (such as services created by the service-mirror, for example) the meshed count column shows the total number which exist, even though the service actually selects no pods at all.

We update the StatSummary implementation to account for services which have no selector.

Additionally, we update the logic of the `--unmeshed` flag.  When the `--unmeshed` flag is not set, we typically skip rows for unmeshed resources because those resources would have no stats.  This is not appropriate to do when the `--from` flag is also set because in this case, metrics are not collected on the target resource but are instead collected on the client-side.  This means that stats can be present, even for unmeshed resources and these resources should still be displayed, even if the `--unmeshed` flag is not set.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-05-20 10:08:27 -07:00
Zahari Dichev 31e33d18d3
Enable service mirroring to work in private networks (#4440)
This change creates a gateway proxy for every gateway. This enables the probe worker to leverage the destination service functionality in order to discover the identity of the gateway.

Fix #4411

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-20 19:48:36 +03:00
Zahari Dichev 6574f124a7
Restrict Service mirror RBACs (#4426)
This PR introduces a few changes that were requested after a bit of service mirror reviewing.

- we restrict the RBACs so the service mirror controller cannot read secrets in all namespaces but only in the one that it is installed in
- we unify the namespace namings so all multicluster resources are installedi n `linkerd-multicluster` on both clusters
- fixed checks to account for changes

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-20 17:08:01 +03:00
Zahari Dichev 4176580a0f
Threadsafe buffering listener (#4359)
* Add thread safety to watcher tests

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-14 20:45:41 +03:00
Zahari Dichev 115bab9868
Fix gateway update problems (#4388)
* Fix gateway update problems

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-14 10:59:30 -05:00
Zahari Dichev ef1a2c2b10
Multicluster dashboard for traffic metrics (#4178)
This change adds labels to endpoints that target remote services. It also adds a Grafana dashboard that can be used to monitor multicluster traffic.

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-14 17:48:27 +03:00
Zahari Dichev fd59ce532d
Add better logging to service mirror controller (#4361)
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-11 10:30:16 +03:00
Zahari Dichev edd9b654a7
Make gateway require TLS for incoming requests (#4339)
Make gateway require TLS for incoming requests

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-11 10:07:48 +03:00
Alex Leong 8fbaa3ef9b
Don't send NoEndpoints during pod updates for ip watches (#4338)
When the proxy has an IP watch on a pod and the destination controller gets a pod update event, the destination controller sends a NoEndpoints message to all listeners followed by an Add with the new pod state.  This can result in the proxy's load balancer being briefly empty and could result in failing requests in the period.  

Since consecutive Add events with the same address will override each other, we can simply send the Adds without needing to clear the previous state with a NoEndpoints message.
2020-05-07 16:10:17 -07:00
Zahari Dichev 4e82ba8878
Multicluster checks (#4279)
Multicluster checks

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-05 10:19:38 +03:00
Zahari Dichev cd04b94bb9
Probe manager events emission tests (#4312)
Probe manager events emission tests

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-05 08:57:05 +03:00
Alex Leong 40b921508f
Inject LINKERD2_PROXY_DESTINATION_GET_NETWORKS proxy variable (#4300)
Fixes #3807

By setting the LINKERD2_PROXY_DESTINATION_GET_NETWORKS environment variable, we configure the Linkerd proxy to do destination lookups for authorities which are IP addresses in the private network range.  This allows us to get destination metadata including identity for HTTP requests which target an IP address in the cluster, Prometheus metrics scrape requests, for example.

This change allowed us to update the "direct edges" test which ensures that the edges command produces correct output for traffic which is addressed directly to a pod IP.

We also re-enabled the "linkerd stat" integration tests which had been disabled while the destination service did not yet support these types of IP queries.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-04-30 11:22:24 -07:00
Zahari Dichev 5149152ef3
Multicluster gateway and remote setup command (#4265)
Add multicluster gateway and setup command

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-04-29 20:33:23 +03:00
Zahari Dichev 17dacf5548
Add gateways command, allowing the retrieval of gateway stats (#4241)
Add gateways command, allowing the retrieval of gateway stats

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-04-27 13:55:01 +03:00
Zahari Dichev 09262ebd72
Add liveliness checks and metrics for multicluster gateway (#4233)
Add liveliness checks for gateway

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-04-27 13:06:58 +03:00
Tarun Pothulapati 2b1cbc6fc1
charts: Using downwardAPI to mount labels to the proxy container (#4199)
* use downward API to mount labels to the proxy container as a volume
* add namespace as a label to the pod
* add a trace inject test
* add downwardAPi for controlplaneTracing
* add controlPlaneTracing condition to volumeMounts
* update add-ons to have workload-ns
* add workload-ns label to control-plane components

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-04-22 10:33:51 -05:00
Alex Leong 9bf54d36ed
Upgrade to go 1.14.2 (#4278)
Upgrade Linkerd's base docker image to use go 1.14.2 in order to stay modern.

The only code change required was to update a test which was checking the error message of a `crypto/x509.CertificateInvalidError`.  The error message of this error changed between go versions.  We update the test to not check for the specific error string so that this test passes regardless of go version.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-04-20 17:14:51 -07:00
Alex Leong 5d3862c120
Use /live for liveness probe (#4270)
Fixes #3984

We use the new `/live` admin endpoint in the Linkerd proxy for liveness probes instead of the `/metrics` endpoint.  This endpoint returns a much smaller payload.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-04-17 14:53:32 -07:00
Kevin Leimkuhler b6aad75b35
Add `operationID` field to tap openapi response (#4245)
This fixes an issue users are experiencing when upgrading from from Linkerd
2.6 to 2.7 and use the [kubernetes-external-secrets]() project.

The change introduced by #3700 resulted in the tap service showing up in the
`/openapi/v2` API response. I confirmed this with a local build.

A dependency within the project expects the `operationID` field to be present
in the swagger definition. It is optional as stated in the
[spec](https://swagger.io/docs/specification/paths-and-operations/). It's
purpose is to identify an operation and should be unique.

This change adds that field to tap service swagger spec. While this can be
fixed in the KES dependency, it certainly does not hurt to add and other
libraries may similarly expect this field.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-04-15 09:41:06 -07:00
Zahari Dichev 26c14d3c66
Detect changes in addresses when getting updates in endpoints watcher (#4104)
Detect changes in addresses when getting updates in endpoints watcher

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-04-10 11:42:39 +03:00
Alex Leong d8eebee4f7
Upgrade to client-go 0.17.4 and smi-sdk-go 0.3.0 (#4221)
Here we upgrade our dependencies on client-go to 0.17.4 and smi-sdk-go to 0.3.0.  Since smi-sdk-go uses client-go 0.17.4, these upgrades must be performed simultaneously.

This also requires simultaneously upgrading our dependency on linkerd/stern to a SHA which also uses client-go 0.17.4.  This keeps all of our transitive dependencies synchronized on one version of client-go.

This ALSO requires updating our codegen scripts to use the 0.17.4 version of code-generator and running it to generate 0.17.4 compatible generated code.  I took this opportunity to update our code generation script to properly use the version of code-generater from `go.mod` rather than a hardcoded SHA.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-04-01 10:07:23 -07:00
Zahari Dichev 10ecd8889e
Set auth override (#4160)
Set AuthOverride when present on endpoints annotation

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-03-25 10:56:36 +02:00
Mayank Shah 963b9b049a
Add kubectl-style label selectors (#4120)
* Update tap, routes and top commands to support label selectors

Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
2020-03-20 10:45:06 -05:00
Alejandro Pedraza 8f79e07ee2
Bump proxy-init to v1.3.2 (#4170)
* Bump proxy-init to v1.3.2

Bumped `proxy-init` version to v1.3.2, fixing an issue with `go.mod`
(linkerd/linkerd2-proxy-init#9).
This is a non-user-facing fix.
2020-03-17 14:49:25 -05:00
Kevin Leimkuhler 10db65bcb3
Update linkerd/stern to fix go.mod parsing (#4173)
## Motivation

I noticed the Go language server stopped working in VS Code and narrowed it
down to `go build ./...` failing with the following:

```
❯ go build ./...
go: github.com/linkerd/stern@v0.0.0-20190907020106-201e8ccdff9c: parsing go.mod: go.mod:3: usage: go 1.23
```

This change updates `linkerd/stern` version with changes made in
linkerd/stern#3 to fix this issue.

This does not depend on #4170, but it is also needed in order to completely
fix `go build ./...`
2020-03-17 11:16:18 -07:00
Zahari Dichev 2db307ee91
Remove target port requirement in port resolution (#4174)
This change removes the target port requirement when resolving ports in the dst service. Based on the comments, it seems that we need to have a target port defined in the port spec in order to resolve to the port in the Endpoints. In reality if target port is note defined when creating the service, k8s will set the port and the target port to the same value. Seems to me that checking for the targetPort to be different than 0, is a no-op.

Signed-off-by: Zahari Dichev zaharidichev@gmail.com
2020-03-16 23:04:08 +02:00
Zahari Dichev caf4e61daf
Enable identitiy on endpoints not associated with pods (#4134)
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-03-09 20:55:57 +02:00
Zahari Dichev 72fc94b03c
Service mirroring tests (#4115)
Unit tests that exercise most of the code in cluster_watcher.go. Essentially the whole cluster mirroring machinary can be tought of as a function that takes remote cluster state, local cluster state, and modification events and as a result it either modifies local cluster state or issues new events onto the queue. This is what these tests are trying to model. I think this covers a lot of the logic there. Any suggestions for other edge cases are welcome.

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-03-04 20:17:21 +02:00
Zahari Dichev edd7fd203d
Service Mirroring Component (#4028)
This PR introduces a service mirroring component that is responsible for watching remote clusters and mirroring their services locally.

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-03-02 21:16:08 +02:00
Christy Jacob 8111e54606
Check for extension server certificate (#4062)
* Check Extension api server Authentication
* Added Checks and tests for extension api-server authentication
* Fixed Failing Static Checks
* Updated the golden file

Signed-off-by: Christy Jacob <christyjacob4@gmail.com>
2020-02-28 13:39:02 -08:00
Mayank Shah 3c3a4a5f5d
cli: Add label selector flag for `stat` (#4040)
* Update `linkerd-namespace` shorthand to `L`
* Add --selector (-l) flag for `stat`

Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
2020-02-17 13:40:07 -05:00
Zahari Dichev 6fa9407318
Ensure we get the correct type out of Informer Deletion events (#4034)
Ensure we get what we expect when receiving DELETE events from the k8s Informer api

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-02-15 10:15:24 +02:00
Alex Leong ec51434eb9
Show traffic split metrics from sources in all namespaces (#3967)
Fixes #3562 

When a pod in one namespace sends traffic to a service which is the apex of a traffic split in another namespace, that traffic is not displayed in the `linkerd stat trafficsplit` output.  This is because when we do a Prometheus query for traffic to the traffic split, we supply a Prometheus label selector to only select traffic sources in the namespace of the traffic split.

Since any pod in any namespace can send traffic to the apex service of a traffic split, we must look at all possible sources of traffic, not just the ones in the same namespace.

Before:

```
$ bin/linkerd stat ts
NAME           APEX     LEAF       WEIGHT   SUCCESS   RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
webapp-split   webapp   webapp       900m         -     -             -             -             -
webapp-split   webapp   webapp-2     100m         -     -             -             -             -
```

After:

```
$ bin/linkerd stat ts
NAME           APEX     LEAF       WEIGHT   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
webapp-split   webapp   webapp       900m    80.00%   1.4rps          31ms          99ms        2530ms
webapp-split   webapp   webapp-2     100m    60.00%   0.2rps          35ms          93ms          99ms
```

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-02-12 09:21:59 -08:00
Alejandro Pedraza 3ba66f6f9d
Fix flakey TestGetProfiles (#3965)
Fixes #3332

Fixes the very rare test failure
```
--- FAIL: TestGetProfiles (0.33s)
    --- FAIL: TestGetProfiles/Returns_server_profile (0.11s)
            server_test.go:228: Expected 1 or 2 updates but got 3:
            [retry_budget:<retry_ratio:0.2 min_retries_per_second:10
            ttl:<seconds:10 > >  routes:<condition:<path:<regex:"/a/b/c"
            > > metrics_labels:<key:"route" value:"route1" >
            timeout:<seconds:10 > > retry_budget:<retry_ratio:0.2
            min_retries_per_second:10 ttl:<seconds:10 > >
            routes:<condition:<path:<regex:"/a/b/c" > >
            metrics_labels:<key:"route" value:"route1" >
            timeout:<seconds:10 > > retry_budget:<retry_ratio:0.2
            min_retries_per_second:10 ttl:<seconds:10 > > ]
            FAIL
            FAIL  github.com/linkerd/linkerd2/controller/api/destination
            0.624s
```
that occurs when a third unexpected stream update occurs, when the fake
API takes more time to notify its listeners about the resources created.

For all the nasty details check #3332
2020-02-07 19:43:29 -05:00
Dax McDonald 76d3285247
Use correct go module file syntax (#4021)
The correct syntax for the go module file is
go MAJOR.MINOR

Signed-off-by: Dax McDonald <dax@rancher.com>
2020-02-07 07:58:54 -08:00
Alejandro Pedraza afb93cddc8
Use `t.Name()` instead of `t.Name` in tests (#3970)
Use `t.Name()` instead of `t.Name` when retrieving the name of tests.
This was causing an error to be added in the log:
```
output: logrus_error="can not add field \"test\"
```

Followup to
[comment](https://github.com/linkerd/linkerd2/pull/3965#discussion_r370387990)
2020-01-27 09:17:19 -05:00
Kevin Leimkuhler 53baecb382
Changes for edge-20.1.3 (#3966)
## edge-20.1.3

* CLI
  * Introduced `linkerd check --pre --linkerd-cni-enabled`, used when the CNI
    plugin is used, to check it has been properly installed before proceeding
    with the control plane installation
  * Added support for the `--as-group` flag so that users can impersonate
    groups for Kubernetes operations (thanks @mayankshah160!)
* Controller
  * Fixed an issue where an override of the Docker registry was not being
    applied to debug containers (thanks @javaducky!)
  * Added check for the Subject Alternate Name attributes to the API server
    when access restrictions have been enabled (thanks @javaducky!)
  * Added support for arbitrary pod labels so that users can leverage the
    Linkerd provided Prometheus instance to scrape for their own labels
    (thanks @daxmc99!)
  * Fixed an issue with CNI config parsing

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-01-23 16:55:21 -08:00
Zahari Dichev a9d38189fb Fix CNI config parsing (#3953)
This PR addreses the problem introduced after #3766.

Fixes #3941 

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-01-23 09:55:04 -08:00
Mayank Shah 60ac0d5527 Add `as-group` CLI flag (#3952)
Add CLI flag --as-group that can impersonate group for k8s operations

Signed-off-by: Mayank Shah mayankshah1614@gmail.com
2020-01-22 16:38:31 +02:00
Paul Balogh b5e39bcbf7 Utilize Common Name or Subject Alternate Name for access checks (#3459) (#3949)
Subject
Utilize Common Name or Subject Alternate Name for access checks (#3459)

Problem
When access restrictions to API server have been enabled with the requestheader-allowed-names configuration, only the Common Name of the requestor certificate is being checked. This check should include the use of Subject Alternate Name attributes.

Solution
API server will now check the SAN attributes (DNS Names, Email Addresses, IP Addresses, and URIs) when determining accessibility for allowed names.

Fixes issue #3459

Signed-off-by: Paul Balogh <javaducky@gmail.com>
2020-01-22 08:58:19 +02:00
Paul Balogh dabee12b93 Fix issue for debug containers when using custom Docker registry (#3873)
**Subject**
Fixes bug where override of Docker registry was not being applied to debug containers (#3851)

**Problem**
Overrides for Docker registry are not being applied to debug containers and provide no means to correct the image.

**Solution**
This update expands the `data.proxy` configuration section within the Linkerd `ConfigMap` to maintain the overridden image name for debug containers at _install_-time similar to handling of the `proxy` and `proxyInit` images.

This change also enables the further override option of the registry for debug containers at _inject_-time given utilization of the `--registry` CLI option.

**Validation**
Several new unit tests have been created to confirm functionality.  In addition, the following workflows were run through:

### Standard Workflow with Custom Registry
This workflow installs Linkerd control plane based upon a custom registry, then injecting the debug sidecar into a service.

* Start with a k8s instance having no Linkerd installation
* Build all images locally using `bin/docker-build`
* Create custom tags (using same version) for generated images, e.g. `docker tag gcr.io/linkerd-io/debug:git-a4ebecb6 javaducky.com/linkerd-io/debug:git-a4ebecb6`
* Install Linkerd with registry override `bin/linkerd install --registry=javaducky.com/linkerd-io | kubectl apply -f -`
* Once Linkerd has been fully initialized, you should be able to confirm that the `linkerd-config` ConfigMap now contains the debug image name, pull policy, and version within the `data.proxy` section
* Request injection of the debug image into an available container.  I used the Emojivoto voting service as described in https://linkerd.io/2/tasks/using-the-debug-container/ as `kubectl -n emojivoto get deploy/voting -o yaml | bin/linkerd inject --enable-debug-sidecar - | kubectl apply -f -`
* Once the deployment creates a new pod for the service, inspection should show that the container now includes the "linkerd-debug" container name based on the applicable override image seen previously within the ConfigMap
* Debugging can also be verified by viewing debug container logs as `kubectl -n emojivoto logs deploy/voting linkerd-debug -f`
* Modifying the `config.linkerd.io/enable-debug-sidecar` annotation, setting to “false”, should show that the pod will be recreated no longer running the debug container.

### Overriding the Custom Registry Override at Injection
This builds upon the “Standard Workflow with Custom Registry” by overriding the Docker registry utilized for the debug container at the time of injection.

* “Clean” the Emojivoto voting service by removing any Linkerd annotations from the deployment
* Request injection similar to before, except provide the `--registry` option as in `kubectl -n emojivoto get deploy/voting -o yaml | bin/linkerd inject --enable-debug-sidecar --registry=gcr.io/linkerd-io - | kubectl apply -f -`
* Inspection of the deployment config should now show the override annotation for `config.linkerd.io/debug-image` having the debug container from the new registry.  Viewing the running pod should show that the `linkerd-debug` container was injected and running the correct image.  Of note, the proxy and proxy-init images are still running the “original” override images.
* As before, modifying the `config.linkerd.io/enable-debug-sidecar` annotation setting to “false”, should show that the pod will be recreated no longer running the debug container.

### Standard Workflow with Default Registry
This workflow is the typical workflow which utilizes the standard Linkerd image registry.

* Uninstall the Linkerd control plane using `bin/linkerd install --ignore-cluster | kubectl delete -f -` as described at https://linkerd.io/2/tasks/uninstall/
* Clean the Emojivoto environment using `curl -sL https://run.linkerd.io/emojivoto.yml | kubectl delete -f -` then reinstall using `curl -sL https://run.linkerd.io/emojivoto.yml | kubectl apply -f -`
* Perform standard Linkerd installation as `bin/linkerd install | kubectl apply -f -`
* Once Linkerd has been fully initialized, you should be able to confirm that the `linkerd-config` ConfigMap references the default debug image of `gcr.io/linkerd-io/debug` within the `data.proxy` section
* Request injection of the debug image into an available container as `kubectl -n emojivoto get deploy/voting -o yaml | bin/linkerd inject --enable-debug-sidecar - | kubectl apply -f -`
* Debugging can also be verified by viewing debug container logs as `kubectl -n emojivoto logs deploy/voting linkerd-debug -f`
* Modifying the `config.linkerd.io/enable-debug-sidecar` annotation, setting to “false”, should show that the pod will be recreated no longer running the debug container.

### Overriding the Default Registry at Injection
This workflow builds upon the “Standard Workflow with Default Registry” by overriding the Docker registry utilized for the debug container at the time of injection.

* “Clean” the Emojivoto voting service by removing any Linkerd annotations from the deployment
* Request injection similar to before, except provide the `--registry` option as in `kubectl -n emojivoto get deploy/voting -o yaml | bin/linkerd inject --enable-debug-sidecar --registry=javaducky.com/linkerd-io - | kubectl apply -f -`
* Inspection of the deployment config should now show the override annotation for `config.linkerd.io/debug-image` having the debug container from the new registry.  Viewing the running pod should show that the `linkerd-debug` container was injected and running the correct image.  Of note, the proxy and proxy-init images are still running the “original” override images.
* As before, modifying the `config.linkerd.io/enable-debug-sidecar` annotation setting to “false”, should show that the pod will be recreated no longer running the debug container.

Fixes issue #3851 

Signed-off-by: Paul Balogh javaducky@gmail.com
2020-01-17 10:18:03 -08:00
Mayank Shah b94e03a8a6 Remove empty fields from generated configs (#3886)
Fixes
- https://github.com/linkerd/linkerd2/issues/2962
- https://github.com/linkerd/linkerd2/issues/2545

### Problem
Field omissions for workload objects are not respected while marshaling to JSON.

### Solution
After digging a bit into the code, I came to realize that while marshaling, workload objects have empty structs as values for various fields which would rather be omitted. As of now, the standard library`encoding/json` does not support zero values of structs with the `omitemty` tag. The relevant issue can be found [here](https://github.com/golang/go/issues/11939). To tackle this problem, the object declaration should have _pointer-to-struct_ as a field type instead of _struct_ itself. However, this approach would be out of scope as the workload object declaration is handled by the k8s library.

I was able to find a drop-in replacement for the `encoding/json` library which supports zero value of structs with the `omitempty` tag. It can be found [here](https://github.com/clarketm/json). I have made use of this library to implement a simple filter like functionality to remove empty tags once a YAML with empty tags is generated, hence leaving the previously existing methods unaffected

Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
2020-01-13 10:02:24 -08:00
Alex Leong 93a81dce97
Change default proxy log level to "warn,linkerd=info" (#3908)
Fixes #3901 

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-01-09 14:22:06 -08:00
Paul Balogh 2cd2ecfa30 Enable mixed configuration of skip-[inbound|outbound]-ports (#3766)
* Enable mixed configuration of skip-[inbound|outbound]-ports using port numbers and ranges (#3752)
* included tests for generated output given proxy-ignore configuration options
* renamed "validate" method to "parseAndValidate" given mutation
* updated documentation to denote inclusiveness of ranges
* Updates for expansion of ignored inbound and outbound port ranges to be handled by the proxy-init rather than CLI (#3766)

This change maintains the configured ports and ranges as strings rather than unsigned integers, while still providing validation at the command layer.

* Bump versions for proxy-init to v1.3.0

Signed-off-by: Paul Balogh <javaducky@gmail.com>
2019-12-20 09:32:13 -05:00
Alex Leong 03762cc526
Support pod ip and service cluster ip lookups in the destination service (#3595)
Fixes #3444 
Fixes #3443 

## Background and Behavior

This change adds support for the destination service to resolve Get requests which contain a service clusterIP or pod ip as the `Path` parameter.  It returns the stream of endpoints, just as if `Get` had been called with the service's authority.  This lays the groundwork for allowing the proxy to TLS TCP connections by allowing the proxy to do destination lookups for the SO_ORIG_DST of tcp connections.  When that ip address corresponds to a service cluster ip or pod ip, the destination service will return the endpoints stream, including the pod metadata required to establish identity.

Prior to this change, attempting to look up an ip address in the destination service would result in a `InvalidArgument` error.

Updating the `GetProfile` method to support ip address lookups is out of scope and attempts to look up an ip address with the `GetProfile` method will result in `InvalidArgument`.

## Implementation

We do this by creating a `IPWatcher` which wraps the `EndpointsWatcher` and supports lookups by ip.   `IPWatcher` maintains a mapping up clusterIPs to service ids and translates subscriptions to an IP address into a subscription to the service id using the underlying `EndpointsWatcher`.

Since the service name is no longer always infer-able directly from the input parameters, we restructure `EndpointTranslator` and `PodSet` so that we propagate the service name from the endpoints API response.

## Testing

This can be tested by running the destination service locally, using the current kube context to connect to a Kubernetes cluster:

```
go run controller/cmd/main.go destination -kubeconfig ~/.kube/config
```

Then lookups can be issued using the destination client:

```
go run controller/script/destination-client/main.go -path 192.168.54.78:80 -method get -addr localhost:8086
```

Service cluster ips and pod ips can be used as the `path` argument.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-12-19 09:25:12 -08:00
Sergio C. Arteaga a1141fc507 Cache StatSummary responses in dashboard web server (#3769)
Signed-off-by: Sergio Castaño Arteaga <tegioz@icloud.com>
2019-12-17 09:15:00 -05:00
Dax McDonald 3088f404ce Upgrade prometheus to v1.2.1 (#3541)
Signed-off-by: Dax McDonald <dax@rancher.com>
2019-12-11 15:26:16 -08:00
Sergio C. Arteaga cee8e3d0ae Add CronJobs and ReplicaSets to dashboard and CLI (#3687)
This PR adds support for CronJobs and ReplicaSets to `linkerd inject`, the web
dashboard and CLI. It adds a new Grafana dashboard for each kind of resource. 

Closes #3614 
Closes #3630 
Closes #3584 
Closes #3585

Signed-off-by: Sergio Castaño Arteaga tegioz@icloud.com
Signed-off-by: Cintia Sanchez Garcia cynthiasg@icloud.com
2019-12-11 10:02:37 -08:00
Zahari Dichev e5f75a8c3d
Add validation to ensure stat time window is at least 15s (#3720)
* Add stat time window minimum of 10s

Signed-off-by: zaharidichev <zaharidichev@gmail.com>

* Address comments

Signed-off-by: zaharidichev <zaharidichev@gmail.com>
2019-12-04 08:12:01 +02:00
Alejandro Pedraza cf9fa0a8c9
Removed calico logutils dependency, incompatible with go 1.13 (#3763)
* Removed calico logutils dependency, incompatible with go 1.13

Fixes #1153

Removed dependency on
`github.com/projectcalico/libcalico-go/lib/logutils` because it has
problems with go modules, as described in
projectcalico/libcalico-go#1153

Not a big deal since it was only used for modifying the plugin's log
format.
2019-11-29 09:19:11 -05:00
Zahari Dichev b83c3a2137
Add Responses to path items to satisfy kube apiserver (#3700)
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
2019-11-15 20:41:50 +02:00
Alex Leong 0026103362 Unit and integration test fixups (#3730)
- Added cleanup step at the end of all integration tests.
- Disable external_issuer_integration_tests in cloud_tests due to
  namespace issue. Running this via `kind` tests is sufficient for now.
- Set a flakey test to `Skip`, relates to #3332.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-11-15 03:40:42 -08:00
Alejandro Pedraza 4b6254b52e
Replaced `uuid` with `uid` from linkerd-config resource (#3694)
* Replaced `uuid` with `uid` from linkerd-config resource

Fixes #3621

Removed the old `uuid` for identifying linkerd installations, and
replaced it with the `uid` property from the `linkerd-config` ConfigMap.

I tested that this `uid` remains the same by updating the config and
also upgrading linkerd, using both the CLI and Helm.

Note that this required granting `linkerd-web` RBAC access to the
`linkerd-config` Config.

I also added an integration test to verify the stability of the uid.
2019-11-13 13:56:01 -05:00
Alejandro Pedraza 3324966702
Upgrade go to 1.13.4 (#3702)
Fixes #3566

As explained in #3566, as of go 1.13 there's a strict check that ensures a dependency's timestamp matches it's sha (as declared in go.mod). Our smi-sdk dependency has a problem with that that got resolved later on, but more work would be required to upgrade that dependency. In the meantime a quick pair of replace statements at the bottom of go.mod fix the issue.
2019-11-13 12:54:36 -05:00
Tarun Pothulapati f18e27b115 use appsv1 api in identity (#3682)
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2019-11-06 15:06:09 -08:00
Alejandro Pedraza 0e8958cd07
Fixed bad identity string for target pod in tap (#3675)
* Fixed bad identity string for target pod in tap

Fixes #3506

Was using the cluster domain instead of the trust domain, which results
in an error when those domains differ.
2019-11-05 15:57:41 -05:00
Alejandro Pedraza 8cf4494e78
Add proxy-injector-injections count to heartbeat (#3655)
Fixes #3059
2019-10-31 11:09:00 -05:00
Alejandro Pedraza d3d8266c63
If tap source IP matches many running pods then only show the IP (#3513)
* If tap source IP matches many running pods then only show the IP

When an unmeshed source ip matched more than one running pod, tap was
showing the names for all those pods, even though the didn't necessary
originate the connection. This could be reproduced when using pod
network add-on such as Calico.

With this change, if a node matches, return it, otherwise we proceed to look for a matching pod. If exactly one running pod matches we return it. Otherwise we return just the IP.

Fixes #3103
2019-10-25 12:38:11 -05:00
Zahari Dichev 0017f9a60a Cert manager support (#3600)
* Add support for --identity-issuer-mode flag to install cmd
* Change flag to be a bool
* Read correct data form identity when external issuer is used
* Add ability for identity service to dynamically reload certs
* Fix failing tests
* Minor refactor
* Load trust anchors from identity issuer secret
* Make identity service actually watch for issuer certs updates
* Add some testing around cmd line identity options validation
* Add tests ensuring that identity service loads issuer
* Take into account external-issuer flag during upgrade + tests
* Fix failing upgrade test
* Address initial review feedback
* Address further review feedback on cli and helm
* Do not persist --identity-external-issuer
* Some improvements to identitiy service
* Bring back persistane of external issuer flag
* Address more feedback
* Update dockerfiles shas
* Publishing k8s events on issuer certs rotation
* Ensure --ignore-cluster+external issuer is not supported
* Update go-deps shas
* Transition to identity issuer scheme based configuration
* Use k8s consts for secret file names

Signed-off-by: zaharidichev <zaharidichev@gmail.com>
2019-10-24 13:15:14 -07:00
Andrew Seigner 0f9ea553d2 Add APIService fake clientset support (#3569)
The `linkerd upgrade --from-manifests` command supports reading the
manifest output via `linkerd install`. PR #3167 introduced a tap
APIService object into `linkerd install`, but the manifest-reading code
in fake.go was never updated to support this new object kind.

Update the fake clientset code to support APIService objects.

Fixes #3559

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-10-21 12:12:19 -07:00
Tarun Pothulapati f3deee01b6 Trace Control plane Components with OC (#3495)
* add trace flags and initialisation
* add ocgrpc handler to newgrpc
* add ochttp handler to linkerd web
* add flags to linkerd web
* add ochttp handler to prometheus handler initialisation
* add ochttp clients for components
* add span for prometheus query
* update godep sha
* fix reviews
* better commenting
* add err checking
* remove sampling
* add check in main
* move to pkg/trace

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2019-10-18 12:19:13 -07:00
Alex Leong 3dcff52b9f
Switch from using golangci fmt to using goimports (#3555)
CI currently enforcing formatting rules by using the fmt linter of golang-ci-lint which is invoked from the bin/lint script.  However it doesn't seem possible to use golang-ci-lint as a formatter, only as a linter which checks formatting.  This means any formatter used by your IDE or invoked manually may or may not use the same formatting rules as golang-ci-lint depending on which formatter you use and which specific revision of that formatter you use.  

In this change we stop using golang-ci-lint for format checking.  We introduce `tools.go` and add goimports to the `go.mod` and `go.sum` files.  This allows everyone to easily get the same revision of goimports by running `go install -mod=readonly golang.org/x/tools/cmd/goimports` from inside of the project.  We add a step in the CI workflow that uses goimports via the `bin/fmt` script to check formatting.

Some shell gymnastics were required in the `bin/fmt` script to work around some limitations of `goimports`:
* goimports does not have a built-in mechanism for excluding directories, and we need to exclude the vendor director as well as the generated Go sources
* goimports returns a 0 exit code, even when formatting errors are detected

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-10-16 13:56:11 -07:00
Johannes Hansen f880e71fcd The linkerd proxy does not work with headless services (#3470)
* The linkerd proxy does not work with headless services (i.e. endpoints not referencing a pod).

Changed endpoints_watcher to also return endpoints with no targetref.

Fixes #3308

Signed-off-by: Johannes Hansen <johannesh1980@gmail.com>

* Fix panic in endpoint_translator

Signed-off-by: Johannes Hansen <johannesh1980@gmail.com>
2019-10-15 14:56:41 -07:00
Alex Leong ef54d18bb7
Fallback to defaults when config cannot be loaded (#3530)
When running the destination controller locally, the Linkerd config files which are typically mounted from a configmap are not available.  To facilitate local development, we fall back to default values in this case instead of failing to start up.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-10-15 14:47:42 -07:00
Guangming Wang c59e9cf500 move t.Fatalf out of goroutine in server_test.go (#3490)
Subject
t.Fataf should not be called in goroutine

Problem

Solution
move t.Fatalf into testing func instead of its goroutine

Validation
unit test passed on my env

Signed-off-by: Guangming Wang <guangming.wang@daocloud.io>
2019-10-15 14:22:10 -07:00
Alejandro Pedraza 3de35ccc58
Remove Discovery service leftovers (#3500)
Followup to #2990, which refactored `linkerd endpoints` to use the
`Destination.Get` API instead of the `Discovery.Endpoints` API, leaving
the Discovery with no implented methods. This PR removes all the Discovery
code leftovers.

Fixes #3499
2019-10-15 11:20:21 -05:00
Ivan Sim cf69dedf9c
Re-add the destination container to the controller spec (#3540)
* Re-add the destination container to the controller spec

This fix is necessary to avoid data plane downtime during an upgrade to
stable-2.6. All existing older proxies will continue to send requests to
this destination container, until the data plane is restarted.

On restart, the new pods will start forwarding their requests to the new
linkerd-dst service.

* Use the 2.6 destination service fqdn
* Fixed unit tests
* Fix integration test failure

Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-10-08 10:49:40 -07:00
Kevin Leimkuhler a3a240e0ef
Add TapEvent headers and trailers to the tap protobuf (#3410)
### Motivation

In order to expose arbitrary headers through tap, headers and trailers should be
read from the linkerd2-proxy-api `TapEvent`s and set in the public `TapEvent`s.
This change should have no user facing changes as it just prepares the events
for JSON output in linkerd/linkerd2#3390

### Solution

The public API has been updated with a headers field for
`TapEvent_Http_RequestInit_` and `TapEvent_Http_ResponseInit_`, and trailers
field for `TapEvent_Http_ResponseEnd_`.

These values are set by reading the corresponding fields off of the proxy's tap
events.

The proto changes are equivalent to the proto changes proposed in
linkerd/linkerd2-proxy-api#33

Closes #3262

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2019-09-29 09:54:37 -07:00
Ivan Sim 9f21c8b481
Introduce Tracing Annotations (#3481)
* Add the tracing environment variables to the proxy spec
* Add tracing event
* Remove unnecessary CLI change
* Update log message
* Handle single segment service name
* Use default service account if not provided

The injector doesn't read the defaults from the values.yaml

* Remove references to conf.workload.ownerRef in log messages

This nested field isn't always set.

Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-09-26 16:07:30 -07:00
Alex Leong 4799baa8e2
Revert "Trace Control Plane components using OC (#3461)" (#3484)
This reverts commit edd3b1f6d4.

This is a temporary revert of #3461 while we sort out some details of how this should configured and how it should interact with configuring a trace collector on the Linkerd proxy.  We will reintroduce this change once the config plan is straightened out.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-09-26 11:56:44 -07:00
Tarun Pothulapati edd3b1f6d4 Trace Control Plane components using OC (#3461)
* add exporter config for all components

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add cmd flags wrt tracing

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add ochttp tracing to web server

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add flags to the tap deployment

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add trace flags to install and upgrade command

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add linkerd prefix to svc names

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add ochttp trasport to API Internal Client

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* fix goimport linting errors

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add ochttp handler to tap http server

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* review and fix tests

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* update test values

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* use common template

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* update tests

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* use Initialize

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* fix sample flag

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add verbose info reg flags

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2019-09-26 08:11:48 -07:00
Tarun Pothulapati 139c64132d Make Identity use GRPC Server with Prom Metrics (#3457)
* make identity use grpc server with prom metrics

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* linting fix

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2019-09-23 08:17:41 -07:00
Tarun Pothulapati 49d39e5a12 Instrumenting Proxy-Injector (#3354)
* add proxy injection prometheus counters

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* formatted injection reasons

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* update proxy injection report tests

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* keep the structure, and add global ownerKind

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* increase request count, when owner is nil

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add readable reasons using map

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* fix linting issues

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add proxy config override annotations as labels

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* remove space for machine reasons

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* use correct proxy image override annotation

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add annotation_at label to prom metrics

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* refactor disablebyannotation function

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2019-09-20 09:46:57 -07:00
Kevin Leimkuhler c62c90870e
Add JSON output to tap command (#3434)
Replaces #3411 

### Motivation

It is a little tough to filter/read the current tap output. As headers are being
added to tap, the output is starting to get difficult to consume. Take a peek at
#3262 for an example. It would be nice to have some more machine readable output
that can be sliced and diced with tools such as jq.

### Solution

A new output option has been added to the `linkerd tap` command that returns the
JSON encoding of tap events.

The default output is line oriented; `-o wide` appends the request's target
resource type to the tap line oriented tap events.

In order display certain values in a more human readable form, a tap event
display struct has been introduced. This struct maps public API `TapEvent`s
directly to a private `tapEvent`. This struct offers a flatter JSON structure
than the protobuf JSON rendering. It also can format certain field--such as
addresses--better than the JSON protobuf marshaler.

Closes #3390

**Default**:
```
➜  linkerd2 git:(kleimkuhler/tap-json-output) linkerd -n linkerd tap deploy/linkerd-web
req id=5:0 proxy=in  src=10.1.6.146:36976 dst=10.1.6.148:9994 tls=not_provided_by_remote :method=GET :authority=10.1.6.148:9994 :path=/metrics
rsp id=5:0 proxy=in  src=10.1.6.146:36976 dst=10.1.6.148:9994 tls=not_provided_by_remote :status=200 latency=3366µs
end id=5:0 proxy=in  src=10.1.6.146:36976 dst=10.1.6.148:9994 tls=not_provided_by_remote duration=132µs response-length=1505B
```

**Wide**:
```
➜  linkerd2 git:(kleimkuhler/tap-json-output) linkerd -n linkerd tap deploy/linkerd-web -o wide
req id=6:0 proxy=in  src=10.1.0.1:35394 dst=10.1.6.148:9994 tls=not_provided_by_remote :method=GET :authority=10.1.6.148:9994 :path=/ping dst_res=deploy/linkerd-web dst_ns=linkerd
rsp id=6:0 proxy=in  src=10.1.0.1:35394 dst=10.1.6.148:9994 tls=not_provided_by_remote :status=200 latency=1442µs dst_res=deploy/linkerd-web dst_ns=linkerd
end id=6:0 proxy=in  src=10.1.0.1:35394 dst=10.1.6.148:9994 tls=not_provided_by_remote duration=88µs response-length=5B dst_res=deploy/linkerd-web dst_ns=linkerd
```

**JSON**:
*Edit: Flattened `Method` and `Scheme` formatting*
```
{
  "source": {
    "ip": "10.138.0.28",
    "port": 47078,
    "metadata": {
      "daemonset": "ip-masq-agent",
      "namespace": "kube-system",
      "pod": "ip-masq-agent-4d5s9",
      "serviceaccount": "ip-masq-agent",
      "tls": "not_provided_by_remote"
    }
  },
  "destination": {
    "ip": "10.60.1.49",
    "port": 9994,
    "metadata": {
      "control_plane_ns": "linkerd",
      "deployment": "linkerd-web",
      "namespace": "linkerd",
      "pod": "linkerd-web-6988999458-c6wpw",
      "pod_template_hash": "6988999458",
      "serviceaccount": "linkerd-web"
    }
  },
  "routeMeta": null,
  "proxyDirection": "INBOUND",
  "requestInitEvent": {
    "id": {
      "base": 0,
      "stream": 0
    },
    "method": "GET",
    "scheme": "",
    "authority": "10.60.1.49:9994",
    "path": "/ready"
  }
}
{
  "source": {
    "ip": "10.138.0.28",
    "port": 47078,
    "metadata": {
      "daemonset": "calico-node",
      "namespace": "kube-system",
      "pod": "calico-node-bbrjq",
      "serviceaccount": "calico-sa",
      "tls": "not_provided_by_remote"
    }
  },
  "destination": {
    "ip": "10.60.1.49",
    "port": 9994,
    "metadata": {
      "control_plane_ns": "linkerd",
      "deployment": "linkerd-web",
      "namespace": "linkerd",
      "pod": "linkerd-web-6988999458-c6wpw",
      "pod_template_hash": "6988999458",
      "serviceaccount": "linkerd-web"
    }
  },
  "routeMeta": null,
  "proxyDirection": "INBOUND",
  "responseInitEvent": {
    "id": {
      "base": 0,
      "stream": 0
    },
    "sinceRequestInit": {
      "nanos": 644820
    },
    "httpStatus": 200
  }
}
{
  "source": {
    "ip": "10.138.0.28",
    "port": 47078,
    "metadata": {
      "deployment": "calico-typha",
      "namespace": "kube-system",
      "pod": "calico-typha-59cb487c49-8247r",
      "pod_template_hash": "59cb487c49",
      "serviceaccount": "calico-sa",
      "tls": "not_provided_by_remote"
    }
  },
  "destination": {
    "ip": "10.60.1.49",
    "port": 9994,
    "metadata": {
      "control_plane_ns": "linkerd",
      "deployment": "linkerd-web",
      "namespace": "linkerd",
      "pod": "linkerd-web-6988999458-c6wpw",
      "pod_template_hash": "6988999458",
      "serviceaccount": "linkerd-web"
    }
  },
  "routeMeta": null,
  "proxyDirection": "INBOUND",
  "responseEndEvent": {
    "id": {
      "base": 0,
      "stream": 0
    },
    "sinceRequestInit": {
      "nanos": 790898
    },
    "sinceResponseInit": {
      "nanos": 146078
    },
    "responseBytes": 3,
    "grpcStatusCode": 0
  }
}
```

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2019-09-19 09:34:49 -07:00
Alejandro Pedraza 30ecddb965
Fix injector timeout under high load (#3442)
* Fix injector timeout under high load

Fixes #3358

When retrieving a pod owner, we were hitting the k8s API directly because
at injection time the informer might not have been informed about the
existence of the parent object.
Under a large number of injection requests this ended up in the k8s API requests
being throttled, the proxy-injector getting blocked and the webhook requests
timing out.

Now we'll hit the shared informer first, and hit the k8s API only when
the informer doesn't return anything. After a few injection requests for
the same owner, the informer should have been updated.

Testing:

Scaling an emoji deployment to 1000 replicas, and after waiting for a
couple of minutes:

Before:
```bash
# a portion of the pods doesn't get injected
$ kubectl-n emojivoto get po | grep ./1 | wc -l
109

kubectl -n kube-system logs -f kube-apiserver-minikube | grep
failing.*timeout
.... (lots of errors)
```

After:
```bash
# all the pods get injected
$ kubectl -n emojivoto get po | grep ./1 | wc -l
0

kubectl -n kube-system logs -f kube-apiserver-minikube | grep
failing.*timeout
```
2019-09-18 17:58:38 -05:00
Andrew Seigner c5a85e587c
Update to client-go v12.0.0, forked stern (#3387)
The repo depended on an old version of client-go. It also depended on
stern, which itself depended on an old version of client-go, making
client-go upgrade non-trivial.

Update the repo to client-go v12.0.0, and also replace stern with a
fork.

This fork of stern includes the following changes:
- updated to use Go Modules
- updated to use client-go v12.0.0
- fixed log line interleaving:
  - https://github.com/wercker/stern/issues/96
  - based on:
    - 8723308e46

Fixes #3382

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-09-10 11:04:29 -07:00
Andrew Seigner 7f59caa7fc
Bump proxy-init to 1.2.0 (#3397)
Pulls in latest proxy-init:
https://github.com/linkerd/linkerd2-proxy-init/releases/tag/v1.2.0

This also bumps a dependency on cobra, which provides more complete zsh
completion.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-09-09 09:06:14 -07:00
Andrew Seigner d773a47dd3
Shrink controller Docker image from 315MB to 38MB (#3378)
The controller Docker image included 7 Go binaries (destination,
heartbeat, identity, proxy-injector, public-api, sp-validator, tap),
each roughly 35MB, with similar dependencies.

Change each controller binary into subcommands of a single `controller`
binary, decreasing the controller Docker image size from 315MB to 38MB.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-09-05 11:44:03 -07:00
Bruno M. Custódio 8fec756395 Add '--address' flag to 'linkerd dashboard'. (#3274)
Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com>
2019-09-05 10:56:10 -07:00
Alejandro Pedraza acbab93ca8
Add support for k8s 1.16 (#3364)
Fixes #3356

1.16 removes some api groups that were already deprecated. From k8s blog
post (https://kubernetes.io/blog/2019/07/18/api-deprecations-in-1-16/):

```
- PodSecurityPolicy: will no longer be served from extensions/v1beta1 in
v1.16.
    Migrate to the policy/v1beta1 API, available since v1.10. Existing
    persisted data can be retrieved/updated via the policy/v1beta1 API.
- DaemonSet, Deployment, StatefulSet, and ReplicaSet: will no longer be
served from extensions/v1beta1, apps/v1beta1, or apps/v1beta2 in v1.16.
    Migrate to the apps/v1 API, available since v1.9. Existing persisted
    data can be retrieved/updated via the apps/v1 API.
```

Previous PRs had already made this change at the Helm templates level,
but we still needed to do it at the API calls and tests.

The integration tests ran fine for k8s 1.12 and 1.15. They fail on 1.16
because the upgrade integration test tries to install linkerd 2.5 which is not
compatible with 1.16.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-09-04 09:59:55 -05:00
Andrew Seigner 90c547576d
Remove broken thrift dependency (#3370)
The repo depended on a (recently broken) thrift package:

```
github.com/linkerd/linkerd2
 -> contrib.go.opencensus.io/exporter/ocagent@v0.2.0
  -> go.opencensus.io@v0.17.0
   -> git.apache.org/thrift.git@v0.0.0-20180902110319-2566ecd5d999
```
... via this line in `controller/k8s`:

```go
_ "k8s.io/client-go/plugin/pkg/client/auth"
```

...which created a dependency on go.opencensus.io:

```bash
$ go mod why go.opencensus.io
...
github.com/linkerd/linkerd2/controller/k8s
k8s.io/client-go/plugin/pkg/client/auth
k8s.io/client-go/plugin/pkg/client/auth/azure
github.com/Azure/go-autorest/autorest
github.com/Azure/go-autorest/tracing
contrib.go.opencensus.io/exporter/ocagent
go.opencensus.io
```

Bump contrib.go.opencensus.io/exporter/ocagent from `v0.2.0` to
`v0.6.0`, creating this new dependency chain:

```
github.com/linkerd/linkerd2
 -> contrib.go.opencensus.io/exporter/ocagent@v0.6.0
  -> google.golang.org/api@v0.7.0
   -> go.opencensus.io@v0.21.0
```

Bumping our go.opencensus.io dependency from `v0.17.0` to `v0.21.0`
pulls in this commit:
ed3a3f0bf0 (diff-37aff102a57d3d7b797f152915a6dc16)

...which removes our dependency on github.com/apache/thrift

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-09-03 16:22:43 -07:00
陈谭军 e281fb3410 fix-up grammar (#3351)
Signed-off-by: chentanjun <2799194073@qq.com>
2019-08-30 08:09:36 -07:00
Alejandro Pedraza fd248d3755
Undo refactoring from #3316 (#3331)
Thus fixing `linkerd edges` and the dashboard topology graph

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-29 13:37:54 -05:00
Alejandro Pedraza 5d7499dc84
Avoid the dashboard requesting stats when not needed (#3338)
* Avoid the dashboard requesting stats when not needed

Create an alternative to `urlsForResource` called
`urlsForResourceNoStats` that makes use of the `skip_stats` parameter in
the stats API (created in #1871) that doesn't query Prometheus when not needed.

When testing using the dashboard looking at the linkerd namespace,
queries per second went down from 2874 to 2756, a 4% decrease.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-29 05:52:44 -05:00
Alejandro Pedraza 368d16f23c
Fix auto-injecting pods and integration tests reporting (#3335)
* Fix auto-injecting pods and integration tests reporting

When creating an Event when auto-injection occurs (#3316) we try to
fetch the parent object to associate the event to it. If the parent
doesn't exist (like in the case of stand-alone pods) the event isn't
created. I had missed dealing with one part where that parent was
expected.

This also adds a new integration test that I verified fails before this
fix.

Finally, I removed from `_test-run.sh` some `|| exit_code=$?` that was
preventing the whole suite to report failure whenever one of the tests
in `/tests` failed.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-28 15:04:20 -05:00