Commit Graph

419 Commits

Author SHA1 Message Date
Alejandro Pedraza 3de35ccc58
Remove Discovery service leftovers (#3500)
Followup to #2990, which refactored `linkerd endpoints` to use the
`Destination.Get` API instead of the `Discovery.Endpoints` API, leaving
the Discovery with no implented methods. This PR removes all the Discovery
code leftovers.

Fixes #3499
2019-10-15 11:20:21 -05:00
Ivan Sim cf69dedf9c
Re-add the destination container to the controller spec (#3540)
* Re-add the destination container to the controller spec

This fix is necessary to avoid data plane downtime during an upgrade to
stable-2.6. All existing older proxies will continue to send requests to
this destination container, until the data plane is restarted.

On restart, the new pods will start forwarding their requests to the new
linkerd-dst service.

* Use the 2.6 destination service fqdn
* Fixed unit tests
* Fix integration test failure

Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-10-08 10:49:40 -07:00
Kevin Leimkuhler a3a240e0ef
Add TapEvent headers and trailers to the tap protobuf (#3410)
### Motivation

In order to expose arbitrary headers through tap, headers and trailers should be
read from the linkerd2-proxy-api `TapEvent`s and set in the public `TapEvent`s.
This change should have no user facing changes as it just prepares the events
for JSON output in linkerd/linkerd2#3390

### Solution

The public API has been updated with a headers field for
`TapEvent_Http_RequestInit_` and `TapEvent_Http_ResponseInit_`, and trailers
field for `TapEvent_Http_ResponseEnd_`.

These values are set by reading the corresponding fields off of the proxy's tap
events.

The proto changes are equivalent to the proto changes proposed in
linkerd/linkerd2-proxy-api#33

Closes #3262

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2019-09-29 09:54:37 -07:00
Ivan Sim 9f21c8b481
Introduce Tracing Annotations (#3481)
* Add the tracing environment variables to the proxy spec
* Add tracing event
* Remove unnecessary CLI change
* Update log message
* Handle single segment service name
* Use default service account if not provided

The injector doesn't read the defaults from the values.yaml

* Remove references to conf.workload.ownerRef in log messages

This nested field isn't always set.

Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-09-26 16:07:30 -07:00
Alex Leong 4799baa8e2
Revert "Trace Control Plane components using OC (#3461)" (#3484)
This reverts commit edd3b1f6d4.

This is a temporary revert of #3461 while we sort out some details of how this should configured and how it should interact with configuring a trace collector on the Linkerd proxy.  We will reintroduce this change once the config plan is straightened out.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-09-26 11:56:44 -07:00
Tarun Pothulapati edd3b1f6d4 Trace Control Plane components using OC (#3461)
* add exporter config for all components

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add cmd flags wrt tracing

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add ochttp tracing to web server

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add flags to the tap deployment

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add trace flags to install and upgrade command

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add linkerd prefix to svc names

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add ochttp trasport to API Internal Client

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* fix goimport linting errors

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add ochttp handler to tap http server

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* review and fix tests

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* update test values

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* use common template

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* update tests

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* use Initialize

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* fix sample flag

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add verbose info reg flags

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2019-09-26 08:11:48 -07:00
Tarun Pothulapati 139c64132d Make Identity use GRPC Server with Prom Metrics (#3457)
* make identity use grpc server with prom metrics

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* linting fix

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2019-09-23 08:17:41 -07:00
Tarun Pothulapati 49d39e5a12 Instrumenting Proxy-Injector (#3354)
* add proxy injection prometheus counters

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* formatted injection reasons

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* update proxy injection report tests

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* keep the structure, and add global ownerKind

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* increase request count, when owner is nil

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add readable reasons using map

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* fix linting issues

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add proxy config override annotations as labels

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* remove space for machine reasons

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* use correct proxy image override annotation

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* add annotation_at label to prom metrics

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* refactor disablebyannotation function

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2019-09-20 09:46:57 -07:00
Kevin Leimkuhler c62c90870e
Add JSON output to tap command (#3434)
Replaces #3411 

### Motivation

It is a little tough to filter/read the current tap output. As headers are being
added to tap, the output is starting to get difficult to consume. Take a peek at
#3262 for an example. It would be nice to have some more machine readable output
that can be sliced and diced with tools such as jq.

### Solution

A new output option has been added to the `linkerd tap` command that returns the
JSON encoding of tap events.

The default output is line oriented; `-o wide` appends the request's target
resource type to the tap line oriented tap events.

In order display certain values in a more human readable form, a tap event
display struct has been introduced. This struct maps public API `TapEvent`s
directly to a private `tapEvent`. This struct offers a flatter JSON structure
than the protobuf JSON rendering. It also can format certain field--such as
addresses--better than the JSON protobuf marshaler.

Closes #3390

**Default**:
```
➜  linkerd2 git:(kleimkuhler/tap-json-output) linkerd -n linkerd tap deploy/linkerd-web
req id=5:0 proxy=in  src=10.1.6.146:36976 dst=10.1.6.148:9994 tls=not_provided_by_remote :method=GET :authority=10.1.6.148:9994 :path=/metrics
rsp id=5:0 proxy=in  src=10.1.6.146:36976 dst=10.1.6.148:9994 tls=not_provided_by_remote :status=200 latency=3366µs
end id=5:0 proxy=in  src=10.1.6.146:36976 dst=10.1.6.148:9994 tls=not_provided_by_remote duration=132µs response-length=1505B
```

**Wide**:
```
➜  linkerd2 git:(kleimkuhler/tap-json-output) linkerd -n linkerd tap deploy/linkerd-web -o wide
req id=6:0 proxy=in  src=10.1.0.1:35394 dst=10.1.6.148:9994 tls=not_provided_by_remote :method=GET :authority=10.1.6.148:9994 :path=/ping dst_res=deploy/linkerd-web dst_ns=linkerd
rsp id=6:0 proxy=in  src=10.1.0.1:35394 dst=10.1.6.148:9994 tls=not_provided_by_remote :status=200 latency=1442µs dst_res=deploy/linkerd-web dst_ns=linkerd
end id=6:0 proxy=in  src=10.1.0.1:35394 dst=10.1.6.148:9994 tls=not_provided_by_remote duration=88µs response-length=5B dst_res=deploy/linkerd-web dst_ns=linkerd
```

**JSON**:
*Edit: Flattened `Method` and `Scheme` formatting*
```
{
  "source": {
    "ip": "10.138.0.28",
    "port": 47078,
    "metadata": {
      "daemonset": "ip-masq-agent",
      "namespace": "kube-system",
      "pod": "ip-masq-agent-4d5s9",
      "serviceaccount": "ip-masq-agent",
      "tls": "not_provided_by_remote"
    }
  },
  "destination": {
    "ip": "10.60.1.49",
    "port": 9994,
    "metadata": {
      "control_plane_ns": "linkerd",
      "deployment": "linkerd-web",
      "namespace": "linkerd",
      "pod": "linkerd-web-6988999458-c6wpw",
      "pod_template_hash": "6988999458",
      "serviceaccount": "linkerd-web"
    }
  },
  "routeMeta": null,
  "proxyDirection": "INBOUND",
  "requestInitEvent": {
    "id": {
      "base": 0,
      "stream": 0
    },
    "method": "GET",
    "scheme": "",
    "authority": "10.60.1.49:9994",
    "path": "/ready"
  }
}
{
  "source": {
    "ip": "10.138.0.28",
    "port": 47078,
    "metadata": {
      "daemonset": "calico-node",
      "namespace": "kube-system",
      "pod": "calico-node-bbrjq",
      "serviceaccount": "calico-sa",
      "tls": "not_provided_by_remote"
    }
  },
  "destination": {
    "ip": "10.60.1.49",
    "port": 9994,
    "metadata": {
      "control_plane_ns": "linkerd",
      "deployment": "linkerd-web",
      "namespace": "linkerd",
      "pod": "linkerd-web-6988999458-c6wpw",
      "pod_template_hash": "6988999458",
      "serviceaccount": "linkerd-web"
    }
  },
  "routeMeta": null,
  "proxyDirection": "INBOUND",
  "responseInitEvent": {
    "id": {
      "base": 0,
      "stream": 0
    },
    "sinceRequestInit": {
      "nanos": 644820
    },
    "httpStatus": 200
  }
}
{
  "source": {
    "ip": "10.138.0.28",
    "port": 47078,
    "metadata": {
      "deployment": "calico-typha",
      "namespace": "kube-system",
      "pod": "calico-typha-59cb487c49-8247r",
      "pod_template_hash": "59cb487c49",
      "serviceaccount": "calico-sa",
      "tls": "not_provided_by_remote"
    }
  },
  "destination": {
    "ip": "10.60.1.49",
    "port": 9994,
    "metadata": {
      "control_plane_ns": "linkerd",
      "deployment": "linkerd-web",
      "namespace": "linkerd",
      "pod": "linkerd-web-6988999458-c6wpw",
      "pod_template_hash": "6988999458",
      "serviceaccount": "linkerd-web"
    }
  },
  "routeMeta": null,
  "proxyDirection": "INBOUND",
  "responseEndEvent": {
    "id": {
      "base": 0,
      "stream": 0
    },
    "sinceRequestInit": {
      "nanos": 790898
    },
    "sinceResponseInit": {
      "nanos": 146078
    },
    "responseBytes": 3,
    "grpcStatusCode": 0
  }
}
```

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2019-09-19 09:34:49 -07:00
Alejandro Pedraza 30ecddb965
Fix injector timeout under high load (#3442)
* Fix injector timeout under high load

Fixes #3358

When retrieving a pod owner, we were hitting the k8s API directly because
at injection time the informer might not have been informed about the
existence of the parent object.
Under a large number of injection requests this ended up in the k8s API requests
being throttled, the proxy-injector getting blocked and the webhook requests
timing out.

Now we'll hit the shared informer first, and hit the k8s API only when
the informer doesn't return anything. After a few injection requests for
the same owner, the informer should have been updated.

Testing:

Scaling an emoji deployment to 1000 replicas, and after waiting for a
couple of minutes:

Before:
```bash
# a portion of the pods doesn't get injected
$ kubectl-n emojivoto get po | grep ./1 | wc -l
109

kubectl -n kube-system logs -f kube-apiserver-minikube | grep
failing.*timeout
.... (lots of errors)
```

After:
```bash
# all the pods get injected
$ kubectl -n emojivoto get po | grep ./1 | wc -l
0

kubectl -n kube-system logs -f kube-apiserver-minikube | grep
failing.*timeout
```
2019-09-18 17:58:38 -05:00
Andrew Seigner c5a85e587c
Update to client-go v12.0.0, forked stern (#3387)
The repo depended on an old version of client-go. It also depended on
stern, which itself depended on an old version of client-go, making
client-go upgrade non-trivial.

Update the repo to client-go v12.0.0, and also replace stern with a
fork.

This fork of stern includes the following changes:
- updated to use Go Modules
- updated to use client-go v12.0.0
- fixed log line interleaving:
  - https://github.com/wercker/stern/issues/96
  - based on:
    - 8723308e46

Fixes #3382

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-09-10 11:04:29 -07:00
Andrew Seigner 7f59caa7fc
Bump proxy-init to 1.2.0 (#3397)
Pulls in latest proxy-init:
https://github.com/linkerd/linkerd2-proxy-init/releases/tag/v1.2.0

This also bumps a dependency on cobra, which provides more complete zsh
completion.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-09-09 09:06:14 -07:00
Andrew Seigner d773a47dd3
Shrink controller Docker image from 315MB to 38MB (#3378)
The controller Docker image included 7 Go binaries (destination,
heartbeat, identity, proxy-injector, public-api, sp-validator, tap),
each roughly 35MB, with similar dependencies.

Change each controller binary into subcommands of a single `controller`
binary, decreasing the controller Docker image size from 315MB to 38MB.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-09-05 11:44:03 -07:00
Bruno M. Custódio 8fec756395 Add '--address' flag to 'linkerd dashboard'. (#3274)
Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com>
2019-09-05 10:56:10 -07:00
Alejandro Pedraza acbab93ca8
Add support for k8s 1.16 (#3364)
Fixes #3356

1.16 removes some api groups that were already deprecated. From k8s blog
post (https://kubernetes.io/blog/2019/07/18/api-deprecations-in-1-16/):

```
- PodSecurityPolicy: will no longer be served from extensions/v1beta1 in
v1.16.
    Migrate to the policy/v1beta1 API, available since v1.10. Existing
    persisted data can be retrieved/updated via the policy/v1beta1 API.
- DaemonSet, Deployment, StatefulSet, and ReplicaSet: will no longer be
served from extensions/v1beta1, apps/v1beta1, or apps/v1beta2 in v1.16.
    Migrate to the apps/v1 API, available since v1.9. Existing persisted
    data can be retrieved/updated via the apps/v1 API.
```

Previous PRs had already made this change at the Helm templates level,
but we still needed to do it at the API calls and tests.

The integration tests ran fine for k8s 1.12 and 1.15. They fail on 1.16
because the upgrade integration test tries to install linkerd 2.5 which is not
compatible with 1.16.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-09-04 09:59:55 -05:00
Andrew Seigner 90c547576d
Remove broken thrift dependency (#3370)
The repo depended on a (recently broken) thrift package:

```
github.com/linkerd/linkerd2
 -> contrib.go.opencensus.io/exporter/ocagent@v0.2.0
  -> go.opencensus.io@v0.17.0
   -> git.apache.org/thrift.git@v0.0.0-20180902110319-2566ecd5d999
```
... via this line in `controller/k8s`:

```go
_ "k8s.io/client-go/plugin/pkg/client/auth"
```

...which created a dependency on go.opencensus.io:

```bash
$ go mod why go.opencensus.io
...
github.com/linkerd/linkerd2/controller/k8s
k8s.io/client-go/plugin/pkg/client/auth
k8s.io/client-go/plugin/pkg/client/auth/azure
github.com/Azure/go-autorest/autorest
github.com/Azure/go-autorest/tracing
contrib.go.opencensus.io/exporter/ocagent
go.opencensus.io
```

Bump contrib.go.opencensus.io/exporter/ocagent from `v0.2.0` to
`v0.6.0`, creating this new dependency chain:

```
github.com/linkerd/linkerd2
 -> contrib.go.opencensus.io/exporter/ocagent@v0.6.0
  -> google.golang.org/api@v0.7.0
   -> go.opencensus.io@v0.21.0
```

Bumping our go.opencensus.io dependency from `v0.17.0` to `v0.21.0`
pulls in this commit:
ed3a3f0bf0 (diff-37aff102a57d3d7b797f152915a6dc16)

...which removes our dependency on github.com/apache/thrift

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-09-03 16:22:43 -07:00
陈谭军 e281fb3410 fix-up grammar (#3351)
Signed-off-by: chentanjun <2799194073@qq.com>
2019-08-30 08:09:36 -07:00
Alejandro Pedraza fd248d3755
Undo refactoring from #3316 (#3331)
Thus fixing `linkerd edges` and the dashboard topology graph

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-29 13:37:54 -05:00
Alejandro Pedraza 5d7499dc84
Avoid the dashboard requesting stats when not needed (#3338)
* Avoid the dashboard requesting stats when not needed

Create an alternative to `urlsForResource` called
`urlsForResourceNoStats` that makes use of the `skip_stats` parameter in
the stats API (created in #1871) that doesn't query Prometheus when not needed.

When testing using the dashboard looking at the linkerd namespace,
queries per second went down from 2874 to 2756, a 4% decrease.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-29 05:52:44 -05:00
Alejandro Pedraza 368d16f23c
Fix auto-injecting pods and integration tests reporting (#3335)
* Fix auto-injecting pods and integration tests reporting

When creating an Event when auto-injection occurs (#3316) we try to
fetch the parent object to associate the event to it. If the parent
doesn't exist (like in the case of stand-alone pods) the event isn't
created. I had missed dealing with one part where that parent was
expected.

This also adds a new integration test that I verified fails before this
fix.

Finally, I removed from `_test-run.sh` some `|| exit_code=$?` that was
preventing the whole suite to report failure whenever one of the tests
in `/tests` failed.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-28 15:04:20 -05:00
arminbuerkle 5c38f38a02 Allow custom cluster domains in remaining backends (#3278)
* Set custom cluster domain in GetServiceProfileFor
* Set custom cluster domain in tap server
Move fetching cluster domain for tap server to cmd main
* Handle fetchting cluster domain errors separately
* Use custom cluster domain for traffic split adaptor

Signed-off-by: Armin Buerkle <armin.buerkle@alfatraining.de>
2019-08-27 10:01:36 -07:00
Alejandro Pedraza 02efb46e45
Have the proxy-injector emit events upon injection/skipping injection (#3316)
* Have the proxy-injector emit events upon injection/skipping injection

Fixes #3253

Have the proxy-injector emit an event whenever a injection happens, or
when injection is skipped for some reason (also added that reason into
the proxy-injector logs). The level is associated to the parent workload
(it can't be associated to the pod because at this point the pod hasn't
been persisted).

The event recorder was setup at the `webhook/server.go` level and passed
to the proxy-injector's `Inject` function. The sp-validator thus also
has access to the event recorder, but for now it's not using it.

Related changes:

- Refactored `api.GetOwnerKindAndName()` to have it return a more
generic object.
- Refactored `report.Injectable()` to also have it return the reason why
a workload is not injectable.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-26 13:34:36 -05:00
Carol A. Scott 089836842a
Add unit test for edges API endpoint (#3306)
Fixes #3052.

Adds a unit test for the edges API endpoint. To maintain a consistent order for
testing, the returned rows in api/public/edges.go are now sorted.
2019-08-23 09:28:02 -07:00
arminbuerkle e7d303e03f Add LINKERD2_PROXY_DESTINATION_GET_SUFFIXES (#3277)
* Fix missing `clusterDomain` in render RenderTapOutputProfile
* Add LINKERD2_PROXY_DESTINATION_GET_SUFFIXES env variable

Signed-off-by: Armin Buerkle <armin.buerkle@alfatraining.de>
2019-08-21 14:28:30 -07:00
Oliver Gould cb276032f5
Require go 1.12.9 for controller builds (#3297)
Netflix recently announced a security advisory that identified several
Denial of Service attack vectors that can affect server implementations
of the HTTP/2 protocol, and has issued eight CVEs. [1]

Go is affected by two of the vulnerabilities (CVE-2019-9512 and
CVE-2019-9514) and so Linkerd components that serve HTTP/2 traffic are
also affected. [2]

These vulnerabilities allow untrusted clients to allocate an unlimited
amount of memory, until the server crashes. The Kubernetes Product
Security Committee has assigned this set of vulnerabilities with a CVSS
score of 7.5. [3]

[1] https://github.com/Netflix/security-bulletins/blob/master/advisories/third-party/2019-002.md
[2] https://golang.org/doc/devel/release.html#go1.12
[3] https://www.first.org/cvss/calculator/3.0#CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H
2019-08-21 10:03:29 -07:00
Guangming Wang 70d85d2065 Cleanup: fix some typos in code comment (#3296)
Signed-off-by: Guangming Wang <guangming.wang@daocloud.io>
2019-08-21 09:40:43 -07:00
Oliver Gould ee79d5d324
destination: Reorganize authority-parsing (#3244)
In preparation for #3242, the destination controller will need to
support a broader set of valid authorities including IP addresses.

This change modifies the destination controller's authority-parsing code
so that the is-this-a-kubernete-service-name decision is decoupled from
parsing of authorities into their consituent parts.

The `Get` API now explicitly handles IP address names, though it
currently fails all such resolutions.
2019-08-21 07:19:42 -07:00
Ivan Sim 183e42e4cd
Merge the CLI 'installValues' type with Helm 'Values' type (#3291)
* Rename template-values.go
* Define new constructor of charts.Values type
* Move all Helm values related code to the pkg/charts package
* Bump dependency
* Use '/' in filepath to remain compatible with VFS requirement
* Add unit test to verify Helm YAML output
* Alejandro's feedback
* Add unit test for Helm YAML validation (HA)

Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-08-20 19:26:38 -07:00
Carol A. Scott bc8fef7ba9 Sorting the expected response for trafficsplit rows so it is always in consistent row order (#3280) 2019-08-19 10:10:26 -07:00
Kevin Leimkuhler c9c41e2e8a
Remove gRPC tap server listener from controller (#3276)
### Summary

As an initial attempt to secure the connection from clients to the gRPC tap
server on the tap Pod, the tap `addr` only listened on localhost.

As @adleong pointed out #3257, this was not actually secure because the inbound
proxy would establish a connection to localhost anyways.

This change removes the gRPC tap server listener and changes `TapByResource`
requests to interface with the server object directly.

From this, we know that all `TapByResourceRequests` have gone through the tap
APIServer and thus authorized by RBAC.

### Details

[NewAPIServer](ef90e0184f/controller/tap/apiserver.go (L25-L26)) now takes a [GRPCTapServer](f6362dfa80/controller/tap/server.go (L33-L34)) instead of a `pb.TapClient` so that
`TapByResource` requests can interact directly with the [TapByResource](f6362dfa80/controller/tap/server.go (L49-L50)) method.

`GRPCTapServer.TapByResource` now makes a private [grpcTapServer](ef90e0184f/controller/tap/handlers.go (L373-L374)) that satisfies
the [tap.TapServer](https://godoc.org/github.com/linkerd/linkerd2/controller/gen/controller/tap#TapServer) interface. Because this interface is satisfied, we can interact
with the tap server methods without spawning an additional listener.

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2019-08-16 16:38:50 -04:00
cpretzer 4e92064f3b
Add a flag to install-cni command to configure iptables wait flag (#3066)
Signed-off-by: Charles Pretzer <charles@buoyant.io>
2019-08-15 12:58:18 -07:00
Carol A. Scott 9c62b65c6a
Adding trafficsplit test to stat_summary_test.go (#3252)
This PR adds a test for trafficsplits to stat_summary_test.go.

Because the test requires a consistent order for returned rows, trafficsplit
rows in stat_summary.go are now sorted by apex + leaf name before being
returned.
2019-08-14 14:48:46 -07:00
Kevin Leimkuhler cc3c53fa73
Remove tap from public API and associated test infrastructure (#3240)
### Summary

After the addition of the tap APIServer, all the logic related to tap in the public API no longer needs to be there. The servers and clients that are created but not used, as well as all the old testing infrastrucure related to tap can be removed.

This deprecates TapByResource and therefore required an update to the protobuf files with `bin/protoc-go.sh`. While the change to deprecate this method was extremely small, a lot of protobuf fils were updated in the process. These changes to the code and protobuf files should probably remain coupled since `TapByResource` is officially deprecated in the public API, but a majority of the additions/deletions are related to those files.

This draft passes `go test` as well as a local run of the integration tests.

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2019-08-14 17:27:37 -04:00
Andrew Seigner 3b55e2e87d
Add container cpu and mem to heartbeat requests (#3238)
PR #3217 re-introduced container metrics collection to
linkerd-prometheus. This enabled linkerd-heartbeat to collect mem and
cpu metrics at the container-level.

Add container cpu and mem metrics to heartbeat requests. For each of
(destination, prometheus, linkerd-proxy), collect maximum memory and p95
cpu.

Concretely, this introduces 7 new query params to heartbeat requests:
- p99-handle-us
- max-mem-linkerd-proxy
- max-mem-destination
- max-mem-prometheus
- p95-cpu-linkerd-proxy
- p95-cpu-destination
- p95-cpu-prometheus

Part of #2961

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-14 12:04:08 -07:00
Carol A. Scott 00437709eb
Add trafficsplit metrics to CLI (#3176)
This PR adds `trafficsplit` as a supported resource for the `linkerd stat` command. Users can type `linkerd stat ts` to see the apex and leaf services of their trafficsplits, as well as metrics for those leaf services.
2019-08-14 10:30:57 -07:00
Alex Leong 98b6b9e9ba
Check in gen deps (#3245)
Go dependencies which are only used by generated code had not previously been checked into the repo.  Because `go generate` does not respect the `-mod=readonly` flag, running `bin/linkerd` will add these dependencies and dirty the local repo.  This can interfere with the way version tags are generated.

To avoid this, we simply check these deps in.

Note that running `go mod tidy` will remove these again.  Thus, it is not recommended to run `go mod tidy`. 

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-08-13 17:02:52 -07:00
Kevin Leimkuhler f0bd82f831
Update protobuf files (#3241)
### Summary

Changes from `bin/protoc-go.sh`

An existing [draft PR](https://github.com/linkerd/linkerd2/pull/3240) has a majority of its changes related to protobuf file
updates. In order to separate these changes out into more related components,
this PR updates the generated protobuf files so that #3240 can be rebased off
this and have a more manageable diff.

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2019-08-13 16:58:49 -04:00
Kevin Leimkuhler 5d7662fd90
Update web server to use tap APIService (#3208)
### Motivation

PR #3167 introduced the tap APIService and migrated `linkerd tap` to use it.
Subsequent PRs (#3186 and #3187) updated `linkerd top` and `linkerd profile
--tap` to use the tap APIService. This PR moves the web's Go server to now also
use the tap APIService instead of the public API. It also ensures an error
banner is shown to the user when unauthorized taps fail via `linkerd top`
command in *Overview* and *Top*, and `linkerd tap` command in *Tap*.

### Details

The majority of these changes are focused around piping through the HTTP error
that occurs and making sure the error banner generated displays the error
message explaining to view the tap RBAC docs.

`httpError` is now public (`HTTPError`) and the error message generated is short
enough to fit in a control frame (explained [here](https://github.com/linkerd/linkerd2/blob/kleimkuhler%2Fweb-tap-apiserver/web/srv/api_handlers.go#L173-L175)).

### Testing

The error we are testing for only occurs when the linkerd-web service account is
not authorzied to tap resources. Unforutnately that is not the case on Docker
For Mac (assuming that is what you use locally), so you'll need to test on a
different cluster. I chose a GKE cluster made through the GKE console--not made
through cluster-utils because it adds cluster-admin.

Checkout the branch locally and `bin/docker-build` or `ares-build` if you have
it setup. It should produce a linkerd with the version `git-04e61786`. I have
already pushed the dependent components, so you won't need to `bin/docker-push
git-04e61786`.

Install linkerd on this GKE cluster and try to run `tap` or `top` commands via
the web. You should see the following errors:

### Tap

![web-tap-unauthorized](https://user-images.githubusercontent.com/4572153/62661243-51464900-b925-11e9-907b-29d7ca3f815d.png)

### Top

![web-top-unauthorized](https://user-images.githubusercontent.com/4572153/62661308-894d8c00-b925-11e9-9498-6c9d38b371f6.png)

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2019-08-08 10:18:32 -07:00
Andrew Seigner f98bc27a38
Fix invalid `l5d-require-id` for some tap requests (#3210)
PR #3154 introduced an `l5d-require-id` header to Tap requests. That
header string was constructed based on the TapByResourceRequest, which
includes 3 notable fields (type, name, namespace). For namespace-level
requests (via commands like `linkerd tap ns linkerd`), type ==
`namespace`, name == `linkerd`, and namespace == "". This special casing
for namespace-level requests yielded invalid `l5d-require-id` headers,
for example: `pd-sa..serviceaccount.identity.linkerd.cluster.local`.

Fix `l5d-require-id` string generation to account for namespace-level
requests. The bulk of this change is tap unit test updates to validate
the fix.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-08 09:42:11 -07:00
Alejandro Pedraza 3ae653ae92
Refactor proxy injection to use Helm charts (#3200)
* Refactor proxy injection to use Helm charts

Fixes #3128

A new chart `/charts/patch` was created, that generates the JSON patch
payload that is to be returned to the k8s API when doing the injection
through the proxy injector, and it's also leveraged by the `linkerd
inject --manual` CLI.

The VFS was used by `linkerd install` to access the old chart under
`/chart`. Now the proxy injection also uses the Helm charts to generate
the JSON patch (see above) so we've moved the VFS from `cli/static` to a
new common place under `/pkg/charts/static`, and the new root for the VFS is
now `/charts`.

`linkerd install` hasn't yet migrated to use the new charts (that'll
happen in #3127), so the only change in that regard was the creation of
`/charts/chart` which is a symlink pointing to `/chart` that
`install.go` now uses, so that the VFS contains both the old and new
charts, as a temporary measure.

You can see that `/bin/Dockerfile-bin`, `/controller/Dockerfile` and
`/bin/build-cli-bin` do now `go generate` pointing to the new location
(and the `go generate` annotation was moved from `/cli/main.go` to
`pkg/charts/static/templates.go`).

The symlink trick doesn't work when building the binaries through
Docker, so `/bin/Dockerfile-bin` replaces the symlink with an actual
copy of `/chart`.

Also note that in `/controller/Dockerfile` we now need to include the
`prod` tag in `go install` like we do in `/bin/Dockerfile-bin` so that
the proxy injector does use the VFS instead of the local file system.

- The common logic to parse a chart has been moved from `install.go` to
`/pkg/charts/util.go`.
- The special ENV var in the proxy for "outbound router capacity" that
only applies to the Prometheus pod is now handled directly in the proxy
partial and all the associated go code could be removed.
- The `patch.go` lib for generating the JSON patch in go along
with its tests `patch_test.go` are no longer needed.
- Lots of functions in `/pkg/inject/inject.go` got removed/simplified
with their logic being moved into the charts themselves. As a
consequence lots of things in `inject_test.go` became irrelevant.
- Moved `template-values.go` from `/pkg/inject` to `pkg/charts` as that
contains the go structs representation of the chart variables that
will be leveraged in #3127.

Don't forget to run `/bin/helm.sh` whenever you make changes to charts
;-)

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-08-07 17:32:37 -05:00
Andrew Seigner c253805042
Update tap authz error with doc URL (#3196)
When the Tap APIServer's SubjectAccessReview fails, return an error
message pointing the user to: https://linkerd.io/tap-rbac

Depends on https://github.com/linkerd/website/pull/450
Part of #3191

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-06 08:56:41 -07:00
Andrew Seigner 09fbea7165
Fix Tap resource/subresource APIGroupList (#3192)
The `/apis/tap.linkerd.io/v1alpha1` endpoint on the Tap APIServer
included tap resource/subresource pairs, such as `deployments/tap` and
`pods/tap`, but did not include the parent resources, such as
`deployments` and `pods`. This broke commands like `kubectl auth can-i`,
which expect the parent resource to exist.

Introduce parent resources for all tap-able subresources. Concretely, it
fixes this class of command:
```
$ kubectl auth can-i watch deployments.tap.linkerd.io/linkerd-grafana \
  --subresource=tap -n linkerd --as siggy@buoyant.io
Warning: the server doesn't have a resource type 'deployments' in group 'tap.linkerd.io'
no - no RBAC policy matched
```
Fixed:
```
$ kubectl auth can-i watch deployments.tap.linkerd.io/linkerd-grafana \
  --subresource=tap -n linkerd --as siggy@buoyant.io
yes
```

Additionally, when SubjectAccessReviews fail, return a 403 rather than
500.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-05 10:36:48 -07:00
Andrew Seigner a59c1dd32d
Introduce tap APIService, update `linkerd tap` (#3167)
The Tap Service enabled tapping of any meshed pod, regardless of user
privilege.

This change introduces a new Tap APIService. Kubernetes provides
authentication and authorization of Tap requests, and then forwards
requests to a new Tap APIServer, which implements a Kubernetes
aggregated APIServer. The Tap APIServer authenticates the client TLS
from Kubernetes, and authorizes the user via a SubjectAccessReview.

This change also modifies the `linkerd tap` command to make requests
against the new APIService.

The Tap APIService implements these Kubernetes-style endpoints:
POST /apis/tap.linkerd.io/v1alpha1/watch/namespaces/:ns/tap
POST /apis/tap.linkerd.io/v1alpha1/watch/namespaces/:ns/:res/:name/tap
GET  /apis
GET  /apis/tap.linkerd.io
GET  /apis/tap.linkerd.io/v1alpha1
GET  /healthz
GET  /healthz/log
GET  /healthz/ping
GET  /metrics
GET  /openapi/v2
GET  /version

Users authorize to the new `tap.linkerd.io/v1alpha1` via RBAC. Only the
`watch` verb is supported. Access is also available via subresources
such as `deployments/tap` and `pods/tap`.

This change introduces the following resources into the default Linkerd
install:
- Global
  - APIService/v1alpha1.tap.linkerd.io
  - ClusterRoleBinding/linkerd-linkerd-tap-auth-delegator
- `linkerd` namespace:
  - Secret/linkerd-tap-tls
- `kube-system` namespace:
  - RoleBinding/linkerd-linkerd-tap-auth-reader

Tasks not covered by this PR:
- `linkerd top`
- `linkerd dashboard`
- `linkerd profile --tap`
- removal of the unauthenticated tap controller

Fixes #2725, #3162, #3172

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-01 14:02:45 -07:00
Andrew Seigner 9a672dd5a9
Introduce `linkerd --as` flag for impersonation (#3173)
Similar to `kubectl --as`, global flag across all linkerd subcommands
which sets a `ImpersonationConfig` in the Kubernetes API config.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-07-31 16:05:33 -07:00
Kevin Leimkuhler 6cc52c3363
Add `l5d-require-id` header to Tap requests (#3154)
### Summary

In order for Pods' tap servers to start authorizing tap clients, the tap
controller must open TLS connections so that it can identity itself to the
server.

This change introduces the use of `l5d-require-id` header on outbound tap
requests.

### Details

When tap requests are made by the tap controller, the `Authority` header is an
IP address. The proxy does not attempt to do service discovery on such requests
and therefore the connection is over plaintext. By introducing the
`l5d-require-id` header the proxy can require a server name on the connection.
This allows the tap controller to identity itself as the client making tap
requests. The name value for the header can be made from the Pod Spec and tap
request, so the change is rather minimal.

#### Proxy Changes

* Update h2 to v0.1.26
* Properly fall back in the dst_router (linkerd/linkerd2-proxy#291)

### Testing

Unit tests for the header have not been added mainly because [no test
infrastructure currently exists](065c221858/controller/tap/server_test.go (L241)) to mock proxy requests. After talking with
@siggy a little about this, it makes to do in a separate change at some point
when behavior like this cannot be reliably tested through integration tests
either.

Integration tests do test this well, and will continue to do once
linkerd/linkerd2-proxy#290 lands.

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2019-07-30 11:17:52 -07:00
Alex Leong ab7226cbcd
Return invalid argument for external name services (#3120)
Fixes https://github.com/linkerd/linkerd2/issues/2800#issuecomment-513740498

When the Linkerd proxy sends a query for a Kubernetes external name service to the destination service, the destination service returns `NoEndpoints: exists=false` because an external name service has no endpoints resource.  Due to a change in the proxy's fallback logic, this no longer causes the proxy to fallback to either DNS or SO_ORIG_DST and instead fails the request.  The net effect is that Linkerd fails all requests to external name services.

We change the destination service to instead return `InvalidArgument` for external name services.  This causes the proxy to fallback to SO_ORIG_DST instead of failing the request.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-07-29 16:31:22 -07:00
Alejandro Pedraza 8c07223f3b
Remove unused argument (#3149)
Removed unused argument in the `GetPatch()` function in
`pkg/inject/inject.go`

Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
2019-07-26 11:39:25 -05:00
Andrew Seigner 51b33ad53c
Fix nil pointer dereference in endpoints watcher (#3147)
The destination service's endpoints watcher assumed every `Endpoints`
object contained a `TargetRef`. This field is optional, and in cases
such as the default `ep/kubernetes` object, `TargetRef` is nil, causing
a nil pointer dereference.

Fix endpoints watcher to check for `TargetRef` prior to dereferencing.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-07-25 17:11:56 -07:00
Andrew Seigner 18b74aa8a8
Introduce Go modules support (#2481)
The repo relied on `dep` for managing Go dependencies. Go 1.11 shipped
with Go modules support. Go 1.13 will be released in August 2019 with
module support enabled by default, deprecating GOPATH.

This change replaces `dep` with Go modules for dependency management.
All scripts, including Docker builds and ci, should work without any dev
environment changes.

To execute `go` commands directly during development, do one of the
following:
1. clone this repo outside of `GOPATH`; or
2. run `export GO111MODULE=on`

Summary of changes:
- Docker build scripts and ci set `-mod=readonly`, to ensure
  dependencies defined in `go.mod` are exactly what is used for the
  builds.
- Dependency updates to `go.mod` are accomplished by running
 `go build` and `go test` directly.
- `bin/go-run`, `bin/build-cli-bin`, and `bin/test-run` set
  `GO111MODULE=on`, permitting usage inside and outside of GOPATH.
- `gcr.io/linkerd-io/go-deps` tags hashed from `go.mod`.
- `bin/update-codegen.sh` still requires running from GOPATH,
  instructions added to BUILD.md.

Fixes #1488

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-07-25 14:41:38 -07:00
Alex Leong 3c4a0e4381
Make authorities in destination overrides absolute (#3137)
Fixes #3136 

When the destination service sends a destination profile with a traffic split to the proxy, the override destination authorities are absolute but do no contain a trailing dot.  e.g. "bar.ns.svc.cluster.local:80".  However, NameAddrs which have undergone canonicalization in the proxy will include the trailing dot.  When a traffic split includes the apex service as one of the overrides, the original apex NameAddr will have the trailing dot and the override will not.  Since these two NameAddrs are not identical, they will go into two distinct slots in the proxy's concrete dst router.  This will cause two services to be created for the same destination which will cause the stats clobbering described in the linked issue.

We change the destination service to always return absolute dst overrides including the trailing dot.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-07-24 17:08:40 -07:00