This PR adds a test for trafficsplits to stat_summary_test.go.
Because the test requires a consistent order for returned rows, trafficsplit
rows in stat_summary.go are now sorted by apex + leaf name before being
returned.
### Summary
After the addition of the tap APIServer, all the logic related to tap in the public API no longer needs to be there. The servers and clients that are created but not used, as well as all the old testing infrastrucure related to tap can be removed.
This deprecates TapByResource and therefore required an update to the protobuf files with `bin/protoc-go.sh`. While the change to deprecate this method was extremely small, a lot of protobuf fils were updated in the process. These changes to the code and protobuf files should probably remain coupled since `TapByResource` is officially deprecated in the public API, but a majority of the additions/deletions are related to those files.
This draft passes `go test` as well as a local run of the integration tests.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
This PR adds `trafficsplit` as a supported resource for the `linkerd stat` command. Users can type `linkerd stat ts` to see the apex and leaf services of their trafficsplits, as well as metrics for those leaf services.
The Tap Service enabled tapping of any meshed pod, regardless of user
privilege.
This change introduces a new Tap APIService. Kubernetes provides
authentication and authorization of Tap requests, and then forwards
requests to a new Tap APIServer, which implements a Kubernetes
aggregated APIServer. The Tap APIServer authenticates the client TLS
from Kubernetes, and authorizes the user via a SubjectAccessReview.
This change also modifies the `linkerd tap` command to make requests
against the new APIService.
The Tap APIService implements these Kubernetes-style endpoints:
POST /apis/tap.linkerd.io/v1alpha1/watch/namespaces/:ns/tap
POST /apis/tap.linkerd.io/v1alpha1/watch/namespaces/:ns/:res/:name/tap
GET /apis
GET /apis/tap.linkerd.io
GET /apis/tap.linkerd.io/v1alpha1
GET /healthz
GET /healthz/log
GET /healthz/ping
GET /metrics
GET /openapi/v2
GET /version
Users authorize to the new `tap.linkerd.io/v1alpha1` via RBAC. Only the
`watch` verb is supported. Access is also available via subresources
such as `deployments/tap` and `pods/tap`.
This change introduces the following resources into the default Linkerd
install:
- Global
- APIService/v1alpha1.tap.linkerd.io
- ClusterRoleBinding/linkerd-linkerd-tap-auth-delegator
- `linkerd` namespace:
- Secret/linkerd-tap-tls
- `kube-system` namespace:
- RoleBinding/linkerd-linkerd-tap-auth-reader
Tasks not covered by this PR:
- `linkerd top`
- `linkerd dashboard`
- `linkerd profile --tap`
- removal of the unauthenticated tap controller
Fixes#2725, #3162, #3172
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
`linkerd check`, the web dashboard, and Grafana all perform version
checks to validate Linkerd is up to date. It's common for users to
seldom execute these codepaths. This makes it difficult to identify what
versions of Linkerd are currently in use and what environments it is
being run in, which helps prioritize testing and backports.
Introduce a `heartbeat` CronJob to the default Linkerd install. The
cronjob executes every 24 hours, starting from 5 minutes after
`linkerd install` is run.
Example check URL:
https://versioncheck.linkerd.io/version.json?
install-time=1562761177&
k8s-version=v1.15.0&
meshed-pods=8&
rps=3&
source=heartbeat&
uuid=cc4bb700-3314-426a-9f0f-ec588b9df020&
version=git-b97ee9f7
Fixes#2961
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The openAPIV3Schema validation in the ServiceProfiles CRD is very limited in what it can validate and is obviated by more sophisticated validation done by the validating admission controller. Therefore, we would like to remove the openAPIV3Schema validation to reduce the size and complexity of the CRD object.
To do so, we must also bump the version of the ServiceProfile custom resource from v1alpha1 to v1alpha2. This ensures that when the controller is upgraded, it will attempt to watch the v1alpha2 resource. If it cannot (because, for example, the controller pod started before the ServiceProfile CRD was updated and therefore the v1alpha2 version does not exist) then it will go into a crash loop backoff until it can. This essentially means that the controller will wait for the CRD to be upgraded to include v1alpha2 before it will start.
Bumping the version is necessary because if we did not, it would be possible for the controller to start before the CRD is updated (removing the validation). In this case, when the CRD is edited, the controller will lose its list watch on ServiceProfiles and will stop getting updates.
Signed-off-by: Alex Leong <alex@buoyant.io>
When waiting for controller pods to be created or become ready, `linkerd check` doesn't offer any hints as to whether there has been an error (such as an ImagePullBackoff).
We add pod status to the output to make this more immediately obvious.
Fixes#2877
Signed-off-by: Alex Leong <alex@buoyant.io>
This PR improves the CLI output for `linkerd edges` to reflect the latest API
changes.
Source and destination namespaces for each edge are now shown by default. The
`MSG` column has been replaced with `Secured` and contains a green checkmark or
the reason for no identity. A new `-o wide` flag shows the identity of client
and server if known.
During operations with `linkerd stat` sometimes it's not clear the actual
pod status.
This commit introduces a method, to the `k8s`package, getting the pod status,
based on [`kubectl` logic](33a3e325f7/pkg/printers/internalversion/printers.go (L558-L640))
to expose the `STATUS` column for pods . Also, it changes the stat command
on the` cli` package adding a column when the resource type is a Pod.
Fixes#1967
Signed-off-by: Jonathan Juares Beber <jonathanbeber@gmail.com>
When getting pods for specific kubernetes resources, the usage of just
labels, as a selector, generates wrong outputs. Once, two resources can use
the same label selector and manage distinct pods, a new mechanism to check
pods for a given resource it's needed. More details on #2932.
This commit introduces a verification through the pod owner references
`UID`s, comparing with the given resource's. Additional logic is needed
when handling `Deployments` since it creates a `ReplicaSet` and this last
one is the actual pod's owner. No verification is done in case of
`Services`.
Signed-off-by: Jonathan Juares Beber <jonathanbeber@gmail.com>
* Have `linkerd endpoints` use `Destination.Get`
Fixes#2885
We're refactoring `linkerd endpoints` so it hits
directly the `Destination.Get` endpoint, instead of relying on the
Discovery service.
For that, I've created a new `client.go` for Destination and added it to
the `APIClient` interface.
I've also added a `destinationClient` struct that mimics `tapClient`,
and whose common logic has been moved into `stream_client.go`.
Analogously, I added a `destinationServer` struct that mimics
`tapServer`.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
This PR fixes a bug in the edges command where if src_resources from two
different namespaces sent requests to the same dst_resource, the original
src_identity was overwritten.
* Simplify port-forwarding code
Simplifies the establishment of a port-forwarding by moving the common
logic into `PortForward.Init()`
Stemmed from this
[comment](https://github.com/linkerd/linkerd2/pull/2937#discussion_r295078800)
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
* Have `GetOwnerKindAndName` be able to skip the cache
Refactored `GetOwnerKindAndName` so it can optionally skip the
shared informer cache and instead hit the k8s API directly.
Useful for the proxy injector, when the pod's replicaset got just
created and might not be in ready in the cache yet.
Fixes#2738
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
Adds a check to Prometheus `edges` queries to verify that data for the requested
resource type exists. Previously, if Prometheus could not find request data for the
requested resource type, it would skip that label and still return data for
other labels in the `by` clause, leading to an incorrect response.
This change adds an endpoint to the public API to allow us to query Prometheus for edge data, in order to display identity information for connections between Linkerd proxies. This PR only includes changes to the controller and protobuf.
Private k8s clusters, such as the private GKE clusters offered by Google
Cloud, cannot be reached through the current API proxy method.
This commit uses the port forwarding feature already developed.
Also modify dashboard command to not fall back to ephemeral port.
Signed-off-by: Jack Price <jackprice@outlook.com>
The new proxy has changed its configuration as follows:
- `LISTENER` urls are now `LISTEN_ADDR` addresses;
- `CONTROL_URL` is now `DESTINATION_SVC_ADDR`;
- `*_NAMESPACE` vars are no longer needed;
- The `PROXY_ID` is now the `DESTINATION_CONTEXT`;
- The "metrics" port is now the "admin" port, since it serves more than
just metrics;
- A readiness probe now checks a dedicated /ready endpoint eagerly.
Identity injection is **NOT** configured by this branch.
## Problem
When an object has no previous route metrics, we do not generate a table for
that object.
The reasoning behind this was for reducing output of the following command:
```
$ linkerd routes deploy --to deploy/foo
```
For each deployment object, if it has no previous traffic to `deploy/foo`, then
a table would not be generated for it.
However, the behavior we see with that indicates there is an error even when a
Service Profile is installed:
```
$ linkerd routes deploy deploy/foo
Error: No Service Profiles found for selected resources
```
## Solution
Always generate a stat table for the queried resource object.
## Validation
I deployed [booksapp](https://github.com/buoyantIO/booksapp) with the `traffic`
deployment removed and Service Profiles installed.
Without the fix, `linkerd routes deploy/webapp` displays an error because there
has been no traffic to `deploy/webapp` without the `traffic` deployment.
With the fix, the following output is generated:
```
ROUTE SERVICE SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
GET / webapp 0.00% 0.0rps 0ms 0ms 0ms
GET /authors/{id} webapp 0.00% 0.0rps 0ms 0ms 0ms
GET /books/{id} webapp 0.00% 0.0rps 0ms 0ms 0ms
POST /authors webapp 0.00% 0.0rps 0ms 0ms 0ms
POST /authors/{id}/delete webapp 0.00% 0.0rps 0ms 0ms 0ms
POST /authors/{id}/edit webapp 0.00% 0.0rps 0ms 0ms 0ms
POST /books webapp 0.00% 0.0rps 0ms 0ms 0ms
POST /books/{id}/delete webapp 0.00% 0.0rps 0ms 0ms 0ms
POST /books/{id}/edit webapp 0.00% 0.0rps 0ms 0ms 0ms
[DEFAULT] webapp 0.00% 0.0rps 0ms 0ms 0ms
```
Closes#2328
Signed-off-by: Kevin Leimkuhler <kevinl@buoyant.io>
linkerd/linkerd2#2428 modified SelfSubjectAccessReview behavior to no
longer paper-over failed ServiceProfile checks, assuming that
ServiceProfiles will be required going forward. There was a lingering
ServiceProfile check in the web's startup that started failing due to
this change, as the web component does not have (and should not need)
ServiceProfile access. The check was originally implemented to inform
the web component whether to expect "single namespace" mode or
ServiceProfile support.
Modify the web's initialization to always expect ServiceProfile support.
Also remove single namespace integration test
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
linkerd/linkerd2#2349 removed the `--single-namespace` flag, in favor of
runtime detection of cluster vs. namespace access, and also
ServiceProfile availability. This maintained control-plane support for
running in these two states.
This change requires control-plane components have cluster-wide
Kubernetes API access and ServiceProfile availability, and will error
out if not. Once #2349 merges, stage 1 install will be a requirement for
a successful stage 2 install.
Part of #2337
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
We were depending on an untagged version of prometheus/client_golang
from Feb 2018.
This bumps our dependency to v0.9.2, from Dec 2018.
Also, this is a prerequisite to #1488.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The control-plane components relied on a `--single-namespace` param,
passed from `linkerd install` into each individual component, to
determine which namespaces they were authorized to access, and whether
to support ServiceProfiles. This command-line flag was redundant given
the authorization rules encoded in the parent `linkerd install` output,
via [Cluster]Role[Binding]s.
Modify the control-plane components to query Kubernetes at startup to
determine which namespaces they are authorized to access, and whether
ServiceProfile support is available. This allows removal of the
`--single-namespace` flag on the components.
Also update `bin/test-cleanup` to cleanup the ServiceProfile CRD.
TODO:
- Remove `--single-namespace` flag on `linkerd install`, part of #2164
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
goconst finds repeated strings that could be replaced by a constant:
https://github.com/jgautheron/goconst
Part of #217
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Adds a flag, tcp_stats to the StatSummary request, which queries prometheus for TCP stats.
This branch returns TCP stats at /api/tps-reports when this flag is true.
TCP stats are now displayed on the Resource Detail pages.
The current queried TCP stats are:
tcp_open_connections
tcp_read_bytes_total
tcp_write_bytes_total
Up until now, the proxy-api controller service has been the sole service
that the proxy communicates with, implementing the majoriry of the API
defined in the `linkerd2-proxy-api` repo. But this is about to change:
linkerd/linkerd2-proxy-api#25 introduces a new Identity service; and
this service must be served outside of the existing proxy-api service
in the linkerd-controller deployment (so that it may run under a
distinct service account).
With this change, the "proxy-api" name becomes less descriptive. It's no
longer "the service that serves the API for the proxy," it's "the
service that serves the Destination API to the proxy." Therefore, it
seems best to bite the bullet and rename this to be the "destination"
service (i.e. because it only serves the
`io.linkerd.proxy.destination.Destination` service).
Co-authored-by: Kevin Lingerfelt <kl@buoyant.io>
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
`golangci-lint` performs numerous checks on Go code, including golint,
ineffassign, govet, and gofmt.
This change modifies `bin/lint` to use `golangci-lint`, and replaces
usage of golint and govet.
Also perform a one-time gofmt cleanup:
- `gofmt -s -w controller/`
- `gofmt -s -w pkg/`
Part of #217
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Fixes#2077
When looking up service profiles, Linkerd always looks for the service profile objects in the Linkerd control namespace. This is limiting because service owners who wish to create service profiles may not have write access to the Linkerd control namespace.
Instead, we have the control plane look for the service profile in both the client namespace (as read from the proxy's `proxy_id` field from the GetProfiles request and from the service's namespace. If a service profile exists in both namespaces, the client namespace takes priority. In this way, clients may override the behavior dictated by the service.
Signed-off-by: Alex Leong <alex@buoyant.io>
The Proxy API service lacked introspection of its internal state.
Introduce a new gRPC Discovery API, implemented by two servers:
1) Proxy API Server: returns a snapshot of discovery state
2) Public API Server: pass-through to the Proxy API Server
Also wire up a new `linkerd endpoints` command.
Fixes#2165
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
DaemonSet stats are not currently shown in the cli stat command, web ui
or grafana dashboard. This commit adds daemonset support for stat.
Update stat command's help message to reference daemonsets.
Update the public-api to support stats for daemonsets.
Add tests for stat summary and api.
Add daemonset get/list/watch permissions to the linkerd-controller
cluster role that's created using the install command.
Update golden expectation test files for install command
yaml manifest output.
Update web UI with daemonsets
Update navigation, overview and pages to list daemonsets and the pods
associated to them.
Add daemonset paths to server, and ui apps.
Add grafana dashboard for daemonsets; a clone of the deployment
dashboard.
Update dependencies and dockerfile hashes
Add DaemonSet support to tap and top commands
Fixes of #2006
Signed-off-by: Zak Knill <zrjknill@gmail.com>
Fixes#2119
When Linkerd is installed in single-namespace mode, the public-api container panics when it attempts to access watch service profiles.
In single-namespace mode, we no longer watch service profiles and return an informative error when the TopRoutes API is called.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Introduce resource selector and deprecate namespace field for ListPods
* Changes from code review
* Properly deprecate the field
* Do not check for nil
* Fix the mockProm usage
* Protoc changes revert
* Changed from code review
Signed-off-by: Alena Varkockova <varkockova.a@gmail.com>
Fixes#1875
This change improves the `linkerd routes` command in a number of important ways:
* The restriction on the type of the `--to` argument is lifted and any resource type can now be used. Try `--to ns/books`, `--to po/webapp-ABCDEF`, `--to au/linkerd.io`, or even `--to svc`.
* All routes for the target will now be populated in the table, even if there are no Prometheus metrics for that route.
* [UNKNOWN] has been renamed to [DEFAULT]
* The `Service/Authority` column will now list `Service` in all cases except for when an authority target is explicitly requested.
```
$ linkerd routes deploy/traffic --to deploy/webapp
ROUTE SERVICE SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
GET / webapp 100.00% 0.5rps 50ms 180ms 196ms
GET /authors/{id} webapp 100.00% 0.5rps 100ms 900ms 980ms
GET /books/{id} webapp 100.00% 0.9rps 38ms 93ms 99ms
POST /authors webapp 100.00% 0.5rps 35ms 48ms 50ms
POST /authors/{id}/delete webapp 100.00% 0.5rps 83ms 180ms 196ms
POST /authors/{id}/edit webapp 0.00% 0.0rps 0ms 0ms 0ms
POST /books webapp 45.16% 2.1rps 75ms 425ms 485ms
POST /books/{id}/delete webapp 100.00% 0.5rps 30ms 90ms 98ms
POST /books/{id}/edit webapp 56.00% 0.8rps 92ms 875ms 975ms
[DEFAULT] webapp 0.00% 0.0rps 0ms 0ms 0ms
```
This is all made possible by a shift in the way we handle the destination resource. When we get a request with a `ToResource`, we use the k8s API to find all Services which include at least one pod belonging to that resource. We then fetch all service profiles for those services and display the routes from those serivce profiles.
This shift in thinking also precipitates a change in the TopRoutes API where we no longer need special cases for `ToAll` (which can be specified by `--to au`) or `ToAuthority` (which can be specified by `--to au/<authority>`) and instead can use a `ToResource` to handle all cases.
Signed-off-by: Alex Leong <alex@buoyant.io>
Commit 1: Enable lint check for comments
Part of #217. Follow up from #1982 and #2018.
A subsequent commit will fix the ci failure.
Commit 2: Address all comment-related linter errors.
This change addresses all comment-related linter errors by doing the
following:
- Add comments to exported symbols
- Make some exported symbols private
- Recommend via TODOs that some exported symbols should should move or
be removed
This PR does not:
- Modify, move, or remove any code
- Modify existing comments
Signed-off-by: Andrew Seigner <siggy@buoyant.io>