This PR introduces a service mirroring component that is responsible for watching remote clusters and mirroring their services locally.
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Followup to #2990, which refactored `linkerd endpoints` to use the
`Destination.Get` API instead of the `Discovery.Endpoints` API, leaving
the Discovery with no implented methods. This PR removes all the Discovery
code leftovers.
Fixes#3499
* Set custom cluster domain in GetServiceProfileFor
* Set custom cluster domain in tap server
Move fetching cluster domain for tap server to cmd main
* Handle fetchting cluster domain errors separately
* Use custom cluster domain for traffic split adaptor
Signed-off-by: Armin Buerkle <armin.buerkle@alfatraining.de>
Fixes#3052.
Adds a unit test for the edges API endpoint. To maintain a consistent order for
testing, the returned rows in api/public/edges.go are now sorted.
### Summary
After the addition of the tap APIServer, all the logic related to tap in the public API no longer needs to be there. The servers and clients that are created but not used, as well as all the old testing infrastrucure related to tap can be removed.
This deprecates TapByResource and therefore required an update to the protobuf files with `bin/protoc-go.sh`. While the change to deprecate this method was extremely small, a lot of protobuf fils were updated in the process. These changes to the code and protobuf files should probably remain coupled since `TapByResource` is officially deprecated in the public API, but a majority of the additions/deletions are related to those files.
This draft passes `go test` as well as a local run of the integration tests.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
This PR adds `trafficsplit` as a supported resource for the `linkerd stat` command. Users can type `linkerd stat ts` to see the apex and leaf services of their trafficsplits, as well as metrics for those leaf services.
`linkerd check`, the web dashboard, and Grafana all perform version
checks to validate Linkerd is up to date. It's common for users to
seldom execute these codepaths. This makes it difficult to identify what
versions of Linkerd are currently in use and what environments it is
being run in, which helps prioritize testing and backports.
Introduce a `heartbeat` CronJob to the default Linkerd install. The
cronjob executes every 24 hours, starting from 5 minutes after
`linkerd install` is run.
Example check URL:
https://versioncheck.linkerd.io/version.json?
install-time=1562761177&
k8s-version=v1.15.0&
meshed-pods=8&
rps=3&
source=heartbeat&
uuid=cc4bb700-3314-426a-9f0f-ec588b9df020&
version=git-b97ee9f7
Fixes#2961
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
This PR improves the CLI output for `linkerd edges` to reflect the latest API
changes.
Source and destination namespaces for each edge are now shown by default. The
`MSG` column has been replaced with `Secured` and contains a green checkmark or
the reason for no identity. A new `-o wide` flag shows the identity of client
and server if known.
During operations with `linkerd stat` sometimes it's not clear the actual
pod status.
This commit introduces a method, to the `k8s`package, getting the pod status,
based on [`kubectl` logic](33a3e325f7/pkg/printers/internalversion/printers.go (L558-L640))
to expose the `STATUS` column for pods . Also, it changes the stat command
on the` cli` package adding a column when the resource type is a Pod.
Fixes#1967
Signed-off-by: Jonathan Juares Beber <jonathanbeber@gmail.com>
* Have `linkerd endpoints` use `Destination.Get`
Fixes#2885
We're refactoring `linkerd endpoints` so it hits
directly the `Destination.Get` endpoint, instead of relying on the
Discovery service.
For that, I've created a new `client.go` for Destination and added it to
the `APIClient` interface.
I've also added a `destinationClient` struct that mimics `tapClient`,
and whose common logic has been moved into `stream_client.go`.
Analogously, I added a `destinationServer` struct that mimics
`tapServer`.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
This change adds an endpoint to the public API to allow us to query Prometheus for edge data, in order to display identity information for connections between Linkerd proxies. This PR only includes changes to the controller and protobuf.
linkerd/linkerd2#2349 removed the `--single-namespace` flag, in favor of
runtime detection of cluster vs. namespace access, and also
ServiceProfile availability. This maintained control-plane support for
running in these two states.
This change requires control-plane components have cluster-wide
Kubernetes API access and ServiceProfile availability, and will error
out if not. Once #2349 merges, stage 1 install will be a requirement for
a successful stage 2 install.
Part of #2337
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
We were depending on an untagged version of prometheus/client_golang
from Feb 2018.
This bumps our dependency to v0.9.2, from Dec 2018.
Also, this is a prerequisite to #1488.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The control-plane components relied on a `--single-namespace` param,
passed from `linkerd install` into each individual component, to
determine which namespaces they were authorized to access, and whether
to support ServiceProfiles. This command-line flag was redundant given
the authorization rules encoded in the parent `linkerd install` output,
via [Cluster]Role[Binding]s.
Modify the control-plane components to query Kubernetes at startup to
determine which namespaces they are authorized to access, and whether
ServiceProfile support is available. This allows removal of the
`--single-namespace` flag on the components.
Also update `bin/test-cleanup` to cleanup the ServiceProfile CRD.
TODO:
- Remove `--single-namespace` flag on `linkerd install`, part of #2164
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Adds a flag, tcp_stats to the StatSummary request, which queries prometheus for TCP stats.
This branch returns TCP stats at /api/tps-reports when this flag is true.
TCP stats are now displayed on the Resource Detail pages.
The current queried TCP stats are:
tcp_open_connections
tcp_read_bytes_total
tcp_write_bytes_total
`golangci-lint` performs numerous checks on Go code, including golint,
ineffassign, govet, and gofmt.
This change modifies `bin/lint` to use `golangci-lint`, and replaces
usage of golint and govet.
Also perform a one-time gofmt cleanup:
- `gofmt -s -w controller/`
- `gofmt -s -w pkg/`
Part of #217
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The Proxy API service lacked introspection of its internal state.
Introduce a new gRPC Discovery API, implemented by two servers:
1) Proxy API Server: returns a snapshot of discovery state
2) Public API Server: pass-through to the Proxy API Server
Also wire up a new `linkerd endpoints` command.
Fixes#2165
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Fixes#2119
When Linkerd is installed in single-namespace mode, the public-api container panics when it attempts to access watch service profiles.
In single-namespace mode, we no longer watch service profiles and return an informative error when the TopRoutes API is called.
Signed-off-by: Alex Leong <alex@buoyant.io>
Fixes#1875
This change improves the `linkerd routes` command in a number of important ways:
* The restriction on the type of the `--to` argument is lifted and any resource type can now be used. Try `--to ns/books`, `--to po/webapp-ABCDEF`, `--to au/linkerd.io`, or even `--to svc`.
* All routes for the target will now be populated in the table, even if there are no Prometheus metrics for that route.
* [UNKNOWN] has been renamed to [DEFAULT]
* The `Service/Authority` column will now list `Service` in all cases except for when an authority target is explicitly requested.
```
$ linkerd routes deploy/traffic --to deploy/webapp
ROUTE SERVICE SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
GET / webapp 100.00% 0.5rps 50ms 180ms 196ms
GET /authors/{id} webapp 100.00% 0.5rps 100ms 900ms 980ms
GET /books/{id} webapp 100.00% 0.9rps 38ms 93ms 99ms
POST /authors webapp 100.00% 0.5rps 35ms 48ms 50ms
POST /authors/{id}/delete webapp 100.00% 0.5rps 83ms 180ms 196ms
POST /authors/{id}/edit webapp 0.00% 0.0rps 0ms 0ms 0ms
POST /books webapp 45.16% 2.1rps 75ms 425ms 485ms
POST /books/{id}/delete webapp 100.00% 0.5rps 30ms 90ms 98ms
POST /books/{id}/edit webapp 56.00% 0.8rps 92ms 875ms 975ms
[DEFAULT] webapp 0.00% 0.0rps 0ms 0ms 0ms
```
This is all made possible by a shift in the way we handle the destination resource. When we get a request with a `ToResource`, we use the k8s API to find all Services which include at least one pod belonging to that resource. We then fetch all service profiles for those services and display the routes from those serivce profiles.
This shift in thinking also precipitates a change in the TopRoutes API where we no longer need special cases for `ToAll` (which can be specified by `--to au`) or `ToAuthority` (which can be specified by `--to au/<authority>`) and instead can use a `ToResource` to handle all cases.
Signed-off-by: Alex Leong <alex@buoyant.io>
Commit 1: Enable lint check for comments
Part of #217. Follow up from #1982 and #2018.
A subsequent commit will fix the ci failure.
Commit 2: Address all comment-related linter errors.
This change addresses all comment-related linter errors by doing the
following:
- Add comments to exported symbols
- Make some exported symbols private
- Recommend via TODOs that some exported symbols should should move or
be removed
This PR does not:
- Modify, move, or remove any code
- Modify existing comments
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Add parameter to stats API to skip retrieving Prometheus stats
Used by the dashboard to populate list of resources.
Fixes#1022
Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
* Prometheus queries check results were being ignored
* Refactor verifyPromQueries() to also test when no prometheus queries
should be generated
* Add test for SkipStats=true
Includes adding ability to public.GenStatSummaryResponse to not generate
basicStats
* Fix previous test
We rework the routes command so that it can accept any Kubernetes resource, making it act much more similarly to the stat command.
Signed-off-by: Alex Leong <alex@buoyant.io>
Add a barebones ListServices endpoint, in support of autocomplete for services.
As we develop service profiles, this endpoint could probably be used to describe
more aspects of services (like, if there were some way to check whether a
service profile was enabled or not).
Accessible from the web UI via http://localhost:8084/api/services
Add a routes command which displays per-route stats for services that have service profiles defined.
This change has three parts:
* A new public-api RPC called `TopRoutes` which serves per-route stat data about a service
* An implementation of TopRoutes in the public-api service. This implementation reads per-route data from Prometheus. This is very similar to how the StatSummaries RPC and much of the code was able to be refactored and shared.
* A new CLI command called `routes` which displays the per-route data in a tabular or json format. This is very similar to the `stat` command and much of the code was able to be refactored and shared.
Note that as of the currently targeted proxy version, only outbound route stats are supported so the `--from` flag must be included in order to see data. This restriction will be lifted in an upcoming change once we add support for inbound route stats as well.
Signed-off-by: Alex Leong <alex@buoyant.io>
Added support for json output in `linkerd stat` through a new (-o|--output)=json option.
Fixes#1417
Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
This PR begins to migrate Conduit to Linkerd2:
* The proxy has been completely removed from this repo, and is now located at
github.com/linkerd/linkerd2-proxy.
* A `Dockerfile-proxy` has been added to fetch the most-recently published proxy
binary from build.l5d.io.
* Proxy-specific protobuf bindings have been moved to
github.com/linkerd/linkerd2-proxy-api.
* All docker images now use the gcr.io/linkerd-io registry.
* `inject` now uses `LINKERD2_PROXY_` environment variables
* Go paths have been updated to reflect the new (future) repo location.
- Return pod uptimes from the GetPods endpoint
- Adds filtering by namespace to api.GetPods
- Adds a --namespace filter to conduit get pods
- Adds pod uptimes to the controller component toolitps on the ServiceMesh page
- Moves the ServiceMesh page back to using /api/pods
Adds the ability to query by a new non-kubernetes resource type, "authorities",
in the StatSummary api.
This includes an extensive refactor of stat_summary.go to deal with non-kubernetes
resource types.
- Add documentation to Resource in the public api so we can use it for authority
- Handle non-k8s resource requests in the StatSummary endpoint
- Rewrite stat summary fetching and parsing to handle non-k8s resources
- keys stat summary metric handling by Resource instead of a generated string
- Adds authority to the CLI
- Adds /authorities to the Web UI
- Adds some more stat integration and unit tests
Problem
`conduit stat` would cause a panic for any resource that wasn't in the list
of StatAllResourceTypes
This bug was introduced by https://github.com/runconduit/conduit/pull/1088/files
Solution
Fix writeStatsToBuffer to not depend on what resources are in StatAllResourceTypes
Also adds a unit test and integration test for `conduit stat ns`
* Fix bug where we were dropping parts of the StatSummaryRequest
* Add tests for prometheus query strings and for failed cases
Problem
In #928 I rewrote the stat api to handle 'all' as a resource type. To query for all resource types,
we would copy the Resource, LabelSelector and TimeWindow of the original request, and then
go through all the resource types and set Resource.Type for each resource we wanted to get.
The bug is that while we copy over some fields of the original request, we didn't copy over all
of them - namely Resource.Name and the Outbound resource. So the Stat endpoint would
ignore any --to or --from flags, and would ignore requests for a specific named resource.
Solution
Copy over all fields from the request.
I've also added tests for this case. In this process I've refactored the stat_summary_test code
to make it a bit easier to read/use.
The existing `tap` command is being deprecated.
Introduce a `tapByResource` cli command. It supports tapping a Kubernetes
resource or collection of resources, optionally filtered by outbound resources.
This command will eventually replace `tap`.
Part of #778
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
This changes the public api to have a new rpc type, `TapByResource`.
This api supersedes the Tap api. `TapByResource` is richer, more closely
reflecting the proxy's capabilities.
The proxy's Tap api is extended to select over destination labels,
corresponding with those returned by the Destination api.
Now both `Tap` and `TapByResource`'s responses may include destination
labels.
This change avoids breaking backwards compatibility by:
* introducing the new `TapByResource` rpc type, opting not to change Tap
* extending the proxy's Match type with a new, optional, `destination_label` field.
* `TapEvent` is extended with a new, optional, `destination_meta`.
* Remove the telemetry service
The telemetry service is no longer needed, now that prometheus scrapes
metrics directly from proxies, and the public-api talks directly to
prometheus. In this branch I'm removing the service itself as well as
all of the telemetry protobuf, and updating the conduit install command
to no longer install the service. I'm also removing the old version of
the stat command, which required the telemetry service, and renaming the
statsummary command to stat.
* Fix time window tests
* Remove deprecated controller scrape config
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The success rate calculation relies on the `classification` label, but
was incorrectly specifying `fail` rather than `failure`.
Fix public api to specify `failure`. Also re-org public api tests for
easier Kubernetes and Prometheus mocking.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Define a new telemetry Stat API
Proposal definition for a new Stat API, for the purposes of satisfying the queries proposed in #627.
StatSummary will replace Stat once implemented and the original Stat deleted.