The control-plane components relied on a `--single-namespace` param,
passed from `linkerd install` into each individual component, to
determine which namespaces they were authorized to access, and whether
to support ServiceProfiles. This command-line flag was redundant given
the authorization rules encoded in the parent `linkerd install` output,
via [Cluster]Role[Binding]s.
Modify the control-plane components to query Kubernetes at startup to
determine which namespaces they are authorized to access, and whether
ServiceProfile support is available. This allows removal of the
`--single-namespace` flag on the components.
Also update `bin/test-cleanup` to cleanup the ServiceProfile CRD.
TODO:
- Remove `--single-namespace` flag on `linkerd install`, part of #2164
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The Proxy API service lacked introspection of its internal state.
Introduce a new gRPC Discovery API, implemented by two servers:
1) Proxy API Server: returns a snapshot of discovery state
2) Public API Server: pass-through to the Proxy API Server
Also wire up a new `linkerd endpoints` command.
Fixes#2165
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Fixes#2119
When Linkerd is installed in single-namespace mode, the public-api container panics when it attempts to access watch service profiles.
In single-namespace mode, we no longer watch service profiles and return an informative error when the TopRoutes API is called.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Introduce resource selector and deprecate namespace field for ListPods
* Changes from code review
* Properly deprecate the field
* Do not check for nil
* Fix the mockProm usage
* Protoc changes revert
* Changed from code review
Signed-off-by: Alena Varkockova <varkockova.a@gmail.com>
Commit 1: Enable lint check for comments
Part of #217. Follow up from #1982 and #2018.
A subsequent commit will fix the ci failure.
Commit 2: Address all comment-related linter errors.
This change addresses all comment-related linter errors by doing the
following:
- Add comments to exported symbols
- Make some exported symbols private
- Recommend via TODOs that some exported symbols should should move or
be removed
This PR does not:
- Modify, move, or remove any code
- Modify existing comments
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Add a barebones ListServices endpoint, in support of autocomplete for services.
As we develop service profiles, this endpoint could probably be used to describe
more aspects of services (like, if there were some way to check whether a
service profile was enabled or not).
Accessible from the web UI via http://localhost:8084/api/services
* Use ListPods always for data plane HC
* Missing changes in grpc_server.go
* Address review comments
* Read proxy version from spec
Signed-off-by: Alena Varkockova <varkockova.a@gmail.com>
* Update ant to 3.7.2
* Add autocomplete of namespaces/resources to Tap in web ui
* Add form fields for authority/path/method/rps/scheme
* Add the ability to clear error messages to the error banner
* Add error listener to ws object
This PR begins to migrate Conduit to Linkerd2:
* The proxy has been completely removed from this repo, and is now located at
github.com/linkerd/linkerd2-proxy.
* A `Dockerfile-proxy` has been added to fetch the most-recently published proxy
binary from build.l5d.io.
* Proxy-specific protobuf bindings have been moved to
github.com/linkerd/linkerd2-proxy-api.
* All docker images now use the gcr.io/linkerd-io registry.
* `inject` now uses `LINKERD2_PROXY_` environment variables
* Go paths have been updated to reflect the new (future) repo location.
- Return pod uptimes from the GetPods endpoint
- Adds filtering by namespace to api.GetPods
- Adds a --namespace filter to conduit get pods
- Adds pod uptimes to the controller component toolitps on the ServiceMesh page
- Moves the ServiceMesh page back to using /api/pods
The `conduit tap` command is now deprecated.
Replace `conduit tap` with `connduit tapByResource`. Rename tapByResource
to tap. The underlying protobuf for tap remains, the tap gRPC endpoint now
returns Unimplemented.
Fixes#804
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
public-api and and tap were both using their own implementations of
the Kubernetes Informer/Lister APIs.
This change factors out all Informer/Lister usage into the Lister
module. This also introduces a new `Lister.GetObjects` method.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The `stat` command did not support `service` as a resource type.
This change adds `service` support to the `stat` command. Specifically:
- as a destination resource on `--to` commands
- as a target resource on `--from` commands
Fixes#805
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
This changes the public api to have a new rpc type, `TapByResource`.
This api supersedes the Tap api. `TapByResource` is richer, more closely
reflecting the proxy's capabilities.
The proxy's Tap api is extended to select over destination labels,
corresponding with those returned by the Destination api.
Now both `Tap` and `TapByResource`'s responses may include destination
labels.
This change avoids breaking backwards compatibility by:
* introducing the new `TapByResource` rpc type, opting not to change Tap
* extending the proxy's Match type with a new, optional, `destination_label` field.
* `TapEvent` is extended with a new, optional, `destination_meta`.
* Add namespace as a resource type in public-api
The cli and public-api only supported deployments as a resource type.
This change adds support for namespace as a resource type in the cli and
public-api. This also change includes:
- cli statsummary now prints `-`'s when objects are not in the mesh
- cli statsummary prints `No resources found.` when applicable
- removed `out-` from cli statsummary flags, and analagous proto changes
- switched public-api to use native prometheus label types
- misc error handling and logging fixes
Part of #627
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Refactor filter and groupby label formulation
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Rename stat_summary.go to stat.go in cli
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Update rbac privileges for namespace stats
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
Conduit was relying on apps/v1 to Deployment and ReplicaSet APIs.
apps/v1 is not available on Kubernetes 1.8. This prevented the
public-api from starting.
Switch Conduit to use apps/v1beta2. Also increase the Kubernetes API
cache sync timeout from 10 to 60 seconds, as it was taking 11 seconds on
a test cluster.
Fixes#761
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Remove the telemetry service
The telemetry service is no longer needed, now that prometheus scrapes
metrics directly from proxies, and the public-api talks directly to
prometheus. In this branch I'm removing the service itself as well as
all of the telemetry protobuf, and updating the conduit install command
to no longer install the service. I'm also removing the old version of
the stat command, which required the telemetry service, and renaming the
statsummary command to stat.
* Fix time window tests
* Remove deprecated controller scrape config
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The ListPods endpoint's logic resides in the telemetry service, which is
going away.
Move ListPods logic into public-api, use new k8s informer APIs.
Fixes#694
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Switch public API to use cached k8s resources
* Move shared informer code to separate goroutine
* Fix spelling issue
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The StatSummary logic was implemented as a method on http_server.
Move the StatSummary logic into grpc_server, for consistency with the
other endpoints.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Start implementing new conduit stat summary endpoint.
Changes the public-api to call prometheus directly instead of the
telemetry service. Wired through to `api/stat` on the web server,
as well as `conduit statsummary` on the CLI. Works for deployments only.
Current implementation just retrieves requests and mesh/total pod count
(so latency stats are always 0).
Uses API defined in #663
Example queries the stat endpoint will eventually satisfy in #627
This branch includes commits from @klingerf
* run ./bin/dep ensure
* run ./bin/update-go-deps-shas
* Define a new telemetry Stat API
Proposal definition for a new Stat API, for the purposes of satisfying the queries proposed in #627.
StatSummary will replace Stat once implemented and the original Stat deleted.
When the conduit proxy is injected into the controller pod, we observe controller pod proxy stats show up as an "outbound" deployment for an unrelated upstream deployment. This may cause confusion when monitoring deployments in the service mesh.
This PR filters out this "misleading" stat in the public api whenever the dashboard requests metric information for a specific deployment.
* exclude telemetry generated by the control plane when requesting deployment metrics
fixes#370
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
In PR #298 we moved time window parsing (10s => (time.now - 10s,
time.now) down the stack to immediately before the query. This had the
unintended effect of creating parallel latency quantile requests with
slightly different timestamps.
This change parses the time window prior to latency quantile fan out,
ensuring all requests have the same timestamp.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
PR #298 moved summary (non-timeseries) requests to Prometheus' Query
endpoint, with no timestamp provided. This Query endpoint returns a
single data point with whatever timestamp was provided in the request.
In the absense of a timestamp, it uses current server time. This causes
the Public API to return discreet data points with slightly different
timestamps, which is unexpected behavior.
Modify the Public API -> Telemetry -> Prometheus request path to always
require a timestamp for single data point requests.
Fixes#340
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The Public APIs stat endpoint copies a slice of values to a slice of
pointers prior to gRPC response. Go's range clause re-uses the same
pointer for each iteration of the loop, causing a slice of {1,2,3}
becoming {3,3,3}.
Fix the range loop to directly reference pointers in the slice of
values, ignoring the range variable. Also add tests to catch this case.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
All requests from the public API service to the Telemetry service were
done serially. In some cases a single request to the public API's Stat
endpoint resulted in 5 serial requests to the Telemetry service.
Make all requests from the Public API to Telemetry concurrent.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Part of #299
Follow-up from #315.
Now that the UIs don't report per-path metrics, we can remove the path label from Prometheus, the path aggregation and filtering options from the telemetry API, and the path field from the proxy report API.
I've modified the tests to no longer expect the removed fields, and manually verified that Conduit still works after making these changes.
Closes#265
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Prometheus queries from the Telemetry service were taking seconds or 10s
of seconds.
Optimize these queries:
- Move all summary queries requiring a single point data off of Prometheus'
QueryRange() endpoint, onto Query()
- Set `defaultVectorRange` to 30s, and also use it regardless of time
window
Also add tests for grpc_server and telemetry server
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Fixes#260
This PR updates the web UI to remove the pod detail page, and to remove the links to that page from pod names in metrics tables. It also removes the `pods` option from `conduit stat`, and the `sourcePod` and `targetPod` fields from the controller API proto's `MetricMetadata` message.
I've updated the `conduit stat` tests to reflect these changes, and manually verified the web UI changes.
Closes#261
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
* Set conduit version to match conduit docker tags
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Remove --skip-inbound-ports for emojivoto
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Rename git_sha => git_sha_head
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Switch to using the go linker for setting the version
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Log conduit version when go servers start
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Cleanup conduit script
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Add --short flag to head sha command
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Set CONDUIT_VERSION in docker-compose env
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
Previously, running `$conduit tap` would return a `Unexpected EOF` error when the server wasn't available. This was due to a few problems with the way we were handling errors all the way down the tap server. This change fixes that and cleans some of the protobuf-over-HTTP code.
- first step towards #49
- closes#106
* Move healthcheck proto to separate file, use throughout
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Remove Check message from healthcheck.proto
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Standardize healthcheck protobuf import name
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Abstract Conduit API client from protobuf interface to add new features
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Consolidate mock api clients
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Add simple implementation of healthcheck for conduit api
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Change NextSteps to FriendlyMessageToUser
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Add grpc check for status on the client
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Add simple server-side check for Conduit API
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Fix feedback from PR
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Add support for path stats in cli and web api
The cli stat command supports grouping by pod and deployment. With this
change, it will also support grouping by path, in order to facilitate a
summary stats per individual endpoint.
* Right-align numeric columns in stat output
We’ve built Conduit from the ground up to be the fastest, lightest,
simplest, and most secure service mesh in the world. It features an
incredibly fast and safe data plane written in Rust, a simple yet
powerful control plane written in Go, and a design that’s focused on
performance, security, and usability. Most importantly, Conduit
incorporates the many lessons we’ve learned from over 18 months of
production service mesh experience with Linkerd.
This repository contains a few tightly-related components:
- `proxy` -- an HTTP/2 proxy written in Rust;
- `controller` -- a control plane written in Go with gRPC;
- `web` -- a UI written in React, served by Go.