linkerd/linkerd2#2349 removed the `--single-namespace` flag, in favor of
runtime detection of cluster vs. namespace access, and also
ServiceProfile availability. This maintained control-plane support for
running in these two states.
This change requires control-plane components have cluster-wide
Kubernetes API access and ServiceProfile availability, and will error
out if not. Once #2349 merges, stage 1 install will be a requirement for
a successful stage 2 install.
Part of #2337
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The control-plane components relied on a `--single-namespace` param,
passed from `linkerd install` into each individual component, to
determine which namespaces they were authorized to access, and whether
to support ServiceProfiles. This command-line flag was redundant given
the authorization rules encoded in the parent `linkerd install` output,
via [Cluster]Role[Binding]s.
Modify the control-plane components to query Kubernetes at startup to
determine which namespaces they are authorized to access, and whether
ServiceProfile support is available. This allows removal of the
`--single-namespace` flag on the components.
Also update `bin/test-cleanup` to cleanup the ServiceProfile CRD.
TODO:
- Remove `--single-namespace` flag on `linkerd install`, part of #2164
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Up until now, the proxy-api controller service has been the sole service
that the proxy communicates with, implementing the majoriry of the API
defined in the `linkerd2-proxy-api` repo. But this is about to change:
linkerd/linkerd2-proxy-api#25 introduces a new Identity service; and
this service must be served outside of the existing proxy-api service
in the linkerd-controller deployment (so that it may run under a
distinct service account).
With this change, the "proxy-api" name becomes less descriptive. It's no
longer "the service that serves the API for the proxy," it's "the
service that serves the Destination API to the proxy." Therefore, it
seems best to bite the bullet and rename this to be the "destination"
service (i.e. because it only serves the
`io.linkerd.proxy.destination.Destination` service).
Co-authored-by: Kevin Lingerfelt <kl@buoyant.io>
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
`golangci-lint` performs numerous checks on Go code, including golint,
ineffassign, govet, and gofmt.
This change modifies `bin/lint` to use `golangci-lint`, and replaces
usage of golint and govet.
Also perform a one-time gofmt cleanup:
- `gofmt -s -w controller/`
- `gofmt -s -w pkg/`
Part of #217
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The Proxy API service lacked introspection of its internal state.
Introduce a new gRPC Discovery API, implemented by two servers:
1) Proxy API Server: returns a snapshot of discovery state
2) Public API Server: pass-through to the Proxy API Server
Also wire up a new `linkerd endpoints` command.
Fixes#2165
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Fixes#2119
When Linkerd is installed in single-namespace mode, the public-api container panics when it attempts to access watch service profiles.
In single-namespace mode, we no longer watch service profiles and return an informative error when the TopRoutes API is called.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Introduce resource selector and deprecate namespace field for ListPods
* Changes from code review
* Properly deprecate the field
* Do not check for nil
* Fix the mockProm usage
* Protoc changes revert
* Changed from code review
Signed-off-by: Alena Varkockova <varkockova.a@gmail.com>
Commit 1: Enable lint check for comments
Part of #217. Follow up from #1982 and #2018.
A subsequent commit will fix the ci failure.
Commit 2: Address all comment-related linter errors.
This change addresses all comment-related linter errors by doing the
following:
- Add comments to exported symbols
- Make some exported symbols private
- Recommend via TODOs that some exported symbols should should move or
be removed
This PR does not:
- Modify, move, or remove any code
- Modify existing comments
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Add a barebones ListServices endpoint, in support of autocomplete for services.
As we develop service profiles, this endpoint could probably be used to describe
more aspects of services (like, if there were some way to check whether a
service profile was enabled or not).
Accessible from the web UI via http://localhost:8084/api/services
Updates to the Kubernetes utility code in `/controller/k8s` to support interacting with ServiceProfiles.
This makes use of the code generated client added in #1752
Signed-off-by: Alex Leong <alex@buoyant.io>
This PR begins to migrate Conduit to Linkerd2:
* The proxy has been completely removed from this repo, and is now located at
github.com/linkerd/linkerd2-proxy.
* A `Dockerfile-proxy` has been added to fetch the most-recently published proxy
binary from build.l5d.io.
* Proxy-specific protobuf bindings have been moved to
github.com/linkerd/linkerd2-proxy-api.
* All docker images now use the gcr.io/linkerd-io registry.
* `inject` now uses `LINKERD2_PROXY_` environment variables
* Go paths have been updated to reflect the new (future) repo location.
- Return pod uptimes from the GetPods endpoint
- Adds filtering by namespace to api.GetPods
- Adds a --namespace filter to conduit get pods
- Adds pod uptimes to the controller component toolitps on the ServiceMesh page
- Moves the ServiceMesh page back to using /api/pods
* Add controller admin servers and readiness probes
* Tweak readiness probes to be more sane
* Refactor based on review feedback
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
public-api and and tap were both using their own implementations of
the Kubernetes Informer/Lister APIs.
This change factors out all Informer/Lister usage into the Lister
module. This also introduces a new `Lister.GetObjects` method.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The TapByResource endpoint was previously a stub.
Implement end-to-end tapByResource functionality, with support for
specifying any kubernetes resource(s) as target and destination.
Fixes#803, #49
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The `stat` command did not support `service` as a resource type.
This change adds `service` support to the `stat` command. Specifically:
- as a destination resource on `--to` commands
- as a target resource on `--from` commands
Fixes#805
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Add namespace as a resource type in public-api
The cli and public-api only supported deployments as a resource type.
This change adds support for namespace as a resource type in the cli and
public-api. This also change includes:
- cli statsummary now prints `-`'s when objects are not in the mesh
- cli statsummary prints `No resources found.` when applicable
- removed `out-` from cli statsummary flags, and analagous proto changes
- switched public-api to use native prometheus label types
- misc error handling and logging fixes
Part of #627
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Refactor filter and groupby label formulation
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Rename stat_summary.go to stat.go in cli
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Update rbac privileges for namespace stats
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
Conduit was relying on apps/v1 to Deployment and ReplicaSet APIs.
apps/v1 is not available on Kubernetes 1.8. This prevented the
public-api from starting.
Switch Conduit to use apps/v1beta2. Also increase the Kubernetes API
cache sync timeout from 10 to 60 seconds, as it was taking 11 seconds on
a test cluster.
Fixes#761
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Remove the telemetry service
The telemetry service is no longer needed, now that prometheus scrapes
metrics directly from proxies, and the public-api talks directly to
prometheus. In this branch I'm removing the service itself as well as
all of the telemetry protobuf, and updating the conduit install command
to no longer install the service. I'm also removing the old version of
the stat command, which required the telemetry service, and renaming the
statsummary command to stat.
* Fix time window tests
* Remove deprecated controller scrape config
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The ListPods endpoint's logic resides in the telemetry service, which is
going away.
Move ListPods logic into public-api, use new k8s informer APIs.
Fixes#694
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Switch public API to use cached k8s resources
* Move shared informer code to separate goroutine
* Fix spelling issue
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The success rate calculation relies on the `classification` label, but
was incorrectly specifying `fail` rather than `failure`.
Fix public api to specify `failure`. Also re-org public api tests for
easier Kubernetes and Prometheus mocking.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The StatSummary logic was implemented as a method on http_server.
Move the StatSummary logic into grpc_server, for consistency with the
other endpoints.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
When the conduit proxy is injected into the controller pod, we observe controller pod proxy stats show up as an "outbound" deployment for an unrelated upstream deployment. This may cause confusion when monitoring deployments in the service mesh.
This PR filters out this "misleading" stat in the public api whenever the dashboard requests metric information for a specific deployment.
* exclude telemetry generated by the control plane when requesting deployment metrics
fixes#370
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
In PR #298 we moved time window parsing (10s => (time.now - 10s,
time.now) down the stack to immediately before the query. This had the
unintended effect of creating parallel latency quantile requests with
slightly different timestamps.
This change parses the time window prior to latency quantile fan out,
ensuring all requests have the same timestamp.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
PR #298 moved summary (non-timeseries) requests to Prometheus' Query
endpoint, with no timestamp provided. This Query endpoint returns a
single data point with whatever timestamp was provided in the request.
In the absense of a timestamp, it uses current server time. This causes
the Public API to return discreet data points with slightly different
timestamps, which is unexpected behavior.
Modify the Public API -> Telemetry -> Prometheus request path to always
require a timestamp for single data point requests.
Fixes#340
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The Public APIs stat endpoint copies a slice of values to a slice of
pointers prior to gRPC response. Go's range clause re-uses the same
pointer for each iteration of the loop, causing a slice of {1,2,3}
becoming {3,3,3}.
Fix the range loop to directly reference pointers in the slice of
values, ignoring the range variable. Also add tests to catch this case.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Follow-up from #315.
Now that the UIs don't report per-path metrics, we can remove the path label from Prometheus, the path aggregation and filtering options from the telemetry API, and the path field from the proxy report API.
I've modified the tests to no longer expect the removed fields, and manually verified that Conduit still works after making these changes.
Closes#265
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Prometheus queries from the Telemetry service were taking seconds or 10s
of seconds.
Optimize these queries:
- Move all summary queries requiring a single point data off of Prometheus'
QueryRange() endpoint, onto Query()
- Set `defaultVectorRange` to 30s, and also use it regardless of time
window
Also add tests for grpc_server and telemetry server
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Fixes#260