Added support for json output in `linkerd stat` through a new (-o|--output)=json option.
Fixes#1417
Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
Updates to the Kubernetes utility code in `/controller/k8s` to support interacting with ServiceProfiles.
This makes use of the code generated client added in #1752
Signed-off-by: Alex Leong <alex@buoyant.io>
This PR begins to migrate Conduit to Linkerd2:
* The proxy has been completely removed from this repo, and is now located at
github.com/linkerd/linkerd2-proxy.
* A `Dockerfile-proxy` has been added to fetch the most-recently published proxy
binary from build.l5d.io.
* Proxy-specific protobuf bindings have been moved to
github.com/linkerd/linkerd2-proxy-api.
* All docker images now use the gcr.io/linkerd-io registry.
* `inject` now uses `LINKERD2_PROXY_` environment variables
* Go paths have been updated to reflect the new (future) repo location.
* Fix bug where we were using dst_authorities as a group by instead of authorities
* Add test to make sure we don't dst_authorities
Previously, we were only checking to make sure we didn't add
dst_authorities in the query labels in promDstQueryLabels but we
weren't checking the groupBy labels in promDstGroupByLabelNames -
this caused us to try to query for dst_authorities when a --from
query was sent. There are no dst_authorities, so there would be no
named results.
I realized that our stat summary expectation checker would only check the actual
proto responses against the expectations if the expectations were non-empty.
Problem
If we expected empty results and the api returned actual results, we never actually
check those results against the expectations.
The bug can be reproduced by replacing any nonzero metric we expect in
expectedResponse with expectedResponse: genEmptyResponse()
The tests on master will still pass.
Solution
Remove this line and ensure we get the expected number of stat tables.
Adds the ability to query by a new non-kubernetes resource type, "authorities",
in the StatSummary api.
This includes an extensive refactor of stat_summary.go to deal with non-kubernetes
resource types.
- Add documentation to Resource in the public api so we can use it for authority
- Handle non-k8s resource requests in the StatSummary endpoint
- Rewrite stat summary fetching and parsing to handle non-k8s resources
- keys stat summary metric handling by Resource instead of a generated string
- Adds authority to the CLI
- Adds /authorities to the Web UI
- Adds some more stat integration and unit tests
* Add controller admin servers and readiness probes
* Tweak readiness probes to be more sane
* Refactor based on review feedback
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
Problem
`conduit stat` would cause a panic for any resource that wasn't in the list
of StatAllResourceTypes
This bug was introduced by https://github.com/runconduit/conduit/pull/1088/files
Solution
Fix writeStatsToBuffer to not depend on what resources are in StatAllResourceTypes
Also adds a unit test and integration test for `conduit stat ns`
- It would be nice to display container errors in the UI. This PR gets the pod's container
statuses and returns them in the public api
- Also add a terminationMessagePolicy to conduit's inject so that we can capture the
proxy's error messages if it terminates
Both the conduit stat command and web UI are showing failed and completed pods.
This change filters out those pods before returning the result to the client.
Fixes#1010
Signed-off-by: Ivan Sim <ihcsim@gmail.com>
- Update the `response_total` prometheus query of the StatSummary endpoint to also
break queries out by a `meshed` label.
- Add a 'Secured' column to the web UI/CLI stat displays, which indicate the percentage of traffic
starting and ending in the mesh
This meshed label is used in the CLI/Web UI to display a column of the percentage of traffic that
starts/ends in the mesh. (Which is a proxy indicator for whether that traffic is 'secured' when we
add TLS by default for intra mesh requests).
The `meshed` label is not yet added anywhere, so until it is supplied by the proxy, all traffic will
show up as 0% secured in the web/CLI.
The StatSummary endpoint was dereferencing
StatSummaryRequest.Selector.Resource, causing a panic when it received
an empty request.
Fix StatSummary to use the nil-friendly
StatSummaryRequest.GetSelector().GetResource() methods, and add a test
to validate.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Fix bug where we were dropping parts of the StatSummaryRequest
* Add tests for prometheus query strings and for failed cases
Problem
In #928 I rewrote the stat api to handle 'all' as a resource type. To query for all resource types,
we would copy the Resource, LabelSelector and TimeWindow of the original request, and then
go through all the resource types and set Resource.Type for each resource we wanted to get.
The bug is that while we copy over some fields of the original request, we didn't copy over all
of them - namely Resource.Name and the Outbound resource. So the Stat endpoint would
ignore any --to or --from flags, and would ignore requests for a specific named resource.
Solution
Copy over all fields from the request.
I've also added tests for this case. In this process I've refactored the stat_summary_test code
to make it a bit easier to read/use.
Allow the Stat endpoint in the public-api to accept requests for resourceType "all".
Currently, this queries Pods, Deployments, RCs and Services, but can be modified
to query other resources as well.
Both the CLI and web endpoints now work if you set resourceType to all.
e.g. `conduit stat all`
* Modify the Stat endpoint to also return the count of failed pods
* Add comments explaining pod count stats
* Rename total pod count to running pod count
This is to support the service mesh overview page, as I'd like to include an indicator of
failed pods there.
public-api and and tap were both using their own implementations of
the Kubernetes Informer/Lister APIs.
This change factors out all Informer/Lister usage into the Lister
module. This also introduces a new `Lister.GetObjects` method.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The `stat` command did not support `service` as a resource type.
This change adds `service` support to the `stat` command. Specifically:
- as a destination resource on `--to` commands
- as a target resource on `--from` commands
Fixes#805
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The public-api previously only permitted 4 hard-coded time windows:
10s, 1m, 10m, 1h. This was primarily a relic of the recently removed
telemetry system.
Modify the public-api to validate the time string, but allow for any
window size, which is then passed through to Prometheus.
Fixes#686
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Add namespace as a resource type in public-api
The cli and public-api only supported deployments as a resource type.
This change adds support for namespace as a resource type in the cli and
public-api. This also change includes:
- cli statsummary now prints `-`'s when objects are not in the mesh
- cli statsummary prints `No resources found.` when applicable
- removed `out-` from cli statsummary flags, and analagous proto changes
- switched public-api to use native prometheus label types
- misc error handling and logging fixes
Part of #627
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Refactor filter and groupby label formulation
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Rename stat_summary.go to stat.go in cli
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Update rbac privileges for namespace stats
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
Conduit was relying on apps/v1 to Deployment and ReplicaSet APIs.
apps/v1 is not available on Kubernetes 1.8. This prevented the
public-api from starting.
Switch Conduit to use apps/v1beta2. Also increase the Kubernetes API
cache sync timeout from 10 to 60 seconds, as it was taking 11 seconds on
a test cluster.
Fixes#761
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Remove the telemetry service
The telemetry service is no longer needed, now that prometheus scrapes
metrics directly from proxies, and the public-api talks directly to
prometheus. In this branch I'm removing the service itself as well as
all of the telemetry protobuf, and updating the conduit install command
to no longer install the service. I'm also removing the old version of
the stat command, which required the telemetry service, and renaming the
statsummary command to stat.
* Fix time window tests
* Remove deprecated controller scrape config
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The ListPods endpoint's logic resides in the telemetry service, which is
going away.
Move ListPods logic into public-api, use new k8s informer APIs.
Fixes#694
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The new StatSummary endpoint was only providing request volume and
successs rate information.
Add support for retrieving latency stats via StatSummary. Also make
all prometheus calls in parallel, and implement kubernetes test
fixtures.
Fixes#681
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Switch public API to use cached k8s resources
* Move shared informer code to separate goroutine
* Fix spelling issue
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The success rate calculation relies on the `classification` label, but
was incorrectly specifying `fail` rather than `failure`.
Fix public api to specify `failure`. Also re-org public api tests for
easier Kubernetes and Prometheus mocking.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The StatSummary logic was implemented as a method on http_server.
Move the StatSummary logic into grpc_server, for consistency with the
other endpoints.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The new statsummary command accepted friendly k8s names, which worked
for k8s queries, but Prometheus requires a specific key.
Modify the statsummary query to map friendly k8s names to canonical k8s
names when constructing the query. Then during the query, map the
canonical k8s name to a specific Prometheus label.
Fixes#695
Signed-off-by: Andrew Seigner <siggy@buoyant.io>