linkerd/linkerd2#2349 removed the `--single-namespace` flag, in favor of
runtime detection of cluster vs. namespace access, and also
ServiceProfile availability. This maintained control-plane support for
running in these two states.
This change requires control-plane components have cluster-wide
Kubernetes API access and ServiceProfile availability, and will error
out if not. Once #2349 merges, stage 1 install will be a requirement for
a successful stage 2 install.
Part of #2337
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The control-plane components relied on a `--single-namespace` param,
passed from `linkerd install` into each individual component, to
determine which namespaces they were authorized to access, and whether
to support ServiceProfiles. This command-line flag was redundant given
the authorization rules encoded in the parent `linkerd install` output,
via [Cluster]Role[Binding]s.
Modify the control-plane components to query Kubernetes at startup to
determine which namespaces they are authorized to access, and whether
ServiceProfile support is available. This allows removal of the
`--single-namespace` flag on the components.
Also update `bin/test-cleanup` to cleanup the ServiceProfile CRD.
TODO:
- Remove `--single-namespace` flag on `linkerd install`, part of #2164
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
goconst finds repeated strings that could be replaced by a constant:
https://github.com/jgautheron/goconst
Part of #217
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Adds a flag, tcp_stats to the StatSummary request, which queries prometheus for TCP stats.
This branch returns TCP stats at /api/tps-reports when this flag is true.
TCP stats are now displayed on the Resource Detail pages.
The current queried TCP stats are:
tcp_open_connections
tcp_read_bytes_total
tcp_write_bytes_total
`golangci-lint` performs numerous checks on Go code, including golint,
ineffassign, govet, and gofmt.
This change modifies `bin/lint` to use `golangci-lint`, and replaces
usage of golint and govet.
Also perform a one-time gofmt cleanup:
- `gofmt -s -w controller/`
- `gofmt -s -w pkg/`
Part of #217
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The Proxy API service lacked introspection of its internal state.
Introduce a new gRPC Discovery API, implemented by two servers:
1) Proxy API Server: returns a snapshot of discovery state
2) Public API Server: pass-through to the Proxy API Server
Also wire up a new `linkerd endpoints` command.
Fixes#2165
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
DaemonSet stats are not currently shown in the cli stat command, web ui
or grafana dashboard. This commit adds daemonset support for stat.
Update stat command's help message to reference daemonsets.
Update the public-api to support stats for daemonsets.
Add tests for stat summary and api.
Add daemonset get/list/watch permissions to the linkerd-controller
cluster role that's created using the install command.
Update golden expectation test files for install command
yaml manifest output.
Update web UI with daemonsets
Update navigation, overview and pages to list daemonsets and the pods
associated to them.
Add daemonset paths to server, and ui apps.
Add grafana dashboard for daemonsets; a clone of the deployment
dashboard.
Update dependencies and dockerfile hashes
Add DaemonSet support to tap and top commands
Fixes of #2006
Signed-off-by: Zak Knill <zrjknill@gmail.com>
Fixes#2119
When Linkerd is installed in single-namespace mode, the public-api container panics when it attempts to access watch service profiles.
In single-namespace mode, we no longer watch service profiles and return an informative error when the TopRoutes API is called.
Signed-off-by: Alex Leong <alex@buoyant.io>
Commit 1: Enable lint check for comments
Part of #217. Follow up from #1982 and #2018.
A subsequent commit will fix the ci failure.
Commit 2: Address all comment-related linter errors.
This change addresses all comment-related linter errors by doing the
following:
- Add comments to exported symbols
- Make some exported symbols private
- Recommend via TODOs that some exported symbols should should move or
be removed
This PR does not:
- Modify, move, or remove any code
- Modify existing comments
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Add parameter to stats API to skip retrieving Prometheus stats
Used by the dashboard to populate list of resources.
Fixes#1022
Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
* Prometheus queries check results were being ignored
* Refactor verifyPromQueries() to also test when no prometheus queries
should be generated
* Add test for SkipStats=true
Includes adding ability to public.GenStatSummaryResponse to not generate
basicStats
* Fix previous test
Add a routes command which displays per-route stats for services that have service profiles defined.
This change has three parts:
* A new public-api RPC called `TopRoutes` which serves per-route stat data about a service
* An implementation of TopRoutes in the public-api service. This implementation reads per-route data from Prometheus. This is very similar to how the StatSummaries RPC and much of the code was able to be refactored and shared.
* A new CLI command called `routes` which displays the per-route data in a tabular or json format. This is very similar to the `stat` command and much of the code was able to be refactored and shared.
Note that as of the currently targeted proxy version, only outbound route stats are supported so the `--from` flag must be included in order to see data. This restriction will be lifted in an upcoming change once we add support for inbound route stats as well.
Signed-off-by: Alex Leong <alex@buoyant.io>
# Problem
When we add a `--from` query to `linkerd stat au` we get more rows than if we would have just run `linkerd stat au`.
Adding a `--from` causes an extra row to be added, and the named authority to be ignored (this is the result we would have expected when running `linkerd stat au -n emojivoto --from deploy/web`).
# Solution
Destination query labels are now appended to `labels` so that those labels can be filtered on.
# Validation
Tests have been updated to reflect the expected expected destination labels now appended in `--from` queries.
Fixes#1766
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Added support for json output in `linkerd stat` through a new (-o|--output)=json option.
Fixes#1417
Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
Updates to the Kubernetes utility code in `/controller/k8s` to support interacting with ServiceProfiles.
This makes use of the code generated client added in #1752
Signed-off-by: Alex Leong <alex@buoyant.io>
This PR begins to migrate Conduit to Linkerd2:
* The proxy has been completely removed from this repo, and is now located at
github.com/linkerd/linkerd2-proxy.
* A `Dockerfile-proxy` has been added to fetch the most-recently published proxy
binary from build.l5d.io.
* Proxy-specific protobuf bindings have been moved to
github.com/linkerd/linkerd2-proxy-api.
* All docker images now use the gcr.io/linkerd-io registry.
* `inject` now uses `LINKERD2_PROXY_` environment variables
* Go paths have been updated to reflect the new (future) repo location.
* Fix bug where we were using dst_authorities as a group by instead of authorities
* Add test to make sure we don't dst_authorities
Previously, we were only checking to make sure we didn't add
dst_authorities in the query labels in promDstQueryLabels but we
weren't checking the groupBy labels in promDstGroupByLabelNames -
this caused us to try to query for dst_authorities when a --from
query was sent. There are no dst_authorities, so there would be no
named results.
I realized that our stat summary expectation checker would only check the actual
proto responses against the expectations if the expectations were non-empty.
Problem
If we expected empty results and the api returned actual results, we never actually
check those results against the expectations.
The bug can be reproduced by replacing any nonzero metric we expect in
expectedResponse with expectedResponse: genEmptyResponse()
The tests on master will still pass.
Solution
Remove this line and ensure we get the expected number of stat tables.
Adds the ability to query by a new non-kubernetes resource type, "authorities",
in the StatSummary api.
This includes an extensive refactor of stat_summary.go to deal with non-kubernetes
resource types.
- Add documentation to Resource in the public api so we can use it for authority
- Handle non-k8s resource requests in the StatSummary endpoint
- Rewrite stat summary fetching and parsing to handle non-k8s resources
- keys stat summary metric handling by Resource instead of a generated string
- Adds authority to the CLI
- Adds /authorities to the Web UI
- Adds some more stat integration and unit tests
* Add controller admin servers and readiness probes
* Tweak readiness probes to be more sane
* Refactor based on review feedback
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
Problem
`conduit stat` would cause a panic for any resource that wasn't in the list
of StatAllResourceTypes
This bug was introduced by https://github.com/runconduit/conduit/pull/1088/files
Solution
Fix writeStatsToBuffer to not depend on what resources are in StatAllResourceTypes
Also adds a unit test and integration test for `conduit stat ns`
- It would be nice to display container errors in the UI. This PR gets the pod's container
statuses and returns them in the public api
- Also add a terminationMessagePolicy to conduit's inject so that we can capture the
proxy's error messages if it terminates
Both the conduit stat command and web UI are showing failed and completed pods.
This change filters out those pods before returning the result to the client.
Fixes#1010
Signed-off-by: Ivan Sim <ihcsim@gmail.com>
- Update the `response_total` prometheus query of the StatSummary endpoint to also
break queries out by a `meshed` label.
- Add a 'Secured' column to the web UI/CLI stat displays, which indicate the percentage of traffic
starting and ending in the mesh
This meshed label is used in the CLI/Web UI to display a column of the percentage of traffic that
starts/ends in the mesh. (Which is a proxy indicator for whether that traffic is 'secured' when we
add TLS by default for intra mesh requests).
The `meshed` label is not yet added anywhere, so until it is supplied by the proxy, all traffic will
show up as 0% secured in the web/CLI.
The StatSummary endpoint was dereferencing
StatSummaryRequest.Selector.Resource, causing a panic when it received
an empty request.
Fix StatSummary to use the nil-friendly
StatSummaryRequest.GetSelector().GetResource() methods, and add a test
to validate.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Fix bug where we were dropping parts of the StatSummaryRequest
* Add tests for prometheus query strings and for failed cases
Problem
In #928 I rewrote the stat api to handle 'all' as a resource type. To query for all resource types,
we would copy the Resource, LabelSelector and TimeWindow of the original request, and then
go through all the resource types and set Resource.Type for each resource we wanted to get.
The bug is that while we copy over some fields of the original request, we didn't copy over all
of them - namely Resource.Name and the Outbound resource. So the Stat endpoint would
ignore any --to or --from flags, and would ignore requests for a specific named resource.
Solution
Copy over all fields from the request.
I've also added tests for this case. In this process I've refactored the stat_summary_test code
to make it a bit easier to read/use.
Allow the Stat endpoint in the public-api to accept requests for resourceType "all".
Currently, this queries Pods, Deployments, RCs and Services, but can be modified
to query other resources as well.
Both the CLI and web endpoints now work if you set resourceType to all.
e.g. `conduit stat all`
* Modify the Stat endpoint to also return the count of failed pods
* Add comments explaining pod count stats
* Rename total pod count to running pod count
This is to support the service mesh overview page, as I'd like to include an indicator of
failed pods there.
public-api and and tap were both using their own implementations of
the Kubernetes Informer/Lister APIs.
This change factors out all Informer/Lister usage into the Lister
module. This also introduces a new `Lister.GetObjects` method.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The `stat` command did not support `service` as a resource type.
This change adds `service` support to the `stat` command. Specifically:
- as a destination resource on `--to` commands
- as a target resource on `--from` commands
Fixes#805
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The public-api previously only permitted 4 hard-coded time windows:
10s, 1m, 10m, 1h. This was primarily a relic of the recently removed
telemetry system.
Modify the public-api to validate the time string, but allow for any
window size, which is then passed through to Prometheus.
Fixes#686
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Add namespace as a resource type in public-api
The cli and public-api only supported deployments as a resource type.
This change adds support for namespace as a resource type in the cli and
public-api. This also change includes:
- cli statsummary now prints `-`'s when objects are not in the mesh
- cli statsummary prints `No resources found.` when applicable
- removed `out-` from cli statsummary flags, and analagous proto changes
- switched public-api to use native prometheus label types
- misc error handling and logging fixes
Part of #627
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Refactor filter and groupby label formulation
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Rename stat_summary.go to stat.go in cli
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Update rbac privileges for namespace stats
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
Conduit was relying on apps/v1 to Deployment and ReplicaSet APIs.
apps/v1 is not available on Kubernetes 1.8. This prevented the
public-api from starting.
Switch Conduit to use apps/v1beta2. Also increase the Kubernetes API
cache sync timeout from 10 to 60 seconds, as it was taking 11 seconds on
a test cluster.
Fixes#761
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Remove the telemetry service
The telemetry service is no longer needed, now that prometheus scrapes
metrics directly from proxies, and the public-api talks directly to
prometheus. In this branch I'm removing the service itself as well as
all of the telemetry protobuf, and updating the conduit install command
to no longer install the service. I'm also removing the old version of
the stat command, which required the telemetry service, and renaming the
statsummary command to stat.
* Fix time window tests
* Remove deprecated controller scrape config
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The ListPods endpoint's logic resides in the telemetry service, which is
going away.
Move ListPods logic into public-api, use new k8s informer APIs.
Fixes#694
Signed-off-by: Andrew Seigner <siggy@buoyant.io>