Closes#7881
This makes the rest of the necessary fixes to satisfy the `staticcheck` lint.
The only class of lints that are being skipped are those related to deprecated tap code. There was some discussion on the original change started by @adleong about if this _actually_ deprecated [here](https://github.com/linkerd/linkerd2/pull/3240#discussion_r313634584); it doesn't look like we every came back around to fully removing it but I don't think it should be a blocker for enabling the lint right now.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
As part of #7881 it is recommended we upgrade to use `protojson` instead of `jsonPb`. This required a few additional changes in the code since the signature of `Marshal` changed so I've split this out into a separate change.
`Marshal` now returns the resulting bytes instead of taking an interface to write those bytes into. This means we have to handle the error separately and then write the JSON into the given writer.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
Closes#7826
This adds the `gosec` and `errcheck` lints to the `golangci` configuration. Most significant lints have been fixed my individual changes, but this enables them by default so that all future changes are caught ahead of time.
A significant amount of these lints are been exluced by the various `exclude-rules` rules added to `.golangci.yml`. These include operations are files that generally do not fail such as `Copy`, `Flush`, or `Write`. We also choose to ignore most errors when cleaning up functions via the `defer` keyword.
Aside from those, there are several other rules added that all have comments explaining why it's okay to ignore the errors that they cover.
Finally, several smaller fixes in the code have been made where it seems necessary to catch errors or at least log them.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
Since Go 1.13, errors may "wrap" other errors. [`errorlint`][el] checks
that error formatting and inspection is wrapping-aware.
This change enables `errorlint` in golangci-lint and updates all error
handling code to pass the lint. Some comparisons in tests have been left
unchanged (using `//nolint:errorlint` comments).
[el]: https://github.com/polyfloyd/go-errorlint
Signed-off-by: Oliver Gould <ver@buoyant.io>
Now, that SMI functionality is fully being moved into the
[linkerd-smi](www.github.com/linkerd/linkerd-smi) extension, we can
stop supporting its functionality by default.
This means that the `destination` component will stop reacting
to the `TrafficSplit` objects. When `linkerd-smi` is installed,
It does the conversion of `TrafficSplit` objects to `ServiceProfiles`
that destination components can understand, and will react accordingly.
Also, Whenever a `ServiceProfile` with traffic splitting is associated
with a service, the same information (i.e splits and weights) is also
surfaced through the `UI` (in the new `services` tab) and the `viz cmd`.
So, We are not really loosing any UI functionality here.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
When running `linkerd check -o short` there can still be formatting issues:
- When there are no core warnings but there are extension warnings, a newline is printed at the start of the output
- When there are warnings, there is no newline printed between the warnings and the result
Below you can see the extra newline (before `Linkerd extensions checks`) and the lack of a newline on line before `Status check results ...`.
Old:
```shell
$ linkerd check -o short
Linkerd extensions checks
=========================
linkerd-viz
-----------
...
Status check results are √
```
New:
```shell
$ linkerd check -o short
Linkerd extensions checks
=========================
linkerd-viz
-----------
...
Status check results are √
```
---
This fixes the above issues by moving the newline printing to the end of a category—which right now is Core and Extension.
If there is no output for either, then no newline is printed. This results in no stray newlines when running in short output and there are no warnings.
```shell
$ linkerd check -o short
Status check results are √
```
If there is output for a category, then the category handles the newline printing itself meaning we don't need to track if a newline needs to be printed _before_ a category or _before_ the results.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
* Add services to web dashboard
Adds services as a new screnn on a web dashboard. Services screen is
implemented as a separate func, because it is different from
pods/jobs/etc.
Services view will not have pods or edges section because this
information is not available.
Signed-off-by: Krzysztof Dryś <krzysztofdrys@gmail.com>
This fixes this error seen in the dashboard:
```
api/extensions?extension_name=multicluster:1 Failed to load resource: the server responded with a status of 500 (Internal Server Error)
```
The issue was #6684 had changed the shape of the error returned by
`GetNamespaceWithExtensionLabel` and the `handleGetExtensions` function
wasn't adapted to it.
Introduce new table in the control plane view of the dashboard to display installed extensions.
Signed-off-by: Sanni Michael <sannimichaelse@gmail.com>
* Removed `Version` API from the public-api
This is a sibling PR to #5993, and it's the second step towards removing the `linkerd-controller` pod.
This one deals with a replacement for the `Version` API, fetching instead the `linkerd-config` CM and retrieving the `LinkerdVersion` value.
## Changes to the public-api
- Removal of the `publicPb.ApiClient` entry from the `Client` interface
- Removal of the `publicPb.ApiServer` entry from the `Server` interface
- Removal of the `Version` and related methods from `client.go`, `grpc_server.go` and `http_server.go`
## Changes to `linkerd version`
- Removal of all references to the public API.
- Call `healthcheck.GetServerVersion` to retrieve the version
## Changes to `linkerd check`
- Removal of the "can query the control API" check from the "linkerd-api" section
- Addition of a new "can retrieve the control plane version" check under the "control-plane-version" section
## Changes to `linkerd-web`
- The version is now retrieved from the `linkerd-config` CM instead of a public-API call.
- Removal of all references to the public API.
- Removal of the `data-go-version` global attribute on the dashboard, which wasn't being used.
## Other changes
- Added `ValuesFromConfigMap` function in `values.go` to convert the `linkerd-config` CM into a `*Values` struct instance
- Removal of the `public` protobuf
- Refactor 'linkerd repair' to use the refactored 'healthcheck.GetServerVersion()' function
* checks: make hintBaseURL configurable
Fixes#5740
Currently, When external binaries use the `healthcheck` pkg, They
can't really override the `hintBaseURL` variable which is used
to set the baseURL of the troubleshooting doc (in which hintAnchor
is used as a anchor).
This PR updates that by adding a new `hintBaseURL` field to
`healthcheck.Category`. This field has been added to `Category`
instead of `Checker` so as to reduce unecessary redundancy as
the chances of setting a different baseURL for checkers under
the same category is very low.
This PR also updates the `runChecks` logic to automatically fallback
onto the linkerd troubleshooting doc as the baseURL so that all
the current checks don't have to set this field. This can be done
otherwise (i.e removing the fallback logic and make all categories
specify the URL) if there is a problem.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Hide the "Gateway" sidebar link
This commit hides the "Gateway" sidebar link in the dashboard if the
`linkerd-multicluster` extension is not installed. If a user happens to navigate to
the Gateway page anyway, we display a CTA (Call to Action) that informs
the user that they would need to run the multicluster install command.
This change includes a new endpoint in the dashboard server; `GET
/api/extensions`. This endpoint returns the namespace an extension
is installed in when passing in extension name. The dashboard uses
this endpoint to detect whether it needs to hide the navigation link
and whether to display the CTA.
Fixes#5330
Closes#5545.
This change moves all tap and tap-injector code into the viz directory.
The tap and tap-injector components now also use a new tap image—separating
these components from the controller image that they are currently part of. This
means the controller image has removed all its build dependencies related to
tap.
Finally, the tap Protobuf has been separated from the metrics-api and moved into
it's own `.proto` file and gen directory. This introduces a clear split between
metrics-api and tap Protobuf.
There is no change in behavior for the `viz tap` command.
### Reviewing
#### Docker images
All the bin directory scripts should be updated to build and load the tap image.
All the CI workflows should be updated to build and push the tap image.
#### Controller and pkg directories
This is primarily deletions. Most of the deleted code in this directory is now
in the tap directory of the Viz extension.
#### viz/tap
This is the location that all the tap related code now lives in. New files are
mostly moved from the controller and pkg directories. Imports have all been
updated to point at the right locations and Protobuf.
The Protobuf here is taken from metrics-api and contains all tap-related
Protobuf.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* Protobuf changes:
- Moved `healthcheck.proto` back from viz to `proto/common` as it remains being used by the main `healthcheck.go` library (it was moved to viz by #5510).
- Extracted from `viz.proto` the IP-related types and put them in `/controller/gen/common/net` to be used by both the public and the viz APIs.
* Added chart templates for new viz linkerd-metrics-api pod
* Spin-off viz healthcheck:
- Created `viz/pkg/healthcheck/healthcheck.go` that wraps the original `pkg/healthcheck/healthcheck.go` while adding the `vizNamespace` and `vizAPIClient` fields which were removed from the core `healthcheck`. That way the core healthcheck doesn't have any dependencies on viz, and viz' healthcheck can now be used to retrieve viz api clients.
- The core and viz healthcheck libs are now abstracted out via the new `healthcheck.Runner` interface.
- Refactored the data plane checks so they don't rely on calling `ListPods`
- The checks in `viz/cmd/check.go` have been moved to `viz/pkg/healthcheck/healthcheck.go` as well, so `check.go`'s sole responsibility is dealing with command business. This command also now retrieves its viz api client through viz' healthcheck.
* Removed linkerd-controller dependency on Prometheus:
- Removed the `global.prometheusUrl` config in the core values.yml.
- Leave the Heartbeat's `-prometheus` flag hard-coded temporarily. TO-DO: have it automatically discover viz and pull Prometheus' endpoint (#5352).
* Moved observability gRPC from linkerd-controller to viz:
- Created a new gRPC server under `viz/metrics-api` moving prometheus-dependent functions out of the core gRPC server and into it (same thing for the accompaigning http server).
- Did the same for the `PublicAPIClient` (now called just `Client`) interface. The `VizAPIClient` interface disappears as it's enough to just rely on the viz `ApiClient` protobuf type.
- Moved the other files implementing the rest of the gRPC functions from `controller/api/public` to `viz/metrics-api` (`edge.go`, `stat_summary.go`, etc.).
- Also simplified some type names to avoid stuttering.
* Added linkerd-metrics-api bootstrap files. At the same time, we strip out of the public-api's `main.go` file the prometheus parameters and other no longer relevant bits.
* linkerd-web updates: it requires connecting with both the public-api and the viz api, so both addresses (and the viz namespace) are now provided as parameters to the container.
* CLI updates and other minor things:
- Changes to command files under `cli/cmd`:
- Updated `endpoints.go` according to new API interface name.
- Updated `version.go`, `dashboard` and `uninstall.go` to pull the viz namespace dynamically.
- Changes to command files under `viz/cmd`:
- `edges.go`, `routes.go`, `stat.go` and `top.go`: point to dependencies that were moved from public-api to viz.
- Other changes to have tests pass:
- Added `metrics-api` to list of docker images to build in actions workflows.
- In `bin/fmt` exclude protobuf generated files instead of entire directories because directories could contain both generated and non-generated code (case in point: `viz/metrics-api`).
* Add retry to 'tap API service is running' check
* mc check shouldn't err when viz is not available. Also properly set the log in multicluster/cmd/root.go so that it properly displays messages when --verbose is used
* Separate observability API
Closes#5312
This is a preliminary step towards moving all the observability API into `/viz`, by first moving its protobuf into `viz/metrics-api`. This should facilitate review as the go files are not moved yet, which will happen in a followup PR. There are no user-facing changes here.
- Moved `proto/common/healthcheck.proto` to `viz/metrics-api/proto/healthcheck.prot`
- Moved the contents of `proto/public.proto` to `viz/metrics-api/proto/viz.proto` except for the `Version` Stuff.
- Merged `proto/controller/tap.proto` into `viz/metrics-api/proto/viz.proto`
- `grpc_server.go` now temporarily exposes `PublicAPIServer` and `VizAPIServer` interfaces to separate both APIs. This will get properly split in a followup.
- The web server provides handlers for both interfaces.
- `cli/cmd/public_api.go` and `pkg/healthcheck/healthcheck.go` temporarily now have methods to access both APIs.
- Most of the CLI commands will use the Viz API, except for `version`.
The other changes in the go files are just changes in the imports to point to the new protobufs.
Other minor changes:
- Removed `git add controller/gen` from `bin/protoc-go.sh`
Fixes#5121
* cli: skip emitting warnings in Profile
Whenever the tapDuration gets completed, there is a warning occured
which we do not emit. This looks like it has been changed in the latest
versions of the dependency.
* Use context.withDeadline instead of client.timeout
The usage of `client.Timeout` is not working correctly causing `W1022
17:20:12.372780 19049 transport.go:260] Unable to cancel request for
promhttp.RoundTripperFunc` to be emitted by the Kubernetes Client.
This is fixed by using context.WithDeadline and passing that into the
http Request.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Fixes#4191#4993
This bumps Kubernetes client-go to the latest v0.19.2 (We had to switch directly to 1.19 because of this issue). Bumping to v0.19.2 required upgrading to smi-sdk-go v0.4.1. This also depends on linkerd/stern#5
This consists of the following changes:
- Fix ./bin/update-codegen.sh by adding the template path to the gen commands, as it is needed after we moved to GOMOD.
- Bump all k8s related dependencies to v0.19.2
- Generate CRD types, client code using the latest k8s.io/code-generator
- Use context.Context as the first argument, in all code paths that touch the k8s client-go interface
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Introduce multicluster gateway api handler in web api server
* Added MetricsUtil for Gateway metrics
* Added gateway api helper
* Added Gateway Component
Updated metricsTable component to support gateway metrics
Added handler for gateway
Fixes#4601
Signed-off-by: Tharun <rajendrantharun@live.com>
This PR allows the dashboard to query for a resource's definition in YAML
format, if the boolean `queryForDefinition` in the `ResourceDetail` component is
set to true.
This change to the web API and the dashboard component was made for a future
redesigned dashboard detail page. At present, `queryForDefinition` is set to
false and there is no visible change to the user with this PR.
Signed-off-by: Sergio Castaño Arteaga <tegioz@icloud.com> Signed-off-by: Cintia
Sanchez Garcia <cynthiasg@icloud.com>
`linkerd check` can now be run from the dashboard in the `/controlplane` view.
Once the check results are received, they are displayed in a modal in a similar
style to the CLI output.
Closes#3613
### Motivation
PR #3167 introduced the tap APIService and migrated `linkerd tap` to use it.
Subsequent PRs (#3186 and #3187) updated `linkerd top` and `linkerd profile
--tap` to use the tap APIService. This PR moves the web's Go server to now also
use the tap APIService instead of the public API. It also ensures an error
banner is shown to the user when unauthorized taps fail via `linkerd top`
command in *Overview* and *Top*, and `linkerd tap` command in *Tap*.
### Details
The majority of these changes are focused around piping through the HTTP error
that occurs and making sure the error banner generated displays the error
message explaining to view the tap RBAC docs.
`httpError` is now public (`HTTPError`) and the error message generated is short
enough to fit in a control frame (explained [here](https://github.com/linkerd/linkerd2/blob/kleimkuhler%2Fweb-tap-apiserver/web/srv/api_handlers.go#L173-L175)).
### Testing
The error we are testing for only occurs when the linkerd-web service account is
not authorzied to tap resources. Unforutnately that is not the case on Docker
For Mac (assuming that is what you use locally), so you'll need to test on a
different cluster. I chose a GKE cluster made through the GKE console--not made
through cluster-utils because it adds cluster-admin.
Checkout the branch locally and `bin/docker-build` or `ares-build` if you have
it setup. It should produce a linkerd with the version `git-04e61786`. I have
already pushed the dependent components, so you won't need to `bin/docker-push
git-04e61786`.
Install linkerd on this GKE cluster and try to run `tap` or `top` commands via
the web. You should see the following errors:
### Tap

### Top

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
Adds an Edges table to the resource detail view that shows the source,
destination name and identity for proxied connections to and from the resource
shown.
linkerd/linkerd2#2365 introduced the goconst linter and fixes, but additional lint
errors had been introduced to master.
This change fixes the one remaining goconst issue.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Adds a flag, tcp_stats to the StatSummary request, which queries prometheus for TCP stats.
This branch returns TCP stats at /api/tps-reports when this flag is true.
TCP stats are now displayed on the Resource Detail pages.
The current queried TCP stats are:
tcp_open_connections
tcp_read_bytes_total
tcp_write_bytes_total
gosimple is a Go linter that specializes in simplifying code
Also fix one spelling error in `cred_test.go`
Part of #217
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
In #2195 we introduced `linkerd endpoints` on the CLI. I would like similar
information to be on the web.
This PR adds an api endpoint at `/api/endpoints`, and introduces a new debugging
pagethat shows a table of endpoints, available at `/debug`
The recent routes API changes caused the Top Routes tab to stop working, as it
wasn't looking for the changed structure of the response. This PR updates that
page to accept the new API response.
This PR also adds to fields to the Top Routes query form, so that the equivalent
of linkerd routes deploy --to deploy/authors will work in the dashboard.
* Introduce resource selector and deprecate namespace field for ListPods
* Changes from code review
* Properly deprecate the field
* Do not check for nil
* Fix the mockProm usage
* Protoc changes revert
* Changed from code review
Signed-off-by: Alena Varkockova <varkockova.a@gmail.com>
* Add parameter to stats API to skip retrieving Prometheus stats
Used by the dashboard to populate list of resources.
Fixes#1022
Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
* Prometheus queries check results were being ignored
* Refactor verifyPromQueries() to also test when no prometheus queries
should be generated
* Add test for SkipStats=true
Includes adding ability to public.GenStatSummaryResponse to not generate
basicStats
* Fix previous test
We rework the routes command so that it can accept any Kubernetes resource, making it act much more similarly to the stat command.
Signed-off-by: Alex Leong <alex@buoyant.io>
Adds a (currently not displayed in sidebar, but available at /routes) page to
mirror the current functionality of `linkerd routes <service>`. So far, this is just a
barebones form and table, but it works.
Adds a /api/routes path and handler to the api to receive TopRoutes requests from the web.
Add a barebones ListServices endpoint, in support of autocomplete for services.
As we develop service profiles, this endpoint could probably be used to describe
more aspects of services (like, if there were some way to check whether a
service profile was enabled or not).
Accessible from the web UI via http://localhost:8084/api/services
Add a routes command which displays per-route stats for services that have service profiles defined.
This change has three parts:
* A new public-api RPC called `TopRoutes` which serves per-route stat data about a service
* An implementation of TopRoutes in the public-api service. This implementation reads per-route data from Prometheus. This is very similar to how the StatSummaries RPC and much of the code was able to be refactored and shared.
* A new CLI command called `routes` which displays the per-route data in a tabular or json format. This is very similar to the `stat` command and much of the code was able to be refactored and shared.
Note that as of the currently targeted proxy version, only outbound route stats are supported so the `--from` flag must be included in order to see data. This restriction will be lifted in an upcoming change once we add support for inbound route stats as well.
Signed-off-by: Alex Leong <alex@buoyant.io>
When a resource has no tap events being streamed to the Tap UI, and a user hits the "Stop" button in the Tap page, the tap stream is left open due to the WebSocket connection not being closed.
It looks like the web server's tap client that is created to stream events from the tap server blocks the main request thread in the web server. This causes the web server to stop receiving any subsequent close frames from the UI i.e. when the "Stop" button is clicked.
This PR moves the tapClient initialization code to a separate goroutine, specifically, the goroutine that reads tap events from the incoming grpc tap stream. This allows the main thread to continue reading messages from the WebSocket connection and allow it to receive close frames.
fixes#1665
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
Previously, WebSocket error messages would appear with the first
couple characters cut off. I've fixed this by using ws.WriteControl
instead of ws.WriteMessage to write errors, as gorilla does in
their example app.
- Use writeControl to write error messages to the client
- Stop the spinner if there is an error present
Increase the MaxRps on the tap server to 100 RPS.
The max RPS for tap/top was increased in for the CLI #1531, but we were
still manually setting this to 1 RPS in the Web UI and Web server.
Remove the pervasive setting of MaxRps to 1 in the web frontend and server
* Make use of the Web UI to render tap events in a table
- Return JSON tap events instead of the command line output
- Experiment with a different way of rendering the EventList
- changed the default width back to 100% of the screen because this
table does not look great squished
Adds a tap endpoint in the web api that communicates with the dashboard
via websockets.
I've moved a bunch of code from the cli tap.go into utils so that the code
can be shared between web and CLI. I think we should consider making the
display more suited to web, but in the short term, reusing the CLI's
rendering of tap events works.
Adds a Tap page in the Web UI that you can use to make tap requests.
The form currently only allows you to enter a resource and namespace,
other filters coming in a follow-up branch.
This PR begins to migrate Conduit to Linkerd2:
* The proxy has been completely removed from this repo, and is now located at
github.com/linkerd/linkerd2-proxy.
* A `Dockerfile-proxy` has been added to fetch the most-recently published proxy
binary from build.l5d.io.
* Proxy-specific protobuf bindings have been moved to
github.com/linkerd/linkerd2-proxy-api.
* All docker images now use the gcr.io/linkerd-io registry.
* `inject` now uses `LINKERD2_PROXY_` environment variables
* Go paths have been updated to reflect the new (future) repo location.
- Return pod uptimes from the GetPods endpoint
- Adds filtering by namespace to api.GetPods
- Adds a --namespace filter to conduit get pods
- Adds pod uptimes to the controller component toolitps on the ServiceMesh page
- Moves the ServiceMesh page back to using /api/pods
Don't allow the CLI or Web UI to request named resources if --all-namespaces is used.
This follows kubectl, which also does not allow requesting named resources
over all namespaces.
This PR also updates the Web API's behaviour to be in line with the CLI's.
Both will now default to the default namespace if no namespace is specified.
* Add namespace as a resource type in public-api
The cli and public-api only supported deployments as a resource type.
This change adds support for namespace as a resource type in the cli and
public-api. This also change includes:
- cli statsummary now prints `-`'s when objects are not in the mesh
- cli statsummary prints `No resources found.` when applicable
- removed `out-` from cli statsummary flags, and analagous proto changes
- switched public-api to use native prometheus label types
- misc error handling and logging fixes
Part of #627
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Refactor filter and groupby label formulation
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Rename stat_summary.go to stat.go in cli
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Update rbac privileges for namespace stats
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>