Commit Graph

35 Commits

Author SHA1 Message Date
Scott Fleener 3847f9cf13
Set minimum TLS version to 1.3 (#13500)
This helps ensure a minimum level of security. The two places this affects is our controller webhook and linkerd-viz tap API.

The controller requires that kube-api supports TLSv1.3, which it does as of 1.19 (our minimum is currently 1.22). The linkerd-viz tap API is mostly used internally, and is deprecated. It may be worth revisiting if we want to keep it around at all.

Signed-off-by: Scott Fleener <scott@buoyant.io>
2024-12-19 09:19:09 -05:00
Oliver Gould 9bd16f3b3b
chore: update Go code for new lints (#13437)
Before updating our dev image with a new version of golangci-lint, this change
updates our Go code to satisfy new lints.

No functional changes.
2024-12-06 07:14:17 -08:00
dependabot[bot] d42432914d
build(deps): bump google.golang.org/grpc from 1.63.2 to 1.64.0 (#12593)
* build(deps): bump google.golang.org/grpc from 1.63.2 to 1.64.0

Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.63.2 to 1.64.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.63.2...v1.64.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

I've replaced all the `grpc.Dial` calls with `grpc.NewClient`. There was one call `grpc.DialContext(ctx,...)` in `viz/tap/api/grpc_server.go` that also got replaced with `grpc.NewClient`, which loses the ability to pass `ctxt` but that doesn't matter; as we're not using `WithBlock(true)` that context wasn't being accounted for when we were using `DialContext()` anyways.

https://github.com/grpc/grpc-go/blob/v1.64.0/clientconn.go#L216-L242

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
2024-05-22 14:40:04 -05:00
Matei David 407df01ec3
chore(controller): Remove stream concurrency limits (#12598)
Our gRPC servers use the default gRPC server configuration, which
limits the number of concurrent streams to 100. Since the controllers
run with proxies, this provides a hard scaling limit for the number of
watches an application can have.

This change updates our gRPC server configuration to clear the default
concurrency limit, allowing the server to handle as many streams as
possible.

Signed-off-by: Matei David <matei@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
2024-05-15 18:07:15 +01:00
Alejandro Pedraza b697e285a0
Refactor IPv4-only functions to also work for IPv6 (#12303)
The main change here is the refactoring of the address functions in `addr.go` that support the Destination controller and Viz's Tap controller. Some of those functions only worked for IPv4, so this change refactored them to make them IP family agnostic.
This enabled adding (and fixing) IPv6 unit tests as detailed in the following sections.

Other changes:

- The `ProxyAddressesToString()` function was no longer used, so it got removed.
- The `ProxyIPToString()` function was only used by the destination-client script, so that got stripped out.

## `addr_test.go`

We added IPv6 cases to each test, that would have failed previously.

## `endpoint_translator_test.go`

One of the test pods (pod3) was changed to have an IPv6. Without the other changes in this PR those tests would still have passed, but just because when comparing actual IPs with expected ones we weren't checking if they were both zero. So here we added checks against that.

## `server_test.go`

As above, we added checks against empty IPs. And in the mocked resources in `test_util.go` we added an IPv6 EndpointSlice.
2024-03-22 07:20:52 -05:00
Andrew Seigner ff25a71d90
Remove shortnames from Tap API resources (#11816)
The Tap API resource shortnames were colliding with existing Kubernetes
resources (e.g. `po`, `deploy`, etc), causing warnings from kubectl
v1.29.0+.

Remove the shortnames from the Tap APIService handlers.

To validate:
```bash
bin/k3d cluster create

# install latest edge
curl https://run.linkerd.io/install-edge | sh
linkerd install --crds | kubectl apply -f -
linkerd install        | kubectl apply -f -
linkerd check
linkerd viz install    | kubectl apply -f -
linkerd check

# observe shortnames
kubectl api-resources --api-group=tap.linkerd.io

# with kubectl v1.29.0+, observe "Warning: short name..."
kubectl get po

# replace tap image
TAP_IMAGE=$(bin/docker-build-tap)
bin/k3d image load $TAP_IMAGE
kubectl -n linkerd-viz set image deploy/tap tap=$TAP_IMAGE

# verify shortnames are no longer present
kubectl api-resources --api-group=tap.linkerd.io

# with kubectl v1.29.0+, observe no warning
kubectl get po
```

Fixes linkerd/linkerd2#11784

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2024-01-04 06:37:36 -05:00
Alex Leong 368b63866d
Add support for remote discovery (#11224)
Adds support for remote discovery to the destination controller.

When the destination controller gets a `Get` request for a Service with the `multicluster.linkerd.io/remote-discovery` label, this is an indication that the destination controller should discover the endpoints for this service from a remote cluster.  The destination controller will look for a remote cluster which has been linked to it (using the `linkerd multicluster link` command) with that name.  It will look at the `multicluster.linkerd.io/remote-discovery` label for the service name to look up in that cluster.  It then streams back the endpoint data for that remote service.

Since we now have multiple client-go informers for the same resource types (one for the local cluster and one for each linked remote cluster) we add a `cluster` label onto the prometheus metrics for the informers and EndpointWatchers to ensure that each of these components' metrics are correctly tracked and don't overwrite each other.

---------

Signed-off-by: Alex Leong <alex@buoyant.io>
2023-08-11 09:31:45 -07:00
Ryan Hristovski 5d8e5e0959
Add support for customizable ignore headers list in Linkerd Tap (#10443)
Currently, Linkerd Tap displays all headers by default, which can contain irrelevant or sensitive information that users may want to exclude from being displayed. Therefore, there is a need for a feature that allows users to set their own ignore headers list, so that they can choose which headers to exclude from the output of Linkerd Tap on the dashboard.

This commit adds a new helm value, `tap.ignoredHeaders`, to the Linkerd-viz chart. This value allows users to specify a comma-separated list of headers to be ignored by Linkerd Tap. The default list includes commonly irrelevant and sensitive headers such as authorization tokens and cookies.

Once merged, every comma-seperated value in `tap.ignoredHeaders` will be removed from the headers in the Tap Dashboard when expanding a request headers result.

Fixes #10389

Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
2023-03-30 16:16:16 -07:00
cui fliter 8c6de42210
all: fix some comments (#10387)
Signed-off-by: cui fliter <imcusg@gmail.com>
2023-03-01 11:47:02 +00:00
Oliver Gould 363e123d79
Update to dev:v39 with Go 1.19 (#10336) 2023-02-16 08:25:42 -08:00
Alex Leong b0778bb2ea
Readiness checks fail until caches are synced (#10166)
Fixes https://github.com/linkerd/linkerd2/issues/10036

The Linkerd control plane components written in go serve liveness and readiness probes endpoint on their admin server.  However, the admin server is not started until k8s informer caches are synced, which can take a long time on large clusters.  This means that liveness checks can time out causing the controller to be restarted.

We start the admin server before attempting to sync caches so that we can respond to liveness checks immediately.  We fail readiness probes until the caches are synced.

Signed-off-by: Alex Leong <alex@buoyant.io>
2023-01-25 11:43:09 -08:00
Alejandro Pedraza 7428d4aa51
Removed dupe imports (#10049)
* Removed dupe imports

My IDE (vim-gopls) has been complaining for a while, so I decided to take
care of it. Found via
[staticcheck](https://github.com/dominikh/go-tools)

* Add stylecheck to go-lint checks
2023-01-10 14:34:56 -05:00
Oliver Gould 54d2bcb0ec
controller: Increase HTTP ReadHeaderTimeout to 15s (#9272)
04a66ba added a `ReadHeaderTimeout` to our HTTP servers (at gosec's
insistence). We chose a fairly arbitrary timeout of 10s. This
configuration causes any connection that has been idle for 10s to be
torn down by the server. Unfortunately, this timeout value matches the
default Kubernetes probe interval and the default linkerd-viz scrape
interval. This can cause probes to race the timeout so that the
connection is healthy from the proxy's point of view and a request is
sent on the connection exactly as the server drops the connection.
These request failures cause controller success rate to appear degraded.

To correct this, this change raises the timeout to 15s so that the
timeout no longer matches the default probe interval.

The proxy's HTTP client is supposed to [retry] requests that encounter
this type of error. We should follow up by doing more research into why
that is not occurring in this situation.

[retry]: https://docs.rs/hyper/0.14.20/hyper/client/struct.Builder.html#method.retry_canceled_requests

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-08-26 13:33:38 -07:00
Oliver Gould 04a66bacea
Set a header read timeout on HTTP servers (#9181)
Newer versions of golangci-lint flag `http.Server` instances that do not
set a `ReadHeaderTimeout` as being vulnerable to "slowloris" attacks,
wherein clients initiate requests that hold connections open
indefinitely.

This change sets a `ReadHeaderTimeout` of 10s. This timeout is fairly
conservative so that clients can eagerly create connections, but is
still constrained enough that these connections won't remain open
indefinitely.

This change also updates kubert to v0.9.1, which instruments a header
read timeout on the policy admission server.

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-08-16 11:10:23 -07:00
Steve Zhang f8413f805c
Generate ipv4 and ipv6 compat URL address for Linkerd2 (#8598)
Introduce change to generate dual-stack compatible URL addresses. This change refactors
some of the code to use Go's standard net library to generate IPv4 and IPv6 compatible
addresses.

The main components affected by this change:

* multicluster probe worker
* port-forwarding package
* viz tap api

Signed-off-by: zhlsunshine <huailong.zhang@intel.com>
2022-06-06 12:19:47 +01:00
Oliver Gould fa8ddb4801
Use go-test/deep for comparisons in tests (#8427)
We frequently compare data structures--sometimes very large data
structures--that are difficult to compare visually. This change replaces
uses of `reflect.DeepEqual` with `deep.Equal`. `go-test`'s `deep.Equal`
returns a diff of values that are not equal.

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-05-05 09:31:07 -07:00
Kevin Leimkuhler 388f14f48f
allow pprof to be configurable via helm flags (#8090)
Follow-up to #8087 that allows pprof to be enabled via the `--set
enablePprof=true` flag.

Each control plane components spawns its own admin server, so each of these
received it's own `enable-pprof` flag. When `enablePprof=true`, it is passed
through to each component so that when it launches its admin server, its pprof
endpoints are enabled.

A note on the templating: `-enable-pprof={{.Values.enablePprof | default
false}}`. `false` values are not rendered by Helm so without the `... | default
false}}`, it tries to pass the flag as `-enable-pprof=""` which results in an
error. Inlining this felt better than conditionally passing the flag with

```yaml {{ if .Values.enablePprof -}} -enable-pprof={{.Values.enablePprof}} {{
end -}} ```

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2022-03-22 14:31:04 -06:00
Oliver Gould 619cd2641e
tap: Quote logged errors for CWE-117 (#8044)
When the Tap controller logs errors, it can potentially log error
messages that include data that originates from the network. This makes
these logs succeptible to log forgery ([CWE-117]).

This change updates log formatting to use `%q` so that newlines, etc are
escaped.

[CWE-117]: https://cwe.mitre.org/data/definitions/117.html

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-03-14 07:25:21 -07:00
Kevin Leimkuhler fc2032fb8e
enable `staticcheck` (#8037)
Closes #7881 

This makes the rest of the necessary fixes to satisfy the `staticcheck` lint.

The only class of lints that are being skipped are those related to deprecated tap code. There was some discussion on the original change started by @adleong about if this _actually_ deprecated [here](https://github.com/linkerd/linkerd2/pull/3240#discussion_r313634584); it doesn't look like we every came back around to fully removing it but I don't think it should be a blocker for enabling the lint right now.

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2022-03-10 15:43:35 -08:00
Kevin Leimkuhler 67bcd8f642
Add `gosec` and `errcheck` lints (#7954)
Closes #7826

This adds the `gosec` and `errcheck` lints to the `golangci` configuration. Most significant lints have been fixed my individual changes, but this enables them by default so that all future changes are caught ahead of time.

A significant amount of these lints are been exluced by the various `exclude-rules` rules added to `.golangci.yml`. These include operations are files that generally do not fail such as `Copy`, `Flush`, or `Write`. We also choose to ignore most errors when cleaning up functions via the `defer` keyword.

Aside from those, there are several other rules added that all have comments explaining why it's okay to ignore the errors that they cover.

Finally, several smaller fixes in the code have been made where it seems necessary to catch errors or at least log them.

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2022-03-03 10:09:51 -07:00
Oliver Gould f5876c2a98
go: Enable `errorlint` checking (#7885)
Since Go 1.13, errors may "wrap" other errors. [`errorlint`][el] checks
that error formatting and inspection is wrapping-aware.

This change enables `errorlint` in golangci-lint and updates all error
handling code to pass the lint. Some comparisons in tests have been left
unchanged (using `//nolint:errorlint` comments).

[el]: https://github.com/polyfloyd/go-errorlint

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-02-16 18:32:19 -07:00
Oliver Gould e03f6182f4
Require use of at least TLS v1.2 (#7837)
In several places where we build TLS servers (usually in our webhooks),
we use the default TLS configuration, which enables legacy versions of
TLS.

This change updates these servers to specify a minimum TLS version of
v1.2.

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-02-07 19:13:02 -08:00
Oliver Gould a24aa51639
tap: Fix potential log forgery in debug logging (#7819)
Our `SubjectAccessReview` debug logging can include newlines in
user-provided data. This change ensures these values are quoted/escaped.

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-02-07 15:06:52 -08:00
Oliver Gould f93ed670a9
tap: Fix typo in filename (#7820)
The file named `sever.go` should be named `server.go`.

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-02-07 13:45:18 -08:00
Oliver Gould ab70db014c
Fix log forgery issues in production-facing code (#7664)
CodeQL has caught several instances where we may be susceptible to [log
forgery][cql].

This change ensures that we strip newlines from log messages that
include potentially user-supplied strings. Several redundant error logs
are removed--we should generally not log an error when returning an
error. Errors should be logged where they are handled.

This change also properly escapes URL paths when constructing them from
protobuf messages.

Note that CodeQL continued to mark some of these uses as issues, but
we've marked them as false-positive. See github/codeql-go#635 and
github/codeql-go#650.

[cql]: https://codeql.github.com/codeql-query-help/go/go-log-injection/

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-01-24 10:18:39 -08:00
Stepan Rabotkin 5e6a1b5508
Graceful shutdown for admin server (#6817)
* Graceful shutdown for admin server

Signed-off-by: Stepan Rabotkin <epicstyt@gmail.com>
2021-09-07 10:50:31 -05:00
Alex Leong 24792cfd1c
Remove core dependency on viz (#6497)
Fixes #5589 

The core control plane has a dependency on the viz package in order to use the `BuildResource` function.  This "backwards" dependency means that the viz source code needs to be included in core docker-builds and is bad for code hygiene.

We move the `BuildResource` function into the viz package.  In `cli/cmd/metrics.go` we replace a call to `BuildResource` with a call directly to `CanonicalResourceNameFromFriendlyName`.

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-07-19 14:28:45 -07:00
dependabot[bot] 789aeea561
Fix gRPC servers (#6510)
Bump github.com/linkerd/linkerd2-proxy-api from 0.1.18 to 0.2.0

Bumps [github.com/linkerd/linkerd2-proxy-api](https://github.com/linkerd/linkerd2-proxy-api) from 0.1.18 to 0.2.0.
- [Release notes](https://github.com/linkerd/linkerd2-proxy-api/releases)
- [Changelog](https://github.com/linkerd/linkerd2-proxy-api/blob/main/CHANGES.md)
- [Commits](https://github.com/linkerd/linkerd2-proxy-api/compare/v0.1.18...v0.2.0)

---
updated-dependencies:
- dependency-name: github.com/linkerd/linkerd2-proxy-api
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Oliver Gould <olix0r@gmail.com>

Co-authored-by: Oliver Gould <ver@buoyant.io>
Co-authored-by: Oliver Gould <olix0r@gmail.com>
2021-07-19 10:24:23 -05:00
dependabot[bot] a3c21d7aad
Bump github.com/prometheus/common from 0.10.0 to 0.29.0 (#6327)
* Bump github.com/prometheus/common from 0.10.0 to 0.29.0

Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.10.0 to 0.29.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](https://github.com/prometheus/common/compare/v0.10.0...v0.29.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alex Leong <alex@buoyant.io>
Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-06-22 09:56:12 -06:00
Alex Leong 948f9a4ece
Update protoc (#6333)
Update protoc from 3.6.0 to 3.15.7

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-06-21 16:37:57 -07:00
Oliver Gould ab32c4e32b
tap: Avoid debug logging of headers values (#6144)
CodeQL flags our debug-logging of user and group header values. While
this information isn't strictly sensitive, it isn't really necessary,
either--we've haven't used any of this logged information for practical
diagnostics in years, if then. And because these header names are read
as configuration and not hardcoded, there's no way for us to be really
_sure_ that these headers are safe to log. So, I propose that we just
stop logging them and instead log the header names instead :).

While we're here, we should use `Header.Values()` instead of direct
slice access. `Values()` ensures that headers are encoded in the
canonical MIME format ("Train-Case") to ensure case-insenstive
comparison.

[1]: https://codeql.github.com/codeql-query-help/go/go-clear-text-logging/
2021-05-24 09:47:36 -07:00
Shubhendra Singh Chauhan ad3b9accd8
fix: issues affecting code quality (#5827)
Fix various lint issues:

- Remove unnecessary calls to fmt.Sprint
- Fix check for empty string
- Fix unnecessary calls to Printf
- Combine multiple `append`s into a single call

Signed-off-by: shubhendra <withshubh@gmail.com>
2021-03-15 17:35:40 -04:00
Alex Leong 57d851b434
Report better errors for pods with tap disabled (#5799)
Fixes https://github.com/linkerd/linkerd2/discussions/5777

When a user runs `linkerd viz check --proxy`, it will print a warning if there are any proxies which cannot be tapped.  This is a normal state of affairs after freshly installing the linkerd-viz extensions because any existing pods will need to be restarted before they can be tapped.  The check warning may lead users to falsely believe that something has gone wrong with their installation.

We remove this specific check from `linkerd viz check --proxy`.  To replace it, we improve the error output when attempting to tap a resource which is not tappable.  This gives the user actionable feedback when the tap command fails.

Old:

```console
> linkerd viz tap -n emojivoto deploy/vote-bot
no pods to tap for deployment/vote-bot
```

New:

```console
> linkerd viz tap -n emojivoto deploy/vote-bot
no pods to tap for deployment/vote-bot
1 pods found with tap not enabled:
	* vote-bot-64dd87cb87-7mcv4
restart these pods to enable tap and make them valid tap targets
```

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-02-24 12:32:46 -08:00
Mayank Shah 96e078421c
CLI: Remove the `--disable-tap` flag from inject (#5671)
Fixes https://github.com/linkerd/linkerd2/issues/5664

- Remove `--disable-flag` from `inject`
-  Move `config.linkerd.io/disable-tap` to `viz.linkerd.io/disable-tap`

Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
2021-02-11 10:01:53 -05:00
Kevin Leimkuhler 75fcc9d623
Move tap from core into Viz extension (#5651)
Closes #5545.

This change moves all tap and tap-injector code into the viz directory. 

The tap and tap-injector components now also use a new tap image—separating
these components from the controller image that they are currently part of. This
means the controller image has removed all its build dependencies related to
tap.

Finally, the tap Protobuf has been separated from the metrics-api and moved into
it's own `.proto` file and gen directory. This introduces a clear split between
metrics-api and tap Protobuf.

There is no change in behavior for the `viz tap` command.

### Reviewing

#### Docker images

All the bin directory scripts should be updated to build and load the tap image.
All the CI workflows should be updated to build and push the tap image.

#### Controller and pkg directories

This is primarily deletions. Most of the deleted code in this directory is now
in the tap directory of the Viz extension.

#### viz/tap

This is the location that all the tap related code now lives in. New files are
mostly moved from the controller and pkg directories. Imports have all been
updated to point at the right locations and Protobuf.

The Protobuf here is taken from metrics-api and contains all tap-related
Protobuf.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-02-09 12:43:21 -05:00