Commit Graph

34 Commits

Author SHA1 Message Date
Alejandro Pedraza f3b1ebfa99
Separate observability API (#5510)
* Separate observability API

Closes #5312

This is a preliminary step towards moving all the observability API into `/viz`, by first moving its protobuf into `viz/metrics-api`. This should facilitate review as the go files are not moved yet, which will happen in a followup PR. There are no user-facing changes here.

- Moved `proto/common/healthcheck.proto` to `viz/metrics-api/proto/healthcheck.prot`
- Moved the contents of `proto/public.proto` to `viz/metrics-api/proto/viz.proto` except for the `Version` Stuff.
- Merged `proto/controller/tap.proto` into `viz/metrics-api/proto/viz.proto`
- `grpc_server.go` now temporarily exposes `PublicAPIServer` and `VizAPIServer` interfaces to separate both APIs. This will get properly split in a followup.
- The web server provides handlers for both interfaces.
- `cli/cmd/public_api.go` and `pkg/healthcheck/healthcheck.go` temporarily now have methods to access both APIs.
- Most of the CLI commands will use the Viz API, except for `version`.

The other changes in the go files are just changes in the imports to point to the new protobufs.

Other minor changes:
- Removed `git add controller/gen` from `bin/protoc-go.sh`
2021-01-13 14:34:54 -05:00
Tarun Pothulapati 3a16baa141
Use errors.Is instead of checking underlying err messages (#5140)
* Use errors.Is instead of checking underlying err messages

Fixes #5132

This PR replaces the usage of `strings.hasSuffix` with `errors.Is`
wherever error messages are being checked. So, that the code is not
effected by changes in the underlying message. Also adds a string
const for http2 response body closed error

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-10-28 21:33:17 +05:30
Tarun Pothulapati 39e7f84773
cli: fix and update timeout warnings in profile cmd (#5122)
Fixes #5121

* cli: skip emitting warnings in Profile


Whenever the tapDuration gets completed, there is a warning occured
which we do not emit. This looks like it has been changed in the latest
versions of the dependency.

* Use context.withDeadline instead of client.timeout

The usage of `client.Timeout` is not working correctly causing `W1022
17:20:12.372780   19049 transport.go:260] Unable to cancel request for
   promhttp.RoundTripperFunc` to be emitted by the Kubernetes Client.

This is fixed by using context.WithDeadline and passing that into the
http Request.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-10-27 22:08:21 +05:30
Alejandro Pedraza aea541d6f9
Upgrade generated protobuf files to v1.4.2 (#4673)
Regenerated protobuf files, using version 1.4.2 that was upgraded from
1.3.2 with the proxy-api update in #4614.

As of v1.4 protobuf messages are disallowed to be copied (because they
hold a mutex), so whenever a message is passed to or returned from a
function we need to use a pointer.

This affects _mostly_ test files.

This is required to unblock #4620 which is adding a field to the config
protobuf.
2020-06-26 09:36:48 -05:00
amariampolskiy a46fa05fd7
Generate correct path regex for proto files without package name (#4098)
Signed-off-by: amariampolskiy <amariampolskiy@pushwoosh.com>
2020-03-23 14:21:42 -05:00
Lewis Cowper 5ca9bc6db5
Support configuration of service profile timeouts (#4072)
This change is in a similar vein to #4052 which provided support for
configuring service profile retries via a vendor extension of
`x-linkerd-retryable`, when generating from an openapi specification.

This change is very similar to the final version of that pull request,
and adds a timeout value based on `x-linkerd-timeout`.

At this point I believe that if the timeout is not specified then the
default provided by linkerd of 30s will apply anyway, but won't
explicitly be reflected in the service profile, which I'm somewhat okay
with as a current state, but I think there's a potential future
improvement that the default timeout is always shown when generating
from an open api spec, but that's more to make it clear and obvious that
that timeout exists.

Signed-off-by: Lewis Cowper <lewis.cowper@googlemail.com>
2020-03-10 13:22:26 -07:00
Kohsheen Tiku dea8b8c547
Support for configuring service profile retries(x-linkerd-retryable) via openapi spec (#4052)
* Retry policy is manually written in yaml file and patched it into the service profile

Added support for configuring service profile retries(x-linkerd-retryable) via openapi spec

Signed-off-by: Kohsheen Tiku <kohsheen.t@gmail.com>
2020-02-18 13:07:07 -08:00
Alejandro Pedraza 3324966702
Upgrade go to 1.13.4 (#3702)
Fixes #3566

As explained in #3566, as of go 1.13 there's a strict check that ensures a dependency's timestamp matches it's sha (as declared in go.mod). Our smi-sdk dependency has a problem with that that got resolved later on, but more work would be required to upgrade that dependency. In the meantime a quick pair of replace statements at the bottom of go.mod fix the issue.
2019-11-13 12:54:36 -05:00
arminbuerkle e3d68da1dc Allow setting custom cluster domain in service profiles (#3148)
Continue of #2950.

I decided to check for the `clusterDomain` in the config map in web server main for the same reasons as as pointed out here https://github.com/linkerd/linkerd2/pull/3113#discussion_r306935817

It decouples the server implementations from the config.

Signed-off-by: Armin Buerkle <armin.buerkle@alfatraining.de>
2019-08-07 09:49:54 -07:00
Andrew Seigner 0565955428
Update `linkerd profile --tap` to Tap APIService (#3187)
PR #3167 introduced a Tap APIService, and migrated linkerd tap to it.

This change migrates `linkerd profile --tap` to the new Tap APIService.

Depends on #3186
Fixes #3169

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-08-02 12:44:58 -07:00
Alex Leong d6ef9ea460
Update ServiceProfile CRD to version v1alpha2 and remove validation (#3078)
The openAPIV3Schema validation in the ServiceProfiles CRD is very limited in what it can validate and is obviated by more sophisticated validation done by the validating admission controller.  Therefore, we would like to remove the openAPIV3Schema validation to reduce the size and complexity of the CRD object.

To do so, we must also bump the version of the ServiceProfile custom resource from v1alpha1 to v1alpha2.  This ensures that when the controller is upgraded, it will attempt to watch the v1alpha2 resource.  If it cannot (because, for example, the controller pod started before the ServiceProfile CRD was updated and therefore the v1alpha2 version does not exist) then it will go into a crash loop backoff until it can.  This essentially means that the controller will wait for the CRD to be upgraded to include v1alpha2 before it will start.  

Bumping the version is necessary because if we did not, it would be possible for the controller to start before the CRD is updated (removing the validation).  In this case, when the CRD is edited, the controller will lose its list watch on ServiceProfiles and will stop getting updates.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-07-23 11:46:31 -07:00
Alex Leong 06a69f69c5
Refactor destination service (#2786)
This is a major refactor of the destination service.  The goals of this refactor are to simplify the code for improved maintainability.  In particular:

* Remove the "resolver" interfaces.  These were a holdover from when our decision tree was more complex about how to handle different kinds of authorities.  The current implementation only accepts fully qualified kubernetes service names and thus this was an unnecessary level of indirection.
* Moved the endpoints and profile watchers into their own package for a more clear separation of concerns.  These watchers deal only in Kubernetes primitives and are agnostic to how they are used.  This allows a cleaner layering when we use them from our gRPC service.
* Renamed the "listener" types to "translator" to make it more clear that the function of these structs is to translate kubernetes updates from the watcher to gRPC messages.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-06-04 15:01:16 -07:00
Andrew Seigner ec5a0ca8d9
Authorization-aware control-plane components (#2349)
The control-plane components relied on a `--single-namespace` param,
passed from `linkerd install` into each individual component, to
determine which namespaces they were authorized to access, and whether
to support ServiceProfiles. This command-line flag was redundant given
the authorization rules encoded in the parent `linkerd install` output,
via [Cluster]Role[Binding]s.

Modify the control-plane components to query Kubernetes at startup to
determine which namespaces they are authorized to access, and whether
ServiceProfile support is available. This allows removal of the
`--single-namespace` flag on the components.

Also update `bin/test-cleanup` to cleanup the ServiceProfile CRD.

TODO:
- Remove `--single-namespace` flag on `linkerd install`, part of #2164

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-02-26 11:54:52 -08:00
Andrew Seigner ad0d0b72a0
lint: Enable unconvert (#2368)
unconvert removes unnecessary type conversions:
https://github.com/mdempsky/unconvert

Part of #217

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-02-25 16:07:42 -08:00
Andrew Seigner 25e462352d
lint: Enable goimports (#2366)
goimports checks import lines, adding missing ones and removing
unreferenced ones:
https://godoc.org/golang.org/x/tools/cmd/goimports

It also requires named imports for packages whose
import paths don't match their package names:
- https://github.com/golang/go/issues/28428
- https://go-review.googlesource.com/c/tools/+/145699/

Also standardized named imports of common Kubernetes packaages.

Part of #217

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-02-25 15:51:10 -08:00
Oliver Gould f7435800da
lint: Enable scopelint (#2364)
[scopelint][scopelint] detects a nasty reference-scoping issue in loops.

[scopelint]: https://github.com/kyoh86/scopelint
2019-02-24 08:59:51 -08:00
Kevin Leimkuhler b2bbeb05ef
Issue 2276: Do not log error when timeout is blank (#2279)
# Problem

When a route does not specify a timeout, the proxy-api defaults to the default
timeout and logs an error:

```
time="2019-02-13T16:29:12Z" level=error msg="failed to parse duration for route POST /io.linkerd.proxy.destination.Destination/GetProfile: time: invalid duration"
```

# Solution

We now check if a route timeout is blank. If it is not set, it is set to
`DefaultRouteTimeout`. If it is set, we try to parse it into a `Duration`.

A request was made to improve logging to include the service profile and
namespace as well.

# Validation

With valid service profiles installed, edit the `.yaml` to include an invalid
`timeout`:

```
...
name: GET /
timeout: foo
```

We should now see the following errors:

```
proxy-api time="2019-02-13T22:27:32Z" level=error msg="failed to parse duration for route 'GET /' in service profile 'webapp.default.svc.cluster.local' in namespace 'default': time: invalid duration foo"
```

This error does not show up when `timeout` is blank.

Fixes #2276

Signed-off-by: Kevin Leimkuhler <kevinl@buoyant.io>
2019-02-14 17:09:02 -08:00
Andrew Seigner 2305974202
Introduce golangci-lint tooling, fixes (#2239)
`golangci-lint` performs numerous checks on Go code, including golint,
ineffassign, govet, and gofmt.

This change modifies `bin/lint` to use `golangci-lint`, and replaces
usage of golint and govet.

Also perform a one-time gofmt cleanup:
- `gofmt -s -w controller/`
- `gofmt -s -w pkg/`

Part of #217

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-02-13 11:16:28 -08:00
Alex Leong 3f333c2860
Validate service profiles in all namespaces (#2237)
Fixes #2220 

The service profile validation which is part of `linkerd check` only validates service profiles in the Linkerd namespace.  Due to a recent change, service profiles now can exist in any namespace.

Update the logic so that service profiles in all namespaces are validated.  
Additionally:
* Relax validation of service profile names to support external names

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-02-11 09:52:47 -08:00
Alex Leong 5b054785e5
Read service profiles from client or server namespace instead of control namespace (#2200)
Fixes #2077 

When looking up service profiles, Linkerd always looks for the service profile objects in the Linkerd control namespace.  This is limiting because service owners who wish to create service profiles may not have write access to the Linkerd control namespace.

Instead, we have the control plane look for the service profile in both the client namespace (as read from the proxy's `proxy_id` field from the GetProfiles request and from the service's namespace.  If a service profile exists in both namespaces, the client namespace takes priority.  In this way, clients may override the behavior dictated by the service.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-02-07 14:51:43 -08:00
Andrew Seigner 907f01fba6
Improve ServiceProfile validation in linkerd check (#2218)
The `linkerd check` command was doing limited validation on
ServiceProfiles.

Make ServiceProfile validation more complete, specifically validate:
- types of all fields
- presence of required fields
- presence of unknown fields
- recursive fields

Also move all validation code into a new `Validate` function in the
profiles package.

Validation of field types and required fields is handled via
`yaml.UnmarshalStrict` in the `Validate` function. This motivated
migrating from github.com/ghodss/yaml to a fork, sigs.k8s.io/yaml.

Fixes #2190
2019-02-07 14:35:47 -08:00
Risha Mars e531655d26
Add a --tap flag to the linkerd profile command (#2139)
Adds the ability to generate a service profile by running a tap for a configurable 
amount of time, and using the route results from the routes seen during the tap.

e.g. `linkerd profile web --tap deploy/web -n emojivoto --tap-duration 2s`
2019-02-06 12:43:16 -08:00
Alejandro Pedraza 50fbcc60b5
Add support for `basePath` in swagger 2.0 files (#2211)
Fixes #2175

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-02-06 13:44:35 -05:00
Alex Leong 3bd4231cec
Add support for timeouts in service profiles (#2149)
Fixes #2042 

Adds a new field to service profile routes called `timeout`.  Any requests to that route which take longer than the given timeout will be aborted and a 504 response will be returned instead.  If the timeout field is not specified, a default timeout of 10 seconds is used.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-01-30 16:48:55 -08:00
Alex Leong 872e1bb026
Add --proto flag to linkerd profile command to read protobuf files (#2128)
Fixes #1425 

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-01-25 11:15:20 -08:00
Alex Leong d542571b65
GetProfiles should always respond with a (possibly empty) profile immediately (#2146)
When `GetProfiles` is called for a destination that does not have a service profile, the proxy-api service does not return any messages until a service profile is created for that service.  This can be interpreted as hanging, and can make it difficult to calculate response latency metrics.

Change the behavior of the API to always return a service profile message immediately.  If the service does not have a service profile, the default service profile is returned.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-01-24 15:22:14 -08:00
Alex Leong 771542dde2
Add support for retries (#2038) 2019-01-16 14:13:48 -08:00
Andrew Seigner 1c302182ef
Enable lint check for comments (#2023)
Commit 1: Enable lint check for comments

Part of #217. Follow up from #1982 and #2018.

A subsequent commit will fix the ci failure.

Commit 2: Address all comment-related linter errors.

This change addresses all comment-related linter errors by doing the
following:
- Add comments to exported symbols
- Make some exported symbols private
- Recommend via TODOs that some exported symbols should should move or
  be removed

This PR does not:
- Modify, move, or remove any code
- Modify existing comments

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-01-02 14:03:59 -08:00
Alex Leong 04ed200e36
Rename path_regex to pathRegex (#1951)
Rename snake case fields to camel case in service profile spec.  This improves the way they are rendered when the `kubectl describe` command is used.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-12-06 11:51:33 -08:00
Alex Leong 4f3e55e937
Rename path to path_regex in ServiceProfile CRD (#1923)
We rename path to path_regex in the ServiceProfile CRD to make it clear that this field accepts a regular expression. We also take this opportunity to remove unnecessary line anchors from regular expressions now that these anchors are added in the proxy.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-12-05 10:42:47 -08:00
Risha Mars e8a39cd17e
Add ability to download a service profile template from the web UI (#1893)
Adds an endpoint, at /profiles/new that allows you to input a service name and
namespace, and download a service profile yaml template. 

This will enable future work, where we can add more of the yaml customization via 
a form in the dashboard, and use that data to help the user configure routes.
2018-12-03 16:48:43 -08:00
Alex Leong 32d556e732
Improve ergonomics of service profile spec (#1828)
We make several changes to the service profile spec to make service profiles more ergonomic and to make them more consistent with the destination profile API.

* Allow multiple fields to be simultaneously set on a RequestMatch or ResponseMatch condition.  Doing so is equivalent to combining the fields with an "all" condition.
* Rename "responses" to "response_classes"
* Change "IsSuccess" to "is_failure"

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-10-31 12:00:22 -07:00
Alex Leong 622185a4dd
Send metric labels in profile API (#1800)
* Send metric labels in profile API

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-10-29 14:28:09 -07:00
Alex Leong f549868033
Fix integration test and docker build (#1790)
Fix broken docker build by moving Service Profile conversion and validation into `/pkg`.

Fix broken integration test by adding service profile validation output to `check`'s expected output.

Testing done:
* `gotest -v ./...`
* `bin/docker-build`
* `bin/test-run (pwd)/bin/linkerd`

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-10-19 10:23:34 -07:00