Commit Graph

272 Commits

Author SHA1 Message Date
Alejandro Pedraza 8ac5360041
Extract from public-api all the Prometheus dependencies, and moves things into a new viz component 'linkerd-metrics-api' (#5560)
* Protobuf changes:
- Moved `healthcheck.proto` back from viz to `proto/common` as it remains being used by the main `healthcheck.go` library (it was moved to viz by #5510).
- Extracted from `viz.proto` the IP-related types and put them in `/controller/gen/common/net` to be used by both the public and the viz APIs.

* Added chart templates for new viz linkerd-metrics-api pod

* Spin-off viz healthcheck:
- Created `viz/pkg/healthcheck/healthcheck.go` that wraps the original `pkg/healthcheck/healthcheck.go` while adding the `vizNamespace` and `vizAPIClient` fields which were removed from the core `healthcheck`. That way the core healthcheck doesn't have any dependencies on viz, and viz' healthcheck can now be used to retrieve viz api clients.
- The core and viz healthcheck libs are now abstracted out via the new `healthcheck.Runner` interface.
- Refactored the data plane checks so they don't rely on calling `ListPods`
- The checks in `viz/cmd/check.go` have been moved to `viz/pkg/healthcheck/healthcheck.go` as well, so `check.go`'s sole responsibility is dealing with command business. This command also now retrieves its viz api client through viz' healthcheck.

* Removed linkerd-controller dependency on Prometheus:
- Removed the `global.prometheusUrl` config in the core values.yml.
- Leave the Heartbeat's `-prometheus` flag hard-coded temporarily. TO-DO: have it automatically discover viz and pull Prometheus' endpoint (#5352).

* Moved observability gRPC from linkerd-controller to viz:
- Created a new gRPC server under `viz/metrics-api` moving prometheus-dependent functions out of the core gRPC server and into it (same thing for the accompaigning http server).
- Did the same for the `PublicAPIClient` (now called just `Client`) interface. The `VizAPIClient` interface disappears as it's enough to just rely on the viz `ApiClient` protobuf type.
- Moved the other files implementing the rest of the gRPC functions from `controller/api/public` to `viz/metrics-api` (`edge.go`, `stat_summary.go`, etc.).
- Also simplified some type names to avoid stuttering.

* Added linkerd-metrics-api bootstrap files. At the same time, we strip out of the public-api's `main.go` file the prometheus parameters and other no longer relevant bits.

* linkerd-web updates: it requires connecting with both the public-api and the viz api, so both addresses (and the viz namespace) are now provided as parameters to the container.

* CLI updates and other minor things:
- Changes to command files under `cli/cmd`:
  - Updated `endpoints.go` according to new API interface name.
  - Updated `version.go`, `dashboard` and `uninstall.go` to pull the viz namespace dynamically.
- Changes to command files under `viz/cmd`:
  - `edges.go`, `routes.go`, `stat.go` and `top.go`: point to dependencies that were moved from public-api to viz.
- Other changes to have tests pass:
  - Added `metrics-api` to list of docker images to build in actions workflows.
  - In `bin/fmt` exclude protobuf generated files instead of entire directories because directories could contain both generated and non-generated code (case in point: `viz/metrics-api`).

* Add retry to 'tap API service is running' check

* mc check shouldn't err when viz is not available. Also properly set the log in multicluster/cmd/root.go so that it properly displays messages when --verbose is used
2021-01-21 18:26:38 -05:00
Oleh Ozimok c416e78261
destination: Fix crash when EndpointSlices are enabled (#5543)
The Destination controller can panic due to a nil-deref when
the EndpointSlices API is enabled.

This change updates the controller to properly initialize values
to avoid this segmentation fault.

Fixes #5521

Signed-off-by: Oleg Ozimok <oleg.ozimok@corp.kismia.com>
2021-01-15 12:52:11 -08:00
Alejandro Pedraza f3b1ebfa99
Separate observability API (#5510)
* Separate observability API

Closes #5312

This is a preliminary step towards moving all the observability API into `/viz`, by first moving its protobuf into `viz/metrics-api`. This should facilitate review as the go files are not moved yet, which will happen in a followup PR. There are no user-facing changes here.

- Moved `proto/common/healthcheck.proto` to `viz/metrics-api/proto/healthcheck.prot`
- Moved the contents of `proto/public.proto` to `viz/metrics-api/proto/viz.proto` except for the `Version` Stuff.
- Merged `proto/controller/tap.proto` into `viz/metrics-api/proto/viz.proto`
- `grpc_server.go` now temporarily exposes `PublicAPIServer` and `VizAPIServer` interfaces to separate both APIs. This will get properly split in a followup.
- The web server provides handlers for both interfaces.
- `cli/cmd/public_api.go` and `pkg/healthcheck/healthcheck.go` temporarily now have methods to access both APIs.
- Most of the CLI commands will use the Viz API, except for `version`.

The other changes in the go files are just changes in the imports to point to the new protobufs.

Other minor changes:
- Removed `git add controller/gen` from `bin/protoc-go.sh`
2021-01-13 14:34:54 -05:00
Filip Petkovski 40192e258a
Ignore pods with status.phase=Succeeded when watching IP addresses (#5412)
Ignore pods with status.phase=Succeeded when watching IP addresses

When a pod terminates successfully, some CNIs will assign its IP address
to newly created pods. This can lead to duplicate pod IPs in the same
Kubernetes cluster.

Filter out pods which are in a Succeeded phase since they are not 
routable anymore.

Fixes #5394

Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
2021-01-12 12:25:37 -05:00
Tarun Pothulapati e04647fb8d
remove prom check for public-api self-check (#5436)
Currently, public-api is part of the core control-plane where
the prom check fails when ran before the viz extension is installed.
This change comments out that check, Once metrics api is moved into
viz, maybe this check can be part of it instead or directly part of
`linkerd viz check`.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-01-05 17:22:39 -05:00
Alejandro Pedraza d3d7f4e2e2
Destination should return `OpaqueTransport` hint when annotation matches resolved target port (#5458)
The destination service now returns `OpaqueTransport` hint when the annotation
matches the resolve target port. This is different from the current behavior
which always sets the hint when a proxy is present.

Closes #5421

This happens by changing the endpoint watcher to set a pod's opaque port
annotation in certain cases. If the pod already has an annotation, then its
value is used. If the pod has no annotation, then it checks the namespace that
the endpoint belongs to; if it finds an annotation on the namespace then it
overrides the pod's annotation value with that.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-01-05 14:54:55 -05:00
Kevin Leimkuhler b830efdad7
Add OpaqueTransport field to destination protocol hints (#5421)
## What

When the destination service returns a destination profile for an endpoint,
indicate if the endpoint can receive opaque traffic.

## Why

Closes #5400

## How

When translating a pod address to a destination profile, the destination service
checks if the pod is controlled by any linkerd control plane. If it is, it can
set a protocol hint where we indicate that it supports H2 and opaque traffic.

If the pod supports opaque traffic, we need to get the port that it expects
inbound traffic on. We do this by getting the proxy container and reading it's
`LINKERD2_PROXY_INBOUND_LISTEN_ADDR` environment variable. If we successfully
parse that into a port, we can set the opaque transport field in the destination
profile.

## Testing

A test has been added to the destination server where a pod has a
`linkerd-proxy` container. We can expect the `OpaqueTransport` field to be set
in the returned destination profile's protocol hint.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-12-23 11:06:39 -05:00
Kevin Leimkuhler 7c0843a823
Add opaque ports to destination service updates (#5294)
## Summary

This changes the destination service to start indicating whether a profile is an
opaque protocol or not.

Currently, profiles returned by the destination service are built by chaining
together updates coming from watching Profile and Traffic Split updates.

With this change, we now also watch updates to Opaque Port annotations on pods
and namespaces; if an update occurs this is now included in building a profile
update and is sent to the client.

## Details

Watching updates to Profiles and Traffic Splits is straightforward--we watch
those resources and if an update occurs on one associated to a service we care
about then the update is passed through.

For Opaque Ports this is a little different because it is an annotation on pods
or namespaces. To account for this, we watch the endpoints that we should care
about.

### When host is a Pod IP

When getting the profile for a Pod IP, we check for the opaque ports annotation
on the pod and the pod's namespace. If one is found, we'll indicate if the
profile is an opaque protocol if the requested port is in the annotation.

We do not subscribe for updates to this pod IP. The only update we really care
about is if the pod is deleted and this is already handled by the proxy.

### When host is a Service

When getting the profile for a Service, we subscribe for updates to the
endpoints of that service. For any ports set in the opaque ports annotation on
any of the pods, we check if the requested port is present.

Since the endpoints for a service can be added and removed, we do subscribe for
updates to the endpoints of the service.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-12-18 12:38:59 -05:00
Kevin Leimkuhler ca86a31816
Add destination service tests for the IP path (#5266)
This adds additional tests for the destination service that assert `GetProfile`
behavior when the path is an IP address.

1. Assert that when the path is a cluster IP, the configured service profile is
   returned.
2. Assert that when the path a pod IP, the endpoint field is populated in the
   service profile returned.
3. Assert that when the path is not a cluster or pod IP, the default service
   profile is returned.
4. Assert that when path is a pod IP with or without the controller annotation,
the endpoint has or does not have a protocol hint

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-11-23 13:17:05 -05:00
Kevin Leimkuhler 92f9387997
Check correct label value when setting protocl hint (#5267)
This fixes an issue where the protocol hint is always set on endpoint responses.
We now check the right value which determines if the pod has the required label.

A test for this has been added to #5266.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-11-20 13:32:50 -08:00
Oliver Gould 375ffd782f
proxy: v2.121.0 (#5253)
This release changes error handling to teardown the server-side
connection when an unexpected error is encountered.

Additionally, the outbound TCP routing stack can now skip redundant
service discovery lookups when profile responses include endpoint
information.

Finally, the cache implementation has been updated to reduce latency by
removing unnecessary buffers.

---

* h2: enable HTTP/2 keepalive PING frames (linkerd/linkerd2-proxy#737)
* actions: Add timeouts to GitHub actions (linkerd/linkerd2-proxy#738)
* outbound: Skip endpoint resolution on profile hint (linkerd/linkerd2-proxy#736)
* Add a FromStr for dns::Name (linkerd/linkerd2-proxy#746)
* outbound: Avoid redundant TCP endpoint resolution (linkerd/linkerd2-proxy#742)
* cache: Make the cache cloneable with RwLock (linkerd/linkerd2-proxy#743)
* http: Teardown serverside connections on error (linkerd/linkerd2-proxy#747)
2020-11-18 16:55:53 -08:00
Kevin Leimkuhler e65f216d52
Add endpoint to GetProfile response (#5227)
Context: #5209

This updates the destination service to set the `Endpoint` field in `GetProfile`
responses.

The `Endpoint` field is only set if the IP maps to a Pod--not a Service.

Additionally in this scenario, the default Service Profile is used as the base
profile so no other significant fields are set.

### Examples

```
# GetProfile for an IP that maps to a Service
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.43.222.0:9090
INFO[0000] fully_qualified_name:"linkerd-prometheus.linkerd.svc.cluster.local"  retry_budget:{retry_ratio:0.2  min_retries_per_second:10  ttl:{seconds:10}}  dst_overrides:{authority:"linkerd-prometheus.linkerd.svc.cluster.local.:9090"  weight:10000}
```

Before:

```
# GetProfile for an IP that maps to a Pod
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.42.0.20
INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}}
```


After:

```
# GetProfile for an IP that maps to a Pod
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.42.0.20
INFO[0000] retry_budget:{retry_ratio:0.2  min_retries_per_second:10  ttl:{seconds:10}}  endpoint:{addr:{ip:{ipv4:170524692}}  weight:10000  metric_labels:{key:"control_plane_ns"  value:"linkerd"}  metric_labels:{key:"deployment"  value:"fast-1"}  metric_labels:{key:"pod"  value:"fast-1-5cc87f64bc-9hx7h"}  metric_labels:{key:"pod_template_hash"  value:"5cc87f64bc"}  metric_labels:{key:"serviceaccount"  value:"default"}  tls_identity:{dns_like_identity:{name:"default.default.serviceaccount.identity.linkerd.cluster.local"}}  protocol_hint:{h2:{}}}
```

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-11-18 15:41:25 -05:00
Tarun Pothulapati 4c106e9c08
cli: make check return SkipError when there is no prometheus configured (#5150)
Fixes #5143

The availability of prometheus is useful for some calls in public-api
that the check uses. This change updates the ListPods in public-api
to still return the pods even when prometheus is not configured.

For a test that exclusively checks for prometheus metrics, we have a gate
which checks if a prometheus is configured and skips it othervise.

Signed-off-by: Tarun Pothulapati tarunpothulapati@outlook.com
2020-10-29 19:57:11 +05:30
Alex Leong 4f34ce8e2f
Empty the stored addresses when the endpoint translator gets a NoEndpoints message (#5126)
Signed-off-by: Alex Leong <alex@buoyant.io>
2020-10-22 17:01:03 -07:00
Kevin Leimkuhler eff50936bf
Fix --all-namespaces flag handling (#5085)
## Motivations

Closes #5080

## Solution

When the `--all-namespaces` (`-A`) flag is set for the `linkerd edges` command,
ignore the `namespace` value set by default or `-n`.

This is similar to the behavior for `kubectl`. `kubectl get -A -n linkerd pods`
showing pods in all namespaces.

### Behavior changes

With linkerd and emojivoto installed, this results in:

Before:

```
❯ linkerd edges -A pods
No edges found.
```

After:

```
❯ linkerd edges -A pods
SRC                                   DST                                       SRC_NS      DST_NS      SECURED       
vote-bot-6cb9cb9569-wl6w5             web-5d69bcfdb7-mxf8f                      emojivoto   emojivoto   √  
web-5d69bcfdb7-mxf8f                  emoji-7dc976587b-rb9c5                    emojivoto   emojivoto   √  
web-5d69bcfdb7-mxf8f                  voting-bdf4f778c-pjkjg                    emojivoto   emojivoto   √  
linkerd-prometheus-68d6897d75-ghmgm   emoji-7dc976587b-rb9c5                    linkerd     emojivoto   √  
linkerd-prometheus-68d6897d75-ghmgm   vote-bot-6cb9cb9569-wl6w5                 linkerd     emojivoto   √  
linkerd-prometheus-68d6897d75-ghmgm   voting-bdf4f778c-pjkjg                    linkerd     emojivoto   √  
linkerd-prometheus-68d6897d75-ghmgm   web-5d69bcfdb7-mxf8f                      linkerd     emojivoto   √  
linkerd-controller-7d965cf78d-qw6xj   linkerd-prometheus-68d6897d75-ghmgm       linkerd     linkerd     √  
linkerd-prometheus-68d6897d75-ghmgm   linkerd-controller-7d965cf78d-qw6xj       linkerd     linkerd     √  
linkerd-prometheus-68d6897d75-ghmgm   linkerd-destination-74dbb9c46b-nkxgh      linkerd     linkerd     √  
linkerd-prometheus-68d6897d75-ghmgm   linkerd-grafana-5d9fb67dc6-sn2l8          linkerd     linkerd     √  
linkerd-prometheus-68d6897d75-ghmgm   linkerd-identity-c875b5d58-b756v          linkerd     linkerd     √  
linkerd-prometheus-68d6897d75-ghmgm   linkerd-proxy-injector-767b55988d-n9r6f   linkerd     linkerd     √  
linkerd-prometheus-68d6897d75-ghmgm   linkerd-sp-validator-6c8df84fb9-4w8kc     linkerd     linkerd     √  
linkerd-prometheus-68d6897d75-ghmgm   linkerd-tap-777fbf7656-p87dm              linkerd     linkerd     √  
linkerd-prometheus-68d6897d75-ghmgm   linkerd-web-546c9444b5-68xpx              linkerd     linkerd     √
```

`linkerd edges -A -n linkerd pods` results in all edges as well (the result
above).

The behavior of `linkerd edges pods` does not change and shows edges in the
`default` namespace.

```
❯ linkerd edges pods
No edges found.
```

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-10-16 16:49:10 -04:00
Alejandro Pedraza 777b06ac55
Expand 'linkerd edges' to work with TCP connections (#5040)
* Expand 'linkerd edges' to work with TCP connections

Fixes #4999

Before:
```
$ bin/linkerd edges po -owide
SRC                                   DST                                    SRC_NS    DST_NS    CLIENT_ID   SERVER_ID   SECURED
linkerd-prometheus-764ddd4f88-t6c2j   rabbitmq-controller-5c6cf7cc6d-8lxp2   linkerd   default                           √
linkerd-prometheus-764ddd4f88-t6c2j   temp                                   linkerd   default                           √

```

After:
```
$ bin/linkerd edges po -owide
SRC                                   DST                                    SRC_NS    DST_NS    CLIENT_ID         SERVER_ID         SECURED
temp                                  rabbitmq-controller-5c6cf7cc6d-5fpsc   default   default   default.default   default.default   √
linkerd-prometheus-66fb97b7fc-vpnxf   rabbitmq-controller-5c6cf7cc6d-5fpsc   linkerd   default                                       √
linkerd-prometheus-66fb97b7fc-vpnxf   temp                                   linkerd   default                                       √
```

With the latest proxy upgrade to v2.113.0 (#5037), the `tcp_open_total` metric now contains the `client_id` label so that we can replace the http-only metric `response_total` with this one to determine edges for TCP-only connections.

This change basically performs the same query as before, but two times, one for `response_total` and another for `tcp_open_total`. For each resulting entry, the latter is kept if `client_id` is present, otherwise the former is used (if present at all). That way things keep on working for older proxies.

Disclaimers:
- This doesn't fix #3706: if two sources connect to the same destination there's no way to tell them appart from the metrics perspective and their edges can get mangled. To fix that, the proxy would have to expose `src_resource` labels in the `tcp_open_total` total inbound metric.
- Note connections coming from prometheus are still unidentified. The reason is those hit the proxy's admin server (instead of the main container) which doesn't expose metrics.
2020-10-12 09:14:39 -05:00
Tarun Pothulapati 5e774aaf05
Remove dependency of linkerd-config for control plane components (#4915)
* Remove dependency of linkerd-config for most control plane components

This PR removes the dependency of `linkerd-config` into control
plane components by making all that information passed through CLI
flags. As most of these components require a couple of flags, passing
them as flags could be more helpful, as updations to the flags trigger a
rollout unlike a configMap update.

This does not update the proxy-injector as it needs a lot more data
and mounting `linkerd-config` is better.
2020-10-06 22:19:18 +05:30
Kevin Leimkuhler 6b7a39c9fa
Set FQN in profile resolutions (#5019)
## Motivation

Closes #5016

Depends on linkerd/linkerd2-proxy-api#44

## Solution

A `profileTranslator` exists for each service and now has a new
`fullyQualifiedName` field.

This field is used to set the `FullyQualifiedName` field of
`DestinationProfile`s each time an update is sent.

In the case that no service profile exists for a service, a default
`DestinationProfile` is created and we can use the field to set the correct
name.

In the case that a service profile does exist for a service, we still use this
field to set the name to keep it consistent.

### Example

Install linkerd on a cluster and run the destination server:

```
go run controller/cmd/main.go destination -kubeconfig ~/.kube/config
```

Get the IP of a service. Here, we'll get the ip for `linkerd-identity`:

```
> kubectl get -n linkerd svc/linkerd-identity
NAME               TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
linkerd-identity   ClusterIP   10.43.161.68   <none>        8080/TCP   4h25m
```

Get the profile of `linkerd-identity` from service name or IP and note the
`FullyQualifiedName` field:

```
> go run controller/script/destination-client/main.go -method getProfile -path 10.43.161.68:8080
INFO[0000] fully_qualified_name:"linkerd-identity.linkerd.svc.cluster.local" ..
```

```
> go run controller/script/destination-client/main.go -method getProfile -path linkerd-identity.linkerd.svc.cluster.local
INFO[0000] fully_qualified_name:"linkerd-identity.linkerd.svc.cluster.local" ..
```

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-10-01 11:06:00 -04:00
Tarun Pothulapati d0caaa86c4
Bump k8s client-go to v0.19.2 (#5002)
Fixes #4191 #4993

This bumps Kubernetes client-go to the latest v0.19.2 (We had to switch directly to 1.19 because of this issue). Bumping to v0.19.2 required upgrading to smi-sdk-go v0.4.1. This also depends on linkerd/stern#5

This consists of the following changes:

- Fix ./bin/update-codegen.sh by adding the template path to the gen commands, as it is needed after we moved to GOMOD.
- Bump all k8s related dependencies to v0.19.2
- Generate CRD types, client code using the latest k8s.io/code-generator
- Use context.Context as the first argument, in all code paths that touch the k8s client-go interface

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-09-28 12:45:18 -05:00
Oliver Gould 6d67b84447
profiles: Eliminate default timeout (#4958)
* profiles: Eliminate default timeout
2020-09-10 14:00:18 -07:00
Alejandro Pedraza ccf027c051
Push docker images to ghcr.io instead of gcr.io (#4953)
* Push docker images to ghcr.io instead of gcr.io

The `cloud_integration.yml` and `release.yml` workflows were modified to
log into ghcr.io, and remove the `Configure gcloud` step which is no
longer necessary.

Note that besides the changes to cloud_integration.yml and release.yml, there was a change to the upgrade-stable integration test so that we do linkerd upgrade --addon-overwrite to reset the addons settings because in stable-2.8.1 the Grafana image was pegged to gcr.io/linkerd-io/grafana in linkerd-config-addons. This will need to be mentioned in the 2.9 upgrade notes.

Also the egress integration test has a debug container that now is pegged to the edge-20.9.2 tag.

Besides that, the other changes are just a global search and replace (s/gcr.io\/linkerd-io/ghcr.io\/linkerd/).
2020-09-10 15:16:24 -05:00
Ali Ariff 5186383c81
Add ARM64 Integration Test (#4897)
* Add ARM64 Integration Test

Signed-off-by: Ali Ariff <ali.ariff12@gmail.com>
2020-08-28 10:38:40 -07:00
Kevin Leimkuhler c2301749ef
Always return destination overrides for services (#4890)
## Motivation

#4879

## Solution

When no traffic split exists for services, return a single destination override
with a weight of 100%.

Using the destination client on a new linkerd installation, this results in the
following output for `linkerd-identity` service:

```
❯ go run controller/script/destination-client/main.go -method getProfile -path linkerd-identity.linkerd.svc.cluster.local:8080
INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} dst_overrides:{authority:"linkerd-identity.linkerd.svc.cluster.local.:8080" weight:100000} 
INFO[0000]
```

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-08-19 12:25:58 -07:00
Matei David f797ab1e65
service topologies: topology-aware service routing (#4780)
[Link to RFC](https://github.com/linkerd/rfc/pull/23)

### What
---
* PR that puts together all past pieces of the puzzle to deliver topology-aware service routing, as specified in the [Kubernetes docs](https://kubernetes.io/docs/concepts/services-networking/service-topology/) but with a much better load balancing algorithm and all the coolness of linkerd :) 
* The first piece of this PR is focused on adding topology metadata: topology preference for services and topology `<k,v>` pairs for endpoints.
* The second piece of this PR puts together the new context format and fetching the source node topology metadata in order to allow for endpoints filtering.
* The final part is doing the filtering -- passing all of the metadata to the listener and on every `Add` filtering endpoints based on the topology preference of the service, topology `<k,v>` pairs of endpoints and topology of the source (again `<k,v>` pairs).

### How
---

* **Collecting metadata**:
   -  Services do not have values for topology keys -- the topological keys defined in a service's spec are only there to dictate locality preference for routing; as such, I decided to store them in an array, they will be taken exactly as they are found in the service spec, this ensures we respect the preference order.

   - For EndpointSlices, we are using a map -- an EndpointSlice has locality information in the form of `<k,v>` pair, where the key is a topological key (similar to what's listed in the service) and the value is the locality information -- e.g `hostname: minikube`. For each address we now have a map of topology values which gets populated when we translate the endpoints to an address set. Because normal Endpoints do not have any topology information, we create each address with an empty map which is subsequently populated ONLY for slices in the `endpointSliceToAddressSet` function.

* **Filtering endpoints**:
  - This was a tricky part and filled me with doubts. I think there are a few ways to do this, but this is how I "envisioned" it. First, the `endpoint_translator.go` should be the one to do the filtering; this means that on subscription, we need to feed all of the relevant metadata to the listener. To do this, I created a new function `AddTopologyFilter` as part of the listener interface.

  - To complement the `AddTopologyFilter` function, I created a new `TopologyFilter` struct in `endpoints_watcher.go`. I then embedded this structure in all listeners that implement the interface. The structure holds the source topology (source node), a boolean to tell if slices are activated in case we need to double check (or write tests for the function) and the service preference. We create the filter on Subscription -- we have access to the k8s client here as well as the service, so it's the best point to collect all of this data together. Addresses all have their own topology added to them so they do not have to be collected by the filter.

  - When we add a new set of addresses, we check to see if slices are enabled -- chances are if slices are enabled, service topology might be too. This lets us skip this step if the latest version is not adopted. Prior to sending an `Add` we filter the endpoints -- if the preference is registered by the filter we strictly enforce it, otherwise nothing changes.

And that's pretty much it. 

Signed-off-by: Matei David <matei.david.35@gmail.com>
2020-08-18 11:11:09 -07:00
Josh Soref 72aadb540f
Spelling (#4872)
This PR corrects misspellings identified by the [check-spelling action](https://github.com/marketplace/actions/check-spelling).

The misspellings have been reported at aaf440489e (commitcomment-41423663)

The action reports that the changes in this PR would make it happy: 5b82c6c5ca

Note: this PR does not include the action. If you're interested in running a spell check on every PR and push, that can be offered separately.

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
2020-08-12 21:59:50 -07:00
Alex Leong a1543b33e3
Add support for service-mirror selectors (#4795)
* Add selector support

Signed-off-by: Alex Leong <alex@buoyant.io>

* Removed unused labels

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-07-30 10:07:14 -07:00
Tharun Rajendran e24c323bf9
Gateway Metrics in Dashboard (#4717)
* Introduce multicluster gateway api handler in web api server
* Added MetricsUtil for Gateway metrics
* Added gateway api helper
* Added Gateway Component

Updated metricsTable component to support gateway metrics
Added handler for gateway

Fixes #4601

Signed-off-by: Tharun <rajendrantharun@live.com>
2020-07-27 12:43:54 -07:00
Matei David 1c197b14e7
Change destination context token format (#4771)
Add a new structure on the destination controller side to keep track of contextual information.
The token format has been changed from ns:<namespace> to a JSON format so that more variables can be
encdoed in the token. As part of this PR, a new field 'nodeName' has been added to help with service
topologies.

Fixes #4498

Signed-off-by: Matei David <matei.david.35@gmail.com>
2020-07-27 09:49:48 -07:00
Alex Leong d540e16c8b
Make service mirror controller per target cluster (#4710)
This PR removes the service mirror controller from `linkerd mc install` to `linkerd mc link`, as described in https://github.com/linkerd/rfc/pull/31.  For fuller context, please see that RFC.

Basic multicluster functionality works here including:
* `linkerd mc install` installs the Link CRD but not any service mirror controllers
* `linkerd mc link` creates a Link resource and installs a service mirror controller which uses that Link
* The service mirror controller creates and manages mirror services, a gateway mirror, and their endpoints.
* The `linkerd mc gateways` command lists all linked target clusters, their liveliness, and probe latences.
* The `linkerd check` multicluster checks have been updated for the new architecture.  Several checks have been rendered obsolete by the new architecture and have been removed.

The following are known issues requiring further work:
* the service mirror controller uses the existing `mirror.linkerd.io/gateway-name` and `mirror.linkerd.io/gateway-ns` annotations to select which services to mirror.  it does not yet support configuring a label selector.
* an unlink command is needed for removing multicluster links: see https://github.com/linkerd/linkerd2/issues/4707
* an mc uninstall command is needed for uninstalling the multicluster addon: see https://github.com/linkerd/linkerd2/issues/4708

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-07-23 14:32:50 -07:00
Matei David 146c593cd5
Uncomment EndpointSliceAccess function (#4760)
* Small PR that uncomments the `EndpointSliceAcess` method and cleans up left over todos in the destination service.
* Based on the past three PRs related to `EndpointSlices` (#4663 #4696 #4740); they should now be functional (albeit prone to bugs) and ready to use.

Signed-off-by: Matei David <matei.david.35@gmail.com>
2020-07-20 14:50:43 -07:00
Tarun Pothulapati b7e9507174
Remove/Relax prometheus related checks (#4724)
* Removes/Relaxes prometheus related checks

Now that prometheus is an add-on, There can be cases where prometheus is
disabled at which the check should show a warning but not fail. This
decouples the tight depedency.

This changes the following checks:

- Removes serviceAccount and pod checks in the CLI.
- Relaxes `linkerd-api` checks to only check for prometheus access when
the URL is not empty. This should work seamlessly with external
prometheus as that URL will be passed and it performs the same
check.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-07-20 14:24:00 -07:00
Matei David 8b85716eb8
Introduce install flag for EndpointSlices (#4740)
EndpointSlices have been made opt-in due to their experimental nature. This PR
introduces a new install flag 'enableEndpointSlices' that will allow adopters to
specify in their cli install or helm install step whether they would like to
use endpointslices as a resource in the destination service, instead of the
endpoints k8s resource.

Signed-off-by: Matei David <matei.david.35@gmail.com>
2020-07-15 09:53:04 -07:00
Kevin Leimkuhler f49b40c4a9
Add support for profile lookups by IP address (#4727)
## Motivation

Closes #3916

This adds the ability to get profiles for services by IP address.

### Change in behavior

When the destination server receives a `GetProfile` request with an IP address,
it now tries to map that IP address to a service.

If the IP address maps to an existing service, then the destination server
returns the profile stream subscribes for updates to the _service_--this is the
existing behavior. If the IP changes to a new service, the stream will still
send updates for the first service the IP address corresponded to since that is
what it is subscribed to.

If the IP address does not map to an existing service, then the destination
server returns the profile stream but does not subscribe for updates. The stream
will receive one update, the default profile.

### Solution

This change uses the `IPWatcher` within the destination server to check for what
services an IP address correspond to. By adding a new method `GetSvc` to
`IPWatcher`, the server now calls this method if `GetProfile` receives a request
with an IP address.

## Testing

Install linkerd on a cluster and get the cluster IP of any service:

```bash
❯ kubectl get -n linkerd svc/linkerd-tap -o wide
NAME          TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)            AGE   SELECTOR
linkerd-tap   ClusterIP   10.104.57.90   <none>        8088/TCP,443/TCP   16h   linkerd.io/control-plane-component=tap
```

Run the destination server:

```bash
❯ go run controller/cmd/main.go destination -kubeconfig ~/.kube/config
```

Get the profile for the tap service by IP address:

```bash
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.104.57.90:8088
INFO[0000] retry_budget:{retry_ratio:0.2  min_retries_per_second:10  ttl:{seconds:10}} 
INFO[0000]
```

Get the profile for an IP address that does not correspond to a service:

```bash
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.256.0.1:8088
INFO[0000] retry_budget:{retry_ratio:0.2  min_retries_per_second:10  ttl:{seconds:10}} 
INFO[0000]
```

You can add and remove settings for the service profile for tap and get updates.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-07-10 14:41:15 -07:00
Zahari Dichev 73010149ce
Do not treat evicted pods as failed in healthchecks (#4732)
When a k8s pod is evicted its Phase is set to Failed and the reason is set to Evicted. Because in the ListPods method of the public APi we only transmit the phase and treat it as Status, the healthchecks assume such evicted data plane pods to be failed. Since this check is retryable, the results is that linkerd check --proxy appears to hang when there are evicted pods. As @adleong correctly pointed out here, the presence of evicted pod is not something that we should make the checks fail.

This change modifies the publci api to set the Pod.Status to "Evicted" for evicted pods. The healtcheks are also modified to not treat evicted pods as error cases.

Fix #4690

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-07-09 14:22:27 +03:00
Suraj Deshmukh d7dbe9cbff
Fix spelling mistakes using codespell (#4700)
Using following command the wrong spelling were found and later on
fixed:

```
codespell --skip CHANGES.md,.git,go.sum,\
    controller/cmd/service-mirror/events_formatting.go,\
    controller/cmd/service-mirror/cluster_watcher_test_util.go,\
    SECURITY_AUDIT.pdf,.gcp.json.enc,web/app/img/favicon.png \
    --ignore-words-list=aks,uint,ans,files\' --check-filenames \
    --check-hidden
```

Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com>
2020-07-07 17:07:22 -05:00
Matei David 9d8d89cce8
Add EndpointSlice logic to EndpointsWatcher (#4501) (#4663)
Introduce support for the EndpointSlice k8s resource (k8s v1.16+) in the destination service.
Through this PR, in the EndpointsWatcher, there will be a dedicated informer for EndpointSlice;
the informer cannot run at the same time as the Endpoints resource informer. The main difference
is that EndpointSlices have a one-to-many relationship with a service, they provide better performance benefits,
dual-stack addresses and more. EndpointSlice support also implies service topology and other k8s related features.

Validated and tested manually, as well as with dedicated unit tests.

Closes #4501

Signed-off-by: Matei David <matei.david.35@gmail.com>
2020-07-07 13:20:40 -07:00
Alejandro Pedraza aea541d6f9
Upgrade generated protobuf files to v1.4.2 (#4673)
Regenerated protobuf files, using version 1.4.2 that was upgraded from
1.3.2 with the proxy-api update in #4614.

As of v1.4 protobuf messages are disallowed to be copied (because they
hold a mutex), so whenever a message is passed to or returned from a
function we need to use a pointer.

This affects _mostly_ test files.

This is required to unblock #4620 which is adding a field to the config
protobuf.
2020-06-26 09:36:48 -05:00
Oliver Gould c4d649e25d
Update proxy-api version to v0.1.13 (#4614)
This update includes no API changes, but updates grpc-go
to the latest release.
2020-06-24 12:52:59 -07:00
Zahari Dichev 3365455e45
Fix mc labels (#4560)
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-06-05 19:36:09 +03:00
Zahari Dichev 6c3922a7f1
Probe manager simplification (#4510)
There are a few notable things happening in this PR: 

- the probe manager has been decoupled from the cluster_watcher. Now its only responsibility is to watch for mirrored gateways beeing created and to probe them. This means that probes are initiated for all gateways no matter whether there are mirrored services being paired
- the number of paired services is derived from the existing services in the cluster rather than being published as a metric by the prober
- there are no events being exchanged between the cluster watcher and the probe manager

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-06-01 14:41:29 -07:00
Zahari Dichev 4176580a0f
Threadsafe buffering listener (#4359)
* Add thread safety to watcher tests

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-14 20:45:41 +03:00
Zahari Dichev ef1a2c2b10
Multicluster dashboard for traffic metrics (#4178)
This change adds labels to endpoints that target remote services. It also adds a Grafana dashboard that can be used to monitor multicluster traffic.

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-14 17:48:27 +03:00
Alex Leong 8fbaa3ef9b
Don't send NoEndpoints during pod updates for ip watches (#4338)
When the proxy has an IP watch on a pod and the destination controller gets a pod update event, the destination controller sends a NoEndpoints message to all listeners followed by an Add with the new pod state.  This can result in the proxy's load balancer being briefly empty and could result in failing requests in the period.  

Since consecutive Add events with the same address will override each other, we can simply send the Adds without needing to clear the previous state with a NoEndpoints message.
2020-05-07 16:10:17 -07:00
Zahari Dichev 5149152ef3
Multicluster gateway and remote setup command (#4265)
Add multicluster gateway and setup command

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-04-29 20:33:23 +03:00
Zahari Dichev 17dacf5548
Add gateways command, allowing the retrieval of gateway stats (#4241)
Add gateways command, allowing the retrieval of gateway stats

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-04-27 13:55:01 +03:00
Zahari Dichev 26c14d3c66
Detect changes in addresses when getting updates in endpoints watcher (#4104)
Detect changes in addresses when getting updates in endpoints watcher

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-04-10 11:42:39 +03:00
Alex Leong d8eebee4f7
Upgrade to client-go 0.17.4 and smi-sdk-go 0.3.0 (#4221)
Here we upgrade our dependencies on client-go to 0.17.4 and smi-sdk-go to 0.3.0.  Since smi-sdk-go uses client-go 0.17.4, these upgrades must be performed simultaneously.

This also requires simultaneously upgrading our dependency on linkerd/stern to a SHA which also uses client-go 0.17.4.  This keeps all of our transitive dependencies synchronized on one version of client-go.

This ALSO requires updating our codegen scripts to use the 0.17.4 version of code-generator and running it to generate 0.17.4 compatible generated code.  I took this opportunity to update our code generation script to properly use the version of code-generater from `go.mod` rather than a hardcoded SHA.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-04-01 10:07:23 -07:00
Zahari Dichev 10ecd8889e
Set auth override (#4160)
Set AuthOverride when present on endpoints annotation

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-03-25 10:56:36 +02:00
Mayank Shah 963b9b049a
Add kubectl-style label selectors (#4120)
* Update tap, routes and top commands to support label selectors

Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
2020-03-20 10:45:06 -05:00
Zahari Dichev 2db307ee91
Remove target port requirement in port resolution (#4174)
This change removes the target port requirement when resolving ports in the dst service. Based on the comments, it seems that we need to have a target port defined in the port spec in order to resolve to the port in the Endpoints. In reality if target port is note defined when creating the service, k8s will set the port and the target port to the same value. Seems to me that checking for the targetPort to be different than 0, is a no-op.

Signed-off-by: Zahari Dichev zaharidichev@gmail.com
2020-03-16 23:04:08 +02:00