Commit Graph

645 Commits

Author SHA1 Message Date
Kevin Leimkuhler 07d5071cc4
Remove default skip ports and add to opaque ports (#5810)
This change removes the default ignored inbound and outbound ports from the
proxy init configuration.

These ports have been moved to the the `proxy.opaquePorts` configuration so that
by default, installations will proxy all traffic on these ports opaquely.

Closes #5571 
Closes #5595 

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-02-24 16:22:09 -05:00
Kevin Leimkuhler 51a965e228
Return default opaque ports in the destination service (#5814)
This changes the destination service to always use a default set of opaque ports
for pods and services. This is so that after Linkerd is installed onto a
cluster, users can benefit from common opaque ports without having to annotate
the workloads that serve the applications.

After #5810 merges, the proxy containers will be have the default opaque ports
`25,443,587,3306,5432,11211`. This value on the proxy container does not affect
traffic though; it only configures the proxy.

In order for clients and servers to detect opaque protocols and determine opaque
transports, the pods and services need to have these annotations.

The ports `25,443,587,3306,5432,11211` are now handled opaquely when a pod or
service does not have the opaque ports annotation. If the annotation is present
with a different value, this is used instead of the default. If the annotation
is present but is an empty string, there are no opaque ports for the workload.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-02-24 14:55:31 -05:00
Kevin Leimkuhler 5bd5db6524
Revert "Rename multicluster annotation prefix and move when possible (#5771)" (#5813)
This reverts commit f9ab867cbc which renamed the
multicluster label name from `mirror.linkerd.io` to `multicluster.linkerd.io`.

While this change was made to follow similar namings in other extensions, it
complicates the multicluster upgrade process due to the secret creation.

`mirror.linkerd.io` is not that important of a label to change and this will
allow a smoother upgrade process for `stable-2.10.x`

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-02-24 12:54:52 -05:00
Kevin Leimkuhler ff93d2d317
Mirror opaque port annotations on services (#5770)
This change introduces an opaque ports annotation watcher that will send
destination profile updates when a service has its opaque ports annotation
change.

The user facing change introduced by this is that the opaque ports annotation is
now required on services when using the multicluster extension. This is because
the service mirror will create mirrored services in the source cluster, and
destination lookups in the source cluster need to discover that the workloads in
the target cluster are opaque protocols.

### Why

Closes #5650

### How

The destination server now has a new opaque ports annotation watcher. When a
client subscribes to updates for a service name or cluster IP, the `GetProfile`
method creates a profile translator stack that passes updates through resource
adaptors such as: traffic split adaptor, service profile adaptor, and now opaque
ports adaptor.

When the annotation on a service changes, the update is passed through to the
client where the `opaque_protocol` field will either be set to true or false.

A few scenarios to consider are:

  - If the annotation is removed from the service, the client should receive
    an update with no opaque ports set.
  - If the service is deleted, the stream stays open so the client should
    receive an update with no opaque ports set.
  - If the service has the annotation added, the client should receive that
    update.

### Testing

Unit test have been added to the watcher as well as the destination server.

An integration test has been added that tests the opaque port annotation on a
service.

For manual testing, using the destination server scripts is easiest:

```
# install Linkerd

# start the destination server
$ go run controller/cmd/main.go destination -kubeconfig ~/.kube/config

# Create a service or namespace with the annotation and inject it

# get the destination profile for that service and observe the opaque protocol field
$ go run controller/script/destination-client/main.go -method getProfile -path test-svc.default.svc.cluster.local:8080
INFO[0000] fully_qualified_name:"terminus-svc.default.svc.cluster.local" opaque_protocol:true retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} dst_overrides:{authority:"terminus-svc.default.svc.cluster.local.:8080" weight:10000} 
INFO[0000]                                              
INFO[0000] fully_qualified_name:"terminus-svc.default.svc.cluster.local" opaque_protocol:true retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} dst_overrides:{authority:"terminus-svc.default.svc.cluster.local.:8080" weight:10000} 
INFO[0000]
```

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-02-23 13:36:17 -05:00
Max Goltzsche a8e5bff21f
Provide CA cert as env var to identity controller. (#5690)
Currently the identity controller is the only component that receives the CA certificate / trust anchors as option `-identity-trust-anchors-pem` instead of an env var.
This stops one from letting it read the trust anchors from a Secret that is managed by e.g. cert-manager.

This PR uses an env var instead of the option to provide the trust anchors. For most helm chart users this doesn't change anything. However using kustomize the helm output manifest can now be adjusted (again) so that the certificate is loaded from a ConfigMap or Secret like in [this example](https://github.com/mgoltzsche/khelm/tree/master/example/kpt/linkerd) which aims to produce a static manifest to make the installation/update more declarative and support GitOps workflows.

This PR does not provide chart options/values to specify Secrets upfront - it would introduce dependencies to other operators.

Relates to #3843, see https://github.com/linkerd/linkerd2/issues/3843#issuecomment-775516217

Fixes #3321

Signed-off-by: Max Goltzsche <max.goltzsche@gmail.com>
2021-02-22 10:30:43 -05:00
Dennis Adjei-Baah 57db567204
send serviceprofile counts in heartbeat URL (#5775)
This change counts the number of service profiles installed in a cluster
and adds that info to the heartbeat HTTP request.

Fixes #5474

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2021-02-19 10:11:26 -05:00
Kevin Leimkuhler f9ab867cbc
Rename multicluster annotation prefix and move when possible (#5771)
This renames the multicluster annotation prefix from `mirror.linkerd.io` to
`multicluster.linkerd.io` in order to reflect other extension naming patterns.

Additionally, it moves labels only used in the Multicluster extension into their
own labels file—again to reflect other extensions.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-02-18 17:10:33 -05:00
Dennis Adjei-Baah 89ec6056c0
Report extension usage (#5761)
This change modifies `linkerd-heartbeat` to report on all linkerd
extensions being used in a cluster.

The heartbeat URL now includes extension names and a value of 
`"1"` if an extension is installed. 
```
https://versioncheck.linkerd.io/version.json?install-time=1613518301&k8s-version=v1.20.2&linkerd-jaeger=1&linkerd-multicluster=1&linkerd-viz=1
```

Fixes #5516

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2021-02-18 07:33:35 -08:00
Kevin Leimkuhler edd3812f30
Add services to proxy injector for opaque ports annotation (#5766)
This adds namespace inheritance of the opaque ports annotation to services. 

This means that the proxy injector now watches services creation in a cluster.
When a new service is created, the webhook receives an admission request for
that service and determines whether a patch needs to be created.

A patch is created if the service does not have the annotation, but the
namespace does. This means the service inherits the annotation from the
namespace.

A patch is not created if the service and the namespace do not have the
annotation, or the service has the annotation. In the case of the service having
the annotation, we don't even need to check the namespace since it would not
inherit it anyways.

If a namespace has the annotation value changed, this will not be reflected on
the service. The service would need to be recreated so that it goes through
another admission request.

None of this applies to the `inject` command which still skips service
injection. We rely on being able to check the namespace annotations, and this is
only possible in the proxy injector webhook when we can query the k8s API.

Closes #5737

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-02-17 20:58:18 -05:00
Oliver Gould 6dc7efd704
docker: Access container images via cr.l5d.io (#5756)
We've created a custom domain, `cr.l5d.io`, that redirects to `ghcr.io`
(using `scarf.sh`). This custom domain allows us to swap the underlying
container registry without impacting users. It also provides us with
important metrics about container usage, without collecting PII like IP
addresses.

This change updates our Helm charts and CLIs to reference this custom
domain. The integration test workflow now refers to the new domain,
while the release workflow continues to use the `ghcr.io/linkerd` registry
for the purpose of publishing images.
2021-02-17 14:31:54 -08:00
Alejandro Pedraza 826d579924
Upgrade proxy-init to v1.3.9 (#5759)
Fixes #5755 follow-up to #5750 and #5751

- Unifies the Go version across Docker and CI to be 1.14.15;
- Updates the GitHub Actions base image from ubuntu-18.04 to ubuntu-20.04; and
- Updates the runtime base image from debian:buster-20201117-slim to debian:buster-20210208-slim.
2021-02-16 17:59:28 -05:00
Oliver Gould 2774780fb8
Update Go to 1.14.15 (#5751)
The Go-1.14 release branch includes a number of important updates. This
change updates our containers' base image to the latest release, 1.14.15

See linkerd/linkerd2-proxy-init#32
Fixes #5655
2021-02-16 08:40:06 -08:00
Kevin Leimkuhler 5dc662ae97
Remove namespace inheritance of opaque ports annotation (#5739)
This change removes the namespace inheritance of the opaque ports annotation.
Now when setting opaque port related fields in destination profile responses, we
only look at the pod annotations.

This prepares for #5736 where the proxy-injector will add the annotation from
the namespace if the pod does not have it already.

Closes #5735

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-02-15 10:21:20 -05:00
Filip Petkovski 73f9fb3518
Use a shared informer when getting node topology (#5722)
Getting information about node topology queries the k8s api directly.
In an environment with high traffic and high number of pods, the
k8s api server can become overwhelmed or start throttling requests.

This MR introduces a node informer to resolve the bottleneck and
fetch node information asynchronously.

Fixes #5684

Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
2021-02-12 11:05:38 -05:00
Tarun Pothulapati a393c42536
values: removal of .global field (#5699)
* values: removal of .global field

Fixes #5425

With the new extension model, We no longer need `Global` field
as we don't rely on chart dependencies anymore. This helps us
further cleanup Values, and make configuration more simpler.

To make upgrades and the usage of new CLI with older config work,
We add a new method called `config.RemoveGlobalFieldIfPresent` that
is used in the upgrade and `FetchCurrentConfiguration` paths to remove
global field and attach its child nodes if global is present. This is verified
by the `TestFetchCurrentConfiguration`'s older test that has the global
field.

We also don't yet remove .global in some helm stable-upgrade tests for
the initial install to work.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-02-11 23:38:34 +05:30
Oliver Gould 8f2d01c5c0
Use fully-qualified DNS names in proxy configuration (#5707)
Pods with unusual DNS configurations may not be able to resolve the
control plane's domain names. We can avoid search path shenanigans by
adding a trailing dot to these names.
2021-02-10 08:27:35 -08:00
Kevin Leimkuhler 75fcc9d623
Move tap from core into Viz extension (#5651)
Closes #5545.

This change moves all tap and tap-injector code into the viz directory. 

The tap and tap-injector components now also use a new tap image—separating
these components from the controller image that they are currently part of. This
means the controller image has removed all its build dependencies related to
tap.

Finally, the tap Protobuf has been separated from the metrics-api and moved into
it's own `.proto` file and gen directory. This introduces a clear split between
metrics-api and tap Protobuf.

There is no change in behavior for the `viz tap` command.

### Reviewing

#### Docker images

All the bin directory scripts should be updated to build and load the tap image.
All the CI workflows should be updated to build and push the tap image.

#### Controller and pkg directories

This is primarily deletions. Most of the deleted code in this directory is now
in the tap directory of the Viz extension.

#### viz/tap

This is the location that all the tap related code now lives in. New files are
mostly moved from the controller and pkg directories. Imports have all been
updated to point at the right locations and Protobuf.

The Protobuf here is taken from metrics-api and contains all tap-related
Protobuf.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-02-09 12:43:21 -05:00
Alejandro Pedraza a04b30d2ab
Simplify SelfCheck API (#5665)
Fixes #5575

Now that only viz makes use of the `SelfCheck` api, merged the `healthcheck.proto` into `viz.proto`.

Also removed the "checkRPC" functionality that was used for handling multiple API responses and was only used by `SelfCheck`, because the extra complexity was not granted. Revert to use the plain vanilla "check" by just concatenating error responses.

## Success Output

```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
√ viz extension self-check
```

## Failure Examples

Failure when viz fails to connect to the k8s api:
```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
× viz extension self-check
    Error calling the Kubernetes API: someerror
    see https://linkerd.io/checks/#l5d-api-control-api for hints

Status check results are ×
```

Failure when viz fails to connect to Prometheus:
```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
× viz extension self-check
    Error calling Prometheus from the control plane: someerror
    see https://linkerd.io/checks/#l5d-api-control-api for hints

Status check results are ×
```

Failure when viz fails to connect to both the k8s api and Prometheus:
```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
× viz extension self-check
    Error calling the Kubernetes API: someerror
    Error calling Prometheus from the control plane: someerror
    see https://linkerd.io/checks/#l5d-api-control-api for hints

Status check results are ×
```
2021-02-05 10:13:45 -05:00
Kevin Leimkuhler 228d8e9e95
Add tracing enabled annotation (#5643)
This change adds the `jaeger.linkerd.io/tracing-enabled` annotation which is
automatically added by the Jaeger extension's `jaeger-injector`.

All pods that receive this annotation have also had the required environment
variables and volume/volume mounts add by the injector.

The purpose of this annotation is that it will allow `jaeger check` to check for
the presence of this annotation instead of needing to look at the proxy
containers directly. If this annotation is not present on pods, `jaeger check`
can warn users that tracing is not configured for those pods. This is similar to
`viz check` warning users that tap is not configured—recenlty added in #5602.

Closes #5632

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-02-03 14:05:15 -05:00
Tarun Pothulapati cd2e911be3
viz: add data-plane and prometheus healthchecks (#5602)
* viz: add data-plane and prometheus healthchecks

Fixes #5325

This branch adds the remaining healthchecks for the viz extension
i.e

- Data-plane metrics check in Prometheus
- `--proxy` mode which also checks for tap injections based
  on annotations.

For this, The following changes were needed
- Category.ID is made public so that --proxy toggleness can be
allowed
- Made tap env key as a field so that it can be re-used for
checks

simplify viz.NewHealthChecker by removing the need to
 pass categoryIDs and instead using
hc.appendCategories directly at the caller to add the
required categories. This is possible by dividing the vizCategories
into separate functions

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-02-01 23:01:13 +05:30
Kevin Leimkuhler 964a190069
viz: only tap pods that have tap explicitly enabled (#5608)
## What this changes

This allows the tap controller to inform `tap` users when pods either have tap
disabled or tap is not enabled yet.

## Why

When a user taps a resource that has not been admitted by the Viz extension's
`tap-injector`, tap is not explicitly disabled but it is also not enabled.
Therefore, the `tap` command hangs and provides no feedback to the user.

Closes #5544

## How

A new `viz.linkerd.io/tap-enabled` annotation is introduced which is
automatically added by the Viz extension's `tap-injector`. This annotation is
added to a pod when it is able to be tapped; this means that the pod and the
pod's namespace do not have the `config.linkerd.io/disable-tap` annotation
added.

When a user attempts to tap a resource, the tap controller now looks for this
new annotation; if the annotation is present on the pod then that pod is
tappable.

If the annotation is not present or tap is explicitly disabled, an error is
returned.

## UI changes

Multiple errors can now occur when trying to tap a resource:

1. There are no pods for the resource.
2. There are pods for the resource, but tap is disabled via pod or namespace
   annotation.
3. There are pods for the resource, but tap is not yet enabled because the
   `tap-injector` did not admit the resource.

Errors are now handled as shown below:

Tap is disabled:

```
❯ bin/linkerd viz tap deploy/test
Error: no pods to tap for deployment/test
pods found with tap disabled via the config.linkerd.io/disable-tap annotation
```

Tap is not enabled:

```
❯ bin/linkerd viz tap deploy/test
Error: no pods to tap for deployment/test
pods found with tap not enabled; try restarting resource so that it can be injected
```

There are a mix of pods with tap disabled or tap not enabled:

```
❯ bin/linkerd viz tap deploy/test
Error: no pods to tap for deployment/test
pods found with tap disabled via the config.linkerd.io/disable-tap annotation
pods found with tap not enabled; try restarting resource so that it can be injected
```

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-01-28 17:37:45 -05:00
Alex Leong 964ce11559
Update generated serviceprofile code (#5580)
I ran `bin/update-codegen.sh` to update the generated code to include the opaque ports in the generated deepcopy function for service profiles.

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-01-22 14:34:49 -08:00
Alejandro Pedraza 8ac5360041
Extract from public-api all the Prometheus dependencies, and moves things into a new viz component 'linkerd-metrics-api' (#5560)
* Protobuf changes:
- Moved `healthcheck.proto` back from viz to `proto/common` as it remains being used by the main `healthcheck.go` library (it was moved to viz by #5510).
- Extracted from `viz.proto` the IP-related types and put them in `/controller/gen/common/net` to be used by both the public and the viz APIs.

* Added chart templates for new viz linkerd-metrics-api pod

* Spin-off viz healthcheck:
- Created `viz/pkg/healthcheck/healthcheck.go` that wraps the original `pkg/healthcheck/healthcheck.go` while adding the `vizNamespace` and `vizAPIClient` fields which were removed from the core `healthcheck`. That way the core healthcheck doesn't have any dependencies on viz, and viz' healthcheck can now be used to retrieve viz api clients.
- The core and viz healthcheck libs are now abstracted out via the new `healthcheck.Runner` interface.
- Refactored the data plane checks so they don't rely on calling `ListPods`
- The checks in `viz/cmd/check.go` have been moved to `viz/pkg/healthcheck/healthcheck.go` as well, so `check.go`'s sole responsibility is dealing with command business. This command also now retrieves its viz api client through viz' healthcheck.

* Removed linkerd-controller dependency on Prometheus:
- Removed the `global.prometheusUrl` config in the core values.yml.
- Leave the Heartbeat's `-prometheus` flag hard-coded temporarily. TO-DO: have it automatically discover viz and pull Prometheus' endpoint (#5352).

* Moved observability gRPC from linkerd-controller to viz:
- Created a new gRPC server under `viz/metrics-api` moving prometheus-dependent functions out of the core gRPC server and into it (same thing for the accompaigning http server).
- Did the same for the `PublicAPIClient` (now called just `Client`) interface. The `VizAPIClient` interface disappears as it's enough to just rely on the viz `ApiClient` protobuf type.
- Moved the other files implementing the rest of the gRPC functions from `controller/api/public` to `viz/metrics-api` (`edge.go`, `stat_summary.go`, etc.).
- Also simplified some type names to avoid stuttering.

* Added linkerd-metrics-api bootstrap files. At the same time, we strip out of the public-api's `main.go` file the prometheus parameters and other no longer relevant bits.

* linkerd-web updates: it requires connecting with both the public-api and the viz api, so both addresses (and the viz namespace) are now provided as parameters to the container.

* CLI updates and other minor things:
- Changes to command files under `cli/cmd`:
  - Updated `endpoints.go` according to new API interface name.
  - Updated `version.go`, `dashboard` and `uninstall.go` to pull the viz namespace dynamically.
- Changes to command files under `viz/cmd`:
  - `edges.go`, `routes.go`, `stat.go` and `top.go`: point to dependencies that were moved from public-api to viz.
- Other changes to have tests pass:
  - Added `metrics-api` to list of docker images to build in actions workflows.
  - In `bin/fmt` exclude protobuf generated files instead of entire directories because directories could contain both generated and non-generated code (case in point: `viz/metrics-api`).

* Add retry to 'tap API service is running' check

* mc check shouldn't err when viz is not available. Also properly set the log in multicluster/cmd/root.go so that it properly displays messages when --verbose is used
2021-01-21 18:26:38 -05:00
Kevin Leimkuhler e7f2a3fba3
viz: add tap-injector (#5540)
## What this changes

This adds a tap-injector component to the `linkerd-viz` extension which is
responsible for adding the tap service name environment variable to the Linkerd
proxy container.

If a pod does not have a Linkerd proxy, no action is taken. If tap is disabled
via annotation on the pod or the namespace, no action is taken.

This also removes the environment variable for explicitly disabling tap through
an environment variable. Tap status for a proxy is now determined only be the
presence or absence of the tap service name environment variable.

Closes #5326

## How it changes

### tap-injector

The tap-injector component determines if `LINKERD2_PROXY_TAP_SVC_NAME` should be
added to a pod's Linkerd proxy container environment. If the pod satisfies the
following, it is added:

- The pod has a Linkerd proxy container
- The pod has not already been mutated
- Tap is not disabled via annotation on the pod or the pod's namespace

### LINKERD2_PROXY_TAP_DISABLED

Now that tap is an extension of Linkerd and not a core component, it no longer
made sense to explicitly enable or disable tap through this Linkerd proxy
environment variable. The status of tap is now determined only be if the
tap-injector adds or does not add the `LINKERD2_PROXY_TAP_SVC_NAME` environment
variable.

### controller image

The tap-injector has been added to the controller image's several startup
commands which determines what it will do in the cluster.

As a follow-up, I think splitting out the `tap` and `tap-injector` commands from
the controller image into a linkerd-viz image (or something like that) makes
sense.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-01-21 11:24:08 -05:00
Oleh Ozimok c416e78261
destination: Fix crash when EndpointSlices are enabled (#5543)
The Destination controller can panic due to a nil-deref when
the EndpointSlices API is enabled.

This change updates the controller to properly initialize values
to avoid this segmentation fault.

Fixes #5521

Signed-off-by: Oleg Ozimok <oleg.ozimok@corp.kismia.com>
2021-01-15 12:52:11 -08:00
Alejandro Pedraza f3b1ebfa99
Separate observability API (#5510)
* Separate observability API

Closes #5312

This is a preliminary step towards moving all the observability API into `/viz`, by first moving its protobuf into `viz/metrics-api`. This should facilitate review as the go files are not moved yet, which will happen in a followup PR. There are no user-facing changes here.

- Moved `proto/common/healthcheck.proto` to `viz/metrics-api/proto/healthcheck.prot`
- Moved the contents of `proto/public.proto` to `viz/metrics-api/proto/viz.proto` except for the `Version` Stuff.
- Merged `proto/controller/tap.proto` into `viz/metrics-api/proto/viz.proto`
- `grpc_server.go` now temporarily exposes `PublicAPIServer` and `VizAPIServer` interfaces to separate both APIs. This will get properly split in a followup.
- The web server provides handlers for both interfaces.
- `cli/cmd/public_api.go` and `pkg/healthcheck/healthcheck.go` temporarily now have methods to access both APIs.
- Most of the CLI commands will use the Viz API, except for `version`.

The other changes in the go files are just changes in the imports to point to the new protobufs.

Other minor changes:
- Removed `git add controller/gen` from `bin/protoc-go.sh`
2021-01-13 14:34:54 -05:00
Filip Petkovski 40192e258a
Ignore pods with status.phase=Succeeded when watching IP addresses (#5412)
Ignore pods with status.phase=Succeeded when watching IP addresses

When a pod terminates successfully, some CNIs will assign its IP address
to newly created pods. This can lead to duplicate pod IPs in the same
Kubernetes cluster.

Filter out pods which are in a Succeeded phase since they are not 
routable anymore.

Fixes #5394

Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
2021-01-12 12:25:37 -05:00
Tarun Pothulapati e04647fb8d
remove prom check for public-api self-check (#5436)
Currently, public-api is part of the core control-plane where
the prom check fails when ran before the viz extension is installed.
This change comments out that check, Once metrics api is moved into
viz, maybe this check can be part of it instead or directly part of
`linkerd viz check`.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-01-05 17:22:39 -05:00
Alejandro Pedraza d3d7f4e2e2
Destination should return `OpaqueTransport` hint when annotation matches resolved target port (#5458)
The destination service now returns `OpaqueTransport` hint when the annotation
matches the resolve target port. This is different from the current behavior
which always sets the hint when a proxy is present.

Closes #5421

This happens by changing the endpoint watcher to set a pod's opaque port
annotation in certain cases. If the pod already has an annotation, then its
value is used. If the pod has no annotation, then it checks the namespace that
the endpoint belongs to; if it finds an annotation on the namespace then it
overrides the pod's annotation value with that.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-01-05 14:54:55 -05:00
Kevin Leimkuhler b830efdad7
Add OpaqueTransport field to destination protocol hints (#5421)
## What

When the destination service returns a destination profile for an endpoint,
indicate if the endpoint can receive opaque traffic.

## Why

Closes #5400

## How

When translating a pod address to a destination profile, the destination service
checks if the pod is controlled by any linkerd control plane. If it is, it can
set a protocol hint where we indicate that it supports H2 and opaque traffic.

If the pod supports opaque traffic, we need to get the port that it expects
inbound traffic on. We do this by getting the proxy container and reading it's
`LINKERD2_PROXY_INBOUND_LISTEN_ADDR` environment variable. If we successfully
parse that into a port, we can set the opaque transport field in the destination
profile.

## Testing

A test has been added to the destination server where a pod has a
`linkerd-proxy` container. We can expect the `OpaqueTransport` field to be set
in the returned destination profile's protocol hint.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-12-23 11:06:39 -05:00
Kevin Leimkuhler 7c0843a823
Add opaque ports to destination service updates (#5294)
## Summary

This changes the destination service to start indicating whether a profile is an
opaque protocol or not.

Currently, profiles returned by the destination service are built by chaining
together updates coming from watching Profile and Traffic Split updates.

With this change, we now also watch updates to Opaque Port annotations on pods
and namespaces; if an update occurs this is now included in building a profile
update and is sent to the client.

## Details

Watching updates to Profiles and Traffic Splits is straightforward--we watch
those resources and if an update occurs on one associated to a service we care
about then the update is passed through.

For Opaque Ports this is a little different because it is an annotation on pods
or namespaces. To account for this, we watch the endpoints that we should care
about.

### When host is a Pod IP

When getting the profile for a Pod IP, we check for the opaque ports annotation
on the pod and the pod's namespace. If one is found, we'll indicate if the
profile is an opaque protocol if the requested port is in the annotation.

We do not subscribe for updates to this pod IP. The only update we really care
about is if the pod is deleted and this is already handled by the proxy.

### When host is a Service

When getting the profile for a Service, we subscribe for updates to the
endpoints of that service. For any ports set in the opaque ports annotation on
any of the pods, we check if the requested port is present.

Since the endpoints for a service can be added and removed, we do subscribe for
updates to the endpoints of the service.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-12-18 12:38:59 -05:00
Alejandro Pedraza d661054795
Fix CLI install/upgrade overriding settings in HA (#5399)
Fixes #5385

## The problems

- `linkerd install --ha` isn't honoring flags
- `linkerd upgrade --ha` is overridding existing configs silently or failing with an error
- *Upgrading HA instances from before 2.9 to version 2.9.1 results in configs being overridden silently, or the upgrade fails with an error*

## The cause

The change in #5358 attempted to fix `linkerd install --ha` that was only applying some of the `values-ha.yaml` defaults, by calling `charts.NewValues(true)` and merging that with the values built from `values.yaml` overriden by the flags. It turns out the `charts.NewValues()` implementation was by itself merging against `values.yaml` and as a result any flag was getting overridden by its default.

This also happened when doing `linkerd upgrade --ha` on an existing instance, which could result in silently overriding settings, or it could also fail loudly like for example when upgrading set up that has an external issuer (in this case the issuer cert won't be able to be read during upgrade and an error would occur as described in #5385).

Finally, when doing `linkerd upgrade` (no --ha flag) on an HA install from before 2.9 results in configs getting overridden as well (silently or with an error) because in order to generate the `linkerd-config-overrides` secret, the original install flags are retrieved from `linkerd-config` via the `loadStoredValuesLegacy()` function which then effectively ends up performing a `linkerd upgrade` with all the flags used for `linkerd install` and falls into the same trap as above.

## The fix

In `values.go` the faulting merging logic is not used anymore, so now `NewValues()` only returns the default values from `values.yaml` and doesn't require an argument anymore. It calls `readDefaults()` which now only returns the appropriate values depending on whether we're on HA or not.
There's a new function `MergeHAValues()` that merges `values-ha.yaml` into the current values (it doesn't look into `values.yaml` anymore), which is only used when processing the `--ha` flag in `options.go`.

## How to test

To replicate the issue try setting a custom setting and check it's not applied:
```bash
linkerd install --ha --controller-log level debug | grep log.level
- -log-level=info
```

## Followup

This wasn't caught because we don't have HA integration tests. Now that our test infra is based on k3d, it should be easy to make such a test using a cluster with multiple nodes. Either that or issuing `linkerd install --ha` with additional configs and compare against a golden file.
2020-12-18 12:11:52 -05:00
Alejandro Pedraza 578d4a19e9
Have the tap APIServer refresh its cert automatically (#5388)
Followup to #5282, fixes #5272 in its totality.

This follows the same pattern as the injector/sp-validator webhooks, leveraging `FsCredsWatcher` to watch for changes in the cert files.

To reuse code from the webhooks, we moved `updateCert()` to `creds_watcher.go`, and `run()` as well (which now is called `ProcessEvents()`).

The `TestNewAPIServer` test in `apiserver_test.go` was removed as it really was just testing two things: (1) that `apiServerAuth` doesn't error which is already covered in the following test, and (2) that the golib call `net.Listen("tcp", addr)` doesn't error, which we're not interested in testing here.

## How to test

To test that the injector/sp-validator functionality is still correct, you can refer to #5282

The steps below are similar, but focused towards the tap component:

```bash
# Create some root cert
$ step certificate create linkerd-tap.linkerd.svc ca.crt ca.key   --profile root-ca --no-password --insecure

# configure tap's caBundle to be that root cert
$ cat > linkerd-overrides.yml << EOF
tap:
  externalSecret: true
  caBundle: |
    < ca.crt contents>
EOF

# Install linkerd
$ bin/linkerd install --config linkerd-overrides.yml | k apply -f -

# Generate an intermediatery cert with short lifespan
$ step certificate create linkerd-tap.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-tap.linkerd.svc

# Create the secret using that intermediate cert
$ kubectl create secret tls \
  linkerd-tap-k8s-tls \
   --cert=ca-int.crt \
   --key=ca-int.key \
   --namespace=linkerd

# Rollout the tap pod for it to pick the new secret
$ k -n linkerd rollout restart deploy/linkerd-tap

# Tap should work
$ bin/linkerd tap -n linkerd deploy/linkerd-web
req id=0:0 proxy=in  src=10.42.0.15:33040 dst=10.42.0.11:9994 tls=true :method=GET :authority=10.42.0.11:9994 :path=/metrics
rsp id=0:0 proxy=in  src=10.42.0.15:33040 dst=10.42.0.11:9994 tls=true :status=200 latency=1779µs
end id=0:0 proxy=in  src=10.42.0.15:33040 dst=10.42.0.11:9994 tls=true duration=65µs response-length=1709B

# Wait 5 minutes and rollout tap again
$ k -n linkerd rollout restart deploy/linkerd-tap

# You'll see in the logs that the cert expired:
$ k -n linkerd logs -f deploy/linkerd-tap tap
2020/12/15 16:03:41 http: TLS handshake error from 127.0.0.1:45866: remote error: tls: bad certificate
2020/12/15 16:03:41 http: TLS handshake error from 127.0.0.1:45870: remote error: tls: bad certificate

# Recreate the secret
$ step certificate create linkerd-tap.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-tap.linkerd.svc
$ k -n linkerd delete secret linkerd-tap-k8s-tls
$ kubectl create secret tls \
  linkerd-tap-k8s-tls \
   --cert=ca-int.crt \
   --key=ca-int.key \
   --namespace=linkerd

# Wait a few moments and you'll see the certs got reloaded and tap is working again
time="2020-12-15T16:03:42Z" level=info msg="Updated certificate" addr=":8089" component=apiserver
```
2020-12-16 17:46:14 -05:00
Alex Leong cdc57d1af0
Use linkerd-jaeger extension for control plane tracing (#5299)
Now that tracing has been split out of the main control plane and into the linkerd-jaeger extension, we remove references to tracing from the main control plane including:

* removing the tracing components from the main control plane chart
* removing the tracing injection logic from the main proxy injector and inject CLI (these will be added back into the new injector in the linkerd-jaeger extension)
* removing tracing related checks (these will be added back into `linkerd jaeger check`)
* removing related tests

We also update the `--control-plane-tracing` flag to configure the control plane components to send traces to the linkerd-jaeger extension.  To make sure this works even when the linkerd-jaeger extension is installed in a non-default namespace, we also add a `--control-plane-tracing-namespace` flag which can be used to change the namespace that the control plane components send traces to.

Note that for now, only the control plane components send traces; the proxies in the control plane do not.  This is because the linkerd-jaeger injector is not yet available.  However, this change adds the appropriate namespace annotations to the control plane namespace to configure the proxies to send traces to the linkerd-jaeger extension once the linkerd-jaeger injector is available.

I tested this by doing the following:

1. bin/linkerd install | kubectl apply -f -
1. bin/helm install jaeger jaeger/charts/jaeger
1. bin/linkerd upgrade --control-plane-tracing=true | kubectl apply -f -
1. kubectl -n linkerd-jaeger port-forward svc/jaeger 16686
1. open http://localhost:16686
1. see traces from the linkerd control plane

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-12-08 14:34:26 -08:00
Tarun Pothulapati 72a0ca974d
extension: Separate multicluster chart and binary (#5293)
Fixes #5257

This branch movies mc charts and cli level code to a new
top level directory. None of the logic is changed.

Also, moves some common types into `/pkg` so that they
are accessible both to the main cli and extensions.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-12-04 16:36:10 -08:00
Alejandro Pedraza 4c634a3816
Have webhooks refresh their certs automatically (#5282)
* Have webhooks refresh their certs automatically

Fixes partially #5272

In 2.9 we introduced the ability for providing the certs for `proxy-injector` and `sp-validator` through some external means like cert-manager, through the new helm setting `externalSecret`.
We forgot however to have those services watch changes in their secrets, so whenever they were rotated they would fail with a cert error, with the only workaround being to restart those pods to pick the new secrets.

This addresses that by first abstracting out `FsCredsWatcher` from the identity controller, which now lives under `pkg/tls`.

The webhook's logic in `launcher.go` no longer reads the certs before starting the https server, moving that instead into `server.go` which in a similar way as identity will receive events from `FsCredsWatcher` and update `Server.cert`. We're leveraging `http.Server.TLSConfig.GetCertificate` which allows us to provide a function that will return the current cert for every incoming request.

### How to test

```bash
# Create some root cert
$ step certificate create linkerd-proxy-injector.linkerd.svc ca.crt ca.key \
  --profile root-ca --no-password --insecure --san linkerd-proxy-injector.linkerd.svc

# configure injector's caBundle to be that root cert
$ cat > linkerd-overrides.yaml << EOF
proxyInjector:
  externalSecret: true
    caBundle: |
      < ca.crt contents>
EOF

# Install linkerd. The injector won't start untill we create the secret below
$ bin/linkerd install --controller-log-level debug --config linkerd-overrides.yaml | k apply -f -

# Generate an intermediatery cert with short lifespan
step certificate create linkerd-proxy-injector.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-proxy-injector.linkerd.svc

# Create the secret using that intermediate cert
$ kubectl create secret tls \
  linkerd-proxy-injector-k8s-tls \
   --cert=ca-int.crt \
   --key=ca-int.key \
   --namespace=linkerd

# start following the injector log
$ k -n linkerd logs -f -l linkerd.io/control-plane-component=proxy-injector -c proxy-injector

# Inject emojivoto. The pods should be injected normally
$ bin/linkerd inject https://run.linkerd.io/emojivoto.yml | kubectl apply -f -

# Wait about 5 minutes and delete a pod
$ k -n emojivoto delete po -l app=emoji-svc

# You'll see it won't be injected, and something like "remote error: tls: bad certificate" will appear in the injector logs.

# Regenerate the intermediate cert
$ step certificate create linkerd-proxy-injector.linkerd.svc ca-int.crt ca-int.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 4m --no-password --insecure --san linkerd-proxy-injector.linkerd.svc

# Delete the secret and recreate it
$ k -n linkerd delete secret linkerd-proxy-injector-k8s-tls
$ kubectl create secret tls \
  linkerd-proxy-injector-k8s-tls \
   --cert=ca-int.crt \
   --key=ca-int.key \
   --namespace=linkerd

# Wait a couple of minutes and you'll see some filesystem events in the injector log along with a "Certificate has been updated" entry
# Then delete the pod again and you'll see it gets injected this time
$ k -n emojivoto delete po -l app=emoji-svc

```
2020-12-04 16:25:59 -05:00
Alejandro Pedraza 9cbfb08a38
Bump proxy-init to v1.3.8 (#5283) 2020-11-27 09:07:34 -05:00
hodbn 92eb174e06
Add safe accessor for Global in linkerd-config (#5269)
CLI crashes if linkerd-config contains unexpected values.

Add a safe accessor that initializes an empty Global on the first
access. Refactor all accesses to use the newly introduced accessor using
gopls.

Add test for linkerd-config data without Global.

Fixes #5215

Co-authored-by: Itai Schwartz <yitai27@gmail.com>
Signed-off-by: Hod Bin Noon <bin.noon.hod@gmail.com>
2020-11-23 12:45:58 -08:00
Kevin Leimkuhler ca86a31816
Add destination service tests for the IP path (#5266)
This adds additional tests for the destination service that assert `GetProfile`
behavior when the path is an IP address.

1. Assert that when the path is a cluster IP, the configured service profile is
   returned.
2. Assert that when the path a pod IP, the endpoint field is populated in the
   service profile returned.
3. Assert that when the path is not a cluster or pod IP, the default service
   profile is returned.
4. Assert that when path is a pod IP with or without the controller annotation,
the endpoint has or does not have a protocol hint

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-11-23 13:17:05 -05:00
Alejandro Pedraza 4687dc52aa
Refactor webhook framework to allow webhooks define their flags (#5256)
* Refactor webhook framework to allow webhook define their flags

Pulled out of `launcher.go` the flag parsing logic and moved it into the `Main` methods of the webhooks (under `controller/cmd/proxy.injector/main.go` and `controller/cmd/sp-validator/main.go`), so that individual webhooks themselves can define the flags they want to use.

Also no longer require that webhooks have cluster-wide access.

Finally, renamed the type `webhook.handlerFunc` to `webhook.Handler` so it can be exported. This will be used in the upcoming jaeger webhook.
2020-11-23 10:40:30 -05:00
Kevin Leimkuhler 92f9387997
Check correct label value when setting protocl hint (#5267)
This fixes an issue where the protocol hint is always set on endpoint responses.
We now check the right value which determines if the pod has the required label.

A test for this has been added to #5266.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-11-20 13:32:50 -08:00
Oliver Gould 375ffd782f
proxy: v2.121.0 (#5253)
This release changes error handling to teardown the server-side
connection when an unexpected error is encountered.

Additionally, the outbound TCP routing stack can now skip redundant
service discovery lookups when profile responses include endpoint
information.

Finally, the cache implementation has been updated to reduce latency by
removing unnecessary buffers.

---

* h2: enable HTTP/2 keepalive PING frames (linkerd/linkerd2-proxy#737)
* actions: Add timeouts to GitHub actions (linkerd/linkerd2-proxy#738)
* outbound: Skip endpoint resolution on profile hint (linkerd/linkerd2-proxy#736)
* Add a FromStr for dns::Name (linkerd/linkerd2-proxy#746)
* outbound: Avoid redundant TCP endpoint resolution (linkerd/linkerd2-proxy#742)
* cache: Make the cache cloneable with RwLock (linkerd/linkerd2-proxy#743)
* http: Teardown serverside connections on error (linkerd/linkerd2-proxy#747)
2020-11-18 16:55:53 -08:00
Kevin Leimkuhler e65f216d52
Add endpoint to GetProfile response (#5227)
Context: #5209

This updates the destination service to set the `Endpoint` field in `GetProfile`
responses.

The `Endpoint` field is only set if the IP maps to a Pod--not a Service.

Additionally in this scenario, the default Service Profile is used as the base
profile so no other significant fields are set.

### Examples

```
# GetProfile for an IP that maps to a Service
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.43.222.0:9090
INFO[0000] fully_qualified_name:"linkerd-prometheus.linkerd.svc.cluster.local"  retry_budget:{retry_ratio:0.2  min_retries_per_second:10  ttl:{seconds:10}}  dst_overrides:{authority:"linkerd-prometheus.linkerd.svc.cluster.local.:9090"  weight:10000}
```

Before:

```
# GetProfile for an IP that maps to a Pod
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.42.0.20
INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}}
```


After:

```
# GetProfile for an IP that maps to a Pod
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.42.0.20
INFO[0000] retry_budget:{retry_ratio:0.2  min_retries_per_second:10  ttl:{seconds:10}}  endpoint:{addr:{ip:{ipv4:170524692}}  weight:10000  metric_labels:{key:"control_plane_ns"  value:"linkerd"}  metric_labels:{key:"deployment"  value:"fast-1"}  metric_labels:{key:"pod"  value:"fast-1-5cc87f64bc-9hx7h"}  metric_labels:{key:"pod_template_hash"  value:"5cc87f64bc"}  metric_labels:{key:"serviceaccount"  value:"default"}  tls_identity:{dns_like_identity:{name:"default.default.serviceaccount.identity.linkerd.cluster.local"}}  protocol_hint:{h2:{}}}
```

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-11-18 15:41:25 -05:00
Alejandro Pedraza 5a707323e6
Update proxy-init to v1.3.7 (#5221)
This upgrades both the proxy-init image itself, and the go dependency on
proxy-init as a library, which fixes CNI in k3s and any host using
binaries coming from BusyBox, where `nsenter` has an
issue parsing arguments (see rancher/k3s#1434).
2020-11-13 15:59:14 -05:00
Tarun Pothulapati 4c106e9c08
cli: make check return SkipError when there is no prometheus configured (#5150)
Fixes #5143

The availability of prometheus is useful for some calls in public-api
that the check uses. This change updates the ListPods in public-api
to still return the pods even when prometheus is not configured.

For a test that exclusively checks for prometheus metrics, we have a gate
which checks if a prometheus is configured and skips it othervise.

Signed-off-by: Tarun Pothulapati tarunpothulapati@outlook.com
2020-10-29 19:57:11 +05:30
Alex Leong 4f34ce8e2f
Empty the stored addresses when the endpoint translator gets a NoEndpoints message (#5126)
Signed-off-by: Alex Leong <alex@buoyant.io>
2020-10-22 17:01:03 -07:00
Oliver Gould d0bce594ea
Remove defunct proxy config variables (#5109)
The proxy no longer honors DESTINATION_GET variables, as profile lookups
inform when endpoint resolution is performed.  Also, there is no longer
a router capacity limit.
2020-10-20 16:13:53 -07:00
Oliver Gould c5d3b281be
Add 100.64.0.0/10 to the set of discoverable networks (#5099)
It appears that Amazon can use the `100.64.0.0/10` network, which is
technically private, for a cluster's Pod network.

Wikipedia describes the network as:

> Shared address space for communications between a service provider
> and its subscribers when using a carrier-grade NAT.

In order to avoid requiring additional configuration on EKS clusters, we
should permit discovery for this network by default.
2020-10-19 12:59:44 -07:00
Kevin Leimkuhler eff50936bf
Fix --all-namespaces flag handling (#5085)
## Motivations

Closes #5080

## Solution

When the `--all-namespaces` (`-A`) flag is set for the `linkerd edges` command,
ignore the `namespace` value set by default or `-n`.

This is similar to the behavior for `kubectl`. `kubectl get -A -n linkerd pods`
showing pods in all namespaces.

### Behavior changes

With linkerd and emojivoto installed, this results in:

Before:

```
❯ linkerd edges -A pods
No edges found.
```

After:

```
❯ linkerd edges -A pods
SRC                                   DST                                       SRC_NS      DST_NS      SECURED       
vote-bot-6cb9cb9569-wl6w5             web-5d69bcfdb7-mxf8f                      emojivoto   emojivoto   √  
web-5d69bcfdb7-mxf8f                  emoji-7dc976587b-rb9c5                    emojivoto   emojivoto   √  
web-5d69bcfdb7-mxf8f                  voting-bdf4f778c-pjkjg                    emojivoto   emojivoto   √  
linkerd-prometheus-68d6897d75-ghmgm   emoji-7dc976587b-rb9c5                    linkerd     emojivoto   √  
linkerd-prometheus-68d6897d75-ghmgm   vote-bot-6cb9cb9569-wl6w5                 linkerd     emojivoto   √  
linkerd-prometheus-68d6897d75-ghmgm   voting-bdf4f778c-pjkjg                    linkerd     emojivoto   √  
linkerd-prometheus-68d6897d75-ghmgm   web-5d69bcfdb7-mxf8f                      linkerd     emojivoto   √  
linkerd-controller-7d965cf78d-qw6xj   linkerd-prometheus-68d6897d75-ghmgm       linkerd     linkerd     √  
linkerd-prometheus-68d6897d75-ghmgm   linkerd-controller-7d965cf78d-qw6xj       linkerd     linkerd     √  
linkerd-prometheus-68d6897d75-ghmgm   linkerd-destination-74dbb9c46b-nkxgh      linkerd     linkerd     √  
linkerd-prometheus-68d6897d75-ghmgm   linkerd-grafana-5d9fb67dc6-sn2l8          linkerd     linkerd     √  
linkerd-prometheus-68d6897d75-ghmgm   linkerd-identity-c875b5d58-b756v          linkerd     linkerd     √  
linkerd-prometheus-68d6897d75-ghmgm   linkerd-proxy-injector-767b55988d-n9r6f   linkerd     linkerd     √  
linkerd-prometheus-68d6897d75-ghmgm   linkerd-sp-validator-6c8df84fb9-4w8kc     linkerd     linkerd     √  
linkerd-prometheus-68d6897d75-ghmgm   linkerd-tap-777fbf7656-p87dm              linkerd     linkerd     √  
linkerd-prometheus-68d6897d75-ghmgm   linkerd-web-546c9444b5-68xpx              linkerd     linkerd     √
```

`linkerd edges -A -n linkerd pods` results in all edges as well (the result
above).

The behavior of `linkerd edges pods` does not change and shows edges in the
`default` namespace.

```
❯ linkerd edges pods
No edges found.
```

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-10-16 16:49:10 -04:00
Oliver Gould 4f16a234aa
Add a default set of ports to bypass the proxy (#5093)
The proxy has a default, hardcoded set of ports on which it doesn't do
protocol detection (25, 587, 3306 -- all of which are server-first
protocols). In a recent change, this default set was removed from
the outbound proxy, since there was no way to configure it to anything
other than the default set. I had thought that there was a default set
applied to proxy-init, but this appears to not be the case.

This change adds these ports to the default Helm values to restore the
prior behavior.

I have also elected to include 443 in this set, as it is generally our
recommendation to avoid proxying HTTPS traffic, since the proxy provides
very little value on these connections today.

Additionally, the memcached port 11211 is skipped by default, as clients
do not issue any sort of preamble that is immediately detectable.

These defaults may change in the future, but seem like good choices for
the 2.9 release.
2020-10-16 11:53:41 -07:00
Alejandro Pedraza 777b06ac55
Expand 'linkerd edges' to work with TCP connections (#5040)
* Expand 'linkerd edges' to work with TCP connections

Fixes #4999

Before:
```
$ bin/linkerd edges po -owide
SRC                                   DST                                    SRC_NS    DST_NS    CLIENT_ID   SERVER_ID   SECURED
linkerd-prometheus-764ddd4f88-t6c2j   rabbitmq-controller-5c6cf7cc6d-8lxp2   linkerd   default                           √
linkerd-prometheus-764ddd4f88-t6c2j   temp                                   linkerd   default                           √

```

After:
```
$ bin/linkerd edges po -owide
SRC                                   DST                                    SRC_NS    DST_NS    CLIENT_ID         SERVER_ID         SECURED
temp                                  rabbitmq-controller-5c6cf7cc6d-5fpsc   default   default   default.default   default.default   √
linkerd-prometheus-66fb97b7fc-vpnxf   rabbitmq-controller-5c6cf7cc6d-5fpsc   linkerd   default                                       √
linkerd-prometheus-66fb97b7fc-vpnxf   temp                                   linkerd   default                                       √
```

With the latest proxy upgrade to v2.113.0 (#5037), the `tcp_open_total` metric now contains the `client_id` label so that we can replace the http-only metric `response_total` with this one to determine edges for TCP-only connections.

This change basically performs the same query as before, but two times, one for `response_total` and another for `tcp_open_total`. For each resulting entry, the latter is kept if `client_id` is present, otherwise the former is used (if present at all). That way things keep on working for older proxies.

Disclaimers:
- This doesn't fix #3706: if two sources connect to the same destination there's no way to tell them appart from the metrics perspective and their edges can get mangled. To fix that, the proxy would have to expose `src_resource` labels in the `tcp_open_total` total inbound metric.
- Note connections coming from prometheus are still unidentified. The reason is those hit the proxy's admin server (instead of the main container) which doesn't expose metrics.
2020-10-12 09:14:39 -05:00
Alejandro Pedraza 11a5d1d427
Fix Heartbeat mem and cpu stats (#5042)
Since k8s 1.16 cadvisor uses the `container` label instead of
`container_name` in the prometheus metrics it exposes.
The heartbeat queries were using the latter, so they were broken
for k8s version since 1.16.

Note that the `p99-handle-us` value is still missing because the
`request_handle_us` metrics is always zero.
2020-10-08 16:31:16 -05:00
Tarun Pothulapati 1e7bb1217d
Update Injection to use new linkerd-config.values (#5036)
This PR Updates the Injection Logic (both CLI and proxy-injector)
to use `Values` struct instead of protobuf Config, part of our move
in removing the protobuf.

This does not touch any of the flags, install related code.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

Co-authored-by: Alex Leong <alex@buoyant.io>
2020-10-07 09:54:34 -07:00
Tarun Pothulapati 5e774aaf05
Remove dependency of linkerd-config for control plane components (#4915)
* Remove dependency of linkerd-config for most control plane components

This PR removes the dependency of `linkerd-config` into control
plane components by making all that information passed through CLI
flags. As most of these components require a couple of flags, passing
them as flags could be more helpful, as updations to the flags trigger a
rollout unlike a configMap update.

This does not update the proxy-injector as it needs a lot more data
and mounting `linkerd-config` is better.
2020-10-06 22:19:18 +05:30
Kevin Leimkuhler 6b7a39c9fa
Set FQN in profile resolutions (#5019)
## Motivation

Closes #5016

Depends on linkerd/linkerd2-proxy-api#44

## Solution

A `profileTranslator` exists for each service and now has a new
`fullyQualifiedName` field.

This field is used to set the `FullyQualifiedName` field of
`DestinationProfile`s each time an update is sent.

In the case that no service profile exists for a service, a default
`DestinationProfile` is created and we can use the field to set the correct
name.

In the case that a service profile does exist for a service, we still use this
field to set the name to keep it consistent.

### Example

Install linkerd on a cluster and run the destination server:

```
go run controller/cmd/main.go destination -kubeconfig ~/.kube/config
```

Get the IP of a service. Here, we'll get the ip for `linkerd-identity`:

```
> kubectl get -n linkerd svc/linkerd-identity
NAME               TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
linkerd-identity   ClusterIP   10.43.161.68   <none>        8080/TCP   4h25m
```

Get the profile of `linkerd-identity` from service name or IP and note the
`FullyQualifiedName` field:

```
> go run controller/script/destination-client/main.go -method getProfile -path 10.43.161.68:8080
INFO[0000] fully_qualified_name:"linkerd-identity.linkerd.svc.cluster.local" ..
```

```
> go run controller/script/destination-client/main.go -method getProfile -path linkerd-identity.linkerd.svc.cluster.local
INFO[0000] fully_qualified_name:"linkerd-identity.linkerd.svc.cluster.local" ..
```

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-10-01 11:06:00 -04:00
Tarun Pothulapati d0caaa86c4
Bump k8s client-go to v0.19.2 (#5002)
Fixes #4191 #4993

This bumps Kubernetes client-go to the latest v0.19.2 (We had to switch directly to 1.19 because of this issue). Bumping to v0.19.2 required upgrading to smi-sdk-go v0.4.1. This also depends on linkerd/stern#5

This consists of the following changes:

- Fix ./bin/update-codegen.sh by adding the template path to the gen commands, as it is needed after we moved to GOMOD.
- Bump all k8s related dependencies to v0.19.2
- Generate CRD types, client code using the latest k8s.io/code-generator
- Use context.Context as the first argument, in all code paths that touch the k8s client-go interface

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-09-28 12:45:18 -05:00
Alejandro Pedraza b30d35f46a
Reset service-mirror component when target's k8s API is unreachable (#4996)
When the service-mirror component can't reach the target's k8s API, the goroutine blocks and it can't be unblocked.

This was happenining specifically in the case of the multicluster integration test (still to be pushed), where the source and target clusters are created in quick succession and the target's API service doesn't always have time to be exposed before being requested by the service mirror.

The fix consists on no longer have restartClusterWatcher be side-effecting, and instead return an error. If such error is not nil then the link watcher is stopped and reset after 10 seconds.
2020-09-25 11:00:28 -05:00
Zahari Dichev 0b649e3ed7
Remove double slash (#4985)
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-09-21 12:15:54 -07:00
Oliver Gould 6d67b84447
profiles: Eliminate default timeout (#4958)
* profiles: Eliminate default timeout
2020-09-10 14:00:18 -07:00
Alejandro Pedraza ccf027c051
Push docker images to ghcr.io instead of gcr.io (#4953)
* Push docker images to ghcr.io instead of gcr.io

The `cloud_integration.yml` and `release.yml` workflows were modified to
log into ghcr.io, and remove the `Configure gcloud` step which is no
longer necessary.

Note that besides the changes to cloud_integration.yml and release.yml, there was a change to the upgrade-stable integration test so that we do linkerd upgrade --addon-overwrite to reset the addons settings because in stable-2.8.1 the Grafana image was pegged to gcr.io/linkerd-io/grafana in linkerd-config-addons. This will need to be mentioned in the 2.9 upgrade notes.

Also the egress integration test has a debug container that now is pegged to the edge-20.9.2 tag.

Besides that, the other changes are just a global search and replace (s/gcr.io\/linkerd-io/ghcr.io\/linkerd/).
2020-09-10 15:16:24 -05:00
Oliver Gould 7ee638bb0c
inject: Configure the proxy to discover profiles for unnamed services (#4960)
The proxy performs endpoint discovery for unnamed services, but not
service profiles.

The destination controller and proxy have been updated to support
lookups for unnamed services in linkerd/linkerd2#4727 and
linkerd/linkerd2-proxy#626, respectively.

This change modifies the injection template so that the
`proxy.destinationGetNetworks` configuration enables profile
discovery for all networks on which endpoint discovery is permitted.
2020-09-10 12:44:00 -07:00
Zahari Dichev 77c88419b8
Make destination and identity services headless (#4923)
* Make destination and identity svcs headless

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-09-02 14:53:38 -05:00
Ali Ariff 5186383c81
Add ARM64 Integration Test (#4897)
* Add ARM64 Integration Test

Signed-off-by: Ali Ariff <ali.ariff12@gmail.com>
2020-08-28 10:38:40 -07:00
Alex Leong 9d3cf6ee4d
Move most service-mirror code out of cmd package (#4901)
All of the code for the service mirror controller lives in the `linkerd/linkerd2/controller/cmd` package.  It is typical for control plane components to only have a `main.go` entrypoint in the cmd package.  This can sometimes make it hard to find the service mirror code since I wouldn't expect it to be in the cmd package.

We move the majority of the code to a dedicated controller package, leaving only main.go in the cmd package.  This is purely organizational; no behavior change is expected.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-08-27 14:17:18 -07:00
Matei David 7ed904f31d
Enable endpoint slices when upgrading through CLI (#4864)
## What/How
@adleong  pointed out in #4780 that when enabling slices during an upgrade, the new value does not persist in the `linkerd-config` ConfigMap. I took a closer look and it seems that we were never overwriting the values in case they were different.

* To fix this, I added an if block when validating and building the upgrade options -- if the current flag value differs from what we have in the ConfigMap, then change the ConfigMap value.
* When doing so, I made sure to check that if the cluster does not support `EndpointSlices` yet the flag is set to true, we will error out. This is done similarly (copy&paste similarily) to what's in the install part.
* Additionally, I have noticed that the helm ConfigMap template stored the flag value under `enableEndpointSlices` field name. I assume this was not changed in the initial PR to reflect the changes made in the protocol buffer. The API (and thus the CLI) uses the field name `endpointSliceEnabled` instead. I have changed the config template so that helm installations will use the same field, which can then be used in the destination service or other components that may implement slice support in the future.

Signed-off-by: Matei David <matei.david.35@gmail.com>
2020-08-24 14:34:50 -07:00
Kevin Leimkuhler c2301749ef
Always return destination overrides for services (#4890)
## Motivation

#4879

## Solution

When no traffic split exists for services, return a single destination override
with a weight of 100%.

Using the destination client on a new linkerd installation, this results in the
following output for `linkerd-identity` service:

```
❯ go run controller/script/destination-client/main.go -method getProfile -path linkerd-identity.linkerd.svc.cluster.local:8080
INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} dst_overrides:{authority:"linkerd-identity.linkerd.svc.cluster.local.:8080" weight:100000} 
INFO[0000]
```

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-08-19 12:25:58 -07:00
Matei David f797ab1e65
service topologies: topology-aware service routing (#4780)
[Link to RFC](https://github.com/linkerd/rfc/pull/23)

### What
---
* PR that puts together all past pieces of the puzzle to deliver topology-aware service routing, as specified in the [Kubernetes docs](https://kubernetes.io/docs/concepts/services-networking/service-topology/) but with a much better load balancing algorithm and all the coolness of linkerd :) 
* The first piece of this PR is focused on adding topology metadata: topology preference for services and topology `<k,v>` pairs for endpoints.
* The second piece of this PR puts together the new context format and fetching the source node topology metadata in order to allow for endpoints filtering.
* The final part is doing the filtering -- passing all of the metadata to the listener and on every `Add` filtering endpoints based on the topology preference of the service, topology `<k,v>` pairs of endpoints and topology of the source (again `<k,v>` pairs).

### How
---

* **Collecting metadata**:
   -  Services do not have values for topology keys -- the topological keys defined in a service's spec are only there to dictate locality preference for routing; as such, I decided to store them in an array, they will be taken exactly as they are found in the service spec, this ensures we respect the preference order.

   - For EndpointSlices, we are using a map -- an EndpointSlice has locality information in the form of `<k,v>` pair, where the key is a topological key (similar to what's listed in the service) and the value is the locality information -- e.g `hostname: minikube`. For each address we now have a map of topology values which gets populated when we translate the endpoints to an address set. Because normal Endpoints do not have any topology information, we create each address with an empty map which is subsequently populated ONLY for slices in the `endpointSliceToAddressSet` function.

* **Filtering endpoints**:
  - This was a tricky part and filled me with doubts. I think there are a few ways to do this, but this is how I "envisioned" it. First, the `endpoint_translator.go` should be the one to do the filtering; this means that on subscription, we need to feed all of the relevant metadata to the listener. To do this, I created a new function `AddTopologyFilter` as part of the listener interface.

  - To complement the `AddTopologyFilter` function, I created a new `TopologyFilter` struct in `endpoints_watcher.go`. I then embedded this structure in all listeners that implement the interface. The structure holds the source topology (source node), a boolean to tell if slices are activated in case we need to double check (or write tests for the function) and the service preference. We create the filter on Subscription -- we have access to the k8s client here as well as the service, so it's the best point to collect all of this data together. Addresses all have their own topology added to them so they do not have to be collected by the filter.

  - When we add a new set of addresses, we check to see if slices are enabled -- chances are if slices are enabled, service topology might be too. This lets us skip this step if the latest version is not adopted. Prior to sending an `Add` we filter the endpoints -- if the preference is registered by the filter we strictly enforce it, otherwise nothing changes.

And that's pretty much it. 

Signed-off-by: Matei David <matei.david.35@gmail.com>
2020-08-18 11:11:09 -07:00
Kevin Leimkuhler 525ad264b0
Print identity in destination client and fix proxy-identity log line (#4873)
## Motivation

These changes came up when testing mock identity. I found it useful for the
destination client to print the identity of endpoints.

```
❯ go run controller/script/destination-client/main.go -method get -path h1.test.example.com:8080
INFO[0000] Add:                                         
INFO[0000] labels: map[concrete:h1.test.example.com:8080] 
INFO[0000] - 127.0.0.1:4143                             
INFO[0000]   - labels: map[addr:127.0.0.1:4143 h2:false] 
INFO[0000]   - protocol hint: UNKNOWN                   
INFO[0000]   - identity: dns_like_identity:{name:"foo.ns1.serviceaccount.identity.linkerd.cluster.local"} 
INFO[0000]
```

I also fixed a log line in the proxy-identity where used the wrong value for the
CSR path

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-08-13 13:49:55 -07:00
Josh Soref 72aadb540f
Spelling (#4872)
This PR corrects misspellings identified by the [check-spelling action](https://github.com/marketplace/actions/check-spelling).

The misspellings have been reported at aaf440489e (commitcomment-41423663)

The action reports that the changes in this PR would make it happy: 5b82c6c5ca

Note: this PR does not include the action. If you're interested in running a spell check on every PR and push, that can be offered separately.

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
2020-08-12 21:59:50 -07:00
Alejandro Pedraza 4876a94ed0
Update proxy-init version to v1.3.6 (#4850)
Supersedes #4846

Bump proxy-init to v1.3.6, containing CNI fixes and support for
multi-arch builds.
#4846 included this in v1.3.5 but proxy.golang.org refused to update the
modified SHA
2020-08-11 11:54:00 -05:00
Alex Leong 024a35a3d3
Move multicluster API connectivity checks earlier (#4819)
Fixes #4774

When a service mirror controller is unable to connect to the target cluster's API, the service mirror controller crashes with the error that it has failed to sync caches.  This error lacks the necessary detail to debug the situation.  Unfortunately, client-go does not surface more useful information about why the caches failed to sync.

To make this more debuggable we do a couple things:

1. When creating the target cluster api client, we eagerly issue a server version check to test the connection.  If the connection fails, the service-mirror-controller logs now look like this:

```
time="2020-07-30T23:53:31Z" level=info msg="Got updated link broken: {Name:broken Namespace:linkerd-multicluster TargetClusterName:broken TargetClusterDomain:cluster.local TargetClusterLinkerdNamespace:linkerd ClusterCredentialsSecret:cluster-credentials-broken GatewayAddress:35.230.81.215 GatewayPort:4143 GatewayIdentity:linkerd-gateway.linkerd-multicluster.serviceaccount.identity.linkerd.cluster.local ProbeSpec:ProbeSpec: {path: /health, port: 4181, period: 3s} Selector:{MatchLabels:map[] MatchExpressions:[{Key:mirror.linkerd.io/exported Operator:Exists Values:[]}]}}"
time="2020-07-30T23:54:01Z" level=error msg="Unable to create cluster watcher: cannot connect to api for target cluster remote: Get \"https://36.199.152.138/version?timeout=32s\": dial tcp 36.199.152.138:443: i/o timeout"
```

This error also no longer causes the service mirror controller to crash.  Updating the Link resource will cause the service mirror controller to reload the credentials and try again.

2. We rearrange the checks in `linkerd check --multicluster` to perform the target API connectivity checks before the service mirror controller checks.  This means that we can validate the target cluster API connection even if the service mirror controller is not healthy.  We also add a server version check here to quickly determine if the connection is healthy.  Sample check output:

```
linkerd-multicluster
--------------------
√ Link CRD exists
√ Link resources are valid
	* broken
W0730 16:52:05.620806   36735 transport.go:243] Unable to cancel request for promhttp.RoundTripperFunc
× remote cluster access credentials are valid
            * failed to connect to API for cluster: [broken]: Get "https://36.199.152.138/version?timeout=30s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    see https://linkerd.io/checks/#l5d-smc-target-clusters-access for hints

W0730 16:52:35.645499   36735 transport.go:243] Unable to cancel request for promhttp.RoundTripperFunc
× clusters share trust anchors
    Problematic clusters:
    * broken: unable to fetch anchors: Get "https://36.199.152.138/api/v1/namespaces/linkerd/configmaps/linkerd-config?timeout=30s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    see https://linkerd.io/checks/#l5d-multicluster-clusters-share-anchors for hints
√ service mirror controller has required permissions
	* broken
√ service mirror controllers are running
	* broken
× all gateway mirrors are healthy
        wrong number of (0) gateway metrics entries for probe-gateway-broken.linkerd-multicluster
    see https://linkerd.io/checks/#l5d-multicluster-gateways-endpoints for hints
√ all mirror services have endpoints
‼ all mirror services are part of a Link
        mirror service voting-svc-gke.emojivoto is not part of any Link
    see https://linkerd.io/checks/#l5d-multicluster-orphaned-services for hints
```

Some logs from the underlying go network libraries sneak into the output which is kinda gross but I don't think it interferes too much with being able to understand what's going on.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-08-05 11:48:23 -07:00
Ali Ariff 61d7dedd98
Build ARM docker images (#4794)
Build ARM docker images in the release workflow.

# Changes:
- Add a new env key `DOCKER_MULTIARCH` and `DOCKER_PUSH`. When set, it will build multi-arch images and push them to the registry. See https://github.com/docker/buildx/issues/59 for why it must be pushed to the registry.
- Usage of `crazy-max/ghaction-docker-buildx ` is necessary as it already configured with the ability to perform cross-compilation (using QEMU) so we can just use it, instead of manually set up it.
- Usage of `buildx` now make default global arguments. (See: https://docs.docker.com/engine/reference/builder/#automatic-platform-args-in-the-global-scope)

# Follow-up:
- Releasing the CLI binary file in ARM architecture. The docker images resulting from these changes already build in the ARM arch. Still, we need to make another adjustment like how to retrieve those binaries and to name it correctly as part of Github Release artifacts.

Signed-off-by: Ali Ariff <ali.ariff12@gmail.com>
2020-08-05 11:14:01 -07:00
Alex Leong 381f237f69
Add multicluster unlink command (#4802)
Fixes #4707 

In order to remove a multicluster link, we add a `linkerd multicluster unlink` command which produces the yaml necessary to delete all of the resources associated with a `linkerd multicluster link`.  These are:
* the link resource
* the service mirror controller deployment
* the service mirror controller's RBAC
* the probe gateway mirror for this link
* all mirror services for this link

This command follows the same pattern as the `linkerd uninstall` command in that its output is expected to be piped to `kubectl delete`.  The typical usage of this command is:

```
linkerd --context=source multicluster unlink --cluster-name=foo | kubectl --context=source delete -f -
```

This change also fixes the shutdown lifecycle of the service mirror controller by properly having it listen for the shutdown signal and exit its main loop.

A few alternative designs were considered:

I investigated using owner references as suggested [here](https://github.com/linkerd/linkerd2/issues/4707#issuecomment-653494591) but it turns out that owner references must refer to resources in the same namespace (or to cluster scoped resources).  This was not feasible here because a service mirror controller can create mirror services in many different namespaces.

I also considered having the service mirror controller delete the mirror services that it created during its own shutdown.  However, this could lead to scenarios where the controller is killed before it finishes deleting the services that it created.  It seemed more reliable to have all the deletions happen from `kubectl delete`.  Since this is the case, we avoid having the service mirror controller delete mirror services, even when the link is deleted, to avoid the race condition where the controller and CLI both attempt to delete the same mirror services and one of them fails with a potentially alarming error message.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-08-04 16:21:59 -07:00
cpretzer 670caaf8ff
Update to proxy-init v1.3.4 (#4815)
Signed-off-by: Charles Pretzer <charles@buoyant.io>
2020-07-30 15:58:58 -05:00
Alex Leong a1543b33e3
Add support for service-mirror selectors (#4795)
* Add selector support

Signed-off-by: Alex Leong <alex@buoyant.io>

* Removed unused labels

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-07-30 10:07:14 -07:00
Matt Miller fc33b9b9aa
support overriding inbound and outbound connect timeouts. (#4759)
* support overriding inbound and outbound connect timeouts.
* add validation on user provided TCP connect timeouts
* convert valid time values into ms

Signed-off-by: Matt Miller <mamiller@rosettastone.com>
2020-07-27 13:56:21 -07:00
Tharun Rajendran e24c323bf9
Gateway Metrics in Dashboard (#4717)
* Introduce multicluster gateway api handler in web api server
* Added MetricsUtil for Gateway metrics
* Added gateway api helper
* Added Gateway Component

Updated metricsTable component to support gateway metrics
Added handler for gateway

Fixes #4601

Signed-off-by: Tharun <rajendrantharun@live.com>
2020-07-27 12:43:54 -07:00
Matei David 1c197b14e7
Change destination context token format (#4771)
Add a new structure on the destination controller side to keep track of contextual information.
The token format has been changed from ns:<namespace> to a JSON format so that more variables can be
encdoed in the token. As part of this PR, a new field 'nodeName' has been added to help with service
topologies.

Fixes #4498

Signed-off-by: Matei David <matei.david.35@gmail.com>
2020-07-27 09:49:48 -07:00
Alex Leong d540e16c8b
Make service mirror controller per target cluster (#4710)
This PR removes the service mirror controller from `linkerd mc install` to `linkerd mc link`, as described in https://github.com/linkerd/rfc/pull/31.  For fuller context, please see that RFC.

Basic multicluster functionality works here including:
* `linkerd mc install` installs the Link CRD but not any service mirror controllers
* `linkerd mc link` creates a Link resource and installs a service mirror controller which uses that Link
* The service mirror controller creates and manages mirror services, a gateway mirror, and their endpoints.
* The `linkerd mc gateways` command lists all linked target clusters, their liveliness, and probe latences.
* The `linkerd check` multicluster checks have been updated for the new architecture.  Several checks have been rendered obsolete by the new architecture and have been removed.

The following are known issues requiring further work:
* the service mirror controller uses the existing `mirror.linkerd.io/gateway-name` and `mirror.linkerd.io/gateway-ns` annotations to select which services to mirror.  it does not yet support configuring a label selector.
* an unlink command is needed for removing multicluster links: see https://github.com/linkerd/linkerd2/issues/4707
* an mc uninstall command is needed for uninstalling the multicluster addon: see https://github.com/linkerd/linkerd2/issues/4708

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-07-23 14:32:50 -07:00
Alejandro Pedraza 5e789ba152
Migrate CI to docker buildx and other improvements (#4765)
* Migrate CI to docker buildx and other improvements

## Motivation
- Improve build times in forks. Specially when rerunning builds because of some flaky test.
- Start using `docker buildx` to pave the way for multiplatform builds.

## Performance improvements
These timings were taken for the `kind_integration.yml` workflow when we merged and rerun the lodash bump PR (#4762)

Before these improvements:
- when merging: `24:18`
- when rerunning after merge (docker cache warm): `19:00`
- when running the same changes in a fork (no docker cache): `32:15`

After these improvements:
- when merging: `25:38`
- when rerunning after merge (docker cache warm): `19:25`
- when running the same changes in a fork (docker cache warm): `19:25`

As explained below, non-forks and forks now use the same cache, so the important take is that forks will always start with a warm cache and we'll no longer see long build times like the `32:15` above.
The downside is a slight increase in the build times for non-forks (up to a little more than a minute, depending on the case).

## Build containers in parallel
The `docker_build` job in the `kind_integration.yml`, `cloud_integration.yml` and `release.yml` workflows relied on running `bin/docker-build` which builds all the containers in sequence. Now each container is built in parallel using a matrix strategy.

## New caching strategy
CI now uses `docker buildx` for building the container images, which allows using an external cache source for builds, a location in the filesystem in this case. That location gets cached using actions/cache, using the key `{{ runner.os }}-buildx-${{ matrix.target }}-${{ env.TAG }}` and the restore key `${{ runner.os }}-buildx-${{ matrix.target }}-`.

For example when building the `web` container, its image and all the intermediary layers get cached under the key `Linux-buildx-web-git-abc0123`. When that has been cached in the `main` branch, that cache will be available to all the child branches, including forks. If a new branch in a fork asks for a key like `Linux-buildx-web-git-def456`, the key won't be found during the first CI run, but the system falls back to the key `Linux-buildx-web-git-abc0123` from `main` and so the build will start with a warm cache (more info about how keys are matched in the [actions/cache docs](https://docs.github.com/en/actions/configuring-and-managing-workflows/caching-dependencies-to-speed-up-workflows#matching-a-cache-key)).

## Packet host no longer needed
To benefit from the warm caches both in non-forks and forks like just explained, we're required to ditch doing the builds in Packet and now everything runs in the github runners VMs.
As a result there's no longer separate logic for non-forks and forks in the workflow files; `kind_integration.yml` was greatly simplified but `cloud_integration.yml` and `release.yml` got a little bigger in order to use the actions artifacts as a repository for the images built. This bloat will be fixed when support for [composite actions](https://github.com/actions/runner/blob/users/ethanchewy/compositeADR/docs/adrs/0549-composite-run-steps.md) lands in github.

## Local builds
You still are able to run `bin/docker-build` or any of the `docker-build.*` scripts. And to make use of buildx, run those same scripts after having set the env var `DOCKER_BUILDKIT=1`. Using buildx supposes you have installed it, as instructed [here](https://github.com/docker/buildx).

## Other
- A new script `bin/docker-cache-prune` is used to remove unused images from the cache. Without that the cache grows constantly and we can rapidly hit the 5GB limit (when the limit is attained the oldest entries get evicted).
- The `go-deps` dockerfile base image was changed from `golang:1.14.2` (ubuntu based) to `golang-1:14.2-alpine` also to conserve cache space.

# Addressed separately in #4875:

Got rid of the `go-deps` image and instead added something similar on top of all the Dockerfiles dealing with `go`, as a first stage for those Dockerfiles. That continues to serve as a way to pre-populate go's build cache, which speeds up the builds in the subsequent stages. That build should in theory be rebuilt automatically only when `go.mod` or `go.sum` change, and now we don't require running `bin/update-go-deps-shas`. That script was removed along with all the logic elsewhere that used it, including the `go_dependencies` job in the `static_checks.yml` github workflow.

The list of modules preinstalled was moved from `Dockerfile-go-deps` to a new script `bin/install-deps`. I couldn't find a way to generate that list dynamically, so whenever a slow-to-compile dependency is found, we have to make sure it's included in that list.

Although this simplifies the dev workflow, note that the real motivation behind this was a limitation in buildx's `docker-container` driver that forbids us from depending on images that haven't been pushed to a registry, so we have to resort to building the dependencies as a first stage in the Dockerfiles.
2020-07-22 14:27:45 -05:00
Wei Lun 85a042c151
add fish shell completion (#4751)
fixes #4208

Signed-off-by: Wei Lun <weilun_95@hotmail.com>
2020-07-20 15:46:30 -07:00
Matei David 146c593cd5
Uncomment EndpointSliceAccess function (#4760)
* Small PR that uncomments the `EndpointSliceAcess` method and cleans up left over todos in the destination service.
* Based on the past three PRs related to `EndpointSlices` (#4663 #4696 #4740); they should now be functional (albeit prone to bugs) and ready to use.

Signed-off-by: Matei David <matei.david.35@gmail.com>
2020-07-20 14:50:43 -07:00
Tarun Pothulapati b7e9507174
Remove/Relax prometheus related checks (#4724)
* Removes/Relaxes prometheus related checks

Now that prometheus is an add-on, There can be cases where prometheus is
disabled at which the check should show a warning but not fail. This
decouples the tight depedency.

This changes the following checks:

- Removes serviceAccount and pod checks in the CLI.
- Relaxes `linkerd-api` checks to only check for prometheus access when
the URL is not empty. This should work seamlessly with external
prometheus as that URL will be passed and it performs the same
check.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-07-20 14:24:00 -07:00
Matei David 8b85716eb8
Introduce install flag for EndpointSlices (#4740)
EndpointSlices have been made opt-in due to their experimental nature. This PR
introduces a new install flag 'enableEndpointSlices' that will allow adopters to
specify in their cli install or helm install step whether they would like to
use endpointslices as a resource in the destination service, instead of the
endpoints k8s resource.

Signed-off-by: Matei David <matei.david.35@gmail.com>
2020-07-15 09:53:04 -07:00
Kevin Leimkuhler f49b40c4a9
Add support for profile lookups by IP address (#4727)
## Motivation

Closes #3916

This adds the ability to get profiles for services by IP address.

### Change in behavior

When the destination server receives a `GetProfile` request with an IP address,
it now tries to map that IP address to a service.

If the IP address maps to an existing service, then the destination server
returns the profile stream subscribes for updates to the _service_--this is the
existing behavior. If the IP changes to a new service, the stream will still
send updates for the first service the IP address corresponded to since that is
what it is subscribed to.

If the IP address does not map to an existing service, then the destination
server returns the profile stream but does not subscribe for updates. The stream
will receive one update, the default profile.

### Solution

This change uses the `IPWatcher` within the destination server to check for what
services an IP address correspond to. By adding a new method `GetSvc` to
`IPWatcher`, the server now calls this method if `GetProfile` receives a request
with an IP address.

## Testing

Install linkerd on a cluster and get the cluster IP of any service:

```bash
❯ kubectl get -n linkerd svc/linkerd-tap -o wide
NAME          TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)            AGE   SELECTOR
linkerd-tap   ClusterIP   10.104.57.90   <none>        8088/TCP,443/TCP   16h   linkerd.io/control-plane-component=tap
```

Run the destination server:

```bash
❯ go run controller/cmd/main.go destination -kubeconfig ~/.kube/config
```

Get the profile for the tap service by IP address:

```bash
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.104.57.90:8088
INFO[0000] retry_budget:{retry_ratio:0.2  min_retries_per_second:10  ttl:{seconds:10}} 
INFO[0000]
```

Get the profile for an IP address that does not correspond to a service:

```bash
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.256.0.1:8088
INFO[0000] retry_budget:{retry_ratio:0.2  min_retries_per_second:10  ttl:{seconds:10}} 
INFO[0000]
```

You can add and remove settings for the service profile for tap and get updates.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-07-10 14:41:15 -07:00
cpretzer d3553c59fd
Add volume and volumeMount for buster-based proxy-init (#4692)
* Add volume and volumeMount for buster-based proxy-init

Signed-off-by: Charles Pretzer <charles@buoyant.io>
2020-07-09 09:55:07 -07:00
Zahari Dichev 73010149ce
Do not treat evicted pods as failed in healthchecks (#4732)
When a k8s pod is evicted its Phase is set to Failed and the reason is set to Evicted. Because in the ListPods method of the public APi we only transmit the phase and treat it as Status, the healthchecks assume such evicted data plane pods to be failed. Since this check is retryable, the results is that linkerd check --proxy appears to hang when there are evicted pods. As @adleong correctly pointed out here, the presence of evicted pod is not something that we should make the checks fail.

This change modifies the publci api to set the Pod.Status to "Evicted" for evicted pods. The healtcheks are also modified to not treat evicted pods as error cases.

Fix #4690

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-07-09 14:22:27 +03:00
Alejandro Pedraza e4273522b8
Delete unused files (#4729)
Removed `controller/proxy-injector/webhook_ops.go` and `controller/sp-validator/webhook_ops.go` that we used when we first introduced webhooks to dynamically create their configs, but we ended up doing that upfront at install time.
2020-07-08 06:41:44 -05:00
Suraj Deshmukh d7dbe9cbff
Fix spelling mistakes using codespell (#4700)
Using following command the wrong spelling were found and later on
fixed:

```
codespell --skip CHANGES.md,.git,go.sum,\
    controller/cmd/service-mirror/events_formatting.go,\
    controller/cmd/service-mirror/cluster_watcher_test_util.go,\
    SECURITY_AUDIT.pdf,.gcp.json.enc,web/app/img/favicon.png \
    --ignore-words-list=aks,uint,ans,files\' --check-filenames \
    --check-hidden
```

Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com>
2020-07-07 17:07:22 -05:00
Matei David 9d8d89cce8
Add EndpointSlice logic to EndpointsWatcher (#4501) (#4663)
Introduce support for the EndpointSlice k8s resource (k8s v1.16+) in the destination service.
Through this PR, in the EndpointsWatcher, there will be a dedicated informer for EndpointSlice;
the informer cannot run at the same time as the Endpoints resource informer. The main difference
is that EndpointSlices have a one-to-many relationship with a service, they provide better performance benefits,
dual-stack addresses and more. EndpointSlice support also implies service topology and other k8s related features.

Validated and tested manually, as well as with dedicated unit tests.

Closes #4501

Signed-off-by: Matei David <matei.david.35@gmail.com>
2020-07-07 13:20:40 -07:00
Matei David a2bd230cd6
service topologies: add Kubernetes/API EndpointSlice support (#4696)
Based on the [EndpointSlice PR](https://github.com/linkerd/linkerd2/pull/4663), this is just the k8s/api support for endpointslices to shorten the first PR.

* Adds CRD
* Adds functions that check whether the cluster has EndpointSlice access
* Adds discovery & endpointslice informers to api.

Signed-off-by: Matei David <matei.david.35@gmail.com>
2020-07-06 15:28:48 -07:00
Naseem 361d35bb6a
feat: add log format annotation and helm value (#4620)
* feat: add log format annotation and helm value

Json log formatting has been added via https://github.com/linkerd/linkerd2-proxy/pull/500
but wiring the option through as an annotation/helm value is still
necessary.

This PR adds the annotation and helm value to configure log format.

Closes #2491

Signed-off-by: Naseem <naseem@transit.app>
2020-07-02 10:08:52 -05:00
Arthur Silva Sens 021048d576
GoDocs for completion, dashboard and diagnostics cli commands (#4518)
Signed-off-by: arthursens <arthursens2005@gmail.com>
2020-06-30 05:53:50 -05:00
Alejandro Pedraza aea541d6f9
Upgrade generated protobuf files to v1.4.2 (#4673)
Regenerated protobuf files, using version 1.4.2 that was upgraded from
1.3.2 with the proxy-api update in #4614.

As of v1.4 protobuf messages are disallowed to be copied (because they
hold a mutex), so whenever a message is passed to or returned from a
function we need to use a pointer.

This affects _mostly_ test files.

This is required to unblock #4620 which is adding a field to the config
protobuf.
2020-06-26 09:36:48 -05:00
Oliver Gould c4d649e25d
Update proxy-api version to v0.1.13 (#4614)
This update includes no API changes, but updates grpc-go
to the latest release.
2020-06-24 12:52:59 -07:00
Lutz Behnke 846d2f11d4
Add support for Helm configuration of per-component proxy resources requests and limits (#4226)
Signed-off-by: Lutz Behnke <lutz.behnke@finleap.com>
2020-06-24 12:54:27 -05:00
Zahari Dichev 7f3d872930
Add destination-get-networks option (#4608)
In #4585 we are observing an issue where a loop is encountered when using nginx ingress. The problem is that the outbound proxy does a dst lookup on the IP address which happens to be the very same address the ingress is listening on.

In order to avoid situations like that this PR introduces a way to modify the set of networks for which the proxy shall do IP based discovery. The change introduces a helm flag `.Values.global.proxy.destinationGetNetworks` that can be used to modify this value. There are two ways a user can affect the this setting: 


- setting the `destinationGetNetworks` field in values during a Helm install, which changes the default on all injected pods
- using an annotation ` config.linkerd.io/proxy-destination-get-networks` for injected workloads to override this value

Note that this setting cannot be tweaked through the `install` or `inject` command

Fix: #4585

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-06-18 20:07:47 +03:00
Alex Leong 755538b84a
Resolve gateway hostnames into IP addresses (#4588)
Fixes #4582 

When a target cluster gateway is exposed as a hostname rather than with a fixed IP address, the service mirror controller fails to create mirror services and gateway mirrors for that gateway.  This is because we only look at the IP field of the gateway service.

We make two changes to address this problem:
 
First, when extracting the gateway spec from a gateway that has a hostname instead of an IP address, we do a DNS lookup to resolve that hostname into an IP address to use in the mirror service endpoints and gateway mirror endpoints.

Second, we schedule a repair job on a regular (1 minute) to update these endpoint objects.  This has the effect of re-resolving the DNS names every minute to pick up any changes in DNS resolution.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-06-15 10:33:49 -07:00
Zahari Dichev f01bcfe722
Tweak service-mirror log levels (#4562)
This PR just modifies the log levels on the probe and cluster watchers
to emit in INFO what they would emit in DEBUG. I think it makes sense
as we need that information to track problems. The only difference is
that when probing gateways we only log if the probe attempt was
unsuccessful.

Fix #4546
2020-06-05 13:12:36 -07:00
Zahari Dichev 3365455e45
Fix mc labels (#4560)
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-06-05 19:36:09 +03:00