Commit Graph

578 Commits

Author SHA1 Message Date
Kevin Leimkuhler 525ad264b0
Print identity in destination client and fix proxy-identity log line (#4873)
## Motivation

These changes came up when testing mock identity. I found it useful for the
destination client to print the identity of endpoints.

```
❯ go run controller/script/destination-client/main.go -method get -path h1.test.example.com:8080
INFO[0000] Add:                                         
INFO[0000] labels: map[concrete:h1.test.example.com:8080] 
INFO[0000] - 127.0.0.1:4143                             
INFO[0000]   - labels: map[addr:127.0.0.1:4143 h2:false] 
INFO[0000]   - protocol hint: UNKNOWN                   
INFO[0000]   - identity: dns_like_identity:{name:"foo.ns1.serviceaccount.identity.linkerd.cluster.local"} 
INFO[0000]
```

I also fixed a log line in the proxy-identity where used the wrong value for the
CSR path

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-08-13 13:49:55 -07:00
Josh Soref 72aadb540f
Spelling (#4872)
This PR corrects misspellings identified by the [check-spelling action](https://github.com/marketplace/actions/check-spelling).

The misspellings have been reported at aaf440489e (commitcomment-41423663)

The action reports that the changes in this PR would make it happy: 5b82c6c5ca

Note: this PR does not include the action. If you're interested in running a spell check on every PR and push, that can be offered separately.

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
2020-08-12 21:59:50 -07:00
Alejandro Pedraza 4876a94ed0
Update proxy-init version to v1.3.6 (#4850)
Supersedes #4846

Bump proxy-init to v1.3.6, containing CNI fixes and support for
multi-arch builds.
#4846 included this in v1.3.5 but proxy.golang.org refused to update the
modified SHA
2020-08-11 11:54:00 -05:00
Alex Leong 024a35a3d3
Move multicluster API connectivity checks earlier (#4819)
Fixes #4774

When a service mirror controller is unable to connect to the target cluster's API, the service mirror controller crashes with the error that it has failed to sync caches.  This error lacks the necessary detail to debug the situation.  Unfortunately, client-go does not surface more useful information about why the caches failed to sync.

To make this more debuggable we do a couple things:

1. When creating the target cluster api client, we eagerly issue a server version check to test the connection.  If the connection fails, the service-mirror-controller logs now look like this:

```
time="2020-07-30T23:53:31Z" level=info msg="Got updated link broken: {Name:broken Namespace:linkerd-multicluster TargetClusterName:broken TargetClusterDomain:cluster.local TargetClusterLinkerdNamespace:linkerd ClusterCredentialsSecret:cluster-credentials-broken GatewayAddress:35.230.81.215 GatewayPort:4143 GatewayIdentity:linkerd-gateway.linkerd-multicluster.serviceaccount.identity.linkerd.cluster.local ProbeSpec:ProbeSpec: {path: /health, port: 4181, period: 3s} Selector:{MatchLabels:map[] MatchExpressions:[{Key:mirror.linkerd.io/exported Operator:Exists Values:[]}]}}"
time="2020-07-30T23:54:01Z" level=error msg="Unable to create cluster watcher: cannot connect to api for target cluster remote: Get \"https://36.199.152.138/version?timeout=32s\": dial tcp 36.199.152.138:443: i/o timeout"
```

This error also no longer causes the service mirror controller to crash.  Updating the Link resource will cause the service mirror controller to reload the credentials and try again.

2. We rearrange the checks in `linkerd check --multicluster` to perform the target API connectivity checks before the service mirror controller checks.  This means that we can validate the target cluster API connection even if the service mirror controller is not healthy.  We also add a server version check here to quickly determine if the connection is healthy.  Sample check output:

```
linkerd-multicluster
--------------------
√ Link CRD exists
√ Link resources are valid
	* broken
W0730 16:52:05.620806   36735 transport.go:243] Unable to cancel request for promhttp.RoundTripperFunc
× remote cluster access credentials are valid
            * failed to connect to API for cluster: [broken]: Get "https://36.199.152.138/version?timeout=30s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    see https://linkerd.io/checks/#l5d-smc-target-clusters-access for hints

W0730 16:52:35.645499   36735 transport.go:243] Unable to cancel request for promhttp.RoundTripperFunc
× clusters share trust anchors
    Problematic clusters:
    * broken: unable to fetch anchors: Get "https://36.199.152.138/api/v1/namespaces/linkerd/configmaps/linkerd-config?timeout=30s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    see https://linkerd.io/checks/#l5d-multicluster-clusters-share-anchors for hints
√ service mirror controller has required permissions
	* broken
√ service mirror controllers are running
	* broken
× all gateway mirrors are healthy
        wrong number of (0) gateway metrics entries for probe-gateway-broken.linkerd-multicluster
    see https://linkerd.io/checks/#l5d-multicluster-gateways-endpoints for hints
√ all mirror services have endpoints
‼ all mirror services are part of a Link
        mirror service voting-svc-gke.emojivoto is not part of any Link
    see https://linkerd.io/checks/#l5d-multicluster-orphaned-services for hints
```

Some logs from the underlying go network libraries sneak into the output which is kinda gross but I don't think it interferes too much with being able to understand what's going on.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-08-05 11:48:23 -07:00
Ali Ariff 61d7dedd98
Build ARM docker images (#4794)
Build ARM docker images in the release workflow.

# Changes:
- Add a new env key `DOCKER_MULTIARCH` and `DOCKER_PUSH`. When set, it will build multi-arch images and push them to the registry. See https://github.com/docker/buildx/issues/59 for why it must be pushed to the registry.
- Usage of `crazy-max/ghaction-docker-buildx ` is necessary as it already configured with the ability to perform cross-compilation (using QEMU) so we can just use it, instead of manually set up it.
- Usage of `buildx` now make default global arguments. (See: https://docs.docker.com/engine/reference/builder/#automatic-platform-args-in-the-global-scope)

# Follow-up:
- Releasing the CLI binary file in ARM architecture. The docker images resulting from these changes already build in the ARM arch. Still, we need to make another adjustment like how to retrieve those binaries and to name it correctly as part of Github Release artifacts.

Signed-off-by: Ali Ariff <ali.ariff12@gmail.com>
2020-08-05 11:14:01 -07:00
Alex Leong 381f237f69
Add multicluster unlink command (#4802)
Fixes #4707 

In order to remove a multicluster link, we add a `linkerd multicluster unlink` command which produces the yaml necessary to delete all of the resources associated with a `linkerd multicluster link`.  These are:
* the link resource
* the service mirror controller deployment
* the service mirror controller's RBAC
* the probe gateway mirror for this link
* all mirror services for this link

This command follows the same pattern as the `linkerd uninstall` command in that its output is expected to be piped to `kubectl delete`.  The typical usage of this command is:

```
linkerd --context=source multicluster unlink --cluster-name=foo | kubectl --context=source delete -f -
```

This change also fixes the shutdown lifecycle of the service mirror controller by properly having it listen for the shutdown signal and exit its main loop.

A few alternative designs were considered:

I investigated using owner references as suggested [here](https://github.com/linkerd/linkerd2/issues/4707#issuecomment-653494591) but it turns out that owner references must refer to resources in the same namespace (or to cluster scoped resources).  This was not feasible here because a service mirror controller can create mirror services in many different namespaces.

I also considered having the service mirror controller delete the mirror services that it created during its own shutdown.  However, this could lead to scenarios where the controller is killed before it finishes deleting the services that it created.  It seemed more reliable to have all the deletions happen from `kubectl delete`.  Since this is the case, we avoid having the service mirror controller delete mirror services, even when the link is deleted, to avoid the race condition where the controller and CLI both attempt to delete the same mirror services and one of them fails with a potentially alarming error message.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-08-04 16:21:59 -07:00
cpretzer 670caaf8ff
Update to proxy-init v1.3.4 (#4815)
Signed-off-by: Charles Pretzer <charles@buoyant.io>
2020-07-30 15:58:58 -05:00
Alex Leong a1543b33e3
Add support for service-mirror selectors (#4795)
* Add selector support

Signed-off-by: Alex Leong <alex@buoyant.io>

* Removed unused labels

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-07-30 10:07:14 -07:00
Matt Miller fc33b9b9aa
support overriding inbound and outbound connect timeouts. (#4759)
* support overriding inbound and outbound connect timeouts.
* add validation on user provided TCP connect timeouts
* convert valid time values into ms

Signed-off-by: Matt Miller <mamiller@rosettastone.com>
2020-07-27 13:56:21 -07:00
Tharun Rajendran e24c323bf9
Gateway Metrics in Dashboard (#4717)
* Introduce multicluster gateway api handler in web api server
* Added MetricsUtil for Gateway metrics
* Added gateway api helper
* Added Gateway Component

Updated metricsTable component to support gateway metrics
Added handler for gateway

Fixes #4601

Signed-off-by: Tharun <rajendrantharun@live.com>
2020-07-27 12:43:54 -07:00
Matei David 1c197b14e7
Change destination context token format (#4771)
Add a new structure on the destination controller side to keep track of contextual information.
The token format has been changed from ns:<namespace> to a JSON format so that more variables can be
encdoed in the token. As part of this PR, a new field 'nodeName' has been added to help with service
topologies.

Fixes #4498

Signed-off-by: Matei David <matei.david.35@gmail.com>
2020-07-27 09:49:48 -07:00
Alex Leong d540e16c8b
Make service mirror controller per target cluster (#4710)
This PR removes the service mirror controller from `linkerd mc install` to `linkerd mc link`, as described in https://github.com/linkerd/rfc/pull/31.  For fuller context, please see that RFC.

Basic multicluster functionality works here including:
* `linkerd mc install` installs the Link CRD but not any service mirror controllers
* `linkerd mc link` creates a Link resource and installs a service mirror controller which uses that Link
* The service mirror controller creates and manages mirror services, a gateway mirror, and their endpoints.
* The `linkerd mc gateways` command lists all linked target clusters, their liveliness, and probe latences.
* The `linkerd check` multicluster checks have been updated for the new architecture.  Several checks have been rendered obsolete by the new architecture and have been removed.

The following are known issues requiring further work:
* the service mirror controller uses the existing `mirror.linkerd.io/gateway-name` and `mirror.linkerd.io/gateway-ns` annotations to select which services to mirror.  it does not yet support configuring a label selector.
* an unlink command is needed for removing multicluster links: see https://github.com/linkerd/linkerd2/issues/4707
* an mc uninstall command is needed for uninstalling the multicluster addon: see https://github.com/linkerd/linkerd2/issues/4708

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-07-23 14:32:50 -07:00
Alejandro Pedraza 5e789ba152
Migrate CI to docker buildx and other improvements (#4765)
* Migrate CI to docker buildx and other improvements

## Motivation
- Improve build times in forks. Specially when rerunning builds because of some flaky test.
- Start using `docker buildx` to pave the way for multiplatform builds.

## Performance improvements
These timings were taken for the `kind_integration.yml` workflow when we merged and rerun the lodash bump PR (#4762)

Before these improvements:
- when merging: `24:18`
- when rerunning after merge (docker cache warm): `19:00`
- when running the same changes in a fork (no docker cache): `32:15`

After these improvements:
- when merging: `25:38`
- when rerunning after merge (docker cache warm): `19:25`
- when running the same changes in a fork (docker cache warm): `19:25`

As explained below, non-forks and forks now use the same cache, so the important take is that forks will always start with a warm cache and we'll no longer see long build times like the `32:15` above.
The downside is a slight increase in the build times for non-forks (up to a little more than a minute, depending on the case).

## Build containers in parallel
The `docker_build` job in the `kind_integration.yml`, `cloud_integration.yml` and `release.yml` workflows relied on running `bin/docker-build` which builds all the containers in sequence. Now each container is built in parallel using a matrix strategy.

## New caching strategy
CI now uses `docker buildx` for building the container images, which allows using an external cache source for builds, a location in the filesystem in this case. That location gets cached using actions/cache, using the key `{{ runner.os }}-buildx-${{ matrix.target }}-${{ env.TAG }}` and the restore key `${{ runner.os }}-buildx-${{ matrix.target }}-`.

For example when building the `web` container, its image and all the intermediary layers get cached under the key `Linux-buildx-web-git-abc0123`. When that has been cached in the `main` branch, that cache will be available to all the child branches, including forks. If a new branch in a fork asks for a key like `Linux-buildx-web-git-def456`, the key won't be found during the first CI run, but the system falls back to the key `Linux-buildx-web-git-abc0123` from `main` and so the build will start with a warm cache (more info about how keys are matched in the [actions/cache docs](https://docs.github.com/en/actions/configuring-and-managing-workflows/caching-dependencies-to-speed-up-workflows#matching-a-cache-key)).

## Packet host no longer needed
To benefit from the warm caches both in non-forks and forks like just explained, we're required to ditch doing the builds in Packet and now everything runs in the github runners VMs.
As a result there's no longer separate logic for non-forks and forks in the workflow files; `kind_integration.yml` was greatly simplified but `cloud_integration.yml` and `release.yml` got a little bigger in order to use the actions artifacts as a repository for the images built. This bloat will be fixed when support for [composite actions](https://github.com/actions/runner/blob/users/ethanchewy/compositeADR/docs/adrs/0549-composite-run-steps.md) lands in github.

## Local builds
You still are able to run `bin/docker-build` or any of the `docker-build.*` scripts. And to make use of buildx, run those same scripts after having set the env var `DOCKER_BUILDKIT=1`. Using buildx supposes you have installed it, as instructed [here](https://github.com/docker/buildx).

## Other
- A new script `bin/docker-cache-prune` is used to remove unused images from the cache. Without that the cache grows constantly and we can rapidly hit the 5GB limit (when the limit is attained the oldest entries get evicted).
- The `go-deps` dockerfile base image was changed from `golang:1.14.2` (ubuntu based) to `golang-1:14.2-alpine` also to conserve cache space.

# Addressed separately in #4875:

Got rid of the `go-deps` image and instead added something similar on top of all the Dockerfiles dealing with `go`, as a first stage for those Dockerfiles. That continues to serve as a way to pre-populate go's build cache, which speeds up the builds in the subsequent stages. That build should in theory be rebuilt automatically only when `go.mod` or `go.sum` change, and now we don't require running `bin/update-go-deps-shas`. That script was removed along with all the logic elsewhere that used it, including the `go_dependencies` job in the `static_checks.yml` github workflow.

The list of modules preinstalled was moved from `Dockerfile-go-deps` to a new script `bin/install-deps`. I couldn't find a way to generate that list dynamically, so whenever a slow-to-compile dependency is found, we have to make sure it's included in that list.

Although this simplifies the dev workflow, note that the real motivation behind this was a limitation in buildx's `docker-container` driver that forbids us from depending on images that haven't been pushed to a registry, so we have to resort to building the dependencies as a first stage in the Dockerfiles.
2020-07-22 14:27:45 -05:00
Wei Lun 85a042c151
add fish shell completion (#4751)
fixes #4208

Signed-off-by: Wei Lun <weilun_95@hotmail.com>
2020-07-20 15:46:30 -07:00
Matei David 146c593cd5
Uncomment EndpointSliceAccess function (#4760)
* Small PR that uncomments the `EndpointSliceAcess` method and cleans up left over todos in the destination service.
* Based on the past three PRs related to `EndpointSlices` (#4663 #4696 #4740); they should now be functional (albeit prone to bugs) and ready to use.

Signed-off-by: Matei David <matei.david.35@gmail.com>
2020-07-20 14:50:43 -07:00
Tarun Pothulapati b7e9507174
Remove/Relax prometheus related checks (#4724)
* Removes/Relaxes prometheus related checks

Now that prometheus is an add-on, There can be cases where prometheus is
disabled at which the check should show a warning but not fail. This
decouples the tight depedency.

This changes the following checks:

- Removes serviceAccount and pod checks in the CLI.
- Relaxes `linkerd-api` checks to only check for prometheus access when
the URL is not empty. This should work seamlessly with external
prometheus as that URL will be passed and it performs the same
check.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-07-20 14:24:00 -07:00
Matei David 8b85716eb8
Introduce install flag for EndpointSlices (#4740)
EndpointSlices have been made opt-in due to their experimental nature. This PR
introduces a new install flag 'enableEndpointSlices' that will allow adopters to
specify in their cli install or helm install step whether they would like to
use endpointslices as a resource in the destination service, instead of the
endpoints k8s resource.

Signed-off-by: Matei David <matei.david.35@gmail.com>
2020-07-15 09:53:04 -07:00
Kevin Leimkuhler f49b40c4a9
Add support for profile lookups by IP address (#4727)
## Motivation

Closes #3916

This adds the ability to get profiles for services by IP address.

### Change in behavior

When the destination server receives a `GetProfile` request with an IP address,
it now tries to map that IP address to a service.

If the IP address maps to an existing service, then the destination server
returns the profile stream subscribes for updates to the _service_--this is the
existing behavior. If the IP changes to a new service, the stream will still
send updates for the first service the IP address corresponded to since that is
what it is subscribed to.

If the IP address does not map to an existing service, then the destination
server returns the profile stream but does not subscribe for updates. The stream
will receive one update, the default profile.

### Solution

This change uses the `IPWatcher` within the destination server to check for what
services an IP address correspond to. By adding a new method `GetSvc` to
`IPWatcher`, the server now calls this method if `GetProfile` receives a request
with an IP address.

## Testing

Install linkerd on a cluster and get the cluster IP of any service:

```bash
❯ kubectl get -n linkerd svc/linkerd-tap -o wide
NAME          TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)            AGE   SELECTOR
linkerd-tap   ClusterIP   10.104.57.90   <none>        8088/TCP,443/TCP   16h   linkerd.io/control-plane-component=tap
```

Run the destination server:

```bash
❯ go run controller/cmd/main.go destination -kubeconfig ~/.kube/config
```

Get the profile for the tap service by IP address:

```bash
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.104.57.90:8088
INFO[0000] retry_budget:{retry_ratio:0.2  min_retries_per_second:10  ttl:{seconds:10}} 
INFO[0000]
```

Get the profile for an IP address that does not correspond to a service:

```bash
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.256.0.1:8088
INFO[0000] retry_budget:{retry_ratio:0.2  min_retries_per_second:10  ttl:{seconds:10}} 
INFO[0000]
```

You can add and remove settings for the service profile for tap and get updates.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-07-10 14:41:15 -07:00
cpretzer d3553c59fd
Add volume and volumeMount for buster-based proxy-init (#4692)
* Add volume and volumeMount for buster-based proxy-init

Signed-off-by: Charles Pretzer <charles@buoyant.io>
2020-07-09 09:55:07 -07:00
Zahari Dichev 73010149ce
Do not treat evicted pods as failed in healthchecks (#4732)
When a k8s pod is evicted its Phase is set to Failed and the reason is set to Evicted. Because in the ListPods method of the public APi we only transmit the phase and treat it as Status, the healthchecks assume such evicted data plane pods to be failed. Since this check is retryable, the results is that linkerd check --proxy appears to hang when there are evicted pods. As @adleong correctly pointed out here, the presence of evicted pod is not something that we should make the checks fail.

This change modifies the publci api to set the Pod.Status to "Evicted" for evicted pods. The healtcheks are also modified to not treat evicted pods as error cases.

Fix #4690

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-07-09 14:22:27 +03:00
Alejandro Pedraza e4273522b8
Delete unused files (#4729)
Removed `controller/proxy-injector/webhook_ops.go` and `controller/sp-validator/webhook_ops.go` that we used when we first introduced webhooks to dynamically create their configs, but we ended up doing that upfront at install time.
2020-07-08 06:41:44 -05:00
Suraj Deshmukh d7dbe9cbff
Fix spelling mistakes using codespell (#4700)
Using following command the wrong spelling were found and later on
fixed:

```
codespell --skip CHANGES.md,.git,go.sum,\
    controller/cmd/service-mirror/events_formatting.go,\
    controller/cmd/service-mirror/cluster_watcher_test_util.go,\
    SECURITY_AUDIT.pdf,.gcp.json.enc,web/app/img/favicon.png \
    --ignore-words-list=aks,uint,ans,files\' --check-filenames \
    --check-hidden
```

Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com>
2020-07-07 17:07:22 -05:00
Matei David 9d8d89cce8
Add EndpointSlice logic to EndpointsWatcher (#4501) (#4663)
Introduce support for the EndpointSlice k8s resource (k8s v1.16+) in the destination service.
Through this PR, in the EndpointsWatcher, there will be a dedicated informer for EndpointSlice;
the informer cannot run at the same time as the Endpoints resource informer. The main difference
is that EndpointSlices have a one-to-many relationship with a service, they provide better performance benefits,
dual-stack addresses and more. EndpointSlice support also implies service topology and other k8s related features.

Validated and tested manually, as well as with dedicated unit tests.

Closes #4501

Signed-off-by: Matei David <matei.david.35@gmail.com>
2020-07-07 13:20:40 -07:00
Matei David a2bd230cd6
service topologies: add Kubernetes/API EndpointSlice support (#4696)
Based on the [EndpointSlice PR](https://github.com/linkerd/linkerd2/pull/4663), this is just the k8s/api support for endpointslices to shorten the first PR.

* Adds CRD
* Adds functions that check whether the cluster has EndpointSlice access
* Adds discovery & endpointslice informers to api.

Signed-off-by: Matei David <matei.david.35@gmail.com>
2020-07-06 15:28:48 -07:00
Naseem 361d35bb6a
feat: add log format annotation and helm value (#4620)
* feat: add log format annotation and helm value

Json log formatting has been added via https://github.com/linkerd/linkerd2-proxy/pull/500
but wiring the option through as an annotation/helm value is still
necessary.

This PR adds the annotation and helm value to configure log format.

Closes #2491

Signed-off-by: Naseem <naseem@transit.app>
2020-07-02 10:08:52 -05:00
Arthur Silva Sens 021048d576
GoDocs for completion, dashboard and diagnostics cli commands (#4518)
Signed-off-by: arthursens <arthursens2005@gmail.com>
2020-06-30 05:53:50 -05:00
Alejandro Pedraza aea541d6f9
Upgrade generated protobuf files to v1.4.2 (#4673)
Regenerated protobuf files, using version 1.4.2 that was upgraded from
1.3.2 with the proxy-api update in #4614.

As of v1.4 protobuf messages are disallowed to be copied (because they
hold a mutex), so whenever a message is passed to or returned from a
function we need to use a pointer.

This affects _mostly_ test files.

This is required to unblock #4620 which is adding a field to the config
protobuf.
2020-06-26 09:36:48 -05:00
Oliver Gould c4d649e25d
Update proxy-api version to v0.1.13 (#4614)
This update includes no API changes, but updates grpc-go
to the latest release.
2020-06-24 12:52:59 -07:00
Lutz Behnke 846d2f11d4
Add support for Helm configuration of per-component proxy resources requests and limits (#4226)
Signed-off-by: Lutz Behnke <lutz.behnke@finleap.com>
2020-06-24 12:54:27 -05:00
Zahari Dichev 7f3d872930
Add destination-get-networks option (#4608)
In #4585 we are observing an issue where a loop is encountered when using nginx ingress. The problem is that the outbound proxy does a dst lookup on the IP address which happens to be the very same address the ingress is listening on.

In order to avoid situations like that this PR introduces a way to modify the set of networks for which the proxy shall do IP based discovery. The change introduces a helm flag `.Values.global.proxy.destinationGetNetworks` that can be used to modify this value. There are two ways a user can affect the this setting: 


- setting the `destinationGetNetworks` field in values during a Helm install, which changes the default on all injected pods
- using an annotation ` config.linkerd.io/proxy-destination-get-networks` for injected workloads to override this value

Note that this setting cannot be tweaked through the `install` or `inject` command

Fix: #4585

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-06-18 20:07:47 +03:00
Alex Leong 755538b84a
Resolve gateway hostnames into IP addresses (#4588)
Fixes #4582 

When a target cluster gateway is exposed as a hostname rather than with a fixed IP address, the service mirror controller fails to create mirror services and gateway mirrors for that gateway.  This is because we only look at the IP field of the gateway service.

We make two changes to address this problem:
 
First, when extracting the gateway spec from a gateway that has a hostname instead of an IP address, we do a DNS lookup to resolve that hostname into an IP address to use in the mirror service endpoints and gateway mirror endpoints.

Second, we schedule a repair job on a regular (1 minute) to update these endpoint objects.  This has the effect of re-resolving the DNS names every minute to pick up any changes in DNS resolution.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-06-15 10:33:49 -07:00
Zahari Dichev f01bcfe722
Tweak service-mirror log levels (#4562)
This PR just modifies the log levels on the probe and cluster watchers
to emit in INFO what they would emit in DEBUG. I think it makes sense
as we need that information to track problems. The only difference is
that when probing gateways we only log if the probe attempt was
unsuccessful.

Fix #4546
2020-06-05 13:12:36 -07:00
Zahari Dichev 3365455e45
Fix mc labels (#4560)
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-06-05 19:36:09 +03:00
Zahari Dichev b6b95455aa
Fix load balancer missing ip race condition (#4554)
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-06-05 19:35:47 +03:00
Alex Leong cffa07ddba
Update gateway identity on gateway mirror endpoints (#4559)
When the identity annotation on a gateway service is updated, this change is not propagated to the mirror gateway endpoints object.

This is because the annotations are updated on the wrong object and the changes are lost.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-06-05 09:21:35 -07:00
Alex Leong 0f84ff61db
Update gateway mirror ports (#4551)
* Update gateway mirror spec when remote gateway changes

Signed-off-by: Alex Leong <alex@buoyant.io>

* Only update ports

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-06-04 17:25:46 +03:00
Kevin Leimkuhler 8a932ac905
Change text to use source/target terminology in events and metrics (#4527)
Change terminology from local/remote to source/target in events and metrics.

This does not change any variable, function, struct, or field names since
testing is still improving

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-06-03 15:02:39 -04:00
Oliver Gould 7cc5e5c646
multicluster: Use the proxy as an HTTP gateway (#4528)
This change modifies the linkerd-gateway component to use the inbound
proxy, rather than nginx, for gateway. This allows us to detect loops and
propagate identity through the gateway.

This change also cleans up port naming to `mc-gateway` and `mc-probe`
to resolve conflicts with Kubernetes validation.

---

* proxy: v2.99.0

The proxy can now operate as gateway, routing requests from its inbound
proxy to the outbound proxy, without passing the requests to a local
application. This supports Linkerd's multicluster feature by adding a
`Forwarded` header to propagate the original client identity and assist
in loop detection.

---

* Add loop detection to inbound & TCP forwarding (linkerd/linkerd2-proxy#527)
* Test loop detection (linkerd/linkerd2-proxy#532)
* fallback: Unwrap errors recursively (linkerd/linkerd2-proxy#534)
* app: Split inbound/outbound constructors into components (linkerd/linkerd2-proxy#533)
* Introduce a gateway between inbound and outbound (linkerd/linkerd2-proxy#540)
* gateway: Add a Forwarded header (linkerd/linkerd2-proxy#544)
* gateway: Return errors instead of responses (linkerd/linkerd2-proxy#547)
* Fail requests that loop through the gateway (linkerd/linkerd2-proxy#545)

* inject: Support config.linkerd.io/enable-gateway

This change introduces a new annotation,
config.linkerd.io/enable-gateway, that, when set, enables the proxy to
act as a gateway, routing all traffic targetting the inbound listener
through the outbound proxy.

This also removes the nginx default listener and gateway port of 4180,
instead using 4143 (the inbound port).

* proxy: v2.100.0

This change modifies the inbound gateway caching so that requests may be
routed to multiple leaves of a traffic split.

---

* inbound: Do not cache gateway services (linkerd/linkerd2-proxy#549)
2020-06-02 19:37:14 -07:00
Kevin Leimkuhler d7f84e6c7b
Change help text to use source/target terminology in service-mirror and healthchecks (#4524)
Change terminology from local/remote to source/target in service-mirror and
healthchecks help text.

This does not change any variable, function, struct, or field names since
testing is still improving

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-06-02 15:21:52 -04:00
Alex Leong 91a067c924
Rename gateway ports (#4526)
* Rename gateway ports

Signed-off-by: Alex Leong <alex@buoyant.io>

* fmt

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-06-02 09:08:23 +03:00
Kevin Leimkuhler b4804a0bb5
Format fix (#4525)
Fixes CI failures

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-06-01 18:51:00 -04:00
Zahari Dichev 6c3922a7f1
Probe manager simplification (#4510)
There are a few notable things happening in this PR: 

- the probe manager has been decoupled from the cluster_watcher. Now its only responsibility is to watch for mirrored gateways beeing created and to probe them. This means that probes are initiated for all gateways no matter whether there are mirrored services being paired
- the number of paired services is derived from the existing services in the cluster rather than being published as a metric by the prober
- there are no events being exchanged between the cluster watcher and the probe manager

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-06-01 14:41:29 -07:00
Zahari Dichev f7f70690fb
Fix resync bug + service selection annotations (#4453)
THis PR addresses two problems: 

- when a resync happens (or the mirror controller is restarted) we incorrectly classify the remote gateway as a mirrored service that is not mirrored anymore and we delete it
- when updating services due to a gateway update, we need to select only the services for the particular cluster

The latter fixes #4451
2020-05-21 14:15:13 -07:00
Alex Leong acacf2e023
Add --close-wait-timeout inject flag (#4409)
Depends on https://github.com/linkerd/linkerd2-proxy-init/pull/10

Fixes #4276 

We add a `--close-wait-timeout` inject flag which configures the proxy-init container to run with `privileged: true` and to set `nf_conntrack_tcp_timeout_close_wait`. 

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-05-21 14:14:14 -07:00
Alex Leong 9cd4557644
Properly show the meshed count for non-selector services (#4446)
When viewing the output of `linkerd stat` for services which do not have a selector (such as services created by the service-mirror, for example) the meshed count column shows the total number which exist, even though the service actually selects no pods at all.

We update the StatSummary implementation to account for services which have no selector.

Additionally, we update the logic of the `--unmeshed` flag.  When the `--unmeshed` flag is not set, we typically skip rows for unmeshed resources because those resources would have no stats.  This is not appropriate to do when the `--from` flag is also set because in this case, metrics are not collected on the target resource but are instead collected on the client-side.  This means that stats can be present, even for unmeshed resources and these resources should still be displayed, even if the `--unmeshed` flag is not set.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-05-20 10:08:27 -07:00
Zahari Dichev 31e33d18d3
Enable service mirroring to work in private networks (#4440)
This change creates a gateway proxy for every gateway. This enables the probe worker to leverage the destination service functionality in order to discover the identity of the gateway.

Fix #4411

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-20 19:48:36 +03:00
Zahari Dichev 6574f124a7
Restrict Service mirror RBACs (#4426)
This PR introduces a few changes that were requested after a bit of service mirror reviewing.

- we restrict the RBACs so the service mirror controller cannot read secrets in all namespaces but only in the one that it is installed in
- we unify the namespace namings so all multicluster resources are installedi n `linkerd-multicluster` on both clusters
- fixed checks to account for changes

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-20 17:08:01 +03:00
Zahari Dichev 4176580a0f
Threadsafe buffering listener (#4359)
* Add thread safety to watcher tests

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-14 20:45:41 +03:00
Zahari Dichev 115bab9868
Fix gateway update problems (#4388)
* Fix gateway update problems

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-14 10:59:30 -05:00
Zahari Dichev ef1a2c2b10
Multicluster dashboard for traffic metrics (#4178)
This change adds labels to endpoints that target remote services. It also adds a Grafana dashboard that can be used to monitor multicluster traffic.

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-14 17:48:27 +03:00
Zahari Dichev fd59ce532d
Add better logging to service mirror controller (#4361)
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-11 10:30:16 +03:00
Zahari Dichev edd9b654a7
Make gateway require TLS for incoming requests (#4339)
Make gateway require TLS for incoming requests

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-11 10:07:48 +03:00
Alex Leong 8fbaa3ef9b
Don't send NoEndpoints during pod updates for ip watches (#4338)
When the proxy has an IP watch on a pod and the destination controller gets a pod update event, the destination controller sends a NoEndpoints message to all listeners followed by an Add with the new pod state.  This can result in the proxy's load balancer being briefly empty and could result in failing requests in the period.  

Since consecutive Add events with the same address will override each other, we can simply send the Adds without needing to clear the previous state with a NoEndpoints message.
2020-05-07 16:10:17 -07:00
Zahari Dichev 4e82ba8878
Multicluster checks (#4279)
Multicluster checks

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-05 10:19:38 +03:00
Zahari Dichev cd04b94bb9
Probe manager events emission tests (#4312)
Probe manager events emission tests

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-05-05 08:57:05 +03:00
Alex Leong 40b921508f
Inject LINKERD2_PROXY_DESTINATION_GET_NETWORKS proxy variable (#4300)
Fixes #3807

By setting the LINKERD2_PROXY_DESTINATION_GET_NETWORKS environment variable, we configure the Linkerd proxy to do destination lookups for authorities which are IP addresses in the private network range.  This allows us to get destination metadata including identity for HTTP requests which target an IP address in the cluster, Prometheus metrics scrape requests, for example.

This change allowed us to update the "direct edges" test which ensures that the edges command produces correct output for traffic which is addressed directly to a pod IP.

We also re-enabled the "linkerd stat" integration tests which had been disabled while the destination service did not yet support these types of IP queries.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-04-30 11:22:24 -07:00
Zahari Dichev 5149152ef3
Multicluster gateway and remote setup command (#4265)
Add multicluster gateway and setup command

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-04-29 20:33:23 +03:00
Zahari Dichev 17dacf5548
Add gateways command, allowing the retrieval of gateway stats (#4241)
Add gateways command, allowing the retrieval of gateway stats

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-04-27 13:55:01 +03:00
Zahari Dichev 09262ebd72
Add liveliness checks and metrics for multicluster gateway (#4233)
Add liveliness checks for gateway

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-04-27 13:06:58 +03:00
Tarun Pothulapati 2b1cbc6fc1
charts: Using downwardAPI to mount labels to the proxy container (#4199)
* use downward API to mount labels to the proxy container as a volume
* add namespace as a label to the pod
* add a trace inject test
* add downwardAPi for controlplaneTracing
* add controlPlaneTracing condition to volumeMounts
* update add-ons to have workload-ns
* add workload-ns label to control-plane components

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-04-22 10:33:51 -05:00
Alex Leong 9bf54d36ed
Upgrade to go 1.14.2 (#4278)
Upgrade Linkerd's base docker image to use go 1.14.2 in order to stay modern.

The only code change required was to update a test which was checking the error message of a `crypto/x509.CertificateInvalidError`.  The error message of this error changed between go versions.  We update the test to not check for the specific error string so that this test passes regardless of go version.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-04-20 17:14:51 -07:00
Alex Leong 5d3862c120
Use /live for liveness probe (#4270)
Fixes #3984

We use the new `/live` admin endpoint in the Linkerd proxy for liveness probes instead of the `/metrics` endpoint.  This endpoint returns a much smaller payload.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-04-17 14:53:32 -07:00
Kevin Leimkuhler b6aad75b35
Add `operationID` field to tap openapi response (#4245)
This fixes an issue users are experiencing when upgrading from from Linkerd
2.6 to 2.7 and use the [kubernetes-external-secrets]() project.

The change introduced by #3700 resulted in the tap service showing up in the
`/openapi/v2` API response. I confirmed this with a local build.

A dependency within the project expects the `operationID` field to be present
in the swagger definition. It is optional as stated in the
[spec](https://swagger.io/docs/specification/paths-and-operations/). It's
purpose is to identify an operation and should be unique.

This change adds that field to tap service swagger spec. While this can be
fixed in the KES dependency, it certainly does not hurt to add and other
libraries may similarly expect this field.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-04-15 09:41:06 -07:00
Zahari Dichev 26c14d3c66
Detect changes in addresses when getting updates in endpoints watcher (#4104)
Detect changes in addresses when getting updates in endpoints watcher

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-04-10 11:42:39 +03:00
Alex Leong d8eebee4f7
Upgrade to client-go 0.17.4 and smi-sdk-go 0.3.0 (#4221)
Here we upgrade our dependencies on client-go to 0.17.4 and smi-sdk-go to 0.3.0.  Since smi-sdk-go uses client-go 0.17.4, these upgrades must be performed simultaneously.

This also requires simultaneously upgrading our dependency on linkerd/stern to a SHA which also uses client-go 0.17.4.  This keeps all of our transitive dependencies synchronized on one version of client-go.

This ALSO requires updating our codegen scripts to use the 0.17.4 version of code-generator and running it to generate 0.17.4 compatible generated code.  I took this opportunity to update our code generation script to properly use the version of code-generater from `go.mod` rather than a hardcoded SHA.

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-04-01 10:07:23 -07:00
Zahari Dichev 10ecd8889e
Set auth override (#4160)
Set AuthOverride when present on endpoints annotation

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-03-25 10:56:36 +02:00
Mayank Shah 963b9b049a
Add kubectl-style label selectors (#4120)
* Update tap, routes and top commands to support label selectors

Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
2020-03-20 10:45:06 -05:00
Alejandro Pedraza 8f79e07ee2
Bump proxy-init to v1.3.2 (#4170)
* Bump proxy-init to v1.3.2

Bumped `proxy-init` version to v1.3.2, fixing an issue with `go.mod`
(linkerd/linkerd2-proxy-init#9).
This is a non-user-facing fix.
2020-03-17 14:49:25 -05:00
Kevin Leimkuhler 10db65bcb3
Update linkerd/stern to fix go.mod parsing (#4173)
## Motivation

I noticed the Go language server stopped working in VS Code and narrowed it
down to `go build ./...` failing with the following:

```
❯ go build ./...
go: github.com/linkerd/stern@v0.0.0-20190907020106-201e8ccdff9c: parsing go.mod: go.mod:3: usage: go 1.23
```

This change updates `linkerd/stern` version with changes made in
linkerd/stern#3 to fix this issue.

This does not depend on #4170, but it is also needed in order to completely
fix `go build ./...`
2020-03-17 11:16:18 -07:00
Zahari Dichev 2db307ee91
Remove target port requirement in port resolution (#4174)
This change removes the target port requirement when resolving ports in the dst service. Based on the comments, it seems that we need to have a target port defined in the port spec in order to resolve to the port in the Endpoints. In reality if target port is note defined when creating the service, k8s will set the port and the target port to the same value. Seems to me that checking for the targetPort to be different than 0, is a no-op.

Signed-off-by: Zahari Dichev zaharidichev@gmail.com
2020-03-16 23:04:08 +02:00
Zahari Dichev caf4e61daf
Enable identitiy on endpoints not associated with pods (#4134)
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-03-09 20:55:57 +02:00
Zahari Dichev 72fc94b03c
Service mirroring tests (#4115)
Unit tests that exercise most of the code in cluster_watcher.go. Essentially the whole cluster mirroring machinary can be tought of as a function that takes remote cluster state, local cluster state, and modification events and as a result it either modifies local cluster state or issues new events onto the queue. This is what these tests are trying to model. I think this covers a lot of the logic there. Any suggestions for other edge cases are welcome.

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-03-04 20:17:21 +02:00
Zahari Dichev edd7fd203d
Service Mirroring Component (#4028)
This PR introduces a service mirroring component that is responsible for watching remote clusters and mirroring their services locally.

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-03-02 21:16:08 +02:00
Christy Jacob 8111e54606
Check for extension server certificate (#4062)
* Check Extension api server Authentication
* Added Checks and tests for extension api-server authentication
* Fixed Failing Static Checks
* Updated the golden file

Signed-off-by: Christy Jacob <christyjacob4@gmail.com>
2020-02-28 13:39:02 -08:00
Mayank Shah 3c3a4a5f5d
cli: Add label selector flag for `stat` (#4040)
* Update `linkerd-namespace` shorthand to `L`
* Add --selector (-l) flag for `stat`

Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
2020-02-17 13:40:07 -05:00
Zahari Dichev 6fa9407318
Ensure we get the correct type out of Informer Deletion events (#4034)
Ensure we get what we expect when receiving DELETE events from the k8s Informer api

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-02-15 10:15:24 +02:00
Alex Leong ec51434eb9
Show traffic split metrics from sources in all namespaces (#3967)
Fixes #3562 

When a pod in one namespace sends traffic to a service which is the apex of a traffic split in another namespace, that traffic is not displayed in the `linkerd stat trafficsplit` output.  This is because when we do a Prometheus query for traffic to the traffic split, we supply a Prometheus label selector to only select traffic sources in the namespace of the traffic split.

Since any pod in any namespace can send traffic to the apex service of a traffic split, we must look at all possible sources of traffic, not just the ones in the same namespace.

Before:

```
$ bin/linkerd stat ts
NAME           APEX     LEAF       WEIGHT   SUCCESS   RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
webapp-split   webapp   webapp       900m         -     -             -             -             -
webapp-split   webapp   webapp-2     100m         -     -             -             -             -
```

After:

```
$ bin/linkerd stat ts
NAME           APEX     LEAF       WEIGHT   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
webapp-split   webapp   webapp       900m    80.00%   1.4rps          31ms          99ms        2530ms
webapp-split   webapp   webapp-2     100m    60.00%   0.2rps          35ms          93ms          99ms
```

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-02-12 09:21:59 -08:00
Alejandro Pedraza 3ba66f6f9d
Fix flakey TestGetProfiles (#3965)
Fixes #3332

Fixes the very rare test failure
```
--- FAIL: TestGetProfiles (0.33s)
    --- FAIL: TestGetProfiles/Returns_server_profile (0.11s)
            server_test.go:228: Expected 1 or 2 updates but got 3:
            [retry_budget:<retry_ratio:0.2 min_retries_per_second:10
            ttl:<seconds:10 > >  routes:<condition:<path:<regex:"/a/b/c"
            > > metrics_labels:<key:"route" value:"route1" >
            timeout:<seconds:10 > > retry_budget:<retry_ratio:0.2
            min_retries_per_second:10 ttl:<seconds:10 > >
            routes:<condition:<path:<regex:"/a/b/c" > >
            metrics_labels:<key:"route" value:"route1" >
            timeout:<seconds:10 > > retry_budget:<retry_ratio:0.2
            min_retries_per_second:10 ttl:<seconds:10 > > ]
            FAIL
            FAIL  github.com/linkerd/linkerd2/controller/api/destination
            0.624s
```
that occurs when a third unexpected stream update occurs, when the fake
API takes more time to notify its listeners about the resources created.

For all the nasty details check #3332
2020-02-07 19:43:29 -05:00
Dax McDonald 76d3285247
Use correct go module file syntax (#4021)
The correct syntax for the go module file is
go MAJOR.MINOR

Signed-off-by: Dax McDonald <dax@rancher.com>
2020-02-07 07:58:54 -08:00
Alejandro Pedraza afb93cddc8
Use `t.Name()` instead of `t.Name` in tests (#3970)
Use `t.Name()` instead of `t.Name` when retrieving the name of tests.
This was causing an error to be added in the log:
```
output: logrus_error="can not add field \"test\"
```

Followup to
[comment](https://github.com/linkerd/linkerd2/pull/3965#discussion_r370387990)
2020-01-27 09:17:19 -05:00
Kevin Leimkuhler 53baecb382
Changes for edge-20.1.3 (#3966)
## edge-20.1.3

* CLI
  * Introduced `linkerd check --pre --linkerd-cni-enabled`, used when the CNI
    plugin is used, to check it has been properly installed before proceeding
    with the control plane installation
  * Added support for the `--as-group` flag so that users can impersonate
    groups for Kubernetes operations (thanks @mayankshah160!)
* Controller
  * Fixed an issue where an override of the Docker registry was not being
    applied to debug containers (thanks @javaducky!)
  * Added check for the Subject Alternate Name attributes to the API server
    when access restrictions have been enabled (thanks @javaducky!)
  * Added support for arbitrary pod labels so that users can leverage the
    Linkerd provided Prometheus instance to scrape for their own labels
    (thanks @daxmc99!)
  * Fixed an issue with CNI config parsing

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2020-01-23 16:55:21 -08:00
Zahari Dichev a9d38189fb Fix CNI config parsing (#3953)
This PR addreses the problem introduced after #3766.

Fixes #3941 

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2020-01-23 09:55:04 -08:00
Mayank Shah 60ac0d5527 Add `as-group` CLI flag (#3952)
Add CLI flag --as-group that can impersonate group for k8s operations

Signed-off-by: Mayank Shah mayankshah1614@gmail.com
2020-01-22 16:38:31 +02:00
Paul Balogh b5e39bcbf7 Utilize Common Name or Subject Alternate Name for access checks (#3459) (#3949)
Subject
Utilize Common Name or Subject Alternate Name for access checks (#3459)

Problem
When access restrictions to API server have been enabled with the requestheader-allowed-names configuration, only the Common Name of the requestor certificate is being checked. This check should include the use of Subject Alternate Name attributes.

Solution
API server will now check the SAN attributes (DNS Names, Email Addresses, IP Addresses, and URIs) when determining accessibility for allowed names.

Fixes issue #3459

Signed-off-by: Paul Balogh <javaducky@gmail.com>
2020-01-22 08:58:19 +02:00
Paul Balogh dabee12b93 Fix issue for debug containers when using custom Docker registry (#3873)
**Subject**
Fixes bug where override of Docker registry was not being applied to debug containers (#3851)

**Problem**
Overrides for Docker registry are not being applied to debug containers and provide no means to correct the image.

**Solution**
This update expands the `data.proxy` configuration section within the Linkerd `ConfigMap` to maintain the overridden image name for debug containers at _install_-time similar to handling of the `proxy` and `proxyInit` images.

This change also enables the further override option of the registry for debug containers at _inject_-time given utilization of the `--registry` CLI option.

**Validation**
Several new unit tests have been created to confirm functionality.  In addition, the following workflows were run through:

### Standard Workflow with Custom Registry
This workflow installs Linkerd control plane based upon a custom registry, then injecting the debug sidecar into a service.

* Start with a k8s instance having no Linkerd installation
* Build all images locally using `bin/docker-build`
* Create custom tags (using same version) for generated images, e.g. `docker tag gcr.io/linkerd-io/debug:git-a4ebecb6 javaducky.com/linkerd-io/debug:git-a4ebecb6`
* Install Linkerd with registry override `bin/linkerd install --registry=javaducky.com/linkerd-io | kubectl apply -f -`
* Once Linkerd has been fully initialized, you should be able to confirm that the `linkerd-config` ConfigMap now contains the debug image name, pull policy, and version within the `data.proxy` section
* Request injection of the debug image into an available container.  I used the Emojivoto voting service as described in https://linkerd.io/2/tasks/using-the-debug-container/ as `kubectl -n emojivoto get deploy/voting -o yaml | bin/linkerd inject --enable-debug-sidecar - | kubectl apply -f -`
* Once the deployment creates a new pod for the service, inspection should show that the container now includes the "linkerd-debug" container name based on the applicable override image seen previously within the ConfigMap
* Debugging can also be verified by viewing debug container logs as `kubectl -n emojivoto logs deploy/voting linkerd-debug -f`
* Modifying the `config.linkerd.io/enable-debug-sidecar` annotation, setting to “false”, should show that the pod will be recreated no longer running the debug container.

### Overriding the Custom Registry Override at Injection
This builds upon the “Standard Workflow with Custom Registry” by overriding the Docker registry utilized for the debug container at the time of injection.

* “Clean” the Emojivoto voting service by removing any Linkerd annotations from the deployment
* Request injection similar to before, except provide the `--registry` option as in `kubectl -n emojivoto get deploy/voting -o yaml | bin/linkerd inject --enable-debug-sidecar --registry=gcr.io/linkerd-io - | kubectl apply -f -`
* Inspection of the deployment config should now show the override annotation for `config.linkerd.io/debug-image` having the debug container from the new registry.  Viewing the running pod should show that the `linkerd-debug` container was injected and running the correct image.  Of note, the proxy and proxy-init images are still running the “original” override images.
* As before, modifying the `config.linkerd.io/enable-debug-sidecar` annotation setting to “false”, should show that the pod will be recreated no longer running the debug container.

### Standard Workflow with Default Registry
This workflow is the typical workflow which utilizes the standard Linkerd image registry.

* Uninstall the Linkerd control plane using `bin/linkerd install --ignore-cluster | kubectl delete -f -` as described at https://linkerd.io/2/tasks/uninstall/
* Clean the Emojivoto environment using `curl -sL https://run.linkerd.io/emojivoto.yml | kubectl delete -f -` then reinstall using `curl -sL https://run.linkerd.io/emojivoto.yml | kubectl apply -f -`
* Perform standard Linkerd installation as `bin/linkerd install | kubectl apply -f -`
* Once Linkerd has been fully initialized, you should be able to confirm that the `linkerd-config` ConfigMap references the default debug image of `gcr.io/linkerd-io/debug` within the `data.proxy` section
* Request injection of the debug image into an available container as `kubectl -n emojivoto get deploy/voting -o yaml | bin/linkerd inject --enable-debug-sidecar - | kubectl apply -f -`
* Debugging can also be verified by viewing debug container logs as `kubectl -n emojivoto logs deploy/voting linkerd-debug -f`
* Modifying the `config.linkerd.io/enable-debug-sidecar` annotation, setting to “false”, should show that the pod will be recreated no longer running the debug container.

### Overriding the Default Registry at Injection
This workflow builds upon the “Standard Workflow with Default Registry” by overriding the Docker registry utilized for the debug container at the time of injection.

* “Clean” the Emojivoto voting service by removing any Linkerd annotations from the deployment
* Request injection similar to before, except provide the `--registry` option as in `kubectl -n emojivoto get deploy/voting -o yaml | bin/linkerd inject --enable-debug-sidecar --registry=javaducky.com/linkerd-io - | kubectl apply -f -`
* Inspection of the deployment config should now show the override annotation for `config.linkerd.io/debug-image` having the debug container from the new registry.  Viewing the running pod should show that the `linkerd-debug` container was injected and running the correct image.  Of note, the proxy and proxy-init images are still running the “original” override images.
* As before, modifying the `config.linkerd.io/enable-debug-sidecar` annotation setting to “false”, should show that the pod will be recreated no longer running the debug container.

Fixes issue #3851 

Signed-off-by: Paul Balogh javaducky@gmail.com
2020-01-17 10:18:03 -08:00
Mayank Shah b94e03a8a6 Remove empty fields from generated configs (#3886)
Fixes
- https://github.com/linkerd/linkerd2/issues/2962
- https://github.com/linkerd/linkerd2/issues/2545

### Problem
Field omissions for workload objects are not respected while marshaling to JSON.

### Solution
After digging a bit into the code, I came to realize that while marshaling, workload objects have empty structs as values for various fields which would rather be omitted. As of now, the standard library`encoding/json` does not support zero values of structs with the `omitemty` tag. The relevant issue can be found [here](https://github.com/golang/go/issues/11939). To tackle this problem, the object declaration should have _pointer-to-struct_ as a field type instead of _struct_ itself. However, this approach would be out of scope as the workload object declaration is handled by the k8s library.

I was able to find a drop-in replacement for the `encoding/json` library which supports zero value of structs with the `omitempty` tag. It can be found [here](https://github.com/clarketm/json). I have made use of this library to implement a simple filter like functionality to remove empty tags once a YAML with empty tags is generated, hence leaving the previously existing methods unaffected

Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
2020-01-13 10:02:24 -08:00
Alex Leong 93a81dce97
Change default proxy log level to "warn,linkerd=info" (#3908)
Fixes #3901 

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-01-09 14:22:06 -08:00
Paul Balogh 2cd2ecfa30 Enable mixed configuration of skip-[inbound|outbound]-ports (#3766)
* Enable mixed configuration of skip-[inbound|outbound]-ports using port numbers and ranges (#3752)
* included tests for generated output given proxy-ignore configuration options
* renamed "validate" method to "parseAndValidate" given mutation
* updated documentation to denote inclusiveness of ranges
* Updates for expansion of ignored inbound and outbound port ranges to be handled by the proxy-init rather than CLI (#3766)

This change maintains the configured ports and ranges as strings rather than unsigned integers, while still providing validation at the command layer.

* Bump versions for proxy-init to v1.3.0

Signed-off-by: Paul Balogh <javaducky@gmail.com>
2019-12-20 09:32:13 -05:00
Alex Leong 03762cc526
Support pod ip and service cluster ip lookups in the destination service (#3595)
Fixes #3444 
Fixes #3443 

## Background and Behavior

This change adds support for the destination service to resolve Get requests which contain a service clusterIP or pod ip as the `Path` parameter.  It returns the stream of endpoints, just as if `Get` had been called with the service's authority.  This lays the groundwork for allowing the proxy to TLS TCP connections by allowing the proxy to do destination lookups for the SO_ORIG_DST of tcp connections.  When that ip address corresponds to a service cluster ip or pod ip, the destination service will return the endpoints stream, including the pod metadata required to establish identity.

Prior to this change, attempting to look up an ip address in the destination service would result in a `InvalidArgument` error.

Updating the `GetProfile` method to support ip address lookups is out of scope and attempts to look up an ip address with the `GetProfile` method will result in `InvalidArgument`.

## Implementation

We do this by creating a `IPWatcher` which wraps the `EndpointsWatcher` and supports lookups by ip.   `IPWatcher` maintains a mapping up clusterIPs to service ids and translates subscriptions to an IP address into a subscription to the service id using the underlying `EndpointsWatcher`.

Since the service name is no longer always infer-able directly from the input parameters, we restructure `EndpointTranslator` and `PodSet` so that we propagate the service name from the endpoints API response.

## Testing

This can be tested by running the destination service locally, using the current kube context to connect to a Kubernetes cluster:

```
go run controller/cmd/main.go destination -kubeconfig ~/.kube/config
```

Then lookups can be issued using the destination client:

```
go run controller/script/destination-client/main.go -path 192.168.54.78:80 -method get -addr localhost:8086
```

Service cluster ips and pod ips can be used as the `path` argument.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-12-19 09:25:12 -08:00
Sergio C. Arteaga a1141fc507 Cache StatSummary responses in dashboard web server (#3769)
Signed-off-by: Sergio Castaño Arteaga <tegioz@icloud.com>
2019-12-17 09:15:00 -05:00
Dax McDonald 3088f404ce Upgrade prometheus to v1.2.1 (#3541)
Signed-off-by: Dax McDonald <dax@rancher.com>
2019-12-11 15:26:16 -08:00
Sergio C. Arteaga cee8e3d0ae Add CronJobs and ReplicaSets to dashboard and CLI (#3687)
This PR adds support for CronJobs and ReplicaSets to `linkerd inject`, the web
dashboard and CLI. It adds a new Grafana dashboard for each kind of resource. 

Closes #3614 
Closes #3630 
Closes #3584 
Closes #3585

Signed-off-by: Sergio Castaño Arteaga tegioz@icloud.com
Signed-off-by: Cintia Sanchez Garcia cynthiasg@icloud.com
2019-12-11 10:02:37 -08:00
Zahari Dichev e5f75a8c3d
Add validation to ensure stat time window is at least 15s (#3720)
* Add stat time window minimum of 10s

Signed-off-by: zaharidichev <zaharidichev@gmail.com>

* Address comments

Signed-off-by: zaharidichev <zaharidichev@gmail.com>
2019-12-04 08:12:01 +02:00
Alejandro Pedraza cf9fa0a8c9
Removed calico logutils dependency, incompatible with go 1.13 (#3763)
* Removed calico logutils dependency, incompatible with go 1.13

Fixes #1153

Removed dependency on
`github.com/projectcalico/libcalico-go/lib/logutils` because it has
problems with go modules, as described in
projectcalico/libcalico-go#1153

Not a big deal since it was only used for modifying the plugin's log
format.
2019-11-29 09:19:11 -05:00
Zahari Dichev b83c3a2137
Add Responses to path items to satisfy kube apiserver (#3700)
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
2019-11-15 20:41:50 +02:00
Alex Leong 0026103362 Unit and integration test fixups (#3730)
- Added cleanup step at the end of all integration tests.
- Disable external_issuer_integration_tests in cloud_tests due to
  namespace issue. Running this via `kind` tests is sufficient for now.
- Set a flakey test to `Skip`, relates to #3332.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-11-15 03:40:42 -08:00
Alejandro Pedraza 4b6254b52e
Replaced `uuid` with `uid` from linkerd-config resource (#3694)
* Replaced `uuid` with `uid` from linkerd-config resource

Fixes #3621

Removed the old `uuid` for identifying linkerd installations, and
replaced it with the `uid` property from the `linkerd-config` ConfigMap.

I tested that this `uid` remains the same by updating the config and
also upgrading linkerd, using both the CLI and Helm.

Note that this required granting `linkerd-web` RBAC access to the
`linkerd-config` Config.

I also added an integration test to verify the stability of the uid.
2019-11-13 13:56:01 -05:00
Alejandro Pedraza 3324966702
Upgrade go to 1.13.4 (#3702)
Fixes #3566

As explained in #3566, as of go 1.13 there's a strict check that ensures a dependency's timestamp matches it's sha (as declared in go.mod). Our smi-sdk dependency has a problem with that that got resolved later on, but more work would be required to upgrade that dependency. In the meantime a quick pair of replace statements at the bottom of go.mod fix the issue.
2019-11-13 12:54:36 -05:00
Tarun Pothulapati f18e27b115 use appsv1 api in identity (#3682)
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2019-11-06 15:06:09 -08:00
Alejandro Pedraza 0e8958cd07
Fixed bad identity string for target pod in tap (#3675)
* Fixed bad identity string for target pod in tap

Fixes #3506

Was using the cluster domain instead of the trust domain, which results
in an error when those domains differ.
2019-11-05 15:57:41 -05:00