* Edge-20.10.1 changes
## edge-20.10.1
This edge release includes a couple of external contributions towards
improved cert-manager support and Grafana charts fixes, among other
enhancements.
* Changed the type of the injector and tap API secrets to `kubernetes.io/tls`,
so they can be provisioned by cert-manager (thanks @cypherfox!)
* Fixed the "Kubernetes cluster monitoring" Grafana dashboard that had a few
charts with incomplete data (thanks @aimbot31!)
* Fixed the `service-mirror` multicluster component so that it retries
connections to the target cluster's Kubernetes API when it's not reachable,
instead of blocking
* Increased the proxy's default timeout for DNS resolution to 500ms, as there
were reports that 100ms was too restrictive
Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This branch updates the check functionality to read
the new `linkerd-config.values` which contains the full
Values struct showing the current state of the Linkerd
installation. (being added in #5020 )
This is done by adding a new `FetchCurrentConfiguraiton`
which first tries to get the latest, if not falls back
to the older `linkerd-config` protobuf format.`
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
## Motivation
Closes#5016
Depends on linkerd/linkerd2-proxy-api#44
## Solution
A `profileTranslator` exists for each service and now has a new
`fullyQualifiedName` field.
This field is used to set the `FullyQualifiedName` field of
`DestinationProfile`s each time an update is sent.
In the case that no service profile exists for a service, a default
`DestinationProfile` is created and we can use the field to set the correct
name.
In the case that a service profile does exist for a service, we still use this
field to set the name to keep it consistent.
### Example
Install linkerd on a cluster and run the destination server:
```
go run controller/cmd/main.go destination -kubeconfig ~/.kube/config
```
Get the IP of a service. Here, we'll get the ip for `linkerd-identity`:
```
> kubectl get -n linkerd svc/linkerd-identity
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
linkerd-identity ClusterIP 10.43.161.68 <none> 8080/TCP 4h25m
```
Get the profile of `linkerd-identity` from service name or IP and note the
`FullyQualifiedName` field:
```
> go run controller/script/destination-client/main.go -method getProfile -path 10.43.161.68:8080
INFO[0000] fully_qualified_name:"linkerd-identity.linkerd.svc.cluster.local" ..
```
```
> go run controller/script/destination-client/main.go -method getProfile -path linkerd-identity.linkerd.svc.cluster.local
INFO[0000] fully_qualified_name:"linkerd-identity.linkerd.svc.cluster.local" ..
```
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Fixes#5008
We add a `values` file to the `ConfigMap/linkerd-config` resource. This file holds the full Values which were used to render the chart except that private data such as the identity issuer key are redacted. This file is currently unused but will eventually be used by CLI commands such as `check` and `inject` which need to load Linkerd's configuration (as described in #5009).
This is one step in a larger effort to eventually get rid of the other files in `ConfigMap/linkerd-config`.
Signed-off-by: Alex Leong <alex@buoyant.io>
A conflict between #4911 and #4737 caused unit test to be broken.
#4737 added a new test to `upgrade_test.go` and the changes in
#4911 updated all of these test to ignore differences in the config
overrides secret. Since these two PRs merged in parallel, the new
test was missing this update.
Update the new test to also ignore differences in the config overrides
secret as the other ones do.
Signed-off-by: Alex Leong <alex@buoyant.io>
Prometheus use a relabel rule that changed since 1.16
Use "pod_name" and "pod" to avoid breaking changes.
Also use "container" and "container_name" for the
same reasons.
Fixes#4380
Signed-off-by: Florian Davasse <florian.davasse@stack-labs.com>
This PR adds a new secret to the output of `linkerd install` called `linkerd-config-overrides`. This is the first step towards simplifying the configuration of the linkerd install and upgrade flow through the CLI. This secret contains the subset of the values.yaml which have been overridden. In other words, the subset of values which differ from their default values. The idea is that this will give us a simpler way to produce the `linkerd upgrade` output while still persisting options set during install. This will eventually replace the `linkerd-config` configmap entirely.
This PR only adds and populates the new secret. The secret is not yet read or used anywhere. Subsequent PRs will update individual control plane components to accept their configuration through flags and will update the `linkerd upgrade` flow to use this secret instead of the `linkerd-config` configmap.
This secret is only generated by the CLI and is not present or required when installing or upgrading with Helm.
Here are sample contents of the secret, base64 decoded. Note that identity tls context is saved as an override so that it can be persisted across updates. Since these fields contain private key material, this object must be a secret. This secret is only used for upgrades and thus only the CLI needs to be able to read it. We will not create any RBAC bindings to grant service accounts access to this secret.
```
global:
identityTrustAnchorsPEM: |
-----BEGIN CERTIFICATE-----
MIIBhDCCASmgAwIBAgIBATAKBggqhkjOPQQDAjApMScwJQYDVQQDEx5pZGVudGl0
eS5saW5rZXJkLmNsdXN0ZXIubG9jYWwwHhcNMjAwODI1MjMzMTU3WhcNMjEwODI1
MjMzMjE3WjApMScwJQYDVQQDEx5pZGVudGl0eS5saW5rZXJkLmNsdXN0ZXIubG9j
YWwwWTATBgcqhkjOPQIBBggqhkjOPQMBBwNCAAQ0e7IPBlVZ03TL8UVlODllbh8b
2pcM5mbtSGgpX9z0l3n5M70oHn715xu2szh63oBjPl2ZfOA5Bd43cJIksONQo0Iw
QDAOBgNVHQ8BAf8EBAMCAQYwHQYDVR0lBBYwFAYIKwYBBQUHAwEGCCsGAQUFBwMC
MA8GA1UdEwEB/wQFMAMBAf8wCgYIKoZIzj0EAwIDSQAwRgIhAI7Sy8P+3TYCJBlK
pIJSZD4lGTUyXPD4Chl/FwWdFfvyAiEA6AgCPbNCx1dOZ8RpjsN2icMRA8vwPtTx
oSfEG/rBb68=
-----END CERTIFICATE-----
heartbeatSchedule: '42 23 * * * '
identity:
issuer:
crtExpiry: "2021-08-25T23:32:17Z"
tls:
crtPEM: |
-----BEGIN CERTIFICATE-----
MIIBhDCCASmgAwIBAgIBATAKBggqhkjOPQQDAjApMScwJQYDVQQDEx5pZGVudGl0
eS5saW5rZXJkLmNsdXN0ZXIubG9jYWwwHhcNMjAwODI1MjMzMTU3WhcNMjEwODI1
MjMzMjE3WjApMScwJQYDVQQDEx5pZGVudGl0eS5saW5rZXJkLmNsdXN0ZXIubG9j
YWwwWTATBgcqhkjOPQIBBggqhkjOPQMBBwNCAAQ0e7IPBlVZ03TL8UVlODllbh8b
2pcM5mbtSGgpX9z0l3n5M70oHn715xu2szh63oBjPl2ZfOA5Bd43cJIksONQo0Iw
QDAOBgNVHQ8BAf8EBAMCAQYwHQYDVR0lBBYwFAYIKwYBBQUHAwEGCCsGAQUFBwMC
MA8GA1UdEwEB/wQFMAMBAf8wCgYIKoZIzj0EAwIDSQAwRgIhAI7Sy8P+3TYCJBlK
pIJSZD4lGTUyXPD4Chl/FwWdFfvyAiEA6AgCPbNCx1dOZ8RpjsN2icMRA8vwPtTx
oSfEG/rBb68=
-----END CERTIFICATE-----
keyPEM: |
-----BEGIN EC PRIVATE KEY-----
MHcCAQEEIJaqjoDnqkKSsTqJMGeo3/1VMfJTBsMEuMWYzdJVxIhToAoGCCqGSM49
AwEHoUQDQgAENHuyDwZVWdN0y/FFZTg5ZW4fG9qXDOZm7UhoKV/c9Jd5+TO9KB5+
9ecbtrM4et6AYz5dmXzgOQXeN3CSJLDjUA==
-----END EC PRIVATE KEY-----
```
Signed-off-by: Alex Leong <alex@buoyant.io>
Currently the secrets for the proxy-injector, sp-validator webhooks and tap API service are using the Opaque secret type and linkerd-specific field names. This makes it impossible to use cert-manager (https://github.com/jetstack/cert-manager) to provisions and rotate the secrets for these services. This change converts the secrets defined in the linkerd2 helm charts and the controller use the kubernetes.io/tls format instead. This format is used for secrets containing the generated secrets by cert-manager.
Signed-off-by: Lutz Behnke <lutz.behnke@finleap.com>
Fixes#4191#4993
This bumps Kubernetes client-go to the latest v0.19.2 (We had to switch directly to 1.19 because of this issue). Bumping to v0.19.2 required upgrading to smi-sdk-go v0.4.1. This also depends on linkerd/stern#5
This consists of the following changes:
- Fix ./bin/update-codegen.sh by adding the template path to the gen commands, as it is needed after we moved to GOMOD.
- Bump all k8s related dependencies to v0.19.2
- Generate CRD types, client code using the latest k8s.io/code-generator
- Use context.Context as the first argument, in all code paths that touch the k8s client-go interface
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
This implements the run_multicluster_test() function in bin/_test-helpers.sh.
The idea is to create two clusters (source and target) using k3d, with linkerd and multicluster support in both, plus emojivoto (without vote-bot) in target, and vote-bot in source.
We then link the clusters and make sure traffic is flowing.
Detailed sequence:
Create certficates.
Install linkerd along with multicluster support in the target cluster.
Run the target1 test: install emojivoto in the target cluster (without vote-bot).
Run linkerd mc link on the target cluster.
Install linkerd along with multicluster support in the source cluster.
Apply the link resource in the source cluster.
Run the source test: Check linkerd mc gateways returns the target cluster link, and only install emojivoto's vote-bot in the source cluster. Note vote-bot's yaml defines the web-svc service as web-svc-target.emojivoto:80
Run the target2 test: Make sure web-svc in the target cluster is receiving requests.
* Add support for k3d in integration tests
KinD doesn't support setting LoadBalancer services out of the box. It can be added with some additional work, but it seems the solutions are not cross-platform.
K3d on the other hand facilitates this, so we'll be using k3d clusters for the multicluster integration test.
The current change sets the ground by generalizing some of the integration tests operations that were hard-coded to KinD.
- Added `bin/k3d` to wrap the setup and running of a pinned version of `k3d`.
- Refactored `bin/_test-helpers.sh` to account for tests to be run in either KinD or k3d.
- Renamed `bin/kind-load` to `bin/image-load` and make it more generic to load images for both KinD (default) and k3d. Also got rid of the no longer used `--images-host` option.
- Added a placeholder for the new `multicluster` test in the lists in `bin/_test-helpers.sh`. It starts by setting up two k3d clusters.
* Refactor handling of the `--multicluster` flag in integration tests (#4995)
Followup to #4994, based off of that branch (`alpeb/k3d-tests`).
This is more preliminary work previous to the more complete multicluster integration test.
- Removed the `--multicluster` flag from all the tests we had in `bin/_test-helpers.sh`, so only the new "multicluster" integration test will make use of that. Also got rid of the `TestUninstallMulticluster()` test in `install_test.go` to keep the multicluster stuff around, needed for the more complete multicluster test that will be implemented in a followup PR.
- Added "multicluster" to the list of tests in the `kind_integration.yml` workflow.
- For now, this new "multicluster" test in `run_multicluster_test()` is just running the install tests (`test/integration/install_test.go`) with the `--multicluster` flag.
Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
## Motivation
Closes#4950
## Solution
Add the `config.linkerd.io/opaque-ports` annotation to either a namespace or pod
spec to set the proxy `LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION`
environment variable.
Currently this environment variable is not used by the proxy, but will be
addressed by #4938.
## Valid values
Ports: `config.linkerd.io/opaque-ports: 4322,3306`
Port ranges: `config.linkerd.io/opaque-ports: 4320-4325`
Mixed ports and port ranges: `config.linkerd.io/opaque-ports: 4320-4325`
If the pod has named ports such as:
```
- name: nginx
image: nginx:latest
ports:
- name: nginx-port
containerPort: 80
protocol: TCP
```
The name can also be used as a value: `config.linkerd.io/opaque-ports:
nginx-port`
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Adds bin/certs-openssl, which creates self-signed root cert/key and issuer cert/key using openssl. This will be used in the two clusters set up in the multicluster integration test (followup PR), given CI already has openssl and to avoid having to install step.
Adds a new flag `--certs-path` to the integration tests, pointing to the path where those certs (ca.crt, ca.key, issuer.key and issuer.crt) will be located to be fed into linkerd install's `--identity-*` flags.
When the service-mirror component can't reach the target's k8s API, the goroutine blocks and it can't be unblocked.
This was happenining specifically in the case of the multicluster integration test (still to be pushed), where the source and target clusters are created in quick succession and the target's API service doesn't always have time to be exposed before being requested by the service mirror.
The fix consists on no longer have restartClusterWatcher be side-effecting, and instead return an error. If such error is not nil then the link watcher is stopped and reset after 10 seconds.
## edge-20.9.4
This edge release introduces support for authenticated docker registries and
fixes a recent multicluster regression.
* Fixed a regression in multicluster gateway configurations that would forbid
inbound gateway traffic
* Upgraded bundled Grafana to v7.1.5
* Enabled Jaeger receiver in collector configuration in Helm chart (thanks
@olivierboudet!)
* Fixed skip port configuration being skipped in CNI plugin
* Introduced support for authenticated docker registries (thanks @c-n-c!)
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* Integration test for smi-metrics
This PR adds an integration test which installs SMI-Metrics and performs
queries and matches the reply with a regex query.
Currently, We store the SMI Helm pkg locally and run the test on top, so
That our CI does not break and we will periodically update the package
based on the newer releases of SMI-Metrics
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* tests: Add new CNI deep integration tests
Fixes#3944
This PR adds a new test, called cni-calico-deep which installs the Linkerd CNI
plugin on top of a cluster with Calico and performs the current integration tests on top, thus
validating various Linkerd features when CNI is enabled. For Calico
to work, special config is required for kind which is at `cni-calico.yaml`
This is different from the CNI integration tests that we run in
cloud integration which performs the CNI level integration tests.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Introduce support for authenticated docker registries using imagePullSecrets
Problem: Private Docker Registries are not supported for the moment as detailed in issue #4413
Solution: Every Service Account of linkerd subcomponents are Attached with imagePullSecrets,
which in turn can then pulls the docker images from authenticated private registries using them.
The imagePullSecret is configured in global.imagePullSecret parameter of values.yaml like
imagePullSecret:
- name: <name-of-private-registry-secret-resource>
Fixes#4413
Signed-off-by: Nilakhya <nilakhya@hotmail.com>
* CNI: Use skip ports configuration in CNI
This PR updates the install and `cmdAdd` workflow (which is called
for each new Pod creation) to retrieve and set the configured Skip
Ports. This also updates the `cmdAdd` workflow to check if the new
pod is a control plane Pod, and adds `443` to OutBoundSkipPort so
that 443 (used with k8s API) is skipped as it was causing errors because
a resolve lookup was happening for them which is not intended.
When some test failed in the middle of the
`./tests/integration/install_test.go` suite, multicluster resources can
be left-over, which `./bin/test-cleanup` wasn't removing.
This was affecting the ARM integration tests, that require good cleanup
since they use a non-transient cluster.
This edge release includes fixes and updates for the control plane and
CLI.
* Added `--dest-cni-bin-dir` flag to the `linkerd install-cni` command,
to configure the directory on the host where the CNI binary will be
placed
* Removed `collector.name` and `jaeger.name` config fields from the
tracing addon
* Updated Jaeger to 1.19.2
* Fixed a warning about deprecated Go packages in controller container
logs
* Do not run cloud integration tests in CI
Closes#4963
Removed the `./.github/workflows/cloud_integration.yml` workflow, and
removed the `cloud_integration_tests` job from the ``./.github/workflows/release.yml` workflow.
This adds a `replace` statement to `go.mod` to force the newer version `1.14.x` of `github.com/grpc-ecosystem/grpc-gateway` to avoid the following warning in all the controller container logs:
```
WARNING: Package "github.com/golang/protobuf/protoc-gen-go/generator" is deprecated.
A future release of golang/protobuf will delete this package,
which has long been excluded from the compatibility promise.
```
More info [here](https://github.com/golang/protobuf/issues/1104)
Currently, This field has to be configured to make CNI work in
GKE clusters as thats where the binaries have to be stored. This
was configurable through Helm, but the same can be allowed through
the CLI too
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* tracing: Move default values into chart
This branch updates the tracing add-on's values into their own chart's values.yaml
(just like grafana and prometheus). This prevents them from being saved into
`linkerd-config-addons` where only the overridden values are stored. Thus allowing
us to change the defaults.
This also
- Updates the check command to fall back to default values, if there are no
overridden name fields.
- Updates jaeger to `1.19.2`
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Push docker images to ghcr.io instead of gcr.io
The `cloud_integration.yml` and `release.yml` workflows were modified to
log into ghcr.io, and remove the `Configure gcloud` step which is no
longer necessary.
Note that besides the changes to cloud_integration.yml and release.yml, there was a change to the upgrade-stable integration test so that we do linkerd upgrade --addon-overwrite to reset the addons settings because in stable-2.8.1 the Grafana image was pegged to gcr.io/linkerd-io/grafana in linkerd-config-addons. This will need to be mentioned in the 2.9 upgrade notes.
Also the egress integration test has a debug container that now is pegged to the edge-20.9.2 tag.
Besides that, the other changes are just a global search and replace (s/gcr.io\/linkerd-io/ghcr.io\/linkerd/).
This release includes several major changes to the proxy's behavior:
- Service profile lookups are now necessary and fundamental to outbound
discovery for HTTP traffic. That is, if a service profile lookup is
rejected, endpoint discovery will not be performed; and endpoint
discovery must succeed for all destinations that are permitted by
service profiles. This simplifies caching and buffering to reduce
latency (especially under concurrency).
- Service discovery is now performed for all TCP traffic, and
connections are balanced over endpoints according to connection
latency.
- This enables mTLS for **all** meshed connections; not just HTTP.
- Outbound TCP metrics are now hydrated with endpoint-specific labels.
---
* outbound: Cache balancers within profile stack (linkerd/linkerd2-proxy#641)
* outbound: Remove unused error type (linkerd/linkerd2-proxy#648)
* Eliminate the ConnectAddr trait (linkerd/linkerd2-proxy#649)
* profiles: Do not rely on tuples as stack targets (linkerd/linkerd2-proxy#650)
* proxy-http: Remove unneeded boilerplate (linkerd/linkerd2-proxy#651)
* outbound: Clarify Http target types (linkerd/linkerd2-proxy#653)
* outbound: TCP discovery and load balancing (linkerd/linkerd2-proxy#652)
* metrics: Add endpoint labels to outbound TCP metrics (linkerd/linkerd2-proxy#654)
The proxy performs endpoint discovery for unnamed services, but not
service profiles.
The destination controller and proxy have been updated to support
lookups for unnamed services in linkerd/linkerd2#4727 and
linkerd/linkerd2-proxy#626, respectively.
This change modifies the injection template so that the
`proxy.destinationGetNetworks` configuration enables profile
discovery for all networks on which endpoint discovery is permitted.
`./bin/test-cleanup` was trying to remove the
resources with the label `linkerd.io/is-test-helm` which we're not
using. Instead, we simply call `helm delete` on the appropriate helm
releases.
This is required for a clean cleanup after the ARM integration test, whose
cluster is just cleaned by this script at the end and is not torn down.
## edge-20.9.1
This edge release contains an important proxy update that allows linkerd to
continue to operate normally in HA during node outages. We're also adding full
Kubernetes 1.19 support!
* Improved the proxy's error handling for DNS errors encountered when
discovering control plane addresses, which can be common during installation,
before all components have been started
* The destination and identity services had to be made headless in order to
support that new controller discovery (which now can leverage SRV records)
* Use SAN fields when generating the linkerd webhook configs; this completes the
Kubernetes 1.19 support which enforces them
* Fixed `linkerd check` for multicluster that was spuriously claiming the
absence of some resources
* Improved the injection test cleanup (thanks @zhouhao3!)
* Added ability to run the integration test suite using a cluster in an ARM
architecture (thanks @aliariff!)
Updating only the go 1.15 version, makes the upgrades fail from older versions,
as the identity certs do not have that setting and go 1.15 expects them.
This PR upgrades the cert generation code to have that field,
allowing us to move to go 1.15 in later versions of Linkerd.
* Create webhook certs with SANs, along with legacy Common Name
Fixes#4918
In Kubernetes 1.18, Go version has been updated to 1.15 which updated
how certificates are verified. They moved away from legacy Common Name
field to SANs.
This PR replaces the internal Helm cert generation functions to
`genSelfSignedCert` as they allow alternate DNS names to be specified.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
The multicluster checks make sure that the correct resources exist for each service mirror controller. When looking up these resources, it uses the `linkerd.io/control-plane-component=linkerd-service-mirror` label selector. However, these resources have the label `linkerd.io/control-plane-component=service-mirror`. This causes the resource lookup to fail to find the resource and the check spuriously fails.
```
× service mirror controller has required permissions
missing ServiceAccounts: linkerd-service-mirror-self
missing ClusterRoles: linkerd-service-mirror-access-local-resources-self
missing ClusterRoleBindings: linkerd-service-mirror-access-local-resources-self
missing Roles: linkerd-service-mirror-read-remote-creds-self
missing RoleBindings: linkerd-service-mirror-read-remote-creds-self
see https://linkerd.io/checks/#l5d-multicluster-source-rbac-correct for hints
| * no service mirror controller deployment for Link self
```
Instead, use the correct label selector when looking up these resources.
Signed-off-by: Alex Leong <alex@buoyant.io>