EndpointSlices have been made opt-in due to their experimental nature. This PR
introduces a new install flag 'enableEndpointSlices' that will allow adopters to
specify in their cli install or helm install step whether they would like to
use endpointslices as a resource in the destination service, instead of the
endpoints k8s resource.
Signed-off-by: Matei David <matei.david.35@gmail.com>
https://github.com/linkerd/linkerd2-proxy/pull/593 changed the proxy
release process to produce platform-specific binaries.
This change modifies the bin/fetch-proxy script to fetch amd64-specific
binaries. The proxy version has been updated to v1.104.1, which includes
no code changes since v1.104.0.
Signed-off-by: Ali Ariff <ali.ariff12@gmail.com>
The `tests` variable wasn't being properly initialized, which resulted
in the `helm-deep` tests being repeated, and without cleanup in between,
the attempt to create resources that were already there caused an error.
## Motivation
Closes#3916
This adds the ability to get profiles for services by IP address.
### Change in behavior
When the destination server receives a `GetProfile` request with an IP address,
it now tries to map that IP address to a service.
If the IP address maps to an existing service, then the destination server
returns the profile stream subscribes for updates to the _service_--this is the
existing behavior. If the IP changes to a new service, the stream will still
send updates for the first service the IP address corresponded to since that is
what it is subscribed to.
If the IP address does not map to an existing service, then the destination
server returns the profile stream but does not subscribe for updates. The stream
will receive one update, the default profile.
### Solution
This change uses the `IPWatcher` within the destination server to check for what
services an IP address correspond to. By adding a new method `GetSvc` to
`IPWatcher`, the server now calls this method if `GetProfile` receives a request
with an IP address.
## Testing
Install linkerd on a cluster and get the cluster IP of any service:
```bash
❯ kubectl get -n linkerd svc/linkerd-tap -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
linkerd-tap ClusterIP 10.104.57.90 <none> 8088/TCP,443/TCP 16h linkerd.io/control-plane-component=tap
```
Run the destination server:
```bash
❯ go run controller/cmd/main.go destination -kubeconfig ~/.kube/config
```
Get the profile for the tap service by IP address:
```bash
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.104.57.90:8088
INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}}
INFO[0000]
```
Get the profile for an IP address that does not correspond to a service:
```bash
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.256.0.1:8088
INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}}
INFO[0000]
```
You can add and remove settings for the service profile for tap and get updates.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This creates a new integration test target that launches the deep suite,
using a linkerd instance installed through Helm.
I've added a `global.proxyInit.ignoreInboundPorts=1234,5678` override
during install and enhanced the injection test to catch problems like
what we saw in #4679.
The deep integration tests started failing on GKE.
Originally, this was thought to be a cleanup issue, but we have not cleaned up
deep integration tests in the past. We install Linkerd once, and then run all
the tests serially.
In thinking it's been a while since we've run a full deep tests on GKE, we may
just need more resources when running them now.
This increases the node count of the GKE cluster that we run on from 1 to 2.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
An unappropriate variable reuse resulted in the failure of the test for
upgrading using manifests. This only happened when the upgrade was
retried a second time (when there's a discrepancy in the heartbeat cron
schedule, which is bening).
This edge release moves Linkerd's bundled Prometheus into an add-on.
This makes the Linkerd Prometheus more configurable, gives it a separate
upgrade lifecycle from the rest of the control plane, and allows users
to disable the bundled Prometheus instance. In addition, this release
includes fixes for several issues, including a regression where the
proxy would fail to report OpenCensus spans.
* Prometheus is now an optional add-on, enabled by default
* Custom tolerations can now be specified for control plane resources
when installing with Helm (thanks @DesmondH0!)
* Evicted data plane pods are no longer considered to be failed by
`linkerd check --proxy`, fixing an issue where the check would be
retried indefinitely as long as evicted pods are present
* Fixed a regression where proxy spans were not reported to OpenCensus
* Fixed a bug where the proxy injector would fail to render skipped port
lists when installed with Helm
* Internal improvements to the proxy for lower latencies under high
concurrency
* Thanks to @Hellcatlk and @surajssd for adding new unit tests and
spelling fixes!
This fixes the deep integration test which currently only calls `run_test` for
`edges` integration test.
This occurs because `run_test "${tests[@]}"` will pass an entire array of
filenames when `run_test` only expects *one* filename.
The solution is to loop through `tests` and call `run_test` for each file.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This moves Prometheus as a add-on, thus making it optional but enabled by default. The also make `linkerd-prometheus` more configurable, and allow it to have its own life-cycle for upgrades, configuration, etc.
This work will be followed by documentation that help users configure existing Prometheus to work with Linkerd.
**Changes Include:**
- moving prometheus manifests into a separate chart at `charts/add-ons/prometheus`, and adding it as a dependency to `linkerd2`
- implement the `addOn` interface to support the same with CLI.
- include configuration in `linkerd-config-addons`
**User Facing Changes:**
The default install experience does not change much but for users who have already configured Prometheus differently, would need to apply the same using the new configuration fields present in chart README
The splitStringListToPorts helm function is currently incorrectly formating a list of ports as an array of Port objects that look ike {"port" : 555}. The config map protobuf representation however expects that the ignoreOutboundPorts and ignoreInboundPorts fields are are list of PortRange objects ({"portRange" : 555}).
This was causing the injector to return an empty string when trying to parse a PortRange object resulting in the ports not getting set correctly when injecting workloads. Note that this is happening only with helm installations as this is when we are actually using a helm template for outputting the config map.
To fix that the splitStringListToPorts helm function is changed to format the objects as the json representation of PortRange and is renamed to splitStringListToPortRanges
Fix: #4679
Signed-off-by: Zahari Dichev zaharidichev@gmail.com
When a k8s pod is evicted its Phase is set to Failed and the reason is set to Evicted. Because in the ListPods method of the public APi we only transmit the phase and treat it as Status, the healthchecks assume such evicted data plane pods to be failed. Since this check is retryable, the results is that linkerd check --proxy appears to hang when there are evicted pods. As @adleong correctly pointed out here, the presence of evicted pod is not something that we should make the checks fail.
This change modifies the publci api to set the Pod.Status to "Evicted" for evicted pods. The healtcheks are also modified to not treat evicted pods as error cases.
Fix#4690
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
* update helm render tests to read child charts values.yaml
Helm installation by default, considers values.yaml for dependend charts
and uses them in rendering. This function is being used for add-ons to
keep the default template values, allowing further overriden from the
parent chart's i.e linkerd2 values.yaml or --addon-config through CLI.
This PR updates the Helm tests to reflect the same i.e consider
values.yaml of chart dependencies if present.
This does not have any UX changes but helps with the follow up
add-on related work.
Removed `controller/proxy-injector/webhook_ops.go` and `controller/sp-validator/webhook_ops.go` that we used when we first introduced webhooks to dynamically create their configs, but we ended up doing that upfront at install time.
Using following command the wrong spelling were found and later on
fixed:
```
codespell --skip CHANGES.md,.git,go.sum,\
controller/cmd/service-mirror/events_formatting.go,\
controller/cmd/service-mirror/cluster_watcher_test_util.go,\
SECURITY_AUDIT.pdf,.gcp.json.enc,web/app/img/favicon.png \
--ignore-words-list=aks,uint,ans,files\' --check-filenames \
--check-hidden
```
Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com>
The function triggering the test for k8s custom cluster domain was
misnamed, and thus the test wasn't being run.
This also adds some extra error handling to catch this and other
potential issues.
Introduce support for the EndpointSlice k8s resource (k8s v1.16+) in the destination service.
Through this PR, in the EndpointsWatcher, there will be a dedicated informer for EndpointSlice;
the informer cannot run at the same time as the Endpoints resource informer. The main difference
is that EndpointSlices have a one-to-many relationship with a service, they provide better performance benefits,
dual-stack addresses and more. EndpointSlice support also implies service topology and other k8s related features.
Validated and tested manually, as well as with dedicated unit tests.
Closes#4501
Signed-off-by: Matei David <matei.david.35@gmail.com>
Based on the [EndpointSlice PR](https://github.com/linkerd/linkerd2/pull/4663), this is just the k8s/api support for endpointslices to shorten the first PR.
* Adds CRD
* Adds functions that check whether the cluster has EndpointSlice access
* Adds discovery & endpointslice informers to api.
Signed-off-by: Matei David <matei.david.35@gmail.com>
Added Sue BV to the adopters list
Added Youmail to list of adopters (#4694)
Signed-off-by: Freddy Andersen <fandersen@youmail.com>
Co-authored-by: Freddy Andersen <53147+heimdull@users.noreply.github.com>
Co-authored-by: Alex Leong <alex@buoyant.io>
This release increases the default buffer size to match the proxy's
in-flight request limit. This reduces contention in overload--especially
high-concurrency--situations, substantially reducing tail latency.
---
* update test-support clients and servers to be natively async (linkerd/linkerd2-proxy#580)
* Print build diagnostics in docker (linkerd/linkerd2-proxy#583)
* update test controllers to std::future/Tonic; remove threads (linkerd/linkerd2-proxy#585)
* buffer: Box the inner service's reponse future (linkerd/linkerd2-proxy#586)
* Eliminate Bind & Listen traits (linkerd/linkerd2-proxy#584)
* cache: replace Lock with Buffer (linkerd/linkerd2-proxy#587)
This PR adds a new cli test to see if installation yamls are correctly
generated even on windows, this is important because of all the file
path difference between windows and Linux, and if any code uses a wrong
format might cause the chart generation commands to fail on windows.
This creates a separate workflow for both release and integration.
Also, all the exisiting integration tests are moved in to
/tests/integration to separate from /test/cli as this test does not fall
under integration tests category
* feat: add log format annotation and helm value
Json log formatting has been added via https://github.com/linkerd/linkerd2-proxy/pull/500
but wiring the option through as an annotation/helm value is still
necessary.
This PR adds the annotation and helm value to configure log format.
Closes#2491
Signed-off-by: Naseem <naseem@transit.app>
Currently linkerd check appears to hang on HA installations where there are pods that are unscheduable. In reality it is just wating on a condition that might never become true without showing any useful information (i.e. which pods are not scheduled). This change adds sets the `surfaceErrorOnRetry: true` so the user gets feedback wrt to what conditions are not met yet instead of simply being shown waiting for check to complete.
Fix#4680
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Currently commands that need access to the public api are executing the `LinkerdControlPlaneExistenceChecks` This set of checks includes one that specifically checks that there is no unscheduable pods. In fact in order to run commands like stat and edge we do not need to meet that requirement.
This change relaxes all this by makind the no unschedulable pods a warning only check. Fixes#3940
Signed-off-by: Zahari Dichev zaharidichev@gmail.com
* Refactor install test helpers
- Move testResourcesPostInstall to testutil.TestResourcesPostInstall
- Move exerciseTestAppEndpoint to testutil.ExerciseTestAppEndpoint
Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
* Trigger CI
Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
Data disappears upon prometheus restarts due to it being all in-memory.
Adding an option to enabled persistence by means of a PVC would be the right approach. It is commonly seen in a wide array of helm charts.
Fixes#4576
Signed-off-by: Naseem <naseem@transit.app>
- match messaging w/website
- replace specific K8s versions with "modern" (future-proofing)
- Copyright 2019 -> 2020
- Minor tweaks
Signed-off-by: William Morgan <william@buoyant.io>
Regenerated protobuf files, using version 1.4.2 that was upgraded from
1.3.2 with the proxy-api update in #4614.
As of v1.4 protobuf messages are disallowed to be copied (because they
hold a mutex), so whenever a message is passed to or returned from a
function we need to use a pointer.
This affects _mostly_ test files.
This is required to unblock #4620 which is adding a field to the config
protobuf.
This edge release moves the proxy onto a new version of the Tokio runtime. This
allows us to more easily integrate with the ecosystem and may yield performance
benefits as well.
* Upgraded the proxy's underlying Tokio runtime and its related libraries
* Added support for PKCS8 formatted ECDSA private keys
* Added support for Helm configuration of per-component proxy resources requests
and limits (thanks @cypherfox!)
* Updated the `linkerd inject` command to throw an error while injecting
non-compliant pods (thanks @mayankshah1607)
Signed-off-by: Alex Leong <alex@buoyant.io>
This release fixes a regression that could cause service profile lookups
to be retried indefinitely, despite the server returning an
`InvalidArgument` response (which indicates the proxy should not retry).
---
* fix InvalidProfileAddr not converting into DiscoveryRejected (linkerd/linkerd2-proxy#581)
## Description
As discussed [here](https://github.com/linkerd/linkerd2/pull/4653#discussion_r445543061), the `kind_integration` job of the release workflow was not kept in sync with the changes made in #4593.
Until GitHub actions can reuse yaml for separate workflows, these sections are supposed to be kept in sync.
This would be an issue if we had tried doing a release since #4593 merged, but that has not happened yet.
## Changes
This updates the release workflow `kind_integration` job to use the use new test interface, mainly removing cluster creation and image loading as necessary prerequisites.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
## Summary
Change the default behavior of integration tests to be isolated by cluster.
Additionally, make running one or all tests easier than the current process.
These changes are explained more in the [Testing
RFC](https://github.com/linkerd/rfc/blob/master/design/0004-isolated-integration-tests.md)
## Changes
This is a script used only by Linkerd developers, but there is a lot of useful
usage examples and explanations in `bin/tests --help` output:
```
Run Linkerd integration tests.
Optionally specify one of the following tests: [upgrade helm helm-upgrade uninstall deep external-issuer]
Usage:
tests [--images] [--images-host ssh://linkerd-docker] [--name test-name] [--skip-kind-create] /path/to/linkerd
Examples:
# Run all tests in isolated clusters
tests /path/to/linkerd
# Run single test in isolated clusters
tests --name test-name /path/to/linkerd
# Skip KinD cluster creation and run all tests in default cluster context
tests --skip-kind-create /path/to/linkerd
# Load images from tar files located under the 'image-archives' directory
# Note: This is primarly for CI
tests --images /path/to/linkerd
# Retrieve images from a remote docker instance and then load them into KinD
# Note: This is primarly for CI
tests --images --images-host ssh://linkerd-docker /path/to/linkerd
Available Commands:
--name: the argument to this option is the specific test to run
--skip-kind-create: skip KinD cluster creation step and run tests in an existing cluster.
--images: (Primarily for CI) use 'kind load image-archive' to load the images from local .tar files in the current directory.
--images-host: (Primarily for CI) the argument to this option is used as the remote docker instance from which images are first retrieved (using 'docker save') to be then loaded into KinD. This command requires --images.
```
### Run all tests
Old:
```bash
bin/test-run $PWD/bin/linkerd
```
New:
```bash
bin/tests $PWD/bin/linkerd
```
### Run single test (upgrade for example):
Current:
```bash
. bin/_test-run.sh
init_test_run $PWD/bin/linkerd
upgrade_integration_tests
```
New:
```bash
bin/tests --name upgrade $PWD/bin/linkerd
```
### Run tests in isolated KinD clusters
Current: Not possible without running single tests in newly created clusters
manually
New:
```bash
bin/tests $PWD/bin/linkerd
```
### Run tests in isolated namespaces on an existing cluster
Old:
```bash
bin/test-run $PWD/bin/linkerd
```
New:
```bash
bin/tests --skip-kind-create $PWD/bin/linkerd
```
## CI
`kind_integration` has been updated so that it does not create a KinD cluster as
part of its test setup.
`cloud_integration` passes the `--skip-kind-create` flag so that the tests are
run serially in a non-KinD cluster.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>