This branch updates the check functionality to read
the new `linkerd-config.values` which contains the full
Values struct showing the current state of the Linkerd
installation. (being added in #5020 )
This is done by adding a new `FetchCurrentConfiguraiton`
which first tries to get the latest, if not falls back
to the older `linkerd-config` protobuf format.`
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Fixes#5008
We add a `values` file to the `ConfigMap/linkerd-config` resource. This file holds the full Values which were used to render the chart except that private data such as the identity issuer key are redacted. This file is currently unused but will eventually be used by CLI commands such as `check` and `inject` which need to load Linkerd's configuration (as described in #5009).
This is one step in a larger effort to eventually get rid of the other files in `ConfigMap/linkerd-config`.
Signed-off-by: Alex Leong <alex@buoyant.io>
A conflict between #4911 and #4737 caused unit test to be broken.
#4737 added a new test to `upgrade_test.go` and the changes in
#4911 updated all of these test to ignore differences in the config
overrides secret. Since these two PRs merged in parallel, the new
test was missing this update.
Update the new test to also ignore differences in the config overrides
secret as the other ones do.
Signed-off-by: Alex Leong <alex@buoyant.io>
This PR adds a new secret to the output of `linkerd install` called `linkerd-config-overrides`. This is the first step towards simplifying the configuration of the linkerd install and upgrade flow through the CLI. This secret contains the subset of the values.yaml which have been overridden. In other words, the subset of values which differ from their default values. The idea is that this will give us a simpler way to produce the `linkerd upgrade` output while still persisting options set during install. This will eventually replace the `linkerd-config` configmap entirely.
This PR only adds and populates the new secret. The secret is not yet read or used anywhere. Subsequent PRs will update individual control plane components to accept their configuration through flags and will update the `linkerd upgrade` flow to use this secret instead of the `linkerd-config` configmap.
This secret is only generated by the CLI and is not present or required when installing or upgrading with Helm.
Here are sample contents of the secret, base64 decoded. Note that identity tls context is saved as an override so that it can be persisted across updates. Since these fields contain private key material, this object must be a secret. This secret is only used for upgrades and thus only the CLI needs to be able to read it. We will not create any RBAC bindings to grant service accounts access to this secret.
```
global:
identityTrustAnchorsPEM: |
-----BEGIN CERTIFICATE-----
MIIBhDCCASmgAwIBAgIBATAKBggqhkjOPQQDAjApMScwJQYDVQQDEx5pZGVudGl0
eS5saW5rZXJkLmNsdXN0ZXIubG9jYWwwHhcNMjAwODI1MjMzMTU3WhcNMjEwODI1
MjMzMjE3WjApMScwJQYDVQQDEx5pZGVudGl0eS5saW5rZXJkLmNsdXN0ZXIubG9j
YWwwWTATBgcqhkjOPQIBBggqhkjOPQMBBwNCAAQ0e7IPBlVZ03TL8UVlODllbh8b
2pcM5mbtSGgpX9z0l3n5M70oHn715xu2szh63oBjPl2ZfOA5Bd43cJIksONQo0Iw
QDAOBgNVHQ8BAf8EBAMCAQYwHQYDVR0lBBYwFAYIKwYBBQUHAwEGCCsGAQUFBwMC
MA8GA1UdEwEB/wQFMAMBAf8wCgYIKoZIzj0EAwIDSQAwRgIhAI7Sy8P+3TYCJBlK
pIJSZD4lGTUyXPD4Chl/FwWdFfvyAiEA6AgCPbNCx1dOZ8RpjsN2icMRA8vwPtTx
oSfEG/rBb68=
-----END CERTIFICATE-----
heartbeatSchedule: '42 23 * * * '
identity:
issuer:
crtExpiry: "2021-08-25T23:32:17Z"
tls:
crtPEM: |
-----BEGIN CERTIFICATE-----
MIIBhDCCASmgAwIBAgIBATAKBggqhkjOPQQDAjApMScwJQYDVQQDEx5pZGVudGl0
eS5saW5rZXJkLmNsdXN0ZXIubG9jYWwwHhcNMjAwODI1MjMzMTU3WhcNMjEwODI1
MjMzMjE3WjApMScwJQYDVQQDEx5pZGVudGl0eS5saW5rZXJkLmNsdXN0ZXIubG9j
YWwwWTATBgcqhkjOPQIBBggqhkjOPQMBBwNCAAQ0e7IPBlVZ03TL8UVlODllbh8b
2pcM5mbtSGgpX9z0l3n5M70oHn715xu2szh63oBjPl2ZfOA5Bd43cJIksONQo0Iw
QDAOBgNVHQ8BAf8EBAMCAQYwHQYDVR0lBBYwFAYIKwYBBQUHAwEGCCsGAQUFBwMC
MA8GA1UdEwEB/wQFMAMBAf8wCgYIKoZIzj0EAwIDSQAwRgIhAI7Sy8P+3TYCJBlK
pIJSZD4lGTUyXPD4Chl/FwWdFfvyAiEA6AgCPbNCx1dOZ8RpjsN2icMRA8vwPtTx
oSfEG/rBb68=
-----END CERTIFICATE-----
keyPEM: |
-----BEGIN EC PRIVATE KEY-----
MHcCAQEEIJaqjoDnqkKSsTqJMGeo3/1VMfJTBsMEuMWYzdJVxIhToAoGCCqGSM49
AwEHoUQDQgAENHuyDwZVWdN0y/FFZTg5ZW4fG9qXDOZm7UhoKV/c9Jd5+TO9KB5+
9ecbtrM4et6AYz5dmXzgOQXeN3CSJLDjUA==
-----END EC PRIVATE KEY-----
```
Signed-off-by: Alex Leong <alex@buoyant.io>
Currently the secrets for the proxy-injector, sp-validator webhooks and tap API service are using the Opaque secret type and linkerd-specific field names. This makes it impossible to use cert-manager (https://github.com/jetstack/cert-manager) to provisions and rotate the secrets for these services. This change converts the secrets defined in the linkerd2 helm charts and the controller use the kubernetes.io/tls format instead. This format is used for secrets containing the generated secrets by cert-manager.
Signed-off-by: Lutz Behnke <lutz.behnke@finleap.com>
Fixes#4191#4993
This bumps Kubernetes client-go to the latest v0.19.2 (We had to switch directly to 1.19 because of this issue). Bumping to v0.19.2 required upgrading to smi-sdk-go v0.4.1. This also depends on linkerd/stern#5
This consists of the following changes:
- Fix ./bin/update-codegen.sh by adding the template path to the gen commands, as it is needed after we moved to GOMOD.
- Bump all k8s related dependencies to v0.19.2
- Generate CRD types, client code using the latest k8s.io/code-generator
- Use context.Context as the first argument, in all code paths that touch the k8s client-go interface
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
## Motivation
Closes#4950
## Solution
Add the `config.linkerd.io/opaque-ports` annotation to either a namespace or pod
spec to set the proxy `LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION`
environment variable.
Currently this environment variable is not used by the proxy, but will be
addressed by #4938.
## Valid values
Ports: `config.linkerd.io/opaque-ports: 4322,3306`
Port ranges: `config.linkerd.io/opaque-ports: 4320-4325`
Mixed ports and port ranges: `config.linkerd.io/opaque-ports: 4320-4325`
If the pod has named ports such as:
```
- name: nginx
image: nginx:latest
ports:
- name: nginx-port
containerPort: 80
protocol: TCP
```
The name can also be used as a value: `config.linkerd.io/opaque-ports:
nginx-port`
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* Introduce support for authenticated docker registries using imagePullSecrets
Problem: Private Docker Registries are not supported for the moment as detailed in issue #4413
Solution: Every Service Account of linkerd subcomponents are Attached with imagePullSecrets,
which in turn can then pulls the docker images from authenticated private registries using them.
The imagePullSecret is configured in global.imagePullSecret parameter of values.yaml like
imagePullSecret:
- name: <name-of-private-registry-secret-resource>
Fixes#4413
Signed-off-by: Nilakhya <nilakhya@hotmail.com>
* CNI: Use skip ports configuration in CNI
This PR updates the install and `cmdAdd` workflow (which is called
for each new Pod creation) to retrieve and set the configured Skip
Ports. This also updates the `cmdAdd` workflow to check if the new
pod is a control plane Pod, and adds `443` to OutBoundSkipPort so
that 443 (used with k8s API) is skipped as it was causing errors because
a resolve lookup was happening for them which is not intended.
Currently, This field has to be configured to make CNI work in
GKE clusters as thats where the binaries have to be stored. This
was configurable through Helm, but the same can be allowed through
the CLI too
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* tracing: Move default values into chart
This branch updates the tracing add-on's values into their own chart's values.yaml
(just like grafana and prometheus). This prevents them from being saved into
`linkerd-config-addons` where only the overridden values are stored. Thus allowing
us to change the defaults.
This also
- Updates the check command to fall back to default values, if there are no
overridden name fields.
- Updates jaeger to `1.19.2`
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Push docker images to ghcr.io instead of gcr.io
The `cloud_integration.yml` and `release.yml` workflows were modified to
log into ghcr.io, and remove the `Configure gcloud` step which is no
longer necessary.
Note that besides the changes to cloud_integration.yml and release.yml, there was a change to the upgrade-stable integration test so that we do linkerd upgrade --addon-overwrite to reset the addons settings because in stable-2.8.1 the Grafana image was pegged to gcr.io/linkerd-io/grafana in linkerd-config-addons. This will need to be mentioned in the 2.9 upgrade notes.
Also the egress integration test has a debug container that now is pegged to the edge-20.9.2 tag.
Besides that, the other changes are just a global search and replace (s/gcr.io\/linkerd-io/ghcr.io\/linkerd/).
The proxy performs endpoint discovery for unnamed services, but not
service profiles.
The destination controller and proxy have been updated to support
lookups for unnamed services in linkerd/linkerd2#4727 and
linkerd/linkerd2-proxy#626, respectively.
This change modifies the injection template so that the
`proxy.destinationGetNetworks` configuration enables profile
discovery for all networks on which endpoint discovery is permitted.
Updating only the go 1.15 version, makes the upgrades fail from older versions,
as the identity certs do not have that setting and go 1.15 expects them.
This PR upgrades the cert generation code to have that field,
allowing us to move to go 1.15 in later versions of Linkerd.
Fixes#4790
This PR removes both the SMI-Metrics templates along with the
experimental sub-commands. This also removes pkg `smi-metrics`
as there is no direct use of it without the commands.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
## What/How
@adleong pointed out in #4780 that when enabling slices during an upgrade, the new value does not persist in the `linkerd-config` ConfigMap. I took a closer look and it seems that we were never overwriting the values in case they were different.
* To fix this, I added an if block when validating and building the upgrade options -- if the current flag value differs from what we have in the ConfigMap, then change the ConfigMap value.
* When doing so, I made sure to check that if the cluster does not support `EndpointSlices` yet the flag is set to true, we will error out. This is done similarly (copy&paste similarily) to what's in the install part.
* Additionally, I have noticed that the helm ConfigMap template stored the flag value under `enableEndpointSlices` field name. I assume this was not changed in the initial PR to reflect the changes made in the protocol buffer. The API (and thus the CLI) uses the field name `endpointSliceEnabled` instead. I have changed the config template so that helm installations will use the same field, which can then be used in the destination service or other components that may implement slice support in the future.
Signed-off-by: Matei David <matei.david.35@gmail.com>
[Link to RFC](https://github.com/linkerd/rfc/pull/23)
### What
---
* PR that puts together all past pieces of the puzzle to deliver topology-aware service routing, as specified in the [Kubernetes docs](https://kubernetes.io/docs/concepts/services-networking/service-topology/) but with a much better load balancing algorithm and all the coolness of linkerd :)
* The first piece of this PR is focused on adding topology metadata: topology preference for services and topology `<k,v>` pairs for endpoints.
* The second piece of this PR puts together the new context format and fetching the source node topology metadata in order to allow for endpoints filtering.
* The final part is doing the filtering -- passing all of the metadata to the listener and on every `Add` filtering endpoints based on the topology preference of the service, topology `<k,v>` pairs of endpoints and topology of the source (again `<k,v>` pairs).
### How
---
* **Collecting metadata**:
- Services do not have values for topology keys -- the topological keys defined in a service's spec are only there to dictate locality preference for routing; as such, I decided to store them in an array, they will be taken exactly as they are found in the service spec, this ensures we respect the preference order.
- For EndpointSlices, we are using a map -- an EndpointSlice has locality information in the form of `<k,v>` pair, where the key is a topological key (similar to what's listed in the service) and the value is the locality information -- e.g `hostname: minikube`. For each address we now have a map of topology values which gets populated when we translate the endpoints to an address set. Because normal Endpoints do not have any topology information, we create each address with an empty map which is subsequently populated ONLY for slices in the `endpointSliceToAddressSet` function.
* **Filtering endpoints**:
- This was a tricky part and filled me with doubts. I think there are a few ways to do this, but this is how I "envisioned" it. First, the `endpoint_translator.go` should be the one to do the filtering; this means that on subscription, we need to feed all of the relevant metadata to the listener. To do this, I created a new function `AddTopologyFilter` as part of the listener interface.
- To complement the `AddTopologyFilter` function, I created a new `TopologyFilter` struct in `endpoints_watcher.go`. I then embedded this structure in all listeners that implement the interface. The structure holds the source topology (source node), a boolean to tell if slices are activated in case we need to double check (or write tests for the function) and the service preference. We create the filter on Subscription -- we have access to the k8s client here as well as the service, so it's the best point to collect all of this data together. Addresses all have their own topology added to them so they do not have to be collected by the filter.
- When we add a new set of addresses, we check to see if slices are enabled -- chances are if slices are enabled, service topology might be too. This lets us skip this step if the latest version is not adopted. Prior to sending an `Add` we filter the endpoints -- if the preference is registered by the filter we strictly enforce it, otherwise nothing changes.
And that's pretty much it.
Signed-off-by: Matei David <matei.david.35@gmail.com>
This PR corrects misspellings identified by the [check-spelling action](https://github.com/marketplace/actions/check-spelling).
The misspellings have been reported at aaf440489e (commitcomment-41423663)
The action reports that the changes in this PR would make it happy: 5b82c6c5ca
Note: this PR does not include the action. If you're interested in running a spell check on every PR and push, that can be offered separately.
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
Fixes#4708
Adds a `linkerd multicluster uninstall` command which outputs the manifests required to uninstall the mutlicluster components. This command first checks that no links exist and advises that any links must be removed with `linkerd multicluster unlink` before proceeding. Typical usage is:
```
linkerd multicluster uninstall | kubectl delete -f -
```
Signed-off-by: Alex Leong <alex@buoyant.io>
When the Link CRD does not exist, multicluster checks in `linkerd check` will be skipped. The `--multicluster` flag is intended to force these checks on, but was being ignored.
We update the options to force the multicluster checks on when the `--multicluster` flag is used, as intended.
Now when `linkerd check --multicluster` is run on a cluster without the multicluster support installed, it gives the following output:
```
linkerd-multicluster
--------------------
× Link CRD exists
multicluster.linkerd.io/Link CRD is missing: the server could not find the requested resource
see https://linkerd.io/checks/#l5d-multicluster-link-crd-exists for hints
Status check results are ×
```
Signed-off-by: Alex Leong <alex@buoyant.io>
Supersedes #4846
Bump proxy-init to v1.3.6, containing CNI fixes and support for
multi-arch builds.
#4846 included this in v1.3.5 but proxy.golang.org refused to update the
modified SHA
The upgrade tests were failing due to hardcoded certificates which had expired. Additionally, these tests contained large swaths of yaml that made it very difficult to understand the semantics of each test case and even more difficult to maintain.
We greatly improve the readability and maintainability of these tests by using a slightly different approach. Each test follows this basic structure:
* Render an install manifest
* Initialize a fake k8s client with the install manifest (and sometimes additional manifests)
* Render an upgrade manifest
* Parse the manifests as yaml tree structures
* Perform a structured diff on the yaml tree structured and look for expected and unexpected differences
The install manifests are generated dynamically using the regular install flow. This means that we no longer need large sections of hardcoded yaml in the tests themselves. Additionally, we now asses the output by doing a structured diff against the install manifest. This means that we no longer need golden files with explicit expected output.
All test cases were preserved except for the following:
* Any test cases related to multiphase install (config/control plane) were not replicated. This flow doesn't follow the same pattern as the tests above because the install and upgrade manifests are not expected to be the same or similar. I also felt that these tests were lower priority because the multiphase install/upgrade feature does not seem to be very popular and is a potential candidate for deprecation.
* Any tests involving upgrading from a very old config were not replicated. The code to generate these old style configs is no longer present in the codebase so in order to test this case, we would need to resort to hardcoded install manifests. These tests also seemed low priority to me because Linkerd versions that used the old config are now over 1 year old so it may no longer be critical that we support upgrading from them. We generally recommend that users upgrading from an old version of Linkerd do so by upgrading through each major version rather than directly to the latest.
Signed-off-by: Alex Leong <alex@buoyant.io>
This PR moves default values into add-on specific values.yaml thus
allowing us to update default values as they would not be present in
linkerd-config-addons cm.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Some installations upgrading from versions prior to 2.7.x may have missing debug image name and version. This fix ensures that the default values are in place for this scenario and additionally upgrades the version of debug image with the control plane version.
Signed-off-by: Paul Balogh <javaducky@gmail.com>
Fixes#4707
In order to remove a multicluster link, we add a `linkerd multicluster unlink` command which produces the yaml necessary to delete all of the resources associated with a `linkerd multicluster link`. These are:
* the link resource
* the service mirror controller deployment
* the service mirror controller's RBAC
* the probe gateway mirror for this link
* all mirror services for this link
This command follows the same pattern as the `linkerd uninstall` command in that its output is expected to be piped to `kubectl delete`. The typical usage of this command is:
```
linkerd --context=source multicluster unlink --cluster-name=foo | kubectl --context=source delete -f -
```
This change also fixes the shutdown lifecycle of the service mirror controller by properly having it listen for the shutdown signal and exit its main loop.
A few alternative designs were considered:
I investigated using owner references as suggested [here](https://github.com/linkerd/linkerd2/issues/4707#issuecomment-653494591) but it turns out that owner references must refer to resources in the same namespace (or to cluster scoped resources). This was not feasible here because a service mirror controller can create mirror services in many different namespaces.
I also considered having the service mirror controller delete the mirror services that it created during its own shutdown. However, this could lead to scenarios where the controller is killed before it finishes deleting the services that it created. It seemed more reliable to have all the deletions happen from `kubectl delete`. Since this is the case, we avoid having the service mirror controller delete mirror services, even when the link is deleted, to avoid the race condition where the controller and CLI both attempt to delete the same mirror services and one of them fails with a potentially alarming error message.
Signed-off-by: Alex Leong <alex@buoyant.io>
* bump prometheus to the latest v2.19.3
latest prometheus version shows a lot of decrease in the memory usage
and other benefits
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* CNI add support for priorityClassName
As requested in #2981 one should be able to optionally define a priorityClassName for the linkerd2 pods.
With this commit support for priorityClassName is added to the CNI plugin helm chart as well as to the
cli command for installing the CNI plugin.
Also added an `installNamespace` Helm option for the CNI installation.
Implements part of #2981.
Signed-off-by: alex.berger@nexiot.ch <alex.berger@nexiot.ch>
This pr adds `globa.prometheusUrl` field which will be used to configure publlic-api, hearbeat, grafana, etc (i,e query path) to use a external Prometheus.
* support overriding inbound and outbound connect timeouts.
* add validation on user provided TCP connect timeouts
* convert valid time values into ms
Signed-off-by: Matt Miller <mamiller@rosettastone.com>
* Add sidecar container support for linkerd-prometheus
Adds a new setting to the Prometheus' Helm config, allowing adding any kind of sidecar containers to the main container.
The specific use case that inspired this was for exporting data from Prometheus to external systems (e.g. cloudwatch, stackdriver, datadog) using a process that watches the prometheus write-ahead log (WAL).
Signed-off-by: Nathan J. Mehl <n@oden.io>
Add a new structure on the destination controller side to keep track of contextual information.
The token format has been changed from ns:<namespace> to a JSON format so that more variables can be
encdoed in the token. As part of this PR, a new field 'nodeName' has been added to help with service
topologies.
Fixes#4498
Signed-off-by: Matei David <matei.david.35@gmail.com>
This PR removes the service mirror controller from `linkerd mc install` to `linkerd mc link`, as described in https://github.com/linkerd/rfc/pull/31. For fuller context, please see that RFC.
Basic multicluster functionality works here including:
* `linkerd mc install` installs the Link CRD but not any service mirror controllers
* `linkerd mc link` creates a Link resource and installs a service mirror controller which uses that Link
* The service mirror controller creates and manages mirror services, a gateway mirror, and their endpoints.
* The `linkerd mc gateways` command lists all linked target clusters, their liveliness, and probe latences.
* The `linkerd check` multicluster checks have been updated for the new architecture. Several checks have been rendered obsolete by the new architecture and have been removed.
The following are known issues requiring further work:
* the service mirror controller uses the existing `mirror.linkerd.io/gateway-name` and `mirror.linkerd.io/gateway-ns` annotations to select which services to mirror. it does not yet support configuring a label selector.
* an unlink command is needed for removing multicluster links: see https://github.com/linkerd/linkerd2/issues/4707
* an mc uninstall command is needed for uninstalling the multicluster addon: see https://github.com/linkerd/linkerd2/issues/4708
Signed-off-by: Alex Leong <alex@buoyant.io>
EndpointSlices have been made opt-in due to their experimental nature. This PR
introduces a new install flag 'enableEndpointSlices' that will allow adopters to
specify in their cli install or helm install step whether they would like to
use endpointslices as a resource in the destination service, instead of the
endpoints k8s resource.
Signed-off-by: Matei David <matei.david.35@gmail.com>
This moves Prometheus as a add-on, thus making it optional but enabled by default. The also make `linkerd-prometheus` more configurable, and allow it to have its own life-cycle for upgrades, configuration, etc.
This work will be followed by documentation that help users configure existing Prometheus to work with Linkerd.
**Changes Include:**
- moving prometheus manifests into a separate chart at `charts/add-ons/prometheus`, and adding it as a dependency to `linkerd2`
- implement the `addOn` interface to support the same with CLI.
- include configuration in `linkerd-config-addons`
**User Facing Changes:**
The default install experience does not change much but for users who have already configured Prometheus differently, would need to apply the same using the new configuration fields present in chart README
The splitStringListToPorts helm function is currently incorrectly formating a list of ports as an array of Port objects that look ike {"port" : 555}. The config map protobuf representation however expects that the ignoreOutboundPorts and ignoreInboundPorts fields are are list of PortRange objects ({"portRange" : 555}).
This was causing the injector to return an empty string when trying to parse a PortRange object resulting in the ports not getting set correctly when injecting workloads. Note that this is happening only with helm installations as this is when we are actually using a helm template for outputting the config map.
To fix that the splitStringListToPorts helm function is changed to format the objects as the json representation of PortRange and is renamed to splitStringListToPortRanges
Fix: #4679
Signed-off-by: Zahari Dichev zaharidichev@gmail.com
* update helm render tests to read child charts values.yaml
Helm installation by default, considers values.yaml for dependend charts
and uses them in rendering. This function is being used for add-ons to
keep the default template values, allowing further overriden from the
parent chart's i.e linkerd2 values.yaml or --addon-config through CLI.
This PR updates the Helm tests to reflect the same i.e consider
values.yaml of chart dependencies if present.
This does not have any UX changes but helps with the follow up
add-on related work.
Using following command the wrong spelling were found and later on
fixed:
```
codespell --skip CHANGES.md,.git,go.sum,\
controller/cmd/service-mirror/events_formatting.go,\
controller/cmd/service-mirror/cluster_watcher_test_util.go,\
SECURITY_AUDIT.pdf,.gcp.json.enc,web/app/img/favicon.png \
--ignore-words-list=aks,uint,ans,files\' --check-filenames \
--check-hidden
```
Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com>
Based on the [EndpointSlice PR](https://github.com/linkerd/linkerd2/pull/4663), this is just the k8s/api support for endpointslices to shorten the first PR.
* Adds CRD
* Adds functions that check whether the cluster has EndpointSlice access
* Adds discovery & endpointslice informers to api.
Signed-off-by: Matei David <matei.david.35@gmail.com>
* feat: add log format annotation and helm value
Json log formatting has been added via https://github.com/linkerd/linkerd2-proxy/pull/500
but wiring the option through as an annotation/helm value is still
necessary.
This PR adds the annotation and helm value to configure log format.
Closes#2491
Signed-off-by: Naseem <naseem@transit.app>
Data disappears upon prometheus restarts due to it being all in-memory.
Adding an option to enabled persistence by means of a PVC would be the right approach. It is commonly seen in a wide array of helm charts.
Fixes#4576
Signed-off-by: Naseem <naseem@transit.app>
Regenerated protobuf files, using version 1.4.2 that was upgraded from
1.3.2 with the proxy-api update in #4614.
As of v1.4 protobuf messages are disallowed to be copied (because they
hold a mutex), so whenever a message is passed to or returned from a
function we need to use a pointer.
This affects _mostly_ test files.
This is required to unblock #4620 which is adding a field to the config
protobuf.
* Update inject to error out on failure
Update injection process to throw an error when the reason for failure is due to sidecar, udp, automountServiceAccountToken or hostNetwork
Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
In #4585 we are observing an issue where a loop is encountered when using nginx ingress. The problem is that the outbound proxy does a dst lookup on the IP address which happens to be the very same address the ingress is listening on.
In order to avoid situations like that this PR introduces a way to modify the set of networks for which the proxy shall do IP based discovery. The change introduces a helm flag `.Values.global.proxy.destinationGetNetworks` that can be used to modify this value. There are two ways a user can affect the this setting:
- setting the `destinationGetNetworks` field in values during a Helm install, which changes the default on all injected pods
- using an annotation ` config.linkerd.io/proxy-destination-get-networks` for injected workloads to override this value
Note that this setting cannot be tweaked through the `install` or `inject` command
Fix: #4585
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Fixes#4606
This has not worked as far back as stable-2.6.0.
## Solution
The recommended upgrade process is to include `--prune` as part of `kubectl
apply ..`:
```bash
$ linkerd upgrade | kubectl apply --prune -l linkerd.io/control-plane-ns=linkerd -f -
```
This is an issue for multi-stage upgrade because `linkerd upgrade config` does
not include the `linkerd-config` ConfigMap in it's output.
`kubectl apply --prune ..` will then prune this resource because it matches the
label selector *and* is not in the above output.
The issue occurs when `linkerd upgrade control-plane` is run and expects to find
the ConfigMap that was just pruned.
This can be fixed by not suggesting to prune resources as part of the
multi-stage upgrade.
## Considered
Including `templates/config.yaml` in the install output regardless of the stage.
Instead of it being a template only used in `control-plane` stage in
[render](4aa3ca7f87/cli/cmd/install.go (L873-L886)), it could always be rendered.
This just exposes other things that are pruned in the process:
```bash
❯ bin/linkerd upgrade control-plane |kubectl apply --prune -l linkerd.io/control-plane-ns=linkerd -f -
× Failed to build upgrade configuration: secrets "linkerd-identity-issuer" not found
For troubleshooting help, visit: https://linkerd.io/upgrade/#troubleshooting
error: no objects passed to apply
```
Ultimately, resources part of the `control-plane` stage need to remain and that
will not happen if we prune all resources not in the `config` stage output
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
As reported in #4259 linkerd check run from linkerd's web cconsole is
broken as the underlying RBAC Role cannot access the apiregistration.k8s.io API Group.
With this commit the RBAC Role is fixed allowing read-only access to the API Group
apiregistration.k8s.io.
Fixes#4259
Signed-off-by: alex.berger@nexiot.ch <alex.berger@nexiot.ch>
Put back space after `grafanaUrl` label in `linkerd-config-addons.yaml`
to avoid breaking the yaml parsing.
```
$ linkerd check
...
linkerd-addons
--------------
‼ 'linkerd-config-addons' config map exists
could not unmarshal linkerd-config-addons config-map: error
unmarshaling JSON: while decoding JSON: json: cannot unmarshal
string into Go struct field Values.global of type linkerd2.Global
```
This was added in #4544 to avoid having the configmap being badly formatted.
So this PR fixes the yaml, but then if we don't set `grafanaUrl` the
configmap format gets messed up, but apparently that's just a cosmetic
problem:
```
apiVersion: v1
data:
values: "global:\n grafanaUrl: \ngrafana:\n enabled: true\n
image:\n name:
gcr.io/linkerd-io/grafana\n name: linkerd-grafana\n resources:\n
cpu:\n limit:
240m\n memory:\n limit: null\ntracing:\n enabled:
false"
kind: ConfigMap
```
Fixes#4541
This PR adds the following checks
- if a mirrored service has endpoints. (This includes gateway mirrors too).
- if an exported service is referencing a gateway that does not exist.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Signed-off-by: Alex Leong <alex@buoyant.io>
Co-authored-by: Alex Leong <alex@buoyant.io>
* Add namespace global flag to hold default namespace name (#4469)
Signed-off-by: Matei David <matei.david.35@gmail.com>
* Change name of controlplane install namespace constant and init point for kubeNamespace
Signed-off-by: Matei David <matei.david.35@gmail.com>
Problem
When updating / writing tests with complex data, e.g the certificates, the build-in diff is not as powerful as dedicated external tool.
Solution
Dump all resource specifications created as part of failing tests to a supplied folder for external analysis.
Signed-off-by: Lutz Behnke <lutz.behnke@finleap.com>
Fixes#4454
As explained
[here](https://github.com/kubernetes/kubernetes/issues/36222#issuecomment-553966166),
trailing spaces in configmap data makes it to look funky when retrieved
later on. This is currently affecting `linkerd-config-addons` and
`linkerd-gateway-config`:
```
$ k -n linkerd-multicluster get cm linkerd-gateway-config -oyaml
apiVersion: v1
data:
nginx.conf: "events {\n}\nstream { \n
\ server { \n
\ listen 4180; \n
\ proxy_pass 127.0.0.1:4140; \n
\ } \n}
\nhttp {\n server {\n listen 4181;\n location /health {\n access_log
off;\n return 200 \"healthy\\n\";\n }\n }\n server {\n listen
\ 8888;\n location /health-local {\n access_log off;\n return
200 \"healthy\\n\";\n }\n } \n}"
kind: ConfigMap
```
AFAIK this is only cosmetic and doesn't affect functionality.
* Fixes#4305
Fixed SP route for `POST /api/v1/query`:
```
$ bin/linkerd routes -n linkerd deploy/linkerd-prometheus
ROUTE SERVICE SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
GET /api/v1/query_range linkerd-prometheus 100.00% 3.9rps 1ms 2ms 2ms
GET /api/v1/series linkerd-prometheus 100.00% 1.1rps 1ms 1ms 1ms
POST /api/v1/query linkerd-prometheus 100.00% 3.1rps 1ms 17ms 19ms
[DEFAULT] linkerd-prometheus - - - - -
```
Also added one missing route for `linkerd-grafana`, realizing afterwards there are
many other ones missing, but not really worth adding them all.
I also removed the routes in `linkerd-controller` for the tap routes
given that's no longer handled in that service.
And the tap service SP was also removed alltogether since nothing was
getting reported.
Change terminology from local/remote to source/target in `multicluster` CLI help
text.
This does not change any variable, function, struct, or field names since
testing is still improving.
Relevant issue: #4480
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
There are a few notable things happening in this PR:
- the probe manager has been decoupled from the cluster_watcher. Now its only responsibility is to watch for mirrored gateways beeing created and to probe them. This means that probes are initiated for all gateways no matter whether there are mirrored services being paired
- the number of paired services is derived from the existing services in the cluster rather than being published as a metric by the prober
- there are no events being exchanged between the cluster watcher and the probe manager
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Failures in `bin/_test-run` from commands different than `go test`
aren't currently properly reported, in part because CI's bash default is
to have `set -e` which terminates the script and just outputs
`##[error]Process completed with exit code 2.` like
[here](https://github.com/linkerd/linkerd2/pull/4496/checks?check_run_id=720720352#step:14:116)
```
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
× no unschedulable pods
linkerd-controller-6c77c7ffb8-w8wh5: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-destination-6767d88f7f-rcnbq: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-grafana-76c76fcfb9-pdhfb: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-identity-5bcf97d6c8-q6rll: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-prometheus-6b95c56b44-hd9m6: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-proxy-injector-58d794ff9-jf7cj: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-sp-validator-6c5f999bfb-qg252: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-tap-6fdf84fc65-6txvr: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-web-8484fbd867-nm8z2: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
see https://linkerd.io/checks/#l5d-existence-unschedulable-pods for hints
Status check results are ×
[error]Process completed with exit code 2.
```
I've made the following changes to `bin/_test-run` to generate better
messages and Github annotations when an error occurs:
- Unset `set -e` so that errors don't immediately exit the script and
don't allow us to properly format the errors.
- Removed many of the `exit_on_err` calls after go test calls because
those output enough information already (they were not being used
anyways in CI because of `set -e`). And instead have `run_test` exit
upon a `go test` error.
- Added `exit_on_err` calls right after non-`go-test` commands to
properly report their failure.
- Refactored the `exit_on_err` function so that it generates a Github
error annotation upon failure.
- Removed `trap` in `install_stable`, since the OS should be able to
handle GC for stuff under `/tmp`.
Also, I've changed the exit 2 code from `linkerd check` when it fails,
to exit code 1.
Quoting the list of directories passed to `goimports` was causing the list to be interpreted as a single argument which was stopping `bin/fmt` from working.
Instead, use `read` to split the list of directories into an array.
Also fix up incorrect formatting that has crept in while `bin/fmt` has been broken.
Signed-off-by: Alex Leong <alex@buoyant.io>
This change adds a `allow` and `link` commands, effectivelly enabling a cluster to have more than one set of credentials that allow it to be mirrored.
Fx #4461
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Co-authored-by: Alex Leong <alex@buoyant.io>
This is @psinghal20's changes in #4462 which is currently failing CI.
Fixes#4456
Description from the original PR:
> This pr renames the `cluster` command in CLI to `multicluster` command. It
> also adds a shorthand `mc` for easy use.
>
> Fixes#4456
>
> Signed-off-by: psinghal20 <psinghal20@gmail.com>
The CI failure doesn't seem to be related to this change, but has only been seen
on forks. Opening this from a non-fork for now to continue investigating.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Co-authored-by: psinghal20 <psinghal20@gmail.com>
Added comments to document several methods and strucs on cmd package. Based on GoDoc guidelines. Focus on alpha cli command
Signed-off-by: arthursens <arthursens2005@gmail.com>
Depends on https://github.com/linkerd/linkerd2-proxy-init/pull/10Fixes#4276
We add a `--close-wait-timeout` inject flag which configures the proxy-init container to run with `privileged: true` and to set `nf_conntrack_tcp_timeout_close_wait`.
Signed-off-by: Alex Leong <alex@buoyant.io>
When viewing the output of `linkerd stat` for services which do not have a selector (such as services created by the service-mirror, for example) the meshed count column shows the total number which exist, even though the service actually selects no pods at all.
We update the StatSummary implementation to account for services which have no selector.
Additionally, we update the logic of the `--unmeshed` flag. When the `--unmeshed` flag is not set, we typically skip rows for unmeshed resources because those resources would have no stats. This is not appropriate to do when the `--from` flag is also set because in this case, metrics are not collected on the target resource but are instead collected on the client-side. This means that stats can be present, even for unmeshed resources and these resources should still be displayed, even if the `--unmeshed` flag is not set.
Signed-off-by: Alex Leong <alex@buoyant.io>
This change creates a gateway proxy for every gateway. This enables the probe worker to leverage the destination service functionality in order to discover the identity of the gateway.
Fix#4411
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
This PR introduces a few changes that were requested after a bit of service mirror reviewing.
- we restrict the RBACs so the service mirror controller cannot read secrets in all namespaces but only in the one that it is installed in
- we unify the namespace namings so all multicluster resources are installedi n `linkerd-multicluster` on both clusters
- fixed checks to account for changes
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Certain install flags are intended to help with Linkerd development and generally are not useful (and are potentially confusing) to users.
We hide these flags in release (edge or stable) builds of the CLI but show them in all other builds. The list of affected flags is:
* control-plane-version
* proxy-image
* proxy-version
* image-pull-policy
* init-image
* init-image-version
Signed-off-by: Alex Leong <alex@buoyant.io>
When using cli commands that work on namespaced resources in the cluster, the default namespace used by the cli is hardcoded to the default Kubernetes namespace (i.e 'default'). This update will allow cli commands that operate on namespaced resources to automatically infer what the name of the default namespace is, by taking the relevant default from the currently used Kubeconfig context. In short, this allows the omission of the -n flag in commands such as linkerd metrics, when working with resources that belong to a namespace that is set as default in the currently active context.
Validation was done manually by setting the default namespace of the currently used context, as well as through two integration tests that target the tap and get command respectively.
Signed-off-by: Matei David <matei.david.35@gmail.com>