Fixes#4790
This PR removes both the SMI-Metrics templates along with the
experimental sub-commands. This also removes pkg `smi-metrics`
as there is no direct use of it without the commands.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
## What/How
@adleong pointed out in #4780 that when enabling slices during an upgrade, the new value does not persist in the `linkerd-config` ConfigMap. I took a closer look and it seems that we were never overwriting the values in case they were different.
* To fix this, I added an if block when validating and building the upgrade options -- if the current flag value differs from what we have in the ConfigMap, then change the ConfigMap value.
* When doing so, I made sure to check that if the cluster does not support `EndpointSlices` yet the flag is set to true, we will error out. This is done similarly (copy&paste similarily) to what's in the install part.
* Additionally, I have noticed that the helm ConfigMap template stored the flag value under `enableEndpointSlices` field name. I assume this was not changed in the initial PR to reflect the changes made in the protocol buffer. The API (and thus the CLI) uses the field name `endpointSliceEnabled` instead. I have changed the config template so that helm installations will use the same field, which can then be used in the destination service or other components that may implement slice support in the future.
Signed-off-by: Matei David <matei.david.35@gmail.com>
[Link to RFC](https://github.com/linkerd/rfc/pull/23)
### What
---
* PR that puts together all past pieces of the puzzle to deliver topology-aware service routing, as specified in the [Kubernetes docs](https://kubernetes.io/docs/concepts/services-networking/service-topology/) but with a much better load balancing algorithm and all the coolness of linkerd :)
* The first piece of this PR is focused on adding topology metadata: topology preference for services and topology `<k,v>` pairs for endpoints.
* The second piece of this PR puts together the new context format and fetching the source node topology metadata in order to allow for endpoints filtering.
* The final part is doing the filtering -- passing all of the metadata to the listener and on every `Add` filtering endpoints based on the topology preference of the service, topology `<k,v>` pairs of endpoints and topology of the source (again `<k,v>` pairs).
### How
---
* **Collecting metadata**:
- Services do not have values for topology keys -- the topological keys defined in a service's spec are only there to dictate locality preference for routing; as such, I decided to store them in an array, they will be taken exactly as they are found in the service spec, this ensures we respect the preference order.
- For EndpointSlices, we are using a map -- an EndpointSlice has locality information in the form of `<k,v>` pair, where the key is a topological key (similar to what's listed in the service) and the value is the locality information -- e.g `hostname: minikube`. For each address we now have a map of topology values which gets populated when we translate the endpoints to an address set. Because normal Endpoints do not have any topology information, we create each address with an empty map which is subsequently populated ONLY for slices in the `endpointSliceToAddressSet` function.
* **Filtering endpoints**:
- This was a tricky part and filled me with doubts. I think there are a few ways to do this, but this is how I "envisioned" it. First, the `endpoint_translator.go` should be the one to do the filtering; this means that on subscription, we need to feed all of the relevant metadata to the listener. To do this, I created a new function `AddTopologyFilter` as part of the listener interface.
- To complement the `AddTopologyFilter` function, I created a new `TopologyFilter` struct in `endpoints_watcher.go`. I then embedded this structure in all listeners that implement the interface. The structure holds the source topology (source node), a boolean to tell if slices are activated in case we need to double check (or write tests for the function) and the service preference. We create the filter on Subscription -- we have access to the k8s client here as well as the service, so it's the best point to collect all of this data together. Addresses all have their own topology added to them so they do not have to be collected by the filter.
- When we add a new set of addresses, we check to see if slices are enabled -- chances are if slices are enabled, service topology might be too. This lets us skip this step if the latest version is not adopted. Prior to sending an `Add` we filter the endpoints -- if the preference is registered by the filter we strictly enforce it, otherwise nothing changes.
And that's pretty much it.
Signed-off-by: Matei David <matei.david.35@gmail.com>
Fixes#4511
Add the `linkerd.io/control-plane-component: gateway` label to the multicluster gateway. Change the value of `linkerd.io/control-plane-component` from `linkerd-service-mirror` to `service-mirror` for the service mirror controller.
These changes are for consistency and should not result in any change in functionality.
Signed-off-by: Alex Leong <alex@buoyant.io>
Supersedes #4846
Bump proxy-init to v1.3.6, containing CNI fixes and support for
multi-arch builds.
#4846 included this in v1.3.5 but proxy.golang.org refused to update the
modified SHA
The upgrade tests were failing due to hardcoded certificates which had expired. Additionally, these tests contained large swaths of yaml that made it very difficult to understand the semantics of each test case and even more difficult to maintain.
We greatly improve the readability and maintainability of these tests by using a slightly different approach. Each test follows this basic structure:
* Render an install manifest
* Initialize a fake k8s client with the install manifest (and sometimes additional manifests)
* Render an upgrade manifest
* Parse the manifests as yaml tree structures
* Perform a structured diff on the yaml tree structured and look for expected and unexpected differences
The install manifests are generated dynamically using the regular install flow. This means that we no longer need large sections of hardcoded yaml in the tests themselves. Additionally, we now asses the output by doing a structured diff against the install manifest. This means that we no longer need golden files with explicit expected output.
All test cases were preserved except for the following:
* Any test cases related to multiphase install (config/control plane) were not replicated. This flow doesn't follow the same pattern as the tests above because the install and upgrade manifests are not expected to be the same or similar. I also felt that these tests were lower priority because the multiphase install/upgrade feature does not seem to be very popular and is a potential candidate for deprecation.
* Any tests involving upgrading from a very old config were not replicated. The code to generate these old style configs is no longer present in the codebase so in order to test this case, we would need to resort to hardcoded install manifests. These tests also seemed low priority to me because Linkerd versions that used the old config are now over 1 year old so it may no longer be critical that we support upgrading from them. We generally recommend that users upgrading from an old version of Linkerd do so by upgrading through each major version rather than directly to the latest.
Signed-off-by: Alex Leong <alex@buoyant.io>
This PR moves default values into add-on specific values.yaml thus
allowing us to update default values as they would not be present in
linkerd-config-addons cm.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* bump prometheus to the latest v2.19.3
latest prometheus version shows a lot of decrease in the memory usage
and other benefits
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* CNI add support for priorityClassName
As requested in #2981 one should be able to optionally define a priorityClassName for the linkerd2 pods.
With this commit support for priorityClassName is added to the CNI plugin helm chart as well as to the
cli command for installing the CNI plugin.
Also added an `installNamespace` Helm option for the CNI installation.
Implements part of #2981.
Signed-off-by: alex.berger@nexiot.ch <alex.berger@nexiot.ch>
This pr adds `globa.prometheusUrl` field which will be used to configure publlic-api, hearbeat, grafana, etc (i,e query path) to use a external Prometheus.
* support overriding inbound and outbound connect timeouts.
* add validation on user provided TCP connect timeouts
* convert valid time values into ms
Signed-off-by: Matt Miller <mamiller@rosettastone.com>
* Add sidecar container support for linkerd-prometheus
Adds a new setting to the Prometheus' Helm config, allowing adding any kind of sidecar containers to the main container.
The specific use case that inspired this was for exporting data from Prometheus to external systems (e.g. cloudwatch, stackdriver, datadog) using a process that watches the prometheus write-ahead log (WAL).
Signed-off-by: Nathan J. Mehl <n@oden.io>
Add a new structure on the destination controller side to keep track of contextual information.
The token format has been changed from ns:<namespace> to a JSON format so that more variables can be
encdoed in the token. As part of this PR, a new field 'nodeName' has been added to help with service
topologies.
Fixes#4498
Signed-off-by: Matei David <matei.david.35@gmail.com>
This PR removes the service mirror controller from `linkerd mc install` to `linkerd mc link`, as described in https://github.com/linkerd/rfc/pull/31. For fuller context, please see that RFC.
Basic multicluster functionality works here including:
* `linkerd mc install` installs the Link CRD but not any service mirror controllers
* `linkerd mc link` creates a Link resource and installs a service mirror controller which uses that Link
* The service mirror controller creates and manages mirror services, a gateway mirror, and their endpoints.
* The `linkerd mc gateways` command lists all linked target clusters, their liveliness, and probe latences.
* The `linkerd check` multicluster checks have been updated for the new architecture. Several checks have been rendered obsolete by the new architecture and have been removed.
The following are known issues requiring further work:
* the service mirror controller uses the existing `mirror.linkerd.io/gateway-name` and `mirror.linkerd.io/gateway-ns` annotations to select which services to mirror. it does not yet support configuring a label selector.
* an unlink command is needed for removing multicluster links: see https://github.com/linkerd/linkerd2/issues/4707
* an mc uninstall command is needed for uninstalling the multicluster addon: see https://github.com/linkerd/linkerd2/issues/4708
Signed-off-by: Alex Leong <alex@buoyant.io>
EndpointSlices have been made opt-in due to their experimental nature. This PR
introduces a new install flag 'enableEndpointSlices' that will allow adopters to
specify in their cli install or helm install step whether they would like to
use endpointslices as a resource in the destination service, instead of the
endpoints k8s resource.
Signed-off-by: Matei David <matei.david.35@gmail.com>
This moves Prometheus as a add-on, thus making it optional but enabled by default. The also make `linkerd-prometheus` more configurable, and allow it to have its own life-cycle for upgrades, configuration, etc.
This work will be followed by documentation that help users configure existing Prometheus to work with Linkerd.
**Changes Include:**
- moving prometheus manifests into a separate chart at `charts/add-ons/prometheus`, and adding it as a dependency to `linkerd2`
- implement the `addOn` interface to support the same with CLI.
- include configuration in `linkerd-config-addons`
**User Facing Changes:**
The default install experience does not change much but for users who have already configured Prometheus differently, would need to apply the same using the new configuration fields present in chart README
The splitStringListToPorts helm function is currently incorrectly formating a list of ports as an array of Port objects that look ike {"port" : 555}. The config map protobuf representation however expects that the ignoreOutboundPorts and ignoreInboundPorts fields are are list of PortRange objects ({"portRange" : 555}).
This was causing the injector to return an empty string when trying to parse a PortRange object resulting in the ports not getting set correctly when injecting workloads. Note that this is happening only with helm installations as this is when we are actually using a helm template for outputting the config map.
To fix that the splitStringListToPorts helm function is changed to format the objects as the json representation of PortRange and is renamed to splitStringListToPortRanges
Fix: #4679
Signed-off-by: Zahari Dichev zaharidichev@gmail.com
Using following command the wrong spelling were found and later on
fixed:
```
codespell --skip CHANGES.md,.git,go.sum,\
controller/cmd/service-mirror/events_formatting.go,\
controller/cmd/service-mirror/cluster_watcher_test_util.go,\
SECURITY_AUDIT.pdf,.gcp.json.enc,web/app/img/favicon.png \
--ignore-words-list=aks,uint,ans,files\' --check-filenames \
--check-hidden
```
Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com>
Based on the [EndpointSlice PR](https://github.com/linkerd/linkerd2/pull/4663), this is just the k8s/api support for endpointslices to shorten the first PR.
* Adds CRD
* Adds functions that check whether the cluster has EndpointSlice access
* Adds discovery & endpointslice informers to api.
Signed-off-by: Matei David <matei.david.35@gmail.com>
* feat: add log format annotation and helm value
Json log formatting has been added via https://github.com/linkerd/linkerd2-proxy/pull/500
but wiring the option through as an annotation/helm value is still
necessary.
This PR adds the annotation and helm value to configure log format.
Closes#2491
Signed-off-by: Naseem <naseem@transit.app>
Data disappears upon prometheus restarts due to it being all in-memory.
Adding an option to enabled persistence by means of a PVC would be the right approach. It is commonly seen in a wide array of helm charts.
Fixes#4576
Signed-off-by: Naseem <naseem@transit.app>
In #4585 we are observing an issue where a loop is encountered when using nginx ingress. The problem is that the outbound proxy does a dst lookup on the IP address which happens to be the very same address the ingress is listening on.
In order to avoid situations like that this PR introduces a way to modify the set of networks for which the proxy shall do IP based discovery. The change introduces a helm flag `.Values.global.proxy.destinationGetNetworks` that can be used to modify this value. There are two ways a user can affect the this setting:
- setting the `destinationGetNetworks` field in values during a Helm install, which changes the default on all injected pods
- using an annotation ` config.linkerd.io/proxy-destination-get-networks` for injected workloads to override this value
Note that this setting cannot be tweaked through the `install` or `inject` command
Fix: #4585
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
As reported in #4259 linkerd check run from linkerd's web cconsole is
broken as the underlying RBAC Role cannot access the apiregistration.k8s.io API Group.
With this commit the RBAC Role is fixed allowing read-only access to the API Group
apiregistration.k8s.io.
Fixes#4259
Signed-off-by: alex.berger@nexiot.ch <alex.berger@nexiot.ch>
Put back space after `grafanaUrl` label in `linkerd-config-addons.yaml`
to avoid breaking the yaml parsing.
```
$ linkerd check
...
linkerd-addons
--------------
‼ 'linkerd-config-addons' config map exists
could not unmarshal linkerd-config-addons config-map: error
unmarshaling JSON: while decoding JSON: json: cannot unmarshal
string into Go struct field Values.global of type linkerd2.Global
```
This was added in #4544 to avoid having the configmap being badly formatted.
So this PR fixes the yaml, but then if we don't set `grafanaUrl` the
configmap format gets messed up, but apparently that's just a cosmetic
problem:
```
apiVersion: v1
data:
values: "global:\n grafanaUrl: \ngrafana:\n enabled: true\n
image:\n name:
gcr.io/linkerd-io/grafana\n name: linkerd-grafana\n resources:\n
cpu:\n limit:
240m\n memory:\n limit: null\ntracing:\n enabled:
false"
kind: ConfigMap
```
Container-optimized OS on GKE runs with a set of read/write rules that prevent the linkerd-gateway from starting up.
These changes move the directories that nginx needs to write to /tmp and configures the error_log to write to stderr
Signed-off-by: Charles Pretzer charles@buoyant.io
Fixes#4454
As explained
[here](https://github.com/kubernetes/kubernetes/issues/36222#issuecomment-553966166),
trailing spaces in configmap data makes it to look funky when retrieved
later on. This is currently affecting `linkerd-config-addons` and
`linkerd-gateway-config`:
```
$ k -n linkerd-multicluster get cm linkerd-gateway-config -oyaml
apiVersion: v1
data:
nginx.conf: "events {\n}\nstream { \n
\ server { \n
\ listen 4180; \n
\ proxy_pass 127.0.0.1:4140; \n
\ } \n}
\nhttp {\n server {\n listen 4181;\n location /health {\n access_log
off;\n return 200 \"healthy\\n\";\n }\n }\n server {\n listen
\ 8888;\n location /health-local {\n access_log off;\n return
200 \"healthy\\n\";\n }\n } \n}"
kind: ConfigMap
```
AFAIK this is only cosmetic and doesn't affect functionality.
Fixes#4531
This PR updates the `linkerd-gateway` cm's name to be templated. To allow multiple Gateway installations in the same cluster with different configmaps.
(Installing multiple gateways in the same cluster is possible only through Helm, as the CLI dosen't expose those commands currently.)
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
This change modifies the linkerd-gateway component to use the inbound
proxy, rather than nginx, for gateway. This allows us to detect loops and
propagate identity through the gateway.
This change also cleans up port naming to `mc-gateway` and `mc-probe`
to resolve conflicts with Kubernetes validation.
---
* proxy: v2.99.0
The proxy can now operate as gateway, routing requests from its inbound
proxy to the outbound proxy, without passing the requests to a local
application. This supports Linkerd's multicluster feature by adding a
`Forwarded` header to propagate the original client identity and assist
in loop detection.
---
* Add loop detection to inbound & TCP forwarding (linkerd/linkerd2-proxy#527)
* Test loop detection (linkerd/linkerd2-proxy#532)
* fallback: Unwrap errors recursively (linkerd/linkerd2-proxy#534)
* app: Split inbound/outbound constructors into components (linkerd/linkerd2-proxy#533)
* Introduce a gateway between inbound and outbound (linkerd/linkerd2-proxy#540)
* gateway: Add a Forwarded header (linkerd/linkerd2-proxy#544)
* gateway: Return errors instead of responses (linkerd/linkerd2-proxy#547)
* Fail requests that loop through the gateway (linkerd/linkerd2-proxy#545)
* inject: Support config.linkerd.io/enable-gateway
This change introduces a new annotation,
config.linkerd.io/enable-gateway, that, when set, enables the proxy to
act as a gateway, routing all traffic targetting the inbound listener
through the outbound proxy.
This also removes the nginx default listener and gateway port of 4180,
instead using 4143 (the inbound port).
* proxy: v2.100.0
This change modifies the inbound gateway caching so that requests may be
routed to multiple leaves of a traffic split.
---
* inbound: Do not cache gateway services (linkerd/linkerd2-proxy#549)
There are a few notable things happening in this PR:
- the probe manager has been decoupled from the cluster_watcher. Now its only responsibility is to watch for mirrored gateways beeing created and to probe them. This means that probes are initiated for all gateways no matter whether there are mirrored services being paired
- the number of paired services is derived from the existing services in the cluster rather than being published as a metric by the prober
- there are no events being exchanged between the cluster watcher and the probe manager
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
For the Edge-20.5.6 release notes: Mention under the Helm section that the user might wanna manually remove the `nginx-configuration` configmap that is left over after this upgrade.
Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
This change adds a `allow` and `link` commands, effectivelly enabling a cluster to have more than one set of credentials that allow it to be mirrored.
Fx #4461
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Co-authored-by: Alex Leong <alex@buoyant.io>
THis PR addresses two problems:
- when a resync happens (or the mirror controller is restarted) we incorrectly classify the remote gateway as a mirrored service that is not mirrored anymore and we delete it
- when updating services due to a gateway update, we need to select only the services for the particular cluster
The latter fixes#4451