linkerd2

Commit Graph

Author	SHA1	Message	Date
Kevin Leimkuhler	ca86a31816	Add destination service tests for the IP path (#5266 ) This adds additional tests for the destination service that assert `GetProfile` behavior when the path is an IP address. 1. Assert that when the path is a cluster IP, the configured service profile is returned. 2. Assert that when the path a pod IP, the endpoint field is populated in the service profile returned. 3. Assert that when the path is not a cluster or pod IP, the default service profile is returned. 4. Assert that when path is a pod IP with or without the controller annotation, the endpoint has or does not have a protocol hint Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-11-23 13:17:05 -05:00
Alejandro Pedraza	4687dc52aa	Refactor webhook framework to allow webhooks define their flags (#5256 ) * Refactor webhook framework to allow webhook define their flags Pulled out of `launcher.go` the flag parsing logic and moved it into the `Main` methods of the webhooks (under `controller/cmd/proxy.injector/main.go` and `controller/cmd/sp-validator/main.go`), so that individual webhooks themselves can define the flags they want to use. Also no longer require that webhooks have cluster-wide access. Finally, renamed the type `webhook.handlerFunc` to `webhook.Handler` so it can be exported. This will be used in the upcoming jaeger webhook.	2020-11-23 10:40:30 -05:00
Kevin Leimkuhler	92f9387997	Check correct label value when setting protocl hint (#5267 ) This fixes an issue where the protocol hint is always set on endpoint responses. We now check the right value which determines if the pod has the required label. A test for this has been added to #5266. Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-11-20 13:32:50 -08:00
Oliver Gould	375ffd782f	proxy: v2.121.0 (#5253 ) This release changes error handling to teardown the server-side connection when an unexpected error is encountered. Additionally, the outbound TCP routing stack can now skip redundant service discovery lookups when profile responses include endpoint information. Finally, the cache implementation has been updated to reduce latency by removing unnecessary buffers. --- * h2: enable HTTP/2 keepalive PING frames (linkerd/linkerd2-proxy#737) * actions: Add timeouts to GitHub actions (linkerd/linkerd2-proxy#738) * outbound: Skip endpoint resolution on profile hint (linkerd/linkerd2-proxy#736) * Add a FromStr for dns::Name (linkerd/linkerd2-proxy#746) * outbound: Avoid redundant TCP endpoint resolution (linkerd/linkerd2-proxy#742) * cache: Make the cache cloneable with RwLock (linkerd/linkerd2-proxy#743) * http: Teardown serverside connections on error (linkerd/linkerd2-proxy#747)	2020-11-18 16:55:53 -08:00
Kevin Leimkuhler	e65f216d52	Add endpoint to GetProfile response (#5227 ) Context: #5209 This updates the destination service to set the `Endpoint` field in `GetProfile` responses. The `Endpoint` field is only set if the IP maps to a Pod--not a Service. Additionally in this scenario, the default Service Profile is used as the base profile so no other significant fields are set. ### Examples ``` # GetProfile for an IP that maps to a Service ❯ go run controller/script/destination-client/main.go -method getProfile -path 10.43.222.0:9090 INFO[0000] fully_qualified_name:"linkerd-prometheus.linkerd.svc.cluster.local" retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} dst_overrides:{authority:"linkerd-prometheus.linkerd.svc.cluster.local.:9090" weight:10000} ``` Before: ``` # GetProfile for an IP that maps to a Pod ❯ go run controller/script/destination-client/main.go -method getProfile -path 10.42.0.20 INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} ``` After: ``` # GetProfile for an IP that maps to a Pod ❯ go run controller/script/destination-client/main.go -method getProfile -path 10.42.0.20 INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} endpoint:{addr:{ip:{ipv4:170524692}} weight:10000 metric_labels:{key:"control_plane_ns" value:"linkerd"} metric_labels:{key:"deployment" value:"fast-1"} metric_labels:{key:"pod" value:"fast-1-5cc87f64bc-9hx7h"} metric_labels:{key:"pod_template_hash" value:"5cc87f64bc"} metric_labels:{key:"serviceaccount" value:"default"} tls_identity:{dns_like_identity:{name:"default.default.serviceaccount.identity.linkerd.cluster.local"}} protocol_hint:{h2:{}}} ``` Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-11-18 15:41:25 -05:00
Alejandro Pedraza	5a707323e6	Update proxy-init to v1.3.7 (#5221 ) This upgrades both the proxy-init image itself, and the go dependency on proxy-init as a library, which fixes CNI in k3s and any host using binaries coming from BusyBox, where `nsenter` has an issue parsing arguments (see rancher/k3s#1434).	2020-11-13 15:59:14 -05:00
Tarun Pothulapati	4c106e9c08	cli: make check return SkipError when there is no prometheus configured (#5150 ) Fixes #5143 The availability of prometheus is useful for some calls in public-api that the check uses. This change updates the ListPods in public-api to still return the pods even when prometheus is not configured. For a test that exclusively checks for prometheus metrics, we have a gate which checks if a prometheus is configured and skips it othervise. Signed-off-by: Tarun Pothulapati tarunpothulapati@outlook.com	2020-10-29 19:57:11 +05:30
Alex Leong	4f34ce8e2f	Empty the stored addresses when the endpoint translator gets a NoEndpoints message (#5126 ) Signed-off-by: Alex Leong <alex@buoyant.io>	2020-10-22 17:01:03 -07:00
Oliver Gould	d0bce594ea	Remove defunct proxy config variables (#5109 ) The proxy no longer honors DESTINATION_GET variables, as profile lookups inform when endpoint resolution is performed. Also, there is no longer a router capacity limit.	2020-10-20 16:13:53 -07:00
Oliver Gould	c5d3b281be	Add 100.64.0.0/10 to the set of discoverable networks (#5099 ) It appears that Amazon can use the `100.64.0.0/10` network, which is technically private, for a cluster's Pod network. Wikipedia describes the network as: > Shared address space for communications between a service provider > and its subscribers when using a carrier-grade NAT. In order to avoid requiring additional configuration on EKS clusters, we should permit discovery for this network by default.	2020-10-19 12:59:44 -07:00
Kevin Leimkuhler	eff50936bf	Fix --all-namespaces flag handling (#5085 ) ## Motivations Closes #5080 ## Solution When the `--all-namespaces` (`-A`) flag is set for the `linkerd edges` command, ignore the `namespace` value set by default or `-n`. This is similar to the behavior for `kubectl`. `kubectl get -A -n linkerd pods` showing pods in all namespaces. ### Behavior changes With linkerd and emojivoto installed, this results in: Before: ``` ❯ linkerd edges -A pods No edges found. ``` After: ``` ❯ linkerd edges -A pods SRC DST SRC_NS DST_NS SECURED vote-bot-6cb9cb9569-wl6w5 web-5d69bcfdb7-mxf8f emojivoto emojivoto √ web-5d69bcfdb7-mxf8f emoji-7dc976587b-rb9c5 emojivoto emojivoto √ web-5d69bcfdb7-mxf8f voting-bdf4f778c-pjkjg emojivoto emojivoto √ linkerd-prometheus-68d6897d75-ghmgm emoji-7dc976587b-rb9c5 linkerd emojivoto √ linkerd-prometheus-68d6897d75-ghmgm vote-bot-6cb9cb9569-wl6w5 linkerd emojivoto √ linkerd-prometheus-68d6897d75-ghmgm voting-bdf4f778c-pjkjg linkerd emojivoto √ linkerd-prometheus-68d6897d75-ghmgm web-5d69bcfdb7-mxf8f linkerd emojivoto √ linkerd-controller-7d965cf78d-qw6xj linkerd-prometheus-68d6897d75-ghmgm linkerd linkerd √ linkerd-prometheus-68d6897d75-ghmgm linkerd-controller-7d965cf78d-qw6xj linkerd linkerd √ linkerd-prometheus-68d6897d75-ghmgm linkerd-destination-74dbb9c46b-nkxgh linkerd linkerd √ linkerd-prometheus-68d6897d75-ghmgm linkerd-grafana-5d9fb67dc6-sn2l8 linkerd linkerd √ linkerd-prometheus-68d6897d75-ghmgm linkerd-identity-c875b5d58-b756v linkerd linkerd √ linkerd-prometheus-68d6897d75-ghmgm linkerd-proxy-injector-767b55988d-n9r6f linkerd linkerd √ linkerd-prometheus-68d6897d75-ghmgm linkerd-sp-validator-6c8df84fb9-4w8kc linkerd linkerd √ linkerd-prometheus-68d6897d75-ghmgm linkerd-tap-777fbf7656-p87dm linkerd linkerd √ linkerd-prometheus-68d6897d75-ghmgm linkerd-web-546c9444b5-68xpx linkerd linkerd √ ``` `linkerd edges -A -n linkerd pods` results in all edges as well (the result above). The behavior of `linkerd edges pods` does not change and shows edges in the `default` namespace. ``` ❯ linkerd edges pods No edges found. ``` Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-10-16 16:49:10 -04:00
Oliver Gould	4f16a234aa	Add a default set of ports to bypass the proxy (#5093 ) The proxy has a default, hardcoded set of ports on which it doesn't do protocol detection (25, 587, 3306 -- all of which are server-first protocols). In a recent change, this default set was removed from the outbound proxy, since there was no way to configure it to anything other than the default set. I had thought that there was a default set applied to proxy-init, but this appears to not be the case. This change adds these ports to the default Helm values to restore the prior behavior. I have also elected to include 443 in this set, as it is generally our recommendation to avoid proxying HTTPS traffic, since the proxy provides very little value on these connections today. Additionally, the memcached port 11211 is skipped by default, as clients do not issue any sort of preamble that is immediately detectable. These defaults may change in the future, but seem like good choices for the 2.9 release.	2020-10-16 11:53:41 -07:00
Alejandro Pedraza	777b06ac55	Expand 'linkerd edges' to work with TCP connections (#5040 ) * Expand 'linkerd edges' to work with TCP connections Fixes #4999 Before: ``` $ bin/linkerd edges po -owide SRC DST SRC_NS DST_NS CLIENT_ID SERVER_ID SECURED linkerd-prometheus-764ddd4f88-t6c2j rabbitmq-controller-5c6cf7cc6d-8lxp2 linkerd default √ linkerd-prometheus-764ddd4f88-t6c2j temp linkerd default √ ``` After: ``` $ bin/linkerd edges po -owide SRC DST SRC_NS DST_NS CLIENT_ID SERVER_ID SECURED temp rabbitmq-controller-5c6cf7cc6d-5fpsc default default default.default default.default √ linkerd-prometheus-66fb97b7fc-vpnxf rabbitmq-controller-5c6cf7cc6d-5fpsc linkerd default √ linkerd-prometheus-66fb97b7fc-vpnxf temp linkerd default √ ``` With the latest proxy upgrade to v2.113.0 (#5037), the `tcp_open_total` metric now contains the `client_id` label so that we can replace the http-only metric `response_total` with this one to determine edges for TCP-only connections. This change basically performs the same query as before, but two times, one for `response_total` and another for `tcp_open_total`. For each resulting entry, the latter is kept if `client_id` is present, otherwise the former is used (if present at all). That way things keep on working for older proxies. Disclaimers: - This doesn't fix #3706: if two sources connect to the same destination there's no way to tell them appart from the metrics perspective and their edges can get mangled. To fix that, the proxy would have to expose `src_resource` labels in the `tcp_open_total` total inbound metric. - Note connections coming from prometheus are still unidentified. The reason is those hit the proxy's admin server (instead of the main container) which doesn't expose metrics.	2020-10-12 09:14:39 -05:00
Alejandro Pedraza	11a5d1d427	Fix Heartbeat mem and cpu stats (#5042 ) Since k8s 1.16 cadvisor uses the `container` label instead of `container_name` in the prometheus metrics it exposes. The heartbeat queries were using the latter, so they were broken for k8s version since 1.16. Note that the `p99-handle-us` value is still missing because the `request_handle_us` metrics is always zero.	2020-10-08 16:31:16 -05:00
Tarun Pothulapati	1e7bb1217d	Update Injection to use new linkerd-config.values (#5036 ) This PR Updates the Injection Logic (both CLI and proxy-injector) to use `Values` struct instead of protobuf Config, part of our move in removing the protobuf. This does not touch any of the flags, install related code. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com> Co-authored-by: Alex Leong <alex@buoyant.io>	2020-10-07 09:54:34 -07:00
Tarun Pothulapati	5e774aaf05	Remove dependency of linkerd-config for control plane components (#4915 ) * Remove dependency of linkerd-config for most control plane components This PR removes the dependency of `linkerd-config` into control plane components by making all that information passed through CLI flags. As most of these components require a couple of flags, passing them as flags could be more helpful, as updations to the flags trigger a rollout unlike a configMap update. This does not update the proxy-injector as it needs a lot more data and mounting `linkerd-config` is better.	2020-10-06 22:19:18 +05:30
Kevin Leimkuhler	6b7a39c9fa	Set FQN in profile resolutions (#5019 ) ## Motivation Closes #5016 Depends on linkerd/linkerd2-proxy-api#44 ## Solution A `profileTranslator` exists for each service and now has a new `fullyQualifiedName` field. This field is used to set the `FullyQualifiedName` field of `DestinationProfile`s each time an update is sent. In the case that no service profile exists for a service, a default `DestinationProfile` is created and we can use the field to set the correct name. In the case that a service profile does exist for a service, we still use this field to set the name to keep it consistent. ### Example Install linkerd on a cluster and run the destination server: ``` go run controller/cmd/main.go destination -kubeconfig ~/.kube/config ``` Get the IP of a service. Here, we'll get the ip for `linkerd-identity`: ``` > kubectl get -n linkerd svc/linkerd-identity NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE linkerd-identity ClusterIP 10.43.161.68 <none> 8080/TCP 4h25m ``` Get the profile of `linkerd-identity` from service name or IP and note the `FullyQualifiedName` field: ``` > go run controller/script/destination-client/main.go -method getProfile -path 10.43.161.68:8080 INFO[0000] fully_qualified_name:"linkerd-identity.linkerd.svc.cluster.local" .. ``` ``` > go run controller/script/destination-client/main.go -method getProfile -path linkerd-identity.linkerd.svc.cluster.local INFO[0000] fully_qualified_name:"linkerd-identity.linkerd.svc.cluster.local" .. ``` Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-10-01 11:06:00 -04:00
Tarun Pothulapati	d0caaa86c4	Bump k8s client-go to v0.19.2 (#5002 ) Fixes #4191 #4993 This bumps Kubernetes client-go to the latest v0.19.2 (We had to switch directly to 1.19 because of this issue). Bumping to v0.19.2 required upgrading to smi-sdk-go v0.4.1. This also depends on linkerd/stern#5 This consists of the following changes: - Fix ./bin/update-codegen.sh by adding the template path to the gen commands, as it is needed after we moved to GOMOD. - Bump all k8s related dependencies to v0.19.2 - Generate CRD types, client code using the latest k8s.io/code-generator - Use context.Context as the first argument, in all code paths that touch the k8s client-go interface Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-09-28 12:45:18 -05:00
Alejandro Pedraza	b30d35f46a	Reset service-mirror component when target's k8s API is unreachable (#4996 ) When the service-mirror component can't reach the target's k8s API, the goroutine blocks and it can't be unblocked. This was happenining specifically in the case of the multicluster integration test (still to be pushed), where the source and target clusters are created in quick succession and the target's API service doesn't always have time to be exposed before being requested by the service mirror. The fix consists on no longer have restartClusterWatcher be side-effecting, and instead return an error. If such error is not nil then the link watcher is stopped and reset after 10 seconds.	2020-09-25 11:00:28 -05:00
Zahari Dichev	0b649e3ed7	Remove double slash (#4985 ) Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-09-21 12:15:54 -07:00
Oliver Gould	6d67b84447	profiles: Eliminate default timeout (#4958 ) * profiles: Eliminate default timeout	2020-09-10 14:00:18 -07:00
Alejandro Pedraza	ccf027c051	Push docker images to ghcr.io instead of gcr.io (#4953 ) * Push docker images to ghcr.io instead of gcr.io The `cloud_integration.yml` and `release.yml` workflows were modified to log into ghcr.io, and remove the `Configure gcloud` step which is no longer necessary. Note that besides the changes to cloud_integration.yml and release.yml, there was a change to the upgrade-stable integration test so that we do linkerd upgrade --addon-overwrite to reset the addons settings because in stable-2.8.1 the Grafana image was pegged to gcr.io/linkerd-io/grafana in linkerd-config-addons. This will need to be mentioned in the 2.9 upgrade notes. Also the egress integration test has a debug container that now is pegged to the edge-20.9.2 tag. Besides that, the other changes are just a global search and replace (s/gcr.io\/linkerd-io/ghcr.io\/linkerd/).	2020-09-10 15:16:24 -05:00
Oliver Gould	7ee638bb0c	inject: Configure the proxy to discover profiles for unnamed services (#4960 ) The proxy performs endpoint discovery for unnamed services, but not service profiles. The destination controller and proxy have been updated to support lookups for unnamed services in linkerd/linkerd2#4727 and linkerd/linkerd2-proxy#626, respectively. This change modifies the injection template so that the `proxy.destinationGetNetworks` configuration enables profile discovery for all networks on which endpoint discovery is permitted.	2020-09-10 12:44:00 -07:00
Zahari Dichev	77c88419b8	Make destination and identity services headless (#4923 ) * Make destination and identity svcs headless Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-09-02 14:53:38 -05:00
Ali Ariff	5186383c81	Add ARM64 Integration Test (#4897 ) * Add ARM64 Integration Test Signed-off-by: Ali Ariff <ali.ariff12@gmail.com>	2020-08-28 10:38:40 -07:00
Alex Leong	9d3cf6ee4d	Move most service-mirror code out of cmd package (#4901 ) All of the code for the service mirror controller lives in the `linkerd/linkerd2/controller/cmd` package. It is typical for control plane components to only have a `main.go` entrypoint in the cmd package. This can sometimes make it hard to find the service mirror code since I wouldn't expect it to be in the cmd package. We move the majority of the code to a dedicated controller package, leaving only main.go in the cmd package. This is purely organizational; no behavior change is expected. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-08-27 14:17:18 -07:00
Matei David	7ed904f31d	Enable endpoint slices when upgrading through CLI (#4864 ) ## What/How @adleong pointed out in #4780 that when enabling slices during an upgrade, the new value does not persist in the `linkerd-config` ConfigMap. I took a closer look and it seems that we were never overwriting the values in case they were different. * To fix this, I added an if block when validating and building the upgrade options -- if the current flag value differs from what we have in the ConfigMap, then change the ConfigMap value. * When doing so, I made sure to check that if the cluster does not support `EndpointSlices` yet the flag is set to true, we will error out. This is done similarly (copy&paste similarily) to what's in the install part. * Additionally, I have noticed that the helm ConfigMap template stored the flag value under `enableEndpointSlices` field name. I assume this was not changed in the initial PR to reflect the changes made in the protocol buffer. The API (and thus the CLI) uses the field name `endpointSliceEnabled` instead. I have changed the config template so that helm installations will use the same field, which can then be used in the destination service or other components that may implement slice support in the future. Signed-off-by: Matei David <matei.david.35@gmail.com>	2020-08-24 14:34:50 -07:00
Kevin Leimkuhler	c2301749ef	Always return destination overrides for services (#4890 ) ## Motivation #4879 ## Solution When no traffic split exists for services, return a single destination override with a weight of 100%. Using the destination client on a new linkerd installation, this results in the following output for `linkerd-identity` service: ``` ❯ go run controller/script/destination-client/main.go -method getProfile -path linkerd-identity.linkerd.svc.cluster.local:8080 INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} dst_overrides:{authority:"linkerd-identity.linkerd.svc.cluster.local.:8080" weight:100000} INFO[0000] ``` Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-08-19 12:25:58 -07:00
Matei David	f797ab1e65	service topologies: topology-aware service routing (#4780 ) [Link to RFC](https://github.com/linkerd/rfc/pull/23) ### What --- * PR that puts together all past pieces of the puzzle to deliver topology-aware service routing, as specified in the [Kubernetes docs](https://kubernetes.io/docs/concepts/services-networking/service-topology/) but with a much better load balancing algorithm and all the coolness of linkerd :) * The first piece of this PR is focused on adding topology metadata: topology preference for services and topology `<k,v>` pairs for endpoints. * The second piece of this PR puts together the new context format and fetching the source node topology metadata in order to allow for endpoints filtering. * The final part is doing the filtering -- passing all of the metadata to the listener and on every `Add` filtering endpoints based on the topology preference of the service, topology `<k,v>` pairs of endpoints and topology of the source (again `<k,v>` pairs). ### How --- * Collecting metadata: - Services do not have values for topology keys -- the topological keys defined in a service's spec are only there to dictate locality preference for routing; as such, I decided to store them in an array, they will be taken exactly as they are found in the service spec, this ensures we respect the preference order. - For EndpointSlices, we are using a map -- an EndpointSlice has locality information in the form of `<k,v>` pair, where the key is a topological key (similar to what's listed in the service) and the value is the locality information -- e.g `hostname: minikube`. For each address we now have a map of topology values which gets populated when we translate the endpoints to an address set. Because normal Endpoints do not have any topology information, we create each address with an empty map which is subsequently populated ONLY for slices in the `endpointSliceToAddressSet` function. * Filtering endpoints: - This was a tricky part and filled me with doubts. I think there are a few ways to do this, but this is how I "envisioned" it. First, the `endpoint_translator.go` should be the one to do the filtering; this means that on subscription, we need to feed all of the relevant metadata to the listener. To do this, I created a new function `AddTopologyFilter` as part of the listener interface. - To complement the `AddTopologyFilter` function, I created a new `TopologyFilter` struct in `endpoints_watcher.go`. I then embedded this structure in all listeners that implement the interface. The structure holds the source topology (source node), a boolean to tell if slices are activated in case we need to double check (or write tests for the function) and the service preference. We create the filter on Subscription -- we have access to the k8s client here as well as the service, so it's the best point to collect all of this data together. Addresses all have their own topology added to them so they do not have to be collected by the filter. - When we add a new set of addresses, we check to see if slices are enabled -- chances are if slices are enabled, service topology might be too. This lets us skip this step if the latest version is not adopted. Prior to sending an `Add` we filter the endpoints -- if the preference is registered by the filter we strictly enforce it, otherwise nothing changes. And that's pretty much it. Signed-off-by: Matei David <matei.david.35@gmail.com>	2020-08-18 11:11:09 -07:00
Kevin Leimkuhler	525ad264b0	Print identity in destination client and fix proxy-identity log line (#4873 ) ## Motivation These changes came up when testing mock identity. I found it useful for the destination client to print the identity of endpoints. ``` ❯ go run controller/script/destination-client/main.go -method get -path h1.test.example.com:8080 INFO[0000] Add: INFO[0000] labels: map[concrete:h1.test.example.com:8080] INFO[0000] - 127.0.0.1:4143 INFO[0000] - labels: map[addr:127.0.0.1:4143 h2:false] INFO[0000] - protocol hint: UNKNOWN INFO[0000] - identity: dns_like_identity:{name:"foo.ns1.serviceaccount.identity.linkerd.cluster.local"} INFO[0000] ``` I also fixed a log line in the proxy-identity where used the wrong value for the CSR path Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-08-13 13:49:55 -07:00
Josh Soref	72aadb540f	Spelling (#4872 ) This PR corrects misspellings identified by the [check-spelling action](https://github.com/marketplace/actions/check-spelling). The misspellings have been reported at `aaf440489e (commitcomment-41423663)` The action reports that the changes in this PR would make it happy: `5b82c6c5ca` Note: this PR does not include the action. If you're interested in running a spell check on every PR and push, that can be offered separately. Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>	2020-08-12 21:59:50 -07:00
Alejandro Pedraza	4876a94ed0	Update proxy-init version to v1.3.6 (#4850 ) Supersedes #4846 Bump proxy-init to v1.3.6, containing CNI fixes and support for multi-arch builds. #4846 included this in v1.3.5 but proxy.golang.org refused to update the modified SHA	2020-08-11 11:54:00 -05:00
Alex Leong	024a35a3d3	Move multicluster API connectivity checks earlier (#4819 ) Fixes #4774 When a service mirror controller is unable to connect to the target cluster's API, the service mirror controller crashes with the error that it has failed to sync caches. This error lacks the necessary detail to debug the situation. Unfortunately, client-go does not surface more useful information about why the caches failed to sync. To make this more debuggable we do a couple things: 1. When creating the target cluster api client, we eagerly issue a server version check to test the connection. If the connection fails, the service-mirror-controller logs now look like this: ``` time="2020-07-30T23:53:31Z" level=info msg="Got updated link broken: {Name:broken Namespace:linkerd-multicluster TargetClusterName:broken TargetClusterDomain:cluster.local TargetClusterLinkerdNamespace:linkerd ClusterCredentialsSecret:cluster-credentials-broken GatewayAddress:35.230.81.215 GatewayPort:4143 GatewayIdentity:linkerd-gateway.linkerd-multicluster.serviceaccount.identity.linkerd.cluster.local ProbeSpec:ProbeSpec: {path: /health, port: 4181, period: 3s} Selector:{MatchLabels:map[] MatchExpressions:[{Key:mirror.linkerd.io/exported Operator:Exists Values:[]}]}}" time="2020-07-30T23:54:01Z" level=error msg="Unable to create cluster watcher: cannot connect to api for target cluster remote: Get \"https://36.199.152.138/version?timeout=32s\": dial tcp 36.199.152.138:443: i/o timeout" ``` This error also no longer causes the service mirror controller to crash. Updating the Link resource will cause the service mirror controller to reload the credentials and try again. 2. We rearrange the checks in `linkerd check --multicluster` to perform the target API connectivity checks before the service mirror controller checks. This means that we can validate the target cluster API connection even if the service mirror controller is not healthy. We also add a server version check here to quickly determine if the connection is healthy. Sample check output: ``` linkerd-multicluster -------------------- √ Link CRD exists √ Link resources are valid * broken W0730 16:52:05.620806 36735 transport.go:243] Unable to cancel request for promhttp.RoundTripperFunc × remote cluster access credentials are valid * failed to connect to API for cluster: [broken]: Get "https://36.199.152.138/version?timeout=30s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) see https://linkerd.io/checks/#l5d-smc-target-clusters-access for hints W0730 16:52:35.645499 36735 transport.go:243] Unable to cancel request for promhttp.RoundTripperFunc × clusters share trust anchors Problematic clusters: * broken: unable to fetch anchors: Get "https://36.199.152.138/api/v1/namespaces/linkerd/configmaps/linkerd-config?timeout=30s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) see https://linkerd.io/checks/#l5d-multicluster-clusters-share-anchors for hints √ service mirror controller has required permissions * broken √ service mirror controllers are running * broken × all gateway mirrors are healthy wrong number of (0) gateway metrics entries for probe-gateway-broken.linkerd-multicluster see https://linkerd.io/checks/#l5d-multicluster-gateways-endpoints for hints √ all mirror services have endpoints ‼ all mirror services are part of a Link mirror service voting-svc-gke.emojivoto is not part of any Link see https://linkerd.io/checks/#l5d-multicluster-orphaned-services for hints ``` Some logs from the underlying go network libraries sneak into the output which is kinda gross but I don't think it interferes too much with being able to understand what's going on. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-08-05 11:48:23 -07:00
Ali Ariff	61d7dedd98	Build ARM docker images (#4794 ) Build ARM docker images in the release workflow. # Changes: - Add a new env key `DOCKER_MULTIARCH` and `DOCKER_PUSH`. When set, it will build multi-arch images and push them to the registry. See https://github.com/docker/buildx/issues/59 for why it must be pushed to the registry. - Usage of `crazy-max/ghaction-docker-buildx ` is necessary as it already configured with the ability to perform cross-compilation (using QEMU) so we can just use it, instead of manually set up it. - Usage of `buildx` now make default global arguments. (See: https://docs.docker.com/engine/reference/builder/#automatic-platform-args-in-the-global-scope) # Follow-up: - Releasing the CLI binary file in ARM architecture. The docker images resulting from these changes already build in the ARM arch. Still, we need to make another adjustment like how to retrieve those binaries and to name it correctly as part of Github Release artifacts. Signed-off-by: Ali Ariff <ali.ariff12@gmail.com>	2020-08-05 11:14:01 -07:00
Alex Leong	381f237f69	Add multicluster unlink command (#4802 ) Fixes #4707 In order to remove a multicluster link, we add a `linkerd multicluster unlink` command which produces the yaml necessary to delete all of the resources associated with a `linkerd multicluster link`. These are: * the link resource * the service mirror controller deployment * the service mirror controller's RBAC * the probe gateway mirror for this link * all mirror services for this link This command follows the same pattern as the `linkerd uninstall` command in that its output is expected to be piped to `kubectl delete`. The typical usage of this command is: ``` linkerd --context=source multicluster unlink --cluster-name=foo \| kubectl --context=source delete -f - ``` This change also fixes the shutdown lifecycle of the service mirror controller by properly having it listen for the shutdown signal and exit its main loop. A few alternative designs were considered: I investigated using owner references as suggested [here](https://github.com/linkerd/linkerd2/issues/4707#issuecomment-653494591) but it turns out that owner references must refer to resources in the same namespace (or to cluster scoped resources). This was not feasible here because a service mirror controller can create mirror services in many different namespaces. I also considered having the service mirror controller delete the mirror services that it created during its own shutdown. However, this could lead to scenarios where the controller is killed before it finishes deleting the services that it created. It seemed more reliable to have all the deletions happen from `kubectl delete`. Since this is the case, we avoid having the service mirror controller delete mirror services, even when the link is deleted, to avoid the race condition where the controller and CLI both attempt to delete the same mirror services and one of them fails with a potentially alarming error message. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-08-04 16:21:59 -07:00
cpretzer	670caaf8ff	Update to proxy-init v1.3.4 (#4815 ) Signed-off-by: Charles Pretzer <charles@buoyant.io>	2020-07-30 15:58:58 -05:00
Alex Leong	a1543b33e3	Add support for service-mirror selectors (#4795 ) * Add selector support Signed-off-by: Alex Leong <alex@buoyant.io> * Removed unused labels Signed-off-by: Alex Leong <alex@buoyant.io>	2020-07-30 10:07:14 -07:00
Matt Miller	fc33b9b9aa	support overriding inbound and outbound connect timeouts. (#4759 ) * support overriding inbound and outbound connect timeouts. * add validation on user provided TCP connect timeouts * convert valid time values into ms Signed-off-by: Matt Miller <mamiller@rosettastone.com>	2020-07-27 13:56:21 -07:00
Tharun Rajendran	e24c323bf9	Gateway Metrics in Dashboard (#4717 ) * Introduce multicluster gateway api handler in web api server * Added MetricsUtil for Gateway metrics * Added gateway api helper * Added Gateway Component Updated metricsTable component to support gateway metrics Added handler for gateway Fixes #4601 Signed-off-by: Tharun <rajendrantharun@live.com>	2020-07-27 12:43:54 -07:00
Matei David	1c197b14e7	Change destination context token format (#4771 ) Add a new structure on the destination controller side to keep track of contextual information. The token format has been changed from ns:<namespace> to a JSON format so that more variables can be encdoed in the token. As part of this PR, a new field 'nodeName' has been added to help with service topologies. Fixes #4498 Signed-off-by: Matei David <matei.david.35@gmail.com>	2020-07-27 09:49:48 -07:00
Alex Leong	d540e16c8b	Make service mirror controller per target cluster (#4710 ) This PR removes the service mirror controller from `linkerd mc install` to `linkerd mc link`, as described in https://github.com/linkerd/rfc/pull/31. For fuller context, please see that RFC. Basic multicluster functionality works here including: * `linkerd mc install` installs the Link CRD but not any service mirror controllers * `linkerd mc link` creates a Link resource and installs a service mirror controller which uses that Link * The service mirror controller creates and manages mirror services, a gateway mirror, and their endpoints. * The `linkerd mc gateways` command lists all linked target clusters, their liveliness, and probe latences. * The `linkerd check` multicluster checks have been updated for the new architecture. Several checks have been rendered obsolete by the new architecture and have been removed. The following are known issues requiring further work: * the service mirror controller uses the existing `mirror.linkerd.io/gateway-name` and `mirror.linkerd.io/gateway-ns` annotations to select which services to mirror. it does not yet support configuring a label selector. * an unlink command is needed for removing multicluster links: see https://github.com/linkerd/linkerd2/issues/4707 * an mc uninstall command is needed for uninstalling the multicluster addon: see https://github.com/linkerd/linkerd2/issues/4708 Signed-off-by: Alex Leong <alex@buoyant.io>	2020-07-23 14:32:50 -07:00
Alejandro Pedraza	5e789ba152	Migrate CI to docker buildx and other improvements (#4765 ) * Migrate CI to docker buildx and other improvements ## Motivation - Improve build times in forks. Specially when rerunning builds because of some flaky test. - Start using `docker buildx` to pave the way for multiplatform builds. ## Performance improvements These timings were taken for the `kind_integration.yml` workflow when we merged and rerun the lodash bump PR (#4762) Before these improvements: - when merging: `24:18` - when rerunning after merge (docker cache warm): `19:00` - when running the same changes in a fork (no docker cache): `32:15` After these improvements: - when merging: `25:38` - when rerunning after merge (docker cache warm): `19:25` - when running the same changes in a fork (docker cache warm): `19:25` As explained below, non-forks and forks now use the same cache, so the important take is that forks will always start with a warm cache and we'll no longer see long build times like the `32:15` above. The downside is a slight increase in the build times for non-forks (up to a little more than a minute, depending on the case). ## Build containers in parallel The `docker_build` job in the `kind_integration.yml`, `cloud_integration.yml` and `release.yml` workflows relied on running `bin/docker-build` which builds all the containers in sequence. Now each container is built in parallel using a matrix strategy. ## New caching strategy CI now uses `docker buildx` for building the container images, which allows using an external cache source for builds, a location in the filesystem in this case. That location gets cached using actions/cache, using the key `{{ runner.os }}-buildx-${{ matrix.target }}-${{ env.TAG }}` and the restore key `${{ runner.os }}-buildx-${{ matrix.target }}-`. For example when building the `web` container, its image and all the intermediary layers get cached under the key `Linux-buildx-web-git-abc0123`. When that has been cached in the `main` branch, that cache will be available to all the child branches, including forks. If a new branch in a fork asks for a key like `Linux-buildx-web-git-def456`, the key won't be found during the first CI run, but the system falls back to the key `Linux-buildx-web-git-abc0123` from `main` and so the build will start with a warm cache (more info about how keys are matched in the [actions/cache docs](https://docs.github.com/en/actions/configuring-and-managing-workflows/caching-dependencies-to-speed-up-workflows#matching-a-cache-key)). ## Packet host no longer needed To benefit from the warm caches both in non-forks and forks like just explained, we're required to ditch doing the builds in Packet and now everything runs in the github runners VMs. As a result there's no longer separate logic for non-forks and forks in the workflow files; `kind_integration.yml` was greatly simplified but `cloud_integration.yml` and `release.yml` got a little bigger in order to use the actions artifacts as a repository for the images built. This bloat will be fixed when support for [composite actions](https://github.com/actions/runner/blob/users/ethanchewy/compositeADR/docs/adrs/0549-composite-run-steps.md) lands in github. ## Local builds You still are able to run `bin/docker-build` or any of the `docker-build.*` scripts. And to make use of buildx, run those same scripts after having set the env var `DOCKER_BUILDKIT=1`. Using buildx supposes you have installed it, as instructed [here](https://github.com/docker/buildx). ## Other - A new script `bin/docker-cache-prune` is used to remove unused images from the cache. Without that the cache grows constantly and we can rapidly hit the 5GB limit (when the limit is attained the oldest entries get evicted). - The `go-deps` dockerfile base image was changed from `golang:1.14.2` (ubuntu based) to `golang-1:14.2-alpine` also to conserve cache space. # Addressed separately in #4875: Got rid of the `go-deps` image and instead added something similar on top of all the Dockerfiles dealing with `go`, as a first stage for those Dockerfiles. That continues to serve as a way to pre-populate go's build cache, which speeds up the builds in the subsequent stages. That build should in theory be rebuilt automatically only when `go.mod` or `go.sum` change, and now we don't require running `bin/update-go-deps-shas`. That script was removed along with all the logic elsewhere that used it, including the `go_dependencies` job in the `static_checks.yml` github workflow. The list of modules preinstalled was moved from `Dockerfile-go-deps` to a new script `bin/install-deps`. I couldn't find a way to generate that list dynamically, so whenever a slow-to-compile dependency is found, we have to make sure it's included in that list. Although this simplifies the dev workflow, note that the real motivation behind this was a limitation in buildx's `docker-container` driver that forbids us from depending on images that haven't been pushed to a registry, so we have to resort to building the dependencies as a first stage in the Dockerfiles.	2020-07-22 14:27:45 -05:00
Wei Lun	85a042c151	add fish shell completion (#4751 ) fixes #4208 Signed-off-by: Wei Lun <weilun_95@hotmail.com>	2020-07-20 15:46:30 -07:00
Matei David	146c593cd5	Uncomment EndpointSliceAccess function (#4760 ) * Small PR that uncomments the `EndpointSliceAcess` method and cleans up left over todos in the destination service. * Based on the past three PRs related to `EndpointSlices` (#4663 #4696 #4740); they should now be functional (albeit prone to bugs) and ready to use. Signed-off-by: Matei David <matei.david.35@gmail.com>	2020-07-20 14:50:43 -07:00
Tarun Pothulapati	b7e9507174	Remove/Relax prometheus related checks (#4724 ) * Removes/Relaxes prometheus related checks Now that prometheus is an add-on, There can be cases where prometheus is disabled at which the check should show a warning but not fail. This decouples the tight depedency. This changes the following checks: - Removes serviceAccount and pod checks in the CLI. - Relaxes `linkerd-api` checks to only check for prometheus access when the URL is not empty. This should work seamlessly with external prometheus as that URL will be passed and it performs the same check. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-07-20 14:24:00 -07:00
Matei David	8b85716eb8	Introduce install flag for EndpointSlices (#4740 ) EndpointSlices have been made opt-in due to their experimental nature. This PR introduces a new install flag 'enableEndpointSlices' that will allow adopters to specify in their cli install or helm install step whether they would like to use endpointslices as a resource in the destination service, instead of the endpoints k8s resource. Signed-off-by: Matei David <matei.david.35@gmail.com>	2020-07-15 09:53:04 -07:00
Kevin Leimkuhler	f49b40c4a9	Add support for profile lookups by IP address (#4727 ) ## Motivation Closes #3916 This adds the ability to get profiles for services by IP address. ### Change in behavior When the destination server receives a `GetProfile` request with an IP address, it now tries to map that IP address to a service. If the IP address maps to an existing service, then the destination server returns the profile stream subscribes for updates to the _service_--this is the existing behavior. If the IP changes to a new service, the stream will still send updates for the first service the IP address corresponded to since that is what it is subscribed to. If the IP address does not map to an existing service, then the destination server returns the profile stream but does not subscribe for updates. The stream will receive one update, the default profile. ### Solution This change uses the `IPWatcher` within the destination server to check for what services an IP address correspond to. By adding a new method `GetSvc` to `IPWatcher`, the server now calls this method if `GetProfile` receives a request with an IP address. ## Testing Install linkerd on a cluster and get the cluster IP of any service: ```bash ❯ kubectl get -n linkerd svc/linkerd-tap -o wide NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR linkerd-tap ClusterIP 10.104.57.90 <none> 8088/TCP,443/TCP 16h linkerd.io/control-plane-component=tap ``` Run the destination server: ```bash ❯ go run controller/cmd/main.go destination -kubeconfig ~/.kube/config ``` Get the profile for the tap service by IP address: ```bash ❯ go run controller/script/destination-client/main.go -method getProfile -path 10.104.57.90:8088 INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} INFO[0000] ``` Get the profile for an IP address that does not correspond to a service: ```bash ❯ go run controller/script/destination-client/main.go -method getProfile -path 10.256.0.1:8088 INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} INFO[0000] ``` You can add and remove settings for the service profile for tap and get updates. Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-07-10 14:41:15 -07:00
cpretzer	d3553c59fd	Add volume and volumeMount for buster-based proxy-init (#4692 ) * Add volume and volumeMount for buster-based proxy-init Signed-off-by: Charles Pretzer <charles@buoyant.io>	2020-07-09 09:55:07 -07:00
Zahari Dichev	73010149ce	Do not treat evicted pods as failed in healthchecks (#4732 ) When a k8s pod is evicted its Phase is set to Failed and the reason is set to Evicted. Because in the ListPods method of the public APi we only transmit the phase and treat it as Status, the healthchecks assume such evicted data plane pods to be failed. Since this check is retryable, the results is that linkerd check --proxy appears to hang when there are evicted pods. As @adleong correctly pointed out here, the presence of evicted pod is not something that we should make the checks fail. This change modifies the publci api to set the Pod.Status to "Evicted" for evicted pods. The healtcheks are also modified to not treat evicted pods as error cases. Fix #4690 Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-07-09 14:22:27 +03:00
Alejandro Pedraza	e4273522b8	Delete unused files (#4729 ) Removed `controller/proxy-injector/webhook_ops.go` and `controller/sp-validator/webhook_ops.go` that we used when we first introduced webhooks to dynamically create their configs, but we ended up doing that upfront at install time.	2020-07-08 06:41:44 -05:00
Suraj Deshmukh	d7dbe9cbff	Fix spelling mistakes using codespell (#4700 ) Using following command the wrong spelling were found and later on fixed: ``` codespell --skip CHANGES.md,.git,go.sum,\ controller/cmd/service-mirror/events_formatting.go,\ controller/cmd/service-mirror/cluster_watcher_test_util.go,\ SECURITY_AUDIT.pdf,.gcp.json.enc,web/app/img/favicon.png \ --ignore-words-list=aks,uint,ans,files\' --check-filenames \ --check-hidden ``` Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com>	2020-07-07 17:07:22 -05:00
Matei David	9d8d89cce8	Add EndpointSlice logic to EndpointsWatcher (#4501 ) (#4663 ) Introduce support for the EndpointSlice k8s resource (k8s v1.16+) in the destination service. Through this PR, in the EndpointsWatcher, there will be a dedicated informer for EndpointSlice; the informer cannot run at the same time as the Endpoints resource informer. The main difference is that EndpointSlices have a one-to-many relationship with a service, they provide better performance benefits, dual-stack addresses and more. EndpointSlice support also implies service topology and other k8s related features. Validated and tested manually, as well as with dedicated unit tests. Closes #4501 Signed-off-by: Matei David <matei.david.35@gmail.com>	2020-07-07 13:20:40 -07:00
Matei David	a2bd230cd6	service topologies: add Kubernetes/API EndpointSlice support (#4696 ) Based on the [EndpointSlice PR](https://github.com/linkerd/linkerd2/pull/4663), this is just the k8s/api support for endpointslices to shorten the first PR. * Adds CRD * Adds functions that check whether the cluster has EndpointSlice access * Adds discovery & endpointslice informers to api. Signed-off-by: Matei David <matei.david.35@gmail.com>	2020-07-06 15:28:48 -07:00
Naseem	361d35bb6a	feat: add log format annotation and helm value (#4620 ) * feat: add log format annotation and helm value Json log formatting has been added via https://github.com/linkerd/linkerd2-proxy/pull/500 but wiring the option through as an annotation/helm value is still necessary. This PR adds the annotation and helm value to configure log format. Closes #2491 Signed-off-by: Naseem <naseem@transit.app>	2020-07-02 10:08:52 -05:00
Arthur Silva Sens	021048d576	GoDocs for completion, dashboard and diagnostics cli commands (#4518 ) Signed-off-by: arthursens <arthursens2005@gmail.com>	2020-06-30 05:53:50 -05:00
Alejandro Pedraza	aea541d6f9	Upgrade generated protobuf files to v1.4.2 (#4673 ) Regenerated protobuf files, using version 1.4.2 that was upgraded from 1.3.2 with the proxy-api update in #4614. As of v1.4 protobuf messages are disallowed to be copied (because they hold a mutex), so whenever a message is passed to or returned from a function we need to use a pointer. This affects _mostly_ test files. This is required to unblock #4620 which is adding a field to the config protobuf.	2020-06-26 09:36:48 -05:00
Oliver Gould	c4d649e25d	Update proxy-api version to v0.1.13 (#4614 ) This update includes no API changes, but updates grpc-go to the latest release.	2020-06-24 12:52:59 -07:00
Lutz Behnke	846d2f11d4	Add support for Helm configuration of per-component proxy resources requests and limits (#4226 ) Signed-off-by: Lutz Behnke <lutz.behnke@finleap.com>	2020-06-24 12:54:27 -05:00
Zahari Dichev	7f3d872930	Add destination-get-networks option (#4608 ) In #4585 we are observing an issue where a loop is encountered when using nginx ingress. The problem is that the outbound proxy does a dst lookup on the IP address which happens to be the very same address the ingress is listening on. In order to avoid situations like that this PR introduces a way to modify the set of networks for which the proxy shall do IP based discovery. The change introduces a helm flag `.Values.global.proxy.destinationGetNetworks` that can be used to modify this value. There are two ways a user can affect the this setting: - setting the `destinationGetNetworks` field in values during a Helm install, which changes the default on all injected pods - using an annotation ` config.linkerd.io/proxy-destination-get-networks` for injected workloads to override this value Note that this setting cannot be tweaked through the `install` or `inject` command Fix: #4585 Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-06-18 20:07:47 +03:00
Alex Leong	755538b84a	Resolve gateway hostnames into IP addresses (#4588 ) Fixes #4582 When a target cluster gateway is exposed as a hostname rather than with a fixed IP address, the service mirror controller fails to create mirror services and gateway mirrors for that gateway. This is because we only look at the IP field of the gateway service. We make two changes to address this problem: First, when extracting the gateway spec from a gateway that has a hostname instead of an IP address, we do a DNS lookup to resolve that hostname into an IP address to use in the mirror service endpoints and gateway mirror endpoints. Second, we schedule a repair job on a regular (1 minute) to update these endpoint objects. This has the effect of re-resolving the DNS names every minute to pick up any changes in DNS resolution. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-06-15 10:33:49 -07:00
Zahari Dichev	f01bcfe722	Tweak service-mirror log levels (#4562 ) This PR just modifies the log levels on the probe and cluster watchers to emit in INFO what they would emit in DEBUG. I think it makes sense as we need that information to track problems. The only difference is that when probing gateways we only log if the probe attempt was unsuccessful. Fix #4546	2020-06-05 13:12:36 -07:00
Zahari Dichev	3365455e45	Fix mc labels (#4560 ) Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-06-05 19:36:09 +03:00
Zahari Dichev	b6b95455aa	Fix load balancer missing ip race condition (#4554 ) Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-06-05 19:35:47 +03:00
Alex Leong	cffa07ddba	Update gateway identity on gateway mirror endpoints (#4559 ) When the identity annotation on a gateway service is updated, this change is not propagated to the mirror gateway endpoints object. This is because the annotations are updated on the wrong object and the changes are lost. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-06-05 09:21:35 -07:00
Alex Leong	0f84ff61db	Update gateway mirror ports (#4551 ) * Update gateway mirror spec when remote gateway changes Signed-off-by: Alex Leong <alex@buoyant.io> * Only update ports Signed-off-by: Alex Leong <alex@buoyant.io>	2020-06-04 17:25:46 +03:00
Kevin Leimkuhler	8a932ac905	Change text to use source/target terminology in events and metrics (#4527 ) Change terminology from local/remote to source/target in events and metrics. This does not change any variable, function, struct, or field names since testing is still improving Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-06-03 15:02:39 -04:00
Oliver Gould	7cc5e5c646	multicluster: Use the proxy as an HTTP gateway (#4528 ) This change modifies the linkerd-gateway component to use the inbound proxy, rather than nginx, for gateway. This allows us to detect loops and propagate identity through the gateway. This change also cleans up port naming to `mc-gateway` and `mc-probe` to resolve conflicts with Kubernetes validation. --- * proxy: v2.99.0 The proxy can now operate as gateway, routing requests from its inbound proxy to the outbound proxy, without passing the requests to a local application. This supports Linkerd's multicluster feature by adding a `Forwarded` header to propagate the original client identity and assist in loop detection. --- * Add loop detection to inbound & TCP forwarding (linkerd/linkerd2-proxy#527) * Test loop detection (linkerd/linkerd2-proxy#532) * fallback: Unwrap errors recursively (linkerd/linkerd2-proxy#534) * app: Split inbound/outbound constructors into components (linkerd/linkerd2-proxy#533) * Introduce a gateway between inbound and outbound (linkerd/linkerd2-proxy#540) * gateway: Add a Forwarded header (linkerd/linkerd2-proxy#544) * gateway: Return errors instead of responses (linkerd/linkerd2-proxy#547) * Fail requests that loop through the gateway (linkerd/linkerd2-proxy#545) * inject: Support config.linkerd.io/enable-gateway This change introduces a new annotation, config.linkerd.io/enable-gateway, that, when set, enables the proxy to act as a gateway, routing all traffic targetting the inbound listener through the outbound proxy. This also removes the nginx default listener and gateway port of 4180, instead using 4143 (the inbound port). * proxy: v2.100.0 This change modifies the inbound gateway caching so that requests may be routed to multiple leaves of a traffic split. --- * inbound: Do not cache gateway services (linkerd/linkerd2-proxy#549)	2020-06-02 19:37:14 -07:00
Kevin Leimkuhler	d7f84e6c7b	Change help text to use source/target terminology in service-mirror and healthchecks (#4524 ) Change terminology from local/remote to source/target in service-mirror and healthchecks help text. This does not change any variable, function, struct, or field names since testing is still improving Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-06-02 15:21:52 -04:00
Alex Leong	91a067c924	Rename gateway ports (#4526 ) * Rename gateway ports Signed-off-by: Alex Leong <alex@buoyant.io> * fmt Signed-off-by: Alex Leong <alex@buoyant.io>	2020-06-02 09:08:23 +03:00
Kevin Leimkuhler	b4804a0bb5	Format fix (#4525 ) Fixes CI failures Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-06-01 18:51:00 -04:00
Zahari Dichev	6c3922a7f1	Probe manager simplification (#4510 ) There are a few notable things happening in this PR: - the probe manager has been decoupled from the cluster_watcher. Now its only responsibility is to watch for mirrored gateways beeing created and to probe them. This means that probes are initiated for all gateways no matter whether there are mirrored services being paired - the number of paired services is derived from the existing services in the cluster rather than being published as a metric by the prober - there are no events being exchanged between the cluster watcher and the probe manager Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-06-01 14:41:29 -07:00
Zahari Dichev	f7f70690fb	Fix resync bug + service selection annotations (#4453 ) THis PR addresses two problems: - when a resync happens (or the mirror controller is restarted) we incorrectly classify the remote gateway as a mirrored service that is not mirrored anymore and we delete it - when updating services due to a gateway update, we need to select only the services for the particular cluster The latter fixes #4451	2020-05-21 14:15:13 -07:00
Alex Leong	acacf2e023	Add --close-wait-timeout inject flag (#4409 ) Depends on https://github.com/linkerd/linkerd2-proxy-init/pull/10 Fixes #4276 We add a `--close-wait-timeout` inject flag which configures the proxy-init container to run with `privileged: true` and to set `nf_conntrack_tcp_timeout_close_wait`. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-05-21 14:14:14 -07:00
Alex Leong	9cd4557644	Properly show the meshed count for non-selector services (#4446 ) When viewing the output of `linkerd stat` for services which do not have a selector (such as services created by the service-mirror, for example) the meshed count column shows the total number which exist, even though the service actually selects no pods at all. We update the StatSummary implementation to account for services which have no selector. Additionally, we update the logic of the `--unmeshed` flag. When the `--unmeshed` flag is not set, we typically skip rows for unmeshed resources because those resources would have no stats. This is not appropriate to do when the `--from` flag is also set because in this case, metrics are not collected on the target resource but are instead collected on the client-side. This means that stats can be present, even for unmeshed resources and these resources should still be displayed, even if the `--unmeshed` flag is not set. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-05-20 10:08:27 -07:00
Zahari Dichev	31e33d18d3	Enable service mirroring to work in private networks (#4440 ) This change creates a gateway proxy for every gateway. This enables the probe worker to leverage the destination service functionality in order to discover the identity of the gateway. Fix #4411 Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-05-20 19:48:36 +03:00
Zahari Dichev	6574f124a7	Restrict Service mirror RBACs (#4426 ) This PR introduces a few changes that were requested after a bit of service mirror reviewing. - we restrict the RBACs so the service mirror controller cannot read secrets in all namespaces but only in the one that it is installed in - we unify the namespace namings so all multicluster resources are installedi n `linkerd-multicluster` on both clusters - fixed checks to account for changes Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-05-20 17:08:01 +03:00
Zahari Dichev	4176580a0f	Threadsafe buffering listener (#4359 ) * Add thread safety to watcher tests Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-05-14 20:45:41 +03:00
Zahari Dichev	115bab9868	Fix gateway update problems (#4388 ) * Fix gateway update problems Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-05-14 10:59:30 -05:00
Zahari Dichev	ef1a2c2b10	Multicluster dashboard for traffic metrics (#4178 ) This change adds labels to endpoints that target remote services. It also adds a Grafana dashboard that can be used to monitor multicluster traffic. Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-05-14 17:48:27 +03:00
Zahari Dichev	fd59ce532d	Add better logging to service mirror controller (#4361 ) Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-05-11 10:30:16 +03:00
Zahari Dichev	edd9b654a7	Make gateway require TLS for incoming requests (#4339 ) Make gateway require TLS for incoming requests Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-05-11 10:07:48 +03:00
Alex Leong	8fbaa3ef9b	Don't send NoEndpoints during pod updates for ip watches (#4338 ) When the proxy has an IP watch on a pod and the destination controller gets a pod update event, the destination controller sends a NoEndpoints message to all listeners followed by an Add with the new pod state. This can result in the proxy's load balancer being briefly empty and could result in failing requests in the period. Since consecutive Add events with the same address will override each other, we can simply send the Adds without needing to clear the previous state with a NoEndpoints message.	2020-05-07 16:10:17 -07:00
Zahari Dichev	4e82ba8878	Multicluster checks (#4279 ) Multicluster checks Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-05-05 10:19:38 +03:00
Zahari Dichev	cd04b94bb9	Probe manager events emission tests (#4312 ) Probe manager events emission tests Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-05-05 08:57:05 +03:00
Alex Leong	40b921508f	Inject LINKERD2_PROXY_DESTINATION_GET_NETWORKS proxy variable (#4300 ) Fixes #3807 By setting the LINKERD2_PROXY_DESTINATION_GET_NETWORKS environment variable, we configure the Linkerd proxy to do destination lookups for authorities which are IP addresses in the private network range. This allows us to get destination metadata including identity for HTTP requests which target an IP address in the cluster, Prometheus metrics scrape requests, for example. This change allowed us to update the "direct edges" test which ensures that the edges command produces correct output for traffic which is addressed directly to a pod IP. We also re-enabled the "linkerd stat" integration tests which had been disabled while the destination service did not yet support these types of IP queries. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-04-30 11:22:24 -07:00
Zahari Dichev	5149152ef3	Multicluster gateway and remote setup command (#4265 ) Add multicluster gateway and setup command Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-04-29 20:33:23 +03:00
Zahari Dichev	17dacf5548	Add gateways command, allowing the retrieval of gateway stats (#4241 ) Add gateways command, allowing the retrieval of gateway stats Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-04-27 13:55:01 +03:00
Zahari Dichev	09262ebd72	Add liveliness checks and metrics for multicluster gateway (#4233 ) Add liveliness checks for gateway Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-04-27 13:06:58 +03:00
Tarun Pothulapati	2b1cbc6fc1	charts: Using downwardAPI to mount labels to the proxy container (#4199 ) * use downward API to mount labels to the proxy container as a volume * add namespace as a label to the pod * add a trace inject test * add downwardAPi for controlplaneTracing * add controlPlaneTracing condition to volumeMounts * update add-ons to have workload-ns * add workload-ns label to control-plane components Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2020-04-22 10:33:51 -05:00
Alex Leong	9bf54d36ed	Upgrade to go 1.14.2 (#4278 ) Upgrade Linkerd's base docker image to use go 1.14.2 in order to stay modern. The only code change required was to update a test which was checking the error message of a `crypto/x509.CertificateInvalidError`. The error message of this error changed between go versions. We update the test to not check for the specific error string so that this test passes regardless of go version. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-04-20 17:14:51 -07:00
Alex Leong	5d3862c120	Use /live for liveness probe (#4270 ) Fixes #3984 We use the new `/live` admin endpoint in the Linkerd proxy for liveness probes instead of the `/metrics` endpoint. This endpoint returns a much smaller payload. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-04-17 14:53:32 -07:00
Kevin Leimkuhler	b6aad75b35	Add `operationID` field to tap openapi response (#4245 ) This fixes an issue users are experiencing when upgrading from from Linkerd 2.6 to 2.7 and use the [kubernetes-external-secrets]() project. The change introduced by #3700 resulted in the tap service showing up in the `/openapi/v2` API response. I confirmed this with a local build. A dependency within the project expects the `operationID` field to be present in the swagger definition. It is optional as stated in the [spec](https://swagger.io/docs/specification/paths-and-operations/). It's purpose is to identify an operation and should be unique. This change adds that field to tap service swagger spec. While this can be fixed in the KES dependency, it certainly does not hurt to add and other libraries may similarly expect this field. Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2020-04-15 09:41:06 -07:00
Zahari Dichev	26c14d3c66	Detect changes in addresses when getting updates in endpoints watcher (#4104 ) Detect changes in addresses when getting updates in endpoints watcher Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-04-10 11:42:39 +03:00
Alex Leong	d8eebee4f7	Upgrade to client-go 0.17.4 and smi-sdk-go 0.3.0 (#4221 ) Here we upgrade our dependencies on client-go to 0.17.4 and smi-sdk-go to 0.3.0. Since smi-sdk-go uses client-go 0.17.4, these upgrades must be performed simultaneously. This also requires simultaneously upgrading our dependency on linkerd/stern to a SHA which also uses client-go 0.17.4. This keeps all of our transitive dependencies synchronized on one version of client-go. This ALSO requires updating our codegen scripts to use the 0.17.4 version of code-generator and running it to generate 0.17.4 compatible generated code. I took this opportunity to update our code generation script to properly use the version of code-generater from `go.mod` rather than a hardcoded SHA. Signed-off-by: Alex Leong <alex@buoyant.io>	2020-04-01 10:07:23 -07:00
Zahari Dichev	10ecd8889e	Set auth override (#4160 ) Set AuthOverride when present on endpoints annotation Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-03-25 10:56:36 +02:00
Mayank Shah	963b9b049a	Add kubectl-style label selectors (#4120 ) * Update tap, routes and top commands to support label selectors Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>	2020-03-20 10:45:06 -05:00
Alejandro Pedraza	8f79e07ee2	Bump proxy-init to v1.3.2 (#4170 ) * Bump proxy-init to v1.3.2 Bumped `proxy-init` version to v1.3.2, fixing an issue with `go.mod` (linkerd/linkerd2-proxy-init#9). This is a non-user-facing fix.	2020-03-17 14:49:25 -05:00
Kevin Leimkuhler	10db65bcb3	Update linkerd/stern to fix go.mod parsing (#4173 ) ## Motivation I noticed the Go language server stopped working in VS Code and narrowed it down to `go build ./...` failing with the following: ``` ❯ go build ./... go: github.com/linkerd/stern@v0.0.0-20190907020106-201e8ccdff9c: parsing go.mod: go.mod:3: usage: go 1.23 ``` This change updates `linkerd/stern` version with changes made in linkerd/stern#3 to fix this issue. This does not depend on #4170, but it is also needed in order to completely fix `go build ./...`	2020-03-17 11:16:18 -07:00
Zahari Dichev	2db307ee91	Remove target port requirement in port resolution (#4174 ) This change removes the target port requirement when resolving ports in the dst service. Based on the comments, it seems that we need to have a target port defined in the port spec in order to resolve to the port in the Endpoints. In reality if target port is note defined when creating the service, k8s will set the port and the target port to the same value. Seems to me that checking for the targetPort to be different than 0, is a no-op. Signed-off-by: Zahari Dichev zaharidichev@gmail.com	2020-03-16 23:04:08 +02:00
Zahari Dichev	caf4e61daf	Enable identitiy on endpoints not associated with pods (#4134 ) Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2020-03-09 20:55:57 +02:00

1 2 3 4 5 ...

607 Commits