Fix broken docker build by moving Service Profile conversion and validation into `/pkg`.
Fix broken integration test by adding service profile validation output to `check`'s expected output.
Testing done:
* `gotest -v ./...`
* `bin/docker-build`
* `bin/test-run (pwd)/bin/linkerd`
Signed-off-by: Alex Leong <alex@buoyant.io>
We implement the getProfiles method in the destination service. This method returns a stream of destination profiles for a given authority. It does this by looking up the ServiceProfile resource in the controller namespace named `<svc>.<ns>` where `<svc>` is the name of the service and `<ns>` is the namespace of the service.
This PR includes:
* Adding a ServiceProfile Custom Resource Definition to linkerd install
* A watch based implementation of the getProfiles method in the destination service, similar to the implementation of get.
* An update to the destination client script that allows querying the getProfiles method.
Signed-off-by: Alex Leong <alex@buoyant.io>
Sometimes, the tap server causes the controller pod to restart after it receives this error.
This error arises when the Tap server does not close gRPC tap streams to proxies before the tap server terminates its streams to its upstream clients and causes the controller pod to restart.
This PR uses the request context from the initial TapByReource to help shutdown tap streams to the data plane proxies gracefully.
fixes#1504
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
Increase the MaxRps on the tap server to 100 RPS.
The max RPS for tap/top was increased in for the CLI #1531, but we were
still manually setting this to 1 RPS in the Web UI and Web server.
Remove the pervasive setting of MaxRps to 1 in the web frontend and server
The default value for the max-rps argument to the tap and top commands is an overly conservative 1rps. This causes the data to come in very slowly and much data to be discarded. Furthermore, because tap requests are windowed to 10 seconds, this causes long pauses between updates.
We fix this in two ways. Firstly we reduce the window size to 1s so that updates will come in at least once per second, even when the actual RPS of the data path is extremely high. Secondly, we increase the default max-rps parameter from 1 to 100. This allows tap to paint an accurate picture of the data much more quickly and sidesteps some sampling bias that happens when the max-rps is low.
In general, tap events tend to happen in bursts. For example, one request in may trigger one or more requests out. Likewise, a single upstream event may trigger several requests to the tapped pod in quick succession. Sampling bias will occur when the max-rps is less than the actual rps and when the tap event limit subdivides these event bursts (biasing towards the first few events in the burst). The greater the max-rps, the less the effects of this bias.
Fixes#1525
Signed-off-by: Alex Leong <alex@buoyant.io>
Previously, we would tap any resource's pods, regardless of whether the pods
were meshed or not. We can't actually tap non-meshed pods, so I'm adding a check
that will filter out non-meshed pods from the pods that tap watches.
Previous behaviour:
When attempting to hang a non meshed pod, it would establish
a watch on the pods, but then never return any results. In the CLI you could
just cancel it with Ctrl-C. In the web, clicking Stop would send a
WebSocket.close(1000) but wouldn't actually close the connection...
Behaviour after change :
If no pods under the specified resource are meshed, it'll
return an error of no pods being found to tap
Fixes#1493.
When the tap server hydrates metadata for the source or destination peer
of a Tap event from the peer's IP address, it doesn't currently add a
namespace label. However, destinations labeled by the proxy do have such
a label.
This is because the tap server currently gets the hydrated labels from
the `GetPodLabels` function, which is also used by the Destination
service for labeling the individual endpoints in a `WeightedAddrSet`
response. However, the Destination service also adds some labels to all
the endpoints in the set, including the namespace and service, so
`GetPodLabels` doesn't return these labels. However, when the tap server
uses that function, it does not add the service or namespace labels.
This branch fixes this issue by adding those labels to the Tap event
after calling `GetPodLabels`. In addition, it fixes a missing space
between the `src/dst_res` and `src/dst_ns` labels in Tap CLI output
with the `-o wide` flag set. This issue was introduced during the
review of #1437, but was missed at the time because the namespace label
wasn't being set correctly.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Based on @adleong's suggestion in
https://github.com/linkerd/linkerd2/pull/1434#pullrequestreview-145428857,
this branch adds label hydration from destination IPs to the Tap server.
This works the same as the label hydration for destination IPs added in
#1434. However, it is only applied to the destination fields of events
recorded by proxies in the inbound direction, since outbound
destinations are already labeled with metadata provided by the
Destination service.
This means that when a user taps inbound traffic, the CLI will show k8s
metadata labels for the destination peer (if it's available). This can
be useful especially when tapping several pods at once, as it makes it
easier to distinguish what pod received a request.
This branch also refactors how the label hydration is performed,
primarily to make adding it to the destination field less repetitive.
Also, the `hydrateIPLabels` function now mutates the label map in the
`TapEvent`, rather than returning a new map of labels, so that the case
where no pod was found doesn't require an additional allocation of an
empty map.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
The `TapEvent` protobuf contains two maps, `DestinationMeta` and
`SourceMeta`. The `DestinationMeta` contains all the metadata provided
by the proxy that originated the event (ultimately originating from the
Destination service), while the `SourceMeta` currently only contains the
source connection's TLS status.
This branch modifies the Tap server to hydrate the same set of metadata
from the source IP address, when the source was within the cluster. It
does this by adding an indexer of pod IPs to pods to its k8s API client,
and looking up IPs against this index. If a pod was found, the extra
metadata is added to the tap event sent to the client.
This branch also changes the client so that if a source pod name was
provided in the metadata, it prints the pod name rather than the IP
address for the `src` field in its output. This mimics what is currently
done for the `dst` field in tap output. Furthermore, the added source
metadata will be necessary for adding src resource types to tap output
(see issue #1170).
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
This PR begins to migrate Conduit to Linkerd2:
* The proxy has been completely removed from this repo, and is now located at
github.com/linkerd/linkerd2-proxy.
* A `Dockerfile-proxy` has been added to fetch the most-recently published proxy
binary from build.l5d.io.
* Proxy-specific protobuf bindings have been moved to
github.com/linkerd/linkerd2-proxy-api.
* All docker images now use the gcr.io/linkerd-io registry.
* `inject` now uses `LINKERD2_PROXY_` environment variables
* Go paths have been updated to reflect the new (future) repo location.
* Modify the Stat endpoint to also return the count of failed pods
* Add comments explaining pod count stats
* Rename total pod count to running pod count
This is to support the service mesh overview page, as I'd like to include an indicator of
failed pods there.
The `conduit tap` command is now deprecated.
Replace `conduit tap` with `connduit tapByResource`. Rename tapByResource
to tap. The underlying protobuf for tap remains, the tap gRPC endpoint now
returns Unimplemented.
Fixes#804
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
public-api and and tap were both using their own implementations of
the Kubernetes Informer/Lister APIs.
This change factors out all Informer/Lister usage into the Lister
module. This also introduces a new `Lister.GetObjects` method.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The TapByResource endpoint was previously a stub.
Implement end-to-end tapByResource functionality, with support for
specifying any kubernetes resource(s) as target and destination.
Fixes#803, #49
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
This changes the public api to have a new rpc type, `TapByResource`.
This api supersedes the Tap api. `TapByResource` is richer, more closely
reflecting the proxy's capabilities.
The proxy's Tap api is extended to select over destination labels,
corresponding with those returned by the Destination api.
Now both `Tap` and `TapByResource`'s responses may include destination
labels.
This change avoids breaking backwards compatibility by:
* introducing the new `TapByResource` rpc type, opting not to change Tap
* extending the proxy's Match type with a new, optional, `destination_label` field.
* `TapEvent` is extended with a new, optional, `destination_meta`.
* Extracted logic from destination server
* Make tests follow style used elsewhere in the code
* Extract single interface for resolvers
* Add tests for k8s and ipv4 resolvers
* Fix small usability issues
* Update dep
* Act on feedback
* Add pod-based metric_labels to destinations response
* Add documentation on running control plane to BUILD.md
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Fix mock controller in proxy tests (#656)
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
* Address review feedback
* Rename files in the destination package
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
When attempting to tap N pods when N is greater than the target rps, a rounding error occurs that requests 0 rps from each pod and no tap data is returned.
Ensure that tap requests at least 1 rps from each target pod.
Tested in Kubernetes on docker-for-desktop with a 15 replica deployment and a maxRps of 10.
Signed-off-by: Alex Leong <alex@buoyant.io>
Previously, running `$conduit tap` would return a `Unexpected EOF` error when the server wasn't available. This was due to a few problems with the way we were handling errors all the way down the tap server. This change fixes that and cleans some of the protobuf-over-HTTP code.
- first step towards #49
- closes#106
* Sort imports
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Upgrade k8s.io/client-go to v6.0.0
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Make k8s store initialization blocking with timeout
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
We’ve built Conduit from the ground up to be the fastest, lightest,
simplest, and most secure service mesh in the world. It features an
incredibly fast and safe data plane written in Rust, a simple yet
powerful control plane written in Go, and a design that’s focused on
performance, security, and usability. Most importantly, Conduit
incorporates the many lessons we’ve learned from over 18 months of
production service mesh experience with Linkerd.
This repository contains a few tightly-related components:
- `proxy` -- an HTTP/2 proxy written in Rust;
- `controller` -- a control plane written in Go with gRPC;
- `web` -- a UI written in React, served by Go.