Commit Graph

30 Commits

Author SHA1 Message Date
Oliver Gould 2a4f38b9e7
proto: Use explicit `go_package` option (#1120)
protobuf has a `go_package` option that can be used to explicitly name
Go packages such that they can be imported without additional rewrites.

This allows us to store proto files without additional, redundant
directories (which were used for packaging hints, previously).

This change adds an explicit `go_package` to all .proto files and
updates `bin/protoc-go.sh` to ensure these packages are output into
$GOPATH (so that the go_package can be absolute). This removes the need
to manually rewrite imports in bin/protoc-go.sh.
2018-06-14 14:03:00 -07:00
Kevin Lingerfelt ec2433e9bd
Update controller to use 'tls' metric label (#1044)
* Update controller to use 'tls' metric label
* Fix meshed column formatter

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-01 16:44:33 -07:00
Eliza Weisman 5a42ce357e
proto: Add TLS identity to WeightedAddr message (#1041)
Required for #1008.

This PR adds the `TlsIdentity` message to the Destination service proto,
to describe what strategy the proxy should use for verifying an endpoint's TLS
certificates. It also adds a `TlsIdentity` field to the `WeightedAddr` message.

Currently, there is one possible variant for `TlsIdentity`, `KubernetesPodName`, 
which consists of the Kubernetes pod name of the endpoint, the namespace of
the endpoint, and the namespace of that pod's Conduit control plane. The proxy
should attempt to connect over TLS if the control plane namespace matches its 
own control plane namespace. The pod name and namespace are used to verify 
the endpoint's TLS certificate.

See https://github.com/runconduit/conduit/issues/386#issuecomment-392948046.

This change was initially part of #1008, but I factored it out to make the diff
smaller.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-05-31 11:48:25 -07:00
Risha Mars ffabdefc6c
Add queries to prometheus to determine number of fully meshed requests (#983)
- Update the `response_total` prometheus query of the StatSummary endpoint to also
break queries out by a `meshed` label. 
- Add a 'Secured' column to the web UI/CLI stat displays, which indicate the percentage of traffic
starting and ending in the mesh

This meshed label is used in the CLI/Web UI to display a column of the percentage of traffic that
starts/ends in the mesh. (Which is a proxy indicator for whether that traffic is 'secured' when we
add TLS by default for intra mesh requests).

The `meshed` label is not yet added anywhere, so until it is supplied by the proxy, all traffic will
show up as 0% secured in the web/CLI.
2018-05-24 11:05:09 -07:00
Risha Mars f94856e489
Modify the Stat endpoint to also return the number of failed conduit pods (#895)
* Modify the Stat endpoint to also return the count of failed pods
* Add comments explaining pod count stats
* Rename total pod count to running pod count

This is to support the service mesh overview page, as I'd like to include an indicator of
failed pods there.
2018-05-08 10:35:21 -07:00
Andrew Seigner dce31b888f
Deprecate Tap, rename TapByResource to Tap (#844)
The `conduit tap` command is now deprecated.

Replace `conduit tap` with `connduit tapByResource`. Rename tapByResource
to tap. The underlying protobuf for tap remains, the tap gRPC endpoint now
returns Unimplemented.

Fixes #804

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-25 12:24:46 -07:00
Brian Smith ef5fac2109
Remove docs for never-implemented Destination service heartbeat (#820)
Fixes #610.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-20 04:54:33 -10:00
Oliver Gould 06dd8d90ee
Introduce the TapByResource API (#778)
This changes the public api to have a new rpc type, `TapByResource`.
This api supersedes the Tap api. `TapByResource` is richer, more closely 
reflecting the proxy's capabilities.

The proxy's Tap api is extended to select over destination labels,
corresponding with those returned by the Destination api.

Now both `Tap` and `TapByResource`'s responses may include destination
labels.

This change avoids breaking backwards compatibility by:

* introducing the new `TapByResource` rpc type, opting not to change Tap
* extending the proxy's Match type with a new, optional, `destination_label` field.
* `TapEvent` is extended with a new, optional, `destination_meta`.
2018-04-18 15:37:07 -07:00
Andrew Seigner 727521f914
Permit arbitrary time windows in public-api (#774)
The public-api previously only permitted 4 hard-coded time windows:
10s, 1m, 10m, 1h. This was primarily a relic of the recently removed
telemetry system.

Modify the public-api to validate the time string, but allow for any
window size, which is then passed through to Prometheus.

Fixes #686

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-16 17:37:17 -07:00
Andrew Seigner 77fb6d3709
Add namespace as a resource type in public-api (#760)
* Add namespace as a resource type in public-api

The cli and public-api only supported deployments as a resource type.

This change adds support for namespace as a resource type in the cli and
public-api. This also change includes:
- cli statsummary now prints `-`'s when objects are not in the mesh
- cli statsummary prints `No resources found.` when applicable
- removed `out-` from cli statsummary flags, and analagous proto changes
- switched public-api to use native prometheus label types
- misc error handling and logging fixes

Part of #627

Signed-off-by: Andrew Seigner <siggy@buoyant.io>

* Refactor filter and groupby label formulation

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Rename stat_summary.go to stat.go in cli

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Update rbac privileges for namespace stats

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-13 16:53:01 -07:00
Kevin Lingerfelt fb15fe7c1a
Remove the telemetry service (#757)
* Remove the telemetry service

The telemetry service is no longer needed, now that prometheus scrapes
metrics directly from proxies, and the public-api talks directly to
prometheus. In this branch I'm removing the service itself as well as
all of the telemetry protobuf, and updating the conduit install command
to no longer install the service. I'm also removing the old version of
the stat command, which required the telemetry service, and renaming the
statsummary command to stat.

* Fix time window tests

* Remove deprecated controller scrape config

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-13 11:21:29 -07:00
Andrew Seigner 624b87f743
Implement ListPods in public-api (#743)
The ListPods endpoint's logic resides in the telemetry service, which is
going away.

Move ListPods logic into public-api, use new k8s informer APIs.

Fixes #694

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-11 17:53:57 -07:00
Andrew Seigner 259fdcd134
Add latency stats in new stat summary endpoint (#737)
The new StatSummary endpoint was only providing request volume and
successs rate information.

Add support for retrieving latency stats via StatSummary. Also make
all prometheus calls in parallel, and implement kubernetes test
fixtures.

Fixes #681

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-11 11:58:32 -07:00
Risha Mars 2f5b5ea5f2
Start implementing conduit stat summary endpoint (#671)
Start implementing new conduit stat summary endpoint. 
Changes the public-api to call prometheus directly instead of the
telemetry service. Wired through to `api/stat` on the web server,
as well as `conduit statsummary` on the CLI. Works for deployments only.

Current implementation just retrieves requests and mesh/total pod count 
(so latency stats are always 0). 

Uses API defined in #663
Example queries the stat endpoint will eventually satisfy in #627

This branch includes commits from @klingerf 

* run ./bin/dep ensure
* run ./bin/update-go-deps-shas
2018-04-05 17:05:06 -07:00
Risha Mars d1a39ea6bf
Define a new telemetry Stat API (#663)
* Define a new telemetry Stat API

Proposal definition for a new Stat API, for the purposes of satisfying the queries proposed in #627.
StatSummary will replace Stat once implemented and the original Stat deleted.
2018-04-03 14:45:58 -07:00
Phil Calçado 19001f8d38 Add pod-based metric_labels to destinations response (#429) (#654)
* Extracted logic from destination server
* Make tests follow style used elsewhere in the code
* Extract single interface for resolvers
* Add tests for k8s and ipv4 resolvers
* Fix small usability issues
* Update dep
* Act on feedback
* Add pod-based metric_labels to destinations response
* Add documentation on running control plane to BUILD.md

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Fix mock controller in proxy tests (#656)

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

* Address review feedback
* Rename files in the destination package

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-02 18:36:57 -07:00
Brian Smith 7dc21f9588
Add the NoEndpoints message to the Destination API (#564)
Have the controller tell the client whether the service exists, not
just what are available. This way we can implement fallback logic to
alternate service discovery mechanisms for ambigious names.

Signed-off-by: Brian Smith <brian@briansmith.org>
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-03-27 10:45:41 -10:00
Andrew Seigner 50f4aa57e5
Require timestamp on all telemetry requests (#342)
PR #298 moved summary (non-timeseries) requests to Prometheus' Query
endpoint, with no timestamp provided. This Query endpoint returns a
single data point with whatever timestamp was provided in the request.
In the absense of a timestamp, it uses current server time. This causes
the Public API to return discreet data points with slightly different
timestamps, which is unexpected behavior.

Modify the Public API -> Telemetry -> Prometheus request path to always
require a timestamp for single data point requests.

Fixes #340

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-02-13 13:52:21 -08:00
Eliza Weisman 8bc497a057
Remove unused metrics (#322)
Removed the `method` label from Prometheus, and removed HTTP methods from reports. Removed `StreamSummary` from reports and replaced it with a `u32` count of streams.

Closes #266 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-02-09 17:14:17 -08:00
Eliza Weisman 458e9d2ac5
Remove per-path metrics from telemetry pipeline (#317)
Follow-up from #315.

Now that the UIs don't report per-path metrics, we can remove the path label from Prometheus, the path aggregation and filtering options from the telemetry API, and the path field from the proxy report API.

I've modified the tests to no longer expect the removed fields, and manually verified that Conduit still works after making these changes.

Closes #265 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-02-09 14:20:28 -08:00
Eliza Weisman 2015d992cc
Remove pod-level metrics from web and CLI (#304)
This PR updates the web UI to remove the pod detail page, and to remove the links to that page from pod names in metrics tables. It also removes the `pods` option from `conduit stat`, and the `sourcePod` and `targetPod` fields from the controller API proto's `MetricMetadata` message.

I've updated the `conduit stat` tests to reflect these changes, and manually verified the web UI changes.

Closes #261 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-02-08 19:07:10 -08:00
Eliza Weisman 915f08ac4c
Store proxy latencies in a structure that matches controller histogram (#11)
The proxy currently stores latency values in an `OrderMap` and reports every observed latency value to the controller's telemetry API since the last report. The telemetry API then sends each individual value to Prometheus. This doesn't scale well when there are a large number of proxies making reports. 

I've modified the proxy to use a fixed-size histogram that matches the histogram buckets in Prometheus. Each report now includes an array indicating the histogram bounds, and each response scope contains a set of counts corresponding to each index in the bounds array, indicating the number of times a latency in that bucket was observed. The controller then reports the upper bound of each bucket to Prometheus, and can use the proxy's reported set of bucket bounds so that the observed values will be correct even if the bounds in the control plane are changed independently of those set in the proxy.

I've also modified `simulate-proxy` to generate the new report structure, and added tests in the proxy's telemetry test suite validating the new behaviour.
2018-02-07 18:02:59 -08:00
Eliza Weisman eddc37de28 Adopt external tower-grpc and tower-h2 deps #225)
The conduit repo includes several library projects that have since been
moved into external repos, including `tower-grpc` and `tower-h2`.

This change removes these vendored libraries in favor of using the new
external crates.
2018-02-01 11:57:02 -08:00
Andrew Seigner d0a0bb22bd
Move EosCtx to common for Tap and Telemetery (#204)
* Make Eos optional in TapEvent

grpc_status not being set in protobuf is the same as being set to zero,
which is also status OK

Modify TapEvent to include an optional EOS struct

Signed-off-by: Andrew Seigner <siggy@buoyant.io>

Part of #198

* Add Eos to proto & proxy tap end-of-stream events

The proxy now outputs `Eos` instead of `grpc_status` in all end-of-stream tap events. The EOS value is set to `grpc_status_code` when the response ended with a `grpc_status` trailer, `http_reset_code` when the response ended with a reset, and no `Eos` when the response ended gracefully without a `grpc_status` trailer.

This PR updates the proxy. The proto and controller changes are in PR #204.
Part of #198. Closes #202

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-01-24 15:48:00 -08:00
Kevin Lingerfelt fd3cfcb5d9
Move healthcheck proto to separate file, use throughout (#150)
* Move healthcheck proto to separate file, use throughout

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Remove Check message from healthcheck.proto

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Standardize healthcheck protobuf import name

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-01-17 11:15:38 -08:00
Phil Calçado e328db7e87
Adds conduit-api check for status command (#140)
* Abstract Conduit API client from protobuf interface to add new features

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Consolidate mock api clients

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Add simple implementation of healthcheck for conduit api

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Change NextSteps to FriendlyMessageToUser

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Add grpc check for status on the client

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Add simple server-side check for Conduit API

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Fix feedback from PR

Signed-off-by: Phil Calcado <phil@buoyant.io>
2018-01-12 15:35:22 -05:00
Eliza Weisman 63d1a5d70d
Add Protocol field to Transports telemetry (#138)
See #132. This PR adds a protocol field to the ClientTransport and ServerTransport messages, and modifies the proxy to report a value for this field (currently, it's only ever HTTP).

Currently, HTTP/1 and HTTP/2 are collapsed into one Protocol variant, see #132 (comment). I expect that we can treat H1 as a subset of H2 as far as metrics goes.

Note that after discussing it with @klingerf, I learned that the control plane telemetry API currently does not do anything with the ClientTransport and ServerTransport messages, so beyond regenerating the protobuf-generated code, no controller changes were actually necessary. As we actually add metrics to TCP transports, we'll want to make some additions to the telemetry API to ingest these metrics. If any metrics are shared between HTTP and raw TCP transports (say, bytes sent), we'll want to differentiate between them in Prometheus. All the metrics that the control plane currently ingests from telemetry reports are likely to be HTTP-specific (requests, responses, response latencies), or at least, do not apply to raw TCP.

Actually adding metrics to raw TCP transports will probably have to wait until there are raw TCP transports implemented in the proxy...

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-01-11 16:00:38 -08:00
Kevin Lingerfelt 2f114e69fa
Add support for path stats in cli and web api (#13)
* Add support for path stats in cli and web api

The cli stat command supports grouping by pod and deployment. With this
change, it will also support grouping by path, in order to facilitate a
summary stats per individual endpoint.

* Right-align numeric columns in stat output
2017-12-08 12:24:39 -08:00
Kevin Lingerfelt 906d4e8b69
Fix public-api error marshaling and unmarshaling (#16) 2017-12-08 11:03:55 -08:00
Oliver Gould b104bd0676 Introducing Conduit, the ultralight service mesh
We’ve built Conduit from the ground up to be the fastest, lightest,
simplest, and most secure service mesh in the world. It features an
incredibly fast and safe data plane written in Rust, a simple yet
powerful control plane written in Go, and a design that’s focused on
performance, security, and usability. Most importantly, Conduit
incorporates the many lessons we’ve learned from over 18 months of
production service mesh experience with Linkerd.

This repository contains a few tightly-related components:
- `proxy` -- an HTTP/2 proxy written in Rust;
- `controller` -- a control plane written in Go with gRPC;
- `web` -- a UI written in React, served by Go.
2017-12-05 00:24:55 +00:00