Commit Graph

2636 Commits

Author SHA1 Message Date
Oliver Gould c454ac413c
Upgrade to Rust 1.24.0 (#363)
Upgrade to Rust 1.24.0
2018-02-16 14:37:29 -08:00
Brian Smith 64f270b631
Strip `conduit` CLI executables in `docker build`. (#367)
File sizes (in bytes) before and after this change:

        conduit-darwin conduit-linux conduit-windows
Before:     27,056,288    27,282,364      27,359,744
After:      20,023,456    18,080,576      18,262,528
----------------------------------------------------
Diff         7,032,832     9,201,788       9,097,216

Fixes #352.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-16 08:20:18 -10:00
Alex Leong 552204366c
Use Prometheus to track added data plane pods. (#338)
The instance cache that powers the ListPods API is stored in memory in the telemetry service. This means that when there are multiple replicas of the telemetry service, each replica will have a distinct, incomplete view of the added pods based on which pods report to that telemetry replica. This causes the data plane bubbles on the dashboard to not all be filled in, and to flicker with each data refresh.

We create a Prometheus counter called reports_total which has pod as a label. Whenever a telemetry service instance receives a report from a pod, it increments reports_total for that pod. This allows us to remove the in-memory instance cache and instead query Prometheus to see if each pod has had a report in the last 30 seconds.

Fixes #337

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-02-14 16:09:55 -08:00
Kevin Lingerfelt 300fd3475b
Remove unused web routes and helper (#356)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-02-14 11:52:39 -08:00
Andrew Seigner 1db7d2a2fb
Ensure latency quantile queries match timestamps (#348)
In PR #298 we moved time window parsing (10s => (time.now - 10s,
time.now) down the stack to immediately before the query. This had the
unintended effect of creating parallel latency quantile requests with
slightly different timestamps.

This change parses the time window prior to latency quantile fan out,
ensuring all requests have the same timestamp.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-02-13 16:26:54 -08:00
Brian Smith aa123b8ad5
Test the proxy in release mode in Docker in CI on the master branch. (#327)
* Test the proxy in release mode in Docker in CI on the master branch.

Previously we were not running the proxy tests in the release configuration.

Run the proxy tests in the release configuration through Docker.

Docker builds with tests in release mode are too slow to run on every
pull request so release mode tests will only be run on the master
branch.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-13 12:16:50 -10:00
Andrew Seigner 50f4aa57e5
Require timestamp on all telemetry requests (#342)
PR #298 moved summary (non-timeseries) requests to Prometheus' Query
endpoint, with no timestamp provided. This Query endpoint returns a
single data point with whatever timestamp was provided in the request.
In the absense of a timestamp, it uses current server time. This causes
the Public API to return discreet data points with slightly different
timestamps, which is unexpected behavior.

Modify the Public API -> Telemetry -> Prometheus request path to always
require a timestamp for single data point requests.

Fixes #340

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-02-13 13:52:21 -08:00
Oliver Gould 4154db2d4f
Improve wording around Getting Started (#288)
Some of the phrasing around the getting started section of the README
was awkward.
2018-02-13 13:38:37 -08:00
Brian Smith b18fe459d4
Precompile large Go libraries in go-deps Docker image. (#332)
On my system (i9-7960x running Docker natively in Linux) this regularly saves
over 11 seconds of build time when a file under pkg/ changes and over 1.5
seconds of build time when a file under controller/ changes. Since most
contributors are running Docker in a VM on less powerful computers, the
savings for most contributors should be significantly greater.

I imagine the savings for web/ and cli/ and proxy-init/ are similar, but I
did not measure them.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-13 11:35:10 -10:00
Andrew Seigner 797bba6bc6
Upgrade to Prometheus 2.1.0 (#344)
Conduit has been on Prometheus 1.8.1. Prometheus 2.x promises better
performance.

Upgrade Conduit to Prometheus 2.1.0

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-02-13 13:22:53 -08:00
Brian Smith 37008f9626
Improve caching behavior of controller/Dockerfile. (#331)
Precompiling pkg/ in an earlier layer saves ~10 seconds of wall clock
time on an incremental build on my machine (i9-7960x) when I update a
file in controller/ such as controller/destination/server.go. This
makes a significant difference in the edit-build-test loop.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-13 11:21:22 -10:00
Brian Smith ec5a02fd64
Upgrade to Go 1.9.4. (#326)
Go 1.9.4 is a security release.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-12 13:47:40 -10:00
Brian Smith 86ea1c06bf
Improve the caching behavior of Dockerfile-go-deps. (#325)
Previously Dockerfile-go-deps would run `dep ensure` whenever anything in the
source tree changed. Also, because it was a multi-stage Dockerfile it did not
work well with Docker's `--cache-from` feature.

Change Dockerfile-go-deps to only re-run `dep ensure` when Gopkg.{toml,lock}
and/or bin/dep change. Simplify it to a single stage so that it works better
with Docker's `--cache-from` feature.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-12 13:40:20 -10:00
Brian Smith c78df4ba13
Use bin/dep in Dockerfile-go-deps. (#324)
bin/dep verifies the digest of the `dep` downloaded `dep` executable,
whereas previously Dockerfile-go-deps wasn't.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-12 13:32:08 -10:00
Risha Mars 1f6aa27922
UI updates, graph removals (#319)
UI cleanups. Remove repetitive labels in the UI, remove unused elements, 
remove graphs until we improve their utility.

- remove “Deployment” from the headers of the Deployment Detail Page
- remove Routes in sidebar
- kill leftmost 100px of sidebear
- remove word controller from service mesh page first table
- add twitter and GitHub and slack links
- kill the graphs, replace with one large header (request rate, success rate, latency top bar)
put upstream/downstream diagram before upstream downstream tables

* Clean up DeploymentList page (#321)

- remove "Most active deployments" graphs from the Deployments List page
- remove the scatterplot sections of the page as I don't think we'll be using them for a while
2018-02-12 12:44:33 -08:00
Andrew Seigner 261586b862
Fix pointer copying (#330)
The Public APIs stat endpoint copies a slice of values to a slice of
pointers prior to gRPC response. Go's range clause re-uses the same
pointer for each iteration of the loop, causing a slice of {1,2,3}
becoming {3,3,3}.

Fix the range loop to directly reference pointers in the slice of
values, ignoring the range variable. Also add tests to catch this case.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-02-10 11:04:28 -08:00
Eliza Weisman 8bc497a057
Remove unused metrics (#322)
Removed the `method` label from Prometheus, and removed HTTP methods from reports. Removed `StreamSummary` from reports and replaced it with a `u32` count of streams.

Closes #266 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-02-09 17:14:17 -08:00
Andrew Seigner bffa5ff3e6
Concurrent Telemetry requests (#323)
All requests from the public API service to the Telemetry service were
done serially. In some cases a single request to the public API's Stat
endpoint resulted in 5 serial requests to the Telemetry service.

Make all requests from the Public API to Telemetry concurrent.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>

Part of #299
2018-02-09 17:11:20 -08:00
Eliza Weisman 458e9d2ac5
Remove per-path metrics from telemetry pipeline (#317)
Follow-up from #315.

Now that the UIs don't report per-path metrics, we can remove the path label from Prometheus, the path aggregation and filtering options from the telemetry API, and the path field from the proxy report API.

I've modified the tests to no longer expect the removed fields, and manually verified that Conduit still works after making these changes.

Closes #265 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-02-09 14:20:28 -08:00
Eliza Weisman 6c2ac6125f
Remove per-path metrics from UIs (#315)
I've removed per-path metrics from the web dashboard and from the `conduit stat` command. 

Manually validated that these metrics are no longer displayed.

Closes  #263

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-02-09 12:35:49 -08:00
Andrew Seigner 33e3c3ace9
Optimize Prometheus queries (#298)
Prometheus queries from the Telemetry service were taking seconds or 10s
of seconds.

Optimize these queries:
- Move all summary queries requiring a single point data off of Prometheus'
  QueryRange() endpoint, onto Query()
- Set `defaultVectorRange` to 30s, and also use it regardless of time
  window
Also add tests for grpc_server and telemetry server

Signed-off-by: Andrew Seigner <siggy@buoyant.io>

Fixes #260
2018-02-09 10:55:07 -08:00
Eliza Weisman 2015d992cc
Remove pod-level metrics from web and CLI (#304)
This PR updates the web UI to remove the pod detail page, and to remove the links to that page from pod names in metrics tables. It also removes the `pods` option from `conduit stat`, and the `sourcePod` and `targetPod` fields from the controller API proto's `MetricMetadata` message.

I've updated the `conduit stat` tests to reflect these changes, and manually verified the web UI changes.

Closes #261 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-02-08 19:07:10 -08:00
Risha Mars 81d4b7b924
Fix bug where table data wasn't being updated (#290) 2018-02-08 10:33:33 -08:00
Brian Smith 4fadfa2243
Don't manually install Docker in Travis CI. (#297)
Travis CI now installs Docker 17.09 or later, which is good enough for
us, so avoid installing Docker manually.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-08 08:31:00 -10:00
Jeff Haynie f721a0f800 Fixed mispelling in conduit inject args (#300) 2018-02-08 12:48:40 -05:00
Eliza Weisman 915f08ac4c
Store proxy latencies in a structure that matches controller histogram (#11)
The proxy currently stores latency values in an `OrderMap` and reports every observed latency value to the controller's telemetry API since the last report. The telemetry API then sends each individual value to Prometheus. This doesn't scale well when there are a large number of proxies making reports. 

I've modified the proxy to use a fixed-size histogram that matches the histogram buckets in Prometheus. Each report now includes an array indicating the histogram bounds, and each response scope contains a set of counts corresponding to each index in the bounds array, indicating the number of times a latency in that bucket was observed. The controller then reports the upper bound of each bucket to Prometheus, and can use the proxy's reported set of bucket bounds so that the observed values will be correct even if the bounds in the control plane are changed independently of those set in the proxy.

I've also modified `simulate-proxy` to generate the new report structure, and added tests in the proxy's telemetry test suite validating the new behaviour.
2018-02-07 18:02:59 -08:00
Kevin Lingerfelt fbb4e812f8
Change default version string from "unknown" to "latest" (#284)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-02-07 10:01:12 -08:00
Risha Mars ff15574a0d
MetricsTable: Consolidate latency, success, request metrics into one tab (#276)
* Consolidate latency, success, request metrics into one tab
on the SortableMetricsTable

- removes sparklines from the table
- makes tables sortable by default
- move pod table in DeploymentDetail to its own row

* remove request distribution column, reorder columns
2018-02-07 09:50:01 -08:00
Oliver Gould a2d537f5c4
Use a load-aware balancer (#251)
Currently, the conduit proxy uses a simplistic Round-Robin load
balancing algorithm. This strategy degrades severely when individual
endpoints exhibit abnormally high latency.

This change improves this situation somewhat by making the load balancer
aware of the number of outstanding requests to each endpoint. When nodes
exhibit high latency, they should tend to have more pending requests
than faster nodes; and the Power-of-Two-Choices node selector can be
used to distribute requests to lesser-loaded instances.

From the finagle guide:

    The algorithm randomly picks two nodes from the set of ready endpoints
    and selects the least loaded of the two. By repeatedly using this
    strategy, we can expect a manageable upper bound on the maximum load of
    any server.

    The maximum load variance between any two servers is bound by
    ln(ln(n))` where `n` is the number of servers in the cluster.

Signed-off-by: Oliver Gould <ver@buoyant.io>
2018-02-07 09:39:31 -08:00
Kevin Lingerfelt 447ee142c0
Stop running "cargo check" in CI (#285)
* Stop running "cargo check" in CI

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Attempt to clear cargo cache

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Remove cache clearing step

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-02-06 15:57:22 -08:00
Oliver Gould 95b91c5976
Set PROXY_SKIP_TESTS for CI Docker build (#283)
The SKIP_TESTS flag is not used. The PROXY_SKIP_TESTS flag should be set
so that unoptimized proxy tests are not built.
2018-02-06 13:37:38 -08:00
Oliver Gould 6a0936e699
Remove proxy/Dockerfile-deps (#279)
The current proxy Dockerfile configuration does not cache dependencies
well, which can increase build times substantially.

By carefully splitting proxy/Dockerfile into several stages that mock
parts of the project, dependencies may be built and cached in Docker
such that changes to the proxy only require building the conduit-proxy
crate.

Furthermore, proxy/Dockerfile now runs the proxy's tests before
producing an artifact, unless the ` PROXY_SKIP_TESTS` build-arg is set
and not-empty.

The `PROXY_UNOPTIMIZED` build-arg has been added to support quicker,
debug-friendly builds.
2018-02-06 13:01:38 -08:00
Risha Mars 185f48b086
DeploymentsList: Replace "least healthy" with "most active" deployments (#277)
* Replace Least Healthy Deployments section with Most Active Deployments (MAD)

* Fix old arguments to ConduitLink
2018-02-06 11:35:57 -08:00
Phil Calçado 8041d07e4d
Add --url option to skip browser when opening dashboard (#281)
Browser opening is a UX nicety, but it shouldn't be mandatory.
2018-02-06 14:07:55 -05:00
Phil Calçado 9c03764a29
Remove hardcoded port and shared state for http test (#282)
We now create a new test HTTP server per test case instead of sharing it across them all.

This should solve the data races we have experienced on Travis.

Signed-off-by: Phil Calcado <phil@buoyant.io>
2018-02-06 13:48:14 -05:00
Oliver Gould e2093e37f8
Move the Rust gRPC bindings to a dedicated crate (#275)
The proxy depends on `protoc`-generated gRPC bindings to communicate
with the controller. In order to generate these bindings, build-time
dependencies must be compiled.

In order to support a more granular, cacheable build scheme, a new crate
has been created to house these gRPC bindings,
`conduit-proxy-controller-grpc`.

Because `TryFrom` and `TryInto` conversions are implemented for
protobuf-defined types, the `convert` module also had to be moved to
into a dedicated crate.

Furthermore, because the proxy's tests require that
`quickcheck::Aribtrary` be implemented for protobuf types, the
`conduit-proxy-controller-grpc` crate supports an _arbitrary_ feature
fla protobuf types, the `conduit-proxy-controller-grpc` crate supports
an _arbitrary_ feature flag.

While we're moving these libraries around, the `tower-router` crate has
been moved to `proxy/router` and renamed to `conduit-proxy-router.`
`futures-mpsc-lossy` has been moved into the proxy directory but has not
been renamed.

Finally, the `proxy/Dockerfile-deps` image has been updated to avoid the
wasteful building of dependency artifacts, as they are not actually used
by `proxy/Dockerfile`.
2018-02-06 10:31:48 -08:00
Brian Smith c52600eb78
Check SHA-256 sum of dep binary before running it. (#272)
Previously we didn't verify that the downloaded dep binary is the right
binary.

Verify that the downloaded binary is correct.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-05 16:02:35 -10:00
Phil Calçado 5628d3c8f4
Add --short and --client to CLI version command (#274)
These are very useful when writing scripts, e.g.

conduit_version=`conduit version --short --client`

Signed-off-by: Phil Calcado <phil@buoyant.io>
2018-02-05 17:02:45 -05:00
Risha Mars c2da891be7
Minor UI title renames and other tweaks (#256)
* ServiceMesh: plot public-api instead of destination, retitle destination and telemetry graphs

* ResourceHealthOverview: Hide Inbound/Outbound request rate if there are 0 deployments

* ResourceMetricOverview: retitle DeploymentDetail/PodDetail sections
2018-02-05 11:27:31 -08:00
Brian Smith 704f00ae8f
Allow bin/dep wrapper script for dep to work on Windows. (#271)
Previously the script only worked on Linux and macOS.

Make it work on Windows too.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-05 09:24:18 -10:00
Brian Smith 4da0b57204
Always use the 64-bit version of dep. (#270)
The logic for choosing the 32-bit vs. 64-bit version of dep was
inverted.

Fix this by simply always using the 64-bit version.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-05 09:07:31 -10:00
Risha Mars 9887f10749
Add ability to change the time window for metrics fetching throughout the app (#237)
* Control metricsWindow from root of app

- Add buttons [currently hidden] on metrics pages to control window of metrics requests
- Consolidate metricsWindow usage (stop passing it around)
- Add a ConduitLink component so we can stop passing around pathPrefix
- Add tests for ApiHelpers

* Hide the time window buttons; fix bug in absolute links
* Add a note explaining why metricWindow buttons are disabled
* Convert ConduitLink in to a component that wraps another
2018-02-05 10:56:17 -08:00
Alex Leong b691c2e25b
Rename --version flag in conduit install to --conduit-version (#255)
This makes the `conduit install` flag match the `conduit inject` flag.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-02-05 10:45:44 -08:00
Andrew Seigner 4156af786d
Enable race detection in ci (#259)
We previously did not have race detection enabled because our tests
would fail. Following #249, this is no longer the case.

Enable race detection in ci and build instructions. This change also
fixes client_test.go attempting to allocate a 2GB buffer due to bad test
input.

Fixes #173

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-02-02 15:04:52 -08:00
Risha Mars 9cadb30795
Fix reference to emojivoto voting deployment (#267) 2018-02-02 13:06:59 -08:00
Andrew Seigner 9a40d984ff
Replace shelling out with kubernetes proxy (#249)
The conduit dashboard command asychronously shells out and runs "kubectl
proxy".

This change replaces the shelling out with calls to kubernetes proxy
APIs. It also allows us to enable race detection in our go tests, as the
shell out code tests did not pass race detection.

Fixes #173

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-02-02 10:31:59 -08:00
Alex Leong fa2f5a0140
Add dep wrapper script to ensure consistent version of dep is used (#253)
* Add `bin/dep` which fetches a fixed version of `dep` to be used. 
* Upgrade from dep 0.3.1 to 0.4.1
* Fix inconsistent Gopkg.lock by checking in the result of `bin/dep ensure`

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-02-01 16:09:05 -08:00
Oliver Gould e771f61298
Add a newline to dco.yml (#254) 2018-02-01 15:16:02 -08:00
Andrew Seigner 277c06cf1e
Simplify and refactor k8s labels and annnotations (#227)
The conduit.io/* k8s labels and annotations we're redundant in some
cases, and not flexible enough in others.

This change modifies the labels in the following ways:
`conduit.io/plane: control` => `conduit.io/controller-component: web`
`conduit.io/controller: conduit` => `conduit.io/controller-ns: conduit`
`conduit.io/plane: data` => (remove, redundant with `conduit.io/controller-ns`)
It also centralizes all k8s labels and annotations into
pkg/k8s/labels.go, and adds tests for the install command.

Part of #201

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-02-01 14:12:06 -08:00
Oliver Gould abaf498cfc
Do not require DCO signoff for project members (#252)
We only need the DCO bot to validate external submissions.
2018-02-01 13:45:32 -08:00