* Add integration test for conduit controller stats
* Update test for new SECURED column in stat output
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Add an app-wide context for global props.
We've been passing the `api` object down from the top of the react tree. With
16.x, there's now the ability to have context that can inject anywhere in the
tree. This creates a top level context provider that contains most of the global
variables we've been using (api, appData, ...). It subsequently cleans up some
of the routes and nested components.
- Bumps `react-dom` to 16.3.2 (to match `react`).
- Adds `enzyme-context-patch` for now. This is fixed in enzyme master, but there
has not been a release yet. Needs to be removed when that is fixed.
* Use a default inside appData for controllerNamespace
* Update syntax of if to use curly brackets
- Update the `response_total` prometheus query of the StatSummary endpoint to also
break queries out by a `meshed` label.
- Add a 'Secured' column to the web UI/CLI stat displays, which indicate the percentage of traffic
starting and ending in the mesh
This meshed label is used in the CLI/Web UI to display a column of the percentage of traffic that
starts/ends in the mesh. (Which is a proxy indicator for whether that traffic is 'secured' when we
add TLS by default for intra mesh requests).
The `meshed` label is not yet added anywhere, so until it is supplied by the proxy, all traffic will
show up as 0% secured in the web/CLI.
It appears that hyper does not necessarily poll bodies to completion,
and instead simply drops a body as soon as `content-length` is reached
(hyperium/hyper#1521).
This change implements Drop for MeasuredBody such that the stream-end
event is triggered if it had not been triggered previously. This ensures
that response latencies and counts are recorded for HTTP/1 streams.
Fixes#994
In the h1-h2 glue code, we incorrectly called `is_empty()` to determine
if an H1 stream had ended. `is_empty` only returns true if there was no
body at all (rather than if the body has been fully consumed).
By changing this to call `hyper::body::Payload::is_end_stream`, h1
bodies now behave the same as h2 bodies.
Relates to #994
Currently, the proxy records a request's latency as the time between
when a request is opened and when its response stream completes. This is
not what we intend to record, especially when a response is long-lived.
In order to more accurate record latency, we want to track the time at
which the first response body frame is received (which is a close
approximation of time-to-first-byte).
Telemetry aggregation has been changed to use the first-frame time to
compute latencies; tests have been updated to exercise this behavior; and
the metrics documentation has been updated to reflect this change.
Addresses #818
Relates to #980
The proxy does not yet support a `meshed` label.
In anticipation of a `meshed` label in the proxy, introduce this label
in `simulate-proxy`, for testing.
Relates to #306 and #386.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
secured -> meshed
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The Grafana dashboards were explicitly filtering out Conduit
control-plane data.
Remove control-plane filtering from Grafana dashboards. This brings
Grafana in-line with web, and also encourages better dog-fooding of our
proxy metrics and dashboards. Also update Grafana to 5.1.3, update the
BUILD.md architecture diagram to include Promethues and Grafana, and
introduce a Prometheus Benchmark dashboard, courtesy of Robust
Perception.
Fixes#908
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The StatSummary endpoint was dereferencing
StatSummaryRequest.Selector.Resource, causing a panic when it received
an empty request.
Fix StatSummary to use the nil-friendly
StatSummaryRequest.GetSelector().GetResource() methods, and add a test
to validate.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Update web dockerfile to use dev deps when building prod assets
* Don't re-run yarn install as pre-req for build/run/test
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
As part of trying to be fancy, I moved the `setup` step into build. This breaks the docker builds because we need to run yarn *without* NODE_ENV=production and then the build *with* NODE_ENV=production (to do things like minify/compress assets).
Split apart build as something without setup and provide a default target that does setup + build for travis.
- Switched from `es2015` to `env` for the default preset. This is the recommended preset and allows us to track the latest and greatest moving forward.
- Added `react-app` as a preset. We get class properties (and thus => for context) as well as the current recommended settings for react apps.
- Created a `web` script that provides functions for common tasks. `react-app` requires that BABEL_ENV/NODE_ENV is set and this guarantees it.
- Updated the web dockerfile to set NODE_ENV correctly and use `bin/web`.
- Moved the babel related modules over to devDependencies.
Proxy tasks emit events to the telemetry system. These events are used
aggregate counts and latencies, as well as to inform Tap requests.
Initially, these events included durations, describing the relevant time
that elapsed between this event and another.
This approach is somewhat inflexible -- it unnecessarily constrains the
set of measurements that can computed in the telemetry system.
To remedy this, the `Event` types can be changed to report discrete
`Instant`s (rather than `Duration`s). Then, when latencies are computed
in the telemetry system, these discrete instants can be compared to
produce durations.
There are no functional changes in this PR.
Running `conduit install --api-port xxx` where xxx != 8086 would yield a
broken install.
Fix the install command to correctly propagate the `api-port` flag,
setting it as the serve address in the proxy-api container.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Debugging issues in the dashboard is a little frustrating without source maps and the full source map takes awhile to build.
Just enables one of the cheaper source maps by default. It is good enough (tm) for what is there now.
* Remove package-scoped vars in cmd package
* Run gofmt on all cmd package files
* Re-add missing Args setting on check command
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
A common pattern when using the old Tokio API was separating the configuration
of a task from binding it to an executor to run on. This was often necessary
when we wanted to construct a type corresponding to some task before the
reactor on which it would execute was initialized. Typically, this was
accomplished with two separate types, one of which represented the
configuration and exposed only a method to take a reactor `Handle` and
transform it to the other type, representing the actual task.
After we migrate to the new Tokio API in #944, executors no longer need to be
passed explictly, as we can use `DefaultExecutor::current` or
`current_thread::TaskExecutor::current` to spawn a task on the current
executor. Therefore, a lot of this complexity can be refactored away.
This PR refactors the `Config` and `Process` structs in
i`control::destination::background` into a single `Background` struct, and
removes the `dns::Config` and `telemetry::MakeControl` structs (`dns::Resolver`
and `telemetry::Control` are now constructed directly). It should not cause
any functional changes.
Closes#966
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
This PR modifies the Namespace page in the web UI to replace the 3 existing api calls
with a single call.
* Consolidate calls to /metrics to use the new resource type all
* Simplify urlsForResource, add comment with assumptions
Now that `impl Trait` is stable, we don't need to box as many futures. We still
need to box before spawning them on an executor, but the component futures no
longer require their own boxes.
Signed-off-by: Eliza Weisman <eliza@buoyant.io
* Fix bug where we were dropping parts of the StatSummaryRequest
* Add tests for prometheus query strings and for failed cases
Problem
In #928 I rewrote the stat api to handle 'all' as a resource type. To query for all resource types,
we would copy the Resource, LabelSelector and TimeWindow of the original request, and then
go through all the resource types and set Resource.Type for each resource we wanted to get.
The bug is that while we copy over some fields of the original request, we didn't copy over all
of them - namely Resource.Name and the Outbound resource. So the Stat endpoint would
ignore any --to or --from flags, and would ignore requests for a specific named resource.
Solution
Copy over all fields from the request.
I've also added tests for this case. In this process I've refactored the stat_summary_test code
to make it a bit easier to read/use.
Closes#888. Closes#867.
This branch upgrades Conduit to use the new Tokio API. It was also necessary to
upgrade some other dependencies (including `hyper`, and `trust-dns`) alongside
this upgrade.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Instead of having connect errors destroy all buffered requests,
this changes Bind to return a service that can rebind itself when
there is a connect error.
It won't try to establish the new connection itself, but waits for
the buffer to poll again. Combing this with changes in tower-buffer
to remove canceled requests from the buffer should mean that we
won't loop on connect errors for forever.
Signed-off-by: Sean McArthur <sean@seanmonstar.com>
A proxy dispatches requests over a constrained number of routes. When
the router's upper bound is reached---and potentially in other future
scenarios---router capacity is created by removing unused routes, their
load balancers, and all related endpoint stacks.
However, in the current regime, the controller subsystem will continue
to monitor discovery observations. As the number of active observations
expands over time, the controller task ends up with more and more work
to do.
This change introduces a shared atomic boolean between the resolution
returned to the load balancer and the state maintained when
communicating with the service. Before the controller polls its active
resolutions, it first ensures that all unused resolutions are dropped.
A recent upstream change in `tower-h2` (tower-rs/tower-h2@d9b3140) caused some
HTTP/2 streams that were previously terminated by TRAILERS frames to be
terminated by empty DATA frames with the end of stream bit set, instead.
This broke some tests in my dev branch for #944, as our test server also uses
`tower-h2`, and some of the metrics tests were no longer seeing the expected
`StreamResponseEnd` events due to this change. This issue may also occur in
other cases, resulting in incorrect metrics.
This PR changes `MeasuredBody::poll_data` to trigger the Stream End event if
it sees a DATA frame that ends the stream.
Fixes#954
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Problem
If you navigate directly to (or do a hard refresh on) a path with more than one segment,
e.g. http://localhost:8084/namespaces/conduit, the dashboard js is not served.
Pages with two paths have to be accessed by loading the dashboard on a different
path and then clicking through.
When accessing the dashboard via conduit dashboard we append a path prefix so that
we can connect using the k8s proxy. This means that moving the dashboard to serve
images off relative paths won't work, because we need to serve images whether the
dashboard is loaded from http://localhost:8084/namespaces/conduit or
from http://localhost:8084/namespaces.
Solution
Check whether we're serving the dashboard with the proxy url, and if we are, adjust
the url at which we serve the index bundle from.
I've also added a very manual override if the conduit logo can't be found at the usual url.
While preparing #946, I was again struck by the `discovery` module being very weighty
(nearly 800 dense lines). The intent of this change is only to improve readability. There
are no functional changes. The following aesthetic changes have been made:
* `control::discovery` has been renamed to `control::destination` to be more consistent
with the rest of conduit's terminology (destinations aren't the only thing that need to
be discovered).
* In that vein, the `Discovery` type has been renamed `Resolver` (since it exposes one
function, `resolve`).
* The `Watch` type has been renamed `Resolution`. This disambiguates the type form
`futures_watch::Watch`(which is used in the same code) and makes it more clearly the
product of a `Resolver`.
* The `Background` and `DiscoveryWork` names were very opaque. `Background` is now
`background::Config` to indicate that it can't actually _do_ anything; and
`DiscoveryWork` is now `background::Process` to indicate that it's responsible for
processing destination updates.
* `DestinationSet` is now a private implementation detail in the `background` module.
* An internal `ResolveRequest` type replaces an unnamed tuple (now that it's used across
files).
* `rustfmt` has been run on `background.rs` and `endpoint.rs`
This is in preparation for landing the Tokio upgrade.
In the upcoming Hyper release, the handling of absolute form request URIs
moved from `hyper::Request` to the `hyper::client::connect::Connect` trait.
Once we upgrade to the new Tokio, we will have to upgrade our Hyper
dependency as well.
Currently, Conduit detects whether the request URI is in absolute form in
`h1::normalize_our_view_of_uri` and adds an extension to the request if it is.
This will no longer work with the new Hyper, as that function is called from
the `bind::NormalizeUri` service, which is not constructed until after the
client connection is established. Therefore, it is necessary to move this
information to `bind::Protocol`, so that it can be passed to
`transparency::client::HyperConnect` (our implementation of Hyper's `Connect`
trait) when we are using the newest Hyper.
For now, however, I've left in the `UriIsAbsoluteForm` extension and continued
to set it in `h1::normalize_our_view_of_uri`, since we currently still use it
on the current Hyper version. I thought it was good to minimize the changes to
this existing code, as it will be removed when we migrate to the new Hyper.
This PR shouldn't cause any functional changes.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
This enables the removal of the inline-block display for links and
fixes menu items not showing up when sidebar is expanded on firefox
Problem
Previously we were linking the icon and expand text of the menu bar separately.
This caused the clickable areas of the menus to be inconsistent, which we were
fixing via css. This wasn't consistently displayed across browsers.
Fix
Linkify the whole Menu Item rather than linking the icon and text separately.
This enables the removal of the inline-block display for links and
fixes menu items not showing up when sidebar is expanded on firefox.
Additionally it makes the clicking of menu links way more consistent.
The frontend assets was not optimized, resulting in suboptimal page load times.
Enabled webpack production mode in the Dockerfile, this still allows good development
and debugging experience when running the web interface locally during development.
Also added minification of the CSS handled by css-loader.
The web interface still works as expected.
The size of the JS file has been reduced from 3.6 MB to 1.2 MB.
And the CSS minification has resulted in sidebar.css from 5.71 kB to 4.33 kb,
and styles.css from 4.18 kB to 3.1 kB.
Fixes#378
Signed-off-by: Kim Christensen <kimworking@gmail.com>
This is in preparation for landing the Tokio upgrade.
In order to be generic over Tokio's current thread and threadpool executors,
a number of types in Conduit which were not previously `Send` are now required
to be `Send`. A majority of this work will be done in the main Tokio upgrade
PR, as it is in many cases not possible to make these types `Send` _without_
using the new Tokio API (in order to remove `Handle`s, etc.); however, I'm
factoring out everything possible and trying to land it in separate PRs.
The p2c load balancer constructed in `Outbound` is currently parameterized
over a random number generator. We currently construct it by getting the
thread-local RNG, and passing it to the load balancer constructor. However,
the thread-local RNG is not `Send`. I've fixed this issue by creating a new
zero-sized empty struct type which implements `rand::Rng` simply by calling
`thread_rng()` every time its' called, and passing that to
`choose::power_of_two_choices` instead. Since this is an empty type which
contains no data, and the correct thread-local RNG is accessed whenever
the methods are called, this new type can trivially be `Send`. According to
the `rand` crate's documentation, this is the correct way to use `ThreadRng`
anyway:
> Retrieve the lazily-initialized thread-local random number generator, seeded
> by the system. Intended to be used in method chaining style, e.g.
> `thread_rng().gen::<i32>()`.
> (from https://docs.rs/rand/0.4.2/rand/fn.thread_rng.html)
This shouldn't lead to any functional changes.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
This is in preparation for landing the Tokio upgrade.
The test `discovery::outbound_updates_newer_services` currently contains an
assertion that an HTTP/2 request to an HTTP/1 service will return a response
with status code 500. This is because the current version of Hyper on which
Conduit depends does not support protocol upgrades.
However, commit hyperium/hyper@bc6af88a32, which
adds support for this kind of protocol upgrade, was recently merged to Hyper's
master branch. Therefore, this assertion will no longer be correct once we
depend on the upcoming Hyper release. When we migrate to the new Tokio, it will
be necessary to upgrade our Hyper dependency as well, and this test will fail.
I've modified the test to no longer make assertions about the response's status
code, so that it's compatible with both the current and future Hyper versions.
If the response is not `Ok`, the test will still fail, since
`tests::support::Client::request()` `expect`s that the response is successful,
but the status code is ignored. I've added a comment in the test explaining
this.
Eventually, when the master version of Conduit depends on the latest Hyper, we
may want to change this test to assert that the status code is 200 instead. We
may also want to add more tests for Hyper's protocol upgrade functionality, but
that seems out of scope for this PR.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Allow the Stat endpoint in the public-api to accept requests for resourceType "all".
Currently, this queries Pods, Deployments, RCs and Services, but can be modified
to query other resources as well.
Both the CLI and web endpoints now work if you set resourceType to all.
e.g. `conduit stat all`
The router's cache has no means to evict unused entries when capacity is
reached.
This change does the following:
- Wraps cache values in a smart pointer that tracks the last time of
access for each entry. The smart pointer updates the access time when
the reference to entry is dropped.
- When capacity is not available, all nodes that have not been accessed
within some minimal idle age are dropped.
Accesses and updates to the map are O(1) when capacity is available.
Reclaiming capacity is O(n), so it's expected that the router is
configured with enough capacity such that capacity need not be reclaimed
usually.
* Fix issue where we were waiting for the next polling interval when switching tabs
Fix issue where we were waiting for the next polling interval when switching tabs.
When we switch tabs, we update the Props of the ResourceList component, but we weren't
resetting how we poll the server. This meant we'd wait until the end of the current polling interval
(2s) to get the data for the tab we just switched to.
I've added stopServerPolling and startServerPolling methods so that we can cancel the resource
requests of the page we're leaving and immediately start polling for new data if the resource type
changes.
The router's `Inner` type contains a map of routes. Recently, this map's
capacity has become constrained to prevent leakage for long-running
processes.
This change prepares for a fuller LRU implementation by moving the
router's `Inner` type to a new (tested) module, `cache`.
The router stores its cache and `Recognize` implementation within a `Mutex`,
but there is no need for the recognizer to be locked.
This change creates a new `Cache` type that is locked independently of
`Recognize`. In order to accomplish this, `Recognize::bind_service` has
been changed to take an immutable reference to its `self`.
The (unused) `Single` type has been removed because it relied on
`bind_service` being mutable.
The way that git-related version information is linked into go binaries
busts Docker's cache such that every commit causes all binaries to
rebuilt.
In order to ameliorate this, we can build each binary once without
version information first so that its artifacts are cached. When Go
sources are not changed and only the version information changes, builds
are 4.3x faster than before (from 5+ minutes to <90s).
On `master`
Branch off of master and build (mostly cached):
```
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build 9.10s user 6.30s system 5% cpu 4:26.47 total
```
Rebuild without changing anything (highly cached):
```
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build 9.23s user 6.04s system 47% cpu 32.017 total
```
Update only the git sha and rebuild:
```
:; git ci -am 'bump it' --allow-empty
[ver/eg 2749eb3] bump it
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build 8.55s user 6.08s system 4% cpu 5:22.25 total
```
On this branch:
Rebuild without changing anything (highly cached):
```
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build 8.94s user 5.97s system 46% cpu 32.257 total
```
Update only the git sha and rebuild:
```
:; git ci -am 'bump it' --allow-empty
[ver/go-docker-cache-versionless 77a80b5] bump it
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build-cli-bin 2.02s user 1.34s system 9% cpu 34.144 total
```