Version 1.7.0 of the url crate seems to be broken which means we cannot
`cargo update` the proxy without locking url to version 1.6. Since we only
use it in a very limited way anyway, and since we use http::uri for parsing
much more, just switch all uses of the url crate to use http::uri for parsing
instead.
This eliminates some build dependencies.
Signed-off-by: Brian Smith <brian@briansmith.org>
Closes#403.
When the Destination service does not return a result for a service, the proxy connection for that service will hang indefinitely waiting for a result from Destination. If, for example, the requested name doesn't exist, this means that the proxy will wait forever, rather than responding with an error.
I've added a timeout wrapping the service returned from `<Outbound as Recognize>::bind_service`. The timeout can be configured by setting the `CONDUIT_PROXY_BIND_TIMEOUT` environment variable, and defaults to 10 seconds (because that's the default value for [a similar configuration in Linkerd](https://linkerd.io/config/1.3.5/linkerd/index.html#router-parameters)).
Testing with @klingerf's reproduction from #403:
```
curl -sIH 'Host: httpbin.org' $(minikube service proxy-http --url)/get | head -n1
HTTP/1.1 500 Internal Server Error
```
proxy logs:
```rust
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy using controller at HostAndPort { host: Domain("proxy-api.conduit.svc.cluster.local"), port: 8086 }
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy routing on V4(127.0.0.1:4140)
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy proxying on V4(0.0.0.0:4143) to None
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy::transport::connect "controller-client", DNS resolved proxy-api.conduit.svc.cluster.local to 10.0.0.240
proxy-5698f79b66-8rczl conduit-proxy ERR! conduit_proxy::map_err turning service error into 500: Inner(Timeout(Duration { secs: 10, nanos: 0 }))
```
This PR adds a `flaky_tests` cargo feature to control whether or not to ignore tests that are timing-dependent. This feature is enabled by default in local builds, but disabled on CI and in all Docker builds.
Closes#440
The telemetry service in the controller pod uses a non-fully qualified URL to connect to the prometheus pod in the control plane. This PR changes the URL the telemetry's prometheus URL to be fully qualified to be consistent with other URLs in the control plane. This change was tested in minikube. The logs report no errors and looking at the prometheus dashboard shows that stats are being recorded from all conduit proxies.
fixes#414
Signed-off-by: Dennis Adjei-Baah dennis@buoyant.io
Refactor `conduit install` test into a data-driven test. Then add a test of the actual default output of `conduit install`.
This test is useful to make it clear when we change the default
settings of `conduit install`.
Signed-off-by: Brian Smith <brian@briansmith.org>
The control plane is proxied through the Conduit proxy. The Conduit
proxy is based on the base image, and the control plane containers
and the proxy share a networking namespace. This means we don't
need the extra base utilities in the controller images since we can
use the utilties in the proxy image.
This is a step towards building the initial no-networking Conduit CA
pod. Since the Conduit CA will not do any networking of its own, we
networking debugging utilties are not helpful for it. They are
actually an unnecessary risk because they could facilitate the
exfiltration of the private key of the CA. (The Conduit CA pod won't
have the Conduit Proxy injected into it either.)
This also simplifies & slightly speeds up the building of the
controller images. This is a stepping stone towards being able to
build the controller images without `docker build` to improve build
times.
Signed-off-by: Brian Smith <brian@briansmith.org>
The `app` label should be reserved for end-user applications and we
shouldn't use it ourselves. We already have a Conduit-specific label
that is is prefixed with the `conduit.io/` prefix to avoid naming
collisions with users' labels, so just use that one instead.
Signed-off-by: Brian Smith <brian@briansmith.org>
The template used by `conduit install` was hard-coded in install.go.
This change moves the template into its own file, in anticipation of
increasing the template's size and complexity.
Part of #420
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
In order to take advantage of the benefits the conduit proxy gives to deployments, this PR injects the conduit proxy into the control plane pod. This helps us lay the groundwork for future work such as TLS, control plane observability etc.
Fixes#311
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
The build scripts assume they are executed from the root of this repo.
This prevents running scripts from other locations, for example,
`cd web && ../bin/go-run .`.
Modify the build scripts to work regardless of current directory.
Fixes#301
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Use Go 1.10.0 to build Go components.
Take advantage of the new build cache in Go 1.10. Future work on improving
build performance will utilize the build cache further.
Signed-off-by: Brian Smith <brian@briansmith.org>
Previously Dockerfile-go-deps was converted from a multi-stage Dockefile
to a single-stage Dockerfile in anticipation of enabling efficient use
of `--cache-from` in CI. However, that resulted in the image ballooning
in size because it contained the Git repo for every package downloaded
by `dep ensure`.
Bring the image back down to the proper size by removing the temporary
files created.
Signed-off-by: Brian Smith <brian@briansmith.org>
Currently, the max number of in-flight requests in the proxy is
unbounded. This is due to the `Buffer` middleware being unbounded.
This is resolved by adding an instance of `InFlightLimit` around
`Buffer`, capping the max number of in-flight requests for a given
endpoint.
Currently, the limit is hardcoded to 10,000. However, this will
eventually become a configuration value.
Fixes#287
Signed-off-by: Carl Lerche <me@carllerche.com>
The proxy will check that the requested authority looks like a local service, and if it doesn't, it will no longer ask the Destination service about the request, instead just using the SO_ORIGINAL_DST, enabling egress naturally.
The rules used to determine if it looks like a local service come from this comment:
> If default_zone.is_none() and the name is in the form $a.$b.svc, or if !default_zone.is_none() and the name is in the form $a.$b.svc.$default_zone, for some a and some b, then use the Destination service. Otherwise, use the IP given.
* Update go-run to set version equal to root-tag
* Fix inject tests for undefined version change
* Pass inject version explitictly as arg
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Remove kubectl dependency, validate k8s server version via api
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Remove unused MockKubectl
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Remame kubectl.go to version.go
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
When the `inject` command is used on a YAML file that is invalid, it prints out an invalid YAML file with the injected proxy. This may give a false indication to the user that the inject was successful even though the inject command prints out an error message further down the terminal window. This PR fixes#303 and contains a test input and output file that indicates what should be shown.
This PR also fixes#390.
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
* Run conduit dashboard on ephemeral port by default
* Fix wording on dashboard --port flag
* log.Debug error instead of discarding it
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Display more decimal points for truncated numbers, add hover info
* Filter completed pods out of web UI
* Decrease the polling interval from 10s to 2s
* Add more detailed pod categorization based on status
* Tweak filtering of pods, tweak explanations in status table
* Refactor `conduit inject` code to eliminate duplicate logic
Previously there was a lot of code repeated once for each type of
object that has a pod spec.
Refactor the code to reduce the amount of duplication there, to make future
changes easier.
Signed-off-by: Brian Smith <brian@briansmith.org>
* CLI: Remove now-unnecessary "enhanced" Kubernetes object types
The "enhanced" types aren't necessary because now the Kuberentes API
implementation has the correct JSON annotations for the InitContainers
field.
Signed-off-by: Brian Smith <brian@briansmith.org>
- reduce row spacing on tables to make them more compact
- Rename TabbedMetricsTable to MetricsTable since it's not tabbed any more
- Format latencies greater than 1000ms as seconds
- Make sidebar collapsible
- poll the /pods endpoint from the sidebar in order to refresh the list of deployments in the autocomplete
- display the conduit namespace in the service mesh details table
- Use floats rather than Col for more responsive layout (fixes#224)
The init container injected by conduit inject rewrites the iptables configuration for its network namespace. This causes havoc when the network namespace isn't restricted to the pod, i.e. when hostNetwork=true.
Skip pods with hostNetwork=true to avoid this problem.
Fixes#366.
Signed-off-by: Brian Smith <brian@briansmith.org>
Refactor `conduit inject` code to make it unit-testable.
Refactor the conduit inject code to make it easier to add unit tests. This work was done by @deebo91 in #365. This is the same PR without the conduit install changes, so that it can land ahead of #365. In particular, this will be used for testing the fix for high-priority bug #366.
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
Signed-off-by: Brian Smith <brian@briansmith.org>
File sizes (in bytes) before and after this change:
conduit-darwin conduit-linux conduit-windows
Before: 27,056,288 27,282,364 27,359,744
After: 20,023,456 18,080,576 18,262,528
----------------------------------------------------
Diff 7,032,832 9,201,788 9,097,216
Fixes#352.
Signed-off-by: Brian Smith <brian@briansmith.org>
The instance cache that powers the ListPods API is stored in memory in the telemetry service. This means that when there are multiple replicas of the telemetry service, each replica will have a distinct, incomplete view of the added pods based on which pods report to that telemetry replica. This causes the data plane bubbles on the dashboard to not all be filled in, and to flicker with each data refresh.
We create a Prometheus counter called reports_total which has pod as a label. Whenever a telemetry service instance receives a report from a pod, it increments reports_total for that pod. This allows us to remove the in-memory instance cache and instead query Prometheus to see if each pod has had a report in the last 30 seconds.
Fixes#337
Signed-off-by: Alex Leong <alex@buoyant.io>
In PR #298 we moved time window parsing (10s => (time.now - 10s,
time.now) down the stack to immediately before the query. This had the
unintended effect of creating parallel latency quantile requests with
slightly different timestamps.
This change parses the time window prior to latency quantile fan out,
ensuring all requests have the same timestamp.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Test the proxy in release mode in Docker in CI on the master branch.
Previously we were not running the proxy tests in the release configuration.
Run the proxy tests in the release configuration through Docker.
Docker builds with tests in release mode are too slow to run on every
pull request so release mode tests will only be run on the master
branch.
Signed-off-by: Brian Smith <brian@briansmith.org>
PR #298 moved summary (non-timeseries) requests to Prometheus' Query
endpoint, with no timestamp provided. This Query endpoint returns a
single data point with whatever timestamp was provided in the request.
In the absense of a timestamp, it uses current server time. This causes
the Public API to return discreet data points with slightly different
timestamps, which is unexpected behavior.
Modify the Public API -> Telemetry -> Prometheus request path to always
require a timestamp for single data point requests.
Fixes#340
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
On my system (i9-7960x running Docker natively in Linux) this regularly saves
over 11 seconds of build time when a file under pkg/ changes and over 1.5
seconds of build time when a file under controller/ changes. Since most
contributors are running Docker in a VM on less powerful computers, the
savings for most contributors should be significantly greater.
I imagine the savings for web/ and cli/ and proxy-init/ are similar, but I
did not measure them.
Signed-off-by: Brian Smith <brian@briansmith.org>
Conduit has been on Prometheus 1.8.1. Prometheus 2.x promises better
performance.
Upgrade Conduit to Prometheus 2.1.0
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Precompiling pkg/ in an earlier layer saves ~10 seconds of wall clock
time on an incremental build on my machine (i9-7960x) when I update a
file in controller/ such as controller/destination/server.go. This
makes a significant difference in the edit-build-test loop.
Signed-off-by: Brian Smith <brian@briansmith.org>
Previously Dockerfile-go-deps would run `dep ensure` whenever anything in the
source tree changed. Also, because it was a multi-stage Dockerfile it did not
work well with Docker's `--cache-from` feature.
Change Dockerfile-go-deps to only re-run `dep ensure` when Gopkg.{toml,lock}
and/or bin/dep change. Simplify it to a single stage so that it works better
with Docker's `--cache-from` feature.
Signed-off-by: Brian Smith <brian@briansmith.org>
bin/dep verifies the digest of the `dep` downloaded `dep` executable,
whereas previously Dockerfile-go-deps wasn't.
Signed-off-by: Brian Smith <brian@briansmith.org>