The control plane is proxied through the Conduit proxy. The Conduit
proxy is based on the base image, and the control plane containers
and the proxy share a networking namespace. This means we don't
need the extra base utilities in the controller images since we can
use the utilties in the proxy image.
This is a step towards building the initial no-networking Conduit CA
pod. Since the Conduit CA will not do any networking of its own, we
networking debugging utilties are not helpful for it. They are
actually an unnecessary risk because they could facilitate the
exfiltration of the private key of the CA. (The Conduit CA pod won't
have the Conduit Proxy injected into it either.)
This also simplifies & slightly speeds up the building of the
controller images. This is a stepping stone towards being able to
build the controller images without `docker build` to improve build
times.
Signed-off-by: Brian Smith <brian@briansmith.org>
The `app` label should be reserved for end-user applications and we
shouldn't use it ourselves. We already have a Conduit-specific label
that is is prefixed with the `conduit.io/` prefix to avoid naming
collisions with users' labels, so just use that one instead.
Signed-off-by: Brian Smith <brian@briansmith.org>
The template used by `conduit install` was hard-coded in install.go.
This change moves the template into its own file, in anticipation of
increasing the template's size and complexity.
Part of #420
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
In order to take advantage of the benefits the conduit proxy gives to deployments, this PR injects the conduit proxy into the control plane pod. This helps us lay the groundwork for future work such as TLS, control plane observability etc.
Fixes#311
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
The build scripts assume they are executed from the root of this repo.
This prevents running scripts from other locations, for example,
`cd web && ../bin/go-run .`.
Modify the build scripts to work regardless of current directory.
Fixes#301
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Use Go 1.10.0 to build Go components.
Take advantage of the new build cache in Go 1.10. Future work on improving
build performance will utilize the build cache further.
Signed-off-by: Brian Smith <brian@briansmith.org>
Previously Dockerfile-go-deps was converted from a multi-stage Dockefile
to a single-stage Dockerfile in anticipation of enabling efficient use
of `--cache-from` in CI. However, that resulted in the image ballooning
in size because it contained the Git repo for every package downloaded
by `dep ensure`.
Bring the image back down to the proper size by removing the temporary
files created.
Signed-off-by: Brian Smith <brian@briansmith.org>
Currently, the max number of in-flight requests in the proxy is
unbounded. This is due to the `Buffer` middleware being unbounded.
This is resolved by adding an instance of `InFlightLimit` around
`Buffer`, capping the max number of in-flight requests for a given
endpoint.
Currently, the limit is hardcoded to 10,000. However, this will
eventually become a configuration value.
Fixes#287
Signed-off-by: Carl Lerche <me@carllerche.com>
The proxy will check that the requested authority looks like a local service, and if it doesn't, it will no longer ask the Destination service about the request, instead just using the SO_ORIGINAL_DST, enabling egress naturally.
The rules used to determine if it looks like a local service come from this comment:
> If default_zone.is_none() and the name is in the form $a.$b.svc, or if !default_zone.is_none() and the name is in the form $a.$b.svc.$default_zone, for some a and some b, then use the Destination service. Otherwise, use the IP given.
* Update go-run to set version equal to root-tag
* Fix inject tests for undefined version change
* Pass inject version explitictly as arg
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Remove kubectl dependency, validate k8s server version via api
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Remove unused MockKubectl
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Remame kubectl.go to version.go
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
When the `inject` command is used on a YAML file that is invalid, it prints out an invalid YAML file with the injected proxy. This may give a false indication to the user that the inject was successful even though the inject command prints out an error message further down the terminal window. This PR fixes#303 and contains a test input and output file that indicates what should be shown.
This PR also fixes#390.
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
* Run conduit dashboard on ephemeral port by default
* Fix wording on dashboard --port flag
* log.Debug error instead of discarding it
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Display more decimal points for truncated numbers, add hover info
* Filter completed pods out of web UI
* Decrease the polling interval from 10s to 2s
* Add more detailed pod categorization based on status
* Tweak filtering of pods, tweak explanations in status table
* Refactor `conduit inject` code to eliminate duplicate logic
Previously there was a lot of code repeated once for each type of
object that has a pod spec.
Refactor the code to reduce the amount of duplication there, to make future
changes easier.
Signed-off-by: Brian Smith <brian@briansmith.org>
* CLI: Remove now-unnecessary "enhanced" Kubernetes object types
The "enhanced" types aren't necessary because now the Kuberentes API
implementation has the correct JSON annotations for the InitContainers
field.
Signed-off-by: Brian Smith <brian@briansmith.org>
- reduce row spacing on tables to make them more compact
- Rename TabbedMetricsTable to MetricsTable since it's not tabbed any more
- Format latencies greater than 1000ms as seconds
- Make sidebar collapsible
- poll the /pods endpoint from the sidebar in order to refresh the list of deployments in the autocomplete
- display the conduit namespace in the service mesh details table
- Use floats rather than Col for more responsive layout (fixes#224)
The init container injected by conduit inject rewrites the iptables configuration for its network namespace. This causes havoc when the network namespace isn't restricted to the pod, i.e. when hostNetwork=true.
Skip pods with hostNetwork=true to avoid this problem.
Fixes#366.
Signed-off-by: Brian Smith <brian@briansmith.org>
Refactor `conduit inject` code to make it unit-testable.
Refactor the conduit inject code to make it easier to add unit tests. This work was done by @deebo91 in #365. This is the same PR without the conduit install changes, so that it can land ahead of #365. In particular, this will be used for testing the fix for high-priority bug #366.
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
Signed-off-by: Brian Smith <brian@briansmith.org>
File sizes (in bytes) before and after this change:
conduit-darwin conduit-linux conduit-windows
Before: 27,056,288 27,282,364 27,359,744
After: 20,023,456 18,080,576 18,262,528
----------------------------------------------------
Diff 7,032,832 9,201,788 9,097,216
Fixes#352.
Signed-off-by: Brian Smith <brian@briansmith.org>
The instance cache that powers the ListPods API is stored in memory in the telemetry service. This means that when there are multiple replicas of the telemetry service, each replica will have a distinct, incomplete view of the added pods based on which pods report to that telemetry replica. This causes the data plane bubbles on the dashboard to not all be filled in, and to flicker with each data refresh.
We create a Prometheus counter called reports_total which has pod as a label. Whenever a telemetry service instance receives a report from a pod, it increments reports_total for that pod. This allows us to remove the in-memory instance cache and instead query Prometheus to see if each pod has had a report in the last 30 seconds.
Fixes#337
Signed-off-by: Alex Leong <alex@buoyant.io>
In PR #298 we moved time window parsing (10s => (time.now - 10s,
time.now) down the stack to immediately before the query. This had the
unintended effect of creating parallel latency quantile requests with
slightly different timestamps.
This change parses the time window prior to latency quantile fan out,
ensuring all requests have the same timestamp.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Test the proxy in release mode in Docker in CI on the master branch.
Previously we were not running the proxy tests in the release configuration.
Run the proxy tests in the release configuration through Docker.
Docker builds with tests in release mode are too slow to run on every
pull request so release mode tests will only be run on the master
branch.
Signed-off-by: Brian Smith <brian@briansmith.org>
PR #298 moved summary (non-timeseries) requests to Prometheus' Query
endpoint, with no timestamp provided. This Query endpoint returns a
single data point with whatever timestamp was provided in the request.
In the absense of a timestamp, it uses current server time. This causes
the Public API to return discreet data points with slightly different
timestamps, which is unexpected behavior.
Modify the Public API -> Telemetry -> Prometheus request path to always
require a timestamp for single data point requests.
Fixes#340
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
On my system (i9-7960x running Docker natively in Linux) this regularly saves
over 11 seconds of build time when a file under pkg/ changes and over 1.5
seconds of build time when a file under controller/ changes. Since most
contributors are running Docker in a VM on less powerful computers, the
savings for most contributors should be significantly greater.
I imagine the savings for web/ and cli/ and proxy-init/ are similar, but I
did not measure them.
Signed-off-by: Brian Smith <brian@briansmith.org>
Conduit has been on Prometheus 1.8.1. Prometheus 2.x promises better
performance.
Upgrade Conduit to Prometheus 2.1.0
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Precompiling pkg/ in an earlier layer saves ~10 seconds of wall clock
time on an incremental build on my machine (i9-7960x) when I update a
file in controller/ such as controller/destination/server.go. This
makes a significant difference in the edit-build-test loop.
Signed-off-by: Brian Smith <brian@briansmith.org>
Previously Dockerfile-go-deps would run `dep ensure` whenever anything in the
source tree changed. Also, because it was a multi-stage Dockerfile it did not
work well with Docker's `--cache-from` feature.
Change Dockerfile-go-deps to only re-run `dep ensure` when Gopkg.{toml,lock}
and/or bin/dep change. Simplify it to a single stage so that it works better
with Docker's `--cache-from` feature.
Signed-off-by: Brian Smith <brian@briansmith.org>
bin/dep verifies the digest of the `dep` downloaded `dep` executable,
whereas previously Dockerfile-go-deps wasn't.
Signed-off-by: Brian Smith <brian@briansmith.org>
UI cleanups. Remove repetitive labels in the UI, remove unused elements,
remove graphs until we improve their utility.
- remove “Deployment” from the headers of the Deployment Detail Page
- remove Routes in sidebar
- kill leftmost 100px of sidebear
- remove word controller from service mesh page first table
- add twitter and GitHub and slack links
- kill the graphs, replace with one large header (request rate, success rate, latency top bar)
put upstream/downstream diagram before upstream downstream tables
* Clean up DeploymentList page (#321)
- remove "Most active deployments" graphs from the Deployments List page
- remove the scatterplot sections of the page as I don't think we'll be using them for a while
The Public APIs stat endpoint copies a slice of values to a slice of
pointers prior to gRPC response. Go's range clause re-uses the same
pointer for each iteration of the loop, causing a slice of {1,2,3}
becoming {3,3,3}.
Fix the range loop to directly reference pointers in the slice of
values, ignoring the range variable. Also add tests to catch this case.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Removed the `method` label from Prometheus, and removed HTTP methods from reports. Removed `StreamSummary` from reports and replaced it with a `u32` count of streams.
Closes#266
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
All requests from the public API service to the Telemetry service were
done serially. In some cases a single request to the public API's Stat
endpoint resulted in 5 serial requests to the Telemetry service.
Make all requests from the Public API to Telemetry concurrent.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Part of #299
Follow-up from #315.
Now that the UIs don't report per-path metrics, we can remove the path label from Prometheus, the path aggregation and filtering options from the telemetry API, and the path field from the proxy report API.
I've modified the tests to no longer expect the removed fields, and manually verified that Conduit still works after making these changes.
Closes#265
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
I've removed per-path metrics from the web dashboard and from the `conduit stat` command.
Manually validated that these metrics are no longer displayed.
Closes #263
Signed-off-by: Eliza Weisman <eliza@buoyant.io>