Commit Graph

30 Commits

Author SHA1 Message Date
Brian Smith b114ef6819
Add initial infrastructure for optionally accepting TLS connections (#1047)
* Add initial infrastructure for optinally accepting TLS connections.

If the environment gives us the paths to the certificate chain and private key
then use TLS for all accepted TCP connections. Otherwise, continue on using
plaintext for all accepted TCP connections. The default behavior--no TLS--isn't
changed.

Later we'll make this smarter by adding protocol detection so that when the TLS
configuration is available, we'll accept both TLS and non-TLS connections.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-05-31 12:20:57 -10:00
Oliver Gould e91699bba2
proxy/router: Implement LRU cache eviction (#925)
The router's cache has no means to evict unused entries when capacity is
reached.

This change does the following:

- Wraps cache values in a smart pointer that tracks the last time of
  access for each entry. The smart pointer updates the access time when
  the reference to entry is dropped.
- When capacity is not available, all nodes that have not been accessed
  within some minimal idle age are dropped.

Accesses and updates to the map are O(1) when capacity is available.
Reclaiming capacity is O(n), so it's expected that the router is
configured with enough capacity such that capacity need not be reclaimed
usually.
2018-05-10 19:06:31 -07:00
Oliver Gould 63fbbd6931
proxy: Parse units with duration configurations (#909)
Configuration values that take durations are currently specified as
time values with no units.  So `600` may mean 600ms in some contexts and
10 minutes in others.

In order to avoid this problem, this change now requires that
configurations provide explicit units for time values such as '600ms' or
10 minutes'.

Fixes #27.
2018-05-08 13:54:12 -07:00
Oliver Gould 68e203a2fc
proxy: Use Duration types for config defaults (#906)
It's easy to misconfigure default durations, since they're recorded as
integers and converted to Durations separately.

Now, all default constants that represent durations use const `Duration`
instances (enabled by a recent Rust release).

This fixes #905 which was caused by using the wrong time unit for the
metrics retain time.
2018-05-08 10:58:22 -07:00
Oliver Gould 02e6d018d0
proxy: Bound on router capacity (#898)
Currently, the proxy may cache an unbounded number of routes. In order
to prevent such leaks in production, new configurations are introduced
to limit the number of inbound and outbound HTTP routes. By default, we
support 100 inbound routes and 10K outbound routes.

In a followup, we'll introduce an eviction strategy so that capacity can
be reclaimed gracefully.
2018-05-04 16:32:30 -07:00
Oliver Gould ada5cb267e
proxy: Expire metrics that have not been updated for 10 minutes (#880)
The proxy is now configured with the CONDUIT_PROXY_METRICS_RETAIN_IDLE
environment variable that dictates the amount of time that the proxy will retain
metrics that have not been updated.

A timestamp is maintained for each unique set of labels, indicating the last time
that the scope was updated. Then, when metrics are read, all metrics older than
CONDUIT_PROXY_METRICS_RETAIN_IDLE are dropped from the stats registry.

A ctx::test_utils module has been added to aid testing.

Fixes #819
2018-04-30 16:11:12 -07:00
Oliver Gould cc44db054f
Remove NODE_NAME and POD_NAME env usage (#758)
* proxy: Remove pod_name and node_name

* cli: Do not inject POD_NAME and NODE_NAME env vars
2018-04-13 13:09:51 -07:00
Oliver Gould efdfc93b50
Stop pushing telemetry reports from the proxy (#616)
Now that the controller does not depend on pushed telemetry reports, the
proxy need not depend on the telemetry API or maintain legacy sampling
logic.
2018-04-12 17:39:29 -07:00
Brian Smith 7d3b715c4d
Proxy: Move DNS name normalization to service discovery (#722)
Only the destination service needs normalized names (and even then,
that's just temporary). The rest of the code needs the name as it was
given, except case-normalized (lowercased). Because DNS fallack isn't
implemented in service discovery yet, Outbound still a temporary
workaround using FullyQualifiedName to keep things working; thta will
be removed once DNS fallback is implemented in service discovery.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-06 15:04:09 -10:00
Brian Smith 7bc4ffd0a4
Revert "Proxy: Refactor DNS name parsing and normalization (#673)" (#700)
This reverts commit 311ef410a8.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-05 16:49:32 -10:00
Brian Smith 311ef410a8
Proxy: Refactor DNS name parsing and normalization (#673)
Proxy: Refactor DNS name parsing and normalization

Only the destination service needs normalized names (and even then,
that's just temporary). The rest of the code needs the name as it was
given, except case-normalized (lowercased). Because DNS fallack isn't
implemented in service discovery yet, Outbound still a temporary
workaround using FullyQualifiedName to keep things working; thta will
be removed once DNS fallback is implemented in service discovery.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-05 12:32:12 -10:00
Sean McArthur 47f9665b8e
proxy: allow disable protocol detection on specific ports (#648)
- Adds environment variables to configure a set of ports that, when an
  incoming connection has an SO_ORIGINAL_DST with a port matching, will
  disable protocol detection for that connection and immediately start a
  TCP proxy.
- Adds a default list of well known ports: SMTP and MySQL.

Closes #339
2018-04-02 14:24:36 -07:00
Eliza Weisman 1c9ce4d118
Add Prometheus /metrics endpoint to proxy (#569)
This PR adds an endpoint to the proxy that serves metrics in Prometheus' text exposition format. The endpoint currently serves the `request_total`, `response_total`, `response_latency_ms`, and `response_duration_ms metrics`, as described in #536. The endpoint's port and address are configurable with the `CONDUIT_PROXY_METRICS_LISTENER` environment variable.

Tests have been added in t`ests/telemetry.rs`
2018-03-21 16:19:32 -07:00
Brian Smith 649e784d9c
Simplify cluster zone suffix handling in the proxy (#528)
* Temporarily stop trying to support configurable zones in the proxy.

None of the zone configuration is tested and lots of things assume the cluster
zone is `cluster.local`. Further, how exactly the proxy will actually learn the
cluster zone hasn't been decided yet.

Just hard-code the zone as "cluster.local" in the proxy until configurable zones
are fully implemented and tested to be working correctly.

Signed-off-by: Brian Smith <brian@briansmith.org>

* Remove the CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN setting

The way that Kubernetes configures DNS search suffixes has some negative
consequences as some names like "example.com" are ambiguous: depending on
whether there is a service "example" in the "com" namespace, "example.com"
may refer to an external service or an internal service, and this can
fluctuate over time. In recognition of that we added the
CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN setting, thinking this would
be part of a solution for users to opt out of the unfortunate behavior
if their applications didn't depend on the DNS search suffix feature.

It turns out similar effects can be acheived using a custom dnsConfig,
starting in Kubernetes 1.10 when dnsConfig reaches the beta stability level.
Now any CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN-based seems duplicative.
Further, attempting to support it optionally made the code complex and hard
to read.

Therefore, let's just remove it. If/when somebody actually requests this
functionality then we can add it back, if dnsConfig isn't a valid alternative
for them.

Signed-off-by: Brian Smith <brian@briansmith.org>

* Further hard-code "cluster.local" as the zone, temporarily.

Addresses review feedback.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-07 14:30:13 -10:00
Brian Smith 72c6a9cab2
Proxy: Make CONDUIT_PROXY_POD_NAMESPACE a required parameter. (#527)
Wwe will be able to simplify service discovery in the near future if we
can rely on the namespace being available.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-07 11:12:05 -10:00
Eliza Weisman ad073c79b9
Remove connect timeouts from Bind (#487)
Currently, the `Reconnect` middleware does not reconnect on connection errors (see  #491) and treats them as request errors. This means that when a connection timeout is wrapped in a `Reconnect`, timeout errors are treated as request errors, and the request returns HTTP 500. Since  this is not the desired behavior, the connection timeouts should be removed, at least until their errors can be handled differently.

This PR removes the connect timeouts from `Bind`, as described in https://github.com/runconduit/conduit/pull/483#issuecomment-369380003.

It removes the `CONDUIT_PROXY_PUBLIC_CONNECT_TIMEOUT_MS` environment variable, but _not_ the `CONDUIT_PROXY_PRIVATE_CONNECT_TIMEOUT_MS` variable, since this is also used for the TCP connect timeouts. If we want also want to remove the TCP connection timeouts, I can do that as well.

Closes #483. Fixes #491.
2018-03-05 15:38:20 -08:00
Brian Smith 8607875267
Stop using the url crate in the proxy. (#450)
Version 1.7.0 of the url crate seems to be broken which means we cannot
`cargo update` the proxy without locking url to version 1.6. Since we only
use it in a very limited way anyway, and since we use http::uri for parsing
much more, just switch all uses of the url crate to use http::uri for parsing
instead.

This eliminates some build dependencies.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-26 08:55:48 -10:00
Eliza Weisman 694f691b71
Add timeout to Outbound::bind_service (#436)
Closes #403.

When the Destination service does not return a result for a service, the proxy connection for that service will hang indefinitely waiting for a result from Destination. If, for example, the requested name doesn't exist, this means that the proxy will wait forever, rather than responding with an error.

I've added a timeout wrapping the service returned from `<Outbound as Recognize>::bind_service`. The timeout can be configured by setting the `CONDUIT_PROXY_BIND_TIMEOUT` environment variable, and defaults to 10 seconds (because that's the default value for [a similar configuration in Linkerd](https://linkerd.io/config/1.3.5/linkerd/index.html#router-parameters)).

Testing with @klingerf's reproduction from #403:
```
curl -sIH 'Host: httpbin.org' $(minikube service proxy-http --url)/get | head -n1
HTTP/1.1 500 Internal Server Error
```
proxy logs:
```rust
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy using controller at HostAndPort { host: Domain("proxy-api.conduit.svc.cluster.local"), port: 8086 }
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy routing on V4(127.0.0.1:4140)
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy proxying on V4(0.0.0.0:4143) to None
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy::transport::connect "controller-client", DNS resolved proxy-api.conduit.svc.cluster.local to 10.0.0.240
proxy-5698f79b66-8rczl conduit-proxy ERR! conduit_proxy::map_err turning service error into 500: Inner(Timeout(Duration { secs: 10, nanos: 0 }))
```
2018-02-26 10:18:35 -08:00
Sean McArthur 236f71fbe0
proxy: use original dst if authority doesnt look like local service (#397)
The proxy will check that the requested authority looks like a local service, and if it doesn't, it will no longer ask the Destination service about the request, instead just using the SO_ORIGINAL_DST, enabling egress naturally.

The rules used to determine if it looks like a local service come from this comment:

> If default_zone.is_none() and the name is in the form $a.$b.svc, or if !default_zone.is_none() and the name is in the form $a.$b.svc.$default_zone, for some a and some b, then use the Destination service. Otherwise, use the IP given.
2018-02-20 18:09:21 -08:00
Brian Smith 02176d8d16
Remove default controller URL from proxy. (#48)
Previously there was a default controller URL in the proxy. This
default was never used for any proxy injected by `conduit inject` and
it was the wrong default when using the proxy outside of Kubernetes.
Also more generally this is such an important setting in terms of
correctness and security that it was dangerous to let it be implied in
any context.

Remove the default, requiring that it be set in order for the proxy to
start.
2018-01-02 08:44:27 -10:00
Sky Ao 238c54414b correct typo: Enviroment -> Environment (#100)
Signed-off-by: Sky Ao <aoxiaojian@gmail.com>
2017-12-29 10:14:48 -08:00
Brian Smith 8385a7a8c1
Proxy: Map unqualified/partially-qualified names to FQDN (#59)
* Proxy: Map unqualified/partially-qualified names to FQDN

Previously we required the service to fully qualify all service names
for outbound traffic. Many services are written assuming that
Kubernetes will complete names using its DNS search path, and those
services weren't working with Conduit.

Now add an option, used by default, to fully-qualify the domain names.
Currently only Kubernetes-like name completion for services is
supported, but the configuration syntax is open-ended to allow for
alternatives in the future. Also, the auto-completion can be disabled
for applications that prefer to ensure they're always using unambiguous
names. Once routing is implemented then it is likely that (default)
routing rules will replace these hard-coded rules.

Unit tests for the name completion logic are included.

Part of the solution for #9. The changes to `conduit inject` to
actually use this facility will be in another PR.
2017-12-19 11:59:26 -10:00
Brian Smith 81fb0fea5f
Move default private connect timeout to `Config` (#42)
Previously the default value of this setting was in lib.rs instead of
being automatically set in `Config` like all the other defaults, which
was inconsistent and confusing.

Fix this by moving the defaulting logic to `Config`.

Validated by running the test suite.
2017-12-13 21:15:21 -06:00
Brian Smith 0185522821
Proxy: Parse environment variables in one place (#26)
Previously `Process` did its own environment variable parsing and did
not benefit from the improved error handling that `config` now has.
Additionally, future changes will need access to these same environment
variables in other parts of the proxy.

Move `Process`'s environment variable parsing to `config` to address
both of these issues. Now there are no uses of `env::var` outside of
`config` except for logging, which is the final desired state.

I validated this manually.
2017-12-13 19:33:37 -06:00
Brian Smith 559f4a76fb
Proxy: Use production config parsing in tests (#25)
* Proxy: Use production config parsing in tests

Previosuly the testing code for the proxy was sensitive to the values
of environment variables unintentionally, because `Config` looked at
the environment variables. Also, the tests were largely avoiding
testing the production configuration parsing code since they were
doing their own parsing.

Now the tests avoid looking at environment variables other than
`ENV_LOG`, which makes them more resilient. Also the tests now parse
the settings using the same code as production use uses.

I validated this manually.
2017-12-13 19:27:50 -06:00
Brian Smith 0ebc20c013
Proxy: Parse all environment variables before aborting (#24)
Previously, as soon as we would encounter one environment variable with
an invalid value we would exit. This is frustrating behavior when
deploying to Kubernetes and there are multiple problems because the
edit-compile-test cycle is so slow.

Fix this by parsing all the environment variables and logging error
messages before exiting.

I validated this manually.
2017-12-13 18:56:14 -06:00
Eliza Weisman 2fdb859dff
Add timeout to in-flight telemetry reports (#12)
This PR adds a configurable timeout duration after which in-flight telemetry reports are dropped, cancelling the corresponding RPC request to the control plane.

I've also made the `Timeout` implementation used in `TimeoutConnect` generic, and reused it in multiple places, including the timeout for in-flight reports.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2017-12-13 15:07:36 -08:00
Brian Smith e29a02d63b
Proxy: Improve error reporting for invalid environment variables (#23)
* Proxy: Improve error reporting for invalid environment variables

Previously when an environment variable had an invalid value the
process would exit with an error that did not mention which
environment variable is invalid.

Start fixing this by routing environment variable parsing through
functions that always know the name of the environment variable when
they report errors.

I validated this change manually.

* Proxy: Improve configuration URL parsing

Previously there was a bit of duplicated logic between parsing `Addr`
and `HostAndPort` values.

Factor out the common logic. In the process, improve the error
reporting in the cases where parsing fails.
2017-12-08 12:32:43 -06:00
Oliver Gould 980f85963d apply rustffmt on proxy, remove rustfmt.toml for now 2017-12-05 00:44:16 +00:00
Oliver Gould b104bd0676 Introducing Conduit, the ultralight service mesh
We’ve built Conduit from the ground up to be the fastest, lightest,
simplest, and most secure service mesh in the world. It features an
incredibly fast and safe data plane written in Rust, a simple yet
powerful control plane written in Go, and a design that’s focused on
performance, security, and usability. Most importantly, Conduit
incorporates the many lessons we’ve learned from over 18 months of
production service mesh experience with Linkerd.

This repository contains a few tightly-related components:
- `proxy` -- an HTTP/2 proxy written in Rust;
- `controller` -- a control plane written in Go with gRPC;
- `web` -- a UI written in React, served by Go.
2017-12-05 00:24:55 +00:00