Commit Graph

12 Commits

Author SHA1 Message Date
Oliver Gould d5e2ff2cb7
Canonicalize outbound names via DNS for inbound profiles (#129)
When the inbound proxy receives requests, these requests may have
relative `:authority` values like _web:8080_. Because these requests can
come from hosts with a variety of DNS configurations, the inbound proxy
can't make a sufficient guess about the fully qualified name (e.g.
_web.ns.svc.cluster.local._).

In order for the inbound proxy to discover inbound service profiles, we
need to establish some means for the inbound proxy to determine the
"canonical" name of the service for each request.

This change introduces a new `l5d-dst-canonical` header that is set by
the outbound proxy and used by the remote inbound proxy to determine
which profile should be used.

The outbound proxy determines the canonical destination by performing
DNS resolution as requests are routed and uses this name for profile and
address discovery. This change removes the proxy's hardcoded Kubernetes
dependency.

The `LINKERD2_PROXY_DESTINATION_GET_SUFFIXES` and
`LINKERD2_PROXY_DESTINATION_PROFILE_SUFFIXES` environment variables
control which domains may be discovered via the destination service.

Finally, HTTP settings detection has been moved into a dedicated routing
layer at the "bottom" of the stack. This is done do that
canonicalization and discovery need not be done redundantly for each set
of HTTP settings. Now, HTTP settings, only configure the HTTP client
stack within an endpoint.

Fixes linkerd/linkerd2#1798
2018-11-15 11:41:17 -08:00
Oliver Gould 8fca9ebde2
Only use the classification label for response_total (#116)
As described in https://github.com/linkerd/linkerd2/issues/1832, our eager
classification is too complicated.

This changes the `classification` label to only be used with the `response_total` label.

The following changes have been made:
1. response_latency metrics only include a status_code and not a classification.
2. response_total metrics include classification labels.
3. transport metrics no longer expose a `classification` label (since it's misleading).
   now the `errno` label is set to be empty when there is no error.
4. Only gRPC classification applies when the request's content type starts
   with `application/grpc+`

The `proxy::http::classify` APIs have been changed so that classifiers cannot
return a classification before the classifier is fully consumed.
2018-11-01 14:59:44 -07:00
Sean McArthur 68ec7f1488 fix test import conflict of assert_contains macro
Signed-off-by: Sean McArthur <sean@buoyant.io>
2018-10-26 14:21:26 -07:00
Oliver Gould 978fed1cf6
refactor: Structure the proxy in terms of `Stack` (#100)
As the proxy's functionality has grown, the HTTP routing functionality
has become complex. Module boundaries have become ill-defined, which
leads to tight coupling--especially around the `ctx` metadata types and
`Service` type signatures.

This change introduces a `Stack` type (and subcrate) that is used as the
base building block for proxy functionality. The `proxy` module now
exposes generic components--stack layers--that are configured and
instantiated in the `app::main` module.

This change reorganizes the repo as follows:
- Several auxiliary crates have been split out from the `src/` directory
  into `lib/fs-watch`, `lib/stack` and `lib/task`.
- All logic specific to configuring and running the linkerd2 sidecar
  proxy has been moved into `src/app`. The `Main` type has been moved
  from `src/lib.rs` to `src/app/main.rs`.
- The `src/proxy` has reusable, generic components useful for building
  proxies in terms of `Stack`s.

The logic contained in `lib/bind.rs`, pertaining to per-endpoint service
behavior, has almost entirely been moved into `app::main`.

`control::destination` has changed so that it is not responsible for
building services. (It used to take a clone of `Bind` and use it to
create per-endpoint services). Instead, the destination service
implements the new `proxy::Resolve` trait, which produces an infinite
`Resolution` stream for each lookup. This allows the `proxy::balance`
module to be generic over the servie discovery source.

Furthermore, the `router::Recognize` API has changed to only expose a
`recgonize()` method and not a `bind_service()` method. The
`bind_service` logic is now modeled as a `Stack`.

The `telemetry::http` module has been replaced by a
`proxy::http::metrics` module that is generic over its metadata types
and does not rely on the old telemetry event system. These events are
now a local implementation detail of the `tap` module.

There are no user-facing changes in the proxy's behavior.
2018-10-11 11:25:03 -07:00
Oliver Gould 0d890dab6b
Mark tcp_connect_err tests as macos-specific (#63)
The `tcp_connect_err` tests do not pass on Linux.

I initially observed that the errno produced is ECONNREFUSED on Linux,
not EXFULL, but even changing this does not help the test to pass
reliably, as the socket teardown semantics appear to be slightly
different.

In order to prevent spurious test failures during development, this
change marks these two tests as macos-specific.
2018-08-14 12:32:41 -07:00
Oliver Gould 18a8d7956d
Improve tcp_connect_err test flakiness (#37)
Both tcp_connect_err tests frequently fail, even during local
development. This seems to happen because the proxy doesn't necessarily
observe the socket closure.

Instead of shutting down the socket gracefully, we can just drop it!
This helps the test pass much more reliably.
2018-08-01 17:25:49 -07:00
Oliver Gould b2fcd5d276
Remove the telemetry system's event channel (#30)
The proxy's telemetry system is implemented with a channel: the proxy thread
generates events and the control thread consumes these events to record
metrics and satisfy Tap observations. This design was intended to minimize
latency overhead in the data path.

However, this design leads to substantial CPU overhead: the control thread's
work scales with the proxy thread's work, leading to resource contention in
busy, resource-limited deployments. This design also has other drawbacks in
terms of allocation & makes it difficult to implement planned features like
payload-aware Tapping.

This change removes the event channel so that all telemetry is recorded
instantaneously in the data path, setting up for further simplifications so
that, eventually, the metrics registry properly uses service lifetimes to
support eviction.

This change has a potentially negative side effect: metrics scrapes obtain
the same lock that the data path uses to write metrics so, if the metrics
server gets heavy traffic, it can directly impact proxy latency. These
effects will be ameliorated by future changes that reduce the need for the
Mutex in the proxy thread.
2018-07-26 11:16:27 -07:00
Markus Jais 7788f60e0e fixed some typos in comments and Dockerfile (#25)
Signed-off-by: Markus Jais <markusjais@googlemail.com>
2018-07-25 10:10:59 -10:00
Eliza Weisman 2d4086aee9
Add errno label to transport close metrics (when applicable) (#12)
This branch adds a label displaying the Unix error code name (or the raw
error code, on other operating systems or if the error code was not 
recognized) to the metrics generated for TCP connection failures.

It also adds a couple of tests for label formatting.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-07-23 15:37:04 -07:00
Oliver Gould bbf217ff4f
Replace references to _Conduit_ (#6)
There are various comments, examples, and documentation that refers to
Conduit. This change replaces or removes these refernces.

CONTRIBUTING.md has been updated to refer to GOVERNANCE/MAINTAINERS.
2018-07-12 20:41:17 -07:00
Eliza Weisman 2f4c1b220a
Add labels for `tls::ReasonForNoIdentity` (#5)
Fixes linkerd/linkerd2#1276.

Currently, metrics with the `tls="no_identity"` label are duplicated.
This is because that label is generated from the `tls_status` label on
the `TransportLabels` struct, which is either `Some(())` or a
`ReasonForNoTls`. `ReasonForNoTls` has a
variant`ReasonForNoTls::NoIdentity`, which contains a
`ReasonForNoIdentity`, but when we format that variant as a label, we
always just produce the string "no_identity", regardless of the value of
the `ReasonForNoIdentity`. 

However, label types are _also_ used as hash map keys into the map that
stores the metrics scopes, so although two instances of
`ReasonForNoTls::NoIdentity` with different `ReasonForNoIdentity`s
produce the same formatted label output, they aren't _equal_, since that
field differs, so they correspond to different metrics.

This branch resolves this issue, by adding an additional label to these
metrics, based on the `ReasonForNoIdentity`. Now, the separate lines in
the metrics output that correspond to each `ReasonForNoIdentity` have a
label differentiating them from each other.

Note that the the `NotImplementedForTap` and `NotImplementedForMetrics`
reasons will never show up in metrics labels, currently, since we don't gather 
metrics from the tap and metrics servers at the moment.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-07-12 16:04:25 -07:00
Oliver Gould c23ecd0cbc
Migrate `conduit-proxy` to `linkerd2-proxy`
The proxy now honors environment variables starting with
`LINKERD2_PROXY_`.
2018-07-07 22:45:21 +00:00