Followup to #12844
This new field defines the default policy for Servers, i.e. if a request doesn't match the policy associated to a Server then this policy applies. The values are the same as for `proxy.defaultInboundPolicy` and the `config.linkerd.io/default-inbound-policy` annotation (all-unauthenticated, all-authenticated, cluster-authenticated, cluster-unauthenticated, deny), plus a new value "audit". The default is "deny", thus remaining backwards-compatible.
This field is also exposed as an additional printer column.
The main change here is the refactoring of the address functions in `addr.go` that support the Destination controller and Viz's Tap controller. Some of those functions only worked for IPv4, so this change refactored them to make them IP family agnostic.
This enabled adding (and fixing) IPv6 unit tests as detailed in the following sections.
Other changes:
- The `ProxyAddressesToString()` function was no longer used, so it got removed.
- The `ProxyIPToString()` function was only used by the destination-client script, so that got stripped out.
## `addr_test.go`
We added IPv6 cases to each test, that would have failed previously.
## `endpoint_translator_test.go`
One of the test pods (pod3) was changed to have an IPv6. Without the other changes in this PR those tests would still have passed, but just because when comparing actual IPs with expected ones we weren't checking if they were both zero. So here we added checks against that.
## `server_test.go`
As above, we added checks against empty IPs. And in the mocked resources in `test_util.go` we added an IPv6 EndpointSlice.
At the time of making this commit, the package `github.com/ghodss/yaml`
is no longer actively maintained.
`sigs.k8s.io/yaml` is a permanent fork of `ghodss/yaml` and is actively
maintained by Kubernetes SIG.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
[gocritic][gc] helps to enforce some consistency and check for potential
errors. This change applies linting changes and enables gocritic via
golangci-lint.
[gc]: https://github.com/go-critic/go-critic
Signed-off-by: Oliver Gould <ver@buoyant.io>
Since Go 1.13, errors may "wrap" other errors. [`errorlint`][el] checks
that error formatting and inspection is wrapping-aware.
This change enables `errorlint` in golangci-lint and updates all error
handling code to pass the lint. Some comparisons in tests have been left
unchanged (using `//nolint:errorlint` comments).
[el]: https://github.com/polyfloyd/go-errorlint
Signed-off-by: Oliver Gould <ver@buoyant.io>
The destination API expects a json formatted context token which includes the node name of the requesting client. e.g.
```
go run controller/script/destination-client/main.go -method get -path web-svc.emojivoto.svc.cluster.local:80 --token '{"nodeName":"alex-worker"}'
```
Signed-off-by: Alex Leong <alex@buoyant.io>
This change ensures that if a Server exists with `proxyProtocol: opaque` that selects an endpoint backed by a pod, that destination requests for that pod reflect the fact that it handles opaque traffic.
Currently, the only way that opaque traffic is honored in the destination service is if the pod has the `config.linkerd.io/opaque-ports` annotation. With the introduction of Servers though, users can set `server.Spec.ProxyProtocol: opaque` to indicate that if a Server selects a pod, then traffic to that pod's `server.Spec.Port` should be opaque. Currently, the destination service does not take this into account.
There is an existing change up that _also_ adds this functionality; it takes a different approach by creating a policy server client for each endpoint that a destination has. For `Get` requests on a service, the number of clients scales with the number of endpoints that back that service.
This change fixes that issue by instead creating a Server watch in the endpoint watcher and sending updates through to the endpoint translator.
The two primary scenarios to consider are
### A `Get` request for some service is streaming when a Server is created/updated/deleted
When a Server is created or updated, the endpoint watcher iterates through its endpoint watches (`servicePublisher` -> `portPublisher`) and if it selects any of those endpoints, the port publisher sends an update if the Server has marked that port as opaque.
When a Server is deleted, the endpoint watcher once again iterates through its endpoint watches and deletes the address set's `OpaquePodPorts` field—ensuring that updates have been cleared of Server overrides.
### A `Get` request for some service happens after a Server is created
When a `Get` request occurs (or new endpoints are added—they both take the same path), we must check if any of those endpoints are selected by some existing Server. If so, we have to take that into account when creating the address set.
This part of the change gives me a little concern as we first must get all the Servers on the cluster and then create a set of _all_ the pod-backed endpoints that they select in order to determine if any of these _new_ endpoints are selected.
## Testing
Right now this can be tested by starting up the destination service locally and running `Get` requests on a service that has endpoints selected by a Server
**app.yaml**
```yaml
apiVersion: v1
kind: Pod
metadata:
name: pod
labels:
app: pod
spec:
containers:
- name: app
image: nginx
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: svc
spec:
selector:
app: pod
ports:
- name: http
port: 80
---
apiVersion: policy.linkerd.io/v1alpha1
kind: Server
metadata:
name: srv
labels:
policy: srv
spec:
podSelector:
matchLabels:
app: pod
port: 80
proxyProtocol: HTTP/1
```
```bash
$ go run controller/script/destination-client/main.go -path svc.default.svc.cluster.local:80
```
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This adds the policy CRD APIs for `Server` and `ServerAuthorization` CRDs.
The structure of each (in their respective `types.go`) is based off the `policy-crd.yaml` specs for each CRD.
Unlike service profiles, servers and server authorizations use the `oneof` extensively so I encoded that as a struct with a pointer for each possible `oneof`. For example, a server's `PodSelector` is either `MatchExpressions` or `MatchLabels`. Therefore, a `PodSelector` is defined as:
```
type PodSelector struct {
MatchExpressions *MatchExpressions
MatchLabels *MatchLabels
}
```
Closes#6970
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
## Motivation
These changes came up when testing mock identity. I found it useful for the
destination client to print the identity of endpoints.
```
❯ go run controller/script/destination-client/main.go -method get -path h1.test.example.com:8080
INFO[0000] Add:
INFO[0000] labels: map[concrete:h1.test.example.com:8080]
INFO[0000] - 127.0.0.1:4143
INFO[0000] - labels: map[addr:127.0.0.1:4143 h2:false]
INFO[0000] - protocol hint: UNKNOWN
INFO[0000] - identity: dns_like_identity:{name:"foo.ns1.serviceaccount.identity.linkerd.cluster.local"}
INFO[0000]
```
I also fixed a log line in the proxy-identity where used the wrong value for the
CSR path
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Followup to #2990, which refactored `linkerd endpoints` to use the
`Destination.Get` API instead of the `Discovery.Endpoints` API, leaving
the Discovery with no implented methods. This PR removes all the Discovery
code leftovers.
Fixes#3499
* Have `linkerd endpoints` use `Destination.Get`
Fixes#2885
We're refactoring `linkerd endpoints` so it hits
directly the `Destination.Get` endpoint, instead of relying on the
Discovery service.
For that, I've created a new `client.go` for Destination and added it to
the `APIClient` interface.
I've also added a `destinationClient` struct that mimics `tapClient`,
and whose common logic has been moved into `stream_client.go`.
Analogously, I added a `destinationServer` struct that mimics
`tapServer`.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
The proxy's TLS implementation has changed to use a new _Identity_ controller.
In preparation for this, the `--tls=optional` CLI flag has been removed
from install and inject; and the `ca` controller has been deleted. Metrics
and UI treatments for TLS have **not** been removed, as they will continue to
be valuable for the new Identity system.
With the removal of the old identity scheme, the Destination service's proxy
ID field is now set with an opaque string (e.g. `ns:emojivoto`) to enable
locality awareness.
Up until now, the proxy-api controller service has been the sole service
that the proxy communicates with, implementing the majoriry of the API
defined in the `linkerd2-proxy-api` repo. But this is about to change:
linkerd/linkerd2-proxy-api#25 introduces a new Identity service; and
this service must be served outside of the existing proxy-api service
in the linkerd-controller deployment (so that it may run under a
distinct service account).
With this change, the "proxy-api" name becomes less descriptive. It's no
longer "the service that serves the API for the proxy," it's "the
service that serves the Destination API to the proxy." Therefore, it
seems best to bite the bullet and rename this to be the "destination"
service (i.e. because it only serves the
`io.linkerd.proxy.destination.Destination` service).
Co-authored-by: Kevin Lingerfelt <kl@buoyant.io>
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The Proxy API service lacked introspection of its internal state.
Introduce a new gRPC Discovery API, implemented by two servers:
1) Proxy API Server: returns a snapshot of discovery state
2) Public API Server: pass-through to the Proxy API Server
Also wire up a new `linkerd endpoints` command.
Fixes#2165
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
This is a followup branch from #2023:
- delete `proxy/client.go`, move code to `destination-client`
- move `RenderTapEvent` and stat functions from `util` to `cmd`
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
A container called `proxy-api` runs in the Linkerd2 controller pod. This container listens on port 8086 and serves the proxy-api but does nothing other than forward gRPC requests to the destination container which listens on port 8089.
We remove the proxy-api container altogether and change the destination container to listen on port 8086 instead of 8089. The result is that clients still use the proxy-api by connecting to `proxy-api.<ns>.svc.cluster.local:8086` but the controller has one fewer containers. This results in a simpler system that is easier to reason about.
Signed-off-by: Alex Leong <alex@buoyant.io>
We implement the getProfiles method in the destination service. This method returns a stream of destination profiles for a given authority. It does this by looking up the ServiceProfile resource in the controller namespace named `<svc>.<ns>` where `<svc>` is the name of the service and `<ns>` is the namespace of the service.
This PR includes:
* Adding a ServiceProfile Custom Resource Definition to linkerd install
* A watch based implementation of the getProfiles method in the destination service, similar to the implementation of get.
* An update to the destination client script that allows querying the getProfiles method.
Signed-off-by: Alex Leong <alex@buoyant.io>
This PR begins to migrate Conduit to Linkerd2:
* The proxy has been completely removed from this repo, and is now located at
github.com/linkerd/linkerd2-proxy.
* A `Dockerfile-proxy` has been added to fetch the most-recently published proxy
binary from build.l5d.io.
* Proxy-specific protobuf bindings have been moved to
github.com/linkerd/linkerd2-proxy-api.
* All docker images now use the gcr.io/linkerd-io registry.
* `inject` now uses `LINKERD2_PROXY_` environment variables
* Go paths have been updated to reflect the new (future) repo location.
* Update dest service with a different tls identity strategy
* Send controller namespace as separate field
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Add controller admin servers and readiness probes
* Tweak readiness probes to be more sane
* Refactor based on review feedback
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The proxy does not yet support a `meshed` label.
In anticipation of a `meshed` label in the proxy, introduce this label
in `simulate-proxy`, for testing.
Relates to #306 and #386.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
secured -> meshed
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Modify the Stat endpoint to also return the count of failed pods
* Add comments explaining pod count stats
* Rename total pod count to running pod count
This is to support the service mesh overview page, as I'd like to include an indicator of
failed pods there.
The Kubernetes client-go Informer/Lister APIs are implemented in several
parts of the code base.
This change introduces a Lister module, providing Informer/Lister
capability through a simple interface. Once this merges, we can follow
up with moving public-api and tap onto Lister.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
This PR removes the unused `request_duration_ms` and `response_duration_ms` histogram metrics from the proxy. It also removes them from the `simulate-proxy` script's output, and from `docs/proxy-metrics.md`
Closes#821
This PR makes two changes to the `simulate-proxy` script:
1. Removed the `protocol={"http", "tcp"}` label from TCP metrics. The proxy no longer adds this label (see https://github.com/runconduit/conduit/pull/785#discussion_r182563499).
2. Fixed failed responses being labeled with `classification="fail"` rather than `classification="failure"` (the label the proxy sets). I noticed that while I was here and decided to fix it as well.
Note that the first change required some minor changes to the `proxyMetricCollectors` struct in `simulate-proxy`; since the label cardinality for TCP open stats decreased by one due to removing the `protocol` label, it's no longer necessary for that struct to `haveCounterVec`/`GaugeVec` pointers for these stats. It now owns the actual `Counter`/`Gauge` instead. This means that the metric vecs that are created to be labeled for `inbound` and `outbound` are now stored as variables in the `newSimulatedProxy` function rather than going in a `proxyMetricCollectors` struct first. This shouldn't impact behaviour at all.
This PR adds the transport-level metrics described in #742 to the `simulate-proxy` script. This will be useful while adding these metrics to the Grafana dashboard and/or CLI.
Closes#793
The Destination service used slightly different labels than the
telemetry pipeline expected, specifically, prefixed with `k8s_*`.
Make all Prometheus labels consistent by dropping `k8s_*`. Also rename
`pod_name` to `pod` for consistency with `deployement`, etc. Also update
and reorganize `proxy-metrics.md` to reflect new labelling.
Fixes#655
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Extracted logic from destination server
* Make tests follow style used elsewhere in the code
* Extract single interface for resolvers
* Add tests for k8s and ipv4 resolvers
* Fix small usability issues
* Update dep
* Act on feedback
* Add pod-based metric_labels to destinations response
* Add documentation on running control plane to BUILD.md
Signed-off-by: Phil Calcado <phil@buoyant.io>
* Fix mock controller in proxy tests (#656)
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
* Address review feedback
* Rename files in the destination package
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
simulate-proxy uses a deployment object from kubernetes to simulate
each proxy metrics endpoint.
Modify simulate-proxy to instead use a pod to simulate each proxy
metrics endpoint. This ensures that each metrics endpoint consistently
represents a pod in kubernetes, including it's namespace, deployment,
and label information.
This change also adds support for:
- a new `metric-ports` flag, default is `10000-10009`.
- `classification`, `pod_name`, and `pod_template_hash` labels
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
simulate-proxy increments a single set of metrics on each iteration, and
also randomizes http status codes, leaving counters unchanged across
several collections.
Modify simuilate-proxy to increment all metrics on each iteration,
provide a 90% success rate, ensure a pod does not call itself, and
increase proxy count from 3 to 10.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The Prometheus scrape config collects from Conduit proxies, and maps
Kubernetes labels to Prometheus labels, appending "k8s_".
This change keeps the resultant Prometheus labels consistent with their
source Kubernetes labels. For example: "deployment" and
"pod_template_hash".
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Have the controller tell the client whether the service exists, not
just what are available. This way we can implement fallback logic to
alternate service discovery mechanisms for ambigious names.
Signed-off-by: Brian Smith <brian@briansmith.org>
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The Prometheus config in the docker-compose environment had fallen
behind the prod setup.
This change updates the docker-compose environment in the following
ways:
- Prometheus config more closely matches prod, based on #583
- simulate-proxy labels matches prod, based on #605
- add Grafana container
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The simulate-proxy script pushes metrics to the telemetry service. This PR modifies the script to expose metrics to a prometheus endpoint. This functionality creates a server that randomly generates response_total, request_totals, response_duration_ms and response_latency_ms. The server reads pod information from a k8s cluster and picks a random namespace to use for all exposed metrics.
Tested out these changes with a locally running prometheus server. I also ran the docker-compose.yml to make sure metrics were being recorded by the prometheus docker container.
fixes#498
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
Removed the `method` label from Prometheus, and removed HTTP methods from reports. Removed `StreamSummary` from reports and replaced it with a `u32` count of streams.
Closes#266
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Follow-up from #315.
Now that the UIs don't report per-path metrics, we can remove the path label from Prometheus, the path aggregation and filtering options from the telemetry API, and the path field from the proxy report API.
I've modified the tests to no longer expect the removed fields, and manually verified that Conduit still works after making these changes.
Closes#265
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
The proxy currently stores latency values in an `OrderMap` and reports every observed latency value to the controller's telemetry API since the last report. The telemetry API then sends each individual value to Prometheus. This doesn't scale well when there are a large number of proxies making reports.
I've modified the proxy to use a fixed-size histogram that matches the histogram buckets in Prometheus. Each report now includes an array indicating the histogram bounds, and each response scope contains a set of counts corresponding to each index in the bounds array, indicating the number of times a latency in that bucket was observed. The controller then reports the upper bound of each bucket to Prometheus, and can use the proxy's reported set of bucket bounds so that the observed values will be correct even if the bounds in the control plane are changed independently of those set in the proxy.
I've also modified `simulate-proxy` to generate the new report structure, and added tests in the proxy's telemetry test suite validating the new behaviour.
Currently, all "success"/"failure" classifications in the telemetry API are made based on the `grpc-status` trailer. If the trailer is not present, then a request is assumed to have failed. As we start proxying non-gRPC traffic, the controller needs to also be aware of HTTP status codes, so that non-gRPC requests are not assumed to always fail.
I've modified the telemetry API server to classify requests based on their HTTP status codes when the `grpc-status` trailer is not present.
I've also modified the `simulate-proxy` script to generate fake HTTP/2 traffic without the `grpc-status` trailer.
Closes#196
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
simulate-proxy sends HTTP example data.
Modify this test script to also send TCP example data.
Part of #132
Signed-off-by: Andrew Seigner <andrew@sig.gy>
* Sort imports
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Upgrade k8s.io/client-go to v6.0.0
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Make k8s store initialization blocking with timeout
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
Problem:
Simulate proxy would seemingly hang when used.
In simulate-proxy we were using rand.Uint32() to generate Count. This is way too big (in telemetry/server.go we call latencyStat.observe() Count times, so this loop was taking fovever).
Solution:
Use a count of 1 (as the surrounding loop will generate count requests)
Validation:
Script now works without hanging.