The Rustls/Ring/Webpki crates have issues communicating with a variety
of Kubernetes clusters. This change modifes the policy-controller's
default TLS implementation to use `libssl` (as provided by the
`distroless:base` container image).
The policy controller synthesizes identity strings based on service account
names; but it assumed that `linkerd` was the name of the control plane
namespace. This change updates the policy controller to take a
`--control-plane-namespace` command-line argument to set this value in
identity strings. The helm templates have been updated to configure the policy
controller appropriately.
Fixes#7204
Co-authored-by: Oliver Gould <ver@buoyant.io>
Kubernetes v1.19 is reaching its end-of-life date on 2021-10-28. In
anticipation of this, we should explicitly update our minimum supported
version to v1.20. This allows us keep our dependencies up-to-date and
ensures that we can actually test against our minimum supported version.
Fixes#7171
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
While testing the proxy with various allocators, we've seen that
jemalloc generally uses less memory without incurring CPU or latency
costs.
This change updates the policy-controller to use jemalloc on x86_64
gnu/linux. We continue to use the system allocator on other platforms
(especially arm), since the jemalloc tests do not pass on these
platforms (according to the jemallocator readme).
Fixes#6827
We upgrade the Server and ServerAuthorization CRD versions from v1alpha1 to v1beta1. This version update does not change the schema at all and the v1alpha1 versions will continue to be served for now. We also update the CLI and control plane to use the v1beta1 versions.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Add label & protocol test for server
* Add authz tests for server
Signed-off-by: Matei David <matei@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
The policy controller currently logs a warning message every
time a Server resource is applied. There is a mismatch between
the format of the blob that we're trying to deserialise and the
type we are deserialising into. To fix, I've changed the
`parse_server` function to deserialise only the spec;
the function signature has also changed to return the name
of the Server as a string.
Closes#6860
The policy controller only emitted logs in the default plain format.
This change adds new CLI flags to the policy-controller: `--log-format`
and `--log-level` that configure logging (replacing the `RUST_LOG`
environment variable). The helm chart is updated to configure these
flags--the `controllerLogLevel` variable is used to configure the policy
controller as well.
Example:
```
{"timestamp":"2021-09-15T03:30:49.552704Z","level":"INFO","fields":{"message":"HTTP admin server listening","addr":"0.0.0.0:8080"},"target":"linkerd_policy_controller::admin","spans":[{"addr":"0.0.0.0:8080","name":"serve"}]}
{"timestamp":"2021-09-15T03:30:49.552689Z","level":"INFO","fields":{"message":"gRPC server listening","addr":"0.0.0.0:8090"},"target":"linkerd_policy_controller","spans":[{"addr":"0.0.0.0:8090","cluster_networks":"[10.0.0.0/8, 100.64.0.0/10, 172.16.0.0/12, 192.168.0.0/16]","name":"grpc"}]}
{"timestamp":"2021-09-15T03:30:49.567734Z","level":"DEBUG","fields":{"message":"Ready"},"target":"linkerd_policy_controller_k8s_index"}
^C{"timestamp":"2021-09-15T03:30:51.245387Z","level":"DEBUG","fields":{"message":"Received ctrl-c"},"target":"linkerd_policy_controller"}
{"timestamp":"2021-09-15T03:30:51.245473Z","level":"INFO","fields":{"message":"Shutting down"},"target":"linkerd_policy_controller"}
```
Co-authored-by: Eliza Weisman <eliza@buoyant.io>
Currently, the policy controller's indexing does not detect when a
server update changes its protocol (due to an incorrect comparison).
This change fixes this comparison so that protocol hint changes are
properly honored.
Various development tools (including Rust Analyzer and some reusable
actions) expect the root of the project to define a Cargo workspace.
In order to work more naturally with these tools, this change moves the
`Cargo.lock`, `rust-toolchain`, and `deny.toml` files to the root of the
project. A `Cargo.toml` is factored out of `policy-controller` to define
the top-level workspace.
We initially implemented a mechanism to automatically authorize
unauthenticated traffic from each pod's Kubelet's IP. Our initial method
of determining a pod's Kubelet IP--using the first IP from its node's
pod CIDRs--is not a generally usable solution. In particular, CNIs
complicate matters (and EKS doesn't even set the podCIDRs field).
This change removes the policy controller's node watch and removes the
`default:kubelet` authorization. When using a restrictive default
policy, users will have to define `serverauthorization` resources that
permit kubelet traffic. It's probably possible to programatically
generate these authorizations (i.e. by inspecting pod probe
configurations); but this is out of scope for the core control plane
functionality.
We've observed noticeable (~10%) RSS & CPU improvements by enabling LTO
in the proxy release builds. This change enables this setting for the
policy controller as well.
We add a validating admission controller to the policy controller which validates `Server` resources. When a `Server` admission request is received, we look at all existing `Server` resources in the cluster and ensure that no other `Server` has an identical selector and port.
Signed-off-by: Alex Leong <alex@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
Fixes#6743
As in #6392 for the proxy image (fixed by #6451), using the
`distroless/cc:nonroot` base image breaks the policy container in some
environments. So we're changing that to `distroless/cc`. The policy
container is already being run using a non-root user, so we're not
compromising on security.
Policy controller API responses include a set of labels. These labels
are to be used in proxy m$etrics to indicate why traffic is permitted to
a pod. This permits metrics to be associated with `Server` and
ServerAuthorization` resources (i.e. for `stat`).
This change updates the response API to include a `name` label
referencing the server's name. When the policy is derived from a default
configuration (and not a `Server` instance), the name takes the form
'default:<policy>'.
This change also updates authorization labels. Defaults are encoded as
servers are, otherwise the authorization's name is set as a label. The
`tls` and `authn` labels have been removed, as they're redundant with
other labels that are already present.
Pods may be annotated with annotations like
`config.linkerd.io/opaque-ports` and
`config.linkerd.io/proxy-require-identity-inbound-ports`--these
annotations configure default behavior that should be honored when a
`Server` does not match the workload's ports. As it stands now, the
policy controller would break opaque-ports configurations that aren't
reflected in a `Server`.
This change reworks the pod indexer to create a default server watch for
each _port_ (rather than for each pod). The cache of default server
watches is now lazy, creating watches as needed for all used
combinations of default policies. These watches are never dropped, but
there are only a few possible combinations of port configurations, so
this doesn't pose any concerns re: memory usage.
While doing this, the names used to describe these default policies are
updated to be prefixed with `default:`. This generally makes these names
more descriptive and easier to understand.
The policy controller's readiness and liveness admin endpoint is tied to
watch state: the controller only advertises liveness when all watches
have received updates; and after a watch disconnects liveness fails
until a new update is received.
However, in some environments--especially when the API server ends the
stream before the client gracefully reconnects--the watch terminess so
that liveness is not advertises even though the client resumes watching
resources. Because the watch is resumed with a `resourceVersion`, no
updates are provided despite the watch being reestablished, and liveness
checks fail until the pod is terminated (or an update is received).
To fix this, we modify readiness advertisements to fail only until the
initial state is acquired from all watches. After this, the controller
serves cached state indefinitely.
While diagnosing this, logging changes were needed, especially for the
`Watch` type. Watches now properly maintain logging contexts and state
transitions are logged in more cases. The signature and logging context
of `Index::run` has been updated as well. Additionally, node lookup
debug logs have been elaborated to help confirm that 'pending' messages
are benign.
We can't use the typical multiarch docker build with the proxy:
qemu-hosted arm64/arm builds take 45+ minutes before failing due to
missing tooling--specifically `protoc`. (While there is a `protoc`
binary available for arm64, there are no binaries available for 32-bit
arm hosts).
To fix this, this change updates the release process to cross-build the
policy-controller on an amd64 host to the target architecture. We
separate the policy-controller's dockerfiles as `amd64.dockerfile`,
`arm64.dockerfile`, and `arm.dockerfile`. Then, in CI we build and push
each of these images individually (in parallel, via a build matrix).
Once all of these are complete, we use the `docker manifest` CLI tools
to unify these images into a single multi-arch manifest.
This cross-building approach requires that we move from using
`native-tls` to `rustls`, as we cannot build against the platform-
appropriate native TLS libraries. The policy-controller is now feature-
flagged to use `rustls` by default, though it may be necessary to use
`native-tls` in local development, as `rustls` cannot validate TLS
connections that target IP addresses.
The policy-controller has also been updated to pull in `tracing-log` for
compatibility with crates that do not use `tracing` natively. This was
helpful while debugging connectivity issue with the Kubernetes cluster.
The `bin/docker-build-policy-controller` helper script now *only* builds
the amd64 variant of the policy controller. It fails when asked to build
multiarch images.
kube v0.59 depends on k8s-openapi v0.13, which includes breaking
changes.
This change updates these dependencies and modifies our code to account
for these changes.
Furthermore, we now use the k8s-openapi feature `v1_16` so that we use
an API version that is compatible with Linkerd's minimum support
kubernetes version.
Closes#6657#6658#6659
crazy-max/ghaction-docker-buildx#172 describes a problem with
cross-building docker images--especially 32b ARM images--and docker
caching.
This change removes caching from the policy-controller dockerfile to
avoid this issue.
We've implemented a new controller--in Rust!--that implements discovery
APIs for inbound server policies. This change imports this code from
linkerd/polixy@25af9b5e.
This policy controller watches nodes, pods, and the recently-introduced
`policy.linkerd.io` CRD resources. It indexes these resources and serves
a gRPC API that will be used by proxies to configure the inbound proxy
for policy enforcement.
This change introduces a new policy-controller container image and adds a
container to the `Linkerd-destination` pod along with a `linkerd-policy` service
to be used by proxies.
This change adds a `policyController` object to the Helm `values.yaml` that
supports configuring the policy controller at runtime.
Proxies are not currently configured to use the policy controller at runtime. This
will change in an upcoming proxy release.