The heartbeat cronjob specified `restartPolicy: OnFailure`. In cases
where failure was non-transient, such as if a cluster did not have
internet access, this would continuously restart and fail.
Change the heartbeat cronjob to `restartPolicy: Never`, as a failed job
has no user-facing impact.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Adds more PSP restrictions
* Update test fixtures
* Updates PSP to be conditional on initContainer
- The proxy-init container runs as root and needs the PSP to allow this
user when there is an init container.
Signed-off-by: Cody Vandermyn <cody.vandermyn@nordstrom.com>
The `_resources.yaml` partial hard-coded indentation, making it
cumbersome to use it in contexts that did not have a specific
indentation level.
Remove indentation from `_resources.yaml`, and instead specify required
indentation at the call site via `nindent`.
Fixes#3119
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
`linkerd check`, the web dashboard, and Grafana all perform version
checks to validate Linkerd is up to date. It's common for users to
seldom execute these codepaths. This makes it difficult to identify what
versions of Linkerd are currently in use and what environments it is
being run in, which helps prioritize testing and backports.
Introduce a `heartbeat` CronJob to the default Linkerd install. The
cronjob executes every 24 hours, starting from 5 minutes after
`linkerd install` is run.
Example check URL:
https://versioncheck.linkerd.io/version.json?
install-time=1562761177&
k8s-version=v1.15.0&
meshed-pods=8&
rps=3&
source=heartbeat&
uuid=cc4bb700-3314-426a-9f0f-ec588b9df020&
version=git-b97ee9f7
Fixes#2961
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The openAPIV3Schema validation in the ServiceProfiles CRD is very limited in what it can validate and is obviated by more sophisticated validation done by the validating admission controller. Therefore, we would like to remove the openAPIV3Schema validation to reduce the size and complexity of the CRD object.
To do so, we must also bump the version of the ServiceProfile custom resource from v1alpha1 to v1alpha2. This ensures that when the controller is upgraded, it will attempt to watch the v1alpha2 resource. If it cannot (because, for example, the controller pod started before the ServiceProfile CRD was updated and therefore the v1alpha2 version does not exist) then it will go into a crash loop backoff until it can. This essentially means that the controller will wait for the CRD to be upgraded to include v1alpha2 before it will start.
Bumping the version is necessary because if we did not, it would be possible for the controller to start before the CRD is updated (removing the validation). In this case, when the CRD is edited, the controller will lose its list watch on ServiceProfiles and will stop getting updates.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Added Anti Affinity when HA is configured
* Move check to validate()
* Test output with anti-affinity when ha upgrade
* Add anti-affinity to identity deployment
* made host anti-affinity default when ha
* Define affinity template in a separate file
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
This change implements the DstOverrides feature of the destination profile API (aka traffic splitting).
We add a TrafficSplitWatcher to the destination service which watches for TrafficSplit resources and notifies subscribers about TrafficSplits for services that they are subscribed to. A new TrafficSplitAdaptor then merges the TrafficSplit logic into the DstOverrides field of the destination profile.
Signed-off-by: Alex Leong <alex@buoyant.io>
Fixes#2927
Also moved `TestInstallSP` after `TestCheckPostInstall` so we're sure
the validating webhook is ready before installing a service profile.
Signed-off-by: Alejandro Pedraza Borrero <alejandro@buoyant.io>
When installing multiple control planes, the mutatingwebhookconfiguration of the first control plane gets overwritten by any subsequent control plane install. This is caused by the fixed name given to the mutatingwebhookconfiguration manifest at install time.
This commit adds in the namespace to the manifest so that there is a unique configuration for each control plane.
Fixes#2887
* Add control plane and CNI PSP and RBAC resources
* Add the '--linkerd-cni-enabled' flag to the multi-stage install subcommands
This flag ensures that the NET_ADMIN capability is omitted from the control
plane's PSP during 'install config' and the proxy-init containers aren't
injected during 'install control-plane'.
Signed-off-by: Ivan Sim <ivan@buoyant.io>
* If HA, set the webhooks failure policy to 'Fail'
I'm adding to the linkerd namespace a new label
`linkerd.io/is-control-plane: true` that is used in the webhook configs'
selector to skip the proxy injector for this namespace. This avoids
running into the timing issues described in #2852.
Fixes#2852
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
* Added labels to webhook configurations in charts/
* Multiple replicas of proxy-injector and sp-validator in HA
* Use ControllerComponent template variable for webhookconfigurations
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Update helm charts to include webhooks config and TLS secret
* Update the webhooks to read the secret cert and key
* Update webhooks to not recreate config on restart
* Ensure upgrade preserve existing secrets
* Revert the change to rename the webhook configs
The renaming change breaks upgrade, where the new webhook configs conflict with
the existing ones. The older resources aren't deleted during upgrade because
they are dynamically created.
* Make the secret volume read-only
* Remove unnecessary exported getter functions
* Remove obsolete mwc and vwc templates
Signed-off-by: Ivan Sim <ivan@buoyant.io>
All ServiceAccounts are intended to be grouped together with other RBAC
resources, particularly for `linkerd install config` output. Grafana and
Web ServiceAccounts were still included with their respective
Deployments.
Group Grafana and Web ServiceAccounts with other RBAC resources.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
`linkerd install` supports a 2-stage install process, `linkerd upgrade`
did not.
Add 2-stage support for `linkerd upgrade`. Also exercise multi-stage
functionality during upgrade integration tests.
Part of #2337
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
This reverts commit 3de16d47be.
#2740 modified the ServiceProfiles CRD which will cause issues for users upgrading from the old CRD version to the new version. #2748 was an attempt to fix this by bumping the service profile CRD version, however, our testing infrastructure is not well set up to accommodate changes to CRDs because they are resources which are global to the cluster.
We revert this change for now and will revisit it in the future when we can give more thought to CRD versioning, upgrade, and testing.
Signed-off-by: Alex Leong <alex@buoyant.io>
This is an initial change to separate out config-specific k8s objects
from the control-plane components. The eventual goal will be rendering
these configs as the first stage of a multi-stage install.
Part of #2337
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Add validation webhook for service profiles
Fixes#2075
Todo in a follow-up PRs: remove the SP check from the CLI check.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
`storage.tsdb.retention` is deprecated in favor of
`storage.tsdb.retention.time`.
Replace all occurrences.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
When installing Linkerd, a user may override default settings, or may
explicitly configure defaults. Consider install options like `--ha
--controller-replicas=4` -- the `--ha` flag sets a new default value for
the controller-replicas, and then we override it.
When we later upgrade this cluster, how can we know how to configure the
cluster?
We could store EnableHA and ControllerReplicas configurations in the
config, but what if, in a later upgrade, the default value changes? How
can we know whether the user specified an override or just used the
default?
To solve this, we add an `Install` message into a new config.
This message includes (at least) the CLI flags used to invoke
install.
upgrade does not specify defaults for install/proxy-options fields and,
instead, uses the persisted install flags to populate default values,
before applying overrides from the upgrade invocation.
This change breaks the protobuf compatibility by altering the
`installation_uuid` field introduced in 9c442f6885.
Because this change was not yet released (even in an edge release), we
feel that it is safe to break.
Fixes https://github.com/linkerd/linkerd2/issues/2574
This change moves resource-templating logic into a dedicated template,
creates new values types to model kubernetes resource constraints, and
changes the `--ha` flag's behavior to create these resource templates
instead of hardcoding the resource constraints in the various templates.
Some of our templates have started to use 'with .Values' scoping to
limit boilerplate within the tempates.
This change makes this uniform in all templates.
Have the Webhook react to pod creation/update only
This was already working almost out-of-the-box, just had to:
- Change the webhook config so it watches pods instead of deployments
- Grant some extra ClusterRole permissions
- Add the piece that figures what's the OwnerReference and add the label
for it
- Manually inject service account mount paths
- Readd volumes tests
Fixes#2342 and #1751
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
This change reintroduces identity hinting to the destination service.
The Get endpoint includes identities for pods that are injected with an
identity-mode of "default" and have the same linkerd control plane.
A `serviceaccount` label is now also added to destination response
metadata so that it's accessible in prometheus and tap.
Because the linkerd-config resource is created after pods that require
it, they can be started before the files are mounted, causing the pods
to restart integration tests to fail.
If we extract the config into its own template file, it can be inserted
before pods are created.
The introduction of identity in 0626fa37 created new state in the
control plane's configuration that must be considered when re-installing
the control plane or when injecting pods.
This change alters `install` to fail if it would seem to conflict with
an existing installation. This behavior may be disabled with the
`--ignore-cluster` flag.
Furthermore, `inject` now _requires_ that it can fetch a configuration
from the control plane in order to operate. Otherwise the
`--ignore-cluster` and `--disable-identity` flags must be specified.
This change does not actually instrument pods to use identity yet---it
lays the framework for proxy identity without changing the test fixture
output (besides a change to how identity HA is configured).
Fixes#2531
https://github.com/linkerd/linkerd2/pull/2521 introduces an "Identity"
controller, but there is no way to include it in linkerd installation.
This change alters the `install` flow as follows:
- An Identity service is _always_ installed;
- Issuer credentials may be specified via the CLI;
- If no Issuer credentials are provided, they are generated each time `install` is called.
- Proxies are NOT configured to use the identity service.
- It's possible to override the credential generation logic---especially
for tests---via install options that can be configured via the CLI.
The new proxy has changed its configuration as follows:
- `LISTENER` urls are now `LISTEN_ADDR` addresses;
- `CONTROL_URL` is now `DESTINATION_SVC_ADDR`;
- `*_NAMESPACE` vars are no longer needed;
- The `PROXY_ID` is now the `DESTINATION_CONTEXT`;
- The "metrics" port is now the "admin" port, since it serves more than
just metrics;
- A readiness probe now checks a dedicated /ready endpoint eagerly.
Identity injection is **NOT** configured by this branch.
The proxy's TLS implementation has changed to use a new _Identity_ controller.
In preparation for this, the `--tls=optional` CLI flag has been removed
from install and inject; and the `ca` controller has been deleted. Metrics
and UI treatments for TLS have **not** been removed, as they will continue to
be valuable for the new Identity system.
With the removal of the old identity scheme, the Destination service's proxy
ID field is now set with an opaque string (e.g. `ns:emojivoto`) to enable
locality awareness.
linkerd/linkerd2#1721 introduced a `--single-namespace` install flag,
enabling the control-plane to function within a single namespace. With
the introduction of ServiceProfiles, and upcoming identity changes, this
single namespace mode of operation is becoming less viable.
This change removes the `--single-namespace` install flag, and all
underlying support. The control-plane must have cluster-wide access to
operate.
A few related changes:
- Remove `--single-namespace` from `linkerd check`, this motivates
combining some check categories, as we can always assume cluster-wide
requirements.
- Simplify the `k8s.ResourceAuthz` API, as callers no longer need to
make a decision based on cluster-wide vs. namespace-wide access.
Components either have access, or they error out.
- Modify the web dashboard to always assume ServiceProfiles are enabled.
Reverts #1721
Part of #2337
Signed-off-by: Andrew Seigner <siggy@buoyant.io>