* Re-add the destination container to the controller spec
This fix is necessary to avoid data plane downtime during an upgrade to
stable-2.6. All existing older proxies will continue to send requests to
this destination container, until the data plane is restarted.
On restart, the new pods will start forwarding their requests to the new
linkerd-dst service.
* Use the 2.6 destination service fqdn
* Fixed unit tests
* Fix integration test failure
Signed-off-by: Ivan Sim <ivan@buoyant.io>
### Motivation
In order to expose arbitrary headers through tap, headers and trailers should be
read from the linkerd2-proxy-api `TapEvent`s and set in the public `TapEvent`s.
This change should have no user facing changes as it just prepares the events
for JSON output in linkerd/linkerd2#3390
### Solution
The public API has been updated with a headers field for
`TapEvent_Http_RequestInit_` and `TapEvent_Http_ResponseInit_`, and trailers
field for `TapEvent_Http_ResponseEnd_`.
These values are set by reading the corresponding fields off of the proxy's tap
events.
The proto changes are equivalent to the proto changes proposed in
linkerd/linkerd2-proxy-api#33
Closes#3262
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
This reverts commit edd3b1f6d4.
This is a temporary revert of #3461 while we sort out some details of how this should configured and how it should interact with configuring a trace collector on the Linkerd proxy. We will reintroduce this change once the config plan is straightened out.
Signed-off-by: Alex Leong <alex@buoyant.io>
When running linkerd in HA mode, a cluster can be broken by bringing down the proxy-injector.
Add a label to MWC namespace selctor that skips any namespace.
Fixes#3346
Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
* Update prometheus cadvisor config to only keep container resources metrics
Signed-off-by: Ivan Sim <ivan@buoyant.io>
* Drop unused large metric
Signed-off-by: Ivan Sim <ivan@buoyant.io>
* Fix unit test
Signed-off-by: Ivan Sim <ivan@buoyant.io>
* Siggy's feedback
Signed-off-by: Ivan Sim <ivan@buoyant.io>
* Fix unit test
Signed-off-by: Ivan Sim <ivan@buoyant.io>
* Trim certs and keys in the Helm charts
Fixes#3419
When installing through the CLI the installation will fail if the certs
are malformed, so this only concerns the Helm templates.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
* Couple of injection events fixes
When generating events in quick succession against the same target, client-go issues a PATCH request instead of a POST, so we need the extra RBAC permission.
Also we have an informer on pods, so we also need the "watch" permission
for them, whose omission was causing an error entry in the logs.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
Fixes#3356
1.16 removes some api groups that were already deprecated. From k8s blog
post (https://kubernetes.io/blog/2019/07/18/api-deprecations-in-1-16/):
```
- PodSecurityPolicy: will no longer be served from extensions/v1beta1 in
v1.16.
Migrate to the policy/v1beta1 API, available since v1.10. Existing
persisted data can be retrieved/updated via the policy/v1beta1 API.
- DaemonSet, Deployment, StatefulSet, and ReplicaSet: will no longer be
served from extensions/v1beta1, apps/v1beta1, or apps/v1beta2 in v1.16.
Migrate to the apps/v1 API, available since v1.9. Existing persisted
data can be retrieved/updated via the apps/v1 API.
```
Previous PRs had already made this change at the Helm templates level,
but we still needed to do it at the API calls and tests.
The integration tests ran fine for k8s 1.12 and 1.15. They fail on 1.16
because the upgrade integration test tries to install linkerd 2.5 which is not
compatible with 1.16.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
* Set custom cluster domain in GetServiceProfileFor
* Set custom cluster domain in tap server
Move fetching cluster domain for tap server to cmd main
* Handle fetchting cluster domain errors separately
* Use custom cluster domain for traffic split adaptor
Signed-off-by: Armin Buerkle <armin.buerkle@alfatraining.de>
* Have the proxy-injector emit events upon injection/skipping injection
Fixes#3253
Have the proxy-injector emit an event whenever a injection happens, or
when injection is skipped for some reason (also added that reason into
the proxy-injector logs). The level is associated to the parent workload
(it can't be associated to the pod because at this point the pod hasn't
been persisted).
The event recorder was setup at the `webhook/server.go` level and passed
to the proxy-injector's `Inject` function. The sp-validator thus also
has access to the event recorder, but for now it's not using it.
Related changes:
- Refactored `api.GetOwnerKindAndName()` to have it return a more
generic object.
- Refactored `report.Injectable()` to also have it return the reason why
a workload is not injectable.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
* Rename template-values.go
* Define new constructor of charts.Values type
* Move all Helm values related code to the pkg/charts package
* Bump dependency
* Use '/' in filepath to remain compatible with VFS requirement
* Add unit test to verify Helm YAML output
* Alejandro's feedback
* Add unit test for Helm YAML validation (HA)
Signed-off-by: Ivan Sim <ivan@buoyant.io>
### Summary
After the addition of the tap APIServer, all the logic related to tap in the public API no longer needs to be there. The servers and clients that are created but not used, as well as all the old testing infrastrucure related to tap can be removed.
This deprecates TapByResource and therefore required an update to the protobuf files with `bin/protoc-go.sh`. While the change to deprecate this method was extremely small, a lot of protobuf fils were updated in the process. These changes to the code and protobuf files should probably remain coupled since `TapByResource` is officially deprecated in the public API, but a majority of the additions/deletions are related to those files.
This draft passes `go test` as well as a local run of the integration tests.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
This PR adds `trafficsplit` as a supported resource for the `linkerd stat` command. Users can type `linkerd stat ts` to see the apex and leaf services of their trafficsplits, as well as metrics for those leaf services.
* Delete symlink to old Helm chart
* Update 'install' code to use common Helm template structs
* Remove obsolete TLS assets functions.
These are now handle by Helm functions inside the templates
* Read defaults from values.yaml and values-ha.yaml
* Ensure that webhooks TLS assets are retained during upgrade
* Fix a few bugs in the Helm templates (see bullet points):
* Merge the way the 'install' ha and non-ha options are handled into one function
* Honor the 'NoInitContainer' option in the components templates
* Control plane mTLS will not be disabled if identity context in the
config map is empty. The data plane mTLS will still be automatically disabled
if the context is nil.
* Resolve test failures from rebase with master
* Fix linter issues
* Set service account mount path read-only field
* Add TLS variables of the webhooks and tap to values.yaml
During upgrade, these secrets are preserved to ensure they remain synced
wih the CA bundle in the webhook configurations. These Helm variables are used
to override the defaults in the templates.
* Remove obsolete 'chart' folder
* Fix bugs in templates
* Handle missing webhooks and tap TLS assets during upgrade
When upgrading from an older version that don't have these secrets, fallback to let Helm
create them by creating an empty charts.TLS struct.
* Revert the selector labels of webhooks to be compatible with that in 2.4
In 2.4, the proxy injector and profile validator webhooks already have their selector labels defined.
Since these attributes are immutable, the recent change to these selectors introduced by the Helm chart work will cause upgrade to fail.
* Alejandro's feedback
* Siggy's feedback
* Removed redundant unexported custom types
Signed-off-by: Ivan Sim <ivan@buoyant.io>
Now that we inject at the pod level by default, `linkerd uninject` should remove the `linkerd.io/inject: enabled`
annotation. Also added a test for that.
Fix#3156
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
The `linkerd-linkerd-tap-admin` ClusterRole had `watch` privileges on
`*/tap` resources. This disallowed non-namespaced tap requests of the
form: `/apis/tap.linkerd.io/v1alpha1/watch/namespaces/linkerd/tap`,
because that URL structure is interpreted by the Kubernetes API as
watching a resource of type `tap` within the linkerd namespace, rather
than tapping the linkerd namespace.
Modify `linkerd-linkerd-tap-admin` to have `watch` privileges on `*`,
enabling any request of the form
`/apis/tap.linkerd.io/v1alpha1/watch/namespaces/linkerd/*` to succeed.
Fixes#3212
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The web dashboard will be migrating to the new Tap APIService, which
requires RBAC privileges to access.
Introduce a new ClusterRole, `linkerd-linkerd-tap-admin`, which gives
cluster-wide tap privileges. Also introduce a new ClusterRoleBinding,
`linkerd-linkerd-web-admin` which binds the `linkerd-web` service
account to the new tap ClusterRole. This ClusterRoleBinding is enabled
by default, but may be disabled via a new `linkerd install` flag
`--restrict-dashboard-privileges`.
Fixes#3177
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Refactor proxy injection to use Helm charts
Fixes#3128
A new chart `/charts/patch` was created, that generates the JSON patch
payload that is to be returned to the k8s API when doing the injection
through the proxy injector, and it's also leveraged by the `linkerd
inject --manual` CLI.
The VFS was used by `linkerd install` to access the old chart under
`/chart`. Now the proxy injection also uses the Helm charts to generate
the JSON patch (see above) so we've moved the VFS from `cli/static` to a
new common place under `/pkg/charts/static`, and the new root for the VFS is
now `/charts`.
`linkerd install` hasn't yet migrated to use the new charts (that'll
happen in #3127), so the only change in that regard was the creation of
`/charts/chart` which is a symlink pointing to `/chart` that
`install.go` now uses, so that the VFS contains both the old and new
charts, as a temporary measure.
You can see that `/bin/Dockerfile-bin`, `/controller/Dockerfile` and
`/bin/build-cli-bin` do now `go generate` pointing to the new location
(and the `go generate` annotation was moved from `/cli/main.go` to
`pkg/charts/static/templates.go`).
The symlink trick doesn't work when building the binaries through
Docker, so `/bin/Dockerfile-bin` replaces the symlink with an actual
copy of `/chart`.
Also note that in `/controller/Dockerfile` we now need to include the
`prod` tag in `go install` like we do in `/bin/Dockerfile-bin` so that
the proxy injector does use the VFS instead of the local file system.
- The common logic to parse a chart has been moved from `install.go` to
`/pkg/charts/util.go`.
- The special ENV var in the proxy for "outbound router capacity" that
only applies to the Prometheus pod is now handled directly in the proxy
partial and all the associated go code could be removed.
- The `patch.go` lib for generating the JSON patch in go along
with its tests `patch_test.go` are no longer needed.
- Lots of functions in `/pkg/inject/inject.go` got removed/simplified
with their logic being moved into the charts themselves. As a
consequence lots of things in `inject_test.go` became irrelevant.
- Moved `template-values.go` from `/pkg/inject` to `pkg/charts` as that
contains the go structs representation of the chart variables that
will be leveraged in #3127.
Don't forget to run `/bin/helm.sh` whenever you make changes to charts
;-)
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
The Tap Service enabled tapping of any meshed pod, regardless of user
privilege.
This change introduces a new Tap APIService. Kubernetes provides
authentication and authorization of Tap requests, and then forwards
requests to a new Tap APIServer, which implements a Kubernetes
aggregated APIServer. The Tap APIServer authenticates the client TLS
from Kubernetes, and authorizes the user via a SubjectAccessReview.
This change also modifies the `linkerd tap` command to make requests
against the new APIService.
The Tap APIService implements these Kubernetes-style endpoints:
POST /apis/tap.linkerd.io/v1alpha1/watch/namespaces/:ns/tap
POST /apis/tap.linkerd.io/v1alpha1/watch/namespaces/:ns/:res/:name/tap
GET /apis
GET /apis/tap.linkerd.io
GET /apis/tap.linkerd.io/v1alpha1
GET /healthz
GET /healthz/log
GET /healthz/ping
GET /metrics
GET /openapi/v2
GET /version
Users authorize to the new `tap.linkerd.io/v1alpha1` via RBAC. Only the
`watch` verb is supported. Access is also available via subresources
such as `deployments/tap` and `pods/tap`.
This change introduces the following resources into the default Linkerd
install:
- Global
- APIService/v1alpha1.tap.linkerd.io
- ClusterRoleBinding/linkerd-linkerd-tap-auth-delegator
- `linkerd` namespace:
- Secret/linkerd-tap-tls
- `kube-system` namespace:
- RoleBinding/linkerd-linkerd-tap-auth-reader
Tasks not covered by this PR:
- `linkerd top`
- `linkerd dashboard`
- `linkerd profile --tap`
- removal of the unauthenticated tap controller
Fixes#2725, #3162, #3172
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The heartbeat cronjob specified `restartPolicy: OnFailure`. In cases
where failure was non-transient, such as if a cluster did not have
internet access, this would continuously restart and fail.
Change the heartbeat cronjob to `restartPolicy: Never`, as a failed job
has no user-facing impact.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
### Summary
In order for Pods' tap servers to start authorizing tap clients, the tap server
must be able to check client names against the expected tap service name.
This change injects the `LINKERD2_PROXY_TAP_SVC_NAME` into proxy PodSpecs.
### Details
The tap servers on the individual resources being tapped should be able to
verify that the client is the tap service. The `LINKERD2_PROXY_TAP_SVC_NAME` is
now injected as an environment variable in the proxies so that it can check this
value against the client name of the TLS connection. Currently, this environment
will go unused. There is an open PR (linkerd2-proxy#290) to use this variable in
the proxy, but this is *not* dependent on that merging first.
Note: The variable is not injected if tap is disabled.
### Testing
Test output has been updated with the newly injected environment variable.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
* increased ha resource limits
* added resource limits to proxy when HA
* update golden files in cmd/main
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Adds more PSP restrictions
* Update test fixtures
* Updates PSP to be conditional on initContainer
- The proxy-init container runs as root and needs the PSP to allow this
user when there is an init container.
Signed-off-by: Cody Vandermyn <cody.vandermyn@nordstrom.com>
`linkerd check`, the web dashboard, and Grafana all perform version
checks to validate Linkerd is up to date. It's common for users to
seldom execute these codepaths. This makes it difficult to identify what
versions of Linkerd are currently in use and what environments it is
being run in, which helps prioritize testing and backports.
Introduce a `heartbeat` CronJob to the default Linkerd install. The
cronjob executes every 24 hours, starting from 5 minutes after
`linkerd install` is run.
Example check URL:
https://versioncheck.linkerd.io/version.json?
install-time=1562761177&
k8s-version=v1.15.0&
meshed-pods=8&
rps=3&
source=heartbeat&
uuid=cc4bb700-3314-426a-9f0f-ec588b9df020&
version=git-b97ee9f7
Fixes#2961
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The openAPIV3Schema validation in the ServiceProfiles CRD is very limited in what it can validate and is obviated by more sophisticated validation done by the validating admission controller. Therefore, we would like to remove the openAPIV3Schema validation to reduce the size and complexity of the CRD object.
To do so, we must also bump the version of the ServiceProfile custom resource from v1alpha1 to v1alpha2. This ensures that when the controller is upgraded, it will attempt to watch the v1alpha2 resource. If it cannot (because, for example, the controller pod started before the ServiceProfile CRD was updated and therefore the v1alpha2 version does not exist) then it will go into a crash loop backoff until it can. This essentially means that the controller will wait for the CRD to be upgraded to include v1alpha2 before it will start.
Bumping the version is necessary because if we did not, it would be possible for the controller to start before the CRD is updated (removing the validation). In this case, when the CRD is edited, the controller will lose its list watch on ServiceProfiles and will stop getting updates.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Allow custom cluster domain in destination watcher
The change relaxes the constrains of an authority requiring a
`svc.cluster.local` suffix to only require `svc` as third part.
A unit test could be added though the destination/server and endpoint
watcher already test this behaviour.
* Update proto to allow setting custom cluster domain
Update golden templates
* Allow setting custom domain in grpc, web server
* Remove cluster domain flags from web srv and public api
* Set defaultClusterDomain in validateAndBuild if none is set
Signed-off-by: Armin Buerkle <armin.buerkle@alfatraining.de>
* Added Anti Affinity when HA is configured
* Move check to validate()
* Test output with anti-affinity when ha upgrade
* Add anti-affinity to identity deployment
* made host anti-affinity default when ha
* Define affinity template in a separate file
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
When installing using some of the flags that persist in install, e.g
`linkerd install --ha`, and then doing `linkerd upgrade config` a nil
pointer error is thrown.
Fixes#3094
`newCmdUpgradeConfig()` was using passing `flags` as nil because
`linkerd upgrade config` doesn't expose any flags for the subcommand,
but turns out they're still needed down the call stack in
`setFlagsFromInstall` to reuse the flags persisted during install.
I also added a new unit test catching this.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
This PR improves the CLI output for `linkerd edges` to reflect the latest API
changes.
Source and destination namespaces for each edge are now shown by default. The
`MSG` column has been replaced with `Secured` and contains a green checkmark or
the reason for no identity. A new `-o wide` flag shows the identity of client
and server if known.
During operations with `linkerd stat` sometimes it's not clear the actual
pod status.
This commit introduces a method, to the `k8s`package, getting the pod status,
based on [`kubectl` logic](33a3e325f7/pkg/printers/internalversion/printers.go (L558-L640))
to expose the `STATUS` column for pods . Also, it changes the stat command
on the` cli` package adding a column when the resource type is a Pod.
Fixes#1967
Signed-off-by: Jonathan Juares Beber <jonathanbeber@gmail.com>
* Have `linkerd endpoints` use `Destination.Get`
Fixes#2885
We're refactoring `linkerd endpoints` so it hits
directly the `Destination.Get` endpoint, instead of relying on the
Discovery service.
For that, I've created a new `client.go` for Destination and added it to
the `APIClient` interface.
I've also added a `destinationClient` struct that mimics `tapClient`,
and whose common logic has been moved into `stream_client.go`.
Analogously, I added a `destinationServer` struct that mimics
`tapServer`.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
Linkerd's CLI flags all match 1:1 with their `config.linkerd.io/*`
annotation counterparts, except `--enable-debug-sidecar`, which
corresponded to `config.linkerd.io/debug`. Additionally, the Linkerd
docs assume this 1:1 mapping.
Rename the `config.linkerd.io/debug` annotation to
`config.linkerd.io/enable-debug-sidecar`.
Relates to https://github.com/linkerd/website/issues/381
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
This change implements the DstOverrides feature of the destination profile API (aka traffic splitting).
We add a TrafficSplitWatcher to the destination service which watches for TrafficSplit resources and notifies subscribers about TrafficSplits for services that they are subscribed to. A new TrafficSplitAdaptor then merges the TrafficSplit logic into the DstOverrides field of the destination profile.
Signed-off-by: Alex Leong <alex@buoyant.io>