253 lines
10 KiB
Markdown
253 lines
10 KiB
Markdown
---
|
|
reviewers:
|
|
- soltysh
|
|
- sttts
|
|
- ericchiang
|
|
content_type: concept
|
|
title: Auditing
|
|
---
|
|
|
|
<!-- overview -->
|
|
|
|
Kubernetes _auditing_ provides a security-relevant, chronological set of records documenting
|
|
the sequence of actions in a cluster. The cluster audits the activities generated by users,
|
|
by applications that use the Kubernetes API, and by the control plane itself.
|
|
|
|
Auditing allows cluster administrators to answer the following questions:
|
|
|
|
- what happened?
|
|
- when did it happen?
|
|
- who initiated it?
|
|
- on what did it happen?
|
|
- where was it observed?
|
|
- from where was it initiated?
|
|
- to where was it going?
|
|
|
|
<!-- body -->
|
|
|
|
Audit records begin their lifecycle inside the
|
|
[kube-apiserver](/docs/reference/command-line-tools-reference/kube-apiserver/)
|
|
component. Each request on each stage
|
|
of its execution generates an audit event, which is then pre-processed according to
|
|
a certain policy and written to a backend. The policy determines what's recorded
|
|
and the backends persist the records. The current backend implementations
|
|
include logs files and webhooks.
|
|
|
|
Each request can be recorded with an associated _stage_. The defined stages are:
|
|
|
|
- `RequestReceived` - The stage for events generated as soon as the audit
|
|
handler receives the request, and before it is delegated down the handler
|
|
chain.
|
|
- `ResponseStarted` - Once the response headers are sent, but before the
|
|
response body is sent. This stage is only generated for long-running requests
|
|
(e.g. watch).
|
|
- `ResponseComplete` - The response body has been completed and no more bytes
|
|
will be sent.
|
|
- `Panic` - Events generated when a panic occurred.
|
|
|
|
{{< note >}}
|
|
Audit events are different from the
|
|
[Event](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#event-v1-core)
|
|
API object.
|
|
{{< /note >}}
|
|
|
|
The audit logging feature increases the memory consumption of the API server
|
|
because some context required for auditing is stored for each request.
|
|
Memory consumption depends on the audit logging configuration.
|
|
|
|
## Audit policy
|
|
|
|
Audit policy defines rules about what events should be recorded and what data
|
|
they should include. The audit policy object structure is defined in the
|
|
[`audit.k8s.io` API group](https://github.com/kubernetes/kubernetes/blob/{{< param "githubbranch" >}}/staging/src/k8s.io/apiserver/pkg/apis/audit/v1/types.go).
|
|
When an event is processed, it's
|
|
compared against the list of rules in order. The first matching rule sets the
|
|
_audit level_ of the event. The defined audit levels are:
|
|
|
|
- `None` - don't log events that match this rule.
|
|
- `Metadata` - log request metadata (requesting user, timestamp, resource,
|
|
verb, etc.) but not request or response body.
|
|
- `Request` - log event metadata and request body but not response body.
|
|
This does not apply for non-resource requests.
|
|
- `RequestResponse` - log event metadata, request and response bodies.
|
|
This does not apply for non-resource requests.
|
|
|
|
You can pass a file with the policy to `kube-apiserver`
|
|
using the `--audit-policy-file` flag. If the flag is omitted, no events are logged.
|
|
Note that the `rules` field __must__ be provided in the audit policy file.
|
|
A policy with no (0) rules is treated as illegal.
|
|
|
|
Below is an example audit policy file:
|
|
|
|
{{< codenew file="audit/audit-policy.yaml" >}}
|
|
|
|
You can use a minimal audit policy file to log all requests at the `Metadata` level:
|
|
|
|
```yaml
|
|
# Log all requests at the Metadata level.
|
|
apiVersion: audit.k8s.io/v1
|
|
kind: Policy
|
|
rules:
|
|
- level: Metadata
|
|
```
|
|
|
|
If you're crafting your own audit profile, you can use the audit profile for Google Container-Optimized OS as a starting point. You can check the
|
|
[configure-helper.sh](https://github.com/kubernetes/kubernetes/blob/{{< param "githubbranch" >}}/cluster/gce/gci/configure-helper.sh)
|
|
script, which generates an audit policy file. You can see most of the audit policy file by looking directly at the script.
|
|
|
|
## Audit backends
|
|
|
|
Audit backends persist audit events to an external storage.
|
|
Out of the box, the kube-apiserver provides two backends:
|
|
|
|
- Log backend, which writes events into the filesystem
|
|
- Webhook backend, which sends events to an external HTTP API
|
|
|
|
In all cases, audit events follow a structure defined by the Kubernetes API in the
|
|
`audit.k8s.io` API group. For Kubernetes {{< param "fullversion" >}}, that
|
|
API is at version
|
|
[`v1`](https://github.com/kubernetes/kubernetes/blob/{{< param "githubbranch" >}}/staging/src/k8s.io/apiserver/pkg/apis/audit/v1/types.go).
|
|
|
|
{{< note >}}
|
|
In case of patches, request body is a JSON array with patch operations, not a JSON object
|
|
with an appropriate Kubernetes API object. For example, the following request body is a valid patch
|
|
request to `/apis/batch/v1/namespaces/some-namespace/jobs/some-job-name`:
|
|
|
|
```json
|
|
[
|
|
{
|
|
"op": "replace",
|
|
"path": "/spec/parallelism",
|
|
"value": 0
|
|
},
|
|
{
|
|
"op": "remove",
|
|
"path": "/spec/template/spec/containers/0/terminationMessagePolicy"
|
|
}
|
|
]
|
|
```
|
|
|
|
{{< /note >}}
|
|
|
|
### Log backend
|
|
|
|
The log backend writes audit events to a file in [JSONlines](https://jsonlines.org/) format.
|
|
You can configure the log audit backend using the following `kube-apiserver` flags:
|
|
|
|
- `--audit-log-path` specifies the log file path that log backend uses to write
|
|
audit events. Not specifying this flag disables log backend. `-` means standard out
|
|
- `--audit-log-maxage` defined the maximum number of days to retain old audit log files
|
|
- `--audit-log-maxbackup` defines the maximum number of audit log files to retain
|
|
- `--audit-log-maxsize` defines the maximum size in megabytes of the audit log file before it gets rotated
|
|
|
|
If your cluster's control plane runs the kube-apiserver as a Pod, remember to mount the `hostPath`
|
|
to the location of the policy file and log file, so that audit records are persisted. For example:
|
|
```shell
|
|
--audit-policy-file=/etc/kubernetes/audit-policy.yaml \
|
|
--audit-log-path=/var/log/audit.log
|
|
```
|
|
then mount the volumes:
|
|
|
|
```yaml
|
|
...
|
|
volumeMounts:
|
|
- mountPath: /etc/kubernetes/audit-policy.yaml
|
|
name: audit
|
|
readOnly: true
|
|
- mountPath: /var/log/audit.log
|
|
name: audit-log
|
|
readOnly: false
|
|
```
|
|
and finally configure the `hostPath`:
|
|
|
|
```yaml
|
|
...
|
|
- name: audit
|
|
hostPath:
|
|
path: /etc/kubernetes/audit-policy.yaml
|
|
type: File
|
|
|
|
- name: audit-log
|
|
hostPath:
|
|
path: /var/log/audit.log
|
|
type: FileOrCreate
|
|
|
|
```
|
|
|
|
|
|
|
|
### Webhook backend
|
|
|
|
The webhook audit backend sends audit events to a remote web API, which is assumed to
|
|
be a form of the Kubernetes API, including means of authentication. You can configure
|
|
a webhook audit backend using the following kube-apiserver flags:
|
|
|
|
- `--audit-webhook-config-file` specifies the path to a file with a webhook
|
|
configuration. The webhook configuration is effectively a specialized
|
|
[kubeconfig](/docs/tasks/access-application-cluster/configure-access-multiple-clusters).
|
|
- `--audit-webhook-initial-backoff` specifies the amount of time to wait after the first failed
|
|
request before retrying. Subsequent requests are retried with exponential backoff.
|
|
|
|
The webhook config file uses the kubeconfig format to specify the remote address of
|
|
the service and credentials used to connect to it.
|
|
|
|
## Event batching {#batching}
|
|
|
|
Both log and webhook backends support batching. Using webhook as an example, here's the list of
|
|
available flags. To get the same flag for log backend, replace `webhook` with `log` in the flag
|
|
name. By default, batching is enabled in `webhook` and disabled in `log`. Similarly, by default
|
|
throttling is enabled in `webhook` and disabled in `log`.
|
|
|
|
- `--audit-webhook-mode` defines the buffering strategy. One of the following:
|
|
- `batch` - buffer events and asynchronously process them in batches. This is the default.
|
|
- `blocking` - block API server responses on processing each individual event.
|
|
- `blocking-strict` - Same as blocking, but when there is a failure during audit logging at the
|
|
RequestReceived stage, the whole request to the kube-apiserver fails.
|
|
|
|
The following flags are used only in the `batch` mode:
|
|
|
|
- `--audit-webhook-batch-buffer-size` defines the number of events to buffer before batching.
|
|
If the rate of incoming events overflows the buffer, events are dropped.
|
|
- `--audit-webhook-batch-max-size` defines the maximum number of events in one batch.
|
|
- `--audit-webhook-batch-max-wait` defines the maximum amount of time to wait before unconditionally
|
|
batching events in the queue.
|
|
- `--audit-webhook-batch-throttle-qps` defines the maximum average number of batches generated
|
|
per second.
|
|
- `--audit-webhook-batch-throttle-burst` defines the maximum number of batches generated at the same
|
|
moment if the allowed QPS was underutilized previously.
|
|
|
|
## Parameter tuning
|
|
|
|
Parameters should be set to accommodate the load on the API server.
|
|
|
|
For example, if kube-apiserver receives 100 requests each second, and each request is audited only
|
|
on `ResponseStarted` and `ResponseComplete` stages, you should account for ≅200 audit
|
|
events being generated each second. Assuming that there are up to 100 events in a batch,
|
|
you should set throttling level at least 2 queries per second. Assuming that the backend can take up to
|
|
5 seconds to write events, you should set the buffer size to hold up to 5 seconds of events;
|
|
that is: 10 batches, or 1000 events.
|
|
|
|
In most cases however, the default parameters should be sufficient and you don't have to worry about
|
|
setting them manually. You can look at the following Prometheus metrics exposed by kube-apiserver
|
|
and in the logs to monitor the state of the auditing subsystem.
|
|
|
|
- `apiserver_audit_event_total` metric contains the total number of audit events exported.
|
|
- `apiserver_audit_error_total` metric contains the total number of events dropped due to an error
|
|
during exporting.
|
|
|
|
### Log entry truncation {#truncate}
|
|
|
|
Both log and webhook backends support limiting the size of events that are logged.
|
|
As an example, the following is the list of flags available for the log backend:
|
|
|
|
- `audit-log-truncate-enabled` whether event and batch truncating is enabled.
|
|
- `audit-log-truncate-max-batch-size` maximum size in bytes of the batch sent to the underlying backend.
|
|
- `audit-log-truncate-max-event-size` maximum size in bytes of the audit event sent to the underlying backend.
|
|
|
|
By default truncate is disabled in both `webhook` and `log`, a cluster administrator should set
|
|
`audit-log-truncate-enabled` or `audit-webhook-truncate-enabled` to enable the feature.
|
|
|
|
## {{% heading "whatsnext" %}}
|
|
|
|
* Learn about [Mutating webhook auditing annotations](/docs/reference/access-authn-authz/extensible-admission-controllers/#mutating-webhook-auditing-annotations).
|