87 lines
5.1 KiB
Markdown
87 lines
5.1 KiB
Markdown
## API call latency SLIs/SLOs details
|
||
|
||
### Definition
|
||
|
||
| Status | SLI | SLO |
|
||
| --- | --- | --- |
|
||
| __Official__ | Latency of processing<sup>[1](#footnote1)</sup> mutating<sup>[2](#footnote2)</sup> API calls for single objects for every (resource, verb) pair, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, for every (resource, verb) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day <= 1s |
|
||
| __Official__ | Latency of processing<sup>[1](#footnote1)</sup> non-streaming read-only<sup>[3](#footnote3)</sup> API calls for every (resource, scope<sup>[4](#footnote4)</sup>) pair, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, for every (resource, scope) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day: (a) <= 1s if `scope=resource` (b) <= 30s<sup>[5](#footnote5)</sup> otherwise (if `scope=namespace` or `scope=cluster`) |
|
||
|
||
<a name="footnote1">\[1\]</a> The SLI only measures latency incurred by the processing
|
||
time of the request. The processing time of a request is the moment when apiserver gets
|
||
the request to last byte of response sent to the user, excluding latency incurred by
|
||
webhooks and priority & fairness queue wait times.
|
||
|
||
<a name="footnote2">\[2\]</a> By mutating API calls we mean POST, PUT, DELETE
|
||
and PATCH.
|
||
|
||
<a name="footnote3">\[3\]</a> By non-streaming read-only API calls we mean GET
|
||
requests without `watch=true` option set. (Note that in Kubernetes internally
|
||
it translates to both GET and LIST calls).
|
||
|
||
<a name="footnote4">\[4\]</a> A scope of a request can be either
|
||
- `resource` - if the request is about a single object
|
||
- `namespace` - if it is about objects from a single namespace
|
||
- `cluster` - if it spawns objects from multiple namespaces
|
||
|
||
<a name="footnote5">\[5\]</a> Historically, the threshold for LISTs with
|
||
`scope=namespace` was set to 5 seconds. However, the threshold was chosen when
|
||
Kubernetes didn't support the scale it supports today and when individual
|
||
namespace didn't contain tens of thousands (if not more) objects of a given
|
||
type. We adjusted the limits to accommodate the usage patterns change, given
|
||
that users are fine with listing tens of thousands of objects taking more than
|
||
5 seconds.
|
||
|
||
### User stories
|
||
- As a user of vanilla Kubernetes, I want some guarantee how quickly I get the
|
||
response from an API call.
|
||
- As an administrator of Kubernetes cluster, if I know characteristics of my
|
||
external dependencies of apiserver (e.g custom admission plugins, priority
|
||
& fairness configuration, and webhooks). I want to be able to provide
|
||
guarantees for API calls latency to users of my cluster.
|
||
|
||
### Other notes
|
||
- We obviously can’t give any guarantee in general, because cluster
|
||
administrators are allowed to register custom admission plugins, webhooks,
|
||
and priority and fairness configurations, which we don’t have any control
|
||
about and they obviously impact API call latencies.
|
||
- As a result, we define the SLIs to be very generic (no matter how your
|
||
cluster is set up), but we provide SLO only for default installations (where we
|
||
have control over what apiserver is doing). This doesn’t provide a false
|
||
impression, that we provide guarantee no matter how the cluster is setup and
|
||
what is installed on top of it.
|
||
- At the same time, API calls are part of pretty much every non-trivial workflow
|
||
in Kubernetes, so this metric is a building block for less trivial SLIs and
|
||
SLOs.
|
||
- The SLO for latency for read-only API calls of a given type may have significant
|
||
buffer in threshold. In fact, the latency of the request should be proportional to
|
||
the amount of work to do (which is number of objects of a given type in a given
|
||
scope) plus some constant overhead. For better tracking of performance, we
|
||
may want to define purely internal SLI of "latency per object". But that
|
||
isn't in near term plans.
|
||
- To recall, SLOs are guaranteed only if thresholds defined in [thresholds file][]
|
||
are satisfied. This is particularly important for this SLO, because it limits
|
||
the number of objects that are returned by LIST calls.
|
||
|
||
[thresholds file]: https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md
|
||
|
||
### Caveats
|
||
- The SLO has to be satisfied independently from used encoding in user-originated
|
||
requests. This makes mix of client important while testing. However, we assume
|
||
that all `core` components communicate with apiserver using protocol buffers.
|
||
- In case of GET requests, user has an option opt-in for accepting potentially
|
||
stale data (being served from cache) and the SLO again has to be satisfied
|
||
independently of that. This makes the careful choice of requests in tests
|
||
important.
|
||
- The SLI & SLO excludes latency incurred by factors that are outside our control, specifically
|
||
from webhooks (1.23+) and API priority & fairness queue wait times (1.27+).
|
||
|
||
### TODOs
|
||
- We may consider treating `non-namespaced` resources as a separate bucket in
|
||
the future. However, it may not make sense if the number of those may be
|
||
comparable with `namespaced` ones.
|
||
|
||
### Test scenario
|
||
|
||
__TODO: Describe test scenario.__
|