Merge pull request #2586 from wojtek-t/network_programmin_latency

Introduce definition of network programming latency SLI
This commit is contained in:
k8s-ci-robot 2018-09-06 01:33:46 -07:00 committed by GitHub
commit b519bab844
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 94 additions and 0 deletions

View File

@ -0,0 +1,93 @@
## Network programming latency SLIs/SLOs details
### Definition
| Status | SLI | SLO |
| --- | --- | --- |
| __WIP__ | Latency of programming a single (e.g. iptables on a given node) in-cluster load balancing mechanism, measured from when service spec or list of its `Ready` pods change to when it is reflected in load balancing mechanism, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, 99th percentile of (99th percentiles across all programmers (e.g. iptables)) per cluster-day <= X |
### User stories
- As a user of vanilla Kubernetes, I want some guarantee how quickly new backends
of my service will be targets of in-cluster load-balancing
- As a user of vanilla Kubernetes, I want some guarantee how quickly deleted
(or unhealthy) backends of my service will be removed from in-cluster
load-balancing
- As a user of vanilla Kubernetes, I want some guarantee how quickly changes
to service specification (including creation) will be reflected in in-cluster
load-balancing
### Other notes
- We are consciously focusing on in-cluster load-balancing for the purpose of
this SLI, as external load-balancing is clearly provider specific (which makes
it hard to set the SLO for it).
- However, in the future it should be possible to formulate the SLI for external
load-balancing in pretty much the same way for consistency.
- The SLI measuring end-to-end time from pod creation was also considered,
but rejected due to being application specific, and thus introducing SLO would
be impossible.
### Caveats
- The SLI is formulated for a single "programmer" (e.g. iptables on a single
node), even though that value itself is not very interesting for the user.
In case there are multiple programmers in the cluster, the aggregation across
them is done only at the SLO level (and only that gives a value that is somehow
interesting for the user). The reason for doing it this is feasibility for
efficiently computing that:
- if we would be doing aggregation at the SLI level (i.e. the SLI would be
formulated like "... reflected in in-cluster load-balancing mechanism and
visible from 99% of programmers"), computing that SLI would be extremely
difficult. It's because in order to decide e.g. whether pod transition to
Ready state is reflected, we would have to know when exactly it was reflected
in 99% of programmers (e.g. iptables). That requires tracking metrics on
per-change base (which we can't do efficiently).
- we admit that the SLO is a bit weaker in that form (i.e. it doesn't necessary
force that a given change is reflected in 99% of programmers with a given
99th percentile latency), but it's close enough approximation.
### How to measure the SLI.
The method of measuring this SLI is not obvious, so for completeness we describe
it here how it will be implemented with all caveats.
1. We assume that for the in-cluster load-balancing programming we are using
Kubernetes `Endpoints` objects.
1. We will introduce a dedicated annotation for `Endpoints` object (name TBD).
1. Endpoints controller (while updating a given `Endpoints` object) will be
setting value of that annotation to the timestamp of the change that triggered
this update:
- for pod transition between `Ready` and `NotReady` states, its timestamp is
simply part of pod condition
- TBD for service updates (ideally we will add `LastUpdateTimestamp` field in
object metadata next to already existing `CreationTimestamp`. The data is
already present at storage layer, so it won't be hard to propagate that.
1. The in-cluster load-balancing programmer will export a prometheus metric
once done with programming. The latency of the operation is defined as
difference betweem timestamp of then whe operation is done and timestamp
recorded in the newly introduced annotation.
#### Caveats
There are a couple of caveats to that measurement method:
1. Single `Endpoints` object may batch multiple pod state transition. <br/>
In that case, we simply choose the oldest one (and not expose all timestamps
to avoid theoretically unbounded growth of the object). That makes the metric
imprecise, but the batching period should be relatively small comparing
to whole end-to-end flow.
1. A single pod may transition its state multiple times within batching
period. <br/>
For that case, we will add additional cache in Endpoints controller caching
the first observed transition timestamp for each pod. The cache will be
cleared when controller picks up a pod into Endpoints object update. This is
consistent with choosing the oldest update in the above point. <br/>
Initially, we may consider simply ignoring this fact.
1. Components may fall out of watch window history and thus miss some watch
events. <br/>
This may be the case for both Endpoints controller or kube-proxy (or other
network programmers if used instead). That becomes a problem when a single
object changed multiple times in the meantime (otherwise informers will
deliver handlers on relisting). Additionally, this can happen only when
components are too slow in processing events (that would already be reflected
in metrics) or (sometimes) after kube-apiserver restart. Given that, we are
going to neglect this problem to avoid unnecessary complications for little
or no gain.
### Test scenario
__TODO: Describe test scenario.__

View File

@ -106,6 +106,7 @@ Prerequisite: Kubernetes cluster is available and serving.
| __Official__ | Latency of mutating API calls for single objects for every (resource, verb) pair, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, for every (resource, verb) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day<sup>[1](#footnote1)</sup> <= 1s | [Details](./api_call_latency.md) |
| __Official__ | Latency of non-streaming read-only API calls for every (resource, scope pair, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, for every (resource, scope) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day<sup>[1](#footnote1)</sup> (a) <= 1s if `scope=resource` (b) <= 5s if `scope=namespace` (c) <= 30s if `scope=cluster` | [Details](./api_call_latency.md) |
| __Official__ | Startup latency of stateless and schedulable pods, excluding time to pull images and run init containers, measured from pod creation timestamp to when all its containers are reported as started and observed via watch, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, 99th percentile per cluster-day<sup>[1](#footnote1)</sup> <= 5s | [Details](./pod_startup_latency.md) |
| __WIP__ | Latency of programming a single (e.g. iptables on a given node) in-cluster load balancing mechanism, measured from when service spec or list of its `Ready` pods change to when it is reflected in load balancing mechanism, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, 99th percentile of (99th percentiles across all programmers (e.g. iptables)) per cluster-day<sup>[1](#footnote1)</sup> <= X | [Details](./networking_programming_latency.md) |
<a name="footnote1">\[1\]</a> For the purpose of visualization it will be a
sliding window. However, for the purpose of reporting the SLO, it means one