Make SLOs page more clear
This commit is contained in:
		
							parent
							
								
									d6c5f0f515
								
							
						
					
					
						commit
						5a275c424c
					
				|  | @ -1,5 +1,28 @@ | ||||||
| ## API call latency SLIs/SLOs details | ## API call latency SLIs/SLOs details | ||||||
| 
 | 
 | ||||||
|  | ### Definition | ||||||
|  | 
 | ||||||
|  | | Status | SLI | SLO | | ||||||
|  | | --- | --- | --- | | ||||||
|  | | __Official__ | Latency<sup>[1](#footnote1)</sup> of mutating<sup>[2](#footnote2)</sup> API calls for single objects for every (resource, verb) pair, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, for every (resource, verb) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day <= 1s | | ||||||
|  | | __Official__ | Latency<sup>[1](#footnote1)</sup> of non-streaming read-only<sup>[3](#footnote3)</sup> API calls for every (resource, scope<sup>[4](#footnote4)</sup>) pair, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, for every (resource, scope) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day (a) <= 1s if `scope=resource` (b) <= 5s if `scope=namespace` (c) <= 30s if `scope=cluster` | | ||||||
|  | 
 | ||||||
|  | <a name="footnote1">\[1\]</a>By latency of API call in this doc we mean time | ||||||
|  | from the moment when apiserver gets the request to last byte of response sent | ||||||
|  | to the user. | ||||||
|  | 
 | ||||||
|  | <a name="footnote2">\[2\]</a>By mutating API calls we mean POST, PUT, DELETE | ||||||
|  | and PATCH. | ||||||
|  | 
 | ||||||
|  | <a name="footnote3">\[3\]</a>By non-streaming read-only API calls we mean GET | ||||||
|  | requests without `watch=true` option set. (Note that in Kubernetes internally | ||||||
|  | it translates to both GET and LIST calls). | ||||||
|  | 
 | ||||||
|  | <a name="footnote4">\[4\]</a>A scope of a request can be either (a) `resource` | ||||||
|  | if the request is about a single object, (b) `namespace` if it is about objects | ||||||
|  | from a single namespace or (c) `cluster` if it spawns objects from multiple | ||||||
|  | namespaces. | ||||||
|  | 
 | ||||||
| ### User stories | ### User stories | ||||||
| - As a user of vanilla Kubernetes, I want some guarantee how quickly I get the | - As a user of vanilla Kubernetes, I want some guarantee how quickly I get the | ||||||
| response from an API call. | response from an API call. | ||||||
|  |  | ||||||
|  | @ -1,5 +1,13 @@ | ||||||
| ## API call extension points latency SLIs details | ## API call extension points latency SLIs details | ||||||
| 
 | 
 | ||||||
|  | ### Definition | ||||||
|  | 
 | ||||||
|  | | Status | SLI | | ||||||
|  | | --- | --- | | ||||||
|  | | WIP | Admission latency for each admission plugin type, measured as 99th percentile over last 5 minutes | | ||||||
|  | | WIP | Webhook call latency for each webhook type, measured as 99th percentile over last 5 minutes | ||||||
|  | | WIP | Initializer latency for each initializer, measured as 99th percentile over last 5 minutes | | ||||||
|  | 
 | ||||||
| ### User stories | ### User stories | ||||||
| - As an administrator, if API calls are slow, I would like to know if this is | - As an administrator, if API calls are slow, I would like to know if this is | ||||||
| because slow extension points (admission plugins, webhooks, initializers) and | because slow extension points (admission plugins, webhooks, initializers) and | ||||||
|  |  | ||||||
|  | @ -1,5 +1,18 @@ | ||||||
| ## Pod startup latency SLI/SLO details | ## Pod startup latency SLI/SLO details | ||||||
| 
 | 
 | ||||||
|  | ### Definition | ||||||
|  | 
 | ||||||
|  | | Status | SLI | SLO | | ||||||
|  | | --- | --- | --- | | ||||||
|  | | __Official__ | Startup latency of stateless<sup>[1](#footnote1)</sup> and schedulable<sup>[2](#footnote2)</sup> pods, excluding time to pull images and run init containers, measured from pod creation timestamp to when all its containers are reported as started and observed via watch, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, 99th percentile per cluster-day <= 5s | | ||||||
|  | 
 | ||||||
|  | <a name="footnote1">[1\]</a>A `stateless pod` is defined as a pod that doesn't | ||||||
|  | mount volumes with sources other than secrets, config maps, downward API and | ||||||
|  | empty dir. | ||||||
|  | 
 | ||||||
|  | <a name="footnote2">[2\]</a>By schedulable pod we mean a pod that can be | ||||||
|  | scheduled in the cluster without causing any preemption. | ||||||
|  | 
 | ||||||
| ### User stories | ### User stories | ||||||
| - As a user of vanilla Kubernetes, I want some guarantee how quickly my pods | - As a user of vanilla Kubernetes, I want some guarantee how quickly my pods | ||||||
| will be started. | will be started. | ||||||
|  |  | ||||||
|  | @ -100,37 +100,14 @@ Prerequisite: Kubernetes cluster is available and serving. | ||||||
| 
 | 
 | ||||||
| | Status | SLI | SLO | User stories, test scenarios, ... | | | Status | SLI | SLO | User stories, test scenarios, ... | | ||||||
| | --- | --- | --- | --- | | | --- | --- | --- | --- | | ||||||
| | __Official__ | Latency<sup>[1](#footnote1)</sup> of mutating<sup>[2](#footnote2)</sup> API calls for single objects for every (resource, verb) pair, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, for every (resource, verb) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day<sup>[3](#footnote3)</sup> <= 1s | [Details](./api_call_latency.md) | | | __Official__ | Latency of mutating API calls for single objects for every (resource, verb) pair, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, for every (resource, verb) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day<sup>[1](#footnote1)</sup> <= 1s | [Details](./api_call_latency.md) | | ||||||
| | __Official__ | Latency<sup>[1](#footnote1)</sup> of non-streaming read-only<sup>[4](#footnote3)</sup> API calls for every (resource, scope<sup>[5](#footnote4)</sup>) pair, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, for every (resource, scope) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day (a) <= 1s if `scope=resource` (b) <= 5s if `scope=namespace` (c) <= 30s if `scope=cluster` | [Details](./api_call_latency.md) | | | __Official__ | Latency of non-streaming read-only API calls for every (resource, scope pair, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, for every (resource, scope) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day<sup>[1](#footnote1)</sup> (a) <= 1s if `scope=resource` (b) <= 5s if `scope=namespace` (c) <= 30s if `scope=cluster` | [Details](./api_call_latency.md) | | ||||||
| | __Official__ | Startup latency of stateless<sup>[6](#footnode6)</sup> and schedulable<sup>[7](#footnote7)</sup> pods, excluding time to pull images and run init containers, measured from pod creation timestamp to when all its containers are reported as started and observed via watch, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, 99th percentile per cluster-day <= 5s | [Details](./pod_startup_latency.md) | | | __Official__ | Startup latency of stateless and schedulable pods, excluding time to pull images and run init containers, measured from pod creation timestamp to when all its containers are reported as started and observed via watch, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, 99th percentile per cluster-day<sup>[1](#footnote1)</sup> <= 5s | [Details](./pod_startup_latency.md) | | ||||||
| 
 | 
 | ||||||
| <a name="footnote1">\[1\]</a>By latency of API call in this doc we mean time | <a name="footnote1">\[1\]</a> For the purpose of visualization it will be a | ||||||
| from the moment when apiserver gets the request to last byte of response sent |  | ||||||
| to the user. |  | ||||||
| 
 |  | ||||||
| <a name="footnote2">\[2\]</a>By mutating API calls we mean POST, PUT, DELETE |  | ||||||
| and PATCH. |  | ||||||
| 
 |  | ||||||
| <a name="footnote3">\[3\]</a> For the purpose of visualization it will be a |  | ||||||
| sliding window. However, for the purpose of reporting the SLO, it means one | sliding window. However, for the purpose of reporting the SLO, it means one | ||||||
| point per day (whether SLO was satisfied on a given day or not). | point per day (whether SLO was satisfied on a given day or not). | ||||||
| 
 | 
 | ||||||
| <a name="footnote4">\[4\]</a>By non-streaming read-only API calls we mean GET |  | ||||||
| requests without `watch=true` option set. (Note that in Kubernetes internally |  | ||||||
| it translates to both GET and LIST calls). |  | ||||||
| 
 |  | ||||||
| <a name="footnote5">\[5\]</a>A scope of a request can be either (a) `resource` |  | ||||||
| if the request is about a single object, (b) `namespace` if it is about objects |  | ||||||
| from a single namespace or (c) `cluster` if it spawns objects from multiple |  | ||||||
| namespaces. |  | ||||||
| 
 |  | ||||||
| <a name="footnode6">[6\]</a>A `stateless pod` is defined as a pod that doesn't |  | ||||||
| mount volumes with sources other than secrets, config maps, downward API and |  | ||||||
| empty dir. |  | ||||||
| 
 |  | ||||||
| <a name="footnode7">[7\]</a>By schedulable pod we mean a pod that can be |  | ||||||
| scheduled in the cluster without causing any preemption. |  | ||||||
| 
 |  | ||||||
| ### Burst SLIs/SLOs | ### Burst SLIs/SLOs | ||||||
| 
 | 
 | ||||||
| | Status | SLI | SLO | User stories, test scenarios, ... | | | Status | SLI | SLO | User stories, test scenarios, ... | | ||||||
|  |  | ||||||
|  | @ -1,5 +1,11 @@ | ||||||
| ## System throughput SLI/SLO details | ## System throughput SLI/SLO details | ||||||
| 
 | 
 | ||||||
|  | ### Definition | ||||||
|  | 
 | ||||||
|  | | Status | SLI | SLO | | ||||||
|  | | --- | --- | --- | | ||||||
|  | | WIP | Time to start 30\*#nodes pods, measured from test scenario start until observing last Pod as ready | Benchmark: when all images present on all Nodes, 99th percentile <= X minutes | | ||||||
|  | 
 | ||||||
| ### User stories | ### User stories | ||||||
| - As a user, I want a guarantee that my workload of X pods can be started | - As a user, I want a guarantee that my workload of X pods can be started | ||||||
|   within a given time |   within a given time | ||||||
|  |  | ||||||
|  | @ -1,5 +1,11 @@ | ||||||
| ## Watch latency SLI details | ## Watch latency SLI details | ||||||
| 
 | 
 | ||||||
|  | ### Definition | ||||||
|  | 
 | ||||||
|  | | Status | SLI | | ||||||
|  | | --- | --- | | ||||||
|  | | WIP | Watch latency for every resource, (from the moment when object is stored in database to when it's ready to be sent to all watchers), measured as 99th percentile over last 5 minutes | | ||||||
|  | 
 | ||||||
| ### User stories | ### User stories | ||||||
| - As an administrator, if Kubernetes is slow, I would like to know if the root | - As an administrator, if Kubernetes is slow, I would like to know if the root | ||||||
| cause of it is slow api-machinery (slow watch) or something farther the path | cause of it is slow api-machinery (slow watch) or something farther the path | ||||||
|  |  | ||||||
		Loading…
	
		Reference in New Issue