Remove burst SLIs/SLOs

2019-06-01 21:41:21 +02:00 · 2019-06-01 21:41:21 +02:00 · f0c6e48b7b
parent f042a6d212
commit f0c6e48b7b
2 changed files with 0 additions and 58 deletions
--- a/sig-scalability/slos/slos.md
+++ b/sig-scalability/slos/slos.md
@ -63,18 +63,6 @@ we will not provide any guarantees for users.
 [Service Level Objectives]: https://en.wikipedia.org/wiki/Service_level_objective
 [Service Level Agreement]: https://en.wikipedia.org/wiki/Service-level_agreement

-## Types of SLOs
-
-While SLIs are very generic and don't really depend on anything (they just
-define what and how we measure), it's not the case for SLOs.
-SLOs provide guarantees, and satisfying them may depend on meeting some
-specific requirements.
-
-As a result, we build our SLOs in "you promise, we promise" format.
-That means, that we provide you a guarantee only if you satisfy the requirement
-that we put on you.
-
-As a consequence we introduce the two types of SLOs.

 ### Steady state SLOs

@ -87,12 +75,6 @@ We define system to be in steady state when the cluster churn per second is <= 2
 churn = #(Pod spec creations/updates/deletions) + #(user originated requests) in a given second
 ```

-### Burst SLO
-
-With burst SLOs, we provide guarantees on how system behaves under the heavy load
-(when user wants the system to do something as quickly as possible not caring too
-much about response time).
-
 ## Environment

 In order to meet the SLOs, system must run in the environment satisfying
@ -145,12 +127,6 @@ sliding window. However, for the purpose of SLO itself, it basically means
 "fraction of good minutes per day" being within threshold.


-### Burst SLIs/SLOs
-
-| Status | SLI | SLO | User stories, test scenarios, ... |
-| --- | --- | --- | --- |
-| WIP | Time to start 30\*#nodes pods, measured from test scenario start until observing last Pod as ready | Benchmark: when all images present on all Nodes, 99th percentile <= X minutes | [Details](./system_throughput.md) |
-
 ### Other SLIs

 | Status | SLI | User stories, ... |
--- a/sig-scalability/slos/system_throughput.md
+++ b/sig-scalability/slos/system_throughput.md
@ -1,34 +0,0 @@
-## System throughput SLI/SLO details
-
-### Definition
-
-| Status | SLI | SLO |
-| --- | --- | --- |
-| WIP | Time to start 30\*#nodes pods, measured from test scenario start until observing last Pod as ready | Benchmark: when all images present on all Nodes, 99th percentile <= X minutes |
-
-### User stories
- As a user, I want a guarantee that my workload of X pods can be started
-  within a given time
- As a user, I want to understand how quickly I can react to a dramatic
-  change in workload profile when my workload exhibits very bursty behavior
-  (e.g. shop during Back Friday Sale)
- As a user, I want a guarantee how quickly I can recreate the whole setup
-  in case of a serious disaster which brings the whole cluster down.
-
-### Test scenario
- Start with a healthy (all nodes ready, all cluster addons already running)
-  cluster with N (>0) running pause pods per node.
- Create a number of `Namespaces` and a number of `Deployments` in each of them.
- All `Namespaces` should be isomorphic, possibly excluding last one which should
-  run all pods that didn't fit in the previous ones.
- Single namespace should run 5000 `Pods` in the following configuration:
-  - one big `Deployment` running ~1/3 of all `Pods` from this `namespace`
-  - medium `Deployments`, each with 120 `Pods`, in total running ~1/3 of all
-    `Pods` from this `namespace`
-  - small `Deployment`, each with 10 `Pods`, in total running ~1/3 of all `Pods`
-    from this `Namespace`
- Each `Deployment` should be covered by a single `Service`.
- Each `Pod` in any `Deployment` contains two pause containers, one `Secret`
-  other than default `ServiceAccount` and one `ConfigMap`. Additionally it has
-  resource requests set and doesn't use any advanced scheduling features or
-  init containers.