Improve Kubernetes scalability definition

2019-06-01 21:38:17 +02:00 · 2019-06-01 21:38:17 +02:00 · f042a6d212
parent bc3fe5de36
commit f042a6d212
1 changed files with 46 additions and 18 deletions
--- a/sig-scalability/slos/slos.md
+++ b/sig-scalability/slos/slos.md
@ -9,27 +9,55 @@ you would expect to have some guarantees in those areas.
 The goal of this doc is to organize the guarantees that Kubernetes provides
 in these areas.
-## What do we require from SLIs/SLOs?
+## How we define scalability?
-We are in the process of extending the number of SLIs ([Service Level Indicators])
+Our scalability definition is built on two concepts:
-and SLOs ([Service Level Objectives]) built on top of these SLIs to cover more areas
+- [Service Level Indicators]
-of the system and user expectations.
+- [Service Level Objectives]
-Our SLIs/SLOs need to have the following properties:
+We require our SLIs/SLOs to have the following properties:
- <b> They need to be testable </b> <br/>
+- <b> They are precise and well-defined </b> <br/>
-  Ideally, they (SLIs and SLOs) should be measurable in all running clusters,
+  It's extremely important to ensure that both users and us have exactly the
-	but if   that isn't possible a benchmark may be enough in some situations.
+  same understanding of what we guarantee.
-  That means that not every SLO may be translatable to SLA ([Service
+- <b> They are consistent with each other </b> <br/>
-  Level Agreement]).
+  This is mostly about using the same terminology, same concepts, etc.
- <b> They need to be understandable for users </b> <br/>
+- <b> They are user-oriented </b> <br/>
-  In particular, they need to be understandable for people not familiar
+  First, the SLOs we provide need to be things users really care about.
-  with the system internals, i.e. their formulation can't depend on some
+  Second, they need to be understandable for people not familiar with the system
-  arcane knowledge.
+  internals (e.g. their formulation can't depend on some arcane knowledge or
  implementation details of the system).
 - <b> They are testable </b> <br/>
  Ideally, SLIs/SLOs should be measurable in all running clusters, but if measuring
  some metrics isn't possible or would be extremely expensive (e.g. in terms
  of resource overhead for the system), benchmarks sometimes may be enough.
  That means that not every SLO may be translatable to SLA ([Service Level
  Agreement]).
-We may also introduce internal(for developers only) SLIs, that may be useful
+While SLIs are generic (they just define what and how we measure), SLOs provide
-for understanding performance characteristic of the system, but for which
+specific guarantees and satisfying them may depend on meeting some specific
-we don't provide any guarantees for users (and thus don't require them to be
+requirements. Specific examples that may visibly affect ability to satisfy them
-that easily understandable).
+are:
 - cluster configuration
 - user of Kubernetes extensibility features
 - load on the cluster.
 As a result, we define Kubernetes scalability using "you promise, we promise"
 framework, as following:
 <b> If you promise to:
 - correctly configure your cluster
 - use extensibility features "reasonably"
 - keep the load in the cluster within recommended limits
 then we promise that your cluster scales, i.e.:
 - all the SLOs are satisfied. </b>
 We are in the process of extending coverage of the system with SLIs and SLOs
 to better reflect user expectations.
 Note that may also introduce internal (for developers only) SLIs, that may be
 useful for understanding performance characteristic of the system, but for which
 we will not provide any guarantees for users.
 [Service Level Indicators]: https://en.wikipedia.org/wiki/Service_level_indicator
 [Service Level Objectives]: https://en.wikipedia.org/wiki/Service_level_objective