Merge pull request #3768 from wojtek-t/update_thresholds

Update scalability thresholds document
2019-06-05 12:24:15 -07:00 · 2019-06-05 12:24:15 -07:00 · 2cf93d0d6a
parent 5f15d87bc2 a8035dc2b6
commit 2cf93d0d6a
2 changed files with 61 additions and 88 deletions
--- a/sig-scalability/configs-and-limits/scalability-envelope.png
+++ b/sig-scalability/configs-and-limits/scalability-envelope.png
--- a/sig-scalability/configs-and-limits/thresholds.md
+++ b/sig-scalability/configs-and-limits/thresholds.md
@ -2,100 +2,73 @@

 ## Background

-Since 1.6 release Kubernetes officially supports 5000-node clusters. However,
-the question is what that actually means. As of early Q3 2017 we are in the
-process of defining set of performance-related SLIs ([Service Level Indicators])
-and SLOs ([Service Level Objectives]).
+As described in [How we define scalability] document, it is impossible
+to provide guarantees in a generic situation. One of the prerequisites
+for SLOs being satisfied is keeping the load in the cluster within
+recommended limits. This document is trying to explicitly summarize
+dimensions and limits themselves.

-However, no matter what SLIs and SLOs we have, there will always be some users
-coming and saying that their cluster is not meeting the SLOs. And in most cases
-it appears that the reason behind is that we (as developers) have silently
-assumed something (e.g. there will be no more than 10000 services in the
-cluster) and users were not aware of that.
-
-This document is trying to explicitly summarize limits for the number of objects
-in the system that we are aware of and state if we will try to relax them in the
-future or not.
+[How we define scalability]: https://github.com/kubernetes/community/blob/master/sig-scalability/slos/slos.md#how-we-define-scalability

 ## Kubernetes thresholds

-We start with explicit definition of quantities and thresholds we assume are
-satisfied in the cluster. This is followed by an explanation for some of those.
-Important notes about the numbers:
-1. In most cases, exceeding these thresholds doesn’t mean that the cluster
-   fails over - it just means that its overall performance degrades.
-1. **Some thresholds below (e.g. total number of all objects, or total number of
-   pods or namespaces) are given for the largest possible cluster. For smaller
-   clusters, the limits are proportionally lower.**
-1. The thresholds obviously differ between different Kubernetes releases
-   (hopefully each of them is non-decreasing). The numbers we present are for
-   the current release (Kubernetes 1.7 release).
-1. There are a lot of factors that influence the thresholds, e.g. etcd version
-   or storage data format. For each of those we assume the default from the
-   release to avoid providing numbers for huge number of combinations of those.
-1. The “Head threshold” is representing the status of Kubernetes head. This
-   column should be snapshotted at every release to produce per-release
-   thresholds (and dedicated column for each release should then be added).
+Scalability dimensions and thresholds are very complex topic. In fact,
+configurations that Kubernetes supports create `Scalability Envelope`:

-| Quantity                            | Head threshold | 1.8 release | Long term goal |
-|-------------------------------------|----------------|-------------|----------------|
-| Total number of all objects         | 250000         |             | 1000000        |
-| Number of nodes                     | 5000           |             | 5000           |
-| Number of pods                      | 150000         |             | 500000         |
-| Number of pods per node<sup>1</sup> | 110            |             | 500            |
-| Number of pods per core<sup>1</sup> | 10             |             | 10             |
-| Number of namespaces (ns)           | 10000          |             | 100000         |
-| Number of pods per ns               | 3000           |             | 50000          |
-| Number of services                  | 10000          |             | 100000         |
-| Number of services per ns           | 5000           |             | 5000           |
-| Number of all services backends     | TBD            |             | 500000         |
-| Number of backends per service      | 250            |             | 5000           |
-| Number of deployments per ns        | 2000           |             | 10000          |
-| Number of pods per deployment       | TBD            |             | 10000          |
-| Number of jobs per ns               | TBD            |             | 1000           |
-| Number of daemon sets per ns        | TBD            |             | 100            |
-| Number of stateful sets per ns      | TBD            |             | 100            |
-| Number of secrets per ns            | TBD            |             | TBD            |
-| Number of secrets per pod           | TBD            |             | TBD            |
-| Number of config maps per ns        | TBD            |             | TBD            |
-| Number of config maps per pod       | TBD            |             | TBD            |
-| Number of storageclasses            | TBD            |             | TBD            |
-| Number of roles and rolebindings    | TBD            |             | TBD            |
+![Scalability Envelope](./scalability-envelope.png)

-There are also thresholds for other types, but for those the numbers depend
-also on the environment (bare metal or which cloud provider) the cluster is
-running in. These include:
+Some the properties of the envelope:
+1. It's NOT a kube, because dimensions are sometimes not independent.
+1. It's NOT convex.
+1. As you move farther along one dimension, your cross-section wrt other
+   dimensions gets smaller.
+1. It's bounded.
+1. It's decomposable into smaller envelopes.

-| Quantity                                  | Head threshold | 1.8 release | Long term goal |
-|-------------------------------------------|----------------|-------------|----------------|
-| Number of ingresses                       | TBD            |             | TBD            |
-| Number of PersistentVolumes               | TBD            |             | TBD            |
-| Number of PersistentVolumeClaims per ns   | TBD            |             | TBD            |
-| Number of PersistentVolumeClaims per node | TBD            |             | TBD            |
+You can learn more about it in this [Kubecon talk] (or [Kubecon slides]).
+
+There are couple caveats to the thresholds we are presenting below:
+1. In majority of cases, thresholds are NOT hard limits - crossing
+   the limit results in degraded performance and doesn't mean cluster
+   immediately fails over.
+1. **Many of the tresholds (for cluster scope) are given for the largest
+   possible cluster. For smaller clusters, the limits are proportionally
+   lower.**
+1. The thresholds may differ (hopefully be non-decreasing) across Kubernetes
+   releases. The threshold below are given for Kubernetes head. <br/>
+   **TODO:** We are planning to start versioning the table below, but we
+   are not there yet.
+1. Given that configuration influences thresholds, we are assuming vanilla
+   Kubernetes setup.
+
+The table below is **NOT exhaustive** - more content is coming soon.
+
+| Quantity               | Threshold scope=namespace | Threshold: scope=cluster |
+|------------------------|---------------------------|--------------------------|
+| #Nodes                 | n/a                       | 5000                     |
+| #Namespaces            | n/a                       | 10000                    |
+| #Pods                  | 3000                      | 150000                   |
+| #Pods per node         | min(110, 10*#cores)       | min(110, 10*#cores)      |
+| #Services              | 5000                      | 10000                    |
+| #All service endpoints | TBD                       | TBD                      |
+| #Endpoints per service | 250                       | n/a                      |
+| #Secrets               | TBD                       | TBD                      |
+| #ConfigMaps            | TBD                       | TBD                      |
+| #Deployments           | 2000                      | TBD                      |
+| #DaemonSets            | TBD                       | TBD                      |
+| #Jobs                  | TBD                       | TBD                      |
+| #StatefulSets          | TBD                       | TBD                      |
+
+There are also thresholds that depend on environment/cloud provider. The **NOT
+exhaustive** list includes:
+
+| Quantity                         | Threshold scope=namespace | Threshold: scope=cluster |
+|----------------------------------|---------------------------|--------------------------|
+| #Ingresses                       | TBD                       | TBD                      |
+| #PersistentVolumes               | n/a                       | TBD                      |
+| #PersistentVolumeClaims          | TBD                       | TBD                      |
+| #PersistentVolumeClaims per node | TBD                       | TBD                      |


-The rationale for some of those numbers:
-1. Total number of objects <br/>
-There is a limitation on the total number of objects on the system, as this
-affects among others etcd and its resource consumption.
-1. Number of nodes <br/>
-We believe that having clusters with more than 5000 nodes is not the best
-option and users should consider splitting into multiple clusters. However,
-we may consider bumping the long term goal at some time in the future.
-1. Number of services and endpoints <br/>
-Each service port and each service backend has a corresponding entry in
-iptables. Number of backends of a given service impact the size of the
-`Endpoints` objects, which impacts size of data that is being sent all over
-the system.
-1. Number of objects of a given type per namespace <br/>
-This holds for different objects (pods, secrets, deployments, ...). There are
-a number of control loops in the system that need to iterate over all objects
-in a given namespace as a reaction to some changes in state. Having large
-number of objects of a given type in a single namespace can make those loops
-expensive and slow down processing given state changes.
-
---
-<sup>1</sup> The limit for number of pods on a given node is in fact minimum from the “pod per node” and “pods per core times number of cores of a node”.
-
-[Service Level Indicators]: https://en.wikipedia.org/wiki/Service_level_indicator
-[Service Level Objectives]: https://en.wikipedia.org/wiki/Service_level_objective
+[Kubecon slides]: https://docs.google.com/presentation/d/1aWjxpY4YJ4KJQUTqaVHdR4sbhwqDiW30EF4_hGCc-gI
+[Kubecon talk]: https://www.youtube.com/watch?v=t_Ww6ELKl4Q