diff --git a/sig-scalability/configs-and-limits/scalability-envelope.png b/sig-scalability/configs-and-limits/scalability-envelope.png new file mode 100644 index 000000000..46b6479a4 Binary files /dev/null and b/sig-scalability/configs-and-limits/scalability-envelope.png differ diff --git a/sig-scalability/configs-and-limits/thresholds.md b/sig-scalability/configs-and-limits/thresholds.md index d8e8d8a65..adef53c54 100644 --- a/sig-scalability/configs-and-limits/thresholds.md +++ b/sig-scalability/configs-and-limits/thresholds.md @@ -2,100 +2,73 @@ ## Background -Since 1.6 release Kubernetes officially supports 5000-node clusters. However, -the question is what that actually means. As of early Q3 2017 we are in the -process of defining set of performance-related SLIs ([Service Level Indicators]) -and SLOs ([Service Level Objectives]). +As described in [How we define scalability] document, it is impossible +to provide guarantees in a generic situation. One of the prerequisites +for SLOs being satisfied is keeping the load in the cluster within +recommended limits. This document is trying to explicitly summarize +dimensions and limits themselves. -However, no matter what SLIs and SLOs we have, there will always be some users -coming and saying that their cluster is not meeting the SLOs. And in most cases -it appears that the reason behind is that we (as developers) have silently -assumed something (e.g. there will be no more than 10000 services in the -cluster) and users were not aware of that. - -This document is trying to explicitly summarize limits for the number of objects -in the system that we are aware of and state if we will try to relax them in the -future or not. +[How we define scalability]: https://github.com/kubernetes/community/blob/master/sig-scalability/slos/slos.md#how-we-define-scalability ## Kubernetes thresholds -We start with explicit definition of quantities and thresholds we assume are -satisfied in the cluster. This is followed by an explanation for some of those. -Important notes about the numbers: -1. In most cases, exceeding these thresholds doesn’t mean that the cluster - fails over - it just means that its overall performance degrades. -1. **Some thresholds below (e.g. total number of all objects, or total number of - pods or namespaces) are given for the largest possible cluster. For smaller - clusters, the limits are proportionally lower.** -1. The thresholds obviously differ between different Kubernetes releases - (hopefully each of them is non-decreasing). The numbers we present are for - the current release (Kubernetes 1.7 release). -1. There are a lot of factors that influence the thresholds, e.g. etcd version - or storage data format. For each of those we assume the default from the - release to avoid providing numbers for huge number of combinations of those. -1. The “Head threshold” is representing the status of Kubernetes head. This - column should be snapshotted at every release to produce per-release - thresholds (and dedicated column for each release should then be added). +Scalability dimensions and thresholds are very complex topic. In fact, +configurations that Kubernetes supports create `Scalability Envelope`: -| Quantity | Head threshold | 1.8 release | Long term goal | -|-------------------------------------|----------------|-------------|----------------| -| Total number of all objects | 250000 | | 1000000 | -| Number of nodes | 5000 | | 5000 | -| Number of pods | 150000 | | 500000 | -| Number of pods per node1 | 110 | | 500 | -| Number of pods per core1 | 10 | | 10 | -| Number of namespaces (ns) | 10000 | | 100000 | -| Number of pods per ns | 3000 | | 50000 | -| Number of services | 10000 | | 100000 | -| Number of services per ns | 5000 | | 5000 | -| Number of all services backends | TBD | | 500000 | -| Number of backends per service | 250 | | 5000 | -| Number of deployments per ns | 2000 | | 10000 | -| Number of pods per deployment | TBD | | 10000 | -| Number of jobs per ns | TBD | | 1000 | -| Number of daemon sets per ns | TBD | | 100 | -| Number of stateful sets per ns | TBD | | 100 | -| Number of secrets per ns | TBD | | TBD | -| Number of secrets per pod | TBD | | TBD | -| Number of config maps per ns | TBD | | TBD | -| Number of config maps per pod | TBD | | TBD | -| Number of storageclasses | TBD | | TBD | -| Number of roles and rolebindings | TBD | | TBD | +![Scalability Envelope](./scalability-envelope.png) -There are also thresholds for other types, but for those the numbers depend -also on the environment (bare metal or which cloud provider) the cluster is -running in. These include: +Some the properties of the envelope: +1. It's NOT a kube, because dimensions are sometimes not independent. +1. It's NOT convex. +1. As you move farther along one dimension, your cross-section wrt other + dimensions gets smaller. +1. It's bounded. +1. It's decomposable into smaller envelopes. -| Quantity | Head threshold | 1.8 release | Long term goal | -|-------------------------------------------|----------------|-------------|----------------| -| Number of ingresses | TBD | | TBD | -| Number of PersistentVolumes | TBD | | TBD | -| Number of PersistentVolumeClaims per ns | TBD | | TBD | -| Number of PersistentVolumeClaims per node | TBD | | TBD | +You can learn more about it in this [Kubecon talk] (or [Kubecon slides]). + +There are couple caveats to the thresholds we are presenting below: +1. In majority of cases, thresholds are NOT hard limits - crossing + the limit results in degraded performance and doesn't mean cluster + immediately fails over. +1. **Many of the tresholds (for cluster scope) are given for the largest + possible cluster. For smaller clusters, the limits are proportionally + lower.** +1. The thresholds may differ (hopefully be non-decreasing) across Kubernetes + releases. The threshold below are given for Kubernetes head.
+ **TODO:** We are planning to start versioning the table below, but we + are not there yet. +1. Given that configuration influences thresholds, we are assuming vanilla + Kubernetes setup. + +The table below is **NOT exhaustive** - more content is coming soon. + +| Quantity | Threshold scope=namespace | Threshold: scope=cluster | +|------------------------|---------------------------|--------------------------| +| #Nodes | n/a | 5000 | +| #Namespaces | n/a | 10000 | +| #Pods | 3000 | 150000 | +| #Pods per node | min(110, 10*#cores) | min(110, 10*#cores) | +| #Services | 5000 | 10000 | +| #All service endpoints | TBD | TBD | +| #Endpoints per service | 250 | n/a | +| #Secrets | TBD | TBD | +| #ConfigMaps | TBD | TBD | +| #Deployments | 2000 | TBD | +| #DaemonSets | TBD | TBD | +| #Jobs | TBD | TBD | +| #StatefulSets | TBD | TBD | + +There are also thresholds that depend on environment/cloud provider. The **NOT +exhaustive** list includes: + +| Quantity | Threshold scope=namespace | Threshold: scope=cluster | +|----------------------------------|---------------------------|--------------------------| +| #Ingresses | TBD | TBD | +| #PersistentVolumes | n/a | TBD | +| #PersistentVolumeClaims | TBD | TBD | +| #PersistentVolumeClaims per node | TBD | TBD | -The rationale for some of those numbers: -1. Total number of objects
-There is a limitation on the total number of objects on the system, as this -affects among others etcd and its resource consumption. -1. Number of nodes
-We believe that having clusters with more than 5000 nodes is not the best -option and users should consider splitting into multiple clusters. However, -we may consider bumping the long term goal at some time in the future. -1. Number of services and endpoints
-Each service port and each service backend has a corresponding entry in -iptables. Number of backends of a given service impact the size of the -`Endpoints` objects, which impacts size of data that is being sent all over -the system. -1. Number of objects of a given type per namespace
-This holds for different objects (pods, secrets, deployments, ...). There are -a number of control loops in the system that need to iterate over all objects -in a given namespace as a reaction to some changes in state. Having large -number of objects of a given type in a single namespace can make those loops -expensive and slow down processing given state changes. - ---- -1 The limit for number of pods on a given node is in fact minimum from the “pod per node” and “pods per core times number of cores of a node”. - -[Service Level Indicators]: https://en.wikipedia.org/wiki/Service_level_indicator -[Service Level Objectives]: https://en.wikipedia.org/wiki/Service_level_objective +[Kubecon slides]: https://docs.google.com/presentation/d/1aWjxpY4YJ4KJQUTqaVHdR4sbhwqDiW30EF4_hGCc-gI +[Kubecon talk]: https://www.youtube.com/watch?v=t_Ww6ELKl4Q