Merge pull request #3768 from wojtek-t/update_thresholds
Update scalability thresholds document
This commit is contained in:
		
						commit
						2cf93d0d6a
					
				
										
											Binary file not shown.
										
									
								
							| After Width: | Height: | Size: 98 KiB | 
|  | @ -2,100 +2,73 @@ | |||
| 
 | ||||
| ## Background | ||||
| 
 | ||||
| Since 1.6 release Kubernetes officially supports 5000-node clusters. However, | ||||
| the question is what that actually means. As of early Q3 2017 we are in the | ||||
| process of defining set of performance-related SLIs ([Service Level Indicators]) | ||||
| and SLOs ([Service Level Objectives]). | ||||
| As described in [How we define scalability] document, it is impossible | ||||
| to provide guarantees in a generic situation. One of the prerequisites | ||||
| for SLOs being satisfied is keeping the load in the cluster within | ||||
| recommended limits. This document is trying to explicitly summarize | ||||
| dimensions and limits themselves. | ||||
| 
 | ||||
| However, no matter what SLIs and SLOs we have, there will always be some users | ||||
| coming and saying that their cluster is not meeting the SLOs. And in most cases | ||||
| it appears that the reason behind is that we (as developers) have silently | ||||
| assumed something (e.g. there will be no more than 10000 services in the | ||||
| cluster) and users were not aware of that. | ||||
| 
 | ||||
| This document is trying to explicitly summarize limits for the number of objects | ||||
| in the system that we are aware of and state if we will try to relax them in the | ||||
| future or not. | ||||
| [How we define scalability]: https://github.com/kubernetes/community/blob/master/sig-scalability/slos/slos.md#how-we-define-scalability | ||||
| 
 | ||||
| ## Kubernetes thresholds | ||||
| 
 | ||||
| We start with explicit definition of quantities and thresholds we assume are | ||||
| satisfied in the cluster. This is followed by an explanation for some of those. | ||||
| Important notes about the numbers: | ||||
| 1. In most cases, exceeding these thresholds doesn’t mean that the cluster | ||||
|    fails over - it just means that its overall performance degrades. | ||||
| 1. **Some thresholds below (e.g. total number of all objects, or total number of | ||||
|    pods or namespaces) are given for the largest possible cluster. For smaller | ||||
|    clusters, the limits are proportionally lower.** | ||||
| 1. The thresholds obviously differ between different Kubernetes releases | ||||
|    (hopefully each of them is non-decreasing). The numbers we present are for | ||||
|    the current release (Kubernetes 1.7 release). | ||||
| 1. There are a lot of factors that influence the thresholds, e.g. etcd version | ||||
|    or storage data format. For each of those we assume the default from the | ||||
|    release to avoid providing numbers for huge number of combinations of those. | ||||
| 1. The “Head threshold” is representing the status of Kubernetes head. This | ||||
|    column should be snapshotted at every release to produce per-release | ||||
|    thresholds (and dedicated column for each release should then be added). | ||||
| Scalability dimensions and thresholds are very complex topic. In fact, | ||||
| configurations that Kubernetes supports create `Scalability Envelope`: | ||||
| 
 | ||||
| | Quantity                            | Head threshold | 1.8 release | Long term goal | | ||||
| |-------------------------------------|----------------|-------------|----------------| | ||||
| | Total number of all objects         | 250000         |             | 1000000        | | ||||
| | Number of nodes                     | 5000           |             | 5000           | | ||||
| | Number of pods                      | 150000         |             | 500000         | | ||||
| | Number of pods per node<sup>1</sup> | 110            |             | 500            | | ||||
| | Number of pods per core<sup>1</sup> | 10             |             | 10             | | ||||
| | Number of namespaces (ns)           | 10000          |             | 100000         | | ||||
| | Number of pods per ns               | 3000           |             | 50000          | | ||||
| | Number of services                  | 10000          |             | 100000         | | ||||
| | Number of services per ns           | 5000           |             | 5000           | | ||||
| | Number of all services backends     | TBD            |             | 500000         | | ||||
| | Number of backends per service      | 250            |             | 5000           | | ||||
| | Number of deployments per ns        | 2000           |             | 10000          | | ||||
| | Number of pods per deployment       | TBD            |             | 10000          | | ||||
| | Number of jobs per ns               | TBD            |             | 1000           | | ||||
| | Number of daemon sets per ns        | TBD            |             | 100            | | ||||
| | Number of stateful sets per ns      | TBD            |             | 100            | | ||||
| | Number of secrets per ns            | TBD            |             | TBD            | | ||||
| | Number of secrets per pod           | TBD            |             | TBD            | | ||||
| | Number of config maps per ns        | TBD            |             | TBD            | | ||||
| | Number of config maps per pod       | TBD            |             | TBD            | | ||||
| | Number of storageclasses            | TBD            |             | TBD            | | ||||
| | Number of roles and rolebindings    | TBD            |             | TBD            | | ||||
|  | ||||
| 
 | ||||
| There are also thresholds for other types, but for those the numbers depend | ||||
| also on the environment (bare metal or which cloud provider) the cluster is | ||||
| running in. These include: | ||||
| Some the properties of the envelope: | ||||
| 1. It's NOT a kube, because dimensions are sometimes not independent. | ||||
| 1. It's NOT convex. | ||||
| 1. As you move farther along one dimension, your cross-section wrt other | ||||
|    dimensions gets smaller. | ||||
| 1. It's bounded. | ||||
| 1. It's decomposable into smaller envelopes. | ||||
| 
 | ||||
| | Quantity                                  | Head threshold | 1.8 release | Long term goal | | ||||
| |-------------------------------------------|----------------|-------------|----------------| | ||||
| | Number of ingresses                       | TBD            |             | TBD            | | ||||
| | Number of PersistentVolumes               | TBD            |             | TBD            | | ||||
| | Number of PersistentVolumeClaims per ns   | TBD            |             | TBD            | | ||||
| | Number of PersistentVolumeClaims per node | TBD            |             | TBD            | | ||||
| You can learn more about it in this [Kubecon talk] (or [Kubecon slides]). | ||||
| 
 | ||||
| There are couple caveats to the thresholds we are presenting below: | ||||
| 1. In majority of cases, thresholds are NOT hard limits - crossing | ||||
|    the limit results in degraded performance and doesn't mean cluster | ||||
|    immediately fails over. | ||||
| 1. **Many of the tresholds (for cluster scope) are given for the largest | ||||
|    possible cluster. For smaller clusters, the limits are proportionally | ||||
|    lower.** | ||||
| 1. The thresholds may differ (hopefully be non-decreasing) across Kubernetes | ||||
|    releases. The threshold below are given for Kubernetes head. <br/> | ||||
|    **TODO:** We are planning to start versioning the table below, but we | ||||
|    are not there yet. | ||||
| 1. Given that configuration influences thresholds, we are assuming vanilla | ||||
|    Kubernetes setup. | ||||
| 
 | ||||
| The table below is **NOT exhaustive** - more content is coming soon. | ||||
| 
 | ||||
| | Quantity               | Threshold scope=namespace | Threshold: scope=cluster | | ||||
| |------------------------|---------------------------|--------------------------| | ||||
| | #Nodes                 | n/a                       | 5000                     | | ||||
| | #Namespaces            | n/a                       | 10000                    | | ||||
| | #Pods                  | 3000                      | 150000                   | | ||||
| | #Pods per node         | min(110, 10*#cores)       | min(110, 10*#cores)      | | ||||
| | #Services              | 5000                      | 10000                    | | ||||
| | #All service endpoints | TBD                       | TBD                      | | ||||
| | #Endpoints per service | 250                       | n/a                      | | ||||
| | #Secrets               | TBD                       | TBD                      | | ||||
| | #ConfigMaps            | TBD                       | TBD                      | | ||||
| | #Deployments           | 2000                      | TBD                      | | ||||
| | #DaemonSets            | TBD                       | TBD                      | | ||||
| | #Jobs                  | TBD                       | TBD                      | | ||||
| | #StatefulSets          | TBD                       | TBD                      | | ||||
| 
 | ||||
| There are also thresholds that depend on environment/cloud provider. The **NOT | ||||
| exhaustive** list includes: | ||||
| 
 | ||||
| | Quantity                         | Threshold scope=namespace | Threshold: scope=cluster | | ||||
| |----------------------------------|---------------------------|--------------------------| | ||||
| | #Ingresses                       | TBD                       | TBD                      | | ||||
| | #PersistentVolumes               | n/a                       | TBD                      | | ||||
| | #PersistentVolumeClaims          | TBD                       | TBD                      | | ||||
| | #PersistentVolumeClaims per node | TBD                       | TBD                      | | ||||
| 
 | ||||
| 
 | ||||
| The rationale for some of those numbers: | ||||
| 1. Total number of objects <br/> | ||||
| There is a limitation on the total number of objects on the system, as this | ||||
| affects among others etcd and its resource consumption. | ||||
| 1. Number of nodes <br/> | ||||
| We believe that having clusters with more than 5000 nodes is not the best | ||||
| option and users should consider splitting into multiple clusters. However, | ||||
| we may consider bumping the long term goal at some time in the future. | ||||
| 1. Number of services and endpoints <br/> | ||||
| Each service port and each service backend has a corresponding entry in | ||||
| iptables. Number of backends of a given service impact the size of the | ||||
| `Endpoints` objects, which impacts size of data that is being sent all over | ||||
| the system. | ||||
| 1. Number of objects of a given type per namespace <br/> | ||||
| This holds for different objects (pods, secrets, deployments, ...). There are | ||||
| a number of control loops in the system that need to iterate over all objects | ||||
| in a given namespace as a reaction to some changes in state. Having large | ||||
| number of objects of a given type in a single namespace can make those loops | ||||
| expensive and slow down processing given state changes. | ||||
| 
 | ||||
| --- | ||||
| <sup>1</sup> The limit for number of pods on a given node is in fact minimum from the “pod per node” and “pods per core times number of cores of a node”. | ||||
| 
 | ||||
| [Service Level Indicators]: https://en.wikipedia.org/wiki/Service_level_indicator | ||||
| [Service Level Objectives]: https://en.wikipedia.org/wiki/Service_level_objective | ||||
| [Kubecon slides]: https://docs.google.com/presentation/d/1aWjxpY4YJ4KJQUTqaVHdR4sbhwqDiW30EF4_hGCc-gI | ||||
| [Kubecon talk]: https://www.youtube.com/watch?v=t_Ww6ELKl4Q | ||||
|  |  | |||
		Loading…
	
		Reference in New Issue