Merge pull request #29385 from shannonxtreme/assign-pods-nodes

Refactor Assigning Pods to Nodes
2022-03-23 01:01:59 -07:00 · 2022-03-23 01:01:59 -07:00 · b2097006f0
parent 0f274554a4 4650e1df61
commit b2097006f0
4 changed files with 300 additions and 228 deletions
--- a/content/en/docs/concepts/scheduling-eviction/assign-pod-node.md
+++ b/content/en/docs/concepts/scheduling-eviction/assign-pod-node.md
@ -12,158 +12,181 @@ weight: 20
 <!-- overview -->
 You can constrain a {{< glossary_tooltip text="Pod" term_id="pod" >}} so that it can only run on particular set of
-{{< glossary_tooltip text="Node(s)" term_id="node" >}}.
+{{< glossary_tooltip text="node(s)" term_id="node" >}}.
 There are several ways to do this and the recommended approaches all use
 [label selectors](/docs/concepts/overview/working-with-objects/labels/) to facilitate the selection.
 Generally such constraints are unnecessary, as the scheduler will automatically do a reasonable placement
-(e.g. spread your pods across nodes so as not place the pod on a node with insufficient free resources, etc.)
+(for example, spreading your Pods across nodes so as not place Pods on a node with insufficient free resources).
-but there are some circumstances where you may want to control which node the pod deploys to - for example to ensure
+However, there are some circumstances where you may want to control which node
-that a pod ends up on a machine with an SSD attached to it, or to co-locate pods from two different
+the Pod deploys to, for example, to ensure that a Pod ends up on a node with an SSD attached to it, or to co-locate Pods from two different
 services that communicate a lot into the same availability zone.
 <!-- body -->
 You can use any of the following methods to choose where Kubernetes schedules
 specific Pods: 
  * [nodeSelector](#nodeselector) field matching against [node labels](#built-in-node-labels)
  * [Affinity and anti-affinity](#affinity-and-anti-affinity)
  * [nodeName](#nodename) field
 ## Node labels {#built-in-node-labels}
 Like many other Kubernetes objects, nodes have
 [labels](/docs/concepts/overview/working-with-objects/labels/). You can [attach labels manually](/docs/tasks/confiure-pod-container/assign-pods-nodes/#add-a-label-to-a-node).
 Kubernetes also populates a standard set of labels on all nodes in a cluster. See [Well-Known Labels, Annotations and Taints](/docs/reference/labels-annotations-taints/)
 for a list of common node labels.
 {{<note>}}
 The value of these labels is cloud provider specific and is not guaranteed to be reliable.
 For example, the value of `kubernetes.io/hostname` may be the same as the node name in some environments
 and a different value in other environments.
 {{</note>}}
 ### Node isolation/restriction
 Adding labels to nodes allows you to target Pods for scheduling on specific
 nodes or groups of nodes. You can use this functionality to ensure that specific
 Pods only run on nodes with certain isolation, security, or regulatory
 properties. 
 If you use labels for node isolation, choose label keys that the {{<glossary_tooltip text="kubelet" term_id="kubelet">}}
 cannot modify. This prevents a compromised node from setting those labels on
 itself so that the scheduler schedules workloads onto the compromised node.
 The [`NodeRestriction` admission plugin](/docs/reference/access-authn-authz/admission-controllers/#noderestriction)
 prevents the kubelet from setting or modifying labels with a
 `node-restriction.kubernetes.io/` prefix. 
 To make use of that label prefix for node isolation:
 1. Ensure you are using the [Node authorizer](/docs/reference/access-authn-authz/node/) and have _enabled_ the `NodeRestriction` admission plugin.
 2. Add labels with the `node-restriction.kubernetes.io/` prefix to your nodes, and use those labels in your [node selectors](#nodeselector).
   For example, `example.com.node-restriction.kubernetes.io/fips=true` or `example.com.node-restriction.kubernetes.io/pci-dss=true`.
 ## nodeSelector
 `nodeSelector` is the simplest recommended form of node selection constraint.
-`nodeSelector` is a field of PodSpec. It specifies a map of key-value pairs. For the pod to be eligible
+You can add the `nodeSelector` field to your Pod specification and specify the
-to run on a node, the node must have each of the indicated key-value pairs as labels (it can have
+[node labels](#built-in-node-labels) you want the target node to have.
-additional labels as well). The most common usage is one key-value pair.
+Kubernetes only schedules the Pod onto nodes that have each of the labels you
 specify. 
-Let's walk through an example of how to use `nodeSelector`.
+See [Assign Pods to Nodes](/docs/tasks/configure-pod-container/assign-pods-nodes) for more
-
+information.
 ### Step Zero: Prerequisites
 This example assumes that you have a basic understanding of Kubernetes pods and that you have [set up a Kubernetes cluster](/docs/setup/).
 ### Step One: Attach label to the node
 Run `kubectl get nodes` to get the names of your cluster's nodes. Pick out the one that you want to add a label to, and then run `kubectl label nodes <node-name> <label-key>=<label-value>` to add a label to the node you've chosen. For example, if my node name is 'kubernetes-foo-node-1.c.a-robinson.internal' and my desired label is 'disktype=ssd', then I can run `kubectl label nodes kubernetes-foo-node-1.c.a-robinson.internal disktype=ssd`.
 You can verify that it worked by re-running `kubectl get nodes --show-labels` and checking that the node now has a label. You can also use `kubectl describe node "nodename"` to see the full list of labels of the given node.
 ### Step Two: Add a nodeSelector field to your pod configuration
 Take whatever pod config file you want to run, and add a nodeSelector section to it, like this. For example, if this is my pod config:
 ```yaml
 apiVersion: v1
 kind: Pod
 metadata:
  name: nginx
  labels:
    env: test
 spec:
  containers:
  - name: nginx
    image: nginx
 ```
 Then add a nodeSelector like so:
 {{< codenew file="pods/pod-nginx.yaml" >}}
 When you then run `kubectl apply -f https://k8s.io/examples/pods/pod-nginx.yaml`,
 the Pod will get scheduled on the node that you attached the label to. You can
 verify that it worked by running `kubectl get pods -o wide` and looking at the
 "NODE" that the Pod was assigned to.
 ## Interlude: built-in node labels {#built-in-node-labels}
 In addition to labels you [attach](#step-one-attach-label-to-the-node), nodes come pre-populated
 with a standard set of labels. See [Well-Known Labels, Annotations and Taints](/docs/reference/labels-annotations-taints/) for a list of these.
 {{< note >}}
 The value of these labels is cloud provider specific and is not guaranteed to be reliable.
 For example, the value of `kubernetes.io/hostname` may be the same as the Node name in some environments
 and a different value in other environments.
 {{< /note >}}
 ## Node isolation/restriction
 Adding labels to Node objects allows targeting pods to specific nodes or groups of nodes.
 This can be used to ensure specific pods only run on nodes with certain isolation, security, or regulatory properties.
 When using labels for this purpose, choosing label keys that cannot be modified by the kubelet process on the node is strongly recommended.
 This prevents a compromised node from using its kubelet credential to set those labels on its own Node object,
 and influencing the scheduler to schedule workloads to the compromised node.
 The `NodeRestriction` admission plugin prevents kubelets from setting or modifying labels with a `node-restriction.kubernetes.io/` prefix.
 To make use of that label prefix for node isolation:
 1. Ensure you are using the [Node authorizer](/docs/reference/access-authn-authz/node/) and have _enabled_ the [NodeRestriction admission plugin](/docs/reference/access-authn-authz/admission-controllers/#noderestriction).
 2. Add labels under the `node-restriction.kubernetes.io/` prefix to your Node objects, and use those labels in your node selectors.
 For example, `example.com.node-restriction.kubernetes.io/fips=true` or `example.com.node-restriction.kubernetes.io/pci-dss=true`.
 ## Affinity and anti-affinity
-`nodeSelector` provides a very simple way to constrain pods to nodes with particular labels. The affinity/anti-affinity
+`nodeSelector` is the simplest way to constrain Pods to nodes with specific
-feature, greatly expands the types of constraints you can express. The key enhancements are
+labels. Affinity and anti-affinity expands the types of constraints you can
 define. Some of the benefits of affinity and anti-affinity include:
-1. The affinity/anti-affinity language is more expressive. The language offers more matching rules
+* The affinity/anti-affinity language is more expressive. `nodeSelector` only
-   besides exact matches created with a logical AND operation;
+  selects nodes with all the specified labels. Affinity/anti-affinity gives you
-2. you can indicate that the rule is "soft"/"preference" rather than a hard requirement, so if the scheduler
+  more control over the selection logic.
-   can't satisfy it, the pod will still be scheduled;
+* You can indicate that a rule is *soft* or *preferred*, so that the scheduler
-3. you can constrain against labels on other pods running on the node (or other topological domain),
+  still schedules the Pod even if it can't find a matching node.
-   rather than against labels on the node itself, which allows rules about which pods can and cannot be co-located
+* You can constrain a Pod using labels on other Pods running on the node (or other topological domain),
  instead of just node labels, which allows you to define rules for which Pods
  can be co-located on a node.
-The affinity feature consists of two types of affinity, "node affinity" and "inter-pod affinity/anti-affinity".
+The affinity feature consists of two types of affinity:
-Node affinity is like the existing `nodeSelector` (but with the first two benefits listed above),
+
-while inter-pod affinity/anti-affinity constrains against pod labels rather than node labels, as
+* *Node affinity* functions like the `nodeSelector` field but is more expressive and
-described in the third item listed above, in addition to having the first and second properties listed above.
+  allows you to specify soft rules. 
 * *Inter-pod affinity/anti-affinity* allows you to constrain Pods against labels
  on other Pods.
 ### Node affinity
-Node affinity is conceptually similar to `nodeSelector` -- it allows you to constrain which nodes your
+Node affinity is conceptually similar to `nodeSelector`, allowing you to constrain which nodes your
-pod is eligible to be scheduled on, based on labels on the node.
+Pod can be scheduled on based on node labels. There are two types of node
 affinity:
-There are currently two types of node affinity, called `requiredDuringSchedulingIgnoredDuringExecution` and
+  * `requiredDuringSchedulingIgnoredDuringExecution`: The scheduler can't
-`preferredDuringSchedulingIgnoredDuringExecution`. You can think of them as "hard" and "soft" respectively,
+    schedule the Pod unless the rule is met. This functions like `nodeSelector`,
-in the sense that the former specifies rules that *must* be met for a pod to be scheduled onto a node (similar to
+    but with a more expressive syntax.
-`nodeSelector` but using a more expressive syntax), while the latter specifies *preferences* that the scheduler
+  * `preferredDuringSchedulingIgnoredDuringExecution`: The scheduler tries to
-will try to enforce but will not guarantee. The "IgnoredDuringExecution" part of the names means that, similar
+    find a node that meets the rule. If a matching node is not available, the
-to how `nodeSelector` works, if labels on a node change at runtime such that the affinity rules on a pod are no longer
+    scheduler still schedules the Pod.
 met, the pod continues to run on the node. In the future we plan to offer
 `requiredDuringSchedulingRequiredDuringExecution` which will be identical to `requiredDuringSchedulingIgnoredDuringExecution`
 except that it will evict pods from nodes that cease to satisfy the pods' node affinity requirements.
-Thus an example of `requiredDuringSchedulingIgnoredDuringExecution` would be "only run the pod on nodes with Intel CPUs"
+{{<note>}}
-and an example `preferredDuringSchedulingIgnoredDuringExecution` would be "try to run this set of pods in failure
+In the preceding types, `IgnoredDuringExecution` means that if the node labels
-zone XYZ, but if it's not possible, then allow some to run elsewhere".
+change after Kubernetes schedules the Pod, the Pod continues to run.
 {{</note>}}
-Node affinity is specified as field `nodeAffinity` of field `affinity` in the PodSpec.
+You can specify node affinities using the `.spec.affinity.nodeAffinity` field in
 your Pod spec.
-Here's an example of a pod that uses node affinity:
+For example, consider the following Pod spec:
-{{< codenew file="pods/pod-with-node-affinity.yaml" >}}
+{{<codenew file="pods/pod-with-node-affinity.yaml">}}
-This node affinity rule says the pod can only be placed on a node with a label whose key is
+In this example, the following rules apply:
 `kubernetes.io/e2e-az-name` and whose value is either `e2e-az1` or `e2e-az2`. In addition,
 among nodes that meet that criteria, nodes with a label whose key is `another-node-label-key` and whose
 value is `another-node-label-value` should be preferred.
-You can see the operator `In` being used in the example. The new node affinity syntax supports the following operators: `In`, `NotIn`, `Exists`, `DoesNotExist`, `Gt`, `Lt`.
+  * The node *must* have a label with the key `kubernetes.io/e2e-az-name` and
-You can use `NotIn` and `DoesNotExist` to achieve node anti-affinity behavior, or use
+    the value is either `e2e-az1` or `e2e-az2`.
-[node taints](/docs/concepts/scheduling-eviction/taint-and-toleration/) to repel pods from specific nodes.
+  * The node *preferably* has a label with the key `another-node-label-key` and
    the value `another-node-label-value`.
-If you specify both `nodeSelector` and `nodeAffinity`, *both* must be satisfied for the pod
+You can use the `operator` field to specify a logical operator for Kubernetes to use when
-to be scheduled onto a candidate node.
+interpreting the rules. You can use `In`, `NotIn`, `Exists`, `DoesNotExist`,
 `Gt` and `Lt`.
-If you specify multiple `nodeSelectorTerms` associated with `nodeAffinity` types, then the pod can be scheduled onto a node **if one of the** `nodeSelectorTerms` can be satisfied.
+`NotIn` and `DoesNotExist` allow you to define node anti-affinity behavior.
 Alternatively, you can use [node taints](/docs/concepts/scheduling-eviction/taint-and-toleration/) 
 to repel Pods from specific nodes.
-If you specify multiple `matchExpressions` associated with `nodeSelectorTerms`, then the pod can be scheduled onto a node **only if all** `matchExpressions` is satisfied.
+{{<note>}}
 If you specify both `nodeSelector` and `nodeAffinity`, *both* must be satisfied
 for the Pod to be scheduled onto a node.
-If you remove or change the label of the node where the pod is scheduled, the pod won't be removed. In other words, the affinity selection works only at the time of scheduling the pod.
+If you specify multiple `nodeSelectorTerms` associated with `nodeAffinity`
 types, then the Pod can be scheduled onto a node if one of the specified `nodeSelectorTerms` can be
 satisfied.
-The `weight` field in `preferredDuringSchedulingIgnoredDuringExecution` is in the range 1-100. For each node that meets all of the scheduling requirements (resource request, RequiredDuringScheduling affinity expressions, etc.), the scheduler will compute a sum by iterating through the elements of this field and adding "weight" to the sum if the node matches the corresponding MatchExpressions. This score is then combined with the scores of other priority functions for the node. The node(s) with the highest total score are the most preferred.
+If you specify multiple `matchExpressions` associated with a single `nodeSelectorTerms`,
 then the Pod can be scheduled onto a node only if all the `matchExpressions` are
 satisfied. 
 {{</note>}}
 See [Assign Pods to Nodes using Node Affinity](/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/)
 for more information.
 #### Node affinity weight
 You can specify a `weight` between 1 and 100 for each instance of the
 `preferredDuringSchedulingIgnoredDuringExecution` affinity type. When the
 scheduler finds nodes that meet all the other scheduling requirements of the Pod, the
 scheduler iterates through every preferred rule that the node satisfies and adds the
 value of the `weight` for that expression to a sum.
 The final sum is added to the score of other priority functions for the node.
 Nodes with the highest total score are prioritized when the scheduler makes a
 scheduling decision for the Pod.
 For example, consider the following Pod spec: 
 {{<codenew file="pods/pod-with-affinity-anti-affinity.yaml">}}
 If there are two possible nodes that match the
 `requiredDuringSchedulingIgnoredDuringExecution` rule, one with the
 `label-1:key-1` label and another with the `label-2:key-2` label, the scheduler
 considers the `weight` of each node and adds the weight to the other scores for
 that node, and schedules the Pod onto the node with the highest final score.
 {{<note>}}
 If you want Kubernetes to successfully schedule the Pods in this example, you
 must have existing nodes with the `kubernetes.io/os=linux` label.
 {{</note>}}
 #### Node affinity per scheduling profile
 {{< feature-state for_k8s_version="v1.20" state="beta" >}}
 When configuring multiple [scheduling profiles](/docs/reference/scheduling/config/#multiple-profiles), you can associate
-a profile with a Node affinity, which is useful if a profile only applies to a specific set of Nodes.
+a profile with a node affinity, which is useful if a profile only applies to a specific set of nodes.
-To do so, add an `addedAffinity` to the args of the [`NodeAffinity` plugin](/docs/reference/scheduling/config/#scheduling-plugins)
+To do so, add an `addedAffinity` to the `args` field of the [`NodeAffinity` plugin](/docs/reference/scheduling/config/#scheduling-plugins)
 in the [scheduler configuration](/docs/reference/scheduling/config/). For example:
 ```yaml
@ -188,29 +211,41 @@ profiles:
 The `addedAffinity` is applied to all Pods that set `.spec.schedulerName` to `foo-scheduler`, in addition to the
 NodeAffinity specified in the PodSpec.
-That is, in order to match the Pod, Nodes need to satisfy `addedAffinity` and the Pod's `.spec.NodeAffinity`.
+That is, in order to match the Pod, nodes need to satisfy `addedAffinity` and
 the Pod's `.spec.NodeAffinity`.
-Since the `addedAffinity` is not visible to end users, its behavior might be unexpected to them. We
+Since the `addedAffinity` is not visible to end users, its behavior might be
-recommend to use node labels that have clear correlation with the profile's scheduler name.
+unexpected to them. Use node labels that have a clear correlation to the
 scheduler profile name.
 {{< note >}}
-The DaemonSet controller, which [creates Pods for DaemonSets](/docs/concepts/workloads/controllers/daemonset/#scheduled-by-default-scheduler)
+The DaemonSet controller, which [creates Pods for DaemonSets](/docs/concepts/workloads/controllers/daemonset/#scheduled-by-default-scheduler),
-is not aware of scheduling profiles. For this reason, it is recommended that you keep a scheduler profile, such as the
+does not support scheduling profiles. When the DaemonSet controller creates
-`default-scheduler`, without any `addedAffinity`. Then, the Daemonset's Pod template should use this scheduler name.
+Pods, the default Kubernetes scheduler places those Pods and honors any
-Otherwise, some Pods created by the Daemonset controller might remain unschedulable.
+`nodeAffinity` rules in the DaemonSet controller.
 {{< /note >}}
 ### Inter-pod affinity and anti-affinity
-Inter-pod affinity and anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled *based on
+Inter-pod affinity and anti-affinity allow you to constrain which nodes your
-labels on pods that are already running on the node* rather than based on labels on nodes. The rules are of the form
+Pods can be scheduled on based on the labels of **Pods** already running on that
-"this pod should (or, in the case of anti-affinity, should not) run in an X if that X is already running one or more pods that meet rule Y".
+node, instead of the node labels.
-Y is expressed as a LabelSelector with an optional associated list of namespaces; unlike nodes, because pods are namespaced
+
-(and therefore the labels on pods are implicitly namespaced),
+Inter-pod affinity and anti-affinity rules take the form "this
-a label selector over pod labels must specify which namespaces the selector should apply to. Conceptually X is a topology domain
+Pod should (or, in the case of anti-affinity, should not) run in an X if that X
-like node, rack, cloud provider zone, cloud provider region, etc. You express it using a `topologyKey` which is the
+is already running one or more Pods that meet rule Y", where X is a topology
-key for the node label that the system uses to denote such a topology domain; for example, see the label keys listed above
+domain like node, rack, cloud provider zone or region, or similar and Y is the
-in the section [Interlude: built-in node labels](#built-in-node-labels).
+rule Kubernetes tries to satisfy.
 You express these rules (Y) as [label selectors](/docs/concepts/overview/working-with-objects/labels/#label-selectors)
 with an optional associated list of namespaces. Pods are namespaced objects in
 Kubernetes, so Pod labels also implicitly have namespaces. Any label selectors
 for Pod labels should specify the namespaces in which Kubernetes should look for those
 labels.
 You express the topology domain (X) using a `topologyKey`, which is the key for
 the node label that the system uses to denote the domain. For examples, see
 [Well-Known Labels, Annotations and Taints](/docs/reference/labels-annotations-taints/).
 {{< note >}}
 Inter-pod affinity and anti-affinity require substantial amount of
@ -219,80 +254,106 @@ not recommend using them in clusters larger than several hundred nodes.
 {{< /note >}}
 {{< note >}}
-Pod anti-affinity requires nodes to be consistently labelled, in other words every node in the cluster must have an appropriate label matching `topologyKey`. If some or all nodes are missing the specified `topologyKey` label, it can lead to unintended behavior.
+Pod anti-affinity requires nodes to be consistently labelled, in other words,
 every node in the cluster must have an appropriate label matching `topologyKey`.
 If some or all nodes are missing the specified `topologyKey` label, it can lead
 to unintended behavior.
 {{< /note >}}
-As with node affinity, there are currently two types of pod affinity and anti-affinity, called `requiredDuringSchedulingIgnoredDuringExecution` and
+#### Types of inter-pod affinity and anti-affinity
 `preferredDuringSchedulingIgnoredDuringExecution` which denote "hard" vs. "soft" requirements.
 See the description in the node affinity section earlier.
 An example of `requiredDuringSchedulingIgnoredDuringExecution` affinity would be "co-locate the pods of service A and service B
 in the same zone, since they communicate a lot with each other"
 and an example `preferredDuringSchedulingIgnoredDuringExecution` anti-affinity would be "spread the pods from this service across zones"
 (a hard requirement wouldn't make sense, since you probably have more pods than zones).
-Inter-pod affinity is specified as field `podAffinity` of field `affinity` in the PodSpec.
+Similar to [node affinity](#node-affinity) are two types of Pod affinity and
-And inter-pod anti-affinity is specified as field `podAntiAffinity` of field `affinity` in the PodSpec.
+anti-affinity as follows:
-#### An example of a pod that uses pod affinity:
+  * `requiredDuringSchedulingIgnoredDuringExecution`
  * `preferredDuringSchedulingIgnoredDuringExecution`
 For example, you could use
 `requiredDuringSchedulingIgnoredDuringExecution` affinity to tell the scheduler to
 co-locate Pods of two services in the same cloud provider zone because they
 communicate with each other a lot. Similarly, you could use
 `preferredDuringSchedulingIgnoredDuringExecution` anti-affinity to spread Pods
 from a service across multiple cloud provider zones.
 To use inter-pod affinity, use the `affinity.podAffinity` field in the Pod spec.
 For inter-pod anti-affinity, use the `affinity.podAntiAffinity` field in the Pod
 spec.
 #### Pod affinity example {#an-example-of-a-pod-that-uses-pod-affinity}
 Consider the following Pod spec:
 {{< codenew file="pods/pod-with-pod-affinity.yaml" >}}
-The affinity on this pod defines one pod affinity rule and one pod anti-affinity rule. In this example, the
+This example defines one Pod affinity rule and one Pod anti-affinity rule. The
-`podAffinity` is `requiredDuringSchedulingIgnoredDuringExecution`
+Pod affinity rule uses the "hard"
-while the `podAntiAffinity` is `preferredDuringSchedulingIgnoredDuringExecution`. The
+`requiredDuringSchedulingIgnoredDuringExecution`, while the anti-affinity rule
-pod affinity rule says that the pod can be scheduled onto a node only if that node is in the same zone
+uses the "soft" `preferredDuringSchedulingIgnoredDuringExecution`.
-as at least one already-running pod that has a label with key "security" and value "S1". (More precisely, the pod is eligible to run
+
-on node N if node N has a label with key `topology.kubernetes.io/zone` and some value V
+The affinity rule says that the scheduler can only schedule a Pod onto a node if
-such that there is at least one node in the cluster with key `topology.kubernetes.io/zone` and
+the node is in the same zone as one or more existing Pods with the label
-value V that is running a pod that has a label with key "security" and value "S1".) The pod anti-affinity
+`security=S1`. More precisely, the scheduler must place the Pod on a node that has the
-rule says that the pod should not be scheduled onto a node if that node is in the same zone as a pod with
+`topology.kubernetes.io/zone=V` label, as long as there is at least one node in
-label having key "security" and value "S2". See the
+that zone that currently has one or more Pods with the Pod label `security=S1`. 
 The anti-affinity rule says that the scheduler should try to avoid scheduling
 the Pod onto a node that is in the same zone as one or more Pods with the label
 `security=S2`. More precisely, the scheduler should try to avoid placing the Pod on a node that has the
 `topology.kubernetes.io/zone=R` label if there are other nodes in the
 same zone currently running Pods with the `Security=S2` Pod label.
 See the
 [design doc](https://git.k8s.io/community/contributors/design-proposals/scheduling/podaffinity.md)
-for many more examples of pod affinity and anti-affinity, both the `requiredDuringSchedulingIgnoredDuringExecution`
+for many more examples of Pod affinity and anti-affinity.
 flavor and the `preferredDuringSchedulingIgnoredDuringExecution` flavor.
-The legal operators for pod affinity and anti-affinity are `In`, `NotIn`, `Exists`, `DoesNotExist`.
+You can use the `In`, `NotIn`, `Exists` and `DoesNotExist` values in the
 `operator` field for Pod affinity and anti-affinity.
-In principle, the `topologyKey` can be any legal label-key. However,
+In principle, the `topologyKey` can be any allowed label key with the following
-for performance and security reasons, there are some constraints on topologyKey:
+exceptions for performance and security reasons:
-1. For pod affinity, empty `topologyKey` is not allowed in both `requiredDuringSchedulingIgnoredDuringExecution`
+* For Pod affinity and anti-affinity, an empty `topologyKey` field is not allowed in both `requiredDuringSchedulingIgnoredDuringExecution`
-and `preferredDuringSchedulingIgnoredDuringExecution`.
+  and `preferredDuringSchedulingIgnoredDuringExecution`.
-2. For pod anti-affinity, empty `topologyKey` is also not allowed in both `requiredDuringSchedulingIgnoredDuringExecution`
+* For `requiredDuringSchedulingIgnoredDuringExecution` Pod anti-affinity rules,
-and `preferredDuringSchedulingIgnoredDuringExecution`.
+  the admission controller `LimitPodHardAntiAffinityTopology` limits
-3. For `requiredDuringSchedulingIgnoredDuringExecution` pod anti-affinity, the admission controller `LimitPodHardAntiAffinityTopology` was introduced to limit `topologyKey` to `kubernetes.io/hostname`. If you want to make it available for custom topologies, you may modify the admission controller, or disable it.
+  `topologyKey` to `kubernetes.io/hostname`. You can modify or disable the
-4. Except for the above cases, the `topologyKey` can be any legal label-key.
+  admission controller if you want to allow custom topologies.
-In addition to `labelSelector` and `topologyKey`, you can optionally specify a list `namespaces`
+In addition to `labelSelector` and `topologyKey`, you can optionally specify a list
-of namespaces which the `labelSelector` should match against (this goes at the same level of the definition as `labelSelector` and `topologyKey`).
+of namespaces which the `labelSelector` should match against using the
-If omitted or empty, it defaults to the namespace of the pod where the affinity/anti-affinity definition appears.
+`namespaces` field at the same level as `labelSelector` and `topologyKey`.
-
+If omitted or empty, `namespaces` defaults to the namespace of the Pod where the
-All `matchExpressions` associated with `requiredDuringSchedulingIgnoredDuringExecution` affinity and anti-affinity
+affinity/anti-affinity definition appears.
 must be satisfied for the pod to be scheduled onto a node.
 #### Namespace selector
 {{< feature-state for_k8s_version="v1.22" state="beta" >}}
-Users can also select matching namespaces using `namespaceSelector`, which is a label query over the set of namespaces.
+You can also select matching namespaces using `namespaceSelector`, which is a label query over the set of namespaces.
-The affinity term is applied to the union of the namespaces selected by `namespaceSelector` and the ones listed in the `namespaces` field.
+The affinity term is applied to namespaces selected by both `namespaceSelector` and the `namespaces` field.
 Note that an empty `namespaceSelector` ({}) matches all namespaces, while a null or empty `namespaces` list and 
-null `namespaceSelector` means "this pod's namespace".
+null `namespaceSelector` matches the namespace of the Pod where the rule is defined.
 {{<note>}}
 This feature is beta and enabled by default. You can disable it via the
 [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
 `PodAffinityNamespaceSelector` in both kube-apiserver and kube-scheduler.
 {{</note>}}
-#### More Practical Use-cases
+#### More practical use-cases
-Interpod Affinity and AntiAffinity can be even more useful when they are used with higher
+Inter-pod affinity and anti-affinity can be even more useful when they are used with higher
-level collections such as ReplicaSets, StatefulSets, Deployments, etc.  One can easily configure that a set of workloads should
+level collections such as ReplicaSets, StatefulSets, Deployments, etc.  These
 rules allow you to configure that a set of workloads should
 be co-located in the same defined topology, eg., the same node.
-##### Always co-located in the same node
+Take, for example, a three-node cluster running a web application with an
 in-memory cache like redis. You could use inter-pod affinity and anti-affinity
 to co-locate the web servers with the cache as much as possible.
-In a three node cluster, a web application has in-memory cache such as redis. We want the web-servers to be co-located with the cache as much as possible.
+In the following example Deployment for the redis cache, the replicas get the label `app=store`. The
-
+`podAntiAffinity` rule tells the scheduler to avoid placing multiple replicas
-Here is the yaml snippet of a simple redis deployment with three replicas and selector label `app=store`. The deployment has `PodAntiAffinity` configured to ensure the scheduler does not co-locate replicas on a single node.
+with the `app=store` label on a single node. This creates each cache in a
 separate node.
 ```yaml
 apiVersion: apps/v1
@ -324,7 +385,10 @@ spec:
        image: redis:3.2-alpine
 ```
-The below yaml snippet of the webserver deployment has `podAntiAffinity` and `podAffinity` configured. This informs the scheduler that all its replicas are to be co-located with pods that have selector label `app=store`. This will also ensure that each web-server replica does not co-locate on a single node.
+The following Deployment for the web servers creates replicas with the label `app=web-store`. The
 Pod affinity rule tells the scheduler to place each replica on a node that has a
 Pod with the label `app=store`. The Pod anti-affinity rule tells the scheduler
 to avoid placing multiple `app=web-store` servers on a single node.
 ```yaml
 apiVersion: apps/v1
@ -365,56 +429,37 @@ spec:
        image: nginx:1.16-alpine
 ```
-If we create the above two deployments, our three node cluster should look like below.
+Creating the two preceding Deployments results in the following cluster layout,
 where each web server is co-located with a cache, on three separate nodes.
 |       node-1         |       node-2        |       node-3       |
 |:--------------------:|:-------------------:|:------------------:|
 | *webserver-1*        |   *webserver-2*     |    *webserver-3*   |
 |  *cache-1*           |     *cache-2*       |     *cache-3*      |
-As you can see, all the 3 replicas of the `web-server` are automatically co-located with the cache as expected.
+See the [ZooKeeper tutorial](/docs/tutorials/stateful-application/zookeeper/#tolerating-node-failure)
-
+for an example of a StatefulSet configured with anti-affinity for high
-```
+availability, using the same technique as this example.
 kubectl get pods -o wide
 ```
 The output is similar to this:
 ```
 NAME                           READY     STATUS    RESTARTS   AGE       IP           NODE
 redis-cache-1450370735-6dzlj   1/1       Running   0          8m        10.192.4.2   kube-node-3
 redis-cache-1450370735-j2j96   1/1       Running   0          8m        10.192.2.2   kube-node-1
 redis-cache-1450370735-z73mh   1/1       Running   0          8m        10.192.3.1   kube-node-2
 web-server-1287567482-5d4dz    1/1       Running   0          7m        10.192.2.3   kube-node-1
 web-server-1287567482-6f7v5    1/1       Running   0          7m        10.192.4.3   kube-node-3
 web-server-1287567482-s330j    1/1       Running   0          7m        10.192.3.2   kube-node-2
 ```
 ##### Never co-located in the same node
 The above example uses `PodAntiAffinity` rule with `topologyKey: "kubernetes.io/hostname"` to deploy the redis cluster so that
 no two instances are located on the same host.
 See [ZooKeeper tutorial](/docs/tutorials/stateful-application/zookeeper/#tolerating-node-failure)
 for an example of a StatefulSet configured with anti-affinity for high availability, using the same technique.
 ## nodeName
-`nodeName` is the simplest form of node selection constraint, but due
+`nodeName` is a more direct form of node selection than affinity or
-to its limitations it is typically not used.  `nodeName` is a field of
+`nodeSelector`. `nodeName` is a field in the Pod spec. If the `nodeName` field
-PodSpec.  If it is non-empty, the scheduler ignores the pod and the
+is not empty, the scheduler ignores the Pod and the kubelet on the named node
-kubelet running on the named node tries to run the pod.  Thus, if
+tries to place the Pod on that node. Using `nodeName` overrules using
-`nodeName` is provided in the PodSpec, it takes precedence over the
+`nodeSelector` or affinity and anti-affinity rules.
 above methods for node selection.
 Some of the limitations of using `nodeName` to select nodes are:
-   If the named node does not exist, the pod will not be run, and in
+-   If the named node does not exist, the Pod will not run, and in
    some cases may be automatically deleted.
 -   If the named node does not have the resources to accommodate the
-    pod, the pod will fail and its reason will indicate why,
+    Pod, the Pod will fail and its reason will indicate why,
    for example OutOfmemory or OutOfcpu.
 -   Node names in cloud environments are not always predictable or
    stable.
-Here is an example of a pod config file using the `nodeName` field:
+Here is an example of a Pod spec using the `nodeName` field:
 ```yaml
 apiVersion: v1
@ -428,21 +473,16 @@ spec:
  nodeName: kube-01
 ```
-The above pod will run on the node kube-01.
+The above Pod will only run on the node `kube-01`.
 ## {{% heading "whatsnext" %}}
-
+* Read more about [taints and tolerations](/docs/concepts/scheduling-eviction/taint-and-toleration/) .
-[Taints](/docs/concepts/scheduling-eviction/taint-and-toleration/) allow a Node to *repel* a set of Pods.
+* Read the design docs for [node affinity](https://git.k8s.io/community/contributors/design-proposals/scheduling/nodeaffinity.md)
-
+  and for [inter-pod affinity/anti-affinity](https://git.k8s.io/community/contributors/design-proposals/scheduling/podaffinity.md).
-The design documents for
+* Learn about how the [topology manager](/docs/tasks/administer-cluster/topology-manager/) takes part in node-level
-[node affinity](https://git.k8s.io/community/contributors/design-proposals/scheduling/nodeaffinity.md)
+  resource allocation decisions. 
-and for [inter-pod affinity/anti-affinity](https://git.k8s.io/community/contributors/design-proposals/scheduling/podaffinity.md) contain extra background information about these features.
+* Learn how to use [nodeSelector](/docs/tasks/configure-pod-container/assign-pods-nodes/).
-
+* Learn how to use [affinity and anti-affinity](/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/).
 Once a Pod is assigned to a Node, the kubelet runs the Pod and allocates node-local resources.
 The [topology manager](/docs/tasks/administer-cluster/topology-manager/) can take part in node-level
 resource allocation decisions. 
--- a/content/en/examples/examples_test.go
+++ b/content/en/examples/examples_test.go
@ -556,6 +556,7 @@ func TestExampleObjectSchemas(t *testing.T) {
 			"pod-projected-svc-token":             {&api.Pod{}},
 			"pod-rs":                              {&api.Pod{}, &api.Pod{}},
 			"pod-single-configmap-env-variable":   {&api.Pod{}},
 			"pod-with-affinity-anti-affinity":     {&api.Pod{}},
 			"pod-with-node-affinity":              {&api.Pod{}},
 			"pod-with-pod-affinity":               {&api.Pod{}},
 			"pod-with-toleration":                 {&api.Pod{}},
--- a/content/en/examples/pods/pod-with-affinity-anti-affinity.yaml
+++ b/content/en/examples/pods/pod-with-affinity-anti-affinity.yaml
@ -0,0 +1,32 @@
 apiVersion: v1
 kind: Pod
 metadata:
  name: with-affinity-anti-affinity
 spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/os
            operator: In
            values:
            - linux
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: label-1
            operator: In
            values:
            - key-1
      - weight: 50
        preference:
          matchExpressions:
          - key: label-2
            operator: In
            values:
            - key-2
  containers:
  - name: with-node-affinity
    image: k8s.gcr.io/pause:2.0
--- a/content/en/examples/pods/pod-with-node-affinity.yaml
+++ b/content/en/examples/pods/pod-with-node-affinity.yaml
@ -8,11 +8,10 @@ spec:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
-          - key: kubernetes.io/e2e-az-name
+          - key: kubernetes.io/os
            operator: In
            values:
-            - e2e-az1
+            - linux
            - e2e-az2
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference: