Add tutorial: Running a Replicated Stateful Application

2016-11-23 14:19:13 -08:00 · 2016-11-23 14:19:13 -08:00 · 38edbd87e6
parent 3d08fd0fa2
commit 38edbd87e6
4 changed files with 529 additions and 0 deletions
--- a/_data/tutorials.yml
+++ b/_data/tutorials.yml
@ -55,3 +55,5 @@ toc:
  section:
  - title: Running a Single-Instance Stateful Application
    path: /docs/tutorials/stateful-application/run-stateful-application/
+  - title: Running a Replicated Stateful Application
+    path: /docs/tutorials/replicated-stateful-application/run-replicated-stateful-application/
--- a/_includes/default-storage-class-prereqs.md
+++ b/_includes/default-storage-class-prereqs.md
@ -0,0 +1,4 @@
+You need to either have a dynamic Persistent Volume provisioner with a default
+[Storage Class](/docs/user-guide/persistent-volumes/#storageclasses),
+or [statically provision Persistent Volumes](/docs/user-guide/persistent-volumes/#provisioning)
+yourself to satisfy the Persistent Volume Claims used here.
--- a/docs/tutorials/index.md
+++ b/docs/tutorials/index.md
@ -21,6 +21,7 @@ each of which has a sequence of steps.
 #### Stateful Applications

 * [Running a Single-Instance Stateful Application](/docs/tutorials/stateful-application/run-stateful-application/)
+* [Running a Replicated Stateful Application](/docs/tutorials/replicated-stateful-application/run-replicated-stateful-application/)

 ### What's next

--- a/docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md
+++ b/docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md
@ -0,0 +1,522 @@
+---
+assignees:
+- bprashanth
+- enisoc
+- erictune
+- foxish
+- janetkuo
+- kow3ns
+- smarterclayton
+
+---
+
+{% capture overview %}
+
+This page shows how to run a replicated stateful application using a
+[Stateful Set](/docs/concepts/controllers/statefulsets/) controller.
+The example is a MySQL single-master topology with multiple slaves running
+asynchronous replication.
+
+Note that **this is not a production configuration**.
+In particular, MySQL settings remain on insecure defaults to keep the focus
+on general patterns for running stateful applications in Kubernetes.
+
+{% endcapture %}
+
+{% capture prerequisites %}
+
+* {% include task-tutorial-prereqs.md %}
+* {% include default-storage-class-prereqs.md %}
+* This tutorial assumes you are familiar with
+[Persistent Volumes](/docs/user-guide/persistent-volumes/)
+and [Stateful Sets](/docs/concepts/controllers/statefulsets/),
+as well as other core concepts like Pods, Services and Config Maps.
+* Some familiarity with MySQL will help, but this tutorial aims to present
+  general patterns that should be useful for other systems.
+
+{% endcapture %}
+
+{% capture objectives %}
+
+* Deploy a replicated MySQL topology with a Stateful Set controller.
+* Send MySQL client traffic.
+* Observe resistance to downtime.
+* Scale the Stateful Set up and down.
+
+{% endcapture %}
+
+{% capture lessoncontent %}
+
+### Deploying MySQL
+
+The example MySQL deployment consists of a Config Map, two Services,
+and a Stateful Set.
+
+#### Config Map
+
+Create the Config Map by saving the following manifest to `mysql-configmap.yaml`
+and running:
+
+```shell
+kubectl create -f mysql-configmap.yaml
+```
+
+{% include code.html language="yaml" file="mysql-configmap.yaml" ghlink="/docs/tutorials/replicated-stateful-application/mysql-configmap.yaml" %}
+
+This Config Map provides `my.cnf` overrides that let you independently control
+configuration on the master and the slaves.
+In this case, you want the master to be able to serve replication logs to slaves
+and you want slaves to reject any writes that don't come via replication.
+
+There's nothing special about the ConfigMap itself that causes different
+portions to apply to different Pods.
+Each Pod will decide which portion to look at as it's initializing,
+based on information provided by the Stateful Set controller.
+
+#### Services
+
+Create the Services by saving the following manifest to `mysql-services.yaml`
+and running:
+
+```shell
+kubectl create -f mysql-services.yaml
+```
+
+{% include code.html language="yaml" file="mysql-services.yaml" ghlink="/docs/tutorials/replicated-stateful-application/mysql-services.yaml" %}
+
+The Headless Service provides a home for the DNS entries that the Stateful Set
+controller will create for each Pod that's part of the set.
+Since the Headless Service is named `mysql`, the Pods will be accessible by
+resolving `<pod-name>.mysql` from within any other Pod in the same Kubernetes
+cluster and namespace.
+
+The Client Service, called `mysql-read`, is a normal Service with its own
+cluster IP that will distribute connections across all MySQL Pods that report
+being Ready. The set of endpoints will include the master and all slaves.
+
+Note that only read queries can use the load-balanced Client Service.
+Since there is only one master, clients should connect directly to the master
+Pod (through its DNS entry within the Headless Service) to execute writes.
+
+#### Stateful Set
+
+Finally, create the Stateful Set by saving the following manifest to
+`mysql-statefulset.yaml` and running:
+
+```shell
+kubectl create -f mysql-statefulset.yaml
+```
+
+{% include code.html language="yaml" file="mysql-statefulset.yaml" ghlink="/docs/tutorials/replicated-stateful-application/mysql-statefulset.yaml" %}
+
+You can watch the startup progress by running:
+
+```shell
+kubectl get pods -l app=mysql --watch
+```
+
+After a while, you should see all 3 Pods become Running:
+
+```
+NAME      READY     STATUS    RESTARTS   AGE
+mysql-0   2/2       Running   0          2m
+mysql-1   2/2       Running   0          1m
+mysql-2   2/2       Running   0          1m
+```
+
+Press **Ctrl+C** to cancel the watch.
+If you don't see any progress, make sure you have a dynamic Persistent Volume
+provisioner enabled as mentioned in the [prerequisites](#before-you-begin).
+
+This manifest uses a variety of techniques for managing stateful Pods as part of
+a Stateful Set. The next section highlights some of these techniques to explain
+what happens as the Stateful Set creates Pods.
+
+### Understanding stateful Pod initialization
+
+The Stateful Set controller starts Pods one at a time, in order by their
+ordinal index.
+It waits until each Pod reports being Ready before starting the next one.
+
+In addition, the controller assigns each Pod a unique, stable name of the form
+`<statefulset-name>-<ordinal-index>`.
+In this case, that results in Pods named `mysql-0`, `mysql-1`, and `mysql-2`.
+
+The Pod template in the above Stateful Set manifest takes advantage of these
+properties to perform orderly startup of MySQL replication.
+
+#### Generating configuration
+
+Before starting any of the containers in the Pod spec, the Pod first runs any
+[Init Containers](/docs/user-guide/production-pods/#handling-initialization)
+in the order defined.
+In the Stateful Set manifest, you will find these defined within the
+`pod.beta.kubernetes.io/init-containers` annotation.
+
+The first Init Container, named `init-mysql`, generates special MySQL config
+files based on the ordinal index.
+
+The script determines its own ordinal index by extracting it from the end of
+the Pod name, which is returned by the `hostname` command.
+Then it saves the ordinal (with a numeric offset to avoid reserved values)
+into a file called `server-id.cnf` in the MySQL `conf.d` directory.
+This translates the unique, stable identity provided by the Stateful Set
+controller into the domain of MySQL server IDs, which require the same
+properties.
+
+The script in the `init-mysql` container also applies either `master.cnf` or
+`slave.cnf` from the Config Map by copying the contents into `conf.d`.
+Since the example topology consists of a single master and any number of slaves,
+the script simply assigns ordinal `0` to be the master, and everyone else to be
+slaves.
+
+#### Cloning existing data
+
+In general, when a new Pod joins the set as a slave, it must assume the master
+may already have data on it. It also must assume that the replication logs may
+not go all the way back to the beginning of time.
+These conservative assumptions are the key to allowing a running Stateful Set
+to scale up and down over time, rather than being fixed at its initial size.
+
+The second Init Container, named `clone-mysql`, performs a clone operation on
+a slave Pod the first time it starts up on an empty Persistent Volume.
+That means it copies all existing data from another running Pod,
+so its local state is consistent enough to begin replicating from the master.
+
+MySQL itself does not provide a mechanism to do this, so the example uses a
+popular open-source tool called Percona XtraBackup.
+During the clone, the source MySQL server may suffer reduced performance.
+To minimize impact on the master, the script instructs each Pod to clone from
+the Pod whose ordinal index is one lower.
+This works because the Stateful Set controller will always ensure Pod `N` is
+Ready before starting Pod `N+1`.
+
+#### Starting replication
+
+After the Init Containers complete successfully, the regular containers run.
+The MySQL Pods consist of a `mysql` container that runs the actual `mysqld`
+server, and an `xtrabackup` container that acts as a
+[sidecar](http://blog.kubernetes.io/2015/06/the-distributed-system-toolkit-patterns.html).
+
+The `xtrabackup` sidecar looks at the cloned data files and determines if
+it's necessary to initialize MySQL replication on the slave.
+If so, it waits for `mysqld` to be ready and then executes the
+`CHANGE MASTER TO` and `START SLAVE` commands with replication parameters
+extracted from the XtraBackup clone files.
+
+Once a slave begins replication, by default it will remember its master and
+reconnect automatically if the server is restarted or the connection dies.
+Also, since slaves look for the master at its stable DNS name (`mysql-0.mysql`),
+they will automatically find the master even if it gets a new Pod IP due to
+being rescheduled.
+
+Lastly, after starting replication, the `xtrabackup` container listens for
+connections from other Pods requesting a data clone.
+This server remains up indefinitely in case the Stateful Set scales up, or in
+case the next Pod loses its Persistent Volume Claim and needs to redo the clone.
+
+### Sending client traffic
+
+You can send test queries to the master (hostname `mysql-0.mysql`)
+by running a temporary container with the `mysql:5.7` image and running the
+`mysql` client binary.
+
+```shell
+kubectl run mysql-client --image=mysql:5.7 -i -t --rm --restart=Never --\
+  mysql -h mysql-0.mysql <<EOF
+CREATE DATABASE test;
+CREATE TABLE test.messages (message VARCHAR(250));
+INSERT INTO test.messages VALUES ('hello');
+EOF
+```
+
+Use the hostname `mysql-read` to send test queries to any server that reports
+being Ready:
+
+```shell
+kubectl run mysql-client --image=mysql:5.7 -i -t --rm --restart=Never --\
+  mysql -h mysql-read -e "SELECT * FROM test.messages"
+```
+
+You should get output like this:
+
+```
+Waiting for pod default/mysql-client to be running, status is Pending, pod ready: false
+---------+
+| message |
+---------+
+| hello   |
+---------+
+pod "mysql-client" deleted
+```
+
+To demonstrate that the `mysql-read` Service distributes connections across
+servers, you can run `SELECT @@server_id` in a loop:
+
+```shell
+kubectl run mysql-client-loop --image=mysql:5.7 -i -t --rm --restart=Never --\
+  bash -ic "while sleep 1; do mysql -h mysql-read -e 'SELECT @@server_id,NOW()'; done"
+```
+
+You should see the reported `@@server_id` change randomly, since a different
+endpoint may be selected upon each connection attempt:
+
+```
+-------------+---------------------+
+| @@server_id | NOW()               |
+-------------+---------------------+
+|         100 | 2006-01-02 15:04:05 |
+-------------+---------------------+
+-------------+---------------------+
+| @@server_id | NOW()               |
+-------------+---------------------+
+|         102 | 2006-01-02 15:04:06 |
+-------------+---------------------+
+-------------+---------------------+
+| @@server_id | NOW()               |
+-------------+---------------------+
+|         101 | 2006-01-02 15:04:07 |
+-------------+---------------------+
+```
+
+You can press **Ctrl+C** when you want to stop the loop, but it's useful to keep
+it running in another window so you can see the effects of the following steps.
+
+### Simulating Pod and Node downtime
+
+To demonstrate the increased availability of reading from the pool of slaves
+instead of a single server, keep the `SELECT @@server_id` loop from above
+running while you force a Pod out of the Ready state.
+
+#### Break the Readiness Probe
+
+The [readiness probe](/docs/user-guide/production-pods/#liveness-and-readiness-probes-aka-health-checks)
+for the `mysql` container runs the command `mysql -h 127.0.0.1 -e 'SELECT 1'`
+to make sure the server is up and able to execute queries.
+
+One way to force this readiness probe to fail is to break that command:
+
+```shell
+kubectl exec mysql-2 -c mysql -- mv /usr/bin/mysql /usr/bin/mysql.off
+```
+
+This reaches into the actual container's filesystem for Pod `mysql-2` and
+renames the `mysql` command so the readiness probe can't find it.
+After a few seconds, the Pod should report one of its containers as not Ready,
+which you can check by running:
+
+```shell
+kubectl get pod mysql-2
+```
+
+Look for `1/2` in the `READY` column:
+
+```
+NAME      READY     STATUS    RESTARTS   AGE
+mysql-2   1/2       Running   0          3m
+```
+
+At this point, you should see your `SELECT @@server_id` loop continue to run,
+although it never reports `102` anymore.
+Recall that the `init-mysql` script defined `server-id` as `100 + $ordinal`,
+so server ID `102` corresponds to Pod `mysql-2`.
+
+Now repair the Pod and it should reappear in the loop output
+after a few seconds:
+
+```shell
+kubectl exec mysql-2 -c mysql -- mv /usr/bin/mysql.off /usr/bin/mysql
+```
+
+#### Delete Pods
+
+The Stateful Set will also recreate Pods if they're deleted, similar to what a
+Replica Set does for stateless Pods.
+
+```shell
+kubectl delete pod mysql-2
+```
+
+The Stateful Set controller will notice that no `mysql-2` Pod exists anymore,
+and will create a new one with the same name and linked to the same
+Persistent Volume Claim.
+You should see server ID `102` disappear from the loop output for a while
+and then return on its own.
+
+#### Drain a Node
+
+If your Kubernetes cluster has multiple Nodes, you can simulate Node downtime
+(such as when Nodes are upgraded) by issuing a
+[drain](http://kubernetes.io/docs/user-guide/kubectl/kubectl_drain/).
+
+First determine which Node one of the MySQL Pods is on:
+
+```shell
+kubectl get pod mysql-2 -o wide
+```
+
+The Node name should show up in the last column:
+
+```
+NAME      READY     STATUS    RESTARTS   AGE       IP            NODE
+mysql-2   2/2       Running   0          15m       10.244.5.27   kubernetes-minion-group-9l2t
+```
+
+Then drain the Node by running the following command, which will cordon it so
+no new Pods may schedule there, and then evict any existing Pods.
+Replace `<node-name>` with the name of the Node you found in the last step.
+
+This may impact other applications on the Node, so it's best to
+**only do this in a test cluster**.
+
+```shell
+kubectl drain <node-name> --force --delete-local-data --ignore-daemonsets
+```
+
+Now you can watch as the Pod reschedules on a different Node:
+
+```shell
+kubectl get pod mysql-2 -o wide --watch
+```
+
+It should look something like this:
+
+```
+NAME      READY   STATUS          RESTARTS   AGE       IP            NODE
+mysql-2   2/2     Terminating     0          15m       10.244.1.56   kubernetes-minion-group-9l2t
+[...]
+mysql-2   0/2     Pending         0          0s        <none>        kubernetes-minion-group-fjlm
+mysql-2   0/2     Init:0/2        0          0s        <none>        kubernetes-minion-group-fjlm
+mysql-2   0/2     Init:1/2        0          20s       10.244.5.32   kubernetes-minion-group-fjlm
+mysql-2   0/2     PodInitializing 0          21s       10.244.5.32   kubernetes-minion-group-fjlm
+mysql-2   1/2     Running         0          22s       10.244.5.32   kubernetes-minion-group-fjlm
+mysql-2   2/2     Running         0          30s       10.244.5.32   kubernetes-minion-group-fjlm
+```
+
+And again, you should see server ID `102` disappear from the
+`SELECT @@server_id` loop output for a while and then return.
+
+Now uncordon the Node to return it to a normal state:
+
+```shell
+kubectl uncordon <node-name>
+```
+
+### Scaling the number of slaves
+
+With MySQL replication, you can scale your read query capacity by adding slaves.
+With Stateful Set, you can do this with a single command:
+
+```shell
+kubectl scale --replicas=5 statefulset mysql
+```
+
+Watch the new Pods come up by running:
+
+```shell
+kubectl get pods -l app=mysql --watch
+```
+
+Once they're up, you should see server IDs `103` and `104` start appearing in
+the `SELECT @@server_id` loop output.
+
+You can also verify that these new servers have the data you added before they
+existed:
+
+```shell
+kubectl run mysql-client --image=mysql:5.7 -i -t --rm --restart=Never --\
+  mysql -h mysql-3.mysql -e "SELECT * FROM test.messages"
+```
+
+```
+Waiting for pod default/mysql-client to be running, status is Pending, pod ready: false
+---------+
+| message |
+---------+
+| hello   |
+---------+
+pod "mysql-client" deleted
+```
+
+Scaling back down is also seamless:
+
+```shell
+kubectl scale --replicas=3 statefulset mysql
+```
+
+Note, however, that while scaling up creates new Persistent Volume Claims
+automatically, scaling down does not automatically delete these PVCs.
+This gives you the choice to keep those initialized PVCs around to make
+scaling back up quicker, or to extract data before deleting them.
+
+You can see this by running:
+
+```shell
+kubectl get pvc -l app=mysql
+```
+
+Which will show that all 5 PVCs still exist, despite having scaled the
+Stateful Set down to 3:
+
+```
+NAME           STATUS    VOLUME                                     CAPACITY   ACCESSMODES   AGE
+data-mysql-0   Bound     pvc-8acbf5dc-b103-11e6-93fa-42010a800002   10Gi       RWO           20m
+data-mysql-1   Bound     pvc-8ad39820-b103-11e6-93fa-42010a800002   10Gi       RWO           20m
+data-mysql-2   Bound     pvc-8ad69a6d-b103-11e6-93fa-42010a800002   10Gi       RWO           20m
+data-mysql-3   Bound     pvc-50043c45-b1c5-11e6-93fa-42010a800002   10Gi       RWO           2m
+data-mysql-4   Bound     pvc-500a9957-b1c5-11e6-93fa-42010a800002   10Gi       RWO           2m
+```
+
+If you don't intend to reuse the extra PVCs, you can delete them:
+
+```shell
+kubectl delete pvc data-mysql-3
+kubectl delete pvc data-mysql-4
+```
+
+{% endcapture %}
+
+{% capture cleanup %}
+
+* Cancel the `SELECT @@server_id` loop by pressing **Ctrl+C** in its terminal,
+  or running the following from another terminal:
+
+  ```shell
+  kubectl delete pod mysql-client-loop --now
+  ```
+
+* Delete the Stateful Set. This will also begin terminating the Pods.
+
+  ```shell
+  kubectl delete statefulset mysql
+  ```
+
+* Verify that the Pods disappear. They may take some time to finish terminating.
+
+  ```shell
+  kubectl get pods -l app=mysql
+  ```
+
+  You'll know the Pods have terminated when the above returns:
+
+  ```
+  No resources found.
+  ```
+
+* Delete the ConfigMap, Services, and Persistent Volume Claims.
+
+  ```shell
+  kubectl delete configmap,service,pvc -l app=mysql
+  ```
+
+{% endcapture %}
+
+{% capture whatsnext %}
+
+* Look in the [Helm Charts repository](https://github.com/kubernetes/charts)
+  for other stateful application examples.
+
+{% endcapture %}
+
+{% include templates/tutorial.md %}
+