From 03190b4bedac819dbf80e70e1574ed0f43ab2cda Mon Sep 17 00:00:00 2001
From: eduartua <eduartua@gmail.com>
Date: Wed, 30 Jan 2019 13:31:33 -0600
Subject: [PATCH] file writing-good-e2e-tests.md moved to the new folder
 /devel/sig-testing - URLs updated - tombstone file created

---
 .../sig-testing/writing-good-e2e-tests.md     | 231 +++++++++++++++++
 contributors/devel/writing-good-e2e-tests.md  | 232 +-----------------
 events/elections/2017/quintonhoole_bio.md     |   2 +-
 events/elections/2018/quintonhoole.md         |   2 +-
 4 files changed, 235 insertions(+), 232 deletions(-)
 create mode 100644 contributors/devel/sig-testing/writing-good-e2e-tests.md

diff --git a/contributors/devel/sig-testing/writing-good-e2e-tests.md b/contributors/devel/sig-testing/writing-good-e2e-tests.md
new file mode 100644
index 000000000..836479c2e
--- /dev/null
+++ b/contributors/devel/sig-testing/writing-good-e2e-tests.md
@@ -0,0 +1,231 @@
+# Writing good e2e tests for Kubernetes #
+
+## Patterns and Anti-Patterns ##
+
+### Goals of e2e tests ###
+
+Beyond the obvious goal of providing end-to-end system test coverage,
+there are a few less obvious goals that you should bear in mind when
+designing, writing and debugging your end-to-end tests.  In
+particular, "flaky" tests, which pass most of the time but fail
+intermittently for difficult-to-diagnose reasons are extremely costly
+in terms of blurring our regression signals and slowing down our
+automated merge velocity.  Up-front time and effort designing your test
+to be reliable is very well spent.  Bear in mind that we have hundreds
+of tests, each running in dozens of different environments, and if any
+test in any test environment fails, we have to assume that we
+potentially have some sort of regression. So if a significant number
+of tests fail even only 1% of the time, basic statistics dictates that
+we will almost never have a "green" regression indicator.  Stated
+another way, writing a test that is only 99% reliable is just about
+useless in the harsh reality of a CI environment.  In fact it's worse
+than useless, because not only does it not provide a reliable
+regression indicator, but it also costs a lot of subsequent debugging
+time, and delayed merges.
+
+#### Debuggability ####
+
+If your test fails, it should provide as detailed as possible reasons
+for the failure in its output. "Timeout" is not a useful error
+message. "Timed out after 60 seconds waiting for pod xxx to enter
+running state, still in pending state" is much more useful to someone
+trying to figure out why your test failed and what to do about it.
+Specifically,
+[assertion](https://onsi.github.io/gomega/#making-assertions) code
+like the following generates rather useless errors:
+
+```
+Expect(err).NotTo(HaveOccurred())
+```
+
+Rather
+[annotate](https://onsi.github.io/gomega/#annotating-assertions) your assertion with something like this:
+
+```
+Expect(err).NotTo(HaveOccurred(), "Failed to create %d foobars, only created %d", foobarsReqd, foobarsCreated)
+```
+
+On the other hand, overly verbose logging, particularly of non-error conditions, can make
+it unnecessarily difficult to figure out whether a test failed and if
+so why?  So don't log lots of irrelevant stuff either.
+
+#### Ability to run in non-dedicated test clusters ####
+
+To reduce end-to-end delay and improve resource utilization when
+running e2e tests, we try, where possible, to run large numbers of
+tests in parallel against the same test cluster.  This means that:
+
+1. you should avoid making any assumption (implicit or explicit) that
+your test is the only thing running against the cluster.  For example,
+making the assumption that your test can run a pod on every node in a
+cluster is not a safe assumption, as some other tests, running at the
+same time as yours, might have saturated one or more nodes in the
+cluster.  Similarly, running a pod in the system namespace, and
+assuming that will increase the count of pods in the system
+namespace by one is not safe, as some other test might be creating or
+deleting pods in the system namespace at the same time as your test.
+If you do legitimately need to write a test like that, make sure to
+label it ["\[Serial\]"](e2e-tests.md#kinds-of-tests) so that it's easy
+to identify, and not run in parallel with any other tests.
+1. You should avoid doing things to the cluster that make it difficult
+for other tests to reliably do what they're trying to do, at the same
+time.  For example, rebooting nodes, disconnecting network interfaces,
+or upgrading cluster software as part of your test is likely to
+violate the assumptions that other tests might have made about a
+reasonably stable cluster environment.  If you need to write such
+tests, please label them as
+["\[Disruptive\]"](e2e-tests.md#kinds-of-tests) so that it's easy to
+identify them, and not run them in parallel with other tests.
+1. You should avoid making assumptions about the Kubernetes API that
+are not part of the API specification, as your tests will break as
+soon as these assumptions become invalid.  For example, relying on
+specific Events, Event reasons or Event messages will make your tests
+very brittle.
+
+#### Speed of execution ####
+
+We have hundreds of e2e tests, some of which we run in serial, one
+after the other, in some cases.  If each test takes just a few minutes
+to run, that very quickly adds up to many, many hours of total
+execution time.  We try to keep such total execution time down to a
+few tens of minutes at most.  Therefore, try (very hard) to keep the
+execution time of your individual tests below 2 minutes, ideally
+shorter than that.  Concretely, adding inappropriately long 'sleep'
+statements or other gratuitous waits to tests is a killer.  If under
+normal circumstances your pod enters the running state within 10
+seconds, and 99.9% of the time within 30 seconds, it would be
+gratuitous to wait 5 minutes for this to happen.  Rather just fail
+after 30 seconds, with a clear error message as to why your test
+failed ("e.g. Pod x failed to become ready after 30 seconds, it
+usually takes 10 seconds").  If you do have a truly legitimate reason
+for waiting longer than that, or writing a test which takes longer
+than 2 minutes to run, comment very clearly in the code why this is
+necessary, and label the test as
+["\[Slow\]"](e2e-tests.md#kinds-of-tests), so that it's easy to
+identify and avoid in test runs that are required to complete
+timeously (for example those that are run against every code
+submission before it is allowed to be merged).
+Note that completing within, say, 2 minutes only when the test
+passes is not generally good enough.  Your test should also fail in a
+reasonable time.  We have seen tests that, for example, wait up to 10
+minutes for each of several pods to become ready.  Under good
+conditions these tests might pass within a few seconds, but if the
+pods never become ready (e.g. due to a system regression) they take a
+very long time to fail and typically cause the entire test run to time
+out, so that no results are produced.  Again, this is a lot less
+useful than a test that fails reliably within a minute or two when the
+system is not working correctly.
+
+#### Resilience to relatively rare, temporary infrastructure glitches or delays ####
+
+Remember that your test will be run many thousands of
+times, at different times of day and night, probably on different
+cloud providers, under different load conditions.  And often the
+underlying state of these systems is stored in eventually consistent
+data stores.  So, for example, if a resource creation request is
+theoretically asynchronous, even if you observe it to be practically
+synchronous most of the time, write your test to assume that it's
+asynchronous (e.g. make the "create" call, and poll or watch the
+resource until it's in the correct state before proceeding).
+Similarly, don't assume that API endpoints are 100% available.
+They're not.  Under high load conditions, API calls might temporarily
+fail or time-out. In such cases it's appropriate to back off and retry
+a few times before failing your test completely (in which case make
+the error message very clear about what happened, e.g. "Retried
+http://... 3 times - all failed with xxx".  Use the standard
+retry mechanisms provided in the libraries detailed below.
+
+### Some concrete tools at your disposal ###
+
+Obviously most of the above goals apply to many tests, not just yours.
+So we've developed a set of reusable test infrastructure, libraries
+and best practices to help you to do the right thing, or at least do
+the same thing as other tests, so that if that turns out to be the
+wrong thing, it can be fixed in one place, not hundreds, to be the
+right thing.
+
+Here are a few pointers:
+
++ [E2e Framework](https://git.k8s.io/kubernetes/test/e2e/framework/framework.go):
+   Familiarise yourself with this test framework and how to use it.
+   Amongst others, it automatically creates uniquely named namespaces
+   within which your tests can run to avoid name clashes, and reliably
+   automates cleaning up the mess after your test has completed (it
+   just deletes everything in the namespace).  This helps to ensure
+   that tests do not leak resources. Note that deleting a namespace
+   (and by implication everything in it) is currently an expensive
+   operation.  So the fewer resources you create, the less cleaning up
+   the framework needs to do, and the faster your test (and other
+   tests running concurrently with yours) will complete. Your tests
+   should always use this framework.  Trying other home-grown
+   approaches to avoiding name clashes and resource leaks has proven
+   to be a very bad idea.
++ [E2e utils library](https://git.k8s.io/kubernetes/test/e2e/framework/util.go):
+   This handy library provides tons of reusable code for a host of
+   commonly needed test functionality, including waiting for resources
+   to enter specified states, safely and consistently retrying failed
+   operations, usefully reporting errors, and much more.  Make sure
+   that you're familiar with what's available there, and use it.
+   Likewise, if you come across a generally useful mechanism that's
+   not yet implemented there, add it so that others can benefit from
+   your brilliance.  In particular pay attention to the variety of
+   timeout and retry related constants at the top of that file. Always
+   try to reuse these constants rather than try to dream up your own
+   values.  Even if the values there are not precisely what you would
+   like to use (timeout periods, retry counts etc), the benefit of
+   having them be consistent and centrally configurable across our
+   entire test suite typically outweighs your personal preferences.
++ **Follow the examples of stable, well-written tests:** Some of our
+   existing end-to-end tests are better written and more reliable than
+   others.  A few examples of well-written tests include:
+   [Replication Controllers](https://git.k8s.io/kubernetes/test/e2e/apps/rc.go),
+   [Services](https://git.k8s.io/kubernetes/test/e2e/network/service.go),
+   [Reboot](https://git.k8s.io/kubernetes/test/e2e/lifecycle/reboot.go).
++ [Ginkgo Test Framework](https://github.com/onsi/ginkgo): This is the
+   test library and runner upon which our e2e tests are built.  Before
+   you write or refactor a test, read the docs and make sure that you
+   understand how it works.  In particular be aware that every test is
+   uniquely identified and described (e.g. in test reports) by the
+   concatenation of its `Describe` clause and nested `It` clauses.
+   So for example `Describe("Pods",...).... It(""should be scheduled
+   with cpu and memory limits")` produces a sane test identifier and
+   descriptor `Pods should be scheduled with cpu and memory limits`,
+   which makes it clear what's being tested, and hence what's not
+   working if it fails.  Other good examples include:
+
+```
+   CAdvisor should be healthy on every node
+```
+
+and
+
+```
+   Daemon set should run and stop complex daemon
+```
+
+   On the contrary
+(these are real examples), the following are less good test
+descriptors:
+
+```
+   KubeProxy should test kube-proxy
+```
+
+and
+
+```
+Nodes [Disruptive] Network when a node becomes unreachable
+[replication controller] recreates pods scheduled on the
+unreachable node AND allows scheduling of pods on a node after
+it rejoins the cluster
+```
+
+An improvement might be
+
+```
+Unreachable nodes are evacuated and then repopulated upon rejoining [Disruptive]
+```
+
+Note that opening issues for specific better tooling is welcome, and
+code implementing that tooling is even more welcome :-).
+
diff --git a/contributors/devel/writing-good-e2e-tests.md b/contributors/devel/writing-good-e2e-tests.md
index 836479c2e..b39208eb7 100644
--- a/contributors/devel/writing-good-e2e-tests.md
+++ b/contributors/devel/writing-good-e2e-tests.md
@@ -1,231 +1,3 @@
-# Writing good e2e tests for Kubernetes #
-
-## Patterns and Anti-Patterns ##
-
-### Goals of e2e tests ###
-
-Beyond the obvious goal of providing end-to-end system test coverage,
-there are a few less obvious goals that you should bear in mind when
-designing, writing and debugging your end-to-end tests.  In
-particular, "flaky" tests, which pass most of the time but fail
-intermittently for difficult-to-diagnose reasons are extremely costly
-in terms of blurring our regression signals and slowing down our
-automated merge velocity.  Up-front time and effort designing your test
-to be reliable is very well spent.  Bear in mind that we have hundreds
-of tests, each running in dozens of different environments, and if any
-test in any test environment fails, we have to assume that we
-potentially have some sort of regression. So if a significant number
-of tests fail even only 1% of the time, basic statistics dictates that
-we will almost never have a "green" regression indicator.  Stated
-another way, writing a test that is only 99% reliable is just about
-useless in the harsh reality of a CI environment.  In fact it's worse
-than useless, because not only does it not provide a reliable
-regression indicator, but it also costs a lot of subsequent debugging
-time, and delayed merges.
-
-#### Debuggability ####
-
-If your test fails, it should provide as detailed as possible reasons
-for the failure in its output. "Timeout" is not a useful error
-message. "Timed out after 60 seconds waiting for pod xxx to enter
-running state, still in pending state" is much more useful to someone
-trying to figure out why your test failed and what to do about it.
-Specifically,
-[assertion](https://onsi.github.io/gomega/#making-assertions) code
-like the following generates rather useless errors:
-
-```
-Expect(err).NotTo(HaveOccurred())
-```
-
-Rather
-[annotate](https://onsi.github.io/gomega/#annotating-assertions) your assertion with something like this:
-
-```
-Expect(err).NotTo(HaveOccurred(), "Failed to create %d foobars, only created %d", foobarsReqd, foobarsCreated)
-```
-
-On the other hand, overly verbose logging, particularly of non-error conditions, can make
-it unnecessarily difficult to figure out whether a test failed and if
-so why?  So don't log lots of irrelevant stuff either.
-
-#### Ability to run in non-dedicated test clusters ####
-
-To reduce end-to-end delay and improve resource utilization when
-running e2e tests, we try, where possible, to run large numbers of
-tests in parallel against the same test cluster.  This means that:
-
-1. you should avoid making any assumption (implicit or explicit) that
-your test is the only thing running against the cluster.  For example,
-making the assumption that your test can run a pod on every node in a
-cluster is not a safe assumption, as some other tests, running at the
-same time as yours, might have saturated one or more nodes in the
-cluster.  Similarly, running a pod in the system namespace, and
-assuming that will increase the count of pods in the system
-namespace by one is not safe, as some other test might be creating or
-deleting pods in the system namespace at the same time as your test.
-If you do legitimately need to write a test like that, make sure to
-label it ["\[Serial\]"](e2e-tests.md#kinds-of-tests) so that it's easy
-to identify, and not run in parallel with any other tests.
-1. You should avoid doing things to the cluster that make it difficult
-for other tests to reliably do what they're trying to do, at the same
-time.  For example, rebooting nodes, disconnecting network interfaces,
-or upgrading cluster software as part of your test is likely to
-violate the assumptions that other tests might have made about a
-reasonably stable cluster environment.  If you need to write such
-tests, please label them as
-["\[Disruptive\]"](e2e-tests.md#kinds-of-tests) so that it's easy to
-identify them, and not run them in parallel with other tests.
-1. You should avoid making assumptions about the Kubernetes API that
-are not part of the API specification, as your tests will break as
-soon as these assumptions become invalid.  For example, relying on
-specific Events, Event reasons or Event messages will make your tests
-very brittle.
-
-#### Speed of execution ####
-
-We have hundreds of e2e tests, some of which we run in serial, one
-after the other, in some cases.  If each test takes just a few minutes
-to run, that very quickly adds up to many, many hours of total
-execution time.  We try to keep such total execution time down to a
-few tens of minutes at most.  Therefore, try (very hard) to keep the
-execution time of your individual tests below 2 minutes, ideally
-shorter than that.  Concretely, adding inappropriately long 'sleep'
-statements or other gratuitous waits to tests is a killer.  If under
-normal circumstances your pod enters the running state within 10
-seconds, and 99.9% of the time within 30 seconds, it would be
-gratuitous to wait 5 minutes for this to happen.  Rather just fail
-after 30 seconds, with a clear error message as to why your test
-failed ("e.g. Pod x failed to become ready after 30 seconds, it
-usually takes 10 seconds").  If you do have a truly legitimate reason
-for waiting longer than that, or writing a test which takes longer
-than 2 minutes to run, comment very clearly in the code why this is
-necessary, and label the test as
-["\[Slow\]"](e2e-tests.md#kinds-of-tests), so that it's easy to
-identify and avoid in test runs that are required to complete
-timeously (for example those that are run against every code
-submission before it is allowed to be merged).
-Note that completing within, say, 2 minutes only when the test
-passes is not generally good enough.  Your test should also fail in a
-reasonable time.  We have seen tests that, for example, wait up to 10
-minutes for each of several pods to become ready.  Under good
-conditions these tests might pass within a few seconds, but if the
-pods never become ready (e.g. due to a system regression) they take a
-very long time to fail and typically cause the entire test run to time
-out, so that no results are produced.  Again, this is a lot less
-useful than a test that fails reliably within a minute or two when the
-system is not working correctly.
-
-#### Resilience to relatively rare, temporary infrastructure glitches or delays ####
-
-Remember that your test will be run many thousands of
-times, at different times of day and night, probably on different
-cloud providers, under different load conditions.  And often the
-underlying state of these systems is stored in eventually consistent
-data stores.  So, for example, if a resource creation request is
-theoretically asynchronous, even if you observe it to be practically
-synchronous most of the time, write your test to assume that it's
-asynchronous (e.g. make the "create" call, and poll or watch the
-resource until it's in the correct state before proceeding).
-Similarly, don't assume that API endpoints are 100% available.
-They're not.  Under high load conditions, API calls might temporarily
-fail or time-out. In such cases it's appropriate to back off and retry
-a few times before failing your test completely (in which case make
-the error message very clear about what happened, e.g. "Retried
-http://... 3 times - all failed with xxx".  Use the standard
-retry mechanisms provided in the libraries detailed below.
-
-### Some concrete tools at your disposal ###
-
-Obviously most of the above goals apply to many tests, not just yours.
-So we've developed a set of reusable test infrastructure, libraries
-and best practices to help you to do the right thing, or at least do
-the same thing as other tests, so that if that turns out to be the
-wrong thing, it can be fixed in one place, not hundreds, to be the
-right thing.
-
-Here are a few pointers:
-
-+ [E2e Framework](https://git.k8s.io/kubernetes/test/e2e/framework/framework.go):
-   Familiarise yourself with this test framework and how to use it.
-   Amongst others, it automatically creates uniquely named namespaces
-   within which your tests can run to avoid name clashes, and reliably
-   automates cleaning up the mess after your test has completed (it
-   just deletes everything in the namespace).  This helps to ensure
-   that tests do not leak resources. Note that deleting a namespace
-   (and by implication everything in it) is currently an expensive
-   operation.  So the fewer resources you create, the less cleaning up
-   the framework needs to do, and the faster your test (and other
-   tests running concurrently with yours) will complete. Your tests
-   should always use this framework.  Trying other home-grown
-   approaches to avoiding name clashes and resource leaks has proven
-   to be a very bad idea.
-+ [E2e utils library](https://git.k8s.io/kubernetes/test/e2e/framework/util.go):
-   This handy library provides tons of reusable code for a host of
-   commonly needed test functionality, including waiting for resources
-   to enter specified states, safely and consistently retrying failed
-   operations, usefully reporting errors, and much more.  Make sure
-   that you're familiar with what's available there, and use it.
-   Likewise, if you come across a generally useful mechanism that's
-   not yet implemented there, add it so that others can benefit from
-   your brilliance.  In particular pay attention to the variety of
-   timeout and retry related constants at the top of that file. Always
-   try to reuse these constants rather than try to dream up your own
-   values.  Even if the values there are not precisely what you would
-   like to use (timeout periods, retry counts etc), the benefit of
-   having them be consistent and centrally configurable across our
-   entire test suite typically outweighs your personal preferences.
-+ **Follow the examples of stable, well-written tests:** Some of our
-   existing end-to-end tests are better written and more reliable than
-   others.  A few examples of well-written tests include:
-   [Replication Controllers](https://git.k8s.io/kubernetes/test/e2e/apps/rc.go),
-   [Services](https://git.k8s.io/kubernetes/test/e2e/network/service.go),
-   [Reboot](https://git.k8s.io/kubernetes/test/e2e/lifecycle/reboot.go).
-+ [Ginkgo Test Framework](https://github.com/onsi/ginkgo): This is the
-   test library and runner upon which our e2e tests are built.  Before
-   you write or refactor a test, read the docs and make sure that you
-   understand how it works.  In particular be aware that every test is
-   uniquely identified and described (e.g. in test reports) by the
-   concatenation of its `Describe` clause and nested `It` clauses.
-   So for example `Describe("Pods",...).... It(""should be scheduled
-   with cpu and memory limits")` produces a sane test identifier and
-   descriptor `Pods should be scheduled with cpu and memory limits`,
-   which makes it clear what's being tested, and hence what's not
-   working if it fails.  Other good examples include:
-
-```
-   CAdvisor should be healthy on every node
-```
-
-and
-
-```
-   Daemon set should run and stop complex daemon
-```
-
-   On the contrary
-(these are real examples), the following are less good test
-descriptors:
-
-```
-   KubeProxy should test kube-proxy
-```
-
-and
-
-```
-Nodes [Disruptive] Network when a node becomes unreachable
-[replication controller] recreates pods scheduled on the
-unreachable node AND allows scheduling of pods on a node after
-it rejoins the cluster
-```
-
-An improvement might be
-
-```
-Unreachable nodes are evacuated and then repopulated upon rejoining [Disruptive]
-```
-
-Note that opening issues for specific better tooling is welcome, and
-code implementing that tooling is even more welcome :-).
+This file has moved to https://git.k8s.io/community/contributors/devel/sig-testing/writing-good-e2e-tests.md.
 
+This file is a placeholder to preserve links.  Please remove by April 30, 2019 or the release of kubernetes 1.13, whichever comes first.
\ No newline at end of file
diff --git a/events/elections/2017/quintonhoole_bio.md b/events/elections/2017/quintonhoole_bio.md
index d72d6aa00..cee52cdd8 100644
--- a/events/elections/2017/quintonhoole_bio.md
+++ b/events/elections/2017/quintonhoole_bio.md
@@ -28,7 +28,7 @@ I have an M.Sc in Computer Science.
 # Highlights of What I've done on Kubernetes thus far
 
 Initially I led the effort to get our CI Testing stuff initiated and
-[working effectively](https://github.com/kubernetes/community/blob/master/contributors/devel/writing-good-e2e-tests.md), which was critical to being able to launch v1.0 successfully,
+[working effectively](/contributors/devel/sig-testing/writing-good-e2e-tests.md), which was critical to being able to launch v1.0 successfully,
 way back in July 2015.  Around that time I handed it over to the now-famous SIG-Testing.
 
 After that I initiated and led the [Cluster
diff --git a/events/elections/2018/quintonhoole.md b/events/elections/2018/quintonhoole.md
index 416e8f658..0120e91a8 100644
--- a/events/elections/2018/quintonhoole.md
+++ b/events/elections/2018/quintonhoole.md
@@ -30,7 +30,7 @@ I have an M.Sc in Computer Science.
 1. steering committee member since October 2017. 
 2. active contributor to [SIG-Architecture](https://github.com/kubernetes/community/tree/master/sig-architecture).
 3. lead [SIG-Multicluster](https://github.com/kubernetes/community/tree/master/sig-multicluster)
-4. early lead getting CI Testing [working effectively](https://github.com/kubernetes/community/blob/master/contributors/devel/writing-good-e2e-tests.md), critical to v1.0 launch.
+4. early lead getting CI Testing [working effectively](/contributors/devel/sig-testing/writing-good-e2e-tests.md), critical to v1.0 launch.
 5. Helped [SIG-Scalability set key
 objectives](https://github.com/kubernetes/community/blob/master/sig-scalability/goals.md).