community/contributors/devel/sig-node/e2e-node-tests.md

355 lines
15 KiB
Markdown

# Node End-To-End (e2e) tests
Node e2e tests are component tests meant for testing the Kubelet code on a custom host environment. Tests can be run either locally or against a host running on Google Compute Engine (GCE). Node e2e tests are run as both pre- and post- submit tests by the Kubernetes project.
*Note: Linux only. Mac and Windows unsupported.*
*Note: There is no scheduler running. The e2e tests have to do manual scheduling, e.g. by using `framework.PodClient`.*
# Running tests
## Locally
Why run tests *locally*? It is much faster than running tests remotely.
Prerequisites:
- [Install etcd](https://github.com/coreos/etcd/releases) and include the path to the installation in your PATH
- Verify etcd is installed correctly by running `which etcd`
- Or make etcd binary available and executable at `/tmp/etcd`
- [Install ginkgo](https://github.com/onsi/ginkgo) and include the path to the installation in your PATH
- Verify ginkgo is installed correctly by running `which ginkgo`
From the Kubernetes base directory, run:
```sh
make test-e2e-node
```
This will run the *ginkgo* binary against the subdirectory *test/e2e_node*, which will in turn:
- Ask for sudo access (needed for running some of the processes)
- Build the Kubernetes source code
- Pre-pull docker images used by the tests
- Start a local instance of *etcd*
- Start a local instance of *kube-apiserver*
- Start a local instance of *kubelet*
- Run the test using the locally started processes
- Output the test results to STDOUT
- Stop *kubelet*, *kube-apiserver*, and *etcd*
To view the settings and print help, run:
```sh
make test-e2e-node PRINT_HELP=y
```
## Remotely
Why run tests *remotely*? Tests will be run in a customized testing environment. This environment closely mimics the pre- and post- submit testing performed by the project.
Prerequisites:
- [Join the googlegroup](https://groups.google.com/forum/#!forum/kubernetes-dev) `kubernetes-dev@googlegroups.com`
- *This provides read access to the node test images.*
- Setup a [Google Cloud Platform](https://cloud.google.com/) account and project with Google Compute Engine enabled
- Install and setup the [gcloud sdk](https://cloud.google.com/sdk/downloads)
- Set your project and a zone by running `gcloud config set project $PROJECT` and `gcloud config set compute/zone $zone`
- Verify the sdk is setup correctly by running `gcloud compute instances list` and `gcloud compute images list --project kubernetes-node-e2e-images`
Run:
```sh
make test-e2e-node REMOTE=true
```
This will:
- Build the Kubernetes source code
- Create a new GCE instance using the default test image
- Instance will be named something like **test-cos-beta-81-12871-44-0**
- Lookup the instance public ip address
- Copy a compressed archive file to the host containing the following binaries:
- ginkgo
- kubelet
- kube-apiserver
- e2e_node.test (this binary contains the actual tests to be run)
- Unzip the archive to a directory under **/tmp/gcloud**
- Run the tests using the `ginkgo` command
- Starts etcd, kube-apiserver, kubelet
- The ginkgo command is used because this supports more features than running the test binary directly
- Output the remote test results to STDOUT
- `scp` the log files back to the local host under /tmp/_artifacts/e2e-node-containervm-v20160321-image
- Stop the processes on the remote host
- **Leave the GCE instance running**
**Note: Subsequent tests run using the same image will *reuse the existing host* instead of deleting it and
provisioning a new one. To delete the GCE instance after each test see
*[DELETE_INSTANCE](#delete-instance-after-tests-run)*.**
## Additional Remote Options
## Run tests using different images
This is useful if you want to run tests against a host using a different OS distro or container runtime than
provided by the default image.
List the available test images using gcloud.
```sh
make test-e2e-node LIST_IMAGES=true
```
This will output a list of the available images for the default image project.
Then run:
```sh
make test-e2e-node REMOTE=true IMAGES="<comma-separated-list-images>"
```
## Run tests against a running GCE instance (not an image)
This is useful if you have an host instance running already and want to run the tests there instead of on a new instance.
```sh
make test-e2e-node REMOTE=true HOSTS="<comma-separated-list-of-hostnames>"
```
## Delete instance after tests run
This is useful if you want recreate the instance for each test run to trigger flakes related to starting the instance.
```sh
make test-e2e-node REMOTE=true DELETE_INSTANCES=true
```
## Keep instance, test binaries, and processes around after tests run
This is useful if you want to manually inspect or debug the kubelet process run as part of the tests.
```sh
make test-e2e-node REMOTE=true CLEANUP=false
```
## Run tests using an image in another project
This is useful if you want to create your own host image in another project and use it for testing.
```sh
make test-e2e-node REMOTE=true IMAGE_PROJECT="<name-of-project-with-images>" IMAGES="<image-name>"
```
Setting up your own host image may require additional steps such as installing etcd or docker. See
[setup_host.sh](https://git.k8s.io/kubernetes/test/e2e_node/environment/setup_host.sh) for common steps to setup hosts to run node tests.
## Create instances using a different instance name prefix
This is useful if you want to create instances using a different name so that you can run multiple copies of the
test in parallel against different instances of the same image.
```sh
make test-e2e-node REMOTE=true INSTANCE_PREFIX="my-prefix"
```
## Run tests using a custom image configuration
This is useful if you want to test out different runtime configurations. First, make a local
(temporary) copy of the base image config from the test-infra repo:
https://github.com/kubernetes/test-infra/tree/master/jobs/e2e_node
Make your desired modifications to the config, and update data paths to be absolute paths to the
relevant files on your local machine (e.g. prepend your home directory path to each). For example:
```diff
images:
cos-stable:
image_regex: cos-stable-60-9592-84-0
project: cos-cloud
- metadata: "user-data</go/src/github.com/containerd/cri/test/e2e_node/init.yaml,containerd-configure-sh</go/src/github.com/containerd/cri/cluster/gce/configure.sh,containerd-extra-init-sh</go/src/github.com/containerd/cri/test/e2e_node/gci-init.sh,containerd-env</workspace/test-infra/jobs/e2e_node/containerd/cri-master/env,gci-update-strategy=update_disabled"
+ metadata: "user-data</home/tallclair/go/src/github.com/containerd/cri/test/e2e_node/init.yaml,containerd-configure-sh</home/tallclair/go/src/github.com/containerd/cri/cluster/gce/configure.sh,containerd-extra-init-sh</home/tallclair/go/src/github.com/containerd/cri/test/e2e_node/gci-init.sh,containerd-env</home/tallclair/workspace/test-infra/jobs/e2e_node/containerd/cri-master/env,gci-update-strategy=update_disabled"
```
Finally, run the tests with your custom config:
```sh
make test-e2e-node REMOTE=true IMAGE_CONFIG_FILE="<local file>" [...]
```
Image configuration files can further influence how FOCUS and SKIP match test cases.
For example:
```sh
--focus="\[NodeFeature:.+\]" --skip="\[Flaky\]|\[Serial\]"
```
runs all e2e tests labeled `"[NodeFeature:*]"` and will skip any tests labeled `"[Flaky]"` or `"[Serial]"`.
Two example e2e tests that match this expression are:
* https://github.com/kubernetes/kubernetes/blob/0e2220b4462130ae8a22ed657e8979f7844e22c1/test/e2e_node/security_context_test.go#L155
* https://github.com/kubernetes/kubernetes/blob/0e2220b4462130ae8a22ed657e8979f7844e22c1/test/e2e_node/security_context_test.go#L175
However, image configuration files select test cases based on the `tests` field.
See https://github.com/kubernetes/test-infra/blob/4572dc3bf92e70f572e55e7ac1be643bdf6b2566/jobs/e2e_node/benchmark-config.yaml#L22-23 for an example configuration.
If the [Prow e2e job configuration](https://github.com/kubernetes/test-infra/blob/master/jobs/e2e_node/image-config.yaml) does **not** specify the `tests` field, FOCUS and SKIP will run as expected.
# Additional test options for both remote and local execution
## Only run a subset of the tests
To run tests matching a regex:
```sh
make test-e2e-node REMOTE=true FOCUS="<regex-to-match>"
```
To run tests NOT matching a regex:
```sh
make test-e2e-node REMOTE=true SKIP="<regex-to-match>"
```
These are often configured in the CI environment.
For example, the [`ci-kubernetes-node-kubelet`](https://github.com/kubernetes/test-infra/blob/05eeaff67cc936181c18a63fdc9d5847c55ef258/config/jobs/kubernetes/sig-node/node-kubelet.yaml#L31) uses `--focus="\[NodeConformance\]" --skip="\[Flaky\]|\[Serial\]"`, this can be specified to the make target as:
```sh
make test-e2e-node REMOTE=true FOCUS="\[NodeConformance\]" SKIP="\[Flaky\]|\[Serial\]"
```
See http://onsi.github.io/ginkgo/#focused-specs in the Grinkgo documentation to learn more about how FOCUS and SKIP work.
## Run a single test
To run a particular e2e test, simply pass the Grinkgo `It` string to the `--focus` argument.
For example, suppose we have the following test case: https://github.com/kubernetes/kubernetes/blob/0e2220b4462130ae8a22ed657e8979f7844e22c1/test/e2e_node/security_context_test.go#L175. We could select this test case by adding the argument:
```sh
--focus="should not show its pid in the non-hostpid containers \[NodeFeature:HostAccess\]"
```
## Run all tests related to a feature
In contrast, to run all node e2e tests related to the "HostAccess" feature one could run:
```sh
--focus="\[NodeFeature:HostAccess\]"
```
## Run tests continually until they fail
This is useful if you are trying to debug a flaky test failure. This will cause ginkgo to continually
run the tests until they fail. **Note: this will only perform test setup once (e.g. creating the instance) and is
less useful for catching flakes related creating the instance from an image.**
```sh
make test-e2e-node REMOTE=true RUN_UNTIL_FAILURE=true
```
## Run tests in parallel
Running test in parallel can usually shorten the test duration. By default node
e2e test runs with`--nodes=8` (see ginkgo flag
[--nodes](https://onsi.github.io/ginkgo/#parallel-specs)). You can use the
`PARALLELISM` option to change the parallelism.
```sh
make test-e2e-node PARALLELISM=4 # run test with 4 parallel nodes
make test-e2e-node PARALLELISM=1 # run test sequentially
```
## Run tests with kubenet network plugin
[kubenet](http://kubernetes.io/docs/admin/network-plugins/#kubenet) is
the default network plugin used by kubelet since Kubernetes 1.3. The
plugin requires [CNI](https://github.com/containernetworking/cni) and
[nsenter](http://man7.org/linux/man-pages/man1/nsenter.1.html).
Currently, kubenet is enabled by default for Remote execution `REMOTE=true`,
but disabled for Local execution. **Note: kubenet is not supported for
local execution currently. This may cause network related test result to be
different for Local and Remote execution. So if you want to run network
related test, Remote execution is recommended.**
To enable/disable kubenet:
```sh
# enable kubenet
make test-e2e-node TEST_ARGS='--kubelet-flags="--network-plugin=kubenet --network-plugin-dir=/opt/cni/bin"'
# disable kubenet
make test-e2e-node TEST_ARGS='--kubelet-flags="--network-plugin= --network-plugin-dir="'
```
## Additional QoS Cgroups Hierarchy level testing
For testing with the QoS Cgroup Hierarchy enabled, you can pass --cgroups-per-qos flag as an argument into Ginkgo using TEST_ARGS
```sh
make test_e2e_node TEST_ARGS="--cgroups-per-qos=true"
```
## Testgrid
TestGrid (https://testgrid.k8s.io) is a publicly hosted and configured automated testing framework developed by Google.
Here (https://testgrid.k8s.io/sig-node-containerd#containerd-node-features) we see an example of an e2e Prow job running e2e tests. Each gray row in the grid corresponds
to an e2e test or a stage of the job (i.e. created a VM, downloaded some files). Passed tests are colored green and failed tests are colored red.
# Notes on tests run by the Kubernetes project during pre-, post- submit.
The node e2e tests are run by the [Prow](https://prow.k8s.io/) for each Pull Request and the results published
in the status checks box at the bottom of the Pull Request below all comments. To have Prow re-run the node e2e tests against a PR add the comment
`/test pull-kubernetes-node-e2e` and **include a link to the test failure logs if caused by a flake.**
Note that [commands to Prow](https://prow.k8s.io/command-help#test) must be on separate lines from any commentary.
For example,
/test pull-kubernetes-node-e2e
flake due to #12345
The PR builder runs tests against the images listed in [image-config.yaml](https://github.com/kubernetes/test-infra/blob/master/jobs/e2e_node/image-config.yaml).
Other [node e2e Prow jobs](https://github.com/kubernetes/test-infra/tree/master/config/jobs/kubernetes/sig-node)
run against different images depending on the configuration chosen in the
[test-infra repo](https://github.com/kubernetes/test-infra/tree/master/jobs/e2e_node).
The source code for these tests comes from the [kubernetes/kubernetes repo](https://github.com/kubernetes/kubernetes/tree/master/test/e2e_node).
# Notes on the Topology Manager tests
The Topology Manager tests require a multi-numa node box (two or more nodes) with at least one SRIOV device installed to run.
The tests automatically skip if the conditions aren't met.
The test code statically includes the manifests needed to configure the SRIOV device plugin.
However, is not possible to anticipate all the possible configuration, hence the included configuration is intentionally minimal.
It is recommended you supply a ConfigMap describing the cards installed in the machine running the tests using TEST_ARGS.
[Here's the upstream reference](https://github.com/intel/sriov-network-device-plugin/blob/master/deployments/configMap.yaml)
```sh
make test-e2e-node TEST_ARGS='--sriovdp-configmap-file="/path/to/sriovdp-config-map.yaml"'
```
You must have the Virtual Functions (VFs) already created in the node you want to run the test on.
Example command to create the VFs - please note the PCI address of the SRIOV device depends on the host
system hardware.
```bash
cat /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.1/sriov_numvfs
echo 7 > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.1/sriov_numvfs
```
Some topology manager tests require minimal knowledge of the host topology in order to be performed.
The required information is to which NUMA node in the system are the SRIOV device attached to.
The test code tries to autodetect the information it needs, skipping the relevant tests if the autodetection fails.
You can override the autodetection adding annotations to the the config map like this example:
```yaml
metadata:
annotations:
pcidevice_node0: "1"
pcidevice_node1: "0"
pcidevice_node2: "0"
pcidevice_node3: "0"
```
Please note that if you add the annotations, then you must provide the full information:
you must should specify the number of SRIOV devices attached to each NUMA node in the system,
even if the number is zero.