Merge pull request #3163 from eduartua/issue-3064-grouping-by-sig-node

Grouping /devel files by SIGs - SIG Node
This commit is contained in:
Kubernetes Prow Robot 2019-01-31 15:33:09 -08:00 committed by GitHub
commit e57bfed3b4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
20 changed files with 855 additions and 834 deletions

View File

@ -1499,7 +1499,7 @@
* Many fixes in the 1.10 release
* More great things in the pipeline
* SIG Node [Derek Carr] (confirmed)
* New [CRI testing policy](https://github.com/kubernetes/community/blob/master/contributors/devel/cri-testing-policy.md)
* New [CRI testing policy](/contributors/devel/sig-node/cri-testing-policy.md)
* Feature going into Beta - Local storage capacity isolation
* Feature going into alpha - debug container, supports pod pid limits, cri container log rotation
* wg-resource-mgmt : graduated device plugins, hugepages, cpu pinning (beta)

View File

@ -217,7 +217,7 @@ agent.
Each node runs a container runtime, which is responsible for downloading images and running containers.
Kubelet does not link in the base container runtime. Instead, we're defining a
[Container Runtime Interface](/contributors/devel/container-runtime-interface.md) to control the
[Container Runtime Interface](/contributors/devel/sig-node/container-runtime-interface.md) to control the
underlying runtime and facilitate pluggability of that layer.
This decoupling is needed in order to maintain clear component boundaries, facilitate testing, and facilitate pluggability.
Runtimes supported today, either upstream or by forks, include at least docker (for Linux and Windows),

View File

@ -268,7 +268,7 @@ already underway for Docker, called
## Container Runtime Interface
Other container runtimes will likely add AppArmor support eventually, so the
[Container Runtime Interface](/contributors/devel/container-runtime-interface.md) (CRI) needs to be made compatible
[Container Runtime Interface](/contributors/devel/sig-node/container-runtime-interface.md) (CRI) needs to be made compatible
with this design. The two important pieces are a way to report whether AppArmor is supported by the
runtime, and a way to specify the profile to load (likely through the `LinuxContainerConfig`).

View File

@ -29,7 +29,7 @@ This document proposes a design for the set of metrics included in an eventual C
"Kubelet": The daemon that runs on every kubernetes node and controls pod and container lifecycle, among many other things.
["cAdvisor":](https://github.com/google/cadvisor) An open source container monitoring solution which only monitors containers, and has no concept of kubernetes constructs like pods or volumes.
["Summary API":](https://git.k8s.io/kubernetes/pkg/kubelet/apis/stats/v1alpha1/types.go) A kubelet API which currently exposes node metrics for use by both system components and monitoring systems.
["CRI":](/contributors/devel/container-runtime-interface.md) The Container Runtime Interface designed to provide an abstraction over runtimes (docker, rkt, etc).
["CRI":](/contributors/devel/sig-node/container-runtime-interface.md) The Container Runtime Interface designed to provide an abstraction over runtimes (docker, rkt, etc).
"Core Metrics": A set of metrics described in the [Monitoring Architecture](/contributors/design-proposals/instrumentation/monitoring_architecture.md) whose purpose is to provide metrics for first-class resource isolation and utilization features, including [resource feasibility checking](https://github.com/eBay/Kubernetes/blob/master/docs/design/resources.md#the-resource-model) and node resource management.
"Resource": A consumable element of a node (e.g. memory, disk space, CPU time, etc).
"First-class Resource": A resource critical for scheduling, whose requests and limits can be (or soon will be) set via the Pod/Container Spec.

View File

@ -4,7 +4,7 @@
[#34672](https://github.com/kubernetes/kubernetes/issues/34672)
## Background
[Container Runtime Interface (CRI)](../devel/container-runtime-interface.md)
[Container Runtime Interface (CRI)](/contributors/devel/sig-node/container-runtime-interface.md)
is an ongoing project to allow container runtimes to integrate with
kubernetes via a newly-defined API.
[Dockershim](https://github.com/kubernetes/kubernetes/blob/release-1.5/pkg/kubelet/dockershim)

View File

@ -1,136 +1,3 @@
# CRI: the Container Runtime Interface
## What is CRI?
CRI (_Container Runtime Interface_) consists of a
[protobuf API](https://git.k8s.io/kubernetes/pkg/kubelet/apis/cri/runtime/v1alpha2/api.proto),
specifications/requirements (to-be-added),
and [libraries](https://git.k8s.io/kubernetes/pkg/kubelet/server/streaming)
for container runtimes to integrate with kubelet on a node. CRI is currently in Alpha.
In the future, we plan to add more developer tools such as the CRI validation
tests.
## Why develop CRI?
Prior to the existence of CRI, container runtimes (e.g., `docker`, `rkt`) were
integrated with kubelet through implementing an internal, high-level interface
in kubelet. The entrance barrier for runtimes was high because the integration
required understanding the internals of kubelet and contributing to the main
Kubernetes repository. More importantly, this would not scale because every new
addition incurs a significant maintenance overhead in the main Kubernetes
repository.
Kubernetes aims to be extensible. CRI is one small, yet important step to enable
pluggable container runtimes and build a healthier ecosystem.
## How to use CRI?
For Kubernetes 1.6+:
1. Start the image and runtime services on your node. You can have a single
service acting as both image and runtime services.
2. Set the kubelet flags
- Pass the unix socket(s) to which your services listen to kubelet:
`--container-runtime-endpoint` and `--image-service-endpoint`.
- Use the "remote" runtime by `--container-runtime=remote`.
CRI is still young and we are actively incorporating feedback from developers
to improve the API. Although we strive to maintain backward compatibility,
developers should expect occasional API breaking changes.
*For Kubernetes 1.5, additional flags are required:*
- Set apiserver flag `--feature-gates=StreamingProxyRedirects=true`.
- Set kubelet flag `--experimental-cri=true`.
## Does Kubelet use CRI today?
Yes, Kubelet always uses CRI except for using the rktnetes integration.
The old, pre-CRI Docker integration was removed in 1.7.
## Specifications, design documents and proposals
The Kubernetes 1.5 [blog post on CRI](https://kubernetes.io/blog/2016/12/container-runtime-interface-cri-in-kubernetes/)
serves as a general introduction.
Below is a mixed list of CRI specifications/requirements, design docs and
proposals. We are working on adding more documentation for the API.
- [Original proposal](https://github.com/kubernetes/kubernetes/blob/release-1.5/docs/proposals/container-runtime-interface-v1.md)
- [Networking](/contributors/devel/kubelet-cri-networking.md)
- [Container metrics](/contributors/devel/cri-container-stats.md)
- [Exec/attach/port-forward streaming requests](https://docs.google.com/document/d/1OE_QoInPlVCK9rMAx9aybRmgFiVjHpJCHI9LrfdNM_s/edit?usp=sharing)
- [Container stdout/stderr logs](https://github.com/kubernetes/kubernetes/blob/release-1.5/docs/proposals/kubelet-cri-logging.md)
## Work-In-Progress CRI runtimes
- [cri-o](https://github.com/kubernetes-incubator/cri-o)
- [rktlet](https://github.com/kubernetes-incubator/rktlet)
- [frakti](https://github.com/kubernetes/frakti)
- [cri-containerd](https://github.com/kubernetes-incubator/cri-containerd)
## [Status update](#status-update)
### Kubernetes v1.7 release (Docker-CRI integration GA, container metrics API)
- The Docker CRI integration has been promoted to GA.
- The legacy, non-CRI Docker integration has been completely removed from
Kubelet. The deprecated `--enable-cri` flag has been removed.
- CRI has been extended to support collecting container metrics from the
runtime.
### Kubernetes v1.6 release (Docker-CRI integration Beta)
**The Docker CRI integration has been promoted to Beta, and been enabled by
default in Kubelet**.
- **Upgrade**: It is recommended to drain your node before upgrading the
Kubelet. If you choose to perform in-place upgrade, the Kubelet will
restart all Kubernetes-managed containers on the node.
- **Resource usage and performance**: There is no performance regression
in our measurement. The memory usage of Kubelet increases slightly
(~0.27MB per pod) due to the additional gRPC serialization for CRI.
- **Disable**: To disable the Docker CRI integration and fall back to the
old implementation, set `--enable-cri=false`. Note that the old
implementation has been *deprecated* and is scheduled to be removed in
the next release. You are encouraged to migrate to CRI as early as
possible.
- **Others**: The Docker container naming/labeling scheme has changed
significantly in 1.6. This is perceived as implementation detail and
should not be relied upon by any external tools or scripts.
### Kubernetes v1.5 release (CRI v1alpha1)
- [v1alpha1 version](https://github.com/kubernetes/kubernetes/blob/release-1.5/pkg/kubelet/api/v1alpha1/runtime/api.proto) of CRI is released.
#### [CRI known issues](#cri-1.5-known-issues):
- [#27097](https://github.com/kubernetes/kubernetes/issues/27097): Container
metrics are not yet defined in CRI.
- [#36401](https://github.com/kubernetes/kubernetes/issues/36401): The new
container log path/format is not yet supported by the logging pipeline
(e.g., fluentd, GCL).
- CRI may not be compatible with other experimental features (e.g., Seccomp).
- Streaming server needs to be hardened.
- [#36666](https://github.com/kubernetes/kubernetes/issues/36666):
Authentication.
- [#36187](https://github.com/kubernetes/kubernetes/issues/36187): Avoid
including user data in the redirect URL.
#### [Docker CRI integration known issues](#docker-cri-1.5-known-issues)
- Docker compatibility: Support only Docker v1.11 and v1.12.
- Network:
- [#35457](https://github.com/kubernetes/kubernetes/issues/35457): Does
not support host ports.
- [#37315](https://github.com/kubernetes/kubernetes/issues/37315): Does
not support bandwidth shaping.
- Exec/attach/port-forward (streaming requests):
- [#35747](https://github.com/kubernetes/kubernetes/issues/35747): Does
not support `nsenter` as the exec handler (`--exec-handler=nsenter`).
- Also see [CRI 1.5 known issues](#cri-1.5-known-issues) for limitations
on CRI streaming.
## Contacts
- Email: sig-node (kubernetes-sig-node@googlegroups.com)
- Slack: https://kubernetes.slack.com/messages/sig-node
This file has moved to https://git.k8s.io/community/contributors/devel/sig-node/container-runtime-interface.md.
This file is a placeholder to preserve links. Please remove by April 28, 2019 or the release of kubernetes 1.13, whichever comes first.

View File

@ -1,121 +1,3 @@
# Container Runtime Interface: Container Metrics
[Container runtime interface
(CRI)](/contributors/devel/container-runtime-interface.md)
provides an abstraction for container runtimes to integrate with Kubernetes.
CRI expects the runtime to provide resource usage statistics for the
containers.
## Background
Historically Kubelet relied on the [cAdvisor](https://github.com/google/cadvisor)
library, an open-source project hosted in a separate repository, to retrieve
container metrics such as CPU and memory usage. These metrics are then aggregated
and exposed through Kubelet's [Summary
API](https://git.k8s.io/kubernetes/pkg/kubelet/apis/stats/v1alpha1/types.go)
for the monitoring pipeline (and other components) to consume. Any container
runtime (e.g., Docker and Rkt) integrated with Kubernetes needed to add a
corresponding package in cAdvisor to support tracking container and image file
system metrics.
With CRI being the new abstraction for integration, it was a natural
progression to augment CRI to serve container metrics to eliminate a separate
integration point.
*See the [core metrics design
proposal](/contributors/design-proposals/instrumentation/core-metrics-pipeline.md)
for more information on metrics exposed by Kubelet, and [monitoring
architecture](/contributors/design-proposals/instrumentation/monitoring_architecture.md)
for the evolving monitoring pipeline in Kubernetes.*
# Container Metrics
Kubelet is responsible for creating pod-level cgroups based on the Quality of
Service class to which the pod belongs, and passes this as a parent cgroup to the
runtime so that it can ensure all resources used by the pod (e.g., pod sandbox,
containers) will be charged to the cgroup. Therefore, Kubelet has the ability
to track resource usage at the pod level (using the built-in cAdvisor), and the
API enhancement focuses on the container-level metrics.
We include the only a set of metrics that are necessary to fulfill the needs of
Kubelet. As the requirements evolve over time, we may extend the API to support
more metrics. Below is the API with the metrics supported today.
```go
// ContainerStats returns stats of the container. If the container does not
// exist, the call returns an error.
rpc ContainerStats(ContainerStatsRequest) returns (ContainerStatsResponse) {}
// ListContainerStats returns stats of all running containers.
rpc ListContainerStats(ListContainerStatsRequest) returns (ListContainerStatsResponse) {}
```
```go
// ContainerStats provides the resource usage statistics for a container.
message ContainerStats {
// Information of the container.
ContainerAttributes attributes = 1;
// CPU usage gathered from the container.
CpuUsage cpu = 2;
// Memory usage gathered from the container.
MemoryUsage memory = 3;
// Usage of the writable layer.
FilesystemUsage writable_layer = 4;
}
// CpuUsage provides the CPU usage information.
message CpuUsage {
// Timestamp in nanoseconds at which the information were collected. Must be > 0.
int64 timestamp = 1;
// Cumulative CPU usage (sum across all cores) since object creation.
UInt64Value usage_core_nano_seconds = 2;
}
// MemoryUsage provides the memory usage information.
message MemoryUsage {
// Timestamp in nanoseconds at which the information were collected. Must be > 0.
int64 timestamp = 1;
// The amount of working set memory in bytes.
UInt64Value working_set_bytes = 2;
}
// FilesystemUsage provides the filesystem usage information.
message FilesystemUsage {
// Timestamp in nanoseconds at which the information were collected. Must be > 0.
int64 timestamp = 1;
// The underlying storage of the filesystem.
StorageIdentifier storage_id = 2;
// UsedBytes represents the bytes used for images on the filesystem.
// This may differ from the total bytes used on the filesystem and may not
// equal CapacityBytes - AvailableBytes.
UInt64Value used_bytes = 3;
// InodesUsed represents the inodes used by the images.
// This may not equal InodesCapacity - InodesAvailable because the underlying
// filesystem may also be used for purposes other than storing images.
UInt64Value inodes_used = 4;
}
```
There are three categories or resources: CPU, memory, and filesystem. Each of
the resource usage message includes a timestamp to indicate when the usage
statistics is collected. This is necessary because some resource usage (e.g.,
filesystem) are inherently more expensive to collect and may be updated less
frequently than others. Having the timestamp allows the consumer to know how
stale/fresh the data is, while giving the runtime flexibility to adjust.
Although CRI does not dictate the frequency of the stats update, Kubelet needs
a minimum guarantee of freshness of the stats for certain resources so that it
can reclaim them timely when under pressure. We will formulate the requirements
for any of such resources and include them in CRI in the near future.
*For more details on why we request cached stats with timestamps as opposed to
requesting stats on-demand, here is the [rationale](https://github.com/kubernetes/kubernetes/pull/45614#issuecomment-302258090)
behind it.*
## Status
The container metrics calls are added to CRI in Kubernetes 1.7, but Kubelet does not
yet use it to gather metrics from the runtime. We plan to enable Kubelet to
optionally consume the container metrics from the API in 1.8.
This file has moved to https://git.k8s.io/community/contributors/devel/sig-node/cri-container-stats.md.
This file is a placeholder to preserve links. Please remove by April 28, 2019 or the release of kubernetes 1.13, whichever comes first.

View File

@ -1,118 +1,3 @@
# Container Runtime Interface: Testing Policy
This file has moved to https://git.k8s.io/community/contributors/devel/sig-node/cri-testing-policy.md.
**Owner: SIG-Node**
This document describes testing policy and process for runtimes implementing the
[Container Runtime Interface (CRI)](/contributors/devel/container-runtime-interface.md)
to publish test results in a federated dashboard. The objective is to provide
the Kubernetes community an easy way to track the conformance, stability, and
supported features of a CRI runtime.
This document focuses on Kubernetes node/cluster end-to-end (E2E) testing
because many features require integration of runtime, OS, or even the cloud
provider. A higher-level integration tests provider better signals on vertical
stack compatibility to the Kubernetes community. On the other hand, runtime
developers are strongly encouraged to run low-level
[CRI validation test suite](https://github.com/kubernetes-sigs/cri-tools/blob/master/docs/validation.md)
for validation as part of their development process.
## Required and optional tests
Runtime maintainers are **required** to submit the tests listed below.
1. Node conformance test suite
2. Node feature test suite
Node E2E tests qualify an OS image with a pre-installed CRI runtime. The
runtime maintainers are free to choose any OS distribution, packaging, and
deployment mechanism. Please see the
[tutorial](https://github.com/kubernetes/community/blob/master/contributors/devel/e2e-node-tests.md)
to know more about the Node E2E test framework and tests for validating a
compatible OS image.
The conformance suite is a set of platform-agnostic (e.g., OS, runtime, and
cloud provider) tests that validate the conformance of the OS image. The feature
suite allows the runtime to demonstrate what features are supported with the OS
distribution.
In addition to the required tests, the runtime maintainers are *strongly
recommended to run and submit results from the Kubernetes conformance test
suite*. This cluster-level E2E test suite provides extra test signal for areas
such as Networking, which cannot be covered by CRI, or Node-level
tests. Because networking requires deep integration between the runtime, the
cloud provider, and/or other cluster components, runtime maintainers are
recommended to reach out to other relevant SIGs (e.g., SIG-GCP or SIG-AWS) for
guidance and/or sponsorship.
## Process for publishing test results
To publish tests results, please submit a proposal in the
[Kubernetes community repository](https://github.com/kubernetes/community)
briefly explaining your runtime, providing at least two maintainers, and
assigning the proposal to the leads of SIG-Node.
These test results should be published under the `sig-node` tab, organized
as follows.
```
sig-node -> sig-node-cri-{Kubernetes-version} -> [page containing the required jobs]
```
Only the last three most recent Kubernetes versions and the master branch are
kept at any time. This is consistent with the Kubernetes release schedule and
policy.
## Test job maintenance
Tests are required to run at least nightly.
The runtime maintainers are responsible for keeping the tests healthy. If the
tests are deemed not actively maintained, SIG-Node may remove the tests from
the test grid at their discretion.
## Process for adding pre-submit testing
If the tests are in good standing (i.e., consistently passing for more than 2
weeks), the runtime maintainers may request that the tests to be included in the
pre-submit Pull Request (PR) tests. Please note that the pre-submit tests
require significantly higher testing capacity, and are held at a higher standard
since they directly affect the development velocity.
If the tests are flaky or failing, and the maintainers are unable to respond and
fix the issues in a timely manner, the SIG leads may remove the runtime from
the presubmit tests until the issues are resolved.
As of now, SIG-Node only accepts promotion of Node conformance tests to
pre-submit because Kubernetes conformance tests involve a wider scope and may
need co-sponsorships from other SIGs.
## FAQ
*1. Can runtime maintainers publish results from other E2E tests?*
Yes, runtime maintainers can publish additional Node E2E tests results. These
test jobs will be displayed in the `sig-node-{runtime-name}` page. The same
policy for test maintenance applies.
As for additional Cluster E2E tests, SIG-Node may agree to host the
results. However, runtime maintainers are strongly encouraged to seek for a more
appropriate SIG to sponsor or host the results.
*2. Can these runtime-specific test jobs be considered release blocking?*
This is beyond the authority of SIG-Node, and requires agreement and consensus
across multiple SIGs (e.g., Release, the relevant cloud provider SIG, etc).
*3. How to run the aforementioned tests?*
It is hard to keep instructions are even links to them up-to-date in one
document. Please contact the relevant SIGs for assistance.
*4. How can I change the test-grid to publish the test results?*
Please contact SIG-Node for the detailed instructions.
*5. How does this policy apply to Windows containers?*
Windows containers are still in the early development phase and the features
they support change rapidly. Therefore, it is suggested to treat it as a
feature with select, whitelisted tests to run.
This file is a placeholder to preserve links. Please remove by April 28, 2019 or the release of kubernetes 1.13, whichever comes first.

View File

@ -1,53 +1,3 @@
# Container Runtime Interface (CRI) Validation Testing
This file has moved to https://git.k8s.io/community/contributors/devel/sig-node/cri-validation.md.
CRI validation testing provides a test framework and a suite of tests to validate that the Container Runtime Interface (CRI) server implementation meets all the requirements. This allows the CRI runtime developers to verify that their runtime conforms to CRI, without needing to set up Kubernetes components or run Kubernetes end-to-end tests.
CRI validation testing is GA since v1.11.0 and is hosted at the [cri-tools](https://github.com/kubernetes-sigs/cri-tools) repository. We encourage the CRI developers to report bugs or help extend the test coverage by adding more tests.
## Install
The test suites can be downloaded from cri-tools [release page](https://github.com/kubernetes-sigs/cri-tools/releases):
```sh
VERSION="v1.11.0"
wget https://github.com/kubernetes-sigs/cri-tools/releases/download/$VERSION/critest-$VERSION-linux-amd64.tar.gz
sudo tar zxvf critest-$VERSION-linux-amd64.tar.gz -C /usr/local/bin
rm -f critest-$VERSION-linux-amd64.tar.gz
```
critest requires [ginkgo](https://github.com/onsi/ginkgo) to run parallel tests. It could be installed by
```sh
go get -u github.com/onsi/ginkgo/ginkgo
```
*Note: ensure GO is installed and GOPATH is set before installing ginkgo.*
## Running tests
### Prerequisite
Before running the test, you need to _ensure that the CRI server under test is running and listening on a Unix socket_. Because the validation tests are designed to request changes (e.g., create/delete) to the containers and verify that correct status is reported, it expects to be the only user of the CRI server. Please make sure that 1) there are no existing CRI-managed containers running on the node, and 2) no other processes (e.g., Kubelet) will interfere with the tests.
### Run
```sh
critest
```
This will
- Connect to the shim of CRI container runtime
- Run the tests using `ginkgo`
- Output the test results to STDOUT
critest connects to `unix:///var/run/dockershim.sock` by default. For other runtimes, the endpoint can be set by flags `-runtime-endpoint` and `-image-endpoint`.
## Additional options
- `-ginkgo.focus`: Only run the tests that match the regular expression.
- `-image-endpoint`: Set the endpoint of image service. Same with runtime-endpoint if not specified.
- `-runtime-endpoint`: Set the endpoint of runtime service. Default to `unix:///var/run/dockershim.sock`.
- `-ginkgo.skip`: Skip the tests that match the regular expression.
- `-parallel`: The number of parallel test nodes to run (default 1). ginkgo must be installed to run parallel tests.
- `-h`: Show help and all supported options.
This file is a placeholder to preserve links. Please remove by April 28, 2019 or the release of kubernetes 1.13, whichever comes first.

View File

@ -1,229 +1,3 @@
# Node End-To-End tests
Node e2e tests are component tests meant for testing the Kubelet code on a custom host environment.
Tests can be run either locally or against a host running on GCE.
Node e2e tests are run as both pre- and post- submit tests by the Kubernetes project.
*Note: Linux only. Mac and Windows unsupported.*
*Note: There is no scheduler running. The e2e tests have to do manual scheduling, e.g. by using `framework.PodClient`.*
# Running tests
## Locally
Why run tests *Locally*? Much faster than running tests Remotely.
Prerequisites:
- [Install etcd](https://github.com/coreos/etcd/releases) on your PATH
- Verify etcd is installed correctly by running `which etcd`
- Or make etcd binary available and executable at `/tmp/etcd`
- [Install ginkgo](https://github.com/onsi/ginkgo) on your PATH
- Verify ginkgo is installed correctly by running `which ginkgo`
From the Kubernetes base directory, run:
```sh
make test-e2e-node
```
This will: run the *ginkgo* binary against the subdirectory *test/e2e_node*, which will in turn:
- Ask for sudo access (needed for running some of the processes)
- Build the Kubernetes source code
- Pre-pull docker images used by the tests
- Start a local instance of *etcd*
- Start a local instance of *kube-apiserver*
- Start a local instance of *kubelet*
- Run the test using the locally started processes
- Output the test results to STDOUT
- Stop *kubelet*, *kube-apiserver*, and *etcd*
## Remotely
Why Run tests *Remotely*? Tests will be run in a customized pristine environment. Closely mimics what will be done
as pre- and post- submit testing performed by the project.
Prerequisites:
- [join the googlegroup](https://groups.google.com/forum/#!forum/kubernetes-dev)
`kubernetes-dev@googlegroups.com`
- *This provides read access to the node test images.*
- Setup a [Google Cloud Platform](https://cloud.google.com/) account and project with Google Compute Engine enabled
- Install and setup the [gcloud sdk](https://cloud.google.com/sdk/downloads)
- Verify the sdk is setup correctly by running `gcloud compute instances list` and `gcloud compute images list --project kubernetes-node-e2e-images`
Run:
```sh
make test-e2e-node REMOTE=true
```
This will:
- Build the Kubernetes source code
- Create a new GCE instance using the default test image
- Instance will be called **test-e2e-node-containervm-v20160321-image**
- Lookup the instance public ip address
- Copy a compressed archive file to the host containing the following binaries:
- ginkgo
- kubelet
- kube-apiserver
- e2e_node.test (this binary contains the actual tests to be run)
- Unzip the archive to a directory under **/tmp/gcloud**
- Run the tests using the `ginkgo` command
- Starts etcd, kube-apiserver, kubelet
- The ginkgo command is used because this supports more features than running the test binary directly
- Output the remote test results to STDOUT
- `scp` the log files back to the local host under /tmp/_artifacts/e2e-node-containervm-v20160321-image
- Stop the processes on the remote host
- **Leave the GCE instance running**
**Note: Subsequent tests run using the same image will *reuse the existing host* instead of deleting it and
provisioning a new one. To delete the GCE instance after each test see
*[DELETE_INSTANCE](#delete-instance-after-tests-run)*.**
# Additional Remote Options
## Run tests using different images
This is useful if you want to run tests against a host using a different OS distro or container runtime than
provided by the default image.
List the available test images using gcloud.
```sh
make test-e2e-node LIST_IMAGES=true
```
This will output a list of the available images for the default image project.
Then run:
```sh
make test-e2e-node REMOTE=true IMAGES="<comma-separated-list-images>"
```
## Run tests against a running GCE instance (not an image)
This is useful if you have an host instance running already and want to run the tests there instead of on a new instance.
```sh
make test-e2e-node REMOTE=true HOSTS="<comma-separated-list-of-hostnames>"
```
## Delete instance after tests run
This is useful if you want recreate the instance for each test run to trigger flakes related to starting the instance.
```sh
make test-e2e-node REMOTE=true DELETE_INSTANCES=true
```
## Keep instance, test binaries, and *processes* around after tests run
This is useful if you want to manually inspect or debug the kubelet process run as part of the tests.
```sh
make test-e2e-node REMOTE=true CLEANUP=false
```
## Run tests using an image in another project
This is useful if you want to create your own host image in another project and use it for testing.
```sh
make test-e2e-node REMOTE=true IMAGE_PROJECT="<name-of-project-with-images>" IMAGES="<image-name>"
```
Setting up your own host image may require additional steps such as installing etcd or docker. See
[setup_host.sh](https://git.k8s.io/kubernetes/test/e2e_node/environment/setup_host.sh) for common steps to setup hosts to run node tests.
## Create instances using a different instance name prefix
This is useful if you want to create instances using a different name so that you can run multiple copies of the
test in parallel against different instances of the same image.
```sh
make test-e2e-node REMOTE=true INSTANCE_PREFIX="my-prefix"
```
# Additional Test Options for both Remote and Local execution
## Only run a subset of the tests
To run tests matching a regex:
```sh
make test-e2e-node REMOTE=true FOCUS="<regex-to-match>"
```
To run tests NOT matching a regex:
```sh
make test-e2e-node REMOTE=true SKIP="<regex-to-match>"
```
## Run tests continually until they fail
This is useful if you are trying to debug a flaky test failure. This will cause ginkgo to continually
run the tests until they fail. **Note: this will only perform test setup once (e.g. creating the instance) and is
less useful for catching flakes related creating the instance from an image.**
```sh
make test-e2e-node REMOTE=true RUN_UNTIL_FAILURE=true
```
## Run tests in parallel
Running test in parallel can usually shorten the test duration. By default node
e2e test runs with`--nodes=8` (see ginkgo flag
[--nodes](https://onsi.github.io/ginkgo/#parallel-specs)). You can use the
`PARALLELISM` option to change the parallelism.
```sh
make test-e2e-node PARALLELISM=4 # run test with 4 parallel nodes
make test-e2e-node PARALLELISM=1 # run test sequentially
```
## Run tests with kubenet network plugin
[kubenet](http://kubernetes.io/docs/admin/network-plugins/#kubenet) is
the default network plugin used by kubelet since Kubernetes 1.3. The
plugin requires [CNI](https://github.com/containernetworking/cni) and
[nsenter](http://man7.org/linux/man-pages/man1/nsenter.1.html).
Currently, kubenet is enabled by default for Remote execution `REMOTE=true`,
but disabled for Local execution. **Note: kubenet is not supported for
local execution currently. This may cause network related test result to be
different for Local and Remote execution. So if you want to run network
related test, Remote execution is recommended.**
To enable/disable kubenet:
```sh
# enable kubenet
make test-e2e-node TEST_ARGS='--kubelet-flags="--network-plugin=kubenet --network-plugin-dir=/opt/cni/bin"'
# disable kubenet
make test-e2e-node TEST_ARGS='--kubelet-flags="--network-plugin= --network-plugin-dir="'
```
## Additional QoS Cgroups Hierarchy level testing
For testing with the QoS Cgroup Hierarchy enabled, you can pass --cgroups-per-qos flag as an argument into Ginkgo using TEST_ARGS
```sh
make test_e2e_node TEST_ARGS="--cgroups-per-qos=true"
```
# Notes on tests run by the Kubernetes project during pre-, post- submit.
The node e2e tests are run by the PR builder for each Pull Request and the results published at
the bottom of the comments section. To re-run just the node e2e tests from the PR builder add the comment
`@k8s-bot node e2e test this issue: #<Flake-Issue-Number or IGNORE>` and **include a link to the test
failure logs if caused by a flake.**
The PR builder runs tests against the images listed in [jenkins-pull.properties](https://git.k8s.io/kubernetes/test/e2e_node/jenkins/jenkins-pull.properties)
The post submit tests run against the images listed in [jenkins-ci.properties](https://git.k8s.io/kubernetes/test/e2e_node/jenkins/jenkins-ci.properties)
This file has moved to https://git.k8s.io/community/contributors/devel/sig-node/e2e-node-tests.md.
This file is a placeholder to preserve links. Please remove by April 28, 2019 or the release of kubernetes 1.13, whichever comes first.

View File

@ -1,56 +1,3 @@
# Container Runtime Interface (CRI) Networking Specifications
This file has moved to https://git.k8s.io/community/contributors/devel/sig-node/kubelet-cri-networking.md.
## Introduction
[Container Runtime Interface (CRI)](container-runtime-interface.md) is
an ongoing project to allow container
runtimes to integrate with kubernetes via a newly-defined API. This document
specifies the network requirements for container runtime
interface (CRI). CRI networking requirements expand upon kubernetes pod
networking requirements. This document does not specify requirements
from upper layers of kubernetes network stack, such as `Service`. More
background on k8s networking could be found
[here](http://kubernetes.io/docs/admin/networking/)
## Requirements
1. Kubelet expects the runtime shim to manage pod's network life cycle. Pod
networking should be handled accordingly along with pod sandbox operations.
* `RunPodSandbox` must set up pod's network. This includes, but is not limited
to allocating a pod IP, configuring the pod's network interfaces and default
network route. Kubelet expects the pod sandbox to have an IP which is
routable within the k8s cluster, if `RunPodSandbox` returns successfully.
`RunPodSandbox` must return an error if it fails to set up the pod's network.
If the pod's network has already been set up, `RunPodSandbox` must skip
network setup and proceed.
* `StopPodSandbox` must tear down the pod's network. The runtime shim
must return error on network tear down failure. If pod's network has
already been torn down, `StopPodSandbox` must skip network tear down and proceed.
* `RemovePodSandbox` may tear down pod's network, if the networking has
not been torn down already. `RemovePodSandbox` must return error on
network tear down failure.
* Response from `PodSandboxStatus` must include pod sandbox network status.
The runtime shim must return an empty network status if it failed
to construct a network status.
2. User supplied pod networking configurations, which are NOT directly
exposed by the kubernetes API, should be handled directly by runtime
shims. For instance, `hairpin-mode`, `cni-bin-dir`, `cni-conf-dir`, `network-plugin`,
`network-plugin-mtu` and `non-masquerade-cidr`. Kubelet will no longer handle
these configurations after the transition to CRI is complete.
3. Network configurations that are exposed through the kubernetes API
are communicated to the runtime shim through `UpdateRuntimeConfig`
interface, e.g. `podCIDR`. For each runtime and network implementation,
some configs may not be applicable. The runtime shim may handle or ignore
network configuration updates from `UpdateRuntimeConfig` interface.
## Extensibility
* Kubelet is oblivious to how the runtime shim manages networking, i.e
runtime shim is free to use [CNI](https://github.com/containernetworking/cni),
[CNM](https://github.com/docker/libnetwork/blob/master/docs/design.md) or
any other implementation as long as the CRI networking requirements and
k8s networking requirements are satisfied.
* Runtime shims have full visibility into pod networking configurations.
* As more network feature arrives, CRI will evolve.
## Related Issues
* Kubelet network plugin for client/server container runtimes [#28667](https://github.com/kubernetes/kubernetes/issues/28667)
* CRI networking umbrella issue [#37316](https://github.com/kubernetes/kubernetes/issues/37316)
This file is a placeholder to preserve links. Please remove by April 28, 2019 or the release of kubernetes 1.13, whichever comes first.

View File

@ -1,121 +1,3 @@
# Measuring Node Performance
This document outlines the issues and pitfalls of measuring Node performance, as
well as the tools available.
## Cluster Set-up
There are lots of factors which can affect node performance numbers, so care
must be taken in setting up the cluster to make the intended measurements. In
addition to taking the following steps into consideration, it is important to
document precisely which setup was used. For example, performance can vary
wildly from commit-to-commit, so it is very important to **document which commit
or version** of Kubernetes was used, which Docker version was used, etc.
### Addon pods
Be aware of which addon pods are running on which nodes. By default Kubernetes
runs 8 addon pods, plus another 2 per node (`fluentd-elasticsearch` and
`kube-proxy`) in the `kube-system` namespace. The addon pods can be disabled for
more consistent results, but doing so can also have performance implications.
For example, Heapster polls each node regularly to collect stats data. Disabling
Heapster will hide the performance cost of serving those stats in the Kubelet.
#### Disabling Add-ons
Disabling addons is simple. Just ssh into the Kubernetes master and move the
addon from `/etc/kubernetes/addons/` to a backup location. More details
[here](https://git.k8s.io/kubernetes/cluster/addons/).
### Which / how many pods?
Performance will vary a lot between a node with 0 pods and a node with 100 pods.
In many cases you'll want to make measurements with several different amounts of
pods. On a single node cluster scaling a replication controller makes this easy,
just make sure the system reaches a steady-state before starting the
measurement. E.g. `kubectl scale replicationcontroller pause --replicas=100`
In most cases pause pods will yield the most consistent measurements since the
system will not be affected by pod load. However, in some special cases
Kubernetes has been tuned to optimize pods that are not doing anything, such as
the cAdvisor housekeeping (stats gathering). In these cases, performing a very
light task (such as a simple network ping) can make a difference.
Finally, you should also consider which features yours pods should be using. For
example, if you want to measure performance with probing, you should obviously
use pods with liveness or readiness probes configured. Likewise for volumes,
number of containers, etc.
### Other Tips
**Number of nodes** - On the one hand, it can be easier to manage logs, pods,
environment etc. with a single node to worry about. On the other hand, having
multiple nodes will let you gather more data in parallel for more robust
sampling.
## E2E Performance Test
There is an end-to-end test for collecting overall resource usage of node
components: [kubelet_perf.go](https://git.k8s.io/kubernetes/test/e2e/node/kubelet_perf.go). To
run the test, simply make sure you have an e2e cluster running (`go run
hack/e2e.go -- -up`) and [set up](#cluster-set-up) correctly.
Run the test with `go run hack/e2e.go -- -v -test
--test_args="--ginkgo.focus=resource\susage\stracking"`. You may also wish to
customise the number of pods or other parameters of the test (remember to rerun
`make WHAT=test/e2e/e2e.test` after you do).
## Profiling
Kubelet installs the [go pprof handlers](https://golang.org/pkg/net/http/pprof/), which can be queried for CPU profiles:
```console
$ kubectl proxy &
Starting to serve on 127.0.0.1:8001
$ curl -G "http://localhost:8001/api/v1/proxy/nodes/${NODE}:10250/debug/pprof/profile?seconds=${DURATION_SECONDS}" > $OUTPUT
$ KUBELET_BIN=_output/dockerized/bin/linux/amd64/kubelet
$ go tool pprof -web $KUBELET_BIN $OUTPUT
```
`pprof` can also provide heap usage, from the `/debug/pprof/heap` endpoint
(e.g. `http://localhost:8001/api/v1/proxy/nodes/${NODE}:10250/debug/pprof/heap`).
More information on go profiling can be found
[here](http://blog.golang.org/profiling-go-programs).
## Benchmarks
Before jumping through all the hoops to measure a live Kubernetes node in a real
cluster, it is worth considering whether the data you need can be gathered
through a Benchmark test. Go provides a really simple benchmarking mechanism,
just add a unit test of the form:
```go
// In foo_test.go
func BenchmarkFoo(b *testing.B) {
b.StopTimer()
setupFoo() // Perform any global setup
b.StartTimer()
for i := 0; i < b.N; i++ {
foo() // Functionality to measure
}
}
```
Then:
```console
$ go test -bench=. -benchtime=${SECONDS}s foo_test.go
```
More details on benchmarking [here](https://golang.org/pkg/testing/).
## TODO
- (taotao) Measuring docker performance
- Expand cluster set-up section
- (vishh) Measuring disk usage
- (yujuhong) Measuring memory usage
- Add section on monitoring kubelet metrics (e.g. with prometheus)
This file has moved to https://git.k8s.io/community/contributors/devel/sig-node/node-performance-testing.md.
This file is a placeholder to preserve links. Please remove by April 28, 2019 or the release of kubernetes 1.13, whichever comes first.

View File

@ -0,0 +1,136 @@
# CRI: the Container Runtime Interface
## What is CRI?
CRI (_Container Runtime Interface_) consists of a
[protobuf API](https://git.k8s.io/kubernetes/pkg/kubelet/apis/cri/runtime/v1alpha2/api.proto),
specifications/requirements (to-be-added),
and [libraries](https://git.k8s.io/kubernetes/pkg/kubelet/server/streaming)
for container runtimes to integrate with kubelet on a node. CRI is currently in Alpha.
In the future, we plan to add more developer tools such as the CRI validation
tests.
## Why develop CRI?
Prior to the existence of CRI, container runtimes (e.g., `docker`, `rkt`) were
integrated with kubelet through implementing an internal, high-level interface
in kubelet. The entrance barrier for runtimes was high because the integration
required understanding the internals of kubelet and contributing to the main
Kubernetes repository. More importantly, this would not scale because every new
addition incurs a significant maintenance overhead in the main Kubernetes
repository.
Kubernetes aims to be extensible. CRI is one small, yet important step to enable
pluggable container runtimes and build a healthier ecosystem.
## How to use CRI?
For Kubernetes 1.6+:
1. Start the image and runtime services on your node. You can have a single
service acting as both image and runtime services.
2. Set the kubelet flags
- Pass the unix socket(s) to which your services listen to kubelet:
`--container-runtime-endpoint` and `--image-service-endpoint`.
- Use the "remote" runtime by `--container-runtime=remote`.
CRI is still young and we are actively incorporating feedback from developers
to improve the API. Although we strive to maintain backward compatibility,
developers should expect occasional API breaking changes.
*For Kubernetes 1.5, additional flags are required:*
- Set apiserver flag `--feature-gates=StreamingProxyRedirects=true`.
- Set kubelet flag `--experimental-cri=true`.
## Does Kubelet use CRI today?
Yes, Kubelet always uses CRI except for using the rktnetes integration.
The old, pre-CRI Docker integration was removed in 1.7.
## Specifications, design documents and proposals
The Kubernetes 1.5 [blog post on CRI](https://kubernetes.io/blog/2016/12/container-runtime-interface-cri-in-kubernetes/)
serves as a general introduction.
Below is a mixed list of CRI specifications/requirements, design docs and
proposals. We are working on adding more documentation for the API.
- [Original proposal](https://github.com/kubernetes/kubernetes/blob/release-1.5/docs/proposals/container-runtime-interface-v1.md)
- [Networking](kubelet-cri-networking.md)
- [Container metrics](cri-container-stats.md)
- [Exec/attach/port-forward streaming requests](https://docs.google.com/document/d/1OE_QoInPlVCK9rMAx9aybRmgFiVjHpJCHI9LrfdNM_s/edit?usp=sharing)
- [Container stdout/stderr logs](https://github.com/kubernetes/kubernetes/blob/release-1.5/docs/proposals/kubelet-cri-logging.md)
## Work-In-Progress CRI runtimes
- [cri-o](https://github.com/kubernetes-incubator/cri-o)
- [rktlet](https://github.com/kubernetes-incubator/rktlet)
- [frakti](https://github.com/kubernetes/frakti)
- [cri-containerd](https://github.com/kubernetes-incubator/cri-containerd)
## [Status update](#status-update)
### Kubernetes v1.7 release (Docker-CRI integration GA, container metrics API)
- The Docker CRI integration has been promoted to GA.
- The legacy, non-CRI Docker integration has been completely removed from
Kubelet. The deprecated `--enable-cri` flag has been removed.
- CRI has been extended to support collecting container metrics from the
runtime.
### Kubernetes v1.6 release (Docker-CRI integration Beta)
**The Docker CRI integration has been promoted to Beta, and been enabled by
default in Kubelet**.
- **Upgrade**: It is recommended to drain your node before upgrading the
Kubelet. If you choose to perform in-place upgrade, the Kubelet will
restart all Kubernetes-managed containers on the node.
- **Resource usage and performance**: There is no performance regression
in our measurement. The memory usage of Kubelet increases slightly
(~0.27MB per pod) due to the additional gRPC serialization for CRI.
- **Disable**: To disable the Docker CRI integration and fall back to the
old implementation, set `--enable-cri=false`. Note that the old
implementation has been *deprecated* and is scheduled to be removed in
the next release. You are encouraged to migrate to CRI as early as
possible.
- **Others**: The Docker container naming/labeling scheme has changed
significantly in 1.6. This is perceived as implementation detail and
should not be relied upon by any external tools or scripts.
### Kubernetes v1.5 release (CRI v1alpha1)
- [v1alpha1 version](https://github.com/kubernetes/kubernetes/blob/release-1.5/pkg/kubelet/api/v1alpha1/runtime/api.proto) of CRI is released.
#### [CRI known issues](#cri-1.5-known-issues):
- [#27097](https://github.com/kubernetes/kubernetes/issues/27097): Container
metrics are not yet defined in CRI.
- [#36401](https://github.com/kubernetes/kubernetes/issues/36401): The new
container log path/format is not yet supported by the logging pipeline
(e.g., fluentd, GCL).
- CRI may not be compatible with other experimental features (e.g., Seccomp).
- Streaming server needs to be hardened.
- [#36666](https://github.com/kubernetes/kubernetes/issues/36666):
Authentication.
- [#36187](https://github.com/kubernetes/kubernetes/issues/36187): Avoid
including user data in the redirect URL.
#### [Docker CRI integration known issues](#docker-cri-1.5-known-issues)
- Docker compatibility: Support only Docker v1.11 and v1.12.
- Network:
- [#35457](https://github.com/kubernetes/kubernetes/issues/35457): Does
not support host ports.
- [#37315](https://github.com/kubernetes/kubernetes/issues/37315): Does
not support bandwidth shaping.
- Exec/attach/port-forward (streaming requests):
- [#35747](https://github.com/kubernetes/kubernetes/issues/35747): Does
not support `nsenter` as the exec handler (`--exec-handler=nsenter`).
- Also see [CRI 1.5 known issues](#cri-1.5-known-issues) for limitations
on CRI streaming.
## Contacts
- Email: sig-node (kubernetes-sig-node@googlegroups.com)
- Slack: https://kubernetes.slack.com/messages/sig-node

View File

@ -0,0 +1,121 @@
# Container Runtime Interface: Container Metrics
[Container runtime interface
(CRI)](/contributors/devel/container-runtime-interface.md)
provides an abstraction for container runtimes to integrate with Kubernetes.
CRI expects the runtime to provide resource usage statistics for the
containers.
## Background
Historically Kubelet relied on the [cAdvisor](https://github.com/google/cadvisor)
library, an open-source project hosted in a separate repository, to retrieve
container metrics such as CPU and memory usage. These metrics are then aggregated
and exposed through Kubelet's [Summary
API](https://git.k8s.io/kubernetes/pkg/kubelet/apis/stats/v1alpha1/types.go)
for the monitoring pipeline (and other components) to consume. Any container
runtime (e.g., Docker and Rkt) integrated with Kubernetes needed to add a
corresponding package in cAdvisor to support tracking container and image file
system metrics.
With CRI being the new abstraction for integration, it was a natural
progression to augment CRI to serve container metrics to eliminate a separate
integration point.
*See the [core metrics design
proposal](/contributors/design-proposals/instrumentation/core-metrics-pipeline.md)
for more information on metrics exposed by Kubelet, and [monitoring
architecture](/contributors/design-proposals/instrumentation/monitoring_architecture.md)
for the evolving monitoring pipeline in Kubernetes.*
# Container Metrics
Kubelet is responsible for creating pod-level cgroups based on the Quality of
Service class to which the pod belongs, and passes this as a parent cgroup to the
runtime so that it can ensure all resources used by the pod (e.g., pod sandbox,
containers) will be charged to the cgroup. Therefore, Kubelet has the ability
to track resource usage at the pod level (using the built-in cAdvisor), and the
API enhancement focuses on the container-level metrics.
We include the only a set of metrics that are necessary to fulfill the needs of
Kubelet. As the requirements evolve over time, we may extend the API to support
more metrics. Below is the API with the metrics supported today.
```go
// ContainerStats returns stats of the container. If the container does not
// exist, the call returns an error.
rpc ContainerStats(ContainerStatsRequest) returns (ContainerStatsResponse) {}
// ListContainerStats returns stats of all running containers.
rpc ListContainerStats(ListContainerStatsRequest) returns (ListContainerStatsResponse) {}
```
```go
// ContainerStats provides the resource usage statistics for a container.
message ContainerStats {
// Information of the container.
ContainerAttributes attributes = 1;
// CPU usage gathered from the container.
CpuUsage cpu = 2;
// Memory usage gathered from the container.
MemoryUsage memory = 3;
// Usage of the writable layer.
FilesystemUsage writable_layer = 4;
}
// CpuUsage provides the CPU usage information.
message CpuUsage {
// Timestamp in nanoseconds at which the information were collected. Must be > 0.
int64 timestamp = 1;
// Cumulative CPU usage (sum across all cores) since object creation.
UInt64Value usage_core_nano_seconds = 2;
}
// MemoryUsage provides the memory usage information.
message MemoryUsage {
// Timestamp in nanoseconds at which the information were collected. Must be > 0.
int64 timestamp = 1;
// The amount of working set memory in bytes.
UInt64Value working_set_bytes = 2;
}
// FilesystemUsage provides the filesystem usage information.
message FilesystemUsage {
// Timestamp in nanoseconds at which the information were collected. Must be > 0.
int64 timestamp = 1;
// The underlying storage of the filesystem.
StorageIdentifier storage_id = 2;
// UsedBytes represents the bytes used for images on the filesystem.
// This may differ from the total bytes used on the filesystem and may not
// equal CapacityBytes - AvailableBytes.
UInt64Value used_bytes = 3;
// InodesUsed represents the inodes used by the images.
// This may not equal InodesCapacity - InodesAvailable because the underlying
// filesystem may also be used for purposes other than storing images.
UInt64Value inodes_used = 4;
}
```
There are three categories or resources: CPU, memory, and filesystem. Each of
the resource usage message includes a timestamp to indicate when the usage
statistics is collected. This is necessary because some resource usage (e.g.,
filesystem) are inherently more expensive to collect and may be updated less
frequently than others. Having the timestamp allows the consumer to know how
stale/fresh the data is, while giving the runtime flexibility to adjust.
Although CRI does not dictate the frequency of the stats update, Kubelet needs
a minimum guarantee of freshness of the stats for certain resources so that it
can reclaim them timely when under pressure. We will formulate the requirements
for any of such resources and include them in CRI in the near future.
*For more details on why we request cached stats with timestamps as opposed to
requesting stats on-demand, here is the [rationale](https://github.com/kubernetes/kubernetes/pull/45614#issuecomment-302258090)
behind it.*
## Status
The container metrics calls are added to CRI in Kubernetes 1.7, but Kubelet does not
yet use it to gather metrics from the runtime. We plan to enable Kubelet to
optionally consume the container metrics from the API in 1.8.

View File

@ -0,0 +1,118 @@
# Container Runtime Interface: Testing Policy
**Owner: SIG-Node**
This document describes testing policy and process for runtimes implementing the
[Container Runtime Interface (CRI)](/contributors/devel/container-runtime-interface.md)
to publish test results in a federated dashboard. The objective is to provide
the Kubernetes community an easy way to track the conformance, stability, and
supported features of a CRI runtime.
This document focuses on Kubernetes node/cluster end-to-end (E2E) testing
because many features require integration of runtime, OS, or even the cloud
provider. A higher-level integration tests provider better signals on vertical
stack compatibility to the Kubernetes community. On the other hand, runtime
developers are strongly encouraged to run low-level
[CRI validation test suite](https://github.com/kubernetes-sigs/cri-tools/blob/master/docs/validation.md)
for validation as part of their development process.
## Required and optional tests
Runtime maintainers are **required** to submit the tests listed below.
1. Node conformance test suite
2. Node feature test suite
Node E2E tests qualify an OS image with a pre-installed CRI runtime. The
runtime maintainers are free to choose any OS distribution, packaging, and
deployment mechanism. Please see the
[tutorial](e2e-node-tests.md)
to know more about the Node E2E test framework and tests for validating a
compatible OS image.
The conformance suite is a set of platform-agnostic (e.g., OS, runtime, and
cloud provider) tests that validate the conformance of the OS image. The feature
suite allows the runtime to demonstrate what features are supported with the OS
distribution.
In addition to the required tests, the runtime maintainers are *strongly
recommended to run and submit results from the Kubernetes conformance test
suite*. This cluster-level E2E test suite provides extra test signal for areas
such as Networking, which cannot be covered by CRI, or Node-level
tests. Because networking requires deep integration between the runtime, the
cloud provider, and/or other cluster components, runtime maintainers are
recommended to reach out to other relevant SIGs (e.g., SIG-GCP or SIG-AWS) for
guidance and/or sponsorship.
## Process for publishing test results
To publish tests results, please submit a proposal in the
[Kubernetes community repository](https://github.com/kubernetes/community)
briefly explaining your runtime, providing at least two maintainers, and
assigning the proposal to the leads of SIG-Node.
These test results should be published under the `sig-node` tab, organized
as follows.
```
sig-node -> sig-node-cri-{Kubernetes-version} -> [page containing the required jobs]
```
Only the last three most recent Kubernetes versions and the master branch are
kept at any time. This is consistent with the Kubernetes release schedule and
policy.
## Test job maintenance
Tests are required to run at least nightly.
The runtime maintainers are responsible for keeping the tests healthy. If the
tests are deemed not actively maintained, SIG-Node may remove the tests from
the test grid at their discretion.
## Process for adding pre-submit testing
If the tests are in good standing (i.e., consistently passing for more than 2
weeks), the runtime maintainers may request that the tests to be included in the
pre-submit Pull Request (PR) tests. Please note that the pre-submit tests
require significantly higher testing capacity, and are held at a higher standard
since they directly affect the development velocity.
If the tests are flaky or failing, and the maintainers are unable to respond and
fix the issues in a timely manner, the SIG leads may remove the runtime from
the presubmit tests until the issues are resolved.
As of now, SIG-Node only accepts promotion of Node conformance tests to
pre-submit because Kubernetes conformance tests involve a wider scope and may
need co-sponsorships from other SIGs.
## FAQ
*1. Can runtime maintainers publish results from other E2E tests?*
Yes, runtime maintainers can publish additional Node E2E tests results. These
test jobs will be displayed in the `sig-node-{runtime-name}` page. The same
policy for test maintenance applies.
As for additional Cluster E2E tests, SIG-Node may agree to host the
results. However, runtime maintainers are strongly encouraged to seek for a more
appropriate SIG to sponsor or host the results.
*2. Can these runtime-specific test jobs be considered release blocking?*
This is beyond the authority of SIG-Node, and requires agreement and consensus
across multiple SIGs (e.g., Release, the relevant cloud provider SIG, etc).
*3. How to run the aforementioned tests?*
It is hard to keep instructions are even links to them up-to-date in one
document. Please contact the relevant SIGs for assistance.
*4. How can I change the test-grid to publish the test results?*
Please contact SIG-Node for the detailed instructions.
*5. How does this policy apply to Windows containers?*
Windows containers are still in the early development phase and the features
they support change rapidly. Therefore, it is suggested to treat it as a
feature with select, whitelisted tests to run.

View File

@ -0,0 +1,53 @@
# Container Runtime Interface (CRI) Validation Testing
CRI validation testing provides a test framework and a suite of tests to validate that the Container Runtime Interface (CRI) server implementation meets all the requirements. This allows the CRI runtime developers to verify that their runtime conforms to CRI, without needing to set up Kubernetes components or run Kubernetes end-to-end tests.
CRI validation testing is GA since v1.11.0 and is hosted at the [cri-tools](https://github.com/kubernetes-sigs/cri-tools) repository. We encourage the CRI developers to report bugs or help extend the test coverage by adding more tests.
## Install
The test suites can be downloaded from cri-tools [release page](https://github.com/kubernetes-sigs/cri-tools/releases):
```sh
VERSION="v1.11.0"
wget https://github.com/kubernetes-sigs/cri-tools/releases/download/$VERSION/critest-$VERSION-linux-amd64.tar.gz
sudo tar zxvf critest-$VERSION-linux-amd64.tar.gz -C /usr/local/bin
rm -f critest-$VERSION-linux-amd64.tar.gz
```
critest requires [ginkgo](https://github.com/onsi/ginkgo) to run parallel tests. It could be installed by
```sh
go get -u github.com/onsi/ginkgo/ginkgo
```
*Note: ensure GO is installed and GOPATH is set before installing ginkgo.*
## Running tests
### Prerequisite
Before running the test, you need to _ensure that the CRI server under test is running and listening on a Unix socket_. Because the validation tests are designed to request changes (e.g., create/delete) to the containers and verify that correct status is reported, it expects to be the only user of the CRI server. Please make sure that 1) there are no existing CRI-managed containers running on the node, and 2) no other processes (e.g., Kubelet) will interfere with the tests.
### Run
```sh
critest
```
This will
- Connect to the shim of CRI container runtime
- Run the tests using `ginkgo`
- Output the test results to STDOUT
critest connects to `unix:///var/run/dockershim.sock` by default. For other runtimes, the endpoint can be set by flags `-runtime-endpoint` and `-image-endpoint`.
## Additional options
- `-ginkgo.focus`: Only run the tests that match the regular expression.
- `-image-endpoint`: Set the endpoint of image service. Same with runtime-endpoint if not specified.
- `-runtime-endpoint`: Set the endpoint of runtime service. Default to `unix:///var/run/dockershim.sock`.
- `-ginkgo.skip`: Skip the tests that match the regular expression.
- `-parallel`: The number of parallel test nodes to run (default 1). ginkgo must be installed to run parallel tests.
- `-h`: Show help and all supported options.

View File

@ -0,0 +1,229 @@
# Node End-To-End tests
Node e2e tests are component tests meant for testing the Kubelet code on a custom host environment.
Tests can be run either locally or against a host running on GCE.
Node e2e tests are run as both pre- and post- submit tests by the Kubernetes project.
*Note: Linux only. Mac and Windows unsupported.*
*Note: There is no scheduler running. The e2e tests have to do manual scheduling, e.g. by using `framework.PodClient`.*
# Running tests
## Locally
Why run tests *Locally*? Much faster than running tests Remotely.
Prerequisites:
- [Install etcd](https://github.com/coreos/etcd/releases) on your PATH
- Verify etcd is installed correctly by running `which etcd`
- Or make etcd binary available and executable at `/tmp/etcd`
- [Install ginkgo](https://github.com/onsi/ginkgo) on your PATH
- Verify ginkgo is installed correctly by running `which ginkgo`
From the Kubernetes base directory, run:
```sh
make test-e2e-node
```
This will: run the *ginkgo* binary against the subdirectory *test/e2e_node*, which will in turn:
- Ask for sudo access (needed for running some of the processes)
- Build the Kubernetes source code
- Pre-pull docker images used by the tests
- Start a local instance of *etcd*
- Start a local instance of *kube-apiserver*
- Start a local instance of *kubelet*
- Run the test using the locally started processes
- Output the test results to STDOUT
- Stop *kubelet*, *kube-apiserver*, and *etcd*
## Remotely
Why Run tests *Remotely*? Tests will be run in a customized pristine environment. Closely mimics what will be done
as pre- and post- submit testing performed by the project.
Prerequisites:
- [join the googlegroup](https://groups.google.com/forum/#!forum/kubernetes-dev)
`kubernetes-dev@googlegroups.com`
- *This provides read access to the node test images.*
- Setup a [Google Cloud Platform](https://cloud.google.com/) account and project with Google Compute Engine enabled
- Install and setup the [gcloud sdk](https://cloud.google.com/sdk/downloads)
- Verify the sdk is setup correctly by running `gcloud compute instances list` and `gcloud compute images list --project kubernetes-node-e2e-images`
Run:
```sh
make test-e2e-node REMOTE=true
```
This will:
- Build the Kubernetes source code
- Create a new GCE instance using the default test image
- Instance will be called **test-e2e-node-containervm-v20160321-image**
- Lookup the instance public ip address
- Copy a compressed archive file to the host containing the following binaries:
- ginkgo
- kubelet
- kube-apiserver
- e2e_node.test (this binary contains the actual tests to be run)
- Unzip the archive to a directory under **/tmp/gcloud**
- Run the tests using the `ginkgo` command
- Starts etcd, kube-apiserver, kubelet
- The ginkgo command is used because this supports more features than running the test binary directly
- Output the remote test results to STDOUT
- `scp` the log files back to the local host under /tmp/_artifacts/e2e-node-containervm-v20160321-image
- Stop the processes on the remote host
- **Leave the GCE instance running**
**Note: Subsequent tests run using the same image will *reuse the existing host* instead of deleting it and
provisioning a new one. To delete the GCE instance after each test see
*[DELETE_INSTANCE](#delete-instance-after-tests-run)*.**
# Additional Remote Options
## Run tests using different images
This is useful if you want to run tests against a host using a different OS distro or container runtime than
provided by the default image.
List the available test images using gcloud.
```sh
make test-e2e-node LIST_IMAGES=true
```
This will output a list of the available images for the default image project.
Then run:
```sh
make test-e2e-node REMOTE=true IMAGES="<comma-separated-list-images>"
```
## Run tests against a running GCE instance (not an image)
This is useful if you have an host instance running already and want to run the tests there instead of on a new instance.
```sh
make test-e2e-node REMOTE=true HOSTS="<comma-separated-list-of-hostnames>"
```
## Delete instance after tests run
This is useful if you want recreate the instance for each test run to trigger flakes related to starting the instance.
```sh
make test-e2e-node REMOTE=true DELETE_INSTANCES=true
```
## Keep instance, test binaries, and *processes* around after tests run
This is useful if you want to manually inspect or debug the kubelet process run as part of the tests.
```sh
make test-e2e-node REMOTE=true CLEANUP=false
```
## Run tests using an image in another project
This is useful if you want to create your own host image in another project and use it for testing.
```sh
make test-e2e-node REMOTE=true IMAGE_PROJECT="<name-of-project-with-images>" IMAGES="<image-name>"
```
Setting up your own host image may require additional steps such as installing etcd or docker. See
[setup_host.sh](https://git.k8s.io/kubernetes/test/e2e_node/environment/setup_host.sh) for common steps to setup hosts to run node tests.
## Create instances using a different instance name prefix
This is useful if you want to create instances using a different name so that you can run multiple copies of the
test in parallel against different instances of the same image.
```sh
make test-e2e-node REMOTE=true INSTANCE_PREFIX="my-prefix"
```
# Additional Test Options for both Remote and Local execution
## Only run a subset of the tests
To run tests matching a regex:
```sh
make test-e2e-node REMOTE=true FOCUS="<regex-to-match>"
```
To run tests NOT matching a regex:
```sh
make test-e2e-node REMOTE=true SKIP="<regex-to-match>"
```
## Run tests continually until they fail
This is useful if you are trying to debug a flaky test failure. This will cause ginkgo to continually
run the tests until they fail. **Note: this will only perform test setup once (e.g. creating the instance) and is
less useful for catching flakes related creating the instance from an image.**
```sh
make test-e2e-node REMOTE=true RUN_UNTIL_FAILURE=true
```
## Run tests in parallel
Running test in parallel can usually shorten the test duration. By default node
e2e test runs with`--nodes=8` (see ginkgo flag
[--nodes](https://onsi.github.io/ginkgo/#parallel-specs)). You can use the
`PARALLELISM` option to change the parallelism.
```sh
make test-e2e-node PARALLELISM=4 # run test with 4 parallel nodes
make test-e2e-node PARALLELISM=1 # run test sequentially
```
## Run tests with kubenet network plugin
[kubenet](http://kubernetes.io/docs/admin/network-plugins/#kubenet) is
the default network plugin used by kubelet since Kubernetes 1.3. The
plugin requires [CNI](https://github.com/containernetworking/cni) and
[nsenter](http://man7.org/linux/man-pages/man1/nsenter.1.html).
Currently, kubenet is enabled by default for Remote execution `REMOTE=true`,
but disabled for Local execution. **Note: kubenet is not supported for
local execution currently. This may cause network related test result to be
different for Local and Remote execution. So if you want to run network
related test, Remote execution is recommended.**
To enable/disable kubenet:
```sh
# enable kubenet
make test-e2e-node TEST_ARGS='--kubelet-flags="--network-plugin=kubenet --network-plugin-dir=/opt/cni/bin"'
# disable kubenet
make test-e2e-node TEST_ARGS='--kubelet-flags="--network-plugin= --network-plugin-dir="'
```
## Additional QoS Cgroups Hierarchy level testing
For testing with the QoS Cgroup Hierarchy enabled, you can pass --cgroups-per-qos flag as an argument into Ginkgo using TEST_ARGS
```sh
make test_e2e_node TEST_ARGS="--cgroups-per-qos=true"
```
# Notes on tests run by the Kubernetes project during pre-, post- submit.
The node e2e tests are run by the PR builder for each Pull Request and the results published at
the bottom of the comments section. To re-run just the node e2e tests from the PR builder add the comment
`@k8s-bot node e2e test this issue: #<Flake-Issue-Number or IGNORE>` and **include a link to the test
failure logs if caused by a flake.**
The PR builder runs tests against the images listed in [jenkins-pull.properties](https://git.k8s.io/kubernetes/test/e2e_node/jenkins/jenkins-pull.properties)
The post submit tests run against the images listed in [jenkins-ci.properties](https://git.k8s.io/kubernetes/test/e2e_node/jenkins/jenkins-ci.properties)

View File

@ -0,0 +1,56 @@
# Container Runtime Interface (CRI) Networking Specifications
## Introduction
[Container Runtime Interface (CRI)](container-runtime-interface.md) is
an ongoing project to allow container
runtimes to integrate with kubernetes via a newly-defined API. This document
specifies the network requirements for container runtime
interface (CRI). CRI networking requirements expand upon kubernetes pod
networking requirements. This document does not specify requirements
from upper layers of kubernetes network stack, such as `Service`. More
background on k8s networking could be found
[here](http://kubernetes.io/docs/admin/networking/)
## Requirements
1. Kubelet expects the runtime shim to manage pod's network life cycle. Pod
networking should be handled accordingly along with pod sandbox operations.
* `RunPodSandbox` must set up pod's network. This includes, but is not limited
to allocating a pod IP, configuring the pod's network interfaces and default
network route. Kubelet expects the pod sandbox to have an IP which is
routable within the k8s cluster, if `RunPodSandbox` returns successfully.
`RunPodSandbox` must return an error if it fails to set up the pod's network.
If the pod's network has already been set up, `RunPodSandbox` must skip
network setup and proceed.
* `StopPodSandbox` must tear down the pod's network. The runtime shim
must return error on network tear down failure. If pod's network has
already been torn down, `StopPodSandbox` must skip network tear down and proceed.
* `RemovePodSandbox` may tear down pod's network, if the networking has
not been torn down already. `RemovePodSandbox` must return error on
network tear down failure.
* Response from `PodSandboxStatus` must include pod sandbox network status.
The runtime shim must return an empty network status if it failed
to construct a network status.
2. User supplied pod networking configurations, which are NOT directly
exposed by the kubernetes API, should be handled directly by runtime
shims. For instance, `hairpin-mode`, `cni-bin-dir`, `cni-conf-dir`, `network-plugin`,
`network-plugin-mtu` and `non-masquerade-cidr`. Kubelet will no longer handle
these configurations after the transition to CRI is complete.
3. Network configurations that are exposed through the kubernetes API
are communicated to the runtime shim through `UpdateRuntimeConfig`
interface, e.g. `podCIDR`. For each runtime and network implementation,
some configs may not be applicable. The runtime shim may handle or ignore
network configuration updates from `UpdateRuntimeConfig` interface.
## Extensibility
* Kubelet is oblivious to how the runtime shim manages networking, i.e
runtime shim is free to use [CNI](https://github.com/containernetworking/cni),
[CNM](https://github.com/docker/libnetwork/blob/master/docs/design.md) or
any other implementation as long as the CRI networking requirements and
k8s networking requirements are satisfied.
* Runtime shims have full visibility into pod networking configurations.
* As more network feature arrives, CRI will evolve.
## Related Issues
* Kubelet network plugin for client/server container runtimes [#28667](https://github.com/kubernetes/kubernetes/issues/28667)
* CRI networking umbrella issue [#37316](https://github.com/kubernetes/kubernetes/issues/37316)

View File

@ -0,0 +1,121 @@
# Measuring Node Performance
This document outlines the issues and pitfalls of measuring Node performance, as
well as the tools available.
## Cluster Set-up
There are lots of factors which can affect node performance numbers, so care
must be taken in setting up the cluster to make the intended measurements. In
addition to taking the following steps into consideration, it is important to
document precisely which setup was used. For example, performance can vary
wildly from commit-to-commit, so it is very important to **document which commit
or version** of Kubernetes was used, which Docker version was used, etc.
### Addon pods
Be aware of which addon pods are running on which nodes. By default Kubernetes
runs 8 addon pods, plus another 2 per node (`fluentd-elasticsearch` and
`kube-proxy`) in the `kube-system` namespace. The addon pods can be disabled for
more consistent results, but doing so can also have performance implications.
For example, Heapster polls each node regularly to collect stats data. Disabling
Heapster will hide the performance cost of serving those stats in the Kubelet.
#### Disabling Add-ons
Disabling addons is simple. Just ssh into the Kubernetes master and move the
addon from `/etc/kubernetes/addons/` to a backup location. More details
[here](https://git.k8s.io/kubernetes/cluster/addons/).
### Which / how many pods?
Performance will vary a lot between a node with 0 pods and a node with 100 pods.
In many cases you'll want to make measurements with several different amounts of
pods. On a single node cluster scaling a replication controller makes this easy,
just make sure the system reaches a steady-state before starting the
measurement. E.g. `kubectl scale replicationcontroller pause --replicas=100`
In most cases pause pods will yield the most consistent measurements since the
system will not be affected by pod load. However, in some special cases
Kubernetes has been tuned to optimize pods that are not doing anything, such as
the cAdvisor housekeeping (stats gathering). In these cases, performing a very
light task (such as a simple network ping) can make a difference.
Finally, you should also consider which features yours pods should be using. For
example, if you want to measure performance with probing, you should obviously
use pods with liveness or readiness probes configured. Likewise for volumes,
number of containers, etc.
### Other Tips
**Number of nodes** - On the one hand, it can be easier to manage logs, pods,
environment etc. with a single node to worry about. On the other hand, having
multiple nodes will let you gather more data in parallel for more robust
sampling.
## E2E Performance Test
There is an end-to-end test for collecting overall resource usage of node
components: [kubelet_perf.go](https://git.k8s.io/kubernetes/test/e2e/node/kubelet_perf.go). To
run the test, simply make sure you have an e2e cluster running (`go run
hack/e2e.go -- -up`) and [set up](#cluster-set-up) correctly.
Run the test with `go run hack/e2e.go -- -v -test
--test_args="--ginkgo.focus=resource\susage\stracking"`. You may also wish to
customise the number of pods or other parameters of the test (remember to rerun
`make WHAT=test/e2e/e2e.test` after you do).
## Profiling
Kubelet installs the [go pprof handlers](https://golang.org/pkg/net/http/pprof/), which can be queried for CPU profiles:
```console
$ kubectl proxy &
Starting to serve on 127.0.0.1:8001
$ curl -G "http://localhost:8001/api/v1/proxy/nodes/${NODE}:10250/debug/pprof/profile?seconds=${DURATION_SECONDS}" > $OUTPUT
$ KUBELET_BIN=_output/dockerized/bin/linux/amd64/kubelet
$ go tool pprof -web $KUBELET_BIN $OUTPUT
```
`pprof` can also provide heap usage, from the `/debug/pprof/heap` endpoint
(e.g. `http://localhost:8001/api/v1/proxy/nodes/${NODE}:10250/debug/pprof/heap`).
More information on go profiling can be found
[here](http://blog.golang.org/profiling-go-programs).
## Benchmarks
Before jumping through all the hoops to measure a live Kubernetes node in a real
cluster, it is worth considering whether the data you need can be gathered
through a Benchmark test. Go provides a really simple benchmarking mechanism,
just add a unit test of the form:
```go
// In foo_test.go
func BenchmarkFoo(b *testing.B) {
b.StopTimer()
setupFoo() // Perform any global setup
b.StartTimer()
for i := 0; i < b.N; i++ {
foo() // Functionality to measure
}
}
```
Then:
```console
$ go test -bench=. -benchtime=${SECONDS}s foo_test.go
```
More details on benchmarking [here](https://golang.org/pkg/testing/).
## TODO
- (taotao) Measuring docker performance
- Expand cluster set-up section
- (vishh) Measuring disk usage
- (yujuhong) Measuring memory usage
- Add section on monitoring kubelet metrics (e.g. with prometheus)

View File

@ -66,8 +66,8 @@ None
SIG Technical Leads
[validation]: https://github.com/kubernetes/community/blob/master/contributors/devel/cri-validation.md
[testing policy]: https://github.com/kubernetes/community/blob/master/contributors/devel/cri-testing-policy.md
[validation]: /contributors/devel/sig-node/cri-validation.md
[testing policy]: /contributors/devel/sig-node/cri-testing-policy.md
[test grid]: https://k8s-testgrid.appspot.com/sig-node#Summary
[perf dashboard]: http://node-perf-dash.k8s.io/#/builds
[sig-governance]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance.md