diff --git a/content/en/blog/_posts/2019-04-24-Hardware-Accelerated-SSLTLS-Termination-in-Ingress-Controllers-using-Kubernetes-Device-Plugins-and-RuntimeClass.md b/content/en/blog/_posts/2019-04-24-Hardware-Accelerated-SSLTLS-Termination-in-Ingress-Controllers-using-Kubernetes-Device-Plugins-and-RuntimeClass.md new file mode 100644 index 0000000000..1db938ef6a --- /dev/null +++ b/content/en/blog/_posts/2019-04-24-Hardware-Accelerated-SSLTLS-Termination-in-Ingress-Controllers-using-Kubernetes-Device-Plugins-and-RuntimeClass.md @@ -0,0 +1,174 @@ +--- +layout: blog +title: 'Hardware Accelerated SSL/TLS Termination in Ingress Controllers using Kubernetes Device Plugins and RuntimeClass' +date: 2019-04-24 +--- + +**Authors:** Mikko Ylinen (Intel) + +## Abstract + +A Kubernetes Ingress is a way to connect cluster services to the world outside the cluster. In order +to correctly route the traffic to service backends, the cluster needs an Ingress controller. The +Ingress controller is responsible for setting the right destinations to backends based on the +Ingress API objects’ information. The actual traffic is routed through a proxy server that +is responsible for tasks such as load balancing and SSL/TLS (later “SSL” refers to both SSL +or TLS ) termination. The SSL termination is a CPU heavy operation due to the crypto operations +involved. To offload some of the CPU intensive work away from the CPU, OpenSSL based proxy +servers can take the benefit of OpenSSL Engine API and dedicated crypto hardware. This frees +CPU cycles for other things and improves the overall throughput of the proxy server. + +In this blog post, we will show how easy it is to make hardware accelerated crypto available +for containers running the Ingress controller proxy using some of the recently created Kubernetes +building blocks: Device plugin framework and RuntimeClass. At the end, a reference setup is given +using an HAproxy based Ingress controller accelerated using Intel® QuickAssist Technology cards. + +## About Proxies, OpenSSL Engine and Crypto Hardware + +The proxy server plays a vital role in a Kubernetes Ingress Controller function. It proxies +the traffic to the backends per Ingress objects routes. Under heavy traffic load, the performance +becomes critical especially if the proxying involves CPU intensive operations like SSL crypto. + +The OpenSSL project provides the widely adopted library for implementing the SSL protocol. Of +the commonly known proxy servers used by Kubernetes Ingress controllers, Nginx and HAproxy use +OpenSSL. The CNCF graduated Envoy proxy uses BoringSSL but there seems to be [community interest +in having OpenSSL as the alternative](https://github.com/envoyproxy/envoy/pull/5161#issuecomment-446374130) for it too. + +The OpenSSL SSL protocol library relies on libcrypto that implements the cryptographic functions. +For quite some time now (first introduced in 0.9.6 release), OpenSSL has provided an [ENGINE +concept](https://github.com/openssl/openssl/blob/master/README.ENGINE) that allows these cryptographic operations to be offloaded to a dedicated crypto +acceleration hardware. Later, a special *dynamic* ENGINE enabled the crypto hardware specific +pieces to be implemented in an independent loadable module that can be developed outside the +OpenSSL code base and distributed separately. From the application’s perspective, this is also +ideal because they don’t need to know the details of how to use the hardware, and the hardware +specific module can be loaded/used when the hardware is available. + +Hardware based crypto can greatly improve Cloud applications’ performance due to hardware +accelerated processing in SSL operations as discussed, and can provide other crypto +services like key/random number generation. Clouds can make the hardware easily available +using the dynamic ENGINE and several loadable module implementations exist, for +example, [CloudHSM](https://docs.aws.amazon.com/cloudhsm/latest/userguide/openssl-library.html), [IBMCA](https://github.com/opencryptoki/openssl-ibmca), or [QAT Engine](https://github.com/intel/QAT_Engine/). + +For Cloud deployments, the ideal scenario is for these modules to be shipped as part of +the container workload. The workload would get scheduled on a node that provides the +underlying hardware that the module needs to access. On the other hand, the workloads +should run the same way and without code modifications regardless of the crypto acceleration +hardware being available or not. The OpenSSL dynamic engine enables this. Figure 1 below +illustrates these two scenarios using a typical Ingress Controller container as an example. +The red colored boxes indicate the differences between a container with a crypto hardware +engine enabled container vs. a “standard” one. It’s worth pointing out that the configuration +changes shown do not necessarily require another version of the container since the configurations +could be managed, e.g., using ConfigMaps. + +{{
}} + +## Hardware Resources and Isolation + +To be able to deploy workloads with hardware dependencies, Kubernetes provides excellent extension +and configurability mechanisms. Let’s take a closer look into Kubernetes the [device plugin framework](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/) +(beta in 1.14) and [RuntimeClass](https://kubernetes.io/docs/concepts/containers/runtime-class/) (beta in 1.14) and learn how they can be leveraged to expose crypto +hardware to workloads. + +The device plugin framework, first introduced in Kubernetes 1.8, provides a way for hardware vendors +to register and allocate node hardware resources to Kubelets. The plugins implement the hardware +specific initialization logic and resource management. The pods can request hardware resources in +their PodSpec, which also guarantees the pod is scheduled on a node that can provide those resources. + +The device resource allocation for containers is non-trivial. For applications dealing with security, +the hardware level isolation is critical. The PCIe based crypto acceleration device functions can +benefit from IO hardware virtualization, through an I/O Memory Management Unit (IOMMU), to provide +the isolation: an *IOMMU group* the device belongs to provides the isolated resource for a workload +(assuming the crypto cards do not share the IOMMU group with other devices). The number of isolated +resources can be further increased if the PCIe device supports the Single-Root I/O Virtualization +(SR-IOV) specification. SR-IOV allows the PCIe device to be split further to *virtual functions* (VF), +derived from *physical function* (PF) devices, and each belonging to their own IOMMU group. To expose +these IOMMU isolated device functions to user space and containers, the host kernel should bind +them to a specific device driver. In Linux, this driver is vfio-pci and it makes each device +available through a character device in user space. The kernel vfio-pci driver provides user space +applications with a direct, IOMMU backed access to PCIe devices and functions, using a mechanism +called *PCI passthrough*. The interface can be leveraged by user space frameworks, such as the +Data Plane Development Kit (DPDK). Additionally, virtual machine (VM) hypervisors can provide +these user space device nodes to VMs and expose them as PCI devices to the guest kernel. +Assuming support from the guest kernel, the VM gets close to native performant direct access to the +underlying host devices. + +To advertise these device resources to Kubernetes, we can have a simple Kubernetes device plugin +that runs the initialization (i.e., binding), calls kubelet’s `Registration` gRPC service, and +implements the DevicePlugin gRPC service that kubelet calls to, e.g., to `Allocate` the resources +upon Pod creation. + +## Device Assignment and Pod Deployment + +At this point, you may ask what the container could do with a VFIO device node? The answer comes +after we first take a quick look into the Kubernetes RuntimeClass. + +The Kubernetes RuntimeClass was created to provide better control and configurability +over a variety of *runtimes* (an earlier [blog post](https://kubernetes.io/blog/2018/10/10/kubernetes-v1.12-introducing-runtimeclass/) goes into the details of the needs, +status and roadmap for it) that are available in the cluster. In essence, the RuntimeClass +provides cluster users better tools to pick and use the runtime that best suits for the pod use case. + +The OCI compatible [Kata Containers runtime](https://katacontainers.io/) provides workloads with a hardware virtualized +isolation layer. In addition to workload isolation, the Kata Containers VM has the added +side benefit that the VFIO devices, as `Allocate`’d by the device plugin, can be passed +through to the container as hardware isolated devices. The only requirement is that the +Kata Containers kernel has driver for the exposed device enabled. + +That’s all it really takes to enable hardware accelerated crypto for container workloads. To summarize: + + 1. Cluster needs a device plugin running on the node that provides the hardware + 2. Device plugin exposes the hardware to user space using the VFIO driver + 3. Pod requests the device resources and Kata Containers as the RuntimeClass in the PodSpec + 4. The container has the hardware adaptation library and the OpenSSL engine module + +Figure 2 shows the overall setup using the Container A illustrated earlier. + +{{
}} + +## Reference Setup + +Finally, we describe the necessary building blocks and steps to build a functional +setup described in Figure 2 that enables hardware accelerated SSL termination in +an Ingress Controller using an Intel® QuickAssist Technology (QAT) PCIe device. +It should be noted that the use cases are not limited to Ingress controllers, but +any OpenSSL based workload can be accelerated. + +### Cluster configuration: + * Kubernetes 1.14 (`RuntimeClass` and `DevicePlugin` feature gates enabled (both are `true` in 1.14) + * RuntimeClass ready runtime and Kata Containers configured + +### Host configuration: + * Intel® QAT driver release with the kernel drivers installed for both host kernel and Kata Containers kernel (or on a rootfs as loadable modules) + * [QAT device plugin](https://github.com/intel/intel-device-plugins-for-kubernetes/tree/master/cmd/qat_plugin) DaemonSet deployed + +### Ingress controller configuration and deployment: + * [HAproxy-ingress](https://github.com/jcmoraisjr/haproxy-ingress) ingress controller in a modified container that has + * the QAT HW HAL user space library (part of Intel® QAT SW release) and + * the [OpenSSL QAT Engine](https://github.com/intel/QAT_Engine/) built in + * Haproxy-ingress ConfigMap to enable QAT engine usage + * `ssl-engine=”qat”` + * `ssl-mode-async=true` + * Haproxy-ingress deployment `.yaml` to + * Request `qat.intel.com: n` resources + * Request `runtimeClassName: kata-containers` (name value depends on cluster config) + * (QAT device config file for each requested device resource with OpenSSL engine configured available in the container) + +Once the building blocks are available, the hardware accelerated SSL/TLS can be tested by following the [TLS termination +example](https://github.com/jcmoraisjr/haproxy-ingress/tree/master/examples/tls-termination) steps. In order to verify the hardware is used, you can check `/sys/kernel/debug/*/fw_counters` files on host as they +get updated by the Intel® QAT firmware. + +Haproxy-ingress and HAproxy are used because HAproxy can be directly configured to use the OpenSSL engine using +`ssl-engine [algo ALGOs]` configuration flag without modifications to the global openssl configuration file. +Moreover, HAproxy can offload configured algorithms using asynchronous calls (with `ssl-mode-async`) to further improve performance. + +## Call to Action + +In this blog post we have shown how Kubernetes Device Plugins and RuntimeClass can be used to provide isolated hardware +access for applications in pods to offload crypto operations to hardware accelerators. Hardware accelerators can be used +to speed up crypto operations and also save CPU cycles to other tasks. We demonstrated the setup using HAproxy that already +supports asynchronous crypto offload with OpenSSL. + +The next steps for our team is to repeat the same for Envoy (with an OpenSSL based TLS transport socket built +as an extension). Furthermore, we are working to enhance Envoy to be able to [offload BoringSSL asynchronous +private key operations](https://github.com/envoyproxy/envoy/issues/6248) to a crypto acceleration hardware. Any review feedback or help is appreciated! + +How many CPU cycles can your crypto application save for other tasks when offloading crypto processing to a dedicated accelerator? diff --git a/static/images/blog/2019-04-23-hardware-accelerated-tls-termination/k8s-blog-fig1.png b/static/images/blog/2019-04-23-hardware-accelerated-tls-termination/k8s-blog-fig1.png new file mode 100644 index 0000000000..9b8ce51185 Binary files /dev/null and b/static/images/blog/2019-04-23-hardware-accelerated-tls-termination/k8s-blog-fig1.png differ diff --git a/static/images/blog/2019-04-23-hardware-accelerated-tls-termination/k8s-blog-fig2.png b/static/images/blog/2019-04-23-hardware-accelerated-tls-termination/k8s-blog-fig2.png new file mode 100644 index 0000000000..a2b37098f6 Binary files /dev/null and b/static/images/blog/2019-04-23-hardware-accelerated-tls-termination/k8s-blog-fig2.png differ