Updated according to the comments on github. Main functional changes: HostPort with SCTP shall be supported; type=LoadBalancer with SCTP shall be supported
This commit is contained in:
parent
683f9a0192
commit
c234ca8c12
|
|
@ -44,7 +44,7 @@ superseded-by:
|
|||
|
||||
The goal of the SCTP support feature is to enable the usage of the SCTP protocol in Kubernetes [Service][], [NetworkPolicy][], and [ContainerPort][]as an additional protocol value option beside the current TCP and UDP options.
|
||||
SCTP is an IETF protocol specified in [RFC4960][], and it is used widely in telecommunications network stacks.
|
||||
Once SCTP support is added as a new protocol option those applications that require SCTP as L4 protocol on their interfaces can be deployed on Kubernetes clusters on a more straightforward way. For example they can use the native kube-dns based service discovery, and their communication can be controlled on the native NetworkPolicy way.
|
||||
Once SCTP support is added as a new protocol option those applications that require SCTP as L4 protocol on their interfaces can be deployed on Kubernetes clusters on a more straightforward way. For example they can use the native kube-dns based service discvery, and their communication can be controlled on the native NetworkPolicy way.
|
||||
|
||||
[Service]: https://kubernetes.io/docs/concepts/services-networking/service/
|
||||
[NetworkPolicy]:
|
||||
|
|
@ -61,15 +61,14 @@ SCTP is a widely used protocol in telecommunications. It would ease the manageme
|
|||
|
||||
Add SCTP support to Kubernetes ContainerPort, Service and NetworkPolicy, so applications running in pods can use the native kube-dns based service discovery for SCTP based services, and their communication can be controlled via the native NetworkPolicy way.
|
||||
|
||||
It is also a goal to enable ingress SCTP connections from clients outside the Kubernetes cluster, and to enable egress SCTP connections to servers outside the Kubernetes cluster.
|
||||
|
||||
### Non-Goals
|
||||
|
||||
It is not a goal here to add SCTP support to load balancers that are provided by cloud providers.
|
||||
It is not a goal here to add SCTP support to load balancers that are provided by cloud providers. The Kubernetes side implementation will not restrict the usage of SCTP as the protocol for the Services with type=LoadBalancer, but we do not implement the support of SCTP into the cloud specific load balancer implementations.
|
||||
|
||||
It is not a goal to support multi-homed SCTP associations. Such a support also depends on the ability to manage multiple IP addresses for a pod, and in the case of Services with ClusterIP or NodePort the support of multi-homed assocations would also require the support of NAT for multihomed associations in iptables/ipvs.
|
||||
It is not a goal to support multi-homed SCTP associations. Such a support also depends on the ability to manage multiple IP addresses for a pod, and in the case of Services with ClusterIP or NodePort the support of multi-homed assocations would also require the support of NAT for multihomed associations in the SCTP related NF conntrack modules.
|
||||
|
||||
It is not a goal to support SCTP as protocol value for the container's HostPort. The reason: [the usage of HostPort is not recommended by Kubernetes][], and to ensure proper interworking of HostPort with userspace SCTP stacks (see below) would require an additional kubelet/kubenet configuration option. In order to keep the complexity and impact of the introduction of SCTP on a lower level we do not plan to support SCTP as new protocol value for HostPort.
|
||||
|
||||
[the usage of HostPort is not recommended by Kubernetes]:https://kubernetes.io/docs/concepts/configuration/overview/#services
|
||||
## Proposal
|
||||
|
||||
### User Stories [optional]
|
||||
|
|
@ -93,7 +92,7 @@ spec:
|
|||
```
|
||||
|
||||
#### Headless Service with SCTP
|
||||
As a user of Kubernetes I want to define headless Services for my applications that use SCTP as L4 protocol on their interfaces, so client applications can discover my applications in kube-dns, or via any other service discovery method that gets information about endpoints via the Kubernetes API.
|
||||
As a user of Kubernetes I want to define headless Services for my applications that use SCTP as L4 protocol on their interfaces, so client applications can discover my applications in kube-dns, or via any other service discovery methods that get information about endpoints via the Kubernetes API.
|
||||
|
||||
Example:
|
||||
```
|
||||
|
|
@ -128,7 +127,7 @@ spec:
|
|||
```
|
||||
|
||||
#### SCTP as container port protocol in Pod definition
|
||||
As a user of Kubernetes I want to define containerPorts for the SCTP based interfaces of my applications
|
||||
As a user of Kubernetes I want to define hostPort for the SCTP based interfaces of my applications
|
||||
Example:
|
||||
```
|
||||
apiVersion: v1
|
||||
|
|
@ -143,10 +142,48 @@ spec:
|
|||
- name: diameter
|
||||
protocol: SCTP
|
||||
containerPort: 80
|
||||
hostPort: 80
|
||||
```
|
||||
|
||||
#### SCTP port accessible from outside the cluster
|
||||
|
||||
As a user of Kubernetes I want to have the option that clien applications that reside outside of the cluster can access my SCTP based services that run in the cluster.
|
||||
|
||||
Example:
|
||||
```
|
||||
kind: Service
|
||||
apiVersion: v1
|
||||
metadata:
|
||||
name: my-service
|
||||
spec:
|
||||
type: NodePort
|
||||
selector:
|
||||
app: MyApp
|
||||
ports:
|
||||
- protocol: SCTP
|
||||
port: 80
|
||||
targetPort: 9376
|
||||
```
|
||||
|
||||
Example:
|
||||
```
|
||||
kind: Service
|
||||
apiVersion: v1
|
||||
metadata:
|
||||
name: my-service
|
||||
spec:
|
||||
selector:
|
||||
app: MyApp
|
||||
ports:
|
||||
- protocol: SCTP
|
||||
port: 80
|
||||
targetPort: 9376
|
||||
externalIPs:
|
||||
- 80.11.12.10
|
||||
```
|
||||
|
||||
#### NetworkPolicy with SCTP
|
||||
As a user of Kubernetes I want to define NetworkPolicies for my applications that use SCTP as L4 protocol on their interfaces, so the network controllers that support SCTP can control the accessibility of my applications on the SCTP based interfaces, too.
|
||||
As a user of Kubernetes I want to define NetworkPolicies for my applications that use SCTP as L4 protocol on their interfaces, so the network plugins that support SCTP can control the accessibility of my applications on the SCTP based interfaces, too.
|
||||
|
||||
Example:
|
||||
```
|
||||
|
|
@ -161,7 +198,6 @@ spec:
|
|||
role: myservice
|
||||
policyTypes:
|
||||
- Ingress
|
||||
- Egress
|
||||
ingress:
|
||||
- from:
|
||||
- ipBlock:
|
||||
|
|
@ -179,17 +215,23 @@ spec:
|
|||
port: 7777
|
||||
```
|
||||
#### Userspace SCTP stack
|
||||
As a user of Kubernetes I want to deploy and run my applications that use a userspace SCTP stack, and at the same time I want to define SCTP Services in the same cluster.
|
||||
As a user of Kubernetes I want to deploy and run my applications that use a userspace SCTP stack, and at the same time I want to define SCTP Services in the same cluster. I use a userspace SCTP stack because of the limitations of the kernel's SCTP support. For example: it's not possible to write an SCTP server that proxies/filters arbitrary SCTP streams using the sockets APIs and kernel SCTP.
|
||||
|
||||
### Implementation Details/Notes/Constraints [optional]
|
||||
|
||||
#### SCTP in Services
|
||||
The Kubernetes API modification for Services is obvious.
|
||||
##### Kubernetes API modification
|
||||
The Kubernetes API modification for Services to support SCTP is obvious.
|
||||
|
||||
In case of Servies with ClusterIP or NodePort or externalIP the selected port shall be reserved on the respective nodes, just like for TCP and UDP currently. Unfortunately, golang does not have native SCTP support in the "net" package, so in order to reserving those ports via the kernel's SCTP API we have to introduce a new 3rd party package as a new vendor package. We plan to use the go sctp library from github.com/ishidawataru/sctp.
|
||||
##### Services with host level ports
|
||||
|
||||
For Services with type=LoadBalancer we reject the Service creation request for SCTP services at API validation time.
|
||||
The kube-proxy and the kubelet starts listening on the defined TCP or UDP port in case of Servies with ClusterIP or NodePort or externalIP, and in case of containers with HostPort defined. The goal of this is to reserve the port in question so no other host level process can use that by accident. When it comes to SCTP the agreement is that we do not follow this pattern. That is, Kubernetes will not listen on host level ports with SCTP as protocol. The reason for this decision is, that the current TCP and UDP related implementation is not perfect either, it has known gaps in some use cases, and in those cases this listening is not started. But no one complained about those gaps so most probably this port reservation via listening logic is not needed at all.
|
||||
|
||||
##### Services with type=LoadBalancer
|
||||
|
||||
For Services with type=LoadBalancer we expect that the cloud provider's load balancer API client in Kubernetes rejects the requests with unsupported protocol.
|
||||
|
||||
#### SCTP support in Kube DNS
|
||||
Kube DNS shall support SRV records with "_sctp" as "proto" value. According to our investigations, the DNS controller is very flexible from this perspective, and it can create SRV records with any protocol name. I.e. there is no need for additional implementation to achieve this goal.
|
||||
|
||||
Example:
|
||||
|
|
@ -200,47 +242,39 @@ _diameter._sctp.my-service.default.svc.cluster.local. 30 IN SRV 10 100 1234 my-s
|
|||
#### SCTP in the Pod's ContainerPort
|
||||
The Kubernetes API modification for the Pod is obvious.
|
||||
|
||||
We reject the pod creation request for pods that have containers with the combination of a hostPort and SCTP as protocol at API validation time.
|
||||
We support SCTP as protocol for any combinations of containerPort and hostPort.
|
||||
|
||||
#### SCTP in NetworkPolicy
|
||||
The Kubernetes API modification for the NetworkPolicy is obvious.
|
||||
|
||||
In order to utilize the new protocol value the network controller must support it.
|
||||
In order to utilize the new protocol value the network plugin must support it.
|
||||
|
||||
#### Interworking with applications that use a user space SCTP stack
|
||||
A userspace SCTP stack implementation cannot work together with the SCTP kernel module (lksctp) on the same node. That is, the loading of the SCTP kernel module must be avoided on nodes where such applications that use userspace SCTP stack are planned to be run. The problem comes with the introduction of the SCTP protocol option for Services with Virtual IP (the "type" of the Service is ClusterIP or NodePort): once such a service is created the relevant port reservation logic kicks-in on every node in the cluster, it starts listening on the port, and as a consequence it loads the SCTP kernel module on every nodes. It immediately ruins the connectivity of the userspace SCTP applications on those nodes.
|
||||
|
||||
The same interworking problem stands for Services with "externalIP" defined.
|
||||
|
||||
NOTE! It is not a new interworking problem between the userspace SCTP stack implementations and the SCTP kernel module. It is a known phenomenon. The userpace SCTP stack creates raw sockets with IPPROTO_SCTP. As it is clearly highlighted in the [documentation of raw sockets][]:
|
||||
##### Problem definition
|
||||
A userpace SCTP stack usually creates raw sockets with IPPROTO_SCTP. And as it is clearly highlighted in the [documentation of raw sockets][]:
|
||||
>Raw sockets may tap all IP protocols in Linux, even protocols like ICMP or TCP which have a protocol module in the kernel. In this case, the packets are passed to both the kernel module and the raw socket(s).
|
||||
|
||||
I.e. it is the normal function of the [kernel][], that it sends the incoming packet to both sides: the raw socket and relevant the kernel module. In this case the kernel module will handle those packets that are destined to the raw socket as Out of the blue (OOTB) packets according to the rules defined in the [RFC4960][].
|
||||
I.e. if both the kernel module (lksctp) and a userspace SCTP stack are active on the same node both receive the incoming SCTP packets according to the current [kernel][] logic.
|
||||
|
||||
In order to resolve this problem the solution has been to dedicate nodes to userspace SCTP applications, and to ensure that on those nodes the SCTP kernel module is not loaded.
|
||||
But in turn the SCTP kernel module will handle those packets that are actually destined to the raw socket as Out of the blue (OOTB) packets according to the rules defined in [RFC4960][]. I.e. the SCTP kernel module sends SCTP ABORT to the sender, and on that way it aborts the connections of the userspace SCTP stack.
|
||||
|
||||
For this reason the main task here is to provide the same isolation possibility: i.e. to provide the option to dedicate some nodes to userspace SCTP applications and ensure that the actions performed by Kubernetes do not load the SCTP kernel modules on those dedicated nodes.
|
||||
As we can see, a userspace SCTP stack cannot co-exist with the SCTP kernel module (lksctp) on the same node. That is, the loading of the SCTP kernel module must be avoided on nodes where such applications that use userspace SCTP stack are planned to be run. The SCTP kernel module loading is triggered when an application starts managing SCTP sockets via the standard socket API or via syscalls.
|
||||
|
||||
As we can easily see, it is pretty easy to separate application pods that use a userspace SCTP stack from those application pods that use the kernel space SCTP stack: the usual nodeselector label based mechanism, or taints are there for this very purpose.
|
||||
In order to resolve this problem the solution was to dedicate nodes to userspace SCTP applications in the past. Such applications that would trigger the loading of the SCTP kernel module were not deployed on those nodes.
|
||||
|
||||
The real challenge here is to ensure that when an SCTP Service is created in a Kubernetes cluster the Kubernetes logic does not create listening SCTP sockets on those nodes that are dedicated for the applications that use userspace SCTP stack - because such an action would trigger the loading of the kernel module.
|
||||
##### The solution in the Kubernetes SCTP support implementation
|
||||
Our main task here is to provide the same node level isolation possibility that was used in the past: i.e. to provide the option to dedicate some nodes to userspace SCTP applications, and ensure that the actions performed by Kubernetes (kubelet, kube-proxy) do not load the SCTP kernel modules on those dedicated nodes.
|
||||
|
||||
There is no such challenge with regard to headless SCTP Services.
|
||||
On the Kubernetes side we solve this problem so, that we do not start listening on the SCTP ports defined for Servies with ClusterIP or NodePort or externalIP, neither in the case when containers with SCTP HostPort are defined. On this way we avoid the loading of the kernel module due to Kubernetes actions.
|
||||
|
||||
This is how our way of thinking goes:
|
||||
|
||||
The first task is to provide a way to dedicate nodes to userspae SCTP application so, that Kubernetes itself is aware of that role of those nodes. It may be achieved with a node level parameter. Based on that parameter the kube-proxy would be aware of the role of the node and it would not create listening SCTP sockets for SCTP Services on the node.
|
||||
On application side it is pretty easy to separate application pods that use a userspace SCTP stack from those application pods that use the kernel space SCTP stack: the usual nodeselector label based mechanism, or taints are there for this very purpose.
|
||||
|
||||
NOTE! The handling of TCP and UDP Services does not change on those dedicated nodes.
|
||||
|
||||
We propose the following alternatives for consideration in the community:
|
||||
We propose the following solution:
|
||||
|
||||
##### Documentation only
|
||||
In this alternative we would describe in the Kubernetes documentation the mutually exclusive nature of userspace and kernel space SCTP stacks, and we would highlight, that the new SCTP Service feature must not be used in those clusters where userspace SCTP stack based applications are deployed, and in turn, userspace SCTP stack based applications cannot be deployed in such clusters where kernel space SCTP stack based applications have already been deployed. We would also highlight, that the usage of headless SCTP Services is possible because such services do not trigger the creation of listening SCTP sockets, thus those do not trigger the loading of the SCTP kernel module on every node.
|
||||
|
||||
##### A node level parameter to dedicate nodes for userspace SCTP applications
|
||||
|
||||
In this alternative we would implement all the tasks that we listed above, i.e. a node level parameter based on which the kube-proxy logic can skip the creation of listening SCTP sockets on the affected nodes.
|
||||
We describe in the Kubernetes documentation the mutually exclusive nature of userspace and kernel space SCTP stacks, and we would highlight, that the required separation of the userspace SCTP stack applications and the kernel module users shall be achieved with the usual nodeselector or taint based mechanisms.
|
||||
|
||||
|
||||
[documentation of raw sockets]: http://man7.org/linux/man-pages/man7/raw.7.html
|
||||
|
|
|
|||
Loading…
Reference in New Issue