It turned out that iptables and ipvs does not trigger the loading of the SCTP kernel module. The proposed solution is updated accordingly, and it became a lot simpler. Also the usage of SCTP as protocol value in the Pod/container descriptor is described in the document now.
This commit is contained in:
parent
d0ad13a09e
commit
3bb64044f5
|
|
@ -42,9 +42,9 @@ superseded-by:
|
||||||
|
|
||||||
## Summary
|
## Summary
|
||||||
|
|
||||||
The goal of the SCTP support feature is to enable the usage of the SCTP protocol in Kubernetes [Service][] and [NetworkPolicy][] as an additional protocol option beside the current TCP and UDP options.
|
The goal of the SCTP support feature is to enable the usage of the SCTP protocol in Kubernetes Pod container port, [Service][] and [NetworkPolicy][] value as an additional protocol option beside the current TCP and UDP options.
|
||||||
SCTP is an IETF protocol specified in [RFC4960][], and it is used widely in telecommunications network stacks.
|
SCTP is an IETF protocol specified in [RFC4960][], and it is used widely in telecommunications network stacks.
|
||||||
Once SCTP support is added as a new protocol option for Service and NetworkPolicy those applications that require SCTP as L4 protocol on their interfaces can be deployed on Kubernetes clusters on a more straightforward way. For example they can use the native kube-dns based service discovery, and their communication can be controlled on the native NetworkPolicy way.
|
Once SCTP support is added as a new protocol option for Service, container port, and NetworkPolicy those applications that require SCTP as L4 protocol on their interfaces can be deployed on Kubernetes clusters on a more straightforward way. For example they can use the native kube-dns based service discovery, and their communication can be controlled on the native NetworkPolicy way.
|
||||||
|
|
||||||
[Service]: https://kubernetes.io/docs/concepts/services-networking/service/
|
[Service]: https://kubernetes.io/docs/concepts/services-networking/service/
|
||||||
[NetworkPolicy]: https://kubernetes.io/docs/concepts/services-networking/network-policies/
|
[NetworkPolicy]: https://kubernetes.io/docs/concepts/services-networking/network-policies/
|
||||||
|
|
@ -52,15 +52,16 @@ Once SCTP support is added as a new protocol option for Service and NetworkPolic
|
||||||
|
|
||||||
## Motivation
|
## Motivation
|
||||||
|
|
||||||
SCTP is a widely used protocol in telecommunications. It would ease the management and execution of telecommunication applications on Kubernetes if SCTP were added as a protocol option to Kubernetes Service and NetworkPolicy.
|
SCTP is a widely used protocol in telecommunications. It would ease the management and execution of telecommunication applications on Kubernetes if SCTP were added as a protocol option to Kubernetes container port, Service and NetworkPolicy.
|
||||||
|
|
||||||
### Goals
|
### Goals
|
||||||
|
|
||||||
Add SCTP support to Kubernetes Service and NetworkPolicy, so applications running in pods can use the native kube-dns based service discovery for SCTP based services, and their communication can be controlled via the native NetworkPolicy way.
|
Add SCTP support to Kubernetes container port, Service and NetworkPolicy, so applications running in pods can use the native kube-dns based service discovery for SCTP based services, they can define container ports for their SCTP based interfaces, and their communication can be controlled via the native NetworkPolicy way.
|
||||||
|
|
||||||
### Non-Goals
|
### Non-Goals
|
||||||
|
|
||||||
It is not a goal here to add SCTP support to load balancers that are provided by cloud providers. I.e. the Kubernetes user can define Services with type=LoadBalancer and Protocol=SCTP, but if the actual load balancer implementation does not support SCTP then the creation of the Service/load balancer fails.
|
It is not a goal here to add SCTP support to load balancers that are provided by cloud providers. I.e. the Kubernetes user can define Services with type=LoadBalancer and Protocol=SCTP, but if the actual load balancer implementation does not support SCTP then the creation of the Service/load balancer fails.
|
||||||
|
|
||||||
It is not a goal to support multi-homed SCTP associations.
|
It is not a goal to support multi-homed SCTP associations.
|
||||||
|
|
||||||
## Proposal
|
## Proposal
|
||||||
|
|
@ -116,6 +117,24 @@ spec:
|
||||||
port: 80
|
port: 80
|
||||||
targetPort: 9376
|
targetPort: 9376
|
||||||
```
|
```
|
||||||
|
|
||||||
|
#### SCTP as container port protocol in Pod definition
|
||||||
|
As a user of Kubernetes I want to define hostPort based port mappings for the SCTP based interfaces of my applications
|
||||||
|
Example:
|
||||||
|
```
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Pod
|
||||||
|
metadata:
|
||||||
|
name: mypod
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- name: container-1
|
||||||
|
image: mycontainerimg
|
||||||
|
ports:
|
||||||
|
- name: diameter
|
||||||
|
protocol: SCTP
|
||||||
|
```
|
||||||
|
|
||||||
#### NetworkPolicy with SCTP
|
#### NetworkPolicy with SCTP
|
||||||
As a user of Kubernetes I want to define NetworPolicies for my applications that use SCTP as L4 protocol on their interfaces, so the network controllers that support SCTP can control the accessibility of my applications on the SCTP based interfaces, too.
|
As a user of Kubernetes I want to define NetworPolicies for my applications that use SCTP as L4 protocol on their interfaces, so the network controllers that support SCTP can control the accessibility of my applications on the SCTP based interfaces, too.
|
||||||
Example:
|
Example:
|
||||||
|
|
@ -149,26 +168,34 @@ spec:
|
||||||
port: 7777
|
port: 7777
|
||||||
```
|
```
|
||||||
#### Userspace SCTP stack
|
#### Userspace SCTP stack
|
||||||
As a user of Kubernetes I want to deploy and run my applications that use a user space SCTP stack.
|
As a user of Kubernetes I want to deploy and run my applications that use a user space SCTP stack, and at the same time I want to define SCTP Services in the same cluster.
|
||||||
|
|
||||||
### Implementation Details/Notes/Constraints [optional]
|
### Implementation Details/Notes/Constraints [optional]
|
||||||
|
|
||||||
#### SCTP in Services
|
#### SCTP in Services
|
||||||
The Kubernetes API modification for Services is obvious.
|
The Kubernetes API modification for Services is obvious.
|
||||||
The selected port shall be reserved on the node, just like for TCP and UDP now. Unfortunately, golang does not have native SCTP support in the "net" package, so in order to access the kernel's SCTP API we have to introduce a new 3rd party package as a new vendor package. We plan to use the go sctp library from github.com/ishidawataru/sctp.
|
|
||||||
|
The selected port shall be reserved on the node, just like for TCP and UDP now. Unfortunately, golang does not have native SCTP support in the "net" package, so in order to access the kernel's SCTP API we have to introduce a new 3rd party package as a new vendor package. We plan to use the go sctp library from github.com/ishidawataru/sctp.
|
||||||
|
|
||||||
For Services with type=LoadBalancer we have to check how the cloud provider implementations handle new protocols, and we have to make sure that if SCTP is not supported then the request for a new load balancer, firewall rule, etc. with protocol=SCTP is rejected gracefully.
|
For Services with type=LoadBalancer we have to check how the cloud provider implementations handle new protocols, and we have to make sure that if SCTP is not supported then the request for a new load balancer, firewall rule, etc. with protocol=SCTP is rejected gracefully.
|
||||||
|
|
||||||
Kube DNS shall support SRV records with "_sctp" as "proto" value. According to our investigations, the DNS controller is very flexible from this perspective, and it can create SRV records with any protocol name. Example:
|
Kube DNS shall support SRV records with "_sctp" as "proto" value. According to our investigations, the DNS controller is very flexible from this perspective, and it can create SRV records with any protocol name. Example:
|
||||||
|
|
||||||
```
|
```
|
||||||
_diameter._sctp.my-service.default.svc.cluster.local. 30 IN SRV 10 100 1234 my-service.default.svc.cluster.local.
|
_diameter._sctp.my-service.default.svc.cluster.local. 30 IN SRV 10 100 1234 my-service.default.svc.cluster.local.
|
||||||
```
|
```
|
||||||
|
#### SCTP in the Pod's container port
|
||||||
|
The Kubernetes API modification for the Pod is obvious.
|
||||||
|
|
||||||
|
The selected port shall be reserved on the node, just like for TCP and UDP now. Unfortunately, golang does not have native SCTP support in the "net" package, so in order to access the kernel's SCTP API we have to introduce a new 3rd party package as a new vendor package. We plan to use the go sctp library from github.com/ishidawataru/sctp.
|
||||||
|
|
||||||
#### SCTP in NetworkPolicy
|
#### SCTP in NetworkPolicy
|
||||||
The Kubernetes API modification for the NetworkPolicy is obvious.
|
The Kubernetes API modification for the NetworkPolicy is obvious.
|
||||||
|
|
||||||
In order to utilize the new protocol value the network controller must support it.
|
In order to utilize the new protocol value the network controller must support it.
|
||||||
|
|
||||||
#### Interworking with applications that use a user space SCTP stack
|
#### Interworking with applications that use a user space SCTP stack
|
||||||
A userspace SCTP stack implementation cannot work together with the SCTP kernel module (lksctp) on the same node. That is, the loading of the SCTP kernel module must be avoided on nodes where such applications that use userspace SCTP stack are planned to be run. The problem comes with the introduction of the SCTP protocol option for Services with Virtual IP: once such a service is created the relevant iptables/ipvs management logic kicks-in on every node, and as a consequence it loads the SCTP kernel module on every nodes. It immediately ruins the connectivity of the userspace SCTP applications on those nodes.
|
A userspace SCTP stack implementation cannot work together with the SCTP kernel module (lksctp) on the same node. That is, the loading of the SCTP kernel module must be avoided on nodes where such applications that use userspace SCTP stack are planned to be run. The problem comes with the introduction of the SCTP protocol option for Services with Virtual IP: once such a service is created the relevant port reservation logic kicks-in on every node, it starts listening on the port, and as a consequence it loads the SCTP kernel module on every nodes. It immediately ruins the connectivity of the userspace SCTP applications on those nodes.
|
||||||
|
|
||||||
NOTE! It is not a new interworking problem between the userspace SCTP stack implementations and the SCTP kernel module. It is a known phenomenon. The userpace SCTP stack creates raw sockets with IPPROTO_SCTP. As it is clearly highlighted in the [documentation of raw sockets][]:
|
NOTE! It is not a new interworking problem between the userspace SCTP stack implementations and the SCTP kernel module. It is a known phenomenon. The userpace SCTP stack creates raw sockets with IPPROTO_SCTP. As it is clearly highlighted in the [documentation of raw sockets][]:
|
||||||
>Raw sockets may tap all IP protocols in Linux, even protocols like ICMP or TCP which have a protocol module in the kernel. In this case, the packets are passed to both the kernel module and the raw socket(s).
|
>Raw sockets may tap all IP protocols in Linux, even protocols like ICMP or TCP which have a protocol module in the kernel. In this case, the packets are passed to both the kernel module and the raw socket(s).
|
||||||
|
|
@ -180,33 +207,30 @@ The solution has been to dedicate nodes to userspace SCTP applications, and to e
|
||||||
For this reason the main task here is to provide the same isolation possibility: i.e. to provide the option to dedicate some nodes to userspace SCTP applications and ensure that k8s does not load the SCTP kernel modules on those dedicated nodes.
|
For this reason the main task here is to provide the same isolation possibility: i.e. to provide the option to dedicate some nodes to userspace SCTP applications and ensure that k8s does not load the SCTP kernel modules on those dedicated nodes.
|
||||||
|
|
||||||
As we can easily see, it is pretty easy to separate application pods that use a userspace SCTP stack from those application pods that use the kernel space SCTP stack: the usual nodeselector label based mechanism, or taints are there for this very purpose.
|
As we can easily see, it is pretty easy to separate application pods that use a userspace SCTP stack from those application pods that use the kernel space SCTP stack: the usual nodeselector label based mechanism, or taints are there for this very purpose.
|
||||||
The real challenge here is to ensure that when an SCTP Service is created in a k8s cluster the k8s logic does not create iptables or ipvs rules on those nodes that are dedicated for the applications that use userspace SCTP stack - because such an action would trigger the loading of the kernel module, but at the same time those applications that use userspace SCTP stack can still access the just created SCTP based Service vie the ClusterIP of that service - assuming that the new Service has ClusterIP allocated. There is no such challenge with regard to headless SCTP Services.
|
|
||||||
|
The real challenge here is to ensure that when an SCTP Service is created in a k8s cluster the k8s logic does not create listening SCTP sockets on those nodes that are dedicated for the applications that use userspace SCTP stack - because such an action would trigger the loading of the kernel module.
|
||||||
|
|
||||||
|
There is no such challenge with regard to headless SCTP Services.
|
||||||
|
|
||||||
This is how our way of thinking goes:
|
This is how our way of thinking goes:
|
||||||
The first task is to provide a way to dedicate nodes to userspae SCTP application so, that k8s itself is aware of that role of those nodes. It may be achieved with a node level parameter - e.g. in kube-proxy. Based on that parameter the k8be-proxy would be aware of the role of the node and it would not apply iptables or ipvs rules for SCTP Services on the node.
|
|
||||||
If a node is dedicated for userspace SCTP applications then whatever proxy solution is to run on that node, that proxy shall use userspace SCTP as well. That is, on those nodes we need a userspace proxy for the SCTP Services. Whether this usespace proxy shall be an extension of the current kube-proxy, or rather it shall be a new independent proxy - it is to be discussed. We are aware of the plans of which goal is to remove the userspace part of kube-proxy - however, we think, that this situation is different from those where the userspace kube-proxy is used for TCP or UDP traffic. I.e. even if the current TCP/UDP related userspace logic is removed from the kube-proxy, the foundations of that could be re-used for this case.
|
|
||||||
The userspace proxy would follow then the current high level logic of the kube-proxy: it would listen on an IP address of the local node, and it would establish connections to the application pods that provide the service.
|
|
||||||
The next task is to ensure that the packets that are sent by applications to the ClusterIP end up in the userspace proxy. It requires the careful setup of iptables or ipvs rules on the node, so those do not trigger the loading of the SCTP kernel module. It means, that those rules cannot use filter on the actual protocol value (SCTP), i.e. we end up with rules that simply forward the ClusterIP to the local host IP on which the userspace proxy listens. The consequence is, that the Service definition can contain only SCTP Ports, TCP or UDP Ports should not be used in that Service definition.
|
|
||||||
|
|
||||||
NOTE! The handling of TCP and UDP Services does not change on those dedicated nodes, i.e. the current iptables/ipvs/etc. mechanisms can be used for those.
|
The first task is to provide a way to dedicate nodes to userspae SCTP application so, that k8s itself is aware of that role of those nodes. It may be achieved with a node level parameter. Based on that parameter the kube-proxy would be aware of the role of the node and it would not create listening SCTP sockets for SCTP Services on the node.
|
||||||
|
|
||||||
|
NOTE! The handling of TCP and UDP Services does not change on those dedicated nodes.
|
||||||
|
|
||||||
|
NOTE! When the user defines SCTP ports to a container in a Pod definition that triggers the creation of a listening SCTP socket (and thus the loading of the SCTP kernel module) only on those nodes to which the pod is scheduled - i.e. the regular node selectors and taints can be used to avoid the collision of userspace SCTP stacks with the SCTP kernel module.
|
||||||
|
|
||||||
We propose the following alternatives for consideration in the community:
|
We propose the following alternatives for consideration in the community:
|
||||||
|
|
||||||
##### Documentation only
|
##### Documentation only
|
||||||
In this alternative we would describe in the Kubernetes documentation the mutually exclusive nature of userspace and kernel space SCTP stacks, and we would highlight, that the new SCTP Service feature must not be used in those clusters where userspace SCTP stack based applications are deployed, and in turn, userspace SCTP stack based applications cannot be deployed in such clusters where kernel space SCTP stack based applications have already been deployed. We would also highlight, that the usage of headless SCTP Services is allowed because such services do not trigger the creation of iptables/ipvs rules, thus those do not trigger the loading of the SCTP kernel module on every node.
|
In this alternative we would describe in the Kubernetes documentation the mutually exclusive nature of userspace and kernel space SCTP stacks, and we would highlight, that the new SCTP Service feature must not be used in those clusters where userspace SCTP stack based applications are deployed, and in turn, userspace SCTP stack based applications cannot be deployed in such clusters where kernel space SCTP stack based applications have already been deployed. We would also highlight, that the usage of headless SCTP Services is allowed because such services do not trigger the creation of listening SCTP sockets, thus those do not trigger the loading of the SCTP kernel module on every node.
|
||||||
|
|
||||||
##### Dedicated nodes without ClusterIP proxy
|
We would also describe that SCTP must not be used as protocol value in the Pod/container definition for those applications that use a userspace SCTP stack.
|
||||||
In this alternative we would implement the option to dedicate nodes for userspace SCTP applications, but we do not implement the userspace proxy. That is:
|
|
||||||
* there would be a kube-prpxy parameter that indicates to the kube-proxy that it must not create iptables or ipvs rules for SCTP Services on its local node
|
|
||||||
* there would not be a userspace proxy to direct traffic sent to the SCTP Service's ClusterIP to the actual service backends
|
|
||||||
|
|
||||||
As userspace SCTP applications could not use the benefits of Kubernetes Services before this enhancement, those anyway had to implement their own service discovery and SCTP traffic handling mechanisms. Following this assumption we can say, that if they continue using their current logic, they do not and will not obtain the ClusterIP from the KubeDNS, but instead they use an alternative way to find their peers, and they use some other ways for connecting to their peers - like e.g. connecting to the IP of their peers directly without any ClusterIP-like solution. That is, they will not miss the possibility to use the ClusterIP of their peers, and consequently they do not need a proxy solution on their local nodes.
|
##### A node level parameter to dedicate nodes for userspace SCTP applications
|
||||||
Also we must note here, that even those userspace SCTP applications can enjoy the benefits of having the peer SCTP endpoints in KubeDNS, and the benefits of having the relevant Service/Endpoint information on the Kubernetes API. For example, they can replace their own service discovery mechanisms with a KubeDNS based one, their custom controllers (if any) can use the state reports of SCTP Services/Endpoints via the Kubernetes API.
|
|
||||||
|
In this alternative we would implement all the tasks that we listed above, i.e. a node level parameter based on which the kube-proxy logic can skip the creation of listening SCTP sockets on the affected nodes.
|
||||||
|
|
||||||
##### Dedicated nodes and userspace proxy
|
|
||||||
In this alternative we would implement all the tasks that we listed above:
|
|
||||||
* node dedication
|
|
||||||
* userspace SCTP proxy on the dedicated nodes
|
|
||||||
|
|
||||||
[documentation of raw sockets]: http://man7.org/linux/man-pages/man7/raw.7.html
|
[documentation of raw sockets]: http://man7.org/linux/man-pages/man7/raw.7.html
|
||||||
[kernel]: https://github.com/torvalds/linux/blob/0fbc4aeabc91f2e39e0dffebe8f81a0eb3648d97/net/ipv4/ip_input.c#L191
|
[kernel]: https://github.com/torvalds/linux/blob/0fbc4aeabc91f2e39e0dffebe8f81a0eb3648d97/net/ipv4/ip_input.c#L191
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue