From d0b4d300b278c5bf0210c8a70397a19af697618b Mon Sep 17 00:00:00 2001 From: Brad Hoekstra Date: Mon, 29 Oct 2018 15:58:59 -0400 Subject: [PATCH] Fill in more details --- ...1-20181017-kube-proxy-services-optional.md | 55 +++++++++++++++++-- 1 file changed, 50 insertions(+), 5 deletions(-) diff --git a/keps/sig-network/0031-20181017-kube-proxy-services-optional.md b/keps/sig-network/0031-20181017-kube-proxy-services-optional.md index 6e5e6e8b9..a1fef246a 100644 --- a/keps/sig-network/0031-20181017-kube-proxy-services-optional.md +++ b/keps/sig-network/0031-20181017-kube-proxy-services-optional.md @@ -31,13 +31,16 @@ superseded-by: * [User Stories](#user-stories) * [Story 1](#story-1) * [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints) + * [Design](#design) + * [Considerations](#considerations) + * [Testing](#testing) * [Risks and Mitigations](#risks-and-mitigations) * [Graduation Criteria](#graduation-criteria) ## Summary In a cluster that has a service mesh a lot of the work being done by kube-proxy is redundant and wasted. -Specifically, services that are only reached via other services in the mesh will never use the service abstaction implemented by kube-proxy in iptables (or ipvs). +Specifically, services that are only reached via other services in the mesh will never use the service abstraction implemented by kube-proxy in iptables (or ipvs). By informing the kube-proxy of this, we can lighten the work it is doing and the burden on its proxy backend. ## Motivation @@ -54,6 +57,7 @@ The goal is to reduce the load on: ### Non-Goals * Making sure the service is still routable via the service mesh +* Preserving any kube-proxy functionality for any intentionally disabled Service, including but not limited to: externalIPs, external LB routing, nodePorts, externalTrafficPolicy, healthCheckNodePort, UDP, SCTP ## Proposal @@ -65,21 +69,61 @@ As a cluster operator, operating a cluster using a service mesh I want to be abl ### Implementation Details/Notes/Constraints +#### Overview + It is important for overall scalability that kube-proxy does not watch for service/endpoint changes that it is not going to affect. This can save a lot of load on the apiserver, networking, and kube-proxy itself by never requesting the updates in the first place. As such, annotating the services directly is considered insufficient as the kube-proxy would still have to watch for changed to the service. -The proposal is to make this feature available at the namespace level: +The proposal is to make this feature available at the namespace level. We will support a new label for namespaces: `networking.k8s.io/service-proxy=disabled` -We will support a new label for namespaces: networking.k8s.io/kube-proxy=disabled +When this label is set, kube-proxy will behave as if services in that namespace do not exist. None of the functionality that kube-proxy provides will be available for services in that namespace. -kube-proxy will be modified to watch all namespaces and stop watching for services/endpoints in namespaces with the above label. +It is expected that this feature will mainly be used on large clusters with lots (>1000) of services. Any use of this feature in a smaller cluster will have negligible impact. + +The envisioned cluster that will make use of this feature looks something like the following: +* Most/all traffic from outside the cluster is handled by gateways, such that each service in the cluster does not need a nodePort +* These small number of entry points into the cluster are a part of the service mesh +* There are many micro-services in the cluster, all a part of the service mesh, that are only accessed from inside the service mesh + * These services are in a separate namespace from the gateways + +#### Design + +Currently, when ProxyServer starts up it creates informers for all Service (ServiceConfig) and Endpoints (EndpointsConfig) objects using a single shared informer factory. The new design will make these previous objects be per-namespace, and only listen on namespaces that are not 'disabled'. + +The ProxyServer type will be updated with the following new methods: +* func (s *ProxyServer) StartWatchingNamespace(ns string) + * Check if namespace is currently watched, if it is then return + * Create a shared informer factory configured with the namespace + * Create a ServiceConfig and EndpointsConfig object using the shared informer factory +* func (s *ProxyServer) StopWatchingNamespace(ns string) + * Check if namespace is currently watched, if it is not then return + * Stop the ServiceConfig and EndpointsConfig for that namespace + * Send deletion events for all objects those configs knew about + * Delete the config objects + +At startup time, ProxyServer will create an informer for all Namespace objects. +* When a namespace objects is created or updated: + * Check for the above label, and if it is not set or is not 'disabled': + * StartWatchingNamespace() + * Else: + * StopWatchingNamespace() +* When a namespace object is deleted: + * StopWatchingNamespace() + +#### Considerations + +kube-proxy has logic in it right now to not sync rules until the config objects have been synced. Care should be taken to make sure this logic still works, and that the data is only considered synced when the Namespace informer and all ServiceConfig and EndpointsConfig objects are synced. + +#### Testing The following cases should be tested. In each case, make sure that services are added/removed from iptables (or other) as expected: * Adding/removing services from namespaces with and without the above label * Adding/removing the above label from namespaces with existing services +* Deleting a namespace with services with and without the above label +* Having a label value other than 'disabled', which should behave as if the label is not set ### Risks and Mitigations -We will keep kube-proxy enabled by default, and only disable it when the cluster operator specifically asks to do so. +We will keep the existing behaviour enabled by default, and only disable it when the cluster operator specifically asks to do so. ## Graduation Criteria @@ -88,3 +132,4 @@ N/A ## Implementation History - 2018-10-17 - This KEP is created +- 2018-10-28 - KEP updated