karmada/docs/proposals/service-discovery
changzhen 26d44636e3 Added MCS Kubernetes native implementation details
Signed-off-by: changzhen <changzhen5@huawei.com>
2023-11-08 11:54:22 +08:00
..
statics Refine the native service discovery proposal 2023-08-30 09:40:13 +08:00
README.md Added MCS Kubernetes native implementation details 2023-11-08 11:54:22 +08:00

README.md

title authors reviewers approvers creation-date update-date
Service discovery with native Kubernetes naming and resolution
@bivas
@XiShanYongYe-Chang
@RainbowMango
@GitHubxsy
@Rains6
@jwcesign
@chaunceyjiang
TBD
@RainbowMango
2023-06-22 2023-08-19

Service discovery with native Kubernetes naming and resolution

Summary

With the current serviceImport controller, when a ServiceImport object is reconciled, the derived service is prefixed with derived- prefix.

This Proposal propose a method for multi-cluster service discovery using Kubernetes native Service, to modify the current implementation of Karmada's MCS. This approach does not add a derived- prefix when accessing services across clusters.

Motivation

Having a derived- prefix for Service resources seems counterintuitive when thinking about service discovery:

  • Assuming the pod is exported as the service foo
  • Another pod that wishes to access it on the same cluster will simply call foo and Kubernetes will bind to the correct one
  • If that pod is scheduled to another cluster, the original service discovery will fail as there's no service by the name foo
  • To find the original pod, the other pod is required to know it is in another cluster and use derived-foo to work properly

Goals

  • Remove the "derived-" prefix from the service
  • User-friendly and native service discovery

Non-Goals

  • Multi cluster connectivity

Proposal

Following are flows to support the service import proposal:

  1. Deployment and Service are created on cluster member1 and the Service imported to cluster member2 using ServiceImport (described below as user story 1)
  2. Deployment and Service are created on cluster member1 and both propagated to cluster member2. Service from cluster member1 is imported to cluster member2 using ServiceImport (described below as user story 2)

The proposal for this flow is what can be referred to as local-and-remote service discovery. In the process handling, it can be simply distinguished into the following scenarios:

  1. Local only - In case there's a local service by the name foo Karmada never attempts to import the remote service and doesn't create an EndPointSlice
  2. Local and Remote - Users accessing the foo service will reach either member1 or member2
  3. Remote only - in case there's a local service by the name foo Karmada will remove the local EndPointSlice and will create an EndPointSlice pointing to the other cluster (e.g. instead of resolving member2 cluster is will reach member1)

Based on the above three scenarios, we have proposed two strategies:

  • RemoteAndLocal - When accessing Service, the traffic will be evenly distributed between the local cluster and remote cluster's Service.
  • LocalFirst - When accessing Services, if the local cluster Service can provide services, it will directly access the Service of the local cluster. If a failure occurs in the Service on the local cluster, it will access the Service on remote clusters.

Note: How can we detect the failure? Maybe we need to watch the EndpointSlices resources of the relevant Services in the member cluster. If the EndpointSlices resource becomes non-existent or the statue become not ready, we need to synchronize it with other clusters. As for the specific implementation of the LocalFirst strategy, we can iterate on it subsequently.

This proposal suggests using the MultiClusterService API to enable cross-cluster service discovery. To avoid conflicts with the previously provided prefixed cross-cluster service discovery, we can add an annotation on the MultiClusterService API with the key discovery.karmada.io/strategy, its value can be either RemoteAndLocal or LocalFirst.

User Stories (Optional)

Story 1

As a Kubernetes cluster member, I want to access a service from another cluster member, So that I can communicate with the service using its original name.

Background: The Service named foo is created on cluster member1 and imported to cluster member2 using ServiceImport.

Scenario:

  1. Given that the Service named foo exists on cluster member1
  2. And the ServiceImport resource is created on cluster member2, specifying the import of foo
  3. When I try to access the service inside member2
  4. Then I can access the service using the name foo.myspace.svc.cluster.local

Story 2

As a Kubernetes cluster member, I want to handle conflicts when importing a service from another cluster member, So that I can access the service without collisions and maintain high availability.

Background: The Service named foo is created on cluster member1 and has a conflict when attempting to import to cluster member2. Conflict refers to the situation where there is already a Service foo existing on the cluster (e.g. propagated with PropagationPolicy), but we still need to import Service foo from other clusters onto this cluster (using ServiceImport)

Scenario:

  1. Given that the Service named foo exists on cluster member1
  2. And there is already a conflicting Service named foo on cluster member2
  3. When I attempt to access the service in cluster member2 using foo.myspace.svc.cluster.local
  4. Then the requests round-robin between the local foo service and the imported foo service (member1 and member2)

Notes/Constraints/Caveats (Optional)

Risks and Mitigations

Adding a Service that resolve to a remote cluster will add a network latency of communication between clusters.

Design Details

API changes

Add an annotation on the MultiClusterService API with the key discovery.karmada.io/strategy, its value can be either RemoteAndLocal or LocalFirst.

apiVersion: networking.karmada.io/v1alpha1
kind: MultiClusterService
metadata:
   name: foo
   annotation:
      discovery.karmada.io/strategy: RemoteAndLocal 
spec:
   types:
      - CrossCluster
   range:
      clusterNames:
         - member2

The optimization design for the MultiClusterService API needs to be further iterated and improved, such as fixing the annotation discovery.karmada.io/strategy in the spec.

General Idea

Before delving into the specific design details, let's first take a look from the user's perspective at what preparations they need to make.

First, the user creates a foo Deployment and Service on the Karmada control panel, and creates a PropagationPolicy to distribute them into the member cluster member1.

image

Second, the user creates an MCS object on the Karmada control plane to enable cross-cluster service foo. In this way, the service on cluster member2 can access the foo Service on cluster member1.

image

Then, present our specific plan design.

  1. When the mcs-controller detects that a user has created a MultiClusterService object, it will create a ServiceExport object in the Karmada control plane and propagates it to the source clusters. This process involves two issues.
  • How are source clusters determined?
  • How to propagate the ServiceExport object?

Detailed explanations are given below:

  • There are two ways of thinking about the first question:
    • We can determine which clusters the target service was propagated to by looking up the ResourceBinding associated with the target service, which are the source clusters.
    • Alternatively, we can just treat all clusters as source clusters. This creates some redundancies, but they can be eliminated in subsequent iterations.
  • There are four ways we can use to propagate ServiceExport to member clusters:
    • Propagated by specifying a PropagationPolicy, specifying the source clusters in the .spec.placement.clusterAffinity.clusterNames field of the PropagationPolicy.
      • pros:
        • Ability to reuse previous code to a greater extent.
      • cons:
        • PropagationPolicy is a user-oriented API that has impact on user perception.
        • In order to get real-time source clusters information, controller need to watch the ResourceBinding object. This drawback no longer exists for the direct way of treating all clusters as source cluster.
    • Propagated by specifying a ResourceBinding, specify the source clusters in the .spec.clusters field of the ResourceBinding
      • pros:
        • Ability to reuse previous code to a greater extent.
      • cons:
        • In order to get real-time source clusters information, controller need to watch the ResourceBinding object. This drawback no longer exists for the direct way of treating all clusters as source cluster.
    • Propagated by specifying a set of Works in the namespaces that correspond to the source clusters.
      • pros:
        • Clear implementation logic.
      • cons:
        • In order to get real-time source clusters information, controller need to watch the ResourceBinding object. This drawback no longer exists for the direct way of treating all clusters as source cluster.
        • Less reuse of code logic, Work object needs to be created one by one.
    • Modify the .spec.propagateDeps field of the ResourceBinding associated with the target Service object to true, enable the dependency distributor capability, and add the ServiceExport resource to the InterpretDependency resource interpreter of the Service resource.
      • pros:
        • Code logic reuse is large, do not need to watch ResourceBinding resource changes.
      • cons:
        • The controller need to enable the dependency distributor capability of the target Service object and maintain it.

Taken together, we can propagate ServiceExport to all clusters with the help of ResourceBinding.

image

  1. Depending on the existing MCS atomic capabilities, the serviceExport controller and endpointSlice controller will collect the EndpointSlices related to foo Service into the Karmada control plane.

image

  1. The mcs-controller controller propagates Service and EndpointSlice objects from karmada control-plane to the destination clusters and over-watch synchronizes their changes. Again, this process requires consideration of two issues.
  • How are destination clusters determined?
  • How to propagate the Service and EndpointSlice object?

Note: In this scenario, we haven't used the ServiceImport object yet, so we don't need to propagate it to the destination clusters.

Detailed explanations are given below:

  • We can get the destination clusters from the .spec.range field of the MultiClusterService object. One thing to consider, however, is that the resources to be propagated may already exist in the destination clusters.
    • If there is a Service existing on the target cluster, there is no need to resynchronize the EndpointSlices exported from this cluster to the cluster. Only synchronize the EndpointSlices received from other clusters.
    • If there is no Service on the target cluster, both the Service and the EndpointSlices collected from other clusters need to be synchronized to that cluster.
  • There are three ways we can use to propagate Service and EndpointSlice to the destination clusters:
  • Propagated the Service and EndpointSlice resources by specifying the respective ResourceBinding, specify the source clusters in the .spec.clusters field of the ResourceBinding
    • pros:
      • Ability to reuse previous code to a greater extent.
    • cons:
      • Since the Service object has already been propagated to the source clusters by the user, we need to create a new ResourceBinding object to propagate it to the destination clusters.
  • Propagated the Service and EndpointSlice resources by specifying the respective set of Works in the namespaces that correspond to the source clusters.
    • pros:
      • Clear implementation logic.
    • cons:
      • Less reuse of code logic, Work object needs to be created one by one.
  • Modify the .spec.propagateDeps field of the ResourceBinding associated with the target Service object to true, enable the dependency distributor capability, and add the EndpointSlice resource to the InterpretDependency resource interpreter of the Service resource.
    • pros:
      • Code logic reuse is large, do not need to watch ResourceBinding resource changes.
    • cons:
      • The controller need to enable the dependency distributor capability of the target Service object and maintain it.
      • Since the Service object has already been propagated to the source clusters by the user, we need to create a new ResourceBinding object to propagate it to the destination clusters.

We need to choose a way, or provide new ideas, to accomplish the propagation of Service and EndpointSlice resources.

Taken together, we can propagate Service and EndpointSlice to the destination clusters with the help of ResourceBinding.

image

At this point, the entire process is complete, and foo Service can now be accessed across clusters.

image

Test Plan

  • UT cover for new add code
  • E2E cover for new add case

Alternatives

One alternative approach to service discovery with native Kubernetes naming and resolution is to rely on external DNS-based service discovery mechanisms. However, this approach may require additional configuration and management overhead, as well as potential inconsistencies between different DNS implementations. By leveraging the native Kubernetes naming and resolution capabilities, the proposed solution simplifies service discovery and provides a consistent user experience.

Another alternative approach could be to enforce a strict naming convention for imported services, where a specific prefix or suffix is added to the service name to differentiate it from local services. However, this approach may introduce complexity for users and require manual handling of naming collisions. The proposed solution aims to provide a more user-friendly experience by removing the "derived-" prefix and allowing services to be accessed using their original names.