11 KiB
title | authors | reviewers | approvers | creation-date | update-date | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Service discovery with native Kubernetes naming and resolution |
|
|
|
2023-06-22 | 2023-08-19 |
Service discovery with native Kubernetes naming and resolution
Summary
With the current ServiceImportController
when a ServiceImport
object is reconciled, the derived service is prefixed with derived-
prefix.
This Proposal propose a method for multi-cluster service discovery using Kubernetes native Service, to modify the current implementation of Karmada's MCS. This approach does not add a derived-
prefix when accessing services across clusters.
Motivation
Having a derived-
prefix for Service
resources seems counterintuitive when thinking about service discovery:
- Assuming the pod is exported as the service
foo
- Another pod that wishes to access it on the same cluster will simply call
foo
and Kubernetes will bind to the correct one - If that pod is scheduled to another cluster, the original service discovery will fail as there's no service by the name
foo
- To find the original pod, the other pod is required to know it is in another cluster and use
derived-foo
to work properly
Goals
- Remove the "derived-" prefix from the service
- User-friendly and native service discovery
Non-Goals
- Multi cluster connectivity
Proposal
Following are flows to support the service import proposal:
Deployment
andService
are created on cluster member1 and theService
imported to cluster member2 usingServiceImport
(described below as user story 1)Deployment
andService
are created on cluster member1 and both propagated to cluster member2.Service
from cluster member1 is imported to cluster member2 usingServiceImport
(described below as user story 2)
The proposal for this flow is what can be referred to as local-and-remote service discovery. In the process handling, it can be simply distinguished into the following scenarios:
- Local only - In case there's a local service by the name
foo
Karmada never attempts to import the remote service and doesn't create anEndPointSlice
- Local and Remote - Users accessing the
foo
service will reach either member1 or member2 - Remote only - in case there's a local service by the name
foo
Karmada will remove the localEndPointSlice
and will create anEndPointSlice
pointing to the other cluster (e.g. instead of resolving member2 cluster is will reach member1)
Based on the above three scenarios, we have proposed two strategies:
- RemoteAndLocal - When accessing Service, the traffic will be evenly distributed between the local cluster and remote cluster's Service.
- LocalFirst - When accessing Services, if the local cluster Service can provide services, it will directly access the Service of the local cluster. If a failure occurs in the Service on the local cluster, it will access the Service on remote clusters.
Note: How can we detect the failure? Maybe we need to watch the EndpointSlices resources of the relevant Services in the member cluster. If the EndpointSlices resource becomes non-existent or the statue become not ready, we need to synchronize it with other clusters. As for the specific implementation of the LocalFirst strategy, we can iterate on it subsequently.
This proposal suggests using the MultiClusterService API to enable cross-cluster service discovery. To avoid conflicts with the previously provided prefixed cross-cluster service discovery, we can add an annotation on the MultiClusterService API with the key discovery.karmada.io/strategy
, its value can be either RemoteAndLocal
or LocalFirst
.
apiVersion: networking.karmada.io/v1alpha1
kind: MultiClusterService
metadata:
name: foo
annotation:
discovery.karmada.io/strategy: RemoteAndLocal
spec:
types:
- CrossCluster
range:
clusterNames:
- member2
User Stories (Optional)
Story 1
As a Kubernetes cluster member, I want to access a service from another cluster member, So that I can communicate with the service using its original name.
Background: The Service named foo
is created on cluster member1 and imported to cluster member2 using ServiceImport
.
Scenario:
- Given that the
Service
namedfoo
exists on cluster member1 - And the
ServiceImport
resource is created on cluster member2, specifying the import offoo
- When I try to access the service inside member2
- Then I can access the service using the name
foo.myspace.svc.cluster.local
Story 2
As a Kubernetes cluster member, I want to handle conflicts when importing a service from another cluster member, So that I can access the service without collisions and maintain high availability.
Background: The Service named foo
is created on cluster member1 and has a conflict when attempting to import to cluster member2.
Conflict refers to the situation where there is already a Service
foo
existing on the cluster (e.g. propagated with PropagationPolicy
), but we still need to import Service
foo
from other clusters onto this cluster (using ServiceImport
)
Scenario:
- Given that the
Service
namedfoo
exists on cluster member1 - And there is already a conflicting
Service
named foo on cluster member2 - When I attempt to access the service in cluster member2 using
foo.myspace.svc.cluster.local
- Then the requests round-robin between the local
foo
service and the importedfoo
service (member1 and member2)
Notes/Constraints/Caveats (Optional)
Risks and Mitigations
Adding a Service
that resolve to a remote cluster will add a network latency of communication between clusters.
Design Details
API changes
The optimization design for the MultiClusterService API needs to be further iterated and improved, such as fixing the annotation discovery.karmada.io/strategy
in the spec.
General Idea
Before delving into the specific design details, let's first take a look from the user's perspective at what preparations they need to make.
- The user creates a foo Deployment and Service on the Karmad control panel, and creates a PropagationPolicy to distribute them into the member cluster member1.
- The user creates an MCS object on the Karmada control plane to enable cross-cluster service foo. In this way, the service on cluster member2 can access the foo Service on cluster member1.
Then, present our specific plan design.
- When the
mcs-controller
detects that a user has created aMultiClusterService
object, it creates aServiceExport
in the Karmada control plane and propagates it to the source clusters through creating aResourceBinding
(the source clusters can obtain this via the Service associated withResourceBinding
).
- Depending on the existing MCS atomic capabilities, the
service-export-controller
will collect theEndpointSlices
related tofoo
Service into the Karmada control plane.
- The
mcs-controller
, on the Karmada control plane, creates aResourceBinding
to propagate Service and EndpointSlice to destination clusters. This is done considering that some target Services already exist in certain destination clusters. Therefore, it's necessary to confirm the specific destination cluster based on the strategy specified in theMultiClusterService
object.
- If there is a Service existing on the target cluster, there is no need to resynchronize the EndpointSlices exported from this cluster to the cluster. Only synchronize the EndpointSlices received from other clusters.
- If there is no Service on the target cluster, both the Service and the EndpointSlices collected from other clusters need to be synchronized to that cluster.
At this point, the entire process is complete, and foo
Service can now be accessed across clusters.
Test Plan
- UT cover for new add code
- E2E cover for new add case
Alternatives
One alternative approach to service discovery with native Kubernetes naming and resolution is to rely on external DNS-based service discovery mechanisms. However, this approach may require additional configuration and management overhead, as well as potential inconsistencies between different DNS implementations. By leveraging the native Kubernetes naming and resolution capabilities, the proposed solution simplifies service discovery and provides a consistent user experience.
Another alternative approach could be to enforce a strict naming convention for imported services, where a specific prefix or suffix is added to the service name to differentiate it from local services. However, this approach may introduce complexity for users and require manual handling of naming collisions. The proposed solution aims to provide a more user-friendly experience by removing the "derived-" prefix and allowing services to be accessed using their original names.