Merge pull request #4287 from jwcesign/added-K8s-native-implementation-details

doc: add proposal for mcs with native service name
This commit is contained in:
karmada-bot 2023-11-29 22:34:58 +08:00 committed by GitHub
commit 01b086d2e1
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
10 changed files with 174 additions and 103 deletions

View File

@ -3,13 +3,12 @@ title: Service discovery with native Kubernetes naming and resolution
authors:
- "@bivas"
- "@XiShanYongYe-Chang"
- "@jwcesign"
reviewers:
- "@RainbowMango"
- "@GitHubxsy"
- "@Rains6"
- "@jwcesign"
- "@chaunceyjiang"
- TBD
approvers:
- "@RainbowMango"
@ -22,7 +21,7 @@ update-date: 2023-08-19
## Summary
With the current `serviceImport` controller, when a `ServiceImport` object is reconciled, the derived service is prefixed with `derived-` prefix.
In multi-cluster scenarios, there is a need to access services across clusters. Currently, Karmada support this by creating derived service(with `derived-` prefix, ) in other clusters to access the service.
This Proposal propose a method for multi-cluster service discovery using Kubernetes native Service, to modify the current implementation of Karmada's MCS. This approach does not add a `derived-` prefix when accessing services across clusters.
@ -32,12 +31,15 @@ This Proposal propose a method for multi-cluster service discovery using Kuberne
This section is for explicitly listing the motivation, goals, and non-goals of
this KEP. Describe why the change is important and the benefits to users.
-->
Having a `derived-` prefix for `Service` resources seems counterintuitive when thinking about service discovery:
- Assuming the pod is exported as the service `foo`
- Another pod that wishes to access it on the same cluster will simply call `foo` and Kubernetes will bind to the correct one
- If that pod is scheduled to another cluster, the original service discovery will fail as there's no service by the name `foo`
- To find the original pod, the other pod is required to know it is in another cluster and use `derived-foo` to work properly
If Karmada supports service discovery using native Kubernetes naming and resolution (without the `derived-` prefix), users can access the service using its original name without needing to modify their code to accommodate services with the `derived-` prefix.
### Goals
- Remove the "derived-" prefix from the service
@ -51,8 +53,8 @@ Having a `derived-` prefix for `Service` resources seems counterintuitive when t
Following are flows to support the service import proposal:
1. `Deployment` and `Service` are created on cluster member1 and the `Service` imported to cluster member2 using `ServiceImport` (described below as [user story 1](#story-1))
2. `Deployment` and `Service` are created on cluster member1 and both propagated to cluster member2. `Service` from cluster member1 is imported to cluster member2 using `ServiceImport` (described below as [user story 2](#story-2))
1. `Deployment` and `Service` are created on cluster member1 and the `Service` imported to cluster member2 using `MultiClusterService` (described below as [user story 1](#story-1))
2. `Deployment` and `Service` are created on cluster member1 and both propagated to cluster member2. `Service` from cluster member1 is imported to cluster member2 using `MultiClusterService` (described below as [user story 2](#story-2))
The proposal for this flow is what can be referred to as local-and-remote service discovery. In the process handling, it can be simply distinguished into the following scenarios:
@ -60,16 +62,17 @@ The proposal for this flow is what can be referred to as local-and-remote servic
2. **Local** and **Remote** - Users accessing the `foo` service will reach either member1 or member2
3. **Remote** only - in case there's a local service by the name `foo` Karmada will remove the local `EndPointSlice` and will create an `EndPointSlice` pointing to the other cluster (e.g. instead of resolving member2 cluster is will reach member1)
Based on the above three scenarios, we have proposed two strategies:
Based on the above three scenarios, we think there is two reasonable strategies(Users can utilize PP to propagate the Service and implement the `Local` scenario, it's not necessary to implement it with `MultiClusterService`):
- **RemoteAndLocal** - When accessing Service, the traffic will be evenly distributed between the local cluster and remote cluster's Service.
- **LocalFirst** - When accessing Services, if the local cluster Service can provide services, it will directly access the Service of the local cluster. If a failure occurs in the Service on the local cluster, it will access the Service on remote clusters.
> Note: How can we detect the failure?
> Maybe we need to watch the EndpointSlices resources of the relevant Services in the member cluster. If the EndpointSlices resource becomes non-existent or the statue become not ready, we need to synchronize it with other clusters.
> As for the specific implementation of the LocalFirst strategy, we can iterate on it subsequently.
This proposal suggests using the [MultiClusterService API](https://github.com/karmada-io/karmada/blob/24bb5829500658dd1caeea16eeace8252bcef682/pkg/apis/networking/v1alpha1/service_types.go#L30) to enable cross-cluster service discovery. To avoid conflicts with the previously provided [prefixed cross-cluster service discovery](./../networking/multiclusterservice.md#cross-clusters-service-discovery), we can add an annotation on the MultiClusterService API with the key `discovery.karmada.io/strategy`, its value can be either `RemoteAndLocal` or `LocalFirst`.
This proposal suggests using the [MultiClusterService API](https://github.com/karmada-io/karmada/blob/24bb5829500658dd1caeea16eeace8252bcef682/pkg/apis/networking/v1alpha1/service_types.go#L30) to enable cross-cluster service discovery.
This proposal focuses on the `RemoteAndLocal` strategy, and we will subsequently iterate on the `LocalFirst` strategy.
### User Stories (Optional)
@ -82,27 +85,16 @@ bogged down.
#### Story 1
As a Kubernetes cluster member,
I want to access a service from another cluster member,
So that I can communicate with the service using its original name.
**Background**: The Service named `foo` is created on cluster member1 and imported to cluster member2 using `ServiceImport`.
As a user of a Kubernetes cluster, I want to be able to access a service whose corresponding pods are located in another cluster. I hope to communicate with the service using its original name.
**Scenario**:
1. Given that the `Service` named `foo` exists on cluster member1
2. And the `ServiceImport` resource is created on cluster member2, specifying the import of `foo`
3. When I try to access the service inside member2
4. Then I can access the service using the name `foo.myspace.svc.cluster.local`
1. When I try to access the service inside member2, I can access the service using the name `foo.myspace.svc.cluster.local`
#### Story 2
As a Kubernetes cluster member,
I want to handle conflicts when importing a service from another cluster member,
So that I can access the service without collisions and maintain high availability.
**Background**: The Service named `foo` is created on cluster member1 and has a conflict when attempting to import to cluster member2.
Conflict refers to the situation where there is already a `Service` `foo` existing on the cluster (e.g. propagated with `PropagationPolicy`), but we still need to import `Service` `foo` from other clusters onto this cluster (using `ServiceImport`)
As a user of a Kubernetes cluster, I want to access a service that has pods located in both this cluster and another. I expect to communicate with the service using its original name, and have the requests routed to the appropriate pods across clusters.
**Scenario**:
@ -131,6 +123,7 @@ How will UX be reviewed, and by whom?
Consider including folks who also work outside the SIG or subproject.
-->
Adding a `Service` that resolve to a remote cluster will add a network latency of communication between clusters.
## Design Details
@ -144,118 +137,196 @@ proposal will be implemented, this is the place to discuss them.
### API changes
Add an annotation on the MultiClusterService API with the key `discovery.karmada.io/strategy`, its value can be either `RemoteAndLocal` or `LocalFirst`.
This proposal proposes two new fields `ServiceProvisionClusters` and `ServiceConsumptionClusters` in `MultiClusterService` API.
```go
type MultiClusterServiceSpec struct {
...
// ServiceProvisionClusters specifies the clusters which will provision the service backend.
// If leave it empty, we will collect the backend endpoints from all clusters and sync
// them to the ServiceConsumptionClusters.
// +optional
ServiceProvisionClusters []string `json:"serviceProvisionClusters,omitempty"`
// ServiceConsumptionClusters specifies the clusters where the service will be exposed, for clients.
// If leave it empty, the service will be exposed to all clusters.
// +optional
ServiceConsumptionClusters []string `json:"serviceConsumptionClusters,omitempty"`
}
```
With this API, we will:
* Use `ServiceProvisionClusters` to specify the member clusters which will provision the service backend, if leave it empty, we will collect the backend endpoints from all clusters and sync them to the `ServiceConsumptionClusters`.
* Use `ServiceConsumptionClusters` to specify the clusters where the service will be exposed. If leave it empty, the service will be exposed to all clusters.
For example, if we want access `foo`` service which are localted in member2 from member3 , we can use the following yaml:
```yaml
apiVersion: v1
kind: Service
metadata:
name: foo
spec:
ports:
- port: 80
targetPort: 8080
selector:
app: foo
---
apiVersion: networking.karmada.io/v1alpha1
kind: MultiClusterService
metadata:
name: foo
annotation:
discovery.karmada.io/strategy: RemoteAndLocal
spec:
types:
- CrossCluster
range:
clusterNames:
- member2
serviceProvisionClusters:
- member2
serviceConsumptionClusters:
- member3
```
The optimization design for the MultiClusterService API needs to be further iterated and improved, such as fixing the annotation `discovery.karmada.io/strategy` in the spec.
### Implementation workflow
### General Idea
#### Service propagation
Before delving into the specific design details, let's first take a look from the user's perspective at what preparations they need to make.
The process of propagating Service from Karmada control plane to member clusters is as follows:
First, the user creates a foo Deployment and Service on the Karmada control panel, and creates a PropagationPolicy to distribute them into the member cluster `member1`.
<p align="center">
<img src="./statics/mcs-svc-sync.png" width="60%">
</p>
![image](statics/user-operation-01.png)
1. `multiclusterservice` controller will list&watch `Service` and `MultiClusterService` resources from Karmada control plane.
1. Once there is same name `MultiClusterService` and `Service`, `multiclusterservice` will create the Work(corresponding to `Service`), the target cluster namespace is all the clusters in filed `spec.serviceProvisionClusters` and `spec.serviceConsumptionClusters`.
1. The Work will be synchronized with the member clusters. After synchronization, `EndpointSlice` will be created in member clusters.
Second, the user creates an MCS object on the Karmada control plane to enable cross-cluster service `foo`. In this way, the service on cluster `member2` can access the foo Service on cluster member1.
#### `EndpointSlice` synchronization
![image](statics/user-operation-02.png)
The process of synchronizing `EndpointSlice` from `ServiceProvisionClusters` to `ServiceConsumptionClusters` is as follows:
Then, present our specific plan design.
<p align="center">
<img src="./statics/mcs-eps-collect.png" height="400px">
</p>
1. When the `mcs-controller` detects that a user has created a `MultiClusterService` object, it will create a `ServiceExport` object in the Karmada control plane and propagates it to the source clusters. This process involves two issues.
- How are source clusters determined?
- How to propagate the `ServiceExport` object?
1. `endpointsliceCollect` controller will list&watch `MultiClusterService`.
1. `endpointsliceCollect` controller will build the informer to list&watch the target service's EndpointSlice from `ServiceProvisionClusters`.
1. `endpointsliceCollect` controller will create the corresponding Work for each `EndpointSlice` in the cluster namespace.
When creating the Work, in order to delete the corresponding work when `MultiClusterService` deletion, we should add following labels:
* `endpointslice.karmada.io/name`: the service name of the original `EndpointSlice`.
* `endpointslice.karmada.io/namespace`: the service namespace of the original `EndpointSlice`.
Detailed explanations are given below:
<p align="center">
<img src="./statics/mcs-eps-sync.png" height="400px">
</p>
- There are two ways of thinking about the first question:
- We can determine which clusters the target service was propagated to by looking up the `ResourceBinding` associated with the target service, which are the source clusters.
- Alternatively, we can just treat all clusters as source clusters. This creates some redundancies, but they can be eliminated in subsequent iterations.
- There are four ways we can use to propagate `ServiceExport` to member clusters:
- Propagated by specifying a `PropagationPolicy`, specifying the source clusters in the `.spec.placement.clusterAffinity.clusterNames` field of the `PropagationPolicy`.
- pros:
- Ability to reuse previous code to a greater extent.
- cons:
- `PropagationPolicy` is a user-oriented API that has impact on user perception.
- In order to get real-time source clusters information, controller need to watch the `ResourceBinding` object. This drawback no longer exists for the direct way of treating all clusters as source cluster.
- Propagated by specifying a `ResourceBinding`, specify the source clusters in the `.spec.clusters` field of the `ResourceBinding`
- pros:
- Ability to reuse previous code to a greater extent.
- cons:
- In order to get real-time source clusters information, controller need to watch the `ResourceBinding` object. This drawback no longer exists for the direct way of treating all clusters as source cluster.
- Propagated by specifying a set of `Work`s in the namespaces that correspond to the source clusters.
- pros:
- Clear implementation logic.
- cons:
- In order to get real-time source clusters information, controller need to watch the `ResourceBinding` object. This drawback no longer exists for the direct way of treating all clusters as source cluster.
- Less reuse of code logic, `Work` object needs to be created one by one.
- Modify the `.spec.propagateDeps` field of the ResourceBinding associated with the target `Service` object to true, enable the dependency distributor capability, and add the `ServiceExport` resource to the [InterpretDependency](https://karmada.io/docs/next/userguide/globalview/customizing-resource-interpreter/#interpretdependency) resource interpreter of the `Service` resource.
- pros:
- Code logic reuse is large, do not need to watch ResourceBinding resource changes.
- cons:
- The controller need to enable the dependency distributor capability of the target `Service` object and maintain it.
1. `endpointsliceDispatch` controller will list&watch `MultiClusterService`.
1. `endpointsliceDispatch` controller will list&watch `EndpointSlice` from `MultiClusterService`'s `spec.serviceProvisionClusters`.
1. `endpointsliceDispatch` controller will creat the corresponding Work for each `EndpointSlice` in the cluster namespace of `MultiClusterService`'s `spec.serviceConsumptionClusters`.
When creating the Work, in order to facilitate problem investigation, we should add following annotation to record the original `EndpointSlice` information:
* `endpointslice.karmada.io/work-provision-cluster`: the cluster name of the original `EndpointSlice`.
Also, we should add the following annotation to the syned `EndpointSlice` record the original information:
* `endpointslice.karmada.io/endpointslice-generation`: the resoruce generation of the `EndpointSlice`, it could be used to check whether the `EndpointSlice` is the newest version.
* `endpointslice.karmada.io/provision-cluster`: the cluster location of the original `EndpointSlice`.
1. Karmada will sync the `EndpointSlice`'s work to the member clusters.
Taken together, we can propagate `ServiceExport` to all clusters with the help of `ResourceBinding`.
But, there is one point to note that, assume I have following configuration:
```yaml
apiVersion: v1
kind: Service
metadata:
name: foo
spec:
ports:
- port: 80
targetPort: 8080
selector:
app: foo
---
apiVersion: networking.karmada.io/v1alpha1
kind: MultiClusterService
metadata:
name: foo
spec:
types:
- CrossCluster
serviceProvisionClusters:
- member1
- member2
serviceConsumptionClusters:
- member2
```
![image](statics/design-01.png)
When create the corresponding Work, Karmada should only sync the exists `EndpointSlice` in `member1` to `member2`.
2. Depending on the existing MCS atomic capabilities, the `serviceExport` controller and `endpointSlice` controller will collect the `EndpointSlices` related to `foo` Service into the Karmada control plane.
### Components change
![image](statics/design-02.png)
#### karmada-controller
3. The `mcs-controller` controller propagates `Service` and `EndpointSlice` objects from karmada control-plane to the destination clusters and over-watch synchronizes their changes. Again, this process requires consideration of two issues.
- How are destination clusters determined?
- How to propagate the `Service` and `EndpointSlice` object?
* Add `multiclusterservice` controller to support reconcile `MultiClusterService` and Clusters, including creation/deletion/updating.
* Add `endpointsliceCollect` controller to support reconcile `MultiClusterService` and Clusters, collect `EndpointSlice` from `ServerClusters` as work.
* Add `endpointsliceDispatch` controller to support reconcile `MultiClusterService` and Clusters, dispatch `EndpointSlice` work from `serviceProvisionClusters` to `serviceConsumptionClusters`.
> Note: In this scenario, we haven't used the `ServiceImport` object yet, so we don't need to propagate it to the destination clusters.
### Status Record
Detailed explanations are given below:
We should have following Condition in `MultiClusterService`:
```go
MCSServiceAppliedConditionType = "ServiceApplied"
- We can get the destination clusters from the `.spec.range` field of the `MultiClusterService` object. One thing to consider, however, is that the resources to be propagated may already exist in the destination clusters.
- If there is a Service existing on the target cluster, there is no need to resynchronize the EndpointSlices exported from this cluster to the cluster. Only synchronize the EndpointSlices received from other clusters.
- If there is no Service on the target cluster, both the Service and the EndpointSlices collected from other clusters need to be synchronized to that cluster.
- There are three ways we can use to propagate `Service` and `EndpointSlice` to the destination clusters:
- Propagated the `Service` and `EndpointSlice` resources by specifying the respective `ResourceBinding`, specify the source clusters in the `.spec.clusters` field of the `ResourceBinding`
- pros:
- Ability to reuse previous code to a greater extent.
- cons:
- Since the `Service` object has already been propagated to the source clusters by the user, we need to create a new `ResourceBinding` object to propagate it to the destination clusters.
- Propagated the `Service` and `EndpointSlice` resources by specifying the respective set of `Work`s in the namespaces that correspond to the source clusters.
- pros:
- Clear implementation logic.
- cons:
- Less reuse of code logic, `Work` object needs to be created one by one.
- Modify the `.spec.propagateDeps` field of the ResourceBinding associated with the target `Service` object to true, enable the dependency distributor capability, and add the `EndpointSlice` resource to the [InterpretDependency](https://karmada.io/docs/next/userguide/globalview/customizing-resource-interpreter/#interpretdependency) resource interpreter of the `Service` resource.
- pros:
- Code logic reuse is large, do not need to watch ResourceBinding resource changes.
- cons:
- The controller need to enable the dependency distributor capability of the target `Service` object and maintain it.
- Since the `Service` object has already been propagated to the source clusters by the user, we need to create a new `ResourceBinding` object to propagate it to the destination clusters.
MCSEndpointSliceCollectedCondtionType = "EndpointSliceCollected"
We need to choose a way, or provide new ideas, to accomplish the propagation of `Service` and `EndpointSlice` resources.
MCSEndpointSliceAppliedCondtionType = "EndpointSliceApplied"
```
Taken together, we can propagate `Service` and `EndpointSlice` to the destination clusters with the help of `ResourceBinding`.
`MCSServiceAppliedConditionType` is used to record the status of `Service` propagation, for example:
```yaml
status:
conditions:
- lastTransitionTime: "2023-11-20T02:30:49Z"
message: Service is propagated to target clusters.
reason: ServiceAppliedSuccess
status: "True"
type: ServiceApplied
```
![image](statics/design-03.png)
`MCSEndpointSliceCollectedConditionType` is used to record the status of `EndpointSlice` collection, for example:
```yaml
status:
conditions:
- lastTransitionTime: "2023-11-20T02:30:49Z"
message: Failed to list&watch EndpointSlice in member3.
reason: EndpointSliceCollectedFailed
status: "False"
type: EndpointSliceCollected
```
At this point, the entire process is complete, and `foo` Service can now be accessed across clusters.
`MCSEndpointSliceAppliedConditionType` is used to record the status of `EndpointSlice` synchronization, for example:
```yaml
status:
conditions:
- lastTransitionTime: "2023-11-20T02:30:49Z"
message: EndpointSlices are propagated to target clusters.
reason: EndpointSliceAppliedSuccess
status: "True"
type: EndpointSliceApplied
```
![image](statics/access.png)
### Metrics Record
For better monitoring, we should have following metrics:
* `mcs_sync_svc_duration_seconds` - The duration of syncing `Service` from Karmada control plane to member clusters.
* `mcs_sync_eps_duration_seconds` - The time it takes from detecting the EndpointSlice to creating/updating the corresponding Work in a specific namespace.
### Development Plan
* API definition, including API files, CRD files, and generated code. (1d)
* For `multiclusterservice` controller, List&watch mcs and service, reconcile the work in execution namespace. (5d)
* For `multiclusterservice` controller, List&watch cluster creation/deletion, reconcile the work in corresponding cluster execution namespace. (10)
* For `endpointsliceCollect` controller, List&watch mcs, collect the corresponding EndpointSlice from `serviceProvisionClusters`, and `endpointsliceDispatch` controller should sync the corresponding Work. (5d)
* For `endpointsliceCollect` controller, List&watch cluster creation/deletion, reconcile the EndpointSlice's work in corresponding cluster execution namespace. (10d)
* If cluster gets unhealth, mcs-eps-controller should delete the EndpointSlice from all the cluster execution namespace. (5d)
### Test Plan

Binary file not shown.

Before

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 15 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 19 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 58 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 57 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 18 KiB