Signed-off-by: yanggang <gang.yang@daocloud.io> |
||
---|---|---|
.. | ||
Alternatives-before-API-server-Architecture.drawio | ||
Alternatives-before-API-server-Architecture.svg | ||
Design-Details-Architecture.drawio | ||
Design-Details-Architecture.svg | ||
README.md | ||
cache.drawio | ||
cache.svg | ||
user-story-service-discovery.drawio | ||
user-story-service-discovery.svg | ||
with-karmada.drawio | ||
with-karmada.svg | ||
without-karmada.drawio | ||
without-karmada.svg |
README.md
title | authors | reviewers | approvers | creation-date | ||||
---|---|---|---|---|---|---|---|---|
Resource Aggregation Proxy |
|
|
|
2022-6-24 |
Resource Aggregation Proxy
Summary
Before using Karmada, there is only one kubernetes cluster per IDC. The deployment system, monitor & alarm system, service discovery and various other systems access raw kubernetes resources directly, including workload, configmap, secret, pod, node.
We can't access pod and node information from Karmada API server. So after introducing Karmada, the PaaS components can no longer access pod and node resources. This will break the PaaS components.
In order to solve the problem, we can either:
- Modify all the PaaS components. Force all of them understand Karmada's logic, and find the pod node in the appropriate cluster by themselves.
- Provide a Resource Aggregation Proxy. All PaaS components just ask the proxy to give them the pod node information. The proxy will find the right clusters to get data, merge data, then return it to the client.
Apparently, using the proxy will greatly simplify external components' access to kubernetes clusters, and hide multiple cluster's implementation details from them. That's what we propose.
Try these:
- Aggregated Kubernetes API Endpoint This component is built for Unified authentication. The restful api client still needs to enumerate all the member clusters to get the right information.
- Caching (aka karmada-search) This component can only get/list, not watch or update data.
Motivation
We want to add a resource aggregation proxy feature to karmada search
. It will provide native kubernetes API for components outside kubernetes. When the client requests operations (including reads & writes), the proxy will figure out the correct downstream kubernetes API servers (including Karmada API server and member clusters), request all the selected servers, and merge the responses into one before responding to the client. External components can treat Karmada-managed kubernetes clusters as one single cluster.
Goals
Allow kubernetes API clients (kubectl, client-go, and other clients that use kubernetes APIs) to read/write the following resources:
- All resources in controller panel, such as deployments, jobs. Supporting all the request behaviors like
karmada-apiserver
. - Pods/nodes/events/endpointslices in member clusters. Supporting all the request behaviors but
create
. - Customer resources in member clusters,are limited supported.
Non-Goals
Risks and Mitigations
- This feature aims to build a cache to store arbitrary resources from multiple member clusters. And these resources are exposed by
search/proxy
REST APIs. If a user has access privilege tosearch/proxy
, they can directly access the cached resource without routing their request to the member clusters. - As previously mentioned, the resource query request will not be routed to the member clusters. So if a secret is cached in the Karmada control plane but a user in the member cluster cannot access it via member cluster's apiserver due to RBAC privilege limitations, they can still access the secret through the Karmada control plane.
- This feature is designed for administrators who needs to query and view the resources in multiple clusters, not designed for the end users. Exposing this API to the end users may cause end users to be able to view resources that do not belong to them.
Proposal
User Stories
Story 1: Service Discovery
Not everyone uses kubernetes Services for service discovery. We use a private service discovery architecture to accommodate both containerized and traditional applications deployed on physical machines.
Its main workflow can be summarized as follows:
- Service discovery controller list and watch pods from clusters.
- When pod ip changed, controller will update the corresponding application's routing metadata.
- Then controller will mark the processed pod by patch a custom annotation to this pod.
The controller ignores pods with this annotation to avoid redundant routing metadata change. Workload redeployment or other operations that cause pods to be deleted/replaced will automatically remove the added annotation, and make the pods eligible for the next steps of processing.
We need Karmada to provide following functionality:
- list/watch/patch pod
Story 2: Show Pod Logs
The PaaS platform provides a UI to access pod logs. It looks like the kubernetes dashboard:
- Find a pod in the web UI
- Click "log" button
- Check out the logs.
We need Karmada to provide following functionality:
- list all pods
- access pod logs
Notes/Constraints/Caveats
fieldSelectors
andtable print
(used for kubectl get) are not supported for CRs. Usually, these are hard coded.
Design Details
Architecture
We add two modules in karmada search component:
-
Router
determines how to direct request from client: to cache, to cluster directly, to karmada apiserver, or to search module. -
Cache
start lists/watch resource from member clusters, and stores data locally, for accelerating member cluster resource accessing.
Define the Scope of Resources to be Proxied
Resources defined in ResourceRegistry
will be proxied in member clusters.
Request Routing
Clients call kubernetes native APIs with search group prefix in url: /apis/search.karmada.io/v1alpha1/proxying
Router
module removes the prefix from the URL and routes the request by resource type:
- If requested resource is defined in ResourceRegistry:
- Read operation, like get/list/watch: Read data from cache, merge the results, and then return them.
- Write operation, like update/patch/delete: Find the right member cluster by query cache, then send request to corresponding member cluster API server.
- Others: Redirect the request to Karmada api-server.
Example
List pods
Client send request with /apis/search.karmada.io/v1alpha1/proxying/api/v1/namespaces/default/pods
.
Router will return the pod list from cache.
Update pod
Client send request with PUT /apis/search.karmada.io/v1alpha1/proxying/api/v1/namespaces/default/pods/foo
.
Router queries foo pod from cache, finding it in cluster1, then redirects PUT /api/v1/namespaces/default/pods/foo
to cluster1.
Log pod
Client send request with GET /apis/search.karmada.io/v1alpha1/proxying/api/v1/namespaces/default/pods/foo/log?container=bar
.
Router queries foo pod from cache, finding it in cluster1, then redirects GET /api/v1/namespaces/default/pods/foo/log?container=bar
to cluster1.
Caching resource from clusters
A Cacher
is used for caching one type resource from one member cluster. Cachers list & watch resource from member clusters and store it in memory. When client request a query request (get, list or watch) for one resource, MUX
fetch it from multiple cachers and merge the result response to client.
How to Merge Result
Below describes the merge strategy according to requests:
- Get request: return first result.
- List request: concat the
items
in list object. - Watch request: return resource event one by one.
Global Resource Version
resourceVersion
is useful in list&watch
mechanism. Every cluster maintains independent version (indeed it's the modified index in etcd). To build a global resource version, we merge all clusters' ResourceVersion into one JSON struct, base64.RawURLEncoding
it, and get the final string.
base64.RawURLEncoding({"cluster1":"1234","cluster2":"5678"}) = "eyJjbHVzdGVyMSI6IjEyMzQiLCJjbHVzdGVyMiI6IjU2NzgifQ"
Example
There are two member clusters: cluster1 and cluster2.
cluster1 | cluster2 | |
---|---|---|
resourceVersion | 1234 | 5678 |
pods | nginx-1, nginx-2 | nginx-3, nginx-4 |
resourceVersion
is the latest version in cluster. When list resource from cluster, apiserver will response with resourceVersion. e.g.kind: PodList, apiVersion: v1, metadata: resourceVersion: 1234, items: [...]
List Pod
Response:
apiVersion: v1
kind: List
metadata:
# eyJjbHVzdGVyMSI6IjEyMzQiLCJjbHVzdGVyMiI6IjU2NzgifQ is result of
# base64.RawURLEncoding("{"cluster1":"1234","cluster2":"5678"}")
resourceVersion: eyJjbHVzdGVyMSI6IjEyMzQiLCJjbHVzdGVyMiI6IjU2NzgifQ
items:
- apiVersion: v1
kind: Pod
metadata:
name: nginx-1
...
- apiVersion: v1
kind: Pod
metadata:
name: nginx-2
...
- apiVersion: v1
kind: Pod
metadata:
name: nginx-3
...
- apiVersion: v1
kind: Pod
metadata:
name: nginx-4
...
Get Pod
Response:
apiVersion: v1
kind: Pod
metadata:
name: nginx-2
namespace: default
resourceVersion: eyJjbHVzdGVyMSI6IjEyMzQiLCJjbHVzdGVyMiI6IjU2NzgifQ
...
Watch Pod
Response:
type: UPDATED
object:
apiVersion: v1
kind: Pod
metadata:
name: nginx-2
namespace: default
# nginx-2 is updated and resourceVersion is updated to 1235
# eyJjbHVzdGVyMSI6IjEyMzUiLCJjbHVzdGVyMiI6IjU2NzgifQ is result of
# base64.RawURLEncoding("{"cluster1":"1235","cluster2":"5678"}")
resourceVersion: eyJjbHVzdGVyMSI6IjEyMzUiLCJjbHVzdGVyMiI6IjU2NzgifQ
...
Resource name conflicting across clusters
If there are two or more resources with same name in member clusters, then client request with the name, proxy don't know which resource to request.
Users shall guarantee there is no name conflicting across clusters by their strategies, e.g. overrides the deployment name in different clusters. Or They shall guarantee no named request will be called to these resources.
Authorization
Karmada-apiserver
determines whether a request is allowed or denied, just like kubernetes.
What can it do?
- Limit requests access by verb, apiGroup, resource
What can't it do?
- Deny accessing pods in some clusters. It just like kubernetes can't deny accessing pods in some nodes.
Test Plan
Alternatives
Placement for proxy
There are two places to put proxy.
(1) Put it before API server. This requires proxy itself to Authenticate.
(2) Put it behind API server as Aggregated API server. Let API server and proxy Authenticate cooperatively.
Comparison
before API server | behind API server | |
---|---|---|
url can be the same as kubernetes api server | ⭐️ ✔ | ✘ |
supported Authentication strategies | X509 Client Certs | ⭐️ All k8s supported Authentication strategies: X509 Client Certs, Static Token, Bootstrap Token, Service Account Token, OpenID Connect Token, Webhook Token Authentication, Authenticating Proxy |
performance | ⭐️ Higher. Much shorter api request path: client => proxy => kube-apiserver | Lower. Longer api request path: client => kube-apiserver => proxy => kube-apiserver |
development complexity | High. In order to support all kubernetes Authentication strategies, we need to include kubernetes Authentication module's code, and update proxy when kubernetes updates. | ⭐️ Low. Only need to write an aggregated API server. |
Conclusion
Some users may need to use Authentication strategies other than X509 Client Certs, but implement it in proxy itself will be too complex. So we consider putting the proxy behind API server a better option.
Cacher V.S. Informer
Both k8s.io/apiserver/pkg/storage/cacher.Cacher and k8s.io/client-go/informers.GenericInformer provide resource cache and event.
Comparison
Cacher | Informer | |
---|---|---|
how to sync from datasource | List/Watch | List/Watch |
Get/List with resourceVersion | ✔support semantics of resourceVersion | ✘always from cache, equivalent to resourceVersion="0" |
List with resourceVersionMatch | ✔supported | ✘not supported |
List with labelSelector | ✔supported | ✔supported |
List with filedSelector | ✔supported | ✘not supported |
watch/ watch with allowWatchBookmarks | ✔supported | ✘not supported |
Conclusion
Cacher
is naturally designed for k8s API, and more matching than informer.