13 KiB
apiserver Architecture
1. Server Composition
The kube-apiserver binary is not one server, but a server chain of three distinct
GenericAPIServer instances: the core Kubernetes API, API extensions (CRDs), and the aggregation
layer.
This composition is managed by a layered configuration system, starting with command-line
flags parsed by options structs (e.g., RecommendedOptions in pkg/server/options), which
populate a Config object that is then used to instantiate the GenericAPIServer
instances. The construction of this delegation chain can be found in the CreateServerChain
function in cmd/kube-apiserver/app/server.go.
graph TD
subgraph Incoming Request
direction LR
A[User/Client] --> B{/apis/apps/v1/deployments};
end
subgraph kube-apiserver process
direction LR
B --> C[Aggregator Server];
C -- Not an APIService --> D[Kube API Server];
D -- Handles Request --> E[REST Storage];
C -- Is an APIService --> F[Proxy to Extension API Server];
D -- Not a Core API --> G[API Extensions Server];
G -- Handles CRD --> E;
end
-
Aggregator Server (
kube-aggregator):- Purpose: Handles the
apiregistration.k8s.ioAPI and acts as a reverse proxy for extension API servers. This functionality was designed to allow third-party APIs to be "aggregated" into the main Kubernetes API server seamlessly. - Mechanism: It watches
APIServiceobjects. When a request arrives (e.g., for/apis/mycompany.com/v1/myresources), it checks if anAPIServicehas "claimed" that path. If so, it uses aServiceResolverto find the IP of the backingServiceand proxies the request. - Delegation: If no
APIServicematches, it delegates the request to the next server in the chain.
- Purpose: Handles the
-
Kube API Server (Core):
- Purpose: Serves all the built-in Kubernetes APIs (
core/v1,apps/v1, etc.). - Mechanism: This is the main server, configured with all the core REST storage strategies.
- Delegation: If a request is for a path that is not a core API (e.g., for a CRD), it delegates the request to the next server in the chain.
- Purpose: Serves all the built-in Kubernetes APIs (
-
API Extensions Server (
apiextensions-apiserver):- Purpose: Handles the
apiextensions.k8s.ioAPI, which managesCustomResourceDefinitionobjects. The evolution of CRDs from a simple extension mechanism to a feature-rich system with validation, versioning, and defaulting is documented in a series of KEPs, starting with the graduation to GA in Kubernetes v1.16. - Mechanism: When a CRD is created, this server dynamically creates and installs a new REST storage handler for the new resource, making it immediately available.
- Delegation: It is the end of the chain. If it cannot handle a request, a
404 Not Foundis returned.
- Purpose: Handles the
2. Handler Chain
Every request flows through a standard chain of HTTP handlers (filters). The request body
is not deserialized until it has passed authentication and authorization. The default handler
chain is constructed by the DefaultBuildHandlerChain function in
staging/src/k8s.io/apiserver/pkg/server/config.go.
sequenceDiagram
participant Client
participant Handler Chain
participant Authentication
participant Authorization
participant Priority and Fairness
participant Admission Control
participant REST Endpoint
Client->>Handler Chain: Request
Handler Chain->>Authentication: Authenticate
Authentication-->>Handler Chain: User Info
Handler Chain->>Authorization: Authorize
Authorization-->>Handler Chain: Allowed/Denied
Handler Chain->>Priority and Fairness: Classify & Queue
Priority and Fairness-->>Handler Chain: Proceed
Handler Chain->>Admission Control: Mutate & Validate
Admission Control-->>Handler Chain: Object OK
Handler Chain->>REST Endpoint: Handle
REST Endpoint-->>Handler Chain: Response
Handler Chain-->>Client: Response
The handler chain consists of the following stages:
- Authentication (
pkg/authentication): This filter identifies the user. The system is pluggable and composed of multiple authenticators (e.g., client certs, bearer tokens, OIDC). The identity of the user is determined by the first authenticator in the chain that successfully identifies the user. - Authorization (
pkg/authorization): This filter checks if the user is permitted to perform the action. This system is also pluggable and composed of multiple authorizers (e.g., RBAC, Node, Webhook). Each authorizer may respond with either allow, deny, or no opinion. If the response is no opinion, the request is passed to the next authorizer in the chain. - Priority and Fairness (
pkg/util/flowcontrol): This subsystem manages request concurrency, classifying requests intoFlowSchemas andPriorityLevels to prevent overload. This feature was introduced to prevent high traffic from overwhelming the API server and to ensure that critical cluster operations are not starved. - Admission Control (
pkg/admission): This is the primary mechanism for policy enforcement. It is only at this stage that the request body is deserialized into an object. It is a chain of plugins that can mutate or validate an object. The built-in Pod Security admission controller is a key example of this, enforcing Pod Security Standards at the namespace level. - REST Endpoint Handling (
pkg/endpoints): The request is finally dispatched to the appropriate REST handler, which is installed by theAPIInstaller.
3. API Group Registration
The high-level steps for introducing an API are:
- Define Types: Create or modify the Go structs in the
types.gofile for the API group. - Generate Code: Use the code generators provided by the Kubernetes project to create the required boilerplate methods for deep-copy, conversion, and defaulting.
- Implement the
Strategy: Write the custom business logic and validation for the resource in itsStrategyobject. - Register and Install: Create the
APIGroupInfostruct, bundling theSchemeand theStrategy-configured storage, and pass it to theGenericAPIServer'sInstallAPIGroupmethod.
The API Group Registry
The runtime.Scheme acts as a central registry for an API group's type information. A single
Scheme object is created for each API group and is responsible for the following key
capabilities:
-
Type Registration and Mapping: The
Scheme's primary role is to map a GroupVersionKind (GVK) to its corresponding Go type and back. This process also relies on thedeepcopy-gentool to createDeepCopy()methods for each type, which is critical for ensuring that objects returned from caches are never modified directly. -
API Conversion: The
Schemestores the conversion functions that translate objects between different API versions. These functions are typically generated by theconversion-gentool and enable the hub-and-spoke model. -
Defaulting: The
Schemeregisters defaulting functions that populate optional fields in an object. These are usually generated by thedefaulter-gentool. -
Declarative Validation: The
Schemecan store and execute code-generated validation functions, providing a baseline level of validation. This is distinct from the primary, handwritten business logic validation, which is handled by theStrategyobject.
The APIGroupInfo Struct and Strategy Object
With a populated Scheme, the API group is registered with the GenericAPIServer by bundling
the Scheme with the storage backend and versioning information into an APIGroupInfo struct.
graph TD
subgraph Server Configuration
A[APIGroupInfo for apps v1];
A --> B{Scheme: Knows Deployment v1};
A --> C{Storage: deployments RESTStorage};
A --> D{Version Priority: v1, v1beta1};
end
subgraph RESTStorage Implementation
C --> E[genericregistry.Store];
E --> F[etcd client];
E --> G[Deployment Strategy];
end
subgraph Server Runtime
H[GenericAPIServer] -- InstallAPIGroup --> I[APIInstaller];
I -- Uses --> A;
I --> J{Register /apis/apps/v1/deployments};
J --> K[HTTP Handler];
K -- On Request --> C;
end
The registration process follows these steps:
-
APIGroupInfoConstruction: For each API group, anAPIGroupInfostruct is created, which contains the populatedScheme, a map of resources to their storage implementations, and an ordered list of Version Priority. -
REST Storage Instantiation: For each resource, a
genericregistry.Storeis created. It is configured with a resource-specificStrategyobject that contains the core business logic (e.g., handwritten validation). -
API Group Installation: The
GenericAPIServer'sInstallAPIGroupmethod takes theAPIGroupInfoand uses anAPIInstallerto expose the resources as HTTP endpoints.
4. Watch Cache
To handle the high volume of watch requests from controllers without overwhelming etcd, the
apiserver uses a watch cache. The implementation can be found in
staging/src/k8s.io/apiserver/pkg/storage/cacher/.
- Initialization: The cacher first performs a
LISTto get the current state of all objects and aResourceVersionfor that point-in-time. It then starts aWATCHfrom that version to ensure a consistent stream of events. - Serving from Cache: Most list and watch requests are served from this in-memory cache, which dramatically reduces the load on etcd. Consistent reads are also served from the cache. This is achieved by first fetching the revision number of the latest write from etcd. The server then ensures the cache is at least that recent—waiting for it to refresh if necessary—before serving the request.
- Fallback to Storage: If a client request cannot be served from the cache's buffer, the request "falls through" to the underlying etcd storage.
- Bookmarks: The cacher uses bookmark events to track the latest
ResourceVersionfor unchanged objects. This prevents the cache'sResourceVersionfrom becoming too old, which avoids the need for expensive relist operations from etcd when the objects have not been modified.
5. Conflict Resolution
- Optimistic Concurrency via
resourceVersion: Clients are expected to perform updates using a read-modify-write workflow. The apiserver uses theresourceVersionfield of every object to enforce optimistic concurrency. When a client submits an update (PUTorPATCH), it must provide theresourceVersionof the object it based its modifications on. If theresourceVersionon the server does not match, the server rejects the request with a409 Conflicterror. This forces the client to re-read the object, resolve the conflict, and resubmit with the newresourceVersion. - Server-Side Apply: A declarative, "intent-based" patch. The server maintains a
managedFieldssection in the object's metadata to track which "manager" (e.g., a controller) owns each field. This allows multiple actors to manage different parts of the same object without overwriting each other's changes.
6. Discovery and OpenAPI
Apiservers serve the /apis discovery endpoints and the /openapi/v2 and /openapi/v3
specifications. The generation of the OpenAPI specification is a multi-stage process.
openapi-gen: This tool reflects on Go structs, reads godoc comments, and looks at validation struct tags to generate a map of all API definitions.zz_generated.openapi.go: The output is a large Go file containing aGetOpenAPIDefinitionsfunction.- Runtime: The
GenericAPIServercalls this generated function to build the final OpenAPI JSON spec that it serves to clients.
7. Security & Observability
- Audit (
pkg/audit): The apiserver has a policy-driven event logging pipeline. The audit policy controls what is logged and at which stage of a request. - Security:
- mTLS: The primary authentication mechanism for system components.
- Service Account Token Issuance: The
kube-apiserveracts as an OIDC provider, issuing and validating JWTs forServiceAccounts.
8. Streaming Protocols
- Websockets: The apiserver uses websockets to upgrade HTTP
connections for interactive, streaming protocols like
exec,attach, andport-forward. TheUpgradeAwareProxyHandlermanages this process.