client/docs/operations/autoscaling.md

1.6 KiB

Autoscaling

The Knative Pod Autoscaler (KPA), provides fast, request-based autoscaling capabilities. To correctly configure autoscaling to zero for revisions, you must modify its parameters.

target defines how many concurrent requests are wanted at a given time (soft limit) and is the recommended configuration for autoscaling in Knative.

The minScale and maxScale annotations can be used to configure the minimum and maximum number of pods that can serve applications.

You can access autoscaling capabilities by using kn to modify Knative services without editing YAML files directly.

Use the service create and service update commands with the appropriate flags to configure the autoscaling behavior.

Flag Description
--concurrency-limit int Hard limit of concurrent requests to be processed by a single replica.
--scale-target int Recommendation for when to scale up based on the concurrent number of incoming requests. Defaults to --concurrency-limit.
--scale-max int Maximum number of replicas.
--scale-min int Minimum number of replicas.