client/docs/operations/autoscaling.md

25 lines
1.6 KiB
Markdown

# Autoscaling
The Knative Pod Autoscaler (KPA), provides fast, request-based autoscaling
capabilities. To correctly configure autoscaling to zero for revisions, you must
modify its parameters.
`target` defines how many concurrent requests are wanted at a given time (soft
limit) and is the recommended configuration for autoscaling in Knative.
The `minScale` and `maxScale` annotations can be used to configure the minimum
and maximum number of pods that can serve applications.
You can access autoscaling capabilities by using `kn` to modify Knative services
without editing YAML files directly.
Use the `service create` and `service update` commands with the appropriate
flags to configure the autoscaling behavior.
| Flag | Description |
| :------------------------- | :-------------------------------------------------------------------------------------------------------------------------- |
| `--concurrency-limit int` | Hard limit of concurrent requests to be processed by a single replica. |
| `--scale-target int` | Recommendation for when to scale up based on the concurrent number of incoming requests. Defaults to `--concurrency-limit`. |
| `--scale-max int` | Maximum number of replicas. |
| `--scale-min int` | Minimum number of replicas. |