add stable window documentatino to "scale-bounds" (#4514)

* add stable window documentatino to "scale-bounds" * moving scaleDownDelayBack and referencing panic mode * Update docs/serving/autoscaling/scale-bounds.md Co-authored-by: Samia Nneji <snneji@vmware.com> * Update docs/serving/autoscaling/scale-bounds.md Co-authored-by: Samia Nneji <snneji@vmware.com> * Update docs/serving/autoscaling/scale-bounds.md Co-authored-by: Julian Friedman <julz.friedman@uk.ibm.com> * use better formatting for kpa only note * Update docs/serving/autoscaling/scale-bounds.md Co-authored-by: Samia Nneji <snneji@vmware.com> * Update docs/serving/autoscaling/scale-bounds.md Co-authored-by: Samia Nneji <snneji@vmware.com> Co-authored-by: Samia Nneji <snneji@vmware.com> Co-authored-by: Julian Friedman <julz.friedman@uk.ibm.com>
2022-01-17 17:16:29 +01:00 · 2022-01-17 17:16:29 +01:00 · 362c294b52
parent d4902d2b10
commit 362c294b52
1 changed files with 57 additions and 5 deletions
--- a/docs/serving/autoscaling/scale-bounds.md
+++ b/docs/serving/autoscaling/scale-bounds.md
@ -158,10 +158,6 @@ When the Revision is created, the larger of initial scale and lower bound is aut
          allow-zero-initial-scale: "true"
    ```

-
-
-
-
 ## Scale Down Delay

 Scale Down Delay specifies a time window which must pass at reduced concurrency
@ -169,7 +165,9 @@ before a scale-down decision is applied. This can be useful, for example, to
 keep containers around for a configurable duration to avoid a cold start
 penalty if new requests come in. Unlike setting a lower bound, the revision
 will eventually be scaled down if reduced concurrency is maintained for the
-delay period.
+delay period. 
+!!! note 
+    Only supported for the default KPA autoscaler class.

 * **Global key:** `scale-down-delay`
 * **Per-revision annotation key:** `autoscaling.knative.dev/scale-down-delay`
@ -217,4 +215,58 @@ delay period.
        autoscaler:
          scale-down-delay: "15m"
    ```
+## Stable window
+
+The stable window defines the sliding time window over which metrics are averaged to provide the input for scaling decisions when the autoscaler is not in [Panic mode](kpa-specific.md).
+
+* **Global key:** `stable-window`
+* **Per-revision annotation key:** `autoscaling.knative.dev/window`
+* **Possible values:** Duration, `6s` <= value <= `1h`
+* **Default:** `60s`
+
+!!! note
+    During scale down, in most cases the last Replica is removed after there has been no traffic to the Revision for the entire duration of the stable window.
+
+**Example:**
+
+=== "Per Revision"
+    ```yaml
+    apiVersion: serving.knative.dev/v1
+    kind: Service
+    metadata:
+      name: helloworld-go
+      namespace: default
+    spec:
+      template:
+        metadata:
+          annotations:
+            autoscaling.knative.dev/window: "40s"
+        spec:
+          containers:
+            - image: gcr.io/knative-samples/helloworld-go
+    ```
+
+=== "Global (ConfigMap)"
+    ```yaml
+    apiVersion: v1
+    kind: ConfigMap
+    metadata:
+     name: config-autoscaler
+     namespace: knative-serving
+    data:
+     stable-window: "40s"
+    ```
+
+=== "Global (Operator)"
+    ```yaml
+    apiVersion: operator.knative.dev/v1alpha1
+    kind: KnativeServing
+    metadata:
+      name: knative-serving
+    spec:
+      config:
+        autoscaler:
+          stable-window: "40s"
+    ```
+
 ---