[WIP] Improvements and structure changes for autoscaling docs (#2439)

2020-06-04 13:23:18 -05:00 · 2020-06-04 13:23:18 -05:00 · b3ffdbfcb4
parent 7e6a242c45
commit b3ffdbfcb4
24 changed files with 1078 additions and 839 deletions
--- a/docs/README.md
+++ b/docs/README.md
@ -12,7 +12,7 @@ focus on solving mundane but difficult tasks such as:

 - [Deploying a container](./serving/getting-started-knative-app.md)
 - [Routing and managing traffic with blue/green deployment](./serving/samples/blue-green-deployment.md)
- [Scaling automatically and sizing workloads based on demand](./serving/configuring-autoscaling.md)
+- [Scaling automatically and sizing workloads based on demand](./serving/autoscaling)
 - [Binding running services to eventing ecosystems](./eventing/samples/kubernetes-event-source/)

 Developers on Knative can use familiar idioms, languages, and frameworks to
@ -88,7 +88,7 @@ Follow the links below to learn more about Knative.

 ### Samples and demos

- [Autoscaling](./serving/samples/autoscale-go/README.md)
+- [Autoscaling](./serving/autoscaling/autoscale-go/)
 - [Binding running services to eventing ecosystems](./eventing/samples/kubernetes-event-source/)
 - [Telemetry](./serving/samples/telemetry-go/README.md)
 - [REST API sample](./serving/samples/rest-api-go/README.md)
--- a/docs/serving/README.md
+++ b/docs/serving/README.md
@ -5,7 +5,7 @@ and scales to support advanced scenarios.
 The Knative Serving project provides middleware primitives that enable:

 - Rapid deployment of serverless containers
- Automatic scaling up and down to zero
+- [Automatic scaling up and down to zero](./autoscaling/README.md)
 - Routing and network programming for Istio components
 - Point-in-time snapshots of deployed code and configurations

@ -36,7 +36,7 @@ serverless workload behaves on the cluster:
  are immutable objects and can be retained for as long as useful. Knative
  Serving Revisions can be automatically scaled up and down according to
  incoming traffic. See
-  [Configuring the Autoscaler](./configuring-autoscaling.md) for more
+  [Configuring the Autoscaler](./autoscaling) for more
  information.

 ![Diagram that displays how the Serving resources coordinate with each other.](https://github.com/knative/serving/raw/master/docs/spec/images/object_model.png)
@ -83,5 +83,3 @@ in the Knative Serving repository.

 See the [Knative Serving Issues](https://github.com/knative/serving/issues) page
 for a full list of known issues.
-
-
--- a/docs/serving/autoscaling/README.md
+++ b/docs/serving/autoscaling/README.md
@ -0,0 +1,18 @@
+One of the main features of Knative is automatic scaling of replicas for an application to closely match incoming demand, including scaling applications to zero if no traffic is being received.
+Knative Serving enables this by default, using the Knative Pod Autoscaler (KPA).
+The Autoscaler component watches traffic flow to the application, and scales replicas up or down based on configured metrics.
+
+Knative services default to using autoscaling settings that are suitable for the majority of use cases. However, some workloads may require a custom, more finely-tuned configuration.
+This guide provides information about configuration options that you can modify to fit the requirements of your workload.
+
+For more information about how autoscaling for Knative works, see the [Autoscaling concepts](./autoscaling-concepts.md) documentation.
+
+For more information about which metrics can be used to control the Autoscaler, see the [metrics](./autoscaling-metrics.md) documentation.
+
+## Optional autoscaling configuration tasks
+
+* Configure your Knative deployment to use the Kubernetes [Horizontal Pod Autoscaler (HPA)](../../install/any-kubernetes-cluster.md#optional-serving-extensions) instead of the default KPA.
+* Disable scale to zero functionality for your cluster ([global configuration only](./scale-to-zero.md)).
+* Configure the [type of metrics](./autoscaling-metrics.md) your Autoscaler consumes.
+* Configure [concurrency limits](./concurrency.md) for applications.
+* Try out the [Go Autoscale Sample App](./autoscale-go/README.md).
--- a/docs/serving/autoscaling/_index.md
+++ b/docs/serving/autoscaling/_index.md
@ -0,0 +1,10 @@
+---
+title: "Autoscaling"
+linkTitle: "Autoscaling"
+weight: 20
+type: "docs"
+aliases:
+    - /docs/serving/configuring-autoscaling/
+---
+
+{{% readfile file="README.md" %}}
--- a/docs/serving/autoscaling/autoscale-go/Dockerfile
+++ b/docs/serving/autoscaling/autoscale-go/Dockerfile
--- a/docs/serving/autoscaling/autoscale-go/OWNERS
+++ b/docs/serving/autoscaling/autoscale-go/OWNERS
--- a/docs/serving/autoscaling/autoscale-go/README.md
+++ b/docs/serving/autoscaling/autoscale-go/README.md
--- a/docs/serving/autoscaling/autoscale-go/autoscale.go
+++ b/docs/serving/autoscaling/autoscale-go/autoscale.go
--- a/docs/serving/autoscaling/autoscale-go/index.md
+++ b/docs/serving/autoscaling/autoscale-go/index.md
@ -0,0 +1,10 @@
+---
+title: "Autoscale Sample App - Go"
+linkTitle: "Autoscale Sample App - Go"
+weight: 100
+type: "docs"
+aliases:
+    - /docs/serving/samples/autoscale-go
+---
+
+{{% readfile file="README.md" %}}
--- a/docs/serving/autoscaling/autoscale-go/request-dashboard.png
+++ b/docs/serving/autoscaling/autoscale-go/request-dashboard.png
--- a/docs/serving/autoscaling/autoscale-go/scale-dashboard.png
+++ b/docs/serving/autoscaling/autoscale-go/scale-dashboard.png
--- a/docs/serving/autoscaling/autoscale-go/service.yaml
+++ b/docs/serving/autoscaling/autoscale-go/service.yaml
--- a/docs/serving/autoscaling/autoscale-go/test/test.go
+++ b/docs/serving/autoscaling/autoscale-go/test/test.go
--- a/docs/serving/autoscaling/autoscaling-concepts.md
+++ b/docs/serving/autoscaling/autoscaling-concepts.md
@ -0,0 +1,135 @@
+---
+title: "Autoscaling concepts"
+linkTitle: "Autoscaling concepts"
+weight: 01
+type: "docs"
+---
+
+This section covers conceptual information about which Autoscaler types are supported, as well as fundamental information about how autoscaling is configured.
+
+## Supported Autoscaler types
+
+Knative Serving supports the implementation of Knative Pod Autoscaler (KPA) and Kubernetes' Horizontal Pod Autoscaler (HPA). The features and limitations of each of these Autoscalers are listed below.
+
+**IMPORTANT:** If you want to use Kubernetes Horizontal Pod Autoscaler (HPA), you must install it after you install [Knative Serving](../../install/any-kubernetes-cluster.md#optional-serving-extensions).
+
+### Knative Pod Autoscaler (KPA)
+
+* Part of the Knative Serving core and enabled by default once Knative Serving is installed.
+* Supports scale to zero functionality.
+* Does not support CPU-based autoscaling.
+
+### Horizontal Pod Autoscaler (HPA)
+
+* Not part of the Knative Serving core, and must be enabled after [Knative Serving installation](../../install/any-kubernetes-cluster.md#optional-serving-extensions).
+* Does not support scale to zero functionality.
+* Supports CPU-based autoscaling.
+
+### Configuring the Autoscaler implementation
+
+The type of Autoscaler implementation (KPA or HPA) can be configured by using the `class` annotation.
+
+* **Global settings key:** `pod-autoscaler-class`
+* **Per-revision annotation key:** `autoscaling.knative.dev/class`
+* **Possible values:** `"kpa.autoscaling.knative.dev"` or `"hpa.autoscaling.knative.dev"`
+* **Default:** `"kpa.autoscaling.knative.dev"`
+
+**Example:**
+{{< tabs name="class" default="Per Revision" >}}
+{{% tab name="Per Revision" %}}
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: helloworld-go
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/class: "kpa.autoscaling.knative.dev"
+    spec:
+      containers:
+        - image: gcr.io/knative-samples/helloworld-go
+```
+{{< /tab >}}
+{{% tab name="Global (ConfigMap)" %}}
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: config-autoscaler
+ namespace: knative-serving
+data:
+ pod-autoscaler-class: "kpa.autoscaling.knative.dev"
+```
+{{< /tab >}}
+{{% tab name="Global (Operator)" %}}
+```yaml
+apiVersion: operator.knative.dev/v1alpha1
+kind: KnativeServing
+metadata:
+  name: knative-serving
+spec:
+  config:
+    autoscaler:
+      pod-autoscaler-class: "kpa.autoscaling.knative.dev"
+```
+{{< /tab >}}
+{{< /tabs >}}
+
+## Global versus per-revision settings
+
+Configuring for autoscaling in Knative can be set using either global or per-revision settings.
+
+1. If no per-revision autoscaling settings are specified, the global settings will be used.
+1. If per-revision settings are specified, these will override the global settings when both types of settings exist.
+
+### Global settings
+
+Global settings for autoscaling are configured using the `config-autoscaler` ConfigMap. If you installed Knative Serving using the Operator, you can set global configuration settings in the `spec.config.autoscaler` ConfigMap, located in the `KnativeServing` custom resource (CR).
+
+#### Example of the default autoscaling ConfigMap
+
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: config-autoscaler
+ namespace: knative-serving
+data:
+ container-concurrency-target-default: "100"
+ container-concurrency-target-percentage: "0.7"
+ enable-scale-to-zero: "true"
+ max-scale-up-rate: "1000"
+ max-scale-down-rate: "2"
+ panic-window-percentage: "10"
+ panic-threshold-percentage: "200"
+ scale-to-zero-grace-period: "30s"
+ scale-to-zero-pod-retention-period: "0s"
+ stable-window: "60s"
+ target-burst-capacity: "200"
+ requests-per-second-target-default: "200"
+```
+
+### Per-revision settings
+
+Per-revision settings for autoscaling are configured by adding _annotations_ to a revision.
+
+### Example
+
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: helloworld-go
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/target: "70"
+```
+
+**IMPORTANT:** If you are creating revisions by using a service or configuration, you must set the annotations in the _revision template_ so that any modifications will be applied to each revision as they are created.
+Setting annotations in the top level metadata of a single revision will not propagate the changes to other revisions and will not apply changes to the autoscaling configuration for your application.
--- a/docs/serving/autoscaling/autoscaling-metrics.md
+++ b/docs/serving/autoscaling/autoscaling-metrics.md
@ -0,0 +1,74 @@
+---
+title: "Metrics"
+linkTitle: "Metrics"
+weight: 03
+type: "docs"
+---
+
+The metric configuration defines which metric type is watched by the Autoscaler.
+
+## Setting metrics per revision
+
+For [per-revision](./autoscaling-concepts.md) configuration, this is determined using the `autoscaling.knative.dev/metric` annotation.
+The possible metric types that can be configured per revision depend on the type of Autoscaler implementation you are using:
+
+* The default KPA Autoscaler supports the `concurrency` and `rps` metrics.
+* The HPA Autoscaler supports the `concurrency`, `rps` and `cpu` metrics.
+
+<!-- TODO: Add details about different metrics types, how concurrency and rps differ. Explain cpu. -->
+
+For more information about KPA and HPA, see the documentation on [Supported Autoscaler types](./autoscaling-concepts.md).
+
+* **Per-revision annotation key:** `autoscaling.knative.dev/metric`
+* **Possible values:** `"concurrency"`, `"rps"` or `"cpu"`, depending on your Autoscaler type. The `cpu` metric is only supported on revisions with the HPA class.
+* **Default:** `"concurrency"`
+
+{{< tabs name="Examples of configuring metric types per revision" default="Per-revision concurrency configuration" >}}
+{{% tab name="Per-revision concurrency configuration" %}}
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: helloworld-go
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/metric: "concurrency"
+```
+{{< /tab >}}
+{{% tab name="Per-revision rps configuration" %}}
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: helloworld-go
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/metric: "rps"
+```
+{{< /tab >}}
+{{% tab name="Per-revision cpu configuration" %}}
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: helloworld-go
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/metric: "cpu"
+```
+{{< /tab >}}
+{{< /tabs >}}
+
+## Next steps
+
+* Configure [concurrency targets](./concurrency.md) for applications
+* Configure [requests per second targets](./rps-target.md) for replicas of an application
--- a/docs/serving/autoscaling/autoscaling-targets.md
+++ b/docs/serving/autoscaling/autoscaling-targets.md
@ -0,0 +1,81 @@
+---
+title: "Targets"
+linkTitle: "Targets"
+weight: 04
+type: "docs"
+---
+
+Configuring a target provide the Autoscaler with a value that it tries to maintain for the configured metric for a revision.
+See the [metrics](./autoscaling-metrics.md) documentation for more information about configurable metric types.
+
+The `target` annotation, used to configure per-revision targets,  is _metric agnostic_. This means the target is simply an integer value, which can be applied for any metric type.
+
+## Configuring targets
+
+* **Global settings key:** `container-concurrency-target-default` for setting a concurrency target, and `requests-per-second-target-default` for setting a requests-per-second (RPS) target. For more information, see the documentation on [metrics](./autoscaling-metrics.md).
+* **Per-revision annotation key:** `autoscaling.knative.dev/target`
+* **Possible values:** An integer (metric agnostic).
+* **Default:** `"100"` for `container-concurrency-target-default`, and `"200"` for `requests-per-second-target-default`. There is no default value set for the `target` annotation.
+
+{{< tabs name="Configuring targets" default="Target annotation - Per-revision" >}}
+{{% tab name="Target annotation - Per-revision" %}}
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: helloworld-go
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/target: "50"
+```
+{{< /tab >}}
+{{% tab name="Concurrency target - Global (ConfigMap)" %}}
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: config-autoscaler
+ namespace: knative-serving
+data:
+ container-concurrency-target-default: "200"
+```
+{{< /tab >}}
+{{% tab name="Concurrency target - Container Global (Operator)" %}}
+```yaml
+apiVersion: operator.knative.dev/v1alpha1
+kind: KnativeServing
+metadata:
+  name: knative-serving
+spec:
+  config:
+    autoscaler:
+      container-concurrency-target-default: "200"
+```
+{{< /tab >}}
+{{% tab name="Requests per second (RPS) target - Global (ConfigMap)" %}}
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: config-autoscaler
+ namespace: knative-serving
+data:
+ requests-per-second-target-default: "150"
+```
+{{< /tab >}}
+{{% tab name="Requests per second (RPS) target - Global (Operator)" %}}
+```yaml
+apiVersion: operator.knative.dev/v1alpha1
+kind: KnativeServing
+metadata:
+  name: knative-serving
+spec:
+  config:
+    autoscaler:
+      requests-per-second-target-default: "150"
+```
+{{< /tab >}}
+{{< /tabs >}}
--- a/docs/serving/autoscaling/concurrency.md
+++ b/docs/serving/autoscaling/concurrency.md
@ -0,0 +1,160 @@
+---
+title: "Configuring concurrency"
+linkTitle: "Configuring concurrency"
+weight: 40
+type: "docs"
+---
+
+Concurrency determines the number of simultaneous requests that can be processed by each replica of an application at any given time.
+<!-- this is where including files would be useful. We could create a concurrency global config module and insert it here, in the docs for metrics, and in the docs for targets. Showing the correct information each time instead of having it in one place with the per revision config jumbled in with it makes it easier to understand IMHO, and would mean users don't need to visit different pages or hunt for the same information for similar user stories @abrennan89.-->
+For per-revision concurrency, you must configure both `autoscaling.knative.dev/metric`and `autoscaling.knative.dev/target` for a [soft limit](#soft-limit), or `containerConcurrency` for a [hard limit](#hard-limit).
+
+For global concurrency, you can set the `container-concurrency-target-default` value.
+
+## Soft versus hard concurrency limits
+
+It is possible to set either a _soft_ or _hard_ concurrency limit.
+
+**NOTE:** If both a soft and a hard limit are specified, the smaller of the two values will be used. This prevents the Autoscaler from having a target value that is not permitted by the hard limit value.
+
+The soft limit is a targeted limit rather than a strictly enforced bound. In some situations, particularly if there is a sudden _burst_ of requests, this value can be exceeded.
+
+The hard limit is an enforced upper bound.
+If concurrency reaches the hard limit, surplus requests will be buffered and must wait until enough capacity is free to execute the requests.
+
+**IMPORTANT:** Using a hard limit configuration is only recommended if there is a clear use case for it with your application. Having a low hard limit specified may have a negative impact on the throughput and latency of an application, and may cause additional cold starts.
+
+### Soft limit
+
+* **Global key:** `container-concurrency-target-default`
+* **Per-revision annotation key:** `autoscaling.knative.dev/target`
+* **Possible values:** An integer.
+* **Default:** `"100"`
+
+**Example:**
+{{< tabs name="target-concurrency" default="Per Revision" >}}
+{{% tab name="Per Revision" %}}
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: helloworld-go
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/target: "200"
+```
+{{< /tab >}}
+{{% tab name="Global (ConfigMap)" %}}
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: config-autoscaler
+ namespace: knative-serving
+data:
+ container-concurrency-target-default: "200"
+```
+{{< /tab >}}
+{{% tab name="Global (Operator)" %}}
+```yaml
+apiVersion: operator.knative.dev/v1alpha1
+kind: KnativeServing
+metadata:
+  name: knative-serving
+spec:
+  config:
+    autoscaler:
+      container-concurrency-target-default: "200"
+```
+{{< /tab >}}
+{{< /tabs >}}
+
+### Hard limit
+
+The hard limit has no global setting and can only be specified [per revision](./autoscaling-concepts.md).
+This particular per-revision setting is not an annotation; it is present on the revision's spec itself as `containerConcurrency`.
+
+* The default value is `0`, meaning that there is no limit on the number of requests that are allowed to flow into the revision.
+* A value greater than `0` specifies the exact number of requests that are allowed to flow to the replica at any one time.
+
+* **Global key:** No global key.
+* **Per-revision spec key:** `containerConcurrency`
+* **Possible values:** integer
+* **Default:** `0`, meaning no limit
+
+**Example:**
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: helloworld-go
+  namespace: default
+spec:
+  template:
+    spec:
+      containerConcurrency: 50
+```
+
+## Target utilization
+
+In addition to the literal settings explained previously, concurrency values can be further adjusted by using a _target utilization value_.
+
+This value specifies what percentage of the previously specified target should actually be targeted by the Autoscaler.
+This is also known as specifying the _hotness_ at which a replica runs, which causes the Autoscaler to scale up before the defined hard limit is reached.
+
+For example, if `containerConcurrency` is set to 10, and the target utilization value is set to 70 (percent), the Autoscaler will create a new replica when the average number of concurrent requests across all existing replicas reaches 7.
+Requests numbered 7 to 10 will still be sent to the existing replicas, but this allows for additional replicas to be started in anticipation of being needed when the `containerConcurrency` limit is reached.
+
+* **Global key:** `container-concurrency-target-percentage`
+* **Per-revision annotation key:** `autoscaling.knative.dev/targetUtilizationPercentage`
+* **Possible values:** float
+* **Default:** `70`
+
+**Note:** If the activator is in the routing path, it will fully load all replicas up to `containerConcurrency`. It currently does not take target utilization into account.
+
+**Example:**
+{{< tabs name="target-utilization" default="Per Revision" >}}
+{{% tab name="Per Revision" %}}
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: helloworld-go
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/targetUtilizationPercentage: "80"
+    spec:
+      containers:
+        - image: gcr.io/knative-samples/helloworld-go
+```
+{{< /tab >}}
+{{% tab name="Global (ConfigMap)" %}}
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: config-autoscaler
+ namespace: knative-serving
+data:
+ container-concurrency-target-percentage: "80"
+```
+{{< /tab >}}
+{{% tab name="Global (Operator)" %}}
+```yaml
+apiVersion: operator.knative.dev/v1alpha1
+kind: KnativeServing
+metadata:
+  name: knative-serving
+spec:
+  config:
+    autoscaler:
+      container-concurrency-target-percentage: "80"
+```
+{{< /tab >}}
+{{< /tabs >}}
--- a/docs/serving/autoscaling/kpa-specific.md
+++ b/docs/serving/autoscaling/kpa-specific.md
@ -0,0 +1,266 @@
+---
+title: "Additional autoscaling configuration for Knative Pod Autoscaler"
+linkTitle: "Additional autoscaling configuration for Knative Pod Autoscaler"
+weight: 60
+type: "docs"
+---
+
+The following settings are specific to the Knative Pod Autoscaler (KPA).
+
+## Modes
+
+The KPA acts on [metrics](./autoscaling-metrics.md) (`concurrency` or `rps`) aggregated over time-based windows.
+
+These windows define the amount of historical data that the Autoscaler takes into account, and are used to smooth the data over the specified amount of time.
+The shorter these windows are, the more quickly the Autoscaler will react.
+
+The KPA's implementation has two modes: **stable** and **panic**.
+
+Stable mode is used for general operation, while panic mode by default has a much shorter window, and will be used to quickly scale a revision up if a burst of traffic arrives.
+
+**NOTE:** When using panic mode, the revision will not scale down to avoid churn. The Autoscaler will leave panic mode if there has been no reason to react quickly during the stable window's timeframe.
+
+### Stable window
+
+* **Global key:** `stable-window`
+* **Per-revision annotation key:** `autoscaling.knative.dev/window`
+* **Possible values:** Duration, `6s` <= value <= `1h`
+* **Default:** `60s`
+
+**NOTE:** During scale down, the last replica will only be removed after there has not been any traffic to the revision for the entire duration of the stable window.
+
+**Example:**
+{{< tabs name="stable-window" default="Per Revision" >}}
+{{% tab name="Per Revision" %}}
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: helloworld-go
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/window: "40s"
+    spec:
+      containers:
+        - image: gcr.io/knative-samples/helloworld-go
+```
+{{< /tab >}}
+{{% tab name="Global (ConfigMap)" %}}
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: config-autoscaler
+ namespace: knative-serving
+data:
+ stable-window: "40s"
+```
+{{< /tab >}}
+{{% tab name="Global (Operator)" %}}
+```yaml
+apiVersion: operator.knative.dev/v1alpha1
+kind: KnativeServing
+metadata:
+  name: knative-serving
+spec:
+  config:
+    autoscaler:
+      stable-window: "40s"
+```
+{{< /tab >}}
+{{< /tabs >}}
+
+### Panic window
+
+The panic window is defined as a percentage of the stable window to assure that both are relative to each other in a working way.
+
+This value indicates how the window over which historical data is evaluated will shrink upon entering panic mode. For example, a value of `10.0` means that in panic mode the window will be 10% of the stable window size.
+
+* **Global key:** `panic-window-percentage`
+* **Per-revision annotation key:** `autoscaling.knative.dev/panicWindowPercentage`
+* **Possible values:** float, `1.0` <= value <= `100.0`
+* **Default:** `10.0`
+
+**Example:**
+{{< tabs name="panic-window" default="Per Revision" >}}
+{{% tab name="Per Revision" %}}
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: helloworld-go
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/panicWindowPercentage: "20.0"
+    spec:
+      containers:
+        - image: gcr.io/knative-samples/helloworld-go
+```
+{{< /tab >}}
+{{% tab name="Global (ConfigMap)" %}}
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: config-autoscaler
+ namespace: knative-serving
+data:
+ panic-window-percentage: "20.0"
+```
+{{< /tab >}}
+{{% tab name="Global (Operator)" %}}
+```yaml
+apiVersion: operator.knative.dev/v1alpha1
+kind: KnativeServing
+metadata:
+  name: knative-serving
+spec:
+  config:
+    autoscaler:
+      panic-window-percentage: "20.0"
+```
+{{< /tab >}}
+{{< /tabs >}}
+
+### Panic mode threshold
+
+This threshold defines when the Autoscaler will move from stable mode into panic mode.
+
+This value is a percentage of the traffic that the current amount of replicas can handle.
+
+**NOTE:** A value of `100.0` (100 percent) means that the Autoscaler is always in panic mode, therefore the  minimum value should be higher than `100.0`.
+
+The default setting of `200.0` means that panic mode will be start if traffic is twice as high as the current replica population can handle.
+
+* **Global key:** `panic-threshold-percentage`
+* **Per-revision annotation key:** `autoscaling.knative.dev/panicThresholdPercentage`
+* **Possible values:** float, `110.0` <= value <= `1000.0`
+* **Default:** `200.0`
+
+**Example:**
+{{< tabs name="panic-threshold" default="Per Revision" >}}
+{{% tab name="Per Revision" %}}
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: helloworld-go
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/panicThresholdPercentage: "150.0"
+    spec:
+      containers:
+        - image: gcr.io/knative-samples/helloworld-go
+```
+{{< /tab >}}
+{{% tab name="Global (ConfigMap)" %}}
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: config-autoscaler
+ namespace: knative-serving
+data:
+ panic-threshold-percentage: "150.0"
+```
+{{< /tab >}}
+{{% tab name="Global (Operator)" %}}
+```yaml
+apiVersion: operator.knative.dev/v1alpha1
+kind: KnativeServing
+metadata:
+  name: knative-serving
+spec:
+  config:
+    autoscaler:
+      panic-threshold-percentage: "150.0"
+```
+{{< /tab >}}
+{{< /tabs >}}
+
+## Scale rates
+
+These settings control by how much the replica population can scale up or down in a single evaluation cycle.
+
+A minimal change of one replica in each direction is always permitted, so the Autoscaler can scale to +/- 1 replica at any time, regardless of the scale rates set.
+
+### Scale up rate
+
+This setting determines the maximum ratio of desired to existing pods. For example, with a value of `2.0`, the revision can only scale from `N` to `2*N` pods in one evaluation cycle.
+
+* **Global key:** `max-scale-up-rate`
+* **Per-revision annotation key:** n/a
+* **Possible values:** float
+* **Default:** `1000.0`
+
+**Example:**
+{{< tabs name="scale-up-rate" default="Global (ConfigMap)" >}}
+{{% tab name="Global (ConfigMap)" %}}
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: config-autoscaler
+ namespace: knative-serving
+data:
+ max-scale-up-rate: "500.0"
+```
+{{< /tab >}}
+{{% tab name="Global (Operator)" %}}
+```yaml
+apiVersion: operator.knative.dev/v1alpha1
+kind: KnativeServing
+metadata:
+  name: knative-serving
+spec:
+  config:
+    autoscaler:
+      max-scale-up-rate: "500.0"
+```
+{{< /tab >}}
+{{< /tabs >}}
+
+### Scale down rate
+
+This setting determines the maximum ratio of existing to desired pods. For example, with a value of `2.0`, the revision can only scale from `N` to `N/2` pods in one evaluation cycle.
+
+* **Global key:** `max-scale-down-rate`
+* **Per-revision annotation key:** n/a
+* **Possible values:** float
+* **Default:** `2.0`
+
+**Example:**
+{{< tabs name="scale-down-rate" default="Global (ConfigMap)" >}}
+{{% tab name="Global (ConfigMap)" %}}
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: config-autoscaler
+ namespace: knative-serving
+data:
+ max-scale-down-rate: "4.0"
+```
+{{< /tab >}}
+{{% tab name="Global (Operator)" %}}
+```yaml
+apiVersion: operator.knative.dev/v1alpha1
+kind: KnativeServing
+metadata:
+  name: knative-serving
+spec:
+  config:
+    autoscaler:
+      max-scale-down-rate: "4.0"
+```
+{{< /tab >}}
+{{< /tabs >}}
--- a/docs/serving/autoscaling/rps-target.md
+++ b/docs/serving/autoscaling/rps-target.md
@ -0,0 +1,57 @@
+---
+title: "Configuring the requests per second (RPS) target"
+linkTitle: "Configuring the requests per second (RPS) target"
+weight: 50
+type: "docs"
+---
+
+This setting specifies a target for requests-per-second per replica of an application.
+
+* **Global key:** `requests-per-second-target-default`
+* **Per-revision annotation key:** `autoscaling.knative.dev/target` (your revision must also be configured to use the `rps` [metric annotation](./autoscaling-metrics.md))
+* **Possible values:** An integer.
+* **Default:** `"200"`
+
+**Example:**
+{{< tabs name="rps-target" default="Per Revision" >}}
+{{% tab name="Per Revision" %}}
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: helloworld-go
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/target: "150"
+    spec:
+      containers:
+        - image: gcr.io/knative-samples/helloworld-go
+```
+{{< /tab >}}
+{{% tab name="Global (ConfigMap)" %}}
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: config-autoscaler
+ namespace: knative-serving
+data:
+ requests-per-second-target-default: "150"
+```
+{{< /tab >}}
+{{% tab name="Global (Operator)" %}}
+```yaml
+apiVersion: operator.knative.dev/v1alpha1
+kind: KnativeServing
+metadata:
+  name: knative-serving
+spec:
+  config:
+    autoscaler:
+      requests-per-second-target-default: "150"
+```
+{{< /tab >}}
+{{< /tabs >}}
--- a/docs/serving/autoscaling/scale-bounds.md
+++ b/docs/serving/autoscaling/scale-bounds.md
@ -0,0 +1,74 @@
+---
+title: "Configuring scale bounds"
+linkTitle: "Configuring scale bounds"
+weight: 50
+type: "docs"
+---
+
+To apply upper and lower bounds to autoscaling behavior, you can specify scale bounds in both directions.
+
+## Lower bound
+
+This value controls the minimum number of replicas that each revision should have.
+Knative will attempt to never have less than this number of replicas at any one point in time.
+
+* **Global key:** n/a
+* **Per-revision annotation key:** `autoscaling.knative.dev/minScale`
+* **Possible values:** integer
+* **Default:** `0` if scale-to-zero is enabled and class KPA is used, `1` otherwise
+
+**NOTE:** For more information about scale-to-zero configuration, see the documentation on [Configuring scale to zero](./scale-to-zero.md).
+
+**Example:**
+{{< tabs name="min-scale" default="Per Revision" >}}
+{{% tab name="Per Revision" %}}
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: helloworld-go
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/minScale: "3"
+    spec:
+      containers:
+        - image: gcr.io/knative-samples/helloworld-go
+```
+{{< /tab >}}
+{{< /tabs >}}
+
+## Upper bound
+
+This value controls the maximum number of replicas that each revision should have.
+Knative will attempt to never have more than this number of replicas running, or in the process of being created, at any one point in time.
+
+* **Global key:** n/a
+* **Per-revision annotation key:** `autoscaling.knative.dev/maxScale`
+* **Possible values:** integer
+* **Default:** `0` which means unlimited
+
+**Example:**
+{{< tabs name="max-scale" default="Per Revision" >}}
+{{% tab name="Per Revision" %}}
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: helloworld-go
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/maxScale: "3"
+    spec:
+      containers:
+        - image: gcr.io/knative-samples/helloworld-go
+```
+{{< /tab >}}
+{{< /tabs >}}
+
+---
--- a/docs/serving/autoscaling/scale-to-zero.md
+++ b/docs/serving/autoscaling/scale-to-zero.md
@ -0,0 +1,142 @@
+---
+title: "Configuring scale to zero"
+linkTitle: "Configuring scale to zero"
+weight: 20
+type: "docs"
+---
+
+**IMPORTANT:** Scale to zero can only be enabled if you are using the Knative Pod Autoscaler (KPA), and can only be configured globally. For more information about using KPA or global configuration, see the documentation on [Autoscaling concepts](./autoscaling-concepts.md).
+
+## Enable scale to zero
+
+The scale to zero value controls whether Knative allows replicas to scale down to zero (if set to `true`), or stop at 1 replica if set to `false`.
+
+**NOTE:** For more information about scale bounds configuration per revision, see the documentation on [Configuring scale bounds](./scale-bounds.md).
+
+* **Global key:** `enable-scale-to-zero`
+* **Per-revision annotation key:** No per-revision setting.
+* **Possible values:** boolean
+* **Default:** `true`
+
+**Example:**
+{{< tabs name="scale-to-zero" default="Global (ConfigMap)" >}}
+{{% tab name="Global (ConfigMap)" %}}
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: config-autoscaler
+ namespace: knative-serving
+data:
+ enable-scale-to-zero: "false"
+```
+{{< /tab >}}
+{{% tab name="Global (Operator)" %}}
+```yaml
+apiVersion: operator.knative.dev/v1alpha1
+kind: KnativeServing
+metadata:
+  name: knative-serving
+spec:
+  config:
+    autoscaler:
+      enable-scale-to-zero: "false"
+```
+{{< /tab >}}
+{{< /tabs >}}
+
+## Scale to zero grace period
+
+This setting specifies an upper bound time limit that the system will wait internally for scale-from-zero machinery to be in place before the last replica is removed.
+
+**IMPORTANT:** This is a value that controls how long internal network programming is allowed to take, and should only be adjusted if you have experienced issues with requests being dropped while a revision was scaling to zero replicas.
+
+This setting does not adjust how long the last replica will be kept after traffic ends, and it does not guarantee that the replica will actually be kept for this entire duration.
+
+* **Global key:** `scale-to-zero-grace-period`
+* **Per-revision annotation key:** n/a
+* **Possible values:** Duration
+* **Default:** `30s`
+
+**Example:**
+{{< tabs name="scale-to-zero-grace" default="Global (ConfigMap)" >}}
+{{% tab name="Global (ConfigMap)" %}}
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: config-autoscaler
+ namespace: knative-serving
+data:
+ scale-to-zero-grace-period: "40s"
+```
+{{< /tab >}}
+{{% tab name="Global (Operator)" %}}
+```yaml
+apiVersion: operator.knative.dev/v1alpha1
+kind: KnativeServing
+metadata:
+  name: knative-serving
+spec:
+  config:
+    autoscaler:
+      scale-to-zero-grace-period: "40s"
+```
+{{< /tab >}}
+{{< /tabs >}}
+
+
+## Scale to zero last pod retention period
+
+The `scale-to-zero-pod-retention-period` flag determines the minimum amount of time that the last pod will remain active after the Autoscaler decides to scale pods to zero.
+
+This contrasts with the `scale-to-zero-grace-period` flag, which determines the maximum amount of time that the last pod will remain active after the Autoscaler decides to scale pods to zero.
+
+* **Global key:** `scale-to-zero-pod-retention-period`
+* **Per-revision annotation key:** `autoscaling.knative.dev/scaleToZeroPodRetentionPeriod`
+* **Possible values:** Non-negative duration string
+* **Default:** `0s`
+
+**Example:**
+{{< tabs name="scale-to-zero-grace" default="Global (ConfigMap)" >}}
+{{% tab name="Per Revision" %}}
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: helloworld-go
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/scaleToZeroPodRetentionPeriod: "1m5s"
+    spec:
+      containers:
+        - image: gcr.io/knative-samples/helloworld-go
+```
+{{< /tab >}}
+{{% tab name="Global (ConfigMap)" %}}
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: config-autoscaler
+ namespace: knative-serving
+data:
+ scale-to-zero-pod-retention-period: "42s"
+```
+{{< /tab >}}
+{{% tab name="Global (Operator)" %}}
+```yaml
+apiVersion: operator.knative.dev/v1alpha1
+kind: KnativeServing
+metadata:
+  name: knative-serving
+spec:
+  config:
+    autoscaler:
+      scale-to-zero-pod-retention-period: "42s"
+```
+{{< /tab >}}
+{{< /tabs >}}
--- a/docs/serving/autoscaling/target-burst-capacity.md
+++ b/docs/serving/autoscaling/target-burst-capacity.md
@ -0,0 +1,47 @@
+---
+title: "Configuring target burst capacity"
+linkTitle: "Configuring target burst capacity"
+weight: 50
+type: "docs"
+---
+
+_Target burst capacity_ is a [global and per-revision](./autoscaling-concepts.md) integer setting that determines the size of traffic burst a Knative application can handle without buffering.
+
+If a traffic burst is too large for the application to handle, the _Activator_ service will be placed in the request path to protect the revision and optimize request load balancing.
+
+The Activator service is responsible for receiving and buffering requests for inactive revisions, or for revisions where a traffic burst is larger than the limits of what can be handled without buffering for that revision.
+
+Target burst capacity can be configured using a combination of the following parameters:
+
+* Setting the targeted concurrency limits for the revision. For more information, see the documentation on [concurrency](./concurrency.md).
+* Setting the target utilization parameters. For more information, see the documentation on [target utilization](./concurrency.md#target-utilization).
+* Setting the target burst capacity per revision.
+
+## Setting the target burst capacity per revision
+
+* **Global key:** No global key.
+* **Per-revision annotation key:** `autoscaling.knative.dev/targetBurstCapacity`
+* **Possible values:** float
+* **Default:** `70`
+
+**Note:** If the activator is in the routing path, it will fully load all replicas up to `containerConcurrency`. It currently applies target utilization only on revision level.
+
+**Example:**
+{{< tabs name="targetBurstCapacity" default="Per Revision" >}}
+{{% tab name="Per Revision" %}}
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  annotations:
+  name: s3
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/minScale: "2"
+        autoscaling.knative.dev/targetBurstCapacity: "70"
+```
+{{< /tab >}}
+{{< /tabs >}}
--- a/docs/serving/configuring-autoscaling.md
+++ b/docs/serving/configuring-autoscaling.md
@ -1,825 +0,0 @@
---
-title: "Configuring autoscaling "
-weight: 10
-type: "docs"
-aliases:
- /docs/serving/configuring-the-autoscaler/
---
-
-One of the main properties of serverless platforms is their ability to scale an application to closely match its incoming demand. That requires watching load as it flows into the application and adjusting the scale based on the respective metrics. It's the job of the autoscaling component of Knative Serving to do exactly that.
-
-Knative Serving comes with its own autoscaler, the **KPA** (Knative Pod Autoscaler) but can also be configured to use Kubernetes' **HPA** (Horizontal Pod Autoscaler) or even a custom third-party autoscaler.
-
-Knative Serving Revisions come with autoscaling preconfigured to defaults that have been proven to work for a variety of use-cases. However, some workloads call for a more fine-tuned approach. This document describes the knobs you can turn to adjust the autoscaler to fit the requirements of your workload.
-
-## Global vs. per-revision settings
-
-Some of the following settings can be configured as a global default and/or overridden per revision.
-
-Global settings are done in the **config-autoscaler** ConfigMap in the namespace of your Knative Serving installation (*knative-serving* by default) if you're manually installing the system with the YAML manifests. If you're using the operator, the settings are done as part of the `spec.config.autoscaler` map on the **KnativeServing CR**. The keys for both the ConfigMap and the CR are the same.
-
-Per-revision settings are done by setting annotations on the revision. When you're creating revisions through a Service or a Configuration, that means the annotations must be set on the respective Revision `template`. All per-revision annotation keys are prefixed with `autoscaling.knative.dev/`. Per-revision settings override global settings where both settings are available. If no per-revision setting is specified, the respective global setting is used.
-
-**Note:** It's important that the annotation is set inside the template key so that it will appear on each revision as they are created. Setting it in the top-level metadata will not propagate them to the revision and thus will not have any effect on autoscaling.
-
-**Example:**
-{{< tabs name="example" default="Per Revision" >}}
-{{% tab name="Per Revision" %}}
-```yaml
-apiVersion: serving.knative.dev/v1
-kind: Service
-metadata:
-  name: helloworld-go
-  namespace: default
-spec:
-  template:
-    metadata:
-      annotations:
-        autoscaling.knative.dev/target: "70"
-    spec:
-      containers:
-        - image: gcr.io/knative-samples/helloworld-go
-```
-{{< /tab >}}
-{{% tab name="Global (ConfigMap)" %}}
-```yaml
-apiVersion: v1
-kind: ConfigMap
-metadata:
- name: config-autoscaler
- namespace: knative-serving
-data:
- container-concurrency-target-default: "100"
-```
-{{< /tab >}}
-{{% tab name="Global (Operator)" %}}
-```yaml
-apiVersion: operator.knative.dev/v1alpha1
-kind: KnativeServing
-metadata:
-  name: knative-serving
-spec:
-  config:
-    autoscaler:
-      container-concurrency-target-default: "100"
-```
-{{< /tab >}}
-{{< /tabs >}}
-
-## Class
-
-As mentioned above, there are multiple potential implementations of an autoscaler. Knative Serving supports the Knative Pod Autoscaler (KPA) and Kubernetes' Horizontal Pod Autoscaler (HPA). The HPA needs to be enabled during installation and is not part of the Knative Serving core.
-
-The KPA is the default and is tailored for serverless workloads. It has performance optimizations that the HPA currently does not have and, unlike the HPA, it supports scale to zero. The HPA on the other hand supports CPU based autoscaling, which the KPA does not support.
-
-* **Global key:** `pod-autoscaler-class`
-* **Per-revision annotation key:** `autoscaling.knative.dev/class`
-* **Possible values:** `"kpa.autoscaling.knative.dev"` or `"hpa.autoscaling.knative.dev"`
-* **Default:** `"kpa.autoscaling.knative.dev"`
-
-**Example:**
-{{< tabs name="class" default="Per Revision" >}}
-{{% tab name="Per Revision" %}}
-```yaml
-apiVersion: serving.knative.dev/v1
-kind: Service
-metadata:
-  name: helloworld-go
-  namespace: default
-spec:
-  template:
-    metadata:
-      annotations:
-        autoscaling.knative.dev/class: "kpa.autoscaling.knative.dev"
-    spec:
-      containers:
-        - image: gcr.io/knative-samples/helloworld-go
-```
-{{< /tab >}}
-{{% tab name="Global (ConfigMap)" %}}
-```yaml
-apiVersion: v1
-kind: ConfigMap
-metadata:
- name: config-autoscaler
- namespace: knative-serving
-data:
- pod-autoscaler-class: "kpa.autoscaling.knative.dev"
-```
-{{< /tab >}}
-{{% tab name="Global (Operator)" %}}
-```yaml
-apiVersion: operator.knative.dev/v1alpha1
-kind: KnativeServing
-metadata:
-  name: knative-serving
-spec:
-  config:
-    autoscaler:
-      pod-autoscaler-class: "kpa.autoscaling.knative.dev"
-```
-{{< /tab >}}
-{{< /tabs >}}
-
-## Metric
-
-The metric specifies which value is looked at and compared against the respective target. The KPA supports two metrics: **concurrency** and **requests-per-seconds** (rps). The HPA currently only supports **cpu** in Knative Serving.
-
-* **Global key:** n/a
-* **Per-revision annotation key:** `autoscaling.knative.dev/metric`
-* **Possible values:** `"concurrency"`, `"rps"` or `"cpu"`
-* **Default:** `"concurrency"`
-
-**Note:** `"cpu"` is only supported on revisions with the HPA class.
-
-**Example:**
-{{< tabs name="metric" default="Per Revision" >}}
-{{% tab name="Per Revision" %}}
-```yaml
-apiVersion: serving.knative.dev/v1
-kind: Service
-metadata:
-  name: helloworld-go
-  namespace: default
-spec:
-  template:
-    metadata:
-      annotations:
-        autoscaling.knative.dev/metric: "rps"
-    spec:
-      containers:
-        - image: gcr.io/knative-samples/helloworld-go
-```
-{{< /tab >}}
-{{< /tabs >}}
-
-## Targets
-
-The autoscaling target is the value the autoscaler tries to maintain per replica of the application. If we, for example, specify a "concurrency target" of "10", the autoscaler will try to make sure that every replica receives on average 10 requests at a time. A target is always evaluated against a specified metric.
-
-**Note:** The per-revision annotation keys for specifying a target are always the same while the global keys specify the metric they correspond to. Globally, the distinction is needed as there can be revision using different metrics in the system. On the revision itself though, that'd be redundant information as the metric is defined separately (see above), hence the target annotation key is metric agnostic and can be used for all of the supported metrics.
-
-### Concurrency Targets/Limits
-
-When "Metric" is set to "concurrency", Knative Serving revisions are scaled on observed **concurrency**. As in the example above, it tries to maintain a stable amount of concurrent requests being worked on by each replica.
-
-Configuring a concurrency target is a little special because Knative Serving has a **soft** and a **hard** concurrency limit. As the name suggests, the hard limit is an enforced upper bound. If concurrency ever hits that bound, additional requests will be buffered and wait until enough capacity is free to execute the requests. The soft limit is only a target for the autoscaler. In some situations and especially on bursts this value can be exceeded by a given replica.
-
-**Note:** It is recommended to only use the hard limit if your application has a clear need for it. Low hard limit values can have an impact on the throughput and latency of the application.\
-**Note:** If both a soft and a hard limit are specified, the smaller of the two values will be used as to not have the autoscaler target a value that's not even allowed into the replica by the hard limit.
-
-#### Soft Limit (target)
-
-* **Global key:** `container-concurrency-target-default`
-* **Per-revision annotation key:** `autoscaling.knative.dev/target`
-* **Possible values:** integer
-* **Default:** `100`
-
-**Example:**
-{{< tabs name="target-concurrency" default="Per Revision" >}}
-{{% tab name="Per Revision" %}}
-```yaml
-apiVersion: serving.knative.dev/v1
-kind: Service
-metadata:
-  name: helloworld-go
-  namespace: default
-spec:
-  template:
-    metadata:
-      annotations:
-        autoscaling.knative.dev/target: "200"
-    spec:
-      containers:
-        - image: gcr.io/knative-samples/helloworld-go
-```
-{{< /tab >}}
-{{% tab name="Global (ConfigMap)" %}}
-```yaml
-apiVersion: v1
-kind: ConfigMap
-metadata:
- name: config-autoscaler
- namespace: knative-serving
-data:
- container-concurrency-target-default: "200"
-```
-{{< /tab >}}
-{{% tab name="Global (Operator)" %}}
-```yaml
-apiVersion: operator.knative.dev/v1alpha1
-kind: KnativeServing
-metadata:
-  name: knative-serving
-spec:
-  config:
-    autoscaler:
-      container-concurrency-target-default: "200"
-```
-{{< /tab >}}
-{{< /tabs >}}
-
-#### Hard Limit (containerConcurrency)
-
-The hard limit has its global setting in the `config-defaults` config map and can also be specified per-revision. Its per-revision setting is not an annotation but is actually present on the revision's spec itself as `containerConcurrency`. Its default value is "0", which means an unlimited number of requests are allowed to flow into the replica. A value above "0" specifies the exact amount of requests allowed to the replica at a time.
-
-* **Global key:** `container-concurrency` (`config-defaults` config map)
-* **Per-revision spec key:** `containerConcurrency`
-* **Possible values:** integer
-* **Default:** `0`, which means unlimited
-
-**Example:**
-{{< tabs name="container-concurrency" default="Per Revision" >}}
-{{% tab name="Per Revision" %}}
-```yaml
-apiVersion: serving.knative.dev/v1
-kind: Service
-metadata:
-  name: helloworld-go
-  namespace: default
-spec:
-  template:
-    spec:
-      containerConcurrency: 50
-      containers:
-        - image: gcr.io/knative-samples/helloworld-go
-```
-{{< /tab >}}
-{{% tab name="Global (ConfigMap)" %}}
-```yaml
-apiVersion: v1
-kind: ConfigMap
-metadata:
- name: config-defaults
- namespace: knative-serving
-data:
- container-concurrency: "50"
-```
-{{< /tab >}}
-{{% tab name="Global (Operator)" %}}
-```yaml
-apiVersion: operator.knative.dev/v1alpha1
-kind: KnativeServing
-metadata:
-  name: knative-serving
-spec:
-  config:
-    defaults:
-      container-concurrency: "50"
-```
-{{< /tab >}}
-{{< /tabs >}}
-
-#### Target Utilization
-
-In addition to the literal settings explained above, the concurrency values can further be adjusted via a target utilization value. This value specifies a percentage of the target to actually be targeted by the autoscaler. In effect, this specifies the “hotness” at which a replica runs, which causes the autoscaler to scale up before the total limit is reached.
-
-* **Global key:** `container-concurrency-target-percentage`
-* **Per-revision annotation key:** `autoscaling.knative.dev/targetUtilizationPercentage`
-* **Possible values:** float
-* **Default:** `70`
-
-**Note:** Target Utilization is only applied to the autoscaling as a suggestion. It is not applied to the hard limit enforcement itself. For example, if containerConcurrency is set to 10 and the utilization to "70" (percent), the autoscaler will create a new replica when the average number of concurrent requests across all replicas reaches "7". Note that requests numbered 7 through 10 will still be sent to the existing replicas, but this allows for additional replicas to be started in anticipation of it being needed when the containerConcurrency limit is reached.
-
-**Note:** If the activator is in the routing path, it will fully load all replicas up to `containerConcurrency`. It currently does not take target utilization into account.
-
-**Example:**
-{{< tabs name="target-utilization" default="Per Revision" >}}
-{{% tab name="Per Revision" %}}
-```yaml
-apiVersion: serving.knative.dev/v1
-kind: Service
-metadata:
-  name: helloworld-go
-  namespace: default
-spec:
-  template:
-    metadata:
-      annotations:
-        autoscaling.knative.dev/targetUtilizationPercentage: "80"
-    spec:
-      containers:
-        - image: gcr.io/knative-samples/helloworld-go
-```
-{{< /tab >}}
-{{% tab name="Global (ConfigMap)" %}}
-```yaml
-apiVersion: v1
-kind: ConfigMap
-metadata:
- name: config-autoscaler
- namespace: knative-serving
-data:
- container-concurrency-target-percentage: "80"
-```
-{{< /tab >}}
-{{% tab name="Global (Operator)" %}}
-```yaml
-apiVersion: operator.knative.dev/v1alpha1
-kind: KnativeServing
-metadata:
-  name: knative-serving
-spec:
-  config:
-    autoscaler:
-      container-concurrency-target-percentage: "80"
-```
-{{< /tab >}}
-{{< /tabs >}}
-
-### Requests-per-second Target
-
-As the name suggests, this specifies a target requests-per-second per replica.
-
-* **Global key:** `requests-per-second-target-default`
-* **Per-revision annotation key:** `autoscaling.knative.dev/target`
-* **Possible values:** integer
-* **Default:** `200`
-
-**Example:**
-{{< tabs name="rps-target" default="Per Revision" >}}
-{{% tab name="Per Revision" %}}
-```yaml
-apiVersion: serving.knative.dev/v1
-kind: Service
-metadata:
-  name: helloworld-go
-  namespace: default
-spec:
-  template:
-    metadata:
-      annotations:
-        autoscaling.knative.dev/target: "150"
-    spec:
-      containers:
-        - image: gcr.io/knative-samples/helloworld-go
-```
-{{< /tab >}}
-{{% tab name="Global (ConfigMap)" %}}
-```yaml
-apiVersion: v1
-kind: ConfigMap
-metadata:
- name: config-autoscaler
- namespace: knative-serving
-data:
- requests-per-second-target-default: "150"
-```
-{{< /tab >}}
-{{% tab name="Global (Operator)" %}}
-```yaml
-apiVersion: operator.knative.dev/v1alpha1
-kind: KnativeServing
-metadata:
-  name: knative-serving
-spec:
-  config:
-    autoscaler:
-      requests-per-second-target-default: "150"
-```
-{{< /tab >}}
-{{< /tabs >}}
-
-## Scale Bounds
-
-To apply upper and lower bounds to the scaling behavior, one can specify scale bounds in both directions.
-
-### Lower bound
-
-This value controls the minimum number of replicas each revision should have. Knative will attempt to never have less than this number of replicas at any one point in time.
-
-* **Global key:** n/a
-* **Per-revision annotation key:** `autoscaling.knative.dev/minScale`
-* **Possible values:** integer
-* **Default:** `0` if scale-to-zero is enabled and class KPA is used, `1` otherwise
-
-**Example:**
-{{< tabs name="min-scale" default="Per Revision" >}}
-{{% tab name="Per Revision" %}}
-```yaml
-apiVersion: serving.knative.dev/v1
-kind: Service
-metadata:
-  name: helloworld-go
-  namespace: default
-spec:
-  template:
-    metadata:
-      annotations:
-        autoscaling.knative.dev/minScale: "3"
-    spec:
-      containers:
-        - image: gcr.io/knative-samples/helloworld-go
-```
-{{< /tab >}}
-{{< /tabs >}}
-
-### Upper bound
-
-This value controls the maximum number of replicas each revision should have. Knative will attempt to never have more than this number of replicas running, or in the process of being created, at any one point in time.
-
-* **Global key:** n/a
-* **Per-revision annotation key:** `autoscaling.knative.dev/maxScale`
-* **Possible values:** integer
-* **Default:** `0` which means unlimited
-
-**Example:**
-{{< tabs name="max-scale" default="Per Revision" >}}
-{{% tab name="Per Revision" %}}
-```yaml
-apiVersion: serving.knative.dev/v1
-kind: Service
-metadata:
-  name: helloworld-go
-  namespace: default
-spec:
-  template:
-    metadata:
-      annotations:
-        autoscaling.knative.dev/maxScale: "3"
-    spec:
-      containers:
-        - image: gcr.io/knative-samples/helloworld-go
-```
-{{< /tab >}}
-{{< /tabs >}}
-
---
-
-# Knative Pod Autoscaler (KPA) Behavior
-
-The following settings are specific to the KPA.
-
-## Scale To Zero
-
-The scale-to-zero values control whether Knative allows revisions to scale down to zero, or stops at "1".
-
-### Enablement
-
-* **Global key:** `enable-scale-to-zero`
-* **Per-revision annotation key:** n/a
-* **Possible values:** boolean
-* **Default:** `true`
-
-**Note:** If this is set to `false`, the behavior of the lower Scale Bounds configuration changes as described above.
-
-**Example:**
-{{< tabs name="scale-to-zero" default="Global (ConfigMap)" >}}
-{{% tab name="Global (ConfigMap)" %}}
-```yaml
-apiVersion: v1
-kind: ConfigMap
-metadata:
- name: config-autoscaler
- namespace: knative-serving
-data:
- enable-scale-to-zero: "false"
-```
-{{< /tab >}}
-{{% tab name="Global (Operator)" %}}
-```yaml
-apiVersion: operator.knative.dev/v1alpha1
-kind: KnativeServing
-metadata:
-  name: knative-serving
-spec:
-  config:
-    autoscaler:
-      enable-scale-to-zero: "false"
-```
-{{< /tab >}}
-{{< /tabs >}}
-
-### Scale To Zero Grace Period
-
-This period is an upper bound amount of time the system waits internally for the scale-from-zero machinery to be in place before the last replica is actually removed. This is an implementation detail and does not adjust how long the last replica will be kept after traffic ends as it's not guaranteed that the will actually keep the replica for this time. This is a value that controls how long internal network programming is allowed to take and should only be adjusted if there have been issues with requests being dropped when a revision was scaling to zero.
-
-* **Global key:** `scale-to-zero-grace-period`
-* **Per-revision annotation key:** n/a
-* **Possible values:** Duration (must be at least 6s).
-* **Default:** `30s`
-
-**Example:**
-{{< tabs name="scale-to-zero-grace" default="Global (ConfigMap)" >}}
-{{% tab name="Global (ConfigMap)" %}}
-```yaml
-apiVersion: v1
-kind: ConfigMap
-metadata:
- name: config-autoscaler
- namespace: knative-serving
-data:
- scale-to-zero-grace-period: "40s"
-```
-{{< /tab >}}
-{{% tab name="Global (Operator)" %}}
-```yaml
-apiVersion: operator.knative.dev/v1alpha1
-kind: KnativeServing
-metadata:
-  name: knative-serving
-spec:
-  config:
-    autoscaler:
-      scale-to-zero-grace-period: "40s"
-```
-{{< /tab >}}
-{{< /tabs >}}
-
-
-### Scale To Zero Last Pod Retention Period
-
-The `scale-to-zero-pod-retention-period` flag determines the **minimum** amount of time that the last pod will remain active after the Autoscaler has decided to scale pods to zero.
-
-This contrasts with the `scale-to-zero-grace-period` flag, which determines the **maximum** amount of time that the last pod will remain active after the Autoscaler has decided to scale pods to zero.
-
-* **Global key:** `scale-to-zero-pod-retention-period`
-* **Per-revision annotation key:** `autoscaling.knative.dev/scaleToZeroPodRetentionPeriod`
-* **Possible values:** Non-negative duration string
-* **Default:** `0s`
-
-**Example:**
-{{< tabs name="scale-to-zero-grace" default="Global (ConfigMap)" >}}
-{{% tab name="Per Revision" %}}
-```yaml
-apiVersion: serving.knative.dev/v1
-kind: Service
-metadata:
-  name: helloworld-go
-  namespace: default
-spec:
-  template:
-    metadata:
-      annotations:
-        autoscaling.knative.dev/scaleToZeroPodRetentionPeriod: "1m5s"
-    spec:
-      containers:
-        - image: gcr.io/knative-samples/helloworld-go
-```
-{{< /tab >}}
-{{% tab name="Global (ConfigMap)" %}}
-```yaml
-apiVersion: v1
-kind: ConfigMap
-metadata:
- name: config-autoscaler
- namespace: knative-serving
-data:
- scale-to-zero-pod-retention-period: "42s"
-```
-{{< /tab >}}
-{{% tab name="Global (Operator)" %}}
-```yaml
-apiVersion: operator.knative.dev/v1alpha1
-kind: KnativeServing
-metadata:
-  name: knative-serving
-spec:
-  config:
-    autoscaler:
-      scale-to-zero-pod-retention-period: "42s"
-```
-{{< /tab >}}
-{{< /tabs >}}
-
-## Modes: Stable and Panic
-
-The KPA acts on the respective metrics (concurrency or RPS) aggregated over time-based windows. These windows define the amount of historical data the autoscaler takes into account and are used to smooth the data over the specified amount of time. The shorter these windows are, the more quickly the autoscaler will react but the more hysterically it will react as well.
-
-The KPA's implementation has two modes: **stable** and **panic**. The stable mode is for general operation while the panic mode has, by default, a much shorter window and will be used to quickly scale a revision up if a burst of traffic comes in. As the panic window is a lot shorter, it will react more quickly to load. The revision will not scale down while in panic mode to avoid a lot of churn.
-
-### Stable Window
-
-* **Global key:** `stable-window`
-* **Per-revision annotation key:** `autoscaling.knative.dev/window`
-* **Possible values:** Duration, `6s` <= value <= `1h`
-* **Default:** `60s`
-
-**Note:** During scale-down, the final replica will only be removed after the revision hasn't seen any traffic for the entire duration of the stable window.\
-**Note:** The autoscaler will leave panic mode only after not seeing a reason to panic for the stable window's timeframe.
-
-**Example:**
-{{< tabs name="stable-window" default="Per Revision" >}}
-{{% tab name="Per Revision" %}}
-```yaml
-apiVersion: serving.knative.dev/v1
-kind: Service
-metadata:
-  name: helloworld-go
-  namespace: default
-spec:
-  template:
-    metadata:
-      annotations:
-        autoscaling.knative.dev/window: "40s"
-    spec:
-      containers:
-        - image: gcr.io/knative-samples/helloworld-go
-```
-{{< /tab >}}
-{{% tab name="Global (ConfigMap)" %}}
-```yaml
-apiVersion: v1
-kind: ConfigMap
-metadata:
- name: config-autoscaler
- namespace: knative-serving
-data:
- stable-window: "40s"
-```
-{{< /tab >}}
-{{% tab name="Global (Operator)" %}}
-```yaml
-apiVersion: operator.knative.dev/v1alpha1
-kind: KnativeServing
-metadata:
-  name: knative-serving
-spec:
-  config:
-    autoscaler:
-      stable-window: "40s"
-```
-{{< /tab >}}
-{{< /tabs >}}
-
-### Panic Window
-
-The panic window is defined as a percentage of the stable window to assure they are both in a workable relation to each other. This value indicates how the window over which historical data is evaluated will be shrunk upon entering panic mode. In other words, a value of "10" means that in panic mode the window will be 10% of the stable window size.
-
-* **Global key:** `panic-window-percentage`
-* **Per-revision annotation key:** `autoscaling.knative.dev/panicWindowPercentage`
-* **Possible values:** float, `1.0` <= value <= `100.0`
-* **Default:** `10.0`
-
-**Example:**
-{{< tabs name="panic-window" default="Per Revision" >}}
-{{% tab name="Per Revision" %}}
-```yaml
-apiVersion: serving.knative.dev/v1
-kind: Service
-metadata:
-  name: helloworld-go
-  namespace: default
-spec:
-  template:
-    metadata:
-      annotations:
-        autoscaling.knative.dev/panicWindowPercentage: "20.0"
-    spec:
-      containers:
-        - image: gcr.io/knative-samples/helloworld-go
-```
-{{< /tab >}}
-{{% tab name="Global (ConfigMap)" %}}
-```yaml
-apiVersion: v1
-kind: ConfigMap
-metadata:
- name: config-autoscaler
- namespace: knative-serving
-data:
- panic-window-percentage: "20.0"
-```
-{{< /tab >}}
-{{% tab name="Global (Operator)" %}}
-```yaml
-apiVersion: operator.knative.dev/v1alpha1
-kind: KnativeServing
-metadata:
-  name: knative-serving
-spec:
-  config:
-    autoscaler:
-      panic-window-percentage: "20.0"
-```
-{{< /tab >}}
-{{< /tabs >}}
-
-### Panic Mode Threshold
-
-This threshold defines when the autoscaler will move from stable mode into panic mode. It's a multiple of the traffic that the current amount of replicas can (or cannot) handle. 100 percent would mean that the autoscaler is always in panic mode, hence the minimum is somewhat higher than that. The default of 200 means that panic mode will be entered if traffic is twice as high as the current replica population can handle.
-
-* **Global key:** `panic-threshold-percentage`
-* **Per-revision annotation key:** `autoscaling.knative.dev/panicThresholdPercentage`
-* **Possible values:** float, `110.0` <= value <= `1000.0`
-* **Default:** `200.0`
-
-**Example:**
-{{< tabs name="panic-threshold" default="Per Revision" >}}
-{{% tab name="Per Revision" %}}
-```yaml
-apiVersion: serving.knative.dev/v1
-kind: Service
-metadata:
-  name: helloworld-go
-  namespace: default
-spec:
-  template:
-    metadata:
-      annotations:
-        autoscaling.knative.dev/panicThresholdPercentage: "150.0"
-    spec:
-      containers:
-        - image: gcr.io/knative-samples/helloworld-go
-```
-{{< /tab >}}
-{{% tab name="Global (ConfigMap)" %}}
-```yaml
-apiVersion: v1
-kind: ConfigMap
-metadata:
- name: config-autoscaler
- namespace: knative-serving
-data:
- panic-threshold-percentage: "150.0"
-```
-{{< /tab >}}
-{{% tab name="Global (Operator)" %}}
-```yaml
-apiVersion: operator.knative.dev/v1alpha1
-kind: KnativeServing
-metadata:
-  name: knative-serving
-spec:
-  config:
-    autoscaler:
-      panic-threshold-percentage: "150.0"
-```
-{{< /tab >}}
-{{< /tabs >}}
-
-## Scale Rates
-
-These settings control by how much the revision's population can scale up or down in a single evaluation cycle. A minimal change of one in each direction is always allowed, i.e. the autoscaler can always scale up or down by at least one, regardless of this rate.
-
-### Scale Up Rate
-
-Maximum ratio of desired vs. observed pods, i.e. with a value of `2.0`, the revision can only go from `N` to `2*N` pods in one evaluation cycle.
-
-* **Global key:** `max-scale-up-rate`
-* **Per-revision annotation key:** n/a
-* **Possible values:** float
-* **Default:** `1000.0`
-
-**Example:**
-{{< tabs name="scale-up-rate" default="Global (ConfigMap)" >}}
-{{% tab name="Global (ConfigMap)" %}}
-```yaml
-apiVersion: v1
-kind: ConfigMap
-metadata:
- name: config-autoscaler
- namespace: knative-serving
-data:
- max-scale-up-rate: "500.0"
-```
-{{< /tab >}}
-{{% tab name="Global (Operator)" %}}
-```yaml
-apiVersion: operator.knative.dev/v1alpha1
-kind: KnativeServing
-metadata:
-  name: knative-serving
-spec:
-  config:
-    autoscaler:
-      max-scale-up-rate: "500.0"
-```
-{{< /tab >}}
-{{< /tabs >}}
-
-### Scale Down Rate
-
-Maximum ratio of observed vs. desired pods, i.e. with a value of `2.0`, the revision can only go from `N` to `N/2` pods in one evaluation cycle.
-
-* **Global key:** `max-scale-down-rate`
-* **Per-revision annotation key:** n/a
-* **Possible values:** float
-* **Default:** `2.0`
-
-**Example:**
-{{< tabs name="scale-down-rate" default="Global (ConfigMap)" >}}
-{{% tab name="Global (ConfigMap)" %}}
-```yaml
-apiVersion: v1
-kind: ConfigMap
-metadata:
- name: config-autoscaler
- namespace: knative-serving
-data:
- max-scale-down-rate: "4.0"
-```
-{{< /tab >}}
-{{% tab name="Global (Operator)" %}}
-```yaml
-apiVersion: operator.knative.dev/v1alpha1
-kind: KnativeServing
-metadata:
-  name: knative-serving
-spec:
-  config:
-    autoscaler:
-      max-scale-down-rate: "4.0"
-```
-{{< /tab >}}
-{{< /tabs >}}
--- a/docs/serving/samples/autoscale-go/index.md
+++ b/docs/serving/samples/autoscale-go/index.md
@ -1,8 +0,0 @@
---
-title: "Autoscale Sample App - Go"
-linkTitle: "Autoscaling - Go"
-weight: 1
-type: "docs"
---
-
-{{% readfile file="README.md" %}}