diff --git a/config/nav.yml b/config/nav.yml
index 1dbd9f913..aa52e2c9a 100644
--- a/config/nav.yml
+++ b/config/nav.yml
@@ -95,7 +95,10 @@ nav:
     # Serving
 ###############################################################################
     - Serving:
-      - Knative Serving overview: serving/README.md
+      - Knative Serving:
+        - Overview: serving/README.md
+        - Architecture: serving/architecture.md
+        - Request Flow: serving/request-flow.md
       - Resources:
         - Revisions:
           - About Revisions: serving/revisions/README.md
@@ -173,7 +176,6 @@ nav:
         - Debugging application issues: serving/troubleshooting/debugging-application-issues.md
     # Serving reference docs
       - Reference:
-        - Request Flow: serving/reference/request-flow.md
         - Serving API: serving/reference/serving-api.md
 ###############################################################################
     # Eventing
diff --git a/docs/install/README.md b/docs/install/README.md
index c4addafb8..8f051e86f 100644
--- a/docs/install/README.md
+++ b/docs/install/README.md
@@ -1,5 +1,8 @@
 # Installing Knative
 
+!!! note
+    Please also take a look at the [Serving Architecture](../serving/architecture.md), which explains the Knative components and the general networking concept.
+
 You can install the Serving component, Eventing component, or both on your
 cluster by using one of the following deployment options:
 
diff --git a/docs/serving/architecture.md b/docs/serving/architecture.md
new file mode 100644
index 000000000..71f3bd7a8
--- /dev/null
+++ b/docs/serving/architecture.md
@@ -0,0 +1,51 @@
+# Knative Serving Architecture
+
+Knative Serving consists of several components forming the backbone of the Serverless Platform.
+This page explains the high-level architecture of Knative Serving. Please also refer to [the Knative Serving Overview](./README.md) 
+and [the Request Flow](./request-flow.md) for additional information.
+
+## Diagram
+
+![Knative Serving Architecture](images/serving-architecture.png)
+
+## Components
+
+| Component   | Responsibilities                                                                                                                                                                                                                                                                                                                                                                                                                      |
+|-------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Activator   | The activator is part of the **data-plane**. It is responsible to queue incoming requests (if a `Knative Service` is scaled-to-zero). It communicates with the autoscaler to bring scaled-to-zero Services back up and forward the queued requests. Activator can also act as a request buffer to handle traffic bursts. Additional details can be found [here](https://github.com/knative/serving/blob/main/docs/scaling/SYSTEM.md). |
+| Autoscaler  | The autoscaler is responsible to scale the Knative Services based on configuration, metrics and incoming requests.                                                                                                                                                                                                                                                                                                                    |
+| Controller  | The controller manages the state of Knative resources within the cluster. It watches several objects, manages the lifecycle of dependent resources, and updates the resource state.                                                                                                                                                                                                                                                   |
+| Queue-Proxy | The Queue-Proxy is a sidecar container in the Knative Service's Pod. It is responsible to collect metrics and enforcing the desired concurrency when forwarding requests to the user's container. It can also act as a queue if necessary, similar to the Activator.                                                                                                                                                                  |
+| Webhooks    | Knative Serving has several webhooks responsible to validate and mutate Knative Resources.                                                                                                                                                                                                                                                                                                                                            |
+
+## Networking Layer and Ingress
+
+!!! note
+    `Ingress` in this case, does not refer to the [Kubernetes Ingress Resource](https://kubernetes.io/docs/concepts/services-networking/ingress/). It refers to the concept of exposing external access to a resource on the cluster. 
+    
+Knative Serving depends on a `Networking Layer` that fulfils the [Knative Networking Specification](https://github.com/knative/networking). 
+For this, Knative Serving defines an internal `KIngress` resource, which acts as an abstraction for different multiple pluggable networking layers. Currently, three networking layers are available and supported by the community:
+
+* [net-kourier](https://github.com/knative-sandbox/net-kourier)
+* [net-contour](https://github.com/knative-sandbox/net-contour)
+* [net-istio](https://github.com/knative-sandbox/net-istio)
+
+
+## Traffic flow and DNS
+
+!!! note
+    There are fine differences between the different networking layers, the following section describes the general concept. Also, there are multiple ways to expose your `Ingress Gateway` and configure DNS. Please refer the installation documentation for more information.
+
+![Knative Serving Architecture Ingress](images/serving-architecture-ingress.png)
+
+* Each networking layer has a controller that is responsible to watch the `KIngress` resources and configure the `Ingress Gateway` accordingly. It will also report back `status` information through this resource.
+* The `Ingress Gateway` is used to route requests to the `activator` or directly to a Knative Service Pod, depending on the mode (proxy/serve, see [here](https://github.com/knative/serving/blob/main/docs/scaling/SYSTEM.md) for more details). The `Ingress Gateway` is handling requests  from inside the cluster and from outside the cluster.
+* For the `Ingress Gateway` to be reachable outside the cluster, it must be [exposed](https://kubernetes.io/docs/tutorials/kubernetes-basics/expose/expose-intro/) using a Kubernetes Service of `type: LoadBalancer` or `type: NodePort`. The community supported networking layers include this as part of the installation. Then [DNS](../install/yaml-install/serving/install-serving-with-yaml.md#configure-dns) is configured to point to the `IP` or `Name` of the `Ingress Gateway`
+
+!!! note
+    Please note, if you do use/set DNS, you should also set the [same domain](./using-a-custom-domain.md) for Knative.
+
+
+## Autoscaling
+
+You can find more detailed information on our autoscaling mechanism [here](https://github.com/knative/serving/tree/main/docs/scaling).
diff --git a/docs/serving/reference/request-flow.png b/docs/serving/images/request-flow.png
similarity index 100%
rename from docs/serving/reference/request-flow.png
rename to docs/serving/images/request-flow.png
diff --git a/docs/serving/images/serving-architecture-ingress.png b/docs/serving/images/serving-architecture-ingress.png
new file mode 100644
index 000000000..5aa4a1120
Binary files /dev/null and b/docs/serving/images/serving-architecture-ingress.png differ
diff --git a/docs/serving/images/serving-architecture.png b/docs/serving/images/serving-architecture.png
new file mode 100644
index 000000000..984c3cbe1
Binary files /dev/null and b/docs/serving/images/serving-architecture.png differ
diff --git a/docs/serving/reference/request-flow.md b/docs/serving/request-flow.md
similarity index 91%
rename from docs/serving/reference/request-flow.md
rename to docs/serving/request-flow.md
index 36495d95c..d1ce3e0f7 100644
--- a/docs/serving/reference/request-flow.md
+++ b/docs/serving/request-flow.md
@@ -1,12 +1,13 @@
 # HTTP Request Flows
 
-While [the overview](/docs/serving) describes the logical components of Knative
+While [the overview](/docs/serving) describes the logical components and
+[the architecture](./architecture.md) describes the over all architecture of Knative
 Serving, this page explains the behavior and flow of HTTP requests to an
 application which is running on Knative Serving.
 
 The following diagram shows the different request flows and control plane loops for Knative Serving.  Note that some components, such as the autoscaler and the apiserver are not updated on every request, but instead measure the system periodically (this is referred to as the control plane).
 
-![Diagram of Knative request flow through HTTP router to optional Activator, then queue-proxy and user container](./request-flow.png)
+![Diagram of Knative request flow through HTTP router to optional Activator, then queue-proxy and user container](images/request-flow.png)
 <!-- Image original: https://docs.google.com/drawings/d/1Jipg4755BHCyqZGu1sUj7FMFUpEs-35Rf5T5chHZ6m0/edit -->
 
 The HTTP router, activator, and autoscaler are all shared cluster-level
@@ -19,7 +20,7 @@ pluggable ingress layer), and are recorded on the request in an internal header.
 Once a request has been assigned to a Revision, the subsequent routing depends
 on the measured traffic flow; at low or zero traffic, incoming requests are
 routed to the activator, while at high traffic levels ([spare capacity greater
-than `target-burst-capacity`](../../load-balancing/target-burst-capacity))
+than `target-burst-capacity`](./load-balancing/target-burst-capacity.md))
 traffic is routed directly to the application pods.
 
 ## Scale From Zero
@@ -33,7 +34,7 @@ additional capacity is needed.
 
 When the autoscaler detects that the available capacity for a Revision is below
 the requested capacity, it [increases the number of pods requested from
-Kubernetes](../../autoscaling/autoscale-go#algorithm).
+Kubernetes](./autoscaling/autoscale-go/README.md#algorithm).
 
 When these new pods become ready or an existing pod has capacity, the activator
 will forward the delayed request to a ready pod.  If a new pod needs to be
@@ -42,7 +43,7 @@ started to handle a request, this is called a _cold-start_.
 ## High scale
 
 When a Revision has a high amount of traffic ([the spare capacity is greater
-than `target-burst-capacity`](../../load-balancing/target-burst-capacity)), the
+than `target-burst-capacity`](./load-balancing/target-burst-capacity.md)), the
 ingress router is programmed directly with the pod adresses of the Revision, and
 the activator is removed from the traffic flow.  This reduces latency and
 increases efficiency when the additional buffering of the activator is not
@@ -78,7 +79,7 @@ reliability and scaling of Knative:
   activator is removed from the request path.
 
 * Implements the [`containerConcurrency` hard limit on request
-  concurrency](https://knative.dev/docs/serving/autoscaling/concurrency/#hard-limit)
+  concurrency](./autoscaling/concurrency.md#hard-limit)
   if requested.
 
 * Handles graceful shutdown on Pod termination (refuse new requests, fail