From 55f1a468d97fbf9387e577e901b3f290ed8aa15b Mon Sep 17 00:00:00 2001
From: Trevor Royer <trevorroyer@gmail.com>
Date: Fri, 16 May 2025 19:43:45 -0700
Subject: [PATCH] Move cli args docs to its own page (#18228) (#18264)

Signed-off-by: Trevor Royer <troyer@redhat.com>
---
 docs/source/index.md                          |  1 +
 docs/source/serving/engine_args.md            |  2 +
 .../serving/openai_compatible_server.md       | 50 +------------------
 docs/source/serving/serve_args.md             | 47 +++++++++++++++++
 4 files changed, 51 insertions(+), 49 deletions(-)
 create mode 100644 docs/source/serving/serve_args.md

diff --git a/docs/source/index.md b/docs/source/index.md
index bbff7361f7..0470a43a95 100644
--- a/docs/source/index.md
+++ b/docs/source/index.md
@@ -117,6 +117,7 @@ training/rlhf.md
 
 serving/offline_inference
 serving/openai_compatible_server
+serving/serve_args
 serving/multimodal_inputs
 serving/distributed_serving
 serving/metrics
diff --git a/docs/source/serving/engine_args.md b/docs/source/serving/engine_args.md
index 97ea01cd3b..9325a2406e 100644
--- a/docs/source/serving/engine_args.md
+++ b/docs/source/serving/engine_args.md
@@ -7,6 +7,8 @@ Engine arguments control the behavior of the vLLM engine.
 - For [offline inference](#offline-inference), they are part of the arguments to `LLM` class.
 - For [online serving](#openai-compatible-server), they are part of the arguments to `vllm serve`.
 
+For references to all arguments available from `vllm serve` see the [serve args](#serve-args) documentation.
+
 Below, you can find an explanation of every engine argument:
 
 <!--- pyml disable-num-lines 7 no-space-in-emphasis -->
diff --git a/docs/source/serving/openai_compatible_server.md b/docs/source/serving/openai_compatible_server.md
index 07bd211c23..61f7e98bf1 100644
--- a/docs/source/serving/openai_compatible_server.md
+++ b/docs/source/serving/openai_compatible_server.md
@@ -4,7 +4,7 @@
 
 vLLM provides an HTTP server that implements OpenAI's [Completions API](https://platform.openai.com/docs/api-reference/completions), [Chat API](https://platform.openai.com/docs/api-reference/chat), and more! This functionality lets you serve models and interact with them using an HTTP client.
 
-In your terminal, you can [install](../getting_started/installation.md) vLLM, then start the server with the [`vllm serve`](#vllm-serve) command. (You can also use our [Docker](#deployment-docker) image.)
+In your terminal, you can [install](../getting_started/installation.md) vLLM, then start the server with the [`vllm serve`](#serve-args) command. (You can also use our [Docker](#deployment-docker) image.)
 
 ```bash
 vllm serve NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123
@@ -168,54 +168,6 @@ completion = client.completions.create(
 print(completion._request_id)
 ```
 
-## CLI Reference
-
-(vllm-serve)=
-
-### `vllm serve`
-
-The `vllm serve` command is used to launch the OpenAI-compatible server.
-
-:::{tip}
-The vast majority of command-line arguments are based on those for offline inference.
-
-See [here](configuration-options) for some common options.
-:::
-
-:::{argparse}
-:module: vllm.entrypoints.openai.cli_args
-:func: create_parser_for_docs
-:prog: vllm serve
-:::
-
-#### Configuration file
-
-You can load CLI arguments via a [YAML](https://yaml.org/) config file.
-The argument names must be the long form of those outlined [above](#vllm-serve).
-
-For example:
-
-```yaml
-# config.yaml
-
-model: meta-llama/Llama-3.1-8B-Instruct
-host: "127.0.0.1"
-port: 6379
-uvicorn-log-level: "info"
-```
-
-To use the above config file:
-
-```bash
-vllm serve --config config.yaml
-```
-
-:::{note}
-In case an argument is supplied simultaneously using command line and the config file, the value from the command line will take precedence.
-The order of priorities is `command line > config file values > defaults`.
-e.g. `vllm serve SOME_MODEL --config config.yaml`, SOME_MODEL takes precedence over `model` in config file.
-:::
-
 ## API Reference
 
 (completions-api)=
diff --git a/docs/source/serving/serve_args.md b/docs/source/serving/serve_args.md
new file mode 100644
index 0000000000..edb49f4ba6
--- /dev/null
+++ b/docs/source/serving/serve_args.md
@@ -0,0 +1,47 @@
+(serve-args)=
+
+# Server Arguments
+
+The `vllm serve` command is used to launch the OpenAI-compatible server.
+
+## CLI Arguments
+
+The following are all arguments available from the `vllm serve` command:
+
+<!--- pyml disable-num-lines 7 no-space-in-emphasis -->
+```{eval-rst}
+.. argparse::
+    :module: vllm.entrypoints.openai.cli_args
+    :func: create_parser_for_docs
+    :prog: vllm serve
+    :nodefaultconst:
+    :markdownhelp:
+```
+
+## Configuration file
+
+You can load CLI arguments via a [YAML](https://yaml.org/) config file.
+The argument names must be the long form of those outlined [above](#serve-args).
+
+For example:
+
+```yaml
+# config.yaml
+
+model: meta-llama/Llama-3.1-8B-Instruct
+host: "127.0.0.1"
+port: 6379
+uvicorn-log-level: "info"
+```
+
+To use the above config file:
+
+```bash
+vllm serve --config config.yaml
+```
+
+:::{note}
+In case an argument is supplied simultaneously using command line and the config file, the value from the command line will take precedence.
+The order of priorities is `command line > config file values > defaults`.
+e.g. `vllm serve SOME_MODEL --config config.yaml`, SOME_MODEL takes precedence over `model` in config file.
+:::