Move cli args docs to its own page (#18228) (#18264)

Signed-off-by: Trevor Royer <troyer@redhat.com>
2025-05-16 19:43:45 -07:00 · 2025-05-16 19:43:45 -07:00 · 55f1a468d9
parent fd195b194e
commit 55f1a468d9
4 changed files with 51 additions and 49 deletions
--- a/docs/source/index.md
+++ b/docs/source/index.md
@ -117,6 +117,7 @@ training/rlhf.md

 serving/offline_inference
 serving/openai_compatible_server
+serving/serve_args
 serving/multimodal_inputs
 serving/distributed_serving
 serving/metrics
--- a/docs/source/serving/engine_args.md
+++ b/docs/source/serving/engine_args.md
@ -7,6 +7,8 @@ Engine arguments control the behavior of the vLLM engine.
 - For [offline inference](#offline-inference), they are part of the arguments to `LLM` class.
 - For [online serving](#openai-compatible-server), they are part of the arguments to `vllm serve`.

+For references to all arguments available from `vllm serve` see the [serve args](#serve-args) documentation.
+
 Below, you can find an explanation of every engine argument:

 <!--- pyml disable-num-lines 7 no-space-in-emphasis -->
--- a/docs/source/serving/openai_compatible_server.md
+++ b/docs/source/serving/openai_compatible_server.md
@ -4,7 +4,7 @@

 vLLM provides an HTTP server that implements OpenAI's [Completions API](https://platform.openai.com/docs/api-reference/completions), [Chat API](https://platform.openai.com/docs/api-reference/chat), and more! This functionality lets you serve models and interact with them using an HTTP client.

-In your terminal, you can [install](../getting_started/installation.md) vLLM, then start the server with the [`vllm serve`](#vllm-serve) command. (You can also use our [Docker](#deployment-docker) image.)
+In your terminal, you can [install](../getting_started/installation.md) vLLM, then start the server with the [`vllm serve`](#serve-args) command. (You can also use our [Docker](#deployment-docker) image.)

 ```bash
 vllm serve NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123
@ -168,54 +168,6 @@ completion = client.completions.create(
 print(completion._request_id)
 ```

-## CLI Reference
-
-(vllm-serve)=
-
-### `vllm serve`
-
-The `vllm serve` command is used to launch the OpenAI-compatible server.
-
-:::{tip}
-The vast majority of command-line arguments are based on those for offline inference.
-
-See [here](configuration-options) for some common options.
-:::
-
-:::{argparse}
-:module: vllm.entrypoints.openai.cli_args
-:func: create_parser_for_docs
-:prog: vllm serve
-:::
-
-#### Configuration file
-
-You can load CLI arguments via a [YAML](https://yaml.org/) config file.
-The argument names must be the long form of those outlined [above](#vllm-serve).
-
-For example:
-
-```yaml
-# config.yaml
-
-model: meta-llama/Llama-3.1-8B-Instruct
-host: "127.0.0.1"
-port: 6379
-uvicorn-log-level: "info"
-```
-
-To use the above config file:
-
-```bash
-vllm serve --config config.yaml
-```
-
-:::{note}
-In case an argument is supplied simultaneously using command line and the config file, the value from the command line will take precedence.
-The order of priorities is `command line > config file values > defaults`.
-e.g. `vllm serve SOME_MODEL --config config.yaml`, SOME_MODEL takes precedence over `model` in config file.
-:::
-
 ## API Reference

 (completions-api)=
--- a/docs/source/serving/serve_args.md
+++ b/docs/source/serving/serve_args.md
@ -0,0 +1,47 @@
+(serve-args)=
+
+# Server Arguments
+
+The `vllm serve` command is used to launch the OpenAI-compatible server.
+
+## CLI Arguments
+
+The following are all arguments available from the `vllm serve` command:
+
+<!--- pyml disable-num-lines 7 no-space-in-emphasis -->
+```{eval-rst}
+.. argparse::
+    :module: vllm.entrypoints.openai.cli_args
+    :func: create_parser_for_docs
+    :prog: vllm serve
+    :nodefaultconst:
+    :markdownhelp:
+```
+
+## Configuration file
+
+You can load CLI arguments via a [YAML](https://yaml.org/) config file.
+The argument names must be the long form of those outlined [above](#serve-args).
+
+For example:
+
+```yaml
+# config.yaml
+
+model: meta-llama/Llama-3.1-8B-Instruct
+host: "127.0.0.1"
+port: 6379
+uvicorn-log-level: "info"
+```
+
+To use the above config file:
+
+```bash
+vllm serve --config config.yaml
+```
+
+:::{note}
+In case an argument is supplied simultaneously using command line and the config file, the value from the command line will take precedence.
+The order of priorities is `command line > config file values > defaults`.
+e.g. `vllm serve SOME_MODEL --config config.yaml`, SOME_MODEL takes precedence over `model` in config file.
+:::