From 55f1a468d97fbf9387e577e901b3f290ed8aa15b Mon Sep 17 00:00:00 2001 From: Trevor Royer Date: Fri, 16 May 2025 19:43:45 -0700 Subject: [PATCH] Move cli args docs to its own page (#18228) (#18264) Signed-off-by: Trevor Royer --- docs/source/index.md | 1 + docs/source/serving/engine_args.md | 2 + .../serving/openai_compatible_server.md | 50 +------------------ docs/source/serving/serve_args.md | 47 +++++++++++++++++ 4 files changed, 51 insertions(+), 49 deletions(-) create mode 100644 docs/source/serving/serve_args.md diff --git a/docs/source/index.md b/docs/source/index.md index bbff7361f7..0470a43a95 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -117,6 +117,7 @@ training/rlhf.md serving/offline_inference serving/openai_compatible_server +serving/serve_args serving/multimodal_inputs serving/distributed_serving serving/metrics diff --git a/docs/source/serving/engine_args.md b/docs/source/serving/engine_args.md index 97ea01cd3b..9325a2406e 100644 --- a/docs/source/serving/engine_args.md +++ b/docs/source/serving/engine_args.md @@ -7,6 +7,8 @@ Engine arguments control the behavior of the vLLM engine. - For [offline inference](#offline-inference), they are part of the arguments to `LLM` class. - For [online serving](#openai-compatible-server), they are part of the arguments to `vllm serve`. +For references to all arguments available from `vllm serve` see the [serve args](#serve-args) documentation. + Below, you can find an explanation of every engine argument: diff --git a/docs/source/serving/openai_compatible_server.md b/docs/source/serving/openai_compatible_server.md index 07bd211c23..61f7e98bf1 100644 --- a/docs/source/serving/openai_compatible_server.md +++ b/docs/source/serving/openai_compatible_server.md @@ -4,7 +4,7 @@ vLLM provides an HTTP server that implements OpenAI's [Completions API](https://platform.openai.com/docs/api-reference/completions), [Chat API](https://platform.openai.com/docs/api-reference/chat), and more! This functionality lets you serve models and interact with them using an HTTP client. -In your terminal, you can [install](../getting_started/installation.md) vLLM, then start the server with the [`vllm serve`](#vllm-serve) command. (You can also use our [Docker](#deployment-docker) image.) +In your terminal, you can [install](../getting_started/installation.md) vLLM, then start the server with the [`vllm serve`](#serve-args) command. (You can also use our [Docker](#deployment-docker) image.) ```bash vllm serve NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123 @@ -168,54 +168,6 @@ completion = client.completions.create( print(completion._request_id) ``` -## CLI Reference - -(vllm-serve)= - -### `vllm serve` - -The `vllm serve` command is used to launch the OpenAI-compatible server. - -:::{tip} -The vast majority of command-line arguments are based on those for offline inference. - -See [here](configuration-options) for some common options. -::: - -:::{argparse} -:module: vllm.entrypoints.openai.cli_args -:func: create_parser_for_docs -:prog: vllm serve -::: - -#### Configuration file - -You can load CLI arguments via a [YAML](https://yaml.org/) config file. -The argument names must be the long form of those outlined [above](#vllm-serve). - -For example: - -```yaml -# config.yaml - -model: meta-llama/Llama-3.1-8B-Instruct -host: "127.0.0.1" -port: 6379 -uvicorn-log-level: "info" -``` - -To use the above config file: - -```bash -vllm serve --config config.yaml -``` - -:::{note} -In case an argument is supplied simultaneously using command line and the config file, the value from the command line will take precedence. -The order of priorities is `command line > config file values > defaults`. -e.g. `vllm serve SOME_MODEL --config config.yaml`, SOME_MODEL takes precedence over `model` in config file. -::: - ## API Reference (completions-api)= diff --git a/docs/source/serving/serve_args.md b/docs/source/serving/serve_args.md new file mode 100644 index 0000000000..edb49f4ba6 --- /dev/null +++ b/docs/source/serving/serve_args.md @@ -0,0 +1,47 @@ +(serve-args)= + +# Server Arguments + +The `vllm serve` command is used to launch the OpenAI-compatible server. + +## CLI Arguments + +The following are all arguments available from the `vllm serve` command: + + +```{eval-rst} +.. argparse:: + :module: vllm.entrypoints.openai.cli_args + :func: create_parser_for_docs + :prog: vllm serve + :nodefaultconst: + :markdownhelp: +``` + +## Configuration file + +You can load CLI arguments via a [YAML](https://yaml.org/) config file. +The argument names must be the long form of those outlined [above](#serve-args). + +For example: + +```yaml +# config.yaml + +model: meta-llama/Llama-3.1-8B-Instruct +host: "127.0.0.1" +port: 6379 +uvicorn-log-level: "info" +``` + +To use the above config file: + +```bash +vllm serve --config config.yaml +``` + +:::{note} +In case an argument is supplied simultaneously using command line and the config file, the value from the command line will take precedence. +The order of priorities is `command line > config file values > defaults`. +e.g. `vllm serve SOME_MODEL --config config.yaml`, SOME_MODEL takes precedence over `model` in config file. +:::