Move cli args docs to its own page (#18228) (#18264)

Signed-off-by: Trevor Royer <troyer@redhat.com>
This commit is contained in:
Trevor Royer 2025-05-16 19:43:45 -07:00 committed by GitHub
parent fd195b194e
commit 55f1a468d9
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
4 changed files with 51 additions and 49 deletions

View File

@ -117,6 +117,7 @@ training/rlhf.md
serving/offline_inference
serving/openai_compatible_server
serving/serve_args
serving/multimodal_inputs
serving/distributed_serving
serving/metrics

View File

@ -7,6 +7,8 @@ Engine arguments control the behavior of the vLLM engine.
- For [offline inference](#offline-inference), they are part of the arguments to `LLM` class.
- For [online serving](#openai-compatible-server), they are part of the arguments to `vllm serve`.
For references to all arguments available from `vllm serve` see the [serve args](#serve-args) documentation.
Below, you can find an explanation of every engine argument:
<!--- pyml disable-num-lines 7 no-space-in-emphasis -->

View File

@ -4,7 +4,7 @@
vLLM provides an HTTP server that implements OpenAI's [Completions API](https://platform.openai.com/docs/api-reference/completions), [Chat API](https://platform.openai.com/docs/api-reference/chat), and more! This functionality lets you serve models and interact with them using an HTTP client.
In your terminal, you can [install](../getting_started/installation.md) vLLM, then start the server with the [`vllm serve`](#vllm-serve) command. (You can also use our [Docker](#deployment-docker) image.)
In your terminal, you can [install](../getting_started/installation.md) vLLM, then start the server with the [`vllm serve`](#serve-args) command. (You can also use our [Docker](#deployment-docker) image.)
```bash
vllm serve NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123
@ -168,54 +168,6 @@ completion = client.completions.create(
print(completion._request_id)
```
## CLI Reference
(vllm-serve)=
### `vllm serve`
The `vllm serve` command is used to launch the OpenAI-compatible server.
:::{tip}
The vast majority of command-line arguments are based on those for offline inference.
See [here](configuration-options) for some common options.
:::
:::{argparse}
:module: vllm.entrypoints.openai.cli_args
:func: create_parser_for_docs
:prog: vllm serve
:::
#### Configuration file
You can load CLI arguments via a [YAML](https://yaml.org/) config file.
The argument names must be the long form of those outlined [above](#vllm-serve).
For example:
```yaml
# config.yaml
model: meta-llama/Llama-3.1-8B-Instruct
host: "127.0.0.1"
port: 6379
uvicorn-log-level: "info"
```
To use the above config file:
```bash
vllm serve --config config.yaml
```
:::{note}
In case an argument is supplied simultaneously using command line and the config file, the value from the command line will take precedence.
The order of priorities is `command line > config file values > defaults`.
e.g. `vllm serve SOME_MODEL --config config.yaml`, SOME_MODEL takes precedence over `model` in config file.
:::
## API Reference
(completions-api)=

View File

@ -0,0 +1,47 @@
(serve-args)=
# Server Arguments
The `vllm serve` command is used to launch the OpenAI-compatible server.
## CLI Arguments
The following are all arguments available from the `vllm serve` command:
<!--- pyml disable-num-lines 7 no-space-in-emphasis -->
```{eval-rst}
.. argparse::
:module: vllm.entrypoints.openai.cli_args
:func: create_parser_for_docs
:prog: vllm serve
:nodefaultconst:
:markdownhelp:
```
## Configuration file
You can load CLI arguments via a [YAML](https://yaml.org/) config file.
The argument names must be the long form of those outlined [above](#serve-args).
For example:
```yaml
# config.yaml
model: meta-llama/Llama-3.1-8B-Instruct
host: "127.0.0.1"
port: 6379
uvicorn-log-level: "info"
```
To use the above config file:
```bash
vllm serve --config config.yaml
```
:::{note}
In case an argument is supplied simultaneously using command line and the config file, the value from the command line will take precedence.
The order of priorities is `command line > config file values > defaults`.
e.g. `vllm serve SOME_MODEL --config config.yaml`, SOME_MODEL takes precedence over `model` in config file.
:::