Signed-off-by: Aaron Pham <contact@aarnphm.xyz> |
||
---|---|---|
.. | ||
README.md | ||
pyproject.toml | ||
structured_outputs.py |
README.md
Structured Outputs
This script demonstrates various structured output capabilities of vLLM's OpenAI-compatible server. It can run individual constraint type or all of them. It supports both streaming responses and concurrent non-streaming requests.
To use this example, you must start an vLLM server with any model of your choice.
vllm serve Qwen/Qwen2.5-3B-Instruct
To serve a reasoning model, you can use the following command:
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --reasoning-parser deepseek_r1
If you want to run this script standalone with uv
, you can use the following:
uvx --from git+https://github.com/vllm-project/vllm#subdirectory=examples/online_serving/structured_outputs structured-output
See feature docs for more information.
!!! tip
If vLLM is running remotely, then set OPENAI_BASE_URL=<remote_url>
before running the script.
Usage
Run all constraints, non-streaming:
uv run structured_outputs.py
Run all constraints, streaming:
uv run structured_outputs.py --stream
Run certain constraints, for example structural_tag
and regex
, streaming:
uv run structured_outputs.py --constraint structural_tag regex --stream
Run all constraints, with reasoning models and streaming:
uv run structured_outputs.py --reasoning --stream