vllm/examples/online_serving/structured_outputs
Aaron Pham 7b3c9ff91d
[Doc] uses absolute links for structured outputs (#19582)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2025-06-13 03:35:17 +00:00
..
README.md [Doc] uses absolute links for structured outputs (#19582) 2025-06-13 03:35:17 +00:00
pyproject.toml [Doc] Unify structured outputs examples (#18196) 2025-06-12 22:50:31 +00:00
structured_outputs.py [Doc] Unify structured outputs examples (#18196) 2025-06-12 22:50:31 +00:00

README.md

Structured Outputs

This script demonstrates various structured output capabilities of vLLM's OpenAI-compatible server. It can run individual constraint type or all of them. It supports both streaming responses and concurrent non-streaming requests.

To use this example, you must start an vLLM server with any model of your choice.

vllm serve Qwen/Qwen2.5-3B-Instruct

To serve a reasoning model, you can use the following command:

vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --reasoning-parser deepseek_r1

If you want to run this script standalone with uv, you can use the following:

uvx --from git+https://github.com/vllm-project/vllm#subdirectory=examples/online_serving/structured_outputs structured-output

See feature docs for more information.

!!! tip If vLLM is running remotely, then set OPENAI_BASE_URL=<remote_url> before running the script.

Usage

Run all constraints, non-streaming:

uv run structured_outputs.py

Run all constraints, streaming:

uv run structured_outputs.py --stream

Run certain constraints, for example structural_tag and regex, streaming:

uv run structured_outputs.py --constraint structural_tag regex --stream

Run all constraints, with reasoning models and streaming:

uv run structured_outputs.py --reasoning --stream