vllm/examples/offline_inference/disaggregated-prefill-v1
Harry Mellor 51ff154639
Improve examples rendering in docs and GitHub (#18203)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-15 15:57:49 +00:00
..
README.md Improve examples rendering in docs and GitHub (#18203) 2025-05-15 15:57:49 +00:00
decode_example.py Construct `KVTransferConfig` properly from Python instead of using JSON blobs without CLI (#17994) 2025-05-12 11:25:33 -07:00
prefill_example.py Construct `KVTransferConfig` properly from Python instead of using JSON blobs without CLI (#17994) 2025-05-12 11:25:33 -07:00
run.sh [P/D][V1] KV Connector API V1 (#15960) 2025-04-17 13:22:40 -07:00

README.md

Disaggregated Prefill V1

This example contains scripts that demonstrate disaggregated prefill in the offline setting of vLLM.

Files

  • run.sh - A helper script that will run prefill_example.py and decode_example.py sequentially.
  • prefill_example.py - A script which performs prefill only, saving the KV state to the local_storage directory and the prompts to output.txt.
  • decode_example.py - A script which performs decode only, loading the KV state from the local_storage directory and the prompts from output.txt.