Disaggregated Serving
This example contains scripts that demonstrate the disaggregated serving features of vLLM.
Files
disagg_proxy_demo.py
- Demonstrates XpYd (X prefill instances, Y decode instances).
kv_events.sh
- Demonstrates KV cache event publishing.