LMCache Examples

This folder demonstrates how to use LMCache for disaggregated prefilling, CPU offloading and KV cache sharing.

1. Disaggregated Prefill in vLLM v1

This example demonstrates how to run LMCache with disaggregated prefill using NIXL on a single node.

Run cd disagg_prefill_lmcache_v1 to get into disagg_prefill_lmcache_v1 folder, and then run

bash disagg_example_nixl.sh

to run disaggregated prefill and benchmark the performance.

disagg_prefill_lmcache_v1/disagg_vllm_launcher.sh - Launches individual vLLM servers for prefill/decode, and also launches the proxy server.
disagg_prefill_lmcache_v1/disagg_proxy_server.py - FastAPI proxy server that coordinates between prefiller and decoder
disagg_prefill_lmcache_v1/disagg_example_nixl.sh - Main script to run the example

disagg_prefill_lmcache_v1/configs/lmcache-prefiller-config.yaml - Configuration for prefiller server
disagg_prefill_lmcache_v1/configs/lmcache-decoder-config.yaml - Configuration for decoder server

The main script generates several log files:

python cpu_offload_lmcache.py -v v0 - CPU offloading implementation for vLLM v0
python cpu_offload_lmcache.py -v v1 - CPU offloading implementation for vLLM v1

The kv_cache_sharing_lmcache_v1.py example demonstrates how to share KV caches between vLLM v1 instances.

The disaggregated_prefill_lmcache_v0.py provides an example of how to run disaggregated prefill in vLLM v0.