Minor
Signed-off-by: WoosukKwon <woosuk.kwon@berkeley.edu>
This commit is contained in:
parent
f90c18079b
commit
e83a5be9ff
|
@ -112,7 +112,7 @@ The final piece of the puzzle for vLLM V1 was integrating [FlashAttention 3](htt
|
|||
|
||||
# Performance
|
||||
|
||||
Thanks to the extensive architectural enhancements, vLLM V1 achieves state-of-the-art throughput and latency, delivering up to **1.7x** higher throughput compared to V0 (*without multi-step scheduling*).
|
||||
Thanks to the extensive architectural enhancements, vLLM V1 achieves state-of-the-art throughput and latency, delivering up to **1.7x higher throughput** compared to V0 (*without multi-step scheduling*).
|
||||
These dramatic performance gains stem from comprehensive CPU overhead reductions across the entire stack.
|
||||
The improvements are even more pronounced for vision-language models (VLMs) like Qwen2-VL, thanks to V1's enhanced support for VLMs.
|
||||
|
||||
|
@ -185,7 +185,7 @@ The V1 re-architecture is a continued joint effort across the entire vLLM team a
|
|||
- [Tyler Michael Smith](https://github.com/tlrmchlsmth) implemented the tensor parallelism support with Python multiprocessing.
|
||||
- [Rui Qiao](https://github.com/ruisearch42) implemented the tensor parallelism support with Ray and is implementing pipeline parallelism support.
|
||||
- [Lucas Wilkinson](https://github.com/LucasWilkinson) added support for FlashAttention 3.
|
||||
- [Alexander Matveev](https://github.com/alexm-redhat) implemented the optimized preprocessor for multimodal inputs.
|
||||
- [Alexander Matveev](https://github.com/alexm-redhat) implemented the optimized preprocessor for multimodal inputs and is implementing TPU support.
|
||||
- [Sourashis Roy](https://github.com/sroy745) implemented the logit penalties in the sampler.
|
||||
- [Cyrus Leung](https://github.com/DarkLight1337) led the MLLM input processing refactoring effort and helped its integration to V1.
|
||||
- [Russell Bryant](https://github.com/russellb) addressed several multiprocess-related issues.
|
Loading…
Reference in New Issue