Merge pull request #2 from vllm-project/bold

Bold
This commit is contained in:
Zhuohan Li 2023-11-14 14:32:29 -08:00 committed by GitHub
commit 73685a63a9
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 3 additions and 3 deletions

View File

@ -23,8 +23,8 @@ For the majority of workloads, vLLM is faster than (or performs comparably to) D
We've identified two key differences between vLLM and DeepSpeed in terms of performance optimization: We've identified two key differences between vLLM and DeepSpeed in terms of performance optimization:
1. DeepSpeed adopts a conservative/suboptimal memory allocation scheme, which wastes memory when output lengths are large. 1. **DeepSpeed adopts a conservative/suboptimal memory allocation scheme**, which wastes memory when output lengths are large.
2. DeepSpeeds Dynamic SplitFuse scheduling gives speedup only when prompt lengths are much greater than output lengths. 2. DeepSpeeds Dynamic SplitFuse scheduling gives **speedup only when prompt lengths are much greater than output lengths**.
As a result, DeepSpeed outperforms when the workload is consistently long prompt and short output. As a result, DeepSpeed outperforms when the workload is consistently long prompt and short output.
In other scenarios, vLLM shows superior performance. In other scenarios, vLLM shows superior performance.
@ -40,7 +40,7 @@ However, the performance gain we observe isn't as significant as 2x.
</p> </p>
#### Scenario 2: Other cases #### Scenario 2: Other cases
In these cases, vLLM is up to 1.8x faster than DeepSpeed. In these cases, vLLM is up to **1.8x** faster than DeepSpeed.
<p align="center"> <p align="center">
<picture> <picture>