update lfai

This commit is contained in:
simon-mo 2024-07-25 14:56:21 -07:00
parent 33d16cb301
commit 9227cfd6d5
2 changed files with 9 additions and 2 deletions

View File

@ -18,8 +18,12 @@ GEM
eventmachine (>= 0.12.9)
http_parser.rb (~> 0)
eventmachine (1.2.7)
ffi (1.17.0-arm64-darwin)
ffi (1.17.0-x86_64-darwin)
forwardable-extended (2.6.0)
google-protobuf (4.27.2-arm64-darwin)
bigdecimal
rake (>= 13)
google-protobuf (4.27.2-x86_64-darwin)
bigdecimal
rake (>= 13)
@ -70,6 +74,8 @@ GEM
strscan
rouge (4.3.0)
safe_yaml (1.0.5)
sass-embedded (1.77.8-arm64-darwin)
google-protobuf (~> 4.26)
sass-embedded (1.77.8-x86_64-darwin)
google-protobuf (~> 4.26)
strscan (3.1.0)
@ -79,6 +85,7 @@ GEM
webrick (1.8.1)
PLATFORMS
arm64-darwin-23
x86_64-darwin-22
x86_64-darwin-23

View File

@ -33,7 +33,7 @@ In our objective for performance optimization, we have made the following progre
* Publication of benchmarks
* Published per-commit performance tracker at [perf.vllm.ai](perf.vllm.ai) on our public benchmarks. The goal of this is to track performance enhancement and regressions.
* Published per-commit performance tracker at [perf.vllm.ai](https://perf.vllm.ai) on our public benchmarks. The goal of this is to track performance enhancement and regressions.
* Published reproducible benchmark ([docs](https://docs.vllm.ai/en/latest/performance_benchmark/benchmarks.html)) of vLLM compared to LMDeploy, TGI, and TensorRT-LLM. The goal is to identify gaps in performance and close them.
* Development and integration of highly optimized kernels
* Integrated FlashAttention2 with PagedAttention, and [FlashInfer](https://github.com/flashinfer-ai/flashinfer). We plan to integrate [FlashAttention3](https://github.com/vllm-project/vllm/issues/6348).
@ -44,7 +44,7 @@ In our objective for performance optimization, we have made the following progre
* We identified vLLMs OpenAI-compatible API frontend has higher than desired overhead. [We are working on isolating it from the critical path of scheduler and model inference. ](https://github.com/vllm-project/vllm/issues/6797)
* We identified vLLMs input preparation, and output processing scale suboptimally with the data size. Many of the operations can be vectorized and enhanced by moving them off the critical path.
We will continue to update the community in vLLMs progress in closing the performance gap. You can track out overall progress [here](https://github.com/vllm-project/vllm/issues/6801). Please continue to suggest new ideas and contribute with your improvements!
We will continue to update the community in vLLMs progress in closing the performance gap. You can track our overall progress [here](https://github.com/vllm-project/vllm/issues/6801). Please continue to suggest new ideas and contribute with your improvements!
### More Resources