update lfai
This commit is contained in:
parent
33d16cb301
commit
9227cfd6d5
|
@ -18,8 +18,12 @@ GEM
|
|||
eventmachine (>= 0.12.9)
|
||||
http_parser.rb (~> 0)
|
||||
eventmachine (1.2.7)
|
||||
ffi (1.17.0-arm64-darwin)
|
||||
ffi (1.17.0-x86_64-darwin)
|
||||
forwardable-extended (2.6.0)
|
||||
google-protobuf (4.27.2-arm64-darwin)
|
||||
bigdecimal
|
||||
rake (>= 13)
|
||||
google-protobuf (4.27.2-x86_64-darwin)
|
||||
bigdecimal
|
||||
rake (>= 13)
|
||||
|
@ -70,6 +74,8 @@ GEM
|
|||
strscan
|
||||
rouge (4.3.0)
|
||||
safe_yaml (1.0.5)
|
||||
sass-embedded (1.77.8-arm64-darwin)
|
||||
google-protobuf (~> 4.26)
|
||||
sass-embedded (1.77.8-x86_64-darwin)
|
||||
google-protobuf (~> 4.26)
|
||||
strscan (3.1.0)
|
||||
|
@ -79,6 +85,7 @@ GEM
|
|||
webrick (1.8.1)
|
||||
|
||||
PLATFORMS
|
||||
arm64-darwin-23
|
||||
x86_64-darwin-22
|
||||
x86_64-darwin-23
|
||||
|
||||
|
|
|
@ -33,7 +33,7 @@ In our objective for performance optimization, we have made the following progre
|
|||
|
||||
|
||||
* Publication of benchmarks
|
||||
* Published per-commit performance tracker at [perf.vllm.ai](perf.vllm.ai) on our public benchmarks. The goal of this is to track performance enhancement and regressions.
|
||||
* Published per-commit performance tracker at [perf.vllm.ai](https://perf.vllm.ai) on our public benchmarks. The goal of this is to track performance enhancement and regressions.
|
||||
* Published reproducible benchmark ([docs](https://docs.vllm.ai/en/latest/performance_benchmark/benchmarks.html)) of vLLM compared to LMDeploy, TGI, and TensorRT-LLM. The goal is to identify gaps in performance and close them.
|
||||
* Development and integration of highly optimized kernels
|
||||
* Integrated FlashAttention2 with PagedAttention, and [FlashInfer](https://github.com/flashinfer-ai/flashinfer). We plan to integrate [FlashAttention3](https://github.com/vllm-project/vllm/issues/6348).
|
||||
|
@ -44,7 +44,7 @@ In our objective for performance optimization, we have made the following progre
|
|||
* We identified vLLM’s OpenAI-compatible API frontend has higher than desired overhead. [We are working on isolating it from the critical path of scheduler and model inference. ](https://github.com/vllm-project/vllm/issues/6797)
|
||||
* We identified vLLM’s input preparation, and output processing scale suboptimally with the data size. Many of the operations can be vectorized and enhanced by moving them off the critical path.
|
||||
|
||||
We will continue to update the community in vLLM’s progress in closing the performance gap. You can track out overall progress [here](https://github.com/vllm-project/vllm/issues/6801). Please continue to suggest new ideas and contribute with your improvements!
|
||||
We will continue to update the community in vLLM’s progress in closing the performance gap. You can track our overall progress [here](https://github.com/vllm-project/vllm/issues/6801). Please continue to suggest new ideas and contribute with your improvements!
|
||||
|
||||
|
||||
### More Resources
|
||||
|
|
Loading…
Reference in New Issue