diff --git a/Gemfile.lock b/Gemfile.lock index 8c9ad74..b925f49 100644 --- a/Gemfile.lock +++ b/Gemfile.lock @@ -18,8 +18,12 @@ GEM eventmachine (>= 0.12.9) http_parser.rb (~> 0) eventmachine (1.2.7) + ffi (1.17.0-arm64-darwin) ffi (1.17.0-x86_64-darwin) forwardable-extended (2.6.0) + google-protobuf (4.27.2-arm64-darwin) + bigdecimal + rake (>= 13) google-protobuf (4.27.2-x86_64-darwin) bigdecimal rake (>= 13) @@ -70,6 +74,8 @@ GEM strscan rouge (4.3.0) safe_yaml (1.0.5) + sass-embedded (1.77.8-arm64-darwin) + google-protobuf (~> 4.26) sass-embedded (1.77.8-x86_64-darwin) google-protobuf (~> 4.26) strscan (3.1.0) @@ -79,6 +85,7 @@ GEM webrick (1.8.1) PLATFORMS + arm64-darwin-23 x86_64-darwin-22 x86_64-darwin-23 diff --git a/_posts/2024-07-25-lfai-perf.md b/_posts/2024-07-25-lfai-perf.md index e827aae..2b8dd91 100644 --- a/_posts/2024-07-25-lfai-perf.md +++ b/_posts/2024-07-25-lfai-perf.md @@ -33,7 +33,7 @@ In our objective for performance optimization, we have made the following progre * Publication of benchmarks - * Published per-commit performance tracker at [perf.vllm.ai](perf.vllm.ai) on our public benchmarks. The goal of this is to track performance enhancement and regressions. + * Published per-commit performance tracker at [perf.vllm.ai](https://perf.vllm.ai) on our public benchmarks. The goal of this is to track performance enhancement and regressions. * Published reproducible benchmark ([docs](https://docs.vllm.ai/en/latest/performance_benchmark/benchmarks.html)) of vLLM compared to LMDeploy, TGI, and TensorRT-LLM. The goal is to identify gaps in performance and close them. * Development and integration of highly optimized kernels * Integrated FlashAttention2 with PagedAttention, and [FlashInfer](https://github.com/flashinfer-ai/flashinfer). We plan to integrate [FlashAttention3](https://github.com/vllm-project/vllm/issues/6348). @@ -44,7 +44,7 @@ In our objective for performance optimization, we have made the following progre * We identified vLLM’s OpenAI-compatible API frontend has higher than desired overhead. [We are working on isolating it from the critical path of scheduler and model inference. ](https://github.com/vllm-project/vllm/issues/6797) * We identified vLLM’s input preparation, and output processing scale suboptimally with the data size. Many of the operations can be vectorized and enhanced by moving them off the critical path. -We will continue to update the community in vLLM’s progress in closing the performance gap. You can track out overall progress [here](https://github.com/vllm-project/vllm/issues/6801). Please continue to suggest new ideas and contribute with your improvements! +We will continue to update the community in vLLM’s progress in closing the performance gap. You can track our overall progress [here](https://github.com/vllm-project/vllm/issues/6801). Please continue to suggest new ideas and contribute with your improvements! ### More Resources