update lfai

2024-07-25 14:56:21 -07:00 · 2024-07-25 14:56:21 -07:00 · 9227cfd6d5
parent 33d16cb301
commit 9227cfd6d5
2 changed files with 9 additions and 2 deletions
--- a/Gemfile.lock
+++ b/Gemfile.lock
@ -18,8 +18,12 @@ GEM
      eventmachine (>= 0.12.9)
      http_parser.rb (~> 0)
    eventmachine (1.2.7)
+    ffi (1.17.0-arm64-darwin)
    ffi (1.17.0-x86_64-darwin)
    forwardable-extended (2.6.0)
+    google-protobuf (4.27.2-arm64-darwin)
+      bigdecimal
+      rake (>= 13)
    google-protobuf (4.27.2-x86_64-darwin)
      bigdecimal
      rake (>= 13)
@ -70,6 +74,8 @@ GEM
      strscan
    rouge (4.3.0)
    safe_yaml (1.0.5)
+    sass-embedded (1.77.8-arm64-darwin)
+      google-protobuf (~> 4.26)
    sass-embedded (1.77.8-x86_64-darwin)
      google-protobuf (~> 4.26)
    strscan (3.1.0)
@ -79,6 +85,7 @@ GEM
    webrick (1.8.1)

 PLATFORMS
+  arm64-darwin-23
  x86_64-darwin-22
  x86_64-darwin-23

--- a/_posts/2024-07-25-lfai-perf.md
+++ b/_posts/2024-07-25-lfai-perf.md
@ -33,7 +33,7 @@ In our objective for performance optimization, we have made the following progre


 * Publication of benchmarks
-    * Published per-commit performance tracker at [perf.vllm.ai](perf.vllm.ai) on our public benchmarks. The goal of this is to track performance enhancement and regressions.
+    * Published per-commit performance tracker at [perf.vllm.ai](https://perf.vllm.ai) on our public benchmarks. The goal of this is to track performance enhancement and regressions.
    * Published reproducible benchmark ([docs](https://docs.vllm.ai/en/latest/performance_benchmark/benchmarks.html)) of vLLM compared to LMDeploy, TGI, and TensorRT-LLM. The goal is to identify gaps in performance and close them. 
 * Development and integration of highly optimized kernels
    * Integrated FlashAttention2 with PagedAttention, and [FlashInfer](https://github.com/flashinfer-ai/flashinfer). We plan to integrate [FlashAttention3](https://github.com/vllm-project/vllm/issues/6348). 
@ -44,7 +44,7 @@ In our objective for performance optimization, we have made the following progre
    * We identified vLLM’s OpenAI-compatible API frontend has higher than desired overhead. [We are working on isolating it from the critical path of scheduler and model inference. ](https://github.com/vllm-project/vllm/issues/6797)
    * We identified vLLM’s input preparation, and output processing scale suboptimally with the data size. Many of the operations can be vectorized and enhanced by moving them off the critical path.

-We will continue to update the community in vLLM’s progress in closing the performance gap. You can track out overall progress [here](https://github.com/vllm-project/vllm/issues/6801). Please continue to suggest new ideas and contribute with your improvements! 
+We will continue to update the community in vLLM’s progress in closing the performance gap. You can track our overall progress [here](https://github.com/vllm-project/vllm/issues/6801). Please continue to suggest new ideas and contribute with your improvements! 


 ### More Resources