From eef62fac45e7ae50af50de5ee3ea3d0eca512918 Mon Sep 17 00:00:00 2001 From: WoosukKwon Date: Sun, 26 Jan 2025 21:43:55 -0800 Subject: [PATCH] Minor Signed-off-by: WoosukKwon --- _posts/2025-01-26-v1-alpha-release.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/_posts/2025-01-26-v1-alpha-release.md b/_posts/2025-01-26-v1-alpha-release.md index 51e438f..6adc74b 100644 --- a/_posts/2025-01-26-v1-alpha-release.md +++ b/_posts/2025-01-26-v1-alpha-release.md @@ -136,8 +136,9 @@ Given that the kernels used for V0 and V1 are almost identical, the performance

-We evaluated the performance on VLMs by testing Qwen2-VL using the [VisionArena dataset](https://arxiv.org/abs/2412.08687). +We evaluated the performance on VLMs by testing Qwen2-VL using the [VisionArena](https://arxiv.org/abs/2412.08687) dataset. V1 delivered even larger speedups over V0, thanks its improved VLM support, driven by two key improvements: offloading input processing to a separate process and implementing more flexible scheduling for multimodal queries. +We would also like to point out that prefix caching is now natively supported for multimodal models in V1, but will skip the benchmark results here. - **Looking Forward**