diff --git a/_posts/2025-01-24-v1.md b/_posts/2025-01-24-v1.md
index 2f3d669..ad6455d 100644
--- a/_posts/2025-01-24-v1.md
+++ b/_posts/2025-01-24-v1.md
@@ -32,7 +32,11 @@ vLLM V1 introduces a comprehensive re-architecture of its core components, inclu
 
 ## 1. Optimized Execution Loop & API Server
 
-![][image2]
+<p align="center">
+<picture>
+<img src="/assets/figures/v1/v1_server_architecture.png" width="80%">
+</picture>
+</p>
 
 As a full-fledged continuous batching engine and OpenAI-compatible API server, vLLM’s core execution loop relies on CPU operations to manage request states between model forward passes. As GPUs are getting faster and significantly reducing model execution times, the CPU overhead for tasks like running the API server, scheduling work, preparing inputs, de-tokenizing outputs, and streaming responses to users becomes increasingly pronounced. This issue is particularly noticeable with smaller models like Llama-8B running on NVIDIA H100 GPUs, where execution time on the GPU is as low as ~5ms.
 
diff --git a/assets/figures/v1/v1_server_architecture.png b/assets/figures/v1/v1_server_architecture.png
new file mode 100644
index 0000000..7acae43
Binary files /dev/null and b/assets/figures/v1/v1_server_architecture.png differ