From d2264ca838b1c2ba42f3a78fb22aab4506f841b5 Mon Sep 17 00:00:00 2001
From: Yuan Tang <terrytangyuan@gmail.com>
Date: Fri, 24 Jan 2025 16:42:11 -0500
Subject: [PATCH] Move diagram to the right

---
 _posts/2025-01-27-intro-to-llama-stack-with-vllm.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/_posts/2025-01-27-intro-to-llama-stack-with-vllm.md b/_posts/2025-01-27-intro-to-llama-stack-with-vllm.md
index 52b2434..6d48e28 100644
--- a/_posts/2025-01-27-intro-to-llama-stack-with-vllm.md
+++ b/_posts/2025-01-27-intro-to-llama-stack-with-vllm.md
@@ -9,12 +9,12 @@ We are excited to announce that vLLM inference provider is now available in [Lla
 
 # What is Llama Stack?
 
+<img align="right" src="https://llama-stack.readthedocs.io/en/latest/_images/llama-stack.png" alt="llama-stack-diagram" width="50%" height="50%">
+
 Llama Stack defines and standardizes the set of core building blocks needed to bring generative AI applications to market. These building blocks are presented in the form of interoperable APIs with a broad set of Service Providers providing their implementations.
 
 Llama Stack focuses on making it easy to build production applications with a variety of models - ranging from the latest Llama 3.3 model to specialized models like Llama Guard for safety. More models beyond the Llama model family are in the works. The goal is to provide pre-packaged implementations (aka “distributions”) which can be run in a variety of deployment environments. The Stack can assist you in your entire app development lifecycle - start iterating on local, mobile or desktop and seamlessly transition to on-prem or public cloud deployments. At every point in this transition, the same set of APIs and the same developer experience is available.
 
-<!-- ideally we could float this image to the right so it does not come in the flow of the doc -->
-<img width="320" src="https://llama-stack.readthedocs.io/en/latest/_images/llama-stack.png" />
 
 Each specific implementation of an API is called a "Provider" in this architecture. Users can swap providers via configuration. `vLLM` is a prominent example of a high-performance API backing the inference API.