edits

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-19 00:05:16 -05:00 · 2025-01-19 00:05:16 -05:00 · ed4835234b
parent d63d39a314
commit ed4835234b
1 changed files with 4 additions and 3 deletions
--- a/_posts/2025-01-12-intro-to-llama-stack-with-vllm.md
+++ b/_posts/2025-01-12-intro-to-llama-stack-with-vllm.md
@ -15,10 +15,11 @@ TODO(ashwin): more background information on Llama Stack

 # vLLM Inference Provider

-There are two options: https://docs.vllm.ai/en/latest/serving/serving_with_llamastack.html
-TODO(yuan): more details here
+Llama Stack provides two vLLM inference providers:
+1. [Remote vLLM inference provider](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/remote-vllm.html) through vLLM's [OpenAI-compatible server](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-completions-api-with-vllm);
+1. [Inline vLLM inference provider](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/inline/inference/vllm) that runs alongside with Llama Stack server.

-We will cover the remote vLLM provider here.
+In this article, we will demonstrate the functionality through the remote vLLM inference provider.

 # Tutorial