Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
This commit is contained in:
Yuan Tang 2025-01-19 00:05:16 -05:00
parent d63d39a314
commit ed4835234b
No known key found for this signature in database
1 changed files with 4 additions and 3 deletions

View File

@ -15,10 +15,11 @@ TODO(ashwin): more background information on Llama Stack
# vLLM Inference Provider
There are two options: https://docs.vllm.ai/en/latest/serving/serving_with_llamastack.html
TODO(yuan): more details here
Llama Stack provides two vLLM inference providers:
1. [Remote vLLM inference provider](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/remote-vllm.html) through vLLM's [OpenAI-compatible server](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-completions-api-with-vllm);
1. [Inline vLLM inference provider](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/inline/inference/vllm) that runs alongside with Llama Stack server.
We will cover the remote vLLM provider here.
In this article, we will demonstrate the functionality through the remote vLLM inference provider.
# Tutorial