From c8f485f16d2a0642cb7ce7cd090a0beab3f87178 Mon Sep 17 00:00:00 2001 From: Yuan Tang Date: Fri, 24 Jan 2025 20:49:08 -0500 Subject: [PATCH] v0.1.0 with fix #879 Signed-off-by: Yuan Tang --- _posts/2025-01-27-intro-to-llama-stack-with-vllm.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/_posts/2025-01-27-intro-to-llama-stack-with-vllm.md b/_posts/2025-01-27-intro-to-llama-stack-with-vllm.md index e5efefe..72d9815 100644 --- a/_posts/2025-01-27-intro-to-llama-stack-with-vllm.md +++ b/_posts/2025-01-27-intro-to-llama-stack-with-vllm.md @@ -32,10 +32,9 @@ In this article, we will demonstrate the functionality through the remote vLLM i * Linux operating system * [Hugging Face CLI](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli) if you'd like to download the model via CLI. -* OCI-compliant technologies like [Podman](https://podman.io/) or [Docker](https://www.docker.com/) (can be specified via the `CONTAINER_BINARY` environment variable when running `llama stack` CLI commands). +* OCI-compliant container technologies like [Podman](https://podman.io/) or [Docker](https://www.docker.com/) (can be specified via the `CONTAINER_BINARY` environment variable when running `llama stack` CLI commands). * [Kind](https://kind.sigs.k8s.io/) for Kubernetes deployment. * [Conda](https://github.com/conda/conda) for managing Python environment. -* Python >= 3.10 if you'd like to test the [Llama Stack Python SDK](https://github.com/meta-llama/llama-stack-client-python). ## Get Started via Containers @@ -119,7 +118,7 @@ image_type: container EOF export CONTAINER_BINARY=podman -LLAMA_STACK_DIR=. PYTHONPATH=. python -m llama_stack.cli.llama stack build --config /tmp/test-vllm-llama-stack/vllm-llama-stack-build.yaml +LLAMA_STACK_DIR=. PYTHONPATH=. python -m llama_stack.cli.llama stack build --config /tmp/test-vllm-llama-stack/vllm-llama-stack-build.yaml --image-name distribution-myenv ``` Once the container image has been built successfully, we can then edit the generated `vllm-run.yaml` to be `/tmp/test-vllm-llama-stack/vllm-llama-stack-run.yaml` with the following change in the `models` field: @@ -137,14 +136,14 @@ Then we can start the LlamaStack Server with the image we built via `llama stack export INFERENCE_ADDR=host.containers.internal export INFERENCE_PORT=8000 export INFERENCE_MODEL=meta-llama/Llama-3.2-1B-Instruct -export LLAMASTACK_PORT=5000 +export LLAMA_STACK_PORT=5000 LLAMA_STACK_DIR=. PYTHONPATH=. python -m llama_stack.cli.llama stack run \ --env INFERENCE_MODEL=$INFERENCE_MODEL \ --env VLLM_URL=http://$INFERENCE_ADDR:$INFERENCE_PORT/v1 \ --env VLLM_MAX_TOKENS=8192 \ --env VLLM_API_TOKEN=fake \ ---env LLAMASTACK_PORT=$LLAMASTACK_PORT \ +--env LLAMA_STACK_PORT=$LLAMA_STACK_PORT \ /tmp/test-vllm-llama-stack/vllm-llama-stack-run.yaml ``` @@ -155,7 +154,7 @@ podman run --security-opt label=disable -it --network host -v /tmp/test-vllm-lla --env VLLM_URL=http://$INFERENCE_ADDR:$INFERENCE_PORT/v1 \ --env VLLM_MAX_TOKENS=8192 \ --env VLLM_API_TOKEN=fake \ ---env LLAMASTACK_PORT=$LLAMASTACK_PORT \ +--env LLAMA_STACK_PORT=$LLAMA_STACK_PORT \ --entrypoint='["python", "-m", "llama_stack.distribution.server.server", "--yaml-config", "/app/config.yaml"]' \ localhost/distribution-myenv:dev ```