diff --git a/_posts/2023-06-20-vllm.md b/_posts/2023-06-20-vllm.md index dfb0914..e465fdf 100644 --- a/_posts/2023-06-20-vllm.md +++ b/_posts/2023-06-20-vllm.md @@ -108,7 +108,7 @@ This utilization of vLLM has also significantly reduced operational costs. With ### Get started with vLLM -Install vLLM with the following command (check out our [installation guide](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) for more): +Install vLLM with the following command (check out our [installation guide](https://docs.vllm.ai/en/latest/getting_started/installation.html) for more): ```bash $ pip install vllm diff --git a/_posts/2024-09-05-perf-update.md b/_posts/2024-09-05-perf-update.md index caf7d3b..e5f46b9 100644 --- a/_posts/2024-09-05-perf-update.md +++ b/_posts/2024-09-05-perf-update.md @@ -150,7 +150,7 @@ Importantly, we will also focus on improving the core of vLLM to reduce the comp ### Get Involved -If you haven’t, we highly recommend you to update the vLLM version (see instructions [here](https://docs.vllm.ai/en/latest/getting_started/installation/index.html)) and try it out for yourself\! We always love to learn more about your use cases and how we can make vLLM better for you. The vLLM team can be reached out via [vllm-questions@lists.berkeley.edu](mailto:vllm-questions@lists.berkeley.edu). vLLM is also a community project, if you are interested in participating and contributing, we welcome you to check out our [roadmap](https://roadmap.vllm.ai/) and see [good first issues](https://github.com/vllm-project/vllm/issues?q=is:open+is:issue+label:%22good+first+issue%22) to tackle. Stay tuned for more updates by [following us on X](https://x.com/vllm\_project). +If you haven’t, we highly recommend you to update the vLLM version (see instructions [here](https://docs.vllm.ai/en/latest/getting_started/installation.html)) and try it out for yourself\! We always love to learn more about your use cases and how we can make vLLM better for you. The vLLM team can be reached out via [vllm-questions@lists.berkeley.edu](mailto:vllm-questions@lists.berkeley.edu). vLLM is also a community project, if you are interested in participating and contributing, we welcome you to check out our [roadmap](https://roadmap.vllm.ai/) and see [good first issues](https://github.com/vllm-project/vllm/issues?q=is:open+is:issue+label:%22good+first+issue%22) to tackle. Stay tuned for more updates by [following us on X](https://x.com/vllm\_project). If you are in the Bay Area, you can meet the vLLM team at the following events: [vLLM’s sixth meetup with NVIDIA(09/09)](https://lu.ma/87q3nvnh), [PyTorch Conference (09/19)](https://pytorch2024.sched.com/event/1fHmx/vllm-easy-fast-and-cheap-llm-serving-for-everyone-woosuk-kwon-uc-berkeley-xiaoxuan-liu-ucb), [CUDA MODE IRL meetup (09/21)](https://events.accel.com/cudamode), and [the first ever vLLM track at Ray Summit (10/01-02)](https://raysummit.anyscale.com/flow/anyscale/raysummit2024/landing/page/sessioncatalog?search.sessiontracks=1719251906298001uzJ2). diff --git a/_posts/2025-01-10-dev-experience.md b/_posts/2025-01-10-dev-experience.md index 03bf578..d0aee21 100644 --- a/_posts/2025-01-10-dev-experience.md +++ b/_posts/2025-01-10-dev-experience.md @@ -29,7 +29,7 @@ For those who prefer a faster package manager, [**uv**](https://github.com/astra uv pip install vllm ``` -Refer to the [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html?device=cuda#create-a-new-python-environment) for more details on setting up [**uv**](https://github.com/astral-sh/uv). Using a simple server-grade setup (Intel 8th Gen CPU), we observe that [**uv**](https://github.com/astral-sh/uv) is 200x faster than pip: +Refer to the [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html?device=cuda#create-a-new-python-environment) for more details on setting up [**uv**](https://github.com/astral-sh/uv). Using a simple server-grade setup (Intel 8th Gen CPU), we observe that [**uv**](https://github.com/astral-sh/uv) is 200x faster than pip: ```sh # with cached packages, clean virtual environment @@ -77,11 +77,11 @@ VLLM_USE_PRECOMPILED=1 pip install -e . The `VLLM_USE_PRECOMPILED=1` flag instructs the installer to use pre-compiled CUDA kernels instead of building them from source, significantly reducing installation time. This is perfect for developers focusing on Python-level features like API improvements, model support, or integration work. -This lightweight process runs efficiently, even on a laptop. Refer to our [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html?device=cuda#build-wheel-from-source) for more advanced usage. +This lightweight process runs efficiently, even on a laptop. Refer to our [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html?device=cuda#build-wheel-from-source) for more advanced usage. ### C++/Kernel Developers -For advanced contributors working with C++ code or CUDA kernels, we incorporate a compilation cache to minimize build time and streamline kernel development. Please check our [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html?device=cuda#build-wheel-from-source) for more details. +For advanced contributors working with C++ code or CUDA kernels, we incorporate a compilation cache to minimize build time and streamline kernel development. Please check our [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html?device=cuda#build-wheel-from-source) for more details. ## Track Changes with Ease diff --git a/_posts/2025-01-27-intro-to-llama-stack-with-vllm.md b/_posts/2025-01-27-intro-to-llama-stack-with-vllm.md index dbe4d1f..5954c1d 100644 --- a/_posts/2025-01-27-intro-to-llama-stack-with-vllm.md +++ b/_posts/2025-01-27-intro-to-llama-stack-with-vllm.md @@ -49,7 +49,7 @@ huggingface-cli login --token huggingface-cli download meta-llama/Llama-3.2-1B-Instruct --local-dir /tmp/test-vllm-llama-stack/.cache/huggingface/hub/models/Llama-3.2-1B-Instruct ``` -Next, let's build the vLLM CPU container image from source. Note that while we use it for demonstration purposes, there are plenty of [other images available for different hardware and architectures](https://docs.vllm.ai/en/latest/getting_started/installation/index.html). +Next, let's build the vLLM CPU container image from source. Note that while we use it for demonstration purposes, there are plenty of [other images available for different hardware and architectures](https://docs.vllm.ai/en/latest/getting_started/installation.html). ``` git clone git@github.com:vllm-project/vllm.git /tmp/test-vllm-llama-stack