Merge pull request #7 from tjtanaa/2024-10-23-vllm-serving-amd-spelling-fix-tj

[Bug] [Spelling] Fix spell spelling
This commit is contained in:
Simon Mo 2024-10-29 17:31:28 -07:00 committed by GitHub
commit dc147caa3f
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 3 additions and 3 deletions

View File

@ -1,7 +1,7 @@
---
layout: post
title: "Serving LLMs on AMD MI300X: Best Practices"
author: "Guest Post by Embedded LLM and Hot Aisles Inc."
author: "Guest Post by Embedded LLM and Hot Aisle Inc."
---
**TL;DR:** vLLM unlocks incredible performance on the AMD MI300X, achieving 1.5x higher throughput and 1.7x faster time-to-first-token (TTFT) than Text Generation Inference (TGI) for Llama 3.1 405B. It also achieves 1.8x higher throughput and 5.1x faster TTFT than TGI for Llama 3.1 70B. This guide explores 8 key vLLM settings to maximize efficiency, showing you how to leverage the power of open-source LLM inference on AMD. If you just want to see the optimal parameters, jump to the [Quick Start Guide](#quick-start-guide).
@ -311,13 +311,13 @@ However, it's important to acknowledge that our exploration has focused primaril
We also want to acknolwedge [this wonderful blogpost](https://shisa.ai/blog/posts/tuning-vllm-mi300x/) by Leonard Lin on how to further optimize vLLM for MI300X, including hipBLAS vs hipBLASLt, CK Flash Attention vs Triton Flash Attention, Tensor Parallelism vs Pipeline Parallelism, etc.
### Acknowledgements
This blog post is drafted by the team at [Embedded LLM](https://embeddedllm.com/) and thank you to [Hot Aisles Inc.](https://hotaisle.xyz/) for sponsoring MI300X for benchmarking vLLM.
This blog post is drafted by the team at [Embedded LLM](https://embeddedllm.com/) and thank you to [Hot Aisle Inc.](https://hotaisle.xyz/) for sponsoring MI300X for benchmarking vLLM.
### Appendix
#### Server Specification
The following are the configuration of the amazing Hot Aisles server:
The following are the configuration of the amazing Hot Aisle server:
- CPU: 2 x Intel Xeon Platinum 8470
- GPU: 8 x AMD Instinct MI300X Accelerators
The model and software that we are using in the benchmark are as follows: