vLLM
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Updated 2025-08-26 01:48:08 +08:00
Updated 2025-08-18 21:26:29 +08:00
Updated 2025-08-01 20:19:25 +08:00
A high-throughput and memory-efficient inference and serving engine for LLMs
Updated 2025-08-01 01:35:07 +08:00
This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
Updated 2025-07-31 04:54:25 +08:00
Updated 2025-07-23 16:35:22 +08:00
Community maintained hardware plugin for vLLM on Spyre
Updated 2025-07-20 16:37:05 +08:00
Cost-efficient and pluggable Infrastructure components for GenAI inference
Updated 2025-07-20 16:33:47 +08:00
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
Updated 2025-07-20 16:31:04 +08:00
Community maintained hardware plugin for vLLM on Ascend
Updated 2025-07-20 16:30:47 +08:00
vLLM Logo Assets
Updated 2024-12-12 09:11:44 +08:00
vLLM performance dashboard
Updated 2024-04-26 14:13:44 +08:00