Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Updated 2025-08-26 01:48:08 +08:00
Work with LLMs on a local environment using containers
Updated 2025-08-25 23:46:14 +08:00
A high-throughput and memory-efficient inference and serving engine for LLMs
Updated 2025-08-01 01:35:07 +08:00