Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Updated 2025-08-26 01:48:08 +08:00
Work with LLMs on a local environment using containers
Updated 2025-08-25 23:46:14 +08:00
A high-throughput and memory-efficient inference and serving engine for LLMs
llm
mlops
pytorch
cuda
inference
llama
llm-serving
llmops
model-serving
qwen
rocm
tpu
trainium
transformer
amd
xpu
deepseek
gpt
hpu
inferentia
Updated 2025-08-01 01:35:07 +08:00