Community maintained hardware plugin for vLLM on Ascend
Updated 2025-07-20 16:30:47 +08:00
A high-throughput and memory-efficient inference and serving engine for LLMs
llm
mlops
pytorch
cuda
inference
llama
llm-serving
llmops
model-serving
qwen
rocm
tpu
trainium
transformer
amd
xpu
deepseek
gpt
hpu
inferentia
Updated 2025-07-04 16:00:34 +08:00