Community maintained hardware plugin for vLLM on Ascend
Updated 2025-07-20 16:30:47 +08:00
RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.
Updated 2025-07-19 18:35:54 +08:00
A high-throughput and memory-efficient inference and serving engine for LLMs
Updated 2025-07-04 16:00:34 +08:00
Distributed ML Training and Fine-Tuning on Kubernetes
Updated 2025-06-20 16:05:11 +08:00
🤖 Discover how to apply your LLM app skills on Kubernetes!
Updated 2024-03-09 05:47:39 +08:00