Community maintained hardware plugin for vLLM on Spyre
Updated 2025-07-20 16:37:05 +08:00
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
Updated 2025-07-20 16:31:04 +08:00
Community maintained hardware plugin for vLLM on Ascend
Updated 2025-07-20 16:30:47 +08:00
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Updated 2025-07-20 13:59:02 +08:00
Updated 2025-07-20 13:58:44 +08:00
Work with LLMs on a local environment using containers
Updated 2025-07-20 13:50:12 +08:00
RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.
Updated 2025-07-19 18:35:54 +08:00
Updated 2025-07-04 16:08:08 +08:00
A high-throughput and memory-efficient inference and serving engine for LLMs
llm
mlops
pytorch
cuda
inference
llama
llm-serving
llmops
model-serving
qwen
rocm
tpu
trainium
transformer
amd
xpu
deepseek
gpt
hpu
inferentia
Updated 2025-07-04 16:00:34 +08:00
Updated 2025-07-04 15:19:52 +08:00
This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
Updated 2025-07-04 15:19:47 +08:00
Automated Machine Learning on Kubernetes
kubernetes
machine-learning
kubeflow
ai
tensorflow
huggingface
llm
mlops
jax
pytorch
hyperparameter-tuning
neural-architecture-search
automl
scikit-learn
Updated 2025-06-26 22:13:16 +08:00
Distributed ML Training and Fine-Tuning on Kubernetes
kubernetes
huggingface
ai
llm
gpu
jax
kubeflow
distributed
xgboost
machine-learning
mlops
python
pytorch
tensorflow
fine-tuning
Updated 2025-06-20 16:05:11 +08:00
Examples for building and running LLM services and applications locally with Podman
Updated 2025-06-19 16:22:59 +08:00
🤖 Discover how to apply your LLM app skills on Kubernetes!
Updated 2024-03-09 05:47:39 +08:00