Cost-efficient and pluggable Infrastructure components for GenAI inference

Go to file

Le Xu 418ffc375c Adding help flag to benchmark script (#1302 ) * adding help flag to benchmark script --------- Signed-off-by: Le Xu <le.xu@bytedance.com> Co-authored-by: Le Xu <le.xu@bytedance.com>		2025-07-19 14:52:33 +08:00
.github	[CI] Disable docker push images workflow in forked repositories (#1301 )	2025-07-18 08:55:02 +08:00
api	Support /scale sub resource for replica mode (#1259 )	2025-07-09 18:19:55 +08:00
benchmarks	Adding help flag to benchmark script (#1302 )	2025-07-19 14:52:33 +08:00
build/container	Support distributed hashing mode kv cache pool (#984 )	2025-04-18 10:52:35 -07:00
cmd	[Misc] feature: use kvcache webhook (#1187 )	2025-07-03 16:55:30 +08:00
config	Support /scale sub resource for replica mode (#1259 )	2025-07-09 18:19:55 +08:00
deployment/terraform	Docs: fixing various text issues (#1190 )	2025-06-13 10:58:13 -07:00
development	[Docs] fix after reorganize incorrect file path (#1271 )	2025-07-10 09:00:34 +08:00
docs	[Docs] Update stormservice docs and link to index page (#1299 )	2025-07-18 09:14:18 +08:00
hack	[Misc] Clean up deployment scripts for volcengine (#1081 )	2025-05-12 22:32:08 -07:00
observability	[FEATURE]: metrics server support for gateway plugins & dashboard (#1211 )	2025-06-24 15:49:49 -07:00
pkg	[Bug] fix incorrect request count (#1246 )	2025-07-17 11:32:27 -07:00
python	[Fix] KVCache: enhance status (#1304 )	2025-07-18 08:50:49 +08:00
samples	[Docs]fix: example docs error (#1237 )	2025-07-02 10:14:10 +08:00
scripts	[Tooling]: port-forward support, Makefile changes for easier dev workflow in kind (#1210 )	2025-06-20 09:07:23 +08:00
test	[Misc] feature: use kvcache webhook (#1187 )	2025-07-03 16:55:30 +08:00
.gitignore	[Tooling]: port-forward support, Makefile changes for easier dev workflow in kind (#1210 )	2025-06-20 09:07:23 +08:00
.golangci.yml	Add RayClusterReplicaSet initial implementation (#165 )	2024-09-12 13:47:48 -07:00
.readthedocs.yaml	skip docs CI if no changes in /docs dir (#294 )	2024-10-13 20:03:08 -07:00
CODE_OF_CONDUCT.md	Add common project documents and skeleton folders (#4 )	2024-07-01 22:40:50 -07:00
CONTRIBUTING.md	[Docs] fix after reorganize incorrect file path (#1271 )	2025-07-10 09:00:34 +08:00
LICENSE	Initial commit	2024-06-10 16:06:11 -07:00
Makefile	[CI] Support custom IMAGE_TAG to override build tags (#1274 )	2025-07-16 05:57:46 +08:00
PROJECT	[Misc] feature: use kvcache webhook (#1187 )	2025-07-03 16:55:30 +08:00
README.md	[Misc] Update the latest news in README.md (#1196 )	2025-06-19 07:37:00 +08:00
SECURITY.md	Create SECURITY.md to enable security policy (#756 )	2025-02-26 19:19:29 -08:00
go.mod	Support /scale sub resource for replica mode (#1259 )	2025-07-09 18:19:55 +08:00
go.sum	Support /scale sub resource for replica mode (#1259 )	2025-07-09 18:19:55 +08:00

README.md

AIBrix

Welcome to AIBrix, an open-source initiative designed to provide essential building blocks to construct scalable GenAI inference infrastructure. AIBrix delivers a cloud-native solution optimized for deploying, managing, and scaling large language model (LLM) inference, tailored specifically to enterprise needs.

Latest News

[2025-06-10] The AIBrix team delivered a talk at KubeCon China 2025 titled AIBrix: Cost-Effective and Scalable Kubernetes Control Plane for vLLM, discussing how the framework optimizes vLLM deployment via Kubernetes for cost efficiency and scalability.
[2025-05-21] AIBrix v0.3.0 is released. Check out the release notes and Blog Post for more details
[2025-04-04] AIBrix co-delivered a KubeCon EU 2025 keynote with Google on LLM-Aware Load Balancing in Kubernetes: A New Era of Efficiency, focusing on LLM specific routing solutions.
[2025-03-30] AIBrix was featured at the ASPLOS'25 workshop with the presentation AIBrix: An Open-Source, Large-Scale LLM Inference Infrastructure for System Research, showcasing its architecture for efficient LLM inference in system research scenarios.
[2025-03-09] AIBrix v0.2.1 is released. DeepSeek-R1 full weights deployment is supported and gateway stability has been improved! Check Blog Post for more details.
[2025-02-19] AIBrix v0.2.0 is released. Check out the release notes and Blog Post for more details.
[2025-11-13] AIBrix v0.1.0 is released. Check out the release notes and Blog Post for more details.

Key Features

The initial release includes the following key features:

High-Density LoRA Management: Streamlined support for lightweight, low-rank adaptations of models.
LLM Gateway and Routing: Efficiently manage and direct traffic across multiple models and replicas.
LLM App-Tailored Autoscaler: Dynamically scale inference resources based on real-time demand.
Unified AI Runtime: A versatile sidecar enabling metric standardization, model downloading, and management.
Distributed Inference: Scalable architecture to handle large workloads across multiple nodes.
Distributed KV Cache: Enables high-capacity, cross-engine KV reuse.
Cost-efficient Heterogeneous Serving: Enables mixed GPU inference to reduce costs with SLO guarantees.
GPU Hardware Failure Detection: Proactive detection of GPU hardware issues.

Architecture

Quick Start

To get started with AIBrix, clone this repository and follow the setup instructions in the documentation. Our comprehensive guide will help you configure and deploy your first LLM infrastructure seamlessly.

# Local Testing
git clone https://github.com/vllm-project/aibrix.git
cd aibrix

# Install nightly aibrix dependencies
kubectl apply -k config/dependency --server-side

# Install nightly aibrix components
kubectl apply -k config/default

Install stable distribution

# Install component dependencies
kubectl apply -f "https://github.com/vllm-project/aibrix/releases/download/v0.3.0/aibrix-dependency-v0.3.0.yaml" --server-side

# Install aibrix components
kubectl apply -f "https://github.com/vllm-project/aibrix/releases/download/v0.3.0/aibrix-core-v0.3.0.yaml"

Documentation

For detailed documentation on installation, configuration, and usage, please visit our documentation page.

Contributing

We welcome contributions from the community! Check out our contributing guidelines to see how you can make a difference.

Slack Channel: #aibrix

License

AIBrix is licensed under the Apache 2.0 License.

Support

If you have any questions or encounter any issues, please submit an issue on our GitHub issues page.

Thank you for choosing AIBrix for your GenAI infrastructure needs!