Commit Graph

610 Commits

Author SHA1 Message Date
Le Xu 418ffc375c
Adding help flag to benchmark script (#1302)
* adding help flag to benchmark script
---------

Signed-off-by: Le Xu <le.xu@bytedance.com>
Co-authored-by: Le Xu <le.xu@bytedance.com>
2025-07-19 14:52:33 +08:00
Jiaxin Shan 2a40822624
[Docs] Update stormservice docs and link to index page (#1299)
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-18 09:14:18 +08:00
Yuchen Cheng 71772156ba
[CI] Disable docker push images workflow in forked repositories (#1301)
Signed-off-by: rudeigerc <rudeigerc@gmail.com>
2025-07-18 08:55:02 +08:00
Haiyang Shi dfe578aa6b
[Fix] KVCache: enhance status (#1304)
Not capturing assertion errors

Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-07-18 08:50:49 +08:00
firebook 24760b200c
[Bug] fix incorrect request count (#1246)
* [Bug] fix incorrect request count

Signed-off-by: firebook <dzzf@163.com>

* add constants and add UT

Signed-off-by: firebook <dzzf@163.com>

* fix lint check

Signed-off-by: firebook <dzzf@163.com>

---------

Signed-off-by: firebook <dzzf@163.com>
Co-authored-by: Jingyuan <zhangjyr@gmail.com>
2025-07-17 11:32:27 -07:00
vie-serendipity b48d6d44ff
[Misc] Add unit test code coverage of rolesyncer (#1296)
test: add ut for StatelessRoleSyncer Scale and Rollout

Signed-off-by: vie-serendipity <2733147505@qq.com>
2025-07-17 19:28:19 +08:00
vie-serendipity c1a754a369
[Misc] extract hashfunc as a field to allow injection (#1297)
feat: extract hash func as a field to allow injection for testing

Signed-off-by: vie-serendipity <2733147505@qq.com>
2025-07-17 18:06:21 +08:00
Haiyang Shi d7c378bbb3
[Integration] vLLM V1 Connector integration (#1295)
- add vLLM V1 connector integration
- update vLLM V0 connector integration

Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-07-17 13:15:54 +08:00
Jiaxin Shan e07132d376
Improve UT coverage for stormservice controller (#1283)
* Improve UT coverage for stormservice controller
* Address code review feedback
---------

Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-16 15:18:30 +08:00
Haiyang Shi 26758a539c
[Fix] KVCache: fix requirements (#1294)
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-07-15 22:25:58 -07:00
rongzhi 8ccd354e1e
[Docs] Add documentation for StormService (#1285)
* Add documentation for StormService
* Fix typo
* Add code link and fix font alignment
---------

Signed-off-by: Rongzhi Li <e1088149@u.nus.edu>
2025-07-16 05:59:44 +08:00
Jiaxin Shan 39691b4fc4
[CI] Support custom IMAGE_TAG to override build tags (#1274)
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-16 05:57:46 +08:00
Haiyang Shi 9755d6b411
[Feature] KVCache: optimize allocator for compact layout (#1288)
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-07-16 05:27:14 +08:00
Haiyang Shi 57d28a5af0
[Feature] KVCache: optimize token list iteration and key building (#1287)
[Feature] KVCache: optimize token list iteration

- Introduce TokenListView to avoid using list slicing which involves
  memory copy
- Memorize key building results to avoid duplicate computing
- Refine key builder benchmarks and change default key builder to the
  RollingHashKeyBuilder which delievers the best performance among all
  built-in key builders based on the benchmarking result

Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-07-15 15:19:24 +08:00
Haiyang Shi 3596787c36
[Fix] KVCache: change cuda kernel's namespace (#1286)
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-07-15 08:57:54 +08:00
Haiyang Shi 49d1c7231f
[Feature] KVCache: enhance profiling (#1278)
- add pyroscope based cpu profiling
- add env vars to control both cpu profiling and nvtx

Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-07-14 09:37:30 -07:00
Haiyang Shi 9017cca28f
[Fix] KVCache: enhance rdma auto-detection (#1276)
Fallback to not using rdma auto-detection if pyverbs is not available
on the runtime platform.

Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-07-14 22:10:09 +08:00
Jiaxin Shan 29d86d7fa8
[Misc] Support role replica index in pod labels (#1280)
Support role replica index in pod labels

Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-14 11:41:12 +08:00
Omer Aplatony 72a09c44a4
[Misc] Improve the unit test coverage of stormservice controller (#1282)
Improve the unit test coverage of stormservice controller

Signed-off-by: Omer Aplatony <omerap12@gmail.com>
2025-07-14 10:07:47 +08:00
Haiyang Shi b2adc9201e
[Doc] KVCache: add section for env vars (#1279)
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-07-13 09:30:16 +08:00
Nicole LiHui 47dffef991
[Docs] fix after reorganize incorrect file path (#1271)
fix reorganize docs/development and docs/tutorial structure for pr #455

Signed-off-by: nicole-lihui <nicole.li@daocloud.io>
2025-07-10 09:00:34 +08:00
Nicole LiHui 1710af5a14
[Docs]fix: observability docs dashboard link 404 (#1270)
Signed-off-by: nicole-lihui <nicole.li@daocloud.io>
2025-07-09 22:55:54 +08:00
Jiaxin Shan cec057a46c
Support /scale sub resource for replica mode (#1259)
Support /scale sub resource for replica mode autoscaling

Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-09 18:19:55 +08:00
CYJiang 6cb7846dd4
[Misc]: add ROLE_TEMPLATE_HASH info to container env (#1268)
add ROLE_TEMPLATE_HASH info to container env

Signed-off-by: googs1025 <googs1025@gmail.com>
2025-07-09 11:28:54 +08:00
ZHENYU c746ef753a
[Bug] Optimize prefix cache hashing to O(N) via block-hash same as vllm (#1262)
* [Fix&Optimization] optimize prefix cache hashing to O(N) via block-hash chaining same as vllm
* Fix: further optimize prefix cache hashing

---------

Signed-off-by: ae86zhizhi <550149470@qq.com>
2025-07-09 07:03:08 +08:00
Jiaxin Shan 11eb7e9635
[Misc] Use domain-qualified finalizer name (#1258)
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-07 13:48:30 +08:00
Jiaxin Shan e13e387c19
Ignore StormService NotFound error during deletion (#1257)
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-07 13:40:23 +08:00
Jiaxin Shan 8fb3541d5e
[docs] Move aibrix component design doc to separate architecture folder (#1250)
* Refactor the kvcache offloading docs
* Update AI Engine Runtime docs
* Polish engine runtime and router docs
* Refactor autoscaler docs

---------

Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-07 10:57:31 +08:00
Jiaxin Shan 788ebf158d
Set Storm Service default update strategy (#1256)
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-06 20:42:59 +08:00
Jiaxin Shan 31a3a4e1de
Update stormservice controller DefaultRequeueAfter to 15s (#1253)
Change DefaultRequeueAfter to 15s

Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-06 17:37:18 +08:00
Haiyang Shi a193a4c32a
[Feature] kvcache cuda kernel (#1247)
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-07-05 17:51:57 +08:00
CYJiang 2d8de1ddfa
[Misc] feature: use kvcache webhook (#1187)
* feature: use kvcache webhook

Signed-off-by: googs1025 <googs1025@gmail.com>

* fix integation validating test error

Signed-off-by: googs1025 <googs1025@gmail.com>

* add mutating webhook integration test for kvcache

Signed-off-by: googs1025 <googs1025@gmail.com>

---------

Signed-off-by: googs1025 <googs1025@gmail.com>
2025-07-03 16:55:30 +08:00
Haiyang Shi e79fcb216a
[Lint] KVCache uses pre-commit lint (#1243)
Change to use pre-commit approach for lint, format and other checks. It
is more easier to add new lints (e.g., the clang-format lint that will
    be used for c++ and cuda codes).

Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-07-03 16:22:58 +08:00
yyzxw e1918a2be4
[Docs]refactor: change architecture to stand-alone directories (#1236)
refactor: change architecture to stand-alone directories

Signed-off-by: zxw <1020938856@qq.com>
2025-07-02 11:07:37 +08:00
yyzxw 91032d9535
[Docs]fix: example docs error (#1237)
fix: example docs error

Signed-off-by: zxw <1020938856@qq.com>
2025-07-02 10:14:10 +08:00
Jiaxin Shan 863dae015d
Support standalone stormservice deployment (#1239)
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-01 17:54:46 +08:00
Jiaxin Shan 90d1d2152f
[Misc] Fix storm service rbac issue (#1235)
Fix storm service rbac issue

Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-01 15:56:30 +08:00
Jingyuan cf64e51ba2
[Misc] SLO-aware router with profile support (#1192)
* Combined feature/load_aware_routing changes
* Improve comments
* Typo Fix
* Improve comments
* Improve log overhead
* Amend comment for GPU profile struct and remove unnecessary tests.
* Add cache.InitForTest for previous cache.NewForTest.  Now, the cache.NewForTest will not change global store and work as stateless as expected.

---------

Signed-off-by: Jingyuan Zhang <jingyuan.zhang0929@bytedance.com>
Co-authored-by: Jingyuan Zhang <jingyuan.zhang0929@bytedance.com>
2025-07-01 09:47:17 +08:00
TimWang 8e0111fafc
[Docs]: add advanced kubernetes deployment examples (#1230)
* docs(installation): add advanced kubernetes deployment examples

Add comprehensive documentation for deploying AIBrix on Kubernetes with
persistent model caching, security policies, and monitoring integration.
Includes YAML examples for namespace, deployment, service, and PVC
configuration with detailed usage notes.

Signed-off-by: haitwang-cloud <haitao_wht@outlook.com>

* docs: update k8s examples and gateway plugins documentation

- Adjust pod security policies from privileged to baseline/restricted
- Update probe timeouts and delays for better reliability
- Add startup probe configuration
- Fix code block formatting in gateway plugins docs
- Update feature list and usage notes to reflect changes

Signed-off-by: haitwang-cloud <haitao_wht@outlook.com>

---------

Signed-off-by: haitwang-cloud <haitao_wht@outlook.com>
2025-06-26 21:30:00 +08:00
Haiyang Shi fa39b73c9b
[Refactor] New memory layout for AIBrix KVCache (#1174)
* [Refactor] New memory layout for AIBrix KVCache

- Legacy layout embedded tokens directly in the key, which could result in very long keys
  for cache blocks with long prefixes
- New layout uses hash as the key and stores tokens as part of the value
- L2Cache get operation now uses hash key and verifies token match after
  retrieving value

Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>

* [Integration] Enable new memory layout in vLLM v0.8.5

Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>

* [Chore] Add hpkv dependency

Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>

* [Fix] Fix typing errors with python3.11

Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>

* [Fix] Fix BaseKVCacheManager

Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>

* [Chore] Optimize L2Cache tokens comparison

Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>

* [Feature] KVCache layout: compact laytout

Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>

---------

Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-06-25 19:01:27 -07:00
ModiCodeCraftsman 4842c7d43f
Add new test cases for gateway server (#1217)
Signed-off-by: Modi Tamam <modi.tamam@gmail.com>
2025-06-25 17:22:00 -07:00
Jiaxin Shan e3bb459aff
Add RoleSet and StormService controller implementation (#1229)
* Add RoleSet and StormService implementation
* Update existing comments to English
* Add licencse header
* Adjust package import sequence
* Add controller registration logic
* Refactor the constants and remove duplication
* Disable PodGroup temporarily
* Add headless service for each storm service
* Add missing constant file
* Address linter errors
* Address code review feedback from gemini
* Add unit test and improve coverage

---------

Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-06-25 21:37:52 +08:00
Jiaxin Shan 8fd23efae5
Add RoleSet and StormService detail spec (#1226)
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-06-25 08:19:05 +08:00
Le Xu 1b0ec4ca4d
Adding callback patterns for generator client (#993)
* adding futures wip
* merge fix
* update client async function
* roll back to async io design
* Remove client pool from parameters
* analysis script update and timing bug fix

---------

Signed-off-by: Le Xu <le.xu@bytedance.com>
Co-authored-by: Le Xu <le.xu@bytedance.com>
2025-06-25 08:04:23 +08:00
Venkat Raman 80dcc997ba
[FEATURE]: metrics server support for gateway plugins & dashboard (#1211)
Signed-off-by: Venkat Raman <vraman2811@gmail.com>
2025-06-24 15:49:49 -07:00
CYJiang d63669da5d
[Misc]: ensure cache sync before starting controller reconcile (#1219)
Ensure cache sync before starting controller reconcile

Signed-off-by: googs1025 <googs1025@gmail.com>
2025-06-24 19:06:48 +08:00
Jiaxin Shan bacdc71c8d
[PD] Add RoleSet and StormService API skeleton for disaggregation orchestration (#1209)
* Add StormService API
* Update StormService to adapt AIBrix structure
* Add RoleSet API
* Update RoleSet to adapt AIBrix structure

---------

Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-06-24 10:43:32 +08:00
Jiaxin Shan 5dfad68c2b
[Doc] Add maintainer guidelines and contributor promotion criteria (#1224)
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-06-24 06:26:55 +08:00
Venkat Raman 782e8998e3
[FIX]: vtc-basic router constructor config init, enable e2e tests & add benchmark results only (#1222)
* [FIX]: vtc-basic router constructor config init, enable e2e tests & add benchmark results only

Signed-off-by: Venkat Raman <vraman2811@gmail.com>

* feat: address review comment

Signed-off-by: Venkat Raman <vraman2811@gmail.com>

---------

Signed-off-by: Venkat Raman <vraman2811@gmail.com>
2025-06-24 06:20:14 +08:00
Bao Jiangnan 49e51f5c46
fix prefix hash incorrect sometimes (#1218) 2025-06-22 16:21:51 -07:00