Le Xu
418ffc375c
Adding help flag to benchmark script ( #1302 )
...
* adding help flag to benchmark script
---------
Signed-off-by: Le Xu <le.xu@bytedance.com>
Co-authored-by: Le Xu <le.xu@bytedance.com>
2025-07-19 14:52:33 +08:00
Jiaxin Shan
2a40822624
[Docs] Update stormservice docs and link to index page ( #1299 )
...
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-18 09:14:18 +08:00
Yuchen Cheng
71772156ba
[CI] Disable docker push images workflow in forked repositories ( #1301 )
...
Signed-off-by: rudeigerc <rudeigerc@gmail.com>
2025-07-18 08:55:02 +08:00
Haiyang Shi
dfe578aa6b
[Fix] KVCache: enhance status ( #1304 )
...
Not capturing assertion errors
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-07-18 08:50:49 +08:00
firebook
24760b200c
[Bug] fix incorrect request count ( #1246 )
...
* [Bug] fix incorrect request count
Signed-off-by: firebook <dzzf@163.com>
* add constants and add UT
Signed-off-by: firebook <dzzf@163.com>
* fix lint check
Signed-off-by: firebook <dzzf@163.com>
---------
Signed-off-by: firebook <dzzf@163.com>
Co-authored-by: Jingyuan <zhangjyr@gmail.com>
2025-07-17 11:32:27 -07:00
vie-serendipity
b48d6d44ff
[Misc] Add unit test code coverage of rolesyncer ( #1296 )
...
test: add ut for StatelessRoleSyncer Scale and Rollout
Signed-off-by: vie-serendipity <2733147505@qq.com>
2025-07-17 19:28:19 +08:00
vie-serendipity
c1a754a369
[Misc] extract hashfunc as a field to allow injection ( #1297 )
...
feat: extract hash func as a field to allow injection for testing
Signed-off-by: vie-serendipity <2733147505@qq.com>
2025-07-17 18:06:21 +08:00
Haiyang Shi
d7c378bbb3
[Integration] vLLM V1 Connector integration ( #1295 )
...
- add vLLM V1 connector integration
- update vLLM V0 connector integration
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-07-17 13:15:54 +08:00
Jiaxin Shan
e07132d376
Improve UT coverage for stormservice controller ( #1283 )
...
* Improve UT coverage for stormservice controller
* Address code review feedback
---------
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-16 15:18:30 +08:00
Haiyang Shi
26758a539c
[Fix] KVCache: fix requirements ( #1294 )
...
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-07-15 22:25:58 -07:00
rongzhi
8ccd354e1e
[Docs] Add documentation for StormService ( #1285 )
...
* Add documentation for StormService
* Fix typo
* Add code link and fix font alignment
---------
Signed-off-by: Rongzhi Li <e1088149@u.nus.edu>
2025-07-16 05:59:44 +08:00
Jiaxin Shan
39691b4fc4
[CI] Support custom IMAGE_TAG to override build tags ( #1274 )
...
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-16 05:57:46 +08:00
Haiyang Shi
9755d6b411
[Feature] KVCache: optimize allocator for compact layout ( #1288 )
...
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-07-16 05:27:14 +08:00
Haiyang Shi
57d28a5af0
[Feature] KVCache: optimize token list iteration and key building ( #1287 )
...
[Feature] KVCache: optimize token list iteration
- Introduce TokenListView to avoid using list slicing which involves
memory copy
- Memorize key building results to avoid duplicate computing
- Refine key builder benchmarks and change default key builder to the
RollingHashKeyBuilder which delievers the best performance among all
built-in key builders based on the benchmarking result
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-07-15 15:19:24 +08:00
Haiyang Shi
3596787c36
[Fix] KVCache: change cuda kernel's namespace ( #1286 )
...
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-07-15 08:57:54 +08:00
Haiyang Shi
49d1c7231f
[Feature] KVCache: enhance profiling ( #1278 )
...
- add pyroscope based cpu profiling
- add env vars to control both cpu profiling and nvtx
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-07-14 09:37:30 -07:00
Haiyang Shi
9017cca28f
[Fix] KVCache: enhance rdma auto-detection ( #1276 )
...
Fallback to not using rdma auto-detection if pyverbs is not available
on the runtime platform.
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-07-14 22:10:09 +08:00
Jiaxin Shan
29d86d7fa8
[Misc] Support role replica index in pod labels ( #1280 )
...
Support role replica index in pod labels
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-14 11:41:12 +08:00
Omer Aplatony
72a09c44a4
[Misc] Improve the unit test coverage of stormservice controller ( #1282 )
...
Improve the unit test coverage of stormservice controller
Signed-off-by: Omer Aplatony <omerap12@gmail.com>
2025-07-14 10:07:47 +08:00
Haiyang Shi
b2adc9201e
[Doc] KVCache: add section for env vars ( #1279 )
...
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-07-13 09:30:16 +08:00
Nicole LiHui
47dffef991
[Docs] fix after reorganize incorrect file path ( #1271 )
...
fix reorganize docs/development and docs/tutorial structure for pr #455
Signed-off-by: nicole-lihui <nicole.li@daocloud.io>
2025-07-10 09:00:34 +08:00
Nicole LiHui
1710af5a14
[Docs]fix: observability docs dashboard link 404 ( #1270 )
...
Signed-off-by: nicole-lihui <nicole.li@daocloud.io>
2025-07-09 22:55:54 +08:00
Jiaxin Shan
cec057a46c
Support /scale sub resource for replica mode ( #1259 )
...
Support /scale sub resource for replica mode autoscaling
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-09 18:19:55 +08:00
CYJiang
6cb7846dd4
[Misc]: add ROLE_TEMPLATE_HASH info to container env ( #1268 )
...
add ROLE_TEMPLATE_HASH info to container env
Signed-off-by: googs1025 <googs1025@gmail.com>
2025-07-09 11:28:54 +08:00
ZHENYU
c746ef753a
[Bug] Optimize prefix cache hashing to O(N) via block-hash same as vllm ( #1262 )
...
* [Fix&Optimization] optimize prefix cache hashing to O(N) via block-hash chaining same as vllm
* Fix: further optimize prefix cache hashing
---------
Signed-off-by: ae86zhizhi <550149470@qq.com>
2025-07-09 07:03:08 +08:00
Jiaxin Shan
11eb7e9635
[Misc] Use domain-qualified finalizer name ( #1258 )
...
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-07 13:48:30 +08:00
Jiaxin Shan
e13e387c19
Ignore StormService NotFound error during deletion ( #1257 )
...
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-07 13:40:23 +08:00
Jiaxin Shan
8fb3541d5e
[docs] Move aibrix component design doc to separate architecture folder ( #1250 )
...
* Refactor the kvcache offloading docs
* Update AI Engine Runtime docs
* Polish engine runtime and router docs
* Refactor autoscaler docs
---------
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-07 10:57:31 +08:00
Jiaxin Shan
788ebf158d
Set Storm Service default update strategy ( #1256 )
...
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-06 20:42:59 +08:00
Jiaxin Shan
31a3a4e1de
Update stormservice controller DefaultRequeueAfter to 15s ( #1253 )
...
Change DefaultRequeueAfter to 15s
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-06 17:37:18 +08:00
Haiyang Shi
a193a4c32a
[Feature] kvcache cuda kernel ( #1247 )
...
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-07-05 17:51:57 +08:00
CYJiang
2d8de1ddfa
[Misc] feature: use kvcache webhook ( #1187 )
...
* feature: use kvcache webhook
Signed-off-by: googs1025 <googs1025@gmail.com>
* fix integation validating test error
Signed-off-by: googs1025 <googs1025@gmail.com>
* add mutating webhook integration test for kvcache
Signed-off-by: googs1025 <googs1025@gmail.com>
---------
Signed-off-by: googs1025 <googs1025@gmail.com>
2025-07-03 16:55:30 +08:00
Haiyang Shi
e79fcb216a
[Lint] KVCache uses pre-commit lint ( #1243 )
...
Change to use pre-commit approach for lint, format and other checks. It
is more easier to add new lints (e.g., the clang-format lint that will
be used for c++ and cuda codes).
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-07-03 16:22:58 +08:00
yyzxw
e1918a2be4
[Docs]refactor: change architecture to stand-alone directories ( #1236 )
...
refactor: change architecture to stand-alone directories
Signed-off-by: zxw <1020938856@qq.com>
2025-07-02 11:07:37 +08:00
yyzxw
91032d9535
[Docs]fix: example docs error ( #1237 )
...
fix: example docs error
Signed-off-by: zxw <1020938856@qq.com>
2025-07-02 10:14:10 +08:00
Jiaxin Shan
863dae015d
Support standalone stormservice deployment ( #1239 )
...
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-01 17:54:46 +08:00
Jiaxin Shan
90d1d2152f
[Misc] Fix storm service rbac issue ( #1235 )
...
Fix storm service rbac issue
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-07-01 15:56:30 +08:00
Jingyuan
cf64e51ba2
[Misc] SLO-aware router with profile support ( #1192 )
...
* Combined feature/load_aware_routing changes
* Improve comments
* Typo Fix
* Improve comments
* Improve log overhead
* Amend comment for GPU profile struct and remove unnecessary tests.
* Add cache.InitForTest for previous cache.NewForTest. Now, the cache.NewForTest will not change global store and work as stateless as expected.
---------
Signed-off-by: Jingyuan Zhang <jingyuan.zhang0929@bytedance.com>
Co-authored-by: Jingyuan Zhang <jingyuan.zhang0929@bytedance.com>
2025-07-01 09:47:17 +08:00
TimWang
8e0111fafc
[Docs]: add advanced kubernetes deployment examples ( #1230 )
...
* docs(installation): add advanced kubernetes deployment examples
Add comprehensive documentation for deploying AIBrix on Kubernetes with
persistent model caching, security policies, and monitoring integration.
Includes YAML examples for namespace, deployment, service, and PVC
configuration with detailed usage notes.
Signed-off-by: haitwang-cloud <haitao_wht@outlook.com>
* docs: update k8s examples and gateway plugins documentation
- Adjust pod security policies from privileged to baseline/restricted
- Update probe timeouts and delays for better reliability
- Add startup probe configuration
- Fix code block formatting in gateway plugins docs
- Update feature list and usage notes to reflect changes
Signed-off-by: haitwang-cloud <haitao_wht@outlook.com>
---------
Signed-off-by: haitwang-cloud <haitao_wht@outlook.com>
2025-06-26 21:30:00 +08:00
Haiyang Shi
fa39b73c9b
[Refactor] New memory layout for AIBrix KVCache ( #1174 )
...
* [Refactor] New memory layout for AIBrix KVCache
- Legacy layout embedded tokens directly in the key, which could result in very long keys
for cache blocks with long prefixes
- New layout uses hash as the key and stores tokens as part of the value
- L2Cache get operation now uses hash key and verifies token match after
retrieving value
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
* [Integration] Enable new memory layout in vLLM v0.8.5
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
* [Chore] Add hpkv dependency
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
* [Fix] Fix typing errors with python3.11
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
* [Fix] Fix BaseKVCacheManager
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
* [Chore] Optimize L2Cache tokens comparison
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
* [Feature] KVCache layout: compact laytout
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
---------
Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
Co-authored-by: Haiyang Shi <haiyang.shi@bytedance.com>
2025-06-25 19:01:27 -07:00
ModiCodeCraftsman
4842c7d43f
Add new test cases for gateway server ( #1217 )
...
Signed-off-by: Modi Tamam <modi.tamam@gmail.com>
2025-06-25 17:22:00 -07:00
Jiaxin Shan
e3bb459aff
Add RoleSet and StormService controller implementation ( #1229 )
...
* Add RoleSet and StormService implementation
* Update existing comments to English
* Add licencse header
* Adjust package import sequence
* Add controller registration logic
* Refactor the constants and remove duplication
* Disable PodGroup temporarily
* Add headless service for each storm service
* Add missing constant file
* Address linter errors
* Address code review feedback from gemini
* Add unit test and improve coverage
---------
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-06-25 21:37:52 +08:00
Jiaxin Shan
8fd23efae5
Add RoleSet and StormService detail spec ( #1226 )
...
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-06-25 08:19:05 +08:00
Le Xu
1b0ec4ca4d
Adding callback patterns for generator client ( #993 )
...
* adding futures wip
* merge fix
* update client async function
* roll back to async io design
* Remove client pool from parameters
* analysis script update and timing bug fix
---------
Signed-off-by: Le Xu <le.xu@bytedance.com>
Co-authored-by: Le Xu <le.xu@bytedance.com>
2025-06-25 08:04:23 +08:00
Venkat Raman
80dcc997ba
[FEATURE]: metrics server support for gateway plugins & dashboard ( #1211 )
...
Signed-off-by: Venkat Raman <vraman2811@gmail.com>
2025-06-24 15:49:49 -07:00
CYJiang
d63669da5d
[Misc]: ensure cache sync before starting controller reconcile ( #1219 )
...
Ensure cache sync before starting controller reconcile
Signed-off-by: googs1025 <googs1025@gmail.com>
2025-06-24 19:06:48 +08:00
Jiaxin Shan
bacdc71c8d
[PD] Add RoleSet and StormService API skeleton for disaggregation orchestration ( #1209 )
...
* Add StormService API
* Update StormService to adapt AIBrix structure
* Add RoleSet API
* Update RoleSet to adapt AIBrix structure
---------
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-06-24 10:43:32 +08:00
Jiaxin Shan
5dfad68c2b
[Doc] Add maintainer guidelines and contributor promotion criteria ( #1224 )
...
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2025-06-24 06:26:55 +08:00
Venkat Raman
782e8998e3
[FIX]: vtc-basic router constructor config init, enable e2e tests & add benchmark results only ( #1222 )
...
* [FIX]: vtc-basic router constructor config init, enable e2e tests & add benchmark results only
Signed-off-by: Venkat Raman <vraman2811@gmail.com>
* feat: address review comment
Signed-off-by: Venkat Raman <vraman2811@gmail.com>
---------
Signed-off-by: Venkat Raman <vraman2811@gmail.com>
2025-06-24 06:20:14 +08:00
Bao Jiangnan
49e51f5c46
fix prefix hash incorrect sometimes ( #1218 )
2025-06-22 16:21:51 -07:00