Chen Zhang
|
a89209b78d
|
[v1] Support mamba2 (#19327)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-18 20:34:15 +00:00 |
lkchen
|
d4629dc43f
|
[Misc] Add __str__ for RequestStatus (#19780)
Signed-off-by: Linkun Chen <github@lkchen.net>
|
2025-06-18 03:03:01 +00:00 |
Isotr0py
|
1173804dca
|
[Bugfix] Fix TP inference for Flex attention backend (#19657)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-16 11:21:37 +00:00 |
Chengji Yao
|
a77aea59fd
|
[TPU] support attention head dim smaller than 128 (#19620)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-06-16 06:40:53 +00:00 |
Isotr0py
|
2db9044ab6
|
[Bugfix] Fix auto dtype casting for BatchFeature (#19316)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-06-14 15:13:08 +00:00 |
jmswen
|
c9280e6346
|
[Bugfix] Respect num-gpu-blocks-override in v1 (#19503)
Signed-off-by: Jon Swenson <jmswen@gmail.com>
|
2025-06-12 11:00:23 +00:00 |
Nick Hill
|
d5bdf899e4
|
[BugFix] Work-around incremental detokenization edge case error (#19449)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-12 06:43:20 +00:00 |
Ning Xie
|
2f1c19b245
|
[CI] change spell checker from codespell to typos (#18711)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-06-11 19:57:10 -07:00 |
leopardracer
|
7c644ab6d5
|
Fix Typo in Documentation and Function Name (#19442)
|
2025-06-10 22:44:11 -07:00 |
Isotr0py
|
5f1ac1e1d1
|
Revert "[v1] Add fp32 support to v1 engine through flex attn" (#19404)
|
2025-06-10 01:30:20 -07:00 |
Nick Hill
|
646d62f636
|
[Core] Use tuple for kv cache group block ids (#19175)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-10 07:01:17 +02:00 |
Siyuan Liu
|
7d44c469fe
|
[TPU]Fix KV cache sharing tests (#19371)
|
2025-06-09 18:38:15 -04:00 |
Isotr0py
|
b8089195b4
|
[v1] Add fp32 support to v1 engine through flex attn (#19319)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-06-09 22:10:44 +08:00 |
Luka Govedič
|
2d8476e465
|
[BugFix][V1] Fix memory profiling bug (#18974)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-06-07 10:34:51 -07:00 |
Nick Hill
|
46ecc57973
|
[BugFix] Fix tpu_model_runner block_id concatenation (#19228)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-06 16:28:17 -07:00 |
Adolfo Victoria
|
ca27f0f9c1
|
[Bugfix][Core] Update cancellation logic in `generate()` to handle Generator exits (#19225)
Co-authored-by: Adolfo Victoria <adovi@meta.com>
|
2025-06-06 20:17:54 +00:00 |
Nick Hill
|
aad30bd306
|
[BugFix] Fix MultiConnector test after HMA changes (#19291)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-06 20:16:24 +00:00 |
jmswen
|
7353492a47
|
[Core] Raise when non-multi-instance DP clients target a DP rank (#19227)
Signed-off-by: Jon Swenson <jmswen@gmail.com>
|
2025-06-06 19:03:01 +08:00 |
Chen Zhang
|
f8a1a2d108
|
[v1] Hybrid Memory Allocator (#17996)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-05 20:47:09 -07:00 |
Benjamin Chislett
|
3465b87ef8
|
[Bugfix] Fix EAGLE vocab embedding construction for Llama 70B (#19033)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-06-05 19:10:08 -07:00 |
Robert Shaw
|
c56ed8bb0e
|
[Bugfix][Nixl] Fix full prefix cache hit bug (#18632)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-06-05 02:07:32 +00:00 |
Nicolò Lucchesi
|
b2fac67130
|
[P/D] Heterogeneous TP (#18833)
Signed-off-by: nicklucche <nlucches@redhat.com>
|
2025-06-04 23:25:34 +00:00 |
Siyuan Liu
|
7ee2590478
|
[TPU] Update dynamo dump file name in compilation test (#19108)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-06-04 16:13:43 -04:00 |
jmswen
|
c8dcc15921
|
Allow AsyncLLMEngine.generate to target a specific DP rank (#19102)
Signed-off-by: Jon Swenson <jmswen@gmail.com>
|
2025-06-04 08:26:47 -07:00 |
Seiji Eicher
|
2669a0d7b5
|
Fix ValueError: Missing value for tag key(s): model_name,engine. (#19113)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-06-04 17:10:45 +08:00 |
Siyuan Liu
|
8e972d9c44
|
[TPU] Skip hanging tests (#19115)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-06-04 01:43:00 -07:00 |
Yan Ru Pei
|
b712be98c7
|
feat: add data parallel rank to KVEventBatch (#18925)
|
2025-06-03 17:14:20 -07:00 |
Chen Zhang
|
a8da78eac9
|
[Bugfix] Max concurrency estimation and check_enough_kv_cache_memory for models with sliding window layers (#19029)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-04 00:14:06 +00:00 |
Chen Zhang
|
b5fd9506c1
|
[Bugfix] get_num_blocks_to_allocate with null_block (#19031)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-03 15:30:55 -07:00 |
Chen Zhang
|
6cac54f4d1
|
[v1] Re-init input batch for multiple kv cache groups (#18654)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-03 21:41:36 +00:00 |
Yong Hoon Shin
|
bdf13965ab
|
[V1] Support cross-layer KV sharing (#18212)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-06-03 20:33:07 +00:00 |
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
Chen Zhang
|
f32fcd9444
|
[v1][KVCacheManager] Rename BlockHashType to BlockHash (#19015)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-03 08:01:48 +00:00 |
Rui Qiao
|
bdce64f236
|
[V1] Support DP with Ray (#18779)
|
2025-06-02 21:15:13 -07:00 |
Siyuan Liu
|
9112b443a0
|
[Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
|
2025-06-03 00:06:20 +00:00 |
22quinn
|
9760fd8f6a
|
[Core] Support inplace model weights loading (#18745)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-02 17:38:50 +08:00 |
Nick Hill
|
2dbe8c0774
|
[Perf] API-server scaleout with many-to-many server-engine comms (#17546)
|
2025-05-30 08:17:00 -07:00 |
Carol Zheng
|
fba02e3bd1
|
[Bugfix][TPU] Fix tpu model runner testcase failure (#18810)
Signed-off-by: Carol Zheng <cazheng@google.com>
|
2025-05-30 18:04:03 +08:00 |
Nick Hill
|
d1d61f3351
|
[BugFix] Make DP work with connector-delayed new requests (#18559)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Will Eaton <weaton@redhat.com>
|
2025-05-29 18:04:18 +00:00 |
Nicolò Lucchesi
|
32ce3cf7c9
|
[V1] Allocate kv_cache with stride order for V1 (#18775)
Signed-off-by: nicklucche <nlucches@redhat.com>
|
2025-05-29 17:54:16 +00:00 |
Mark McLoughlin
|
06a0338015
|
[V1][Metrics] Add API for accessing in-memory Prometheus metrics (#17010)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-27 09:37:06 +00:00 |
qizixi
|
c1e4a4052d
|
[V1][Spec Decode] Support multi-layer eagle draft model (#18030)
Signed-off-by: qizixi <qizixi@meta.com>
|
2025-05-24 09:45:34 +00:00 |
qizixi
|
d55e446d13
|
[V1][Spec Decode] Small refactors to improve eagle bookkeeping performance (#18424)
Signed-off-by: qizixi <qizixi@meta.com>
|
2025-05-24 06:51:22 +00:00 |
Robert Shaw
|
2b10ba7491
|
[Bugfix][Nixl] Fix Preemption Bug (#18631)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-05-23 23:30:16 +00:00 |
Feng XiaoLong
|
4fc1bf813a
|
[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454)
Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com>
Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>
|
2025-05-23 16:16:26 -07:00 |
Chen Zhang
|
6550114c9c
|
[v1] Redo "Support multiple KV cache groups in GPU model runner (#17945)" (#18593)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-05-23 09:39:47 -07:00 |
Chauncey
|
b046cf792d
|
[Feature][V1]: suupports cached_tokens in response usage (#18149)
Co-authored-by: simon-mo <xmo@berkeley.edu>
|
2025-05-23 01:41:03 -07:00 |
lkchen
|
e44d8ce8c7
|
[Bugfix] Set `KVTransferConfig.engine_id` in post_init (#18576)
Signed-off-by: Linkun Chen <github@lkchen.net>
|
2025-05-23 02:54:42 +00:00 |
Mark McLoughlin
|
c6b636f9fb
|
[V1][Spec Decoding] Use model_loader.get_model() to load models (#18273)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-23 02:05:44 +00:00 |
rasmith
|
46791e1b4b
|
[AMD] [P/D] Compute num gpus for ROCm correctly in run_accuracy_test.sh (#18568)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2025-05-22 18:45:35 -07:00 |