vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Chen Zhang	a89209b78d	[v1] Support mamba2 (#19327 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-18 20:34:15 +00:00
lkchen	d4629dc43f	[Misc] Add __str__ for RequestStatus (#19780 ) Signed-off-by: Linkun Chen <github@lkchen.net>	2025-06-18 03:03:01 +00:00
Isotr0py	1173804dca	[Bugfix] Fix TP inference for Flex attention backend (#19657 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-16 11:21:37 +00:00
Chengji Yao	a77aea59fd	[TPU] support attention head dim smaller than 128 (#19620 ) Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-06-16 06:40:53 +00:00
Isotr0py	2db9044ab6	[Bugfix] Fix auto dtype casting for BatchFeature (#19316 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-06-14 15:13:08 +00:00
jmswen	c9280e6346	[Bugfix] Respect num-gpu-blocks-override in v1 (#19503 ) Signed-off-by: Jon Swenson <jmswen@gmail.com>	2025-06-12 11:00:23 +00:00
Nick Hill	d5bdf899e4	[BugFix] Work-around incremental detokenization edge case error (#19449 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-12 06:43:20 +00:00
Ning Xie	2f1c19b245	[CI] change spell checker from codespell to typos (#18711 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-06-11 19:57:10 -07:00
leopardracer	7c644ab6d5	Fix Typo in Documentation and Function Name (#19442 )	2025-06-10 22:44:11 -07:00
Isotr0py	5f1ac1e1d1	Revert "[v1] Add fp32 support to v1 engine through flex attn" (#19404 )	2025-06-10 01:30:20 -07:00
Nick Hill	646d62f636	[Core] Use tuple for kv cache group block ids (#19175 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-10 07:01:17 +02:00
Siyuan Liu	7d44c469fe	[TPU]Fix KV cache sharing tests (#19371 )	2025-06-09 18:38:15 -04:00
Isotr0py	b8089195b4	[v1] Add fp32 support to v1 engine through flex attn (#19319 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-06-09 22:10:44 +08:00
Luka Govedič	2d8476e465	[BugFix][V1] Fix memory profiling bug (#18974 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-06-07 10:34:51 -07:00
Nick Hill	46ecc57973	[BugFix] Fix tpu_model_runner block_id concatenation (#19228 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-06 16:28:17 -07:00
Adolfo Victoria	ca27f0f9c1	[Bugfix][Core] Update cancellation logic in `generate()` to handle Generator exits (#19225 ) Co-authored-by: Adolfo Victoria <adovi@meta.com>	2025-06-06 20:17:54 +00:00
Nick Hill	aad30bd306	[BugFix] Fix MultiConnector test after HMA changes (#19291 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-06 20:16:24 +00:00
jmswen	7353492a47	[Core] Raise when non-multi-instance DP clients target a DP rank (#19227 ) Signed-off-by: Jon Swenson <jmswen@gmail.com>	2025-06-06 19:03:01 +08:00
Chen Zhang	f8a1a2d108	[v1] Hybrid Memory Allocator (#17996 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-05 20:47:09 -07:00
Benjamin Chislett	3465b87ef8	[Bugfix] Fix EAGLE vocab embedding construction for Llama 70B (#19033 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-06-05 19:10:08 -07:00
Robert Shaw	c56ed8bb0e	[Bugfix][Nixl] Fix full prefix cache hit bug (#18632 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-06-05 02:07:32 +00:00
Nicolò Lucchesi	b2fac67130	[P/D] Heterogeneous TP (#18833 ) Signed-off-by: nicklucche <nlucches@redhat.com>	2025-06-04 23:25:34 +00:00
Siyuan Liu	7ee2590478	[TPU] Update dynamo dump file name in compilation test (#19108 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-06-04 16:13:43 -04:00
jmswen	c8dcc15921	Allow AsyncLLMEngine.generate to target a specific DP rank (#19102 ) Signed-off-by: Jon Swenson <jmswen@gmail.com>	2025-06-04 08:26:47 -07:00
Seiji Eicher	2669a0d7b5	Fix ValueError: Missing value for tag key(s): model_name,engine. (#19113 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-06-04 17:10:45 +08:00
Siyuan Liu	8e972d9c44	[TPU] Skip hanging tests (#19115 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-06-04 01:43:00 -07:00
Yan Ru Pei	b712be98c7	feat: add data parallel rank to KVEventBatch (#18925 )	2025-06-03 17:14:20 -07:00
Chen Zhang	a8da78eac9	[Bugfix] Max concurrency estimation and check_enough_kv_cache_memory for models with sliding window layers (#19029 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-04 00:14:06 +00:00
Chen Zhang	b5fd9506c1	[Bugfix] get_num_blocks_to_allocate with null_block (#19031 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-03 15:30:55 -07:00
Chen Zhang	6cac54f4d1	[v1] Re-init input batch for multiple kv cache groups (#18654 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-03 21:41:36 +00:00
Yong Hoon Shin	bdf13965ab	[V1] Support cross-layer KV sharing (#18212 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-06-03 20:33:07 +00:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Chen Zhang	f32fcd9444	[v1][KVCacheManager] Rename BlockHashType to BlockHash (#19015 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-03 08:01:48 +00:00
Rui Qiao	bdce64f236	[V1] Support DP with Ray (#18779 )	2025-06-02 21:15:13 -07:00
Siyuan Liu	9112b443a0	[Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com> Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>	2025-06-03 00:06:20 +00:00
22quinn	9760fd8f6a	[Core] Support inplace model weights loading (#18745 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-02 17:38:50 +08:00
Nick Hill	2dbe8c0774	[Perf] API-server scaleout with many-to-many server-engine comms (#17546 )	2025-05-30 08:17:00 -07:00
Carol Zheng	fba02e3bd1	[Bugfix][TPU] Fix tpu model runner testcase failure (#18810 ) Signed-off-by: Carol Zheng <cazheng@google.com>	2025-05-30 18:04:03 +08:00
Nick Hill	d1d61f3351	[BugFix] Make DP work with connector-delayed new requests (#18559 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Will Eaton <weaton@redhat.com>	2025-05-29 18:04:18 +00:00
Nicolò Lucchesi	32ce3cf7c9	[V1] Allocate kv_cache with stride order for V1 (#18775 ) Signed-off-by: nicklucche <nlucches@redhat.com>	2025-05-29 17:54:16 +00:00
Mark McLoughlin	06a0338015	[V1][Metrics] Add API for accessing in-memory Prometheus metrics (#17010 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-27 09:37:06 +00:00
qizixi	c1e4a4052d	[V1][Spec Decode] Support multi-layer eagle draft model (#18030 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-05-24 09:45:34 +00:00
qizixi	d55e446d13	[V1][Spec Decode] Small refactors to improve eagle bookkeeping performance (#18424 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-05-24 06:51:22 +00:00
Robert Shaw	2b10ba7491	[Bugfix][Nixl] Fix Preemption Bug (#18631 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-05-23 23:30:16 +00:00
Feng XiaoLong	4fc1bf813a	[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454 ) Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com> Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>	2025-05-23 16:16:26 -07:00
Chen Zhang	6550114c9c	[v1] Redo "Support multiple KV cache groups in GPU model runner (#17945 )" (#18593 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-23 09:39:47 -07:00
Chauncey	b046cf792d	[Feature][V1]: suupports cached_tokens in response usage (#18149 ) Co-authored-by: simon-mo <xmo@berkeley.edu>	2025-05-23 01:41:03 -07:00
lkchen	e44d8ce8c7	[Bugfix] Set `KVTransferConfig.engine_id` in post_init (#18576 ) Signed-off-by: Linkun Chen <github@lkchen.net>	2025-05-23 02:54:42 +00:00
Mark McLoughlin	c6b636f9fb	[V1][Spec Decoding] Use model_loader.get_model() to load models (#18273 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-23 02:05:44 +00:00
rasmith	46791e1b4b	[AMD] [P/D] Compute num gpus for ROCm correctly in run_accuracy_test.sh (#18568 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-05-22 18:45:35 -07:00

1 2 3 4 5 ...

283 Commits