vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Trevor Royer	55f1a468d9	Move cli args docs to its own page (#18228 ) (#18264 ) Signed-off-by: Trevor Royer <troyer@redhat.com>	2025-05-16 19:43:45 -07:00
Michael Goin	fd195b194e	[V1][P/D] Local attention optimization for NIXL (#18170 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-16 21:16:33 -04:00
Woosuk Kwon	fabe89bbc4	[Spec Decode] Don't fall back to V0 when spec decoding is enabled (#18265 )	2025-05-16 16:10:27 -07:00
Jinzhen Lin	e73b7dfd69	[Bugfix] fix `an illegal memory access was encountered` of marlin kernel + act_order (#18245 )	2025-05-16 16:02:44 -07:00
Bowen Wang	7fdfa01530	[Sampler] Adapt to FlashInfer 0.2.3 sampler API (#15777 ) Signed-off-by: Bowen Wang <abmfy@icloud.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-05-16 15:14:03 -07:00
Sanger Steel	aef94c6d07	[CI] Assign reviewer to mergify with changes to Tensorizer files (#18278 )	2025-05-16 12:04:14 -07:00
Nick Hill	0ceaebf87b	[BugFix] Fix ordering of KVConnector finished send/rcv sets (#18211 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-16 09:20:54 -07:00
Nick Hill	1db4f47f81	[BugFix] Fix multi async save in MultiConnector (#18246 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-16 08:13:47 -07:00
Reid	d3d91b6f71	[Misc][MacOS] fix bfloat16 error (#18249 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-16 15:05:59 +00:00
learner0810	87d871470d	[Model] Use autoweightloader for dbrx (#18251 ) Signed-off-by: learner0810 <zhongjun.li@daocloud.io>	2025-05-16 07:54:13 -07:00
fxmarty-amd	a5f8c111c2	[Fix] Fix typo in `resolve_hf_chat_template` (#18259 ) Signed-off-by: Felix Marty <felmarty@amd.com>	2025-05-16 14:52:41 +00:00
Lain	e23564cb70	use ceil_div in cutlass block scaling shape check (#17918 )	2025-05-16 03:02:58 -07:00
Isotr0py	390ec88905	[Misc] Consolidate Audio tests into multimodal common generation tests (#18214 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-16 09:18:08 +00:00
Seiji Eicher	541817670c	[Misc] Add Ray Prometheus logger to V1 (#17925 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-05-16 01:02:42 -07:00
Vadim Gimpelson	67da5720d4	[PERF] Speed up Qwen2.5-VL model by speed up rotary position embedding (#17973 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>	2025-05-15 23:31:02 -07:00
David Xia	5c04bb8b86	[doc] fix multimodal example script (#18089 ) Signed-off-by: David Xia <david@davidxia.com>	2025-05-16 06:05:34 +00:00
Lucia Fang	3d2779c29a	[Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 (#17827 ) Signed-off-by: Lucia Fang <fanglu@fb.com>	2025-05-15 22:28:27 -07:00
Will Eaton	6b31c84aff	Throw better error for when running into k8s service discovery issue (#18209 ) Signed-off-by: Will Eaton <weaton@redhat.com>	2025-05-15 21:07:28 -07:00
Harry Mellor	b18201fe06	Allow users to pass arbitrary JSON keys from CLI (#18208 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-15 21:05:34 -07:00
Sky Lee	f4937a51c1	[Model] vLLM v1 supports Medusa (#17956 ) Signed-off-by: lisiqi23 <lisiqi23@xiaomi.com> Signed-off-by: skylee-01 <497627264@qq.com> Co-authored-by: lisiqi23 <lisiqi23@xiaomi.com>	2025-05-15 21:05:31 -07:00
kliuae	ee659e3b60	[Bugfix][ROCm] Use `chunked_prefill_paged_decode` as fallback for V1 attention on ROCm (#18093 ) Signed-off-by: kf <kuanfu.liu@embeddedllm.com>	2025-05-15 19:30:17 -07:00
Lucas Wilkinson	4e1c6a0264	[Bugfix] fix rotary embedding test for _get_padded_tensor_shape (#18229 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-16 01:32:45 +00:00
Lucas Wilkinson	c7852a6d9b	[Build] Allow shipping PTX on a per-file basis (#18155 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-15 16:41:55 -07:00
Lucia Fang	8795eb9975	[Bugfix] Fix test_eagle test (#18223 ) Signed-off-by: Lucia Fang <fanglu@fb.com>	2025-05-15 15:59:42 -07:00
Alexei-V-Ivanov-AMD	0b34593017	Adding "AMD: Tensorizer Test" to amdproduction. (#18216 )	2025-05-15 11:01:25 -07:00
Nicolò Lucchesi	e3f3aee6f4	[Misc] Avoid cuda graph log when sizes still match (#18202 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-05-15 09:59:38 -07:00
TJian	92540529c0	[Bugfix] [ROCm]: Remove assertion logic when using AITER fused moe in unquantizedMethod to reenable LLama4 BF16 (#18205 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-15 09:53:18 -07:00
Zhonghua Deng	fadb8d5c2d	[Bugfix]Change the exception thrown by call_hf_processor from RuntimeError to ValueError (#18181 ) Signed-off-by: Abatom <abzhonghua@gmail.com>	2025-05-15 09:01:47 -07:00
Sebastian Schoennenbeck	2aa5470ac5	[Frontend] Fix chat template content format detection (#18190 ) Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com>	2025-05-15 09:00:21 -07:00
Harry Mellor	51ff154639	Improve examples rendering in docs and GitHub (#18203 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-15 15:57:49 +00:00
Alexei-V-Ivanov-AMD	566ec04c3d	Adding "Basic Models Test" and "Multi-Modal Models Test (Extended) 3" in AMD Pipeline (#18106 ) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-05-15 08:49:23 -07:00
Thomas Parnell	01c22335ba	[Kernel] [V1] Fix performance regression for triton unified attention (#18161 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-15 06:39:00 -07:00
hustxiayang	451da4bcbd	add tools into TokenizeChatRequest (#18187 ) Signed-off-by: yangxia <yangxiast@gmail.com>	2025-05-15 04:01:49 -07:00
Harry Mellor	07ad27121f	Update deprecated type hinting in `model_loader` (#18130 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-15 04:00:21 -07:00
omahs	a9944aabfa	fix: typos (#18151 ) Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>	2025-05-15 02:16:15 -07:00
Russell Bryant	a8f5aec20a	[V1] Update zmq socket creation in nixl connector (#18148 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-14 23:17:57 -07:00
David Xia	de71fec81b	[CI] don't skip fixed `test_kv_cache_events()` (#18183 ) Signed-off-by: David Xia <david@davidxia.com>	2025-05-14 23:17:16 -07:00
Mengqing Cao	70f8b96724	[Bugfix] Fix FusedMoEPrepareAndFinalize for cuda-disalike backends (#18178 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-05-14 23:16:31 -07:00
inkcherry	dd2a94596a	[Model] Allow the use of sliding window in Qwen2 (#17772 ) Signed-off-by: inkcherry <mingzhi.liu@intel.com>	2025-05-14 22:29:38 -07:00
Ning Xie	420caf7557	[UT] Add ut for none hash (#17892 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-05-15 13:28:11 +08:00
Chenheli Hua	4f07a64075	Support custom implementations of VideoLoader backends. (#18091 )	2025-05-15 13:26:49 +08:00
Thomas Parnell	e6b8e65d2d	[Bugfix] Fix fp8 tests for triton_unified_attention for Triton 3.3 (#18013 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-15 13:26:34 +08:00
Harry Mellor	26d0419309	Update deprecated type hinting in `models` (#18132 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-14 22:06:50 -07:00
Luka Govedič	83f74c698f	[Fix][ROCm] Enforce eager for all encoder-decoder models on ROCm (#18154 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2025-05-14 22:04:43 -07:00
Reid	2dff093574	[Misc] add lobe-chat support (#18177 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-15 05:02:23 +00:00
Aaron Pham	afe3236e90	[Chore] astral's ty (#18116 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-05-15 05:00:43 +00:00
Mark McLoughlin	65334ef3b9	[V1][Metrics] Remove unused code (#18158 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-14 20:13:17 -07:00
Chen Zhang	e60f550b38	[v1] Support multiple KV cache groups in GPU model runner (#17945 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-14 18:54:54 -07:00
David Xia	f25e0d1125	[Bugfix]: make most of `test_openai_schema.py` pass (#17664 )	2025-05-14 17:04:35 -07:00
Andrey Talman	09f106a91e	Upload vllm index for the rc builds (#18173 )	2025-05-14 16:35:56 -07:00

1 2 3 4 5 ...

6557 Commits All Branches Search

6557 Commits

All Branches