vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Richard Zou	77f0d465d0	[BugFix] Allow use_cudagraph to work with dynamic VLLM_USE_V1 (#19390 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-06-11 07:54:41 +08:00
Xu Wenqing	22c3c0aa4a	Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B-FP8 (#19401 ) Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>	2025-06-11 07:23:57 +08:00
py-andy-c	33f8dba7c6	[Model] use AutoWeightsLoader for commandr (#19399 ) Signed-off-by: py-andy-c <pychen1017@gmail.com>	2025-06-10 22:42:21 +00:00
Gregory Shtrasberg	5241ca50d6	[ROCm][V1] Adding ROCm to the list of plaforms using V1 by default (#19440 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-06-10 22:06:15 +00:00
Russell Bryant	da9b523ce1	[Docs] Note that alternative structured output backends are supported (#19426 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-06-10 16:20:00 +00:00
Jee Jee Li	b6553be1bc	[Misc] Slight improvement of the BNB (#19418 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-06-10 13:51:49 +00:00
youkaichao	64a9af5afa	Simplify ep kernels installation (#19412 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-06-10 20:06:08 +08:00
Li, Jiang	e4248849ec	[BugFix][CPU] Fix CPU CI by ignore collecting test_pixtral (#19411 ) Signed-off-by: jiang.li <jiang1.li@intel.com>	2025-06-10 12:02:40 +00:00
Rachel Guo	467bef18a3	[BugFix][FlashInfer] Fix attention backend interface mismatch with unexpected keyword `use_irope` (#19134 ) Signed-off-by: Yunqiu Guo <guorachel@meta.com>	2025-06-10 16:48:51 +08:00
Isotr0py	5f1ac1e1d1	Revert "[v1] Add fp32 support to v1 engine through flex attn" (#19404 )	2025-06-10 01:30:20 -07:00
Louie Tsai	9368cc90b2	Automatically bind CPU OMP Threads of a rank to CPU ids of a NUMA node. (#17930 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Co-authored-by: Li, Jiang <bigpyj64@gmail.com>	2025-06-10 06:22:05 +00:00
Anna Pendleton	32b3946bb4	Add clear documentation around the impact of debugging flag (#19369 ) Signed-off-by: Anna Pendleton <pendleton@google.com>	2025-06-10 06:16:09 +00:00
Reid	6b1391ca7e	[Misc] refactor neuron_multimodal and profiling (#19397 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-10 06:12:42 +00:00
Russell Bryant	a3f66e75d1	Add security warning to bug report template (#19365 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-06-10 06:06:36 +00:00
Lukas Geiger	319cb1e351	[Core] Batch multi modal input using pinned memory (#19169 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-06-10 13:44:59 +08:00
Li Wang	1efef71645	[Bugfix] Fix modelscope token passed in (#19389 ) Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-06-10 13:39:37 +08:00
Nick Hill	646d62f636	[Core] Use tuple for kv cache group block ids (#19175 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-10 07:01:17 +02:00
Reid	6cd4ae8acd	[Frontend] Add tqdm_leave_pbar to control progress bar visibility (#19357 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-10 04:55:09 +00:00
Harry Mellor	c016047ed7	Fix docs/mkdocs/hooks/remove_announcement.py (#19382 )	2025-06-09 21:36:54 -07:00
XiongfeiWei	9af6d22e4c	Use xla flag to improve the quantized model performance (#19303 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-06-10 01:28:45 +00:00
Tianyu Guo	4589b94032	[Bugfix] Fix benchmark_moe.py (#19016 ) Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn>	2025-06-09 18:04:36 -07:00
Ye (Charlotte) Qi	cc867be19c	[V1] Reuse V0's memory_profiling util for gpu worker memory profiling (#19312 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-06-10 08:40:01 +08:00
Siyuan Liu	3a7cd627a8	[Misc] Fix a config typo in disable_hybrid_kv_cache_manager configuration (#19383 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-06-09 16:41:51 -07:00
Pavani Majety	8058c91108	[HOT-FIX] Add `kv_sharing_target_layer_name` argument to cutlass_mla backend (#19374 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-06-09 19:00:07 -04:00
Siyuan Liu	7d44c469fe	[TPU]Fix KV cache sharing tests (#19371 )	2025-06-09 18:38:15 -04:00
liusiqian-tal	31f58be96a	[Frontend] Make TIMEOUT_KEEP_ALIVE configurable through env var (#18472 ) Signed-off-by: liusiqian <liusiqian@tal.com>	2025-06-09 21:41:21 +00:00
Kyle Sayers	ebb2f383b8	[Quantization] Bump compressed-tensors version (#19295 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-06-09 14:33:15 -07:00
22quinn	c1c7dbbeeb	[Bugfix][Core] Prevent token lengths exceeding `max_model_len` in V0 (#19348 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-09 23:01:29 +08:00
Varun Sundar Rabindranath	5cf2daea9a	[Misc] Fixes and Optimizations for DeepEP + DeepGEMM combination. (#19298 ) Signed-off-by: Varun <vsundarr@redhat.com> Co-authored-by: Varun <vsundarr@redhat.com>	2025-06-09 10:50:39 -04:00
Isotr0py	b8089195b4	[v1] Add fp32 support to v1 engine through flex attn (#19319 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-06-09 22:10:44 +08:00
Yinghai Lu	770e5dcdb8	[full_graph] Fix query_start_loc padding (#19321 ) Signed-off-by: Yinghai Lu <yinghai@thinkingmachines.ai>	2025-06-09 21:32:56 +08:00
Michael Yao	c57c9415b1	[Docs] Fix a bullet list in usage/security.md (#19358 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-06-09 13:28:51 +00:00
Lu Fang	01810f9236	[CI] Introduce rules for llama auto-label (#19323 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-06-09 20:05:42 +08:00
Conroy Cheers	59abbd84f9	[Fix] Allow kernel compilation for CUDA capability 8.7 (#19328 ) Signed-off-by: Conroy Cheers <conroy@corncheese.org>	2025-06-09 02:57:23 -07:00
Jee Jee Li	95a6568b5c	[CI/Build] Fix LoRA test (#19350 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-06-09 09:52:10 +00:00
Se7en	0eca5eacd0	[Doc] Fix description in the Automatic Prefix Caching design doc (#19333 ) Signed-off-by: cr7258 <chengzw258@163.com>	2025-06-09 17:30:02 +08:00
Reid	12e5829221	[doc] improve ci doc (#19307 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-09 07:26:12 +00:00
Richard Zou	3a4d417707	[Misc] Cleanup compilation tests (#19343 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-06-09 15:05:44 +08:00
Kseniya Parkhamchuk	8335667c22	[Frontend] Remove unreachable code from llm.py (#19288 ) Signed-off-by: KsuParkhamchuk <k.parkhamchuk@gmail.com>	2025-06-09 10:22:10 +08:00
Isotr0py	e1c4380d4c	[Misc] Add documentation update reminder to PR template (#19289 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-09 10:20:53 +08:00
Cyrus Leung	e31ae3de36	[Deprecation] Remove `inputs` arg fallback in Engine classes (#18799 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-06-09 10:19:56 +08:00
wang.yuqi	2ffb9b6e07	[Bugfix] model_max_length should consider max_model_len in tokenizer_config (#19201 )	2025-06-08 07:17:53 -07:00
jennyyyyzhen	cda10fa3e2	[Multi Modal] Add an env var for message queue max chunk bytes (#19242 ) Signed-off-by: yZhen <yZhen@fb.com> Co-authored-by: yZhen <yZhen@fb.com>	2025-06-08 21:39:12 +08:00
Dipika Sikka	c123bc33f9	[Quantization] Add compressed-tensors NVFP4 support (#18312 )	2025-06-08 09:05:55 -04:00
Akash kaothalkar	b9a1791e2c	[Hardware][POWER] Add IBM POWER11 Support to CPU Extension Detection (#19082 ) Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>	2025-06-08 09:17:14 +00:00
Xu Wenqing	989dcee981	Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B (#19315 ) Signed-off-by: Xu Wenqing <xuwq1993@qq.com>	2025-06-08 16:07:02 +08:00
Richard Zou	3d64d366e0	[Misc] Change tests/compile to use VLLM_V1 by default (#19302 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-06-08 16:06:48 +08:00
Richard Zou	eaa2e51088	[Bugfix] Re-enable use_cudagraph in vLLM v1 (#19299 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2025-06-08 08:56:12 +08:00
Chauncey	d77f7fb871	[Bugfix]: Fix TypeError: 'float' object cannot be interpreted as an integer (#19283 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-06-08 08:16:31 +08:00
Luka Govedič	2d8476e465	[BugFix][V1] Fix memory profiling bug (#18974 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-06-07 10:34:51 -07:00

... 2 3 4 5 6 ...

7204 Commits All Branches Search

7204 Commits

All Branches