Roger Wang
e17250f0d2
fix precommit
2025-06-18 21:17:43 -07:00
Jee Jee Li
4959915089
[Quantization] Modify the logic of BNB double quantization ( #19742 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-06-19 03:52:09 +00:00
Lu Fang
8d1e89d946
[Misc][ROCm] Enforce no unused variable in ROCm C++ files ( #19796 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-06-18 20:25:15 -07:00
Michael Goin
36239f79dd
Fix FA2 fallback for Blackwell V1 ( #19781 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-19 09:53:55 +08:00
afeldman-nm
dfada85eee
[Frontend] Expose custom args in OpenAI APIs ( #16862 )
...
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-06-18 17:41:11 -07:00
Richard Zou
ed33349738
[BugFix] Fix use_cudagraph=False ( #19612 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com>
2025-06-19 08:23:12 +08:00
Woosuk Kwon
d49adea1f9
[Multimodal] Use fast processor for Qwen2/2.5-VL ( #19789 )
2025-06-18 15:49:40 -07:00
Russell Bryant
14fdd21d39
[Core] More fixes to MultiModalEmbeddings type handling ( #19715 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-06-18 22:48:29 +00:00
QiliangCui
04fefe7c9a
[TPU] Update torch-xla version to include paged attention tuned block change ( #19813 )
...
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-06-18 22:41:13 +00:00
Lukas Geiger
3b523e38d9
[Core] Do not copy array during hashing ( #19484 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-06-18 15:36:55 -07:00
afeldman-nm
16c16301c8
Disable "Forbid direct 'import triton'" check for `vllm/triton_utils/importing.py` in an extensible way ( #19783 )
...
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
2025-06-18 15:08:00 -07:00
Nathan Weinberg
9206d0ff01
docs: fix Slack bulletpoint in README ( #19811 )
...
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-06-18 20:47:08 +00:00
Chen Zhang
a89209b78d
[v1] Support mamba2 ( #19327 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-06-18 20:34:15 +00:00
Russell Bryant
ffacb222cb
[Docs] Add Huzaifa Sidhpurwala to vuln mgmt team doc ( #19808 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-06-18 20:22:28 +00:00
Chauncey
12575cfa7a
[Bugfix] fix RAY_CGRAPH_get_timeout is not set successfully ( #19725 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-06-18 10:26:16 -07:00
Zzz9990
8b6e1d639c
[Hardware][AMD] integrate aiter chunked prefill into vllm ( #18596 )
...
Signed-off-by: fsx950223 <fsx950223@outlook.com>
Signed-off-by: charlifu <charlifu@amd.com>
Co-authored-by: fsx950223 <fsx950223@outlook.com>
Co-authored-by: charlifu <charlifu@amd.com>
2025-06-18 08:46:51 -07:00
Lu Fang
735a9de71f
[Qwen] Add tagging rule for Qwen related PRs ( #19799 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-06-18 14:26:43 +00:00
wangxiyuan
257ab95439
[Platform] Allow platform use V1 Engine by default ( #19792 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-06-18 13:03:36 +00:00
Reid
cca91a7a10
[doc] fix the incorrect label ( #19787 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-18 10:30:58 +00:00
Woosuk Kwon
f04d604567
[Minor] Zero-initialize attn output buffer ( #19784 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-18 06:59:27 +00:00
afeldman-nm
19a53b2783
[V1] Decouple GPU and TPU `InputBatch` ( #19778 )
...
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
2025-06-18 06:38:13 +00:00
Zhonghua Deng
eccdc8318c
[V1][P/D] An native implementation of xPyD based on P2P NCCL ( #18242 )
...
Signed-off-by: Abatom <abzhonghua@gmail.com>
2025-06-18 06:32:36 +00:00
Russell Bryant
5f52a84685
[V1] Add API docs for EncoderCacheManager ( #19294 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-06-18 13:37:01 +08:00
lkchen
d4629dc43f
[Misc] Add __str__ for RequestStatus ( #19780 )
...
Signed-off-by: Linkun Chen <github@lkchen.net>
2025-06-18 03:03:01 +00:00
Ning Xie
6e9cc73f67
[MISC] correct DeviceConfig device field static type analysis ( #19699 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-06-17 17:21:50 -07:00
Ning Xie
c53711bd63
[MISC] correct copy_blocks src_to_dists param type ( #19696 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-06-17 17:21:06 -07:00
Chenyaaang
dac8cc49f4
[TPU] Update torch version to include paged attention kernel change ( #19706 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com>
2025-06-17 22:24:49 +00:00
Charlie Fu
a44b1c951d
[Feature][ROCm] Add full graph capture support for TritonAttentionBackend ( #19158 )
...
Signed-off-by: charlifu <charlifu@amd.com>
2025-06-17 17:03:06 -04:00
Michael Goin
b447624ee3
[Bugfix] Fix faulty triton importing logic when using Ray for DP ( #19734 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-17 20:59:29 +00:00
Jiayi Yao
cda92307c1
[Misc] Update lmcache connector with the latest connector apis ( #19441 )
...
Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>
2025-06-17 19:57:54 +00:00
Michael Goin
bf57ccc5c2
Remove sm120 arch from sm100 cutlass kernel arch list ( #19716 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-17 11:49:39 -07:00
Wentao Ye
ffb2cd6b54
[Perf] Optimize `moe_align_block_size` CUDA kernel ( #19572 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-06-17 11:49:26 -07:00
Isotr0py
ca94d7fa00
[Bugfix] Update multimodel models mapping to fit new checkpoint after Transformers v4.52 ( #19151 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-17 15:58:38 +00:00
CYJiang
5a1c2e15d8
[Mis] remove duplicate engine status checks ( #19647 )
...
Signed-off-by: googs1025 <googs1025@gmail.com>
2025-06-17 08:17:38 -07:00
Nicolò Lucchesi
4c8f64faa7
[V1][Kernel] Flashinfer HND KV cache layout ( #19280 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-06-17 09:09:22 -04:00
David Xia
93aee29fdb
[doc] split "Other AI Accelerators" tabs ( #19708 )
2025-06-17 22:05:29 +09:00
Reid
154d063b9f
[doc][mkdocs] Add edit button to documentation ( #19637 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-17 11:10:31 +00:00
jvlunteren
ccd7c05089
[Kernel] Add Split-KV Support to Unified Triton Attention Kernel ( #19152 )
...
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>
2025-06-17 10:45:07 +00:00
Huy Do
c48c6c4008
Add a doc on how to update PyTorch version ( #19705 )
2025-06-17 18:10:37 +08:00
Isotr0py
aed8468642
[Doc] Add missing llava family multi-image examples ( #19698 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-17 07:05:21 +00:00
quanliu
5c76b9cdaf
[Core] add remove_seq_from_computed_blocks_tracker to BlockSpaceManager ( #19686 )
...
Signed-off-by: 刘全 <quan.liu2@dbappsecurity.com.cn>
Co-authored-by: 刘全 <quan.liu2@dbappsecurity.com.cn>
2025-06-17 04:40:58 +00:00
Driss Guessous
ddfed314f9
Fixes IMA for TP w/ flex-attention ( #19712 )
...
Signed-off-by: drisspg <drisspguessous@gmail.com>
2025-06-17 04:01:50 +00:00
Di Liu
5b3ad5ecf2
[DOC] fix doc typos ( #19600 )
...
Signed-off-by: Di Liu <liu-di@sjtu.edu.cn>
2025-06-17 11:34:53 +08:00
nguyenhoangthuan99
ede5c4ebdf
[Frontend] add chunking audio for > 30s audio ( #19597 )
...
Signed-off-by: nguyenhoangthuan99 <thuanhppro12@gmail.com>
2025-06-17 11:34:00 +08:00
Lucas Wilkinson
07334959d8
[Wheel Size] Only build FA2 8.0+PTX ( #19336 )
2025-06-17 12:32:49 +09:00
David Xia
119f683949
[doc] add project flag to gcloud TPU command ( #19664 )
...
Signed-off-by: David Xia <david@davidxia.com>
2025-06-17 01:00:09 +00:00
Conroy Cheers
0860087aff
[Fix] Fall back to Gloo when NCCL backend is unavailable ( #19641 )
...
Signed-off-by: conroy-cheers <conroy@corncheese.org>
2025-06-17 08:42:14 +08:00
Dipika Sikka
6bc7b57315
[Quantization] Remove FP4 emulation; Fall-back to marlin for device < 100 ( #19563 )
2025-06-16 17:33:51 -04:00
Russell Bryant
90f9c2eb5c
[V1] Change return type on get_multimodal_embeddings() ( #19446 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-06-16 13:32:15 -04:00
qscqesze
387bdf0ab9
[Model] Add support for MiniMaxM1ForCausalLM (shares architecture with MiniMaxText01ForCausalLM) ( #19677 )
...
Signed-off-by: QscQ <qscqesze@gmail.com>
2025-06-16 09:47:14 -07:00