Commit Graph

7204 Commits

Author SHA1 Message Date
Reid 441dc63ac7
[Frontend] improve vllm serve --help display (#18643)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-24 07:53:22 +00:00
qizixi d55e446d13
[V1][Spec Decode] Small refactors to improve eagle bookkeeping performance (#18424)
Signed-off-by: qizixi <qizixi@meta.com>
2025-05-24 06:51:22 +00:00
Wenhua Cheng ec82c3e388
FIX MOE issue in AutoRound format (#18586)
Signed-off-by: wenhuach21 <wenhua.cheng@intel.com>
2025-05-23 22:01:40 -07:00
Mathieu Borderé 45ab403a1f
config.py: Clarify that only local GGUF checkpoints are supported. (#18623)
Signed-off-by: Mathieu Bordere <mathieu@letmetweakit.com>
2025-05-24 08:46:34 +08:00
Robert Shaw 2b10ba7491
[Bugfix][Nixl] Fix Preemption Bug (#18631)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-05-23 23:30:16 +00:00
Feng XiaoLong 4fc1bf813a
[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454)
Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com>
Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>
2025-05-23 16:16:26 -07:00
Pavani Majety f2036734fb
[ModelOpt] Introduce VLLM_MAX_TOKENS_PER_EXPERT_FP4_MOE env var to control blockscale tensor allocation (#18160)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-05-23 15:52:20 -07:00
Cyrus Leung 7d9216495c
[Doc] Update references to doc files (#18637)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-23 15:49:21 -07:00
Michael Goin 0ddf88e16e
[CI] Enable test_initialization to run on V1 (#16736)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-05-23 15:09:44 -07:00
Huy Do 1645b60196
Use prebuilt FlashInfer x86_64 PyTorch 2.7 CUDA 12.8 wheel for CI (#18537)
Signed-off-by: Huy Do <huydhn@gmail.com>
2025-05-23 21:17:16 +00:00
Jiayi Yao 2628a69e35
[V1] Support Deepseek MTP (#18435)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>
Co-authored-by: Rui Qiao <ruisearch42@gmail.com>
2025-05-23 10:26:28 -07:00
Cyrus Leung 371f7e4ca2
[Doc] Fix broken links and unlinked docs, add shortcuts to home sidebar (#18627)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-23 10:22:40 -07:00
Cyrus Leung 15b45ffb9a
[Doc] Avoid documenting dynamic / internal modules (#18626)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-23 09:58:02 -07:00
Cyrus Leung 273cb3b4d9
[Doc] Fix top-level API links/docs (#18621)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-23 09:46:56 -07:00
David Xia 8ddd1cf26a
[Doc] fix list formatting (#18624)
Signed-off-by: David Xia <david@davidxia.com>
2025-05-23 09:41:17 -07:00
Chen Zhang 6550114c9c
[v1] Redo "Support multiple KV cache groups in GPU model runner (#17945)" (#18593)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-05-23 09:39:47 -07:00
Michael Goin 9520a989df
[Docs] Change mkdocs to not use directory urls (#18622)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-05-23 09:33:21 -07:00
Harry Mellor 3d28ad343f
Fix figures in design doc (#18612)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-23 09:09:54 -07:00
youkaichao 6a7988c55b
Refactor pplx init logic to make it modular (prepare for deepep) (#18200)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-05-23 23:43:43 +08:00
Cyrus Leung 022d8abe29
[Doc] Use a different color for the announcement (#18616)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-23 08:25:03 -07:00
Hyogeun Oh (오효근) 5221815a00
[Doc] Fix markdown list indentation for MkDocs rendering (#18620)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
2025-05-23 08:23:21 -07:00
Simon Mo 1068556b2c
[Bugfix][Build/CI] Fixup CUDA compiler version check for CUDA_SUPPORTED_ARCHS (#18579) 2025-05-23 07:43:58 -07:00
Reid 2cd1fa4556
[Misc] add Haystack integration (#18601)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-23 06:21:19 -07:00
Harry Mellor d4c2919760
Include private attributes in API documentation (#18614)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-23 06:18:31 -07:00
Tristan Leclercq 6220f3c6b0
[Bugfix] Fix transformers model impl ignored for mixtral quant (#18602)
Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com>
2025-05-23 05:54:13 -07:00
Harry Mellor 52fb23f47e
Fix examples with code blocks in docs (#18609)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-23 05:53:44 -07:00
Cyrus Leung 6dd51c7ef1
[CI/Build] Fix V1 flag being set in entrypoints tests (#18598)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-23 05:51:53 -07:00
Harry Mellor 2edb533af2
Replace `{func}` with mkdocs style links (#18610)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-23 05:51:38 -07:00
Hyogeun Oh (오효근) 38a95cb4a8
[Doc] Fix indent of contributing to vllm (#18611)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
2025-05-23 05:50:07 -07:00
Ning Xie cd821ea5d2
[CI] fix kv_cache_type argument (#18594)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-05-23 04:49:18 -07:00
Kay Yan 7ab056c273
[Hardware][CPU] Update intel_extension_for_pytorch 2.7.0 and move to `requirements/cpu.txt` (#18542)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
2025-05-23 04:38:42 -07:00
Harry Mellor 6526e05111
Add myself as docs code owner (#18605)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-23 04:08:31 -07:00
Madeesh Kannan e493e48524
[V0][Bugfix] Fix parallel sampling performance regression when guided decoding is enabled (#17731)
Signed-off-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-05-23 03:38:23 -07:00
Mengqing Cao 4ce64e2df4
[Bugfix][Model] Fix baichuan model loader for tp (#18597)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
2025-05-23 02:39:05 -07:00
Cyrus Leung fbb13a2c15
Revert "[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (#18034)" (#18600)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-23 02:18:22 -07:00
Harry Mellor a1fe24d961
Migrate docs from Sphinx to MkDocs (#18145)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-23 02:09:53 -07:00
Yuqi Zhang d0bc2f810b
[Bugfix] Add half type support in reshape_and_cache_cpu_impl on x86 cpu platform (#18430)
Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
Co-authored-by: Yuqi Zhang <yuqizhang@google.com>
2025-05-23 01:41:37 -07:00
Chauncey b046cf792d
[Feature][V1]: suupports cached_tokens in response usage (#18149)
Co-authored-by: simon-mo <xmo@berkeley.edu>
2025-05-23 01:41:03 -07:00
Michael Goin 54af915949
[Doc] Update quickstart and install for cu128 using `--torch-backend=auto` (#18505)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-05-23 08:36:37 +00:00
cascade 71ea614d4a
[Feature]Add async tensor parallelism using compilation pass (#17882)
Signed-off-by: cascade812 <cascade812@outlook.com>
2025-05-23 01:03:34 -07:00
RonaldBXu 4c611348a7
[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (#18034)
Signed-off-by: Ronald Xu <ronaldxu@amazon.com>
2025-05-23 00:37:18 -07:00
Ning Xie 60cad94b86
[Hardware] correct method signatures for HPU,ROCm,XPU (#18551)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-05-22 22:31:59 -07:00
Shanshan Shen 9c1baa5bc6
[Misc] Replace `cuda` hard code with `current_platform` (#16983)
Signed-off-by: shen-shanshan <467638484@qq.com>
2025-05-23 04:38:50 +00:00
Teruaki Ishizaki 4be2255c81
[Bugfix][Benchmarks] Fix a benchmark of deepspeed-mii backend to use api_key (#17291)
Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com>
2025-05-23 12:30:47 +08:00
aws-elaineyz ed5d408255
[Neuron] Remove bypass on EAGLEConfig and add a test (#18514)
Signed-off-by: Elaine Zhao <elaineyz@amazon.com>
2025-05-22 21:26:32 -07:00
Benjamin Chislett 583507d130
[Spec Decode] Make EAGLE3 draft token ID mapping optional (#18488)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-05-22 20:17:39 -07:00
lkchen e44d8ce8c7
[Bugfix] Set `KVTransferConfig.engine_id` in post_init (#18576)
Signed-off-by: Linkun Chen <github@lkchen.net>
2025-05-23 02:54:42 +00:00
Nick Hill 93ecb8139c
[BugFix] Increase TP execute_model timeout (#18558)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-05-23 10:22:11 +08:00
CYJiang fae453f8ce
[Misc] refactor: simplify input validation and num_requests handling in _convert_v1_inputs (#18482)
Signed-off-by: googs1025 <googs1025@gmail.com>
2025-05-23 10:15:32 +08:00
Harry Mellor 4b0da7b60e
Enable hybrid attention models for Transformers backend (#18494)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-23 10:12:08 +08:00