Commit Graph

7204 Commits

Author SHA1 Message Date
Harry Mellor 27bebcd897
Convert `examples` to `ruff-format` (#18400)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-26 16:57:54 +00:00
Lukas Geiger e7523c2e03
[V1][Sampler] Improve performance of FlashInfer sampling by sampling logits instead of probs (#18608) 2025-05-26 11:49:36 -04:00
Cyrus Leung a869baca73
[Bugfix] Fix Llama GGUF initialization (#18717)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-26 07:49:22 -07:00
Cyrus Leung 82e2339b06
[Doc] Move examples and further reorganize user guide (#18666)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-26 07:38:04 -07:00
Cyrus Leung 9553fdb41e
[Doc] Improve API docs (#18713)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-26 07:33:34 -07:00
dylan 243eb9199f
[Bugfix]: handle hf-xet CAS error when loading Qwen3 weights in vLLM (#18701) 2025-05-26 07:10:56 -07:00
Reid 0665e29998
[Misc] add AutoGen integration (#18712)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-05-26 13:56:18 +00:00
Łukasz Durejko e76be06550
[Hardware][Intel-Gaudi] [CI/Build] Add tensor parallel size = 2 test to HPU CI (#18709)
Signed-off-by: Lukasz Durejko <ldurejko@habana.ai>
2025-05-26 05:26:07 -07:00
Isotr0py 0877750029
[CI/Build] Split pooling and generation extended language models tests in CI (#18705)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-05-26 04:00:08 -07:00
Naveassaf 6d68030f1c
[Model] Add support for YARN in NemotronNAS models (#18427)
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
2025-05-26 10:31:49 +00:00
Ning Xie 5a2c76cbe1
[CI] fix dump_input for str type (#18697)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-05-26 18:23:35 +08:00
Cyrus Leung 38b13dfe78
[CI/Build] Replace `math.isclose` with `pytest.approx` (#18703)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-26 02:05:17 -07:00
Cyrus Leung 61a45e7a72
[Bugfix] Fix Mistral-format models with sliding window (#18693)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-26 01:44:04 -07:00
Cyrus Leung 65523a0995
[Doc] Fix issue template format (#18699)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-26 00:45:39 -07:00
Cyrus Leung 4b7740a105
[GH] Add issue template for reporting CI failures (#18696)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-26 00:42:04 -07:00
Ning Xie 4ea62c0ea0
[CI] add missing argument (#18694)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-05-26 00:22:04 -07:00
Maximilien de Bayser 561b77a0d6
[Bugfix] Fix the lm_head in gpt_bigcode in lora mode (#6357)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
2025-05-26 14:52:25 +08:00
CYJiang abd4030d94
refactor: simplify request handler, use positive condition check for handler assignment (#18690)
Signed-off-by: googs1025 <googs1025@gmail.com>
2025-05-26 06:32:28 +00:00
AlexZhao 8820821b59
[Misc] Fixed the abnormally high TTFT issue in the PD disaggregation example (#18644)
Signed-off-by: zhaohaidao <zhaohaidao2008@hotmail.com>
Signed-off-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com>
Co-authored-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com>
2025-05-26 13:51:27 +08:00
Cyrus Leung fba0642704
[CI/Build][Doc] Update `gte-Qwen2-1.5B-instruct` usage (#18683)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-05-25 20:27:50 -07:00
Lukas Geiger 6071e989df
[Core][Multimodal] Convert PIL Image to array without data copy when hashing (#18682)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-05-25 17:33:35 +00:00
Cyrus Leung 57fd13a707
[Bugfix] Fix profiling dummy data for Pixtral (#18677)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-25 14:05:30 +00:00
Reid 3a886bd58c
[Misc] small improve (#18680)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-25 06:05:38 -07:00
Reid 35be8fad62
[CI/build] fix no regex (#18676)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-25 10:10:51 +00:00
Yuqi Zhang f2faac745d
[Bugfix] Fix cpu usage and cache hit stats reporting on cpu environment (#18674)
Signed-off-by: zzzyq <zhangyuqi94@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-05-25 02:36:06 -07:00
Reid 279f854519
[doc] improve readability (#18675)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-25 01:40:31 -07:00
Reid 624b77a2b3
[doc] fix broken links (#18671)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-25 01:36:33 -07:00
Cyrus Leung 503f8487c2
[Misc] Reduce logs on startup (#18649)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-24 23:03:53 -07:00
Ning Xie 44073a7ac3
[BUGFIX] catch subclass first for try...except (#18672)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-05-25 05:34:24 +00:00
Michael Goin 63934543a0
Speed up the `kernels/quantization/` tests (#18669)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-05-25 05:02:59 +00:00
Isotr0py 75f81750f3
[VLM] Initialize video input support for InternVL models (#18499)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-05-25 04:51:25 +00:00
Mengqing Cao 6ab681bcbe
[Misc][ModelScope] Change to use runtime VLLM_USE_MODELSCOPE (#18655)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-05-25 04:51:21 +00:00
Chenguang Li cebc22f3b6
[Misc]Replace `cuda` hard code with `current_platform` in Ray (#14668)
Signed-off-by: noemotiovon <757486878@qq.com>
2025-05-24 20:26:31 -07:00
Ning Xie 6c6dcd8611
[MISC] correct signature for LoaderFunction (#18670)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-05-24 20:17:47 -07:00
Seiji Eicher 7891fdf0c6
[V1] Fix _pickle.PicklingError: Can't pickle <class 'transformers_modules.deepseek-ai.DeepSeek-V2-Lite... (#18640)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
2025-05-24 20:07:20 -07:00
Woosuk Kwon 6825d9a998
[BugFix][Spec Decode] Improve Prefix Caching Logic in Speculative Decoding (#18668)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-05-24 17:33:46 -07:00
Reid b554ab736e
[CI/Build] fix permission denied issue (#18645)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-24 16:09:10 +00:00
Aaron Pham 9ea7f1abf3
fix(regression): clone from reference items (#18662)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2025-05-24 15:25:20 +00:00
Aaron Pham 2807271c86
[CI] enforce import regex instead of re (#18665)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2025-05-24 08:04:14 -07:00
wangxiyuan b9018a3f9f
[BugFix] Fix import error for fused_moe (#18642)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-05-24 07:53:36 -07:00
Ning Xie 4ceafb6299
[MISC] typo fix and clean import (#18664)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-05-24 07:52:09 -07:00
Cyrus Leung 2e6705784f
[CI/Build] `chmod +x` to `cleanup_pr_body.sh` (#18650)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-24 07:26:45 -07:00
Cyrus Leung 1cb194a018
[Doc] Reorganize user guide (#18661)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-24 07:25:33 -07:00
ztang2370 2cd4d58df4
[Model] use AutoWeightsLoader for gpt2 (#18625)
Signed-off-by: zt2370 <ztang2370@gmail.com>
2025-05-24 13:36:13 +00:00
Cyrus Leung 6d166a8d35
[Doc] Add community links (#18657)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-24 06:06:38 -07:00
Cyrus Leung ef1dd6870f
[Doc] Fix indentation problems in V0 Paged Attention docs (#18659)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-24 06:06:35 -07:00
Mengqing Cao e77dc4bad8
[MISC][pre-commit] Add pre-commit check for triton import (#17716)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
2025-05-24 20:09:15 +08:00
Cyrus Leung 07458a51ce
[Doc] Update README links, mark external links (#18635)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-24 09:57:15 +00:00
qizixi c1e4a4052d
[V1][Spec Decode] Support multi-layer eagle draft model (#18030)
Signed-off-by: qizixi <qizixi@meta.com>
2025-05-24 09:45:34 +00:00
Yuanhao WU a859320575
[Model] Add support for Qwen2.5-Omni-7B-AWQ (Qwen2_5OmniForConditionalGeneration) (#18647) 2025-05-24 09:15:36 +00:00