Commit Graph

7469 Commits

Author SHA1 Message Date
Li, Jiang ec0ff9f64b
[V0 deprecation] Remove V0 CPU (#20437)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-03 20:26:30 -07:00
Woosuk Kwon c139974a79 enforce eager
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-03 08:19:48 -07:00
Woosuk Kwon 45878b0233 fix
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-03 08:05:21 -07:00
Woosuk Kwon b32187f244 fix attn test
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-03 08:01:56 -07:00
Woosuk Kwon 475c22e1ab Merge branch 'main' into woosuk/remove-v0-part1 2025-07-03 07:59:48 -07:00
Nicolò Lucchesi d1b689c445
[Bugfix] Fix flaky `test_streaming_response` test (#20363)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-07-03 14:46:24 +00:00
Reid 9854dc9040
[Frontend] improve vllm bench <bench_type> --help display (#20430)
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-03 14:22:16 +00:00
Isotr0py ff5c60fad8
[Misc] Automatically tag PRs to add new models (#20222)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-03 07:11:03 -07:00
wang.yuqi 6f1229f91d
[Model][2/N] Automatic conversion of CrossEncoding model (#19978)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-07-03 13:59:23 +00:00
Jee Jee Li 1819fbda63
[Quantization] Bump to use latest bitsandbytes (#20424)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-03 21:58:46 +08:00
Li, Jiang 7f0367109e
[CI/Build][CPU] Enable cross compilation in CPU release pipeline (#20423)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-03 05:26:12 -07:00
Ning Xie fb14d53cf6
[Kernel] refactor cpu worker v0 cache dtype (#20080)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-07-03 08:39:14 +00:00
Cyrus Leung b024a42e93
[Core] Move multimodal placeholder from chat utils to model definition (#20355)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-03 08:18:30 +00:00
Michael Yao cb97f2bfc5
[Docs] Replace two list with tables in intel_gaudi.md (#20414)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-07-03 00:48:25 -07:00
Reid 359200f6ac
[doc] fix link (#20417)
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-03 00:21:57 -07:00
Lifans 220aee902a
[Misc] Add rules to label Speculative Decoding Related PRs (#20406)
Signed-off-by: Lifan Shen <lifans@meta.com>
2025-07-02 23:56:49 -07:00
Nick Hill 67d25eca05
[Tests] Update online DP tests to verify that requests are balanced (#20157)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-03 14:49:13 +08:00
qscqesze 363528de27
[Feature] Support MiniMax-M1 function calls features (#20297)
Signed-off-by: QscQ <qscqesze@gmail.com>
Signed-off-by: qingjun <qingjun@minimaxi.com>
2025-07-03 06:48:27 +00:00
QiliangCui 4ff61ababa
[TPU] Add a case to cover RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8 (#20385)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-07-03 06:46:41 +00:00
Woosuk Kwon c6235f24f7 fix
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-02 23:27:17 -07:00
Woosuk Kwon 5203865c1e fix
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-02 23:24:33 -07:00
Woosuk Kwon 01eb1fff98 Remove --dtype float32
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-02 23:07:13 -07:00
Woosuk Kwon d28a003b18 fix xpu
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-02 22:19:17 -07:00
Woosuk Kwon 9e94110c43 fix cpu
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-02 22:17:58 -07:00
Woosuk Kwon cda307df48 remove attn backends
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-02 22:13:09 -07:00
Woosuk Kwon 99eddb06a0 [V0 deprecation] Remove V0 CPU/XPU/TPU backends
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-02 22:11:08 -07:00
Li, Jiang 0ec3779df7
[Bugfix][CI/CD][CPU] Fix CPU CI tests (#20383)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-02 20:11:36 -07:00
Chenheli Hua b616f6a53d
[Misc] Small: Fix video loader return type annotations. (#20389)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
2025-07-03 03:10:39 +00:00
bnellnm 2e25bb12a8
[Bugfix] Fix import of CutlassExpertsFp8 in compressed_tensors_moe.py (#20381)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-07-03 02:07:43 +00:00
Louie Tsai 9965c47d0d
Enable CPU nightly performance benchmark and its Markdown report (#18444)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
2025-07-02 17:50:25 -07:00
Nick Hill 059d4cdb49
[BugFix] Fix DP headless mode arg validation (#20398)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-02 17:15:32 -07:00
Tyler Michael Smith bdb84e26b0
[Bugfix] Fixes for FlashInfer's TORCH_CUDA_ARCH_LIST (#20136)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
2025-07-02 17:15:11 -07:00
Nicolò Lucchesi 3dd359147d
[Docs] Update EAGLE example (#20375)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-07-02 17:13:51 -07:00
Nick Hill 657f2f301a
[DP] Support external DP Load Balancer mode (#19790)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-02 10:21:52 -07:00
vllmellm a1aafc827a
[ROCm][FEAT] Enable Full Graph Mode in AITER MLA V1 Attn Backend (Decode Phase only) (#20254)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-07-02 16:25:46 +00:00
rongfu.leng 139508a418
[Misc] add handler HF_TOKEN is emptry string (#20369)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2025-07-02 09:14:31 -07:00
Nick Hill d265414dbc
[Minor] Clean up incorrect comment in test (#20382)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-02 09:13:37 -07:00
afeldman-nm 48fb076cbc
[V1] LogitsProcessor programming model (#16728)
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-07-02 09:10:42 -07:00
bnellnm c1909e7e8c
[Kernels] MoE refactor (#19636)
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: ElizaWszola <ewszola@redhat.com>
2025-07-02 06:08:27 -07:00
cronoik-inceptionai b95877509b
Documentation update tool_calling: mapping back to function from response (#20373) 2025-07-02 05:55:49 -07:00
zichongli5 706ff13224
[Model] Adds support for SlimMoE models Phi-tiny-MoE-instruct (#20286)
Signed-off-by: Zichong Li <t-lizichong@microsoft.com@Reasoning-H100-VM3.drbuo4tcjzruhloch3eo0b25ef.cx.internal.cloudapp.net>
Co-authored-by: Zichong Li <t-lizichong@microsoft.com@Reasoning-H100-VM3.drbuo4tcjzruhloch3eo0b25ef.cx.internal.cloudapp.net>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-07-02 12:54:12 +00:00
WangHuaqiang ccbfb1d1c9
[Bugfix] Fix the max_seq_len limit of 16384 for DeepSeek models (#20322)
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
2025-07-02 12:53:36 +00:00
Joonchen Liau 9e5552aa13
[NVIDIA] Support Cutlass w8a8 FP8 for Blackwell Geforce GPUs (sm120) (#17280)
Signed-off-by: kaln27 <liaojuncheng123@foxmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-02 06:47:19 -06:00
Lu Fang 0c600b9ab6
[Build/CI] Automatically tag DeepSeek related PRs (#20370)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-07-02 04:02:43 -07:00
CSWYF3634076 e303dcf523
[Model] Add Ernie4.5 and Ernie4.5MoE Model Support (#20220)
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
2025-07-02 03:37:01 -07:00
Michael Yao ae9c4d416f
[Docs] Make TPU ref prettier in google_tpu.md (#20356)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-07-02 02:04:08 -07:00
Michael Yao d853520b3e
[Docs] Fix indentations for 2-level items in deprecation_policy.md (#20352)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-07-01 23:50:31 -07:00
Cyrus Leung ba51aea65e
[Bugfix] Keye-VL compatibility with `tok_kwargs` (#20058) (#20353)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-01 23:46:59 -07:00
Kwai-Keye 8452946c06
[Model][VLM] Support Keye-VL-8B-Preview (#20126)
Signed-off-by: Kwai-Keye <Keye@kuaishou.com>
2025-07-01 23:35:04 -07:00
Chenheli Hua 2e7cbf2d7d
[Frontend] Support configurable mm placeholder strings & flexible video sampling policies via CLI flags. (#20105)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
2025-07-01 23:34:03 -07:00