vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Michael Goin	2142035b51	[V1] Support multiple kv connectors (#17564 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-05-14 16:28:02 -07:00
Russell Bryant	78aa341d12	[CI] Fix race condition in test_kv_cache_events test (#18169 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-14 16:27:48 -07:00
Jerry Zhang	7974736740	Add support for loading torchao models with `AOPerModuleConfig` (#17826 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-05-14 16:24:59 -07:00
Aaron Pham	2fc9075b82	[V1] Structured Outputs + Thinking compatibility (#16577 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-05-14 15:45:24 -07:00
Lucas Wilkinson	d93c976a0d	[Kernel] Have rotary embeddings support tensors (#18046 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-14 15:43:55 -07:00
Robert Shaw	856865008e	[CI] Disable Failing Tests (#18165 )	2025-05-14 13:49:56 -07:00
bnellnm	f9c069c85e	Modularize fused experts and integrate PPLX kernels (#15956 )	2025-05-14 13:11:54 -07:00
Nick Hill	59dd311cf5	[KVConnector] Keep KVTransferParams as a dict (#18033 )	2025-05-14 08:05:57 -07:00
Cyrus Leung	d066e52013	[Bugfix] Fix chat utils tests (#18139 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-14 05:38:21 -07:00
Cyrus Leung	d62a076e84	[Model] GritLM supports other attention backends (#18109 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-14 03:33:19 -07:00
Jee Jee Li	259127f8b8	[Bugfix] Fix LoRA test (#18123 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-14 10:25:47 +00:00
TJian	612c2edb4f	[FEAT] [ROCm]: Add AITER CK 2 Stages MoE support (#17110 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-05-14 03:03:11 -07:00
rongfu.leng	82e7f9bb03	[Misc] replace does not exist model (#18119 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-05-14 02:13:47 -07:00
Cyrus Leung	8f5dc41481	[Bugfix] Fix entrypoints audio test failure (#18111 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-14 09:08:07 +00:00
wang.yuqi	63ad622233	[New Model]: support GTE NewModel (#17986 )	2025-05-14 01:31:31 -07:00
lkchen	6685890d11	[Fix] Move "model_config" as keyword args in chat_utils.py (#18098 ) Signed-off-by: Linkun <github@lkchen.net>	2025-05-13 23:27:26 -07:00
Charlie Fu	7b2f28deba	[AMD][torch.compile] Enable silu+fp8_quant fusion for rocm (#18082 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-05-13 22:13:56 -07:00
vllmellm	2d912fb66f	[FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 (#17955 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-13 22:03:47 -07:00
Chen Zhang	f2ae883b67	[v1][KVCacheManager] pass num_new_computed_tokens to kv cache manager (#18001 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-13 19:09:39 -07:00
vllmellm	40de1ef455	[FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature (#14968 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-13 19:08:20 -07:00
Nick Hill	55aa7af994	[V1] DP scale-out (2/N): Decouple engine process management and comms (#15977 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-13 10:48:21 -07:00
Aaron Pham	cb528d0585	[Fix] check to make sure processor has chat templates (#18047 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-05-13 03:04:10 -07:00
Michael Goin	ea6ae8cb45	[Bugfix] Fix marlin moe fallback logic for llama4 (#18042 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-13 07:53:28 +00:00
Chen Zhang	f0d610a8ae	[v1][KVCacheManager] Avoid full cache hit by controlling max_length (#17999 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-05-13 06:50:38 +00:00
Chauncey	dc1a821768	[Feature][V1] Support `tool_choice: required` when using Xgrammar as the `StructuredOutputBackend`. (#17845 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-05-12 23:01:31 -07:00
hissu-hyvarinen	f6518b2b48	[ROCm] Skip tests for quantizations incompatible with ROCm (#17905 ) Signed-off-by: Hissu Hyvarinen <hissu.hyvarinen@amd.com>	2025-05-12 18:39:28 -06:00
bwshen-mi	acee8f48aa	[Model] Support MiMo-7B inference with MTP (#17433 ) Signed-off-by: wp-alpha <wangpeng66@xiaomi.com> Co-authored-by: wangpeng66 <wangpeng66@xiaomi.com>	2025-05-12 23:25:33 +00:00
wwl2755	dc9905368d	[V1][Spec Decode] Eagle unit tests (#17350 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-05-12 23:01:17 +00:00
Russell Bryant	ebab1ac37c	[CI] Make JSON output tests less likely to fail (#17859 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-12 22:31:54 +00:00
Jonathan Berkhahn	98ea35601c	[Lora][Frontend]Add default local directory LoRA resolver plugin. (#16855 ) Signed-off-by: jberkhahn <jaberkha@us.ibm.com>	2025-05-12 10:39:10 -07:00
Robert Shaw	d19110204c	[P/D] NIXL Integration (#17751 ) Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Brent Salisbury <bsalisbu@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: ApostaC <yihua98@uchicago.edu> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Brent Salisbury <bsalisbu@redhat.com>	2025-05-12 09:46:16 -07:00
Maximilien de Bayser	05a4324f8e	Initialize the delta tool call fields explicitly (#17340 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: igmainc <igmainc@icloud.com>	2025-05-12 13:28:58 +00:00
Cheng Kuan Yong Jason	08bf784078	[Bugfix] validate grammar and throw 400 error instead of crashing the engine when xgrammar validation fails (#17623 ) Signed-off-by: Jason Cheng <jasoncky96@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-05-12 09:06:10 +08:00
Isotr0py	021c16c7ca	[Model] Broadcast Ovis2 implementation to fit Ovis1.6 (#17861 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-11 17:56:30 -07:00
TJian	a810b5b088	[BugFix] [ROCm]: Bugfix and handle addition case of input for `rocm_aiter_rms_norm` (#17857 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-11 04:17:11 -07:00
wang.yuqi	e4b8713380	[New Model]: nomic-embed-text-v2-moe (#17785 )	2025-05-11 00:59:43 -07:00
Dipika Sikka	cd3edfc908	[Misc] Add compressed-tensors NVFP4A16 emulation support (#17914 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Dipika <dipikasikka1@gmail.com>	2025-05-11 15:58:38 +08:00
Frieda Huang	9cea90eab4	[Frontend] Add /classify endpoint (#17032 ) Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>	2025-05-11 07:57:07 +00:00
Ben Browning	8132365b74	[Bugfix]: v1 engine - consider lora adapters in allowed_token_ids (#17855 ) Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-05-11 00:53:58 -07:00
Jinzhen Lin	d74e5f37bc	[Kernel] fp4 marlin kernel (#17687 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>	2025-05-10 19:58:49 -07:00
Chen Zhang	ca66a1674c	[v1] Rename specialized_manager.py to single_type_kv_cache_manager.py (#17946 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-10 16:14:12 -07:00
Chen Zhang	950751a987	[v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders (#17483 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-10 16:12:04 -07:00
Ximo Guanter	fc4441a4ee	Add missing content type headers to /ping and /health (#17036 ) (#17786 ) Signed-off-by: Ximo Guanter <ximo.guanter@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-10 07:13:32 +01:00
tracelogfb	246e3e0a36	fix broken test vllm:test_kernels - test_attention_selector.py::test_flash_attn (#17873 ) Co-authored-by: Stephen Chen <tracelog@meta.com>	2025-05-10 10:46:54 +08:00
Pavani Majety	0c0fdae84f	[Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model (#16362 )	2025-05-09 16:24:41 -07:00
Harry Mellor	4b2ed7926a	Improve configs - the rest! (#17562 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-09 15:18:44 -07:00
Cyrus Leung	6e5595ca39	[CI/Build] Automatically retry flaky tests (#17856 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-09 09:55:17 -06:00
Chen Zhang	200da9a517	[v1] Move block management logic from KVCacheManager to SpecializedManager (#17474 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-09 15:25:34 +00:00
Harry Mellor	c6798baa9c	Change `top_k` to be disabled with `0` (still accept `-1` for now) (#17773 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-09 10:01:49 +00:00
Ning Xie	d310e6de98	[BUGFIX]: return fast when request requires prompt logprobs (#17251 )	2025-05-08 21:25:41 -07:00
vllmellm	3c9396a64f	[FEAT][ROCm]: Support AITER MLA on V1 Engine (#17523 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: qli88 <qiang.li2@amd.com> Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>	2025-05-09 10:42:05 +08:00
Shu Wang	376786fac1	Add cutlass support for blackwell fp8 blockwise gemm (#14383 ) Signed-off-by: Shu Wang <shuw@nvidia.com>	2025-05-08 15:09:55 -07:00
Russell Bryant	ec54d73c31	[CI] Fix test_collective_rpc (#17858 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-08 16:47:12 +00:00
fxmarty-amd	bb239a730f	[Bugfix] Fix quark fp8 format loading on AMD GPUs (#12612 ) Signed-off-by: Felix Marty <felmarty@amd.com> Signed-off-by: kewang2 <kewang2@amd.com> Co-authored-by: kewang2 <kewang2@amd.com>	2025-05-08 02:53:53 -07:00
Jevin Jiang	a463555dee	[TPU] Fix the test_sampler (#17820 )	2025-05-08 05:51:33 -04:00
Cyrus Leung	96722aa81d	[Frontend] Chat template fallbacks for multimodal models (#17805 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-07 23:05:54 -07:00
Hashem Hashemi	5a499e70d5	[Kernel][Hardware][AMD] Bf16 mfma opt for ROCm skinny GEMMs (#17071 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com> Signed-off-by: charlifu <charlifu@amd.com> Co-authored-by: charlifu <charlifu@amd.com>	2025-05-07 22:34:49 -07:00
Russell Bryant	6930a41116	[V1] Add VLLM_ALLOW_INSECURE_SERIALIZATION env var (#17490 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-05-08 13:34:02 +08:00
Chanh Nguyen	7ea2adb802	[Core] Support full cuda graph in v1 (#16072 ) Signed-off-by: Chanh Nguyen <cnguyen@linkedin.com> Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>	2025-05-07 22:30:15 -07:00
Wallas Henrique	d43f914d42	[Core][Feature] Input metadata dump on crash (#13407 ) Signed-off-by: Wallas Santos <wallashss@ibm.com>	2025-05-07 22:15:09 +00:00
Akshat Tripathi	c20ef40fd0	[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend (#14238 ) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>	2025-05-07 16:28:47 -04:00
Bowen Bao	db593aa67f	[Quantization] Quark MXFP4 format loading (#16943 )	2025-05-07 15:05:05 -04:00
Isotr0py	f98e307588	[Bugfix] Fix missing lora name mapping for lora without prefix (#17793 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-07 16:17:12 +00:00
Isotr0py	be8ff88e66	[Bugfix] Fix Video IO error for short video (#17791 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-07 15:36:06 +00:00
Yong Hoon Shin	98c89e16ff	Make key optional for rotary embedding (#17566 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-05-07 00:11:46 -07:00
Yong Hoon Shin	324a3119b0	Fix test_memory_usage_no_spec (#17754 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-05-07 00:10:33 -07:00
Cyrus Leung	8a15c2603a	[Frontend] Add missing chat templates for various MLLMs (#17758 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-07 00:10:01 -07:00
Satyajith Chilappagari	043e4c4955	Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling (#16357 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com> Co-authored-by: Aaron Dou <yzdou@amazon.com> Co-authored-by: Shashwat Srijan <sssrijan@amazon.com> Co-authored-by: Chongming Ni <chongmni@amazon.com> Co-authored-by: Amulya Ballakur <amulyaab@amazon.com> Co-authored-by: Patrick Lange <patlange@amazon.com> Co-authored-by: Elaine Zhao <elaineyz@amazon.com> Co-authored-by: Lin Lin Pan <tailinpa@amazon.com> Co-authored-by: Navyadhara Gogineni <navyadha@amazon.com> Co-authored-by: Yishan McNabb <yishanm@amazon.com> Co-authored-by: Mrinal Shukla <181322398+mrinalks@users.noreply.github.com>	2025-05-07 00:07:30 -07:00
Szymon Ożóg	1a45a61387	[Kernel] GGUF MoeVec kernel (#16780 ) Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com> Signed-off-by: SzymonOzog <szymon.ozog@gmail.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-05-06 23:07:23 -07:00
Jee Jee Li	822de7fb94	[Misc] Split model loader (#17712 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-07 12:42:26 +08:00
Michael Goin	e50a1f1a9c	[TPU] Add kernel test for moe_pallas (#17496 ) Signed-off-by: Michael Goin <mgoin64@gmail.com>	2025-05-06 17:59:57 -07:00
Chih-Chieh Yang	18dd5e01f2	[Model] Mamba2 causal conv1d Refactor to Split Prefill and Decode Requests for Corresponding Kernels (#17146 ) Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>	2025-05-06 17:59:30 -07:00
Thomas Parnell	2f925e5777	[Kernel] Unified Triton kernel that doesn't distinguish between prefill + decode (#16828 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-06 18:21:48 -04:00
Chen Zhang	aabcd2cae3	[v1] Introduce KVCacheBlocks as interface between Scheduler and KVCacheManager (#17479 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-06 08:50:34 -07:00
Li, Jiang	a6fed02068	[V1][PP] Support PP for MultiprocExecutor (#14219 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: jiang.li <jiang1.li@intel.com>	2025-05-06 07:58:05 -07:00
Mengqing Cao	f9bc5a0693	[Bugfix] Fix triton import with local TritonPlaceholder (#17446 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-05-06 17:53:09 +08:00
Lucas Wilkinson	6eae34533a	[Misc] Fix ScalarType float4 naming (#17690 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-06 01:07:15 -07:00
Stan Wozniak	999328be0d	[Model] Add GraniteMoeHybrid 4.0 model (#17497 ) Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com> Co-authored-by: Thomas Ortner <boh@zurich.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>	2025-05-06 12:00:31 +08:00
Nicolò Lucchesi	5941e0b7ea	[TPU][V1] Add support for top-logprobs (#17072 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-05-05 14:20:15 -07:00
XiongfeiWei	9765940824	[TPU] Enable gemma3-27b with TP>1 on multi-chips. (#17335 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-05-05 14:19:58 -07:00
Nick Hill	5ea5c514da	[BugFix] Increase timeout for startup failure test (#17642 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-05 20:53:19 +00:00
Jinzhen Lin	1d0c9d6b2d	[Kernel] some optimizations for dense marlin and moe marlin (#16850 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>	2025-05-05 09:39:30 -07:00
Harry Mellor	d6484ef3c3	Add full API docs and improve the UX of navigating them (#17485 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-03 19:42:43 -07:00
Isotr0py	f66f1e0fa3	[Bugfix] Fix broken Qwen2.5-omni tests (#17613 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-03 17:08:14 +00:00
Cyrus Leung	887d7af882	[Core] Gate `prompt_embeds` behind a feature flag (#17607 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-04 00:19:20 +08:00
Richard Zou	b90b0852e9	[easy] Print number of needed GPUs in skip message (#17594 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-05-02 15:27:43 -07:00
Caleb_Du	3e887d2e0c	permute/unpermute kernel for moe optimization (#14568 ) Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>	2025-05-02 11:31:55 -07:00
Cyrus Leung	cb234955df	[Misc] Clean up input processing (#17582 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 08:11:53 -07:00
Cyrus Leung	99404f53c7	[Security] Fix image hash collision (#17378 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 08:36:39 -04:00
Harry Mellor	785d75a03b	Automatically tell users that dict args must be valid JSON in CLI (#17577 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-02 05:24:55 -07:00
Cyrus Leung	d7543862bd	[Misc] Rename assets for testing (#17575 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 03:29:25 -07:00
Robert Shaw	c777df79f7	[BugFix] Fix Memory Leak (#17567 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-05-02 01:07:03 -07:00
Andrew Sansom	cc2a77d7f1	[Core] [Bugfix] Add Input Embeddings (#15428 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: 临景 <linjing.yx@alibaba-inc.com> Co-authored-by: Bryce1010 <bryceyx@gmail.com> Co-authored-by: Nan2018 <nan@protopia.ai> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 01:06:39 -07:00
Jerry Zhang	109e15a335	Add `pt_load_map_location` to allow loading to cuda (#16869 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-05-01 23:23:42 -07:00
Cyrus Leung	f89d0e11bf	[Misc] Continue refactoring model tests (#17573 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-01 22:06:08 -07:00
Michael Goin	292fc59d61	[CI] Actually run tests/kv_transfer/test_disagg.py in CI (#17555 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-02 04:05:04 +00:00
Isotr0py	88c8304104	[Model] Refactor Ovis2 to support original tokenizer (#17537 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-01 11:00:53 -07:00
Sage Moore	460a2b1100	[torch.compile] Add torch inductor pass for fusing silu_and_mul with subsequent scaled_fp8_quant operations (#10867 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-05-01 07:59:28 -07:00
Chauncey	98060b001d	[Feature][Frontend]: Deprecate --enable-reasoning (#17452 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-05-01 06:46:16 -07:00
Huy Do	b74d888c63	Fix more broken speculative decode tests (#17450 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-05-01 06:05:58 -07:00
Cyrus Leung	48e925fab5	[Misc] Clean up test docstrings and names (#17521 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-01 05:19:32 -07:00
Russell Bryant	fbefc8a78d	[Core] Enable IPv6 with vllm.utils.make_zmq_socket() (#16506 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-01 09:38:18 +00:00
Noah Yoshida	13cf6b6236	[BugFix] fix speculative decoding memory leak when speculation is disabled (#15506 ) Signed-off-by: Noah Yoshida <noahcy117@gmail.com>	2025-04-30 23:28:17 -07:00
Cyrus Leung	afb4429b4f	[CI/Build] Reorganize models tests (#17459 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-30 23:03:08 -07:00
Michael Goin	aa4502e7f3	[CI][Bugfix] Fix failing V1 Test due to missing 'cache_salt' arg (#17500 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-30 21:03:30 -07:00
Michael Goin	17b4d85f63	[CI][TPU] Skip structured outputs+spec decode tests on TPU (#17510 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-30 20:36:20 -07:00
Siyuan Liu	dbc18e7816	[CI][TPU] Skip Multimodal test (#17488 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-04-30 19:51:39 -07:00
Chen Zhang	81ecf425f0	[v1][Spec Decode] Make sliding window compatible with eagle prefix caching (#17398 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-04-30 18:25:53 +00:00
Russell Bryant	947f2f5375	[V1] Allow turning off pickle fallback in vllm.v1.serial_utils (#17427 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-04-30 16:10:54 +00:00
Alec	0be6d05b5e	[V1][Metrics] add support for kv event publishing (#16750 ) Signed-off-by: alec-flowers <aflowers@nvidia.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-04-30 07:44:45 -07:00
Marko Rosenmueller	77073c77bc	[Core] Prevent side-channel attacks via cache salting (#17045 ) Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>	2025-04-30 20:27:21 +08:00
Nicolò Lucchesi	a7d5b016bd	[TPU][V1][CI] Update regression test baseline for v6 CI (#17064 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-30 04:03:22 -07:00
Marco	54072f315f	[MODEL ADDITION] Ovis2 Model Addition (#15826 ) Signed-off-by: Marco <121761685+mlinmg@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-04-30 07:33:29 +00:00
Huy Do	88fcf00dda	Fix some speculative decode tests with tl.dot (#17371 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-04-29 19:41:02 -07:00
Harry Mellor	13698db634	Improve configs - `ModelConfig` (#17130 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-30 10:38:22 +08:00
Gabriel Marinho	1c2bc7ead0	Truncation control for embedding models (#14776 ) Signed-off-by: Gabriel Marinho <gmarinho@ibm.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com>	2025-04-30 09:24:57 +08:00
Benjamin Chislett	34120f5acd	[V1][Feature] Enable Speculative Decoding with Structured Outputs (#14702 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-04-30 00:02:10 +00:00
Harry Mellor	7489ec0bab	Remove Bamba 9B from CI (#17407 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-29 21:10:31 +00:00
Harry Mellor	0350809f3a	Remove Falcon3 2x7B from CI (#17404 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-29 19:52:25 +00:00
Harry Mellor	a6977dbd15	Simplify (and fix) passing of guided decoding backend options (#17008 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-29 19:02:23 +00:00
mofanke	a39203f99e	[Bugfix] add qwen3 reasoning-parser fix content is None when disable … (#17369 ) Signed-off-by: mofanke <mofanke@gmail.com>	2025-04-29 16:32:40 +00:00
Harry Mellor	2ef5d106bb	Improve literal dataclass field conversion to argparse argument (#17391 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-29 16:25:08 +00:00
Cyrus Leung	88ad9ec6b2	[Frontend] Support `chat_template_kwargs` in `LLM.chat` (#17356 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-29 22:03:35 +08:00
Cyrus Leung	00ee37efa2	[Bugfix] Clean up MiniMax-VL and fix processing (#17354 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-29 20:42:16 +08:00
Jee Jee Li	890f104cdf	[Doc] Fix QWen3MOE info (#17381 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-29 12:38:32 +00:00
ponix-j	bdb2cddafc	[Misc]Use a platform independent interface to obtain the device attributes (#17100 )	2025-04-29 06:59:13 +00:00
qscqesze	cde384cd92	[Model] support MiniMax-VL-01 model (#16328 ) Signed-off-by: qingjun <qingjun@minimaxi.com>	2025-04-29 12:05:50 +08:00
Michał Moskal	86d9fc29cb	implement Structural Tag with Guidance backend (#17333 ) Signed-off-by: Michal Moskal <michal@moskal.me>	2025-04-29 02:21:32 +00:00
Harry Mellor	b6dd32aa07	Make name of `compressed-tensors` quant method consistent across vLLM (#17255 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-28 16:28:13 +00:00
Harry Mellor	f94886946e	Improve conversion from dataclass configs to argparse arguments (#17303 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-28 16:22:12 +00:00
Alex Brooks	fa93cd9f60	[Model] Add Granite Speech Support (#16246 ) Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-04-28 10:05:00 +00:00
Lily Liu	20e489eaa1	[V1][Spec Decode] Make eagle compatible with prefix caching. (#17137 ) Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-04-27 09:29:43 -07:00
cascade	690fe019f0	[Feature] support sequence parallelism using compilation pass (#16155 ) Signed-off-by: cascade812 <cascade812@outlook.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-04-27 06:29:35 -07:00
Kaixi Hou	ed7a29d9f8	[NVIDIA] Support Cutlass MLA for Blackwell GPUs (#16032 ) Signed-off-by: kaixih <kaixih@nvidia.com>	2025-04-27 06:29:21 -07:00
Alex Brooks	756848e79e	[Bugfix] Fix Lora Name Parsing (#17196 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-27 20:33:09 +08:00
Cyrus Leung	93a126fbc7	[Misc] Make cached tokenizer pickle-compatible (#17048 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-27 13:05:00 +08:00
rasmith	8e4b351a0c	[Kernel][Triton][FP8] Adding fp8 and variable length sequence support to Triton FAv2 kernel (#12591 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-04-27 00:35:08 +00:00
Happy	9869453c42	Update test_flash_attn.py (#17102 ) Signed-off-by: ShuaibinLi <lishuaibin@live.cn>	2025-04-26 22:17:35 +00:00
Ning Xie	fd11a325b8	[MISC] rename interval to max_recent_requests (#14285 )	2025-04-26 16:59:18 +00:00
Russell Bryant	f8acd01ff7	[V1] Add `structural_tag` support using xgrammar (#17085 )	2025-04-26 14:06:37 +00:00
Cyrus Leung	909fdaf152	[Bugfix] Fix standard models tests (#17217 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-26 02:26:41 -07:00
Nick Hill	df6f3ce883	[Core] Remove prompt string from engine core data structures (#17214 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-25 23:41:05 -07:00
Woosuk Kwon	513f074766	[CI/test] Fix Eagle Correctness Test (#17209 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-25 23:40:36 -07:00
Nick Hill	b07bf83c7d	[BugFix] Avoid race conditions in zero-copy tensor transmission (#17203 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-26 06:00:07 +00:00
Zijing Liu	53e8cf53a4	[V1][Metrics] Allow V1 AsyncLLM to use custom logger (#14661 ) Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-25 22:05:40 -07:00
Shu Wang	9e96f56efb	Allocate kv_cache with stride order (#16605 ) Signed-off-by: shuw <shuw@nvidia.com>	2025-04-25 22:03:31 -07:00
Benjamin Chislett	a0e619e62a	[V1][Spec Decode] EAGLE-3 Support (#16937 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com> Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Co-authored-by: Bryan Lu <yuzhelu@amazon.com>	2025-04-25 15:43:07 -07:00
Nick Hill	70116459c3	[BugFix][Frontend] Fix `LLM.chat()` tokenization (#16081 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-25 22:20:05 +00:00
Cyrus Leung	43faa0461a	[Bugfix] Fix hybrid model tests (#17182 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-25 15:14:37 -07:00
Harry Mellor	423e9f1cbe	Use Transformers helper `get_text_config()` instead of checking for `text_config` (#17105 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-25 08:47:35 -07:00
Harry Mellor	0bd7f8fca5	Bump Transformers to 4.51.3 (#17116 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-25 08:34:34 -07:00
Cyrus Leung	19dcc02a72	[Bugfix] Fix mistral model tests (#17181 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-25 06:03:34 -07:00
Sangyeon Cho	6aae216b4e	[Bugfix] remove fallback in guided_json (int range, patterns) (#16725 ) Signed-off-by: csy1204 <josang1204@gmail.com> Co-authored-by: 조상연[플레이스 AI] <sang-yeon.cho@navercorp.com>	2025-04-25 06:54:43 +00:00
vllmellm	eef364723c	[FEAT] [ROCm]: AITER Fused MOE V1 Support (#16752 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-04-25 11:06:50 +08:00
Maximilien de Bayser	05e1fbfc52	Add chat template for Llama 4 models (#16428 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2025-04-24 20:19:36 +00:00
Harry Mellor	0fa939e2d1	Improve configs - `LoRAConfig` + `PromptAdapterConfig` (#16980 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 10:29:34 -07:00
Mark McLoughlin	340d7b1b21	[V1][Spec Decoding] Add num_drafts and num_accepted_tokens_per_position metrics (#16665 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-04-24 08:57:40 -07:00
Michael Goin	82e43b2d7e	Add missing rocm_skinny_gemms kernel test to CI (#17060 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-24 07:49:37 -07:00
wang.yuqi	67309a1cb5	[Frontend] Using matryoshka_dimensions control the allowed output dimensions. (#16970 )	2025-04-24 07:06:28 -07:00
Rui Qiao	c0dfd97519	[V1][PP] Optimization: continue scheduling prefill chunks (#17080 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-04-24 05:27:08 -07:00
Harry Mellor	a9138e85b1	Fix OOT registration test (#17099 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 04:44:12 -07:00
Harry Mellor	0a05ed57e6	Simplify `TokenizerGroup` (#16790 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 04:43:56 -07:00
Michael Goin	14288d1332	Disable enforce_eager for V1 TPU sampler and structured output tests (#17016 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-24 02:50:09 -07:00
Woosuk Kwon	b411418ff0	[Chore] Remove Sampler from Model Code (#17084 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-24 02:49:33 -07:00
Travis Johnson	3cde34a4a4	[Frontend] Support guidance:no-additional-properties for compatibility with xgrammar (#15949 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2025-04-23 18:34:41 +00:00
Michael Goin	6317a5174a	Categorize `tests/kernels/` based on kernel type (#16799 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-23 09:21:07 -04:00
Nick Hill	1e013fa388	[V1][DP] More robust DP/EP dummy request coordination (#16277 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-22 19:12:15 -07:00
Aleksandr Malyshev	bc7c4d206b	[Kernel][ROCM] Upstream prefix prefill speed up for vLLM V1 (#13305 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu> Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com> Signed-off-by: maleksan85 <maleksan@amd.com> Signed-off-by: <> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: qli88 <qiang.li2@amd.com> Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com>	2025-04-22 19:11:56 -07:00
Guillaume Calmettes	36fe78769f	[Bugfix] validate urls object for multimodal content parts (#16990 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-04-23 09:43:06 +08:00
Chenyaaang	83d933718c	[Core][V1][TPU] Enable structured decoding on TPU V1 (#16499 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-04-22 18:05:23 -06:00
vllmellm	30bc3e0f66	[FEAT][ROCm]: Support AITER MLA (#15893 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: qli88 <qiang.li2@amd.com>	2025-04-22 09:31:13 -07:00
Lei Wang	8d32dc603d	[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS (#6036 ) Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com> Co-authored-by: xinyuxiao <xinyuxiao2024@gmail.com>	2025-04-22 09:01:36 +01:00
Woosuk Kwon	c4ab9f3e71	[V1] Remove pre-allocation for KV cache (#16941 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-22 00:52:18 -07:00
Chauncey	acba33a0f1	[Bugfix] Fix the issue where llm.generate cannot be called repeatedly after setting GuidedDecodingParams (#16767 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-04-22 06:02:20 +00:00
Charlie Fu	188b7f9b8c	[Performance][ROCm] Add skinny gemms for unquantized linear on ROCm (#15830 ) Signed-off-by: charlifu <charlifu@amd.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>	2025-04-21 20:46:22 -07:00
Varun Sundar Rabindranath	7b8a2ab76f	[Kernel] Add expert_map support to Cutlass FP8 MOE (#16861 ) Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com> Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com>	2025-04-21 20:44:32 -07:00
Jeffrey Li	0e4254492f	[Bugfix]: fix issue with n>1 sampling on v1 requests overriding each other (#16863 ) Signed-off-by: Jeffrey Li <jeffrey.dot.li@gmail.com>	2025-04-22 11:40:19 +08:00
Nicolò Lucchesi	fa3bba2a53	[TPU][V1] Enable Top-P (#16843 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-04-22 00:46:07 +00:00
Michael Goin	986537f1c3	[V1] V1 FlashInfer Attention (#16684 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Aurick Qiao <qiao@aurick.net>	2025-04-22 00:38:41 +00:00
Nicolò Lucchesi	210207525e	[TPU][V1] Capture multimodal encoder during model compilation (#15051 ) Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Siyuan Liu <lsiyuan@google.com>	2025-04-21 18:36:59 -06:00
Chengji Yao	471fe65630	[TPU][V1] Implicitly adjust page size when there's SMEM OOM (#16871 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-21 15:43:13 -06:00
Woosuk Kwon	3a0fba5cf4	[V1][Spec Decode] Handle draft tokens beyond max_model_len (#16087 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-21 12:38:50 -07:00
qizixi	bb3605db85	[Bugfix] Fix v1/spec_decode/test_ngram.py (#16895 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-04-20 20:54:29 -07:00
Staszek Paśko	87aaadef73	Serialize tensors using int8 views (#16866 ) Signed-off-by: Staszek Pasko <staszek@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-19 10:28:34 -07:00
Isotr0py	83f3c3bd91	[Model] Refactor Phi-4-multimodal to use merged processor and support V1 (#15477 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-19 02:26:11 -07:00
vie-serendipity	d9737ca1c6	[V1][Misc] stop update prefix cache stats when logs_stats is disabled (#16460 ) Signed-off-by: vie-serendipity <2733147505@qq.com>	2025-04-19 02:25:19 -07:00
Nicolò Lucchesi	9d4ca19d50	[Misc] Benchmarks for audio models (#16505 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-19 02:24:14 -07:00
Nicolò Lucchesi	2ef0dc53b8	[Frontend] Add sampling params to `v1/audio/transcriptions` endpoint (#16591 ) Signed-off-by: Jannis Schönleber <joennlae@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Jannis Schönleber <joennlae@gmail.com>	2025-04-19 07:03:54 +00:00
Yang Fan	2c1bd848a6	[Model][VLM] Add Qwen2.5-Omni model support (thinker only) (#15130 ) Signed-off-by: fyabc <suyang.fy@alibaba-inc.com> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Xiong Wang <wangxiongts@163.com>	2025-04-18 23:14:36 -07:00
wang.yuqi	3d3ab3689f	[New Model]: Snowflake Arctic Embed (Family) (#16649 )	2025-04-18 08:11:57 -07:00
Harry Mellor	686623c5e7	Fix `nullable_kvs` fallback (#16837 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-18 05:58:39 -07:00
Harry Mellor	e78587a64c	Improve-mm-and-pooler-and-decoding-configs (#16789 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-17 22:13:32 -07:00
Cyrus Leung	c16fb5dae8	[Doc] Improve help examples for `--compilation-config` (#16729 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-17 21:22:34 -07:00
Tarun Kumar	e37073efd7	Add property-based testing for vLLM endpoints using an API defined by an OpenAPI 3.1 schema (#16721 ) Signed-off-by: Tarun Kumar <takumar@redhat.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-17 21:08:27 -07:00
Yihua Cheng	3408e47159	[P/D][V1] KV Connector API V1 (#15960 ) Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: remi <remi@mistral.ai> Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>	2025-04-17 13:22:40 -07:00
Nicolò Lucchesi	eb5819b2d9	[V1][TPU] Enable Top K (#15489 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Hyesoo Yang <hyeygit@gmail.com> Co-authored-by: Hyesoo Yang <hyeygit@gmail.com>	2025-04-17 18:18:11 +00:00
Nicolò Lucchesi	5989f4684d	[TPU][V1] Fix padding recompilation when `max-num-batched-tokens` is not even (#16726 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-17 18:09:57 +00:00
Nick Hill	05fcd1b430	[V1][Perf] Faster incremental detokenization (#15137 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-17 07:45:24 -07:00
Harry Mellor	d27ea94034	Improve configs - `TokenizerPoolConfig` + `DeviceConfig` (#16603 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-17 11:19:42 +00:00
intervitens	5b1aca2ae3	[Bugfix] Fix GLM4 model (#16618 ) Signed-off-by: intervitens <intervitens@tutanota.com>	2025-04-17 03:35:07 -07:00
Isotr0py	cb072ce93b	[Bugfix] Update Florence-2 tokenizer to make grounding tasks work (#16734 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-17 04:17:39 +00:00
Robert Shaw	2b05b8ce69	[V1][Frontend] Improve Shutdown And Logs (#11737 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Andrew Feldman <afeldman@neuralmagic.com> Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-16 19:48:34 -07:00
Staszek Paśko	3092375e27	[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased] (#16432 ) Signed-off-by: Staszek Pasko <staszek@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-16 19:28:32 -07:00
xsank	ee378f3d49	[Model] support modernbert (#16648 ) Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com> Co-authored-by: 唯勤 <xsank.mz@alibaba-inc.com>	2025-04-16 05:30:15 -07:00
Shanshan Shen	976711d9db	[V1][Structured Output] Move xgrammar related utils to `backend_xgrammar.py` (#16578 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-04-16 17:01:36 +08:00
Shinichi Hemmi	3badb0213b	[Model] Add PLaMo2 (#14323 ) Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com> Signed-off-by: shemmi <shemmi@preferred.jp> Co-authored-by: Kento Nozawa <nzw0301@preferred.jp> Co-authored-by: Hiroaki Mikami <mhiroaki@preferred.jp> Co-authored-by: Calvin Metzger <metzger@preferred.jp>	2025-04-15 19:31:30 -07:00
Angky William	fdcb850f14	[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server (#10546 ) Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local> Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>	2025-04-15 22:31:38 +00:00
Dipika Sikka	54a66e5fee	[Misc] Update `compressed-tensors` WNA16 to support zero-points (#14211 )	2025-04-15 07:33:51 -06:00
Jee Jee Li	1575c1701a	[CI/Build] Fix LoRA OOM (#16624 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-15 16:38:19 +08:00
Michael Goin	b4fe16c75b	Add `vllm bench [latency, throughput]` CLI commands (#16508 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-14 23:10:35 -07:00
Pooya Davoodi	bc5dd4f669	[Bugfix] Fix broken GritLM model and tests (missing pooling_metadata) (#16631 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>	2025-04-14 23:09:58 -07:00
Tyler Michael Smith	dbb036cf61	[Bugfix] Fix tests/kernels/test_mamba_ssm_ssd.py (#16623 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-04-15 05:35:38 +00:00
Jinzhen Lin	d06ba4ed3f	[Kernel] moe wna16 marlin kernel (#14447 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-04-14 20:05:22 -07:00
Alex Brooks	6b40996ae8	[Core][Bugfix] Fix Offline MM Beam Search (#16390 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-04-15 10:33:02 +08:00
courage17340	b1308b84a3	[Model][VLM] Add Kimi-VL model support (#16387 ) Signed-off-by: courage17340 <courage17340@163.com>	2025-04-14 21:41:48 +00:00
Nicolò Lucchesi	b3f2fddd17	[TPU][V1] Fix exponential padding when `max-num-batched-tokens` is not a power of 2 (#16596 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-14 17:01:05 +00:00
Cyrus Leung	aa29841ede	[Bugfix] Multi-modal caches not acting like LRU caches (#16593 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-14 09:24:16 -07:00
Russell Bryant	dc1b4a6f13	[Core][V0] Enable regex support with xgrammar (#13228 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-14 10:13:38 +08:00
Lily Liu	f49e5aff11	[V1][Spec Decode] KV cache slots for eagle heads (#16370 ) Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-04-12 19:42:51 -07:00
Ryan McConville	6c11ecf8d3	[Bugfix] Validate logit biases to prevent out of vocab ids crashing engine (#16529 ) Signed-off-by: Ryan McConville <ryan@ryanmcconville.com>	2025-04-12 20:19:19 +00:00
Cyrus Leung	d9fc8cd9da	[V1] Enable multi-input by default (#15799 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-12 08:52:39 +00:00
Cyrus Leung	c5bc0e7fcc	[Misc] Update chat utils tests (#16520 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-12 06:48:43 +00:00
wang.yuqi	fbf722c6e6	[Frontend] support matryoshka representation / support embedding API dimensions (#16331 )	2025-04-11 23:23:10 -07:00
leon-seidel	e92d7085bf	[Feature][V1] Add xgrammar to support minLength, maxLength with test (#16516 ) Signed-off-by: Leon Seidel <leon.seidel@fau.de>	2025-04-11 23:22:07 -07:00
Nick Hill	41cc883c29	[BugFix] Handle non-contiguous tensors properly when serializing (#16492 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-11 17:54:06 -07:00
Ye (Charlotte) Qi	16eda8c43a	[Frontend] Added chat templates for LLaMa4 pythonic tool calling (#16463 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Kai Wu <kaiwu@meta.com>	2025-04-12 06:26:17 +08:00
Travis Johnson	71b9cde010	[Bugfix] handle alignment of encoder_seq_lens in mllama.py (#14784 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2025-04-11 19:59:50 +00:00
Michael Goin	f41647ee6b	[Kernel] Support W8A8 channel-wise weights and per-token activations in triton fused_moe_kernel (#16366 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-11 17:54:08 +00:00
chaow-amd	9e90c9f73f	[Bugfix] Fix bugs of running Quark quantized models (#16236 ) Signed-off-by: chaow <chaow@amd.com>	2025-04-11 10:18:32 -04:00
DefTruth	e9528f6dc6	[Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-04-11 06:50:50 -06:00
Michael Goin	aa3b3d76e0	Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (#16447 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-11 08:09:52 +00:00
Isotr0py	93195146ea	[Bugfix][VLM] Fix failing Phi-4-MM multi-images tests and add vision-speech test (#16424 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-11 04:57:16 +00:00
Nicolò Lucchesi	3cc9af88ff	[TPU][V1] Disable per-request seed/Generator (#16172 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-10 17:05:44 -04:00
Nick Hill	dd143ef541	[V1] Zero-copy tensor/ndarray serialization/transmission (#13790 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-10 19:23:14 +00:00
Lily Liu	e8224f3dca	[V1][Spec Decode] Eagle Model loading (#16035 ) Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-04-10 11:21:48 -07:00
Russell Bryant	9665313c39	[V1] Set structured output backend to `auto` by default (#15724 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-10 17:53:26 +00:00
Cyrus Leung	83b824c8b4	[VLM] Remove `BaseProcessingInfo.get_mm_max_tokens_per_item` (#16408 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-10 09:06:58 -07:00
Cyrus Leung	3d4c87758e	[Misc] Update transformers version limits of multi-modal tests (#16381 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-09 23:03:33 -07:00
Yuxuan Zhang	1e44ffc3ff	Add GLM-4-0414 support (#16338 ) Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com> Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: yihong0618 <zouzou0208@gmail.com> Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: Ajay Vohra <ajayvohr@amazon.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com> Co-authored-by: Accelerator1996 <lvfei.lv@alibaba-inc.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: yihong <zouzou0208@gmail.com> Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Co-authored-by: ajayvohra2005 <ajayvohr@amazon.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-04-10 09:19:42 +08:00
Chengji Yao	a454748544	[TPU][V1] Refine tpu_model_runner to mitigate future recompilation issues (#16275 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-09 18:51:51 -06:00
Guillaume Calmettes	98d01d3ce2	[Bugfix][Frontend] respect provided default guided decoding backend (#15476 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-04-09 05:11:10 -07:00
Accelerator1996	24f6b9a713	[Misc] Fix test_sharded_state_loader.py(#16004 ) (#16005 ) Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com>	2025-04-09 14:47:30 +08:00
Luka Govedič	9cdde47289	[BugFix] Fix fusion test and add them to CI (#16287 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-04-08 23:46:45 -07:00
Chengji Yao	b1eb4ca152	[TPU] Update PyTorch/XLA (#16288 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-09 14:46:32 +08:00
Michael Goin	87b4ac56c2	[CI][Bugfix] Fix bad tolerance for test_batch_base64_embedding (#16221 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-09 04:14:46 +00:00
rongfu.leng	4716377fbc	[Feature] Estimate max-model-len use available KV cache memory (#16168 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-08 19:12:51 -07:00
Chauncey	102bf967f0	[Model] Add smolvlm support (#16017 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-04-08 19:12:17 -07:00
Jee Jee Li	86c3369eb8	[CI/Build] Fix CI LoRA failure (#16270 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-09 09:13:56 +08:00
Cyrus Leung	4ebc0b9640	[Bugfix] Proper input validation for multi-modal encoder-decoder models (#16156 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-08 09:45:21 -07:00
wang.yuqi	1f5d13ab9f	[New Model]: jinaai/jina-embeddings-v3 (#16120 )	2025-04-08 08:39:12 -07:00

... 3 4 5 6 7 ...

2143 Commits