vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
vllmellm	3c9396a64f	[FEAT][ROCm]: Support AITER MLA on V1 Engine (#17523 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: qli88 <qiang.li2@amd.com> Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>	2025-05-09 10:42:05 +08:00
Shu Wang	376786fac1	Add cutlass support for blackwell fp8 blockwise gemm (#14383 ) Signed-off-by: Shu Wang <shuw@nvidia.com>	2025-05-08 15:09:55 -07:00
Russell Bryant	ec54d73c31	[CI] Fix test_collective_rpc (#17858 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-08 16:47:12 +00:00
fxmarty-amd	bb239a730f	[Bugfix] Fix quark fp8 format loading on AMD GPUs (#12612 ) Signed-off-by: Felix Marty <felmarty@amd.com> Signed-off-by: kewang2 <kewang2@amd.com> Co-authored-by: kewang2 <kewang2@amd.com>	2025-05-08 02:53:53 -07:00
Jevin Jiang	a463555dee	[TPU] Fix the test_sampler (#17820 )	2025-05-08 05:51:33 -04:00
Cyrus Leung	96722aa81d	[Frontend] Chat template fallbacks for multimodal models (#17805 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-07 23:05:54 -07:00
Hashem Hashemi	5a499e70d5	[Kernel][Hardware][AMD] Bf16 mfma opt for ROCm skinny GEMMs (#17071 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com> Signed-off-by: charlifu <charlifu@amd.com> Co-authored-by: charlifu <charlifu@amd.com>	2025-05-07 22:34:49 -07:00
Russell Bryant	6930a41116	[V1] Add VLLM_ALLOW_INSECURE_SERIALIZATION env var (#17490 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-05-08 13:34:02 +08:00
Chanh Nguyen	7ea2adb802	[Core] Support full cuda graph in v1 (#16072 ) Signed-off-by: Chanh Nguyen <cnguyen@linkedin.com> Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>	2025-05-07 22:30:15 -07:00
Wallas Henrique	d43f914d42	[Core][Feature] Input metadata dump on crash (#13407 ) Signed-off-by: Wallas Santos <wallashss@ibm.com>	2025-05-07 22:15:09 +00:00
Akshat Tripathi	c20ef40fd0	[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend (#14238 ) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>	2025-05-07 16:28:47 -04:00
Bowen Bao	db593aa67f	[Quantization] Quark MXFP4 format loading (#16943 )	2025-05-07 15:05:05 -04:00
Isotr0py	f98e307588	[Bugfix] Fix missing lora name mapping for lora without prefix (#17793 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-07 16:17:12 +00:00
Isotr0py	be8ff88e66	[Bugfix] Fix Video IO error for short video (#17791 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-07 15:36:06 +00:00
Yong Hoon Shin	98c89e16ff	Make key optional for rotary embedding (#17566 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-05-07 00:11:46 -07:00
Yong Hoon Shin	324a3119b0	Fix test_memory_usage_no_spec (#17754 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-05-07 00:10:33 -07:00
Cyrus Leung	8a15c2603a	[Frontend] Add missing chat templates for various MLLMs (#17758 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-07 00:10:01 -07:00
Satyajith Chilappagari	043e4c4955	Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling (#16357 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com> Co-authored-by: Aaron Dou <yzdou@amazon.com> Co-authored-by: Shashwat Srijan <sssrijan@amazon.com> Co-authored-by: Chongming Ni <chongmni@amazon.com> Co-authored-by: Amulya Ballakur <amulyaab@amazon.com> Co-authored-by: Patrick Lange <patlange@amazon.com> Co-authored-by: Elaine Zhao <elaineyz@amazon.com> Co-authored-by: Lin Lin Pan <tailinpa@amazon.com> Co-authored-by: Navyadhara Gogineni <navyadha@amazon.com> Co-authored-by: Yishan McNabb <yishanm@amazon.com> Co-authored-by: Mrinal Shukla <181322398+mrinalks@users.noreply.github.com>	2025-05-07 00:07:30 -07:00
Szymon Ożóg	1a45a61387	[Kernel] GGUF MoeVec kernel (#16780 ) Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com> Signed-off-by: SzymonOzog <szymon.ozog@gmail.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-05-06 23:07:23 -07:00
Jee Jee Li	822de7fb94	[Misc] Split model loader (#17712 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-07 12:42:26 +08:00
Michael Goin	e50a1f1a9c	[TPU] Add kernel test for moe_pallas (#17496 ) Signed-off-by: Michael Goin <mgoin64@gmail.com>	2025-05-06 17:59:57 -07:00
Chih-Chieh Yang	18dd5e01f2	[Model] Mamba2 causal conv1d Refactor to Split Prefill and Decode Requests for Corresponding Kernels (#17146 ) Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>	2025-05-06 17:59:30 -07:00
Thomas Parnell	2f925e5777	[Kernel] Unified Triton kernel that doesn't distinguish between prefill + decode (#16828 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-06 18:21:48 -04:00
Chen Zhang	aabcd2cae3	[v1] Introduce KVCacheBlocks as interface between Scheduler and KVCacheManager (#17479 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-06 08:50:34 -07:00
Li, Jiang	a6fed02068	[V1][PP] Support PP for MultiprocExecutor (#14219 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: jiang.li <jiang1.li@intel.com>	2025-05-06 07:58:05 -07:00
Mengqing Cao	f9bc5a0693	[Bugfix] Fix triton import with local TritonPlaceholder (#17446 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-05-06 17:53:09 +08:00
Lucas Wilkinson	6eae34533a	[Misc] Fix ScalarType float4 naming (#17690 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-06 01:07:15 -07:00
Stan Wozniak	999328be0d	[Model] Add GraniteMoeHybrid 4.0 model (#17497 ) Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com> Co-authored-by: Thomas Ortner <boh@zurich.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>	2025-05-06 12:00:31 +08:00
Nicolò Lucchesi	5941e0b7ea	[TPU][V1] Add support for top-logprobs (#17072 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-05-05 14:20:15 -07:00
XiongfeiWei	9765940824	[TPU] Enable gemma3-27b with TP>1 on multi-chips. (#17335 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-05-05 14:19:58 -07:00
Nick Hill	5ea5c514da	[BugFix] Increase timeout for startup failure test (#17642 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-05 20:53:19 +00:00
Jinzhen Lin	1d0c9d6b2d	[Kernel] some optimizations for dense marlin and moe marlin (#16850 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>	2025-05-05 09:39:30 -07:00
Harry Mellor	d6484ef3c3	Add full API docs and improve the UX of navigating them (#17485 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-03 19:42:43 -07:00
Isotr0py	f66f1e0fa3	[Bugfix] Fix broken Qwen2.5-omni tests (#17613 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-03 17:08:14 +00:00
Cyrus Leung	887d7af882	[Core] Gate `prompt_embeds` behind a feature flag (#17607 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-04 00:19:20 +08:00
Richard Zou	b90b0852e9	[easy] Print number of needed GPUs in skip message (#17594 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-05-02 15:27:43 -07:00
Caleb_Du	3e887d2e0c	permute/unpermute kernel for moe optimization (#14568 ) Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>	2025-05-02 11:31:55 -07:00
Cyrus Leung	cb234955df	[Misc] Clean up input processing (#17582 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 08:11:53 -07:00
Cyrus Leung	99404f53c7	[Security] Fix image hash collision (#17378 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 08:36:39 -04:00
Harry Mellor	785d75a03b	Automatically tell users that dict args must be valid JSON in CLI (#17577 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-02 05:24:55 -07:00
Cyrus Leung	d7543862bd	[Misc] Rename assets for testing (#17575 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 03:29:25 -07:00
Robert Shaw	c777df79f7	[BugFix] Fix Memory Leak (#17567 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-05-02 01:07:03 -07:00
Andrew Sansom	cc2a77d7f1	[Core] [Bugfix] Add Input Embeddings (#15428 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: 临景 <linjing.yx@alibaba-inc.com> Co-authored-by: Bryce1010 <bryceyx@gmail.com> Co-authored-by: Nan2018 <nan@protopia.ai> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 01:06:39 -07:00
Jerry Zhang	109e15a335	Add `pt_load_map_location` to allow loading to cuda (#16869 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-05-01 23:23:42 -07:00
Cyrus Leung	f89d0e11bf	[Misc] Continue refactoring model tests (#17573 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-01 22:06:08 -07:00
Michael Goin	292fc59d61	[CI] Actually run tests/kv_transfer/test_disagg.py in CI (#17555 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-02 04:05:04 +00:00
Isotr0py	88c8304104	[Model] Refactor Ovis2 to support original tokenizer (#17537 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-01 11:00:53 -07:00
Sage Moore	460a2b1100	[torch.compile] Add torch inductor pass for fusing silu_and_mul with subsequent scaled_fp8_quant operations (#10867 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-05-01 07:59:28 -07:00
Chauncey	98060b001d	[Feature][Frontend]: Deprecate --enable-reasoning (#17452 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-05-01 06:46:16 -07:00
Huy Do	b74d888c63	Fix more broken speculative decode tests (#17450 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-05-01 06:05:58 -07:00
Cyrus Leung	48e925fab5	[Misc] Clean up test docstrings and names (#17521 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-01 05:19:32 -07:00
Russell Bryant	fbefc8a78d	[Core] Enable IPv6 with vllm.utils.make_zmq_socket() (#16506 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-01 09:38:18 +00:00
Noah Yoshida	13cf6b6236	[BugFix] fix speculative decoding memory leak when speculation is disabled (#15506 ) Signed-off-by: Noah Yoshida <noahcy117@gmail.com>	2025-04-30 23:28:17 -07:00
Cyrus Leung	afb4429b4f	[CI/Build] Reorganize models tests (#17459 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-30 23:03:08 -07:00
Michael Goin	aa4502e7f3	[CI][Bugfix] Fix failing V1 Test due to missing 'cache_salt' arg (#17500 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-30 21:03:30 -07:00
Michael Goin	17b4d85f63	[CI][TPU] Skip structured outputs+spec decode tests on TPU (#17510 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-30 20:36:20 -07:00
Siyuan Liu	dbc18e7816	[CI][TPU] Skip Multimodal test (#17488 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-04-30 19:51:39 -07:00
Chen Zhang	81ecf425f0	[v1][Spec Decode] Make sliding window compatible with eagle prefix caching (#17398 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-04-30 18:25:53 +00:00
Russell Bryant	947f2f5375	[V1] Allow turning off pickle fallback in vllm.v1.serial_utils (#17427 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-04-30 16:10:54 +00:00
Alec	0be6d05b5e	[V1][Metrics] add support for kv event publishing (#16750 ) Signed-off-by: alec-flowers <aflowers@nvidia.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-04-30 07:44:45 -07:00
Marko Rosenmueller	77073c77bc	[Core] Prevent side-channel attacks via cache salting (#17045 ) Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>	2025-04-30 20:27:21 +08:00
Nicolò Lucchesi	a7d5b016bd	[TPU][V1][CI] Update regression test baseline for v6 CI (#17064 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-30 04:03:22 -07:00
Marco	54072f315f	[MODEL ADDITION] Ovis2 Model Addition (#15826 ) Signed-off-by: Marco <121761685+mlinmg@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-04-30 07:33:29 +00:00
Huy Do	88fcf00dda	Fix some speculative decode tests with tl.dot (#17371 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-04-29 19:41:02 -07:00
Harry Mellor	13698db634	Improve configs - `ModelConfig` (#17130 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-30 10:38:22 +08:00
Gabriel Marinho	1c2bc7ead0	Truncation control for embedding models (#14776 ) Signed-off-by: Gabriel Marinho <gmarinho@ibm.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com>	2025-04-30 09:24:57 +08:00
Benjamin Chislett	34120f5acd	[V1][Feature] Enable Speculative Decoding with Structured Outputs (#14702 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-04-30 00:02:10 +00:00
Harry Mellor	7489ec0bab	Remove Bamba 9B from CI (#17407 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-29 21:10:31 +00:00
Harry Mellor	0350809f3a	Remove Falcon3 2x7B from CI (#17404 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-29 19:52:25 +00:00
Harry Mellor	a6977dbd15	Simplify (and fix) passing of guided decoding backend options (#17008 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-29 19:02:23 +00:00
mofanke	a39203f99e	[Bugfix] add qwen3 reasoning-parser fix content is None when disable … (#17369 ) Signed-off-by: mofanke <mofanke@gmail.com>	2025-04-29 16:32:40 +00:00
Harry Mellor	2ef5d106bb	Improve literal dataclass field conversion to argparse argument (#17391 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-29 16:25:08 +00:00
Cyrus Leung	88ad9ec6b2	[Frontend] Support `chat_template_kwargs` in `LLM.chat` (#17356 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-29 22:03:35 +08:00
Cyrus Leung	00ee37efa2	[Bugfix] Clean up MiniMax-VL and fix processing (#17354 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-29 20:42:16 +08:00
Jee Jee Li	890f104cdf	[Doc] Fix QWen3MOE info (#17381 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-29 12:38:32 +00:00
ponix-j	bdb2cddafc	[Misc]Use a platform independent interface to obtain the device attributes (#17100 )	2025-04-29 06:59:13 +00:00
qscqesze	cde384cd92	[Model] support MiniMax-VL-01 model (#16328 ) Signed-off-by: qingjun <qingjun@minimaxi.com>	2025-04-29 12:05:50 +08:00
Michał Moskal	86d9fc29cb	implement Structural Tag with Guidance backend (#17333 ) Signed-off-by: Michal Moskal <michal@moskal.me>	2025-04-29 02:21:32 +00:00
Harry Mellor	b6dd32aa07	Make name of `compressed-tensors` quant method consistent across vLLM (#17255 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-28 16:28:13 +00:00
Harry Mellor	f94886946e	Improve conversion from dataclass configs to argparse arguments (#17303 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-28 16:22:12 +00:00
Alex Brooks	fa93cd9f60	[Model] Add Granite Speech Support (#16246 ) Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-04-28 10:05:00 +00:00
Lily Liu	20e489eaa1	[V1][Spec Decode] Make eagle compatible with prefix caching. (#17137 ) Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-04-27 09:29:43 -07:00
cascade	690fe019f0	[Feature] support sequence parallelism using compilation pass (#16155 ) Signed-off-by: cascade812 <cascade812@outlook.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-04-27 06:29:35 -07:00
Kaixi Hou	ed7a29d9f8	[NVIDIA] Support Cutlass MLA for Blackwell GPUs (#16032 ) Signed-off-by: kaixih <kaixih@nvidia.com>	2025-04-27 06:29:21 -07:00
Alex Brooks	756848e79e	[Bugfix] Fix Lora Name Parsing (#17196 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-27 20:33:09 +08:00
Cyrus Leung	93a126fbc7	[Misc] Make cached tokenizer pickle-compatible (#17048 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-27 13:05:00 +08:00
rasmith	8e4b351a0c	[Kernel][Triton][FP8] Adding fp8 and variable length sequence support to Triton FAv2 kernel (#12591 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-04-27 00:35:08 +00:00
Happy	9869453c42	Update test_flash_attn.py (#17102 ) Signed-off-by: ShuaibinLi <lishuaibin@live.cn>	2025-04-26 22:17:35 +00:00
Ning Xie	fd11a325b8	[MISC] rename interval to max_recent_requests (#14285 )	2025-04-26 16:59:18 +00:00
Russell Bryant	f8acd01ff7	[V1] Add `structural_tag` support using xgrammar (#17085 )	2025-04-26 14:06:37 +00:00
Cyrus Leung	909fdaf152	[Bugfix] Fix standard models tests (#17217 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-26 02:26:41 -07:00
Nick Hill	df6f3ce883	[Core] Remove prompt string from engine core data structures (#17214 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-25 23:41:05 -07:00
Woosuk Kwon	513f074766	[CI/test] Fix Eagle Correctness Test (#17209 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-25 23:40:36 -07:00
Nick Hill	b07bf83c7d	[BugFix] Avoid race conditions in zero-copy tensor transmission (#17203 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-26 06:00:07 +00:00
Zijing Liu	53e8cf53a4	[V1][Metrics] Allow V1 AsyncLLM to use custom logger (#14661 ) Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-25 22:05:40 -07:00
Shu Wang	9e96f56efb	Allocate kv_cache with stride order (#16605 ) Signed-off-by: shuw <shuw@nvidia.com>	2025-04-25 22:03:31 -07:00
Benjamin Chislett	a0e619e62a	[V1][Spec Decode] EAGLE-3 Support (#16937 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com> Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Co-authored-by: Bryan Lu <yuzhelu@amazon.com>	2025-04-25 15:43:07 -07:00
Nick Hill	70116459c3	[BugFix][Frontend] Fix `LLM.chat()` tokenization (#16081 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-25 22:20:05 +00:00
Cyrus Leung	43faa0461a	[Bugfix] Fix hybrid model tests (#17182 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-25 15:14:37 -07:00
Harry Mellor	423e9f1cbe	Use Transformers helper `get_text_config()` instead of checking for `text_config` (#17105 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-25 08:47:35 -07:00
Harry Mellor	0bd7f8fca5	Bump Transformers to 4.51.3 (#17116 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-25 08:34:34 -07:00
Cyrus Leung	19dcc02a72	[Bugfix] Fix mistral model tests (#17181 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-25 06:03:34 -07:00
Sangyeon Cho	6aae216b4e	[Bugfix] remove fallback in guided_json (int range, patterns) (#16725 ) Signed-off-by: csy1204 <josang1204@gmail.com> Co-authored-by: 조상연[플레이스 AI] <sang-yeon.cho@navercorp.com>	2025-04-25 06:54:43 +00:00
vllmellm	eef364723c	[FEAT] [ROCm]: AITER Fused MOE V1 Support (#16752 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-04-25 11:06:50 +08:00
Maximilien de Bayser	05e1fbfc52	Add chat template for Llama 4 models (#16428 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2025-04-24 20:19:36 +00:00
Harry Mellor	0fa939e2d1	Improve configs - `LoRAConfig` + `PromptAdapterConfig` (#16980 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 10:29:34 -07:00
Mark McLoughlin	340d7b1b21	[V1][Spec Decoding] Add num_drafts and num_accepted_tokens_per_position metrics (#16665 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-04-24 08:57:40 -07:00
Michael Goin	82e43b2d7e	Add missing rocm_skinny_gemms kernel test to CI (#17060 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-24 07:49:37 -07:00
wang.yuqi	67309a1cb5	[Frontend] Using matryoshka_dimensions control the allowed output dimensions. (#16970 )	2025-04-24 07:06:28 -07:00
Rui Qiao	c0dfd97519	[V1][PP] Optimization: continue scheduling prefill chunks (#17080 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-04-24 05:27:08 -07:00
Harry Mellor	a9138e85b1	Fix OOT registration test (#17099 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 04:44:12 -07:00
Harry Mellor	0a05ed57e6	Simplify `TokenizerGroup` (#16790 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 04:43:56 -07:00
Michael Goin	14288d1332	Disable enforce_eager for V1 TPU sampler and structured output tests (#17016 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-24 02:50:09 -07:00
Woosuk Kwon	b411418ff0	[Chore] Remove Sampler from Model Code (#17084 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-24 02:49:33 -07:00
Travis Johnson	3cde34a4a4	[Frontend] Support guidance:no-additional-properties for compatibility with xgrammar (#15949 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2025-04-23 18:34:41 +00:00
Michael Goin	6317a5174a	Categorize `tests/kernels/` based on kernel type (#16799 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-23 09:21:07 -04:00
Nick Hill	1e013fa388	[V1][DP] More robust DP/EP dummy request coordination (#16277 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-22 19:12:15 -07:00
Aleksandr Malyshev	bc7c4d206b	[Kernel][ROCM] Upstream prefix prefill speed up for vLLM V1 (#13305 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu> Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com> Signed-off-by: maleksan85 <maleksan@amd.com> Signed-off-by: <> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: qli88 <qiang.li2@amd.com> Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com>	2025-04-22 19:11:56 -07:00
Guillaume Calmettes	36fe78769f	[Bugfix] validate urls object for multimodal content parts (#16990 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-04-23 09:43:06 +08:00
Chenyaaang	83d933718c	[Core][V1][TPU] Enable structured decoding on TPU V1 (#16499 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-04-22 18:05:23 -06:00
vllmellm	30bc3e0f66	[FEAT][ROCm]: Support AITER MLA (#15893 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: qli88 <qiang.li2@amd.com>	2025-04-22 09:31:13 -07:00
Lei Wang	8d32dc603d	[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS (#6036 ) Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com> Co-authored-by: xinyuxiao <xinyuxiao2024@gmail.com>	2025-04-22 09:01:36 +01:00
Woosuk Kwon	c4ab9f3e71	[V1] Remove pre-allocation for KV cache (#16941 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-22 00:52:18 -07:00
Chauncey	acba33a0f1	[Bugfix] Fix the issue where llm.generate cannot be called repeatedly after setting GuidedDecodingParams (#16767 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-04-22 06:02:20 +00:00
Charlie Fu	188b7f9b8c	[Performance][ROCm] Add skinny gemms for unquantized linear on ROCm (#15830 ) Signed-off-by: charlifu <charlifu@amd.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>	2025-04-21 20:46:22 -07:00
Varun Sundar Rabindranath	7b8a2ab76f	[Kernel] Add expert_map support to Cutlass FP8 MOE (#16861 ) Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com> Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com>	2025-04-21 20:44:32 -07:00
Jeffrey Li	0e4254492f	[Bugfix]: fix issue with n>1 sampling on v1 requests overriding each other (#16863 ) Signed-off-by: Jeffrey Li <jeffrey.dot.li@gmail.com>	2025-04-22 11:40:19 +08:00
Nicolò Lucchesi	fa3bba2a53	[TPU][V1] Enable Top-P (#16843 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-04-22 00:46:07 +00:00
Michael Goin	986537f1c3	[V1] V1 FlashInfer Attention (#16684 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Aurick Qiao <qiao@aurick.net>	2025-04-22 00:38:41 +00:00
Nicolò Lucchesi	210207525e	[TPU][V1] Capture multimodal encoder during model compilation (#15051 ) Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Siyuan Liu <lsiyuan@google.com>	2025-04-21 18:36:59 -06:00
Chengji Yao	471fe65630	[TPU][V1] Implicitly adjust page size when there's SMEM OOM (#16871 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-21 15:43:13 -06:00
Woosuk Kwon	3a0fba5cf4	[V1][Spec Decode] Handle draft tokens beyond max_model_len (#16087 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-21 12:38:50 -07:00
qizixi	bb3605db85	[Bugfix] Fix v1/spec_decode/test_ngram.py (#16895 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-04-20 20:54:29 -07:00
Staszek Paśko	87aaadef73	Serialize tensors using int8 views (#16866 ) Signed-off-by: Staszek Pasko <staszek@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-19 10:28:34 -07:00
Isotr0py	83f3c3bd91	[Model] Refactor Phi-4-multimodal to use merged processor and support V1 (#15477 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-19 02:26:11 -07:00
vie-serendipity	d9737ca1c6	[V1][Misc] stop update prefix cache stats when logs_stats is disabled (#16460 ) Signed-off-by: vie-serendipity <2733147505@qq.com>	2025-04-19 02:25:19 -07:00
Nicolò Lucchesi	9d4ca19d50	[Misc] Benchmarks for audio models (#16505 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-19 02:24:14 -07:00
Nicolò Lucchesi	2ef0dc53b8	[Frontend] Add sampling params to `v1/audio/transcriptions` endpoint (#16591 ) Signed-off-by: Jannis Schönleber <joennlae@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Jannis Schönleber <joennlae@gmail.com>	2025-04-19 07:03:54 +00:00
Yang Fan	2c1bd848a6	[Model][VLM] Add Qwen2.5-Omni model support (thinker only) (#15130 ) Signed-off-by: fyabc <suyang.fy@alibaba-inc.com> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Xiong Wang <wangxiongts@163.com>	2025-04-18 23:14:36 -07:00
wang.yuqi	3d3ab3689f	[New Model]: Snowflake Arctic Embed (Family) (#16649 )	2025-04-18 08:11:57 -07:00
Harry Mellor	686623c5e7	Fix `nullable_kvs` fallback (#16837 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-18 05:58:39 -07:00
Harry Mellor	e78587a64c	Improve-mm-and-pooler-and-decoding-configs (#16789 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-17 22:13:32 -07:00
Cyrus Leung	c16fb5dae8	[Doc] Improve help examples for `--compilation-config` (#16729 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-17 21:22:34 -07:00
Tarun Kumar	e37073efd7	Add property-based testing for vLLM endpoints using an API defined by an OpenAPI 3.1 schema (#16721 ) Signed-off-by: Tarun Kumar <takumar@redhat.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-17 21:08:27 -07:00
Yihua Cheng	3408e47159	[P/D][V1] KV Connector API V1 (#15960 ) Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: remi <remi@mistral.ai> Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>	2025-04-17 13:22:40 -07:00
Nicolò Lucchesi	eb5819b2d9	[V1][TPU] Enable Top K (#15489 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Hyesoo Yang <hyeygit@gmail.com> Co-authored-by: Hyesoo Yang <hyeygit@gmail.com>	2025-04-17 18:18:11 +00:00
Nicolò Lucchesi	5989f4684d	[TPU][V1] Fix padding recompilation when `max-num-batched-tokens` is not even (#16726 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-17 18:09:57 +00:00
Nick Hill	05fcd1b430	[V1][Perf] Faster incremental detokenization (#15137 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-17 07:45:24 -07:00
Harry Mellor	d27ea94034	Improve configs - `TokenizerPoolConfig` + `DeviceConfig` (#16603 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-17 11:19:42 +00:00
intervitens	5b1aca2ae3	[Bugfix] Fix GLM4 model (#16618 ) Signed-off-by: intervitens <intervitens@tutanota.com>	2025-04-17 03:35:07 -07:00
Isotr0py	cb072ce93b	[Bugfix] Update Florence-2 tokenizer to make grounding tasks work (#16734 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-17 04:17:39 +00:00
Robert Shaw	2b05b8ce69	[V1][Frontend] Improve Shutdown And Logs (#11737 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Andrew Feldman <afeldman@neuralmagic.com> Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-16 19:48:34 -07:00
Staszek Paśko	3092375e27	[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased] (#16432 ) Signed-off-by: Staszek Pasko <staszek@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-16 19:28:32 -07:00
xsank	ee378f3d49	[Model] support modernbert (#16648 ) Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com> Co-authored-by: 唯勤 <xsank.mz@alibaba-inc.com>	2025-04-16 05:30:15 -07:00
Shanshan Shen	976711d9db	[V1][Structured Output] Move xgrammar related utils to `backend_xgrammar.py` (#16578 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-04-16 17:01:36 +08:00
Shinichi Hemmi	3badb0213b	[Model] Add PLaMo2 (#14323 ) Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com> Signed-off-by: shemmi <shemmi@preferred.jp> Co-authored-by: Kento Nozawa <nzw0301@preferred.jp> Co-authored-by: Hiroaki Mikami <mhiroaki@preferred.jp> Co-authored-by: Calvin Metzger <metzger@preferred.jp>	2025-04-15 19:31:30 -07:00
Angky William	fdcb850f14	[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server (#10546 ) Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local> Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>	2025-04-15 22:31:38 +00:00
Dipika Sikka	54a66e5fee	[Misc] Update `compressed-tensors` WNA16 to support zero-points (#14211 )	2025-04-15 07:33:51 -06:00
Jee Jee Li	1575c1701a	[CI/Build] Fix LoRA OOM (#16624 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-15 16:38:19 +08:00
Michael Goin	b4fe16c75b	Add `vllm bench [latency, throughput]` CLI commands (#16508 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-14 23:10:35 -07:00
Pooya Davoodi	bc5dd4f669	[Bugfix] Fix broken GritLM model and tests (missing pooling_metadata) (#16631 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>	2025-04-14 23:09:58 -07:00
Tyler Michael Smith	dbb036cf61	[Bugfix] Fix tests/kernels/test_mamba_ssm_ssd.py (#16623 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-04-15 05:35:38 +00:00
Jinzhen Lin	d06ba4ed3f	[Kernel] moe wna16 marlin kernel (#14447 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-04-14 20:05:22 -07:00
Alex Brooks	6b40996ae8	[Core][Bugfix] Fix Offline MM Beam Search (#16390 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-04-15 10:33:02 +08:00
courage17340	b1308b84a3	[Model][VLM] Add Kimi-VL model support (#16387 ) Signed-off-by: courage17340 <courage17340@163.com>	2025-04-14 21:41:48 +00:00
Nicolò Lucchesi	b3f2fddd17	[TPU][V1] Fix exponential padding when `max-num-batched-tokens` is not a power of 2 (#16596 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-14 17:01:05 +00:00
Cyrus Leung	aa29841ede	[Bugfix] Multi-modal caches not acting like LRU caches (#16593 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-14 09:24:16 -07:00
Russell Bryant	dc1b4a6f13	[Core][V0] Enable regex support with xgrammar (#13228 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-14 10:13:38 +08:00
Lily Liu	f49e5aff11	[V1][Spec Decode] KV cache slots for eagle heads (#16370 ) Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-04-12 19:42:51 -07:00
Ryan McConville	6c11ecf8d3	[Bugfix] Validate logit biases to prevent out of vocab ids crashing engine (#16529 ) Signed-off-by: Ryan McConville <ryan@ryanmcconville.com>	2025-04-12 20:19:19 +00:00
Cyrus Leung	d9fc8cd9da	[V1] Enable multi-input by default (#15799 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-12 08:52:39 +00:00
Cyrus Leung	c5bc0e7fcc	[Misc] Update chat utils tests (#16520 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-12 06:48:43 +00:00
wang.yuqi	fbf722c6e6	[Frontend] support matryoshka representation / support embedding API dimensions (#16331 )	2025-04-11 23:23:10 -07:00
leon-seidel	e92d7085bf	[Feature][V1] Add xgrammar to support minLength, maxLength with test (#16516 ) Signed-off-by: Leon Seidel <leon.seidel@fau.de>	2025-04-11 23:22:07 -07:00
Nick Hill	41cc883c29	[BugFix] Handle non-contiguous tensors properly when serializing (#16492 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-11 17:54:06 -07:00
Ye (Charlotte) Qi	16eda8c43a	[Frontend] Added chat templates for LLaMa4 pythonic tool calling (#16463 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Kai Wu <kaiwu@meta.com>	2025-04-12 06:26:17 +08:00
Travis Johnson	71b9cde010	[Bugfix] handle alignment of encoder_seq_lens in mllama.py (#14784 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2025-04-11 19:59:50 +00:00
Michael Goin	f41647ee6b	[Kernel] Support W8A8 channel-wise weights and per-token activations in triton fused_moe_kernel (#16366 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-11 17:54:08 +00:00
chaow-amd	9e90c9f73f	[Bugfix] Fix bugs of running Quark quantized models (#16236 ) Signed-off-by: chaow <chaow@amd.com>	2025-04-11 10:18:32 -04:00
DefTruth	e9528f6dc6	[Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-04-11 06:50:50 -06:00
Michael Goin	aa3b3d76e0	Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (#16447 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-11 08:09:52 +00:00
Isotr0py	93195146ea	[Bugfix][VLM] Fix failing Phi-4-MM multi-images tests and add vision-speech test (#16424 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-11 04:57:16 +00:00
Nicolò Lucchesi	3cc9af88ff	[TPU][V1] Disable per-request seed/Generator (#16172 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-10 17:05:44 -04:00
Nick Hill	dd143ef541	[V1] Zero-copy tensor/ndarray serialization/transmission (#13790 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-10 19:23:14 +00:00
Lily Liu	e8224f3dca	[V1][Spec Decode] Eagle Model loading (#16035 ) Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-04-10 11:21:48 -07:00
Russell Bryant	9665313c39	[V1] Set structured output backend to `auto` by default (#15724 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-10 17:53:26 +00:00
Cyrus Leung	83b824c8b4	[VLM] Remove `BaseProcessingInfo.get_mm_max_tokens_per_item` (#16408 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-10 09:06:58 -07:00
Cyrus Leung	3d4c87758e	[Misc] Update transformers version limits of multi-modal tests (#16381 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-09 23:03:33 -07:00
Yuxuan Zhang	1e44ffc3ff	Add GLM-4-0414 support (#16338 ) Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com> Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: yihong0618 <zouzou0208@gmail.com> Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: Ajay Vohra <ajayvohr@amazon.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com> Co-authored-by: Accelerator1996 <lvfei.lv@alibaba-inc.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: yihong <zouzou0208@gmail.com> Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Co-authored-by: ajayvohra2005 <ajayvohr@amazon.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-04-10 09:19:42 +08:00
Chengji Yao	a454748544	[TPU][V1] Refine tpu_model_runner to mitigate future recompilation issues (#16275 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-09 18:51:51 -06:00
Guillaume Calmettes	98d01d3ce2	[Bugfix][Frontend] respect provided default guided decoding backend (#15476 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-04-09 05:11:10 -07:00
Accelerator1996	24f6b9a713	[Misc] Fix test_sharded_state_loader.py(#16004 ) (#16005 ) Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com>	2025-04-09 14:47:30 +08:00
Luka Govedič	9cdde47289	[BugFix] Fix fusion test and add them to CI (#16287 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-04-08 23:46:45 -07:00
Chengji Yao	b1eb4ca152	[TPU] Update PyTorch/XLA (#16288 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-09 14:46:32 +08:00
Michael Goin	87b4ac56c2	[CI][Bugfix] Fix bad tolerance for test_batch_base64_embedding (#16221 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-09 04:14:46 +00:00
rongfu.leng	4716377fbc	[Feature] Estimate max-model-len use available KV cache memory (#16168 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-08 19:12:51 -07:00
Chauncey	102bf967f0	[Model] Add smolvlm support (#16017 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-04-08 19:12:17 -07:00
Jee Jee Li	86c3369eb8	[CI/Build] Fix CI LoRA failure (#16270 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-09 09:13:56 +08:00
Cyrus Leung	4ebc0b9640	[Bugfix] Proper input validation for multi-modal encoder-decoder models (#16156 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-08 09:45:21 -07:00
wang.yuqi	1f5d13ab9f	[New Model]: jinaai/jina-embeddings-v3 (#16120 )	2025-04-08 08:39:12 -07:00
Kebe	e11880deea	[Bugfix] Remove triton do_bench fast_flush arg (#16256 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2025-04-08 13:51:06 +00:00
Michael Goin	8e5314a468	[V1] Add `disable_chunked_mm_input` arg to disable partial mm input prefill (#15837 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-07 23:24:07 -07:00
Isotr0py	f6b32efb7f	[Bugfix] Fix and reorganize broken GGUF tests and bump gguf version (#16194 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-08 13:38:13 +08:00
Michael Goin	b99733d092	[Bugfix] Do not skip "empty" parts of chats that are parsable (#16219 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-08 05:14:15 +00:00
Roger Wang	f2ebb6f541	[V1] Scatter and gather placeholders in the model runner (#16076 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>	2025-04-08 10:43:41 +08:00
Driss Guessous	652907b354	Torchao (#14231 ) Signed-off-by: drisspg <drisspguessous@gmail.com>	2025-04-07 19:39:28 -04:00
leon-seidel	24f1c01e0f	[Bugfix][V0] XGrammar structured output supports Enum (#15878 ) Signed-off-by: Leon Seidel <leon.seidel@fau.de>	2025-04-07 22:38:25 +00:00
Nick Hill	7f6d47c1a2	[V1][BugFix] Exit properly if engine core fails during startup (#16137 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-07 15:30:15 -07:00
Nicolò Lucchesi	090c856d76	[Misc] Human-readable `max-model-len` cli arg (#16181 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-04-07 14:40:58 -04:00
Cyrus Leung	66d433b94f	[V1] Revert the default `max_num_seqs` to V0 values for most hardware (#16158 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-07 13:54:36 -04:00
Cyrus Leung	027b204ff1	[Bugfix] Re-enable support for `ChatGLMForConditionalGeneration` (#16187 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-07 23:15:58 +08:00
Lu Fang	55dcce91df	Upstream Llama4 Support to Main (#16113 ) Signed-off-by: Aston Zhang <22279212+astonzhang@users.noreply.github.com> Signed-off-by: Chris Thi <chris.c.thi@gmail.com> Signed-off-by: drisspg <drisspguessous@gmail.com> Signed-off-by: Jon Swenson <jmswen@gmail.com> Signed-off-by: Keyun Tong <tongkeyun@gmail.com> Signed-off-by: Lu Fang <fanglu@meta.com> Signed-off-by: Xiaodong Wang <xdwang@meta.com> Signed-off-by: Yang Chen <yangche@fb.com> Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Signed-off-by: Yong Hoon Shin <yhshin@meta.com> Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Signed-off-by: Lu Fang <lufang@fb.com> Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: Lucia Fang <fanglu@fb.com> Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Lu Fang <fanglu@fb.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-07 08:06:27 -07:00
YamPengLi	7699258ef0	[Model] Add Qwen3 and Qwen3MoE (#15289 ) Signed-off-by: YamPengLi <yampayne.lyp@alibaba-inc.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-04-07 04:06:41 -07:00
Roger Wang	bb8dab821e	[CI] Set max transformers version for Ultravox model test (#16149 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-04-07 04:37:58 +00:00
Isotr0py	fc0f87768a	[Bugfix] Make dummy encoder prompt padding alternative and add missing warnings (#16129 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-07 04:07:15 +00:00
Tristan Leclercq	4285e423a6	[Misc] Auto detect bitsandbytes pre-quantized models (#16027 ) Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com>	2025-04-04 23:30:45 -07:00
Roger Wang	af51d80fa1	Revert "[V1] Scatter and gather placeholders in the model runner" (#16075 )	2025-04-04 14:50:57 -07:00
Cyrus Leung	f5722a5052	[V1] Scatter and gather placeholders in the model runner (#15712 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-04-04 21:26:44 +00:00
Mark McLoughlin	a35a8a8392	[V1][Spec Decode] Avoid logging useless nan metrics (#16023 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-04-04 08:52:41 -07:00
bnellnm	dcc56d62da	[Bugfix] Fix function names in test_block_fp8.py (#16033 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-04-03 23:01:34 +00:00
Robert Shaw	f15e70d906	[TPU] Switch Test to Non-Sliding Window (#15981 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2025-04-03 14:28:45 -07:00
iefgnoix	b6be6f8d1e	[TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. (#15732 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-04-03 14:23:28 -07:00
bnellnm	15ba07ef25	[Minor] Fused experts refactor (#15914 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-04-03 10:19:38 -07:00
Liangfu Chen	d2b58ca203	[Neuron][kernel] Fuse kv cache into a single tensor (#15911 ) Signed-off-by: Liangfu Chen <liangfc@amazon.com>	2025-04-03 09:51:32 -07:00
Aleksandr Malyshev	e73ff24e31	[ROCM][KERNEL] Paged attention for V1 (#15720 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com>	2025-04-02 19:48:00 -07:00
Hyesoo Yang	1b84eff03a	[V1][TPU] TPU-optimized top-p implementation (avoids scattering). (#15736 ) Signed-off-by: Hyesoo Yang <hyeygit@gmail.com> Co-authored-by: root <root@t1v-n-822696b7-w-0.us-central2-b.c.tpu-prod-env-large-adhoc.internal>	2025-04-02 17:18:08 -07:00
Matthias Matt	cefb9e5a28	[Frontend] Implement Tool Calling with `tool_choice='required'` (#13483 ) Signed-off-by: Liangfu Chen <liangfc@amazon.com> Signed-off-by: Matt, Matthias <matthias.matt@tuwien.ac.at> Co-authored-by: Liangfu Chen <liangfc@amazon.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2025-04-02 07:45:45 -07:00
Mark McLoughlin	98d7367b61	[Metrics] Hide deprecated metrics (#15458 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-04-02 07:37:19 -07:00
Chauncey	594a8b9030	[Bugfix] Fix the issue where the model name is empty string, causing no response with the model name. (#15938 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-04-02 06:33:52 -07:00
Russell Bryant	14e53ed11f	[V1] Fix json_object support with xgrammar (#15488 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-02 02:00:08 -07:00
Eric Tang	ddb94c2605	[core] Add tags parameter to wake_up() (#15500 ) Signed-off-by: Eric <erictang000@gmail.com>	2025-04-02 01:59:27 -07:00
LukasBluebaum	90969fb39a	[Kernel] Add more dtype support for GGUF dequantization (#15879 ) Signed-off-by: lukas.bluebaum <lukas.bluebaum@aleph-alpha.com>	2025-04-02 01:58:48 -07:00
Jee Jee Li	4203926f10	[CI/Build] Further clean up LoRA tests (#15920 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-02 01:39:09 -07:00
Gerald	9ef98d527e	[Model][MiniMaxText01] Support MiniMaxText01 model inference (#13454 ) Signed-off-by: qscqesze <475517977@qq.com> Co-authored-by: qingjun <qingjun@minimaxi.com> Co-authored-by: qscqesze <475517977@qq.com>	2025-04-01 16:23:55 -04:00
Mark McLoughlin	a79cc68b3a	[V1][Metrics] Initial speculative decoding metrics (#15151 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-04-01 10:45:04 -07:00
Roger Wang	7e3f7a4ee7	[CI] Disable flaky structure decoding test temporarily. (#15892 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-04-01 17:42:34 +00:00
Jennifer Zhao	38327cf454	[Model] Aya Vision (#15441 ) Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-04-01 16:30:43 +00:00
Jee Jee Li	dfa82e2a3d	[CI/Build] Clean up LoRA tests (#15867 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-01 16:28:50 +00:00
bnellnm	e59ca942f5	Add option to use DeepGemm contiguous grouped gemm kernel for fused MoE operations. (#13932 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-04-01 12:07:43 -04:00
wang.yuqi	085cbc4f9f	[New Model]: jinaai/jina-reranker-v2-base-multilingual (#15876 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-01 08:32:26 -07:00
Michael Goin	51d7c6a2b2	[Model] Support Mistral3 in the HF Transformers format (#15505 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-04-01 06:10:05 -07:00
Varun Sundar Rabindranath	79455cf421	[Misc] Enable V1 LoRA by default (#15320 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-04-01 16:53:56 +08:00
Wei Zeng	30d6a015e0	[Feature] specify model in config.yaml (#15798 ) Signed-off-by: weizeng <weizeng@roblox.com>	2025-04-01 01:20:06 -07:00
Chen Zhang	3a5f0afcd2	[V1] Implement sliding window attention in kv_cache_manager (#14097 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-04-01 00:33:17 -07:00
Yan Ma	ff6473980d	[Bugfix][Model] fix mllama multi-image (#14883 ) Signed-off-by: yan ma <yan.ma@intel.com>	2025-03-31 22:53:37 -07:00
Harry Mellor	a76f547e11	Rename fallback model and refactor supported models section (#15829 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-31 22:49:41 -07:00
Ilya Markov	b7b7676d67	[Distributed] Add custom allreduce support for ROCM (#14125 ) Signed-off-by: ilmarkov <imarkov@redhat.com> Co-authored-by: ilmarkov <imarkov@redhat.com>	2025-03-31 22:49:12 -07:00
Mark McLoughlin	f98a4920f9	[V1][Core] Remove unused speculative config from scheduler (#15818 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-03-31 19:15:21 +00:00
Alexander Matveev	9a2160fa55	[V1] TPU CI - Add basic perf regression test (#15414 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-03-31 13:25:20 -04:00
shangmingc	239b7befdd	[V1][Spec Decode] Remove deprecated spec decode config params (#15466 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-03-31 09:19:35 -07:00
Cyrus Leung	09e974d483	[Bugfix] Check dimensions of multimodal embeddings in V1 (#15816 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-31 09:01:35 -07:00
Harry Mellor	e5ef4fa99a	Upgrade `transformers` to `v4.50.3` (#13905 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-31 08:59:37 -07:00
Alex Brooks	c2e7507ad4	[Bugfix] Fix Crashing When Loading Modules With Batchnorm Stats (#15813 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-03-31 13:23:53 +00:00
Naveassaf	3aa2b6a637	[Model] Update support for NemotronNAS models (#15008 ) Signed-off-by: Nave Assaf <nassaf@nvidia.com>	2025-03-31 20:35:14 +08:00
youkaichao	555aa21905	[V1] Fully Transparent Implementation of CPU Offloading (#15354 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-03-31 20:22:34 +08:00
Charlie Fu	e85829450d	[Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-03-31 04:42:18 -07:00
yihong	248e76c4df	fix: lint fix a ruff checkout syntax error (#15767 ) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-03-30 03:36:02 -07:00
Cyrus Leung	803d5c35f3	[V1] Override `mm_counts` for dummy data creation (#15703 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-30 03:20:42 -07:00
pansicheng	7fd8c0f85c	fix test_phi3v (#15321 ) Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>	2025-03-30 02:01:34 -07:00
Julien Denize	6909a76201	[Bugfix] Fix Mistral guided generation using xgrammar (#15704 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai>	2025-03-29 20:20:19 -07:00
Chauncey	045533716b	[CI] xgrammar structured output supports Enum. (#15757 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-03-29 20:20:02 -07:00
Roger Wang	c67abd614f	[V1] Support interleaved modality items (#15605 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-03-29 06:30:09 -07:00
TJian	4965ec42d2	[FEAT] [ROCm] Add AITER int8 scaled gemm kernel (#15433 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-03-29 03:33:56 -07:00
Russell Bryant	7a7992085b	[CI] Speed up V1 structured output tests (#15718 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-28 21:10:45 -07:00
Varun Sundar Rabindranath	1286211f57	[Bugfix] LoRA V1: add and fix entrypoints tests (#15715 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-28 21:10:41 -07:00
pengyuange	de1cb38769	[Model] Support Skywork-R1V (#15397 ) Signed-off-by: jiacai.liu <932997367@qq.com> Co-authored-by: jiacai.liu <932997367@qq.com>	2025-03-28 20:39:21 -07:00
Alexander Matveev	c3f687ac22	[V1] TPU - Fix the chunked prompt bug (#15713 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-03-28 20:19:04 +00:00
Luka Govedič	04437e313d	[Bugfix] [torch.compile] Add Dynamo metrics context during compilation (#15639 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-03-28 14:01:09 -06:00
Cyrus Leung	c6bc0034d0	[Misc] Remove unused utils and clean up imports (#15708 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-28 09:41:16 -07:00
Michael Goin	47e9038d23	Fix cpu offload testing for gptq/awq/ct (#15648 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-29 00:29:32 +08:00
Russell Bryant	7329ff5468	[V1] Support disable_any_whtespace for guidance backend (#15584 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-28 23:46:45 +08:00
Chauncey	3b00ff9138	[Bugfix][v1] xgrammar structured output supports Enum. (#15594 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-03-28 06:14:53 -07:00
Ce Gao	3bbaacbe15	[Bugfix][Frontend] Eliminate regex based check in reasoning full generator (#14821 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai>	2025-03-28 11:20:35 +00:00
Lize Cai	a10314c6b3	[Misc] Fix test_sleep to use query parameters (#14373 ) Signed-off-by: Lize Cai <lize.cai@sap.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-03-28 18:00:14 +08:00
Ce Gao	32b14baf8a	[Refactor][Frontend] Keep all logic about reasoning into one class (#14428 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai>	2025-03-28 00:23:30 -07:00
Robert Shaw	2d9045fce8	[TPU][CI] Fix TPUModelRunner Test (#15667 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2025-03-28 00:01:26 -07:00
Cyrus Leung	355f66348c	[V1] Remove legacy input registry (#15673 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-27 23:34:34 -07:00
Robert Shaw	8a49eea74b	[CI][TPU] Temporarily Disable Quant Test on TPU (#15649 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-03-27 19:45:05 -07:00
Jee Jee Li	726efc6a32	[Quantization][V1] BitsAndBytes support V1 (#15611 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-28 10:12:47 +08:00
Nick Hill	15dac210f0	[V1] AsyncLLM data parallel (#13923 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-27 16:14:41 -07:00
Nicolò Lucchesi	4098b72210	[Bugfix][TPU][V1] Fix recompilation (#15553 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-03-27 19:15:06 +00:00
Cyrus Leung	247181536f	[Misc] Replace `is_encoder_decoder_inputs` with `split_enc_dec_inputs` (#15620 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-27 17:36:32 +00:00
Cody Yu	54aa619459	[V1] Refactor num_computed_tokens logic (#15307 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-27 04:54:36 +00:00
Varun Sundar Rabindranath	8095341a01	[misc] LoRA: Remove unused long context test data (#15558 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-27 10:04:51 +08:00
ElizaWszola	9239bf718e	[Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972 ) Signed-off-by: ElizaWszola <eliza@neuralmagic.com> Signed-off-by: ElizaWszola <ewszola@redhat.com> Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com>	2025-03-27 00:54:44 +00:00
Matthew Vine	7a6d45bc8a	Support FIPS enabled machines with MD5 hashing (#15299 ) Signed-off-by: Matthew Vine <32849887+MattTheCuber@users.noreply.github.com>	2025-03-26 20:19:46 -04:00
Alexander Matveev	9d119a86ae	[V1] TPU CI - Fix test_compilation.py (#15570 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-03-26 21:51:54 +00:00
marko	27df5199d9	Support SHA256 as hash function in prefix caching (#15297 ) Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>	2025-03-26 11:11:28 -07:00
Nick Hill	35fad35a48	[V1][Sampler] Faster top-k only implementation (#15478 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-26 10:56:47 -07:00
Alex Brooks	1711b929b6	[Model] Add Reasoning Parser for Granite Models (#14202 ) Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Co-authored-by: Joe Runde <joe@joerun.de>	2025-03-26 14:28:07 +00:00
Harry Mellor	cf5c8f1686	Separate base model from `TransformersModel` (#15467 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-03-26 18:13:38 +08:00
wwl2755	99f536f830	[Misc] Enhance warning information to user-defined chat template (#15408 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-03-26 02:21:15 -07:00
vllmellm	5ebf66748b	[FEAT][ROCm] Integrate Fused MoE Kernels from AITER (#14967 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-03-26 16:30:30 +08:00
Cyrus Leung	997c8811d6	[Model] Support multi-image for Molmo (#15438 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-26 11:26:33 +08:00
Harry Mellor	e42389f9d7	Transformers backend already supports V1 (#15463 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-25 20:26:16 -07:00
Varun Sundar Rabindranath	ff38f0a32c	[CI/Build] LoRA: Delete long context tests (#15503 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-25 17:18:34 -07:00
Chenyaaang	ac3cd6e83c	[core] add bucket padding to tpu_model_runner (#14995 ) Signed-off-by: Chenyaaang <llccyy1212@gmail.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-03-25 17:27:22 -04:00
Lu Fang	082ab86f5f	[V1] Support long_prefill_token_threshold in v1 scheduler (#15419 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-03-25 14:22:26 -07:00
yarongmu-google	0a049c7d86	[CI/Build] Add tests for the V1 tpu_model_runner. (#14843 ) Signed-off-by: Yarong Mu <ymu@google.com>	2025-03-25 12:27:16 -04:00
Cyrus Leung	a9e879b316	[Misc] Clean up MiniCPM-V/O code (#15337 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-25 10:22:52 +00:00

... 4 5 6 7 8 ...

2143 Commits