vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
qscqesze	387bdf0ab9	[Model] Add support for MiniMaxM1ForCausalLM (shares architecture with MiniMaxText01ForCausalLM) (#19677 ) Signed-off-by: QscQ <qscqesze@gmail.com>	2025-06-16 09:47:14 -07:00
Isotr0py	1173804dca	[Bugfix] Fix TP inference for Flex attention backend (#19657 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-16 11:21:37 +00:00
wang.yuqi	f40f763f12	[CI] Add mteb testing for rerank models (#19344 )	2025-06-16 01:36:43 -07:00
Chengji Yao	a77aea59fd	[TPU] support attention head dim smaller than 128 (#19620 ) Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-06-16 06:40:53 +00:00
Ye (Charlotte) Qi	b692e9cd07	[Misc] Fix skipped max-model-len validation when deriving max model length from tokenizer config (#19660 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-06-16 06:30:29 +00:00
quanliu	92183b41f3	[Bugfix][Core] Prefix caching causes incorrect outputs due to outdated ComputedBlocksTracker (#18957 ) Signed-off-by: 刘全 <quan.liu2@dbappsecurity.com.cn> Co-authored-by: 刘全 <quan.liu2@dbappsecurity.com.cn>	2025-06-15 21:56:37 -07:00
22quinn	0b73736a0d	[Kernel] Raise verbose error and consolidate `num_heads/num_kv_heads` divisibility check (#19339 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-15 13:43:48 +08:00
Lu Fang	ee1531bc38	[Bugfix][2/n] Fix speculative decoding CI - Fix test_ngram_e2e_greedy_correctness (#19644 )	2025-06-14 21:15:41 -07:00
Isotr0py	2db9044ab6	[Bugfix] Fix auto dtype casting for BatchFeature (#19316 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-06-14 15:13:08 +00:00
Lu Fang	06be858828	[Bugfix] Fix the speculative decoding test by setting the target dtype (#19633 )	2025-06-13 20:57:32 -07:00
Concurrensee	d65668b4e8	Adding "AMD: Multi-step Tests" to amdproduction. (#19508 ) Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-06-13 17:08:51 -07:00
Luka Govedič	3597b06a4f	[CUDA] Enable full cudagraph for FlashMLA (#18581 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-06-13 18:12:26 +00:00
Ekagra Ranjan	017ef648e9	[Spec Decode][Benchmark] Generalize spec decode offline benchmark to more methods and datasets (#18847 )	2025-06-12 10:30:56 -07:00
Luka Govedič	f98548b9da	[torch.compile][ROCm] Fuse quantization onto attention using a torch.compile pass (#16756 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Sage Moore <sage@neuralmagic.com>	2025-06-12 08:31:04 -07:00
mobicham	96846bb360	Fix TorchAOConfig skip layers (#19265 ) Signed-off-by: mobicham <hicham@mobiuslabs.com>	2025-06-12 22:22:53 +08:00
Wentao Ye	b6efafd9e4	[Perf] Vectorize static / dynamic INT8 quant kernels (#19233 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-12 06:51:41 -07:00
jmswen	c9280e6346	[Bugfix] Respect num-gpu-blocks-override in v1 (#19503 ) Signed-off-by: Jon Swenson <jmswen@gmail.com>	2025-06-12 11:00:23 +00:00
Nick Hill	d5bdf899e4	[BugFix] Work-around incremental detokenization edge case error (#19449 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-12 06:43:20 +00:00
Ning Xie	2f1c19b245	[CI] change spell checker from codespell to typos (#18711 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-06-11 19:57:10 -07:00
bnellnm	29fa5cac1c	[Kernels] Add activation chunking logic to FusedMoEModularKernel (#19168 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-06-11 12:53:10 -04:00
Cyrus Leung	a2142f0196	Support non-string values in JSON keys from CLI (#19471 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-06-11 09:34:04 +00:00
leopardracer	7c644ab6d5	Fix Typo in Documentation and Function Name (#19442 )	2025-06-10 22:44:11 -07:00
Michael Goin	1e473b3010	[CI] Disable failing GGUF model test (#19454 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-11 05:12:38 +00:00
Lu Fang	2b1e2111b0	Fix test_max_model_len in tests/entrypoints/llm/test_generate.py (#19451 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-06-11 12:54:59 +08:00
wang.yuqi	3952731e8f	[New Model]: Support Qwen3 Embedding & Reranker (#19260 )	2025-06-10 20:07:30 -07:00
Richard Zou	77f0d465d0	[BugFix] Allow use_cudagraph to work with dynamic VLLM_USE_V1 (#19390 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-06-11 07:54:41 +08:00
Isotr0py	5f1ac1e1d1	Revert "[v1] Add fp32 support to v1 engine through flex attn" (#19404 )	2025-06-10 01:30:20 -07:00
Nick Hill	646d62f636	[Core] Use tuple for kv cache group block ids (#19175 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-10 07:01:17 +02:00
Siyuan Liu	7d44c469fe	[TPU]Fix KV cache sharing tests (#19371 )	2025-06-09 18:38:15 -04:00
liusiqian-tal	31f58be96a	[Frontend] Make TIMEOUT_KEEP_ALIVE configurable through env var (#18472 ) Signed-off-by: liusiqian <liusiqian@tal.com>	2025-06-09 21:41:21 +00:00
22quinn	c1c7dbbeeb	[Bugfix][Core] Prevent token lengths exceeding `max_model_len` in V0 (#19348 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-09 23:01:29 +08:00
Varun Sundar Rabindranath	5cf2daea9a	[Misc] Fixes and Optimizations for DeepEP + DeepGEMM combination. (#19298 ) Signed-off-by: Varun <vsundarr@redhat.com> Co-authored-by: Varun <vsundarr@redhat.com>	2025-06-09 10:50:39 -04:00
Isotr0py	b8089195b4	[v1] Add fp32 support to v1 engine through flex attn (#19319 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-06-09 22:10:44 +08:00
Jee Jee Li	95a6568b5c	[CI/Build] Fix LoRA test (#19350 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-06-09 09:52:10 +00:00
Richard Zou	3a4d417707	[Misc] Cleanup compilation tests (#19343 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-06-09 15:05:44 +08:00
Dipika Sikka	c123bc33f9	[Quantization] Add compressed-tensors NVFP4 support (#18312 )	2025-06-08 09:05:55 -04:00
Richard Zou	3d64d366e0	[Misc] Change tests/compile to use VLLM_V1 by default (#19302 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-06-08 16:06:48 +08:00
Richard Zou	eaa2e51088	[Bugfix] Re-enable use_cudagraph in vLLM v1 (#19299 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2025-06-08 08:56:12 +08:00
Luka Govedič	2d8476e465	[BugFix][V1] Fix memory profiling bug (#18974 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-06-07 10:34:51 -07:00
Isotr0py	d2f0e7e615	[CI/Build] Improve Llama GGUF test robustness (#19287 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-07 17:23:28 +08:00
Driss Guessous	cf02f9b283	Add FlexAttention to V1 (#16078 ) Signed-off-by: drisspg <drisspguessous@gmail.com>	2025-06-06 21:58:55 -07:00
ElizaWszola	84166fee97	[Kernel] Integrate CUTLASS MoE kernel with PPLX (#18762 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-06-06 18:26:11 -07:00
Lu Fang	6e0cd10f72	[Easy][Test] Simplify test_function_tool_use with multiple parametrizes (#19269 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-06-07 09:19:09 +08:00
Nick Hill	46ecc57973	[BugFix] Fix tpu_model_runner block_id concatenation (#19228 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-06 16:28:17 -07:00
Adolfo Victoria	ca27f0f9c1	[Bugfix][Core] Update cancellation logic in `generate()` to handle Generator exits (#19225 ) Co-authored-by: Adolfo Victoria <adovi@meta.com>	2025-06-06 20:17:54 +00:00
Nick Hill	aad30bd306	[BugFix] Fix MultiConnector test after HMA changes (#19291 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-06 20:16:24 +00:00
jmswen	7353492a47	[Core] Raise when non-multi-instance DP clients target a DP rank (#19227 ) Signed-off-by: Jon Swenson <jmswen@gmail.com>	2025-06-06 19:03:01 +08:00
Siqi Yan	f168b85725	Unit Test for run_dp_sharded_vision_model (#19103 ) Signed-off-by: Siqi Yan <siqi@meta.com> Co-authored-by: Siqi Yan <siqi@meta.com>	2025-06-06 16:24:02 +08:00
Richard Zou	da511d54d8	Fix CompilationConfig repr (#19091 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-06-06 16:23:35 +08:00
Dipika Sikka	94870359cd	[Quantization] Bump compressed-tensors version; update NVFP4A16 test model (#19224 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>	2025-06-06 01:21:54 -07:00
Chengji Yao	b61dc5f972	[TPU] update torch_xla pin (#19231 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-06-06 04:27:38 +00:00
Chen Zhang	f8a1a2d108	[v1] Hybrid Memory Allocator (#17996 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-05 20:47:09 -07:00
Benjamin Chislett	3465b87ef8	[Bugfix] Fix EAGLE vocab embedding construction for Llama 70B (#19033 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-06-05 19:10:08 -07:00
Jerry Zhang	c8134bea15	Fix AOPerModuleConfig name changes (#18869 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-06-05 18:51:32 -07:00
Luis Vega	cb6d572e85	[Model] NemotronH support (#18863 ) Signed-off-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com> Co-authored-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>	2025-06-05 21:29:28 +00:00
Dipika Sikka	aa49f14832	[Quantization] Skip Fp4 Test for `compressed-tensors` (#19217 )	2025-06-05 18:21:53 +00:00
Povilas Kanapickas	85e2b7bb13	[MISC][Bugfix] Use less CPU when message queue has been empty for some time (#16226 ) Signed-off-by: Povilas Kanapickas <povilas@radix.lt>	2025-06-05 16:53:08 +00:00
Chiyue Wei	61059bee40	[Hardware][NVIDIA] FP4 MoE kernel optimization (#19110 ) Signed-off-by: Chiyue Wei <chiyuew@nvidia.com> Co-authored-by: Chiyue Wei <chiyuew@nvidia.com>	2025-06-05 09:48:26 -07:00
Guillaume Calmettes	9bc8bb07cf	[Bugfix] properly catch PIL-related errors for vision models when incorrect data urls are provided (#19202 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-06-05 12:59:28 +00:00
Chauncey	8fc57501d3	[Bugfix]: Fix the incompatibility issue with stream when Thinking is disabled (#19135 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-06-05 06:24:24 +00:00
Robert Shaw	c56ed8bb0e	[Bugfix][Nixl] Fix full prefix cache hit bug (#18632 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-06-05 02:07:32 +00:00
Nicolò Lucchesi	b2fac67130	[P/D] Heterogeneous TP (#18833 ) Signed-off-by: nicklucche <nlucches@redhat.com>	2025-06-04 23:25:34 +00:00
Varun Sundar Rabindranath	c3fd4d669a	[Kernel] Integrate batched/masked deepgemm kernel (#19111 ) Signed-off-by: Varun <vsundarr@redhat.com> Co-authored-by: Varun <vsundarr@redhat.com>	2025-06-04 21:59:18 +00:00
Siyuan Liu	7ee2590478	[TPU] Update dynamo dump file name in compilation test (#19108 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-06-04 16:13:43 -04:00
jmswen	c8dcc15921	Allow AsyncLLMEngine.generate to target a specific DP rank (#19102 ) Signed-off-by: Jon Swenson <jmswen@gmail.com>	2025-06-04 08:26:47 -07:00
Cyrus Leung	01dc9a76db	[CI/Build][Bugfix] Ensure compatibility with transformers 4.52 (#18678 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-06-04 04:49:20 -07:00
wang.yuqi	35cf32df30	Improve the output precision of embedding models (#19092 )	2025-06-04 11:48:57 +00:00
Seiji Eicher	2669a0d7b5	Fix ValueError: Missing value for tag key(s): model_name,engine. (#19113 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-06-04 17:10:45 +08:00
Siyuan Liu	8e972d9c44	[TPU] Skip hanging tests (#19115 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-06-04 01:43:00 -07:00
Woosuk Kwon	b124e1085b	[Bugfix] Fix FA3 full cuda graph correctness (#19106 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-06-03 23:10:15 -07:00
Kaixi Hou	41aa578428	[NVIDIA] Add Cutlass MLA backend (#17625 )	2025-06-03 21:40:26 -07:00
Vadim Gimpelson	5d6d1adf15	[KERNEL] Sampler. CUDA kernel for applying repetition penalty (#18437 )	2025-06-03 21:13:01 -07:00
Li, Jiang	4555143ea7	[CPU] V1 support for the CPU backend (#16441 )	2025-06-03 18:43:01 -07:00
Yan Ru Pei	b712be98c7	feat: add data parallel rank to KVEventBatch (#18925 )	2025-06-03 17:14:20 -07:00
Chen Zhang	a8da78eac9	[Bugfix] Max concurrency estimation and check_enough_kv_cache_memory for models with sliding window layers (#19029 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-04 00:14:06 +00:00
Chauncey	4de790fcad	[Bugfix]: Fix the incompatibility issue with tool_choice 'required' when Thinking is enabled (#19075 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-06-03 23:27:24 +00:00
Chen Zhang	b5fd9506c1	[Bugfix] get_num_blocks_to_allocate with null_block (#19031 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-03 15:30:55 -07:00
Chen Zhang	6cac54f4d1	[v1] Re-init input batch for multiple kv cache groups (#18654 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-03 21:41:36 +00:00
Harry Mellor	6865fe0074	Fix interaction between `Optional` and `Annotated` in CLI typing (#19093 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Yikun Jiang <yikun@apache.org>	2025-06-03 21:07:19 +00:00
Yong Hoon Shin	bdf13965ab	[V1] Support cross-layer KV sharing (#18212 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-06-03 20:33:07 +00:00
Varun Sundar Rabindranath	fa98d77773	[Kernel] DeepEP dispatch-combine kernel integration (#18434 ) Signed-off-by: Varun <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-06-03 12:30:02 -07:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Harry Mellor	476844d44c	Fix underscores in dict keys passed via CLI (#19030 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-06-03 14:39:24 +00:00
Jee Jee Li	4e68ae5e59	[CI/Build] Remove V0 LoRA test (#19066 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-06-03 14:30:18 +00:00
Chen Zhang	f32fcd9444	[v1][KVCacheManager] Rename BlockHashType to BlockHash (#19015 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-03 08:01:48 +00:00
汪志鹏	1282bd812e	Add tarsier model support (#18985 ) Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>	2025-06-03 13:13:13 +08:00
Rui Qiao	bdce64f236	[V1] Support DP with Ray (#18779 )	2025-06-02 21:15:13 -07:00
Siyuan Liu	9112b443a0	[Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com> Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>	2025-06-03 00:06:20 +00:00
22quinn	9760fd8f6a	[Core] Support inplace model weights loading (#18745 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-02 17:38:50 +08:00
Cyrus Leung	6aa8f9a4e7	[Core] Rework dtype resolution (#18751 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-06-01 11:04:23 +08:00
Reid	20079c6e36	[Misc] add return token strs for tokenize (#18941 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-31 18:00:11 +00:00
Charlie Fu	306d60401d	[ROCm][Kernel] Add gfx950 support for skinny gemms (#18010 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-05-31 07:40:05 -07:00
vllmellm	0f5e0d567e	[FEAT][ROCm] Add AITER grouped topk for DeepSeekV2 (#18825 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-05-31 03:39:31 -07:00
Pooya Davoodi	dff80b0e42	[Frontend] Add rerank support to run_batch endpoint (#16278 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>	2025-05-31 07:40:01 +00:00
Will Eaton	1dab4d5718	Tool parser regex timeout handling (#18960 ) Signed-off-by: Will Eaton <weaton@redhat.com>	2025-05-30 21:02:54 +00:00
Isotr0py	5a8641638a	[VLM] Add PP support and fix GPTQ inference for Ovis models (#18958 ) Signed-off-by: isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-30 17:11:44 +00:00
Nick Hill	2dbe8c0774	[Perf] API-server scaleout with many-to-many server-engine comms (#17546 )	2025-05-30 08:17:00 -07:00
Shawn Huang	e1fadf1197	[Feature] minicpm eagle support (#18943 ) Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com> Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com>	2025-05-30 06:45:56 -07:00
Carol Zheng	fba02e3bd1	[Bugfix][TPU] Fix tpu model runner testcase failure (#18810 ) Signed-off-by: Carol Zheng <cazheng@google.com>	2025-05-30 18:04:03 +08:00
Cyrus Leung	1aa2f81b43	[Misc] Update type annotation for rotary embedding `base` (#18914 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-30 10:17:01 +08:00
Chengji Yao	a1cc9f33a3	[TPU] remove transpose ops in moe kernel (#18923 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-05-29 23:00:11 +00:00
Nick Hill	d1d61f3351	[BugFix] Make DP work with connector-delayed new requests (#18559 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Will Eaton <weaton@redhat.com>	2025-05-29 18:04:18 +00:00
Nicolò Lucchesi	32ce3cf7c9	[V1] Allocate kv_cache with stride order for V1 (#18775 ) Signed-off-by: nicklucche <nlucches@redhat.com>	2025-05-29 17:54:16 +00:00
Cyrus Leung	c29034037d	[Deprecation] Disallow pos-args other than `model` when initializing `LLM` (#18802 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-29 09:36:58 -07:00
Isotr0py	c9479b2920	[Bugfix] Fix the failing gte embedding test (#18720 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-29 07:39:25 -07:00
Satyajith Chilappagari	972eddf7c9	[Neuron] Add multi-LoRA support for Neuron. (#18284 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>	2025-05-29 16:41:22 +08:00
Richard Zou	26b4fa45be	Add ability to use CUDAGraphs with use_inductor=False (#17345 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-05-29 10:16:52 +08:00
Hongxia Yang	269d901734	[Bugfix][ROCm] fix the power of 2 exception from triton_unified_attention.py when running llama4 models and unit test fix (#18100 ) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-29 07:21:46 +08:00
Akshat Tripathi	643622ba46	[Hardware][TPU][V1] Multi-LoRA Optimisations for the V1 TPU backend (#15655 ) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Signed-off-by: xihajun <junfan@krai.ai> Signed-off-by: Jorge de Freitas <jorge.de-freitas22@imperial.ac.uk> Signed-off-by: Jorge de Freitas <jorge@krai.ai> Co-authored-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: xihajun <junfan@krai.ai> Co-authored-by: Jorge de Freitas <jorge.de-freitas22@imperial.ac.uk> Co-authored-by: Jorge de Freitas <jorge@krai.ai>	2025-05-28 19:59:09 +00:00
Mark McLoughlin	0e98964e94	[V1][Metrics] Remove metrics that were deprecated in 0.8 (#18837 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-28 18:54:12 +00:00
Alex Brooks	321331b8ae	[Core] Add Lora Support to Beam Search (#18346 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-05-28 08:58:24 -07:00
Reid	435fa95444	[Frontend] add run batch to CLI (#18804 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-28 07:08:57 -07:00
Harry Mellor	4c2b38ce9e	Enable Pydantic mypy checks and convert configs to Pydantic dataclasses (#17599 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-28 12:46:04 +00:00
wang.yuqi	de65fc8e1e	[CI] improve embed testing (#18747 )	2025-05-28 00:16:35 -07:00
Rabi Mishra	b78f844a67	[Bugfix][FailingTest]Fix test_model_load_with_params.py (#18758 ) Signed-off-by: rabi <ramishra@redhat.com>	2025-05-28 05:42:54 +00:00
wang.yuqi	3e9ce609bd	[Bugfix] Fix nomic max_model_len (#18755 )	2025-05-27 20:29:53 -07:00
Satyajith Chilappagari	e0cbad4e30	[Neuron] Support quantization on neuron (#18283 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>	2025-05-27 22:10:33 +00:00
Michael Goin	5873877241	[Bugfix] Mistral tool calling when content is list (#18729 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-27 09:05:37 -07:00
Mark McLoughlin	06a0338015	[V1][Metrics] Add API for accessing in-memory Prometheus metrics (#17010 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-27 09:37:06 +00:00
Isotr0py	1f1b1bc03b	[V1][Quantization] Add CUDA graph compatible v1 GGUF support (#18646 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-27 04:40:28 +00:00
Cyrus Leung	82e2339b06	[Doc] Move examples and further reorganize user guide (#18666 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-26 07:38:04 -07:00
Ning Xie	5a2c76cbe1	[CI] fix dump_input for str type (#18697 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-05-26 18:23:35 +08:00
Cyrus Leung	38b13dfe78	[CI/Build] Replace `math.isclose` with `pytest.approx` (#18703 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-26 02:05:17 -07:00
Ning Xie	4ea62c0ea0	[CI] add missing argument (#18694 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-05-26 00:22:04 -07:00
Cyrus Leung	fba0642704	[CI/Build][Doc] Update `gte-Qwen2-1.5B-instruct` usage (#18683 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-05-25 20:27:50 -07:00
Cyrus Leung	57fd13a707	[Bugfix] Fix profiling dummy data for Pixtral (#18677 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-25 14:05:30 +00:00
Michael Goin	63934543a0	Speed up the `kernels/quantization/` tests (#18669 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-25 05:02:59 +00:00
Isotr0py	75f81750f3	[VLM] Initialize video input support for InternVL models (#18499 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-05-25 04:51:25 +00:00
Mengqing Cao	6ab681bcbe	[Misc][ModelScope] Change to use runtime VLLM_USE_MODELSCOPE (#18655 ) Signed-off-by: Mengqing Cao <cmq0113@163.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-05-25 04:51:21 +00:00
qizixi	c1e4a4052d	[V1][Spec Decode] Support multi-layer eagle draft model (#18030 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-05-24 09:45:34 +00:00
Yuanhao WU	a859320575	[Model] Add support for Qwen2.5-Omni-7B-AWQ (Qwen2_5OmniForConditionalGeneration) (#18647 )	2025-05-24 09:15:36 +00:00
qizixi	d55e446d13	[V1][Spec Decode] Small refactors to improve eagle bookkeeping performance (#18424 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-05-24 06:51:22 +00:00
Robert Shaw	2b10ba7491	[Bugfix][Nixl] Fix Preemption Bug (#18631 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-05-23 23:30:16 +00:00
Feng XiaoLong	4fc1bf813a	[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454 ) Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com> Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>	2025-05-23 16:16:26 -07:00
Michael Goin	0ddf88e16e	[CI] Enable test_initialization to run on V1 (#16736 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-23 15:09:44 -07:00
Chen Zhang	6550114c9c	[v1] Redo "Support multiple KV cache groups in GPU model runner (#17945 )" (#18593 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-23 09:39:47 -07:00
Ning Xie	cd821ea5d2	[CI] fix kv_cache_type argument (#18594 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-05-23 04:49:18 -07:00
Chauncey	b046cf792d	[Feature][V1]: suupports cached_tokens in response usage (#18149 ) Co-authored-by: simon-mo <xmo@berkeley.edu>	2025-05-23 01:41:03 -07:00
cascade	71ea614d4a	[Feature]Add async tensor parallelism using compilation pass (#17882 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-05-23 01:03:34 -07:00
aws-elaineyz	ed5d408255	[Neuron] Remove bypass on EAGLEConfig and add a test (#18514 ) Signed-off-by: Elaine Zhao <elaineyz@amazon.com>	2025-05-22 21:26:32 -07:00
lkchen	e44d8ce8c7	[Bugfix] Set `KVTransferConfig.engine_id` in post_init (#18576 ) Signed-off-by: Linkun Chen <github@lkchen.net>	2025-05-23 02:54:42 +00:00
Harry Mellor	4b0da7b60e	Enable hybrid attention models for Transformers backend (#18494 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-23 10:12:08 +08:00
Mark McLoughlin	c6b636f9fb	[V1][Spec Decoding] Use model_loader.get_model() to load models (#18273 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-23 02:05:44 +00:00
Chenheli Hua	04eb88dc80	Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. (#18569 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-05-23 01:59:18 +00:00
rasmith	46791e1b4b	[AMD] [P/D] Compute num gpus for ROCm correctly in run_accuracy_test.sh (#18568 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-05-22 18:45:35 -07:00
Sanger Steel	c32e249a23	[Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization (#17926 ) Signed-off-by: Sanger Steel <sangersteel@gmail.com>	2025-05-22 18:44:18 -07:00
Kai Wu	c91fe7b1b9	[Frontend][Bug Fix] Update llama4 pythonic jinja template and llama4_pythonic parser (#17917 ) Signed-off-by: Kai Wu <kaiwu@meta.com>	2025-05-22 16:44:08 -07:00
Tyler Michael Smith	6e588da0f4	[Build/CI] Fix CUDA 11.8 build (#17679 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-22 12:13:54 -07:00
David Xia	1f3a1200e4	[Bugfix] make `test_openai_schema.py` pass (#18224 ) Signed-off-by: David Xia <david@davidxia.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-22 18:34:06 +00:00
Harry Mellor	ca86a7cf6e	[CI/Build] Update bamba test model location (#18544 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-22 06:01:07 -07:00

1 2 3 4 5 ...

2233 Commits