vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
bnellnm	5e5baa91aa	[Kernels] Use empty for modular MoE workspaces (#19667 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-06-16 14:58:01 +00:00
Chauncey	836d4ce140	[Bugfix] fix missing 'finish_reason': null in streaming chat (#19662 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-06-16 14:10:39 +00:00
Ning Xie	c3fec47bb7	[MISC] bump huggingface_hub pkg to 0.33.0 (#19547 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-06-16 05:22:28 -07:00
Isotr0py	1173804dca	[Bugfix] Fix TP inference for Flex attention backend (#19657 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-16 11:21:37 +00:00
Shawn Tan	4d5424029b	[Feature]:Allow for Granite MoE Hybrid models with _only_ shared experts. (#19652 ) Signed-off-by: Shawn Tan <shawntan@ibm.com>	2025-06-16 11:14:18 +00:00
Navanit Dubey	3e7506975c	[DOC] Add reasoning capability to vLLM streamlit code (#19557 )	2025-06-16 07:09:12 -04:00
Nick Hill	ee35e96ac3	[BugFix] Don't catch BaseException when dumping execute_model errors (#19626 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-16 11:01:08 +00:00
Szymon Ożóg	dec66d253b	[Kernel] GGUF MMVQ kernel for multiple input vectors (#18754 ) Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>	2025-06-16 17:33:26 +08:00
Russell Bryant	8d120701fd	[Docs] Move multiproc doc to v1 dir (#19651 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-06-16 09:10:12 +00:00
wang.yuqi	f40f763f12	[CI] Add mteb testing for rerank models (#19344 )	2025-06-16 01:36:43 -07:00
Ning Xie	26bc46ef89	[MISC] typo fix (#19672 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-06-16 07:18:49 +00:00
Chengji Yao	a77aea59fd	[TPU] support attention head dim smaller than 128 (#19620 ) Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-06-16 06:40:53 +00:00
Ye (Charlotte) Qi	b692e9cd07	[Misc] Fix skipped max-model-len validation when deriving max model length from tokenizer config (#19660 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-06-16 06:30:29 +00:00
Francesco Bertolotti	367871a469	[Misc][Frontend] passthrough `bad_words` (#19564 ) Signed-off-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai> Co-authored-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai> Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com>	2025-06-16 05:05:13 +00:00
quanliu	92183b41f3	[Bugfix][Core] Prefix caching causes incorrect outputs due to outdated ComputedBlocksTracker (#18957 ) Signed-off-by: 刘全 <quan.liu2@dbappsecurity.com.cn> Co-authored-by: 刘全 <quan.liu2@dbappsecurity.com.cn>	2025-06-15 21:56:37 -07:00
Lu Fang	c6703d1e0d	[MISC] Remove unused variableds in C++ (#19609 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-06-15 20:05:28 -07:00
Isotr0py	a5e7242d5f	[Misc] Remove duplicate multiproc method setting for CPU platform (#19649 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-16 02:26:58 +00:00
Richard Zou	91b2c17a55	[CI/Build] Fix torch nightly CI dependencies part 2 (#19589 )	2025-06-15 20:01:10 +08:00
Woosuk Kwon	055915e6ce	Enable prefix caching with full cuda graphs (#19617 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-06-15 01:05:05 -07:00
Wentao Ye	3d330c4c09	[Benchmark] Refactor benchmark script for fp8 & int8 (#19627 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-15 15:15:37 +08:00
22quinn	0b73736a0d	[Kernel] Raise verbose error and consolidate `num_heads/num_kv_heads` divisibility check (#19339 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-15 13:43:48 +08:00
Lu Fang	ee1531bc38	[Bugfix][2/n] Fix speculative decoding CI - Fix test_ngram_e2e_greedy_correctness (#19644 )	2025-06-14 21:15:41 -07:00
Ilya Markov	e13945f9dd	[Perf] Further tunings for SM100 FP8 CUTLASS kernel (#19566 )	2025-06-14 17:25:10 -07:00
maobaolong	08500011d3	[Fix] Convert kv_transfer_config from dict to KVTransferConfig (#19262 )	2025-06-14 12:32:07 -07:00
Konrad Zawora	861a0a0a39	[Bugfix] Don't attempt to use triton if no driver is active (#19561 )	2025-06-14 12:30:54 -07:00
Huy Do	bc956b38d0	Only build CUTLASS MoE kernels on Hopper (#19648 )	2025-06-14 11:44:15 -07:00
jiahanc	294fc1e2c9	[Hardware][NVIDIA][kernel] Fp4 MOE quant kernel optimization (#19500 )	2025-06-14 09:34:28 -07:00
Isotr0py	2db9044ab6	[Bugfix] Fix auto dtype casting for BatchFeature (#19316 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-06-14 15:13:08 +00:00
Reid	6fa718a460	[Misc] Modularize CLI Argument Parsing in Benchmark Scripts (#19593 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-14 16:54:52 +08:00
Lu Fang	06be858828	[Bugfix] Fix the speculative decoding test by setting the target dtype (#19633 )	2025-06-13 20:57:32 -07:00
Saheli Bhattacharjee	d1e34cc9ac	[V1][Metrics] Deprecate metrics with gpu_ prefix for non GPU specific metrics. (#18354 ) Signed-off-by: Saheli Bhattacharjee <saheli@krai.ai>	2025-06-14 11:07:36 +08:00
Nick Hill	bd517eb9fe	[BugFix] Fix DP Coordinator incorrect debug log message (#19624 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-14 00:18:03 +00:00
Concurrensee	d65668b4e8	Adding "AMD: Multi-step Tests" to amdproduction. (#19508 ) Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-06-13 17:08:51 -07:00
Woosuk Kwon	aafbbd981f	[torch.compile] Use custom ops when use_inductor=False (#19618 )	2025-06-13 15:05:54 -07:00
Anna Pendleton	0f0874515a	[Doc] Add troubleshooting section to k8s deployment (#19377 ) Signed-off-by: Anna Pendleton <pendleton@google.com>	2025-06-13 21:47:51 +00:00
Luka Govedič	3597b06a4f	[CUDA] Enable full cudagraph for FlashMLA (#18581 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-06-13 18:12:26 +00:00
Reid	1015296b79	[doc][mkdocs] fix the duplicate Supported features sections in GPU docs (#19606 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-13 16:25:08 +00:00
Wentao Ye	ce9dc02c93	[Refactor] Remove unused variables in `moe_permute_unpermute_kernel.inl` (#19573 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-13 06:12:15 -07:00
qscqesze	a24cb91600	[Model] Fix minimax model cache & lm_head precision (#19592 ) Signed-off-by: qingjun <qingjun@minimaxi.com>	2025-06-13 12:08:20 +00:00
Nick Hill	7e8d97dd3f	[BugFix] Honor `enable_caching` in connector-delayed kvcache load case (#19435 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-13 09:46:32 +00:00
youkaichao	d70bc7c029	[torch.compile] reorganize the cache directory to support compiling multiple models (#19064 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-06-13 15:23:25 +08:00
Boyuan Feng	ce688ad46e	use base version for version comparison (#19587 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-06-13 15:09:34 +08:00
汪志鹏	cefdb9962d	[Fix] The zip function in Python 3.9 does not have the strict argument (#19549 ) Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>	2025-06-13 14:57:48 +08:00
汪志鹏	ace5cdaff0	[Fix] bump mistral common to support magistral (#19533 ) Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>	2025-06-12 22:28:12 -07:00
Li, Jiang	6458721108	[CPU] Refine default config for the CPU backend (#19539 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-06-13 13:27:39 +08:00
Hyogeun Oh (오효근)	bb4a0decef	[Misc] Correct broken docs link (#19553 ) Signed-off-by: Zerohertz <ohg3417@gmail.com>	2025-06-12 22:27:13 -07:00
Reid	c707cfc12e	[doc] fix incorrect link (#19586 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-13 04:26:09 +00:00
Aaron Pham	7b3c9ff91d	[Doc] uses absolute links for structured outputs (#19582 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-06-13 03:35:17 +00:00
qizixi	c68698b326	[Bugfix] Fix EAGLE vocab embedding for multimodal target model (#19570 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-06-12 23:09:19 -04:00
Varun Sundar Rabindranath	e3b12667d4	[BugFix] : Fix Batched DeepGemm Experts (#19515 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-06-12 20:43:02 -06:00

1 2 3 4 5 ...

7204 Commits All Branches Search

7204 Commits

All Branches