vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Richard Zou	84ec470fca	Improve "failed to get the hash of the compiled graph" error (#18956 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-05-30 15:00:54 +00:00
Russell Bryant	b29ca5c4d5	[Docs] Update SECURITY.md with link to our security guide (#18961 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-30 07:37:27 -07:00
Reid	ec6833c5e9	[doc] show the count for fork and watch (#18950 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-30 06:45:59 -07:00
Shawn Huang	e1fadf1197	[Feature] minicpm eagle support (#18943 ) Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com> Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com>	2025-05-30 06:45:56 -07:00
Daniele	43ff405b90	[CI/Build] remove regex from build dependencies (#18945 ) Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-05-30 04:02:50 -07:00
Carol Zheng	fba02e3bd1	[Bugfix][TPU] Fix tpu model runner testcase failure (#18810 ) Signed-off-by: Carol Zheng <cazheng@google.com>	2025-05-30 18:04:03 +08:00
Always-Naive	4577fc9abb	[Misc]Fix typo (#18947 )	2025-05-30 02:21:35 -07:00
Rabi Mishra	5f1d0c8118	[Bugfix][Failing Test] Fix test_vllm_port.py (#18618 ) Signed-off-by: rabi <ramishra@redhat.com>	2025-05-30 17:13:47 +08:00
Lukas Geiger	c3bb9f2331	[Model] Use in-place adds in SigLIP (#18922 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-05-30 17:12:59 +08:00
Reid	8f8900cee9	[doc] add mkdocs doc (#18930 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-30 07:58:44 +00:00
Rabi Mishra	6acb7a6285	[Misc]Fix benchmarks/README.md for speculative decoding (#18897 ) Signed-off-by: rabi <ramishra@redhat.com>	2025-05-30 07:58:04 +00:00
Cyrus Leung	4f4a6b844a	[Deprecation] Remove mean pooling default for `Qwen2EmbeddingModel` (#18913 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-30 06:53:37 +00:00
Michael Goin	4d0a1541be	[Bugfix] Remove NVFP4 scales assertions to fix load_format=dummy (#18861 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-30 13:37:36 +08:00
vllmellm	77b6e74fe2	[ROCm] Remove unnecessary assertion of max_model_len in ROCM_AITER_MLA attention backend. (#18938 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-05-29 22:33:17 -07:00
H	5acf828d99	[docs] fix: fix markdown syntax (#18927 )	2025-05-30 05:20:48 +00:00
iLeGend	3987e2ae96	[Model] Use AutoWeightsLoader for mamba2 (#18918 ) Signed-off-by: iLeGend <824040212@qq.com>	2025-05-30 04:50:10 +00:00
Chauncey	77164dad5e	[Bugfix] Consistent ascii handling in tool parsers (#18883 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-05-30 04:44:43 +00:00
Wenhua Cheng	3de3eadf5b	improve the robustness of parsing vlms config in AutoRound (#18894 ) Signed-off-by: wenhuach21 <wenhua.cheng@intel.com>	2025-05-29 19:24:47 -07:00
Carol Zheng	3132290a14	[TPU][CI/CD] Clean up docker for TPU tests. (#18926 ) Signed-off-by: Carol Zheng <cazheng@google.com>	2025-05-30 10:24:19 +08:00
Cyrus Leung	1aa2f81b43	[Misc] Update type annotation for rotary embedding `base` (#18914 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-30 10:17:01 +08:00
Michael Goin	d54af615d5	[Bugfix] Fix PP default fallback behavior for V1 (#18915 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-30 10:13:17 +08:00
Chengji Yao	a1cc9f33a3	[TPU] remove transpose ops in moe kernel (#18923 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-05-29 23:00:11 +00:00
Richard Zou	a521ef06e5	Use standalone_compile by default in torch >= 2.8.0 (#18846 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-05-30 06:41:58 +08:00
Will Eaton	64eaf5fe05	[P/D] NixlConnector DP fixes (#18903 ) Signed-off-by: Will Eaton <weaton@redhat.com>	2025-05-29 18:08:40 +00:00
Nick Hill	d1d61f3351	[BugFix] Make DP work with connector-delayed new requests (#18559 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Will Eaton <weaton@redhat.com>	2025-05-29 18:04:18 +00:00
Nicolò Lucchesi	32ce3cf7c9	[V1] Allocate kv_cache with stride order for V1 (#18775 ) Signed-off-by: nicklucche <nlucches@redhat.com>	2025-05-29 17:54:16 +00:00
CYJiang	d58f9c7f7a	[Misc] Remove duplicate init for self.vllm_config (#18896 ) Signed-off-by: googs1025 <googs1025@gmail.com>	2025-05-29 17:26:07 +00:00
Cyrus Leung	c29034037d	[Deprecation] Disallow pos-args other than `model` when initializing `LLM` (#18802 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-29 09:36:58 -07:00
Gregory Shtrasberg	1b7cfd5a36	[ROCm][V0][Attention] Revert to the previous FA triton kernel (#18226 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-05-29 12:13:18 -04:00
Gregory Shtrasberg	da4b69d0b4	[Attention][V1] Toggle for v1 attention backend (#18275 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-05-29 10:48:24 -04:00
Isotr0py	c9479b2920	[Bugfix] Fix the failing gte embedding test (#18720 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-29 07:39:25 -07:00
Hyogeun Oh (오효근)	6f2909405e	[Doc] Fix codeblocks formatting in LoRA adapters documentation (#18907 ) Signed-off-by: Zerohertz <ohg3417@gmail.com>	2025-05-29 07:38:55 -07:00
Duyi-Wang	b169d5f7b6	[Misc][Tools][Benchmark] Add benchmark_serving supports for llama.cpp. (#18692 ) Signed-off-by: Duyi-Wang <duyi.wang@intel.com>	2025-05-29 20:02:08 +08:00
Chenyaaang	f8977c233f	Fix an error in dummy weight loading for quantization models (#18855 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-05-29 03:07:20 -07:00
Luka Govedič	f274581f44	[BugFix] Update pydantic to fix error on python 3.10 (#18852 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-05-29 03:05:46 -07:00
Lukas Geiger	0b1447f890	[Bugfix] Ensure tensors are contiguous during serialisation (#18860 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-05-29 03:05:20 -07:00
Nicolò Lucchesi	24d0ef8970	[Misc] Replace TODO in serving transcription (#18895 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-05-29 02:58:14 -07:00
Jee Jee Li	7fcfd954ff	[Bugfix] Fix misleading information in the documentation (#18845 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-29 02:54:14 -07:00
Reid	e740d07f07	[doc] add CLI doc (#18871 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-29 09:51:36 +00:00
Michael Yao	a652e71dd0	[Doc] Remove redundant spaces from compatibility_matrix.md (#18891 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-05-29 02:51:20 -07:00
Jee Jee Li	34d6c447c4	[LoRA] Add LoRA support for InternVL (#18842 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-29 08:46:24 +00:00
Satyajith Chilappagari	972eddf7c9	[Neuron] Add multi-LoRA support for Neuron. (#18284 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>	2025-05-29 16:41:22 +08:00
Brent Salisbury	fd7bb88d72	Fixes a dead link in nightly benchmark readme (#18856 ) Signed-off-by: Brent Salisbury <bsalisbu@redhat.com>	2025-05-29 04:41:39 +00:00
Yikun Jiang	3c49dbdd03	Skip device and quant Pydantic validation to make plugin device work (#18843 ) Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-05-28 20:12:30 -07:00
aws-elaineyz	1661a9c28f	[Doc][Neuron] Update documentation for Neuron (#18868 ) Signed-off-by: Elaine Zhao <elaineyz@amazon.com>	2025-05-28 19:44:01 -07:00
Chengji Yao	8e882ffdc0	[Bugfix][TPU] fix moe custom kernel import (#18853 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-05-28 19:34:19 -07:00
Richard Zou	26b4fa45be	Add ability to use CUDAGraphs with use_inductor=False (#17345 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-05-29 10:16:52 +08:00
Maximilien de Bayser	515b413ebf	Prevent the cross-encoder logic from being applied to classification tasks (#18838 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-05-28 19:16:17 -07:00
Hongxia Yang	269d901734	[Bugfix][ROCm] fix the power of 2 exception from triton_unified_attention.py when running llama4 models and unit test fix (#18100 ) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-29 07:21:46 +08:00
Varun Sundar Rabindranath	7951d78738	[Core] Enable CUDA graphs for DP + All2All kernels (#18724 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-05-28 22:55:30 +00:00

... 6 7 8 9 10 ...

7204 Commits All Branches Search

7204 Commits

All Branches