vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Michael Goin	6d18ed2a2e	Update docker docs with ARM CUDA cross-compile (#19037 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2025-06-03 08:21:53 +00:00
Chen Zhang	f32fcd9444	[v1][KVCacheManager] Rename BlockHashType to BlockHash (#19015 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-03 08:01:48 +00:00
Lu Fang	d32aa2e670	[Bugfix] Use cmake 3.26.1 instead of 3.26 to avoid build failure (#19019 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-06-03 00:16:17 -07:00
Michael Goin	cc977286e7	Reduce logs in CLI scripts and plugin loader (#18970 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-03 06:00:45 +00:00
Reid	17430e3653	[bugfix] small fix logic issue (#18999 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-03 05:35:12 +00:00
汪志鹏	1282bd812e	Add tarsier model support (#18985 ) Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>	2025-06-03 13:13:13 +08:00
Rui Qiao	bdce64f236	[V1] Support DP with Ray (#18779 )	2025-06-02 21:15:13 -07:00
Gregory Shtrasberg	9e6f61e8c3	[ROCm][Build] Clean up the ROCm build (#19040 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-06-02 20:47:47 -07:00
Li, Jiang	8655f47f37	[CPU][CI] Re-enable the CPU CI tests (#19046 ) Signed-off-by: jiang.li <jiang1.li@intel.com>	2025-06-02 20:46:47 -07:00
Concurrensee	4ce42f9204	Adding "LoRA Test %N" to AMD production tests (#18929 ) Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>	2025-06-02 20:46:44 -07:00
Tyler Michael Smith	8a57872b2a	[Bugfix][EP+DP] Use pplx-kernel internode instead of intranode (#19034 ) Signed-off-by: Tyler Michael Smith <tysmith@redhat.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-06-03 11:36:51 +08:00
Hyogeun Oh (오효근)	5bc1ad6cee	[Doc] Remove duplicate TOCs during MkDocs migration (#19021 ) Signed-off-by: Zerohertz <ohg3417@gmail.com>	2025-06-02 19:49:48 -07:00
Siyuan Liu	9112b443a0	[Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com> Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>	2025-06-03 00:06:20 +00:00
Calvin Chen	c57d577e8d	add an absolute path for run.sh (#18258 ) Signed-off-by: calvin chen <120380290@qq.com>	2025-06-02 19:38:23 +00:00
Gregory Shtrasberg	ca2f6b9c30	[Bugfix][Model] Attempt to fix eagle in V0. (#18978 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-06-02 08:15:53 -07:00
Frαnçois	20133cfee2	[Frontend] enable custom logging for the uvicorn server (OpenAI API server) (#18403 ) Signed-off-by: François Paupier <francois.paupier@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-06-02 15:04:23 +00:00
jennyyyyzhen	ebb1ec9318	[Model] enable data parallel for Llama4 vision encoder (#18368 ) Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com> Co-authored-by: yZhen <yZhen@fb.com> Co-authored-by: yzhen <yzhen@devgpu093.cco2.facebook.com>	2025-06-02 19:22:54 +08:00
Reid	5b168b6d7a	[doc] add pytest tips (#19010 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-02 11:07:26 +00:00
22quinn	9760fd8f6a	[Core] Support inplace model weights loading (#18745 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-02 17:38:50 +08:00
Robert Shaw	b9f61e1387	[Bugfix][Nixl] Fix DP Metadata Handshake (#19008 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-06-02 03:30:41 +00:00
zhrrr	d6fd3a33b8	[Misc] reuse num_tokens_across_dp of get_dp_padding to avoid unnecessary dp all reduce in set_forward_context (#18935 ) Signed-off-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>	2025-06-01 19:41:18 +00:00
Reid	432ec9926e	[doc] wrong output (#19000 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-01 11:26:14 +00:00
Nick Hill	2b102d51ad	[BugFix] Fix incorrect metrics shutdown error log message (#18992 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-01 11:42:23 +08:00
rongfu.leng	aa54a7bf7b	[BugFix] fix data parallel construct ipv6 url addres (#18991 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-06-01 11:42:10 +08:00
Michael Goin	2ad6194a02	Let max_num_batched_tokens use human_readable_int for large numbers (#18968 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-01 11:41:29 +08:00
Reid	c594cbf565	[doc] small fix - mkdocs (#18996 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-31 20:23:43 -07:00
Isotr0py	a35ca765a5	[LoRA] Support dynamically initialize `packed_modules_mapping` for VLM with arbitrary components (#18987 ) Signed-off-by: isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-01 11:06:57 +08:00
Cyrus Leung	6aa8f9a4e7	[Core] Rework dtype resolution (#18751 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-06-01 11:04:23 +08:00
Benjamin Chislett	1bc86a3da1	[Bugfix] Fix EAGLE3 broken logits (#18909 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-05-31 19:58:07 -07:00
Ekagra Ranjan	bbfa0c61d1	[Misc][Benchmark] Add support for CustomDataset (#18511 )	2025-05-31 19:07:38 +00:00
Reid	20079c6e36	[Misc] add return token strs for tokenize (#18941 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-31 18:00:11 +00:00
Nick Hill	9a1b9b99d7	[BugFix] Fix multi-node offline data-parallel (#18981 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>	2025-05-31 08:34:52 -07:00
ptarasiewiczNV	8bf507d766	[P/D] NixlConnector use cache device index for memory registration (#18969 ) Signed-off-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>	2025-05-31 11:19:18 -04:00
Charlie Fu	306d60401d	[ROCm][Kernel] Add gfx950 support for skinny gemms (#18010 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-05-31 07:40:05 -07:00
Fred Reiss	f2c3f66d59	[Bugfix] Fix for issue 17396 (#18773 ) Signed-off-by: Fred Reiss <frreiss@us.ibm.com>	2025-05-31 11:58:17 +00:00
vllmellm	0f5e0d567e	[FEAT][ROCm] Add AITER grouped topk for DeepSeekV2 (#18825 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-05-31 03:39:31 -07:00
Luka Govedič	c55d804672	[BugFix] Pydantic part 2 (#18911 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-05-31 03:39:28 -07:00
Reid	749f5bdd38	[doc] fix the list rendering issue - security.md (#18982 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-31 10:39:21 +00:00
Satyajith Chilappagari	2a50ef5760	[Neuron] Add Multi-Modal model support for Neuron (#18921 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com> Co-authored-by: Ashraf Mahgoub <ashymahg@amazon.com> Co-authored-by: Rohith Nallamaddi <nalrohit@amazon.com> Co-authored-by: FeliciaLuo <luof@amazon.com> Co-authored-by: Elaine Zhao <elaineyz@amazon.com>	2025-05-31 10:39:11 +00:00
Lucia Fang	b8b904795d	fix security issue of logging llm output (#18980 ) Signed-off-by: Lu Fang <fanglu@fb.com> Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>	2025-05-31 10:38:56 +00:00
Chauncey	ba5111f237	[Bugfix]: Fix the incompatibility issue with Structured Outputs when Thinking is disabled (#18879 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-05-31 09:20:54 +00:00
Yong Hoon Shin	1e123529d7	[Misc] Fix estimated max model len msg (#18966 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-05-31 16:43:44 +08:00
Pooya Davoodi	dff80b0e42	[Frontend] Add rerank support to run_batch endpoint (#16278 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>	2025-05-31 07:40:01 +00:00
Yu Guo	7782464a17	create util function for batched arange (#18937 )	2025-05-31 13:50:38 +08:00
Lukas Geiger	0f71e24034	[Docs] Correct multiprocessing design doc (#18964 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-05-31 01:30:15 +00:00
Will Eaton	1dab4d5718	Tool parser regex timeout handling (#18960 ) Signed-off-by: Will Eaton <weaton@redhat.com>	2025-05-30 21:02:54 +00:00
rongfu.leng	7f21e8052b	[Misc] add group_size is -1 in awq quantization (#18910 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-05-30 17:34:22 +00:00
Isotr0py	5a8641638a	[VLM] Add PP support and fix GPTQ inference for Ovis models (#18958 ) Signed-off-by: isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-30 17:11:44 +00:00
Michael Goin	f49239cb45	Benchmark script for fp8 vs bf16 gemm (#17126 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-30 10:56:11 -06:00
Nick Hill	2dbe8c0774	[Perf] API-server scaleout with many-to-many server-engine comms (#17546 )	2025-05-30 08:17:00 -07:00

... 5 6 7 8 9 ...

7204 Commits All Branches Search

7204 Commits

All Branches