vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Lain	e23564cb70	use ceil_div in cutlass block scaling shape check (#17918 )	2025-05-16 03:02:58 -07:00
Harry Mellor	009d9e7590	Convert `benchmarks` to `ruff format` (#18068 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 13:43:29 +00:00
Russell Bryant	23b3134eb5	[Benchmarks] Refactor run_structured_output_benchmarks.sh (#17722 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-13 01:47:29 -07:00
Brayden Zhong	891b9d33de	[Fix] Benchmark `"EngineClient" has no attribute "model_config"` (#17976 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-11 22:55:53 -07:00
Pavani Majety	0c0fdae84f	[Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model (#16362 )	2025-05-09 16:24:41 -07:00
xsank	0a9bbaa104	[Misc] support model prefix & add deepseek vl2 tiny fused moe config (#17763 ) Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com> Co-authored-by: 唯勤 <xsank.mz@alibaba-inc.com>	2025-05-08 07:50:22 +00:00
d.transposed	d456aea71f	[Misc] Add Next Edit Prediction (NEP) datasets support in `benchmark_serving.py` (#16839 ) Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Signed-off-by: dtransposed <> Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>	2025-05-06 15:38:45 -04:00
Mengqing Cao	f9bc5a0693	[Bugfix] Fix triton import with local TritonPlaceholder (#17446 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-05-06 17:53:09 +08:00
Mikhail Podvitskii	dc47ba32f8	[Bugfix] Fixed prompt length for random dataset (#17408 ) Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com>	2025-05-06 07:00:08 +00:00
Russell Bryant	d3efde8176	[Benchmarks] Remove invalid option under V1 engine (#17651 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-05 16:30:22 -04:00
Xiaodong Wang	9352cdb56d	[Hardware][AMD] Improve OAM device ID + llama4 Maverick MOE tuning (#16263 ) Signed-off-by: Lu Fang <lufang@fb.com> Co-authored-by: Lu Fang <lufang@fb.com>	2025-05-02 19:44:19 +00:00
Caleb_Du	3e887d2e0c	permute/unpermute kernel for moe optimization (#14568 ) Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>	2025-05-02 11:31:55 -07:00
Chenyaaang	9b70e2b4c1	[Misc][Tools][Benchmark] Publish script to auto tune server parameters (#17207 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-05-01 19:53:03 +00:00
Teruaki Ishizaki	86a1f67a3b	[Bugfix][Benchmarks] Allow benchmark of deepspeed-mii backend to select a model (#17285 ) Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com>	2025-05-01 11:54:51 +00:00
Benjamin Chislett	34120f5acd	[V1][Feature] Enable Speculative Decoding with Structured Outputs (#14702 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-04-30 00:02:10 +00:00
Ekagra Ranjan	cfe4532093	[Benchmark] Add single turn MTBench to Serving Bench (#17202 )	2025-04-28 16:46:15 -07:00
Michael Goin	8fc88d63f1	[Model] Add tuned triton fused_moe configs for Qwen3Moe (#17328 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-28 15:20:24 -07:00
Cyrus Leung	93a126fbc7	[Misc] Make cached tokenizer pickle-compatible (#17048 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-27 13:05:00 +08:00
Harry Mellor	423e9f1cbe	Use Transformers helper `get_text_config()` instead of checking for `text_config` (#17105 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-25 08:47:35 -07:00
Lucas Wilkinson	881f735827	[Misc] Benchmark Serving Script Support Appending Results (#17028 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-04-24 22:53:55 -07:00
Mengqing Cao	2f54045508	[Bugfix][Misc] Use TritonPlaceholderModule to defensively import triton (#15099 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-04-24 22:51:02 -07:00
Reid	db2f8d915c	[V1] Update structured output (#16812 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-23 23:57:17 -07:00
Chenyaaang	83d933718c	[Core][V1][TPU] Enable structured decoding on TPU V1 (#16499 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-04-22 18:05:23 -06:00
Lei Wang	8d32dc603d	[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS (#6036 ) Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com> Co-authored-by: xinyuxiao <xinyuxiao2024@gmail.com>	2025-04-22 09:01:36 +01:00
Kartik Ramesh	3b34fd5273	Raise error for data-parallel with benchmark_throughput (#16737 ) Signed-off-by: Kartik Ramesh <kartikx2000@gmail.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-04-21 23:51:43 +08:00
Nicolò Lucchesi	9d4ca19d50	[Misc] Benchmarks for audio models (#16505 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-19 02:24:14 -07:00
Jennifer Zhao	63d2705edb	[Benchmark][Bugfix] Fix SonnetDataset default values in benchmark_throughput.py (#16556 )	2025-04-13 17:20:26 -07:00
Chenyaaang	d544d141ec	update benchmark_serving_structured_output to include auto backend (#16438 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-04-11 12:25:52 +08:00
Alexey Belyakov	3e397a9484	check input length of sonnet samples (#16423 ) Signed-off-by: alexey-belyakov <alexey.belyakov@intel.com>	2025-04-11 10:15:06 +08:00
WWW	268c325078	Fix range_ratio Bug in RandomDataset (#16126 ) Signed-off-by: jadewang21 <jadewangcn@outlook.com>	2025-04-10 15:31:17 -07:00
look	7cd0bd7212	[Bugfix] Fix output token length check logic (#16419 ) Signed-off-by: look <eeslook@163.com>	2025-04-10 20:16:48 +00:00
Chenyaaang	5fbab20e02	[Bugfix] Fix bug when dataset is json (#15899 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-04-10 18:35:41 +00:00
Chenyaaang	417bcefbae	fix sonnet dataset sample when prefix len is very small (#16379 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-04-10 05:35:07 +00:00
Michael Goin	b2ce859bd2	Fix `benchmark_throughput.py --backend=hf` (#16352 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-09 19:09:28 +00:00
yihong	04149cce27	[BugFix] fix some typos found by typos. (#16314 ) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-04-09 03:43:59 -07:00
Lu Fang	55dcce91df	Upstream Llama4 Support to Main (#16113 ) Signed-off-by: Aston Zhang <22279212+astonzhang@users.noreply.github.com> Signed-off-by: Chris Thi <chris.c.thi@gmail.com> Signed-off-by: drisspg <drisspguessous@gmail.com> Signed-off-by: Jon Swenson <jmswen@gmail.com> Signed-off-by: Keyun Tong <tongkeyun@gmail.com> Signed-off-by: Lu Fang <fanglu@meta.com> Signed-off-by: Xiaodong Wang <xdwang@meta.com> Signed-off-by: Yang Chen <yangche@fb.com> Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Signed-off-by: Yong Hoon Shin <yhshin@meta.com> Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Signed-off-by: Lu Fang <lufang@fb.com> Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: Lucia Fang <fanglu@fb.com> Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Lu Fang <fanglu@fb.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-07 08:06:27 -07:00
Hyesoo Yang	ba10801961	[Benchmark] Add sampling parameters to benchmark_serving. (#16022 ) Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>	2025-04-06 12:30:35 +08:00
Ziji Shi (Steven)	95862f7b4d	[Benchmark][Doc] Update throughput benchmark and README (#15998 ) Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-04-04 09:39:02 -07:00
Ziji Shi (Steven)	06f21ce7a5	[Benchmark] Add AIMO Dataset to Benchmark (#15955 ) Signed-off-by: Ziji Shi <shi.ziji.sm@gmail.com> Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com>	2025-04-03 06:09:18 +00:00
Brayden Zhong	252937806c	[Bugfix][Benchmarks] Ensure `async_request_deepspeed_mii` uses the OpenAI choices key (#15926 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-04-02 02:19:35 -07:00
Li Wang	aa557e6422	[Benchmark]Fix error message (#15866 ) Signed-off-by: wangli <wangli858794774@gmail.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2025-04-02 01:32:24 -07:00
bnellnm	e59ca942f5	Add option to use DeepGemm contiguous grouped gemm kernel for fused MoE operations. (#13932 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-04-01 12:07:43 -04:00
Jennifer Zhao	effc5d24fa	[Benchmark] Update Vision Arena Dataset and HuggingFaceDataset Setup (#15748 ) Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>	2025-03-31 15:38:58 +08:00
Woosuk Kwon	70e132244a	[Minor] Remove TGI launching script (#15646 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-28 09:30:08 -07:00
Chen Xia	e7f720ea56	[Misc]add coding benchmark for speculative decoding (#15303 ) Signed-off-by: CXIAAAAA <cxia0209@gmail.com>	2025-03-28 10:47:05 +08:00
ElizaWszola	9239bf718e	[Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972 ) Signed-off-by: ElizaWszola <eliza@neuralmagic.com> Signed-off-by: ElizaWszola <ewszola@redhat.com> Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com>	2025-03-27 00:54:44 +00:00
Tyler Michael Smith	23114d3364	[Misc] Warn about v0 in benchmark_paged_attn.py (#15495 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-03-25 20:31:04 -07:00
DefTruth	f90d34b498	[Misc] Add tuned R1 w8a8 and MoE configs for NVIDIA L20 (#15322 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-03-23 01:10:10 -07:00
Russell Bryant	1f16b7fe74	[Core][V0] Add guidance backend for structured output (#14589 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <lohuynh@microsoft.com> Co-authored-by: Michal Moskal <michal@moskal.me> Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2025-03-19 21:33:51 -07:00
Jennifer Zhao	b88be22165	[Benchmark] Allow oversample request in benchmark dataset (#15170 ) Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>	2025-03-20 12:32:58 +08:00

1 2 3 4 5 ...

317 Commits