vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Wang, Yi	202c5df935	[Benchmark] fix request loss if "ping" is returned (#19535 ) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-06-22 07:21:04 +00:00
Brayden Zhong	5aa4a015ce	[Benchmark] Fix `Value of type "SampleRequest" is not indexable` (#18032 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-06-19 21:28:55 -07:00
Robert Shaw	10d82f9ac5	[Benchmark][Bugfix] Fix Dataset Length Calculation (#19868 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2025-06-19 18:30:41 -07:00
afeldman-nm	dfada85eee	[Frontend] Expose custom args in OpenAI APIs (#16862 ) Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Signed-off-by: Andrew Feldman <afeldman@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-06-18 17:41:11 -07:00
Wentao Ye	ffb2cd6b54	[Perf] Optimize `moe_align_block_size` CUDA kernel (#19572 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-06-17 11:49:26 -07:00
Wentao Ye	3d330c4c09	[Benchmark] Refactor benchmark script for fp8 & int8 (#19627 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-15 15:15:37 +08:00
Reid	6fa718a460	[Misc] Modularize CLI Argument Parsing in Benchmark Scripts (#19593 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-14 16:54:52 +08:00
Wentao Ye	b6efafd9e4	[Perf] Vectorize static / dynamic INT8 quant kernels (#19233 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-12 06:51:41 -07:00
Tianyu Guo	4589b94032	[Bugfix] Fix benchmark_moe.py (#19016 ) Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn>	2025-06-09 18:04:36 -07:00
Lifans	4e4f63ad45	[Nit][Benchmark]Fix example in benchmark_serving_structured_output.py (#19311 ) Signed-off-by: Lifan Shen <lifans@meta.com>	2025-06-07 18:25:38 +08:00
ElizaWszola	84166fee97	[Kernel] Integrate CUTLASS MoE kernel with PPLX (#18762 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-06-06 18:26:11 -07:00
Chenyaaang	441b65d8c7	[Misc][Tools][Benchmark] Fix and improve auto tune script (#19163 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-06-06 23:31:19 +00:00
Benjamin Chislett	3465b87ef8	[Bugfix] Fix EAGLE vocab embedding construction for Llama 70B (#19033 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-06-05 19:10:08 -07:00
Chiyue Wei	61059bee40	[Hardware][NVIDIA] FP4 MoE kernel optimization (#19110 ) Signed-off-by: Chiyue Wei <chiyuew@nvidia.com> Co-authored-by: Chiyue Wei <chiyuew@nvidia.com>	2025-06-05 09:48:26 -07:00
Huy Do	0678b52251	Handle non-serializable objects when dumping benchmark results (#19114 )	2025-06-04 22:40:04 -07:00
Ekagra Ranjan	135cf55cd1	[V1][Spec Decode][Ngram] 1.35x gain -> 1.95x gain on InstructCoder with prompt fix (#18971 )	2025-06-03 15:26:33 -07:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Ekagra Ranjan	bbfa0c61d1	[Misc][Benchmark] Add support for CustomDataset (#18511 )	2025-05-31 19:07:38 +00:00
Michael Goin	f49239cb45	Benchmark script for fp8 vs bf16 gemm (#17126 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-30 10:56:11 -06:00
Rabi Mishra	6acb7a6285	[Misc]Fix benchmarks/README.md for speculative decoding (#18897 ) Signed-off-by: rabi <ramishra@redhat.com>	2025-05-30 07:58:04 +00:00
Cyrus Leung	1aa2f81b43	[Misc] Update type annotation for rotary embedding `base` (#18914 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-30 10:17:01 +08:00
Duyi-Wang	b169d5f7b6	[Misc][Tools][Benchmark] Add benchmark_serving supports for llama.cpp. (#18692 ) Signed-off-by: Duyi-Wang <duyi.wang@intel.com>	2025-05-29 20:02:08 +08:00
Divakar Verma	774c5fde30	[V1] fix torch profiling for V1 offline scenarios (#18445 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-05-28 04:16:30 +00:00
cascade	aaa4ac1c95	Disable prefix cache by default for benchmark (#18639 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-05-27 20:06:34 +08:00
Calvin Chen	4693a3438c	[Doc] cleanup deprecated flag for doc (#18715 ) Signed-off-by: calvin chen <120380290@qq.com>	2025-05-27 07:12:02 +00:00
Cyrus Leung	82e2339b06	[Doc] Move examples and further reorganize user guide (#18666 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-26 07:38:04 -07:00
Feng XiaoLong	4fc1bf813a	[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454 ) Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com> Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>	2025-05-23 16:16:26 -07:00
Teruaki Ishizaki	4be2255c81	[Bugfix][Benchmarks] Fix a benchmark of deepspeed-mii backend to use api_key (#17291 ) Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com>	2025-05-23 12:30:47 +08:00
Chenheli Hua	04eb88dc80	Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. (#18569 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-05-23 01:59:18 +00:00
Hosang	dd5fa7e04f	[ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 (#17004 ) Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>	2025-05-21 08:35:00 -07:00
Lain	e23564cb70	use ceil_div in cutlass block scaling shape check (#17918 )	2025-05-16 03:02:58 -07:00
Harry Mellor	009d9e7590	Convert `benchmarks` to `ruff format` (#18068 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 13:43:29 +00:00
Russell Bryant	23b3134eb5	[Benchmarks] Refactor run_structured_output_benchmarks.sh (#17722 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-13 01:47:29 -07:00
Brayden Zhong	891b9d33de	[Fix] Benchmark `"EngineClient" has no attribute "model_config"` (#17976 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-11 22:55:53 -07:00
Pavani Majety	0c0fdae84f	[Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model (#16362 )	2025-05-09 16:24:41 -07:00
xsank	0a9bbaa104	[Misc] support model prefix & add deepseek vl2 tiny fused moe config (#17763 ) Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com> Co-authored-by: 唯勤 <xsank.mz@alibaba-inc.com>	2025-05-08 07:50:22 +00:00
d.transposed	d456aea71f	[Misc] Add Next Edit Prediction (NEP) datasets support in `benchmark_serving.py` (#16839 ) Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Signed-off-by: dtransposed <> Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>	2025-05-06 15:38:45 -04:00
Mengqing Cao	f9bc5a0693	[Bugfix] Fix triton import with local TritonPlaceholder (#17446 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-05-06 17:53:09 +08:00
Mikhail Podvitskii	dc47ba32f8	[Bugfix] Fixed prompt length for random dataset (#17408 ) Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com>	2025-05-06 07:00:08 +00:00
Russell Bryant	d3efde8176	[Benchmarks] Remove invalid option under V1 engine (#17651 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-05 16:30:22 -04:00
Xiaodong Wang	9352cdb56d	[Hardware][AMD] Improve OAM device ID + llama4 Maverick MOE tuning (#16263 ) Signed-off-by: Lu Fang <lufang@fb.com> Co-authored-by: Lu Fang <lufang@fb.com>	2025-05-02 19:44:19 +00:00
Caleb_Du	3e887d2e0c	permute/unpermute kernel for moe optimization (#14568 ) Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>	2025-05-02 11:31:55 -07:00
Chenyaaang	9b70e2b4c1	[Misc][Tools][Benchmark] Publish script to auto tune server parameters (#17207 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-05-01 19:53:03 +00:00
Teruaki Ishizaki	86a1f67a3b	[Bugfix][Benchmarks] Allow benchmark of deepspeed-mii backend to select a model (#17285 ) Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com>	2025-05-01 11:54:51 +00:00
Benjamin Chislett	34120f5acd	[V1][Feature] Enable Speculative Decoding with Structured Outputs (#14702 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-04-30 00:02:10 +00:00
Ekagra Ranjan	cfe4532093	[Benchmark] Add single turn MTBench to Serving Bench (#17202 )	2025-04-28 16:46:15 -07:00
Michael Goin	8fc88d63f1	[Model] Add tuned triton fused_moe configs for Qwen3Moe (#17328 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-28 15:20:24 -07:00
Cyrus Leung	93a126fbc7	[Misc] Make cached tokenizer pickle-compatible (#17048 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-27 13:05:00 +08:00
Harry Mellor	423e9f1cbe	Use Transformers helper `get_text_config()` instead of checking for `text_config` (#17105 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-25 08:47:35 -07:00
Lucas Wilkinson	881f735827	[Misc] Benchmark Serving Script Support Appending Results (#17028 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-04-24 22:53:55 -07:00

1 2 3 4 5 ...

347 Commits