Robert Shaw
3c6fd286b4
updated
...
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-03 18:29:58 +00:00
bnellnm
c1909e7e8c
[Kernels] MoE refactor ( #19636 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: ElizaWszola <ewszola@redhat.com>
2025-07-02 06:08:27 -07:00
Kebe
b1c1fe35a5
[Misc] remove redundant char ( #20287 )
...
Signed-off-by: Kebe <mail@kebe7jun.com>
2025-07-01 15:33:22 +08:00
czhu-cohere
9909726d2a
Enable ZP Support for Machete ( #20268 )
...
Signed-off-by: czhu-cohere <conway.zhu@cohere.com>
2025-07-01 07:12:20 +00:00
Reid
167aca45cb
[Misc] Use collapsible blocks for benchmark examples. ( #20017 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-26 03:35:16 -07:00
Ekagra Ranjan
9502c38138
[Benchmark][Bug] Fix multiple bugs in bench and add args to spec_decode offline ( #20083 )
2025-06-25 22:06:27 -07:00
Wentao Ye
879f69bed3
[Refactor] Remove duplicate `ceil_div` ( #20023 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-06-25 05:19:09 +00:00
Wentao Ye
a6c4b87fbc
Revert "[Feature] Integrate new deepgemm ( #19820 )" ( #20049 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-06-24 19:45:22 -07:00
Wentao Ye
c6e3bba8e6
[Feature] Integrate new deepgemm ( #19820 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-06-24 12:51:56 -07:00
d.transposed
c635c5f744
[Misc][Benchmarking] Add variable request-rate ("ramp-up") to the benchmarking client. ( #19423 )
...
Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-06-24 18:41:49 +00:00
Reid
3014c920da
add some examples for other benchmark scripts ( #19893 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-24 05:57:46 +00:00
Chenyaaang
ee5ad8d2c5
[Misc][Tools][Benchmark] Add profile to autotune script ( #19711 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com>
2025-06-24 00:59:41 +00:00
22quinn
4671ac6e2a
[Bugfix][Benchmark] Fix Marlin benchmark ( #19929 )
2025-06-24 07:25:12 +09:00
Wang, Yi
202c5df935
[Benchmark] fix request loss if "ping" is returned ( #19535 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-06-22 07:21:04 +00:00
Brayden Zhong
5aa4a015ce
[Benchmark] Fix `Value of type "SampleRequest" is not indexable` ( #18032 )
...
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-06-19 21:28:55 -07:00
Robert Shaw
10d82f9ac5
[Benchmark][Bugfix] Fix Dataset Length Calculation ( #19868 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-06-19 18:30:41 -07:00
afeldman-nm
dfada85eee
[Frontend] Expose custom args in OpenAI APIs ( #16862 )
...
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-06-18 17:41:11 -07:00
Wentao Ye
ffb2cd6b54
[Perf] Optimize `moe_align_block_size` CUDA kernel ( #19572 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-06-17 11:49:26 -07:00
Wentao Ye
3d330c4c09
[Benchmark] Refactor benchmark script for fp8 & int8 ( #19627 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-06-15 15:15:37 +08:00
Reid
6fa718a460
[Misc] Modularize CLI Argument Parsing in Benchmark Scripts ( #19593 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-14 16:54:52 +08:00
Wentao Ye
b6efafd9e4
[Perf] Vectorize static / dynamic INT8 quant kernels ( #19233 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-06-12 06:51:41 -07:00
Tianyu Guo
4589b94032
[Bugfix] Fix benchmark_moe.py ( #19016 )
...
Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn>
2025-06-09 18:04:36 -07:00
Lifans
4e4f63ad45
[Nit][Benchmark]Fix example in benchmark_serving_structured_output.py ( #19311 )
...
Signed-off-by: Lifan Shen <lifans@meta.com>
2025-06-07 18:25:38 +08:00
ElizaWszola
84166fee97
[Kernel] Integrate CUTLASS MoE kernel with PPLX ( #18762 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-06-06 18:26:11 -07:00
Chenyaaang
441b65d8c7
[Misc][Tools][Benchmark] Fix and improve auto tune script ( #19163 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com>
2025-06-06 23:31:19 +00:00
Benjamin Chislett
3465b87ef8
[Bugfix] Fix EAGLE vocab embedding construction for Llama 70B ( #19033 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
2025-06-05 19:10:08 -07:00
Chiyue Wei
61059bee40
[Hardware][NVIDIA] FP4 MoE kernel optimization ( #19110 )
...
Signed-off-by: Chiyue Wei <chiyuew@nvidia.com>
Co-authored-by: Chiyue Wei <chiyuew@nvidia.com>
2025-06-05 09:48:26 -07:00
Huy Do
0678b52251
Handle non-serializable objects when dumping benchmark results ( #19114 )
2025-06-04 22:40:04 -07:00
Ekagra Ranjan
135cf55cd1
[V1][Spec Decode][Ngram] 1.35x gain -> 1.95x gain on InstructCoder with prompt fix ( #18971 )
2025-06-03 15:26:33 -07:00
Simon Mo
02f0c7b220
[Misc] Add SPDX-FileCopyrightText ( #19100 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
Ekagra Ranjan
bbfa0c61d1
[Misc][Benchmark] Add support for CustomDataset ( #18511 )
2025-05-31 19:07:38 +00:00
Michael Goin
f49239cb45
Benchmark script for fp8 vs bf16 gemm ( #17126 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-05-30 10:56:11 -06:00
Rabi Mishra
6acb7a6285
[Misc]Fix benchmarks/README.md for speculative decoding ( #18897 )
...
Signed-off-by: rabi <ramishra@redhat.com>
2025-05-30 07:58:04 +00:00
Cyrus Leung
1aa2f81b43
[Misc] Update type annotation for rotary embedding `base` ( #18914 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-30 10:17:01 +08:00
Duyi-Wang
b169d5f7b6
[Misc][Tools][Benchmark] Add benchmark_serving supports for llama.cpp. ( #18692 )
...
Signed-off-by: Duyi-Wang <duyi.wang@intel.com>
2025-05-29 20:02:08 +08:00
Divakar Verma
774c5fde30
[V1] fix torch profiling for V1 offline scenarios ( #18445 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2025-05-28 04:16:30 +00:00
cascade
aaa4ac1c95
Disable prefix cache by default for benchmark ( #18639 )
...
Signed-off-by: cascade812 <cascade812@outlook.com>
2025-05-27 20:06:34 +08:00
Calvin Chen
4693a3438c
[Doc] cleanup deprecated flag for doc ( #18715 )
...
Signed-off-by: calvin chen <120380290@qq.com>
2025-05-27 07:12:02 +00:00
Cyrus Leung
82e2339b06
[Doc] Move examples and further reorganize user guide ( #18666 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-26 07:38:04 -07:00
Feng XiaoLong
4fc1bf813a
[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking ( #18454 )
...
Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com>
Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>
2025-05-23 16:16:26 -07:00
Teruaki Ishizaki
4be2255c81
[Bugfix][Benchmarks] Fix a benchmark of deepspeed-mii backend to use api_key ( #17291 )
...
Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com>
2025-05-23 12:30:47 +08:00
Chenheli Hua
04eb88dc80
Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. ( #18569 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
2025-05-23 01:59:18 +00:00
Hosang
dd5fa7e04f
[ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 ( #17004 )
...
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>
2025-05-21 08:35:00 -07:00
Lain
e23564cb70
use ceil_div in cutlass block scaling shape check ( #17918 )
2025-05-16 03:02:58 -07:00
Harry Mellor
009d9e7590
Convert `benchmarks` to `ruff format` ( #18068 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-13 13:43:29 +00:00
Russell Bryant
23b3134eb5
[Benchmarks] Refactor run_structured_output_benchmarks.sh ( #17722 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-05-13 01:47:29 -07:00
Brayden Zhong
891b9d33de
[Fix] Benchmark `"EngineClient" has no attribute "model_config"` ( #17976 )
...
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-11 22:55:53 -07:00
Pavani Majety
0c0fdae84f
[Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model ( #16362 )
2025-05-09 16:24:41 -07:00
xsank
0a9bbaa104
[Misc] support model prefix & add deepseek vl2 tiny fused moe config ( #17763 )
...
Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com>
Co-authored-by: 唯勤 <xsank.mz@alibaba-inc.com>
2025-05-08 07:50:22 +00:00
d.transposed
d456aea71f
[Misc] Add Next Edit Prediction (NEP) datasets support in `benchmark_serving.py` ( #16839 )
...
Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Signed-off-by: dtransposed <>
Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
2025-05-06 15:38:45 -04:00