Ekagra Ranjan
|
135cf55cd1
|
[V1][Spec Decode][Ngram] 1.35x gain -> 1.95x gain on InstructCoder with prompt fix (#18971)
|
2025-06-03 15:26:33 -07:00 |
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
Ekagra Ranjan
|
bbfa0c61d1
|
[Misc][Benchmark] Add support for CustomDataset (#18511)
|
2025-05-31 19:07:38 +00:00 |
Michael Goin
|
f49239cb45
|
Benchmark script for fp8 vs bf16 gemm (#17126)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-30 10:56:11 -06:00 |
Rabi Mishra
|
6acb7a6285
|
[Misc]Fix benchmarks/README.md for speculative decoding (#18897)
Signed-off-by: rabi <ramishra@redhat.com>
|
2025-05-30 07:58:04 +00:00 |
Cyrus Leung
|
1aa2f81b43
|
[Misc] Update type annotation for rotary embedding `base` (#18914)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-30 10:17:01 +08:00 |
Duyi-Wang
|
b169d5f7b6
|
[Misc][Tools][Benchmark] Add benchmark_serving supports for llama.cpp. (#18692)
Signed-off-by: Duyi-Wang <duyi.wang@intel.com>
|
2025-05-29 20:02:08 +08:00 |
Divakar Verma
|
774c5fde30
|
[V1] fix torch profiling for V1 offline scenarios (#18445)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-05-28 04:16:30 +00:00 |
cascade
|
aaa4ac1c95
|
Disable prefix cache by default for benchmark (#18639)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-05-27 20:06:34 +08:00 |
Calvin Chen
|
4693a3438c
|
[Doc] cleanup deprecated flag for doc (#18715)
Signed-off-by: calvin chen <120380290@qq.com>
|
2025-05-27 07:12:02 +00:00 |
Cyrus Leung
|
82e2339b06
|
[Doc] Move examples and further reorganize user guide (#18666)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-26 07:38:04 -07:00 |
Feng XiaoLong
|
4fc1bf813a
|
[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454)
Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com>
Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>
|
2025-05-23 16:16:26 -07:00 |
Teruaki Ishizaki
|
4be2255c81
|
[Bugfix][Benchmarks] Fix a benchmark of deepspeed-mii backend to use api_key (#17291)
Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com>
|
2025-05-23 12:30:47 +08:00 |
Chenheli Hua
|
04eb88dc80
|
Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. (#18569)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-05-23 01:59:18 +00:00 |
Hosang
|
dd5fa7e04f
|
[ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 (#17004)
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>
|
2025-05-21 08:35:00 -07:00 |
Lain
|
e23564cb70
|
use ceil_div in cutlass block scaling shape check (#17918)
|
2025-05-16 03:02:58 -07:00 |
Harry Mellor
|
009d9e7590
|
Convert `benchmarks` to `ruff format` (#18068)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-13 13:43:29 +00:00 |
Russell Bryant
|
23b3134eb5
|
[Benchmarks] Refactor run_structured_output_benchmarks.sh (#17722)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-13 01:47:29 -07:00 |
Brayden Zhong
|
891b9d33de
|
[Fix] Benchmark `"EngineClient" has no attribute "model_config"` (#17976)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-11 22:55:53 -07:00 |
Pavani Majety
|
0c0fdae84f
|
[Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model (#16362)
|
2025-05-09 16:24:41 -07:00 |
xsank
|
0a9bbaa104
|
[Misc] support model prefix & add deepseek vl2 tiny fused moe config (#17763)
Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com>
Co-authored-by: 唯勤 <xsank.mz@alibaba-inc.com>
|
2025-05-08 07:50:22 +00:00 |
d.transposed
|
d456aea71f
|
[Misc] Add Next Edit Prediction (NEP) datasets support in `benchmark_serving.py` (#16839)
Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Signed-off-by: dtransposed <>
Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
|
2025-05-06 15:38:45 -04:00 |
Mengqing Cao
|
f9bc5a0693
|
[Bugfix] Fix triton import with local TritonPlaceholder (#17446)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-05-06 17:53:09 +08:00 |
Mikhail Podvitskii
|
dc47ba32f8
|
[Bugfix] Fixed prompt length for random dataset (#17408)
Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com>
|
2025-05-06 07:00:08 +00:00 |
Russell Bryant
|
d3efde8176
|
[Benchmarks] Remove invalid option under V1 engine (#17651)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-05 16:30:22 -04:00 |
Xiaodong Wang
|
9352cdb56d
|
[Hardware][AMD] Improve OAM device ID + llama4 Maverick MOE tuning (#16263)
Signed-off-by: Lu Fang <lufang@fb.com>
Co-authored-by: Lu Fang <lufang@fb.com>
|
2025-05-02 19:44:19 +00:00 |
Caleb_Du
|
3e887d2e0c
|
permute/unpermute kernel for moe optimization (#14568)
Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>
|
2025-05-02 11:31:55 -07:00 |
Chenyaaang
|
9b70e2b4c1
|
[Misc][Tools][Benchmark] Publish script to auto tune server parameters (#17207)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-05-01 19:53:03 +00:00 |
Teruaki Ishizaki
|
86a1f67a3b
|
[Bugfix][Benchmarks] Allow benchmark of deepspeed-mii backend to select a model (#17285)
Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com>
|
2025-05-01 11:54:51 +00:00 |
Benjamin Chislett
|
34120f5acd
|
[V1][Feature] Enable Speculative Decoding with Structured Outputs (#14702)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
|
2025-04-30 00:02:10 +00:00 |
Ekagra Ranjan
|
cfe4532093
|
[Benchmark] Add single turn MTBench to Serving Bench (#17202)
|
2025-04-28 16:46:15 -07:00 |
Michael Goin
|
8fc88d63f1
|
[Model] Add tuned triton fused_moe configs for Qwen3Moe (#17328)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-28 15:20:24 -07:00 |
Cyrus Leung
|
93a126fbc7
|
[Misc] Make cached tokenizer pickle-compatible (#17048)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-27 13:05:00 +08:00 |
Harry Mellor
|
423e9f1cbe
|
Use Transformers helper `get_text_config()` instead of checking for `text_config` (#17105)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-25 08:47:35 -07:00 |
Lucas Wilkinson
|
881f735827
|
[Misc] Benchmark Serving Script Support Appending Results (#17028)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-04-24 22:53:55 -07:00 |
Mengqing Cao
|
2f54045508
|
[Bugfix][Misc] Use TritonPlaceholderModule to defensively import triton (#15099)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-04-24 22:51:02 -07:00 |
Reid
|
db2f8d915c
|
[V1] Update structured output (#16812)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-23 23:57:17 -07:00 |
Chenyaaang
|
83d933718c
|
[Core][V1][TPU] Enable structured decoding on TPU V1 (#16499)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-04-22 18:05:23 -06:00 |
Lei Wang
|
8d32dc603d
|
[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS (#6036)
Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>
Co-authored-by: xinyuxiao <xinyuxiao2024@gmail.com>
|
2025-04-22 09:01:36 +01:00 |
Kartik Ramesh
|
3b34fd5273
|
Raise error for data-parallel with benchmark_throughput (#16737)
Signed-off-by: Kartik Ramesh <kartikx2000@gmail.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-04-21 23:51:43 +08:00 |
Nicolò Lucchesi
|
9d4ca19d50
|
[Misc] Benchmarks for audio models (#16505)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-19 02:24:14 -07:00 |
Jennifer Zhao
|
63d2705edb
|
[Benchmark][Bugfix] Fix SonnetDataset default values in benchmark_throughput.py (#16556)
|
2025-04-13 17:20:26 -07:00 |
Chenyaaang
|
d544d141ec
|
update benchmark_serving_structured_output to include auto backend (#16438)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-04-11 12:25:52 +08:00 |
Alexey Belyakov
|
3e397a9484
|
check input length of sonnet samples (#16423)
Signed-off-by: alexey-belyakov <alexey.belyakov@intel.com>
|
2025-04-11 10:15:06 +08:00 |
WWW
|
268c325078
|
Fix range_ratio Bug in RandomDataset (#16126)
Signed-off-by: jadewang21 <jadewangcn@outlook.com>
|
2025-04-10 15:31:17 -07:00 |
look
|
7cd0bd7212
|
[Bugfix] Fix output token length check logic (#16419)
Signed-off-by: look <eeslook@163.com>
|
2025-04-10 20:16:48 +00:00 |
Chenyaaang
|
5fbab20e02
|
[Bugfix] Fix bug when dataset is json (#15899)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-04-10 18:35:41 +00:00 |
Chenyaaang
|
417bcefbae
|
fix sonnet dataset sample when prefix len is very small (#16379)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-04-10 05:35:07 +00:00 |
Michael Goin
|
b2ce859bd2
|
Fix `benchmark_throughput.py --backend=hf` (#16352)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-09 19:09:28 +00:00 |
yihong
|
04149cce27
|
[BugFix] fix some typos found by typos. (#16314)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-04-09 03:43:59 -07:00 |