Commit Graph

296 Commits

Author SHA1 Message Date
Reid db2f8d915c
[V1] Update structured output (#16812)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-23 23:57:17 -07:00
Chenyaaang 83d933718c
[Core][V1][TPU] Enable structured decoding on TPU V1 (#16499)
Signed-off-by: Chenyaaang <chenyangli@google.com>
2025-04-22 18:05:23 -06:00
Lei Wang 8d32dc603d
[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS (#6036)
Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>
Co-authored-by: xinyuxiao <xinyuxiao2024@gmail.com>
2025-04-22 09:01:36 +01:00
Kartik Ramesh 3b34fd5273
Raise error for data-parallel with benchmark_throughput (#16737)
Signed-off-by: Kartik Ramesh <kartikx2000@gmail.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2025-04-21 23:51:43 +08:00
Nicolò Lucchesi 9d4ca19d50
[Misc] Benchmarks for audio models (#16505)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-04-19 02:24:14 -07:00
Jennifer Zhao 63d2705edb
[Benchmark][Bugfix] Fix SonnetDataset default values in benchmark_throughput.py (#16556) 2025-04-13 17:20:26 -07:00
Chenyaaang d544d141ec
update benchmark_serving_structured_output to include auto backend (#16438)
Signed-off-by: Chenyaaang <chenyangli@google.com>
2025-04-11 12:25:52 +08:00
Alexey Belyakov 3e397a9484
check input length of sonnet samples (#16423)
Signed-off-by: alexey-belyakov <alexey.belyakov@intel.com>
2025-04-11 10:15:06 +08:00
WWW 268c325078
Fix range_ratio Bug in RandomDataset (#16126)
Signed-off-by: jadewang21 <jadewangcn@outlook.com>
2025-04-10 15:31:17 -07:00
look 7cd0bd7212
[Bugfix] Fix output token length check logic (#16419)
Signed-off-by: look <eeslook@163.com>
2025-04-10 20:16:48 +00:00
Chenyaaang 5fbab20e02
[Bugfix] Fix bug when dataset is json (#15899)
Signed-off-by: Chenyaaang <chenyangli@google.com>
2025-04-10 18:35:41 +00:00
Chenyaaang 417bcefbae
fix sonnet dataset sample when prefix len is very small (#16379)
Signed-off-by: Chenyaaang <chenyangli@google.com>
2025-04-10 05:35:07 +00:00
Michael Goin b2ce859bd2
Fix `benchmark_throughput.py --backend=hf` (#16352)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-09 19:09:28 +00:00
yihong 04149cce27
[BugFix] fix some typos found by typos. (#16314)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-04-09 03:43:59 -07:00
Lu Fang 55dcce91df
Upstream Llama4 Support to Main (#16113)
Signed-off-by: Aston Zhang <22279212+astonzhang@users.noreply.github.com>
Signed-off-by: Chris Thi <chris.c.thi@gmail.com>
Signed-off-by: drisspg <drisspguessous@gmail.com>
Signed-off-by: Jon Swenson <jmswen@gmail.com>
Signed-off-by: Keyun Tong <tongkeyun@gmail.com>
Signed-off-by: Lu Fang <fanglu@meta.com>
Signed-off-by: Xiaodong Wang <xdwang@meta.com>
Signed-off-by: Yang Chen <yangche@fb.com>
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Signed-off-by: Lu Fang <lufang@fb.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-07 08:06:27 -07:00
Hyesoo Yang ba10801961
[Benchmark] Add sampling parameters to benchmark_serving. (#16022)
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
2025-04-06 12:30:35 +08:00
Ziji Shi (Steven) 95862f7b4d
[Benchmark][Doc] Update throughput benchmark and README (#15998)
Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-04-04 09:39:02 -07:00
Ziji Shi (Steven) 06f21ce7a5
[Benchmark] Add AIMO Dataset to Benchmark (#15955)
Signed-off-by: Ziji Shi <shi.ziji.sm@gmail.com>
Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com>
2025-04-03 06:09:18 +00:00
Brayden Zhong 252937806c
[Bugfix][Benchmarks] Ensure `async_request_deepspeed_mii` uses the OpenAI choices key (#15926)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-04-02 02:19:35 -07:00
Li Wang aa557e6422
[Benchmark]Fix error message (#15866)
Signed-off-by: wangli <wangli858794774@gmail.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2025-04-02 01:32:24 -07:00
bnellnm e59ca942f5
Add option to use DeepGemm contiguous grouped gemm kernel for fused MoE operations. (#13932)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-04-01 12:07:43 -04:00
Jennifer Zhao effc5d24fa
[Benchmark] Update Vision Arena Dataset and HuggingFaceDataset Setup (#15748)
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>
2025-03-31 15:38:58 +08:00
Woosuk Kwon 70e132244a
[Minor] Remove TGI launching script (#15646)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-28 09:30:08 -07:00
Chen Xia e7f720ea56
[Misc]add coding benchmark for speculative decoding (#15303)
Signed-off-by: CXIAAAAA <cxia0209@gmail.com>
2025-03-28 10:47:05 +08:00
ElizaWszola 9239bf718e
[Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972)
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com>
2025-03-27 00:54:44 +00:00
Tyler Michael Smith 23114d3364
[Misc] Warn about v0 in benchmark_paged_attn.py (#15495)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-03-25 20:31:04 -07:00
DefTruth f90d34b498
[Misc] Add tuned R1 w8a8 and MoE configs for NVIDIA L20 (#15322)
Signed-off-by: DefTruth <qiustudent_r@163.com>
2025-03-23 01:10:10 -07:00
Russell Bryant 1f16b7fe74
[Core][V0] Add guidance backend for structured output (#14589)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <lohuynh@microsoft.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
2025-03-19 21:33:51 -07:00
Jennifer Zhao b88be22165
[Benchmark] Allow oversample request in benchmark dataset (#15170)
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>
2025-03-20 12:32:58 +08:00
Wang, Yi 40828ce5fe
fix "Total generated tokens:" is 0 if using --backend tgi and --endpo… (#14673)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-19 20:56:16 -07:00
Aaron Pham 6c5a3195db
[Misc][Benchmark] Add support for different `tokenizer_mode` (#15040)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-03-19 14:56:50 +00:00
Varun Sundar Rabindranath 400d483e87
[Kernels] LoRA - Retire SGMV and BGMV Kernels (#14685)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-03-18 09:47:53 +00:00
Simon Mo 583a9778e0
[Benchmark] Do not save detailed info to json by default (#14879)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-03-16 21:48:11 -07:00
Roger Wang 3453b964a3
[Misc][Doc] Minor benchmark README update (#14874)
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-03-16 09:46:17 +08:00
Li Wang 09269b3127
[BugFix]Fix performance serving benchmark when enable profiling (#14737)
Signed-off-by: wangli <wangli858794774@gmail.com>
2025-03-14 07:02:05 +00:00
Jennifer Zhao a6e0d096dd
[Feature] Add visionarena offline support for benchmark_throughput (#14654)
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Co-authored-by: Jennifer Zhao <JenZhao@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2025-03-14 04:07:54 +00:00
Jee Jee Li a73122de96
[Bugfix] fix benchmark moe (#14653)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-13 16:12:42 +08:00
Jennifer Zhao 4a42b9f5d6
[Doc] Update benchmarks README (#14646)
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2025-03-11 19:23:04 -07:00
Jeff Daily a1c8f3796c
dynamic distpatch of fp8 kernels (#14245)
Signed-off-by: Jeff Daily <jeff.daily@amd.com>
2025-03-11 10:54:56 -04:00
Russell Bryant 08a1a1121d
benchmarks: simplify test jsonschema (#14567)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-11 13:39:30 +00:00
Russell Bryant 432d6dad15
Fix typo in benchmark_serving_structured_output.py (#14566)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-10 14:58:58 -07:00
Varun Sundar Rabindranath 5ff0d32580
[V1] LoRA - Add triton kernels for V1 (#13096)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-03-10 17:27:53 -04:00
Harry Mellor 3b352a2f92
Correct capitalisation: `VLLM` -> `vLLM` (#14562)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-10 16:36:21 +00:00
Jennifer Zhao 1253b15774
[Feature] Consolidate performance benchmark datasets (#14036)
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-03-10 07:23:11 +00:00
Russell Bryant 9085aabd62
[benchmarks] Add option to use unique jsonschema for each request (#14457)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-08 06:36:39 -08:00
Jeremy Arnold 58abe35455
[Benchmarks] Make detokenization optional in benchmark scripts (#11697)
Signed-off-by: Jeremy Arnold <Jeremy.Arnold@amd.com>
2025-03-07 08:09:00 -08:00
Aaron Pham 80e9afb5bc
[V1][Core] Support for Structured Outputs (#12388)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-03-07 07:19:11 -08:00
Aleksandr Malyshev 0ca3b8e01c
[BUGFIX] Skip tokenization support for throughput benchmark (#12712)
Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu>
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
2025-03-07 02:51:47 -08:00
Brayden Zhong c34eeec58d
[Bugfix] Correctly call `cudaProfilerStop` in benchmarks script (#14183)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-03-07 00:42:49 +00:00
Daniel Li ad60bbb2b2
[Doc] Fix a typo (#14385) 2025-03-06 16:31:52 -08:00