Commit Graph

70 Commits

Author SHA1 Message Date
Reid 6fa718a460
[Misc] Modularize CLI Argument Parsing in Benchmark Scripts (#19593)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-14 16:54:52 +08:00
Simon Mo 02f0c7b220
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
Divakar Verma 774c5fde30
[V1] fix torch profiling for V1 offline scenarios (#18445)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2025-05-28 04:16:30 +00:00
cascade aaa4ac1c95
Disable prefix cache by default for benchmark (#18639)
Signed-off-by: cascade812 <cascade812@outlook.com>
2025-05-27 20:06:34 +08:00
Harry Mellor 009d9e7590
Convert `benchmarks` to `ruff format` (#18068)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-13 13:43:29 +00:00
Jeremy Arnold 58abe35455
[Benchmarks] Make detokenization optional in benchmark scripts (#11697)
Signed-off-by: Jeremy Arnold <Jeremy.Arnold@amd.com>
2025-03-07 08:09:00 -08:00
Harry Mellor cf069aa8aa
Update deprecated Python 3.8 typing (#13971) 2025-03-02 17:34:51 -08:00
Huy Do e7ef74e26e
Fix some issues with benchmark data output (#13641)
Signed-off-by: Huy Do <huydhn@gmail.com>
2025-02-24 10:23:18 +08:00
Cyrus Leung 7f6bae561c
[CI/Build] Fix pre-commit errors (#13696) 2025-02-22 00:31:26 -08:00
Robin 8aca27fa11
[Bugfix] Fix benchmark script bug: inaccurate stats for vllm backend when max_model_len < input_len + output_len (#13691)
Signed-off-by: WangErXiao <863579016@qq.com>
2025-02-22 14:10:38 +08:00
Huy Do 45186834a0
Run v1 benchmark and integrate with PyTorch OSS benchmark database (#13068)
Signed-off-by: Huy Do <huydhn@gmail.com>
2025-02-17 08:16:32 +00:00
Russell Bryant e489ad7a21
[Misc] Add SPDX-License-Identifier headers to python source files (#12628)
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**

commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:18:24 2025 -0500

    Add SPDX license headers to python source files
    
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
    also be easily used by tools to help manage license compliance.
    
The Linux Foundation runs license scans against the codebase to help
ensure
    we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
    
    More information can be found on the SPDX site:
    
    - https://spdx.dev/learn/handling-license-info/
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com>

commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:36:32 2025 -0500

    Check for SPDX headers using pre-commit
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com>

---------

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-02 11:58:18 -08:00
Ye (Charlotte) Qi 1d967acb45
[Bugfix] fix beam search input errors and latency benchmark script (#11875)
Signed-off-by: Ye Qi <yeq@meta.com>
Co-authored-by: yeq <yeq@devgpu004.lla3.facebook.com>
2025-01-09 17:36:39 +08:00
Divakar Verma 4d29e91be8
[Misc] sort torch profiler table by kernel timing (#11813) 2025-01-08 10:57:04 +08:00
Jeremy Arnold cb6fdaa0a0
[Misc] Make benchmarks use EngineArgs (#9529) 2024-10-22 15:40:38 -07:00
Kuntai Du 81ede99ca4
[Core] Deprecating block manager v1 and make block manager v2 default (#8704)
Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
2024-10-17 11:38:15 -05:00
sroy745 f3a507f1d3
[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149) 2024-10-10 14:17:17 +08:00
youkaichao 18b296fdb2
[core] remove beam search from the core (#9105) 2024-10-07 05:47:04 +00:00
Cyrus Leung 3b00b9c26c
[Core] rename`PromptInputs` and `inputs` (#8876) 2024-09-26 20:35:15 -07:00
Simon Mo 4f1ba0844b
Revert "rename PromptInputs and inputs with backward compatibility (#8760) (#8810) 2024-09-25 10:36:26 -07:00
Cyrus Leung 28e1299e60
rename PromptInputs and inputs with backward compatibility (#8760) 2024-09-25 09:36:47 -07:00
Simon Mo 3185fb0cca
Revert "[Core] Rename `PromptInputs` to `PromptType`, and `inputs` to `prompt`" (#8750) 2024-09-24 05:45:20 +00:00
Cyrus Leung 0057894ef7
[Core] Rename `PromptInputs` and `inputs`(#8673) 2024-09-20 19:00:54 -07:00
Aarni Koskela 8baa454937
[Misc] Move device options to a single place (#8322) 2024-09-11 13:25:58 -07:00
Cyrus Leung 739b61a348
[Frontend] Refactor prompt processing (#4028)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-22 10:13:53 -07:00
Alexander Matveev 3476ed0809
[Core] Optimize block_manager_v2 vs block_manager_v1 (to make V2 default) (#5602) 2024-07-01 20:10:37 -07:00
Ilya Lavrenov 57f09a419c
[Hardware][Intel] OpenVINO vLLM backend (#5379) 2024-06-28 13:50:16 +00:00
Woo-Yeon Lee 2ce5d6688b
[Speculative Decoding] Support draft model on different tensor-parallel size than target model (#5414) 2024-06-25 09:56:06 +00:00
Michael Goin 8065a7e220
[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718) 2024-06-20 17:00:13 -06:00
DearPlanet d8714530d1
[Misc]Add param max-model-len in benchmark_latency.py (#5629) 2024-06-19 18:19:08 +08:00
Ronen Schaffer 7879f24dcc
[Misc] Add OpenTelemetry support (#4687)
This PR adds basic support for OpenTelemetry distributed tracing.
It includes changes to enable tracing functionality and improve monitoring capabilities.

I've also added a markdown with print-screens to guide users how to use this feature. You can find it here
2024-06-19 01:17:03 +09:00
Kuntai Du 9e4e6fe207
[CI] the readability of benchmarking and prepare for dashboard (#5571)
[CI] Improve the readability of performance benchmarking results and prepare for upcoming performance dashboard (#5571)
2024-06-17 11:41:08 -07:00
Kunshang Ji 728c4c8a06
[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (#3814)
Co-authored-by: Jiang Li <jiang1.li@intel.com>
Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com>
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-06-17 11:01:25 -07:00
Kuntai Du 319ad7f1d3
[CI/Build][Misc] Add CI that benchmarks vllm performance on those PRs with `perf-benchmarks` label (#5073)
Co-authored-by: simon-mo <simon.mo@hey.com>
2024-06-13 22:36:20 -07:00
Woosuk Kwon 1a8bfd92d5
[Hardware] Initial TPU integration (#5292) 2024-06-12 11:53:03 -07:00
Benjamin Kitor b3376e5c76
[Misc] Add args for selecting distributed executor to benchmarks (#5335) 2024-06-08 09:20:16 +08:00
Marut Pandya 616e600e0b
[Misc] add gpu_memory_utilization arg (#5079)
Signed-off-by: pandyamarut <pandyamarut@gmail.com>
2024-05-28 17:16:18 -07:00
Cyrus Leung 5ae5ed1e60
[Core] Consolidate prompt arguments to LLM engines (#4328)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-05-28 13:29:31 -07:00
Cody Yu a3a73ab069
[Misc] Load FP8 kv-cache scaling factors from checkpoints (#4893)
The 2nd PR for #4532.

This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).
2024-05-22 13:28:20 -07:00
Simon Mo f09edd8a25
Add JSON output support for benchmark_latency and benchmark_throughput (#4848) 2024-05-16 10:02:56 -07:00
Cody Yu 973617ae02
[Speculative decoding][Re-take] Enable TP>1 speculative decoding (#4840)
Co-authored-by: Cade Daniel <edacih@gmail.com>
Co-authored-by: Cade Daniel <cade@anyscale.com>
2024-05-16 00:53:51 -07:00
Michael Goin 53b018edcb
[Bugfix] Get available quantization methods from quantization registry (#4098) 2024-04-18 00:21:55 -07:00
SangBin Cho 67b4221a61
[Core][5/N] Fully working chunked prefill e2e (#3884) 2024-04-10 17:56:48 -07:00
Zedong Peng c013d32c75
[Benchmark] Add cpu options to bench scripts (#3915) 2024-04-09 21:30:03 -07:00
youkaichao e4be7d70bb
[CI/Benchmark] add more iteration and use median for robust latency benchmark (#3889) 2024-04-06 21:32:30 +00:00
Adrian Abeyta 2ff767b513
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290)
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu>
Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com>
Co-authored-by: guofangze <guofangze@kuaishou.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-04-03 14:15:55 -07:00
SangBin Cho b51c1cc9d2
[2/N] Chunked prefill data update (#3538) 2024-03-28 10:06:01 -07:00
AmadeusChan 1956931436
[Misc] add the "download-dir" option to the latency/throughput benchmarks (#3621) 2024-03-27 13:39:05 -07:00
Philipp Moritz 17c3103c56
Make it easy to profile workers with nsight (#3162)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-03-03 16:19:13 -08:00
Woosuk Kwon 72d3a30c63
[Minor] Fix benchmark_latency script (#2765) 2024-02-05 12:45:37 -08:00