vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Mengqing Cao	f9bc5a0693	[Bugfix] Fix triton import with local TritonPlaceholder (#17446 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-05-06 17:53:09 +08:00
Mikhail Podvitskii	dc47ba32f8	[Bugfix] Fixed prompt length for random dataset (#17408 ) Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com>	2025-05-06 07:00:08 +00:00
Russell Bryant	d3efde8176	[Benchmarks] Remove invalid option under V1 engine (#17651 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-05 16:30:22 -04:00
Xiaodong Wang	9352cdb56d	[Hardware][AMD] Improve OAM device ID + llama4 Maverick MOE tuning (#16263 ) Signed-off-by: Lu Fang <lufang@fb.com> Co-authored-by: Lu Fang <lufang@fb.com>	2025-05-02 19:44:19 +00:00
Caleb_Du	3e887d2e0c	permute/unpermute kernel for moe optimization (#14568 ) Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>	2025-05-02 11:31:55 -07:00
Chenyaaang	9b70e2b4c1	[Misc][Tools][Benchmark] Publish script to auto tune server parameters (#17207 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-05-01 19:53:03 +00:00
Teruaki Ishizaki	86a1f67a3b	[Bugfix][Benchmarks] Allow benchmark of deepspeed-mii backend to select a model (#17285 ) Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com>	2025-05-01 11:54:51 +00:00
Benjamin Chislett	34120f5acd	[V1][Feature] Enable Speculative Decoding with Structured Outputs (#14702 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-04-30 00:02:10 +00:00
Ekagra Ranjan	cfe4532093	[Benchmark] Add single turn MTBench to Serving Bench (#17202 )	2025-04-28 16:46:15 -07:00
Michael Goin	8fc88d63f1	[Model] Add tuned triton fused_moe configs for Qwen3Moe (#17328 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-28 15:20:24 -07:00
Cyrus Leung	93a126fbc7	[Misc] Make cached tokenizer pickle-compatible (#17048 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-27 13:05:00 +08:00
Harry Mellor	423e9f1cbe	Use Transformers helper `get_text_config()` instead of checking for `text_config` (#17105 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-25 08:47:35 -07:00
Lucas Wilkinson	881f735827	[Misc] Benchmark Serving Script Support Appending Results (#17028 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-04-24 22:53:55 -07:00
Mengqing Cao	2f54045508	[Bugfix][Misc] Use TritonPlaceholderModule to defensively import triton (#15099 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-04-24 22:51:02 -07:00
Reid	db2f8d915c	[V1] Update structured output (#16812 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-23 23:57:17 -07:00
Chenyaaang	83d933718c	[Core][V1][TPU] Enable structured decoding on TPU V1 (#16499 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-04-22 18:05:23 -06:00
Lei Wang	8d32dc603d	[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS (#6036 ) Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com> Co-authored-by: xinyuxiao <xinyuxiao2024@gmail.com>	2025-04-22 09:01:36 +01:00
Kartik Ramesh	3b34fd5273	Raise error for data-parallel with benchmark_throughput (#16737 ) Signed-off-by: Kartik Ramesh <kartikx2000@gmail.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-04-21 23:51:43 +08:00
Nicolò Lucchesi	9d4ca19d50	[Misc] Benchmarks for audio models (#16505 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-19 02:24:14 -07:00
Jennifer Zhao	63d2705edb	[Benchmark][Bugfix] Fix SonnetDataset default values in benchmark_throughput.py (#16556 )	2025-04-13 17:20:26 -07:00
Chenyaaang	d544d141ec	update benchmark_serving_structured_output to include auto backend (#16438 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-04-11 12:25:52 +08:00
Alexey Belyakov	3e397a9484	check input length of sonnet samples (#16423 ) Signed-off-by: alexey-belyakov <alexey.belyakov@intel.com>	2025-04-11 10:15:06 +08:00
WWW	268c325078	Fix range_ratio Bug in RandomDataset (#16126 ) Signed-off-by: jadewang21 <jadewangcn@outlook.com>	2025-04-10 15:31:17 -07:00
look	7cd0bd7212	[Bugfix] Fix output token length check logic (#16419 ) Signed-off-by: look <eeslook@163.com>	2025-04-10 20:16:48 +00:00
Chenyaaang	5fbab20e02	[Bugfix] Fix bug when dataset is json (#15899 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-04-10 18:35:41 +00:00
Chenyaaang	417bcefbae	fix sonnet dataset sample when prefix len is very small (#16379 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-04-10 05:35:07 +00:00
Michael Goin	b2ce859bd2	Fix `benchmark_throughput.py --backend=hf` (#16352 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-09 19:09:28 +00:00
yihong	04149cce27	[BugFix] fix some typos found by typos. (#16314 ) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-04-09 03:43:59 -07:00
Lu Fang	55dcce91df	Upstream Llama4 Support to Main (#16113 ) Signed-off-by: Aston Zhang <22279212+astonzhang@users.noreply.github.com> Signed-off-by: Chris Thi <chris.c.thi@gmail.com> Signed-off-by: drisspg <drisspguessous@gmail.com> Signed-off-by: Jon Swenson <jmswen@gmail.com> Signed-off-by: Keyun Tong <tongkeyun@gmail.com> Signed-off-by: Lu Fang <fanglu@meta.com> Signed-off-by: Xiaodong Wang <xdwang@meta.com> Signed-off-by: Yang Chen <yangche@fb.com> Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Signed-off-by: Yong Hoon Shin <yhshin@meta.com> Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Signed-off-by: Lu Fang <lufang@fb.com> Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: Lucia Fang <fanglu@fb.com> Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Lu Fang <fanglu@fb.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-07 08:06:27 -07:00
Hyesoo Yang	ba10801961	[Benchmark] Add sampling parameters to benchmark_serving. (#16022 ) Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>	2025-04-06 12:30:35 +08:00
Ziji Shi (Steven)	95862f7b4d	[Benchmark][Doc] Update throughput benchmark and README (#15998 ) Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-04-04 09:39:02 -07:00
Ziji Shi (Steven)	06f21ce7a5	[Benchmark] Add AIMO Dataset to Benchmark (#15955 ) Signed-off-by: Ziji Shi <shi.ziji.sm@gmail.com> Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com>	2025-04-03 06:09:18 +00:00
Brayden Zhong	252937806c	[Bugfix][Benchmarks] Ensure `async_request_deepspeed_mii` uses the OpenAI choices key (#15926 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-04-02 02:19:35 -07:00
Li Wang	aa557e6422	[Benchmark]Fix error message (#15866 ) Signed-off-by: wangli <wangli858794774@gmail.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2025-04-02 01:32:24 -07:00
bnellnm	e59ca942f5	Add option to use DeepGemm contiguous grouped gemm kernel for fused MoE operations. (#13932 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-04-01 12:07:43 -04:00
Jennifer Zhao	effc5d24fa	[Benchmark] Update Vision Arena Dataset and HuggingFaceDataset Setup (#15748 ) Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>	2025-03-31 15:38:58 +08:00
Woosuk Kwon	70e132244a	[Minor] Remove TGI launching script (#15646 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-28 09:30:08 -07:00
Chen Xia	e7f720ea56	[Misc]add coding benchmark for speculative decoding (#15303 ) Signed-off-by: CXIAAAAA <cxia0209@gmail.com>	2025-03-28 10:47:05 +08:00
ElizaWszola	9239bf718e	[Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972 ) Signed-off-by: ElizaWszola <eliza@neuralmagic.com> Signed-off-by: ElizaWszola <ewszola@redhat.com> Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com>	2025-03-27 00:54:44 +00:00
Tyler Michael Smith	23114d3364	[Misc] Warn about v0 in benchmark_paged_attn.py (#15495 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-03-25 20:31:04 -07:00
DefTruth	f90d34b498	[Misc] Add tuned R1 w8a8 and MoE configs for NVIDIA L20 (#15322 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-03-23 01:10:10 -07:00
Russell Bryant	1f16b7fe74	[Core][V0] Add guidance backend for structured output (#14589 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <lohuynh@microsoft.com> Co-authored-by: Michal Moskal <michal@moskal.me> Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2025-03-19 21:33:51 -07:00
Jennifer Zhao	b88be22165	[Benchmark] Allow oversample request in benchmark dataset (#15170 ) Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>	2025-03-20 12:32:58 +08:00
Wang, Yi	40828ce5fe	fix "Total generated tokens:" is 0 if using --backend tgi and --endpo… (#14673 ) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2025-03-19 20:56:16 -07:00
Aaron Pham	6c5a3195db	[Misc][Benchmark] Add support for different `tokenizer_mode` (#15040 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-03-19 14:56:50 +00:00
Varun Sundar Rabindranath	400d483e87	[Kernels] LoRA - Retire SGMV and BGMV Kernels (#14685 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-18 09:47:53 +00:00
Simon Mo	583a9778e0	[Benchmark] Do not save detailed info to json by default (#14879 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-03-16 21:48:11 -07:00
Roger Wang	3453b964a3	[Misc][Doc] Minor benchmark README update (#14874 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-03-16 09:46:17 +08:00
Li Wang	09269b3127	[BugFix]Fix performance serving benchmark when enable profiling (#14737 ) Signed-off-by: wangli <wangli858794774@gmail.com>	2025-03-14 07:02:05 +00:00
Jennifer Zhao	a6e0d096dd	[Feature] Add visionarena offline support for benchmark_throughput (#14654 ) Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com> Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com> Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com> Co-authored-by: Jennifer Zhao <JenZhao@users.noreply.github.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2025-03-14 04:07:54 +00:00
Jee Jee Li	a73122de96	[Bugfix] fix benchmark moe (#14653 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-13 16:12:42 +08:00
Jennifer Zhao	4a42b9f5d6	[Doc] Update benchmarks README (#14646 ) Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com> Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2025-03-11 19:23:04 -07:00
Jeff Daily	a1c8f3796c	dynamic distpatch of fp8 kernels (#14245 ) Signed-off-by: Jeff Daily <jeff.daily@amd.com>	2025-03-11 10:54:56 -04:00
Russell Bryant	08a1a1121d	benchmarks: simplify test jsonschema (#14567 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-11 13:39:30 +00:00
Russell Bryant	432d6dad15	Fix typo in benchmark_serving_structured_output.py (#14566 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-10 14:58:58 -07:00
Varun Sundar Rabindranath	5ff0d32580	[V1] LoRA - Add triton kernels for V1 (#13096 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-10 17:27:53 -04:00
Harry Mellor	3b352a2f92	Correct capitalisation: `VLLM` -> `vLLM` (#14562 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-10 16:36:21 +00:00
Jennifer Zhao	1253b15774	[Feature] Consolidate performance benchmark datasets (#14036 ) Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-03-10 07:23:11 +00:00
Russell Bryant	9085aabd62	[benchmarks] Add option to use unique jsonschema for each request (#14457 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-08 06:36:39 -08:00
Jeremy Arnold	58abe35455	[Benchmarks] Make detokenization optional in benchmark scripts (#11697 ) Signed-off-by: Jeremy Arnold <Jeremy.Arnold@amd.com>	2025-03-07 08:09:00 -08:00
Aaron Pham	80e9afb5bc	[V1][Core] Support for Structured Outputs (#12388 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-03-07 07:19:11 -08:00
Aleksandr Malyshev	0ca3b8e01c	[BUGFIX] Skip tokenization support for throughput benchmark (#12712 ) Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu> Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>	2025-03-07 02:51:47 -08:00
Brayden Zhong	c34eeec58d	[Bugfix] Correctly call `cudaProfilerStop` in benchmarks script (#14183 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-03-07 00:42:49 +00:00
Daniel Li	ad60bbb2b2	[Doc] Fix a typo (#14385 )	2025-03-06 16:31:52 -08:00
Michael Goin	ca100c90fe	Add benchmark for DeepGEMM and vLLM Block FP8 Dense GEMM (#13917 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-05 17:08:51 -08:00
Vincent	a4f1ee35d6	Deprecate `best_of` Sampling Parameter in anticipation for vLLM V1 (#13997 ) Signed-off-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com> Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-05 20:22:43 +00:00
Jee Jee Li	7bab4bb048	[Misc] Add Qwen2MoeForCausalLM moe tuning support (#14276 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-05 23:11:29 +08:00
Michael Goin	f78c0be80a	Fix benchmark_moe.py tuning for CUDA devices (#14164 )	2025-03-03 21:11:03 -08:00
Divakar Verma	bb5b640359	[core] moe fp8 block quant tuning support (#14068 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-03-04 01:30:23 +00:00
TJian	848a6438ae	[ROCm] Faster Custom Paged Attention kernels (#12348 )	2025-03-03 09:24:45 -08:00
Harry Mellor	cf069aa8aa	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
YajieWang	6a92ff93e1	[Misc][Kernel]: Add GPTQAllSpark Quantization (#12931 )	2025-02-28 22:30:59 -08:00
Jee Jee Li	6a84164add	[Bugfix] Add file lock for ModelScope download (#14060 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-01 06:10:28 +00:00
Brayden Zhong	ec8a5e5386	[Misc]: Add support for goodput on guided benchmarking + TPOT calculation refactor (#13736 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-02-26 19:06:47 +08:00
Jee Jee Li	5157338ed9	[Misc] Improve LoRA spelling (#13831 )	2025-02-25 23:43:01 -08:00
Jongseok Park	781096e385	Expert Parallelism (EP) Support for DeepSeek V2 (#12583 )	2025-02-24 07:33:20 -08:00
Huy Do	e7ef74e26e	Fix some issues with benchmark data output (#13641 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-02-24 10:23:18 +08:00
Roger Wang	9bebc9512f	[Misc] Deprecate `--dataset` from `benchmark_serving.py` (#13708 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-02-23 13:32:20 +00:00
Cyrus Leung	7f6bae561c	[CI/Build] Fix pre-commit errors (#13696 )	2025-02-22 00:31:26 -08:00
Robin	8aca27fa11	[Bugfix] Fix benchmark script bug: inaccurate stats for vllm backend when max_model_len < input_len + output_len (#13691 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-02-22 14:10:38 +08:00
Huy Do	45186834a0	Run v1 benchmark and integrate with PyTorch OSS benchmark database (#13068 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-02-17 08:16:32 +00:00
Keyun Tong	3ee696a63d	[RFC][vllm-API] Support tokenizer registry for customized tokenizer in vLLM (#12518 ) Signed-off-by: Keyun Tong <tongkeyun@gmail.com>	2025-02-12 12:25:58 +08:00
Woosuk Kwon	58047c6f04	[Benchmark] Add BurstGPT to benchmark_serving (#13063 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2025-02-10 21:25:30 -08:00
Cyrus Leung	8a69e0e20e	[CI/Build] Auto-fix Markdown files (#12941 )	2025-02-08 04:25:15 -08:00
Varun Sundar Rabindranath	7e1837676a	[misc] Add LoRA to benchmark_serving (#12898 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-02-08 17:15:44 +08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Tyler Michael Smith	cfa134d247	[Bugfix/CI] Fixup benchmark_moe.py (#12562 ) Fixes `is_marlin` not being passed into `get_default_config` Also allow `--tensor-parallel-size` in addition to `-tp` and `--tp-size` Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-01 13:41:35 +08:00
Lucas Wilkinson	9798b2fb00	[Kernel] Update `cutlass_scaled_mm` to support 2d group (blockwise) scaling (#11868 )	2025-01-30 18:33:00 -08:00
Divakar Verma	1c1bb0bbf2	[Misc][MoE] add Deepseek-V3 moe tuning support (#12558 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-01-30 00:47:30 +00:00
Harry Mellor	823ab79633	Update `pre-commit` hooks (#12475 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-27 17:23:08 -07:00
Junichi Sato	3bb8e2c9a2	[Misc] Enable proxy support in benchmark script (#12356 ) Signed-off-by: Junichi Sato <junichi.sato@sbintuitions.co.jp>	2025-01-24 14:58:26 +00:00
Roger Wang	3c818bdb42	[Misc] Use VisionArena Dataset for VLM Benchmarking (#12389 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-01-24 00:22:04 -08:00
Junichi Sato	9726ad676d	[Misc] Fix OpenAI API Compatibility Issues in Benchmark Script (#12357 ) Signed-off-by: Junichi Sato <junichi.sato@sbintuitions.co.jp>	2025-01-23 17:02:13 -05:00
Gregory Shtrasberg	e97f802b2d	[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2025-01-23 18:04:03 +00:00
Nick Hill	222a9dc350	[Benchmark] More accurate TPOT calc in `benchmark_serving.py` (#12288 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-01-22 13:46:14 +08:00
Divakar Verma	2acba47d9b	[bugfix] moe tuning. rm is_navi() (#12273 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-01-21 22:47:32 +00:00
gujing	936db119ed	benchmark_serving support --served-model-name param (#12109 ) Signed-off-by: zibai <zibai.gj@alibaba-inc.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2025-01-19 09:59:56 +00:00
Divakar Verma	8027a72461	[ROCm][MoE] moe tuning support for rocm (#12049 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-01-17 14:49:16 +08:00
Varun Sundar Rabindranath	5fd24ec02e	[misc] Add LoRA kernel micro benchmarks (#11579 )	2025-01-16 15:51:40 +00:00
elijah	c6db21313c	bugfix: Fix signature mismatch in benchmark's `get_tokenizer` function (#11982 ) Signed-off-by: elijah <f1renze.142857@gmail.com>	2025-01-13 15:22:07 +00:00

1 2 3 4 5 ...

360 Commits