vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Lu Fang	55dcce91df	Upstream Llama4 Support to Main (#16113 ) Signed-off-by: Aston Zhang <22279212+astonzhang@users.noreply.github.com> Signed-off-by: Chris Thi <chris.c.thi@gmail.com> Signed-off-by: drisspg <drisspguessous@gmail.com> Signed-off-by: Jon Swenson <jmswen@gmail.com> Signed-off-by: Keyun Tong <tongkeyun@gmail.com> Signed-off-by: Lu Fang <fanglu@meta.com> Signed-off-by: Xiaodong Wang <xdwang@meta.com> Signed-off-by: Yang Chen <yangche@fb.com> Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Signed-off-by: Yong Hoon Shin <yhshin@meta.com> Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Signed-off-by: Lu Fang <lufang@fb.com> Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: Lucia Fang <fanglu@fb.com> Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Lu Fang <fanglu@fb.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-07 08:06:27 -07:00
Hyesoo Yang	ba10801961	[Benchmark] Add sampling parameters to benchmark_serving. (#16022 ) Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>	2025-04-06 12:30:35 +08:00
Ziji Shi (Steven)	95862f7b4d	[Benchmark][Doc] Update throughput benchmark and README (#15998 ) Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-04-04 09:39:02 -07:00
Ziji Shi (Steven)	06f21ce7a5	[Benchmark] Add AIMO Dataset to Benchmark (#15955 ) Signed-off-by: Ziji Shi <shi.ziji.sm@gmail.com> Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com>	2025-04-03 06:09:18 +00:00
Brayden Zhong	252937806c	[Bugfix][Benchmarks] Ensure `async_request_deepspeed_mii` uses the OpenAI choices key (#15926 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-04-02 02:19:35 -07:00
Li Wang	aa557e6422	[Benchmark]Fix error message (#15866 ) Signed-off-by: wangli <wangli858794774@gmail.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2025-04-02 01:32:24 -07:00
bnellnm	e59ca942f5	Add option to use DeepGemm contiguous grouped gemm kernel for fused MoE operations. (#13932 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-04-01 12:07:43 -04:00
Jennifer Zhao	effc5d24fa	[Benchmark] Update Vision Arena Dataset and HuggingFaceDataset Setup (#15748 ) Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>	2025-03-31 15:38:58 +08:00
Woosuk Kwon	70e132244a	[Minor] Remove TGI launching script (#15646 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-28 09:30:08 -07:00
Chen Xia	e7f720ea56	[Misc]add coding benchmark for speculative decoding (#15303 ) Signed-off-by: CXIAAAAA <cxia0209@gmail.com>	2025-03-28 10:47:05 +08:00
ElizaWszola	9239bf718e	[Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972 ) Signed-off-by: ElizaWszola <eliza@neuralmagic.com> Signed-off-by: ElizaWszola <ewszola@redhat.com> Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com>	2025-03-27 00:54:44 +00:00
Tyler Michael Smith	23114d3364	[Misc] Warn about v0 in benchmark_paged_attn.py (#15495 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-03-25 20:31:04 -07:00
DefTruth	f90d34b498	[Misc] Add tuned R1 w8a8 and MoE configs for NVIDIA L20 (#15322 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-03-23 01:10:10 -07:00
Russell Bryant	1f16b7fe74	[Core][V0] Add guidance backend for structured output (#14589 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <lohuynh@microsoft.com> Co-authored-by: Michal Moskal <michal@moskal.me> Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2025-03-19 21:33:51 -07:00
Jennifer Zhao	b88be22165	[Benchmark] Allow oversample request in benchmark dataset (#15170 ) Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>	2025-03-20 12:32:58 +08:00
Wang, Yi	40828ce5fe	fix "Total generated tokens:" is 0 if using --backend tgi and --endpo… (#14673 ) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2025-03-19 20:56:16 -07:00
Aaron Pham	6c5a3195db	[Misc][Benchmark] Add support for different `tokenizer_mode` (#15040 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-03-19 14:56:50 +00:00
Varun Sundar Rabindranath	400d483e87	[Kernels] LoRA - Retire SGMV and BGMV Kernels (#14685 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-18 09:47:53 +00:00
Simon Mo	583a9778e0	[Benchmark] Do not save detailed info to json by default (#14879 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-03-16 21:48:11 -07:00
Roger Wang	3453b964a3	[Misc][Doc] Minor benchmark README update (#14874 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-03-16 09:46:17 +08:00
Li Wang	09269b3127	[BugFix]Fix performance serving benchmark when enable profiling (#14737 ) Signed-off-by: wangli <wangli858794774@gmail.com>	2025-03-14 07:02:05 +00:00
Jennifer Zhao	a6e0d096dd	[Feature] Add visionarena offline support for benchmark_throughput (#14654 ) Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com> Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com> Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com> Co-authored-by: Jennifer Zhao <JenZhao@users.noreply.github.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2025-03-14 04:07:54 +00:00
Jee Jee Li	a73122de96	[Bugfix] fix benchmark moe (#14653 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-13 16:12:42 +08:00
Jennifer Zhao	4a42b9f5d6	[Doc] Update benchmarks README (#14646 ) Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com> Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2025-03-11 19:23:04 -07:00
Jeff Daily	a1c8f3796c	dynamic distpatch of fp8 kernels (#14245 ) Signed-off-by: Jeff Daily <jeff.daily@amd.com>	2025-03-11 10:54:56 -04:00
Russell Bryant	08a1a1121d	benchmarks: simplify test jsonschema (#14567 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-11 13:39:30 +00:00
Russell Bryant	432d6dad15	Fix typo in benchmark_serving_structured_output.py (#14566 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-10 14:58:58 -07:00
Varun Sundar Rabindranath	5ff0d32580	[V1] LoRA - Add triton kernels for V1 (#13096 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-10 17:27:53 -04:00
Harry Mellor	3b352a2f92	Correct capitalisation: `VLLM` -> `vLLM` (#14562 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-10 16:36:21 +00:00
Jennifer Zhao	1253b15774	[Feature] Consolidate performance benchmark datasets (#14036 ) Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-03-10 07:23:11 +00:00
Russell Bryant	9085aabd62	[benchmarks] Add option to use unique jsonschema for each request (#14457 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-08 06:36:39 -08:00
Jeremy Arnold	58abe35455	[Benchmarks] Make detokenization optional in benchmark scripts (#11697 ) Signed-off-by: Jeremy Arnold <Jeremy.Arnold@amd.com>	2025-03-07 08:09:00 -08:00
Aaron Pham	80e9afb5bc	[V1][Core] Support for Structured Outputs (#12388 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-03-07 07:19:11 -08:00
Aleksandr Malyshev	0ca3b8e01c	[BUGFIX] Skip tokenization support for throughput benchmark (#12712 ) Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu> Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>	2025-03-07 02:51:47 -08:00
Brayden Zhong	c34eeec58d	[Bugfix] Correctly call `cudaProfilerStop` in benchmarks script (#14183 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-03-07 00:42:49 +00:00
Daniel Li	ad60bbb2b2	[Doc] Fix a typo (#14385 )	2025-03-06 16:31:52 -08:00
Michael Goin	ca100c90fe	Add benchmark for DeepGEMM and vLLM Block FP8 Dense GEMM (#13917 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-05 17:08:51 -08:00
Vincent	a4f1ee35d6	Deprecate `best_of` Sampling Parameter in anticipation for vLLM V1 (#13997 ) Signed-off-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com> Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-05 20:22:43 +00:00
Jee Jee Li	7bab4bb048	[Misc] Add Qwen2MoeForCausalLM moe tuning support (#14276 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-05 23:11:29 +08:00
Michael Goin	f78c0be80a	Fix benchmark_moe.py tuning for CUDA devices (#14164 )	2025-03-03 21:11:03 -08:00
Divakar Verma	bb5b640359	[core] moe fp8 block quant tuning support (#14068 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-03-04 01:30:23 +00:00
TJian	848a6438ae	[ROCm] Faster Custom Paged Attention kernels (#12348 )	2025-03-03 09:24:45 -08:00
Harry Mellor	cf069aa8aa	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
YajieWang	6a92ff93e1	[Misc][Kernel]: Add GPTQAllSpark Quantization (#12931 )	2025-02-28 22:30:59 -08:00
Jee Jee Li	6a84164add	[Bugfix] Add file lock for ModelScope download (#14060 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-01 06:10:28 +00:00
Brayden Zhong	ec8a5e5386	[Misc]: Add support for goodput on guided benchmarking + TPOT calculation refactor (#13736 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-02-26 19:06:47 +08:00
Jee Jee Li	5157338ed9	[Misc] Improve LoRA spelling (#13831 )	2025-02-25 23:43:01 -08:00
Jongseok Park	781096e385	Expert Parallelism (EP) Support for DeepSeek V2 (#12583 )	2025-02-24 07:33:20 -08:00
Huy Do	e7ef74e26e	Fix some issues with benchmark data output (#13641 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-02-24 10:23:18 +08:00
Roger Wang	9bebc9512f	[Misc] Deprecate `--dataset` from `benchmark_serving.py` (#13708 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-02-23 13:32:20 +00:00
Cyrus Leung	7f6bae561c	[CI/Build] Fix pre-commit errors (#13696 )	2025-02-22 00:31:26 -08:00
Robin	8aca27fa11	[Bugfix] Fix benchmark script bug: inaccurate stats for vllm backend when max_model_len < input_len + output_len (#13691 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-02-22 14:10:38 +08:00
Huy Do	45186834a0	Run v1 benchmark and integrate with PyTorch OSS benchmark database (#13068 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-02-17 08:16:32 +00:00
Keyun Tong	3ee696a63d	[RFC][vllm-API] Support tokenizer registry for customized tokenizer in vLLM (#12518 ) Signed-off-by: Keyun Tong <tongkeyun@gmail.com>	2025-02-12 12:25:58 +08:00
Woosuk Kwon	58047c6f04	[Benchmark] Add BurstGPT to benchmark_serving (#13063 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2025-02-10 21:25:30 -08:00
Cyrus Leung	8a69e0e20e	[CI/Build] Auto-fix Markdown files (#12941 )	2025-02-08 04:25:15 -08:00
Varun Sundar Rabindranath	7e1837676a	[misc] Add LoRA to benchmark_serving (#12898 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-02-08 17:15:44 +08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Tyler Michael Smith	cfa134d247	[Bugfix/CI] Fixup benchmark_moe.py (#12562 ) Fixes `is_marlin` not being passed into `get_default_config` Also allow `--tensor-parallel-size` in addition to `-tp` and `--tp-size` Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-01 13:41:35 +08:00
Lucas Wilkinson	9798b2fb00	[Kernel] Update `cutlass_scaled_mm` to support 2d group (blockwise) scaling (#11868 )	2025-01-30 18:33:00 -08:00
Divakar Verma	1c1bb0bbf2	[Misc][MoE] add Deepseek-V3 moe tuning support (#12558 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-01-30 00:47:30 +00:00
Harry Mellor	823ab79633	Update `pre-commit` hooks (#12475 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-27 17:23:08 -07:00
Junichi Sato	3bb8e2c9a2	[Misc] Enable proxy support in benchmark script (#12356 ) Signed-off-by: Junichi Sato <junichi.sato@sbintuitions.co.jp>	2025-01-24 14:58:26 +00:00
Roger Wang	3c818bdb42	[Misc] Use VisionArena Dataset for VLM Benchmarking (#12389 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-01-24 00:22:04 -08:00
Junichi Sato	9726ad676d	[Misc] Fix OpenAI API Compatibility Issues in Benchmark Script (#12357 ) Signed-off-by: Junichi Sato <junichi.sato@sbintuitions.co.jp>	2025-01-23 17:02:13 -05:00
Gregory Shtrasberg	e97f802b2d	[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2025-01-23 18:04:03 +00:00
Nick Hill	222a9dc350	[Benchmark] More accurate TPOT calc in `benchmark_serving.py` (#12288 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-01-22 13:46:14 +08:00
Divakar Verma	2acba47d9b	[bugfix] moe tuning. rm is_navi() (#12273 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-01-21 22:47:32 +00:00
gujing	936db119ed	benchmark_serving support --served-model-name param (#12109 ) Signed-off-by: zibai <zibai.gj@alibaba-inc.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2025-01-19 09:59:56 +00:00
Divakar Verma	8027a72461	[ROCm][MoE] moe tuning support for rocm (#12049 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-01-17 14:49:16 +08:00
Varun Sundar Rabindranath	5fd24ec02e	[misc] Add LoRA kernel micro benchmarks (#11579 )	2025-01-16 15:51:40 +00:00
elijah	c6db21313c	bugfix: Fix signature mismatch in benchmark's `get_tokenizer` function (#11982 ) Signed-off-by: elijah <f1renze.142857@gmail.com>	2025-01-13 15:22:07 +00:00
minmin	8a579408f3	[Misc] Update benchmark_prefix_caching.py fixed example usage (#11920 ) Signed-off-by: Ren MinMin <renmm6@chinaunicom.cn> Co-authored-by: Ren MinMin <renmm6@chinaunicom.cn>	2025-01-10 20:39:22 +00:00
Kuntai Du	5959564f94	Doc fix in `benchmark_long_document_qa_throughput.py` (#11933 ) Signed-off-by: Kuntai Du <kuntai@uchicago.edu>	2025-01-10 23:51:43 +08:00
Ye (Charlotte) Qi	1d967acb45	[Bugfix] fix beam search input errors and latency benchmark script (#11875 ) Signed-off-by: Ye Qi <yeq@meta.com> Co-authored-by: yeq <yeq@devgpu004.lla3.facebook.com>	2025-01-09 17:36:39 +08:00
Divakar Verma	4d29e91be8	[Misc] sort torch profiler table by kernel timing (#11813 )	2025-01-08 10:57:04 +08:00
Yihua Cheng	0c6f998554	[Benchmark] Add benchmark script for CPU offloading (#11533 ) Signed-off-by: ApostaC <yihua98@uchicago.edu> Co-authored-by: KuntaiDu <kuntai@uchicago.edu>	2025-01-01 00:10:55 +00:00
Jiaxin Shan	fc601665eb	[Misc] Update disaggregation benchmark scripts and test logs (#11456 ) Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>	2024-12-25 06:58:48 +00:00
Varun Sundar Rabindranath	98356735ac	[misc] benchmark_throughput : Add LoRA (#11267 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-12-19 15:43:16 +08:00
Dipika Sikka	60508ffda9	[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995 ) Co-authored-by: Faraz Shahsavan <faraz.shahsavan@gmail.com> Co-authored-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2024-12-18 09:57:16 -05:00
Roger Wang	02222a0256	[Misc] Kernel Benchmark for `RMSNorm` (#11241 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Xiaoyu Zhang <BBuf@users.noreply.github.com>	2024-12-17 06:57:02 +00:00
Alexander Matveev	238c0d93b4	[Misc] Add tokenizer_mode param to benchmark_serving.py (#11174 ) Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>	2024-12-13 16:19:10 +00:00
Luka Govedič	30870b4f66	[torch.compile] Dynamic fp8 + rms_norm fusion (#10906 ) Signed-off-by: luka <luka@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-12-13 03:19:23 +00:00
Chendi.Xue	82eb5ea8f3	Benchmark serving structured output (#10880 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-12-04 16:28:21 -05:00
Chendi.Xue	381ac93bb5	[Benchmark] Benchmark structured output with datasets (#10557 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Chendi Xue <chendi.xue@intel.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2024-12-03 17:21:06 -07:00
Michael Goin	4433195ab7	[Bugfix] Prevent benchmark_throughput.py from using duplicated random prompts (#10753 )	2024-12-03 02:26:15 +00:00
Kuntai Du	0590ec3fd9	[Core] Implement disagg prefill by StatelessProcessGroup (#10502 ) This PR provides initial support for single-node disaggregated prefill in 1P1D scenario. Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Co-authored-by: ApostaC <yihua98@uchicago.edu> Co-authored-by: YaoJiayi <120040070@link.cuhk.edu.cn>	2024-12-01 19:01:00 -06:00
Roger Wang	c11f172187	[Misc] Adding `MMMU-Pro` vision dataset to serving benchmark (#10804 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2024-12-01 08:47:05 +00:00
Wang, Yi	8a93a598d9	fix the issue that len(tokenizer(prompt)["input_ids"]) > prompt_len (#10524 ) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2024-11-21 11:15:36 +00:00
ElizaWszola	b00b33d77e	[Model][Quantization] HQQ support through Marlin kernel expansion (#9766 ) Signed-off-by: ElizaWszola <eliza@neuralmagic.com>	2024-11-19 13:31:12 -08:00
Ricky Xu	90a6c759ca	[misc] partial prefix & random input generation benchmark (#9929 ) Signed-off-by: rickyx <rickyx@anyscale.com>	2024-11-18 15:39:14 -08:00
Lucas Wilkinson	96d999fbe8	[Kernel] Initial Machete W4A8 support + Refactors (#9855 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2024-11-18 12:59:29 -07:00
Jaehyun An	8b6725b0cf	[Misc] Update benchmark to support image_url file or http (#10287 ) Signed-off-by: rbbang <anjaehyun87@gmail.com>	2024-11-16 18:15:40 +08:00
Cyrus Leung	f4c2187e29	[Misc] Fix typo in #5895 (#10145 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-08 09:07:01 +00:00
DearPlanet	ad39bd640c	[Bugfix] Add error handling when server cannot respond any valid tokens (#5895 )	2024-11-08 04:58:37 +00:00
Cody Yu	201fc07730	[V1] Prefix caching (take 2) (#9972 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2024-11-07 17:34:44 -08:00
Russell Bryant	3be5b26a76	[CI/Build] Add shell script linting using shellcheck (#7925 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-11-07 18:17:29 +00:00
Atlas	a62bc0109c	[Misc] Add Gamma-Distribution Request Generation Support for Serving Benchmark. (#10105 ) Signed-off-by: Mozhou <spli161006@gmail.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-11-07 11:20:30 +00:00
Aaron Pham	21063c11c7	[CI/Build] drop support for Python 3.8 EOL (#8464 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2024-11-06 07:11:55 +00:00
lkchen	d2e80332a7	[Feature] Update benchmark_throughput.py to support image input (#9851 ) Signed-off-by: Linkun Chen <github+anyscale@lkchen.net> Co-authored-by: Linkun Chen <github+anyscale@lkchen.net>	2024-11-05 19:30:02 +00:00

1 2 3 4 5 ...

332 Commits