Commit Graph

6944 Commits

Author SHA1 Message Date
Simon Mo 9a76ef07b9 Add pandas and datasets for benchmarks 2025-06-04 06:51:59 -07:00
Xu Wenqing 02658c2dfe
Add DeepSeek-R1-0528 function call chat template (#18874)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
2025-06-04 13:24:18 +00:00
Cyrus Leung 01dc9a76db
[CI/Build][Bugfix] Ensure compatibility with transformers 4.52 (#18678)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-06-04 04:49:20 -07:00
wang.yuqi 35cf32df30
Improve the output precision of embedding models (#19092) 2025-06-04 11:48:57 +00:00
Isotr0py 8711bc5e68
[Misc] Add packages for benchmark as extra dependency (#19089)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-04 04:18:48 -07:00
Seiji Eicher 2669a0d7b5
Fix ValueError: Missing value for tag key(s): model_name,engine. (#19113)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
2025-06-04 17:10:45 +08:00
Siyuan Liu 8e972d9c44
[TPU] Skip hanging tests (#19115)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
2025-06-04 01:43:00 -07:00
汪志鹏 3336c8cfbe
Fix #19130 (#19132)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
2025-06-04 01:42:06 -07:00
Woosuk Kwon b124e1085b
[Bugfix] Fix FA3 full cuda graph correctness (#19106)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-03 23:10:15 -07:00
Kaixi Hou 41aa578428
[NVIDIA] Add Cutlass MLA backend (#17625) 2025-06-03 21:40:26 -07:00
Calvin Chen 8d646c2e53
[Cleanup][v1]:remote guided-decoding-backend for example (#19059)
Signed-off-by: calvin chen <120380290@qq.com>
2025-06-04 04:23:26 +00:00
Vadim Gimpelson 5d6d1adf15
[KERNEL] Sampler. CUDA kernel for applying repetition penalty (#18437) 2025-06-03 21:13:01 -07:00
Lukas Geiger 1409ef9134
[Core] Cast multimodal input in hf processor (#18862)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-06-03 20:24:56 -07:00
Li, Jiang 4555143ea7
[CPU] V1 support for the CPU backend (#16441) 2025-06-03 18:43:01 -07:00
Russell Bryant 52dceb172d
[Docs] Add developer doc about CI failures (#18782)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-06-04 01:09:13 +00:00
Jiaxin Shan abd7df2fca
[Misc] Fix path and python alias errors in disagg_prefill exmaples (#18919) 2025-06-03 17:15:18 -07:00
Yan Ru Pei b712be98c7
feat: add data parallel rank to KVEventBatch (#18925) 2025-06-03 17:14:20 -07:00
Chen Zhang a8da78eac9
[Bugfix] Max concurrency estimation and check_enough_kv_cache_memory for models with sliding window layers (#19029)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-06-04 00:14:06 +00:00
Nicolò Lucchesi 5d96533e22
[Bugfix][P/D] Fix Prefix Cache Bug (#18411)
Signed-off-by: nicklucche <nlucches@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-06-03 23:53:16 +00:00
Chauncey 4de790fcad
[Bugfix]: Fix the incompatibility issue with tool_choice 'required' when Thinking is enabled (#19075)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-06-03 23:27:24 +00:00
Chen Zhang b5fd9506c1
[Bugfix] get_num_blocks_to_allocate with null_block (#19031)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-06-03 15:30:55 -07:00
Ekagra Ranjan 135cf55cd1
[V1][Spec Decode][Ngram] 1.35x gain -> 1.95x gain on InstructCoder with prompt fix (#18971) 2025-06-03 15:26:33 -07:00
Chen Zhang 6cac54f4d1
[v1] Re-init input batch for multiple kv cache groups (#18654)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-06-03 21:41:36 +00:00
Harry Mellor 6865fe0074
Fix interaction between `Optional` and `Annotated` in CLI typing (#19093)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Yikun Jiang <yikun@apache.org>
2025-06-03 21:07:19 +00:00
Michael Goin e31446b6c8
[Perf] Tune `scaled_fp8_quant` by increasing vectorization (#18844)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-03 13:48:25 -07:00
Yong Hoon Shin bdf13965ab
[V1] Support cross-layer KV sharing (#18212)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-06-03 20:33:07 +00:00
Varun Sundar Rabindranath fa98d77773
[Kernel] DeepEP dispatch-combine kernel integration (#18434)
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-06-03 12:30:02 -07:00
Reid 01eee40536
[doc] update docker version (#19074)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-03 19:08:21 +00:00
SorenDreano 19bdaf32b1
[Doc] Readme standardization (#18695)
Co-authored-by: Soren Dreano <soren@numind.ai>
2025-06-03 11:50:55 -07:00
Simon Mo 02f0c7b220
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
CYJiang d054da1992
[Misc] fix: add miss best_of param validation (#18555)
Signed-off-by: googs1025 <googs1025@gmail.com>
2025-06-03 11:02:07 -07:00
Nicolò Lucchesi 4b7817c119
[Misc] Add missing `_Backend` enums (#19081)
Signed-off-by: nicklucche <nlucches@redhat.com>
2025-06-03 16:15:16 +00:00
Lu Fang d00dd65cd4
[Doc] Improve the Pull Request template with key components (#19086)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-06-03 23:44:34 +08:00
Raushan Turganbay d81edded69
[Bugfix] disable processor cache (#19068)
Signed-off-by: raushan <raushan@huggingface.co>
2025-06-03 15:06:04 +00:00
Harry Mellor 476844d44c
Fix underscores in dict keys passed via CLI (#19030)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-06-03 14:39:24 +00:00
Jee Jee Li 4e68ae5e59
[CI/Build] Remove V0 LoRA test (#19066)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-06-03 14:30:18 +00:00
youkaichao 4e88723f32
[doc] clarify windows support (#19088)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-06-03 21:42:17 +08:00
Cyrus Leung 118ff92111
[Doc] Update V1 user guide for embedding and enc-dec models (#19060)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-06-03 02:29:41 -07:00
Isotr0py ec2dcd80bc
[Misc] Update `WeightsMapper` for qwen2-vl/qwen2.5-vl (#19054)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-03 09:08:20 +00:00
Jee Jee Li 42243fbda0
[Doc] Add InternVL LoRA support (#19055)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-06-03 09:08:03 +00:00
Michael Goin 6d18ed2a2e
Update docker docs with ARM CUDA cross-compile (#19037)
Signed-off-by: mgoin <michael@neuralmagic.com>
2025-06-03 08:21:53 +00:00
Chen Zhang f32fcd9444
[v1][KVCacheManager] Rename BlockHashType to BlockHash (#19015)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-06-03 08:01:48 +00:00
Lu Fang d32aa2e670
[Bugfix] Use cmake 3.26.1 instead of 3.26 to avoid build failure (#19019)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-06-03 00:16:17 -07:00
Michael Goin cc977286e7
Reduce logs in CLI scripts and plugin loader (#18970)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-03 06:00:45 +00:00
Reid 17430e3653
[bugfix] small fix logic issue (#18999)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-03 05:35:12 +00:00
汪志鹏 1282bd812e
Add tarsier model support (#18985)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
2025-06-03 13:13:13 +08:00
Rui Qiao bdce64f236
[V1] Support DP with Ray (#18779) 2025-06-02 21:15:13 -07:00
Gregory Shtrasberg 9e6f61e8c3
[ROCm][Build] Clean up the ROCm build (#19040)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-06-02 20:47:47 -07:00
Li, Jiang 8655f47f37
[CPU][CI] Re-enable the CPU CI tests (#19046)
Signed-off-by: jiang.li <jiang1.li@intel.com>
2025-06-02 20:46:47 -07:00
Concurrensee 4ce42f9204
Adding "LoRA Test %N" to AMD production tests (#18929)
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
2025-06-02 20:46:44 -07:00