Xu Wenqing
|
02658c2dfe
|
Add DeepSeek-R1-0528 function call chat template (#18874)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
|
2025-06-04 13:24:18 +00:00 |
汪志鹏
|
3336c8cfbe
|
Fix #19130 (#19132)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-06-04 01:42:06 -07:00 |
Calvin Chen
|
8d646c2e53
|
[Cleanup][v1]:remote guided-decoding-backend for example (#19059)
Signed-off-by: calvin chen <120380290@qq.com>
|
2025-06-04 04:23:26 +00:00 |
Jiaxin Shan
|
abd7df2fca
|
[Misc] Fix path and python alias errors in disagg_prefill exmaples (#18919)
|
2025-06-03 17:15:18 -07:00 |
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
汪志鹏
|
1282bd812e
|
Add tarsier model support (#18985)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-06-03 13:13:13 +08:00 |
Siyuan Liu
|
9112b443a0
|
[Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
|
2025-06-03 00:06:20 +00:00 |
Calvin Chen
|
c57d577e8d
|
add an absolute path for run.sh (#18258)
Signed-off-by: calvin chen <120380290@qq.com>
|
2025-06-02 19:38:23 +00:00 |
Nick Hill
|
9a1b9b99d7
|
[BugFix] Fix multi-node offline data-parallel (#18981)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>
|
2025-05-31 08:34:52 -07:00 |
Satyajith Chilappagari
|
2a50ef5760
|
[Neuron] Add Multi-Modal model support for Neuron (#18921)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
Co-authored-by: Ashraf Mahgoub <ashymahg@amazon.com>
Co-authored-by: Rohith Nallamaddi <nalrohit@amazon.com>
Co-authored-by: FeliciaLuo <luof@amazon.com>
Co-authored-by: Elaine Zhao <elaineyz@amazon.com>
|
2025-05-31 10:39:11 +00:00 |
Mark McLoughlin
|
0e98964e94
|
[V1][Metrics] Remove metrics that were deprecated in 0.8 (#18837)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-28 18:54:12 +00:00 |
Reid
|
435fa95444
|
[Frontend] add run batch to CLI (#18804)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-28 07:08:57 -07:00 |
wang.yuqi
|
3e9ce609bd
|
[Bugfix] Fix nomic max_model_len (#18755)
|
2025-05-27 20:29:53 -07:00 |
Mark McLoughlin
|
06a0338015
|
[V1][Metrics] Add API for accessing in-memory Prometheus metrics (#17010)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-27 09:37:06 +00:00 |
Reid
|
fc6d0c290f
|
[Misc] improve docs (#18734)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-27 07:07:01 +00:00 |
Cyrus Leung
|
753944fa9b
|
[Doc] Update reproducibility doc and example (#18741)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-27 07:03:13 +00:00 |
Harry Mellor
|
27bebcd897
|
Convert `examples` to `ruff-format` (#18400)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-26 16:57:54 +00:00 |
Cyrus Leung
|
82e2339b06
|
[Doc] Move examples and further reorganize user guide (#18666)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-26 07:38:04 -07:00 |
AlexZhao
|
8820821b59
|
[Misc] Fixed the abnormally high TTFT issue in the PD disaggregation example (#18644)
Signed-off-by: zhaohaidao <zhaohaidao2008@hotmail.com>
Signed-off-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com>
Co-authored-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com>
|
2025-05-26 13:51:27 +08:00 |
Isotr0py
|
75f81750f3
|
[VLM] Initialize video input support for InternVL models (#18499)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-05-25 04:51:25 +00:00 |
Feng XiaoLong
|
4fc1bf813a
|
[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454)
Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com>
Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>
|
2025-05-23 16:16:26 -07:00 |
Chenheli Hua
|
04eb88dc80
|
Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. (#18569)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-05-23 01:59:18 +00:00 |
Sanger Steel
|
c32e249a23
|
[Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization (#17926)
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
|
2025-05-22 18:44:18 -07:00 |
Kai Wu
|
c91fe7b1b9
|
[Frontend][Bug Fix] Update llama4 pythonic jinja template and llama4_pythonic parser (#17917)
Signed-off-by: Kai Wu <kaiwu@meta.com>
|
2025-05-22 16:44:08 -07:00 |
Reid
|
cb506ecb5a
|
[Misc] improve Automatic Prefix Caching example (#18554)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-22 14:50:46 +00:00 |
Calvin Chen
|
3f505233fd
|
[Doc] Add stream flag for chat completion example (#18524)
Signed-off-by: calvin chen <120380290@qq.com>
|
2025-05-22 14:07:10 +00:00 |
CYJiang
|
71075029f2
|
[Doc] Support --stream arg in openai_completion_client.py script (#18388)
Signed-off-by: googs1025 <googs1025@gmail.com>
|
2025-05-22 13:20:17 +00:00 |
Reid
|
107f5fc4cb
|
[Misc] refactor disaggregated-prefill-v1 example (#18474)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-21 11:10:14 +00:00 |
Reid
|
8f55962a7f
|
[Misc] refactor prompt embedding examples (#18405)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-20 15:26:12 +00:00 |
Gong Shufan
|
8171221834
|
[Misc] Fix typo (#18330)
|
2025-05-19 09:51:01 -07:00 |
Reid
|
27d0952600
|
[Misc] extract parser.parse_args() (#18323)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-19 04:06:26 +00:00 |
David Xia
|
5c04bb8b86
|
[doc] fix multimodal example script (#18089)
Signed-off-by: David Xia <david@davidxia.com>
|
2025-05-16 06:05:34 +00:00 |
Lucia Fang
|
3d2779c29a
|
[Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 (#17827)
Signed-off-by: Lucia Fang <fanglu@fb.com>
|
2025-05-15 22:28:27 -07:00 |
Harry Mellor
|
51ff154639
|
Improve examples rendering in docs and GitHub (#18203)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-15 15:57:49 +00:00 |
omahs
|
a9944aabfa
|
fix: typos (#18151)
Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>
|
2025-05-15 02:16:15 -07:00 |
bnellnm
|
f9c069c85e
|
Modularize fused experts and integrate PPLX kernels (#15956)
|
2025-05-14 13:11:54 -07:00 |
Ekagra Ranjan
|
418d2f8bfb
|
[V1][Spec Decode] Share input embedding of target model with EAGLE draft model to free ~1GB for llama 3 model (#17326)
Co-authored-by: root <root@ekagra-8xh100.us-east5-a.c.serving-efficiency-poc.internal>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-05-14 12:31:46 -07:00 |
majianpeng
|
e7ef61c1f0
|
[Bugfix][Example] make lmcache v0 work. (#18051)
Signed-off-by: Ma, Jianpeng <jianpeng.ma@intel.com>
|
2025-05-13 23:43:44 -07:00 |
Ecthlion_zyy
|
33011318c2
|
Fix broken example: examples/offline_inference/profiling at scheduler_config (#18117)
|
2025-05-13 23:19:14 -07:00 |
Tao He
|
60f7624334
|
Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844)
|
2025-05-12 19:52:47 -07:00 |
Harry Mellor
|
72a3f6b898
|
Construct `KVTransferConfig` properly from Python instead of using JSON blobs without CLI (#17994)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-12 11:25:33 -07:00 |
Xu Wenqing
|
3a5ea75129
|
[Feature] Support DeepSeekV3 Function Call (#17784)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
Signed-off-by: Xu Wenqing <xuwq1993@qq.com>
|
2025-05-12 00:45:21 -07:00 |
Isotr0py
|
021c16c7ca
|
[Model] Broadcast Ovis2 implementation to fit Ovis1.6 (#17861)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-11 17:56:30 -07:00 |
Frieda Huang
|
9cea90eab4
|
[Frontend] Add /classify endpoint (#17032)
Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>
|
2025-05-11 07:57:07 +00:00 |
Reid
|
4c31218f80
|
[Misc] remove --model from vllm serve usage (#17944)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-10 13:23:31 +00:00 |
Mark McLoughlin
|
7e3571134f
|
[V1][Spec Decoding] Include bonus tokens in mean acceptance length (#17908)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-09 13:32:36 -07:00 |
Rui Qiao
|
c44c384b1c
|
[Misc] Add references in ray_serve_deepseek example (#17907)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-05-09 16:59:36 +00:00 |
Cyrus Leung
|
a1e19b635d
|
[Doc] Fix a typo in the file name (#17836)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-08 18:04:18 +08:00 |
Rick Yuan
|
ca04b97c93
|
[Bugfix] Fix tool call template validation for Mistral models (#17644)
Signed-off-by: Rick Yuan <yuan821120@gmail.com>
Signed-off-by: RIck Yuan <yuan821120@gmail.com>
Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com>
|
2025-05-08 09:47:19 +00:00 |
Cyrus Leung
|
96722aa81d
|
[Frontend] Chat template fallbacks for multimodal models (#17805)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-07 23:05:54 -07:00 |