Isotr0py
|
aed8468642
|
[Doc] Add missing llava family multi-image examples (#19698)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-17 07:05:21 +00:00 |
Ekagra Ranjan
|
017ef648e9
|
[Spec Decode][Benchmark] Generalize spec decode offline benchmark to more methods and datasets (#18847)
|
2025-06-12 10:30:56 -07:00 |
niu_he
|
dff680001d
|
Fix typo (#19525)
Signed-off-by: 2niuhe <carlton2tang@gmail.com>
|
2025-06-12 09:24:45 +00:00 |
wang.yuqi
|
3952731e8f
|
[New Model]: Support Qwen3 Embedding & Reranker (#19260)
|
2025-06-10 20:07:30 -07:00 |
Reid
|
6b1391ca7e
|
[Misc] refactor neuron_multimodal and profiling (#19397)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-10 06:12:42 +00:00 |
Reid
|
122cdca5f6
|
[Misc] refactor context extension (#19246)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-07 05:13:21 +00:00 |
汪志鹏
|
3336c8cfbe
|
Fix #19130 (#19132)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-06-04 01:42:06 -07:00 |
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
汪志鹏
|
1282bd812e
|
Add tarsier model support (#18985)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-06-03 13:13:13 +08:00 |
Siyuan Liu
|
9112b443a0
|
[Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
|
2025-06-03 00:06:20 +00:00 |
Calvin Chen
|
c57d577e8d
|
add an absolute path for run.sh (#18258)
Signed-off-by: calvin chen <120380290@qq.com>
|
2025-06-02 19:38:23 +00:00 |
Nick Hill
|
9a1b9b99d7
|
[BugFix] Fix multi-node offline data-parallel (#18981)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>
|
2025-05-31 08:34:52 -07:00 |
Satyajith Chilappagari
|
2a50ef5760
|
[Neuron] Add Multi-Modal model support for Neuron (#18921)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
Co-authored-by: Ashraf Mahgoub <ashymahg@amazon.com>
Co-authored-by: Rohith Nallamaddi <nalrohit@amazon.com>
Co-authored-by: FeliciaLuo <luof@amazon.com>
Co-authored-by: Elaine Zhao <elaineyz@amazon.com>
|
2025-05-31 10:39:11 +00:00 |
Reid
|
435fa95444
|
[Frontend] add run batch to CLI (#18804)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-28 07:08:57 -07:00 |
wang.yuqi
|
3e9ce609bd
|
[Bugfix] Fix nomic max_model_len (#18755)
|
2025-05-27 20:29:53 -07:00 |
Mark McLoughlin
|
06a0338015
|
[V1][Metrics] Add API for accessing in-memory Prometheus metrics (#17010)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-27 09:37:06 +00:00 |
Reid
|
fc6d0c290f
|
[Misc] improve docs (#18734)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-27 07:07:01 +00:00 |
Cyrus Leung
|
753944fa9b
|
[Doc] Update reproducibility doc and example (#18741)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-27 07:03:13 +00:00 |
Harry Mellor
|
27bebcd897
|
Convert `examples` to `ruff-format` (#18400)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-26 16:57:54 +00:00 |
Isotr0py
|
75f81750f3
|
[VLM] Initialize video input support for InternVL models (#18499)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-05-25 04:51:25 +00:00 |
Feng XiaoLong
|
4fc1bf813a
|
[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454)
Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com>
Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>
|
2025-05-23 16:16:26 -07:00 |
Chenheli Hua
|
04eb88dc80
|
Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. (#18569)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-05-23 01:59:18 +00:00 |
Reid
|
cb506ecb5a
|
[Misc] improve Automatic Prefix Caching example (#18554)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-22 14:50:46 +00:00 |
Reid
|
107f5fc4cb
|
[Misc] refactor disaggregated-prefill-v1 example (#18474)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-21 11:10:14 +00:00 |
Reid
|
8f55962a7f
|
[Misc] refactor prompt embedding examples (#18405)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-20 15:26:12 +00:00 |
Reid
|
27d0952600
|
[Misc] extract parser.parse_args() (#18323)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-19 04:06:26 +00:00 |
Lucia Fang
|
3d2779c29a
|
[Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 (#17827)
Signed-off-by: Lucia Fang <fanglu@fb.com>
|
2025-05-15 22:28:27 -07:00 |
Harry Mellor
|
51ff154639
|
Improve examples rendering in docs and GitHub (#18203)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-15 15:57:49 +00:00 |
omahs
|
a9944aabfa
|
fix: typos (#18151)
Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>
|
2025-05-15 02:16:15 -07:00 |
bnellnm
|
f9c069c85e
|
Modularize fused experts and integrate PPLX kernels (#15956)
|
2025-05-14 13:11:54 -07:00 |
Ekagra Ranjan
|
418d2f8bfb
|
[V1][Spec Decode] Share input embedding of target model with EAGLE draft model to free ~1GB for llama 3 model (#17326)
Co-authored-by: root <root@ekagra-8xh100.us-east5-a.c.serving-efficiency-poc.internal>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-05-14 12:31:46 -07:00 |
Ecthlion_zyy
|
33011318c2
|
Fix broken example: examples/offline_inference/profiling at scheduler_config (#18117)
|
2025-05-13 23:19:14 -07:00 |
Tao He
|
60f7624334
|
Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844)
|
2025-05-12 19:52:47 -07:00 |
Harry Mellor
|
72a3f6b898
|
Construct `KVTransferConfig` properly from Python instead of using JSON blobs without CLI (#17994)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-12 11:25:33 -07:00 |
Isotr0py
|
021c16c7ca
|
[Model] Broadcast Ovis2 implementation to fit Ovis1.6 (#17861)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-11 17:56:30 -07:00 |
Mark McLoughlin
|
7e3571134f
|
[V1][Spec Decoding] Include bonus tokens in mean acceptance length (#17908)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-09 13:32:36 -07:00 |
Cyrus Leung
|
a1e19b635d
|
[Doc] Fix a typo in the file name (#17836)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-08 18:04:18 +08:00 |
Harry Mellor
|
646a31e51e
|
Fix and simplify `deprecated=True` CLI `kwarg` (#17781)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-07 16:51:06 +01:00 |
Satyajith Chilappagari
|
043e4c4955
|
Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling (#16357)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
Co-authored-by: Aaron Dou <yzdou@amazon.com>
Co-authored-by: Shashwat Srijan <sssrijan@amazon.com>
Co-authored-by: Chongming Ni <chongmni@amazon.com>
Co-authored-by: Amulya Ballakur <amulyaab@amazon.com>
Co-authored-by: Patrick Lange <patlange@amazon.com>
Co-authored-by: Elaine Zhao <elaineyz@amazon.com>
Co-authored-by: Lin Lin Pan <tailinpa@amazon.com>
Co-authored-by: Navyadhara Gogineni <navyadha@amazon.com>
Co-authored-by: Yishan McNabb <yishanm@amazon.com>
Co-authored-by: Mrinal Shukla <181322398+mrinalks@users.noreply.github.com>
|
2025-05-07 00:07:30 -07:00 |
Jee Jee Li
|
ba7703e659
|
[Misc] Remove qlora_adapter_name_or_path (#17699)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-05-06 23:10:37 -07:00 |
Jevin Jiang
|
621ca2c0ab
|
[TPU] Increase block size and reset block shapes (#16458)
|
2025-05-06 13:55:04 -04:00 |
Cyrus Leung
|
5b8c390747
|
[Bugfix] Fix modality limits in vision language example (#17721)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-06 16:12:28 +00:00 |
Harry Mellor
|
d6484ef3c3
|
Add full API docs and improve the UX of navigating them (#17485)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-03 19:42:43 -07:00 |
Cyrus Leung
|
d7543862bd
|
[Misc] Rename assets for testing (#17575)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-02 03:29:25 -07:00 |
Cyrus Leung
|
f89d0e11bf
|
[Misc] Continue refactoring model tests (#17573)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-01 22:06:08 -07:00 |
Isotr0py
|
88c8304104
|
[Model] Refactor Ovis2 to support original tokenizer (#17537)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-01 11:00:53 -07:00 |
Marco
|
54072f315f
|
[MODEL ADDITION] Ovis2 Model Addition (#15826)
Signed-off-by: Marco <121761685+mlinmg@users.noreply.github.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-04-30 07:33:29 +00:00 |
Bryan Lu
|
70788bdbdc
|
[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE (#17211)
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
|
2025-04-29 21:10:00 +00:00 |
Alex Brooks
|
fa93cd9f60
|
[Model] Add Granite Speech Support (#16246)
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2025-04-28 10:05:00 +00:00 |
Isotr0py
|
8c1c926d00
|
[Bugfix] Fix missing int type for `-n` in multi-image example (#17223)
|
2025-04-26 08:49:52 +00:00 |