Commit Graph

151 Commits

Author SHA1 Message Date
Simon Mo 02f0c7b220
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
Jee Jee Li 4e68ae5e59
[CI/Build] Remove V0 LoRA test (#19066)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-06-03 14:30:18 +00:00
Alex Brooks 321331b8ae
[Core] Add Lora Support to Beam Search (#18346)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-05-28 08:58:24 -07:00
Harry Mellor 4c2b38ce9e
Enable Pydantic mypy checks and convert configs to Pydantic dataclasses (#17599)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-28 12:46:04 +00:00
Cyrus Leung 82e2339b06
[Doc] Move examples and further reorganize user guide (#18666)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-26 07:38:04 -07:00
Sanger Steel c32e249a23
[Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization (#17926)
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
2025-05-22 18:44:18 -07:00
Jee Jee Li db5a29ba19
[Bugfix] Fix LoRA test (#18518)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-05-21 21:48:53 -07:00
omahs a9944aabfa
fix: typos (#18151)
Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>
2025-05-15 02:16:15 -07:00
Jee Jee Li 259127f8b8
[Bugfix] Fix LoRA test (#18123)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-05-14 10:25:47 +00:00
Ben Browning 8132365b74
[Bugfix]: v1 engine - consider lora adapters in allowed_token_ids (#17855)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-05-11 00:53:58 -07:00
Akshat Tripathi c20ef40fd0
[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend (#14238)
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
2025-05-07 16:28:47 -04:00
Isotr0py f98e307588
[Bugfix] Fix missing lora name mapping for lora without prefix (#17793)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-05-07 16:17:12 +00:00
Alex Brooks 756848e79e
[Bugfix] Fix Lora Name Parsing (#17196)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-04-27 20:33:09 +08:00
Harry Mellor 0fa939e2d1
Improve configs - `LoRAConfig` + `PromptAdapterConfig` (#16980)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-24 10:29:34 -07:00
Harry Mellor 0a05ed57e6
Simplify `TokenizerGroup` (#16790)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-24 04:43:56 -07:00
Nick Hill 05fcd1b430
[V1][Perf] Faster incremental detokenization (#15137)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-04-17 07:45:24 -07:00
Angky William fdcb850f14
[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server (#10546)
Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
2025-04-15 22:31:38 +00:00
Jee Jee Li 1575c1701a
[CI/Build] Fix LoRA OOM (#16624)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-04-15 16:38:19 +08:00
Cyrus Leung aa29841ede
[Bugfix] Multi-modal caches not acting like LRU caches (#16593)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-14 09:24:16 -07:00
Jee Jee Li 86c3369eb8
[CI/Build] Fix CI LoRA failure (#16270)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-04-09 09:13:56 +08:00
Jee Jee Li 4203926f10
[CI/Build] Further clean up LoRA tests (#15920)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-04-02 01:39:09 -07:00
Jee Jee Li dfa82e2a3d
[CI/Build] Clean up LoRA tests (#15867)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-04-01 16:28:50 +00:00
Varun Sundar Rabindranath 79455cf421
[Misc] Enable V1 LoRA by default (#15320)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-04-01 16:53:56 +08:00
Varun Sundar Rabindranath 8095341a01
[misc] LoRA: Remove unused long context test data (#15558)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-03-27 10:04:51 +08:00
Varun Sundar Rabindranath ff38f0a32c
[CI/Build] LoRA: Delete long context tests (#15503)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-03-25 17:18:34 -07:00
Varun Sundar Rabindranath 8a8b30eac1
[Bugfix] LoRA V0 - Fix case where `max_num_seqs` is between cudagraph capture sizes (#15308)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-03-22 02:03:32 -07:00
Nick Hill da6ea29f7a
[V1] Avoid redundant input processing in n>1 case (#14985)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-03-20 22:24:10 -07:00
Varun Sundar Rabindranath 0cfe7d386d
[CI/Build] LoRA : make add_lora_test safer (#15181)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-03-21 09:28:53 +08:00
Varun Sundar Rabindranath 400d483e87
[Kernels] LoRA - Retire SGMV and BGMV Kernels (#14685)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-03-18 09:47:53 +00:00
vllmellm 2bb0e1a799
[Bugfix][ROCm] running new process using spawn method for rocm in tests. (#14810)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-03-17 11:33:35 +00:00
Jee Jee Li e0fdfa1608
[CI/Build] Delete LoRA bias test (#14849)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-14 22:09:25 -07:00
Robert Shaw d4d93db2c5
[V1] V1 Enablement Oracle (#13726)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2025-03-14 22:02:20 -07:00
Varun Sundar Rabindranath 0b1cfa6180
[Kernel] LoRA - Enable CUDAGraphs for V1 (#14626)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-03-13 20:42:04 -07:00
Jee Jee Li bd44b812cb
[CI/Build] Delete ultravox LoRA test (#14730)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-13 07:57:39 +00:00
Li, Jiang ff47aab056
[CPU] Upgrade CPU backend to torch-2.6 (#13381)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-03-12 10:41:13 +00:00
Varun Sundar Rabindranath 5ff0d32580
[V1] LoRA - Add triton kernels for V1 (#13096)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-03-10 17:27:53 -04:00
Jee Jee Li 12c29a881f
[Bugfix] Further clean up LoRA test (#14422)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-07 10:30:55 +00:00
Jee Jee Li ddd1ef66ec
[Bugfix] Fix JambaForCausalLM LoRA (#14370)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-06 22:05:47 -08:00
Varun Sundar Rabindranath 3dbd2d813a
[V1] LoRA - Enable more V1 tests (#14315)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-03-06 11:55:42 +08:00
Harry Mellor cf069aa8aa
Update deprecated Python 3.8 typing (#13971) 2025-03-02 17:34:51 -08:00
Jee Jee Li cc5e8f6db8
[Model] Add LoRA support for TransformersModel (#13770)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-02 09:17:34 +08:00
Jee Jee Li 5157338ed9
[Misc] Improve LoRA spelling (#13831) 2025-02-25 23:43:01 -08:00
Jee Jee Li 37b6cb4985
[CI/Build] Fix V1 LoRA failure (#13767) 2025-02-25 02:01:15 -08:00
Varun Sundar Rabindranath 03f48b3db6
[Core] LoRA V1 - Add add/pin/list/remove_lora functions (#13705) 2025-02-25 00:18:02 -08:00
Jee Jee Li 105b8ce4c0
[Misc] Reduce LoRA-related static variable (#13166) 2025-02-22 00:21:30 -08:00
Jee Jee Li 512368e34a
[Misc] Qwen2.5 VL support LoRA (#13261) 2025-02-19 18:37:55 -08:00
Nick Hill caf7ff4456
[V1][Core] Generic mechanism for handling engine utility (#13060)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-02-19 17:09:22 +08:00
Varun Sundar Rabindranath cbc40128eb
[V1] LoRA - Enable Serving Usecase (#12883)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-02-14 14:21:12 +08:00
Jee Jee Li 82cabf53a3
[Misc] Delete unused LoRA modules (#13151) 2025-02-12 08:58:24 -08:00
Varun Sundar Rabindranath 78a141d768
[Misc] LoRA - Refactor Punica ops tests (#12970)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-02-11 07:26:03 +00:00