Commit Graph

26 Commits

Author SHA1 Message Date
Liangliang Ma a0389e0554
[UT][intel GPU] use current_platform instead of device hardcode in v1 tests (#20169)
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
2025-07-02 09:06:04 +08:00
Woosuk Kwon 7f280d69c9
[Optimization] Cache sampled token ids in model runner (#20291)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-01 11:01:31 -07:00
Woosuk Kwon 2863befce3
[Optimization] Use Shared `CachedRequestData` Instance Across All Requests (#20232)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-30 09:07:50 -07:00
Vlad Tiberiu Mihailescu 2e3e3c86dc
Export NaNs in logits to scheduler_stats if output is corrupted (#18777)
Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com>
2025-06-20 22:47:16 +08:00
Maximilien de Bayser 799397ee4f
Support embedding models in V1 (#16188)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-06-18 21:36:33 -07:00
Isotr0py 5f1ac1e1d1
Revert "[v1] Add fp32 support to v1 engine through flex attn" (#19404) 2025-06-10 01:30:20 -07:00
Nick Hill 646d62f636
[Core] Use tuple for kv cache group block ids (#19175)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-06-10 07:01:17 +02:00
Isotr0py b8089195b4
[v1] Add fp32 support to v1 engine through flex attn (#19319)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-06-09 22:10:44 +08:00
Chen Zhang f8a1a2d108
[v1] Hybrid Memory Allocator (#17996)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-06-05 20:47:09 -07:00
Chen Zhang 6cac54f4d1
[v1] Re-init input batch for multiple kv cache groups (#18654)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-06-03 21:41:36 +00:00
Yong Hoon Shin bdf13965ab
[V1] Support cross-layer KV sharing (#18212)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-06-03 20:33:07 +00:00
Simon Mo 02f0c7b220
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
22quinn 9760fd8f6a
[Core] Support inplace model weights loading (#18745)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-06-02 17:38:50 +08:00
Nicolò Lucchesi 32ce3cf7c9
[V1] Allocate kv_cache with stride order for V1 (#18775)
Signed-off-by: nicklucche <nlucches@redhat.com>
2025-05-29 17:54:16 +00:00
Chen Zhang 6550114c9c
[v1] Redo "Support multiple KV cache groups in GPU model runner (#17945)" (#18593)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-05-23 09:39:47 -07:00
Mark McLoughlin bb0a311213
Revert "[v1] Support multiple KV cache groups in GPU model runner (#17945) (#18459)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-05-21 10:25:23 -07:00
Chen Zhang e60f550b38
[v1] Support multiple KV cache groups in GPU model runner (#17945)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-05-14 18:54:54 -07:00
Chen Zhang 950751a987
[v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders (#17483)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-05-10 16:12:04 -07:00
Nick Hill df6f3ce883
[Core] Remove prompt string from engine core data structures (#17214)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-04-25 23:41:05 -07:00
Woosuk Kwon 0c6f5023c3
[V1] Scheduler Refactoring [1/N] - Add Scheduler Interface (#15250)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-03-20 17:50:43 -07:00
Aaron Pham 80e9afb5bc
[V1][Core] Support for Structured Outputs (#12388)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-03-07 07:19:11 -08:00
Chen Zhang e7bd944e08
[v1] Cleanup the BlockTable in InputBatch (#13977)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-02-28 19:03:16 +00:00
Nick Hill 30172b4947
[V1] Optimize handling of sampling metadata and req_ids list (#13244)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-02-18 12:15:33 -08:00
Woosuk Kwon 4c21ce9eba
[V1] Get input tokens from scheduler (#13339)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-02-17 11:01:07 -08:00
Lily Liu 80f63a3966
[V1][Spec Decode] Ngram Spec Decode (#12193)
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
2025-02-15 18:05:11 -08:00
Kero Liang b0ccfc565a
[Bugfix][V1] GPUModelRunner._update_states should return True when there is a finished request in batch (#13126) 2025-02-13 22:39:20 -08:00