vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Nick Hill	646d62f636	[Core] Use tuple for kv cache group block ids (#19175 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-10 07:01:17 +02:00
Chen Zhang	f8a1a2d108	[v1] Hybrid Memory Allocator (#17996 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-05 20:47:09 -07:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Chen Zhang	f32fcd9444	[v1][KVCacheManager] Rename BlockHashType to BlockHash (#19015 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-03 08:01:48 +00:00
Nick Hill	2dbe8c0774	[Perf] API-server scaleout with many-to-many server-engine comms (#17546 )	2025-05-30 08:17:00 -07:00
Chen Zhang	6550114c9c	[v1] Redo "Support multiple KV cache groups in GPU model runner (#17945 )" (#18593 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-23 09:39:47 -07:00
Mark McLoughlin	bb0a311213	Revert "[v1] Support multiple KV cache groups in GPU model runner (#17945 ) (#18459 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-21 10:25:23 -07:00
Chen Zhang	e60f550b38	[v1] Support multiple KV cache groups in GPU model runner (#17945 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-14 18:54:54 -07:00
Chen Zhang	f2ae883b67	[v1][KVCacheManager] pass num_new_computed_tokens to kv cache manager (#18001 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-13 19:09:39 -07:00
Chen Zhang	200da9a517	[v1] Move block management logic from KVCacheManager to SpecializedManager (#17474 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-09 15:25:34 +00:00
Ning Xie	d310e6de98	[BUGFIX]: return fast when request requires prompt logprobs (#17251 )	2025-05-08 21:25:41 -07:00
Chen Zhang	aabcd2cae3	[v1] Introduce KVCacheBlocks as interface between Scheduler and KVCacheManager (#17479 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-06 08:50:34 -07:00
Chen Zhang	81ecf425f0	[v1][Spec Decode] Make sliding window compatible with eagle prefix caching (#17398 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-04-30 18:25:53 +00:00
Alec	0be6d05b5e	[V1][Metrics] add support for kv event publishing (#16750 ) Signed-off-by: alec-flowers <aflowers@nvidia.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-04-30 07:44:45 -07:00
Marko Rosenmueller	77073c77bc	[Core] Prevent side-channel attacks via cache salting (#17045 ) Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>	2025-04-30 20:27:21 +08:00
Lily Liu	20e489eaa1	[V1][Spec Decode] Make eagle compatible with prefix caching. (#17137 ) Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-04-27 09:29:43 -07:00
Nick Hill	df6f3ce883	[Core] Remove prompt string from engine core data structures (#17214 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-25 23:41:05 -07:00
Woosuk Kwon	c4ab9f3e71	[V1] Remove pre-allocation for KV cache (#16941 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-22 00:52:18 -07:00
vie-serendipity	d9737ca1c6	[V1][Misc] stop update prefix cache stats when logs_stats is disabled (#16460 ) Signed-off-by: vie-serendipity <2733147505@qq.com>	2025-04-19 02:25:19 -07:00
Chen Zhang	3a5f0afcd2	[V1] Implement sliding window attention in kv_cache_manager (#14097 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-04-01 00:33:17 -07:00
marko	27df5199d9	Support SHA256 as hash function in prefix caching (#15297 ) Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>	2025-03-26 11:11:28 -07:00
afeldman-nm	ef64044079	[V1] Prompt logprobs + APC compatibility; prompt logprobs reqs cannot fill APC (#13949 )	2025-03-08 01:48:12 +00:00
Harry Mellor	cf069aa8aa	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
Chen Zhang	28943d36ce	[v1] Move block pool operations to a separate class (#13973 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2025-02-28 20:53:31 +00:00
Woosuk Kwon	3243158336	[V1] Move KV block hashes from Request to KVCacheManager (#12922 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-07 19:14:10 -08:00
Cody Yu	5095e96606	[V1] Revert `uncache_blocks` and support recaching full blocks (#12415 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2025-02-03 15:04:53 -08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Shawn Du	f8ece6e17f	[Core][v1] Unify allocating slots in prefill and decode in KV cache manager (#12608 ) As mentioned in RFC https://github.com/vllm-project/vllm/issues/12254, this PR achieves the task: combine allocate_slots and append_slots. There should be no functionality change, except that in decode, also raise exception when num_tokens is zero (like prefill), and change the unit test case accordingly. @comaniac @rickyyx @WoosukKwon @youkaichao @heheda12345 @simon-mo --------- Signed-off-by: Shawn Du <shawnd200@outlook.com>	2025-02-02 16:40:58 +08:00
Cody Yu	f0ef37233e	[V1] Add `uncache_blocks` (#12333 )	2025-01-23 04:19:21 +00:00
Cody Yu	7206ce4ce1	[Core] Support `reset_prefix_cache` (#12284 )	2025-01-22 18:52:27 +00:00
Chen Zhang	994fc655b7	[V1][Prefix Cache] Move the logic of num_computed_tokens into KVCacheManager (#12003 )	2025-01-15 07:55:30 +00:00
Roger Wang	91b361ae89	[V1] Extend beyond image modality and support mixed-modality inference with Llava-OneVision (#11685 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-06 19:58:16 +00:00
Chen Zhang	8c3230d8c1	[V1] Simpify vision block hash for prefix caching by removing offset from hash (#11646 )	2024-12-31 08:56:01 +00:00
sakunkun	2c5718809b	[Bugfix] Move the _touch(computed_blocks) call in the allocate_slots method to after the check for allocating new blocks. (#11565 )	2024-12-31 06:29:04 +00:00
Cody Yu	bf8717ebae	[V1] Prefix caching for vision language models (#11187 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2024-12-17 16:37:59 -08:00
Cody Yu	78ed8f57d8	[Misc][V1] Fix type in v1 prefix caching (#11151 )	2024-12-13 00:57:40 +00:00
Woosuk Kwon	a79b122400	[V1] Do not allocate beyond the max_model_len (#10730 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-11-28 00:13:15 -08:00
Ricky Xu	97814fbf0f	[v1] Refactor KVCacheManager for more hash input than token ids (#10507 ) Signed-off-by: rickyx <rickyx@anyscale.com> Signed-off-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2024-11-22 23:27:25 +00:00
Cyrus Leung	0b8bb86bf1	[1/N] Initial prototype for multi-modal processor (#10044 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-13 12:39:03 +00:00
Cody Yu	201fc07730	[V1] Prefix caching (take 2) (#9972 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2024-11-07 17:34:44 -08:00

40 Commits