vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Cyrus Leung	01dc9a76db	[CI/Build][Bugfix] Ensure compatibility with transformers 4.52 (#18678 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-06-04 04:49:20 -07:00
wang.yuqi	35cf32df30	Improve the output precision of embedding models (#19092 )	2025-06-04 11:48:57 +00:00
Seiji Eicher	2669a0d7b5	Fix ValueError: Missing value for tag key(s): model_name,engine. (#19113 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-06-04 17:10:45 +08:00
Siyuan Liu	8e972d9c44	[TPU] Skip hanging tests (#19115 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-06-04 01:43:00 -07:00
Woosuk Kwon	b124e1085b	[Bugfix] Fix FA3 full cuda graph correctness (#19106 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-06-03 23:10:15 -07:00
Kaixi Hou	41aa578428	[NVIDIA] Add Cutlass MLA backend (#17625 )	2025-06-03 21:40:26 -07:00
Vadim Gimpelson	5d6d1adf15	[KERNEL] Sampler. CUDA kernel for applying repetition penalty (#18437 )	2025-06-03 21:13:01 -07:00
Li, Jiang	4555143ea7	[CPU] V1 support for the CPU backend (#16441 )	2025-06-03 18:43:01 -07:00
Yan Ru Pei	b712be98c7	feat: add data parallel rank to KVEventBatch (#18925 )	2025-06-03 17:14:20 -07:00
Chen Zhang	a8da78eac9	[Bugfix] Max concurrency estimation and check_enough_kv_cache_memory for models with sliding window layers (#19029 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-04 00:14:06 +00:00
Chauncey	4de790fcad	[Bugfix]: Fix the incompatibility issue with tool_choice 'required' when Thinking is enabled (#19075 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-06-03 23:27:24 +00:00
Chen Zhang	b5fd9506c1	[Bugfix] get_num_blocks_to_allocate with null_block (#19031 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-03 15:30:55 -07:00
Chen Zhang	6cac54f4d1	[v1] Re-init input batch for multiple kv cache groups (#18654 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-03 21:41:36 +00:00
Harry Mellor	6865fe0074	Fix interaction between `Optional` and `Annotated` in CLI typing (#19093 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Yikun Jiang <yikun@apache.org>	2025-06-03 21:07:19 +00:00
Yong Hoon Shin	bdf13965ab	[V1] Support cross-layer KV sharing (#18212 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-06-03 20:33:07 +00:00
Varun Sundar Rabindranath	fa98d77773	[Kernel] DeepEP dispatch-combine kernel integration (#18434 ) Signed-off-by: Varun <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-06-03 12:30:02 -07:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Harry Mellor	476844d44c	Fix underscores in dict keys passed via CLI (#19030 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-06-03 14:39:24 +00:00
Jee Jee Li	4e68ae5e59	[CI/Build] Remove V0 LoRA test (#19066 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-06-03 14:30:18 +00:00
Chen Zhang	f32fcd9444	[v1][KVCacheManager] Rename BlockHashType to BlockHash (#19015 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-03 08:01:48 +00:00
汪志鹏	1282bd812e	Add tarsier model support (#18985 ) Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>	2025-06-03 13:13:13 +08:00
Rui Qiao	bdce64f236	[V1] Support DP with Ray (#18779 )	2025-06-02 21:15:13 -07:00
Siyuan Liu	9112b443a0	[Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com> Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>	2025-06-03 00:06:20 +00:00
22quinn	9760fd8f6a	[Core] Support inplace model weights loading (#18745 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-02 17:38:50 +08:00
Cyrus Leung	6aa8f9a4e7	[Core] Rework dtype resolution (#18751 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-06-01 11:04:23 +08:00
Reid	20079c6e36	[Misc] add return token strs for tokenize (#18941 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-31 18:00:11 +00:00
Charlie Fu	306d60401d	[ROCm][Kernel] Add gfx950 support for skinny gemms (#18010 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-05-31 07:40:05 -07:00
vllmellm	0f5e0d567e	[FEAT][ROCm] Add AITER grouped topk for DeepSeekV2 (#18825 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-05-31 03:39:31 -07:00
Pooya Davoodi	dff80b0e42	[Frontend] Add rerank support to run_batch endpoint (#16278 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>	2025-05-31 07:40:01 +00:00
Will Eaton	1dab4d5718	Tool parser regex timeout handling (#18960 ) Signed-off-by: Will Eaton <weaton@redhat.com>	2025-05-30 21:02:54 +00:00
Isotr0py	5a8641638a	[VLM] Add PP support and fix GPTQ inference for Ovis models (#18958 ) Signed-off-by: isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-30 17:11:44 +00:00
Nick Hill	2dbe8c0774	[Perf] API-server scaleout with many-to-many server-engine comms (#17546 )	2025-05-30 08:17:00 -07:00
Shawn Huang	e1fadf1197	[Feature] minicpm eagle support (#18943 ) Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com> Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com>	2025-05-30 06:45:56 -07:00
Carol Zheng	fba02e3bd1	[Bugfix][TPU] Fix tpu model runner testcase failure (#18810 ) Signed-off-by: Carol Zheng <cazheng@google.com>	2025-05-30 18:04:03 +08:00
Cyrus Leung	1aa2f81b43	[Misc] Update type annotation for rotary embedding `base` (#18914 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-30 10:17:01 +08:00
Chengji Yao	a1cc9f33a3	[TPU] remove transpose ops in moe kernel (#18923 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-05-29 23:00:11 +00:00
Nick Hill	d1d61f3351	[BugFix] Make DP work with connector-delayed new requests (#18559 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Will Eaton <weaton@redhat.com>	2025-05-29 18:04:18 +00:00
Nicolò Lucchesi	32ce3cf7c9	[V1] Allocate kv_cache with stride order for V1 (#18775 ) Signed-off-by: nicklucche <nlucches@redhat.com>	2025-05-29 17:54:16 +00:00
Cyrus Leung	c29034037d	[Deprecation] Disallow pos-args other than `model` when initializing `LLM` (#18802 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-29 09:36:58 -07:00
Isotr0py	c9479b2920	[Bugfix] Fix the failing gte embedding test (#18720 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-29 07:39:25 -07:00
Satyajith Chilappagari	972eddf7c9	[Neuron] Add multi-LoRA support for Neuron. (#18284 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>	2025-05-29 16:41:22 +08:00
Richard Zou	26b4fa45be	Add ability to use CUDAGraphs with use_inductor=False (#17345 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-05-29 10:16:52 +08:00
Hongxia Yang	269d901734	[Bugfix][ROCm] fix the power of 2 exception from triton_unified_attention.py when running llama4 models and unit test fix (#18100 ) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-29 07:21:46 +08:00
Akshat Tripathi	643622ba46	[Hardware][TPU][V1] Multi-LoRA Optimisations for the V1 TPU backend (#15655 ) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Signed-off-by: xihajun <junfan@krai.ai> Signed-off-by: Jorge de Freitas <jorge.de-freitas22@imperial.ac.uk> Signed-off-by: Jorge de Freitas <jorge@krai.ai> Co-authored-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: xihajun <junfan@krai.ai> Co-authored-by: Jorge de Freitas <jorge.de-freitas22@imperial.ac.uk> Co-authored-by: Jorge de Freitas <jorge@krai.ai>	2025-05-28 19:59:09 +00:00
Mark McLoughlin	0e98964e94	[V1][Metrics] Remove metrics that were deprecated in 0.8 (#18837 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-28 18:54:12 +00:00
Alex Brooks	321331b8ae	[Core] Add Lora Support to Beam Search (#18346 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-05-28 08:58:24 -07:00
Reid	435fa95444	[Frontend] add run batch to CLI (#18804 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-28 07:08:57 -07:00
Harry Mellor	4c2b38ce9e	Enable Pydantic mypy checks and convert configs to Pydantic dataclasses (#17599 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-28 12:46:04 +00:00
wang.yuqi	de65fc8e1e	[CI] improve embed testing (#18747 )	2025-05-28 00:16:35 -07:00
Rabi Mishra	b78f844a67	[Bugfix][FailingTest]Fix test_model_load_with_params.py (#18758 ) Signed-off-by: rabi <ramishra@redhat.com>	2025-05-28 05:42:54 +00:00

1 2 3 4 5 ...

2068 Commits