vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Chengji Yao	a77aea59fd	[TPU] support attention head dim smaller than 128 (#19620 ) Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-06-16 06:40:53 +00:00
Nick Hill	646d62f636	[Core] Use tuple for kv cache group block ids (#19175 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-10 07:01:17 +02:00
Siyuan Liu	7d44c469fe	[TPU]Fix KV cache sharing tests (#19371 )	2025-06-09 18:38:15 -04:00
Nick Hill	46ecc57973	[BugFix] Fix tpu_model_runner block_id concatenation (#19228 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-06 16:28:17 -07:00
Siyuan Liu	7ee2590478	[TPU] Update dynamo dump file name in compilation test (#19108 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-06-04 16:13:43 -04:00
Siyuan Liu	8e972d9c44	[TPU] Skip hanging tests (#19115 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-06-04 01:43:00 -07:00
Yong Hoon Shin	bdf13965ab	[V1] Support cross-layer KV sharing (#18212 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-06-03 20:33:07 +00:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Siyuan Liu	9112b443a0	[Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com> Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>	2025-06-03 00:06:20 +00:00
Carol Zheng	fba02e3bd1	[Bugfix][TPU] Fix tpu model runner testcase failure (#18810 ) Signed-off-by: Carol Zheng <cazheng@google.com>	2025-05-30 18:04:03 +08:00
Jevin Jiang	a463555dee	[TPU] Fix the test_sampler (#17820 )	2025-05-08 05:51:33 -04:00
Cyrus Leung	8a15c2603a	[Frontend] Add missing chat templates for various MLLMs (#17758 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-07 00:10:01 -07:00
Nicolò Lucchesi	5941e0b7ea	[TPU][V1] Add support for top-logprobs (#17072 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-05-05 14:20:15 -07:00
XiongfeiWei	9765940824	[TPU] Enable gemma3-27b with TP>1 on multi-chips. (#17335 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-05-05 14:19:58 -07:00
Siyuan Liu	dbc18e7816	[CI][TPU] Skip Multimodal test (#17488 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-04-30 19:51:39 -07:00
Nicolò Lucchesi	a7d5b016bd	[TPU][V1][CI] Update regression test baseline for v6 CI (#17064 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-30 04:03:22 -07:00
Nick Hill	df6f3ce883	[Core] Remove prompt string from engine core data structures (#17214 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-25 23:41:05 -07:00
Michael Goin	14288d1332	Disable enforce_eager for V1 TPU sampler and structured output tests (#17016 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-24 02:50:09 -07:00
Chenyaaang	83d933718c	[Core][V1][TPU] Enable structured decoding on TPU V1 (#16499 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-04-22 18:05:23 -06:00
Nicolò Lucchesi	fa3bba2a53	[TPU][V1] Enable Top-P (#16843 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-04-22 00:46:07 +00:00
Nicolò Lucchesi	210207525e	[TPU][V1] Capture multimodal encoder during model compilation (#15051 ) Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Siyuan Liu <lsiyuan@google.com>	2025-04-21 18:36:59 -06:00
Chengji Yao	471fe65630	[TPU][V1] Implicitly adjust page size when there's SMEM OOM (#16871 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-21 15:43:13 -06:00
Nicolò Lucchesi	eb5819b2d9	[V1][TPU] Enable Top K (#15489 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Hyesoo Yang <hyeygit@gmail.com> Co-authored-by: Hyesoo Yang <hyeygit@gmail.com>	2025-04-17 18:18:11 +00:00
Nicolò Lucchesi	5989f4684d	[TPU][V1] Fix padding recompilation when `max-num-batched-tokens` is not even (#16726 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-17 18:09:57 +00:00
Nicolò Lucchesi	b3f2fddd17	[TPU][V1] Fix exponential padding when `max-num-batched-tokens` is not a power of 2 (#16596 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-14 17:01:05 +00:00
Nicolò Lucchesi	3cc9af88ff	[TPU][V1] Disable per-request seed/Generator (#16172 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-10 17:05:44 -04:00
Chengji Yao	a454748544	[TPU][V1] Refine tpu_model_runner to mitigate future recompilation issues (#16275 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-09 18:51:51 -06:00
Chengji Yao	b1eb4ca152	[TPU] Update PyTorch/XLA (#16288 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-09 14:46:32 +08:00
iefgnoix	b6be6f8d1e	[TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. (#15732 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-04-03 14:23:28 -07:00
Hyesoo Yang	1b84eff03a	[V1][TPU] TPU-optimized top-p implementation (avoids scattering). (#15736 ) Signed-off-by: Hyesoo Yang <hyeygit@gmail.com> Co-authored-by: root <root@t1v-n-822696b7-w-0.us-central2-b.c.tpu-prod-env-large-adhoc.internal>	2025-04-02 17:18:08 -07:00
Alexander Matveev	9a2160fa55	[V1] TPU CI - Add basic perf regression test (#15414 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-03-31 13:25:20 -04:00
Alexander Matveev	c3f687ac22	[V1] TPU - Fix the chunked prompt bug (#15713 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-03-28 20:19:04 +00:00
Robert Shaw	2d9045fce8	[TPU][CI] Fix TPUModelRunner Test (#15667 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2025-03-28 00:01:26 -07:00
Robert Shaw	8a49eea74b	[CI][TPU] Temporarily Disable Quant Test on TPU (#15649 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-03-27 19:45:05 -07:00
Nicolò Lucchesi	4098b72210	[Bugfix][TPU][V1] Fix recompilation (#15553 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-03-27 19:15:06 +00:00
Chenyaaang	ac3cd6e83c	[core] add bucket padding to tpu_model_runner (#14995 ) Signed-off-by: Chenyaaang <llccyy1212@gmail.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-03-25 17:27:22 -04:00
yarongmu-google	0a049c7d86	[CI/Build] Add tests for the V1 tpu_model_runner. (#14843 ) Signed-off-by: Yarong Mu <ymu@google.com>	2025-03-25 12:27:16 -04:00
Nicolò Lucchesi	cfbb8c930f	[TPU][V1] MHA Pallas backend (#15288 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-03-21 08:50:39 -07:00
Hyesoo Yang	47195057e9	[V1][TPU] Speed up top-k on TPU by using torch.topk (#15242 ) Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>	2025-03-20 19:19:40 -07:00
Nicolò Lucchesi	d8c6d7d6b5	[V1][TPU] Support V1 Sampler for ragged attention (#14227 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-03-19 21:00:39 -07:00
Alexander Matveev	72a8639b68	[V1] TPU - CI/CD use smaller model (#15054 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-03-18 21:39:21 +00:00
Sibi	a73e183e36	[Misc] Replace os environ to monkeypatch in test suite (#14516 ) Signed-off-by: sibi <85477603+t-sibiraj@users.noreply.github.com> Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2025-03-16 20:35:57 -07:00
Alexander Matveev	cb8bdfade2	[V1] TPU - Add tensor parallel support via Ray (#13618 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-03-08 08:19:38 -05:00

43 Commits