Commit Graph

43 Commits

Author SHA1 Message Date
Chengji Yao a77aea59fd
[TPU] support attention head dim smaller than 128 (#19620)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-06-16 06:40:53 +00:00
Nick Hill 646d62f636
[Core] Use tuple for kv cache group block ids (#19175)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-06-10 07:01:17 +02:00
Siyuan Liu 7d44c469fe
[TPU]Fix KV cache sharing tests (#19371) 2025-06-09 18:38:15 -04:00
Nick Hill 46ecc57973
[BugFix] Fix tpu_model_runner block_id concatenation (#19228)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-06-06 16:28:17 -07:00
Siyuan Liu 7ee2590478
[TPU] Update dynamo dump file name in compilation test (#19108)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
2025-06-04 16:13:43 -04:00
Siyuan Liu 8e972d9c44
[TPU] Skip hanging tests (#19115)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
2025-06-04 01:43:00 -07:00
Yong Hoon Shin bdf13965ab
[V1] Support cross-layer KV sharing (#18212)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-06-03 20:33:07 +00:00
Simon Mo 02f0c7b220
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
Siyuan Liu 9112b443a0
[Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
2025-06-03 00:06:20 +00:00
Carol Zheng fba02e3bd1
[Bugfix][TPU] Fix tpu model runner testcase failure (#18810)
Signed-off-by: Carol Zheng <cazheng@google.com>
2025-05-30 18:04:03 +08:00
Jevin Jiang a463555dee
[TPU] Fix the test_sampler (#17820) 2025-05-08 05:51:33 -04:00
Cyrus Leung 8a15c2603a
[Frontend] Add missing chat templates for various MLLMs (#17758)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-07 00:10:01 -07:00
Nicolò Lucchesi 5941e0b7ea
[TPU][V1] Add support for top-logprobs (#17072)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-05-05 14:20:15 -07:00
XiongfeiWei 9765940824
[TPU] Enable gemma3-27b with TP>1 on multi-chips. (#17335)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-05-05 14:19:58 -07:00
Siyuan Liu dbc18e7816
[CI][TPU] Skip Multimodal test (#17488)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
2025-04-30 19:51:39 -07:00
Nicolò Lucchesi a7d5b016bd
[TPU][V1][CI] Update regression test baseline for v6 CI (#17064)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-04-30 04:03:22 -07:00
Nick Hill df6f3ce883
[Core] Remove prompt string from engine core data structures (#17214)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-04-25 23:41:05 -07:00
Michael Goin 14288d1332
Disable enforce_eager for V1 TPU sampler and structured output tests (#17016)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-24 02:50:09 -07:00
Chenyaaang 83d933718c
[Core][V1][TPU] Enable structured decoding on TPU V1 (#16499)
Signed-off-by: Chenyaaang <chenyangli@google.com>
2025-04-22 18:05:23 -06:00
Nicolò Lucchesi fa3bba2a53
[TPU][V1] Enable Top-P (#16843)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-04-22 00:46:07 +00:00
Nicolò Lucchesi 210207525e
[TPU][V1] Capture multimodal encoder during model compilation (#15051)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Siyuan Liu <lsiyuan@google.com>
2025-04-21 18:36:59 -06:00
Chengji Yao 471fe65630
[TPU][V1] Implicitly adjust page size when there's SMEM OOM (#16871)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-04-21 15:43:13 -06:00
Nicolò Lucchesi eb5819b2d9
[V1][TPU] Enable Top K (#15489)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
Co-authored-by: Hyesoo Yang <hyeygit@gmail.com>
2025-04-17 18:18:11 +00:00
Nicolò Lucchesi 5989f4684d
[TPU][V1] Fix padding recompilation when `max-num-batched-tokens` is not even (#16726)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-04-17 18:09:57 +00:00
Nicolò Lucchesi b3f2fddd17
[TPU][V1] Fix exponential padding when `max-num-batched-tokens` is not a power of 2 (#16596)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-04-14 17:01:05 +00:00
Nicolò Lucchesi 3cc9af88ff
[TPU][V1] Disable per-request seed/Generator (#16172)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-04-10 17:05:44 -04:00
Chengji Yao a454748544
[TPU][V1] Refine tpu_model_runner to mitigate future recompilation issues (#16275)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-04-09 18:51:51 -06:00
Chengji Yao b1eb4ca152
[TPU] Update PyTorch/XLA (#16288)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-04-09 14:46:32 +08:00
iefgnoix b6be6f8d1e
[TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. (#15732)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-04-03 14:23:28 -07:00
Hyesoo Yang 1b84eff03a
[V1][TPU] TPU-optimized top-p implementation (avoids scattering). (#15736)
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
Co-authored-by: root <root@t1v-n-822696b7-w-0.us-central2-b.c.tpu-prod-env-large-adhoc.internal>
2025-04-02 17:18:08 -07:00
Alexander Matveev 9a2160fa55
[V1] TPU CI - Add basic perf regression test (#15414)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-03-31 13:25:20 -04:00
Alexander Matveev c3f687ac22
[V1] TPU - Fix the chunked prompt bug (#15713)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-03-28 20:19:04 +00:00
Robert Shaw 2d9045fce8
[TPU][CI] Fix TPUModelRunner Test (#15667)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-03-28 00:01:26 -07:00
Robert Shaw 8a49eea74b
[CI][TPU] Temporarily Disable Quant Test on TPU (#15649)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-03-27 19:45:05 -07:00
Nicolò Lucchesi 4098b72210
[Bugfix][TPU][V1] Fix recompilation (#15553)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-03-27 19:15:06 +00:00
Chenyaaang ac3cd6e83c
[core] add bucket padding to tpu_model_runner (#14995)
Signed-off-by: Chenyaaang <llccyy1212@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-03-25 17:27:22 -04:00
yarongmu-google 0a049c7d86
[CI/Build] Add tests for the V1 tpu_model_runner. (#14843)
Signed-off-by: Yarong Mu <ymu@google.com>
2025-03-25 12:27:16 -04:00
Nicolò Lucchesi cfbb8c930f
[TPU][V1] MHA Pallas backend (#15288)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-03-21 08:50:39 -07:00
Hyesoo Yang 47195057e9
[V1][TPU] Speed up top-k on TPU by using torch.topk (#15242)
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
2025-03-20 19:19:40 -07:00
Nicolò Lucchesi d8c6d7d6b5
[V1][TPU] Support V1 Sampler for ragged attention (#14227)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-03-19 21:00:39 -07:00
Alexander Matveev 72a8639b68
[V1] TPU - CI/CD use smaller model (#15054)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-03-18 21:39:21 +00:00
Sibi a73e183e36
[Misc] Replace os environ to monkeypatch in test suite (#14516)
Signed-off-by: sibi <85477603+t-sibiraj@users.noreply.github.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
2025-03-16 20:35:57 -07:00
Alexander Matveev cb8bdfade2
[V1] TPU - Add tensor parallel support via Ray (#13618)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-03-08 08:19:38 -05:00