Commit Graph

7 Commits

Author SHA1 Message Date
Luka Govedič a3896c7f02
[Build] Fixes for CMake install (#18570) 2025-05-27 20:49:24 -04:00
yexin(叶鑫) b22980a1dc
[Perf]Optimize rotary_emb implementation to use Triton operator for improved inference performance (#16457)
Signed-off-by: cynthieye <yexin93@qq.com>
Co-authored-by: MagnetoWang <magnetowang@outlook.com>
2025-04-25 14:52:28 +08:00
Lucas Wilkinson 41ca7eb491
[Attention] FA3 decode perf improvement - single mma warp group support for head dim 128 (#16864)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-04-24 20:12:21 -07:00
Lucas Wilkinson 183dad7a85
[Attention] Update to lastest FA3 code (#13111)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-04-17 15:14:07 -07:00
Mickaël Seznec a597a57595
[Attention] Flash Attention 3 - fp8 (#14570)
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
2025-03-20 01:14:20 -04:00
Pavani Majety ed6ea06577
[Hardware] Update the flash attn tag to support Blackwell (#14244) 2025-03-05 22:01:37 -08:00
Lucas Wilkinson f95903909f
[Kernel] FlashMLA integration (#13747)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-02-27 10:35:08 +08:00