Lucas Wilkinson
|
07334959d8
|
[Wheel Size] Only build FA2 8.0+PTX (#19336)
|
2025-06-17 12:32:49 +09:00 |
Luka Govedič
|
a3896c7f02
|
[Build] Fixes for CMake install (#18570)
|
2025-05-27 20:49:24 -04:00 |
yexin(叶鑫)
|
b22980a1dc
|
[Perf]Optimize rotary_emb implementation to use Triton operator for improved inference performance (#16457)
Signed-off-by: cynthieye <yexin93@qq.com>
Co-authored-by: MagnetoWang <magnetowang@outlook.com>
|
2025-04-25 14:52:28 +08:00 |
Lucas Wilkinson
|
41ca7eb491
|
[Attention] FA3 decode perf improvement - single mma warp group support for head dim 128 (#16864)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-04-24 20:12:21 -07:00 |
Lucas Wilkinson
|
183dad7a85
|
[Attention] Update to lastest FA3 code (#13111)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-04-17 15:14:07 -07:00 |
Mickaël Seznec
|
a597a57595
|
[Attention] Flash Attention 3 - fp8 (#14570)
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
|
2025-03-20 01:14:20 -04:00 |
Pavani Majety
|
ed6ea06577
|
[Hardware] Update the flash attn tag to support Blackwell (#14244)
|
2025-03-05 22:01:37 -08:00 |
Lucas Wilkinson
|
f95903909f
|
[Kernel] FlashMLA integration (#13747)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-02-27 10:35:08 +08:00 |