Michael Goin
|
e31446b6c8
|
[Perf] Tune `scaled_fp8_quant` by increasing vectorization (#18844)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-03 13:48:25 -07:00 |
Arjun Kathuria
|
d8487ef557
|
[ROCm]: Fix build from source failure with gcc14 and ROCm 6.3 (#13779)
Signed-off-by: Arjun Kathuria <arjun.kathuria8@gmail.com>
|
2025-05-12 20:36:33 -07:00 |
Richard Barnes
|
d6da8a8ff2
|
[Bugfix] Fix `numel()` downcast in fused_layernorm_dynamic_per_token_quant.cu (#17316)
|
2025-04-28 19:23:18 -07:00 |
Charlie Fu
|
e85829450d
|
[Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-03-31 04:42:18 -07:00 |
Lu Fang
|
d3ccbd6350
|
Fix CUDA kernel index data type in vllm/csrc/quantization/fused_kernels/layernorm_utils.cuh +10 (#15159)
Signed-off-by: Lu Fang <lufang@fb.com>
Co-authored-by: Richard Barnes <rbarnes@meta.com>
|
2025-03-21 10:01:11 +08:00 |
Jeff Daily
|
a1c8f3796c
|
dynamic distpatch of fp8 kernels (#14245)
Signed-off-by: Jeff Daily <jeff.daily@amd.com>
|
2025-03-11 10:54:56 -04:00 |
Luka Govedič
|
30870b4f66
|
[torch.compile] Dynamic fp8 + rms_norm fusion (#10906)
Signed-off-by: luka <luka@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-12-13 03:19:23 +00:00 |