Commit Graph

22 Commits

Author SHA1 Message Date
Lu Fang c6703d1e0d
[MISC] Remove unused variableds in C++ (#19609)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-06-15 20:05:28 -07:00
Ning Xie 2f1c19b245
[CI] change spell checker from codespell to typos (#18711)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-06-11 19:57:10 -07:00
Charlie Fu 306d60401d
[ROCm][Kernel] Add gfx950 support for skinny gemms (#18010)
Signed-off-by: charlifu <charlifu@amd.com>
2025-05-31 07:40:05 -07:00
Hosang dd5fa7e04f
[ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 (#17004)
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>
2025-05-21 08:35:00 -07:00
Lu Fang f50dcb7c21
[Easy] Eliminate c10::optional usage in vllm/csrc (#17819) 2025-05-08 03:05:10 -07:00
Hashem Hashemi 5a499e70d5
[Kernel][Hardware][AMD] Bf16 mfma opt for ROCm skinny GEMMs (#17071)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
Signed-off-by: charlifu <charlifu@amd.com>
Co-authored-by: charlifu <charlifu@amd.com>
2025-05-07 22:34:49 -07:00
Gregory Shtrasberg 32aa74c09c
[ROCm][FP8][Kernel] FP8 quantization fused into Custom Paged Attention (#17139)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-05-07 07:12:35 -07:00
Juan Villamizar 811a6c0972
[ROCM] Add gfx950 to the custom attention archs (#16034)
Signed-off-by: jpvillam <Juan.Villamizar@amd.com>
Signed-off-by: seungrokjung <seungrok.jung@amd.com>
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: seungrokjung <seungrok.jung@amd.com>
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-05-01 11:18:28 -07:00
Lu Fang c8e5be35f7
[MISC][AMD] Add unused annotation to rocm kernel file (#17097)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-04-25 20:33:35 -07:00
Charlie Fu 188b7f9b8c
[Performance][ROCm] Add skinny gemms for unquantized linear on ROCm (#15830)
Signed-off-by: charlifu <charlifu@amd.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
2025-04-21 20:46:22 -07:00
Aleksandr Malyshev e73ff24e31
[ROCM][KERNEL] Paged attention for V1 (#15720)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com>
2025-04-02 19:48:00 -07:00
Lu Fang 051da7efe3
Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 (#15160)
Signed-off-by: Lu Fang <lufang@fb.com>
Co-authored-by: Richard Barnes <rbarnes@meta.com>
2025-03-25 15:36:45 +08:00
Lu Fang cd0cd85102
[MISC] More AMD unused var clean up (#14926)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-03-17 16:40:41 +08:00
Lu Fang 8c0d15d5c5
[Misc][Easy] Annotate unused vars in the csrc files (#14798)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-03-15 12:40:09 +08:00
TJian 848a6438ae
[ROCm] Faster Custom Paged Attention kernels (#12348) 2025-03-03 09:24:45 -08:00
Gregory Shtrasberg 0023cd2b9d
[ROCm] MI300A compile targets deprecation (#13560) 2025-02-19 23:05:00 -08:00
Shiyan Deng f1042e86f0
[Misc] AMD Build Improvements (#12923) 2025-02-12 02:36:10 -08:00
Harry Mellor 823ab79633
Update `pre-commit` hooks (#12475)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-27 17:23:08 -07:00
Gregory Shtrasberg e97f802b2d
[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
2025-01-23 18:04:03 +00:00
Lu Fang 4068f4b5b5
[MISC] Replace c10::optional with std::optional (#11730)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-01-05 10:20:34 +09:00
Charlie Fu 9cc373f390
[Kernel][Amd] Add fp8 kv cache support for rocm custom paged attention (#8577) 2024-09-19 17:37:57 +00:00
Charlie Fu 1ef0d2efd0
[Kernel][Hardware][Amd]Custom paged attention kernel for rocm (#8310) 2024-09-13 17:01:11 -07:00