vllm/csrc
Charlie Fu 188b7f9b8c
[Performance][ROCm] Add skinny gemms for unquantized linear on ROCm (#15830)
Signed-off-by: charlifu <charlifu@amd.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
2025-04-21 20:46:22 -07:00
..
attention [Bugfix][Kernel] fix potential cuda graph broken for merge_attn_states kernel (#16693) 2025-04-16 03:31:39 -07:00
core [Attention] MLA with chunked prefill (#12639) 2025-02-21 15:30:12 -08:00
cpu [Bugfix] fix gettid method is not define (#16084) 2025-04-08 19:12:44 -07:00
cutlass_extensions [Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972) 2025-03-27 00:54:44 +00:00
mamba [BugFix] fix some typos found by typos. (#16314) 2025-04-09 03:43:59 -07:00
moe [BugFix] Accuracy fix for llama4 int4 - improperly casted scales (#16801) 2025-04-17 22:13:29 -07:00
prepare_inputs [Misc][Easy] Annotate unused vars in the csrc files (#14798) 2025-03-15 12:40:09 +08:00
quantization [Kernel] Add expert_map support to Cutlass FP8 MOE (#16861) 2025-04-21 20:44:32 -07:00
rocm [Performance][ROCm] Add skinny gemms for unquantized linear on ROCm (#15830) 2025-04-21 20:46:22 -07:00
sparse/cutlass [BugFix/Build] Fix sparse kernels not getting built on hopper (#14572) 2025-03-11 17:09:03 +00:00
activation_kernels.cu [Kernel] Support MulAndSilu (#11624) 2025-01-15 02:29:53 +00:00
cache.h [Attention] MLA with chunked prefill (#12639) 2025-02-21 15:30:12 -08:00
cache_kernels.cu [Misc][Docs] fix the comments of KV_T and CACHE_T in CALL_RESHAPE_AND_CACHE_XX macros (#14347) 2025-03-18 05:50:19 -07:00
cuda_compat.h [Kernel][ROCm][AMD] enable fused topk_softmax kernel for moe layer (#4927) 2024-06-02 14:13:26 -07:00
cuda_utils.h [Attention] MLA with chunked prefill (#12639) 2025-02-21 15:30:12 -08:00
cuda_utils_kernels.cu [NVIDIA] Support nvfp4 quantization (#12784) 2025-02-12 19:51:51 -08:00
cuda_view.cu [V1] Fully Transparent Implementation of CPU Offloading (#15354) 2025-03-31 20:22:34 +08:00
cumem_allocator.cpp [core] improve error handling when wake up from sleep mode (#12981) 2025-02-10 09:38:57 +08:00
custom_all_reduce.cu [Distributed] Add custom allreduce support for ROCM (#14125) 2025-03-31 22:49:12 -07:00
custom_all_reduce.cuh fix: spelling (#16466) 2025-04-11 23:24:22 -07:00
custom_all_reduce_test.cu [Distributed] Add custom allreduce support for ROCM (#14125) 2025-03-31 22:49:12 -07:00
dispatch_utils.h dynamic distpatch of fp8 kernels (#14245) 2025-03-11 10:54:56 -04:00
layernorm_kernels.cu [torch.compile] Fuse RMSNorm with quant (#9138) 2024-11-08 21:20:08 +00:00
layernorm_quant_kernels.cu dynamic distpatch of fp8 kernels (#14245) 2025-03-11 10:54:56 -04:00
ops.h [Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173) 2025-04-11 06:50:50 -06:00
permute_cols.cu [Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701) 2024-09-23 13:46:26 -04:00
pos_encoding_kernels.cu [Kernel] Make rotary_embedding ops more flexible with input shape (#12777) 2025-02-06 08:46:13 -08:00
torch_bindings.cpp [Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173) 2025-04-11 06:50:50 -06:00
type_convert.cuh [torch.compile] Fuse RMSNorm with quant (#9138) 2024-11-08 21:20:08 +00:00