vllm/csrc at 6d0df0ebebd4e347e1ebcdea4be010a4b54b901b - vllm

History

Charlie Fu 188b7f9b8c [Performance][ROCm] Add skinny gemms for unquantized linear on ROCm (#15830 ) Signed-off-by: charlifu <charlifu@amd.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>		2025-04-21 20:46:22 -07:00
..
attention	[Bugfix][Kernel] fix potential cuda graph broken for merge_attn_states kernel (#16693 )	2025-04-16 03:31:39 -07:00
core	[Attention] MLA with chunked prefill (#12639 )	2025-02-21 15:30:12 -08:00
cpu	[Bugfix] fix gettid method is not define (#16084 )	2025-04-08 19:12:44 -07:00
cutlass_extensions	[Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972 )	2025-03-27 00:54:44 +00:00
mamba	[BugFix] fix some typos found by typos. (#16314 )	2025-04-09 03:43:59 -07:00
moe	[BugFix] Accuracy fix for llama4 int4 - improperly casted scales (#16801 )	2025-04-17 22:13:29 -07:00
prepare_inputs	[Misc][Easy] Annotate unused vars in the csrc files (#14798 )	2025-03-15 12:40:09 +08:00
quantization	[Kernel] Add expert_map support to Cutlass FP8 MOE (#16861 )	2025-04-21 20:44:32 -07:00
rocm	[Performance][ROCm] Add skinny gemms for unquantized linear on ROCm (#15830 )	2025-04-21 20:46:22 -07:00
sparse/cutlass	[BugFix/Build] Fix sparse kernels not getting built on hopper (#14572 )	2025-03-11 17:09:03 +00:00
activation_kernels.cu	[Kernel] Support MulAndSilu (#11624 )	2025-01-15 02:29:53 +00:00
cache.h	[Attention] MLA with chunked prefill (#12639 )	2025-02-21 15:30:12 -08:00
cache_kernels.cu	[Misc][Docs] fix the comments of KV_T and CACHE_T in CALL_RESHAPE_AND_CACHE_XX macros (#14347 )	2025-03-18 05:50:19 -07:00
cuda_compat.h	[Kernel][ROCm][AMD] enable fused topk_softmax kernel for moe layer (#4927 )	2024-06-02 14:13:26 -07:00
cuda_utils.h	[Attention] MLA with chunked prefill (#12639 )	2025-02-21 15:30:12 -08:00
cuda_utils_kernels.cu	[NVIDIA] Support nvfp4 quantization (#12784 )	2025-02-12 19:51:51 -08:00
cuda_view.cu	[V1] Fully Transparent Implementation of CPU Offloading (#15354 )	2025-03-31 20:22:34 +08:00
cumem_allocator.cpp	[core] improve error handling when wake up from sleep mode (#12981 )	2025-02-10 09:38:57 +08:00
custom_all_reduce.cu	[Distributed] Add custom allreduce support for ROCM (#14125 )	2025-03-31 22:49:12 -07:00
custom_all_reduce.cuh	fix: spelling (#16466 )	2025-04-11 23:24:22 -07:00
custom_all_reduce_test.cu	[Distributed] Add custom allreduce support for ROCM (#14125 )	2025-03-31 22:49:12 -07:00
dispatch_utils.h	dynamic distpatch of fp8 kernels (#14245 )	2025-03-11 10:54:56 -04:00
layernorm_kernels.cu	[torch.compile] Fuse RMSNorm with quant (#9138 )	2024-11-08 21:20:08 +00:00
layernorm_quant_kernels.cu	dynamic distpatch of fp8 kernels (#14245 )	2025-03-11 10:54:56 -04:00
ops.h	[Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173 )	2025-04-11 06:50:50 -06:00
permute_cols.cu	[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701 )	2024-09-23 13:46:26 -04:00
pos_encoding_kernels.cu	[Kernel] Make rotary_embedding ops more flexible with input shape (#12777 )	2025-02-06 08:46:13 -08:00
torch_bindings.cpp	[Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173 )	2025-04-11 06:50:50 -06:00
type_convert.cuh	[torch.compile] Fuse RMSNorm with quant (#9138 )	2024-11-08 21:20:08 +00:00