vllm/csrc at lwilkinson/refactor-cmake - vllm - Gitea: Git with a cup of tea

History

almersawi a547aeb828 feat(rocm-support): support mamba2 on rocm (#18565 ) Signed-off-by: Islam Almersawi <islam.almersawi@openinnovation.ai> Co-authored-by: Islam Almersawi <islam.almersawi@openinnovation.ai>		2025-05-27 00:07:53 -07:00
..
attention	fix: typos (#18151 )	2025-05-15 02:16:15 -07:00
core	[Kernel] fp4 marlin kernel (#17687 )	2025-05-10 19:58:49 -07:00
cpu	[Bugfix] Add half type support in reshape_and_cache_cpu_impl on x86 cpu platform (#18430 )	2025-05-23 01:41:37 -07:00
cutlass_extensions	fix CUDA_check redefinition in #17918 (#18287 )	2025-05-19 13:42:35 -07:00
mamba	feat(rocm-support): support mamba2 on rocm (#18565 )	2025-05-27 00:07:53 -07:00
moe	[Build/CI] Fix CUDA 11.8 build (#17679 )	2025-05-22 12:13:54 -07:00
prepare_inputs	[Misc][Easy] Annotate unused vars in the csrc files (#14798 )	2025-03-15 12:40:09 +08:00
quantization	[Build/CI] Fix CUDA 11.8 build (#17679 )	2025-05-22 12:13:54 -07:00
rocm	[ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 (#17004 )	2025-05-21 08:35:00 -07:00
sparse/cutlass	fix CUDA_check redefinition in #17918 (#18287 )	2025-05-19 13:42:35 -07:00
activation_kernels.cu	Modularize fused experts and integrate PPLX kernels (#15956 )	2025-05-14 13:11:54 -07:00
cache.h	[Attention] MLA with chunked prefill (#12639 )	2025-02-21 15:30:12 -08:00
cache_kernels.cu	Allocate kv_cache with stride order (#16605 )	2025-04-25 22:03:31 -07:00
cuda_compat.h	[Kernel][ROCm][AMD] enable fused topk_softmax kernel for moe layer (#4927 )	2024-06-02 14:13:26 -07:00
cuda_utils.h	[Attention] MLA with chunked prefill (#12639 )	2025-02-21 15:30:12 -08:00
cuda_utils_kernels.cu	[NVIDIA] Support nvfp4 quantization (#12784 )	2025-02-12 19:51:51 -08:00
cuda_view.cu	[V1] Fully Transparent Implementation of CPU Offloading (#15354 )	2025-03-31 20:22:34 +08:00
cumem_allocator.cpp	[core] improve error handling when wake up from sleep mode (#12981 )	2025-02-10 09:38:57 +08:00
custom_all_reduce.cu	[Distributed] Add custom allreduce support for ROCM (#14125 )	2025-03-31 22:49:12 -07:00
custom_all_reduce.cuh	fix: spelling (#16466 )	2025-04-11 23:24:22 -07:00
custom_all_reduce_test.cu	[Distributed] Add custom allreduce support for ROCM (#14125 )	2025-03-31 22:49:12 -07:00
dispatch_utils.h	Modularize fused experts and integrate PPLX kernels (#15956 )	2025-05-14 13:11:54 -07:00
layernorm_kernels.cu	[Kernel] Use fused rmsnorm for some models like qwen3 series (#17735 )	2025-05-06 23:10:02 -07:00
layernorm_quant_kernels.cu	dynamic distpatch of fp8 kernels (#14245 )	2025-03-11 10:54:56 -04:00
ops.h	Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844 )	2025-05-12 19:52:47 -07:00
permute_cols.cu	[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701 )	2024-09-23 13:46:26 -04:00
pos_encoding_kernels.cu	[Kernel] Have rotary embeddings support tensors (#18046 )	2025-05-14 15:43:55 -07:00
torch_bindings.cpp	feat(rocm-support): support mamba2 on rocm (#18565 )	2025-05-27 00:07:53 -07:00
type_convert.cuh	[torch.compile] Fuse RMSNorm with quant (#9138 )	2024-11-08 21:20:08 +00:00