vllm/csrc
Wentao Ye ffb2cd6b54
[Perf] Optimize `moe_align_block_size` CUDA kernel (#19572)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-06-17 11:49:26 -07:00
..
attention [MISC] Remove unused variableds in C++ (#19609) 2025-06-15 20:05:28 -07:00
core [Kernel] fp4 marlin kernel (#17687) 2025-05-10 19:58:49 -07:00
cpu [CI] change spell checker from codespell to typos (#18711) 2025-06-11 19:57:10 -07:00
cutlass_extensions [Misc] Add SPDX-FileCopyrightText (#19100) 2025-06-03 11:20:17 -07:00
mamba feat(rocm-support): support mamba2 on rocm (#18565) 2025-05-27 00:07:53 -07:00
moe [Perf] Optimize `moe_align_block_size` CUDA kernel (#19572) 2025-06-17 11:49:26 -07:00
prepare_inputs [MISC] Remove unused variableds in C++ (#19609) 2025-06-15 20:05:28 -07:00
quantization [Kernel] GGUF MMVQ kernel for multiple input vectors (#18754) 2025-06-16 17:33:26 +08:00
rocm [MISC] Remove unused variableds in C++ (#19609) 2025-06-15 20:05:28 -07:00
sparse/cutlass [CI] change spell checker from codespell to typos (#18711) 2025-06-11 19:57:10 -07:00
activation_kernels.cu Modularize fused experts and integrate PPLX kernels (#15956) 2025-05-14 13:11:54 -07:00
cache.h [Attention] MLA with chunked prefill (#12639) 2025-02-21 15:30:12 -08:00
cache_kernels.cu Allocate kv_cache with stride order (#16605) 2025-04-25 22:03:31 -07:00
cuda_compat.h [Kernel][ROCm][AMD] enable fused topk_softmax kernel for moe layer (#4927) 2024-06-02 14:13:26 -07:00
cuda_utils.h [Attention] MLA with chunked prefill (#12639) 2025-02-21 15:30:12 -08:00
cuda_utils_kernels.cu [NVIDIA] Support nvfp4 quantization (#12784) 2025-02-12 19:51:51 -08:00
cuda_view.cu [V1] Fully Transparent Implementation of CPU Offloading (#15354) 2025-03-31 20:22:34 +08:00
cumem_allocator.cpp [core] improve error handling when wake up from sleep mode (#12981) 2025-02-10 09:38:57 +08:00
custom_all_reduce.cu [Distributed] Add custom allreduce support for ROCM (#14125) 2025-03-31 22:49:12 -07:00
custom_all_reduce.cuh fix: spelling (#16466) 2025-04-11 23:24:22 -07:00
custom_all_reduce_test.cu [Distributed] Add custom allreduce support for ROCM (#14125) 2025-03-31 22:49:12 -07:00
dispatch_utils.h Modularize fused experts and integrate PPLX kernels (#15956) 2025-05-14 13:11:54 -07:00
layernorm_kernels.cu [Kernel] Use fused rmsnorm for some models like qwen3 series (#17735) 2025-05-06 23:10:02 -07:00
layernorm_quant_kernels.cu dynamic distpatch of fp8 kernels (#14245) 2025-03-11 10:54:56 -04:00
ops.h [Kernel] Integrate CUTLASS MoE kernel with PPLX (#18762) 2025-06-06 18:26:11 -07:00
permute_cols.cu [Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701) 2024-09-23 13:46:26 -04:00
pos_encoding_kernels.cu [Kernel] Have rotary embeddings support tensors (#18046) 2025-05-14 15:43:55 -07:00
sampler.cu [KERNEL] Sampler. CUDA kernel for applying repetition penalty (#18437) 2025-06-03 21:13:01 -07:00
torch_bindings.cpp [Kernel] Integrate CUTLASS MoE kernel with PPLX (#18762) 2025-06-06 18:26:11 -07:00
type_convert.cuh [torch.compile] Fuse RMSNorm with quant (#9138) 2024-11-08 21:20:08 +00:00