vllm/quantization at 6d0df0ebebd4e347e1ebcdea4be010a4b54b901b - vllm

History

Varun Sundar Rabindranath 7b8a2ab76f [Kernel] Add expert_map support to Cutlass FP8 MOE (#16861 ) Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com> Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com>		2025-04-21 20:44:32 -07:00
..
aqlm	[Kernel] fix types used in aqlm and ggml kernels to support dynamo (#7596 )	2024-08-16 14:00:11 -07:00
awq	[Kernel] Fix awq error when n is not divisable by 128 (#13227 )	2025-02-13 20:07:05 -08:00
compressed_tensors	[MISC] Replace c10::optional with std::optional (#11730 )	2025-01-05 10:20:34 +09:00
cutlass_w8a8	[Kernel] Add expert_map support to Cutlass FP8 MOE (#16861 )	2025-04-21 20:44:32 -07:00
fp4	[Kernel] Add ModelOpt FP4 Checkpoint Support (#12520 )	2025-03-12 05:13:11 +00:00
fp8	[Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050 )	2025-03-31 04:42:18 -07:00
fused_kernels	[Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050 )	2025-03-31 04:42:18 -07:00
gguf	[BugFix][ROCm] Fix GGUF MoE Dispatch Block_Dim for ROCm (#16247 )	2025-04-08 05:10:26 -07:00
gptq	Fix CUDA kernel index data type in vllm/csrc/quantization/fused_kernels/layernorm_utils.cuh +10 (#15159 )	2025-03-21 10:01:11 +08:00
gptq_allspark	Fix CUDA kernel index data type in vllm/csrc/quantization/fused_kernels/layernorm_utils.cuh +10 (#15159 )	2025-03-21 10:01:11 +08:00
gptq_marlin	[Kernel] moe wna16 marlin kernel (#14447 )	2025-04-14 20:05:22 -07:00
machete	add cutlass support for blackwell fp8 gemm (#13798 )	2025-03-04 07:55:07 -08:00
marlin	Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 (#15160 )	2025-03-25 15:36:45 +08:00
utils.cuh	[Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050 )	2025-03-31 04:42:18 -07:00
vectorization.cuh	dynamic distpatch of fp8 kernels (#14245 )	2025-03-11 10:54:56 -04:00