vllm/csrc/quantization/gptq_marlin
Jinzhen Lin d06ba4ed3f
[Kernel] moe wna16 marlin kernel (#14447)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-04-14 20:05:22 -07:00
..
awq_marlin_repack.cu Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 (#15160) 2025-03-25 15:36:45 +08:00
gptq_marlin.cu [Bugfix] fix use_atomic_add support of marlin kernel when using v1 engine (#15946) 2025-04-05 20:04:22 -07:00
gptq_marlin_repack.cu Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 (#15160) 2025-03-25 15:36:45 +08:00
marlin.cuh [Kernel] moe wna16 marlin kernel (#14447) 2025-04-14 20:05:22 -07:00
marlin_dtypes.cuh [Kernel] moe wna16 marlin kernel (#14447) 2025-04-14 20:05:22 -07:00