vllm/quantization at main - vllm - Gitea: Git with a cup of tea

History

Wentao Ye 783921d889 [Perf] Optimize Vectorization Utils for Int 8 Quantization Kernels (#20331 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>		2025-07-04 15:06:24 +08:00
..
aqlm	[Kernel] fix types used in aqlm and ggml kernels to support dynamo (#7596 )	2024-08-16 14:00:11 -07:00
awq	[Kernel] Fix awq error when n is not divisable by 128 (#13227 )	2025-02-13 20:07:05 -08:00
compressed_tensors	[Perf] Optimize Vectorization Utils for Int 8 Quantization Kernels (#20331 )	2025-07-04 15:06:24 +08:00
cutlass_w8a8	[NVIDIA] Support Cutlass w8a8 FP8 for Blackwell Geforce GPUs (sm120) (#17280 )	2025-07-02 06:47:19 -06:00
fp4	[Kernel][Bugfix] Fixup some warnings in nvfp4_blockwise_moe when CUDA < 12.8 (#20324 )	2025-07-01 18:05:47 -07:00
fp8	[MISC] Remove unused variableds in C++ (#19609 )	2025-06-15 20:05:28 -07:00
fused_kernels	[Perf] Tune `scaled_fp8_quant` by increasing vectorization (#18844 )	2025-06-03 13:48:25 -07:00
gguf	[Kernel] GGUF MMVQ kernel for multiple input vectors (#18754 )	2025-06-16 17:33:26 +08:00
gptq	[MISC] Remove unused variableds in C++ (#19609 )	2025-06-15 20:05:28 -07:00
gptq_allspark	[Easy] Eliminate c10::optional usage in vllm/csrc (#17819 )	2025-05-08 03:05:10 -07:00
gptq_marlin	remove unused variables in marlin_template.h (#20236 )	2025-07-02 00:51:52 +00:00
machete	[CI] change spell checker from codespell to typos (#18711 )	2025-06-11 19:57:10 -07:00
marlin	`pre-commit autoupdate` (#17380 )	2025-04-29 06:46:55 -07:00
activation_kernels.cu	[AMD][torch.compile] Enable silu+fp8_quant fusion for rocm (#18082 )	2025-05-13 22:13:56 -07:00
utils.cuh	[Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050 )	2025-03-31 04:42:18 -07:00
vectorization.cuh	[Perf] Tune `scaled_fp8_quant` by increasing vectorization (#18844 )	2025-06-03 13:48:25 -07:00
vectorization_utils.cuh	[Perf] Optimize Vectorization Utils for Int 8 Quantization Kernels (#20331 )	2025-07-04 15:06:24 +08:00