vllm/quantization at codex/add-pandas-and-datasets-to-requirements - vllm

History

Michael Goin e31446b6c8 [Perf] Tune `scaled_fp8_quant` by increasing vectorization (#18844 ) Signed-off-by: mgoin <mgoin64@gmail.com>		2025-06-03 13:48:25 -07:00
..
aqlm	[Kernel] fix types used in aqlm and ggml kernels to support dynamo (#7596 )	2024-08-16 14:00:11 -07:00
awq	[Kernel] Fix awq error when n is not divisable by 128 (#13227 )	2025-02-13 20:07:05 -08:00
compressed_tensors	[ROCm]: Fix build from source failure with gcc14 and ROCm 6.3 (#13779 )	2025-05-12 20:36:33 -07:00
cutlass_w8a8	[Build/CI] Fix CUDA 11.8 build (#17679 )	2025-05-22 12:13:54 -07:00
fp4	[Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model (#16362 )	2025-05-09 16:24:41 -07:00
fp8	[Perf] Tune `scaled_fp8_quant` by increasing vectorization (#18844 )	2025-06-03 13:48:25 -07:00
fused_kernels	[Perf] Tune `scaled_fp8_quant` by increasing vectorization (#18844 )	2025-06-03 13:48:25 -07:00
gguf	[Kernel] GGUF MoeVec kernel (#16780 )	2025-05-06 23:07:23 -07:00
gptq	Fix CUDA kernel index data type in vllm/csrc/quantization/fused_kernels/layernorm_utils.cuh +10 (#15159 )	2025-03-21 10:01:11 +08:00
gptq_allspark	[Easy] Eliminate c10::optional usage in vllm/csrc (#17819 )	2025-05-08 03:05:10 -07:00
gptq_marlin	[Misc] Add SPDX-FileCopyrightText (#19100 )	2025-06-03 11:20:17 -07:00
machete	[Misc] Add SPDX-FileCopyrightText (#19100 )	2025-06-03 11:20:17 -07:00
marlin	`pre-commit autoupdate` (#17380 )	2025-04-29 06:46:55 -07:00
activation_kernels.cu	[AMD][torch.compile] Enable silu+fp8_quant fusion for rocm (#18082 )	2025-05-13 22:13:56 -07:00
utils.cuh	[Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050 )	2025-03-31 04:42:18 -07:00
vectorization.cuh	[Perf] Tune `scaled_fp8_quant` by increasing vectorization (#18844 )	2025-06-03 13:48:25 -07:00