vllm/quantization at 55f1a468d97fbf9387e577e901b3f290ed8aa15b - vllm

History

Jerry Zhang 7974736740 Add support for loading torchao models with `AOPerModuleConfig` (#17826 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>		2025-05-14 16:24:59 -07:00
..
__init__.py	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 )	2024-05-13 23:50:09 +09:00
test_bitsandbytes.py	[Misc] Auto detect bitsandbytes pre-quantized models (#16027 )	2025-04-04 23:30:45 -07:00
test_compressed_tensors.py	[Misc] Add compressed-tensors NVFP4A16 emulation support (#17914 )	2025-05-11 15:58:38 +08:00
test_configs.py	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
test_cpu_offload.py	[V1] Fully Transparent Implementation of CPU Offloading (#15354 )	2025-03-31 20:22:34 +08:00
test_experts_int8.py	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 )	2025-02-02 11:58:18 -08:00
test_fp8.py	[FEAT][ROCm] Integrate Fused MoE Kernels from AITER (#14967 )	2025-03-26 16:30:30 +08:00
test_gptq_dynamic.py	[V1] V1 Enablement Oracle (#13726 )	2025-03-14 22:02:20 -07:00
test_ipex_quant.py	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 )	2025-02-02 11:58:18 -08:00
test_lm_head.py	[V1] V1 Enablement Oracle (#13726 )	2025-03-14 22:02:20 -07:00
test_ptpc_fp8.py	[ROCm] [Feature] [Doc] [Dockerfile] [BugFix] Support Per-Token-Activation Per-Channel-Weight FP8 Quantization Inferencing (#12501 )	2025-02-07 08:13:43 -08:00
test_quark.py	[Bugfix] Fix quark fp8 format loading on AMD GPUs (#12612 )	2025-05-08 02:53:53 -07:00
test_register_quantization_config.py	Improve configs - `ModelConfig` (#17130 )	2025-04-30 10:38:22 +08:00
test_torchao.py	Add support for loading torchao models with `AOPerModuleConfig` (#17826 )	2025-05-14 16:24:59 -07:00
utils.py	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 )	2025-02-02 11:58:18 -08:00