vllm/quantization at 6d0df0ebebd4e347e1ebcdea4be010a4b54b901b - vllm

History

Dipika Sikka 54a66e5fee [Misc] Update `compressed-tensors` WNA16 to support zero-points (#14211 )		2025-04-15 07:33:51 -06:00
..
__init__.py	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 )	2024-05-13 23:50:09 +09:00
test_bitsandbytes.py	[Misc] Auto detect bitsandbytes pre-quantized models (#16027 )	2025-04-04 23:30:45 -07:00
test_compressed_tensors.py	[Misc] Update `compressed-tensors` WNA16 to support zero-points (#14211 )	2025-04-15 07:33:51 -06:00
test_configs.py	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
test_cpu_offload.py	[V1] Fully Transparent Implementation of CPU Offloading (#15354 )	2025-03-31 20:22:34 +08:00
test_experts_int8.py	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 )	2025-02-02 11:58:18 -08:00
test_fp8.py	[FEAT][ROCm] Integrate Fused MoE Kernels from AITER (#14967 )	2025-03-26 16:30:30 +08:00
test_gptq_dynamic.py	[V1] V1 Enablement Oracle (#13726 )	2025-03-14 22:02:20 -07:00
test_ipex_quant.py	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 )	2025-02-02 11:58:18 -08:00
test_lm_head.py	[V1] V1 Enablement Oracle (#13726 )	2025-03-14 22:02:20 -07:00
test_ptpc_fp8.py	[ROCm] [Feature] [Doc] [Dockerfile] [BugFix] Support Per-Token-Activation Per-Channel-Weight FP8 Quantization Inferencing (#12501 )	2025-02-07 08:13:43 -08:00
test_quark.py	[Bugfix] Fix bugs of running Quark quantized models (#16236 )	2025-04-11 10:18:32 -04:00
test_register_quantization_config.py	[V1] V1 Enablement Oracle (#13726 )	2025-03-14 22:02:20 -07:00
test_torchao.py	Torchao (#14231 )	2025-04-07 19:39:28 -04:00
utils.py	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 )	2025-02-02 11:58:18 -08:00