--- title: Quantization --- [](){ #quantization-index } Quantization trades off model precision for smaller memory footprint, allowing large models to be run on a wider range of devices. Contents: - [Supported Hardware](supported_hardware.md) - [AutoAWQ](auto_awq.md) - [BitsAndBytes](bnb.md) - [BitBLAS](bitblas.md) - [GGUF](gguf.md) - [GPTQModel](gptqmodel.md) - [INT4 W4A16](int4.md) - [INT8 W8A8](int8.md) - [FP8 W8A8](fp8.md) - [NVIDIA TensorRT Model Optimizer](modelopt.md) - [AMD Quark](quark.md) - [Quantized KV Cache](quantized_kvcache.md) - [TorchAO](torchao.md)