--- title: Quantization --- [](){ #quantization-index } Quantization trades off model precision for smaller memory footprint, allowing large models to be run on a wider range of devices. Contents: - [Supported_Hardware](supported_hardware.md) - [Auto_Awq](auto_awq.md) - [Bnb](bnb.md) - [Bitblas](bitblas.md) - [Gguf](gguf.md) - [Gptqmodel](gptqmodel.md) - [Int4](int4.md) - [Int8](int8.md) - [Fp8](fp8.md) - [Modelopt](modelopt.md) - [Quark](quark.md) - [Quantized_Kvcache](quantized_kvcache.md) - [Torchao](torchao.md)