vllm/docs/features/quantization
Jee Jee Li 1819fbda63
[Quantization] Bump to use latest bitsandbytes (#20424)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-03 21:58:46 +08:00
..
README.md [Doc] Fix quantization link titles (#19478) 2025-06-11 01:27:22 -07:00
auto_awq.md [Docs] Fix syntax highlighting of shell commands (#19870) 2025-06-23 17:59:09 +00:00
bitblas.md [Docs] Fix syntax highlighting of shell commands (#19870) 2025-06-23 17:59:09 +00:00
bnb.md [Quantization] Bump to use latest bitsandbytes (#20424) 2025-07-03 21:58:46 +08:00
fp8.md [Docs] Fix syntax highlighting of shell commands (#19870) 2025-06-23 17:59:09 +00:00
gguf.md [Docs] Fix syntax highlighting of shell commands (#19870) 2025-06-23 17:59:09 +00:00
gptqmodel.md [Docs] Fix syntax highlighting of shell commands (#19870) 2025-06-23 17:59:09 +00:00
int4.md [Docs] Fix syntax highlighting of shell commands (#19870) 2025-06-23 17:59:09 +00:00
int8.md [Docs] Fix syntax highlighting of shell commands (#19870) 2025-06-23 17:59:09 +00:00
modelopt.md [Docs] Fix syntax highlighting of shell commands (#19870) 2025-06-23 17:59:09 +00:00
quantized_kvcache.md [Docs] Fix syntax highlighting of shell commands (#19870) 2025-06-23 17:59:09 +00:00
quark.md [Docs] Fix syntax highlighting of shell commands (#19870) 2025-06-23 17:59:09 +00:00
supported_hardware.md [Doc][Neuron] Update documentation for Neuron (#18868) 2025-05-28 19:44:01 -07:00
torchao.md [Docs] Fix syntax highlighting of shell commands (#19870) 2025-06-23 17:59:09 +00:00

README.md

title
Quantization

{ #quantization-index }

Quantization trades off model precision for smaller memory footprint, allowing large models to be run on a wider range of devices.

Contents: