mirror of https://github.com/vllm-project/vllm.git
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> |
||
---|---|---|
.. | ||
README.md | ||
auto_awq.md | ||
bitblas.md | ||
bnb.md | ||
fp8.md | ||
gguf.md | ||
gptqmodel.md | ||
int4.md | ||
int8.md | ||
modelopt.md | ||
quantized_kvcache.md | ||
quark.md | ||
supported_hardware.md | ||
torchao.md |
README.md
title |
---|
Quantization |
{ #quantization-index }
Quantization trades off model precision for smaller memory footprint, allowing large models to be run on a wider range of devices.
Contents: