vllm/quantization at 202c5df9357e7c52b51e19abc70e8444f3f85ada - vllm - Gitea: Git with a cup of tea

History

Cyrus Leung a5115f4ff5 [Doc] Fix quantization link titles (#19478 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>		2025-06-11 01:27:22 -07:00
..
README.md	[Doc] Fix quantization link titles (#19478 )	2025-06-11 01:27:22 -07:00
auto_awq.md	[doc] improve readability (#18675 )	2025-05-25 01:40:31 -07:00
bitblas.md	[doc] improve readability (#18675 )	2025-05-25 01:40:31 -07:00
bnb.md	[Misc] small improve (#18680 )	2025-05-25 06:05:38 -07:00
fp8.md	Migrate docs from Sphinx to MkDocs (#18145 )	2025-05-23 02:09:53 -07:00
gguf.md	[doc] improve readability (#18675 )	2025-05-25 01:40:31 -07:00
gptqmodel.md	[doc] improve readability (#18675 )	2025-05-25 01:40:31 -07:00
int4.md	Migrate docs from Sphinx to MkDocs (#18145 )	2025-05-23 02:09:53 -07:00
int8.md	Migrate docs from Sphinx to MkDocs (#18145 )	2025-05-23 02:09:53 -07:00
modelopt.md	Migrate docs from Sphinx to MkDocs (#18145 )	2025-05-23 02:09:53 -07:00
quantized_kvcache.md	Migrate docs from Sphinx to MkDocs (#18145 )	2025-05-23 02:09:53 -07:00
quark.md	[Doc] Fix quantization link titles (#19478 )	2025-06-11 01:27:22 -07:00
supported_hardware.md	[Doc][Neuron] Update documentation for Neuron (#18868 )	2025-05-28 19:44:01 -07:00
torchao.md	[doc] improve readability (#18675 )	2025-05-25 01:40:31 -07:00

README.md

title
Quantization

{ #quantization-index }

Quantization trades off model precision for smaller memory footprint, allowing large models to be run on a wider range of devices.

Contents: