vllm/quantization at main - vllm - Gitea: Git with a cup of tea

History

Jee Jee Li 1819fbda63 [Quantization] Bump to use latest bitsandbytes (#20424 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>		2025-07-03 21:58:46 +08:00
..
README.md	[Doc] Fix quantization link titles (#19478 )	2025-06-11 01:27:22 -07:00
auto_awq.md	[Docs] Fix syntax highlighting of shell commands (#19870 )	2025-06-23 17:59:09 +00:00
bitblas.md	[Docs] Fix syntax highlighting of shell commands (#19870 )	2025-06-23 17:59:09 +00:00
bnb.md	[Quantization] Bump to use latest bitsandbytes (#20424 )	2025-07-03 21:58:46 +08:00
fp8.md	[Docs] Fix syntax highlighting of shell commands (#19870 )	2025-06-23 17:59:09 +00:00
gguf.md	[Docs] Fix syntax highlighting of shell commands (#19870 )	2025-06-23 17:59:09 +00:00
gptqmodel.md	[Docs] Fix syntax highlighting of shell commands (#19870 )	2025-06-23 17:59:09 +00:00
int4.md	[Docs] Fix syntax highlighting of shell commands (#19870 )	2025-06-23 17:59:09 +00:00
int8.md	[Docs] Fix syntax highlighting of shell commands (#19870 )	2025-06-23 17:59:09 +00:00
modelopt.md	[Docs] Fix syntax highlighting of shell commands (#19870 )	2025-06-23 17:59:09 +00:00
quantized_kvcache.md	[Docs] Fix syntax highlighting of shell commands (#19870 )	2025-06-23 17:59:09 +00:00
quark.md	[Docs] Fix syntax highlighting of shell commands (#19870 )	2025-06-23 17:59:09 +00:00
supported_hardware.md	[Doc][Neuron] Update documentation for Neuron (#18868 )	2025-05-28 19:44:01 -07:00
torchao.md	[Docs] Fix syntax highlighting of shell commands (#19870 )	2025-06-23 17:59:09 +00:00

README.md

title
Quantization

{ #quantization-index }

Quantization trades off model precision for smaller memory footprint, allowing large models to be run on a wider range of devices.

Contents: