vllm/compatibility_matrix.md at main

5.9 KiB

Raw Permalink Blame History

title
Compatibility Matrix

{ #compatibility-matrix }

The tables below show mutually exclusive features and the support on some hardware.

The symbols used have the following meanings:

✅ = Full compatibility
🟠 = Partial compatibility
❌ = No compatibility
❔ = Unknown or TBD

!!! note Check the ❌ or 🟠 with links to see tracking issue for unsupported feature/hardware combination.

Feature x Feature

Feature	[CP][chunked-prefill]	[APC][automatic-prefix-caching]	[LoRA][lora-adapter]	prmpt adptr	[SD][spec-decode]	CUDA graph	pooling	enc-dec	logP	prmpt logP	async output	multi-step	mm	best-of	beam-search
[CP][chunked-prefill]	✅
[APC][automatic-prefix-caching]	✅	✅
[LoRA][lora-adapter]	✅	✅	✅
prmpt adptr	✅	✅	✅	✅
[SD][spec-decode]	✅	✅	❌	✅	✅
CUDA graph	✅	✅	✅	✅	✅	✅
pooling	❌	❌	❌	❌	❌	❌	✅
enc-dec	❌	❌	❌	❌	❌	✅	✅	✅
logP	✅	✅	✅	✅	✅	✅	❌	✅	✅
prmpt logP	✅	✅	✅	✅	✅	✅	❌	✅	✅	✅
async output	✅	✅	✅	✅	❌	✅	❌	❌	✅	✅	✅
multi-step	❌	✅	❌	✅	❌	✅	❌	❌	✅	✅	✅	✅
mm	✅	🟠	🟠	❔	❔	✅	✅	✅	✅	✅	✅	❔	✅
best-of	✅	✅	✅	✅	❌	✅	❌	✅	✅	✅	❔	❌	✅	✅
beam-search	✅	✅	✅	✅	❌	✅	❌	✅	✅	✅	❔	❌	❔	✅	✅

{ #feature-x-hardware }

Feature x Hardware

Feature	Volta	Turing	Ampere	Ada	Hopper	CPU	AMD	TPU
[CP][chunked-prefill]	❌	✅	✅	✅	✅	✅	✅	✅
[APC][automatic-prefix-caching]	❌	✅	✅	✅	✅	✅	✅	✅
[LoRA][lora-adapter]	✅	✅	✅	✅	✅	✅	✅	✅
prmpt adptr	✅	✅	✅	✅	✅	❌	✅	❌
[SD][spec-decode]	✅	✅	✅	✅	✅	✅	✅	❌
CUDA graph	✅	✅	✅	✅	✅	❌	✅	❌
pooling	✅	✅	✅	✅	✅	✅	❔	❌
enc-dec	✅	✅	✅	✅	✅	✅	❌	❌
mm	✅	✅	✅	✅	✅	✅	✅	❌
logP	✅	✅	✅	✅	✅	✅	✅	❌
prmpt logP	✅	✅	✅	✅	✅	✅	✅	❌
async output	✅	✅	✅	✅	✅	❌	❌	❌
multi-step	✅	✅	✅	✅	✅	❌	✅	❌
best-of	✅	✅	✅	✅	✅	✅	✅	❌
beam-search	✅	✅	✅	✅	✅	✅	✅	❌

!!! note Please refer to [Feature support through NxD Inference backend][feature-support-through-nxd-inference-backend] for features supported on AWS Neuron hardware

5.9 KiB Raw Permalink Blame History

Feature x Feature

Feature x Hardware

5.9 KiB

Raw Permalink Blame History