vllm/docs/features/compatibility_matrix.md

5.9 KiB

title
Compatibility Matrix

{ #compatibility-matrix }

The tables below show mutually exclusive features and the support on some hardware.

The symbols used have the following meanings:

  • = Full compatibility
  • 🟠 = Partial compatibility
  • = No compatibility
  • = Unknown or TBD

!!! note Check the or 🟠 with links to see tracking issue for unsupported feature/hardware combination.

Feature x Feature

Feature [CP][chunked-prefill] [APC][automatic-prefix-caching] [LoRA][lora-adapter] prmpt adptr [SD][spec-decode] CUDA graph pooling enc-dec logP prmpt logP async output multi-step mm best-of beam-search
[CP][chunked-prefill]
[APC][automatic-prefix-caching]
[LoRA][lora-adapter]
prmpt adptr
[SD][spec-decode]
CUDA graph
pooling
enc-dec
logP
prmpt logP
async output
multi-step
mm 🟠 🟠
best-of
beam-search

{ #feature-x-hardware }

Feature x Hardware

Feature Volta Turing Ampere Ada Hopper CPU AMD TPU
[CP][chunked-prefill]
[APC][automatic-prefix-caching]
[LoRA][lora-adapter]
prmpt adptr
[SD][spec-decode]
CUDA graph
pooling
enc-dec
mm
logP
prmpt logP
async output
multi-step
best-of
beam-search

!!! note Please refer to [Feature support through NxD Inference backend][feature-support-through-nxd-inference-backend] for features supported on AWS Neuron hardware