Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> |
||
---|---|---|
.. | ||
README.md | ||
basic.md | ||
multimodal.md | ||
registration.md | ||
tests.md |
README.md
title |
---|
Summary |
{ #new-model }
!!! important
Many decoder language models can now be automatically loaded using the [Transformers backend][transformers-backend] without having to implement them in vLLM. See if vllm serve <model>
works first!
vLLM models are specialized PyTorch models that take advantage of various [features][compatibility-matrix] to optimize their performance.
The complexity of integrating a model into vLLM depends heavily on the model's architecture. The process is considerably straightforward if the model shares a similar architecture with an existing model in vLLM. However, this can be more complex for models that include new operators (e.g., a new attention mechanism).
Read through these pages for a step-by-step guide:
!!! tip If you are encountering issues while integrating your model into vLLM, feel free to open a GitHub issue or ask on our developer slack. We will be happy to help you out!