vllm/model at woosuk/async-sched - vllm - Gitea: Git with a cup of tea

History

Kyle Sayers d8cf819a9a [Core] [Bugfix] [Multimodal] Fix multimodal profiling and generation for SFT/PTQed models (#20058 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>		2025-06-30 17:26:49 +00:00
..
README.md	[Doc] Rename page titles (#20130 )	2025-06-26 08:18:49 -07:00
basic.md	[Doc] Rename page titles (#20130 )	2025-06-26 08:18:49 -07:00
multimodal.md	[Core] [Bugfix] [Multimodal] Fix multimodal profiling and generation for SFT/PTQed models (#20058 )	2025-06-30 17:26:49 +00:00
registration.md	[Doc] Rename page titles (#20130 )	2025-06-26 08:18:49 -07:00
tests.md	[Doc] Rename page titles (#20130 )	2025-06-26 08:18:49 -07:00

README.md

title
Summary

{ #new-model }

!!! important Many decoder language models can now be automatically loaded using the [Transformers backend][transformers-backend] without having to implement them in vLLM. See if vllm serve <model> works first!

vLLM models are specialized PyTorch models that take advantage of various [features][compatibility-matrix] to optimize their performance.

The complexity of integrating a model into vLLM depends heavily on the model's architecture. The process is considerably straightforward if the model shares a similar architecture with an existing model in vLLM. However, this can be more complex for models that include new operators (e.g., a new attention mechanism).

Read through these pages for a step-by-step guide:

!!! tip If you are encountering issues while integrating your model into vLLM, feel free to open a GitHub issue or ask on our developer slack. We will be happy to help you out!