Commit Graph

379 Commits

Author SHA1 Message Date
Rafael Vasquez f7db5f0fa9
[Doc] Use shell code-blocks and fix section headers (#9508)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-10-22 06:43:24 +00:00
Dhia Eddine Rhaiem f6b97293aa
[Model] FalconMamba Support (#9325) 2024-10-21 12:50:16 -04:00
Michael Goin 3921a2f29e
[Model] Support Pixtral models in the HF Transformers format (#9036) 2024-10-18 13:29:56 -06:00
Cyrus Leung 051eaf6db3
[Model] Add user-configurable task for models that support both generation and embedding (#9424) 2024-10-18 11:31:58 -07:00
Kuntai Du 81ede99ca4
[Core] Deprecating block manager v1 and make block manager v2 default (#8704)
Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
2024-10-17 11:38:15 -05:00
Junhao Li 5b8a1fde84
[Model][Bugfix] Add FATReLU activation and support for openbmb/MiniCPM-S-1B-sft (#9396) 2024-10-16 16:40:24 +00:00
Roger Wang 59230ef32b
[Misc] Consolidate example usage of OpenAI client for multimodal models (#9412)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-16 11:20:51 +00:00
Cyrus Leung 7abba39ee6
[Model] VLM2Vec, the first multimodal embedding model in vLLM (#9303) 2024-10-16 14:31:00 +08:00
Michael Goin 8e836d982a
[Doc] Fix code formatting in spec_decode.rst (#9348) 2024-10-14 21:29:11 -07:00
Tyler Michael Smith 169b530607
[Bugfix] Clean up some cruft in mamba.py (#9343) 2024-10-15 00:24:25 +00:00
Reza Salehi dfe43a2071
[Model] Molmo vLLM Integration (#9016)
Co-authored-by: sanghol <sanghol@allenai.org>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-10-14 07:56:24 -07:00
Wallas Henrique 8baf85e4e9
[Doc] Compatibility matrix for mutual exclusive features (#8512)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
2024-10-11 11:18:50 -07:00
sixgod 6cf1167c1a
[Model] Add GLM-4v support and meet vllm==0.6.2 (#9242) 2024-10-11 17:36:13 +00:00
Tyler Michael Smith 7342a7d7f8
[Model] Support Mamba (#6484) 2024-10-11 15:40:06 +00:00
Cyrus Leung e808156f30
[Misc] Collect model support info in a single process per model (#9233) 2024-10-11 11:08:11 +00:00
whyiug 04de9057ab
[Model] support input image embedding for minicpmv (#9237) 2024-10-10 15:00:47 +00:00
Jiangtao Hu dc4aea677a
[Doc] Fix VLM prompt placeholder sample bug (#9170) 2024-10-09 08:59:42 +00:00
Sayak Paul 1874c6a1b0
[Doc] Update vlm.rst to include an example on videos (#9155)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-10-08 18:12:29 +00:00
Cyrus Leung 151ef4efd2
[Model] Support NVLM-D and fix QK Norm in InternViT (#9045)
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2024-10-07 11:55:12 +00:00
Cyrus Leung b22b798471
[Model] PP support for embedding models and update docs (#9090)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-10-06 16:35:27 +08:00
Cyrus Leung f22619fe96
[Misc] Remove user-facing error for removed VLM args (#9104) 2024-10-06 01:33:52 -07:00
Roger Wang 26aa325f4f
[Core][VLM] Test registration for OOT multimodal models (#8717)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-04 10:38:25 -07:00
Cyrus Leung 0e36fd4909
[Misc] Move registry to its own file (#9064) 2024-10-04 10:01:37 +00:00
Murali Andoorveedu 0f6d7a9a34
[Models] Add remaining model PP support (#7168)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-04 10:56:58 +08:00
Nick Hill 18c2e30c57
[Doc] Update Granite model docs (#9025) 2024-10-03 02:42:24 +00:00
Cyrus Leung 4f341bd4bf
[Doc] Update list of supported models (#8987) 2024-10-02 00:35:39 +08:00
whyiug e01ab595d8
[Model] support input embeddings for qwen2vl (#8856) 2024-09-30 03:16:10 +00:00
Cyrus Leung 3b00b9c26c
[Core] rename`PromptInputs` and `inputs` (#8876) 2024-09-26 20:35:15 -07:00
Roger Wang 4bb98f2190
[Misc] Update config loading for Qwen2-VL and remove Granite (#8837) 2024-09-26 07:45:30 -07:00
Roger Wang e2c6e0a829
[Doc] Update doc for Transformers 4.45 (#8817) 2024-09-25 13:29:48 -07:00
Chen Zhang 770ec6024f
[Model] Add support for the multi-modal Llama 3.2 model (#8811)
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-25 13:29:32 -07:00
Simon Mo 4f1ba0844b
Revert "rename PromptInputs and inputs with backward compatibility (#8760) (#8810) 2024-09-25 10:36:26 -07:00
Cyrus Leung 28e1299e60
rename PromptInputs and inputs with backward compatibility (#8760) 2024-09-25 09:36:47 -07:00
Simon Mo 3185fb0cca
Revert "[Core] Rename `PromptInputs` to `PromptType`, and `inputs` to `prompt`" (#8750) 2024-09-24 05:45:20 +00:00
litianjian 5b59532760
[Model][VLM] Add LLaVA-Onevision model support (#8486)
Co-authored-by: litianjian <litianjian@bytedance.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-22 10:51:44 -07:00
Cyrus Leung 0057894ef7
[Core] Rename `PromptInputs` and `inputs`(#8673) 2024-09-20 19:00:54 -07:00
Niklas Muennighoff 3b63de9353
[Model] Add OLMoE (#7922) 2024-09-20 09:31:41 -07:00
Jiaxin Shan 260d40b5ea
[Core] Support Lora lineage and base model metadata management (#6315) 2024-09-20 06:20:56 +00:00
Geun, Lim e18749ff09
[Model] Support Solar Model (#8386)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-09-18 11:04:00 -06:00
ywfang 8a0cf1ddc3
[Model] support minicpm3 (#8297)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-14 14:50:26 +00:00
Cyrus Leung a84e598e21
[CI/Build] Reorganize models tests (#7820) 2024-09-13 10:20:06 -07:00
Alex Brooks c6202daeed
[Model] Support multiple images for qwen-vl (#8247)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-12 10:10:54 -07:00
Patrick von Platen d394787e52
Pixtral (#8377)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-11 14:41:55 -07:00
Yang Fan 3b7fea770f
[Model][VLM] Add Qwen2-VL model support (#7905)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-11 09:31:19 -07:00
Yangshen⚡Deng 6a512a00df
[model] Support for Llava-Next-Video model (#7559)
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-09-10 22:21:36 -07:00
Isotr0py e807125936
[Model][VLM] Support multi-images inputs for InternVL2 models (#8201) 2024-09-07 16:38:23 +08:00
Cyrus Leung 2f707fcb35
[Model] Multi-input support for LLaVA (#8238) 2024-09-07 02:57:24 +00:00
Jiaxin Shan db3bf7c991
[Core] Support load and unload LoRA in api server (#6566)
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2024-09-05 18:10:33 -07:00
sroy745 2febcf2777
[Documentation][Spec Decode] Add documentation about lossless guarantees in Speculative Decoding in vLLM (#7962) 2024-09-05 16:25:29 -04:00
Alex Brooks 9da25a88aa
[MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) (#8029)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-05 12:48:10 +00:00
Cyrus Leung 288a938872
[Doc] Indicate more information about supported modalities (#8181) 2024-09-05 10:51:53 +00:00
Wenxiang 1248e8506a
[Model] Adding support for MSFT Phi-3.5-MoE (#7729)
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Zeqi Lin <zelin@microsoft.com>
Co-authored-by: Zeqi Lin <Zeqi.Lin@microsoft.com>
2024-08-30 13:42:57 -06:00
Yohan Na dc13e99348
[MODEL] add Exaone model support (#7819) 2024-08-29 23:34:20 -07:00
Peter Salas 57792ed469
[Doc] Fix incorrect docs from #7615 (#7788) 2024-08-22 10:02:06 -07:00
zifeitong df1a21131d
[Model] Fix Phi-3.5-vision-instruct 'num_crops' issue (#7710) 2024-08-22 09:36:24 +08:00
Peter Salas 1ca0d4f86b
[Model] Add UltravoxModel and UltravoxConfig (#7615) 2024-08-21 22:49:39 +00:00
Roger Wang 4506641212
[Doc] Section for Multimodal Language Models (#7719) 2024-08-20 23:24:01 -07:00
Cyrus Leung 3f674a49b5
[VLM][Core] Support profiling with multiple multi-modal inputs per prompt (#7126) 2024-08-14 17:55:42 +00:00
Peter Salas 00c3d68e45
[Frontend][Core] Add plumbing to support audio language models (#7446) 2024-08-13 17:39:33 +00:00
Roger Wang e6e42e4b17
[Core][VLM] Support image embeddings as input (#6613) 2024-08-12 16:16:06 +08:00
Jee Jee Li 757ac70a64
[Model] Rename MiniCPMVQwen2 to MiniCPMV2.6 (#7273) 2024-08-08 14:02:41 +00:00
Stas Bekman 0e12cd67a8
[Doc] add online speculative decoding example (#7243) 2024-08-07 09:58:02 -07:00
Thomas Parnell 789937af2e
[Doc] [SpecDecode] Update MLPSpeculator documentation (#7100)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2024-08-05 23:29:43 +00:00
Jee Jee Li 179a6a36f2
[Model]Refactor MiniCPMV (#7020)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-04 08:12:41 +00:00
Alphi 2f4e108f75
[Bugfix] Clean up MiniCPM-V (#6939)
Co-authored-by: hezhihui <hzh7269@modelbest.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-07-31 14:39:19 +00:00
Isotr0py 7cbd9ec7a9
[Model] Initialize support for InternVL2 series models (#6514)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-29 10:16:30 +00:00
Cyrus Leung 1ad86acf17
[Model] Initial support for BLIP-2 (#5920)
Co-authored-by: ywang96 <ywang@roblox.com>
2024-07-27 11:53:07 +00:00
Roger Wang ecb33a28cb
[CI/Build][Doc] Update CI and Doc for VLM example changes (#6860) 2024-07-27 09:54:14 +00:00
Michael Goin 281977bd6e
[Doc] Add Nemotron to supported model docs (#6843) 2024-07-26 17:32:44 -04:00
Alphi 9e169a4c61
[Model] Adding support for MiniCPM-V (#4087) 2024-07-24 20:59:30 -07:00
Woosuk Kwon cb1362a889
[Docs] Announce llama3.1 support (#6688) 2024-07-23 08:18:15 -07:00
Roger Wang 22fa2e35cb
[VLM][Model] Support image input for Chameleon (#6633) 2024-07-22 23:50:48 -07:00
Cyrus Leung 739b61a348
[Frontend] Refactor prompt processing (#4028)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-22 10:13:53 -07:00
Cyrus Leung 5bf35a91e4
[Doc][CI/Build] Update docs and tests to use `vllm serve` (#6431) 2024-07-17 07:43:21 +00:00
Jiaxin Shan 94162beb9f
[Doc] Fix the lora adapter path in server startup script (#6230) 2024-07-16 10:11:04 -07:00
Yuan Tang 6ef3bf912c
Remove unnecessary trailing period in spec_decode.rst (#6405) 2024-07-14 07:58:09 +00:00
Isotr0py 540c0368b1
[Model] Initialize Fuyu-8B support (#3924)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-14 05:27:14 +00:00
Roger Wang 6206dcb29e
[Model] Add PaliGemma (#5189)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-07-07 09:25:50 +08:00
Cyrus Leung 9389380015
[Doc] Move guide for multimodal model and other improvements (#6168) 2024-07-06 17:18:59 +08:00
Roger Wang 175c43eca4
[Doc] Reorganize Supported Models by Type (#6167) 2024-07-06 05:59:36 +00:00
Cyrus Leung ae96ef8fbd
[VLM] Calculate maximum number of multi-modal tokens by model (#6121) 2024-07-04 16:37:23 -07:00
xwjiang2010 d9e98f42e4
[vlm] Remove vision language config. (#6089)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-03 22:14:16 +00:00
Cyrus Leung 9831aec49f
[Core] Dynamic image size support for VLMs (#5276)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: ywang96 <ywang@roblox.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-07-02 20:34:00 -07:00
Mor Zusman 9d6a8daa87
[Model] Jamba support (#4115)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Erez Schwartz <erezs@ai21.com>
Co-authored-by: Mor Zusman <morz@ai21.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Tomer Asida <tomera@ai21.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-02 23:11:29 +00:00
xwjiang2010 98d6682cd1
[VLM] Remove `image_input_type` from VLM config (#5852)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-02 07:57:09 +00:00
Cyrus Leung 5cbe8d155c
[Core] Registry for processing model inputs (#5214)
Co-authored-by: ywang96 <ywang@roblox.com>
2024-06-28 12:09:56 +00:00
Woosuk Kwon 79c92c7c8a
[Model] Add Gemma 2 (#5908) 2024-06-27 13:33:56 -07:00
Cyrus Leung 96354d6a29
[Model] Add base class for LoRA-supported models (#5018) 2024-06-27 16:03:04 +08:00
Roger Wang 3aa7b6cf66
[Misc][Doc] Add Example of using OpenAI Server with VLM (#5832) 2024-06-25 20:34:25 -07:00
Cyrus Leung f23871e9ee
[Doc] Add notice about breaking changes to VLMs (#5818) 2024-06-25 01:25:03 -07:00
Michael Goin 1744cc99ba
[Doc] Add Phi-3-medium to list of supported models (#5788) 2024-06-24 10:48:55 -07:00
Isotr0py daef218b55
[Model] Initialize Phi-3-vision support (#4986) 2024-06-17 19:34:33 -07:00
Cyrus Leung 0ce7b952f8
[Doc] Update LLaVA docs (#5437)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-06-13 11:22:07 -07:00
Cade Daniel 89ec06c33b
[Docs] [Spec decode] Fix docs error in code example (#5427) 2024-06-11 10:31:56 -07:00
Cade Daniel 4c2ffb28ff
[Speculative decoding] Initial spec decode docs (#5400) 2024-06-11 10:15:40 -07:00
SangBin Cho 246598a6b1
[CI] docfix (#5410)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: ywang96 <ywang@roblox.com>
2024-06-11 01:28:50 -07:00
Roger Wang 856c990041
[Docs] Add Docs on Limitations of VLM Support (#5383) 2024-06-10 09:53:50 -07:00
Cyrus Leung 6b29d6fe70
[Model] Initial support for LLaVA-NeXT (#4199)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-06-10 12:47:15 +00:00
Roger Wang 7a9cb294ae
[Frontend] Add OpenAI Vision API Support (#5237)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-06-07 11:23:32 -07:00
Cyrus Leung 7a64d24aad
[Core] Support image processor (#4197) 2024-06-02 22:56:41 -07:00
Nick Hill 657579113f
[Doc] Add checkmark for GPTBigCodeForCausalLM LoRA support (#5171) 2024-05-31 17:20:19 -07:00
Eric Xihui Lin 8e192ff967
[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799)
Co-authored-by: beagleski <yunanzhang@microsoft.com>
Co-authored-by: bapatra <bapatra@microsoft.com>
Co-authored-by: Barun Patra <codedecde@users.noreply.github.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-05-24 22:00:52 -07:00
Isotr0py f12c3b5b3d
[Model] Add Phi-2 LoRA support (#4886) 2024-05-21 14:24:17 +09:00
Zhuohan Li ac1fbf7fd2
[Doc] Shorten README by removing supported model list (#4796) 2024-05-13 16:23:54 -07:00
SangBin Cho e7c46b9527
[Scheduler] Warning upon preemption and Swapping (#4647)
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
2024-05-13 23:50:44 +09:00
Simon Mo 51d4094fda
chunked-prefill-doc-syntax (#4603)
Fix the docs: https://docs.vllm.ai/en/latest/models/performance.html

Co-authored-by: sang <rkooo567@gmail.com>
2024-05-10 14:13:23 +09:00
SangBin Cho 36fb68f947
[Doc] Chunked Prefill Documentation (#4580) 2024-05-04 00:18:00 -07:00
Isotr0py fbf152d976
[Bugfix][Model] Refactor OLMo model to support new HF format in transformers 4.40.0 (#4324)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-04-25 09:35:56 -07:00
Caio Mendes 96e90fdeb3
[Model] Adds Phi-3 support (#4298) 2024-04-25 03:06:57 +00:00
xiaoji 7f2593b164
[Doc]: Update the doc of adding new models (#4236) 2024-04-21 09:57:08 -07:00
Harry Mellor fe7d648fe5
Don't show default value for flags in `EngineArgs` (#4223)
Co-authored-by: Harry Mellor <hmellor@oxts.com>
2024-04-21 09:15:28 -07:00
Harry Mellor 682789d402
Fix missing docs and out of sync `EngineArgs` (#4219)
Co-authored-by: Harry Mellor <hmellor@oxts.com>
2024-04-19 20:51:33 -07:00
Simon Mo 705578ae14
[Docs] document that Meta Llama 3 is supported (#4175) 2024-04-18 10:55:48 -07:00
Sanger Steel d619ae2d19
[Doc] Add better clarity for tensorizer usage (#4090)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-04-15 13:28:25 -07:00
Simon Mo aceb17cf2d
[Docs] document that mixtral 8x22b is supported (#4073) 2024-04-14 14:35:55 -07:00
Sanger Steel 711a000255
[Frontend] [Core] feat: Add model loading using `tensorizer` (#3476) 2024-04-13 17:13:01 -07:00
youkaichao e35397468f
[Doc] Add doc to state our model support policy (#3948)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-04-10 17:03:02 +00:00
ywfang b4543c8f6b
[Model] add minicpm (#3893) 2024-04-08 18:28:36 +08:00
youkaichao 95baec828f
[Core] enable out-of-tree model register (#3871) 2024-04-06 17:11:41 -07:00
Sean Gallen 78107fa091
[Doc]Add asynchronous engine arguments to documentation. (#3810)
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-04-04 21:52:01 -07:00
wenyujin333 d6ea427f04
[Model] Add support for Qwen2MoeModel (#3346) 2024-03-28 15:19:59 +00:00
Woosuk Kwon 6d9aa00fc4
[Docs] Add Command-R to supported models (#3669) 2024-03-27 15:20:00 -07:00
Megha Agarwal e24336b5a7
[Model] Add support for DBRX (#3660) 2024-03-27 13:01:46 -07:00
Woosuk Kwon e66b629c04
[Misc] Minor fix in KVCache type (#3652) 2024-03-26 23:14:06 -07:00
Jee Li 76879342a3
[Doc]add lora support (#3649) 2024-03-27 02:06:46 +00:00
Lalit Pradhan 4c07dd28c0
[🚀 Ready to be merged] Added support for Jais models (#3183) 2024-03-21 09:45:24 +00:00
Simon Mo ef65dcfa6f
[Doc] Add docs about OpenAI compatible server (#3288) 2024-03-18 22:05:34 -07:00
Philipp Moritz 657061fdce
[docs] Add LoRA support information for models (#3299) 2024-03-11 00:54:51 -07:00
Sage Moore ce4f5a29fb
Add Automatic Prefix Caching (#2762)
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-03-02 00:50:01 -08:00
Ganesh Jagadeesan a8683102cc
multi-lora documentation fix (#3064) 2024-02-27 21:26:15 -08:00
Woosuk Kwon 8b430d7dea
[Minor] Fix StableLMEpochForCausalLM -> StableLmForCausalLM (#3046) 2024-02-26 20:23:50 -08:00
张大成 48a8f4a7fd
Support Orion model (#2539)
Co-authored-by: zhangdacheng <zhangdacheng@ainirobot.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-02-26 19:17:06 -08:00
Zhuohan Li a9c8212895
[FIX] Add Gemma model to the doc (#2966) 2024-02-21 09:46:15 -08:00
Isotr0py ab3a5a8259
Support OLMo models. (#2832) 2024-02-18 21:05:15 -08:00
jvmncs 8f36444c4f
multi-LoRA as extra models in OpenAI server (#2775)
how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)):
```terminal
$ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
$ python -m vllm.entrypoints.api_server \
 --model meta-llama/Llama-2-7b-hf \
 --enable-lora \
 --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH
```
the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs

no work has been done here to scope client permissions to specific models
2024-02-17 12:00:48 -08:00
Philipp Moritz 317b29de0f
Remove Yi model definition, please use `LlamaForCausalLM` instead (#2854)
Co-authored-by: Roy <jasonailu87@gmail.com>
2024-02-13 14:22:22 -08:00
Philipp Moritz 4ca2c358b1
Add documentation section about LoRA (#2834) 2024-02-12 17:24:45 +01:00
Fengzhe Zhou cd9e60c76c
Add Internlm2 (#2666) 2024-02-01 09:27:40 -08:00
Junyang Lin 2832e7b9f9
fix names and license for Qwen2 (#2589) 2024-01-24 22:37:51 -08:00
LastWhisper 223c19224b
Fix the syntax error in the doc of supported_models (#2584) 2024-01-24 11:22:51 -08:00
Junyang Lin 94b5edeb53
Add qwen2 (#2495) 2024-01-22 14:34:21 -08:00
Hyunsung Lee e1957c6ebd
Add StableLM3B model (#2372) 2024-01-16 20:32:40 -08:00
Zhuohan Li fd4ea8ef5c
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
Ronen Schaffer c17daa9f89
[Docs] Fix broken links (#2222) 2023-12-20 12:43:42 -08:00
avideci de60a3fb93
Added DeciLM-7b and DeciLM-7b-instruct (#2062) 2023-12-19 02:29:33 -08:00
Suhong Moon 3ec8c25cd0
[Docs] Update documentation for gpu-memory-utilization option (#2162) 2023-12-17 10:51:57 -08:00
Woosuk Kwon f8c688d746
[Minor] Add Phi 2 to supported models (#2159) 2023-12-17 02:54:57 -08:00
Antoni Baum 21d93c140d
Optimize Mixtral with expert parallelism (#2090) 2023-12-13 23:55:07 -08:00
Woosuk Kwon 096827c284
[Docs] Add notes on ROCm-supported models (#2087) 2023-12-13 09:45:34 -08:00
Woosuk Kwon 4ff0203987
Minor fixes for Mixtral (#2015) 2023-12-11 09:16:15 -08:00
Peter Götz d940ce497e
Fix typo in adding_model.rst (#1947)
adpated -> adapted
2023-12-06 10:04:26 -08:00
Woosuk Kwon e5452ddfd6
Normalize head weights for Baichuan 2 (#1876) 2023-11-30 20:03:58 -08:00
Simon Mo 0f621c2c7d
[Docs] Add information about using shared memory in docker (#1845) 2023-11-29 18:33:56 -08:00
Casper a921d8be9d
[DOCS] Add engine args documentation (#1741) 2023-11-22 12:31:27 -08:00
liuyhwangyh edb305584b
Support download models from www.modelscope.cn (#1588) 2023-11-17 20:38:31 -08:00
Zhuohan Li 0fc280b06c
Update the adding-model doc according to the new refactor (#1692) 2023-11-16 18:46:26 -08:00
Zhuohan Li 415d109527
[Fix] Update Supported Models List (#1690) 2023-11-16 14:47:26 -08:00
Usama Ahmed 0967102c6d
fixing typo in `tiiuae/falcon-rw-7b` model name (#1226) 2023-09-29 13:40:25 -07:00
Woosuk Kwon 202351d5bf
Add Mistral to supported model list (#1221) 2023-09-28 14:33:04 -07:00
Zhuohan Li 002800f081
Align vLLM's beam search implementation with HF generate (#857) 2023-09-04 17:29:42 -07:00
Woosuk Kwon 55b28b1eee
[Docs] Minor fixes in supported models (#920)
* Minor fix in supported models

* Add another small fix for Aquila model

---------

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-08-31 16:28:39 -07:00
Zhuohan Li 14f9c72bfd
Update Supported Model List (#825) 2023-08-22 11:51:44 -07:00
Uranus 1b151ed181
Fix baichuan doc style (#748) 2023-08-13 20:57:31 -07:00
Zhuohan Li f7389f4763
[Doc] Add Baichuan 13B to supported models (#656) 2023-08-02 16:45:12 -07:00
Zhuohan Li 1b0bd0fe8a
Add Falcon support (new) (#592) 2023-08-02 14:04:39 -07:00
Zhuohan Li df5dd3c68e
Add Baichuan-7B to README (#494) 2023-07-25 15:25:12 -07:00
Zhuohan Li 6fc2a38b11
Add support for LLaMA-2 (#505) 2023-07-20 11:38:27 -07:00
Andre Slavescu c894836108
[Model] Add support for GPT-J (#226)
Co-authored-by: woWoosuk Kwon <woosuk.kwon@berkeley.edu>
2023-07-08 17:55:16 -07:00
Woosuk Kwon ffa6d2f9f9
[Docs] Fix typo (#346) 2023-07-03 16:51:47 -07:00
Woosuk Kwon 404422f42e
[Model] Add support for MPT (#334) 2023-07-03 16:47:53 -07:00
Woosuk Kwon e41f06702c
Add support for BLOOM (#331) 2023-07-03 13:12:35 -07:00
Woosuk Kwon 665c48963b
[Docs] Add GPTBigCode to supported models (#213) 2023-06-22 15:05:11 -07:00
Woosuk Kwon 794e578de0
[Minor] Fix URLs (#166) 2023-06-19 22:57:14 -07:00
Woosuk Kwon b7e62d3454
Fix repo & documentation URLs (#163) 2023-06-19 20:03:40 -07:00
Zhuohan Li 0b32a987dd
Add and list supported models in README (#161) 2023-06-20 10:57:46 +08:00
Woosuk Kwon dcda03b4cb
Write README and front page of doc (#147) 2023-06-18 03:19:38 -07:00
Woosuk Kwon 0b98ba15c7
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
Woosuk Kwon 456941cfe4
[Docs] Write the `Adding a New Model` section (#138) 2023-06-05 20:01:26 -07:00
Woosuk Kwon 62ec38ea41
Document supported models (#127) 2023-06-02 22:35:17 -07:00