Isotr0py
2cbeedad09
[Docs] Document Phi-4 support ( #12362 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-23 19:18:51 +00:00
Gregory Shtrasberg
e97f802b2d
[FP8][Kernel] Dynamic kv cache scaling factors computation ( #11906 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
2025-01-23 18:04:03 +00:00
Cyrus Leung
d07efb31c5
[Doc] Troubleshooting errors during model inspection ( #12351 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-23 22:46:58 +08:00
youkaichao
511627445e
[doc] explain common errors around torch.compile ( #12340 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-23 14:56:02 +08:00
Russell Bryant
7551a34032
[Docs] Document vulnerability disclosure process ( #12326 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-01-23 03:44:09 +00:00
Michael Goin
01a55941f5
[Docs] Update FP8 KV Cache documentation ( #12238 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-01-23 11:18:09 +08:00
Hongxia Yang
09ccc9c8f7
[Documentation][AMD] Add information about prebuilt ROCm vLLM docker for perf validation purpose ( #12281 )
...
Signed-off-by: Hongxia Yang <hongxyan@amd.com>
2025-01-22 07:49:22 +08:00
Cyrus Leung
96912550c8
[Misc] Rename `MultiModalInputsV2 -> MultiModalInputs` ( #12244 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-21 07:31:19 +00:00
Gregory Shtrasberg
d4b62d4641
[AMD][Build] Porting dockerfiles from the ROCm/vllm fork ( #11777 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-01-21 12:22:23 +08:00
Isotr0py
83609791d2
[Model] Add Qwen2 PRM model support ( #12202 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-20 14:59:46 +08:00
Harry Mellor
3ea7b94523
Move linting to `pre-commit` ( #11975 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-20 14:58:01 +08:00
Roger Wang
81763c58a0
[V1] Add V1 support of Qwen2-VL ( #12128 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: imkero <kerorek@outlook.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-19 19:52:13 +08:00
Isotr0py
02798ecabe
[Model] Port deepseek-vl2 processor, remove dependency ( #12169 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-18 13:59:39 +08:00
Hongxia Yang
c09503ddd6
[AMD][CI/Build][Bugfix] use pytorch stale wheel ( #12172 )
...
Signed-off-by: hongxyan <hongxyan@amd.com>
2025-01-18 11:15:53 +08:00
Yuan Tang
1475847a14
[Doc] Add instructions on using Podman when SELinux is active ( #12136 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-17 04:45:36 +00:00
Isotr0py
62b06ba23d
[Model] Add support for deepseek-vl2-tiny model ( #12068 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-16 17:14:48 +00:00
Cyrus Leung
f8ef146f03
[Doc] Add documentation for specifying model architecture ( #12105 )
2025-01-16 15:53:43 +08:00
RunningLeon
97eb97b5a4
[Model]: Support internlm3 ( #12037 )
2025-01-15 11:35:17 +00:00
Kyle Sayers
3f9b7ab9f5
[Doc] Update examples to remove SparseAutoModelForCausalLM ( #12062 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-01-15 06:36:01 +00:00
Harry Mellor
c9d6ff530b
Explain where the engine args go when using Docker ( #12041 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-14 16:05:50 +00:00
TJian
8a1f938e6f
[Doc] Update Quantization Hardware Support Documentation ( #12025 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-01-14 04:37:52 +00:00
Woosuk Kwon
1a401252b5
[Docs] Add Sky Computing Lab to project intro ( #12019 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-01-13 17:24:36 -08:00
Harry Mellor
e8c23ff989
[Doc] Organise installation documentation into categories and tabs ( #11935 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-13 12:27:36 +00:00
Roger Wang
cd8249903f
[Doc][V1] Update model implementation guide for V1 support ( #11998 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-01-13 11:58:54 +00:00
Akshat Tripathi
8bddb73512
[Hardware][CPU] Multi-LoRA implementation for the CPU backend ( #11100 )
...
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Oleg Mosalov <oleg@krai.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Oleg Mosalov <oleg@krai.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-01-12 13:01:52 +00:00
Isotr0py
f967e51f38
[Model] Initialize support for Deepseek-VL2 models ( #11578 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-01-12 00:17:24 -08:00
Rafael Vasquez
43f3d9e699
[CI/Build] Add markdown linter ( #11857 )
...
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2025-01-12 00:17:13 -08:00
Cyrus Leung
a991f7d508
[Doc] Basic guide for writing unit tests for new models ( #11951 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-11 21:27:24 +08:00
Li, Jiang
aa1e77a19c
[Hardware][CPU] Support MOE models on x86 CPU ( #11831 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-01-10 11:07:58 -05:00
Harry Mellor
482cdc494e
[Doc] Rename offline inference examples ( #11927 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-10 23:50:29 +08:00
Cyrus Leung
12664ddda5
[Doc] [1/N] Initial guide for merged multi-modal processor ( #11925 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-10 14:30:25 +00:00
Harry Mellor
d85c47d6ad
Replace "online inference" with "online serving" ( #11923 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-10 12:05:56 +00:00
Cyrus Leung
3de2b1eafb
[Doc] Show default pooling method in a table ( #11904 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-10 11:25:20 +08:00
Cyrus Leung
c3cf54dda4
[Doc][5/N] Move Community and API Reference to the bottom ( #11896 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2025-01-10 03:10:12 +00:00
Charles Frye
36f5303578
[Docs] Add Modal to deployment frameworks ( #11907 )
2025-01-09 23:26:37 +00:00
Cyrus Leung
9a228348d2
[Misc] Provide correct Pixtral-HF chat template ( #11891 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-09 10:19:37 -07:00
Cyrus Leung
65097ca0af
[Doc] Add model development API Reference ( #11884 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-09 09:43:40 +00:00
Guspan Tanadi
a732900efc
[Doc] Intended links Python multiprocessing library ( #11878 )
2025-01-09 05:39:39 +00:00
Michael Goin
730e9592e9
[Doc] Recommend uv and python 3.12 for quickstart guide ( #11849 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
2025-01-09 11:37:48 +08:00
Cyrus Leung
5984499e47
[Doc] Expand Multimodal API Reference ( #11852 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-08 17:14:14 +00:00
Cyrus Leung
6cd40a5bfe
[Doc][4/N] Reorganize API Reference ( #11843 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-08 21:34:44 +08:00
Harry Mellor
aba8d6ee00
[Doc] Move examples into categories ( #11840 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-08 13:09:53 +00:00
Wallas Henrique
cfd3219f58
[Hardware][Apple] Native support for macOS Apple Silicon ( #11696 )
...
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2025-01-08 16:35:49 +08:00
Simon Mo
a1b2b8606e
[Docs] Update sponsor name: 'Novita' to 'Novita AI' ( #11833 )
2025-01-07 23:05:46 -08:00
youkaichao
ad9f1aa679
[doc] update wheels url ( #11830 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-08 14:36:49 +08:00
Simon Mo
259abd8953
[Docs] reorganize sponsorship page ( #11639 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-01-07 21:16:08 -08:00
Harry Mellor
5950f555a1
[Doc] Group examples into categories ( #11782 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-08 09:20:12 +08:00
sroy745
973f5dc581
[Doc]Add documentation for using EAGLE in vLLM ( #11417 )
...
Signed-off-by: Sourashis Roy <sroy@roblox.com>
2025-01-07 19:19:12 +00:00
Cyrus Leung
c0efe92d8b
[Doc] Add note to `gte-Qwen2` models ( #11808 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-07 21:50:58 +08:00
youkaichao
d9fa1c05ad
[doc] update how pip can install nightly wheels ( #11806 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-07 21:42:58 +08:00
Roger Wang
2de197bdd4
[V1] Support audio language models on V1 ( #11733 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-01-07 19:47:36 +08:00
youkaichao
869e829b85
[doc] add doc to explain how to use uv ( #11773 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-01-07 18:41:17 +08:00
Roger Wang
8082ad7950
[V1][Doc] Update V1 support for `LLaVa-NeXT-Video` ( #11798 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-01-07 09:55:39 +00:00
Russell Bryant
ce1917fcf2
[Doc] Create a vulnerability management team ( #9925 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-01-06 22:57:32 -08:00
Cyrus Leung
8ceffbf315
[Doc][3/N] Reorganize Serving section ( #11766 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-07 11:20:01 +08:00
Roger Wang
91b361ae89
[V1] Extend beyond image modality and support mixed-modality inference with Llava-OneVision ( #11685 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-06 19:58:16 +00:00
youkaichao
4ca5d40adc
[doc] explain how to add interleaving sliding window support ( #11771 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-06 21:57:44 +08:00
Cyrus Leung
ee77fdb5de
[Doc][2/N] Reorganize Models and Usage sections ( #11755 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-06 21:40:31 +08:00
Suraj Deshmukh
2a622d704a
k8s-config: Update the secret to use stringData ( #11679 )
...
Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com>
2025-01-06 08:01:22 +00:00
Cyrus Leung
402d378360
[Doc] [1/N] Reorganize Getting Started section ( #11645 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-06 02:18:33 +00:00
Alberto Ferrer
d1d49397e7
Update bnb.md with example for OpenAI ( #11718 )
2025-01-04 06:29:02 +00:00
Hust_YangXian
9c93636d84
Update tool_calling.md ( #11701 )
2025-01-04 06:16:30 +00:00
Sachin Varghese
2f1e8e8f54
Update default max_num_batch_tokens for chunked prefill ( #11694 )
2025-01-03 00:25:53 +00:00
Chunyang Wen
84c35c374a
According to vllm.EngineArgs, the name should be distributed_executor_backend ( #11689 )
2025-01-02 18:14:16 +00:00
Cyrus Leung
365801fedd
[VLM] Add max-count checking in data parser for single image models ( #11661 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-12-31 22:15:21 -08:00
Roger Wang
e7c7c5e822
[V1][VLM] V1 support for selected single-image models. ( #11632 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Isotr0py <2037008807@qq.com>
2024-12-31 21:17:22 +00:00
Matthias Vogler
a2a40bcd0d
[Model][LoRA]LoRA support added for MolmoForCausalLM ( #11439 )
...
Signed-off-by: Matthias Vogler <matthias.vogler@joesecurity.org>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Matthias Vogler <matthias.vogler@joesecurity.org>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-30 17:33:06 -08:00
youkaichao
b12e87f942
[platforms] enable platform plugins ( #11602 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-30 20:24:45 +08:00
Cyrus Leung
32b4c63f02
[Doc] Convert list tables to MyST ( #11594 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-29 15:56:22 +08:00
youkaichao
328841d002
[bugfix] interleaving sliding window for cohere2 model ( #11583 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-28 16:55:42 +00:00
Cyrus Leung
d427e5cfda
[Doc] Minor documentation fixes ( #11580 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-28 21:53:59 +08:00
Isotr0py
d34be24bb1
[Model] Support InternLM2 Reward models ( #11571 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-12-28 06:14:10 +00:00
Robert Shaw
df04dffade
[V1] [4/N] API Server: ZMQ/MP Utilities ( #11541 )
2024-12-28 01:45:08 +00:00
Cyrus Leung
101418096f
[VLM] Support caching in merged multi-modal processor ( #11396 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-27 17:22:48 +00:00
Chen1022
5ce4627a7e
[Doc] Add xgrammar in doc ( #11549 )
...
Signed-off-by: ccjincong <chenjincong11@gmail.com>
2024-12-27 13:05:10 +00:00
AlexHe99
d003f3ea39
Update deploying_with_k8s.md with AMD ROCm GPU example ( #11465 )
...
Signed-off-by: Alex He <alehe@amd.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-12-27 10:00:04 +00:00
Robert Shaw
0c0c2015c5
Update openai_compatible_server.md ( #11536 )
...
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-12-26 16:26:18 -08:00
Simon Mo
82d24f7aac
[Docs] Document Deepseek V3 support ( #11535 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
2024-12-26 16:21:56 -08:00
Isotr0py
b85a977822
[Doc] Add video example to openai client for multimodal ( #11521 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-12-26 17:31:29 +00:00
Roger Wang
7492a36207
[Doc] Add `QVQ` and `QwQ` to the list of supported models ( #11509 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2024-12-26 09:44:32 +00:00
Cyrus Leung
6ad909fdda
[Doc] Improve GitHub links ( #11491 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-25 14:49:26 -08:00
Cyrus Leung
3f3e92e1f2
[Model] Automatic conversion of classification and reward models ( #11469 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-24 18:22:22 +00:00
Cyrus Leung
9edca6bf8f
[Frontend] Online Pooling API ( #11457 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-24 17:54:30 +08:00
Rafael Vasquez
32aa2059ad
[Docs] Convert rST to MyST (Markdown) ( #11145 )
...
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-12-23 22:35:38 +00:00
Yuan Tang
2e726680b3
[Bugfix] torch nightly version in ROCm installation guide ( #11423 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2024-12-23 17:20:22 +00:00
youkaichao
5d2248d81a
[doc] explain nccl requirements for rlhf ( #11381 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-20 13:00:56 -08:00
omer-dayan
995f56236b
[Core] Loading model from S3 using RunAI Model Streamer as optional loader ( #10192 )
...
Signed-off-by: OmerD <omer@run.ai>
2024-12-20 16:46:24 +00:00
youkaichao
1ecc645b8f
[doc] backward compatibility for 0.6.4 ( #11359 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-19 21:33:53 -08:00
youkaichao
7801f56ed7
[ci][gh200] dockerfile clean up ( #11351 )
...
Signed-off-by: drikster80 <ed.sealing@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: drikster80 <ed.sealing@gmail.com>
Co-authored-by: cenzhiyao <2523403608@qq.com>
2024-12-19 18:13:06 -08:00
Yehoshua Cohen
6c7f881541
[Model] Add JambaForSequenceClassification model ( #10860 )
...
Signed-off-by: Yehoshua Cohen <yehoshuaco@ai21.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Yehoshua Cohen <yehoshuaco@ai21.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-19 22:48:06 +08:00
Travis Johnson
17ca964273
[Model] IBM Granite 3.1 ( #11307 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2024-12-19 11:27:24 +08:00
kYLe
66d4b16724
[Frontend] Add OpenAI API support for input_audio ( #11027 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-16 22:09:58 -08:00
youkaichao
35bae114a8
fix gh200 tests on main ( #11246 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-16 17:22:38 -08:00
bk-TurbaAI
35ffa682b1
[Docs] hint to enable use of GPU performance counters in profiling tools for multi-node distributed serving ( #11235 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-12-16 22:20:39 +00:00
Jani Monoses
bddbbcb132
[Model] Support Cohere2ForCausalLM (Cohere R7B) ( #11203 )
2024-12-16 09:56:19 +00:00
cennn
b3b1526f03
WIP: [CI/Build] simplify Dockerfile build for ARM64 / GH200 ( #11212 )
...
Signed-off-by: drikster80 <ed.sealing@gmail.com>
Co-authored-by: drikster80 <ed.sealing@gmail.com>
2024-12-16 09:20:49 +00:00
AlexHe99
da6f409246
Update deploying_with_k8s.rst ( #10922 )
2024-12-15 16:33:58 -08:00
Kuntai Du
38e599d6a8
[Doc] add documentation for disaggregated prefilling ( #11197 )
...
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
2024-12-15 13:31:16 -06:00
Jee Jee Li
15859f2357
[[Misc]Upgrade bitsandbytes to the latest version 0.45.0 ( #11201 )
2024-12-15 03:03:06 +00:00
Russell Bryant
4863e5fba5
[Core] V1: Use multiprocessing by default ( #11074 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-12-13 16:27:32 -08:00
Cyrus Leung
0920ab9131
[Doc] Reorganize online pooling APIs ( #11172 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-14 00:22:22 +08:00
Cyrus Leung
eeec9e3390
[Frontend] Separate pooling APIs in offline inference ( #11129 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-13 10:40:07 +00:00
Jani Monoses
7cd7409142
PaliGemma 2 support ( #11142 )
2024-12-13 07:40:07 +00:00
Ramon Ziai
d4d5291cc2
fix(docs): typo in helm install instructions ( #11141 )
...
Signed-off-by: Ramon Ziai <ramon.ziai@bettermarks.com>
2024-12-12 17:36:32 +00:00
Pooya Davoodi
1da8f0e1dd
[Model] Add support for embedding model GritLM ( #10816 )
...
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
2024-12-12 06:39:16 +00:00
Yuan Tang
24a36d6d5f
Update link to LlamaStack remote vLLM guide in serving_with_llamastack.rst ( #11112 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2024-12-12 02:39:21 +00:00
bingps
fd22220687
[Doc] Installed version of llmcompressor for int8/fp8 quantization ( #11103 )
...
Signed-off-by: Guangda Liu <bingps@users.noreply.github.com>
Co-authored-by: Guangda Liu <bingps@users.noreply.github.com>
2024-12-11 15:43:24 +00:00
Cyrus Leung
cad5c0a6ed
[Doc] Update docs to refer to pooling models ( #11093 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-11 13:36:27 +00:00
Cyrus Leung
8f10d5e393
[Misc] Split up pooling tasks ( #10820 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-11 01:28:00 -08:00
Mor Zusman
ffa48c9146
[Model] PP support for Mamba-like models ( #10992 )
...
Signed-off-by: mzusman <mor.zusmann@gmail.com>
2024-12-10 21:53:37 -05:00
Maxime Fournioux
fe2e10c71b
Add example of helm chart for vllm deployment on k8s ( #9199 )
...
Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
2024-12-10 09:19:27 +00:00
Joe Runde
980ad394a8
[Frontend] Use request id from header ( #10968 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-12-10 13:46:29 +08:00
Michael Goin
6d525288c1
[Docs] Add dedicated tool calling page to docs ( #10554 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-12-09 20:15:34 -05:00
Roger Wang
af7c4a92e6
[Doc][V1] Add V1 support column for multimodal models ( #10998 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2024-12-08 22:29:16 -08:00
Cyrus Leung
c889d5888b
[Doc] Explicitly state that PP isn't compatible with speculative decoding yet ( #10975 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-07 17:20:49 +00:00
Cyrus Leung
39e227c7ae
[Model] Update multi-modal processor to support Mantis(LLaVA) model ( #10711 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-07 17:10:05 +00:00
Cyrus Leung
1c768fe537
[Doc] Explicitly state that InternVL 2.5 is supported ( #10978 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-07 16:58:02 +00:00
Sam Stoelinga
7406274041
[Doc] add KubeAI to serving integrations ( #10837 )
...
Signed-off-by: Sam Stoelinga <sammiestoel@gmail.com>
2024-12-06 17:03:56 +00:00
Cyrus Leung
aa39a8e175
[Doc] Create a new "Usage" section ( #10827 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-05 11:19:35 +08:00
Daniele
e4c34c23de
[CI/Build] improve python-only dev setup ( #9621 )
...
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-12-04 21:48:13 +00:00
Kevin H. Luu
c92acb9693
[ci/build] Update vLLM postmerge ECR repo ( #10887 )
2024-12-04 09:01:20 +00:00
Aaron Pham
9323a3153b
[Core][Performance] Add XGrammar support for guided decoding and set it as default ( #10785 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
2024-12-03 15:17:00 +08:00
Russell Bryant
ef51831ee8
[Doc] Add github links for source code references ( #10672 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-03 06:46:07 +00:00
Cyrus Leung
e95f275f57
[CI/Build] Update `mistral_common` version for tests and docs ( #10825 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-02 10:26:10 +00:00
youkaichao
169a0ff911
[doc] add warning about comparing hf and vllm outputs ( #10805 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-01 00:41:38 -08:00
Cyrus Leung
133707123e
[Model] Replace embedding models with pooling adapter ( #10769 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-01 08:02:54 +08:00
wangxiyuan
7e4bbda573
[doc] format fix ( #10789 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2024-11-30 11:38:40 +00:00
Isotr0py
c83919c7a6
[Model] Add Internlm2 LoRA support ( #5064 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-28 17:29:04 +00:00
sixgod
5fc5ce0fe4
[Model] Added GLM-4 series hf format model support vllm==0.6.4 ( #10561 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2024-11-28 14:53:31 +00:00
罗泽轩
278be671a3
[Doc] Update model in arch_overview.rst to match comment ( #10701 )
...
Signed-off-by: spacewander <spacewanderlzx@gmail.com>
2024-11-27 23:58:39 -08:00
shunxing12345
1209261e93
[Model] Support telechat2 ( #10311 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: xiangw2 <xiangw2@chinatelecom.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
2024-11-27 11:32:35 +00:00
Murali Andoorveedu
db66e018ea
[Bugfix] Fix for Spec model TP + Chunked Prefill ( #10232 )
...
Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com>
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Co-authored-by: Sourashis Roy <sroy@roblox.com>
2024-11-26 09:11:16 -08:00
Sage Moore
9a88f89799
custom allreduce + torch.compile ( #10121 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-11-25 22:00:16 -08:00
Sanket Kale
a6760f6456
[Feature] vLLM ARM Enablement for AARCH64 CPUs ( #9228 )
...
Signed-off-by: Sanket Kale <sanketk.kale@fujitsu.com>
Co-authored-by: Sanket Kale <sanketk.kale@fujitsu.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
2024-11-25 18:32:39 -08:00
Shane A
9db713a1dc
[Model] Add OLMo November 2024 model ( #10503 )
2024-11-25 17:26:40 -05:00
Cyrus Leung
1b583cfefa
[Doc] Fix typos in docs ( #10636 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-25 10:15:45 -08:00
zhou fan
b1d920531f
[Model]: Add support for Aria model ( #10514 )
...
Signed-off-by: xffxff <1247714429@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2024-11-25 18:10:55 +00:00
fzyzcjy
2b0879bfc2
Super tiny little typo fix ( #10633 )
2024-11-25 13:08:30 +00:00
Cyrus Leung
ed46f14321
[Model] Support `is_causal` HF config field for Qwen2 model ( #10621 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-25 09:51:20 +00:00
Cyrus Leung
a30a605d21
[Doc] Add encoder-based models to Supported Models page ( #10616 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-25 06:34:07 +00:00
Maximilien de Bayser
214efc2c3c
Support Cross encoder models ( #10400 )
...
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Co-authored-by: Flavia Beo <flavia.beo@ibm.com>
2024-11-24 18:56:20 -08:00
youkaichao
e4fbb14414
[doc] update the code to add models ( #10603 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-11-24 11:21:40 -08:00
Michael Goin
9afa014552
Add small example to metrics.rst ( #10550 )
2024-11-21 23:43:43 +00:00
Li, Jiang
63f1fde277
[Hardware][CPU] Support chunked-prefill and prefix-caching on CPU ( #10355 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2024-11-20 10:57:39 +00:00
wchen61
7629a9c6e5
[CI/Build] Support compilation with local cutlass path ( #10423 ) ( #10424 )
2024-11-19 21:35:50 -08:00
Cyrus Leung
b4be5a8adb
[Bugfix] Enforce no chunked prefill for embedding models ( #10470 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-20 05:12:51 +00:00
Russell Bryant
5390d6664f
[Doc] Add the start of an arch overview page ( #10368 )
2024-11-19 09:52:11 +00:00
Michael Goin
74f8c2cf5f
Add openai.beta.chat.completions.parse example to structured_outputs.rst ( #10433 )
2024-11-19 04:37:46 +00:00
Yan Ma
6b2d25efc7
[Hardware][XPU] AWQ/GPTQ support for xpu backend ( #10107 )
...
Signed-off-by: yan ma <yan.ma@intel.com>
2024-11-18 11:18:05 -07:00
ismael-dm
31894a2155
[Doc] Add documentation for Structured Outputs ( #9943 )
...
Signed-off-by: ismael-dm <ismaeldm99@gmail.com>
2024-11-18 09:52:12 -08:00
B-201
4186be8111
[Doc] Update doc for LoRA support in GLM-4V ( #10425 )
...
Signed-off-by: B-201 <Joy25810@foxmail.com>
2024-11-18 15:08:30 +00:00
youkaichao
755b85359b
[doc] add doc for the plugin system ( #10372 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-15 21:46:27 -08:00
Cyrus Leung
32e46e000f
[Frontend] Automatic detection of chat content format from AST ( #9919 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-16 13:35:40 +08:00
Michael Green
4f168f69a3
[Docs] Misc updates to TPU installation instructions ( #10165 )
2024-11-15 13:26:17 -08:00
Russell Bryant
3e8d14d8a1
[Doc] Move PR template content to docs ( #10159 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-15 13:20:20 -08:00
Simon Mo
c76ac49d26
[Docs] Add Nebius as sponsors ( #10371 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
2024-11-15 12:47:40 -08:00
Cyrus Leung
2ac6d0e75b
[Misc] Consolidate pooler config overrides ( #10351 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-15 06:59:00 +00:00
Cyrus Leung
b40cf6402e
[Model] Support Qwen2 embeddings and use tags to select model tests ( #10184 )
2024-11-14 20:23:09 -08:00
Woosuk Kwon
1dbae0329c
[Docs] Publish meetup slides ( #10331 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-14 16:19:38 +00:00
Mike Depinet
f67ce05d0b
[Frontend] Pythonic tool parser ( #9859 )
...
Signed-off-by: Mike Depinet <mike@fixie.ai>
2024-11-14 04:14:34 +00:00
youkaichao
504ac53d18
[misc] error early for old-style class ( #10304 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-13 18:55:39 -08:00
Cyrus Leung
0b8bb86bf1
[1/N] Initial prototype for multi-modal processor ( #10044 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-13 12:39:03 +00:00
B-201
d909acf9fe
[Model][LoRA]LoRA support added for idefics3 ( #10281 )
...
Signed-off-by: B-201 <Joy25810@foxmail.com>
2024-11-13 17:25:59 +08:00
Austin Veselka
1b886aa104
[Model] Adding Support for Qwen2VL as an Embedding Model. Using MrLight/dse-qwen2-2b-mrl-v1 ( #9944 )
...
Signed-off-by: FurtherAI <austin.veselka@lighton.ai>
Co-authored-by: FurtherAI <austin.veselka@lighton.ai>
2024-11-13 08:28:13 +00:00
电脑星人
3945c82346
[Model] Add support for Qwen2-VL video embeddings input & multiple image embeddings input with varied resolutions ( #10221 )
...
Signed-off-by: imkero <kerorek@outlook.com>
2024-11-13 07:07:22 +00:00
youkaichao
377b74fe87
Revert "[ci][build] limit cmake version" ( #10271 )
2024-11-12 15:06:48 -08:00
youkaichao
18081451f9
[doc] improve debugging doc ( #10270 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-12 14:43:52 -08:00
youkaichao
96ae0eaeb2
[doc] fix location of runllm widget ( #10266 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-12 14:34:39 -08:00
Guillaume Calmettes
36c513a076
[BugFix] Do not raise a `ValueError` when `tool_choice` is set to the supported `none` option and `tools` are not defined. ( #10000 )
...
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
2024-11-12 11:13:46 +00:00
youkaichao
3a28f18b0b
[doc] explain the class hierarchy in vLLM ( #10240 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-11 22:56:44 -08:00
youkaichao
d1c6799b88
[doc] update debugging guide ( #10236 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-11 15:21:12 -08:00
Yuan Tang
4800339c62
Add docs on serving with Llama Stack ( #10183 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2024-11-11 11:28:55 -08:00
youkaichao
f0f2e5638e
[doc] improve debugging code ( #10206 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-10 17:49:40 -08:00
Shawn Du
20cf2f553c
[Misc] small fixes to function tracing file path ( #9543 )
...
Signed-off-by: Shawn Du <shawnd200@outlook.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-11-10 15:21:06 -08:00
Yongzao
bfb7d61a7c
[doc] Polish the integration with huggingface doc ( #10195 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-11-10 10:22:04 -08:00
youkaichao
9fa4bdde9d
[ci][build] limit cmake version ( #10188 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-09 16:27:26 -08:00
cjackal
d88bff1b96
[Frontend] add `add_request_id` middleware ( #9594 )
...
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
2024-11-09 10:18:29 +00:00
youkaichao
8a4358ecb5
[doc] explaining the integration with huggingface ( #10173 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-09 01:02:54 -08:00
Cyrus Leung
49d2a41a86
[Doc] Adjust RunLLM location ( #10176 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-08 20:07:10 -08:00
Cyrus Leung
e0191a95d8
[0/N] Rename `MultiModalInputs` to `MultiModalKwargs` ( #10040 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-09 11:31:02 +08:00
Rafael Vasquez
6b30471586
[Misc] Improve Web UI ( #10090 )
...
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-11-08 09:51:04 -08:00
Russell Bryant
3a7f15a398
[Doc] Move CONTRIBUTING to docs site ( #9924 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-08 05:15:12 +00:00
whyiug
40d0e7411d
[Doc] Update FAQ links in spec_decode.rst ( #9662 )
...
Signed-off-by: whyiug <whyiug@hotmail.com>
2024-11-08 04:44:58 +00:00
litianjian
28b2877d30
Online video support for VLMs ( #10020 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: litianjian <litianjian@bytedance.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-07 20:25:59 +00:00
Maximilien de Bayser
ae62fd17c0
[Frontend] Tool calling parser for Granite 3.0 models ( #9027 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2024-11-07 07:09:02 -08:00
Rafael Vasquez
d7263a1bb8
Doc: Improve benchmark documentation ( #9927 )
...
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-11-06 23:50:35 -08:00
Cyrus Leung
db7db4aab9
[Misc] Consolidate ModelConfig code related to HF config ( #10104 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-07 06:00:21 +00:00
youkaichao
e7b84c394d
[doc] add back Python 3.8 ABI ( #10100 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-06 21:06:41 -08:00
Li, Jiang
a4b3e0c1e9
[Hardware][CPU] Update torch 2.5 ( #9911 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2024-11-07 04:43:08 +00:00
Russell Bryant
098f94de42
[CI/Build] Drop Python 3.8 support ( #10038 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-06 14:31:01 +00:00
Eric
406d4cc480
[Model][LoRA]LoRA support added for Qwen2VLForConditionalGeneration ( #10022 )
...
Signed-off-by: ericperfect <ericperfectttt@gmail.com>
2024-11-06 14:13:15 +00:00
Jee Jee Li
a5bba7d234
[Model] Add Idefics3 support ( #9767 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: B-201 <Joy25810@foxmail.com>
Co-authored-by: B-201 <Joy25810@foxmail.com>
2024-11-06 11:41:17 +00:00
Jee Jee Li
2003cc3513
[Model][LoRA]LoRA support added for LlamaEmbeddingModel ( #10071 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-11-06 09:49:19 +00:00
Konrad Zawora
a02a50e6e5
[Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend ( #6143 )
...
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Signed-off-by: Bob Zhu <bob.zhu@intel.com>
Signed-off-by: zehao-intel <zehao.huang@intel.com>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>
Co-authored-by: Michal Adamczyk <madamczyk@habana.ai>
Co-authored-by: Marceli Fylcek <mfylcek@habana.ai>
Co-authored-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>
Co-authored-by: Vivek Goel <vgoel@habana.ai>
Co-authored-by: yuwenzho <yuwen.zhou@intel.com>
Co-authored-by: Dominika Olszewska <dolszewska@habana.ai>
Co-authored-by: barak goldberg <149692267+bgoldberg-habana@users.noreply.github.com>
Co-authored-by: Michal Szutenberg <37601244+szutenberg@users.noreply.github.com>
Co-authored-by: Jan Kaniecki <jkaniecki@habana.ai>
Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyniewicz-habana@users.noreply.github.com>
Co-authored-by: Krzysztof Wisniewski <kwisniewski@habana.ai>
Co-authored-by: Dudi Lester <160421192+dudilester@users.noreply.github.com>
Co-authored-by: Ilia Taraban <tarabanil@gmail.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>
Co-authored-by: Jakub Maksymczuk <jmaksymczuk@habana.ai>
Co-authored-by: Tomasz Zielinski <85164140+tzielinski-habana@users.noreply.github.com>
Co-authored-by: Sun Choi <schoi@habana.ai>
Co-authored-by: Iryna Boiko <iboiko@habana.ai>
Co-authored-by: Bob Zhu <41610754+czhu15@users.noreply.github.com>
Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Co-authored-by: Zehao Huang <zehao.huang@intel.com>
Co-authored-by: Andrzej Kotłowski <Andrzej.Kotlowski@intel.com>
Co-authored-by: Yan Tomsinsky <73292515+Yantom1@users.noreply.github.com>
Co-authored-by: Nir David <ndavid@habana.ai>
Co-authored-by: Yu-Zhou <yu.zhou@intel.com>
Co-authored-by: Ruheena Suhani Shaik <rsshaik@habana.ai>
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
Co-authored-by: Marcin Swiniarski <mswiniarski@habana.ai>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Jacek Czaja <jacek.czaja@intel.com>
Co-authored-by: Jacek Czaja <jczaja@habana.ai>
Co-authored-by: Yuan <yuan.zhou@outlook.com>
2024-11-06 01:09:10 -08:00
Aaron Pham
21063c11c7
[CI/Build] drop support for Python 3.8 EOL ( #8464 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2024-11-06 07:11:55 +00:00
Richard Liu
cd34029e91
Refactor TPU requirements file and pin build dependencies ( #10010 )
...
Signed-off-by: Richard Liu <ricliu@google.com>
2024-11-05 16:48:44 +00:00
Roger Wang
6e056bcf04
[Doc] Update VLM doc about loading from local files ( #9999 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2024-11-04 19:47:11 +00:00
shanshan wang
54597724f4
[Model] Add support for H2OVL-Mississippi models ( #9747 )
...
Signed-off-by: Shanshan Wang <shanshan.wang@h2o.ai>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-11-04 00:15:36 +00:00
Michael Green
1d4cfe2be1
[Doc] Updated tpu-installation.rst with more details ( #9926 )
...
Signed-off-by: Michael Green <mikegre@google.com>
2024-11-02 10:06:45 -04:00
Nick Hill
eed92f12fc
[Docs] Update Granite 3.0 models in supported models table ( #9930 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-11-02 09:02:18 +00:00
Cyrus Leung
ba0d892074
[Frontend] Use a proper chat template for VLM2Vec ( #9912 )
2024-11-01 14:09:07 +00:00
Cyrus Leung
06386a64dd
[Frontend] Chat-based Embeddings API ( #9759 )
2024-11-01 08:13:35 +00:00
Cyrus Leung
d3aa2a8b2f
[Doc] Update multi-input support ( #9906 )
2024-11-01 07:34:49 +00:00
Yongzao
2b5bf20988
[torch.compile] Adding torch compile annotations to some models ( #9876 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-11-01 00:25:47 -07:00
Joe Runde
031a7995f3
[Bugfix][Frontend] Reject guided decoding in multistep mode ( #9892 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-11-01 01:09:46 +00:00
Jee Jee Li
5608e611c2
[Doc] Update Qwen documentation ( #9869 )
2024-10-31 08:54:18 +00:00
Guillaume Calmettes
abbfb6134d
[Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint ( #9837 )
2024-10-30 18:15:56 -07:00
youkaichao
c2cd1a2142
[doc] update pp support ( #9853 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-30 13:36:51 -07:00
Joe Runde
33d257735f
[Doc] link bug for multistep guided decoding ( #9843 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-10-30 17:28:29 +00:00
Woosuk Kwon
211fe91aa8
[TPU] Correctly profile peak memory usage & Upgrade PyTorch XLA ( #9438 )
2024-10-30 09:41:38 +00:00
Yan Ma
04a3ae0aca
[Bugfix] Fix multi nodes TP+PP for XPU ( #8884 )
...
Signed-off-by: YiSheng5 <syhm@mail.ustc.edu.cn>
Signed-off-by: yan ma <yan.ma@intel.com>
Co-authored-by: YiSheng5 <syhm@mail.ustc.edu.cn>
2024-10-29 21:34:45 -07:00
Will Eaton
882a1ad0de
[Model] tool calling support for ibm-granite/granite-20b-functioncalling ( #8339 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
2024-10-29 15:07:37 -07:00
Russell Bryant
c5d7fb9ddc
[Doc] fix third-party model example ( #9771 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-10-28 19:39:21 -07:00
kakao-kevin-us
6650e6a930
[Model] Add classification Task with Qwen2ForSequenceClassification ( #9704 )
...
Signed-off-by: Kevin-Yang <ykcha9@gmail.com>
Co-authored-by: Kevin-Yang <ykcha9@gmail.com>
2024-10-26 17:53:35 +00:00
Rafael Vasquez
228cfbd03f
[Doc] Improve quickstart documentation ( #9256 )
...
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-10-25 14:32:10 -07:00
Cyrus Leung
b979143d5b
[Doc] Move additional tips/notes to the top ( #9647 )
2024-10-24 09:43:59 +00:00
Yongzao
8a02cd045a
[torch.compile] Adding torch compile annotations to some models ( #9639 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-10-24 00:54:57 -07:00
Cyrus Leung
836e8ef6ee
[Bugfix] Fix PP for ChatGLM and Molmo ( #9422 )
2024-10-24 06:12:05 +00:00
Vinay R Damodaran
33bab41060
[Bugfix]: Make chat content text allow type content ( #9358 )
...
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
2024-10-24 05:05:49 +00:00
Yunfei Chu
fc6c274626
[Model] Add Qwen2-Audio model support ( #9248 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-23 17:54:22 +00:00
Cyrus Leung
831540cf04
[Model] Support E5-V ( #9576 )
2024-10-23 11:35:29 +08:00
Seth Kimmel
208cb34c81
[Doc]: Update tensorizer docs to include vllm[tensorizer] ( #7889 )
...
Co-authored-by: Kaunil Dhruv <dhruv.kaunil@gmail.com>
2024-10-22 15:43:25 -07:00
Yuan
32a1ee74a0
[Hardware][Intel CPU][DOC] Update docs for CPU backend ( #6212 )
...
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Gubrud, Aaron D <aaron.d.gubrud@intel.com>
Co-authored-by: adgubrud <96072084+adgubrud@users.noreply.github.com>
2024-10-22 10:38:04 -07:00
Isotr0py
bb392ea2d2
[Model][VLM] Initialize support for Mono-InternVL model ( #9528 )
2024-10-22 16:01:46 +00:00
Rafael Vasquez
f7db5f0fa9
[Doc] Use shell code-blocks and fix section headers ( #9508 )
...
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-10-22 06:43:24 +00:00
youkaichao
d621c43df7
[doc] fix format ( #9562 )
2024-10-21 13:54:57 -07:00
Dhia Eddine Rhaiem
f6b97293aa
[Model] FalconMamba Support ( #9325 )
2024-10-21 12:50:16 -04:00
Michael Goin
3921a2f29e
[Model] Support Pixtral models in the HF Transformers format ( #9036 )
2024-10-18 13:29:56 -06:00
Cyrus Leung
051eaf6db3
[Model] Add user-configurable task for models that support both generation and embedding ( #9424 )
2024-10-18 11:31:58 -07:00
tomeras91
d2b1bf55ec
[Frontend][Feature] Add jamba tool parser ( #9154 )
2024-10-18 10:27:48 +00:00
Kuntai Du
81ede99ca4
[Core] Deprecating block manager v1 and make block manager v2 default ( #8704 )
...
Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
2024-10-17 11:38:15 -05:00
Li, Jiang
5eda21e773
[Hardware][CPU] compressed-tensor INT8 W8A8 AZP support ( #9344 )
2024-10-17 12:21:04 -04:00
Junhao Li
5b8a1fde84
[Model][Bugfix] Add FATReLU activation and support for openbmb/MiniCPM-S-1B-sft ( #9396 )
2024-10-16 16:40:24 +00:00
Roger Wang
59230ef32b
[Misc] Consolidate example usage of OpenAI client for multimodal models ( #9412 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-16 11:20:51 +00:00
Cyrus Leung
cee711fdbb
[Core] Rename input data types ( #8688 )
2024-10-16 10:49:37 +00:00
Cyrus Leung
7abba39ee6
[Model] VLM2Vec, the first multimodal embedding model in vLLM ( #9303 )
2024-10-16 14:31:00 +08:00
Michael Goin
8e836d982a
[Doc] Fix code formatting in spec_decode.rst ( #9348 )
2024-10-14 21:29:11 -07:00
Tyler Michael Smith
169b530607
[Bugfix] Clean up some cruft in mamba.py ( #9343 )
2024-10-15 00:24:25 +00:00
Reza Salehi
dfe43a2071
[Model] Molmo vLLM Integration ( #9016 )
...
Co-authored-by: sanghol <sanghol@allenai.org>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-10-14 07:56:24 -07:00
Yunmeng
2b184ddd4f
[Misc][Installation] Improve source installation script and doc ( #9309 )
...
Co-authored-by: youkaichao <youkaichao@126.com>
2024-10-12 09:36:40 -07:00
Wallas Henrique
8baf85e4e9
[Doc] Compatibility matrix for mutual exclusive features ( #8512 )
...
Signed-off-by: Wallas Santos <wallashss@ibm.com>
2024-10-11 11:18:50 -07:00
sixgod
6cf1167c1a
[Model] Add GLM-4v support and meet vllm==0.6.2 ( #9242 )
2024-10-11 17:36:13 +00:00
Tyler Michael Smith
7342a7d7f8
[Model] Support Mamba ( #6484 )
2024-10-11 15:40:06 +00:00
Cyrus Leung
e808156f30
[Misc] Collect model support info in a single process per model ( #9233 )
2024-10-11 11:08:11 +00:00
omrishiv
f990bab2a4
[Doc][Neuron] add note to neuron documentation about resolving triton issue ( #9257 )
...
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
2024-10-10 23:36:32 +00:00
Rafael Vasquez
055f3270d4
[Doc] Improve debugging documentation ( #9204 )
...
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-10-10 10:48:51 -07:00
whyiug
04de9057ab
[Model] support input image embedding for minicpmv ( #9237 )
2024-10-10 15:00:47 +00:00
youkaichao
de895f1697
[misc] improve model support check in another process ( #9208 )
2024-10-09 21:58:27 -07:00
Li, Jiang
ca77dd7a44
[Hardware][CPU] Support AWQ for CPU backend ( #7515 )
2024-10-09 10:28:08 -06:00
Jiangtao Hu
dc4aea677a
[Doc] Fix VLM prompt placeholder sample bug ( #9170 )
2024-10-09 08:59:42 +00:00
Yuan Tang
acce7630c1
Update link to KServe deployment guide ( #9173 )
2024-10-09 03:58:49 +00:00
Michael Goin
9ba0bd6aa6
Add `lm-eval` directly to requirements-test.txt ( #9161 )
2024-10-08 18:22:31 -07:00
Rafael Vasquez
de24046fcd
[Doc] Improve contributing and installation documentation ( #9132 )
...
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-10-08 20:22:08 +00:00
Sayak Paul
1874c6a1b0
[Doc] Update vlm.rst to include an example on videos ( #9155 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-10-08 18:12:29 +00:00
TimWang
93cf74a8a7
[Doc]: Add deploying_with_k8s guide ( #8451 )
2024-10-07 13:31:45 -07:00
Cyrus Leung
151ef4efd2
[Model] Support NVLM-D and fix QK Norm in InternViT ( #9045 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2024-10-07 11:55:12 +00:00
Cyrus Leung
b22b798471
[Model] PP support for embedding models and update docs ( #9090 )
...
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-10-06 16:35:27 +08:00
Cyrus Leung
f22619fe96
[Misc] Remove user-facing error for removed VLM args ( #9104 )
2024-10-06 01:33:52 -07:00
Andy Dai
5df1834895
[Bugfix] Fix order of arguments matters in config.yaml ( #8960 )
2024-10-05 17:35:11 +00:00
Roger Wang
26aa325f4f
[Core][VLM] Test registration for OOT multimodal models ( #8717 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-04 10:38:25 -07:00
Cyrus Leung
0e36fd4909
[Misc] Move registry to its own file ( #9064 )
2024-10-04 10:01:37 +00:00
Murali Andoorveedu
0f6d7a9a34
[Models] Add remaining model PP support ( #7168 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-04 10:56:58 +08:00
代君
3dbb215b38
[Frontend][Feature] support tool calling for internlm/internlm2_5-7b-chat model ( #8405 )
2024-10-04 10:36:39 +08:00
Nick Hill
18c2e30c57
[Doc] Update Granite model docs ( #9025 )
2024-10-03 02:42:24 +00:00
Sergey Shlyapnikov
f58d4fccc9
[OpenVINO] Enable GPU support for OpenVINO vLLM backend ( #8192 )
2024-10-02 17:50:01 -04:00
Cyrus Leung
4f341bd4bf
[Doc] Update list of supported models ( #8987 )
2024-10-02 00:35:39 +08:00
whyiug
e01ab595d8
[Model] support input embeddings for qwen2vl ( #8856 )
2024-09-30 03:16:10 +00:00
youkaichao
cc276443b5
[doc] organize installation doc and expose per-commit docker ( #8931 )
2024-09-28 17:48:41 -07:00
youkaichao
d86f6b2afb
[misc] fix wheel name ( #8919 )
2024-09-27 22:10:44 -07:00
Cyrus Leung
3b00b9c26c
[Core] rename`PromptInputs` and `inputs` ( #8876 )
2024-09-26 20:35:15 -07:00
Maximilien de Bayser
344cd2b6f4
[Feature] Add support for Llama 3.1 and 3.2 tool use ( #8343 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2024-09-26 17:01:42 -07:00
youkaichao
70de39f6b4
[misc][installation] build from source without compilation ( #8818 )
2024-09-26 13:19:04 -07:00
Roger Wang
4bb98f2190
[Misc] Update config loading for Qwen2-VL and remove Granite ( #8837 )
2024-09-26 07:45:30 -07:00
Roger Wang
e2c6e0a829
[Doc] Update doc for Transformers 4.45 ( #8817 )
2024-09-25 13:29:48 -07:00
Chen Zhang
770ec6024f
[Model] Add support for the multi-modal Llama 3.2 model ( #8811 )
...
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-25 13:29:32 -07:00
Simon Mo
4f1ba0844b
Revert "rename PromptInputs and inputs with backward compatibility ( #8760 ) ( #8810 )
2024-09-25 10:36:26 -07:00
Cyrus Leung
28e1299e60
rename PromptInputs and inputs with backward compatibility ( #8760 )
2024-09-25 09:36:47 -07:00
Hongxia Yang
1c046447a6
[CI/Build][Bugfix][Doc][ROCm] CI fix and doc update after ROCm 6.2 upgrade ( #8777 )
2024-09-25 22:26:37 +08:00
Jee Jee Li
13f9f7a3d0
[[Misc]Upgrade bitsandbytes to the latest version 0.44.0 ( #8768 )
2024-09-24 17:08:55 -07:00
Simon Mo
3185fb0cca
Revert "[Core] Rename `PromptInputs` to `PromptType`, and `inputs` to `prompt`" ( #8750 )
2024-09-24 05:45:20 +00:00
Hongxia Yang
530821d00c
[Hardware][AMD] ROCm6.2 upgrade ( #8674 )
2024-09-23 18:52:39 -07:00
Daniele
ee5f34b1c2
[CI/Build] use setuptools-scm to set __version__ ( #4738 )
...
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-23 09:44:26 -07:00
Yan Ma
d23679eb99
[Bugfix] fix docker build for xpu ( #8652 )
2024-09-22 22:54:18 -07:00
youkaichao
d4a2ac8302
[build] enable existing pytorch (for GH200, aarch64, nightly) ( #8713 )
2024-09-22 12:47:54 -07:00
litianjian
5b59532760
[Model][VLM] Add LLaVA-Onevision model support ( #8486 )
...
Co-authored-by: litianjian <litianjian@bytedance.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-22 10:51:44 -07:00
Andy Dai
4dfdf43196
[Doc] Fix typo in AMD installation guide ( #8689 )
2024-09-21 00:24:12 -07:00
Cyrus Leung
0057894ef7
[Core] Rename `PromptInputs` and `inputs`( #8673 )
2024-09-20 19:00:54 -07:00
omrishiv
7c8566aa4f
[Doc] neuron documentation update ( #8671 )
...
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
2024-09-20 15:04:37 -07:00
Niklas Muennighoff
3b63de9353
[Model] Add OLMoE ( #7922 )
2024-09-20 09:31:41 -07:00
Jiaxin Shan
260d40b5ea
[Core] Support Lora lineage and base model metadata management ( #6315 )
2024-09-20 06:20:56 +00:00
Isotr0py
ea4647b7d7
[Doc] Add documentation for GGUF quantization ( #8618 )
2024-09-19 13:15:55 -06:00
Geun, Lim
e18749ff09
[Model] Support Solar Model ( #8386 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-09-18 11:04:00 -06:00
Alexander Matveev
7c7714d856
[Core][Bugfix][Perf] Introduce `MQLLMEngine` to avoid `asyncio` OH ( #8157 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-09-18 13:56:58 +00:00
youkaichao
fa0c114fad
[doc] improve installation doc ( #8550 )
...
Co-authored-by: Andy Dai <76841985+Imss27@users.noreply.github.com>
2024-09-17 16:24:06 -07:00
youkaichao
2759a43a26
[doc] update doc on testing and debugging ( #8514 )
2024-09-16 12:10:23 -07:00
ywfang
8a0cf1ddc3
[Model] support minicpm3 ( #8297 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-14 14:50:26 +00:00
Isotr0py
f57092c00b
[Doc] Add oneDNN installation to CPU backend documentation ( #8467 )
2024-09-13 18:06:30 +00:00
Cyrus Leung
a84e598e21
[CI/Build] Reorganize models tests ( #7820 )
2024-09-13 10:20:06 -07:00
youkaichao
cab69a15e4
[doc] recommend pip instead of conda ( #8446 )
2024-09-12 23:52:41 -07:00
Alex Brooks
c6202daeed
[Model] Support multiple images for qwen-vl ( #8247 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-12 10:10:54 -07:00
Patrick von Platen
d394787e52
Pixtral ( #8377 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-11 14:41:55 -07:00
Yang Fan
3b7fea770f
[Model][VLM] Add Qwen2-VL model support ( #7905 )
...
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-11 09:31:19 -07:00
Yangshen⚡Deng
6a512a00df
[model] Support for Llava-Next-Video model ( #7559 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-09-10 22:21:36 -07:00
Simon Mo
a1d874224d
Add NVIDIA Meetup slides, announce AMD meetup, and add contact info ( #8319 )
2024-09-09 23:21:00 -07:00
Isotr0py
e807125936
[Model][VLM] Support multi-images inputs for InternVL2 models ( #8201 )
2024-09-07 16:38:23 +08:00
Cyrus Leung
2f707fcb35
[Model] Multi-input support for LLaVA ( #8238 )
2024-09-07 02:57:24 +00:00
William Lin
12dd715807
[misc] [doc] [frontend] LLM torch profiler support ( #7943 )
2024-09-06 17:48:48 -07:00
Dipika Sikka
23f322297f
[Misc] Remove `SqueezeLLM` ( #8220 )
2024-09-06 16:29:03 -06:00
Jiaxin Shan
db3bf7c991
[Core] Support load and unload LoRA in api server ( #6566 )
...
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2024-09-05 18:10:33 -07:00
sroy745
2febcf2777
[Documentation][Spec Decode] Add documentation about lossless guarantees in Speculative Decoding in vLLM ( #7962 )
2024-09-05 16:25:29 -04:00
Alex Brooks
9da25a88aa
[MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) ( #8029 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-05 12:48:10 +00:00
Cyrus Leung
288a938872
[Doc] Indicate more information about supported modalities ( #8181 )
2024-09-05 10:51:53 +00:00
Kyle Mistele
e02ce498be
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models ( #5649 )
...
Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com>
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
2024-09-04 13:18:13 -07:00
Woosuk Kwon
61f4a93d14
[TPU][Bugfix] Use XLA rank for persistent cache path ( #8137 )
2024-09-03 18:35:33 -07:00
Wenxiang
1248e8506a
[Model] Adding support for MSFT Phi-3.5-MoE ( #7729 )
...
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Zeqi Lin <zelin@microsoft.com>
Co-authored-by: Zeqi Lin <Zeqi.Lin@microsoft.com>
2024-08-30 13:42:57 -06:00
Kaunil Dhruv
058344f89a
[Frontend]-config-cli-args ( #7737 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Kaunil Dhruv <kaunil_dhruv@intuit.com>
2024-08-30 08:21:02 -07:00
Yohan Na
dc13e99348
[MODEL] add Exaone model support ( #7819 )
2024-08-29 23:34:20 -07:00
Stas Bekman
8c56e57def
[Doc] fix 404 link ( #7966 )
2024-08-28 13:54:23 -07:00
Woosuk Kwon
eeffde1ac0
[TPU] Upgrade PyTorch XLA nightly ( #7967 )
2024-08-28 13:10:21 -07:00
Stas Bekman
98c12cffe5
[Doc] fix the autoAWQ example ( #7937 )
2024-08-28 12:12:32 +00:00
Peter Salas
fab5f53e2d
[Core][VLM] Stack multimodal tensors to represent multiple images within each prompt ( #7902 )
2024-08-28 01:53:56 +00:00
Patrick von Platen
6fc4e6e07a
[Model] Add Mistral Tokenization to improve robustness and chat encoding ( #7739 )
2024-08-27 12:40:02 +00:00
Peter Salas
57792ed469
[Doc] Fix incorrect docs from #7615 ( #7788 )
2024-08-22 10:02:06 -07:00
zifeitong
df1a21131d
[Model] Fix Phi-3.5-vision-instruct 'num_crops' issue ( #7710 )
2024-08-22 09:36:24 +08:00
Peter Salas
1ca0d4f86b
[Model] Add UltravoxModel and UltravoxConfig ( #7615 )
2024-08-21 22:49:39 +00:00
William Lin
dd53c4b023
[misc] Add Torch profiler support ( #7451 )
...
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-08-21 15:39:26 -07:00
Cyrus Leung
baaedfdb2d
[mypy] Enable following imports for entrypoints ( #7248 )
...
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Fei <dfdfcai4@gmail.com>
2024-08-20 23:28:21 -07:00
Roger Wang
4506641212
[Doc] Section for Multimodal Language Models ( #7719 )
2024-08-20 23:24:01 -07:00
Ilya Lavrenov
398521ad19
[OpenVINO] Updated documentation ( #7687 )
2024-08-20 07:33:56 -06:00
youkaichao
e54ebc2f8f
[doc] fix doc build error caused by msgspec ( #7659 )
2024-08-19 17:50:59 -07:00
Michael Goin
d4f0f17b02
[Doc] Update quantization supported hardware table ( #7595 )
2024-08-16 13:59:27 -07:00
Michael Goin
b3f4e17935
[Doc] Add docs for llmcompressor INT8 and FP8 checkpoints ( #7444 )
2024-08-16 13:59:16 -07:00
Kameshwara Pavan Kumar Mantha
22b39e11f2
llama_index serving integration documentation ( #6973 )
...
Co-authored-by: pavanmantha <pavan.mantha@thevaslabs.io>
2024-08-14 15:38:37 -07:00
Cyrus Leung
3f674a49b5
[VLM][Core] Support profiling with multiple multi-modal inputs per prompt ( #7126 )
2024-08-14 17:55:42 +00:00
youkaichao
199adbb7cf
[doc] update test script to include cudagraph ( #7501 )
2024-08-13 21:52:58 -07:00
Cyrus Leung
dd164d72f3
[Bugfix][Docs] Update list of mock imports ( #7493 )
2024-08-13 20:37:30 -07:00
Woosuk Kwon
a08df8322e
[TPU] Support multi-host inference ( #7457 )
2024-08-13 16:31:20 -07:00
Peter Salas
00c3d68e45
[Frontend][Core] Add plumbing to support audio language models ( #7446 )
2024-08-13 17:39:33 +00:00
Woosuk Kwon
e20233d361
Revert "[Doc] Update supported_hardware.rst ( #7276 )" ( #7467 )
2024-08-13 01:37:08 -07:00
jon-chuang
a046f86397
[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel ( #7208 )
...
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-08-12 22:47:41 +00:00
Roger Wang
e6e42e4b17
[Core][VLM] Support image embeddings as input ( #6613 )
2024-08-12 16:16:06 +08:00
Simon Mo
f020a6297e
[Docs] Update readme ( #7316 )
2024-08-11 17:13:37 -07:00
tomeras91
02b1988b9f
[Doc] building vLLM with VLLM_TARGET_DEVICE=empty ( #7403 )
2024-08-11 14:38:17 -07:00
Woosuk Kwon
90bab18f24
[TPU] Use mark_dynamic to reduce compilation time ( #7340 )
2024-08-10 18:12:22 -07:00
Simon Mo
5923532e15
Add Skywork AI as Sponsor ( #7314 )
2024-08-08 13:59:57 -07:00
Jee Jee Li
757ac70a64
[Model] Rename MiniCPMVQwen2 to MiniCPMV2.6 ( #7273 )
2024-08-08 14:02:41 +00:00
Michael Goin
6d94420246
[Doc] Update supported_hardware.rst ( #7276 )
2024-08-07 14:21:50 -07:00
Stas Bekman
0e12cd67a8
[Doc] add online speculative decoding example ( #7243 )
2024-08-07 09:58:02 -07:00
Ilya Lavrenov
80cbe10c59
[OpenVINO] migrate to latest dependencies versions ( #7251 )
2024-08-07 09:49:10 -07:00
Roger Wang
2385c8f374
[Doc] Mock new dependencies for documentation ( #7245 )
2024-08-07 06:43:03 +00:00
Thomas Parnell
789937af2e
[Doc] [SpecDecode] Update MLPSpeculator documentation ( #7100 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2024-08-05 23:29:43 +00:00
Simon Mo
4db5176d97
bump version to v0.5.4 ( #7139 )
2024-08-05 14:39:48 -07:00
Jee Jee Li
179a6a36f2
[Model]Refactor MiniCPMV ( #7020 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-04 08:12:41 +00:00
Yihuan Bu
654bc5ca49
Support for guided decoding for offline LLM ( #6878 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-04 03:12:09 +00:00
Michael Goin
b482b9a5b1
[CI/Build] Add support for Python 3.12 ( #7035 )
2024-08-02 13:51:22 -07:00
Murali Andoorveedu
fc912e0886
[Models] Support Qwen model with PP ( #6974 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-08-01 12:40:43 -07:00
Jee Jee Li
7ecee34321
[Kernel][RFC] Refactor the punica kernel based on Triton ( #5036 )
2024-07-31 17:12:24 -07:00
Alphi
2f4e108f75
[Bugfix] Clean up MiniCPM-V ( #6939 )
...
Co-authored-by: hezhihui <hzh7269@modelbest.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-07-31 14:39:19 +00:00
Cyrus Leung
f230cc2ca6
[Bugfix] Fix broadcasting logic for `multi_modal_kwargs` ( #6836 )
2024-07-31 10:38:45 +08:00
Ilya Lavrenov
5895b24677
[OpenVINO] Updated OpenVINO requirements and build docs ( #6948 )
2024-07-30 11:33:01 -07:00
Isotr0py
7cbd9ec7a9
[Model] Initialize support for InternVL2 series models ( #6514 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-29 10:16:30 +00:00
Woosuk Kwon
fad5576c58
[TPU] Reduce compilation time & Upgrade PyTorch XLA version ( #6856 )
2024-07-27 10:28:33 -07:00
Chenggang Wu
f954d0715c
[Docs] Add RunLLM chat widget ( #6857 )
2024-07-27 09:24:46 -07:00
Cyrus Leung
1ad86acf17
[Model] Initial support for BLIP-2 ( #5920 )
...
Co-authored-by: ywang96 <ywang@roblox.com>
2024-07-27 11:53:07 +00:00
Roger Wang
ecb33a28cb
[CI/Build][Doc] Update CI and Doc for VLM example changes ( #6860 )
2024-07-27 09:54:14 +00:00
Harry Mellor
c53041ae3b
[Doc] Add missing mock import to docs `conf.py` ( #6834 )
2024-07-27 04:47:33 +00:00
omrishiv
3c3012398e
[Doc] add VLLM_TARGET_DEVICE=neuron to documentation for neuron ( #6844 )
...
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
2024-07-26 20:20:16 -07:00
Woosuk Kwon
ced36cd89b
[ROCm] Upgrade PyTorch nightly version ( #6845 )
2024-07-26 20:16:13 -07:00
Zhanghao Wu
150a1ffbfd
[Doc] Update SkyPilot doc for wrong indents and instructions for update service ( #4283 )
2024-07-26 14:39:10 -07:00
Michael Goin
281977bd6e
[Doc] Add Nemotron to supported model docs ( #6843 )
2024-07-26 17:32:44 -04:00
Li, Jiang
3bbb4936dc
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation ( #6125 )
2024-07-26 13:50:10 -07:00
youkaichao
85ad7e2d01
[doc][debugging] add known issues for hangs ( #6816 )
2024-07-25 21:48:05 -07:00
Woosuk Kwon
b7215de2c5
[Docs] Publish 5th meetup slides ( #6799 )
2024-07-25 16:47:55 -07:00
youkaichao
f3ff63c3f4
[doc][distributed] improve multinode serving doc ( #6804 )
2024-07-25 15:38:32 -07:00
Kuntai Du
6a1e25b151
[Doc] Add documentations for nightly benchmarks ( #6412 )
2024-07-25 11:57:16 -07:00
Alphi
9e169a4c61
[Model] Adding support for MiniCPM-V ( #4087 )
2024-07-24 20:59:30 -07:00
Hongxia Yang
d88c458f44
[Doc][AMD][ROCm]Added tips to refer to mi300x tuning guide for mi300x users ( #6754 )
2024-07-24 14:32:57 -07:00
Woosuk Kwon
ccc4a73257
[Docs][ROCm] Detailed instructions to build from source ( #6680 )
2024-07-24 01:07:23 -07:00
dongmao zhang
87525fab92
[bitsandbytes]: support read bnb pre-quantized model ( #5753 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-07-23 23:45:09 +00:00
youkaichao
71950af726
[doc][distributed] fix doc argument order ( #6691 )
2024-07-23 08:55:33 -07:00
Woosuk Kwon
cb1362a889
[Docs] Announce llama3.1 support ( #6688 )
2024-07-23 08:18:15 -07:00
Roger Wang
22fa2e35cb
[VLM][Model] Support image input for Chameleon ( #6633 )
2024-07-22 23:50:48 -07:00
youkaichao
c051bfe4eb
[doc][distributed] doc for setting up multi-node environment ( #6529 )
...
[doc][distributed] add more doc for setting up multi-node environment (#6529 )
2024-07-22 21:22:09 -07:00
Cyrus Leung
739b61a348
[Frontend] Refactor prompt processing ( #4028 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-22 10:13:53 -07:00
Matt Wong
06d6c5fe9f
[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes ( #6543 )
2024-07-20 09:39:07 -07:00
Murali Andoorveedu
45ceb85a0c
[Docs] Update PP docs ( #6598 )
2024-07-19 16:38:21 -07:00
Simon Mo
30efe41532
[Docs] Update docs for wheel location ( #6580 )
2024-07-19 12:14:11 -07:00
milo157
a38524f338
[DOC] - Add docker image to Cerebrium Integration ( #6510 )
2024-07-17 10:22:53 -07:00
Cyrus Leung
5bf35a91e4
[Doc][CI/Build] Update docs and tests to use `vllm serve` ( #6431 )
2024-07-17 07:43:21 +00:00
Hongxia Yang
10383887e0
[ROCm] Cleanup Dockerfile and remove outdated patch ( #6482 )
2024-07-16 22:47:02 -07:00
Jiaxin Shan
94162beb9f
[Doc] Fix the lora adapter path in server startup script ( #6230 )
2024-07-16 10:11:04 -07:00
Woosuk Kwon
c467dff24f
[Hardware][TPU] Support MoE with Pallas GMM kernel ( #6457 )
2024-07-16 09:56:28 -07:00
youkaichao
9f4ccec761
[doc][misc] remind to cancel debugging environment variables ( #6481 )
...
[doc][misc] remind users to cancel debugging environment variables after debugging (#6481 )
2024-07-16 09:45:30 -07:00
Kevin H. Luu
d6f3b3d5c4
Pin sphinx-argparse version ( #6453 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-07-16 01:26:11 +00:00
Woosuk Kwon
3dee97b05f
[Docs] Add Google Cloud to sponsor list ( #6450 )
2024-07-15 11:58:10 -07:00
youkaichao
94b82e8c18
[doc][distributed] add suggestion for distributed inference ( #6418 )
2024-07-15 09:45:51 -07:00
youkaichao
22e79ee8f3
[doc][misc] doc update ( #6439 )
2024-07-14 23:33:25 -07:00
Robert Cohn
61e85dbad8
[Doc] xpu backend requires running setvars.sh ( #6393 )
2024-07-14 17:10:11 -07:00
Ethan Xu
dbfe254eda
[Feature] vLLM CLI ( #5090 )
...
Co-authored-by: simon-mo <simon.mo@hey.com>
2024-07-14 15:36:43 -07:00
Yuan Tang
6ef3bf912c
Remove unnecessary trailing period in spec_decode.rst ( #6405 )
2024-07-14 07:58:09 +00:00
Isotr0py
540c0368b1
[Model] Initialize Fuyu-8B support ( #3924 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-14 05:27:14 +00:00
Saliya Ekanayake
a27f87da34
[Doc] Fix Typo in Doc ( #6392 )
...
Co-authored-by: Saliya Ekanayake <esaliya@d-matrix.ai>
2024-07-13 00:48:23 +00:00
Simon Mo
d719ba24c5
Build some nightly wheels by default ( #6380 )
2024-07-12 13:56:59 -07:00
youkaichao
2d23b42d92
[doc] update pipeline parallel in readme ( #6347 )
2024-07-11 11:38:40 -07:00
Jie Fu (傅杰)
439c84581a
[Doc] Update description of vLLM support for CPUs ( #6003 )
2024-07-10 21:15:29 -07:00
Cyrus Leung
8a924d2248
[Doc] Guide for adding multi-modal plugins ( #6205 )
2024-07-10 14:55:34 +08:00
Murali Andoorveedu
673dd4cae9
[Docs] Docs update for Pipeline Parallel ( #6222 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-07-09 16:24:58 -07:00
Roger Wang
6206dcb29e
[Model] Add PaliGemma ( #5189 )
...
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-07-07 09:25:50 +08:00
Cyrus Leung
9389380015
[Doc] Move guide for multimodal model and other improvements ( #6168 )
2024-07-06 17:18:59 +08:00
Roger Wang
175c43eca4
[Doc] Reorganize Supported Models by Type ( #6167 )
2024-07-06 05:59:36 +00:00
Simon Mo
79d406e918
[Docs] Fix readthedocs for tag build ( #6158 )
2024-07-05 12:44:40 -07:00
Cyrus Leung
ae96ef8fbd
[VLM] Calculate maximum number of multi-modal tokens by model ( #6121 )
2024-07-04 16:37:23 -07:00
youkaichao
27902d42be
[misc][doc] try to add warning for latest html ( #5979 )
2024-07-04 09:57:09 -07:00
youkaichao
966fe72141
[doc][misc] bump up py version in installation doc ( #6119 )
2024-07-03 15:52:04 -07:00
xwjiang2010
d9e98f42e4
[vlm] Remove vision language config. ( #6089 )
...
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-03 22:14:16 +00:00
Michael Goin
47f0954af0
[Kernel] Expand FP8 support to Ampere GPUs using FP8 Marlin ( #5975 )
2024-07-03 17:38:00 +00:00
Roger Wang
f1c78138aa
[Doc] Fix Mock Import ( #6094 )
2024-07-03 00:13:56 -07:00
Cyrus Leung
9831aec49f
[Core] Dynamic image size support for VLMs ( #5276 )
...
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: ywang96 <ywang@roblox.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-07-02 20:34:00 -07:00
Mor Zusman
9d6a8daa87
[Model] Jamba support ( #4115 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Erez Schwartz <erezs@ai21.com>
Co-authored-by: Mor Zusman <morz@ai21.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Tomer Asida <tomera@ai21.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-02 23:11:29 +00:00
Cyrus Leung
31354e563f
[Doc] Reinstate doc dependencies ( #6061 )
2024-07-02 10:53:16 +00:00
xwjiang2010
98d6682cd1
[VLM] Remove `image_input_type` from VLM config ( #5852 )
...
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-02 07:57:09 +00:00
Roger Wang
8e0817c262
[Bugfix][Doc] Fix Doc Formatting ( #6048 )
2024-07-01 15:09:11 -07:00
ning.zhang
83bdcb6ac3
add FAQ doc under 'serving' ( #5946 )
2024-07-01 14:11:36 -07:00
youkaichao
4050d646e5
[doc][misc] remove deprecated api server in doc ( #6037 )
2024-07-01 12:52:43 -04:00
Ilya Lavrenov
57f09a419c
[Hardware][Intel] OpenVINO vLLM backend ( #5379 )
2024-06-28 13:50:16 +00:00
Cyrus Leung
5cbe8d155c
[Core] Registry for processing model inputs ( #5214 )
...
Co-authored-by: ywang96 <ywang@roblox.com>
2024-06-28 12:09:56 +00:00
Woosuk Kwon
79c92c7c8a
[Model] Add Gemma 2 ( #5908 )
2024-06-27 13:33:56 -07:00
youkaichao
3fd02bda51
[doc][misc] add note for Kubernetes users ( #5916 )
2024-06-27 10:07:07 -07:00
Cyrus Leung
96354d6a29
[Model] Add base class for LoRA-supported models ( #5018 )
2024-06-27 16:03:04 +08:00
youkaichao
294104c3f9
[doc] update usage of env var to avoid conflict ( #5873 )
2024-06-26 17:57:12 -04:00
Roger Wang
3aa7b6cf66
[Misc][Doc] Add Example of using OpenAI Server with VLM ( #5832 )
2024-06-25 20:34:25 -07:00
Matt Wong
dd793d1de5
[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes ( #5422 )
2024-06-25 15:56:15 -07:00
youkaichao
c18ebfdd71
[doc][distributed] add both gloo and nccl tests ( #5834 )
2024-06-25 15:10:28 -04:00
Cyrus Leung
f23871e9ee
[Doc] Add notice about breaking changes to VLMs ( #5818 )
2024-06-25 01:25:03 -07:00
Michael Goin
1744cc99ba
[Doc] Add Phi-3-medium to list of supported models ( #5788 )
2024-06-24 10:48:55 -07:00
Michael Goin
e72dc6cb35
[Doc] Add "Suggest edit" button to doc pages ( #5789 )
2024-06-24 10:26:17 -07:00
youkaichao
c246212952
[doc][faq] add warning to download models for every nodes ( #5783 )
2024-06-24 15:37:42 +08:00
Woosuk Kwon
8c00f9c15d
[Docs][TPU] Add installation tip for TPU ( #5761 )
2024-06-21 23:09:40 -07:00
Michael Goin
5b15bde539
[Doc] Documentation on supported hardware for quantization methods ( #5745 )
2024-06-21 12:44:29 -04:00
Roger Wang
1b2eaac316
[Bugfix][Doc] FIx Duplicate Explicit Target Name Errors ( #5703 )
2024-06-19 23:10:47 -07:00
Rafael Vasquez
e83db9e7e3
[Doc] Update docker references ( #5614 )
...
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-06-19 15:01:45 -07:00
milo157
2bd231a7b7
[Doc] Added cerebrium as Integration option ( #5553 )
2024-06-18 15:56:59 -07:00
Isotr0py
daef218b55
[Model] Initialize Phi-3-vision support ( #4986 )
2024-06-17 19:34:33 -07:00
Kunshang Ji
728c4c8a06
[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend ( #3814 )
...
Co-authored-by: Jiang Li <jiang1.li@intel.com>
Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com>
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-06-17 11:01:25 -07:00
youkaichao
845a3f26f9
[Doc] add debugging tips for crash and multi-node debugging ( #5581 )
2024-06-17 10:08:01 +08:00
Sanger Steel
6e2527a7cb
[Doc] Update documentation on Tensorizer ( #5471 )
2024-06-14 11:27:57 -07:00
Simon Mo
cdab68dcdb
[Docs] Add ZhenFund as a Sponsor ( #5548 )
2024-06-14 11:17:21 -07:00
Cyrus Leung
0ce7b952f8
[Doc] Update LLaVA docs ( #5437 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-06-13 11:22:07 -07:00
Woosuk Kwon
a65634d3ae
[Docs] Add 4th meetup slides ( #5509 )
2024-06-13 10:18:26 -07:00
Li, Jiang
80aa7e91fc
[Hardware][Intel] Optimize CPU backend and add more performance tips ( #4971 )
...
Co-authored-by: Jianan Gu <jianan.gu@intel.com>
2024-06-13 09:33:14 -07:00
Cyrus Leung
b8d4dfff9c
[Doc] Update debug docs ( #5438 )
2024-06-12 14:49:31 -07:00
Woosuk Kwon
1a8bfd92d5
[Hardware] Initial TPU integration ( #5292 )
2024-06-12 11:53:03 -07:00
youkaichao
8f89d72090
[Doc] add common case for long waiting time ( #5430 )
2024-06-11 11:12:13 -07:00
Nick Hill
99dac099ab
[Core][Doc] Default to multiprocessing for single-node distributed case ( #5230 )
...
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2024-06-11 11:10:41 -07:00
Cade Daniel
89ec06c33b
[Docs] [Spec decode] Fix docs error in code example ( #5427 )
2024-06-11 10:31:56 -07:00
Kuntai Du
9fde251bf0
[Doc] Add an automatic prefix caching section in vllm documentation ( #5324 )
...
Co-authored-by: simon-mo <simon.mo@hey.com>
2024-06-11 10:24:59 -07:00
Cade Daniel
4c2ffb28ff
[Speculative decoding] Initial spec decode docs ( #5400 )
2024-06-11 10:15:40 -07:00
SangBin Cho
246598a6b1
[CI] docfix ( #5410 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: ywang96 <ywang@roblox.com>
2024-06-11 01:28:50 -07:00
Roger Wang
3c4cebf751
[Doc][Typo] Fixing Missing Comma ( #5403 )
2024-06-11 00:20:28 -07:00
youkaichao
d8f31f2f8b
[Doc] add debugging tips ( #5409 )
2024-06-10 23:21:43 -07:00
Michael Goin
77c87beb06
[Doc] Add documentation for FP8 W8A8 ( #5388 )
2024-06-10 18:55:12 -06:00
Woosuk Kwon
cb77ad836f
[Docs] Alphabetically sort sponsors ( #5386 )
2024-06-10 15:17:19 -05:00
Roger Wang
856c990041
[Docs] Add Docs on Limitations of VLM Support ( #5383 )
2024-06-10 09:53:50 -07:00
Cyrus Leung
6b29d6fe70
[Model] Initial support for LLaVA-NeXT ( #4199 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-06-10 12:47:15 +00:00
Roger Wang
7a9cb294ae
[Frontend] Add OpenAI Vision API Support ( #5237 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-06-07 11:23:32 -07:00
Simon Mo
f270a39537
[Docs] Add Sequoia as sponsors ( #5287 )
2024-06-05 18:02:56 +00:00
Jie Fu (傅杰)
87d5abef75
[Bugfix] Fix a bug caused by pip install setuptools>=49.4.0 for CPU backend ( #5249 )
2024-06-04 09:57:51 -07:00
Breno Faria
f775a07e30
[FRONTEND] OpenAI `tools` support named functions ( #5032 )
2024-06-03 18:25:29 -05:00
Cyrus Leung
7a64d24aad
[Core] Support image processor ( #4197 )
2024-06-02 22:56:41 -07:00
Nick Hill
657579113f
[Doc] Add checkmark for GPTBigCodeForCausalLM LoRA support ( #5171 )
2024-05-31 17:20:19 -07:00
Chansung Park
429d89720e
add doc about serving option on dstack ( #3074 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-05-30 10:11:07 -07:00
Cyrus Leung
a9bcc7afb2
[Doc] Use intersphinx and update entrypoints docs ( #5125 )
2024-05-30 09:59:23 -07:00
youkaichao
4fbcb0f27e
[Doc][Build] update after removing vllm-nccl ( #5103 )
...
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-05-29 23:51:18 +00:00
Cyrus Leung
5ae5ed1e60
[Core] Consolidate prompt arguments to LLM engines ( #4328 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-05-28 13:29:31 -07:00
Simon Mo
290f4ada2b
[Docs] Add Dropbox as sponsors ( #5089 )
2024-05-28 10:29:09 -07:00
Eric Xihui Lin
8e192ff967
[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model ( #4799 )
...
Co-authored-by: beagleski <yunanzhang@microsoft.com>
Co-authored-by: bapatra <bapatra@microsoft.com>
Co-authored-by: Barun Patra <codedecde@users.noreply.github.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-05-24 22:00:52 -07:00
youkaichao
6a50f4cafa
[Doc] add ccache guide in doc ( #5012 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-05-23 23:21:54 +00:00
Simon Mo
e941f88584
[Docs] Add acknowledgment for sponsors ( #4925 )
2024-05-21 00:17:25 -07:00
Isotr0py
f12c3b5b3d
[Model] Add Phi-2 LoRA support ( #4886 )
2024-05-21 14:24:17 +09:00
Kante Yin
8e7fb5d43a
Support to serve vLLM on Kubernetes with LWS ( #4829 )
...
Signed-off-by: kerthcet <kerthcet@gmail.com>
2024-05-16 16:37:29 -07:00
Cyrus Leung
dc72402b57
[Bugfix][Doc] Fix CI failure in docs ( #4804 )
...
This PR fixes the CI failure introduced by #4798 .
The failure originates from having duplicate target names in reST, and is fixed by changing the ref targets to anonymous ones. For more information, see this discussion.
I have also changed the format of the links to be more distinct from each other.
2024-05-15 01:57:08 +09:00
Zhuohan Li
c579b750a0
[Doc] Add meetups to the doc ( #4798 )
2024-05-13 18:48:00 -07:00
Cyrus Leung
4bfa7e7f75
[Doc] Add API reference for offline inference ( #4710 )
2024-05-13 17:47:42 -07:00
Zhuohan Li
ac1fbf7fd2
[Doc] Shorten README by removing supported model list ( #4796 )
2024-05-13 16:23:54 -07:00
SangBin Cho
e7c46b9527
[Scheduler] Warning upon preemption and Swapping ( #4647 )
...
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
2024-05-13 23:50:44 +09:00
Allen.Dou
706588a77d
[Bugfix] Fix CLI arguments in OpenAI server docs ( #4729 )
2024-05-11 00:00:56 +09:00
Simon Mo
51d4094fda
chunked-prefill-doc-syntax ( #4603 )
...
Fix the docs: https://docs.vllm.ai/en/latest/models/performance.html
Co-authored-by: sang <rkooo567@gmail.com>
2024-05-10 14:13:23 +09:00
Cyrus Leung
a3c124570a
[Bugfix] Fix CLI arguments in OpenAI server docs ( #4709 )
2024-05-09 09:53:14 -07:00
SangBin Cho
36fb68f947
[Doc] Chunked Prefill Documentation ( #4580 )
2024-05-04 00:18:00 -07:00
youkaichao
2d7bce9cd5
[Doc] add env vars to the doc ( #4572 )
2024-05-03 05:13:49 +00:00
Frαnçois
e491c7e053
[Doc] update(example model): for OpenAI compatible serving ( #4503 )
2024-05-01 10:14:16 -07:00
fuchen.ljl
ee37328da0
Unable to find Punica extension issue during source code installation ( #4494 )
...
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-05-01 00:42:09 +00:00
Prashant Gupta
b31a1fb63c
[Doc] add visualization for multi-stage dockerfile ( #4456 )
...
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-04-30 17:41:59 +00:00
SangBin Cho
a88081bf76
[CI] Disable non-lazy string operation on logging ( #4326 )
...
Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>
2024-04-26 00:16:58 -07:00
Hongxia Yang
cf29b7eda4
[ROCm][Hardware][AMD][Doc] Documentation update for ROCm ( #4376 )
...
Co-authored-by: WoosukKwon <woosuk.kwon@berkeley.edu>
2024-04-25 18:12:25 -07:00
Isotr0py
fbf152d976
[Bugfix][Model] Refactor OLMo model to support new HF format in transformers 4.40.0 ( #4324 )
...
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-04-25 09:35:56 -07:00
Caio Mendes
96e90fdeb3
[Model] Adds Phi-3 support ( #4298 )
2024-04-25 03:06:57 +00:00
youkaichao
2768884ac4
[Doc] Add note for docker user ( #4340 )
...
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-04-24 21:09:44 +00:00
Harry Mellor
34128a697e
Fix `autodoc` directives ( #4272 )
...
Co-authored-by: Harry Mellor <hmellor@oxts.com>
2024-04-23 01:53:01 +00:00
Zhanghao Wu
ceaf4ed003
[Doc] Update the SkyPilot doc with serving and Llama-3 ( #4276 )
2024-04-22 15:34:31 -07:00