vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Isotr0py	2cbeedad09	[Docs] Document Phi-4 support (#12362 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-01-23 19:18:51 +00:00
Gregory Shtrasberg	e97f802b2d	[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2025-01-23 18:04:03 +00:00
Cyrus Leung	d07efb31c5	[Doc] Troubleshooting errors during model inspection (#12351 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-23 22:46:58 +08:00
youkaichao	511627445e	[doc] explain common errors around torch.compile (#12340 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-23 14:56:02 +08:00
Russell Bryant	7551a34032	[Docs] Document vulnerability disclosure process (#12326 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-01-23 03:44:09 +00:00
Michael Goin	01a55941f5	[Docs] Update FP8 KV Cache documentation (#12238 ) Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-01-23 11:18:09 +08:00
Hongxia Yang	09ccc9c8f7	[Documentation][AMD] Add information about prebuilt ROCm vLLM docker for perf validation purpose (#12281 ) Signed-off-by: Hongxia Yang <hongxyan@amd.com>	2025-01-22 07:49:22 +08:00
Cyrus Leung	96912550c8	[Misc] Rename `MultiModalInputsV2 -> MultiModalInputs` (#12244 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-21 07:31:19 +00:00
Gregory Shtrasberg	d4b62d4641	[AMD][Build] Porting dockerfiles from the ROCm/vllm fork (#11777 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-01-21 12:22:23 +08:00
Isotr0py	83609791d2	[Model] Add Qwen2 PRM model support (#12202 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-01-20 14:59:46 +08:00
Harry Mellor	3ea7b94523	Move linting to `pre-commit` (#11975 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-20 14:58:01 +08:00
Roger Wang	81763c58a0	[V1] Add V1 support of Qwen2-VL (#12128 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: imkero <kerorek@outlook.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-19 19:52:13 +08:00
Isotr0py	02798ecabe	[Model] Port deepseek-vl2 processor, remove dependency (#12169 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-01-18 13:59:39 +08:00
Hongxia Yang	c09503ddd6	[AMD][CI/Build][Bugfix] use pytorch stale wheel (#12172 ) Signed-off-by: hongxyan <hongxyan@amd.com>	2025-01-18 11:15:53 +08:00
Yuan Tang	1475847a14	[Doc] Add instructions on using Podman when SELinux is active (#12136 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-01-17 04:45:36 +00:00
Isotr0py	62b06ba23d	[Model] Add support for deepseek-vl2-tiny model (#12068 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-01-16 17:14:48 +00:00
Cyrus Leung	f8ef146f03	[Doc] Add documentation for specifying model architecture (#12105 )	2025-01-16 15:53:43 +08:00
RunningLeon	97eb97b5a4	[Model]: Support internlm3 (#12037 )	2025-01-15 11:35:17 +00:00
Kyle Sayers	3f9b7ab9f5	[Doc] Update examples to remove SparseAutoModelForCausalLM (#12062 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-01-15 06:36:01 +00:00
Harry Mellor	c9d6ff530b	Explain where the engine args go when using Docker (#12041 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-14 16:05:50 +00:00
TJian	8a1f938e6f	[Doc] Update Quantization Hardware Support Documentation (#12025 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-01-14 04:37:52 +00:00
Woosuk Kwon	1a401252b5	[Docs] Add Sky Computing Lab to project intro (#12019 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-01-13 17:24:36 -08:00
Harry Mellor	e8c23ff989	[Doc] Organise installation documentation into categories and tabs (#11935 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-13 12:27:36 +00:00
Roger Wang	cd8249903f	[Doc][V1] Update model implementation guide for V1 support (#11998 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-01-13 11:58:54 +00:00
Akshat Tripathi	8bddb73512	[Hardware][CPU] Multi-LoRA implementation for the CPU backend (#11100 ) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Oleg Mosalov <oleg@krai.ai> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Oleg Mosalov <oleg@krai.ai> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-01-12 13:01:52 +00:00
Isotr0py	f967e51f38	[Model] Initialize support for Deepseek-VL2 models (#11578 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-01-12 00:17:24 -08:00
Rafael Vasquez	43f3d9e699	[CI/Build] Add markdown linter (#11857 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2025-01-12 00:17:13 -08:00
Cyrus Leung	a991f7d508	[Doc] Basic guide for writing unit tests for new models (#11951 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-11 21:27:24 +08:00
Li, Jiang	aa1e77a19c	[Hardware][CPU] Support MOE models on x86 CPU (#11831 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-01-10 11:07:58 -05:00
Harry Mellor	482cdc494e	[Doc] Rename offline inference examples (#11927 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-10 23:50:29 +08:00
Cyrus Leung	12664ddda5	[Doc] [1/N] Initial guide for merged multi-modal processor (#11925 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-10 14:30:25 +00:00
Harry Mellor	d85c47d6ad	Replace "online inference" with "online serving" (#11923 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-10 12:05:56 +00:00
Cyrus Leung	3de2b1eafb	[Doc] Show default pooling method in a table (#11904 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-10 11:25:20 +08:00
Cyrus Leung	c3cf54dda4	[Doc][5/N] Move Community and API Reference to the bottom (#11896 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-01-10 03:10:12 +00:00
Charles Frye	36f5303578	[Docs] Add Modal to deployment frameworks (#11907 )	2025-01-09 23:26:37 +00:00
Cyrus Leung	9a228348d2	[Misc] Provide correct Pixtral-HF chat template (#11891 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-09 10:19:37 -07:00
Cyrus Leung	65097ca0af	[Doc] Add model development API Reference (#11884 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-09 09:43:40 +00:00
Guspan Tanadi	a732900efc	[Doc] Intended links Python multiprocessing library (#11878 )	2025-01-09 05:39:39 +00:00
Michael Goin	730e9592e9	[Doc] Recommend uv and python 3.12 for quickstart guide (#11849 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2025-01-09 11:37:48 +08:00
Cyrus Leung	5984499e47	[Doc] Expand Multimodal API Reference (#11852 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-08 17:14:14 +00:00
Cyrus Leung	6cd40a5bfe	[Doc][4/N] Reorganize API Reference (#11843 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-08 21:34:44 +08:00
Harry Mellor	aba8d6ee00	[Doc] Move examples into categories (#11840 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-08 13:09:53 +00:00
Wallas Henrique	cfd3219f58	[Hardware][Apple] Native support for macOS Apple Silicon (#11696 ) Signed-off-by: Wallas Santos <wallashss@ibm.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-01-08 16:35:49 +08:00
Simon Mo	a1b2b8606e	[Docs] Update sponsor name: 'Novita' to 'Novita AI' (#11833 )	2025-01-07 23:05:46 -08:00
youkaichao	ad9f1aa679	[doc] update wheels url (#11830 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-08 14:36:49 +08:00
Simon Mo	259abd8953	[Docs] reorganize sponsorship page (#11639 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-01-07 21:16:08 -08:00
Harry Mellor	5950f555a1	[Doc] Group examples into categories (#11782 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-08 09:20:12 +08:00
sroy745	973f5dc581	[Doc]Add documentation for using EAGLE in vLLM (#11417 ) Signed-off-by: Sourashis Roy <sroy@roblox.com>	2025-01-07 19:19:12 +00:00
Cyrus Leung	c0efe92d8b	[Doc] Add note to `gte-Qwen2` models (#11808 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-07 21:50:58 +08:00
youkaichao	d9fa1c05ad	[doc] update how pip can install nightly wheels (#11806 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-07 21:42:58 +08:00
Roger Wang	2de197bdd4	[V1] Support audio language models on V1 (#11733 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-01-07 19:47:36 +08:00
youkaichao	869e829b85	[doc] add doc to explain how to use uv (#11773 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-01-07 18:41:17 +08:00
Roger Wang	8082ad7950	[V1][Doc] Update V1 support for `LLaVa-NeXT-Video` (#11798 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-01-07 09:55:39 +00:00
Russell Bryant	ce1917fcf2	[Doc] Create a vulnerability management team (#9925 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-01-06 22:57:32 -08:00
Cyrus Leung	8ceffbf315	[Doc][3/N] Reorganize Serving section (#11766 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-07 11:20:01 +08:00
Roger Wang	91b361ae89	[V1] Extend beyond image modality and support mixed-modality inference with Llava-OneVision (#11685 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-06 19:58:16 +00:00
youkaichao	4ca5d40adc	[doc] explain how to add interleaving sliding window support (#11771 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-06 21:57:44 +08:00
Cyrus Leung	ee77fdb5de	[Doc][2/N] Reorganize Models and Usage sections (#11755 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-06 21:40:31 +08:00
Suraj Deshmukh	2a622d704a	k8s-config: Update the secret to use stringData (#11679 ) Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com>	2025-01-06 08:01:22 +00:00
Cyrus Leung	402d378360	[Doc] [1/N] Reorganize Getting Started section (#11645 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-06 02:18:33 +00:00
Alberto Ferrer	d1d49397e7	Update bnb.md with example for OpenAI (#11718 )	2025-01-04 06:29:02 +00:00
Hust_YangXian	9c93636d84	Update tool_calling.md (#11701 )	2025-01-04 06:16:30 +00:00
Sachin Varghese	2f1e8e8f54	Update default max_num_batch_tokens for chunked prefill (#11694 )	2025-01-03 00:25:53 +00:00
Chunyang Wen	84c35c374a	According to vllm.EngineArgs, the name should be distributed_executor_backend (#11689 )	2025-01-02 18:14:16 +00:00
Cyrus Leung	365801fedd	[VLM] Add max-count checking in data parser for single image models (#11661 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-12-31 22:15:21 -08:00
Roger Wang	e7c7c5e822	[V1][VLM] V1 support for selected single-image models. (#11632 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Isotr0py <2037008807@qq.com>	2024-12-31 21:17:22 +00:00
Matthias Vogler	a2a40bcd0d	[Model][LoRA]LoRA support added for MolmoForCausalLM (#11439 ) Signed-off-by: Matthias Vogler <matthias.vogler@joesecurity.org> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Matthias Vogler <matthias.vogler@joesecurity.org> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-30 17:33:06 -08:00
youkaichao	b12e87f942	[platforms] enable platform plugins (#11602 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-30 20:24:45 +08:00
Cyrus Leung	32b4c63f02	[Doc] Convert list tables to MyST (#11594 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-29 15:56:22 +08:00
youkaichao	328841d002	[bugfix] interleaving sliding window for cohere2 model (#11583 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-28 16:55:42 +00:00
Cyrus Leung	d427e5cfda	[Doc] Minor documentation fixes (#11580 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-28 21:53:59 +08:00
Isotr0py	d34be24bb1	[Model] Support InternLM2 Reward models (#11571 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-28 06:14:10 +00:00
Robert Shaw	df04dffade	[V1] [4/N] API Server: ZMQ/MP Utilities (#11541 )	2024-12-28 01:45:08 +00:00
Cyrus Leung	101418096f	[VLM] Support caching in merged multi-modal processor (#11396 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-27 17:22:48 +00:00
Chen1022	5ce4627a7e	[Doc] Add xgrammar in doc (#11549 ) Signed-off-by: ccjincong <chenjincong11@gmail.com>	2024-12-27 13:05:10 +00:00
AlexHe99	d003f3ea39	Update deploying_with_k8s.md with AMD ROCm GPU example (#11465 ) Signed-off-by: Alex He <alehe@amd.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-27 10:00:04 +00:00
Robert Shaw	0c0c2015c5	Update openai_compatible_server.md (#11536 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-12-26 16:26:18 -08:00
Simon Mo	82d24f7aac	[Docs] Document Deepseek V3 support (#11535 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2024-12-26 16:21:56 -08:00
Isotr0py	b85a977822	[Doc] Add video example to openai client for multimodal (#11521 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-26 17:31:29 +00:00
Roger Wang	7492a36207	[Doc] Add `QVQ` and `QwQ` to the list of supported models (#11509 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-12-26 09:44:32 +00:00
Cyrus Leung	6ad909fdda	[Doc] Improve GitHub links (#11491 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-25 14:49:26 -08:00
Cyrus Leung	3f3e92e1f2	[Model] Automatic conversion of classification and reward models (#11469 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-24 18:22:22 +00:00
Cyrus Leung	9edca6bf8f	[Frontend] Online Pooling API (#11457 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-24 17:54:30 +08:00
Rafael Vasquez	32aa2059ad	[Docs] Convert rST to MyST (Markdown) (#11145 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-12-23 22:35:38 +00:00
Yuan Tang	2e726680b3	[Bugfix] torch nightly version in ROCm installation guide (#11423 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2024-12-23 17:20:22 +00:00
youkaichao	5d2248d81a	[doc] explain nccl requirements for rlhf (#11381 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-20 13:00:56 -08:00
omer-dayan	995f56236b	[Core] Loading model from S3 using RunAI Model Streamer as optional loader (#10192 ) Signed-off-by: OmerD <omer@run.ai>	2024-12-20 16:46:24 +00:00
youkaichao	1ecc645b8f	[doc] backward compatibility for 0.6.4 (#11359 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-19 21:33:53 -08:00
youkaichao	7801f56ed7	[ci][gh200] dockerfile clean up (#11351 ) Signed-off-by: drikster80 <ed.sealing@gmail.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: drikster80 <ed.sealing@gmail.com> Co-authored-by: cenzhiyao <2523403608@qq.com>	2024-12-19 18:13:06 -08:00
Yehoshua Cohen	6c7f881541	[Model] Add JambaForSequenceClassification model (#10860 ) Signed-off-by: Yehoshua Cohen <yehoshuaco@ai21.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Yehoshua Cohen <yehoshuaco@ai21.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-19 22:48:06 +08:00
Travis Johnson	17ca964273	[Model] IBM Granite 3.1 (#11307 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-12-19 11:27:24 +08:00
kYLe	66d4b16724	[Frontend] Add OpenAI API support for input_audio (#11027 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-16 22:09:58 -08:00
youkaichao	35bae114a8	fix gh200 tests on main (#11246 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-16 17:22:38 -08:00
bk-TurbaAI	35ffa682b1	[Docs] hint to enable use of GPU performance counters in profiling tools for multi-node distributed serving (#11235 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-12-16 22:20:39 +00:00
Jani Monoses	bddbbcb132	[Model] Support Cohere2ForCausalLM (Cohere R7B) (#11203 )	2024-12-16 09:56:19 +00:00
cennn	b3b1526f03	WIP: [CI/Build] simplify Dockerfile build for ARM64 / GH200 (#11212 ) Signed-off-by: drikster80 <ed.sealing@gmail.com> Co-authored-by: drikster80 <ed.sealing@gmail.com>	2024-12-16 09:20:49 +00:00
AlexHe99	da6f409246	Update deploying_with_k8s.rst (#10922 )	2024-12-15 16:33:58 -08:00
Kuntai Du	38e599d6a8	[Doc] add documentation for disaggregated prefilling (#11197 ) Signed-off-by: Kuntai Du <kuntai@uchicago.edu>	2024-12-15 13:31:16 -06:00
Jee Jee Li	15859f2357	[[Misc]Upgrade bitsandbytes to the latest version 0.45.0 (#11201 )	2024-12-15 03:03:06 +00:00
Russell Bryant	4863e5fba5	[Core] V1: Use multiprocessing by default (#11074 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-12-13 16:27:32 -08:00
Cyrus Leung	0920ab9131	[Doc] Reorganize online pooling APIs (#11172 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-14 00:22:22 +08:00
Cyrus Leung	eeec9e3390	[Frontend] Separate pooling APIs in offline inference (#11129 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-13 10:40:07 +00:00
Jani Monoses	7cd7409142	PaliGemma 2 support (#11142 )	2024-12-13 07:40:07 +00:00
Ramon Ziai	d4d5291cc2	fix(docs): typo in helm install instructions (#11141 ) Signed-off-by: Ramon Ziai <ramon.ziai@bettermarks.com>	2024-12-12 17:36:32 +00:00
Pooya Davoodi	1da8f0e1dd	[Model] Add support for embedding model GritLM (#10816 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>	2024-12-12 06:39:16 +00:00
Yuan Tang	24a36d6d5f	Update link to LlamaStack remote vLLM guide in serving_with_llamastack.rst (#11112 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2024-12-12 02:39:21 +00:00
bingps	fd22220687	[Doc] Installed version of llmcompressor for int8/fp8 quantization (#11103 ) Signed-off-by: Guangda Liu <bingps@users.noreply.github.com> Co-authored-by: Guangda Liu <bingps@users.noreply.github.com>	2024-12-11 15:43:24 +00:00
Cyrus Leung	cad5c0a6ed	[Doc] Update docs to refer to pooling models (#11093 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-11 13:36:27 +00:00
Cyrus Leung	8f10d5e393	[Misc] Split up pooling tasks (#10820 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-11 01:28:00 -08:00
Mor Zusman	ffa48c9146	[Model] PP support for Mamba-like models (#10992 ) Signed-off-by: mzusman <mor.zusmann@gmail.com>	2024-12-10 21:53:37 -05:00
Maxime Fournioux	fe2e10c71b	Add example of helm chart for vllm deployment on k8s (#9199 ) Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>	2024-12-10 09:19:27 +00:00
Joe Runde	980ad394a8	[Frontend] Use request id from header (#10968 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-12-10 13:46:29 +08:00
Michael Goin	6d525288c1	[Docs] Add dedicated tool calling page to docs (#10554 ) Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-12-09 20:15:34 -05:00
Roger Wang	af7c4a92e6	[Doc][V1] Add V1 support column for multimodal models (#10998 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2024-12-08 22:29:16 -08:00
Cyrus Leung	c889d5888b	[Doc] Explicitly state that PP isn't compatible with speculative decoding yet (#10975 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-07 17:20:49 +00:00
Cyrus Leung	39e227c7ae	[Model] Update multi-modal processor to support Mantis(LLaVA) model (#10711 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-07 17:10:05 +00:00
Cyrus Leung	1c768fe537	[Doc] Explicitly state that InternVL 2.5 is supported (#10978 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-07 16:58:02 +00:00
Sam Stoelinga	7406274041	[Doc] add KubeAI to serving integrations (#10837 ) Signed-off-by: Sam Stoelinga <sammiestoel@gmail.com>	2024-12-06 17:03:56 +00:00
Cyrus Leung	aa39a8e175	[Doc] Create a new "Usage" section (#10827 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-05 11:19:35 +08:00
Daniele	e4c34c23de	[CI/Build] improve python-only dev setup (#9621 ) Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-12-04 21:48:13 +00:00
Kevin H. Luu	c92acb9693	[ci/build] Update vLLM postmerge ECR repo (#10887 )	2024-12-04 09:01:20 +00:00
Aaron Pham	9323a3153b	[Core][Performance] Add XGrammar support for guided decoding and set it as default (#10785 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2024-12-03 15:17:00 +08:00
Russell Bryant	ef51831ee8	[Doc] Add github links for source code references (#10672 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-03 06:46:07 +00:00
Cyrus Leung	e95f275f57	[CI/Build] Update `mistral_common` version for tests and docs (#10825 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-02 10:26:10 +00:00
youkaichao	169a0ff911	[doc] add warning about comparing hf and vllm outputs (#10805 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-01 00:41:38 -08:00
Cyrus Leung	133707123e	[Model] Replace embedding models with pooling adapter (#10769 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-01 08:02:54 +08:00
wangxiyuan	7e4bbda573	[doc] format fix (#10789 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2024-11-30 11:38:40 +00:00
Isotr0py	c83919c7a6	[Model] Add Internlm2 LoRA support (#5064 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2024-11-28 17:29:04 +00:00
sixgod	5fc5ce0fe4	[Model] Added GLM-4 series hf format model support vllm==0.6.4 (#10561 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-11-28 14:53:31 +00:00
罗泽轩	278be671a3	[Doc] Update model in arch_overview.rst to match comment (#10701 ) Signed-off-by: spacewander <spacewanderlzx@gmail.com>	2024-11-27 23:58:39 -08:00
shunxing12345	1209261e93	[Model] Support telechat2 (#10311 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: xiangw2 <xiangw2@chinatelecom.cn> Co-authored-by: Isotr0py <2037008807@qq.com>	2024-11-27 11:32:35 +00:00
Murali Andoorveedu	db66e018ea	[Bugfix] Fix for Spec model TP + Chunked Prefill (#10232 ) Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com> Signed-off-by: Sourashis Roy <sroy@roblox.com> Co-authored-by: Sourashis Roy <sroy@roblox.com>	2024-11-26 09:11:16 -08:00
Sage Moore	9a88f89799	custom allreduce + torch.compile (#10121 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-11-25 22:00:16 -08:00
Sanket Kale	a6760f6456	[Feature] vLLM ARM Enablement for AARCH64 CPUs (#9228 ) Signed-off-by: Sanket Kale <sanketk.kale@fujitsu.com> Co-authored-by: Sanket Kale <sanketk.kale@fujitsu.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2024-11-25 18:32:39 -08:00
Shane A	9db713a1dc	[Model] Add OLMo November 2024 model (#10503 )	2024-11-25 17:26:40 -05:00
Cyrus Leung	1b583cfefa	[Doc] Fix typos in docs (#10636 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-25 10:15:45 -08:00
zhou fan	b1d920531f	[Model]: Add support for Aria model (#10514 ) Signed-off-by: xffxff <1247714429@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2024-11-25 18:10:55 +00:00
fzyzcjy	2b0879bfc2	Super tiny little typo fix (#10633 )	2024-11-25 13:08:30 +00:00
Cyrus Leung	ed46f14321	[Model] Support `is_causal` HF config field for Qwen2 model (#10621 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-25 09:51:20 +00:00
Cyrus Leung	a30a605d21	[Doc] Add encoder-based models to Supported Models page (#10616 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-25 06:34:07 +00:00
Maximilien de Bayser	214efc2c3c	Support Cross encoder models (#10400 ) Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Flavia Beo <flavia.beo@ibm.com> Co-authored-by: Flavia Beo <flavia.beo@ibm.com>	2024-11-24 18:56:20 -08:00
youkaichao	e4fbb14414	[doc] update the code to add models (#10603 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-11-24 11:21:40 -08:00
Michael Goin	9afa014552	Add small example to metrics.rst (#10550 )	2024-11-21 23:43:43 +00:00
Li, Jiang	63f1fde277	[Hardware][CPU] Support chunked-prefill and prefix-caching on CPU (#10355 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2024-11-20 10:57:39 +00:00
wchen61	7629a9c6e5	[CI/Build] Support compilation with local cutlass path (#10423 ) (#10424 )	2024-11-19 21:35:50 -08:00
Cyrus Leung	b4be5a8adb	[Bugfix] Enforce no chunked prefill for embedding models (#10470 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-20 05:12:51 +00:00
Russell Bryant	5390d6664f	[Doc] Add the start of an arch overview page (#10368 )	2024-11-19 09:52:11 +00:00
Michael Goin	74f8c2cf5f	Add openai.beta.chat.completions.parse example to structured_outputs.rst (#10433 )	2024-11-19 04:37:46 +00:00
Yan Ma	6b2d25efc7	[Hardware][XPU] AWQ/GPTQ support for xpu backend (#10107 ) Signed-off-by: yan ma <yan.ma@intel.com>	2024-11-18 11:18:05 -07:00
ismael-dm	31894a2155	[Doc] Add documentation for Structured Outputs (#9943 ) Signed-off-by: ismael-dm <ismaeldm99@gmail.com>	2024-11-18 09:52:12 -08:00
B-201	4186be8111	[Doc] Update doc for LoRA support in GLM-4V (#10425 ) Signed-off-by: B-201 <Joy25810@foxmail.com>	2024-11-18 15:08:30 +00:00
youkaichao	755b85359b	[doc] add doc for the plugin system (#10372 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-15 21:46:27 -08:00
Cyrus Leung	32e46e000f	[Frontend] Automatic detection of chat content format from AST (#9919 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-16 13:35:40 +08:00
Michael Green	4f168f69a3	[Docs] Misc updates to TPU installation instructions (#10165 )	2024-11-15 13:26:17 -08:00
Russell Bryant	3e8d14d8a1	[Doc] Move PR template content to docs (#10159 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-11-15 13:20:20 -08:00
Simon Mo	c76ac49d26	[Docs] Add Nebius as sponsors (#10371 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2024-11-15 12:47:40 -08:00
Cyrus Leung	2ac6d0e75b	[Misc] Consolidate pooler config overrides (#10351 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-15 06:59:00 +00:00
Cyrus Leung	b40cf6402e	[Model] Support Qwen2 embeddings and use tags to select model tests (#10184 )	2024-11-14 20:23:09 -08:00
Woosuk Kwon	1dbae0329c	[Docs] Publish meetup slides (#10331 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-11-14 16:19:38 +00:00
Mike Depinet	f67ce05d0b	[Frontend] Pythonic tool parser (#9859 ) Signed-off-by: Mike Depinet <mike@fixie.ai>	2024-11-14 04:14:34 +00:00
youkaichao	504ac53d18	[misc] error early for old-style class (#10304 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-13 18:55:39 -08:00
Cyrus Leung	0b8bb86bf1	[1/N] Initial prototype for multi-modal processor (#10044 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-13 12:39:03 +00:00
B-201	d909acf9fe	[Model][LoRA]LoRA support added for idefics3 (#10281 ) Signed-off-by: B-201 <Joy25810@foxmail.com>	2024-11-13 17:25:59 +08:00
Austin Veselka	1b886aa104	[Model] Adding Support for Qwen2VL as an Embedding Model. Using MrLight/dse-qwen2-2b-mrl-v1 (#9944 ) Signed-off-by: FurtherAI <austin.veselka@lighton.ai> Co-authored-by: FurtherAI <austin.veselka@lighton.ai>	2024-11-13 08:28:13 +00:00
电脑星人	3945c82346	[Model] Add support for Qwen2-VL video embeddings input & multiple image embeddings input with varied resolutions (#10221 ) Signed-off-by: imkero <kerorek@outlook.com>	2024-11-13 07:07:22 +00:00
youkaichao	377b74fe87	Revert "[ci][build] limit cmake version" (#10271 )	2024-11-12 15:06:48 -08:00
youkaichao	18081451f9	[doc] improve debugging doc (#10270 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-12 14:43:52 -08:00
youkaichao	96ae0eaeb2	[doc] fix location of runllm widget (#10266 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-12 14:34:39 -08:00
Guillaume Calmettes	36c513a076	[BugFix] Do not raise a `ValueError` when `tool_choice` is set to the supported `none` option and `tools` are not defined. (#10000 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2024-11-12 11:13:46 +00:00
youkaichao	3a28f18b0b	[doc] explain the class hierarchy in vLLM (#10240 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-11 22:56:44 -08:00
youkaichao	d1c6799b88	[doc] update debugging guide (#10236 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-11 15:21:12 -08:00
Yuan Tang	4800339c62	Add docs on serving with Llama Stack (#10183 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2024-11-11 11:28:55 -08:00
youkaichao	f0f2e5638e	[doc] improve debugging code (#10206 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-10 17:49:40 -08:00
Shawn Du	20cf2f553c	[Misc] small fixes to function tracing file path (#9543 ) Signed-off-by: Shawn Du <shawnd200@outlook.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-11-10 15:21:06 -08:00
Yongzao	bfb7d61a7c	[doc] Polish the integration with huggingface doc (#10195 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-11-10 10:22:04 -08:00
youkaichao	9fa4bdde9d	[ci][build] limit cmake version (#10188 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-09 16:27:26 -08:00
cjackal	d88bff1b96	[Frontend] add `add_request_id` middleware (#9594 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2024-11-09 10:18:29 +00:00
youkaichao	8a4358ecb5	[doc] explaining the integration with huggingface (#10173 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-09 01:02:54 -08:00
Cyrus Leung	49d2a41a86	[Doc] Adjust RunLLM location (#10176 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-08 20:07:10 -08:00
Cyrus Leung	e0191a95d8	[0/N] Rename `MultiModalInputs` to `MultiModalKwargs` (#10040 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-09 11:31:02 +08:00
Rafael Vasquez	6b30471586	[Misc] Improve Web UI (#10090 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-11-08 09:51:04 -08:00
Russell Bryant	3a7f15a398	[Doc] Move CONTRIBUTING to docs site (#9924 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-11-08 05:15:12 +00:00
whyiug	40d0e7411d	[Doc] Update FAQ links in spec_decode.rst (#9662 ) Signed-off-by: whyiug <whyiug@hotmail.com>	2024-11-08 04:44:58 +00:00
litianjian	28b2877d30	Online video support for VLMs (#10020 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: litianjian <litianjian@bytedance.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-07 20:25:59 +00:00
Maximilien de Bayser	ae62fd17c0	[Frontend] Tool calling parser for Granite 3.0 models (#9027 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2024-11-07 07:09:02 -08:00
Rafael Vasquez	d7263a1bb8	Doc: Improve benchmark documentation (#9927 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-11-06 23:50:35 -08:00
Cyrus Leung	db7db4aab9	[Misc] Consolidate ModelConfig code related to HF config (#10104 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-07 06:00:21 +00:00
youkaichao	e7b84c394d	[doc] add back Python 3.8 ABI (#10100 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-06 21:06:41 -08:00
Li, Jiang	a4b3e0c1e9	[Hardware][CPU] Update torch 2.5 (#9911 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2024-11-07 04:43:08 +00:00
Russell Bryant	098f94de42	[CI/Build] Drop Python 3.8 support (#10038 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-06 14:31:01 +00:00
Eric	406d4cc480	[Model][LoRA]LoRA support added for Qwen2VLForConditionalGeneration (#10022 ) Signed-off-by: ericperfect <ericperfectttt@gmail.com>	2024-11-06 14:13:15 +00:00
Jee Jee Li	a5bba7d234	[Model] Add Idefics3 support (#9767 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: B-201 <Joy25810@foxmail.com> Co-authored-by: B-201 <Joy25810@foxmail.com>	2024-11-06 11:41:17 +00:00
Jee Jee Li	2003cc3513	[Model][LoRA]LoRA support added for LlamaEmbeddingModel (#10071 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-11-06 09:49:19 +00:00
Konrad Zawora	a02a50e6e5	[Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend (#6143 ) Signed-off-by: yuwenzho <yuwen.zhou@intel.com> Signed-off-by: Chendi.Xue <chendi.xue@intel.com> Signed-off-by: Bob Zhu <bob.zhu@intel.com> Signed-off-by: zehao-intel <zehao.huang@intel.com> Signed-off-by: Konrad Zawora <kzawora@habana.ai> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Sanju C Sudhakaran <scsudhakaran@habana.ai> Co-authored-by: Michal Adamczyk <madamczyk@habana.ai> Co-authored-by: Marceli Fylcek <mfylcek@habana.ai> Co-authored-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com> Co-authored-by: Vivek Goel <vgoel@habana.ai> Co-authored-by: yuwenzho <yuwen.zhou@intel.com> Co-authored-by: Dominika Olszewska <dolszewska@habana.ai> Co-authored-by: barak goldberg <149692267+bgoldberg-habana@users.noreply.github.com> Co-authored-by: Michal Szutenberg <37601244+szutenberg@users.noreply.github.com> Co-authored-by: Jan Kaniecki <jkaniecki@habana.ai> Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyniewicz-habana@users.noreply.github.com> Co-authored-by: Krzysztof Wisniewski <kwisniewski@habana.ai> Co-authored-by: Dudi Lester <160421192+dudilester@users.noreply.github.com> Co-authored-by: Ilia Taraban <tarabanil@gmail.com> Co-authored-by: Chendi.Xue <chendi.xue@intel.com> Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai> Co-authored-by: Jakub Maksymczuk <jmaksymczuk@habana.ai> Co-authored-by: Tomasz Zielinski <85164140+tzielinski-habana@users.noreply.github.com> Co-authored-by: Sun Choi <schoi@habana.ai> Co-authored-by: Iryna Boiko <iboiko@habana.ai> Co-authored-by: Bob Zhu <41610754+czhu15@users.noreply.github.com> Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com> Co-authored-by: Zehao Huang <zehao.huang@intel.com> Co-authored-by: Andrzej Kotłowski <Andrzej.Kotlowski@intel.com> Co-authored-by: Yan Tomsinsky <73292515+Yantom1@users.noreply.github.com> Co-authored-by: Nir David <ndavid@habana.ai> Co-authored-by: Yu-Zhou <yu.zhou@intel.com> Co-authored-by: Ruheena Suhani Shaik <rsshaik@habana.ai> Co-authored-by: Karol Damaszke <kdamaszke@habana.ai> Co-authored-by: Marcin Swiniarski <mswiniarski@habana.ai> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Jacek Czaja <jacek.czaja@intel.com> Co-authored-by: Jacek Czaja <jczaja@habana.ai> Co-authored-by: Yuan <yuan.zhou@outlook.com>	2024-11-06 01:09:10 -08:00
Aaron Pham	21063c11c7	[CI/Build] drop support for Python 3.8 EOL (#8464 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2024-11-06 07:11:55 +00:00
Richard Liu	cd34029e91	Refactor TPU requirements file and pin build dependencies (#10010 ) Signed-off-by: Richard Liu <ricliu@google.com>	2024-11-05 16:48:44 +00:00
Roger Wang	6e056bcf04	[Doc] Update VLM doc about loading from local files (#9999 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2024-11-04 19:47:11 +00:00
shanshan wang	54597724f4	[Model] Add support for H2OVL-Mississippi models (#9747 ) Signed-off-by: Shanshan Wang <shanshan.wang@h2o.ai> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-11-04 00:15:36 +00:00
Michael Green	1d4cfe2be1	[Doc] Updated tpu-installation.rst with more details (#9926 ) Signed-off-by: Michael Green <mikegre@google.com>	2024-11-02 10:06:45 -04:00
Nick Hill	eed92f12fc	[Docs] Update Granite 3.0 models in supported models table (#9930 ) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-11-02 09:02:18 +00:00
Cyrus Leung	ba0d892074	[Frontend] Use a proper chat template for VLM2Vec (#9912 )	2024-11-01 14:09:07 +00:00
Cyrus Leung	06386a64dd	[Frontend] Chat-based Embeddings API (#9759 )	2024-11-01 08:13:35 +00:00
Cyrus Leung	d3aa2a8b2f	[Doc] Update multi-input support (#9906 )	2024-11-01 07:34:49 +00:00
Yongzao	2b5bf20988	[torch.compile] Adding torch compile annotations to some models (#9876 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-11-01 00:25:47 -07:00
Joe Runde	031a7995f3	[Bugfix][Frontend] Reject guided decoding in multistep mode (#9892 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-11-01 01:09:46 +00:00
Jee Jee Li	5608e611c2	[Doc] Update Qwen documentation (#9869 )	2024-10-31 08:54:18 +00:00
Guillaume Calmettes	abbfb6134d	[Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint (#9837 )	2024-10-30 18:15:56 -07:00
youkaichao	c2cd1a2142	[doc] update pp support (#9853 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-10-30 13:36:51 -07:00
Joe Runde	33d257735f	[Doc] link bug for multistep guided decoding (#9843 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-10-30 17:28:29 +00:00
Woosuk Kwon	211fe91aa8	[TPU] Correctly profile peak memory usage & Upgrade PyTorch XLA (#9438 )	2024-10-30 09:41:38 +00:00
Yan Ma	04a3ae0aca	[Bugfix] Fix multi nodes TP+PP for XPU (#8884 ) Signed-off-by: YiSheng5 <syhm@mail.ustc.edu.cn> Signed-off-by: yan ma <yan.ma@intel.com> Co-authored-by: YiSheng5 <syhm@mail.ustc.edu.cn>	2024-10-29 21:34:45 -07:00
Will Eaton	882a1ad0de	[Model] tool calling support for ibm-granite/granite-20b-functioncalling (#8339 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>	2024-10-29 15:07:37 -07:00
Russell Bryant	c5d7fb9ddc	[Doc] fix third-party model example (#9771 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-10-28 19:39:21 -07:00
kakao-kevin-us	6650e6a930	[Model] Add classification Task with Qwen2ForSequenceClassification (#9704 ) Signed-off-by: Kevin-Yang <ykcha9@gmail.com> Co-authored-by: Kevin-Yang <ykcha9@gmail.com>	2024-10-26 17:53:35 +00:00
Rafael Vasquez	228cfbd03f	[Doc] Improve quickstart documentation (#9256 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-10-25 14:32:10 -07:00
Cyrus Leung	b979143d5b	[Doc] Move additional tips/notes to the top (#9647 )	2024-10-24 09:43:59 +00:00
Yongzao	8a02cd045a	[torch.compile] Adding torch compile annotations to some models (#9639 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-10-24 00:54:57 -07:00
Cyrus Leung	836e8ef6ee	[Bugfix] Fix PP for ChatGLM and Molmo (#9422 )	2024-10-24 06:12:05 +00:00
Vinay R Damodaran	33bab41060	[Bugfix]: Make chat content text allow type content (#9358 ) Signed-off-by: Vinay Damodaran <vrdn@hey.com>	2024-10-24 05:05:49 +00:00
Yunfei Chu	fc6c274626	[Model] Add Qwen2-Audio model support (#9248 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-23 17:54:22 +00:00
Cyrus Leung	831540cf04	[Model] Support E5-V (#9576 )	2024-10-23 11:35:29 +08:00
Seth Kimmel	208cb34c81	[Doc]: Update tensorizer docs to include vllm[tensorizer] (#7889 ) Co-authored-by: Kaunil Dhruv <dhruv.kaunil@gmail.com>	2024-10-22 15:43:25 -07:00
Yuan	32a1ee74a0	[Hardware][Intel CPU][DOC] Update docs for CPU backend (#6212 ) Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com> Co-authored-by: Gubrud, Aaron D <aaron.d.gubrud@intel.com> Co-authored-by: adgubrud <96072084+adgubrud@users.noreply.github.com>	2024-10-22 10:38:04 -07:00
Isotr0py	bb392ea2d2	[Model][VLM] Initialize support for Mono-InternVL model (#9528 )	2024-10-22 16:01:46 +00:00
Rafael Vasquez	f7db5f0fa9	[Doc] Use shell code-blocks and fix section headers (#9508 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-10-22 06:43:24 +00:00
youkaichao	d621c43df7	[doc] fix format (#9562 )	2024-10-21 13:54:57 -07:00
Dhia Eddine Rhaiem	f6b97293aa	[Model] FalconMamba Support (#9325 )	2024-10-21 12:50:16 -04:00
Michael Goin	3921a2f29e	[Model] Support Pixtral models in the HF Transformers format (#9036 )	2024-10-18 13:29:56 -06:00
Cyrus Leung	051eaf6db3	[Model] Add user-configurable task for models that support both generation and embedding (#9424 )	2024-10-18 11:31:58 -07:00
tomeras91	d2b1bf55ec	[Frontend][Feature] Add jamba tool parser (#9154 )	2024-10-18 10:27:48 +00:00
Kuntai Du	81ede99ca4	[Core] Deprecating block manager v1 and make block manager v2 default (#8704 ) Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).	2024-10-17 11:38:15 -05:00
Li, Jiang	5eda21e773	[Hardware][CPU] compressed-tensor INT8 W8A8 AZP support (#9344 )	2024-10-17 12:21:04 -04:00
Junhao Li	5b8a1fde84	[Model][Bugfix] Add FATReLU activation and support for openbmb/MiniCPM-S-1B-sft (#9396 )	2024-10-16 16:40:24 +00:00
Roger Wang	59230ef32b	[Misc] Consolidate example usage of OpenAI client for multimodal models (#9412 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-16 11:20:51 +00:00
Cyrus Leung	cee711fdbb	[Core] Rename input data types (#8688 )	2024-10-16 10:49:37 +00:00
Cyrus Leung	7abba39ee6	[Model] VLM2Vec, the first multimodal embedding model in vLLM (#9303 )	2024-10-16 14:31:00 +08:00
Michael Goin	8e836d982a	[Doc] Fix code formatting in spec_decode.rst (#9348 )	2024-10-14 21:29:11 -07:00
Tyler Michael Smith	169b530607	[Bugfix] Clean up some cruft in mamba.py (#9343 )	2024-10-15 00:24:25 +00:00
Reza Salehi	dfe43a2071	[Model] Molmo vLLM Integration (#9016 ) Co-authored-by: sanghol <sanghol@allenai.org> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-10-14 07:56:24 -07:00
Yunmeng	2b184ddd4f	[Misc][Installation] Improve source installation script and doc (#9309 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-10-12 09:36:40 -07:00
Wallas Henrique	8baf85e4e9	[Doc] Compatibility matrix for mutual exclusive features (#8512 ) Signed-off-by: Wallas Santos <wallashss@ibm.com>	2024-10-11 11:18:50 -07:00
sixgod	6cf1167c1a	[Model] Add GLM-4v support and meet vllm==0.6.2 (#9242 )	2024-10-11 17:36:13 +00:00
Tyler Michael Smith	7342a7d7f8	[Model] Support Mamba (#6484 )	2024-10-11 15:40:06 +00:00
Cyrus Leung	e808156f30	[Misc] Collect model support info in a single process per model (#9233 )	2024-10-11 11:08:11 +00:00
omrishiv	f990bab2a4	[Doc][Neuron] add note to neuron documentation about resolving triton issue (#9257 ) Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>	2024-10-10 23:36:32 +00:00
Rafael Vasquez	055f3270d4	[Doc] Improve debugging documentation (#9204 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-10-10 10:48:51 -07:00
whyiug	04de9057ab	[Model] support input image embedding for minicpmv (#9237 )	2024-10-10 15:00:47 +00:00
youkaichao	de895f1697	[misc] improve model support check in another process (#9208 )	2024-10-09 21:58:27 -07:00
Li, Jiang	ca77dd7a44	[Hardware][CPU] Support AWQ for CPU backend (#7515 )	2024-10-09 10:28:08 -06:00
Jiangtao Hu	dc4aea677a	[Doc] Fix VLM prompt placeholder sample bug (#9170 )	2024-10-09 08:59:42 +00:00
Yuan Tang	acce7630c1	Update link to KServe deployment guide (#9173 )	2024-10-09 03:58:49 +00:00
Michael Goin	9ba0bd6aa6	Add `lm-eval` directly to requirements-test.txt (#9161 )	2024-10-08 18:22:31 -07:00
Rafael Vasquez	de24046fcd	[Doc] Improve contributing and installation documentation (#9132 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-10-08 20:22:08 +00:00
Sayak Paul	1874c6a1b0	[Doc] Update vlm.rst to include an example on videos (#9155 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-10-08 18:12:29 +00:00
TimWang	93cf74a8a7	[Doc]: Add deploying_with_k8s guide (#8451 )	2024-10-07 13:31:45 -07:00
Cyrus Leung	151ef4efd2	[Model] Support NVLM-D and fix QK Norm in InternViT (#9045 ) Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2024-10-07 11:55:12 +00:00
Cyrus Leung	b22b798471	[Model] PP support for embedding models and update docs (#9090 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-10-06 16:35:27 +08:00
Cyrus Leung	f22619fe96	[Misc] Remove user-facing error for removed VLM args (#9104 )	2024-10-06 01:33:52 -07:00
Andy Dai	5df1834895	[Bugfix] Fix order of arguments matters in config.yaml (#8960 )	2024-10-05 17:35:11 +00:00
Roger Wang	26aa325f4f	[Core][VLM] Test registration for OOT multimodal models (#8717 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-04 10:38:25 -07:00
Cyrus Leung	0e36fd4909	[Misc] Move registry to its own file (#9064 )	2024-10-04 10:01:37 +00:00
Murali Andoorveedu	0f6d7a9a34	[Models] Add remaining model PP support (#7168 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-04 10:56:58 +08:00
代君	3dbb215b38	[Frontend][Feature] support tool calling for internlm/internlm2_5-7b-chat model (#8405 )	2024-10-04 10:36:39 +08:00
Nick Hill	18c2e30c57	[Doc] Update Granite model docs (#9025 )	2024-10-03 02:42:24 +00:00
Sergey Shlyapnikov	f58d4fccc9	[OpenVINO] Enable GPU support for OpenVINO vLLM backend (#8192 )	2024-10-02 17:50:01 -04:00
Cyrus Leung	4f341bd4bf	[Doc] Update list of supported models (#8987 )	2024-10-02 00:35:39 +08:00
whyiug	e01ab595d8	[Model] support input embeddings for qwen2vl (#8856 )	2024-09-30 03:16:10 +00:00
youkaichao	cc276443b5	[doc] organize installation doc and expose per-commit docker (#8931 )	2024-09-28 17:48:41 -07:00
youkaichao	d86f6b2afb	[misc] fix wheel name (#8919 )	2024-09-27 22:10:44 -07:00
Cyrus Leung	3b00b9c26c	[Core] rename`PromptInputs` and `inputs` (#8876 )	2024-09-26 20:35:15 -07:00
Maximilien de Bayser	344cd2b6f4	[Feature] Add support for Llama 3.1 and 3.2 tool use (#8343 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2024-09-26 17:01:42 -07:00
youkaichao	70de39f6b4	[misc][installation] build from source without compilation (#8818 )	2024-09-26 13:19:04 -07:00
Roger Wang	4bb98f2190	[Misc] Update config loading for Qwen2-VL and remove Granite (#8837 )	2024-09-26 07:45:30 -07:00
Roger Wang	e2c6e0a829	[Doc] Update doc for Transformers 4.45 (#8817 )	2024-09-25 13:29:48 -07:00
Chen Zhang	770ec6024f	[Model] Add support for the multi-modal Llama 3.2 model (#8811 ) Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Chang Su <chang.s.su@oracle.com> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-09-25 13:29:32 -07:00
Simon Mo	4f1ba0844b	Revert "rename PromptInputs and inputs with backward compatibility (#8760 ) (#8810 )	2024-09-25 10:36:26 -07:00
Cyrus Leung	28e1299e60	rename PromptInputs and inputs with backward compatibility (#8760 )	2024-09-25 09:36:47 -07:00
Hongxia Yang	1c046447a6	[CI/Build][Bugfix][Doc][ROCm] CI fix and doc update after ROCm 6.2 upgrade (#8777 )	2024-09-25 22:26:37 +08:00
Jee Jee Li	13f9f7a3d0	[[Misc]Upgrade bitsandbytes to the latest version 0.44.0 (#8768 )	2024-09-24 17:08:55 -07:00
Simon Mo	3185fb0cca	Revert "[Core] Rename `PromptInputs` to `PromptType`, and `inputs` to `prompt`" (#8750 )	2024-09-24 05:45:20 +00:00
Hongxia Yang	530821d00c	[Hardware][AMD] ROCm6.2 upgrade (#8674 )	2024-09-23 18:52:39 -07:00
Daniele	ee5f34b1c2	[CI/Build] use setuptools-scm to set __version__ (#4738 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-09-23 09:44:26 -07:00
Yan Ma	d23679eb99	[Bugfix] fix docker build for xpu (#8652 )	2024-09-22 22:54:18 -07:00
youkaichao	d4a2ac8302	[build] enable existing pytorch (for GH200, aarch64, nightly) (#8713 )	2024-09-22 12:47:54 -07:00
litianjian	5b59532760	[Model][VLM] Add LLaVA-Onevision model support (#8486 ) Co-authored-by: litianjian <litianjian@bytedance.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-22 10:51:44 -07:00
Andy Dai	4dfdf43196	[Doc] Fix typo in AMD installation guide (#8689 )	2024-09-21 00:24:12 -07:00
Cyrus Leung	0057894ef7	[Core] Rename `PromptInputs` and `inputs`(#8673 )	2024-09-20 19:00:54 -07:00
omrishiv	7c8566aa4f	[Doc] neuron documentation update (#8671 ) Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>	2024-09-20 15:04:37 -07:00
Niklas Muennighoff	3b63de9353	[Model] Add OLMoE (#7922 )	2024-09-20 09:31:41 -07:00
Jiaxin Shan	260d40b5ea	[Core] Support Lora lineage and base model metadata management (#6315 )	2024-09-20 06:20:56 +00:00
Isotr0py	ea4647b7d7	[Doc] Add documentation for GGUF quantization (#8618 )	2024-09-19 13:15:55 -06:00
Geun, Lim	e18749ff09	[Model] Support Solar Model (#8386 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-09-18 11:04:00 -06:00
Alexander Matveev	7c7714d856	[Core][Bugfix][Perf] Introduce `MQLLMEngine` to avoid `asyncio` OH (#8157 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-09-18 13:56:58 +00:00
youkaichao	fa0c114fad	[doc] improve installation doc (#8550 ) Co-authored-by: Andy Dai <76841985+Imss27@users.noreply.github.com>	2024-09-17 16:24:06 -07:00
youkaichao	2759a43a26	[doc] update doc on testing and debugging (#8514 )	2024-09-16 12:10:23 -07:00
ywfang	8a0cf1ddc3	[Model] support minicpm3 (#8297 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-14 14:50:26 +00:00
Isotr0py	f57092c00b	[Doc] Add oneDNN installation to CPU backend documentation (#8467 )	2024-09-13 18:06:30 +00:00
Cyrus Leung	a84e598e21	[CI/Build] Reorganize models tests (#7820 )	2024-09-13 10:20:06 -07:00
youkaichao	cab69a15e4	[doc] recommend pip instead of conda (#8446 )	2024-09-12 23:52:41 -07:00
Alex Brooks	c6202daeed	[Model] Support multiple images for qwen-vl (#8247 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-12 10:10:54 -07:00
Patrick von Platen	d394787e52	Pixtral (#8377 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-09-11 14:41:55 -07:00
Yang Fan	3b7fea770f	[Model][VLM] Add Qwen2-VL model support (#7905 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-11 09:31:19 -07:00
Yangshen⚡Deng	6a512a00df	[model] Support for Llava-Next-Video model (#7559 ) Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-09-10 22:21:36 -07:00
Simon Mo	a1d874224d	Add NVIDIA Meetup slides, announce AMD meetup, and add contact info (#8319 )	2024-09-09 23:21:00 -07:00
Isotr0py	e807125936	[Model][VLM] Support multi-images inputs for InternVL2 models (#8201 )	2024-09-07 16:38:23 +08:00
Cyrus Leung	2f707fcb35	[Model] Multi-input support for LLaVA (#8238 )	2024-09-07 02:57:24 +00:00
William Lin	12dd715807	[misc] [doc] [frontend] LLM torch profiler support (#7943 )	2024-09-06 17:48:48 -07:00
Dipika Sikka	23f322297f	[Misc] Remove `SqueezeLLM` (#8220 )	2024-09-06 16:29:03 -06:00
Jiaxin Shan	db3bf7c991	[Core] Support load and unload LoRA in api server (#6566 ) Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2024-09-05 18:10:33 -07:00
sroy745	2febcf2777	[Documentation][Spec Decode] Add documentation about lossless guarantees in Speculative Decoding in vLLM (#7962 )	2024-09-05 16:25:29 -04:00
Alex Brooks	9da25a88aa	[MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) (#8029 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-05 12:48:10 +00:00
Cyrus Leung	288a938872	[Doc] Indicate more information about supported modalities (#8181 )	2024-09-05 10:51:53 +00:00
Kyle Mistele	e02ce498be	[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649 ) Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com> Co-authored-by: Kyle Mistele <kyle@constellate.ai>	2024-09-04 13:18:13 -07:00
Woosuk Kwon	61f4a93d14	[TPU][Bugfix] Use XLA rank for persistent cache path (#8137 )	2024-09-03 18:35:33 -07:00
Wenxiang	1248e8506a	[Model] Adding support for MSFT Phi-3.5-MoE (#7729 ) Co-authored-by: Your Name <you@example.com> Co-authored-by: Zeqi Lin <zelin@microsoft.com> Co-authored-by: Zeqi Lin <Zeqi.Lin@microsoft.com>	2024-08-30 13:42:57 -06:00
Kaunil Dhruv	058344f89a	[Frontend]-config-cli-args (#7737 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Kaunil Dhruv <kaunil_dhruv@intuit.com>	2024-08-30 08:21:02 -07:00
Yohan Na	dc13e99348	[MODEL] add Exaone model support (#7819 )	2024-08-29 23:34:20 -07:00
Stas Bekman	8c56e57def	[Doc] fix 404 link (#7966 )	2024-08-28 13:54:23 -07:00
Woosuk Kwon	eeffde1ac0	[TPU] Upgrade PyTorch XLA nightly (#7967 )	2024-08-28 13:10:21 -07:00
Stas Bekman	98c12cffe5	[Doc] fix the autoAWQ example (#7937 )	2024-08-28 12:12:32 +00:00
Peter Salas	fab5f53e2d	[Core][VLM] Stack multimodal tensors to represent multiple images within each prompt (#7902 )	2024-08-28 01:53:56 +00:00
Patrick von Platen	6fc4e6e07a	[Model] Add Mistral Tokenization to improve robustness and chat encoding (#7739 )	2024-08-27 12:40:02 +00:00
Peter Salas	57792ed469	[Doc] Fix incorrect docs from #7615 (#7788 )	2024-08-22 10:02:06 -07:00
zifeitong	df1a21131d	[Model] Fix Phi-3.5-vision-instruct 'num_crops' issue (#7710 )	2024-08-22 09:36:24 +08:00
Peter Salas	1ca0d4f86b	[Model] Add UltravoxModel and UltravoxConfig (#7615 )	2024-08-21 22:49:39 +00:00
William Lin	dd53c4b023	[misc] Add Torch profiler support (#7451 ) Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2024-08-21 15:39:26 -07:00
Cyrus Leung	baaedfdb2d	[mypy] Enable following imports for entrypoints (#7248 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Fei <dfdfcai4@gmail.com>	2024-08-20 23:28:21 -07:00
Roger Wang	4506641212	[Doc] Section for Multimodal Language Models (#7719 )	2024-08-20 23:24:01 -07:00
Ilya Lavrenov	398521ad19	[OpenVINO] Updated documentation (#7687 )	2024-08-20 07:33:56 -06:00
youkaichao	e54ebc2f8f	[doc] fix doc build error caused by msgspec (#7659 )	2024-08-19 17:50:59 -07:00
Michael Goin	d4f0f17b02	[Doc] Update quantization supported hardware table (#7595 )	2024-08-16 13:59:27 -07:00
Michael Goin	b3f4e17935	[Doc] Add docs for llmcompressor INT8 and FP8 checkpoints (#7444 )	2024-08-16 13:59:16 -07:00
Kameshwara Pavan Kumar Mantha	22b39e11f2	llama_index serving integration documentation (#6973 ) Co-authored-by: pavanmantha <pavan.mantha@thevaslabs.io>	2024-08-14 15:38:37 -07:00
Cyrus Leung	3f674a49b5	[VLM][Core] Support profiling with multiple multi-modal inputs per prompt (#7126 )	2024-08-14 17:55:42 +00:00
youkaichao	199adbb7cf	[doc] update test script to include cudagraph (#7501 )	2024-08-13 21:52:58 -07:00
Cyrus Leung	dd164d72f3	[Bugfix][Docs] Update list of mock imports (#7493 )	2024-08-13 20:37:30 -07:00
Woosuk Kwon	a08df8322e	[TPU] Support multi-host inference (#7457 )	2024-08-13 16:31:20 -07:00
Peter Salas	00c3d68e45	[Frontend][Core] Add plumbing to support audio language models (#7446 )	2024-08-13 17:39:33 +00:00
Woosuk Kwon	e20233d361	Revert "[Doc] Update supported_hardware.rst (#7276 )" (#7467 )	2024-08-13 01:37:08 -07:00
jon-chuang	a046f86397	[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel (#7208 ) Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2024-08-12 22:47:41 +00:00
Roger Wang	e6e42e4b17	[Core][VLM] Support image embeddings as input (#6613 )	2024-08-12 16:16:06 +08:00
Simon Mo	f020a6297e	[Docs] Update readme (#7316 )	2024-08-11 17:13:37 -07:00
tomeras91	02b1988b9f	[Doc] building vLLM with VLLM_TARGET_DEVICE=empty (#7403 )	2024-08-11 14:38:17 -07:00
Woosuk Kwon	90bab18f24	[TPU] Use mark_dynamic to reduce compilation time (#7340 )	2024-08-10 18:12:22 -07:00
Simon Mo	5923532e15	Add Skywork AI as Sponsor (#7314 )	2024-08-08 13:59:57 -07:00
Jee Jee Li	757ac70a64	[Model] Rename MiniCPMVQwen2 to MiniCPMV2.6 (#7273 )	2024-08-08 14:02:41 +00:00
Michael Goin	6d94420246	[Doc] Update supported_hardware.rst (#7276 )	2024-08-07 14:21:50 -07:00
Stas Bekman	0e12cd67a8	[Doc] add online speculative decoding example (#7243 )	2024-08-07 09:58:02 -07:00
Ilya Lavrenov	80cbe10c59	[OpenVINO] migrate to latest dependencies versions (#7251 )	2024-08-07 09:49:10 -07:00
Roger Wang	2385c8f374	[Doc] Mock new dependencies for documentation (#7245 )	2024-08-07 06:43:03 +00:00
Thomas Parnell	789937af2e	[Doc] [SpecDecode] Update MLPSpeculator documentation (#7100 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2024-08-05 23:29:43 +00:00
Simon Mo	4db5176d97	bump version to v0.5.4 (#7139 )	2024-08-05 14:39:48 -07:00
Jee Jee Li	179a6a36f2	[Model]Refactor MiniCPMV (#7020 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-08-04 08:12:41 +00:00
Yihuan Bu	654bc5ca49	Support for guided decoding for offline LLM (#6878 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-08-04 03:12:09 +00:00
Michael Goin	b482b9a5b1	[CI/Build] Add support for Python 3.12 (#7035 )	2024-08-02 13:51:22 -07:00
Murali Andoorveedu	fc912e0886	[Models] Support Qwen model with PP (#6974 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-08-01 12:40:43 -07:00
Jee Jee Li	7ecee34321	[Kernel][RFC] Refactor the punica kernel based on Triton (#5036 )	2024-07-31 17:12:24 -07:00
Alphi	2f4e108f75	[Bugfix] Clean up MiniCPM-V (#6939 ) Co-authored-by: hezhihui <hzh7269@modelbest.cn> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-07-31 14:39:19 +00:00
Cyrus Leung	f230cc2ca6	[Bugfix] Fix broadcasting logic for `multi_modal_kwargs` (#6836 )	2024-07-31 10:38:45 +08:00
Ilya Lavrenov	5895b24677	[OpenVINO] Updated OpenVINO requirements and build docs (#6948 )	2024-07-30 11:33:01 -07:00
Isotr0py	7cbd9ec7a9	[Model] Initialize support for InternVL2 series models (#6514 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-29 10:16:30 +00:00
Woosuk Kwon	fad5576c58	[TPU] Reduce compilation time & Upgrade PyTorch XLA version (#6856 )	2024-07-27 10:28:33 -07:00
Chenggang Wu	f954d0715c	[Docs] Add RunLLM chat widget (#6857 )	2024-07-27 09:24:46 -07:00
Cyrus Leung	1ad86acf17	[Model] Initial support for BLIP-2 (#5920 ) Co-authored-by: ywang96 <ywang@roblox.com>	2024-07-27 11:53:07 +00:00
Roger Wang	ecb33a28cb	[CI/Build][Doc] Update CI and Doc for VLM example changes (#6860 )	2024-07-27 09:54:14 +00:00
Harry Mellor	c53041ae3b	[Doc] Add missing mock import to docs `conf.py` (#6834 )	2024-07-27 04:47:33 +00:00
omrishiv	3c3012398e	[Doc] add VLLM_TARGET_DEVICE=neuron to documentation for neuron (#6844 ) Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>	2024-07-26 20:20:16 -07:00
Woosuk Kwon	ced36cd89b	[ROCm] Upgrade PyTorch nightly version (#6845 )	2024-07-26 20:16:13 -07:00
Zhanghao Wu	150a1ffbfd	[Doc] Update SkyPilot doc for wrong indents and instructions for update service (#4283 )	2024-07-26 14:39:10 -07:00
Michael Goin	281977bd6e	[Doc] Add Nemotron to supported model docs (#6843 )	2024-07-26 17:32:44 -04:00
Li, Jiang	3bbb4936dc	[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125 )	2024-07-26 13:50:10 -07:00
youkaichao	85ad7e2d01	[doc][debugging] add known issues for hangs (#6816 )	2024-07-25 21:48:05 -07:00
Woosuk Kwon	b7215de2c5	[Docs] Publish 5th meetup slides (#6799 )	2024-07-25 16:47:55 -07:00
youkaichao	f3ff63c3f4	[doc][distributed] improve multinode serving doc (#6804 )	2024-07-25 15:38:32 -07:00
Kuntai Du	6a1e25b151	[Doc] Add documentations for nightly benchmarks (#6412 )	2024-07-25 11:57:16 -07:00
Alphi	9e169a4c61	[Model] Adding support for MiniCPM-V (#4087 )	2024-07-24 20:59:30 -07:00
Hongxia Yang	d88c458f44	[Doc][AMD][ROCm]Added tips to refer to mi300x tuning guide for mi300x users (#6754 )	2024-07-24 14:32:57 -07:00
Woosuk Kwon	ccc4a73257	[Docs][ROCm] Detailed instructions to build from source (#6680 )	2024-07-24 01:07:23 -07:00
dongmao zhang	87525fab92	[bitsandbytes]: support read bnb pre-quantized model (#5753 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-07-23 23:45:09 +00:00
youkaichao	71950af726	[doc][distributed] fix doc argument order (#6691 )	2024-07-23 08:55:33 -07:00
Woosuk Kwon	cb1362a889	[Docs] Announce llama3.1 support (#6688 )	2024-07-23 08:18:15 -07:00
Roger Wang	22fa2e35cb	[VLM][Model] Support image input for Chameleon (#6633 )	2024-07-22 23:50:48 -07:00
youkaichao	c051bfe4eb	[doc][distributed] doc for setting up multi-node environment (#6529 ) [doc][distributed] add more doc for setting up multi-node environment (#6529)	2024-07-22 21:22:09 -07:00
Cyrus Leung	739b61a348	[Frontend] Refactor prompt processing (#4028 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-22 10:13:53 -07:00
Matt Wong	06d6c5fe9f	[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes (#6543 )	2024-07-20 09:39:07 -07:00
Murali Andoorveedu	45ceb85a0c	[Docs] Update PP docs (#6598 )	2024-07-19 16:38:21 -07:00
Simon Mo	30efe41532	[Docs] Update docs for wheel location (#6580 )	2024-07-19 12:14:11 -07:00
milo157	a38524f338	[DOC] - Add docker image to Cerebrium Integration (#6510 )	2024-07-17 10:22:53 -07:00
Cyrus Leung	5bf35a91e4	[Doc][CI/Build] Update docs and tests to use `vllm serve` (#6431 )	2024-07-17 07:43:21 +00:00
Hongxia Yang	10383887e0	[ROCm] Cleanup Dockerfile and remove outdated patch (#6482 )	2024-07-16 22:47:02 -07:00
Jiaxin Shan	94162beb9f	[Doc] Fix the lora adapter path in server startup script (#6230 )	2024-07-16 10:11:04 -07:00
Woosuk Kwon	c467dff24f	[Hardware][TPU] Support MoE with Pallas GMM kernel (#6457 )	2024-07-16 09:56:28 -07:00
youkaichao	9f4ccec761	[doc][misc] remind to cancel debugging environment variables (#6481 ) [doc][misc] remind users to cancel debugging environment variables after debugging (#6481)	2024-07-16 09:45:30 -07:00
Kevin H. Luu	d6f3b3d5c4	Pin sphinx-argparse version (#6453 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-07-16 01:26:11 +00:00
Woosuk Kwon	3dee97b05f	[Docs] Add Google Cloud to sponsor list (#6450 )	2024-07-15 11:58:10 -07:00
youkaichao	94b82e8c18	[doc][distributed] add suggestion for distributed inference (#6418 )	2024-07-15 09:45:51 -07:00
youkaichao	22e79ee8f3	[doc][misc] doc update (#6439 )	2024-07-14 23:33:25 -07:00
Robert Cohn	61e85dbad8	[Doc] xpu backend requires running setvars.sh (#6393 )	2024-07-14 17:10:11 -07:00
Ethan Xu	dbfe254eda	[Feature] vLLM CLI (#5090 ) Co-authored-by: simon-mo <simon.mo@hey.com>	2024-07-14 15:36:43 -07:00
Yuan Tang	6ef3bf912c	Remove unnecessary trailing period in spec_decode.rst (#6405 )	2024-07-14 07:58:09 +00:00
Isotr0py	540c0368b1	[Model] Initialize Fuyu-8B support (#3924 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-14 05:27:14 +00:00
Saliya Ekanayake	a27f87da34	[Doc] Fix Typo in Doc (#6392 ) Co-authored-by: Saliya Ekanayake <esaliya@d-matrix.ai>	2024-07-13 00:48:23 +00:00
Simon Mo	d719ba24c5	Build some nightly wheels by default (#6380 )	2024-07-12 13:56:59 -07:00
youkaichao	2d23b42d92	[doc] update pipeline parallel in readme (#6347 )	2024-07-11 11:38:40 -07:00
Jie Fu (傅杰)	439c84581a	[Doc] Update description of vLLM support for CPUs (#6003 )	2024-07-10 21:15:29 -07:00
Cyrus Leung	8a924d2248	[Doc] Guide for adding multi-modal plugins (#6205 )	2024-07-10 14:55:34 +08:00
Murali Andoorveedu	673dd4cae9	[Docs] Docs update for Pipeline Parallel (#6222 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-07-09 16:24:58 -07:00
Roger Wang	6206dcb29e	[Model] Add PaliGemma (#5189 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-07-07 09:25:50 +08:00
Cyrus Leung	9389380015	[Doc] Move guide for multimodal model and other improvements (#6168 )	2024-07-06 17:18:59 +08:00
Roger Wang	175c43eca4	[Doc] Reorganize Supported Models by Type (#6167 )	2024-07-06 05:59:36 +00:00
Simon Mo	79d406e918	[Docs] Fix readthedocs for tag build (#6158 )	2024-07-05 12:44:40 -07:00
Cyrus Leung	ae96ef8fbd	[VLM] Calculate maximum number of multi-modal tokens by model (#6121 )	2024-07-04 16:37:23 -07:00
youkaichao	27902d42be	[misc][doc] try to add warning for latest html (#5979 )	2024-07-04 09:57:09 -07:00
youkaichao	966fe72141	[doc][misc] bump up py version in installation doc (#6119 )	2024-07-03 15:52:04 -07:00
xwjiang2010	d9e98f42e4	[vlm] Remove vision language config. (#6089 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-03 22:14:16 +00:00
Michael Goin	47f0954af0	[Kernel] Expand FP8 support to Ampere GPUs using FP8 Marlin (#5975 )	2024-07-03 17:38:00 +00:00
Roger Wang	f1c78138aa	[Doc] Fix Mock Import (#6094 )	2024-07-03 00:13:56 -07:00
Cyrus Leung	9831aec49f	[Core] Dynamic image size support for VLMs (#5276 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: ywang96 <ywang@roblox.com> Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-07-02 20:34:00 -07:00
Mor Zusman	9d6a8daa87	[Model] Jamba support (#4115 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: Erez Schwartz <erezs@ai21.com> Co-authored-by: Mor Zusman <morz@ai21.com> Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com> Co-authored-by: Tomer Asida <tomera@ai21.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-02 23:11:29 +00:00
Cyrus Leung	31354e563f	[Doc] Reinstate doc dependencies (#6061 )	2024-07-02 10:53:16 +00:00
xwjiang2010	98d6682cd1	[VLM] Remove `image_input_type` from VLM config (#5852 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-02 07:57:09 +00:00
Roger Wang	8e0817c262	[Bugfix][Doc] Fix Doc Formatting (#6048 )	2024-07-01 15:09:11 -07:00
ning.zhang	83bdcb6ac3	add FAQ doc under 'serving' (#5946 )	2024-07-01 14:11:36 -07:00
youkaichao	4050d646e5	[doc][misc] remove deprecated api server in doc (#6037 )	2024-07-01 12:52:43 -04:00
Ilya Lavrenov	57f09a419c	[Hardware][Intel] OpenVINO vLLM backend (#5379 )	2024-06-28 13:50:16 +00:00
Cyrus Leung	5cbe8d155c	[Core] Registry for processing model inputs (#5214 ) Co-authored-by: ywang96 <ywang@roblox.com>	2024-06-28 12:09:56 +00:00
Woosuk Kwon	79c92c7c8a	[Model] Add Gemma 2 (#5908 )	2024-06-27 13:33:56 -07:00
youkaichao	3fd02bda51	[doc][misc] add note for Kubernetes users (#5916 )	2024-06-27 10:07:07 -07:00
Cyrus Leung	96354d6a29	[Model] Add base class for LoRA-supported models (#5018 )	2024-06-27 16:03:04 +08:00
youkaichao	294104c3f9	[doc] update usage of env var to avoid conflict (#5873 )	2024-06-26 17:57:12 -04:00
Roger Wang	3aa7b6cf66	[Misc][Doc] Add Example of using OpenAI Server with VLM (#5832 )	2024-06-25 20:34:25 -07:00
Matt Wong	dd793d1de5	[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (#5422 )	2024-06-25 15:56:15 -07:00
youkaichao	c18ebfdd71	[doc][distributed] add both gloo and nccl tests (#5834 )	2024-06-25 15:10:28 -04:00
Cyrus Leung	f23871e9ee	[Doc] Add notice about breaking changes to VLMs (#5818 )	2024-06-25 01:25:03 -07:00
Michael Goin	1744cc99ba	[Doc] Add Phi-3-medium to list of supported models (#5788 )	2024-06-24 10:48:55 -07:00
Michael Goin	e72dc6cb35	[Doc] Add "Suggest edit" button to doc pages (#5789 )	2024-06-24 10:26:17 -07:00
youkaichao	c246212952	[doc][faq] add warning to download models for every nodes (#5783 )	2024-06-24 15:37:42 +08:00
Woosuk Kwon	8c00f9c15d	[Docs][TPU] Add installation tip for TPU (#5761 )	2024-06-21 23:09:40 -07:00
Michael Goin	5b15bde539	[Doc] Documentation on supported hardware for quantization methods (#5745 )	2024-06-21 12:44:29 -04:00
Roger Wang	1b2eaac316	[Bugfix][Doc] FIx Duplicate Explicit Target Name Errors (#5703 )	2024-06-19 23:10:47 -07:00
Rafael Vasquez	e83db9e7e3	[Doc] Update docker references (#5614 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-06-19 15:01:45 -07:00
milo157	2bd231a7b7	[Doc] Added cerebrium as Integration option (#5553 )	2024-06-18 15:56:59 -07:00
Isotr0py	daef218b55	[Model] Initialize Phi-3-vision support (#4986 )	2024-06-17 19:34:33 -07:00
Kunshang Ji	728c4c8a06	[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (#3814 ) Co-authored-by: Jiang Li <jiang1.li@intel.com> Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com> Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>	2024-06-17 11:01:25 -07:00
youkaichao	845a3f26f9	[Doc] add debugging tips for crash and multi-node debugging (#5581 )	2024-06-17 10:08:01 +08:00
Sanger Steel	6e2527a7cb	[Doc] Update documentation on Tensorizer (#5471 )	2024-06-14 11:27:57 -07:00
Simon Mo	cdab68dcdb	[Docs] Add ZhenFund as a Sponsor (#5548 )	2024-06-14 11:17:21 -07:00
Cyrus Leung	0ce7b952f8	[Doc] Update LLaVA docs (#5437 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-06-13 11:22:07 -07:00
Woosuk Kwon	a65634d3ae	[Docs] Add 4th meetup slides (#5509 )	2024-06-13 10:18:26 -07:00
Li, Jiang	80aa7e91fc	[Hardware][Intel] Optimize CPU backend and add more performance tips (#4971 ) Co-authored-by: Jianan Gu <jianan.gu@intel.com>	2024-06-13 09:33:14 -07:00
Cyrus Leung	b8d4dfff9c	[Doc] Update debug docs (#5438 )	2024-06-12 14:49:31 -07:00
Woosuk Kwon	1a8bfd92d5	[Hardware] Initial TPU integration (#5292 )	2024-06-12 11:53:03 -07:00
youkaichao	8f89d72090	[Doc] add common case for long waiting time (#5430 )	2024-06-11 11:12:13 -07:00
Nick Hill	99dac099ab	[Core][Doc] Default to multiprocessing for single-node distributed case (#5230 ) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2024-06-11 11:10:41 -07:00
Cade Daniel	89ec06c33b	[Docs] [Spec decode] Fix docs error in code example (#5427 )	2024-06-11 10:31:56 -07:00
Kuntai Du	9fde251bf0	[Doc] Add an automatic prefix caching section in vllm documentation (#5324 ) Co-authored-by: simon-mo <simon.mo@hey.com>	2024-06-11 10:24:59 -07:00
Cade Daniel	4c2ffb28ff	[Speculative decoding] Initial spec decode docs (#5400 )	2024-06-11 10:15:40 -07:00
SangBin Cho	246598a6b1	[CI] docfix (#5410 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: ywang96 <ywang@roblox.com>	2024-06-11 01:28:50 -07:00
Roger Wang	3c4cebf751	[Doc][Typo] Fixing Missing Comma (#5403 )	2024-06-11 00:20:28 -07:00
youkaichao	d8f31f2f8b	[Doc] add debugging tips (#5409 )	2024-06-10 23:21:43 -07:00
Michael Goin	77c87beb06	[Doc] Add documentation for FP8 W8A8 (#5388 )	2024-06-10 18:55:12 -06:00
Woosuk Kwon	cb77ad836f	[Docs] Alphabetically sort sponsors (#5386 )	2024-06-10 15:17:19 -05:00
Roger Wang	856c990041	[Docs] Add Docs on Limitations of VLM Support (#5383 )	2024-06-10 09:53:50 -07:00
Cyrus Leung	6b29d6fe70	[Model] Initial support for LLaVA-NeXT (#4199 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-06-10 12:47:15 +00:00
Roger Wang	7a9cb294ae	[Frontend] Add OpenAI Vision API Support (#5237 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-06-07 11:23:32 -07:00
Simon Mo	f270a39537	[Docs] Add Sequoia as sponsors (#5287 )	2024-06-05 18:02:56 +00:00
Jie Fu (傅杰)	87d5abef75	[Bugfix] Fix a bug caused by pip install setuptools>=49.4.0 for CPU backend (#5249 )	2024-06-04 09:57:51 -07:00
Breno Faria	f775a07e30	[FRONTEND] OpenAI `tools` support named functions (#5032 )	2024-06-03 18:25:29 -05:00
Cyrus Leung	7a64d24aad	[Core] Support image processor (#4197 )	2024-06-02 22:56:41 -07:00
Nick Hill	657579113f	[Doc] Add checkmark for GPTBigCodeForCausalLM LoRA support (#5171 )	2024-05-31 17:20:19 -07:00
Chansung Park	429d89720e	add doc about serving option on dstack (#3074 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-05-30 10:11:07 -07:00
Cyrus Leung	a9bcc7afb2	[Doc] Use intersphinx and update entrypoints docs (#5125 )	2024-05-30 09:59:23 -07:00
youkaichao	4fbcb0f27e	[Doc][Build] update after removing vllm-nccl (#5103 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-05-29 23:51:18 +00:00
Cyrus Leung	5ae5ed1e60	[Core] Consolidate prompt arguments to LLM engines (#4328 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-05-28 13:29:31 -07:00
Simon Mo	290f4ada2b	[Docs] Add Dropbox as sponsors (#5089 )	2024-05-28 10:29:09 -07:00
Eric Xihui Lin	8e192ff967	[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799 ) Co-authored-by: beagleski <yunanzhang@microsoft.com> Co-authored-by: bapatra <bapatra@microsoft.com> Co-authored-by: Barun Patra <codedecde@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-05-24 22:00:52 -07:00
youkaichao	6a50f4cafa	[Doc] add ccache guide in doc (#5012 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-05-23 23:21:54 +00:00
Simon Mo	e941f88584	[Docs] Add acknowledgment for sponsors (#4925 )	2024-05-21 00:17:25 -07:00
Isotr0py	f12c3b5b3d	[Model] Add Phi-2 LoRA support (#4886 )	2024-05-21 14:24:17 +09:00
Kante Yin	8e7fb5d43a	Support to serve vLLM on Kubernetes with LWS (#4829 ) Signed-off-by: kerthcet <kerthcet@gmail.com>	2024-05-16 16:37:29 -07:00
Cyrus Leung	dc72402b57	[Bugfix][Doc] Fix CI failure in docs (#4804 ) This PR fixes the CI failure introduced by #4798. The failure originates from having duplicate target names in reST, and is fixed by changing the ref targets to anonymous ones. For more information, see this discussion. I have also changed the format of the links to be more distinct from each other.	2024-05-15 01:57:08 +09:00
Zhuohan Li	c579b750a0	[Doc] Add meetups to the doc (#4798 )	2024-05-13 18:48:00 -07:00
Cyrus Leung	4bfa7e7f75	[Doc] Add API reference for offline inference (#4710 )	2024-05-13 17:47:42 -07:00
Zhuohan Li	ac1fbf7fd2	[Doc] Shorten README by removing supported model list (#4796 )	2024-05-13 16:23:54 -07:00
SangBin Cho	e7c46b9527	[Scheduler] Warning upon preemption and Swapping (#4647 ) Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>	2024-05-13 23:50:44 +09:00
Allen.Dou	706588a77d	[Bugfix] Fix CLI arguments in OpenAI server docs (#4729 )	2024-05-11 00:00:56 +09:00
Simon Mo	51d4094fda	chunked-prefill-doc-syntax (#4603 ) Fix the docs: https://docs.vllm.ai/en/latest/models/performance.html Co-authored-by: sang <rkooo567@gmail.com>	2024-05-10 14:13:23 +09:00
Cyrus Leung	a3c124570a	[Bugfix] Fix CLI arguments in OpenAI server docs (#4709 )	2024-05-09 09:53:14 -07:00
SangBin Cho	36fb68f947	[Doc] Chunked Prefill Documentation (#4580 )	2024-05-04 00:18:00 -07:00
youkaichao	2d7bce9cd5	[Doc] add env vars to the doc (#4572 )	2024-05-03 05:13:49 +00:00
Frαnçois	e491c7e053	[Doc] update(example model): for OpenAI compatible serving (#4503 )	2024-05-01 10:14:16 -07:00
fuchen.ljl	ee37328da0	Unable to find Punica extension issue during source code installation (#4494 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-05-01 00:42:09 +00:00
Prashant Gupta	b31a1fb63c	[Doc] add visualization for multi-stage dockerfile (#4456 ) Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-04-30 17:41:59 +00:00
SangBin Cho	a88081bf76	[CI] Disable non-lazy string operation on logging (#4326 ) Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>	2024-04-26 00:16:58 -07:00
Hongxia Yang	cf29b7eda4	[ROCm][Hardware][AMD][Doc] Documentation update for ROCm (#4376 ) Co-authored-by: WoosukKwon <woosuk.kwon@berkeley.edu>	2024-04-25 18:12:25 -07:00
Isotr0py	fbf152d976	[Bugfix][Model] Refactor OLMo model to support new HF format in transformers 4.40.0 (#4324 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-04-25 09:35:56 -07:00
Caio Mendes	96e90fdeb3	[Model] Adds Phi-3 support (#4298 )	2024-04-25 03:06:57 +00:00
youkaichao	2768884ac4	[Doc] Add note for docker user (#4340 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-04-24 21:09:44 +00:00
Harry Mellor	34128a697e	Fix `autodoc` directives (#4272 ) Co-authored-by: Harry Mellor <hmellor@oxts.com>	2024-04-23 01:53:01 +00:00
Zhanghao Wu	ceaf4ed003	[Doc] Update the SkyPilot doc with serving and Llama-3 (#4276 )	2024-04-22 15:34:31 -07:00

... 8 9 10 11 12 ...

1092 Commits