vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Cyrus Leung	32b4c63f02	[Doc] Convert list tables to MyST (#11594 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-29 15:56:22 +08:00
youkaichao	328841d002	[bugfix] interleaving sliding window for cohere2 model (#11583 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-28 16:55:42 +00:00
Cyrus Leung	d427e5cfda	[Doc] Minor documentation fixes (#11580 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-28 21:53:59 +08:00
Isotr0py	d34be24bb1	[Model] Support InternLM2 Reward models (#11571 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-28 06:14:10 +00:00
Cyrus Leung	101418096f	[VLM] Support caching in merged multi-modal processor (#11396 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-27 17:22:48 +00:00
Chen1022	5ce4627a7e	[Doc] Add xgrammar in doc (#11549 ) Signed-off-by: ccjincong <chenjincong11@gmail.com>	2024-12-27 13:05:10 +00:00
AlexHe99	d003f3ea39	Update deploying_with_k8s.md with AMD ROCm GPU example (#11465 ) Signed-off-by: Alex He <alehe@amd.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-27 10:00:04 +00:00
Robert Shaw	0c0c2015c5	Update openai_compatible_server.md (#11536 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-12-26 16:26:18 -08:00
Simon Mo	82d24f7aac	[Docs] Document Deepseek V3 support (#11535 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2024-12-26 16:21:56 -08:00
Isotr0py	b85a977822	[Doc] Add video example to openai client for multimodal (#11521 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-26 17:31:29 +00:00
Roger Wang	7492a36207	[Doc] Add `QVQ` and `QwQ` to the list of supported models (#11509 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-12-26 09:44:32 +00:00
Cyrus Leung	6ad909fdda	[Doc] Improve GitHub links (#11491 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-25 14:49:26 -08:00
Cyrus Leung	3f3e92e1f2	[Model] Automatic conversion of classification and reward models (#11469 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-24 18:22:22 +00:00
Cyrus Leung	9edca6bf8f	[Frontend] Online Pooling API (#11457 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-24 17:54:30 +08:00
Rafael Vasquez	32aa2059ad	[Docs] Convert rST to MyST (Markdown) (#11145 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-12-23 22:35:38 +00:00
Yuan Tang	2e726680b3	[Bugfix] torch nightly version in ROCm installation guide (#11423 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2024-12-23 17:20:22 +00:00
youkaichao	5d2248d81a	[doc] explain nccl requirements for rlhf (#11381 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-20 13:00:56 -08:00
omer-dayan	995f56236b	[Core] Loading model from S3 using RunAI Model Streamer as optional loader (#10192 ) Signed-off-by: OmerD <omer@run.ai>	2024-12-20 16:46:24 +00:00
youkaichao	1ecc645b8f	[doc] backward compatibility for 0.6.4 (#11359 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-19 21:33:53 -08:00
youkaichao	7801f56ed7	[ci][gh200] dockerfile clean up (#11351 ) Signed-off-by: drikster80 <ed.sealing@gmail.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: drikster80 <ed.sealing@gmail.com> Co-authored-by: cenzhiyao <2523403608@qq.com>	2024-12-19 18:13:06 -08:00
Yehoshua Cohen	6c7f881541	[Model] Add JambaForSequenceClassification model (#10860 ) Signed-off-by: Yehoshua Cohen <yehoshuaco@ai21.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Yehoshua Cohen <yehoshuaco@ai21.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-19 22:48:06 +08:00
Travis Johnson	17ca964273	[Model] IBM Granite 3.1 (#11307 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-12-19 11:27:24 +08:00
kYLe	66d4b16724	[Frontend] Add OpenAI API support for input_audio (#11027 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-16 22:09:58 -08:00
youkaichao	35bae114a8	fix gh200 tests on main (#11246 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-16 17:22:38 -08:00
bk-TurbaAI	35ffa682b1	[Docs] hint to enable use of GPU performance counters in profiling tools for multi-node distributed serving (#11235 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-12-16 22:20:39 +00:00
Jani Monoses	bddbbcb132	[Model] Support Cohere2ForCausalLM (Cohere R7B) (#11203 )	2024-12-16 09:56:19 +00:00
cennn	b3b1526f03	WIP: [CI/Build] simplify Dockerfile build for ARM64 / GH200 (#11212 ) Signed-off-by: drikster80 <ed.sealing@gmail.com> Co-authored-by: drikster80 <ed.sealing@gmail.com>	2024-12-16 09:20:49 +00:00
AlexHe99	da6f409246	Update deploying_with_k8s.rst (#10922 )	2024-12-15 16:33:58 -08:00
Kuntai Du	38e599d6a8	[Doc] add documentation for disaggregated prefilling (#11197 ) Signed-off-by: Kuntai Du <kuntai@uchicago.edu>	2024-12-15 13:31:16 -06:00
Jee Jee Li	15859f2357	[[Misc]Upgrade bitsandbytes to the latest version 0.45.0 (#11201 )	2024-12-15 03:03:06 +00:00
Russell Bryant	4863e5fba5	[Core] V1: Use multiprocessing by default (#11074 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-12-13 16:27:32 -08:00
Cyrus Leung	0920ab9131	[Doc] Reorganize online pooling APIs (#11172 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-14 00:22:22 +08:00
Cyrus Leung	eeec9e3390	[Frontend] Separate pooling APIs in offline inference (#11129 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-13 10:40:07 +00:00
Jani Monoses	7cd7409142	PaliGemma 2 support (#11142 )	2024-12-13 07:40:07 +00:00
Ramon Ziai	d4d5291cc2	fix(docs): typo in helm install instructions (#11141 ) Signed-off-by: Ramon Ziai <ramon.ziai@bettermarks.com>	2024-12-12 17:36:32 +00:00
Pooya Davoodi	1da8f0e1dd	[Model] Add support for embedding model GritLM (#10816 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>	2024-12-12 06:39:16 +00:00
Yuan Tang	24a36d6d5f	Update link to LlamaStack remote vLLM guide in serving_with_llamastack.rst (#11112 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2024-12-12 02:39:21 +00:00
bingps	fd22220687	[Doc] Installed version of llmcompressor for int8/fp8 quantization (#11103 ) Signed-off-by: Guangda Liu <bingps@users.noreply.github.com> Co-authored-by: Guangda Liu <bingps@users.noreply.github.com>	2024-12-11 15:43:24 +00:00
Cyrus Leung	cad5c0a6ed	[Doc] Update docs to refer to pooling models (#11093 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-11 13:36:27 +00:00
Cyrus Leung	8f10d5e393	[Misc] Split up pooling tasks (#10820 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-11 01:28:00 -08:00
Mor Zusman	ffa48c9146	[Model] PP support for Mamba-like models (#10992 ) Signed-off-by: mzusman <mor.zusmann@gmail.com>	2024-12-10 21:53:37 -05:00
Maxime Fournioux	fe2e10c71b	Add example of helm chart for vllm deployment on k8s (#9199 ) Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>	2024-12-10 09:19:27 +00:00
Michael Goin	6d525288c1	[Docs] Add dedicated tool calling page to docs (#10554 ) Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-12-09 20:15:34 -05:00
Roger Wang	af7c4a92e6	[Doc][V1] Add V1 support column for multimodal models (#10998 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2024-12-08 22:29:16 -08:00
Cyrus Leung	c889d5888b	[Doc] Explicitly state that PP isn't compatible with speculative decoding yet (#10975 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-07 17:20:49 +00:00
Cyrus Leung	39e227c7ae	[Model] Update multi-modal processor to support Mantis(LLaVA) model (#10711 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-07 17:10:05 +00:00
Cyrus Leung	1c768fe537	[Doc] Explicitly state that InternVL 2.5 is supported (#10978 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-07 16:58:02 +00:00
Sam Stoelinga	7406274041	[Doc] add KubeAI to serving integrations (#10837 ) Signed-off-by: Sam Stoelinga <sammiestoel@gmail.com>	2024-12-06 17:03:56 +00:00
Cyrus Leung	aa39a8e175	[Doc] Create a new "Usage" section (#10827 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-05 11:19:35 +08:00
Daniele	e4c34c23de	[CI/Build] improve python-only dev setup (#9621 ) Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-12-04 21:48:13 +00:00
Kevin H. Luu	c92acb9693	[ci/build] Update vLLM postmerge ECR repo (#10887 )	2024-12-04 09:01:20 +00:00
Aaron Pham	9323a3153b	[Core][Performance] Add XGrammar support for guided decoding and set it as default (#10785 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2024-12-03 15:17:00 +08:00
Russell Bryant	ef51831ee8	[Doc] Add github links for source code references (#10672 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-03 06:46:07 +00:00
youkaichao	169a0ff911	[doc] add warning about comparing hf and vllm outputs (#10805 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-01 00:41:38 -08:00
Cyrus Leung	133707123e	[Model] Replace embedding models with pooling adapter (#10769 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-01 08:02:54 +08:00
wangxiyuan	7e4bbda573	[doc] format fix (#10789 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2024-11-30 11:38:40 +00:00
Isotr0py	c83919c7a6	[Model] Add Internlm2 LoRA support (#5064 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2024-11-28 17:29:04 +00:00
sixgod	5fc5ce0fe4	[Model] Added GLM-4 series hf format model support vllm==0.6.4 (#10561 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-11-28 14:53:31 +00:00
罗泽轩	278be671a3	[Doc] Update model in arch_overview.rst to match comment (#10701 ) Signed-off-by: spacewander <spacewanderlzx@gmail.com>	2024-11-27 23:58:39 -08:00
shunxing12345	1209261e93	[Model] Support telechat2 (#10311 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: xiangw2 <xiangw2@chinatelecom.cn> Co-authored-by: Isotr0py <2037008807@qq.com>	2024-11-27 11:32:35 +00:00
Murali Andoorveedu	db66e018ea	[Bugfix] Fix for Spec model TP + Chunked Prefill (#10232 ) Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com> Signed-off-by: Sourashis Roy <sroy@roblox.com> Co-authored-by: Sourashis Roy <sroy@roblox.com>	2024-11-26 09:11:16 -08:00
Sage Moore	9a88f89799	custom allreduce + torch.compile (#10121 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-11-25 22:00:16 -08:00
Sanket Kale	a6760f6456	[Feature] vLLM ARM Enablement for AARCH64 CPUs (#9228 ) Signed-off-by: Sanket Kale <sanketk.kale@fujitsu.com> Co-authored-by: Sanket Kale <sanketk.kale@fujitsu.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2024-11-25 18:32:39 -08:00
Shane A	9db713a1dc	[Model] Add OLMo November 2024 model (#10503 )	2024-11-25 17:26:40 -05:00
Cyrus Leung	1b583cfefa	[Doc] Fix typos in docs (#10636 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-25 10:15:45 -08:00
zhou fan	b1d920531f	[Model]: Add support for Aria model (#10514 ) Signed-off-by: xffxff <1247714429@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2024-11-25 18:10:55 +00:00
fzyzcjy	2b0879bfc2	Super tiny little typo fix (#10633 )	2024-11-25 13:08:30 +00:00
Cyrus Leung	ed46f14321	[Model] Support `is_causal` HF config field for Qwen2 model (#10621 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-25 09:51:20 +00:00
Cyrus Leung	a30a605d21	[Doc] Add encoder-based models to Supported Models page (#10616 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-25 06:34:07 +00:00
Maximilien de Bayser	214efc2c3c	Support Cross encoder models (#10400 ) Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Flavia Beo <flavia.beo@ibm.com> Co-authored-by: Flavia Beo <flavia.beo@ibm.com>	2024-11-24 18:56:20 -08:00
youkaichao	e4fbb14414	[doc] update the code to add models (#10603 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-11-24 11:21:40 -08:00
Michael Goin	9afa014552	Add small example to metrics.rst (#10550 )	2024-11-21 23:43:43 +00:00
Li, Jiang	63f1fde277	[Hardware][CPU] Support chunked-prefill and prefix-caching on CPU (#10355 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2024-11-20 10:57:39 +00:00
wchen61	7629a9c6e5	[CI/Build] Support compilation with local cutlass path (#10423 ) (#10424 )	2024-11-19 21:35:50 -08:00
Cyrus Leung	b4be5a8adb	[Bugfix] Enforce no chunked prefill for embedding models (#10470 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-20 05:12:51 +00:00
Russell Bryant	5390d6664f	[Doc] Add the start of an arch overview page (#10368 )	2024-11-19 09:52:11 +00:00
Michael Goin	74f8c2cf5f	Add openai.beta.chat.completions.parse example to structured_outputs.rst (#10433 )	2024-11-19 04:37:46 +00:00
Yan Ma	6b2d25efc7	[Hardware][XPU] AWQ/GPTQ support for xpu backend (#10107 ) Signed-off-by: yan ma <yan.ma@intel.com>	2024-11-18 11:18:05 -07:00
ismael-dm	31894a2155	[Doc] Add documentation for Structured Outputs (#9943 ) Signed-off-by: ismael-dm <ismaeldm99@gmail.com>	2024-11-18 09:52:12 -08:00
B-201	4186be8111	[Doc] Update doc for LoRA support in GLM-4V (#10425 ) Signed-off-by: B-201 <Joy25810@foxmail.com>	2024-11-18 15:08:30 +00:00
youkaichao	755b85359b	[doc] add doc for the plugin system (#10372 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-15 21:46:27 -08:00
Cyrus Leung	32e46e000f	[Frontend] Automatic detection of chat content format from AST (#9919 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-16 13:35:40 +08:00
Michael Green	4f168f69a3	[Docs] Misc updates to TPU installation instructions (#10165 )	2024-11-15 13:26:17 -08:00
Russell Bryant	3e8d14d8a1	[Doc] Move PR template content to docs (#10159 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-11-15 13:20:20 -08:00
Simon Mo	c76ac49d26	[Docs] Add Nebius as sponsors (#10371 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2024-11-15 12:47:40 -08:00
Cyrus Leung	2ac6d0e75b	[Misc] Consolidate pooler config overrides (#10351 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-15 06:59:00 +00:00
Cyrus Leung	b40cf6402e	[Model] Support Qwen2 embeddings and use tags to select model tests (#10184 )	2024-11-14 20:23:09 -08:00
Woosuk Kwon	1dbae0329c	[Docs] Publish meetup slides (#10331 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-11-14 16:19:38 +00:00
Mike Depinet	f67ce05d0b	[Frontend] Pythonic tool parser (#9859 ) Signed-off-by: Mike Depinet <mike@fixie.ai>	2024-11-14 04:14:34 +00:00
youkaichao	504ac53d18	[misc] error early for old-style class (#10304 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-13 18:55:39 -08:00
Cyrus Leung	0b8bb86bf1	[1/N] Initial prototype for multi-modal processor (#10044 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-13 12:39:03 +00:00
B-201	d909acf9fe	[Model][LoRA]LoRA support added for idefics3 (#10281 ) Signed-off-by: B-201 <Joy25810@foxmail.com>	2024-11-13 17:25:59 +08:00
Austin Veselka	1b886aa104	[Model] Adding Support for Qwen2VL as an Embedding Model. Using MrLight/dse-qwen2-2b-mrl-v1 (#9944 ) Signed-off-by: FurtherAI <austin.veselka@lighton.ai> Co-authored-by: FurtherAI <austin.veselka@lighton.ai>	2024-11-13 08:28:13 +00:00
电脑星人	3945c82346	[Model] Add support for Qwen2-VL video embeddings input & multiple image embeddings input with varied resolutions (#10221 ) Signed-off-by: imkero <kerorek@outlook.com>	2024-11-13 07:07:22 +00:00
youkaichao	377b74fe87	Revert "[ci][build] limit cmake version" (#10271 )	2024-11-12 15:06:48 -08:00
youkaichao	18081451f9	[doc] improve debugging doc (#10270 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-12 14:43:52 -08:00
youkaichao	96ae0eaeb2	[doc] fix location of runllm widget (#10266 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-12 14:34:39 -08:00
Guillaume Calmettes	36c513a076	[BugFix] Do not raise a `ValueError` when `tool_choice` is set to the supported `none` option and `tools` are not defined. (#10000 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2024-11-12 11:13:46 +00:00
youkaichao	3a28f18b0b	[doc] explain the class hierarchy in vLLM (#10240 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-11 22:56:44 -08:00
youkaichao	d1c6799b88	[doc] update debugging guide (#10236 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-11 15:21:12 -08:00

1 2 3 4 5 ...

612 Commits