vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Nicolò Lucchesi	fa82b93853	[Frontend][Docs] Transcription API streaming (#13301 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-03-06 10:39:35 +00:00
lkchen	5d802522a7	[V1][VLM][Pixtral-HF] Support Pixtral-HF on V1 (#14275 ) Signed-off-by: Linkun Chen <github@lkchen.net>	2025-03-06 08:58:41 +00:00
kYLe	1769928079	[Model] Update Paligemma multimodal processing with PromptUpdate (#14015 ) Signed-off-by: Kyle Huang <kylhuang@nvidia.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-03-06 08:31:38 +00:00
Ce Gao	f5f7f00cd9	[Bugfix][Structured Output] Support outlines engine with reasoning outputs for DeepSeek R1 (#14114 )	2025-03-06 03:49:20 +00:00
Rui Qiao	abcc61e0af	[misc] Mention `ray list nodes` command to troubleshoot ray issues (#14318 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-03-06 02:00:36 +00:00
Simon Mo	ca2ca8de57	[Docs] Add Meta Slides (#14297 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-03-05 08:30:23 -08:00
DaividFrank	8f808cf86e	prefix_caching.md: Fixed typo (#14293 ) Signed-off-by: Daivid Savernin-Frenk <daivid.frank@TurboNext.ai>	2025-03-05 15:43:13 +00:00
Cyrus Leung	7f89a594dd	[Doc] [3/N] Refer code examples for common cases in dev multimodal processor (#14278 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-05 12:29:50 +00:00
Iacopo Poli	961644e6a8	[Doc] Update nginx guide: remove privileged from vllm container run and add target GPU ID (#14217 ) Signed-off-by: Iacopo Poli <iacopo@lighton.ai>	2025-03-05 11:44:10 +00:00
Congcong Chen	0a995d5434	[Model] New model support for Phi-4-multimodal-instruct (#14119 )	2025-03-04 20:57:01 -08:00
Mark McLoughlin	c2bd2196fc	[v1][Metrics] Add design doc (#12745 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2025-03-04 20:36:55 +00:00
Michael Goin	550c7ba3dc	[Docs] Update Dockerfile dependency image (#14215 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-04 20:22:11 +00:00
youkaichao	3610fb4930	[doc] add "Failed to infer device type" to faq (#14200 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-03-04 20:47:06 +08:00
Travis Johnson	c060b71408	[Model] Add support for GraniteMoeShared models (#13313 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-03-04 08:04:52 +08:00
Qubitium-ModelCloud	cd1d3c3df8	[Docs] Add GPTQModel (#14056 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-03-03 21:59:09 +00:00
Harry Mellor	98175b2816	Improve the docs for `TransformersModel` (#14147 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-03 17:03:05 +00:00
Harry Mellor	cf069aa8aa	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
Ce Gao	bf33700ecd	[v0][structured output] Support reasoning output (#12955 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai>	2025-03-02 14:49:42 -05:00
qux-bbb	bc6ccb9878	[Doc] Source building add clone step (#14086 ) Signed-off-by: qux-bbb <1147635419@qq.com>	2025-03-02 10:59:50 +00:00
Jee Jee Li	cc5e8f6db8	[Model] Add LoRA support for TransformersModel (#13770 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-02 09:17:34 +08:00
Kuntai Du	8994dabc22	[Documentation] Add more deployment guide for Kubernetes deployment (#13841 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Signed-off-by: Kuntai Du <kuntai@uchicago.edu>	2025-03-01 06:44:24 +00:00
Brayden Zhong	f64ffa8c25	[Docs] Add `pipeline_parallel_size` to optimization docs (#14059 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-03-01 05:43:54 +00:00
Brayden Zhong	2aed2c9fa7	[Doc] Fix ROCm documentation (#14041 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-02-28 16:42:07 +00:00
Harry Mellor	f58f8b5c96	Update AutoAWQ docs (#14042 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-28 15:20:29 +00:00
Cyrus Leung	1088f06242	[Doc] Move multimodal Embedding API example to Online Serving page (#14017 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-28 07:12:04 +00:00
Cyrus Leung	f1579b229d	[VLM] Generalized prompt updates for multi-modal processor (#13964 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-27 17:44:25 +00:00
王博伟	512d77d582	Update quickstart.md (#13958 )	2025-02-27 16:05:11 +00:00
Szymon Ożóg	7f0be2aa24	[Model] Deepseek GGUF support (#13167 )	2025-02-27 02:08:35 -08:00
Isotr0py	edf309ebbe	[VLM] Support multimodal inputs for Florence-2 models (#13320 )	2025-02-27 02:06:41 -08:00
Michael Goin	ca377cf1b9	Use CUDA 12.4 as default for release and nightly wheels (#12098 )	2025-02-26 19:06:37 -08:00
Jee Jee Li	5157338ed9	[Misc] Improve LoRA spelling (#13831 )	2025-02-25 23:43:01 -08:00
Michael Goin	07c4353057	[Model] Support Grok1 (#13795 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-26 01:07:12 +00:00
Harry Mellor	cdc1fa12eb	Remove unused kwargs from model definitions (#13555 )	2025-02-24 17:13:52 -08:00
Nicolò Lucchesi	444b0f0f62	[Misc][Docs] Raise error when flashinfer is not installed and `VLLM_ATTENTION_BACKEND` is set (#12513 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-02-24 10:43:21 -05:00
Cyrus Leung	8354f6640c	[Doc] Dockerfile instructions for optional dependencies and dev transformers (#13699 )	2025-02-22 06:04:31 -08:00
Mark McLoughlin	2cb8c1540e	[Metrics] Add `--show-hidden-metrics-for-version` CLI arg (#13295 )	2025-02-22 00:20:45 -08:00
Yuan Tang	8c0dd3d4df	docs: Add a note on full CI run in contributing guide (#13646 )	2025-02-21 21:53:59 -08:00
Gabriel Marinho	1c3c975766	[FEATURE] Enables /score endpoint for embedding models (#12846 )	2025-02-20 22:09:47 -08:00
Kante Yin	44c33f01f3	Add llmaz as another integration (#13643 ) Signed-off-by: kerthcet <kerthcet@gmail.com>	2025-02-21 03:52:40 +00:00
Joe Runde	bfbc0b32c6	[Frontend] Add backend-specific options for guided decoding (#13505 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-02-20 15:07:58 -05:00
Harry Mellor	992e5c3d34	Merge similar examples in `offline_inference` into single `basic` example (#12737 )	2025-02-20 04:53:51 -08:00
Jee Jee Li	512368e34a	[Misc] Qwen2.5 VL support LoRA (#13261 )	2025-02-19 18:37:55 -08:00
Wilson Wu	01c184b8f3	Fix copyright year to auto get current year (#13561 )	2025-02-19 16:55:34 +00:00
youkaichao	ad5a35c21b	[doc] clarify multi-node serving doc (#13558 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-19 22:32:17 +08:00
youkaichao	52ce14d31f	[doc] clarify profiling is only for developers (#13554 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-19 20:55:58 +08:00
Roger Wang	fd84857f64	[Doc] Add clarification note regarding paligemma (#13511 )	2025-02-18 22:24:03 -08:00
Harry Mellor	00b69c2d27	[Misc] Remove dangling references to `--use-v2-block-manager` (#13492 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-19 03:37:26 +00:00
youkaichao	7b203b7694	[misc] fix debugging code (#13487 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-18 09:37:11 -08:00
Harry Mellor	2358ca527b	[Doc]: Improve feature tables (#13224 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-18 18:52:39 +08:00
Isotr0py	67ef8f666a	[Model] Enable quantization support for `transformers` backend (#12960 )	2025-02-17 19:52:47 -08:00
Cyrus Leung	7b623fca0b	[VLM] Check required fields before initializing field config in `DictEmbeddingItems` (#13380 )	2025-02-17 01:36:07 -08:00
yankooo	f857311d13	Fix spelling error in index.md (#13369 )	2025-02-17 06:53:20 +00:00
shangmingc	46cdd59577	[Feature][Spec Decode] Simplify the use of Eagle Spec Decode (#12304 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-02-16 19:32:26 -08:00
凌	da833b0aee	[Docs] Change myenv to vllm. Update python_env_setup.inc.md (#13325 )	2025-02-16 16:04:21 +00:00
Roger Wang	b7d309860e	[V1] Update doc and examples for H2O-VL (#13349 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-02-16 10:35:54 +00:00
Cyrus Leung	367cb8ce8c	[Doc] [2/N] Add Fuyu E2E example for multimodal processor (#13331 )	2025-02-15 07:06:23 -08:00
Nicolò Lucchesi	579d7a63b2	[Bugfix][Docs] Fix offline Whisper (#13274 )	2025-02-14 21:32:37 -08:00
Nicolò Lucchesi	d84cef76eb	[Frontend] Add `/v1/audio/transcriptions` OpenAI API endpoint (#12909 )	2025-02-13 07:23:45 -08:00
Cyrus Leung	1bc3b5e71b	[VLM] Separate text-only and vision variants of the same model architecture (#13157 )	2025-02-13 06:19:15 -08:00
Cyrus Leung	c9d3ecf016	[VLM] Merged multi-modal processor for Molmo (#12966 )	2025-02-13 04:34:00 -08:00
Russell Bryant	d46d490c27	[Frontend] Move CLI code into vllm.cmd package (#12971 )	2025-02-12 23:12:21 -08:00
Cody Yu	60c68df6d1	[Build] Automatically use the wheel of the base commit with Python-only build (#13178 )	2025-02-12 23:10:28 -08:00
Harry Mellor	deb6c1c6b4	[Doc] Improve OpenVINO installation doc (#13102 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-11 18:02:46 +00:00
Farzad Abdolhosseini	08b2d845d6	[Model] Ultravox Model: Support v0.5 Release (#12912 ) Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>	2025-02-10 22:02:48 +00:00
மனோஜ்குமார் பழனிச்சாமி	2ae889052c	Fix seed parameter behavior in vLLM (#13007 ) Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>	2025-02-10 23:26:50 +08:00
Cyrus Leung	51f0b5f7f6	[Bugfix] Clean up and fix multi-modal processors (#13012 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-10 10:45:21 +00:00
Yuan Tang	243137143c	[Doc] Add link to tool_choice tracking issue in tool_calling.md (#13003 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-10 06:09:33 +00:00
Jee Jee Li	86222a3dab	[VLM] Merged multi-modal processor for GLM4V (#12449 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-02-08 20:32:16 +00:00
Cyrus Leung	8a69e0e20e	[CI/Build] Auto-fix Markdown files (#12941 )	2025-02-08 04:25:15 -08:00
Jun Duan	256a2d29dc	[Doc] Correct HF repository for TeleChat2 models (#12949 )	2025-02-08 01:42:15 -08:00
TJian	eaa92d4437	[ROCm] [Feature] [Doc] [Dockerfile] [BugFix] Support Per-Token-Activation Per-Channel-Weight FP8 Quantization Inferencing (#12501 )	2025-02-07 08:13:43 -08:00
Jitse Klomp	afe74f7a96	[Doc] double quote cmake package in build.inc.md (#12840 )	2025-02-06 09:17:55 -08:00
Sumit Vij	d88506dda4	[Model] LoRA Support for Ultravox model (#11253 )	2025-02-05 19:54:13 -08:00
Cyrus Leung	75404d041b	[VLM] Update compatibility with transformers 4.49	2025-02-05 19:09:45 -08:00
Roger Wang	bf3b79efb8	[VLM] Qwen2.5-VL	2025-02-05 13:31:38 -08:00
Russell Bryant	9a5b1554b4	[Docs] Drop duplicate [source] links	2025-02-05 13:30:50 -08:00
Michael Goin	c53dc466b1	[Doc] Remove performance warning for auto_awq.md (#12743 )	2025-02-04 22:43:11 -08:00
Isotr0py	815079de8e	[VLM] merged multimodal processor and V1 support for idefics3 (#12660 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-02-04 20:00:51 +08:00
Cyrus Leung	d1ca7df84d	[VLM] Merged multi-modal processor for InternVL-based models (#12553 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-02-04 16:44:52 +08:00
Thomas Parnell	bb392af434	[Doc] Replace ibm-fms with ibm-ai-platform (#12709 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-02-04 07:05:04 +00:00
Arthur	a1a2aaadb9	[Model]: Add `transformers` backend support (#11330 ) # Adds support for `transformers` as a backend Following https://github.com/huggingface/transformers/pull/35235, a bunch of models should already be supported, we are ramping up support for more models. Thanks @Isotr0py for the TP support, and @hmellor for his help as well! This includes: - `trust_remote_code=True` support: any model on the hub, if it implements attention the correct way can be natively supported!! - tensor parallel support --------- Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <41363108+Isotr0py@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-02-03 21:30:38 +08:00
youkaichao	e64330910b	[doc][misc] clarify VLLM_HOST_IP for multi-node inference (#12667 ) As more and more people are trying deepseek models with multi-node inference, https://github.com/vllm-project/vllm/issues/7815 becomes more frequent. Let's give clear message to users. Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-03 09:32:18 +08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Kunshang Ji	f256ebe4df	[Hardware][Intel GPU] add XPU bf16 support (#12392 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-02-02 10:17:26 +00:00
Brian Dellabetta	44bbca78d7	[Doc] int4 w4a16 example (#12585 ) Based on a request by @mgoin , with @kylesayrs we have added an example doc for int4 w4a16 quantization, following the pre-existing int8 w8a8 quantization example and the example available in [`llm-compressor`](https://github.com/vllm-project/llm-compressor/blob/main/examples/quantization_w4a16/llama3_example.py) FIX #n/a (no issue created) @kylesayrs and I have discussed a couple additional improvements for the quantization docs. We will revisit at a later date, possibly including: - A section for "choosing the correct quantization scheme/ compression technique" - Additional vision or audio calibration datasets --------- Signed-off-by: Brian Dellabetta <bdellabe@redhat.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-01-31 15:38:48 -08:00
Harry Mellor	60808bd4c7	[Doc] Improve installation signposting (#12575 ) - Make device tab names more explicit - Add comprehensive list of devices to https://docs.vllm.ai/en/latest/getting_started/installation/index.html - Add `attention` blocks to the intro of all devices that don't have pre-built wheels/images --------- Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-31 15:38:35 -08:00
Cody Yu	60bcef000e	[Docs][V1] Prefix caching design (#12598 ) - Create v1 design document section in docs. - Add prefix caching design doc. @WoosukKwon @ywang96 --------- Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2025-01-31 12:30:46 -08:00
Cody Yu	847f883232	[Git] Automatically sign-off commits (#12595 ) It's very annoying when I forgot to add `-s` in `git commit` to sign-off, because I then need to `git rebase HEAD~1 --signoff` and `git push -f` to fix the DCO. This PR adds a hook to sign off commits automatically when `-s` is missing to solve this problem. The only change from the user side is now users have to install 2 hooks, so instead of just ``` pre-commit install ``` Now we need to ``` pre-commit install --hook-type pre-commit --hook-type commit-msg ``` Note that even if users still only install the pre-commit hook, they won't get any error in `git commit`. Just the sign-off hook won't run. cc @hmellor @youkaichao --------- Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2025-01-31 12:30:33 -08:00
Harry Mellor	e3f7ff65e7	Add favicon to docs (#12611 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-31 09:20:34 -08:00
Harry Mellor	a2769032ca	Set `?device={device}` when changing tab in installation guides (#12560 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-30 00:05:42 -08:00
Alphi	d93bf4da85	[Model] Refactoring of MiniCPM-V and add MiniCPM-o-2.6 support for vLLM (#12069 ) Signed-off-by: hzh <hezhihui_thu@163.com> Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> Signed-off-by: shaochangxu.scx <shaochangxu.scx@antgroup.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Oleg Mosalov <oleg@krai.ai> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu> Signed-off-by: Chenguang Li <757486878@qq.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Shanshan Shen <467638484@qq.com> Signed-off-by: elijah <f1renze.142857@gmail.com> Signed-off-by: Yikun <yikunkero@gmail.com> Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Konrad Zawora <kzawora@habana.ai> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> Co-authored-by: shaochangxu <85155497+shaochangxu@users.noreply.github.com> Co-authored-by: shaochangxu.scx <shaochangxu.scx@antgroup.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: sixgod <evethwillbeok@outlook.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Akshat Tripathi <Akshat.tripathi6568@gmail.com> Co-authored-by: Oleg Mosalov <oleg@krai.ai> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Avshalom Manevich <12231371+avshalomman@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by: Yangcheng Li <liyangcheng.lyc@alibaba-inc.com> Co-authored-by: Siyuan Li <94890248+liaoyanqing666@users.noreply.github.com> Co-authored-by: Concurrensee <yida.wu@amd.com> Co-authored-by: Chenguang Li <757486878@qq.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Alex Brooks <alex.brooks@ibm.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Shanshan Shen <467638484@qq.com> Co-authored-by: elijah <30852919+e1ijah1@users.noreply.github.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Steve Luo <36296769+SunflowerAries@users.noreply.github.com> Co-authored-by: mgoin <michael@neuralmagic.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Konrad Zawora <kzawora@habana.ai> Co-authored-by: TJian <tunjian1996@gmail.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: maang-h <55082429+maang-h@users.noreply.github.com> Co-authored-by: Elfie Guo <164945471+elfiegg@users.noreply.github.com> Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-01-29 09:24:59 +00:00
Harry Mellor	dd6a3a02cb	[Doc] Convert docs to use colon fences (#12471 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-29 11:38:29 +08:00
Ce Gao	a7e3eba66f	[Frontend] Support reasoning content for deepseek r1 (#12473 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai> Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Michael Goin <mgoin@redhat.com>	2025-01-29 11:38:08 +08:00
Jun Duan	925d2f1908	[Doc] Fix typo for x86 CPU installation (#12514 ) Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>	2025-01-28 16:37:10 +00:00
Cyrus Leung	8f58a51358	[VLM] Merged multi-modal processor and V1 support for Qwen-VL (#12504 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-28 16:25:05 +00:00
Yuan Tang	582cf78798	[DOC] Add link to vLLM blog (#12460 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-01-27 03:46:19 +00:00
Kyle Mistele	0034b09ceb	[Frontend] Rerank API (Jina- and Cohere-compatible API) (#12376 ) Signed-off-by: Kyle Mistele <kyle@mistele.com>	2025-01-26 19:58:45 -07:00
Mohit Deopujari	9a0f3bdbe5	[Hardware][Gaudi][Doc] Add missing step in setup instructions (#12382 )	2025-01-24 09:43:49 +00:00
Russell Bryant	c5cffcd0cd	[Docs] Update spec decode + structured output in compat matrix (#12373 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-01-24 01:15:52 +00:00
Woosuk Kwon	682b55bc07	[Docs] Add meetup slides (#12345 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-01-23 14:10:03 -08:00
Isotr0py	2cbeedad09	[Docs] Document Phi-4 support (#12362 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-01-23 19:18:51 +00:00
Gregory Shtrasberg	e97f802b2d	[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2025-01-23 18:04:03 +00:00
Cyrus Leung	d07efb31c5	[Doc] Troubleshooting errors during model inspection (#12351 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-23 22:46:58 +08:00
youkaichao	511627445e	[doc] explain common errors around torch.compile (#12340 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-23 14:56:02 +08:00
Russell Bryant	7551a34032	[Docs] Document vulnerability disclosure process (#12326 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-01-23 03:44:09 +00:00
Michael Goin	01a55941f5	[Docs] Update FP8 KV Cache documentation (#12238 ) Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-01-23 11:18:09 +08:00
Hongxia Yang	09ccc9c8f7	[Documentation][AMD] Add information about prebuilt ROCm vLLM docker for perf validation purpose (#12281 ) Signed-off-by: Hongxia Yang <hongxyan@amd.com>	2025-01-22 07:49:22 +08:00
Cyrus Leung	96912550c8	[Misc] Rename `MultiModalInputsV2 -> MultiModalInputs` (#12244 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-21 07:31:19 +00:00
Gregory Shtrasberg	d4b62d4641	[AMD][Build] Porting dockerfiles from the ROCm/vllm fork (#11777 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-01-21 12:22:23 +08:00
Isotr0py	83609791d2	[Model] Add Qwen2 PRM model support (#12202 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-01-20 14:59:46 +08:00
Harry Mellor	3ea7b94523	Move linting to `pre-commit` (#11975 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-20 14:58:01 +08:00
Roger Wang	81763c58a0	[V1] Add V1 support of Qwen2-VL (#12128 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: imkero <kerorek@outlook.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-19 19:52:13 +08:00
Isotr0py	02798ecabe	[Model] Port deepseek-vl2 processor, remove dependency (#12169 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-01-18 13:59:39 +08:00
Hongxia Yang	c09503ddd6	[AMD][CI/Build][Bugfix] use pytorch stale wheel (#12172 ) Signed-off-by: hongxyan <hongxyan@amd.com>	2025-01-18 11:15:53 +08:00
Yuan Tang	1475847a14	[Doc] Add instructions on using Podman when SELinux is active (#12136 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-01-17 04:45:36 +00:00
Isotr0py	62b06ba23d	[Model] Add support for deepseek-vl2-tiny model (#12068 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-01-16 17:14:48 +00:00
Cyrus Leung	f8ef146f03	[Doc] Add documentation for specifying model architecture (#12105 )	2025-01-16 15:53:43 +08:00
RunningLeon	97eb97b5a4	[Model]: Support internlm3 (#12037 )	2025-01-15 11:35:17 +00:00
Kyle Sayers	3f9b7ab9f5	[Doc] Update examples to remove SparseAutoModelForCausalLM (#12062 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-01-15 06:36:01 +00:00
Harry Mellor	c9d6ff530b	Explain where the engine args go when using Docker (#12041 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-14 16:05:50 +00:00
TJian	8a1f938e6f	[Doc] Update Quantization Hardware Support Documentation (#12025 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-01-14 04:37:52 +00:00
Woosuk Kwon	1a401252b5	[Docs] Add Sky Computing Lab to project intro (#12019 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-01-13 17:24:36 -08:00
Harry Mellor	e8c23ff989	[Doc] Organise installation documentation into categories and tabs (#11935 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-13 12:27:36 +00:00
Roger Wang	cd8249903f	[Doc][V1] Update model implementation guide for V1 support (#11998 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-01-13 11:58:54 +00:00
Akshat Tripathi	8bddb73512	[Hardware][CPU] Multi-LoRA implementation for the CPU backend (#11100 ) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Oleg Mosalov <oleg@krai.ai> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Oleg Mosalov <oleg@krai.ai> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-01-12 13:01:52 +00:00
Isotr0py	f967e51f38	[Model] Initialize support for Deepseek-VL2 models (#11578 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-01-12 00:17:24 -08:00
Rafael Vasquez	43f3d9e699	[CI/Build] Add markdown linter (#11857 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2025-01-12 00:17:13 -08:00
Cyrus Leung	a991f7d508	[Doc] Basic guide for writing unit tests for new models (#11951 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-11 21:27:24 +08:00
Li, Jiang	aa1e77a19c	[Hardware][CPU] Support MOE models on x86 CPU (#11831 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-01-10 11:07:58 -05:00
Harry Mellor	482cdc494e	[Doc] Rename offline inference examples (#11927 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-10 23:50:29 +08:00
Cyrus Leung	12664ddda5	[Doc] [1/N] Initial guide for merged multi-modal processor (#11925 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-10 14:30:25 +00:00
Harry Mellor	d85c47d6ad	Replace "online inference" with "online serving" (#11923 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-10 12:05:56 +00:00
Cyrus Leung	3de2b1eafb	[Doc] Show default pooling method in a table (#11904 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-10 11:25:20 +08:00
Cyrus Leung	c3cf54dda4	[Doc][5/N] Move Community and API Reference to the bottom (#11896 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-01-10 03:10:12 +00:00
Charles Frye	36f5303578	[Docs] Add Modal to deployment frameworks (#11907 )	2025-01-09 23:26:37 +00:00
Cyrus Leung	9a228348d2	[Misc] Provide correct Pixtral-HF chat template (#11891 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-09 10:19:37 -07:00
Cyrus Leung	65097ca0af	[Doc] Add model development API Reference (#11884 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-09 09:43:40 +00:00
Guspan Tanadi	a732900efc	[Doc] Intended links Python multiprocessing library (#11878 )	2025-01-09 05:39:39 +00:00
Michael Goin	730e9592e9	[Doc] Recommend uv and python 3.12 for quickstart guide (#11849 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2025-01-09 11:37:48 +08:00
Cyrus Leung	5984499e47	[Doc] Expand Multimodal API Reference (#11852 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-08 17:14:14 +00:00
Cyrus Leung	6cd40a5bfe	[Doc][4/N] Reorganize API Reference (#11843 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-08 21:34:44 +08:00
Harry Mellor	aba8d6ee00	[Doc] Move examples into categories (#11840 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-08 13:09:53 +00:00
Wallas Henrique	cfd3219f58	[Hardware][Apple] Native support for macOS Apple Silicon (#11696 ) Signed-off-by: Wallas Santos <wallashss@ibm.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-01-08 16:35:49 +08:00
Simon Mo	a1b2b8606e	[Docs] Update sponsor name: 'Novita' to 'Novita AI' (#11833 )	2025-01-07 23:05:46 -08:00
youkaichao	ad9f1aa679	[doc] update wheels url (#11830 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-08 14:36:49 +08:00
Simon Mo	259abd8953	[Docs] reorganize sponsorship page (#11639 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-01-07 21:16:08 -08:00
Harry Mellor	5950f555a1	[Doc] Group examples into categories (#11782 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-08 09:20:12 +08:00
sroy745	973f5dc581	[Doc]Add documentation for using EAGLE in vLLM (#11417 ) Signed-off-by: Sourashis Roy <sroy@roblox.com>	2025-01-07 19:19:12 +00:00
Cyrus Leung	c0efe92d8b	[Doc] Add note to `gte-Qwen2` models (#11808 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-07 21:50:58 +08:00
youkaichao	d9fa1c05ad	[doc] update how pip can install nightly wheels (#11806 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-07 21:42:58 +08:00
Roger Wang	2de197bdd4	[V1] Support audio language models on V1 (#11733 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-01-07 19:47:36 +08:00
youkaichao	869e829b85	[doc] add doc to explain how to use uv (#11773 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-01-07 18:41:17 +08:00
Roger Wang	8082ad7950	[V1][Doc] Update V1 support for `LLaVa-NeXT-Video` (#11798 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-01-07 09:55:39 +00:00
Russell Bryant	ce1917fcf2	[Doc] Create a vulnerability management team (#9925 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-01-06 22:57:32 -08:00
Cyrus Leung	8ceffbf315	[Doc][3/N] Reorganize Serving section (#11766 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-07 11:20:01 +08:00
Roger Wang	91b361ae89	[V1] Extend beyond image modality and support mixed-modality inference with Llava-OneVision (#11685 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-06 19:58:16 +00:00
youkaichao	4ca5d40adc	[doc] explain how to add interleaving sliding window support (#11771 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-06 21:57:44 +08:00
Cyrus Leung	ee77fdb5de	[Doc][2/N] Reorganize Models and Usage sections (#11755 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-06 21:40:31 +08:00
Suraj Deshmukh	2a622d704a	k8s-config: Update the secret to use stringData (#11679 ) Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com>	2025-01-06 08:01:22 +00:00
Cyrus Leung	402d378360	[Doc] [1/N] Reorganize Getting Started section (#11645 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-06 02:18:33 +00:00
Alberto Ferrer	d1d49397e7	Update bnb.md with example for OpenAI (#11718 )	2025-01-04 06:29:02 +00:00
Hust_YangXian	9c93636d84	Update tool_calling.md (#11701 )	2025-01-04 06:16:30 +00:00
Sachin Varghese	2f1e8e8f54	Update default max_num_batch_tokens for chunked prefill (#11694 )	2025-01-03 00:25:53 +00:00
Chunyang Wen	84c35c374a	According to vllm.EngineArgs, the name should be distributed_executor_backend (#11689 )	2025-01-02 18:14:16 +00:00
Cyrus Leung	365801fedd	[VLM] Add max-count checking in data parser for single image models (#11661 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-12-31 22:15:21 -08:00
Roger Wang	e7c7c5e822	[V1][VLM] V1 support for selected single-image models. (#11632 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Isotr0py <2037008807@qq.com>	2024-12-31 21:17:22 +00:00
Matthias Vogler	a2a40bcd0d	[Model][LoRA]LoRA support added for MolmoForCausalLM (#11439 ) Signed-off-by: Matthias Vogler <matthias.vogler@joesecurity.org> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Matthias Vogler <matthias.vogler@joesecurity.org> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-30 17:33:06 -08:00
youkaichao	b12e87f942	[platforms] enable platform plugins (#11602 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-30 20:24:45 +08:00
Cyrus Leung	32b4c63f02	[Doc] Convert list tables to MyST (#11594 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-29 15:56:22 +08:00
youkaichao	328841d002	[bugfix] interleaving sliding window for cohere2 model (#11583 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-28 16:55:42 +00:00
Cyrus Leung	d427e5cfda	[Doc] Minor documentation fixes (#11580 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-28 21:53:59 +08:00
Isotr0py	d34be24bb1	[Model] Support InternLM2 Reward models (#11571 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-28 06:14:10 +00:00
Robert Shaw	df04dffade	[V1] [4/N] API Server: ZMQ/MP Utilities (#11541 )	2024-12-28 01:45:08 +00:00
Cyrus Leung	101418096f	[VLM] Support caching in merged multi-modal processor (#11396 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-27 17:22:48 +00:00
Chen1022	5ce4627a7e	[Doc] Add xgrammar in doc (#11549 ) Signed-off-by: ccjincong <chenjincong11@gmail.com>	2024-12-27 13:05:10 +00:00
AlexHe99	d003f3ea39	Update deploying_with_k8s.md with AMD ROCm GPU example (#11465 ) Signed-off-by: Alex He <alehe@amd.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-27 10:00:04 +00:00
Robert Shaw	0c0c2015c5	Update openai_compatible_server.md (#11536 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-12-26 16:26:18 -08:00
Simon Mo	82d24f7aac	[Docs] Document Deepseek V3 support (#11535 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2024-12-26 16:21:56 -08:00
Isotr0py	b85a977822	[Doc] Add video example to openai client for multimodal (#11521 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-26 17:31:29 +00:00
Roger Wang	7492a36207	[Doc] Add `QVQ` and `QwQ` to the list of supported models (#11509 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-12-26 09:44:32 +00:00
Cyrus Leung	6ad909fdda	[Doc] Improve GitHub links (#11491 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-25 14:49:26 -08:00
Cyrus Leung	3f3e92e1f2	[Model] Automatic conversion of classification and reward models (#11469 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-24 18:22:22 +00:00
Cyrus Leung	9edca6bf8f	[Frontend] Online Pooling API (#11457 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-24 17:54:30 +08:00
Rafael Vasquez	32aa2059ad	[Docs] Convert rST to MyST (Markdown) (#11145 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-12-23 22:35:38 +00:00
Yuan Tang	2e726680b3	[Bugfix] torch nightly version in ROCm installation guide (#11423 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2024-12-23 17:20:22 +00:00
youkaichao	5d2248d81a	[doc] explain nccl requirements for rlhf (#11381 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-20 13:00:56 -08:00
omer-dayan	995f56236b	[Core] Loading model from S3 using RunAI Model Streamer as optional loader (#10192 ) Signed-off-by: OmerD <omer@run.ai>	2024-12-20 16:46:24 +00:00
youkaichao	1ecc645b8f	[doc] backward compatibility for 0.6.4 (#11359 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-19 21:33:53 -08:00
youkaichao	7801f56ed7	[ci][gh200] dockerfile clean up (#11351 ) Signed-off-by: drikster80 <ed.sealing@gmail.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: drikster80 <ed.sealing@gmail.com> Co-authored-by: cenzhiyao <2523403608@qq.com>	2024-12-19 18:13:06 -08:00
Yehoshua Cohen	6c7f881541	[Model] Add JambaForSequenceClassification model (#10860 ) Signed-off-by: Yehoshua Cohen <yehoshuaco@ai21.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Yehoshua Cohen <yehoshuaco@ai21.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-19 22:48:06 +08:00
Travis Johnson	17ca964273	[Model] IBM Granite 3.1 (#11307 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-12-19 11:27:24 +08:00
kYLe	66d4b16724	[Frontend] Add OpenAI API support for input_audio (#11027 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-16 22:09:58 -08:00
youkaichao	35bae114a8	fix gh200 tests on main (#11246 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-16 17:22:38 -08:00
bk-TurbaAI	35ffa682b1	[Docs] hint to enable use of GPU performance counters in profiling tools for multi-node distributed serving (#11235 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-12-16 22:20:39 +00:00
Jani Monoses	bddbbcb132	[Model] Support Cohere2ForCausalLM (Cohere R7B) (#11203 )	2024-12-16 09:56:19 +00:00
cennn	b3b1526f03	WIP: [CI/Build] simplify Dockerfile build for ARM64 / GH200 (#11212 ) Signed-off-by: drikster80 <ed.sealing@gmail.com> Co-authored-by: drikster80 <ed.sealing@gmail.com>	2024-12-16 09:20:49 +00:00
AlexHe99	da6f409246	Update deploying_with_k8s.rst (#10922 )	2024-12-15 16:33:58 -08:00
Kuntai Du	38e599d6a8	[Doc] add documentation for disaggregated prefilling (#11197 ) Signed-off-by: Kuntai Du <kuntai@uchicago.edu>	2024-12-15 13:31:16 -06:00
Jee Jee Li	15859f2357	[[Misc]Upgrade bitsandbytes to the latest version 0.45.0 (#11201 )	2024-12-15 03:03:06 +00:00
Russell Bryant	4863e5fba5	[Core] V1: Use multiprocessing by default (#11074 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-12-13 16:27:32 -08:00
Cyrus Leung	0920ab9131	[Doc] Reorganize online pooling APIs (#11172 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-14 00:22:22 +08:00
Cyrus Leung	eeec9e3390	[Frontend] Separate pooling APIs in offline inference (#11129 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-13 10:40:07 +00:00
Jani Monoses	7cd7409142	PaliGemma 2 support (#11142 )	2024-12-13 07:40:07 +00:00
Ramon Ziai	d4d5291cc2	fix(docs): typo in helm install instructions (#11141 ) Signed-off-by: Ramon Ziai <ramon.ziai@bettermarks.com>	2024-12-12 17:36:32 +00:00
Pooya Davoodi	1da8f0e1dd	[Model] Add support for embedding model GritLM (#10816 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>	2024-12-12 06:39:16 +00:00
Yuan Tang	24a36d6d5f	Update link to LlamaStack remote vLLM guide in serving_with_llamastack.rst (#11112 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2024-12-12 02:39:21 +00:00
bingps	fd22220687	[Doc] Installed version of llmcompressor for int8/fp8 quantization (#11103 ) Signed-off-by: Guangda Liu <bingps@users.noreply.github.com> Co-authored-by: Guangda Liu <bingps@users.noreply.github.com>	2024-12-11 15:43:24 +00:00
Cyrus Leung	cad5c0a6ed	[Doc] Update docs to refer to pooling models (#11093 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-11 13:36:27 +00:00
Cyrus Leung	8f10d5e393	[Misc] Split up pooling tasks (#10820 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-11 01:28:00 -08:00
Mor Zusman	ffa48c9146	[Model] PP support for Mamba-like models (#10992 ) Signed-off-by: mzusman <mor.zusmann@gmail.com>	2024-12-10 21:53:37 -05:00
Maxime Fournioux	fe2e10c71b	Add example of helm chart for vllm deployment on k8s (#9199 ) Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>	2024-12-10 09:19:27 +00:00
Joe Runde	980ad394a8	[Frontend] Use request id from header (#10968 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-12-10 13:46:29 +08:00
Michael Goin	6d525288c1	[Docs] Add dedicated tool calling page to docs (#10554 ) Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-12-09 20:15:34 -05:00
Roger Wang	af7c4a92e6	[Doc][V1] Add V1 support column for multimodal models (#10998 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2024-12-08 22:29:16 -08:00
Cyrus Leung	c889d5888b	[Doc] Explicitly state that PP isn't compatible with speculative decoding yet (#10975 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-07 17:20:49 +00:00
Cyrus Leung	39e227c7ae	[Model] Update multi-modal processor to support Mantis(LLaVA) model (#10711 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-07 17:10:05 +00:00
Cyrus Leung	1c768fe537	[Doc] Explicitly state that InternVL 2.5 is supported (#10978 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-07 16:58:02 +00:00
Sam Stoelinga	7406274041	[Doc] add KubeAI to serving integrations (#10837 ) Signed-off-by: Sam Stoelinga <sammiestoel@gmail.com>	2024-12-06 17:03:56 +00:00
Cyrus Leung	aa39a8e175	[Doc] Create a new "Usage" section (#10827 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-05 11:19:35 +08:00
Daniele	e4c34c23de	[CI/Build] improve python-only dev setup (#9621 ) Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-12-04 21:48:13 +00:00
Kevin H. Luu	c92acb9693	[ci/build] Update vLLM postmerge ECR repo (#10887 )	2024-12-04 09:01:20 +00:00
Aaron Pham	9323a3153b	[Core][Performance] Add XGrammar support for guided decoding and set it as default (#10785 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2024-12-03 15:17:00 +08:00
Russell Bryant	ef51831ee8	[Doc] Add github links for source code references (#10672 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-03 06:46:07 +00:00
Cyrus Leung	e95f275f57	[CI/Build] Update `mistral_common` version for tests and docs (#10825 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-02 10:26:10 +00:00
youkaichao	169a0ff911	[doc] add warning about comparing hf and vllm outputs (#10805 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-01 00:41:38 -08:00
Cyrus Leung	133707123e	[Model] Replace embedding models with pooling adapter (#10769 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-01 08:02:54 +08:00
wangxiyuan	7e4bbda573	[doc] format fix (#10789 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2024-11-30 11:38:40 +00:00
Isotr0py	c83919c7a6	[Model] Add Internlm2 LoRA support (#5064 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2024-11-28 17:29:04 +00:00
sixgod	5fc5ce0fe4	[Model] Added GLM-4 series hf format model support vllm==0.6.4 (#10561 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-11-28 14:53:31 +00:00
罗泽轩	278be671a3	[Doc] Update model in arch_overview.rst to match comment (#10701 ) Signed-off-by: spacewander <spacewanderlzx@gmail.com>	2024-11-27 23:58:39 -08:00
shunxing12345	1209261e93	[Model] Support telechat2 (#10311 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: xiangw2 <xiangw2@chinatelecom.cn> Co-authored-by: Isotr0py <2037008807@qq.com>	2024-11-27 11:32:35 +00:00
Murali Andoorveedu	db66e018ea	[Bugfix] Fix for Spec model TP + Chunked Prefill (#10232 ) Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com> Signed-off-by: Sourashis Roy <sroy@roblox.com> Co-authored-by: Sourashis Roy <sroy@roblox.com>	2024-11-26 09:11:16 -08:00
Sage Moore	9a88f89799	custom allreduce + torch.compile (#10121 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-11-25 22:00:16 -08:00
Sanket Kale	a6760f6456	[Feature] vLLM ARM Enablement for AARCH64 CPUs (#9228 ) Signed-off-by: Sanket Kale <sanketk.kale@fujitsu.com> Co-authored-by: Sanket Kale <sanketk.kale@fujitsu.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2024-11-25 18:32:39 -08:00
Shane A	9db713a1dc	[Model] Add OLMo November 2024 model (#10503 )	2024-11-25 17:26:40 -05:00
Cyrus Leung	1b583cfefa	[Doc] Fix typos in docs (#10636 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-25 10:15:45 -08:00
zhou fan	b1d920531f	[Model]: Add support for Aria model (#10514 ) Signed-off-by: xffxff <1247714429@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2024-11-25 18:10:55 +00:00
fzyzcjy	2b0879bfc2	Super tiny little typo fix (#10633 )	2024-11-25 13:08:30 +00:00
Cyrus Leung	ed46f14321	[Model] Support `is_causal` HF config field for Qwen2 model (#10621 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-25 09:51:20 +00:00
Cyrus Leung	a30a605d21	[Doc] Add encoder-based models to Supported Models page (#10616 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-25 06:34:07 +00:00
Maximilien de Bayser	214efc2c3c	Support Cross encoder models (#10400 ) Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Flavia Beo <flavia.beo@ibm.com> Co-authored-by: Flavia Beo <flavia.beo@ibm.com>	2024-11-24 18:56:20 -08:00
youkaichao	e4fbb14414	[doc] update the code to add models (#10603 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-11-24 11:21:40 -08:00
Michael Goin	9afa014552	Add small example to metrics.rst (#10550 )	2024-11-21 23:43:43 +00:00
Li, Jiang	63f1fde277	[Hardware][CPU] Support chunked-prefill and prefix-caching on CPU (#10355 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2024-11-20 10:57:39 +00:00
wchen61	7629a9c6e5	[CI/Build] Support compilation with local cutlass path (#10423 ) (#10424 )	2024-11-19 21:35:50 -08:00
Cyrus Leung	b4be5a8adb	[Bugfix] Enforce no chunked prefill for embedding models (#10470 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-20 05:12:51 +00:00
Russell Bryant	5390d6664f	[Doc] Add the start of an arch overview page (#10368 )	2024-11-19 09:52:11 +00:00
Michael Goin	74f8c2cf5f	Add openai.beta.chat.completions.parse example to structured_outputs.rst (#10433 )	2024-11-19 04:37:46 +00:00
Yan Ma	6b2d25efc7	[Hardware][XPU] AWQ/GPTQ support for xpu backend (#10107 ) Signed-off-by: yan ma <yan.ma@intel.com>	2024-11-18 11:18:05 -07:00
ismael-dm	31894a2155	[Doc] Add documentation for Structured Outputs (#9943 ) Signed-off-by: ismael-dm <ismaeldm99@gmail.com>	2024-11-18 09:52:12 -08:00
B-201	4186be8111	[Doc] Update doc for LoRA support in GLM-4V (#10425 ) Signed-off-by: B-201 <Joy25810@foxmail.com>	2024-11-18 15:08:30 +00:00
youkaichao	755b85359b	[doc] add doc for the plugin system (#10372 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-15 21:46:27 -08:00
Cyrus Leung	32e46e000f	[Frontend] Automatic detection of chat content format from AST (#9919 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-16 13:35:40 +08:00
Michael Green	4f168f69a3	[Docs] Misc updates to TPU installation instructions (#10165 )	2024-11-15 13:26:17 -08:00
Russell Bryant	3e8d14d8a1	[Doc] Move PR template content to docs (#10159 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-11-15 13:20:20 -08:00
Simon Mo	c76ac49d26	[Docs] Add Nebius as sponsors (#10371 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2024-11-15 12:47:40 -08:00
Cyrus Leung	2ac6d0e75b	[Misc] Consolidate pooler config overrides (#10351 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-15 06:59:00 +00:00
Cyrus Leung	b40cf6402e	[Model] Support Qwen2 embeddings and use tags to select model tests (#10184 )	2024-11-14 20:23:09 -08:00
Woosuk Kwon	1dbae0329c	[Docs] Publish meetup slides (#10331 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-11-14 16:19:38 +00:00
Mike Depinet	f67ce05d0b	[Frontend] Pythonic tool parser (#9859 ) Signed-off-by: Mike Depinet <mike@fixie.ai>	2024-11-14 04:14:34 +00:00
youkaichao	504ac53d18	[misc] error early for old-style class (#10304 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-13 18:55:39 -08:00
Cyrus Leung	0b8bb86bf1	[1/N] Initial prototype for multi-modal processor (#10044 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-13 12:39:03 +00:00
B-201	d909acf9fe	[Model][LoRA]LoRA support added for idefics3 (#10281 ) Signed-off-by: B-201 <Joy25810@foxmail.com>	2024-11-13 17:25:59 +08:00
Austin Veselka	1b886aa104	[Model] Adding Support for Qwen2VL as an Embedding Model. Using MrLight/dse-qwen2-2b-mrl-v1 (#9944 ) Signed-off-by: FurtherAI <austin.veselka@lighton.ai> Co-authored-by: FurtherAI <austin.veselka@lighton.ai>	2024-11-13 08:28:13 +00:00
电脑星人	3945c82346	[Model] Add support for Qwen2-VL video embeddings input & multiple image embeddings input with varied resolutions (#10221 ) Signed-off-by: imkero <kerorek@outlook.com>	2024-11-13 07:07:22 +00:00
youkaichao	377b74fe87	Revert "[ci][build] limit cmake version" (#10271 )	2024-11-12 15:06:48 -08:00
youkaichao	18081451f9	[doc] improve debugging doc (#10270 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-12 14:43:52 -08:00
youkaichao	96ae0eaeb2	[doc] fix location of runllm widget (#10266 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-12 14:34:39 -08:00
Guillaume Calmettes	36c513a076	[BugFix] Do not raise a `ValueError` when `tool_choice` is set to the supported `none` option and `tools` are not defined. (#10000 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2024-11-12 11:13:46 +00:00
youkaichao	3a28f18b0b	[doc] explain the class hierarchy in vLLM (#10240 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-11 22:56:44 -08:00
youkaichao	d1c6799b88	[doc] update debugging guide (#10236 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-11 15:21:12 -08:00
Yuan Tang	4800339c62	Add docs on serving with Llama Stack (#10183 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2024-11-11 11:28:55 -08:00
youkaichao	f0f2e5638e	[doc] improve debugging code (#10206 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-10 17:49:40 -08:00
Shawn Du	20cf2f553c	[Misc] small fixes to function tracing file path (#9543 ) Signed-off-by: Shawn Du <shawnd200@outlook.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-11-10 15:21:06 -08:00
Yongzao	bfb7d61a7c	[doc] Polish the integration with huggingface doc (#10195 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-11-10 10:22:04 -08:00
youkaichao	9fa4bdde9d	[ci][build] limit cmake version (#10188 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-09 16:27:26 -08:00
cjackal	d88bff1b96	[Frontend] add `add_request_id` middleware (#9594 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2024-11-09 10:18:29 +00:00
youkaichao	8a4358ecb5	[doc] explaining the integration with huggingface (#10173 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-09 01:02:54 -08:00
Cyrus Leung	49d2a41a86	[Doc] Adjust RunLLM location (#10176 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-08 20:07:10 -08:00
Cyrus Leung	e0191a95d8	[0/N] Rename `MultiModalInputs` to `MultiModalKwargs` (#10040 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-09 11:31:02 +08:00
Rafael Vasquez	6b30471586	[Misc] Improve Web UI (#10090 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-11-08 09:51:04 -08:00
Russell Bryant	3a7f15a398	[Doc] Move CONTRIBUTING to docs site (#9924 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-11-08 05:15:12 +00:00
whyiug	40d0e7411d	[Doc] Update FAQ links in spec_decode.rst (#9662 ) Signed-off-by: whyiug <whyiug@hotmail.com>	2024-11-08 04:44:58 +00:00
litianjian	28b2877d30	Online video support for VLMs (#10020 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: litianjian <litianjian@bytedance.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-07 20:25:59 +00:00
Maximilien de Bayser	ae62fd17c0	[Frontend] Tool calling parser for Granite 3.0 models (#9027 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2024-11-07 07:09:02 -08:00
Rafael Vasquez	d7263a1bb8	Doc: Improve benchmark documentation (#9927 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-11-06 23:50:35 -08:00
Cyrus Leung	db7db4aab9	[Misc] Consolidate ModelConfig code related to HF config (#10104 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-07 06:00:21 +00:00
youkaichao	e7b84c394d	[doc] add back Python 3.8 ABI (#10100 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-06 21:06:41 -08:00
Li, Jiang	a4b3e0c1e9	[Hardware][CPU] Update torch 2.5 (#9911 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2024-11-07 04:43:08 +00:00
Russell Bryant	098f94de42	[CI/Build] Drop Python 3.8 support (#10038 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-06 14:31:01 +00:00
Eric	406d4cc480	[Model][LoRA]LoRA support added for Qwen2VLForConditionalGeneration (#10022 ) Signed-off-by: ericperfect <ericperfectttt@gmail.com>	2024-11-06 14:13:15 +00:00
Jee Jee Li	a5bba7d234	[Model] Add Idefics3 support (#9767 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: B-201 <Joy25810@foxmail.com> Co-authored-by: B-201 <Joy25810@foxmail.com>	2024-11-06 11:41:17 +00:00
Jee Jee Li	2003cc3513	[Model][LoRA]LoRA support added for LlamaEmbeddingModel (#10071 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-11-06 09:49:19 +00:00
Konrad Zawora	a02a50e6e5	[Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend (#6143 ) Signed-off-by: yuwenzho <yuwen.zhou@intel.com> Signed-off-by: Chendi.Xue <chendi.xue@intel.com> Signed-off-by: Bob Zhu <bob.zhu@intel.com> Signed-off-by: zehao-intel <zehao.huang@intel.com> Signed-off-by: Konrad Zawora <kzawora@habana.ai> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Sanju C Sudhakaran <scsudhakaran@habana.ai> Co-authored-by: Michal Adamczyk <madamczyk@habana.ai> Co-authored-by: Marceli Fylcek <mfylcek@habana.ai> Co-authored-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com> Co-authored-by: Vivek Goel <vgoel@habana.ai> Co-authored-by: yuwenzho <yuwen.zhou@intel.com> Co-authored-by: Dominika Olszewska <dolszewska@habana.ai> Co-authored-by: barak goldberg <149692267+bgoldberg-habana@users.noreply.github.com> Co-authored-by: Michal Szutenberg <37601244+szutenberg@users.noreply.github.com> Co-authored-by: Jan Kaniecki <jkaniecki@habana.ai> Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyniewicz-habana@users.noreply.github.com> Co-authored-by: Krzysztof Wisniewski <kwisniewski@habana.ai> Co-authored-by: Dudi Lester <160421192+dudilester@users.noreply.github.com> Co-authored-by: Ilia Taraban <tarabanil@gmail.com> Co-authored-by: Chendi.Xue <chendi.xue@intel.com> Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai> Co-authored-by: Jakub Maksymczuk <jmaksymczuk@habana.ai> Co-authored-by: Tomasz Zielinski <85164140+tzielinski-habana@users.noreply.github.com> Co-authored-by: Sun Choi <schoi@habana.ai> Co-authored-by: Iryna Boiko <iboiko@habana.ai> Co-authored-by: Bob Zhu <41610754+czhu15@users.noreply.github.com> Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com> Co-authored-by: Zehao Huang <zehao.huang@intel.com> Co-authored-by: Andrzej Kotłowski <Andrzej.Kotlowski@intel.com> Co-authored-by: Yan Tomsinsky <73292515+Yantom1@users.noreply.github.com> Co-authored-by: Nir David <ndavid@habana.ai> Co-authored-by: Yu-Zhou <yu.zhou@intel.com> Co-authored-by: Ruheena Suhani Shaik <rsshaik@habana.ai> Co-authored-by: Karol Damaszke <kdamaszke@habana.ai> Co-authored-by: Marcin Swiniarski <mswiniarski@habana.ai> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Jacek Czaja <jacek.czaja@intel.com> Co-authored-by: Jacek Czaja <jczaja@habana.ai> Co-authored-by: Yuan <yuan.zhou@outlook.com>	2024-11-06 01:09:10 -08:00
Aaron Pham	21063c11c7	[CI/Build] drop support for Python 3.8 EOL (#8464 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2024-11-06 07:11:55 +00:00
Richard Liu	cd34029e91	Refactor TPU requirements file and pin build dependencies (#10010 ) Signed-off-by: Richard Liu <ricliu@google.com>	2024-11-05 16:48:44 +00:00
Roger Wang	6e056bcf04	[Doc] Update VLM doc about loading from local files (#9999 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2024-11-04 19:47:11 +00:00
shanshan wang	54597724f4	[Model] Add support for H2OVL-Mississippi models (#9747 ) Signed-off-by: Shanshan Wang <shanshan.wang@h2o.ai> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-11-04 00:15:36 +00:00
Michael Green	1d4cfe2be1	[Doc] Updated tpu-installation.rst with more details (#9926 ) Signed-off-by: Michael Green <mikegre@google.com>	2024-11-02 10:06:45 -04:00
Nick Hill	eed92f12fc	[Docs] Update Granite 3.0 models in supported models table (#9930 ) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-11-02 09:02:18 +00:00
Cyrus Leung	ba0d892074	[Frontend] Use a proper chat template for VLM2Vec (#9912 )	2024-11-01 14:09:07 +00:00
Cyrus Leung	06386a64dd	[Frontend] Chat-based Embeddings API (#9759 )	2024-11-01 08:13:35 +00:00
Cyrus Leung	d3aa2a8b2f	[Doc] Update multi-input support (#9906 )	2024-11-01 07:34:49 +00:00
Yongzao	2b5bf20988	[torch.compile] Adding torch compile annotations to some models (#9876 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-11-01 00:25:47 -07:00
Joe Runde	031a7995f3	[Bugfix][Frontend] Reject guided decoding in multistep mode (#9892 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-11-01 01:09:46 +00:00
Jee Jee Li	5608e611c2	[Doc] Update Qwen documentation (#9869 )	2024-10-31 08:54:18 +00:00
Guillaume Calmettes	abbfb6134d	[Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint (#9837 )	2024-10-30 18:15:56 -07:00
youkaichao	c2cd1a2142	[doc] update pp support (#9853 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-10-30 13:36:51 -07:00
Joe Runde	33d257735f	[Doc] link bug for multistep guided decoding (#9843 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-10-30 17:28:29 +00:00
Woosuk Kwon	211fe91aa8	[TPU] Correctly profile peak memory usage & Upgrade PyTorch XLA (#9438 )	2024-10-30 09:41:38 +00:00
Yan Ma	04a3ae0aca	[Bugfix] Fix multi nodes TP+PP for XPU (#8884 ) Signed-off-by: YiSheng5 <syhm@mail.ustc.edu.cn> Signed-off-by: yan ma <yan.ma@intel.com> Co-authored-by: YiSheng5 <syhm@mail.ustc.edu.cn>	2024-10-29 21:34:45 -07:00
Will Eaton	882a1ad0de	[Model] tool calling support for ibm-granite/granite-20b-functioncalling (#8339 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>	2024-10-29 15:07:37 -07:00
Russell Bryant	c5d7fb9ddc	[Doc] fix third-party model example (#9771 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-10-28 19:39:21 -07:00
kakao-kevin-us	6650e6a930	[Model] Add classification Task with Qwen2ForSequenceClassification (#9704 ) Signed-off-by: Kevin-Yang <ykcha9@gmail.com> Co-authored-by: Kevin-Yang <ykcha9@gmail.com>	2024-10-26 17:53:35 +00:00
Rafael Vasquez	228cfbd03f	[Doc] Improve quickstart documentation (#9256 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-10-25 14:32:10 -07:00
Cyrus Leung	b979143d5b	[Doc] Move additional tips/notes to the top (#9647 )	2024-10-24 09:43:59 +00:00
Yongzao	8a02cd045a	[torch.compile] Adding torch compile annotations to some models (#9639 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-10-24 00:54:57 -07:00
Cyrus Leung	836e8ef6ee	[Bugfix] Fix PP for ChatGLM and Molmo (#9422 )	2024-10-24 06:12:05 +00:00
Vinay R Damodaran	33bab41060	[Bugfix]: Make chat content text allow type content (#9358 ) Signed-off-by: Vinay Damodaran <vrdn@hey.com>	2024-10-24 05:05:49 +00:00
Yunfei Chu	fc6c274626	[Model] Add Qwen2-Audio model support (#9248 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-23 17:54:22 +00:00
Cyrus Leung	831540cf04	[Model] Support E5-V (#9576 )	2024-10-23 11:35:29 +08:00
Seth Kimmel	208cb34c81	[Doc]: Update tensorizer docs to include vllm[tensorizer] (#7889 ) Co-authored-by: Kaunil Dhruv <dhruv.kaunil@gmail.com>	2024-10-22 15:43:25 -07:00
Yuan	32a1ee74a0	[Hardware][Intel CPU][DOC] Update docs for CPU backend (#6212 ) Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com> Co-authored-by: Gubrud, Aaron D <aaron.d.gubrud@intel.com> Co-authored-by: adgubrud <96072084+adgubrud@users.noreply.github.com>	2024-10-22 10:38:04 -07:00
Isotr0py	bb392ea2d2	[Model][VLM] Initialize support for Mono-InternVL model (#9528 )	2024-10-22 16:01:46 +00:00
Rafael Vasquez	f7db5f0fa9	[Doc] Use shell code-blocks and fix section headers (#9508 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-10-22 06:43:24 +00:00
youkaichao	d621c43df7	[doc] fix format (#9562 )	2024-10-21 13:54:57 -07:00
Dhia Eddine Rhaiem	f6b97293aa	[Model] FalconMamba Support (#9325 )	2024-10-21 12:50:16 -04:00
Michael Goin	3921a2f29e	[Model] Support Pixtral models in the HF Transformers format (#9036 )	2024-10-18 13:29:56 -06:00
Cyrus Leung	051eaf6db3	[Model] Add user-configurable task for models that support both generation and embedding (#9424 )	2024-10-18 11:31:58 -07:00
tomeras91	d2b1bf55ec	[Frontend][Feature] Add jamba tool parser (#9154 )	2024-10-18 10:27:48 +00:00
Kuntai Du	81ede99ca4	[Core] Deprecating block manager v1 and make block manager v2 default (#8704 ) Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).	2024-10-17 11:38:15 -05:00
Li, Jiang	5eda21e773	[Hardware][CPU] compressed-tensor INT8 W8A8 AZP support (#9344 )	2024-10-17 12:21:04 -04:00
Junhao Li	5b8a1fde84	[Model][Bugfix] Add FATReLU activation and support for openbmb/MiniCPM-S-1B-sft (#9396 )	2024-10-16 16:40:24 +00:00
Roger Wang	59230ef32b	[Misc] Consolidate example usage of OpenAI client for multimodal models (#9412 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-16 11:20:51 +00:00
Cyrus Leung	cee711fdbb	[Core] Rename input data types (#8688 )	2024-10-16 10:49:37 +00:00
Cyrus Leung	7abba39ee6	[Model] VLM2Vec, the first multimodal embedding model in vLLM (#9303 )	2024-10-16 14:31:00 +08:00
Michael Goin	8e836d982a	[Doc] Fix code formatting in spec_decode.rst (#9348 )	2024-10-14 21:29:11 -07:00
Tyler Michael Smith	169b530607	[Bugfix] Clean up some cruft in mamba.py (#9343 )	2024-10-15 00:24:25 +00:00
Reza Salehi	dfe43a2071	[Model] Molmo vLLM Integration (#9016 ) Co-authored-by: sanghol <sanghol@allenai.org> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-10-14 07:56:24 -07:00
Yunmeng	2b184ddd4f	[Misc][Installation] Improve source installation script and doc (#9309 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-10-12 09:36:40 -07:00
Wallas Henrique	8baf85e4e9	[Doc] Compatibility matrix for mutual exclusive features (#8512 ) Signed-off-by: Wallas Santos <wallashss@ibm.com>	2024-10-11 11:18:50 -07:00
sixgod	6cf1167c1a	[Model] Add GLM-4v support and meet vllm==0.6.2 (#9242 )	2024-10-11 17:36:13 +00:00
Tyler Michael Smith	7342a7d7f8	[Model] Support Mamba (#6484 )	2024-10-11 15:40:06 +00:00
Cyrus Leung	e808156f30	[Misc] Collect model support info in a single process per model (#9233 )	2024-10-11 11:08:11 +00:00
omrishiv	f990bab2a4	[Doc][Neuron] add note to neuron documentation about resolving triton issue (#9257 ) Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>	2024-10-10 23:36:32 +00:00
Rafael Vasquez	055f3270d4	[Doc] Improve debugging documentation (#9204 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-10-10 10:48:51 -07:00
whyiug	04de9057ab	[Model] support input image embedding for minicpmv (#9237 )	2024-10-10 15:00:47 +00:00
youkaichao	de895f1697	[misc] improve model support check in another process (#9208 )	2024-10-09 21:58:27 -07:00
Li, Jiang	ca77dd7a44	[Hardware][CPU] Support AWQ for CPU backend (#7515 )	2024-10-09 10:28:08 -06:00
Jiangtao Hu	dc4aea677a	[Doc] Fix VLM prompt placeholder sample bug (#9170 )	2024-10-09 08:59:42 +00:00
Yuan Tang	acce7630c1	Update link to KServe deployment guide (#9173 )	2024-10-09 03:58:49 +00:00
Michael Goin	9ba0bd6aa6	Add `lm-eval` directly to requirements-test.txt (#9161 )	2024-10-08 18:22:31 -07:00
Rafael Vasquez	de24046fcd	[Doc] Improve contributing and installation documentation (#9132 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-10-08 20:22:08 +00:00
Sayak Paul	1874c6a1b0	[Doc] Update vlm.rst to include an example on videos (#9155 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-10-08 18:12:29 +00:00
TimWang	93cf74a8a7	[Doc]: Add deploying_with_k8s guide (#8451 )	2024-10-07 13:31:45 -07:00
Cyrus Leung	151ef4efd2	[Model] Support NVLM-D and fix QK Norm in InternViT (#9045 ) Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2024-10-07 11:55:12 +00:00
Cyrus Leung	b22b798471	[Model] PP support for embedding models and update docs (#9090 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-10-06 16:35:27 +08:00
Cyrus Leung	f22619fe96	[Misc] Remove user-facing error for removed VLM args (#9104 )	2024-10-06 01:33:52 -07:00
Andy Dai	5df1834895	[Bugfix] Fix order of arguments matters in config.yaml (#8960 )	2024-10-05 17:35:11 +00:00
Roger Wang	26aa325f4f	[Core][VLM] Test registration for OOT multimodal models (#8717 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-04 10:38:25 -07:00
Cyrus Leung	0e36fd4909	[Misc] Move registry to its own file (#9064 )	2024-10-04 10:01:37 +00:00
Murali Andoorveedu	0f6d7a9a34	[Models] Add remaining model PP support (#7168 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-04 10:56:58 +08:00
代君	3dbb215b38	[Frontend][Feature] support tool calling for internlm/internlm2_5-7b-chat model (#8405 )	2024-10-04 10:36:39 +08:00
Nick Hill	18c2e30c57	[Doc] Update Granite model docs (#9025 )	2024-10-03 02:42:24 +00:00
Sergey Shlyapnikov	f58d4fccc9	[OpenVINO] Enable GPU support for OpenVINO vLLM backend (#8192 )	2024-10-02 17:50:01 -04:00
Cyrus Leung	4f341bd4bf	[Doc] Update list of supported models (#8987 )	2024-10-02 00:35:39 +08:00
whyiug	e01ab595d8	[Model] support input embeddings for qwen2vl (#8856 )	2024-09-30 03:16:10 +00:00
youkaichao	cc276443b5	[doc] organize installation doc and expose per-commit docker (#8931 )	2024-09-28 17:48:41 -07:00
youkaichao	d86f6b2afb	[misc] fix wheel name (#8919 )	2024-09-27 22:10:44 -07:00
Cyrus Leung	3b00b9c26c	[Core] rename`PromptInputs` and `inputs` (#8876 )	2024-09-26 20:35:15 -07:00
Maximilien de Bayser	344cd2b6f4	[Feature] Add support for Llama 3.1 and 3.2 tool use (#8343 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2024-09-26 17:01:42 -07:00
youkaichao	70de39f6b4	[misc][installation] build from source without compilation (#8818 )	2024-09-26 13:19:04 -07:00
Roger Wang	4bb98f2190	[Misc] Update config loading for Qwen2-VL and remove Granite (#8837 )	2024-09-26 07:45:30 -07:00
Roger Wang	e2c6e0a829	[Doc] Update doc for Transformers 4.45 (#8817 )	2024-09-25 13:29:48 -07:00
Chen Zhang	770ec6024f	[Model] Add support for the multi-modal Llama 3.2 model (#8811 ) Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Chang Su <chang.s.su@oracle.com> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-09-25 13:29:32 -07:00
Simon Mo	4f1ba0844b	Revert "rename PromptInputs and inputs with backward compatibility (#8760 ) (#8810 )	2024-09-25 10:36:26 -07:00
Cyrus Leung	28e1299e60	rename PromptInputs and inputs with backward compatibility (#8760 )	2024-09-25 09:36:47 -07:00
Hongxia Yang	1c046447a6	[CI/Build][Bugfix][Doc][ROCm] CI fix and doc update after ROCm 6.2 upgrade (#8777 )	2024-09-25 22:26:37 +08:00
Jee Jee Li	13f9f7a3d0	[[Misc]Upgrade bitsandbytes to the latest version 0.44.0 (#8768 )	2024-09-24 17:08:55 -07:00
Simon Mo	3185fb0cca	Revert "[Core] Rename `PromptInputs` to `PromptType`, and `inputs` to `prompt`" (#8750 )	2024-09-24 05:45:20 +00:00
Hongxia Yang	530821d00c	[Hardware][AMD] ROCm6.2 upgrade (#8674 )	2024-09-23 18:52:39 -07:00
Daniele	ee5f34b1c2	[CI/Build] use setuptools-scm to set __version__ (#4738 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-09-23 09:44:26 -07:00
Yan Ma	d23679eb99	[Bugfix] fix docker build for xpu (#8652 )	2024-09-22 22:54:18 -07:00
youkaichao	d4a2ac8302	[build] enable existing pytorch (for GH200, aarch64, nightly) (#8713 )	2024-09-22 12:47:54 -07:00
litianjian	5b59532760	[Model][VLM] Add LLaVA-Onevision model support (#8486 ) Co-authored-by: litianjian <litianjian@bytedance.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-22 10:51:44 -07:00
Andy Dai	4dfdf43196	[Doc] Fix typo in AMD installation guide (#8689 )	2024-09-21 00:24:12 -07:00
Cyrus Leung	0057894ef7	[Core] Rename `PromptInputs` and `inputs`(#8673 )	2024-09-20 19:00:54 -07:00
omrishiv	7c8566aa4f	[Doc] neuron documentation update (#8671 ) Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>	2024-09-20 15:04:37 -07:00
Niklas Muennighoff	3b63de9353	[Model] Add OLMoE (#7922 )	2024-09-20 09:31:41 -07:00
Jiaxin Shan	260d40b5ea	[Core] Support Lora lineage and base model metadata management (#6315 )	2024-09-20 06:20:56 +00:00
Isotr0py	ea4647b7d7	[Doc] Add documentation for GGUF quantization (#8618 )	2024-09-19 13:15:55 -06:00
Geun, Lim	e18749ff09	[Model] Support Solar Model (#8386 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-09-18 11:04:00 -06:00
Alexander Matveev	7c7714d856	[Core][Bugfix][Perf] Introduce `MQLLMEngine` to avoid `asyncio` OH (#8157 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-09-18 13:56:58 +00:00
youkaichao	fa0c114fad	[doc] improve installation doc (#8550 ) Co-authored-by: Andy Dai <76841985+Imss27@users.noreply.github.com>	2024-09-17 16:24:06 -07:00
youkaichao	2759a43a26	[doc] update doc on testing and debugging (#8514 )	2024-09-16 12:10:23 -07:00
ywfang	8a0cf1ddc3	[Model] support minicpm3 (#8297 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-14 14:50:26 +00:00
Isotr0py	f57092c00b	[Doc] Add oneDNN installation to CPU backend documentation (#8467 )	2024-09-13 18:06:30 +00:00
Cyrus Leung	a84e598e21	[CI/Build] Reorganize models tests (#7820 )	2024-09-13 10:20:06 -07:00
youkaichao	cab69a15e4	[doc] recommend pip instead of conda (#8446 )	2024-09-12 23:52:41 -07:00
Alex Brooks	c6202daeed	[Model] Support multiple images for qwen-vl (#8247 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-12 10:10:54 -07:00

... 6 7 8 9 10 ...

1092 Commits