vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Michael Goin	ca377cf1b9	Use CUDA 12.4 as default for release and nightly wheels (#12098 )	2025-02-26 19:06:37 -08:00
Jee Jee Li	5157338ed9	[Misc] Improve LoRA spelling (#13831 )	2025-02-25 23:43:01 -08:00
Michael Goin	07c4353057	[Model] Support Grok1 (#13795 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-26 01:07:12 +00:00
Harry Mellor	cdc1fa12eb	Remove unused kwargs from model definitions (#13555 )	2025-02-24 17:13:52 -08:00
Nicolò Lucchesi	444b0f0f62	[Misc][Docs] Raise error when flashinfer is not installed and `VLLM_ATTENTION_BACKEND` is set (#12513 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-02-24 10:43:21 -05:00
Cyrus Leung	8354f6640c	[Doc] Dockerfile instructions for optional dependencies and dev transformers (#13699 )	2025-02-22 06:04:31 -08:00
Mark McLoughlin	2cb8c1540e	[Metrics] Add `--show-hidden-metrics-for-version` CLI arg (#13295 )	2025-02-22 00:20:45 -08:00
Yuan Tang	8c0dd3d4df	docs: Add a note on full CI run in contributing guide (#13646 )	2025-02-21 21:53:59 -08:00
Gabriel Marinho	1c3c975766	[FEATURE] Enables /score endpoint for embedding models (#12846 )	2025-02-20 22:09:47 -08:00
Kante Yin	44c33f01f3	Add llmaz as another integration (#13643 ) Signed-off-by: kerthcet <kerthcet@gmail.com>	2025-02-21 03:52:40 +00:00
Joe Runde	bfbc0b32c6	[Frontend] Add backend-specific options for guided decoding (#13505 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-02-20 15:07:58 -05:00
Harry Mellor	992e5c3d34	Merge similar examples in `offline_inference` into single `basic` example (#12737 )	2025-02-20 04:53:51 -08:00
Jee Jee Li	512368e34a	[Misc] Qwen2.5 VL support LoRA (#13261 )	2025-02-19 18:37:55 -08:00
Wilson Wu	01c184b8f3	Fix copyright year to auto get current year (#13561 )	2025-02-19 16:55:34 +00:00
youkaichao	ad5a35c21b	[doc] clarify multi-node serving doc (#13558 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-19 22:32:17 +08:00
youkaichao	52ce14d31f	[doc] clarify profiling is only for developers (#13554 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-19 20:55:58 +08:00
Roger Wang	fd84857f64	[Doc] Add clarification note regarding paligemma (#13511 )	2025-02-18 22:24:03 -08:00
Harry Mellor	00b69c2d27	[Misc] Remove dangling references to `--use-v2-block-manager` (#13492 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-19 03:37:26 +00:00
youkaichao	7b203b7694	[misc] fix debugging code (#13487 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-18 09:37:11 -08:00
Harry Mellor	2358ca527b	[Doc]: Improve feature tables (#13224 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-18 18:52:39 +08:00
Isotr0py	67ef8f666a	[Model] Enable quantization support for `transformers` backend (#12960 )	2025-02-17 19:52:47 -08:00
Cyrus Leung	7b623fca0b	[VLM] Check required fields before initializing field config in `DictEmbeddingItems` (#13380 )	2025-02-17 01:36:07 -08:00
yankooo	f857311d13	Fix spelling error in index.md (#13369 )	2025-02-17 06:53:20 +00:00
shangmingc	46cdd59577	[Feature][Spec Decode] Simplify the use of Eagle Spec Decode (#12304 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-02-16 19:32:26 -08:00
凌	da833b0aee	[Docs] Change myenv to vllm. Update python_env_setup.inc.md (#13325 )	2025-02-16 16:04:21 +00:00
Roger Wang	b7d309860e	[V1] Update doc and examples for H2O-VL (#13349 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-02-16 10:35:54 +00:00
Cyrus Leung	367cb8ce8c	[Doc] [2/N] Add Fuyu E2E example for multimodal processor (#13331 )	2025-02-15 07:06:23 -08:00
Nicolò Lucchesi	579d7a63b2	[Bugfix][Docs] Fix offline Whisper (#13274 )	2025-02-14 21:32:37 -08:00
Nicolò Lucchesi	d84cef76eb	[Frontend] Add `/v1/audio/transcriptions` OpenAI API endpoint (#12909 )	2025-02-13 07:23:45 -08:00
Cyrus Leung	1bc3b5e71b	[VLM] Separate text-only and vision variants of the same model architecture (#13157 )	2025-02-13 06:19:15 -08:00
Cyrus Leung	c9d3ecf016	[VLM] Merged multi-modal processor for Molmo (#12966 )	2025-02-13 04:34:00 -08:00
Russell Bryant	d46d490c27	[Frontend] Move CLI code into vllm.cmd package (#12971 )	2025-02-12 23:12:21 -08:00
Cody Yu	60c68df6d1	[Build] Automatically use the wheel of the base commit with Python-only build (#13178 )	2025-02-12 23:10:28 -08:00
Harry Mellor	deb6c1c6b4	[Doc] Improve OpenVINO installation doc (#13102 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-11 18:02:46 +00:00
Farzad Abdolhosseini	08b2d845d6	[Model] Ultravox Model: Support v0.5 Release (#12912 ) Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>	2025-02-10 22:02:48 +00:00
Cyrus Leung	51f0b5f7f6	[Bugfix] Clean up and fix multi-modal processors (#13012 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-10 10:45:21 +00:00
Yuan Tang	243137143c	[Doc] Add link to tool_choice tracking issue in tool_calling.md (#13003 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-10 06:09:33 +00:00
Jee Jee Li	86222a3dab	[VLM] Merged multi-modal processor for GLM4V (#12449 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-02-08 20:32:16 +00:00
Cyrus Leung	8a69e0e20e	[CI/Build] Auto-fix Markdown files (#12941 )	2025-02-08 04:25:15 -08:00
Jun Duan	256a2d29dc	[Doc] Correct HF repository for TeleChat2 models (#12949 )	2025-02-08 01:42:15 -08:00
TJian	eaa92d4437	[ROCm] [Feature] [Doc] [Dockerfile] [BugFix] Support Per-Token-Activation Per-Channel-Weight FP8 Quantization Inferencing (#12501 )	2025-02-07 08:13:43 -08:00
Jitse Klomp	afe74f7a96	[Doc] double quote cmake package in build.inc.md (#12840 )	2025-02-06 09:17:55 -08:00
Sumit Vij	d88506dda4	[Model] LoRA Support for Ultravox model (#11253 )	2025-02-05 19:54:13 -08:00
Cyrus Leung	75404d041b	[VLM] Update compatibility with transformers 4.49	2025-02-05 19:09:45 -08:00
Roger Wang	bf3b79efb8	[VLM] Qwen2.5-VL	2025-02-05 13:31:38 -08:00
Russell Bryant	9a5b1554b4	[Docs] Drop duplicate [source] links	2025-02-05 13:30:50 -08:00
Michael Goin	c53dc466b1	[Doc] Remove performance warning for auto_awq.md (#12743 )	2025-02-04 22:43:11 -08:00
Isotr0py	815079de8e	[VLM] merged multimodal processor and V1 support for idefics3 (#12660 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-02-04 20:00:51 +08:00
Cyrus Leung	d1ca7df84d	[VLM] Merged multi-modal processor for InternVL-based models (#12553 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-02-04 16:44:52 +08:00
Thomas Parnell	bb392af434	[Doc] Replace ibm-fms with ibm-ai-platform (#12709 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-02-04 07:05:04 +00:00
Arthur	a1a2aaadb9	[Model]: Add `transformers` backend support (#11330 ) # Adds support for `transformers` as a backend Following https://github.com/huggingface/transformers/pull/35235, a bunch of models should already be supported, we are ramping up support for more models. Thanks @Isotr0py for the TP support, and @hmellor for his help as well! This includes: - `trust_remote_code=True` support: any model on the hub, if it implements attention the correct way can be natively supported!! - tensor parallel support --------- Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <41363108+Isotr0py@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-02-03 21:30:38 +08:00
youkaichao	e64330910b	[doc][misc] clarify VLLM_HOST_IP for multi-node inference (#12667 ) As more and more people are trying deepseek models with multi-node inference, https://github.com/vllm-project/vllm/issues/7815 becomes more frequent. Let's give clear message to users. Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-03 09:32:18 +08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Kunshang Ji	f256ebe4df	[Hardware][Intel GPU] add XPU bf16 support (#12392 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-02-02 10:17:26 +00:00
Brian Dellabetta	44bbca78d7	[Doc] int4 w4a16 example (#12585 ) Based on a request by @mgoin , with @kylesayrs we have added an example doc for int4 w4a16 quantization, following the pre-existing int8 w8a8 quantization example and the example available in [`llm-compressor`](https://github.com/vllm-project/llm-compressor/blob/main/examples/quantization_w4a16/llama3_example.py) FIX #n/a (no issue created) @kylesayrs and I have discussed a couple additional improvements for the quantization docs. We will revisit at a later date, possibly including: - A section for "choosing the correct quantization scheme/ compression technique" - Additional vision or audio calibration datasets --------- Signed-off-by: Brian Dellabetta <bdellabe@redhat.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-01-31 15:38:48 -08:00
Harry Mellor	60808bd4c7	[Doc] Improve installation signposting (#12575 ) - Make device tab names more explicit - Add comprehensive list of devices to https://docs.vllm.ai/en/latest/getting_started/installation/index.html - Add `attention` blocks to the intro of all devices that don't have pre-built wheels/images --------- Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-31 15:38:35 -08:00
Cody Yu	60bcef000e	[Docs][V1] Prefix caching design (#12598 ) - Create v1 design document section in docs. - Add prefix caching design doc. @WoosukKwon @ywang96 --------- Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2025-01-31 12:30:46 -08:00
Cody Yu	847f883232	[Git] Automatically sign-off commits (#12595 ) It's very annoying when I forgot to add `-s` in `git commit` to sign-off, because I then need to `git rebase HEAD~1 --signoff` and `git push -f` to fix the DCO. This PR adds a hook to sign off commits automatically when `-s` is missing to solve this problem. The only change from the user side is now users have to install 2 hooks, so instead of just ``` pre-commit install ``` Now we need to ``` pre-commit install --hook-type pre-commit --hook-type commit-msg ``` Note that even if users still only install the pre-commit hook, they won't get any error in `git commit`. Just the sign-off hook won't run. cc @hmellor @youkaichao --------- Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2025-01-31 12:30:33 -08:00
Harry Mellor	e3f7ff65e7	Add favicon to docs (#12611 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-31 09:20:34 -08:00
Harry Mellor	a2769032ca	Set `?device={device}` when changing tab in installation guides (#12560 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-30 00:05:42 -08:00
Alphi	d93bf4da85	[Model] Refactoring of MiniCPM-V and add MiniCPM-o-2.6 support for vLLM (#12069 ) Signed-off-by: hzh <hezhihui_thu@163.com> Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> Signed-off-by: shaochangxu.scx <shaochangxu.scx@antgroup.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Oleg Mosalov <oleg@krai.ai> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu> Signed-off-by: Chenguang Li <757486878@qq.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Shanshan Shen <467638484@qq.com> Signed-off-by: elijah <f1renze.142857@gmail.com> Signed-off-by: Yikun <yikunkero@gmail.com> Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Konrad Zawora <kzawora@habana.ai> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> Co-authored-by: shaochangxu <85155497+shaochangxu@users.noreply.github.com> Co-authored-by: shaochangxu.scx <shaochangxu.scx@antgroup.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: sixgod <evethwillbeok@outlook.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Akshat Tripathi <Akshat.tripathi6568@gmail.com> Co-authored-by: Oleg Mosalov <oleg@krai.ai> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Avshalom Manevich <12231371+avshalomman@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by: Yangcheng Li <liyangcheng.lyc@alibaba-inc.com> Co-authored-by: Siyuan Li <94890248+liaoyanqing666@users.noreply.github.com> Co-authored-by: Concurrensee <yida.wu@amd.com> Co-authored-by: Chenguang Li <757486878@qq.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Alex Brooks <alex.brooks@ibm.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Shanshan Shen <467638484@qq.com> Co-authored-by: elijah <30852919+e1ijah1@users.noreply.github.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Steve Luo <36296769+SunflowerAries@users.noreply.github.com> Co-authored-by: mgoin <michael@neuralmagic.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Konrad Zawora <kzawora@habana.ai> Co-authored-by: TJian <tunjian1996@gmail.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: maang-h <55082429+maang-h@users.noreply.github.com> Co-authored-by: Elfie Guo <164945471+elfiegg@users.noreply.github.com> Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-01-29 09:24:59 +00:00
Harry Mellor	dd6a3a02cb	[Doc] Convert docs to use colon fences (#12471 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-29 11:38:29 +08:00
Ce Gao	a7e3eba66f	[Frontend] Support reasoning content for deepseek r1 (#12473 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai> Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Michael Goin <mgoin@redhat.com>	2025-01-29 11:38:08 +08:00
Jun Duan	925d2f1908	[Doc] Fix typo for x86 CPU installation (#12514 ) Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>	2025-01-28 16:37:10 +00:00
Cyrus Leung	8f58a51358	[VLM] Merged multi-modal processor and V1 support for Qwen-VL (#12504 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-28 16:25:05 +00:00
Yuan Tang	582cf78798	[DOC] Add link to vLLM blog (#12460 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-01-27 03:46:19 +00:00
Kyle Mistele	0034b09ceb	[Frontend] Rerank API (Jina- and Cohere-compatible API) (#12376 ) Signed-off-by: Kyle Mistele <kyle@mistele.com>	2025-01-26 19:58:45 -07:00
Mohit Deopujari	9a0f3bdbe5	[Hardware][Gaudi][Doc] Add missing step in setup instructions (#12382 )	2025-01-24 09:43:49 +00:00
Russell Bryant	c5cffcd0cd	[Docs] Update spec decode + structured output in compat matrix (#12373 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-01-24 01:15:52 +00:00
Woosuk Kwon	682b55bc07	[Docs] Add meetup slides (#12345 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-01-23 14:10:03 -08:00
Isotr0py	2cbeedad09	[Docs] Document Phi-4 support (#12362 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-01-23 19:18:51 +00:00
Gregory Shtrasberg	e97f802b2d	[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2025-01-23 18:04:03 +00:00
Cyrus Leung	d07efb31c5	[Doc] Troubleshooting errors during model inspection (#12351 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-23 22:46:58 +08:00
youkaichao	511627445e	[doc] explain common errors around torch.compile (#12340 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-23 14:56:02 +08:00
Russell Bryant	7551a34032	[Docs] Document vulnerability disclosure process (#12326 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-01-23 03:44:09 +00:00
Michael Goin	01a55941f5	[Docs] Update FP8 KV Cache documentation (#12238 ) Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-01-23 11:18:09 +08:00
Hongxia Yang	09ccc9c8f7	[Documentation][AMD] Add information about prebuilt ROCm vLLM docker for perf validation purpose (#12281 ) Signed-off-by: Hongxia Yang <hongxyan@amd.com>	2025-01-22 07:49:22 +08:00
Cyrus Leung	96912550c8	[Misc] Rename `MultiModalInputsV2 -> MultiModalInputs` (#12244 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-21 07:31:19 +00:00
Gregory Shtrasberg	d4b62d4641	[AMD][Build] Porting dockerfiles from the ROCm/vllm fork (#11777 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-01-21 12:22:23 +08:00
Isotr0py	83609791d2	[Model] Add Qwen2 PRM model support (#12202 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-01-20 14:59:46 +08:00
Harry Mellor	3ea7b94523	Move linting to `pre-commit` (#11975 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-20 14:58:01 +08:00
Roger Wang	81763c58a0	[V1] Add V1 support of Qwen2-VL (#12128 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: imkero <kerorek@outlook.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-19 19:52:13 +08:00
Isotr0py	02798ecabe	[Model] Port deepseek-vl2 processor, remove dependency (#12169 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-01-18 13:59:39 +08:00
Hongxia Yang	c09503ddd6	[AMD][CI/Build][Bugfix] use pytorch stale wheel (#12172 ) Signed-off-by: hongxyan <hongxyan@amd.com>	2025-01-18 11:15:53 +08:00
Yuan Tang	1475847a14	[Doc] Add instructions on using Podman when SELinux is active (#12136 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-01-17 04:45:36 +00:00
Isotr0py	62b06ba23d	[Model] Add support for deepseek-vl2-tiny model (#12068 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-01-16 17:14:48 +00:00
Cyrus Leung	f8ef146f03	[Doc] Add documentation for specifying model architecture (#12105 )	2025-01-16 15:53:43 +08:00
RunningLeon	97eb97b5a4	[Model]: Support internlm3 (#12037 )	2025-01-15 11:35:17 +00:00
Kyle Sayers	3f9b7ab9f5	[Doc] Update examples to remove SparseAutoModelForCausalLM (#12062 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-01-15 06:36:01 +00:00
Harry Mellor	c9d6ff530b	Explain where the engine args go when using Docker (#12041 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-14 16:05:50 +00:00
TJian	8a1f938e6f	[Doc] Update Quantization Hardware Support Documentation (#12025 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-01-14 04:37:52 +00:00
Woosuk Kwon	1a401252b5	[Docs] Add Sky Computing Lab to project intro (#12019 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-01-13 17:24:36 -08:00
Harry Mellor	e8c23ff989	[Doc] Organise installation documentation into categories and tabs (#11935 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-13 12:27:36 +00:00
Roger Wang	cd8249903f	[Doc][V1] Update model implementation guide for V1 support (#11998 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-01-13 11:58:54 +00:00
Akshat Tripathi	8bddb73512	[Hardware][CPU] Multi-LoRA implementation for the CPU backend (#11100 ) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Oleg Mosalov <oleg@krai.ai> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Oleg Mosalov <oleg@krai.ai> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-01-12 13:01:52 +00:00
Isotr0py	f967e51f38	[Model] Initialize support for Deepseek-VL2 models (#11578 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-01-12 00:17:24 -08:00
Rafael Vasquez	43f3d9e699	[CI/Build] Add markdown linter (#11857 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2025-01-12 00:17:13 -08:00
Cyrus Leung	a991f7d508	[Doc] Basic guide for writing unit tests for new models (#11951 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-11 21:27:24 +08:00
Li, Jiang	aa1e77a19c	[Hardware][CPU] Support MOE models on x86 CPU (#11831 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-01-10 11:07:58 -05:00
Harry Mellor	482cdc494e	[Doc] Rename offline inference examples (#11927 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-10 23:50:29 +08:00
Cyrus Leung	12664ddda5	[Doc] [1/N] Initial guide for merged multi-modal processor (#11925 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-10 14:30:25 +00:00
Harry Mellor	d85c47d6ad	Replace "online inference" with "online serving" (#11923 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-10 12:05:56 +00:00
Cyrus Leung	3de2b1eafb	[Doc] Show default pooling method in a table (#11904 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-10 11:25:20 +08:00
Cyrus Leung	c3cf54dda4	[Doc][5/N] Move Community and API Reference to the bottom (#11896 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-01-10 03:10:12 +00:00
Charles Frye	36f5303578	[Docs] Add Modal to deployment frameworks (#11907 )	2025-01-09 23:26:37 +00:00
Cyrus Leung	9a228348d2	[Misc] Provide correct Pixtral-HF chat template (#11891 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-09 10:19:37 -07:00
Cyrus Leung	65097ca0af	[Doc] Add model development API Reference (#11884 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-09 09:43:40 +00:00
Guspan Tanadi	a732900efc	[Doc] Intended links Python multiprocessing library (#11878 )	2025-01-09 05:39:39 +00:00
Michael Goin	730e9592e9	[Doc] Recommend uv and python 3.12 for quickstart guide (#11849 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2025-01-09 11:37:48 +08:00
Cyrus Leung	5984499e47	[Doc] Expand Multimodal API Reference (#11852 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-08 17:14:14 +00:00
Cyrus Leung	6cd40a5bfe	[Doc][4/N] Reorganize API Reference (#11843 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-08 21:34:44 +08:00
Harry Mellor	aba8d6ee00	[Doc] Move examples into categories (#11840 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-08 13:09:53 +00:00
Wallas Henrique	cfd3219f58	[Hardware][Apple] Native support for macOS Apple Silicon (#11696 ) Signed-off-by: Wallas Santos <wallashss@ibm.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-01-08 16:35:49 +08:00
Simon Mo	a1b2b8606e	[Docs] Update sponsor name: 'Novita' to 'Novita AI' (#11833 )	2025-01-07 23:05:46 -08:00
youkaichao	ad9f1aa679	[doc] update wheels url (#11830 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-08 14:36:49 +08:00
Simon Mo	259abd8953	[Docs] reorganize sponsorship page (#11639 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-01-07 21:16:08 -08:00
Harry Mellor	5950f555a1	[Doc] Group examples into categories (#11782 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-08 09:20:12 +08:00
sroy745	973f5dc581	[Doc]Add documentation for using EAGLE in vLLM (#11417 ) Signed-off-by: Sourashis Roy <sroy@roblox.com>	2025-01-07 19:19:12 +00:00
Cyrus Leung	c0efe92d8b	[Doc] Add note to `gte-Qwen2` models (#11808 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-07 21:50:58 +08:00
youkaichao	d9fa1c05ad	[doc] update how pip can install nightly wheels (#11806 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-07 21:42:58 +08:00
Roger Wang	2de197bdd4	[V1] Support audio language models on V1 (#11733 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-01-07 19:47:36 +08:00
youkaichao	869e829b85	[doc] add doc to explain how to use uv (#11773 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-01-07 18:41:17 +08:00
Roger Wang	8082ad7950	[V1][Doc] Update V1 support for `LLaVa-NeXT-Video` (#11798 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-01-07 09:55:39 +00:00
Russell Bryant	ce1917fcf2	[Doc] Create a vulnerability management team (#9925 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-01-06 22:57:32 -08:00
Cyrus Leung	8ceffbf315	[Doc][3/N] Reorganize Serving section (#11766 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-07 11:20:01 +08:00
Roger Wang	91b361ae89	[V1] Extend beyond image modality and support mixed-modality inference with Llava-OneVision (#11685 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-06 19:58:16 +00:00
youkaichao	4ca5d40adc	[doc] explain how to add interleaving sliding window support (#11771 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-06 21:57:44 +08:00
Cyrus Leung	ee77fdb5de	[Doc][2/N] Reorganize Models and Usage sections (#11755 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-06 21:40:31 +08:00
Suraj Deshmukh	2a622d704a	k8s-config: Update the secret to use stringData (#11679 ) Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com>	2025-01-06 08:01:22 +00:00
Cyrus Leung	402d378360	[Doc] [1/N] Reorganize Getting Started section (#11645 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-06 02:18:33 +00:00
Alberto Ferrer	d1d49397e7	Update bnb.md with example for OpenAI (#11718 )	2025-01-04 06:29:02 +00:00
Hust_YangXian	9c93636d84	Update tool_calling.md (#11701 )	2025-01-04 06:16:30 +00:00
Sachin Varghese	2f1e8e8f54	Update default max_num_batch_tokens for chunked prefill (#11694 )	2025-01-03 00:25:53 +00:00
Chunyang Wen	84c35c374a	According to vllm.EngineArgs, the name should be distributed_executor_backend (#11689 )	2025-01-02 18:14:16 +00:00
Cyrus Leung	365801fedd	[VLM] Add max-count checking in data parser for single image models (#11661 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-12-31 22:15:21 -08:00
Roger Wang	e7c7c5e822	[V1][VLM] V1 support for selected single-image models. (#11632 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Isotr0py <2037008807@qq.com>	2024-12-31 21:17:22 +00:00
Matthias Vogler	a2a40bcd0d	[Model][LoRA]LoRA support added for MolmoForCausalLM (#11439 ) Signed-off-by: Matthias Vogler <matthias.vogler@joesecurity.org> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Matthias Vogler <matthias.vogler@joesecurity.org> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-30 17:33:06 -08:00
youkaichao	b12e87f942	[platforms] enable platform plugins (#11602 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-30 20:24:45 +08:00
Cyrus Leung	32b4c63f02	[Doc] Convert list tables to MyST (#11594 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-29 15:56:22 +08:00
youkaichao	328841d002	[bugfix] interleaving sliding window for cohere2 model (#11583 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-28 16:55:42 +00:00
Cyrus Leung	d427e5cfda	[Doc] Minor documentation fixes (#11580 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-28 21:53:59 +08:00
Isotr0py	d34be24bb1	[Model] Support InternLM2 Reward models (#11571 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-28 06:14:10 +00:00
Cyrus Leung	101418096f	[VLM] Support caching in merged multi-modal processor (#11396 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-27 17:22:48 +00:00
Chen1022	5ce4627a7e	[Doc] Add xgrammar in doc (#11549 ) Signed-off-by: ccjincong <chenjincong11@gmail.com>	2024-12-27 13:05:10 +00:00
AlexHe99	d003f3ea39	Update deploying_with_k8s.md with AMD ROCm GPU example (#11465 ) Signed-off-by: Alex He <alehe@amd.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-27 10:00:04 +00:00
Robert Shaw	0c0c2015c5	Update openai_compatible_server.md (#11536 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-12-26 16:26:18 -08:00
Simon Mo	82d24f7aac	[Docs] Document Deepseek V3 support (#11535 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2024-12-26 16:21:56 -08:00
Isotr0py	b85a977822	[Doc] Add video example to openai client for multimodal (#11521 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-26 17:31:29 +00:00
Roger Wang	7492a36207	[Doc] Add `QVQ` and `QwQ` to the list of supported models (#11509 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-12-26 09:44:32 +00:00
Cyrus Leung	6ad909fdda	[Doc] Improve GitHub links (#11491 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-25 14:49:26 -08:00
Cyrus Leung	3f3e92e1f2	[Model] Automatic conversion of classification and reward models (#11469 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-24 18:22:22 +00:00
Cyrus Leung	9edca6bf8f	[Frontend] Online Pooling API (#11457 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-24 17:54:30 +08:00
Rafael Vasquez	32aa2059ad	[Docs] Convert rST to MyST (Markdown) (#11145 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-12-23 22:35:38 +00:00
Yuan Tang	2e726680b3	[Bugfix] torch nightly version in ROCm installation guide (#11423 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2024-12-23 17:20:22 +00:00
youkaichao	5d2248d81a	[doc] explain nccl requirements for rlhf (#11381 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-20 13:00:56 -08:00
omer-dayan	995f56236b	[Core] Loading model from S3 using RunAI Model Streamer as optional loader (#10192 ) Signed-off-by: OmerD <omer@run.ai>	2024-12-20 16:46:24 +00:00
youkaichao	1ecc645b8f	[doc] backward compatibility for 0.6.4 (#11359 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-19 21:33:53 -08:00
youkaichao	7801f56ed7	[ci][gh200] dockerfile clean up (#11351 ) Signed-off-by: drikster80 <ed.sealing@gmail.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: drikster80 <ed.sealing@gmail.com> Co-authored-by: cenzhiyao <2523403608@qq.com>	2024-12-19 18:13:06 -08:00
Yehoshua Cohen	6c7f881541	[Model] Add JambaForSequenceClassification model (#10860 ) Signed-off-by: Yehoshua Cohen <yehoshuaco@ai21.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Yehoshua Cohen <yehoshuaco@ai21.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-19 22:48:06 +08:00
Travis Johnson	17ca964273	[Model] IBM Granite 3.1 (#11307 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-12-19 11:27:24 +08:00
kYLe	66d4b16724	[Frontend] Add OpenAI API support for input_audio (#11027 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-16 22:09:58 -08:00
youkaichao	35bae114a8	fix gh200 tests on main (#11246 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-16 17:22:38 -08:00
bk-TurbaAI	35ffa682b1	[Docs] hint to enable use of GPU performance counters in profiling tools for multi-node distributed serving (#11235 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-12-16 22:20:39 +00:00
Jani Monoses	bddbbcb132	[Model] Support Cohere2ForCausalLM (Cohere R7B) (#11203 )	2024-12-16 09:56:19 +00:00
cennn	b3b1526f03	WIP: [CI/Build] simplify Dockerfile build for ARM64 / GH200 (#11212 ) Signed-off-by: drikster80 <ed.sealing@gmail.com> Co-authored-by: drikster80 <ed.sealing@gmail.com>	2024-12-16 09:20:49 +00:00
AlexHe99	da6f409246	Update deploying_with_k8s.rst (#10922 )	2024-12-15 16:33:58 -08:00
Kuntai Du	38e599d6a8	[Doc] add documentation for disaggregated prefilling (#11197 ) Signed-off-by: Kuntai Du <kuntai@uchicago.edu>	2024-12-15 13:31:16 -06:00
Jee Jee Li	15859f2357	[[Misc]Upgrade bitsandbytes to the latest version 0.45.0 (#11201 )	2024-12-15 03:03:06 +00:00
Russell Bryant	4863e5fba5	[Core] V1: Use multiprocessing by default (#11074 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-12-13 16:27:32 -08:00
Cyrus Leung	0920ab9131	[Doc] Reorganize online pooling APIs (#11172 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-14 00:22:22 +08:00
Cyrus Leung	eeec9e3390	[Frontend] Separate pooling APIs in offline inference (#11129 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-13 10:40:07 +00:00
Jani Monoses	7cd7409142	PaliGemma 2 support (#11142 )	2024-12-13 07:40:07 +00:00
Ramon Ziai	d4d5291cc2	fix(docs): typo in helm install instructions (#11141 ) Signed-off-by: Ramon Ziai <ramon.ziai@bettermarks.com>	2024-12-12 17:36:32 +00:00
Pooya Davoodi	1da8f0e1dd	[Model] Add support for embedding model GritLM (#10816 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>	2024-12-12 06:39:16 +00:00
Yuan Tang	24a36d6d5f	Update link to LlamaStack remote vLLM guide in serving_with_llamastack.rst (#11112 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2024-12-12 02:39:21 +00:00
bingps	fd22220687	[Doc] Installed version of llmcompressor for int8/fp8 quantization (#11103 ) Signed-off-by: Guangda Liu <bingps@users.noreply.github.com> Co-authored-by: Guangda Liu <bingps@users.noreply.github.com>	2024-12-11 15:43:24 +00:00
Cyrus Leung	cad5c0a6ed	[Doc] Update docs to refer to pooling models (#11093 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-11 13:36:27 +00:00
Cyrus Leung	8f10d5e393	[Misc] Split up pooling tasks (#10820 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-11 01:28:00 -08:00
Mor Zusman	ffa48c9146	[Model] PP support for Mamba-like models (#10992 ) Signed-off-by: mzusman <mor.zusmann@gmail.com>	2024-12-10 21:53:37 -05:00
Maxime Fournioux	fe2e10c71b	Add example of helm chart for vllm deployment on k8s (#9199 ) Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>	2024-12-10 09:19:27 +00:00
Michael Goin	6d525288c1	[Docs] Add dedicated tool calling page to docs (#10554 ) Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-12-09 20:15:34 -05:00
Roger Wang	af7c4a92e6	[Doc][V1] Add V1 support column for multimodal models (#10998 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2024-12-08 22:29:16 -08:00
Cyrus Leung	c889d5888b	[Doc] Explicitly state that PP isn't compatible with speculative decoding yet (#10975 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-07 17:20:49 +00:00
Cyrus Leung	39e227c7ae	[Model] Update multi-modal processor to support Mantis(LLaVA) model (#10711 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-07 17:10:05 +00:00
Cyrus Leung	1c768fe537	[Doc] Explicitly state that InternVL 2.5 is supported (#10978 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-07 16:58:02 +00:00
Sam Stoelinga	7406274041	[Doc] add KubeAI to serving integrations (#10837 ) Signed-off-by: Sam Stoelinga <sammiestoel@gmail.com>	2024-12-06 17:03:56 +00:00
Cyrus Leung	aa39a8e175	[Doc] Create a new "Usage" section (#10827 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-05 11:19:35 +08:00
Daniele	e4c34c23de	[CI/Build] improve python-only dev setup (#9621 ) Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-12-04 21:48:13 +00:00
Kevin H. Luu	c92acb9693	[ci/build] Update vLLM postmerge ECR repo (#10887 )	2024-12-04 09:01:20 +00:00
Aaron Pham	9323a3153b	[Core][Performance] Add XGrammar support for guided decoding and set it as default (#10785 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2024-12-03 15:17:00 +08:00
Russell Bryant	ef51831ee8	[Doc] Add github links for source code references (#10672 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-03 06:46:07 +00:00
youkaichao	169a0ff911	[doc] add warning about comparing hf and vllm outputs (#10805 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-01 00:41:38 -08:00
Cyrus Leung	133707123e	[Model] Replace embedding models with pooling adapter (#10769 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-01 08:02:54 +08:00
wangxiyuan	7e4bbda573	[doc] format fix (#10789 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2024-11-30 11:38:40 +00:00
Isotr0py	c83919c7a6	[Model] Add Internlm2 LoRA support (#5064 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2024-11-28 17:29:04 +00:00
sixgod	5fc5ce0fe4	[Model] Added GLM-4 series hf format model support vllm==0.6.4 (#10561 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-11-28 14:53:31 +00:00
罗泽轩	278be671a3	[Doc] Update model in arch_overview.rst to match comment (#10701 ) Signed-off-by: spacewander <spacewanderlzx@gmail.com>	2024-11-27 23:58:39 -08:00
shunxing12345	1209261e93	[Model] Support telechat2 (#10311 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: xiangw2 <xiangw2@chinatelecom.cn> Co-authored-by: Isotr0py <2037008807@qq.com>	2024-11-27 11:32:35 +00:00
Murali Andoorveedu	db66e018ea	[Bugfix] Fix for Spec model TP + Chunked Prefill (#10232 ) Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com> Signed-off-by: Sourashis Roy <sroy@roblox.com> Co-authored-by: Sourashis Roy <sroy@roblox.com>	2024-11-26 09:11:16 -08:00
Sage Moore	9a88f89799	custom allreduce + torch.compile (#10121 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-11-25 22:00:16 -08:00
Sanket Kale	a6760f6456	[Feature] vLLM ARM Enablement for AARCH64 CPUs (#9228 ) Signed-off-by: Sanket Kale <sanketk.kale@fujitsu.com> Co-authored-by: Sanket Kale <sanketk.kale@fujitsu.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2024-11-25 18:32:39 -08:00
Shane A	9db713a1dc	[Model] Add OLMo November 2024 model (#10503 )	2024-11-25 17:26:40 -05:00
Cyrus Leung	1b583cfefa	[Doc] Fix typos in docs (#10636 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-25 10:15:45 -08:00
zhou fan	b1d920531f	[Model]: Add support for Aria model (#10514 ) Signed-off-by: xffxff <1247714429@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2024-11-25 18:10:55 +00:00
fzyzcjy	2b0879bfc2	Super tiny little typo fix (#10633 )	2024-11-25 13:08:30 +00:00
Cyrus Leung	ed46f14321	[Model] Support `is_causal` HF config field for Qwen2 model (#10621 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-25 09:51:20 +00:00
Cyrus Leung	a30a605d21	[Doc] Add encoder-based models to Supported Models page (#10616 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-25 06:34:07 +00:00
Maximilien de Bayser	214efc2c3c	Support Cross encoder models (#10400 ) Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Flavia Beo <flavia.beo@ibm.com> Co-authored-by: Flavia Beo <flavia.beo@ibm.com>	2024-11-24 18:56:20 -08:00
youkaichao	e4fbb14414	[doc] update the code to add models (#10603 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-11-24 11:21:40 -08:00
Michael Goin	9afa014552	Add small example to metrics.rst (#10550 )	2024-11-21 23:43:43 +00:00
Li, Jiang	63f1fde277	[Hardware][CPU] Support chunked-prefill and prefix-caching on CPU (#10355 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2024-11-20 10:57:39 +00:00
wchen61	7629a9c6e5	[CI/Build] Support compilation with local cutlass path (#10423 ) (#10424 )	2024-11-19 21:35:50 -08:00
Cyrus Leung	b4be5a8adb	[Bugfix] Enforce no chunked prefill for embedding models (#10470 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-20 05:12:51 +00:00
Russell Bryant	5390d6664f	[Doc] Add the start of an arch overview page (#10368 )	2024-11-19 09:52:11 +00:00
Michael Goin	74f8c2cf5f	Add openai.beta.chat.completions.parse example to structured_outputs.rst (#10433 )	2024-11-19 04:37:46 +00:00
Yan Ma	6b2d25efc7	[Hardware][XPU] AWQ/GPTQ support for xpu backend (#10107 ) Signed-off-by: yan ma <yan.ma@intel.com>	2024-11-18 11:18:05 -07:00
ismael-dm	31894a2155	[Doc] Add documentation for Structured Outputs (#9943 ) Signed-off-by: ismael-dm <ismaeldm99@gmail.com>	2024-11-18 09:52:12 -08:00
B-201	4186be8111	[Doc] Update doc for LoRA support in GLM-4V (#10425 ) Signed-off-by: B-201 <Joy25810@foxmail.com>	2024-11-18 15:08:30 +00:00
youkaichao	755b85359b	[doc] add doc for the plugin system (#10372 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-15 21:46:27 -08:00
Cyrus Leung	32e46e000f	[Frontend] Automatic detection of chat content format from AST (#9919 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-16 13:35:40 +08:00
Michael Green	4f168f69a3	[Docs] Misc updates to TPU installation instructions (#10165 )	2024-11-15 13:26:17 -08:00
Russell Bryant	3e8d14d8a1	[Doc] Move PR template content to docs (#10159 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-11-15 13:20:20 -08:00
Simon Mo	c76ac49d26	[Docs] Add Nebius as sponsors (#10371 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2024-11-15 12:47:40 -08:00
Cyrus Leung	2ac6d0e75b	[Misc] Consolidate pooler config overrides (#10351 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-15 06:59:00 +00:00
Cyrus Leung	b40cf6402e	[Model] Support Qwen2 embeddings and use tags to select model tests (#10184 )	2024-11-14 20:23:09 -08:00
Woosuk Kwon	1dbae0329c	[Docs] Publish meetup slides (#10331 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-11-14 16:19:38 +00:00
Mike Depinet	f67ce05d0b	[Frontend] Pythonic tool parser (#9859 ) Signed-off-by: Mike Depinet <mike@fixie.ai>	2024-11-14 04:14:34 +00:00
youkaichao	504ac53d18	[misc] error early for old-style class (#10304 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-13 18:55:39 -08:00
Cyrus Leung	0b8bb86bf1	[1/N] Initial prototype for multi-modal processor (#10044 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-13 12:39:03 +00:00
B-201	d909acf9fe	[Model][LoRA]LoRA support added for idefics3 (#10281 ) Signed-off-by: B-201 <Joy25810@foxmail.com>	2024-11-13 17:25:59 +08:00
Austin Veselka	1b886aa104	[Model] Adding Support for Qwen2VL as an Embedding Model. Using MrLight/dse-qwen2-2b-mrl-v1 (#9944 ) Signed-off-by: FurtherAI <austin.veselka@lighton.ai> Co-authored-by: FurtherAI <austin.veselka@lighton.ai>	2024-11-13 08:28:13 +00:00
电脑星人	3945c82346	[Model] Add support for Qwen2-VL video embeddings input & multiple image embeddings input with varied resolutions (#10221 ) Signed-off-by: imkero <kerorek@outlook.com>	2024-11-13 07:07:22 +00:00
youkaichao	377b74fe87	Revert "[ci][build] limit cmake version" (#10271 )	2024-11-12 15:06:48 -08:00
youkaichao	18081451f9	[doc] improve debugging doc (#10270 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-12 14:43:52 -08:00
youkaichao	96ae0eaeb2	[doc] fix location of runllm widget (#10266 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-12 14:34:39 -08:00
Guillaume Calmettes	36c513a076	[BugFix] Do not raise a `ValueError` when `tool_choice` is set to the supported `none` option and `tools` are not defined. (#10000 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2024-11-12 11:13:46 +00:00
youkaichao	3a28f18b0b	[doc] explain the class hierarchy in vLLM (#10240 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-11 22:56:44 -08:00
youkaichao	d1c6799b88	[doc] update debugging guide (#10236 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-11 15:21:12 -08:00
Yuan Tang	4800339c62	Add docs on serving with Llama Stack (#10183 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2024-11-11 11:28:55 -08:00
youkaichao	f0f2e5638e	[doc] improve debugging code (#10206 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-10 17:49:40 -08:00
Shawn Du	20cf2f553c	[Misc] small fixes to function tracing file path (#9543 ) Signed-off-by: Shawn Du <shawnd200@outlook.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-11-10 15:21:06 -08:00
Yongzao	bfb7d61a7c	[doc] Polish the integration with huggingface doc (#10195 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-11-10 10:22:04 -08:00
youkaichao	9fa4bdde9d	[ci][build] limit cmake version (#10188 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-09 16:27:26 -08:00
cjackal	d88bff1b96	[Frontend] add `add_request_id` middleware (#9594 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2024-11-09 10:18:29 +00:00
youkaichao	8a4358ecb5	[doc] explaining the integration with huggingface (#10173 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-09 01:02:54 -08:00
Cyrus Leung	49d2a41a86	[Doc] Adjust RunLLM location (#10176 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-08 20:07:10 -08:00
Cyrus Leung	e0191a95d8	[0/N] Rename `MultiModalInputs` to `MultiModalKwargs` (#10040 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-09 11:31:02 +08:00
Rafael Vasquez	6b30471586	[Misc] Improve Web UI (#10090 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-11-08 09:51:04 -08:00
Russell Bryant	3a7f15a398	[Doc] Move CONTRIBUTING to docs site (#9924 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-11-08 05:15:12 +00:00
whyiug	40d0e7411d	[Doc] Update FAQ links in spec_decode.rst (#9662 ) Signed-off-by: whyiug <whyiug@hotmail.com>	2024-11-08 04:44:58 +00:00

... 3 4 5 6 7 ...

900 Commits