Aaron Pham
|
2fc9075b82
|
[V1] Structured Outputs + Thinking compatibility (#16577)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-14 15:45:24 -07:00 |
Jonathan Berkhahn
|
98ea35601c
|
[Lora][Frontend]Add default local directory LoRA resolver plugin. (#16855)
Signed-off-by: jberkhahn <jaberkha@us.ibm.com>
|
2025-05-12 10:39:10 -07:00 |
Xu Wenqing
|
3a5ea75129
|
[Feature] Support DeepSeekV3 Function Call (#17784)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
Signed-off-by: Xu Wenqing <xuwq1993@qq.com>
|
2025-05-12 00:45:21 -07:00 |
Reid
|
d1110f5b5a
|
[doc] update lora doc (#17936)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-11 15:56:21 +08:00 |
Michael Yao
|
89a0315f4c
|
[Doc] Update several links in reasoning_outputs.md (#17846)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-05-09 01:20:55 -07:00 |
Reid
|
7377dd0307
|
[doc] update the issue link (#17782)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-07 20:29:05 +08:00 |
Michael Yao
|
0d115460a7
|
[Docs] Use gh-file to add links to tool_calling.md (#17709)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-05-06 15:27:19 +00:00 |
Michael Goin
|
98834fefaa
|
Update nm to rht in doc links + refine fp8 doc (#17678)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-06 00:41:14 +00:00 |
Harry Mellor
|
d6484ef3c3
|
Add full API docs and improve the UX of navigating them (#17485)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-03 19:42:43 -07:00 |
Zhiyu
|
182f40ea8b
|
Add NVIDIA TensorRT Model Optimizer in vLLM documentation (#17561)
|
2025-05-02 11:36:46 -07:00 |
Reid
|
3a500cd0b6
|
[doc] miss result (#17589)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-02 07:04:49 -07:00 |
Reid
|
6d1479ca4b
|
[doc] add the print result (#17584)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-02 05:24:45 -07:00 |
Chauncey
|
98060b001d
|
[Feature][Frontend]: Deprecate --enable-reasoning (#17452)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-05-01 06:46:16 -07:00 |
NaLan ZeYu
|
1144a8efe7
|
[Bugfix] Temporarily disable gptq_bitblas on ROCm (#17411)
Signed-off-by: Yan Cangang <nalanzeyu@gmail.com>
|
2025-04-30 19:51:45 -07:00 |
Reid
|
2ac74d098e
|
[doc] add install tips (#17373)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-30 17:02:41 +00:00 |
mofanke
|
a39203f99e
|
[Bugfix] add qwen3 reasoning-parser fix content is None when disable … (#17369)
Signed-off-by: mofanke <mofanke@gmail.com>
|
2025-04-29 16:32:40 +00:00 |
Reid
|
3ad986c28b
|
[doc] update wrong model id (#17287)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-28 04:20:51 -07:00 |
Russell Bryant
|
52b4f4a8d7
|
[Docs] Update structured output doc for V1 (#17135)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-26 15:12:18 +00:00 |
Reid
|
df5c879527
|
[doc] update wrong hf model links (#17184)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-25 16:40:54 +00:00 |
Michael Yao
|
f851b84266
|
[Doc] Add two links to disagg_prefill.md (#17168)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-04-25 10:23:57 +00:00 |
Michael Yao
|
ef19e67d2c
|
[Doc] Add headings to improve gptqmodel.md (#17164)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-04-25 01:13:13 -07:00 |
Maximilien de Bayser
|
05e1fbfc52
|
Add chat template for Llama 4 models (#16428)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-04-24 20:19:36 +00:00 |
Reid
|
9c1244de57
|
[doc] update to hyperlink (#17096)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-24 00:58:08 -07:00 |
Reid
|
db2f8d915c
|
[V1] Update structured output (#16812)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-23 23:57:17 -07:00 |
Michael Yao
|
f7912cba3d
|
[Doc] Add top anchor and a note to quantization/bitblas.md (#17042)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-04-23 07:32:16 -07:00 |
Lei Wang
|
8d32dc603d
|
[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS (#6036)
Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>
Co-authored-by: xinyuxiao <xinyuxiao2024@gmail.com>
|
2025-04-22 09:01:36 +01:00 |
Justin Ho
|
490b1698a5
|
[Doc] Updated Llama section in tool calling docs to have llama 3.2 config info (#16857)
Signed-off-by: jmho <jaylenho734@gmail.com>
|
2025-04-18 23:28:53 +00:00 |
Angky William
|
fdcb850f14
|
[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server (#10546)
Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
|
2025-04-15 22:31:38 +00:00 |
Ye (Charlotte) Qi
|
16eda8c43a
|
[Frontend] Added chat templates for LLaMa4 pythonic tool calling (#16463)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Co-authored-by: Kai Wu <kaiwu@meta.com>
|
2025-04-12 06:26:17 +08:00 |
Michael Goin
|
ed37599544
|
Update supported_hardware.md for TPU INT8 (#16437)
|
2025-04-11 12:28:07 +08:00 |
Driss Guessous
|
652907b354
|
Torchao (#14231)
Signed-off-by: drisspg <drisspguessous@gmail.com>
|
2025-04-07 19:39:28 -04:00 |
yihong
|
95d63f38c0
|
doc: fix some typos in doc (#16154)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-04-07 05:32:06 +00:00 |
Tristan Leclercq
|
4285e423a6
|
[Misc] Auto detect bitsandbytes pre-quantized models (#16027)
Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com>
|
2025-04-04 23:30:45 -07:00 |
Matthias Matt
|
cefb9e5a28
|
[Frontend] Implement Tool Calling with `tool_choice='required'` (#13483)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
Signed-off-by: Matt, Matthias <matthias.matt@tuwien.ac.at>
Co-authored-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2025-04-02 07:45:45 -07:00 |
chaow-amd
|
2041c0e360
|
[Doc] Quark quantization documentation (#15861)
Signed-off-by: chaow <chaow@amd.com>
|
2025-04-01 08:32:45 -07:00 |
shangmingc
|
239b7befdd
|
[V1][Spec Decode] Remove deprecated spec decode config params (#15466)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-03-31 09:19:35 -07:00 |
Ce Gao
|
762b424a52
|
[Docs] Document v0 engine support in reasoning outputs (#15739)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
|
2025-03-29 03:46:57 +00:00 |
Alex Brooks
|
1711b929b6
|
[Model] Add Reasoning Parser for Granite Models (#14202)
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
|
2025-03-26 14:28:07 +00:00 |
Jee Jee Li
|
3892e58ad7
|
[Misc] Upgrade BNB version (#15183)
|
2025-03-24 05:51:42 +00:00 |
Robin
|
d6cd59f122
|
[Frontend] Support tool calling and reasoning parser (#14511)
Signed-off-by: WangErXiao <863579016@qq.com>
|
2025-03-23 14:00:07 -07:00 |
shangmingc
|
50c9636d87
|
[V1][Usage] Refactor speculative decoding configuration and tests (#14434)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-03-22 19:28:10 -10:00 |
Jee Jee Li
|
10f55fe6c5
|
[Misc] Clean up the BitsAndBytes arguments (#15140)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-20 19:17:12 -07:00 |
Bryan Lu
|
9ed6ee92d6
|
[Bugfix] EAGLE output norm bug (#14464)
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
|
2025-03-15 06:50:33 +00:00 |
yasu52
|
3fb17d26c8
|
[Doc] Fix typo in documentation (#14783)
Signed-off-by: yasu52 <tsuguro4649@gmail.com>
|
2025-03-13 20:33:09 -07:00 |
Robin
|
c908a07f57
|
[Doc] Added QwQ-32B to the supported models list in the reasoning out… (#14479)
Signed-off-by: WangErXiao <863579016@qq.com>
|
2025-03-08 07:07:32 +00:00 |
Robin
|
7b6fd6e486
|
[Doc]add doc for Qwen models tool calling (#14478)
Signed-off-by: WangErXiao <863579016@qq.com>
|
2025-03-08 06:58:46 +00:00 |
Yanyi Liu
|
0ddc991f5c
|
[Doc] Update reasoning with stream example to use OpenAI library (#14077)
Signed-off-by: liuyanyi <wolfsonliu@163.com>
|
2025-03-06 13:20:37 +00:00 |
Ce Gao
|
f5f7f00cd9
|
[Bugfix][Structured Output] Support outlines engine with reasoning outputs for DeepSeek R1 (#14114)
|
2025-03-06 03:49:20 +00:00 |
Qubitium-ModelCloud
|
cd1d3c3df8
|
[Docs] Add GPTQModel (#14056)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-03-03 21:59:09 +00:00 |
Harry Mellor
|
cf069aa8aa
|
Update deprecated Python 3.8 typing (#13971)
|
2025-03-02 17:34:51 -08:00 |