Harry Mellor
3d925165f2
Add example scripts to documentation ( #4225 )
...
Co-authored-by: Harry Mellor <hmellor@oxts.com>
2024-04-22 16:36:54 +00:00
xiaoji
7f2593b164
[Doc]: Update the doc of adding new models ( #4236 )
2024-04-21 09:57:08 -07:00
Harry Mellor
fe7d648fe5
Don't show default value for flags in `EngineArgs` ( #4223 )
...
Co-authored-by: Harry Mellor <hmellor@oxts.com>
2024-04-21 09:15:28 -07:00
Harry Mellor
682789d402
Fix missing docs and out of sync `EngineArgs` ( #4219 )
...
Co-authored-by: Harry Mellor <hmellor@oxts.com>
2024-04-19 20:51:33 -07:00
Simon Mo
705578ae14
[Docs] document that Meta Llama 3 is supported ( #4175 )
2024-04-18 10:55:48 -07:00
Sanger Steel
d619ae2d19
[Doc] Add better clarity for tensorizer usage ( #4090 )
...
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-04-15 13:28:25 -07:00
Simon Mo
aceb17cf2d
[Docs] document that mixtral 8x22b is supported ( #4073 )
2024-04-14 14:35:55 -07:00
Sanger Steel
711a000255
[Frontend] [Core] feat: Add model loading using `tensorizer` ( #3476 )
2024-04-13 17:13:01 -07:00
Michael Feil
c2b4a1bce9
[Doc] Add typing hints / mypy types cleanup ( #3816 )
...
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-04-11 17:17:21 -07:00
youkaichao
f3d0bf7589
[Doc][Installation] delete python setup.py develop ( #3989 )
2024-04-11 03:33:02 +00:00
Frαnçois
92cd2e2f21
[Doc] Fix getting stared to use publicly available model ( #3963 )
2024-04-10 18:05:52 +00:00
youkaichao
e35397468f
[Doc] Add doc to state our model support policy ( #3948 )
...
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-04-10 17:03:02 +00:00
ywfang
b4543c8f6b
[Model] add minicpm ( #3893 )
2024-04-08 18:28:36 +08:00
youkaichao
95baec828f
[Core] enable out-of-tree model register ( #3871 )
2024-04-06 17:11:41 -07:00
youkaichao
d03d64fd2e
[CI/Build] refactor dockerfile & fix pip cache
...
[CI/Build] fix pip cache with vllm_nccl & refactor dockerfile to build wheels (#3859 )
2024-04-04 21:53:16 -07:00
Sean Gallen
78107fa091
[Doc]Add asynchronous engine arguments to documentation. ( #3810 )
...
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-04-04 21:52:01 -07:00
Adrian Abeyta
2ff767b513
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) ( #3290 )
...
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu>
Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com>
Co-authored-by: guofangze <guofangze@kuaishou.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-04-03 14:15:55 -07:00
Roger Wang
3bec41f41a
[Doc] Fix vLLMEngine Doc Page ( #3791 )
2024-04-02 09:49:37 -07:00
bigPYJ1151
0e3f06fe9c
[Hardware][Intel] Add CPU inference backend ( #3634 )
...
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
2024-04-01 22:07:30 -07:00
youkaichao
9c82a1bec3
[Doc] Update installation doc ( #3746 )
...
[Doc] Update installation doc for build from source and explain the dependency on torch/cuda version (#3746 )
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-03-30 16:34:38 -07:00
yhu422
d8658c8cc1
Usage Stats Collection ( #2852 )
2024-03-28 22:16:12 -07:00
wenyujin333
d6ea427f04
[Model] Add support for Qwen2MoeModel ( #3346 )
2024-03-28 15:19:59 +00:00
Woosuk Kwon
6d9aa00fc4
[Docs] Add Command-R to supported models ( #3669 )
2024-03-27 15:20:00 -07:00
Megha Agarwal
e24336b5a7
[Model] Add support for DBRX ( #3660 )
2024-03-27 13:01:46 -07:00
Woosuk Kwon
e66b629c04
[Misc] Minor fix in KVCache type ( #3652 )
2024-03-26 23:14:06 -07:00
Jee Li
76879342a3
[Doc]add lora support ( #3649 )
2024-03-27 02:06:46 +00:00
SangBin Cho
01bfb22b41
[CI] Try introducing isort. ( #3495 )
2024-03-25 07:59:47 -07:00
youkaichao
42bc386129
[CI/Build] respect the common environment variable MAX_JOBS ( #3600 )
2024-03-24 17:04:00 -07:00
Lalit Pradhan
4c07dd28c0
[ 🚀 Ready to be merged] Added support for Jais models ( #3183 )
2024-03-21 09:45:24 +00:00
Jim Burtoft
63e8b28a99
[Doc] minor fix of spelling in amd-installation.rst ( #3506 )
2024-03-19 20:32:30 +00:00
Jim Burtoft
2a60c9bd17
[Doc] minor fix to neuron-installation.rst ( #3505 )
2024-03-19 13:21:35 -07:00
Simon Mo
ef65dcfa6f
[Doc] Add docs about OpenAI compatible server ( #3288 )
2024-03-18 22:05:34 -07:00
laneeee
8fa7357f2d
fix document error for value and v_vec illustration ( #3421 )
2024-03-15 16:06:09 -07:00
Sherlock Xu
b0925b3878
docs: Add BentoML deployment doc ( #3336 )
...
Signed-off-by: Sherlock113 <sherlockxu07@gmail.com>
2024-03-12 10:34:30 -07:00
Zhuohan Li
4c922709b6
Add distributed model executor abstraction ( #3191 )
2024-03-11 11:03:45 -07:00
Philipp Moritz
657061fdce
[docs] Add LoRA support information for models ( #3299 )
2024-03-11 00:54:51 -07:00
Roger Wang
99c3cfb83c
[Docs] Fix Unmocked Imports ( #3275 )
2024-03-08 09:58:01 -08:00
Jialun Lyu
27a7b070db
Add document for vllm paged attention kernel. ( #2978 )
2024-03-04 09:23:34 -08:00
Liangfu Chen
d0fae88114
[DOC] add setup document to support neuron backend ( #2777 )
2024-03-04 01:03:51 +00:00
Sage Moore
ce4f5a29fb
Add Automatic Prefix Caching ( #2762 )
...
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-03-02 00:50:01 -08:00
Yuan Tang
49d849b3ab
docs: Add tutorial on deploying vLLM model with KServe ( #2586 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2024-03-01 11:04:14 -08:00
Ganesh Jagadeesan
a8683102cc
multi-lora documentation fix ( #3064 )
2024-02-27 21:26:15 -08:00
Woosuk Kwon
8b430d7dea
[Minor] Fix StableLMEpochForCausalLM -> StableLmForCausalLM ( #3046 )
2024-02-26 20:23:50 -08:00
张大成
48a8f4a7fd
Support Orion model ( #2539 )
...
Co-authored-by: zhangdacheng <zhangdacheng@ainirobot.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-02-26 19:17:06 -08:00
Harry Mellor
ef978fe411
Port metrics from `aioprometheus` to `prometheus_client` ( #2730 )
2024-02-25 11:54:00 -08:00
Zhuohan Li
a9c8212895
[FIX] Add Gemma model to the doc ( #2966 )
2024-02-21 09:46:15 -08:00
Isotr0py
ab3a5a8259
Support OLMo models. ( #2832 )
2024-02-18 21:05:15 -08:00
jvmncs
8f36444c4f
multi-LoRA as extra models in OpenAI server ( #2775 )
...
how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py )):
```terminal
$ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
$ python -m vllm.entrypoints.api_server \
--model meta-llama/Llama-2-7b-hf \
--enable-lora \
--lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH
```
the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs
no work has been done here to scope client permissions to specific models
2024-02-17 12:00:48 -08:00
Philipp Moritz
317b29de0f
Remove Yi model definition, please use `LlamaForCausalLM` instead ( #2854 )
...
Co-authored-by: Roy <jasonailu87@gmail.com>
2024-02-13 14:22:22 -08:00
Simon Mo
f964493274
[CI] Ensure documentation build is checked in CI ( #2842 )
2024-02-12 22:53:07 -08:00
Philipp Moritz
4ca2c358b1
Add documentation section about LoRA ( #2834 )
2024-02-12 17:24:45 +01:00
Hongxia Yang
0580aab02f
[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention ( #2768 )
2024-02-10 23:14:37 -08:00
Philipp Moritz
931746bc6d
Add documentation on how to do incremental builds ( #2796 )
2024-02-07 14:42:02 -08:00
Massimiliano Pronesti
5ed704ec8c
docs: fix langchain ( #2736 )
2024-02-03 18:17:55 -08:00
Fengzhe Zhou
cd9e60c76c
Add Internlm2 ( #2666 )
2024-02-01 09:27:40 -08:00
Zhuohan Li
1af090b57d
Bump up version to v0.3.0 ( #2656 )
2024-01-31 00:07:07 -08:00
zhaoyang-star
9090bf02e7
Support FP8-E5M2 KV Cache ( #2279 )
...
Co-authored-by: zhaoyang <zhao.yang16@zte.com.cn>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-01-28 16:43:54 -08:00
Hongxia Yang
6b7de1a030
[ROCm] add support to ROCm 6.0 and MI300 ( #2274 )
2024-01-26 12:41:10 -08:00
Junyang Lin
2832e7b9f9
fix names and license for Qwen2 ( #2589 )
2024-01-24 22:37:51 -08:00
LastWhisper
223c19224b
Fix the syntax error in the doc of supported_models ( #2584 )
2024-01-24 11:22:51 -08:00
Erfan Al-Hossami
9c1352eb57
[Feature] Simple API token authentication and pluggable middlewares ( #1106 )
2024-01-23 15:13:00 -08:00
Junyang Lin
94b5edeb53
Add qwen2 ( #2495 )
2024-01-22 14:34:21 -08:00
Hyunsung Lee
e1957c6ebd
Add StableLM3B model ( #2372 )
2024-01-16 20:32:40 -08:00
Simon
827cbcd37c
Update quickstart.rst ( #2369 )
2024-01-12 12:56:18 -08:00
Zhuohan Li
f745847ef7
[Minor] Fix the format in quick start guide related to Model Scope ( #2425 )
2024-01-11 19:44:01 -08:00
Jiaxiang
6549aef245
[DOC] Add additional comments for LLMEngine and AsyncLLMEngine ( #1011 )
2024-01-11 19:26:49 -08:00
Zhuohan Li
fd4ea8ef5c
Use NCCL instead of ray for control-plane communication to remove serialization overhead ( #2221 )
2024-01-03 11:30:22 -08:00
Shivam Thakkar
1db83e31a2
[Docs] Update installation instructions to include CUDA 11.8 xFormers ( #2246 )
2023-12-22 23:20:02 -08:00
Ronen Schaffer
c17daa9f89
[Docs] Fix broken links ( #2222 )
2023-12-20 12:43:42 -08:00
avideci
de60a3fb93
Added DeciLM-7b and DeciLM-7b-instruct ( #2062 )
2023-12-19 02:29:33 -08:00
kliuae
1b7c791d60
[ROCm] Fixes for GPTQ on ROCm ( #2180 )
2023-12-18 10:41:04 -08:00
Suhong Moon
3ec8c25cd0
[Docs] Update documentation for gpu-memory-utilization option ( #2162 )
2023-12-17 10:51:57 -08:00
Woosuk Kwon
f8c688d746
[Minor] Add Phi 2 to supported models ( #2159 )
2023-12-17 02:54:57 -08:00
Woosuk Kwon
26c52a5ea6
[Docs] Add CUDA graph support to docs ( #2148 )
2023-12-17 01:49:20 -08:00
Woosuk Kwon
b81a6a6bb3
[Docs] Add supported quantization methods to docs ( #2135 )
2023-12-15 13:29:22 -08:00
Antoni Baum
21d93c140d
Optimize Mixtral with expert parallelism ( #2090 )
2023-12-13 23:55:07 -08:00
Woosuk Kwon
096827c284
[Docs] Add notes on ROCm-supported models ( #2087 )
2023-12-13 09:45:34 -08:00
Woosuk Kwon
6565d9e33e
Update installation instruction for vLLM + CUDA 11.8 ( #2086 )
2023-12-13 09:25:59 -08:00
TJian
f375ec8440
[ROCm] Upgrade xformers version for ROCm & update doc ( #2079 )
...
Co-authored-by: miloice <jeffaw99@hotmail.com>
2023-12-13 00:56:05 -08:00
Ikko Eltociear Ashimine
c0ce15dfb2
Update run_on_sky.rst ( #2025 )
...
sharable -> shareable
2023-12-11 10:32:58 -08:00
Woosuk Kwon
4ff0203987
Minor fixes for Mixtral ( #2015 )
2023-12-11 09:16:15 -08:00
Simon Mo
c85b80c2b6
[Docker] Add cuda arch list as build option ( #1950 )
2023-12-08 09:53:47 -08:00
TJian
6ccc0bfffb
Merge EmbeddedLLM/vllm-rocm into vLLM main ( #1836 )
...
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Amir Balwel <amoooori04@gmail.com>
Co-authored-by: root <kuanfu.liu@akirakan.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: kuanfu <kuanfu.liu@embeddedllm.com>
Co-authored-by: miloice <17350011+kliuae@users.noreply.github.com>
2023-12-07 23:16:52 -08:00
AguirreNicolas
24f60a54f4
[Docker] Adding number of nvcc_threads during build as envar ( #1893 )
2023-12-07 11:00:32 -08:00
gottlike
42c02f5892
Fix quickstart.rst typo jinja ( #1964 )
2023-12-07 08:34:44 -08:00
Peter Götz
d940ce497e
Fix typo in adding_model.rst ( #1947 )
...
adpated -> adapted
2023-12-06 10:04:26 -08:00
Massimiliano Pronesti
c07a442854
chore(examples-docs): upgrade to OpenAI V1 ( #1785 )
2023-12-03 01:11:22 -08:00
Simon Mo
5313c2cb8b
Add Production Metrics in Prometheus format ( #1890 )
2023-12-02 16:37:44 -08:00
Simon Mo
4cefa9b49b
[Docs] Update the AWQ documentation to highlight performance issue ( #1883 )
2023-12-02 15:52:47 -08:00
Woosuk Kwon
e5452ddfd6
Normalize head weights for Baichuan 2 ( #1876 )
2023-11-30 20:03:58 -08:00
Adam Brusselback
66785cc05c
Support chat template and `echo` for chat API ( #1756 )
2023-11-30 16:43:13 -08:00
Massimiliano Pronesti
05a38612b0
docs: add instruction for langchain ( #1162 )
2023-11-30 10:57:44 -08:00
Simon Mo
0f621c2c7d
[Docs] Add information about using shared memory in docker ( #1845 )
2023-11-29 18:33:56 -08:00
Casper
a921d8be9d
[DOCS] Add engine args documentation ( #1741 )
2023-11-22 12:31:27 -08:00
Wen Sun
112627e8b2
[Docs] Fix the code block's format in deploying_with_docker page ( #1722 )
2023-11-20 01:22:39 -08:00
Simon Mo
37c1e3c218
Documentation about official docker image ( #1709 )
2023-11-19 20:56:26 -08:00
Woosuk Kwon
06e9ebebd5
Add instructions to install vLLM+cu118 ( #1717 )
2023-11-18 23:48:58 -08:00
liuyhwangyh
edb305584b
Support download models from www.modelscope.cn ( #1588 )
2023-11-17 20:38:31 -08:00
Zhuohan Li
0fc280b06c
Update the adding-model doc according to the new refactor ( #1692 )
2023-11-16 18:46:26 -08:00
Zhuohan Li
415d109527
[Fix] Update Supported Models List ( #1690 )
2023-11-16 14:47:26 -08:00
Casper
8516999495
Add Quantization and AutoAWQ to docs ( #1235 )
2023-11-04 22:43:39 -07:00
Stephen Krider
9cabcb7645
Add Dockerfile ( #1350 )
2023-10-31 12:36:47 -07:00
Zhuohan Li
9eed4d1f3e
Update README.md ( #1292 )
2023-10-08 23:15:50 -07:00
Usama Ahmed
0967102c6d
fixing typo in `tiiuae/falcon-rw-7b` model name ( #1226 )
2023-09-29 13:40:25 -07:00
Woosuk Kwon
202351d5bf
Add Mistral to supported model list ( #1221 )
2023-09-28 14:33:04 -07:00
Nick Perez
4ee52bb169
Docs: Fix broken link to openai example ( #1145 )
...
Link to `openai_client.py` is no longer valid - updated to `openai_completion_client.py`
2023-09-22 11:36:09 -07:00
Woosuk Kwon
7d7e3b78a3
Use `--ipc=host` in docker run for distributed inference ( #1125 )
2023-09-21 18:26:47 -07:00
Tanmay Verma
6f2dd6c37e
Add documentation to Triton server tutorial ( #983 )
2023-09-20 10:32:40 -07:00
Woosuk Kwon
eda1a7cad3
Announce paper release ( #1036 )
2023-09-13 17:38:13 -07:00
Woosuk Kwon
b9cecc2635
[Docs] Update installation page ( #1005 )
2023-09-10 14:23:31 -07:00
Zhuohan Li
002800f081
Align vLLM's beam search implementation with HF generate ( #857 )
2023-09-04 17:29:42 -07:00
Woosuk Kwon
55b28b1eee
[Docs] Minor fixes in supported models ( #920 )
...
* Minor fix in supported models
* Add another small fix for Aquila model
---------
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-08-31 16:28:39 -07:00
Zhuohan Li
14f9c72bfd
Update Supported Model List ( #825 )
2023-08-22 11:51:44 -07:00
Uranus
1b151ed181
Fix baichuan doc style ( #748 )
2023-08-13 20:57:31 -07:00
Zhuohan Li
f7389f4763
[Doc] Add Baichuan 13B to supported models ( #656 )
2023-08-02 16:45:12 -07:00
Zhuohan Li
1b0bd0fe8a
Add Falcon support (new) ( #592 )
2023-08-02 14:04:39 -07:00
Zhuohan Li
df5dd3c68e
Add Baichuan-7B to README ( #494 )
2023-07-25 15:25:12 -07:00
Zhuohan Li
6fc2a38b11
Add support for LLaMA-2 ( #505 )
2023-07-20 11:38:27 -07:00
Zhanghao Wu
58df2883cb
[Doc] Add doc for running vLLM on the cloud ( #426 )
...
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-07-16 13:37:14 -07:00
Andre Slavescu
c894836108
[Model] Add support for GPT-J ( #226 )
...
Co-authored-by: woWoosuk Kwon <woosuk.kwon@berkeley.edu>
2023-07-08 17:55:16 -07:00
Woosuk Kwon
ffa6d2f9f9
[Docs] Fix typo ( #346 )
2023-07-03 16:51:47 -07:00
Woosuk Kwon
404422f42e
[Model] Add support for MPT ( #334 )
2023-07-03 16:47:53 -07:00
Woosuk Kwon
e41f06702c
Add support for BLOOM ( #331 )
2023-07-03 13:12:35 -07:00
Bayang
9d27b09d12
Update README.md ( #306 )
2023-06-29 06:52:15 -07:00
Zhuohan Li
2cf1a333b6
[Doc] Documentation for distributed inference ( #261 )
2023-06-26 11:34:23 -07:00
Woosuk Kwon
665c48963b
[Docs] Add GPTBigCode to supported models ( #213 )
2023-06-22 15:05:11 -07:00
Woosuk Kwon
794e578de0
[Minor] Fix URLs ( #166 )
2023-06-19 22:57:14 -07:00
Woosuk Kwon
caddfc14c1
[Minor] Fix icons in doc ( #165 )
2023-06-19 20:35:38 -07:00
Woosuk Kwon
b7e62d3454
Fix repo & documentation URLs ( #163 )
2023-06-19 20:03:40 -07:00
Woosuk Kwon
364536acd1
[Docs] Minor fix ( #162 )
2023-06-19 19:58:23 -07:00
Zhuohan Li
0b32a987dd
Add and list supported models in README ( #161 )
2023-06-20 10:57:46 +08:00
Zhuohan Li
a255885f83
Add logo and polish readme ( #156 )
2023-06-19 16:31:13 +08:00
Woosuk Kwon
dcda03b4cb
Write README and front page of doc ( #147 )
2023-06-18 03:19:38 -07:00
Zhuohan Li
bec7b2dc26
Add quickstart guide ( #148 )
2023-06-18 01:26:12 +08:00
Woosuk Kwon
0b98ba15c7
Change the name to vLLM ( #150 )
2023-06-17 03:07:40 -07:00
Woosuk Kwon
e38074b1e6
Support FP32 ( #141 )
2023-06-07 00:40:21 -07:00
Woosuk Kwon
376725ce74
[PyPI] Packaging for PyPI distribution ( #140 )
2023-06-05 20:03:14 -07:00
Woosuk Kwon
456941cfe4
[Docs] Write the `Adding a New Model` section ( #138 )
2023-06-05 20:01:26 -07:00
Woosuk Kwon
62ec38ea41
Document supported models ( #127 )
2023-06-02 22:35:17 -07:00
Woosuk Kwon
0eda2e0953
Add .readthedocs.yaml ( #136 )
2023-06-02 22:27:44 -07:00
Woosuk Kwon
56b7f0efa4
Add a doc for installation ( #128 )
2023-05-27 01:13:06 -07:00
Woosuk Kwon
19d2899439
Add initial sphinx docs ( #120 )
2023-05-22 17:02:44 -07:00