Commit Graph

1092 Commits

Author SHA1 Message Date
Philipp Moritz 4ca2c358b1
Add documentation section about LoRA (#2834) 2024-02-12 17:24:45 +01:00
Hongxia Yang 0580aab02f
[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention (#2768) 2024-02-10 23:14:37 -08:00
Philipp Moritz 931746bc6d
Add documentation on how to do incremental builds (#2796) 2024-02-07 14:42:02 -08:00
Massimiliano Pronesti 5ed704ec8c
docs: fix langchain (#2736) 2024-02-03 18:17:55 -08:00
Fengzhe Zhou cd9e60c76c
Add Internlm2 (#2666) 2024-02-01 09:27:40 -08:00
Zhuohan Li 1af090b57d
Bump up version to v0.3.0 (#2656) 2024-01-31 00:07:07 -08:00
zhaoyang-star 9090bf02e7
Support FP8-E5M2 KV Cache (#2279)
Co-authored-by: zhaoyang <zhao.yang16@zte.com.cn>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-01-28 16:43:54 -08:00
Hongxia Yang 6b7de1a030
[ROCm] add support to ROCm 6.0 and MI300 (#2274) 2024-01-26 12:41:10 -08:00
Junyang Lin 2832e7b9f9
fix names and license for Qwen2 (#2589) 2024-01-24 22:37:51 -08:00
LastWhisper 223c19224b
Fix the syntax error in the doc of supported_models (#2584) 2024-01-24 11:22:51 -08:00
Erfan Al-Hossami 9c1352eb57
[Feature] Simple API token authentication and pluggable middlewares (#1106) 2024-01-23 15:13:00 -08:00
Junyang Lin 94b5edeb53
Add qwen2 (#2495) 2024-01-22 14:34:21 -08:00
Hyunsung Lee e1957c6ebd
Add StableLM3B model (#2372) 2024-01-16 20:32:40 -08:00
Simon 827cbcd37c
Update quickstart.rst (#2369) 2024-01-12 12:56:18 -08:00
Zhuohan Li f745847ef7
[Minor] Fix the format in quick start guide related to Model Scope (#2425) 2024-01-11 19:44:01 -08:00
Jiaxiang 6549aef245
[DOC] Add additional comments for LLMEngine and AsyncLLMEngine (#1011) 2024-01-11 19:26:49 -08:00
Zhuohan Li fd4ea8ef5c
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
Shivam Thakkar 1db83e31a2
[Docs] Update installation instructions to include CUDA 11.8 xFormers (#2246) 2023-12-22 23:20:02 -08:00
Ronen Schaffer c17daa9f89
[Docs] Fix broken links (#2222) 2023-12-20 12:43:42 -08:00
avideci de60a3fb93
Added DeciLM-7b and DeciLM-7b-instruct (#2062) 2023-12-19 02:29:33 -08:00
kliuae 1b7c791d60
[ROCm] Fixes for GPTQ on ROCm (#2180) 2023-12-18 10:41:04 -08:00
Suhong Moon 3ec8c25cd0
[Docs] Update documentation for gpu-memory-utilization option (#2162) 2023-12-17 10:51:57 -08:00
Woosuk Kwon f8c688d746
[Minor] Add Phi 2 to supported models (#2159) 2023-12-17 02:54:57 -08:00
Woosuk Kwon 26c52a5ea6
[Docs] Add CUDA graph support to docs (#2148) 2023-12-17 01:49:20 -08:00
Woosuk Kwon b81a6a6bb3
[Docs] Add supported quantization methods to docs (#2135) 2023-12-15 13:29:22 -08:00
Antoni Baum 21d93c140d
Optimize Mixtral with expert parallelism (#2090) 2023-12-13 23:55:07 -08:00
Woosuk Kwon 096827c284
[Docs] Add notes on ROCm-supported models (#2087) 2023-12-13 09:45:34 -08:00
Woosuk Kwon 6565d9e33e
Update installation instruction for vLLM + CUDA 11.8 (#2086) 2023-12-13 09:25:59 -08:00
TJian f375ec8440
[ROCm] Upgrade xformers version for ROCm & update doc (#2079)
Co-authored-by: miloice <jeffaw99@hotmail.com>
2023-12-13 00:56:05 -08:00
Ikko Eltociear Ashimine c0ce15dfb2
Update run_on_sky.rst (#2025)
sharable -> shareable
2023-12-11 10:32:58 -08:00
Woosuk Kwon 4ff0203987
Minor fixes for Mixtral (#2015) 2023-12-11 09:16:15 -08:00
Simon Mo c85b80c2b6
[Docker] Add cuda arch list as build option (#1950) 2023-12-08 09:53:47 -08:00
TJian 6ccc0bfffb
Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836)
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Amir Balwel <amoooori04@gmail.com>
Co-authored-by: root <kuanfu.liu@akirakan.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: kuanfu <kuanfu.liu@embeddedllm.com>
Co-authored-by: miloice <17350011+kliuae@users.noreply.github.com>
2023-12-07 23:16:52 -08:00
AguirreNicolas 24f60a54f4
[Docker] Adding number of nvcc_threads during build as envar (#1893) 2023-12-07 11:00:32 -08:00
gottlike 42c02f5892
Fix quickstart.rst typo jinja (#1964) 2023-12-07 08:34:44 -08:00
Peter Götz d940ce497e
Fix typo in adding_model.rst (#1947)
adpated -> adapted
2023-12-06 10:04:26 -08:00
Massimiliano Pronesti c07a442854
chore(examples-docs): upgrade to OpenAI V1 (#1785) 2023-12-03 01:11:22 -08:00
Simon Mo 5313c2cb8b
Add Production Metrics in Prometheus format (#1890) 2023-12-02 16:37:44 -08:00
Simon Mo 4cefa9b49b
[Docs] Update the AWQ documentation to highlight performance issue (#1883) 2023-12-02 15:52:47 -08:00
Woosuk Kwon e5452ddfd6
Normalize head weights for Baichuan 2 (#1876) 2023-11-30 20:03:58 -08:00
Adam Brusselback 66785cc05c
Support chat template and `echo` for chat API (#1756) 2023-11-30 16:43:13 -08:00
Massimiliano Pronesti 05a38612b0
docs: add instruction for langchain (#1162) 2023-11-30 10:57:44 -08:00
Simon Mo 0f621c2c7d
[Docs] Add information about using shared memory in docker (#1845) 2023-11-29 18:33:56 -08:00
Casper a921d8be9d
[DOCS] Add engine args documentation (#1741) 2023-11-22 12:31:27 -08:00
Wen Sun 112627e8b2
[Docs] Fix the code block's format in deploying_with_docker page (#1722) 2023-11-20 01:22:39 -08:00
Simon Mo 37c1e3c218
Documentation about official docker image (#1709) 2023-11-19 20:56:26 -08:00
Woosuk Kwon 06e9ebebd5
Add instructions to install vLLM+cu118 (#1717) 2023-11-18 23:48:58 -08:00
liuyhwangyh edb305584b
Support download models from www.modelscope.cn (#1588) 2023-11-17 20:38:31 -08:00
Zhuohan Li 0fc280b06c
Update the adding-model doc according to the new refactor (#1692) 2023-11-16 18:46:26 -08:00
Zhuohan Li 415d109527
[Fix] Update Supported Models List (#1690) 2023-11-16 14:47:26 -08:00
Casper 8516999495
Add Quantization and AutoAWQ to docs (#1235) 2023-11-04 22:43:39 -07:00
Stephen Krider 9cabcb7645
Add Dockerfile (#1350) 2023-10-31 12:36:47 -07:00
Zhuohan Li 9eed4d1f3e
Update README.md (#1292) 2023-10-08 23:15:50 -07:00
Usama Ahmed 0967102c6d
fixing typo in `tiiuae/falcon-rw-7b` model name (#1226) 2023-09-29 13:40:25 -07:00
Woosuk Kwon 202351d5bf
Add Mistral to supported model list (#1221) 2023-09-28 14:33:04 -07:00
Nick Perez 4ee52bb169
Docs: Fix broken link to openai example (#1145)
Link to `openai_client.py` is no longer valid - updated to `openai_completion_client.py`
2023-09-22 11:36:09 -07:00
Woosuk Kwon 7d7e3b78a3
Use `--ipc=host` in docker run for distributed inference (#1125) 2023-09-21 18:26:47 -07:00
Tanmay Verma 6f2dd6c37e
Add documentation to Triton server tutorial (#983) 2023-09-20 10:32:40 -07:00
Woosuk Kwon eda1a7cad3
Announce paper release (#1036) 2023-09-13 17:38:13 -07:00
Woosuk Kwon b9cecc2635
[Docs] Update installation page (#1005) 2023-09-10 14:23:31 -07:00
Zhuohan Li 002800f081
Align vLLM's beam search implementation with HF generate (#857) 2023-09-04 17:29:42 -07:00
Woosuk Kwon 55b28b1eee
[Docs] Minor fixes in supported models (#920)
* Minor fix in supported models

* Add another small fix for Aquila model

---------

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-08-31 16:28:39 -07:00
Zhuohan Li 14f9c72bfd
Update Supported Model List (#825) 2023-08-22 11:51:44 -07:00
Uranus 1b151ed181
Fix baichuan doc style (#748) 2023-08-13 20:57:31 -07:00
Zhuohan Li f7389f4763
[Doc] Add Baichuan 13B to supported models (#656) 2023-08-02 16:45:12 -07:00
Zhuohan Li 1b0bd0fe8a
Add Falcon support (new) (#592) 2023-08-02 14:04:39 -07:00
Zhuohan Li df5dd3c68e
Add Baichuan-7B to README (#494) 2023-07-25 15:25:12 -07:00
Zhuohan Li 6fc2a38b11
Add support for LLaMA-2 (#505) 2023-07-20 11:38:27 -07:00
Zhanghao Wu 58df2883cb
[Doc] Add doc for running vLLM on the cloud (#426)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-07-16 13:37:14 -07:00
Andre Slavescu c894836108
[Model] Add support for GPT-J (#226)
Co-authored-by: woWoosuk Kwon <woosuk.kwon@berkeley.edu>
2023-07-08 17:55:16 -07:00
Woosuk Kwon ffa6d2f9f9
[Docs] Fix typo (#346) 2023-07-03 16:51:47 -07:00
Woosuk Kwon 404422f42e
[Model] Add support for MPT (#334) 2023-07-03 16:47:53 -07:00
Woosuk Kwon e41f06702c
Add support for BLOOM (#331) 2023-07-03 13:12:35 -07:00
Bayang 9d27b09d12
Update README.md (#306) 2023-06-29 06:52:15 -07:00
Zhuohan Li 2cf1a333b6
[Doc] Documentation for distributed inference (#261) 2023-06-26 11:34:23 -07:00
Woosuk Kwon 665c48963b
[Docs] Add GPTBigCode to supported models (#213) 2023-06-22 15:05:11 -07:00
Woosuk Kwon 794e578de0
[Minor] Fix URLs (#166) 2023-06-19 22:57:14 -07:00
Woosuk Kwon caddfc14c1
[Minor] Fix icons in doc (#165) 2023-06-19 20:35:38 -07:00
Woosuk Kwon b7e62d3454
Fix repo & documentation URLs (#163) 2023-06-19 20:03:40 -07:00
Woosuk Kwon 364536acd1
[Docs] Minor fix (#162) 2023-06-19 19:58:23 -07:00
Zhuohan Li 0b32a987dd
Add and list supported models in README (#161) 2023-06-20 10:57:46 +08:00
Zhuohan Li a255885f83
Add logo and polish readme (#156) 2023-06-19 16:31:13 +08:00
Woosuk Kwon dcda03b4cb
Write README and front page of doc (#147) 2023-06-18 03:19:38 -07:00
Zhuohan Li bec7b2dc26
Add quickstart guide (#148) 2023-06-18 01:26:12 +08:00
Woosuk Kwon 0b98ba15c7
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
Woosuk Kwon e38074b1e6
Support FP32 (#141) 2023-06-07 00:40:21 -07:00
Woosuk Kwon 376725ce74
[PyPI] Packaging for PyPI distribution (#140) 2023-06-05 20:03:14 -07:00
Woosuk Kwon 456941cfe4
[Docs] Write the `Adding a New Model` section (#138) 2023-06-05 20:01:26 -07:00
Woosuk Kwon 62ec38ea41
Document supported models (#127) 2023-06-02 22:35:17 -07:00
Woosuk Kwon 0eda2e0953
Add .readthedocs.yaml (#136) 2023-06-02 22:27:44 -07:00
Woosuk Kwon 56b7f0efa4
Add a doc for installation (#128) 2023-05-27 01:13:06 -07:00
Woosuk Kwon 19d2899439
Add initial sphinx docs (#120) 2023-05-22 17:02:44 -07:00