Commit Graph

1092 Commits

Author SHA1 Message Date
Thomas Parnell 789937af2e
[Doc] [SpecDecode] Update MLPSpeculator documentation (#7100)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2024-08-05 23:29:43 +00:00
Simon Mo 4db5176d97
bump version to v0.5.4 (#7139) 2024-08-05 14:39:48 -07:00
Jee Jee Li 179a6a36f2
[Model]Refactor MiniCPMV (#7020)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-04 08:12:41 +00:00
Yihuan Bu 654bc5ca49
Support for guided decoding for offline LLM (#6878)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-04 03:12:09 +00:00
Michael Goin b482b9a5b1
[CI/Build] Add support for Python 3.12 (#7035) 2024-08-02 13:51:22 -07:00
Murali Andoorveedu fc912e0886
[Models] Support Qwen model with PP (#6974)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-08-01 12:40:43 -07:00
Jee Jee Li 7ecee34321
[Kernel][RFC] Refactor the punica kernel based on Triton (#5036) 2024-07-31 17:12:24 -07:00
Alphi 2f4e108f75
[Bugfix] Clean up MiniCPM-V (#6939)
Co-authored-by: hezhihui <hzh7269@modelbest.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-07-31 14:39:19 +00:00
Cyrus Leung f230cc2ca6
[Bugfix] Fix broadcasting logic for `multi_modal_kwargs` (#6836) 2024-07-31 10:38:45 +08:00
Ilya Lavrenov 5895b24677
[OpenVINO] Updated OpenVINO requirements and build docs (#6948) 2024-07-30 11:33:01 -07:00
Isotr0py 7cbd9ec7a9
[Model] Initialize support for InternVL2 series models (#6514)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-29 10:16:30 +00:00
Woosuk Kwon fad5576c58
[TPU] Reduce compilation time & Upgrade PyTorch XLA version (#6856) 2024-07-27 10:28:33 -07:00
Chenggang Wu f954d0715c
[Docs] Add RunLLM chat widget (#6857) 2024-07-27 09:24:46 -07:00
Cyrus Leung 1ad86acf17
[Model] Initial support for BLIP-2 (#5920)
Co-authored-by: ywang96 <ywang@roblox.com>
2024-07-27 11:53:07 +00:00
Roger Wang ecb33a28cb
[CI/Build][Doc] Update CI and Doc for VLM example changes (#6860) 2024-07-27 09:54:14 +00:00
Harry Mellor c53041ae3b
[Doc] Add missing mock import to docs `conf.py` (#6834) 2024-07-27 04:47:33 +00:00
omrishiv 3c3012398e
[Doc] add VLLM_TARGET_DEVICE=neuron to documentation for neuron (#6844)
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
2024-07-26 20:20:16 -07:00
Woosuk Kwon ced36cd89b
[ROCm] Upgrade PyTorch nightly version (#6845) 2024-07-26 20:16:13 -07:00
Zhanghao Wu 150a1ffbfd
[Doc] Update SkyPilot doc for wrong indents and instructions for update service (#4283) 2024-07-26 14:39:10 -07:00
Michael Goin 281977bd6e
[Doc] Add Nemotron to supported model docs (#6843) 2024-07-26 17:32:44 -04:00
Li, Jiang 3bbb4936dc
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125) 2024-07-26 13:50:10 -07:00
youkaichao 85ad7e2d01
[doc][debugging] add known issues for hangs (#6816) 2024-07-25 21:48:05 -07:00
Woosuk Kwon b7215de2c5
[Docs] Publish 5th meetup slides (#6799) 2024-07-25 16:47:55 -07:00
youkaichao f3ff63c3f4
[doc][distributed] improve multinode serving doc (#6804) 2024-07-25 15:38:32 -07:00
Kuntai Du 6a1e25b151
[Doc] Add documentations for nightly benchmarks (#6412) 2024-07-25 11:57:16 -07:00
Alphi 9e169a4c61
[Model] Adding support for MiniCPM-V (#4087) 2024-07-24 20:59:30 -07:00
Hongxia Yang d88c458f44
[Doc][AMD][ROCm]Added tips to refer to mi300x tuning guide for mi300x users (#6754) 2024-07-24 14:32:57 -07:00
Woosuk Kwon ccc4a73257
[Docs][ROCm] Detailed instructions to build from source (#6680) 2024-07-24 01:07:23 -07:00
dongmao zhang 87525fab92
[bitsandbytes]: support read bnb pre-quantized model (#5753)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-07-23 23:45:09 +00:00
youkaichao 71950af726
[doc][distributed] fix doc argument order (#6691) 2024-07-23 08:55:33 -07:00
Woosuk Kwon cb1362a889
[Docs] Announce llama3.1 support (#6688) 2024-07-23 08:18:15 -07:00
Roger Wang 22fa2e35cb
[VLM][Model] Support image input for Chameleon (#6633) 2024-07-22 23:50:48 -07:00
youkaichao c051bfe4eb
[doc][distributed] doc for setting up multi-node environment (#6529)
[doc][distributed] add more doc for setting up multi-node environment (#6529)
2024-07-22 21:22:09 -07:00
Cyrus Leung 739b61a348
[Frontend] Refactor prompt processing (#4028)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-22 10:13:53 -07:00
Matt Wong 06d6c5fe9f
[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes (#6543) 2024-07-20 09:39:07 -07:00
Murali Andoorveedu 45ceb85a0c
[Docs] Update PP docs (#6598) 2024-07-19 16:38:21 -07:00
Simon Mo 30efe41532
[Docs] Update docs for wheel location (#6580) 2024-07-19 12:14:11 -07:00
milo157 a38524f338
[DOC] - Add docker image to Cerebrium Integration (#6510) 2024-07-17 10:22:53 -07:00
Cyrus Leung 5bf35a91e4
[Doc][CI/Build] Update docs and tests to use `vllm serve` (#6431) 2024-07-17 07:43:21 +00:00
Hongxia Yang 10383887e0
[ROCm] Cleanup Dockerfile and remove outdated patch (#6482) 2024-07-16 22:47:02 -07:00
Jiaxin Shan 94162beb9f
[Doc] Fix the lora adapter path in server startup script (#6230) 2024-07-16 10:11:04 -07:00
Woosuk Kwon c467dff24f
[Hardware][TPU] Support MoE with Pallas GMM kernel (#6457) 2024-07-16 09:56:28 -07:00
youkaichao 9f4ccec761
[doc][misc] remind to cancel debugging environment variables (#6481)
[doc][misc] remind users to cancel debugging environment variables after debugging (#6481)
2024-07-16 09:45:30 -07:00
Kevin H. Luu d6f3b3d5c4
Pin sphinx-argparse version (#6453)
Signed-off-by: kevin <kevin@anyscale.com>
2024-07-16 01:26:11 +00:00
Woosuk Kwon 3dee97b05f
[Docs] Add Google Cloud to sponsor list (#6450) 2024-07-15 11:58:10 -07:00
youkaichao 94b82e8c18
[doc][distributed] add suggestion for distributed inference (#6418) 2024-07-15 09:45:51 -07:00
youkaichao 22e79ee8f3
[doc][misc] doc update (#6439) 2024-07-14 23:33:25 -07:00
Robert Cohn 61e85dbad8
[Doc] xpu backend requires running setvars.sh (#6393) 2024-07-14 17:10:11 -07:00
Ethan Xu dbfe254eda
[Feature] vLLM CLI (#5090)
Co-authored-by: simon-mo <simon.mo@hey.com>
2024-07-14 15:36:43 -07:00
Yuan Tang 6ef3bf912c
Remove unnecessary trailing period in spec_decode.rst (#6405) 2024-07-14 07:58:09 +00:00
Isotr0py 540c0368b1
[Model] Initialize Fuyu-8B support (#3924)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-14 05:27:14 +00:00
Saliya Ekanayake a27f87da34
[Doc] Fix Typo in Doc (#6392)
Co-authored-by: Saliya Ekanayake <esaliya@d-matrix.ai>
2024-07-13 00:48:23 +00:00
Simon Mo d719ba24c5
Build some nightly wheels by default (#6380) 2024-07-12 13:56:59 -07:00
youkaichao 2d23b42d92
[doc] update pipeline parallel in readme (#6347) 2024-07-11 11:38:40 -07:00
Jie Fu (傅杰) 439c84581a
[Doc] Update description of vLLM support for CPUs (#6003) 2024-07-10 21:15:29 -07:00
Cyrus Leung 8a924d2248
[Doc] Guide for adding multi-modal plugins (#6205) 2024-07-10 14:55:34 +08:00
Murali Andoorveedu 673dd4cae9
[Docs] Docs update for Pipeline Parallel (#6222)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-07-09 16:24:58 -07:00
Roger Wang 6206dcb29e
[Model] Add PaliGemma (#5189)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-07-07 09:25:50 +08:00
Cyrus Leung 9389380015
[Doc] Move guide for multimodal model and other improvements (#6168) 2024-07-06 17:18:59 +08:00
Roger Wang 175c43eca4
[Doc] Reorganize Supported Models by Type (#6167) 2024-07-06 05:59:36 +00:00
Simon Mo 79d406e918
[Docs] Fix readthedocs for tag build (#6158) 2024-07-05 12:44:40 -07:00
Cyrus Leung ae96ef8fbd
[VLM] Calculate maximum number of multi-modal tokens by model (#6121) 2024-07-04 16:37:23 -07:00
youkaichao 27902d42be
[misc][doc] try to add warning for latest html (#5979) 2024-07-04 09:57:09 -07:00
youkaichao 966fe72141
[doc][misc] bump up py version in installation doc (#6119) 2024-07-03 15:52:04 -07:00
xwjiang2010 d9e98f42e4
[vlm] Remove vision language config. (#6089)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-03 22:14:16 +00:00
Michael Goin 47f0954af0
[Kernel] Expand FP8 support to Ampere GPUs using FP8 Marlin (#5975) 2024-07-03 17:38:00 +00:00
Roger Wang f1c78138aa
[Doc] Fix Mock Import (#6094) 2024-07-03 00:13:56 -07:00
Cyrus Leung 9831aec49f
[Core] Dynamic image size support for VLMs (#5276)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: ywang96 <ywang@roblox.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-07-02 20:34:00 -07:00
Mor Zusman 9d6a8daa87
[Model] Jamba support (#4115)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Erez Schwartz <erezs@ai21.com>
Co-authored-by: Mor Zusman <morz@ai21.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Tomer Asida <tomera@ai21.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-02 23:11:29 +00:00
Cyrus Leung 31354e563f
[Doc] Reinstate doc dependencies (#6061) 2024-07-02 10:53:16 +00:00
xwjiang2010 98d6682cd1
[VLM] Remove `image_input_type` from VLM config (#5852)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-02 07:57:09 +00:00
Roger Wang 8e0817c262
[Bugfix][Doc] Fix Doc Formatting (#6048) 2024-07-01 15:09:11 -07:00
ning.zhang 83bdcb6ac3
add FAQ doc under 'serving' (#5946) 2024-07-01 14:11:36 -07:00
youkaichao 4050d646e5
[doc][misc] remove deprecated api server in doc (#6037) 2024-07-01 12:52:43 -04:00
Ilya Lavrenov 57f09a419c
[Hardware][Intel] OpenVINO vLLM backend (#5379) 2024-06-28 13:50:16 +00:00
Cyrus Leung 5cbe8d155c
[Core] Registry for processing model inputs (#5214)
Co-authored-by: ywang96 <ywang@roblox.com>
2024-06-28 12:09:56 +00:00
Woosuk Kwon 79c92c7c8a
[Model] Add Gemma 2 (#5908) 2024-06-27 13:33:56 -07:00
youkaichao 3fd02bda51
[doc][misc] add note for Kubernetes users (#5916) 2024-06-27 10:07:07 -07:00
Cyrus Leung 96354d6a29
[Model] Add base class for LoRA-supported models (#5018) 2024-06-27 16:03:04 +08:00
youkaichao 294104c3f9
[doc] update usage of env var to avoid conflict (#5873) 2024-06-26 17:57:12 -04:00
Roger Wang 3aa7b6cf66
[Misc][Doc] Add Example of using OpenAI Server with VLM (#5832) 2024-06-25 20:34:25 -07:00
Matt Wong dd793d1de5
[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (#5422) 2024-06-25 15:56:15 -07:00
youkaichao c18ebfdd71
[doc][distributed] add both gloo and nccl tests (#5834) 2024-06-25 15:10:28 -04:00
Cyrus Leung f23871e9ee
[Doc] Add notice about breaking changes to VLMs (#5818) 2024-06-25 01:25:03 -07:00
Michael Goin 1744cc99ba
[Doc] Add Phi-3-medium to list of supported models (#5788) 2024-06-24 10:48:55 -07:00
Michael Goin e72dc6cb35
[Doc] Add "Suggest edit" button to doc pages (#5789) 2024-06-24 10:26:17 -07:00
youkaichao c246212952
[doc][faq] add warning to download models for every nodes (#5783) 2024-06-24 15:37:42 +08:00
Woosuk Kwon 8c00f9c15d
[Docs][TPU] Add installation tip for TPU (#5761) 2024-06-21 23:09:40 -07:00
Michael Goin 5b15bde539
[Doc] Documentation on supported hardware for quantization methods (#5745) 2024-06-21 12:44:29 -04:00
Roger Wang 1b2eaac316
[Bugfix][Doc] FIx Duplicate Explicit Target Name Errors (#5703) 2024-06-19 23:10:47 -07:00
Rafael Vasquez e83db9e7e3
[Doc] Update docker references (#5614)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-06-19 15:01:45 -07:00
milo157 2bd231a7b7
[Doc] Added cerebrium as Integration option (#5553) 2024-06-18 15:56:59 -07:00
Isotr0py daef218b55
[Model] Initialize Phi-3-vision support (#4986) 2024-06-17 19:34:33 -07:00
Kunshang Ji 728c4c8a06
[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (#3814)
Co-authored-by: Jiang Li <jiang1.li@intel.com>
Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com>
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-06-17 11:01:25 -07:00
youkaichao 845a3f26f9
[Doc] add debugging tips for crash and multi-node debugging (#5581) 2024-06-17 10:08:01 +08:00
Sanger Steel 6e2527a7cb
[Doc] Update documentation on Tensorizer (#5471) 2024-06-14 11:27:57 -07:00
Simon Mo cdab68dcdb
[Docs] Add ZhenFund as a Sponsor (#5548) 2024-06-14 11:17:21 -07:00
Cyrus Leung 0ce7b952f8
[Doc] Update LLaVA docs (#5437)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-06-13 11:22:07 -07:00
Woosuk Kwon a65634d3ae
[Docs] Add 4th meetup slides (#5509) 2024-06-13 10:18:26 -07:00
Li, Jiang 80aa7e91fc
[Hardware][Intel] Optimize CPU backend and add more performance tips (#4971)
Co-authored-by: Jianan Gu <jianan.gu@intel.com>
2024-06-13 09:33:14 -07:00
Cyrus Leung b8d4dfff9c
[Doc] Update debug docs (#5438) 2024-06-12 14:49:31 -07:00
Woosuk Kwon 1a8bfd92d5
[Hardware] Initial TPU integration (#5292) 2024-06-12 11:53:03 -07:00
youkaichao 8f89d72090
[Doc] add common case for long waiting time (#5430) 2024-06-11 11:12:13 -07:00
Nick Hill 99dac099ab
[Core][Doc] Default to multiprocessing for single-node distributed case (#5230)
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2024-06-11 11:10:41 -07:00
Cade Daniel 89ec06c33b
[Docs] [Spec decode] Fix docs error in code example (#5427) 2024-06-11 10:31:56 -07:00
Kuntai Du 9fde251bf0
[Doc] Add an automatic prefix caching section in vllm documentation (#5324)
Co-authored-by: simon-mo <simon.mo@hey.com>
2024-06-11 10:24:59 -07:00
Cade Daniel 4c2ffb28ff
[Speculative decoding] Initial spec decode docs (#5400) 2024-06-11 10:15:40 -07:00
SangBin Cho 246598a6b1
[CI] docfix (#5410)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: ywang96 <ywang@roblox.com>
2024-06-11 01:28:50 -07:00
Roger Wang 3c4cebf751
[Doc][Typo] Fixing Missing Comma (#5403) 2024-06-11 00:20:28 -07:00
youkaichao d8f31f2f8b
[Doc] add debugging tips (#5409) 2024-06-10 23:21:43 -07:00
Michael Goin 77c87beb06
[Doc] Add documentation for FP8 W8A8 (#5388) 2024-06-10 18:55:12 -06:00
Woosuk Kwon cb77ad836f
[Docs] Alphabetically sort sponsors (#5386) 2024-06-10 15:17:19 -05:00
Roger Wang 856c990041
[Docs] Add Docs on Limitations of VLM Support (#5383) 2024-06-10 09:53:50 -07:00
Cyrus Leung 6b29d6fe70
[Model] Initial support for LLaVA-NeXT (#4199)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-06-10 12:47:15 +00:00
Roger Wang 7a9cb294ae
[Frontend] Add OpenAI Vision API Support (#5237)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-06-07 11:23:32 -07:00
Simon Mo f270a39537
[Docs] Add Sequoia as sponsors (#5287) 2024-06-05 18:02:56 +00:00
Jie Fu (傅杰) 87d5abef75
[Bugfix] Fix a bug caused by pip install setuptools>=49.4.0 for CPU backend (#5249) 2024-06-04 09:57:51 -07:00
Breno Faria f775a07e30
[FRONTEND] OpenAI `tools` support named functions (#5032) 2024-06-03 18:25:29 -05:00
Cyrus Leung 7a64d24aad
[Core] Support image processor (#4197) 2024-06-02 22:56:41 -07:00
Nick Hill 657579113f
[Doc] Add checkmark for GPTBigCodeForCausalLM LoRA support (#5171) 2024-05-31 17:20:19 -07:00
Chansung Park 429d89720e
add doc about serving option on dstack (#3074)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-05-30 10:11:07 -07:00
Cyrus Leung a9bcc7afb2
[Doc] Use intersphinx and update entrypoints docs (#5125) 2024-05-30 09:59:23 -07:00
youkaichao 4fbcb0f27e
[Doc][Build] update after removing vllm-nccl (#5103)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-05-29 23:51:18 +00:00
Cyrus Leung 5ae5ed1e60
[Core] Consolidate prompt arguments to LLM engines (#4328)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-05-28 13:29:31 -07:00
Simon Mo 290f4ada2b
[Docs] Add Dropbox as sponsors (#5089) 2024-05-28 10:29:09 -07:00
Eric Xihui Lin 8e192ff967
[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799)
Co-authored-by: beagleski <yunanzhang@microsoft.com>
Co-authored-by: bapatra <bapatra@microsoft.com>
Co-authored-by: Barun Patra <codedecde@users.noreply.github.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-05-24 22:00:52 -07:00
youkaichao 6a50f4cafa
[Doc] add ccache guide in doc (#5012)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-05-23 23:21:54 +00:00
Simon Mo e941f88584
[Docs] Add acknowledgment for sponsors (#4925) 2024-05-21 00:17:25 -07:00
Isotr0py f12c3b5b3d
[Model] Add Phi-2 LoRA support (#4886) 2024-05-21 14:24:17 +09:00
Kante Yin 8e7fb5d43a
Support to serve vLLM on Kubernetes with LWS (#4829)
Signed-off-by: kerthcet <kerthcet@gmail.com>
2024-05-16 16:37:29 -07:00
Cyrus Leung dc72402b57
[Bugfix][Doc] Fix CI failure in docs (#4804)
This PR fixes the CI failure introduced by #4798.

The failure originates from having duplicate target names in reST, and is fixed by changing the ref targets to anonymous ones. For more information, see this discussion.

I have also changed the format of the links to be more distinct from each other.
2024-05-15 01:57:08 +09:00
Zhuohan Li c579b750a0
[Doc] Add meetups to the doc (#4798) 2024-05-13 18:48:00 -07:00
Cyrus Leung 4bfa7e7f75
[Doc] Add API reference for offline inference (#4710) 2024-05-13 17:47:42 -07:00
Zhuohan Li ac1fbf7fd2
[Doc] Shorten README by removing supported model list (#4796) 2024-05-13 16:23:54 -07:00
SangBin Cho e7c46b9527
[Scheduler] Warning upon preemption and Swapping (#4647)
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
2024-05-13 23:50:44 +09:00
Allen.Dou 706588a77d
[Bugfix] Fix CLI arguments in OpenAI server docs (#4729) 2024-05-11 00:00:56 +09:00
Simon Mo 51d4094fda
chunked-prefill-doc-syntax (#4603)
Fix the docs: https://docs.vllm.ai/en/latest/models/performance.html

Co-authored-by: sang <rkooo567@gmail.com>
2024-05-10 14:13:23 +09:00
Cyrus Leung a3c124570a
[Bugfix] Fix CLI arguments in OpenAI server docs (#4709) 2024-05-09 09:53:14 -07:00
SangBin Cho 36fb68f947
[Doc] Chunked Prefill Documentation (#4580) 2024-05-04 00:18:00 -07:00
youkaichao 2d7bce9cd5
[Doc] add env vars to the doc (#4572) 2024-05-03 05:13:49 +00:00
Frαnçois e491c7e053
[Doc] update(example model): for OpenAI compatible serving (#4503) 2024-05-01 10:14:16 -07:00
fuchen.ljl ee37328da0
Unable to find Punica extension issue during source code installation (#4494)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-05-01 00:42:09 +00:00
Prashant Gupta b31a1fb63c
[Doc] add visualization for multi-stage dockerfile (#4456)
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-04-30 17:41:59 +00:00
SangBin Cho a88081bf76
[CI] Disable non-lazy string operation on logging (#4326)
Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>
2024-04-26 00:16:58 -07:00
Hongxia Yang cf29b7eda4
[ROCm][Hardware][AMD][Doc] Documentation update for ROCm (#4376)
Co-authored-by: WoosukKwon <woosuk.kwon@berkeley.edu>
2024-04-25 18:12:25 -07:00
Isotr0py fbf152d976
[Bugfix][Model] Refactor OLMo model to support new HF format in transformers 4.40.0 (#4324)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-04-25 09:35:56 -07:00
Caio Mendes 96e90fdeb3
[Model] Adds Phi-3 support (#4298) 2024-04-25 03:06:57 +00:00
youkaichao 2768884ac4
[Doc] Add note for docker user (#4340)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-04-24 21:09:44 +00:00
Harry Mellor 34128a697e
Fix `autodoc` directives (#4272)
Co-authored-by: Harry Mellor <hmellor@oxts.com>
2024-04-23 01:53:01 +00:00
Zhanghao Wu ceaf4ed003
[Doc] Update the SkyPilot doc with serving and Llama-3 (#4276) 2024-04-22 15:34:31 -07:00
Harry Mellor 3d925165f2
Add example scripts to documentation (#4225)
Co-authored-by: Harry Mellor <hmellor@oxts.com>
2024-04-22 16:36:54 +00:00
xiaoji 7f2593b164
[Doc]: Update the doc of adding new models (#4236) 2024-04-21 09:57:08 -07:00
Harry Mellor fe7d648fe5
Don't show default value for flags in `EngineArgs` (#4223)
Co-authored-by: Harry Mellor <hmellor@oxts.com>
2024-04-21 09:15:28 -07:00
Harry Mellor 682789d402
Fix missing docs and out of sync `EngineArgs` (#4219)
Co-authored-by: Harry Mellor <hmellor@oxts.com>
2024-04-19 20:51:33 -07:00
Simon Mo 705578ae14
[Docs] document that Meta Llama 3 is supported (#4175) 2024-04-18 10:55:48 -07:00
Sanger Steel d619ae2d19
[Doc] Add better clarity for tensorizer usage (#4090)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-04-15 13:28:25 -07:00
Simon Mo aceb17cf2d
[Docs] document that mixtral 8x22b is supported (#4073) 2024-04-14 14:35:55 -07:00
Sanger Steel 711a000255
[Frontend] [Core] feat: Add model loading using `tensorizer` (#3476) 2024-04-13 17:13:01 -07:00
Michael Feil c2b4a1bce9
[Doc] Add typing hints / mypy types cleanup (#3816)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-04-11 17:17:21 -07:00
youkaichao f3d0bf7589
[Doc][Installation] delete python setup.py develop (#3989) 2024-04-11 03:33:02 +00:00
Frαnçois 92cd2e2f21
[Doc] Fix getting stared to use publicly available model (#3963) 2024-04-10 18:05:52 +00:00
youkaichao e35397468f
[Doc] Add doc to state our model support policy (#3948)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-04-10 17:03:02 +00:00
ywfang b4543c8f6b
[Model] add minicpm (#3893) 2024-04-08 18:28:36 +08:00
youkaichao 95baec828f
[Core] enable out-of-tree model register (#3871) 2024-04-06 17:11:41 -07:00
youkaichao d03d64fd2e
[CI/Build] refactor dockerfile & fix pip cache
[CI/Build] fix pip cache with vllm_nccl & refactor dockerfile to build wheels (#3859)
2024-04-04 21:53:16 -07:00
Sean Gallen 78107fa091
[Doc]Add asynchronous engine arguments to documentation. (#3810)
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-04-04 21:52:01 -07:00
Adrian Abeyta 2ff767b513
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290)
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu>
Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com>
Co-authored-by: guofangze <guofangze@kuaishou.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-04-03 14:15:55 -07:00
Roger Wang 3bec41f41a
[Doc] Fix vLLMEngine Doc Page (#3791) 2024-04-02 09:49:37 -07:00
bigPYJ1151 0e3f06fe9c
[Hardware][Intel] Add CPU inference backend (#3634)
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
2024-04-01 22:07:30 -07:00
youkaichao 9c82a1bec3
[Doc] Update installation doc (#3746)
[Doc] Update installation doc for build from source and explain the dependency on torch/cuda version (#3746)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-03-30 16:34:38 -07:00
yhu422 d8658c8cc1
Usage Stats Collection (#2852) 2024-03-28 22:16:12 -07:00
wenyujin333 d6ea427f04
[Model] Add support for Qwen2MoeModel (#3346) 2024-03-28 15:19:59 +00:00
Woosuk Kwon 6d9aa00fc4
[Docs] Add Command-R to supported models (#3669) 2024-03-27 15:20:00 -07:00
Megha Agarwal e24336b5a7
[Model] Add support for DBRX (#3660) 2024-03-27 13:01:46 -07:00
Woosuk Kwon e66b629c04
[Misc] Minor fix in KVCache type (#3652) 2024-03-26 23:14:06 -07:00
Jee Li 76879342a3
[Doc]add lora support (#3649) 2024-03-27 02:06:46 +00:00
SangBin Cho 01bfb22b41
[CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
youkaichao 42bc386129
[CI/Build] respect the common environment variable MAX_JOBS (#3600) 2024-03-24 17:04:00 -07:00
Lalit Pradhan 4c07dd28c0
[🚀 Ready to be merged] Added support for Jais models (#3183) 2024-03-21 09:45:24 +00:00
Jim Burtoft 63e8b28a99
[Doc] minor fix of spelling in amd-installation.rst (#3506) 2024-03-19 20:32:30 +00:00
Jim Burtoft 2a60c9bd17
[Doc] minor fix to neuron-installation.rst (#3505) 2024-03-19 13:21:35 -07:00
Simon Mo ef65dcfa6f
[Doc] Add docs about OpenAI compatible server (#3288) 2024-03-18 22:05:34 -07:00
laneeee 8fa7357f2d
fix document error for value and v_vec illustration (#3421) 2024-03-15 16:06:09 -07:00
Sherlock Xu b0925b3878
docs: Add BentoML deployment doc (#3336)
Signed-off-by: Sherlock113 <sherlockxu07@gmail.com>
2024-03-12 10:34:30 -07:00
Zhuohan Li 4c922709b6
Add distributed model executor abstraction (#3191) 2024-03-11 11:03:45 -07:00
Philipp Moritz 657061fdce
[docs] Add LoRA support information for models (#3299) 2024-03-11 00:54:51 -07:00
Roger Wang 99c3cfb83c
[Docs] Fix Unmocked Imports (#3275) 2024-03-08 09:58:01 -08:00
Jialun Lyu 27a7b070db
Add document for vllm paged attention kernel. (#2978) 2024-03-04 09:23:34 -08:00
Liangfu Chen d0fae88114
[DOC] add setup document to support neuron backend (#2777) 2024-03-04 01:03:51 +00:00
Sage Moore ce4f5a29fb
Add Automatic Prefix Caching (#2762)
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-03-02 00:50:01 -08:00
Yuan Tang 49d849b3ab
docs: Add tutorial on deploying vLLM model with KServe (#2586)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2024-03-01 11:04:14 -08:00
Ganesh Jagadeesan a8683102cc
multi-lora documentation fix (#3064) 2024-02-27 21:26:15 -08:00
Woosuk Kwon 8b430d7dea
[Minor] Fix StableLMEpochForCausalLM -> StableLmForCausalLM (#3046) 2024-02-26 20:23:50 -08:00
张大成 48a8f4a7fd
Support Orion model (#2539)
Co-authored-by: zhangdacheng <zhangdacheng@ainirobot.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-02-26 19:17:06 -08:00
Harry Mellor ef978fe411
Port metrics from `aioprometheus` to `prometheus_client` (#2730) 2024-02-25 11:54:00 -08:00
Zhuohan Li a9c8212895
[FIX] Add Gemma model to the doc (#2966) 2024-02-21 09:46:15 -08:00
Isotr0py ab3a5a8259
Support OLMo models. (#2832) 2024-02-18 21:05:15 -08:00
jvmncs 8f36444c4f
multi-LoRA as extra models in OpenAI server (#2775)
how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)):
```terminal
$ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
$ python -m vllm.entrypoints.api_server \
 --model meta-llama/Llama-2-7b-hf \
 --enable-lora \
 --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH
```
the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs

no work has been done here to scope client permissions to specific models
2024-02-17 12:00:48 -08:00
Philipp Moritz 317b29de0f
Remove Yi model definition, please use `LlamaForCausalLM` instead (#2854)
Co-authored-by: Roy <jasonailu87@gmail.com>
2024-02-13 14:22:22 -08:00
Simon Mo f964493274
[CI] Ensure documentation build is checked in CI (#2842) 2024-02-12 22:53:07 -08:00
Philipp Moritz 4ca2c358b1
Add documentation section about LoRA (#2834) 2024-02-12 17:24:45 +01:00
Hongxia Yang 0580aab02f
[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention (#2768) 2024-02-10 23:14:37 -08:00
Philipp Moritz 931746bc6d
Add documentation on how to do incremental builds (#2796) 2024-02-07 14:42:02 -08:00
Massimiliano Pronesti 5ed704ec8c
docs: fix langchain (#2736) 2024-02-03 18:17:55 -08:00
Fengzhe Zhou cd9e60c76c
Add Internlm2 (#2666) 2024-02-01 09:27:40 -08:00
Zhuohan Li 1af090b57d
Bump up version to v0.3.0 (#2656) 2024-01-31 00:07:07 -08:00
zhaoyang-star 9090bf02e7
Support FP8-E5M2 KV Cache (#2279)
Co-authored-by: zhaoyang <zhao.yang16@zte.com.cn>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-01-28 16:43:54 -08:00
Hongxia Yang 6b7de1a030
[ROCm] add support to ROCm 6.0 and MI300 (#2274) 2024-01-26 12:41:10 -08:00
Junyang Lin 2832e7b9f9
fix names and license for Qwen2 (#2589) 2024-01-24 22:37:51 -08:00
LastWhisper 223c19224b
Fix the syntax error in the doc of supported_models (#2584) 2024-01-24 11:22:51 -08:00
Erfan Al-Hossami 9c1352eb57
[Feature] Simple API token authentication and pluggable middlewares (#1106) 2024-01-23 15:13:00 -08:00
Junyang Lin 94b5edeb53
Add qwen2 (#2495) 2024-01-22 14:34:21 -08:00
Hyunsung Lee e1957c6ebd
Add StableLM3B model (#2372) 2024-01-16 20:32:40 -08:00
Simon 827cbcd37c
Update quickstart.rst (#2369) 2024-01-12 12:56:18 -08:00
Zhuohan Li f745847ef7
[Minor] Fix the format in quick start guide related to Model Scope (#2425) 2024-01-11 19:44:01 -08:00
Jiaxiang 6549aef245
[DOC] Add additional comments for LLMEngine and AsyncLLMEngine (#1011) 2024-01-11 19:26:49 -08:00
Zhuohan Li fd4ea8ef5c
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
Shivam Thakkar 1db83e31a2
[Docs] Update installation instructions to include CUDA 11.8 xFormers (#2246) 2023-12-22 23:20:02 -08:00
Ronen Schaffer c17daa9f89
[Docs] Fix broken links (#2222) 2023-12-20 12:43:42 -08:00
avideci de60a3fb93
Added DeciLM-7b and DeciLM-7b-instruct (#2062) 2023-12-19 02:29:33 -08:00
kliuae 1b7c791d60
[ROCm] Fixes for GPTQ on ROCm (#2180) 2023-12-18 10:41:04 -08:00
Suhong Moon 3ec8c25cd0
[Docs] Update documentation for gpu-memory-utilization option (#2162) 2023-12-17 10:51:57 -08:00
Woosuk Kwon f8c688d746
[Minor] Add Phi 2 to supported models (#2159) 2023-12-17 02:54:57 -08:00
Woosuk Kwon 26c52a5ea6
[Docs] Add CUDA graph support to docs (#2148) 2023-12-17 01:49:20 -08:00
Woosuk Kwon b81a6a6bb3
[Docs] Add supported quantization methods to docs (#2135) 2023-12-15 13:29:22 -08:00
Antoni Baum 21d93c140d
Optimize Mixtral with expert parallelism (#2090) 2023-12-13 23:55:07 -08:00
Woosuk Kwon 096827c284
[Docs] Add notes on ROCm-supported models (#2087) 2023-12-13 09:45:34 -08:00
Woosuk Kwon 6565d9e33e
Update installation instruction for vLLM + CUDA 11.8 (#2086) 2023-12-13 09:25:59 -08:00
TJian f375ec8440
[ROCm] Upgrade xformers version for ROCm & update doc (#2079)
Co-authored-by: miloice <jeffaw99@hotmail.com>
2023-12-13 00:56:05 -08:00
Ikko Eltociear Ashimine c0ce15dfb2
Update run_on_sky.rst (#2025)
sharable -> shareable
2023-12-11 10:32:58 -08:00
Woosuk Kwon 4ff0203987
Minor fixes for Mixtral (#2015) 2023-12-11 09:16:15 -08:00
Simon Mo c85b80c2b6
[Docker] Add cuda arch list as build option (#1950) 2023-12-08 09:53:47 -08:00
TJian 6ccc0bfffb
Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836)
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Amir Balwel <amoooori04@gmail.com>
Co-authored-by: root <kuanfu.liu@akirakan.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: kuanfu <kuanfu.liu@embeddedllm.com>
Co-authored-by: miloice <17350011+kliuae@users.noreply.github.com>
2023-12-07 23:16:52 -08:00
AguirreNicolas 24f60a54f4
[Docker] Adding number of nvcc_threads during build as envar (#1893) 2023-12-07 11:00:32 -08:00
gottlike 42c02f5892
Fix quickstart.rst typo jinja (#1964) 2023-12-07 08:34:44 -08:00
Peter Götz d940ce497e
Fix typo in adding_model.rst (#1947)
adpated -> adapted
2023-12-06 10:04:26 -08:00
Massimiliano Pronesti c07a442854
chore(examples-docs): upgrade to OpenAI V1 (#1785) 2023-12-03 01:11:22 -08:00
Simon Mo 5313c2cb8b
Add Production Metrics in Prometheus format (#1890) 2023-12-02 16:37:44 -08:00
Simon Mo 4cefa9b49b
[Docs] Update the AWQ documentation to highlight performance issue (#1883) 2023-12-02 15:52:47 -08:00
Woosuk Kwon e5452ddfd6
Normalize head weights for Baichuan 2 (#1876) 2023-11-30 20:03:58 -08:00
Adam Brusselback 66785cc05c
Support chat template and `echo` for chat API (#1756) 2023-11-30 16:43:13 -08:00
Massimiliano Pronesti 05a38612b0
docs: add instruction for langchain (#1162) 2023-11-30 10:57:44 -08:00
Simon Mo 0f621c2c7d
[Docs] Add information about using shared memory in docker (#1845) 2023-11-29 18:33:56 -08:00
Casper a921d8be9d
[DOCS] Add engine args documentation (#1741) 2023-11-22 12:31:27 -08:00
Wen Sun 112627e8b2
[Docs] Fix the code block's format in deploying_with_docker page (#1722) 2023-11-20 01:22:39 -08:00
Simon Mo 37c1e3c218
Documentation about official docker image (#1709) 2023-11-19 20:56:26 -08:00
Woosuk Kwon 06e9ebebd5
Add instructions to install vLLM+cu118 (#1717) 2023-11-18 23:48:58 -08:00
liuyhwangyh edb305584b
Support download models from www.modelscope.cn (#1588) 2023-11-17 20:38:31 -08:00
Zhuohan Li 0fc280b06c
Update the adding-model doc according to the new refactor (#1692) 2023-11-16 18:46:26 -08:00
Zhuohan Li 415d109527
[Fix] Update Supported Models List (#1690) 2023-11-16 14:47:26 -08:00
Casper 8516999495
Add Quantization and AutoAWQ to docs (#1235) 2023-11-04 22:43:39 -07:00
Stephen Krider 9cabcb7645
Add Dockerfile (#1350) 2023-10-31 12:36:47 -07:00
Zhuohan Li 9eed4d1f3e
Update README.md (#1292) 2023-10-08 23:15:50 -07:00
Usama Ahmed 0967102c6d
fixing typo in `tiiuae/falcon-rw-7b` model name (#1226) 2023-09-29 13:40:25 -07:00
Woosuk Kwon 202351d5bf
Add Mistral to supported model list (#1221) 2023-09-28 14:33:04 -07:00
Nick Perez 4ee52bb169
Docs: Fix broken link to openai example (#1145)
Link to `openai_client.py` is no longer valid - updated to `openai_completion_client.py`
2023-09-22 11:36:09 -07:00
Woosuk Kwon 7d7e3b78a3
Use `--ipc=host` in docker run for distributed inference (#1125) 2023-09-21 18:26:47 -07:00
Tanmay Verma 6f2dd6c37e
Add documentation to Triton server tutorial (#983) 2023-09-20 10:32:40 -07:00
Woosuk Kwon eda1a7cad3
Announce paper release (#1036) 2023-09-13 17:38:13 -07:00
Woosuk Kwon b9cecc2635
[Docs] Update installation page (#1005) 2023-09-10 14:23:31 -07:00
Zhuohan Li 002800f081
Align vLLM's beam search implementation with HF generate (#857) 2023-09-04 17:29:42 -07:00
Woosuk Kwon 55b28b1eee
[Docs] Minor fixes in supported models (#920)
* Minor fix in supported models

* Add another small fix for Aquila model

---------

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-08-31 16:28:39 -07:00
Zhuohan Li 14f9c72bfd
Update Supported Model List (#825) 2023-08-22 11:51:44 -07:00
Uranus 1b151ed181
Fix baichuan doc style (#748) 2023-08-13 20:57:31 -07:00
Zhuohan Li f7389f4763
[Doc] Add Baichuan 13B to supported models (#656) 2023-08-02 16:45:12 -07:00
Zhuohan Li 1b0bd0fe8a
Add Falcon support (new) (#592) 2023-08-02 14:04:39 -07:00
Zhuohan Li df5dd3c68e
Add Baichuan-7B to README (#494) 2023-07-25 15:25:12 -07:00
Zhuohan Li 6fc2a38b11
Add support for LLaMA-2 (#505) 2023-07-20 11:38:27 -07:00
Zhanghao Wu 58df2883cb
[Doc] Add doc for running vLLM on the cloud (#426)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-07-16 13:37:14 -07:00
Andre Slavescu c894836108
[Model] Add support for GPT-J (#226)
Co-authored-by: woWoosuk Kwon <woosuk.kwon@berkeley.edu>
2023-07-08 17:55:16 -07:00
Woosuk Kwon ffa6d2f9f9
[Docs] Fix typo (#346) 2023-07-03 16:51:47 -07:00
Woosuk Kwon 404422f42e
[Model] Add support for MPT (#334) 2023-07-03 16:47:53 -07:00
Woosuk Kwon e41f06702c
Add support for BLOOM (#331) 2023-07-03 13:12:35 -07:00
Bayang 9d27b09d12
Update README.md (#306) 2023-06-29 06:52:15 -07:00
Zhuohan Li 2cf1a333b6
[Doc] Documentation for distributed inference (#261) 2023-06-26 11:34:23 -07:00
Woosuk Kwon 665c48963b
[Docs] Add GPTBigCode to supported models (#213) 2023-06-22 15:05:11 -07:00
Woosuk Kwon 794e578de0
[Minor] Fix URLs (#166) 2023-06-19 22:57:14 -07:00
Woosuk Kwon caddfc14c1
[Minor] Fix icons in doc (#165) 2023-06-19 20:35:38 -07:00
Woosuk Kwon b7e62d3454
Fix repo & documentation URLs (#163) 2023-06-19 20:03:40 -07:00
Woosuk Kwon 364536acd1
[Docs] Minor fix (#162) 2023-06-19 19:58:23 -07:00
Zhuohan Li 0b32a987dd
Add and list supported models in README (#161) 2023-06-20 10:57:46 +08:00
Zhuohan Li a255885f83
Add logo and polish readme (#156) 2023-06-19 16:31:13 +08:00
Woosuk Kwon dcda03b4cb
Write README and front page of doc (#147) 2023-06-18 03:19:38 -07:00
Zhuohan Li bec7b2dc26
Add quickstart guide (#148) 2023-06-18 01:26:12 +08:00
Woosuk Kwon 0b98ba15c7
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
Woosuk Kwon e38074b1e6
Support FP32 (#141) 2023-06-07 00:40:21 -07:00
Woosuk Kwon 376725ce74
[PyPI] Packaging for PyPI distribution (#140) 2023-06-05 20:03:14 -07:00
Woosuk Kwon 456941cfe4
[Docs] Write the `Adding a New Model` section (#138) 2023-06-05 20:01:26 -07:00
Woosuk Kwon 62ec38ea41
Document supported models (#127) 2023-06-02 22:35:17 -07:00
Woosuk Kwon 0eda2e0953
Add .readthedocs.yaml (#136) 2023-06-02 22:27:44 -07:00
Woosuk Kwon 56b7f0efa4
Add a doc for installation (#128) 2023-05-27 01:13:06 -07:00
Woosuk Kwon 19d2899439
Add initial sphinx docs (#120) 2023-05-22 17:02:44 -07:00