Commit Graph

176 Commits

Author SHA1 Message Date
Jie Fu (傅杰) 87d5abef75
[Bugfix] Fix a bug caused by pip install setuptools>=49.4.0 for CPU backend (#5249) 2024-06-04 09:57:51 -07:00
Breno Faria f775a07e30
[FRONTEND] OpenAI `tools` support named functions (#5032) 2024-06-03 18:25:29 -05:00
Cyrus Leung 7a64d24aad
[Core] Support image processor (#4197) 2024-06-02 22:56:41 -07:00
Nick Hill 657579113f
[Doc] Add checkmark for GPTBigCodeForCausalLM LoRA support (#5171) 2024-05-31 17:20:19 -07:00
Chansung Park 429d89720e
add doc about serving option on dstack (#3074)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-05-30 10:11:07 -07:00
Cyrus Leung a9bcc7afb2
[Doc] Use intersphinx and update entrypoints docs (#5125) 2024-05-30 09:59:23 -07:00
youkaichao 4fbcb0f27e
[Doc][Build] update after removing vllm-nccl (#5103)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-05-29 23:51:18 +00:00
Cyrus Leung 5ae5ed1e60
[Core] Consolidate prompt arguments to LLM engines (#4328)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-05-28 13:29:31 -07:00
Simon Mo 290f4ada2b
[Docs] Add Dropbox as sponsors (#5089) 2024-05-28 10:29:09 -07:00
Eric Xihui Lin 8e192ff967
[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799)
Co-authored-by: beagleski <yunanzhang@microsoft.com>
Co-authored-by: bapatra <bapatra@microsoft.com>
Co-authored-by: Barun Patra <codedecde@users.noreply.github.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-05-24 22:00:52 -07:00
youkaichao 6a50f4cafa
[Doc] add ccache guide in doc (#5012)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-05-23 23:21:54 +00:00
Simon Mo e941f88584
[Docs] Add acknowledgment for sponsors (#4925) 2024-05-21 00:17:25 -07:00
Isotr0py f12c3b5b3d
[Model] Add Phi-2 LoRA support (#4886) 2024-05-21 14:24:17 +09:00
Kante Yin 8e7fb5d43a
Support to serve vLLM on Kubernetes with LWS (#4829)
Signed-off-by: kerthcet <kerthcet@gmail.com>
2024-05-16 16:37:29 -07:00
Cyrus Leung dc72402b57
[Bugfix][Doc] Fix CI failure in docs (#4804)
This PR fixes the CI failure introduced by #4798.

The failure originates from having duplicate target names in reST, and is fixed by changing the ref targets to anonymous ones. For more information, see this discussion.

I have also changed the format of the links to be more distinct from each other.
2024-05-15 01:57:08 +09:00
Zhuohan Li c579b750a0
[Doc] Add meetups to the doc (#4798) 2024-05-13 18:48:00 -07:00
Cyrus Leung 4bfa7e7f75
[Doc] Add API reference for offline inference (#4710) 2024-05-13 17:47:42 -07:00
Zhuohan Li ac1fbf7fd2
[Doc] Shorten README by removing supported model list (#4796) 2024-05-13 16:23:54 -07:00
SangBin Cho e7c46b9527
[Scheduler] Warning upon preemption and Swapping (#4647)
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
2024-05-13 23:50:44 +09:00
Allen.Dou 706588a77d
[Bugfix] Fix CLI arguments in OpenAI server docs (#4729) 2024-05-11 00:00:56 +09:00
Simon Mo 51d4094fda
chunked-prefill-doc-syntax (#4603)
Fix the docs: https://docs.vllm.ai/en/latest/models/performance.html

Co-authored-by: sang <rkooo567@gmail.com>
2024-05-10 14:13:23 +09:00
Cyrus Leung a3c124570a
[Bugfix] Fix CLI arguments in OpenAI server docs (#4709) 2024-05-09 09:53:14 -07:00
SangBin Cho 36fb68f947
[Doc] Chunked Prefill Documentation (#4580) 2024-05-04 00:18:00 -07:00
youkaichao 2d7bce9cd5
[Doc] add env vars to the doc (#4572) 2024-05-03 05:13:49 +00:00
Frαnçois e491c7e053
[Doc] update(example model): for OpenAI compatible serving (#4503) 2024-05-01 10:14:16 -07:00
fuchen.ljl ee37328da0
Unable to find Punica extension issue during source code installation (#4494)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-05-01 00:42:09 +00:00
Prashant Gupta b31a1fb63c
[Doc] add visualization for multi-stage dockerfile (#4456)
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-04-30 17:41:59 +00:00
SangBin Cho a88081bf76
[CI] Disable non-lazy string operation on logging (#4326)
Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>
2024-04-26 00:16:58 -07:00
Hongxia Yang cf29b7eda4
[ROCm][Hardware][AMD][Doc] Documentation update for ROCm (#4376)
Co-authored-by: WoosukKwon <woosuk.kwon@berkeley.edu>
2024-04-25 18:12:25 -07:00
Isotr0py fbf152d976
[Bugfix][Model] Refactor OLMo model to support new HF format in transformers 4.40.0 (#4324)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-04-25 09:35:56 -07:00
Caio Mendes 96e90fdeb3
[Model] Adds Phi-3 support (#4298) 2024-04-25 03:06:57 +00:00
youkaichao 2768884ac4
[Doc] Add note for docker user (#4340)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-04-24 21:09:44 +00:00
Harry Mellor 34128a697e
Fix `autodoc` directives (#4272)
Co-authored-by: Harry Mellor <hmellor@oxts.com>
2024-04-23 01:53:01 +00:00
Zhanghao Wu ceaf4ed003
[Doc] Update the SkyPilot doc with serving and Llama-3 (#4276) 2024-04-22 15:34:31 -07:00
Harry Mellor 3d925165f2
Add example scripts to documentation (#4225)
Co-authored-by: Harry Mellor <hmellor@oxts.com>
2024-04-22 16:36:54 +00:00
xiaoji 7f2593b164
[Doc]: Update the doc of adding new models (#4236) 2024-04-21 09:57:08 -07:00
Harry Mellor fe7d648fe5
Don't show default value for flags in `EngineArgs` (#4223)
Co-authored-by: Harry Mellor <hmellor@oxts.com>
2024-04-21 09:15:28 -07:00
Harry Mellor 682789d402
Fix missing docs and out of sync `EngineArgs` (#4219)
Co-authored-by: Harry Mellor <hmellor@oxts.com>
2024-04-19 20:51:33 -07:00
Simon Mo 705578ae14
[Docs] document that Meta Llama 3 is supported (#4175) 2024-04-18 10:55:48 -07:00
Sanger Steel d619ae2d19
[Doc] Add better clarity for tensorizer usage (#4090)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-04-15 13:28:25 -07:00
Simon Mo aceb17cf2d
[Docs] document that mixtral 8x22b is supported (#4073) 2024-04-14 14:35:55 -07:00
Sanger Steel 711a000255
[Frontend] [Core] feat: Add model loading using `tensorizer` (#3476) 2024-04-13 17:13:01 -07:00
Michael Feil c2b4a1bce9
[Doc] Add typing hints / mypy types cleanup (#3816)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-04-11 17:17:21 -07:00
youkaichao f3d0bf7589
[Doc][Installation] delete python setup.py develop (#3989) 2024-04-11 03:33:02 +00:00
Frαnçois 92cd2e2f21
[Doc] Fix getting stared to use publicly available model (#3963) 2024-04-10 18:05:52 +00:00
youkaichao e35397468f
[Doc] Add doc to state our model support policy (#3948)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-04-10 17:03:02 +00:00
ywfang b4543c8f6b
[Model] add minicpm (#3893) 2024-04-08 18:28:36 +08:00
youkaichao 95baec828f
[Core] enable out-of-tree model register (#3871) 2024-04-06 17:11:41 -07:00
youkaichao d03d64fd2e
[CI/Build] refactor dockerfile & fix pip cache
[CI/Build] fix pip cache with vllm_nccl & refactor dockerfile to build wheels (#3859)
2024-04-04 21:53:16 -07:00
Sean Gallen 78107fa091
[Doc]Add asynchronous engine arguments to documentation. (#3810)
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-04-04 21:52:01 -07:00