vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Nick Hill	657579113f	[Doc] Add checkmark for GPTBigCodeForCausalLM LoRA support (#5171 )	2024-05-31 17:20:19 -07:00
Eric Xihui Lin	8e192ff967	[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799 ) Co-authored-by: beagleski <yunanzhang@microsoft.com> Co-authored-by: bapatra <bapatra@microsoft.com> Co-authored-by: Barun Patra <codedecde@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-05-24 22:00:52 -07:00
Isotr0py	f12c3b5b3d	[Model] Add Phi-2 LoRA support (#4886 )	2024-05-21 14:24:17 +09:00
Zhuohan Li	ac1fbf7fd2	[Doc] Shorten README by removing supported model list (#4796 )	2024-05-13 16:23:54 -07:00
SangBin Cho	e7c46b9527	[Scheduler] Warning upon preemption and Swapping (#4647 ) Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>	2024-05-13 23:50:44 +09:00
Simon Mo	51d4094fda	chunked-prefill-doc-syntax (#4603 ) Fix the docs: https://docs.vllm.ai/en/latest/models/performance.html Co-authored-by: sang <rkooo567@gmail.com>	2024-05-10 14:13:23 +09:00
SangBin Cho	36fb68f947	[Doc] Chunked Prefill Documentation (#4580 )	2024-05-04 00:18:00 -07:00
Isotr0py	fbf152d976	[Bugfix][Model] Refactor OLMo model to support new HF format in transformers 4.40.0 (#4324 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-04-25 09:35:56 -07:00
Caio Mendes	96e90fdeb3	[Model] Adds Phi-3 support (#4298 )	2024-04-25 03:06:57 +00:00
xiaoji	7f2593b164	[Doc]: Update the doc of adding new models (#4236 )	2024-04-21 09:57:08 -07:00
Harry Mellor	fe7d648fe5	Don't show default value for flags in `EngineArgs` (#4223 ) Co-authored-by: Harry Mellor <hmellor@oxts.com>	2024-04-21 09:15:28 -07:00
Harry Mellor	682789d402	Fix missing docs and out of sync `EngineArgs` (#4219 ) Co-authored-by: Harry Mellor <hmellor@oxts.com>	2024-04-19 20:51:33 -07:00
Simon Mo	705578ae14	[Docs] document that Meta Llama 3 is supported (#4175 )	2024-04-18 10:55:48 -07:00
Sanger Steel	d619ae2d19	[Doc] Add better clarity for tensorizer usage (#4090 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-04-15 13:28:25 -07:00
Simon Mo	aceb17cf2d	[Docs] document that mixtral 8x22b is supported (#4073 )	2024-04-14 14:35:55 -07:00
Sanger Steel	711a000255	[Frontend] [Core] feat: Add model loading using `tensorizer` (#3476 )	2024-04-13 17:13:01 -07:00
youkaichao	e35397468f	[Doc] Add doc to state our model support policy (#3948 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-04-10 17:03:02 +00:00
ywfang	b4543c8f6b	[Model] add minicpm (#3893 )	2024-04-08 18:28:36 +08:00
youkaichao	95baec828f	[Core] enable out-of-tree model register (#3871 )	2024-04-06 17:11:41 -07:00
Sean Gallen	78107fa091	[Doc]Add asynchronous engine arguments to documentation. (#3810 ) Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-04-04 21:52:01 -07:00
wenyujin333	d6ea427f04	[Model] Add support for Qwen2MoeModel (#3346 )	2024-03-28 15:19:59 +00:00
Woosuk Kwon	6d9aa00fc4	[Docs] Add Command-R to supported models (#3669 )	2024-03-27 15:20:00 -07:00
Megha Agarwal	e24336b5a7	[Model] Add support for DBRX (#3660 )	2024-03-27 13:01:46 -07:00
Woosuk Kwon	e66b629c04	[Misc] Minor fix in KVCache type (#3652 )	2024-03-26 23:14:06 -07:00
Jee Li	76879342a3	[Doc]add lora support (#3649 )	2024-03-27 02:06:46 +00:00
Lalit Pradhan	4c07dd28c0	[🚀 Ready to be merged] Added support for Jais models (#3183 )	2024-03-21 09:45:24 +00:00
Simon Mo	ef65dcfa6f	[Doc] Add docs about OpenAI compatible server (#3288 )	2024-03-18 22:05:34 -07:00
Philipp Moritz	657061fdce	[docs] Add LoRA support information for models (#3299 )	2024-03-11 00:54:51 -07:00
Sage Moore	ce4f5a29fb	Add Automatic Prefix Caching (#2762 ) Co-authored-by: ElizaWszola <eliza@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-03-02 00:50:01 -08:00
Ganesh Jagadeesan	a8683102cc	multi-lora documentation fix (#3064 )	2024-02-27 21:26:15 -08:00
Woosuk Kwon	8b430d7dea	[Minor] Fix StableLMEpochForCausalLM -> StableLmForCausalLM (#3046 )	2024-02-26 20:23:50 -08:00
张大成	48a8f4a7fd	Support Orion model (#2539 ) Co-authored-by: zhangdacheng <zhangdacheng@ainirobot.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-02-26 19:17:06 -08:00
Zhuohan Li	a9c8212895	[FIX] Add Gemma model to the doc (#2966 )	2024-02-21 09:46:15 -08:00
Isotr0py	ab3a5a8259	Support OLMo models. (#2832 )	2024-02-18 21:05:15 -08:00
jvmncs	8f36444c4f	multi-LoRA as extra models in OpenAI server (#2775 ) how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)): ```terminal $ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/ $ python -m vllm.entrypoints.api_server \ --model meta-llama/Llama-2-7b-hf \ --enable-lora \ --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH ``` the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs no work has been done here to scope client permissions to specific models	2024-02-17 12:00:48 -08:00
Philipp Moritz	317b29de0f	Remove Yi model definition, please use `LlamaForCausalLM` instead (#2854 ) Co-authored-by: Roy <jasonailu87@gmail.com>	2024-02-13 14:22:22 -08:00
Philipp Moritz	4ca2c358b1	Add documentation section about LoRA (#2834 )	2024-02-12 17:24:45 +01:00
Fengzhe Zhou	cd9e60c76c	Add Internlm2 (#2666 )	2024-02-01 09:27:40 -08:00
Junyang Lin	2832e7b9f9	fix names and license for Qwen2 (#2589 )	2024-01-24 22:37:51 -08:00
LastWhisper	223c19224b	Fix the syntax error in the doc of supported_models (#2584 )	2024-01-24 11:22:51 -08:00
Junyang Lin	94b5edeb53	Add qwen2 (#2495 )	2024-01-22 14:34:21 -08:00
Hyunsung Lee	e1957c6ebd	Add StableLM3B model (#2372 )	2024-01-16 20:32:40 -08:00
Zhuohan Li	fd4ea8ef5c	Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )	2024-01-03 11:30:22 -08:00
Ronen Schaffer	c17daa9f89	[Docs] Fix broken links (#2222 )	2023-12-20 12:43:42 -08:00
avideci	de60a3fb93	Added DeciLM-7b and DeciLM-7b-instruct (#2062 )	2023-12-19 02:29:33 -08:00
Suhong Moon	3ec8c25cd0	[Docs] Update documentation for gpu-memory-utilization option (#2162 )	2023-12-17 10:51:57 -08:00
Woosuk Kwon	f8c688d746	[Minor] Add Phi 2 to supported models (#2159 )	2023-12-17 02:54:57 -08:00
Antoni Baum	21d93c140d	Optimize Mixtral with expert parallelism (#2090 )	2023-12-13 23:55:07 -08:00
Woosuk Kwon	096827c284	[Docs] Add notes on ROCm-supported models (#2087 )	2023-12-13 09:45:34 -08:00
Woosuk Kwon	4ff0203987	Minor fixes for Mixtral (#2015 )	2023-12-11 09:16:15 -08:00
Peter Götz	d940ce497e	Fix typo in adding_model.rst (#1947 ) adpated -> adapted	2023-12-06 10:04:26 -08:00
Woosuk Kwon	e5452ddfd6	Normalize head weights for Baichuan 2 (#1876 )	2023-11-30 20:03:58 -08:00
Simon Mo	0f621c2c7d	[Docs] Add information about using shared memory in docker (#1845 )	2023-11-29 18:33:56 -08:00
Casper	a921d8be9d	[DOCS] Add engine args documentation (#1741 )	2023-11-22 12:31:27 -08:00
liuyhwangyh	edb305584b	Support download models from www.modelscope.cn (#1588 )	2023-11-17 20:38:31 -08:00
Zhuohan Li	0fc280b06c	Update the adding-model doc according to the new refactor (#1692 )	2023-11-16 18:46:26 -08:00
Zhuohan Li	415d109527	[Fix] Update Supported Models List (#1690 )	2023-11-16 14:47:26 -08:00
Usama Ahmed	0967102c6d	fixing typo in `tiiuae/falcon-rw-7b` model name (#1226 )	2023-09-29 13:40:25 -07:00
Woosuk Kwon	202351d5bf	Add Mistral to supported model list (#1221 )	2023-09-28 14:33:04 -07:00
Zhuohan Li	002800f081	Align vLLM's beam search implementation with HF generate (#857 )	2023-09-04 17:29:42 -07:00
Woosuk Kwon	55b28b1eee	[Docs] Minor fixes in supported models (#920 ) * Minor fix in supported models * Add another small fix for Aquila model --------- Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2023-08-31 16:28:39 -07:00
Zhuohan Li	14f9c72bfd	Update Supported Model List (#825 )	2023-08-22 11:51:44 -07:00
Uranus	1b151ed181	Fix baichuan doc style (#748 )	2023-08-13 20:57:31 -07:00
Zhuohan Li	f7389f4763	[Doc] Add Baichuan 13B to supported models (#656 )	2023-08-02 16:45:12 -07:00
Zhuohan Li	1b0bd0fe8a	Add Falcon support (new) (#592 )	2023-08-02 14:04:39 -07:00
Zhuohan Li	df5dd3c68e	Add Baichuan-7B to README (#494 )	2023-07-25 15:25:12 -07:00
Zhuohan Li	6fc2a38b11	Add support for LLaMA-2 (#505 )	2023-07-20 11:38:27 -07:00
Andre Slavescu	c894836108	[Model] Add support for GPT-J (#226 ) Co-authored-by: woWoosuk Kwon <woosuk.kwon@berkeley.edu>	2023-07-08 17:55:16 -07:00
Woosuk Kwon	ffa6d2f9f9	[Docs] Fix typo (#346 )	2023-07-03 16:51:47 -07:00
Woosuk Kwon	404422f42e	[Model] Add support for MPT (#334 )	2023-07-03 16:47:53 -07:00
Woosuk Kwon	e41f06702c	Add support for BLOOM (#331 )	2023-07-03 13:12:35 -07:00
Woosuk Kwon	665c48963b	[Docs] Add GPTBigCode to supported models (#213 )	2023-06-22 15:05:11 -07:00
Woosuk Kwon	794e578de0	[Minor] Fix URLs (#166 )	2023-06-19 22:57:14 -07:00
Woosuk Kwon	b7e62d3454	Fix repo & documentation URLs (#163 )	2023-06-19 20:03:40 -07:00
Zhuohan Li	0b32a987dd	Add and list supported models in README (#161 )	2023-06-20 10:57:46 +08:00
Woosuk Kwon	dcda03b4cb	Write README and front page of doc (#147 )	2023-06-18 03:19:38 -07:00
Woosuk Kwon	0b98ba15c7	Change the name to vLLM (#150 )	2023-06-17 03:07:40 -07:00
Woosuk Kwon	456941cfe4	[Docs] Write the `Adding a New Model` section (#138 )	2023-06-05 20:01:26 -07:00
Woosuk Kwon	62ec38ea41	Document supported models (#127 )	2023-06-02 22:35:17 -07:00

... 4 5 6 7 8

379 Commits