vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
wenyujin333	d6ea427f04	[Model] Add support for Qwen2MoeModel (#3346 )	2024-03-28 15:19:59 +00:00
Woosuk Kwon	6d9aa00fc4	[Docs] Add Command-R to supported models (#3669 )	2024-03-27 15:20:00 -07:00
Megha Agarwal	e24336b5a7	[Model] Add support for DBRX (#3660 )	2024-03-27 13:01:46 -07:00
Woosuk Kwon	e66b629c04	[Misc] Minor fix in KVCache type (#3652 )	2024-03-26 23:14:06 -07:00
Jee Li	76879342a3	[Doc]add lora support (#3649 )	2024-03-27 02:06:46 +00:00
SangBin Cho	01bfb22b41	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
youkaichao	42bc386129	[CI/Build] respect the common environment variable MAX_JOBS (#3600 )	2024-03-24 17:04:00 -07:00
Lalit Pradhan	4c07dd28c0	[🚀 Ready to be merged] Added support for Jais models (#3183 )	2024-03-21 09:45:24 +00:00
Jim Burtoft	63e8b28a99	[Doc] minor fix of spelling in amd-installation.rst (#3506 )	2024-03-19 20:32:30 +00:00
Jim Burtoft	2a60c9bd17	[Doc] minor fix to neuron-installation.rst (#3505 )	2024-03-19 13:21:35 -07:00
Simon Mo	ef65dcfa6f	[Doc] Add docs about OpenAI compatible server (#3288 )	2024-03-18 22:05:34 -07:00
laneeee	8fa7357f2d	fix document error for value and v_vec illustration (#3421 )	2024-03-15 16:06:09 -07:00
Sherlock Xu	b0925b3878	docs: Add BentoML deployment doc (#3336 ) Signed-off-by: Sherlock113 <sherlockxu07@gmail.com>	2024-03-12 10:34:30 -07:00
Zhuohan Li	4c922709b6	Add distributed model executor abstraction (#3191 )	2024-03-11 11:03:45 -07:00
Philipp Moritz	657061fdce	[docs] Add LoRA support information for models (#3299 )	2024-03-11 00:54:51 -07:00
Roger Wang	99c3cfb83c	[Docs] Fix Unmocked Imports (#3275 )	2024-03-08 09:58:01 -08:00
Jialun Lyu	27a7b070db	Add document for vllm paged attention kernel. (#2978 )	2024-03-04 09:23:34 -08:00
Liangfu Chen	d0fae88114	[DOC] add setup document to support neuron backend (#2777 )	2024-03-04 01:03:51 +00:00
Sage Moore	ce4f5a29fb	Add Automatic Prefix Caching (#2762 ) Co-authored-by: ElizaWszola <eliza@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-03-02 00:50:01 -08:00
Yuan Tang	49d849b3ab	docs: Add tutorial on deploying vLLM model with KServe (#2586 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2024-03-01 11:04:14 -08:00
Ganesh Jagadeesan	a8683102cc	multi-lora documentation fix (#3064 )	2024-02-27 21:26:15 -08:00
Woosuk Kwon	8b430d7dea	[Minor] Fix StableLMEpochForCausalLM -> StableLmForCausalLM (#3046 )	2024-02-26 20:23:50 -08:00
张大成	48a8f4a7fd	Support Orion model (#2539 ) Co-authored-by: zhangdacheng <zhangdacheng@ainirobot.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-02-26 19:17:06 -08:00
Harry Mellor	ef978fe411	Port metrics from `aioprometheus` to `prometheus_client` (#2730 )	2024-02-25 11:54:00 -08:00
Zhuohan Li	a9c8212895	[FIX] Add Gemma model to the doc (#2966 )	2024-02-21 09:46:15 -08:00
Isotr0py	ab3a5a8259	Support OLMo models. (#2832 )	2024-02-18 21:05:15 -08:00
jvmncs	8f36444c4f	multi-LoRA as extra models in OpenAI server (#2775 ) how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)): ```terminal $ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/ $ python -m vllm.entrypoints.api_server \ --model meta-llama/Llama-2-7b-hf \ --enable-lora \ --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH ``` the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs no work has been done here to scope client permissions to specific models	2024-02-17 12:00:48 -08:00
Philipp Moritz	317b29de0f	Remove Yi model definition, please use `LlamaForCausalLM` instead (#2854 ) Co-authored-by: Roy <jasonailu87@gmail.com>	2024-02-13 14:22:22 -08:00
Simon Mo	f964493274	[CI] Ensure documentation build is checked in CI (#2842 )	2024-02-12 22:53:07 -08:00
Philipp Moritz	4ca2c358b1	Add documentation section about LoRA (#2834 )	2024-02-12 17:24:45 +01:00
Hongxia Yang	0580aab02f	[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention (#2768 )	2024-02-10 23:14:37 -08:00
Philipp Moritz	931746bc6d	Add documentation on how to do incremental builds (#2796 )	2024-02-07 14:42:02 -08:00
Massimiliano Pronesti	5ed704ec8c	docs: fix langchain (#2736 )	2024-02-03 18:17:55 -08:00
Fengzhe Zhou	cd9e60c76c	Add Internlm2 (#2666 )	2024-02-01 09:27:40 -08:00
Zhuohan Li	1af090b57d	Bump up version to v0.3.0 (#2656 )	2024-01-31 00:07:07 -08:00
zhaoyang-star	9090bf02e7	Support FP8-E5M2 KV Cache (#2279 ) Co-authored-by: zhaoyang <zhao.yang16@zte.com.cn> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2024-01-28 16:43:54 -08:00
Hongxia Yang	6b7de1a030	[ROCm] add support to ROCm 6.0 and MI300 (#2274 )	2024-01-26 12:41:10 -08:00
Junyang Lin	2832e7b9f9	fix names and license for Qwen2 (#2589 )	2024-01-24 22:37:51 -08:00
LastWhisper	223c19224b	Fix the syntax error in the doc of supported_models (#2584 )	2024-01-24 11:22:51 -08:00
Erfan Al-Hossami	9c1352eb57	[Feature] Simple API token authentication and pluggable middlewares (#1106 )	2024-01-23 15:13:00 -08:00
Junyang Lin	94b5edeb53	Add qwen2 (#2495 )	2024-01-22 14:34:21 -08:00
Hyunsung Lee	e1957c6ebd	Add StableLM3B model (#2372 )	2024-01-16 20:32:40 -08:00
Simon	827cbcd37c	Update quickstart.rst (#2369 )	2024-01-12 12:56:18 -08:00
Zhuohan Li	f745847ef7	[Minor] Fix the format in quick start guide related to Model Scope (#2425 )	2024-01-11 19:44:01 -08:00
Jiaxiang	6549aef245	[DOC] Add additional comments for LLMEngine and AsyncLLMEngine (#1011 )	2024-01-11 19:26:49 -08:00
Zhuohan Li	fd4ea8ef5c	Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )	2024-01-03 11:30:22 -08:00
Shivam Thakkar	1db83e31a2	[Docs] Update installation instructions to include CUDA 11.8 xFormers (#2246 )	2023-12-22 23:20:02 -08:00
Ronen Schaffer	c17daa9f89	[Docs] Fix broken links (#2222 )	2023-12-20 12:43:42 -08:00
avideci	de60a3fb93	Added DeciLM-7b and DeciLM-7b-instruct (#2062 )	2023-12-19 02:29:33 -08:00
kliuae	1b7c791d60	[ROCm] Fixes for GPTQ on ROCm (#2180 )	2023-12-18 10:41:04 -08:00

1 2 3

119 Commits