vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Zhuohan Li	df5dd3c68e	Add Baichuan-7B to README (#494 )	2023-07-25 15:25:12 -07:00
MoeedDar	2d867b55fa	fixed tensor parallel is not defined (#564 )	2023-07-25 14:16:51 -07:00
Tao Peng	d7a1c6d614	Fix paged attention testing. (#495 ) Signed-off-by: Tao Peng <jiankeng.pt@alibaba-inc.com>	2023-07-24 21:01:56 -07:00
Zhuohan Li	7d5a155e4a	[Fix] Fix GPTBigcoder for distributed execution (#503 )	2023-07-24 18:36:33 -07:00
leegohi04517	1dde34e0f8	GPTJConfig has no attribute rotary. (#532 )	2023-07-24 11:29:30 -07:00
Zhuohan Li	6fc2a38b11	Add support for LLaMA-2 (#505 )	2023-07-20 11:38:27 -07:00
Antoni Baum	c487a221ee	Fix bad assert in initialize_cluster if PG already exists (#526 )	2023-07-19 23:17:12 -07:00
Antoni Baum	9925c17940	Ray placement group support (#397 )	2023-07-19 22:49:31 -07:00
Ricardo Lu	8c4b2592fb	fix: enable trust-remote-code in api server & benchmark. (#509 )	2023-07-19 17:06:15 -07:00
WRH	cf21a9bd5c	support trust_remote_code in benchmark (#518 )	2023-07-19 17:02:40 -07:00
Massimiliano Pronesti	16c3e295a8	fix(ray_utils): ignore re-init error (#465 )	2023-07-19 17:01:19 -07:00
Song	bda41c70dd	hotfix attn alibi wo head mapping (#496 ) Co-authored-by: oliveryuan <oliveryuan@basemind.com>	2023-07-18 11:31:48 -07:00
Lily Liu	453bafb96f	Merge pull request #498 from MoeedDar/main Fixed old name reference for max_seq_len	2023-07-18 09:22:56 -07:00
MoeedDar	328d231c17	Fixed old name reference for max_seq_len	2023-07-18 16:47:59 +01:00
Lily Liu	b4b195b360	fix max seq len (#489 )	2023-07-17 23:20:20 -07:00
codethazine	20b0d88d16	Add support for baichuan (#365 )	2023-07-17 13:50:55 -07:00
Zhuohan Li	2bdea7ac11	[Fix] Fix the condition of max_seq_len (#477 )	2023-07-17 00:33:48 -04:00
Zhanghao Wu	58df2883cb	[Doc] Add doc for running vLLM on the cloud (#426 ) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2023-07-16 13:37:14 -07:00
Zhangir Azerbayev	6d7d95a70a	Offload port selection to OS (#467 )	2023-07-15 23:11:02 -07:00
Zhuohan Li	96853af5a8	Optimize MQA Kernel (#452 )	2023-07-14 20:06:40 -04:00
Wen Sun	dbed69058c	Fix the `KeyError` when loading bloom-based models (#441 )	2023-07-13 21:58:09 -07:00
panda	7b6ae94059	add vocab padding for LLama(Support WizardLM) (#411 )	2023-07-13 23:56:22 -04:00
xcnick	c6dfc3cdbe	Fix handling of special tokens in decoding. (#418 )	2023-07-12 11:14:56 -04:00
Keming	51be365143	fix: freeze pydantic to v1 (#429 )	2023-07-12 11:10:55 -04:00
Andre Slavescu	c894836108	[Model] Add support for GPT-J (#226 ) Co-authored-by: woWoosuk Kwon <woosuk.kwon@berkeley.edu>	2023-07-08 17:55:16 -07:00
Fazlul Shahriar	75beba29b5	Don't try to load training_args.bin (#373 )	2023-07-08 15:26:28 -07:00
Woosuk Kwon	ddfdf470ae	Add trust_remote_code arg to get_config (#405 )	2023-07-08 15:24:17 -07:00
Woosuk Kwon	b6fbb9a565	Sort the outputs before return (#402 )	2023-07-08 14:48:18 -07:00
Lily Liu	2179e4f4c5	avoid python list copy in sequence initialization (#401 )	2023-07-08 12:42:08 -07:00
codethazine	a945fcc2ae	Add trust-remote-code flag to handle remote tokenizers (#364 )	2023-07-07 11:04:58 -07:00
Nicolas Frenay	be54f8e5c4	[Fix] Change /generate response-type to json for non-streaming (#374 )	2023-07-06 18:15:17 -07:00
Ricardo Lu	b396cb4998	fix: only response [DONE] once when streaming response. (#378 )	2023-07-06 18:08:40 -07:00
Woosuk Kwon	1c395b4eaa	Bump up the version (#300 )	2023-07-04 21:41:53 -07:00
akxxsb	3d64cf019e	[Server] use fastchat.model.model_adapter.get_conversation_template method to get model template (#357 )	2023-07-04 21:39:59 -07:00
Zhuohan Li	98fe8cb542	[Server] Add option to specify chat template for chat endpoint (#345 )	2023-07-03 23:01:56 -07:00
Woosuk Kwon	ffa6d2f9f9	[Docs] Fix typo (#346 )	2023-07-03 16:51:47 -07:00
Woosuk Kwon	404422f42e	[Model] Add support for MPT (#334 )	2023-07-03 16:47:53 -07:00
coolcloudcol	7717d0838b	Fix an endless loop issue when engine_step throws a RuntimeError (#339 )	2023-07-03 15:22:28 -07:00
Zhuohan Li	42e0c1df78	[Quality] Add CI for formatting (#343 )	2023-07-03 14:50:56 -07:00
Woosuk Kwon	e41f06702c	Add support for BLOOM (#331 )	2023-07-03 13:12:35 -07:00
Zhuohan Li	d6fa1be3a8	[Quality] Add code formatter and linter (#326 )	2023-07-03 11:31:55 -07:00
Zhuohan Li	0ffded812a	[Fix] Better error message for batched prompts (#342 )	2023-07-03 09:27:31 -07:00
Michele Catalano	0bd2a573a5	Allow send list of str for the Prompt on openai demo endpoint /v1/completions (#323 ) * allow str or List[str] for prompt * Update vllm/entrypoints/openai/api_server.py Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> --------- Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2023-07-03 09:17:50 -07:00
Ricardo Lu	49b26e2cec	feat: add ChatCompletion endpoint in OpenAI demo server. (#330 )	2023-07-02 22:54:33 -07:00
Lily Liu	dafd924c1f	Raise error for long prompt (#273 )	2023-06-30 18:48:49 -07:00
Zhuohan Li	598dc4b79a	[Fix] Weight loading for GPTBigCode (#313 )	2023-06-29 22:14:17 -07:00
Zhuohan Li	85de093472	[Fix] Do not pin memory when in WSL (#312 )	2023-06-29 15:00:21 -07:00
Zhanghao Wu	f72297562f	Add news for the vllm+skypilot example (#314 )	2023-06-29 12:32:37 -07:00
Bayang	9d27b09d12	Update README.md (#306 )	2023-06-29 06:52:15 -07:00
Woosuk Kwon	998d9d1509	[Tokenizer] Add tokenizer mode (#298 )	2023-06-28 14:19:22 -07:00

... 138 139 140 141 142 ...

7204 Commits All Branches Search

7204 Commits

All Branches