Commit Graph

7204 Commits

Author SHA1 Message Date
Zhuohan Li df5dd3c68e
Add Baichuan-7B to README (#494) 2023-07-25 15:25:12 -07:00
MoeedDar 2d867b55fa
fixed tensor parallel is not defined (#564) 2023-07-25 14:16:51 -07:00
Tao Peng d7a1c6d614
Fix paged attention testing. (#495)
Signed-off-by: Tao Peng <jiankeng.pt@alibaba-inc.com>
2023-07-24 21:01:56 -07:00
Zhuohan Li 7d5a155e4a
[Fix] Fix GPTBigcoder for distributed execution (#503) 2023-07-24 18:36:33 -07:00
leegohi04517 1dde34e0f8
GPTJConfig has no attribute rotary. (#532) 2023-07-24 11:29:30 -07:00
Zhuohan Li 6fc2a38b11
Add support for LLaMA-2 (#505) 2023-07-20 11:38:27 -07:00
Antoni Baum c487a221ee
Fix bad assert in initialize_cluster if PG already exists (#526) 2023-07-19 23:17:12 -07:00
Antoni Baum 9925c17940
Ray placement group support (#397) 2023-07-19 22:49:31 -07:00
Ricardo Lu 8c4b2592fb
fix: enable trust-remote-code in api server & benchmark. (#509) 2023-07-19 17:06:15 -07:00
WRH cf21a9bd5c
support trust_remote_code in benchmark (#518) 2023-07-19 17:02:40 -07:00
Massimiliano Pronesti 16c3e295a8
fix(ray_utils): ignore re-init error (#465) 2023-07-19 17:01:19 -07:00
Song bda41c70dd
hotfix attn alibi wo head mapping (#496)
Co-authored-by: oliveryuan <oliveryuan@basemind.com>
2023-07-18 11:31:48 -07:00
Lily Liu 453bafb96f
Merge pull request #498 from MoeedDar/main
Fixed old name reference for max_seq_len
2023-07-18 09:22:56 -07:00
MoeedDar 328d231c17 Fixed old name reference for max_seq_len 2023-07-18 16:47:59 +01:00
Lily Liu b4b195b360
fix max seq len (#489) 2023-07-17 23:20:20 -07:00
codethazine 20b0d88d16
Add support for baichuan (#365) 2023-07-17 13:50:55 -07:00
Zhuohan Li 2bdea7ac11
[Fix] Fix the condition of max_seq_len (#477) 2023-07-17 00:33:48 -04:00
Zhanghao Wu 58df2883cb
[Doc] Add doc for running vLLM on the cloud (#426)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-07-16 13:37:14 -07:00
Zhangir Azerbayev 6d7d95a70a
Offload port selection to OS (#467) 2023-07-15 23:11:02 -07:00
Zhuohan Li 96853af5a8
Optimize MQA Kernel (#452) 2023-07-14 20:06:40 -04:00
Wen Sun dbed69058c
Fix the `KeyError` when loading bloom-based models (#441) 2023-07-13 21:58:09 -07:00
panda 7b6ae94059
add vocab padding for LLama(Support WizardLM) (#411) 2023-07-13 23:56:22 -04:00
xcnick c6dfc3cdbe
Fix handling of special tokens in decoding. (#418) 2023-07-12 11:14:56 -04:00
Keming 51be365143
fix: freeze pydantic to v1 (#429) 2023-07-12 11:10:55 -04:00
Andre Slavescu c894836108
[Model] Add support for GPT-J (#226)
Co-authored-by: woWoosuk Kwon <woosuk.kwon@berkeley.edu>
2023-07-08 17:55:16 -07:00
Fazlul Shahriar 75beba29b5
Don't try to load training_args.bin (#373) 2023-07-08 15:26:28 -07:00
Woosuk Kwon ddfdf470ae
Add trust_remote_code arg to get_config (#405) 2023-07-08 15:24:17 -07:00
Woosuk Kwon b6fbb9a565
Sort the outputs before return (#402) 2023-07-08 14:48:18 -07:00
Lily Liu 2179e4f4c5
avoid python list copy in sequence initialization (#401) 2023-07-08 12:42:08 -07:00
codethazine a945fcc2ae
Add trust-remote-code flag to handle remote tokenizers (#364) 2023-07-07 11:04:58 -07:00
Nicolas Frenay be54f8e5c4
[Fix] Change /generate response-type to json for non-streaming (#374) 2023-07-06 18:15:17 -07:00
Ricardo Lu b396cb4998
fix: only response [DONE] once when streaming response. (#378) 2023-07-06 18:08:40 -07:00
Woosuk Kwon 1c395b4eaa
Bump up the version (#300) 2023-07-04 21:41:53 -07:00
akxxsb 3d64cf019e
[Server] use fastchat.model.model_adapter.get_conversation_template method to get model template (#357) 2023-07-04 21:39:59 -07:00
Zhuohan Li 98fe8cb542
[Server] Add option to specify chat template for chat endpoint (#345) 2023-07-03 23:01:56 -07:00
Woosuk Kwon ffa6d2f9f9
[Docs] Fix typo (#346) 2023-07-03 16:51:47 -07:00
Woosuk Kwon 404422f42e
[Model] Add support for MPT (#334) 2023-07-03 16:47:53 -07:00
coolcloudcol 7717d0838b
Fix an endless loop issue when engine_step throws a RuntimeError (#339) 2023-07-03 15:22:28 -07:00
Zhuohan Li 42e0c1df78
[Quality] Add CI for formatting (#343) 2023-07-03 14:50:56 -07:00
Woosuk Kwon e41f06702c
Add support for BLOOM (#331) 2023-07-03 13:12:35 -07:00
Zhuohan Li d6fa1be3a8
[Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00
Zhuohan Li 0ffded812a
[Fix] Better error message for batched prompts (#342) 2023-07-03 09:27:31 -07:00
Michele Catalano 0bd2a573a5
Allow send list of str for the Prompt on openai demo endpoint /v1/completions (#323)
* allow str or List[str] for prompt

* Update vllm/entrypoints/openai/api_server.py

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

---------

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-07-03 09:17:50 -07:00
Ricardo Lu 49b26e2cec
feat: add ChatCompletion endpoint in OpenAI demo server. (#330) 2023-07-02 22:54:33 -07:00
Lily Liu dafd924c1f
Raise error for long prompt (#273) 2023-06-30 18:48:49 -07:00
Zhuohan Li 598dc4b79a
[Fix] Weight loading for GPTBigCode (#313) 2023-06-29 22:14:17 -07:00
Zhuohan Li 85de093472
[Fix] Do not pin memory when in WSL (#312) 2023-06-29 15:00:21 -07:00
Zhanghao Wu f72297562f
Add news for the vllm+skypilot example (#314) 2023-06-29 12:32:37 -07:00
Bayang 9d27b09d12
Update README.md (#306) 2023-06-29 06:52:15 -07:00
Woosuk Kwon 998d9d1509
[Tokenizer] Add tokenizer mode (#298) 2023-06-28 14:19:22 -07:00