vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Roger Wang	29c748930e	[CI] Fix flaky entrypoint tests (#11403 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2024-12-21 21:08:44 -08:00
Yanyi Liu	5aef49806d	[Feature] Add load generation config from model (#11164 ) Signed-off-by: liuyanyi <wolfsonliu@163.com> Signed-off-by: Yanyi Liu <wolfsonliu@163.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-12-19 10:50:38 +00:00
Michael Goin	a30482f054	[CI] Expand test_guided_generate to test all backends (#11313 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-12-19 04:00:38 +00:00
Michael Goin	c77eb8a33c	[Bugfix] Set temperature=0.7 in test_guided_choice_chat (#11264 )	2024-12-17 16:34:06 -08:00
Joe Runde	2d1b9baa8f	[Bugfix] Fix request cancellation without polling (#11190 )	2024-12-17 12:26:32 -08:00
kYLe	66d4b16724	[Frontend] Add OpenAI API support for input_audio (#11027 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-16 22:09:58 -08:00
Michael Goin	0064f697d3	[CI] Add test case with JSON schema using references + use xgrammar by default with OpenAI parse (#10935 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-12-17 11:39:58 +08:00
youkaichao	551603feff	[core] overhaul memory profiling and fix backward compatibility (#10511 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-16 13:32:25 -08:00
Isotr0py	d927dbcd88	[Model] Refactor Ultravox to use merged input processor (#11198 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-16 10:09:53 +00:00
Brad Hilton	9c3dadd1c9	[Frontend] Add `logits_processors` as an extra completion argument (#11150 ) Signed-off-by: Brad Hilton <brad.hilton.nw@gmail.com>	2024-12-14 16:46:42 +00:00
Cyrus Leung	0920ab9131	[Doc] Reorganize online pooling APIs (#11172 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-14 00:22:22 +08:00
Cyrus Leung	eeec9e3390	[Frontend] Separate pooling APIs in offline inference (#11129 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-13 10:40:07 +00:00
Jiaxin Shan	85362f028c	[Misc][LoRA] Ensure Lora Adapter requests return adapter name (#11094 ) Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-12 09:25:16 +00:00
Cyrus Leung	8f10d5e393	[Misc] Split up pooling tasks (#10820 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-11 01:28:00 -08:00
Isotr0py	a811dd6608	[Model] merged input processor for Phi-3-Vision models (#10977 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-09 12:55:10 -08:00
Michael Goin	8d370e91cb	[Bugfix] Fallback to outlines for complex json schemas (#10899 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-12-05 11:14:06 +08:00
Aaron Pham	9323a3153b	[Core][Performance] Add XGrammar support for guided decoding and set it as default (#10785 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2024-12-03 15:17:00 +08:00
Cyrus Leung	d2f058e76c	[Misc] Rename embedding classes to pooling (#10801 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-01 14:36:51 +08:00
tomeras91	395b1c7454	[Frontend] don't block event loop in tokenization (preprocess) in OpenAI compatible server (#10635 ) Signed-off-by: Tomer Asida <tomera@ai21.com>	2024-11-27 13:21:10 -08:00
youkaichao	308cc5e21e	[ci] fix slow tests (#10698 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-27 09:26:14 -08:00
youkaichao	334d64d1e8	[ci] add vllm_test_utils (#10659 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-26 00:20:04 -08:00
Chauncey	d04b13a380	[Bug]: Authorization ignored when root_path is set (#10606 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2024-11-25 16:21:41 +00:00
Maximilien de Bayser	214efc2c3c	Support Cross encoder models (#10400 ) Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Flavia Beo <flavia.beo@ibm.com> Co-authored-by: Flavia Beo <flavia.beo@ibm.com>	2024-11-24 18:56:20 -08:00
Varun Vinayak Shenoy	7d8ffb344f	[Bugfix] Internal Server Error when tool_choice is incorrect. (#10567 ) Signed-off-by: Varun Shenoy <varun.vinayak.shenoy@oracle.com>	2024-11-22 21:13:29 -08:00
Travis Johnson	9195dbdbca	[Bugfix][Frontend] Update Llama Chat Templates to also support Non-Tool use (#10164 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-11-23 10:17:38 +08:00
Chauncey	da7e702c6f	[Bug]: When apply continue_final_message for OpenAI server, the "echo":false is ignored (#10180 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2024-11-21 16:24:32 +00:00
Guillaume Calmettes	c68f7ede6a	[Bugfix]: allow extra fields in requests to openai compatible server (#10463 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2024-11-20 16:42:21 -05:00
Cyrus Leung	32e46e000f	[Frontend] Automatic detection of chat content format from AST (#9919 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-16 13:35:40 +08:00
Mike Depinet	f67ce05d0b	[Frontend] Pythonic tool parser (#9859 ) Signed-off-by: Mike Depinet <mike@fixie.ai>	2024-11-14 04:14:34 +00:00
Robert Shaw	6ace6fba2c	[V1] `AsyncLLM` Implementation (#9826 ) Signed-off-by: Nick Hill <nickhill@us.ibm.com> Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-11-11 23:05:38 +00:00
litianjian	28b2877d30	Online video support for VLMs (#10020 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: litianjian <litianjian@bytedance.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-07 20:25:59 +00:00
Joe Runde	d58268c56a	[V1] Make v1 more testable (#9888 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-11-06 11:57:35 -08:00
tomeras91	ac04a97a9f	[Frontend] Add max_tokens prometheus metric (#9881 ) Signed-off-by: Tomer Asida <tomera@ai21.com>	2024-11-04 22:53:24 +00:00
Robert Shaw	1c45f4c385	[CI] Basic Integration Test For TPU (#9968 ) Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>	2024-11-04 11:34:26 -08:00
Cyrus Leung	ba0d892074	[Frontend] Use a proper chat template for VLM2Vec (#9912 )	2024-11-01 14:09:07 +00:00
Cyrus Leung	06386a64dd	[Frontend] Chat-based Embeddings API (#9759 )	2024-11-01 08:13:35 +00:00
Joe Runde	031a7995f3	[Bugfix][Frontend] Reject guided decoding in multistep mode (#9892 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-11-01 01:09:46 +00:00
Guillaume Calmettes	abbfb6134d	[Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint (#9837 )	2024-10-30 18:15:56 -07:00
Joe Runde	67bdf8e523	[Bugfix][Frontend] Guard against bad token ids (#9634 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-10-29 14:13:20 -07:00
Zhong Qishuai	ef7865b4f9	[Frontend] re-enable multi-modality input in the new beam search implementation (#9427 ) Signed-off-by: Qishuai Ferdinandzhong@gmail.com	2024-10-29 11:49:47 +00:00
Vinay R Damodaran	33bab41060	[Bugfix]: Make chat content text allow type content (#9358 ) Signed-off-by: Vinay Damodaran <vrdn@hey.com>	2024-10-24 05:05:49 +00:00
Alex Brooks	150b779081	[Frontend] Enable Online Multi-image Support for MLlama (#9393 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-10-23 17:28:57 +00:00
Wallas Henrique	c0292211ce	[CI/Build] Replaced some models on tests for smaller ones (#9570 ) Signed-off-by: Wallas Santos <wallashss@ibm.com>	2024-10-22 04:52:14 +00:00
youkaichao	76a5e13270	[core] move parallel sampling out from vllm core (#9302 )	2024-10-22 00:31:44 +00:00
Chen Zhang	5b59fe0f08	[Bugfix] Pass json-schema to GuidedDecodingParams and make test stronger (#9530 )	2024-10-20 00:05:02 +00:00
Yue Zhang	c5eea3c8ba	[Frontend] Support simpler image input format (#9478 )	2024-10-18 23:17:07 -07:00
sasha0552	337ed76671	[Bugfix] Fix offline mode when using `mistral_common` (#9457 )	2024-10-18 18:12:32 -07:00
Cody Yu	d11bf435a0	[MISC] Consolidate cleanup() and refactor offline_inference_with_prefix.py (#9510 )	2024-10-18 14:30:55 -07:00
Cyrus Leung	051eaf6db3	[Model] Add user-configurable task for models that support both generation and embedding (#9424 )	2024-10-18 11:31:58 -07:00
Joe Runde	de4008e2ab	[Bugfix][Core] Use torch.cuda.memory_stats() to profile peak memory usage (#9352 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-10-17 22:47:27 -04:00
Chang Su	ba30942240	[Bugfix] Fix vLLM UsageInfo and logprobs None AssertionError with empty token_ids (#9034 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-10-15 15:40:43 -07:00
Nick Hill	e9d517f276	[BugFix] Fix chat API continuous usage stats (#9357 )	2024-10-14 23:19:48 -07:00
youkaichao	cbc2ef5529	[misc] hide best_of from engine (#9261 ) Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>	2024-10-10 21:30:44 -07:00
Daniele	9a94ca4a5d	[Bugfix] fix OpenAI API server startup with --disable-frontend-multiprocessing (#8537 )	2024-10-08 09:38:40 -07:00
Alex Brooks	069d3bd8d0	[Frontend] Add Early Validation For Chat Template / Tool Call Parser (#9151 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-10-08 14:31:26 +00:00
Brendan Wong	8c746226c9	[Frontend] API support for beam search for MQLLMEngine (#9117 )	2024-10-08 05:51:43 +00:00
Brendan Wong	168cab6bbf	[Frontend] API support for beam search (#9087 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-10-05 23:39:03 -07:00
Flávia Béo	0dcc8cbe5a	Adds truncate_prompt_tokens param for embeddings creation (#8999 ) Signed-off-by: Flavia Beo <flavia.beo@ibm.com>	2024-10-04 18:31:40 +00:00
Roger Wang	26aa325f4f	[Core][VLM] Test registration for OOT multimodal models (#8717 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-04 10:38:25 -07:00
Joe Runde	062c89e7c9	[Frontend][Core] Move guided decoding params into sampling params (#8252 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-10-01 09:34:25 +08:00
danieljannai21	6c9ba48fde	[Frontend] Added support for HF's new `continue_final_message` parameter (#8942 )	2024-09-29 17:59:47 +00:00
Cyrus Leung	3b00b9c26c	[Core] rename`PromptInputs` and `inputs` (#8876 )	2024-09-26 20:35:15 -07:00
Nick Hill	4b377d6feb	[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829 )	2024-09-26 16:46:43 -07:00
Simon Mo	4f1ba0844b	Revert "rename PromptInputs and inputs with backward compatibility (#8760 ) (#8810 )	2024-09-25 10:36:26 -07:00
Cyrus Leung	28e1299e60	rename PromptInputs and inputs with backward compatibility (#8760 )	2024-09-25 09:36:47 -07:00
Andy	2529d09b5a	[Frontend] Batch inference for llm.chat() API (#8648 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-09-24 09:44:11 -07:00
Alexander Matveev	1a2aef3e59	Add output streaming support to multi-step + async while ensuring RequestOutput obj reuse (#8335 )	2024-09-23 15:38:04 -07:00
Jiaxin Shan	260d40b5ea	[Core] Support Lora lineage and base model metadata management (#6315 )	2024-09-20 06:20:56 +00:00
Alexander Matveev	7c7714d856	[Core][Bugfix][Perf] Introduce `MQLLMEngine` to avoid `asyncio` OH (#8157 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-09-18 13:56:58 +00:00
Joe Runde	f2e263b801	[Bugfix] Offline mode fix (#8376 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-09-12 11:11:57 -07:00
Pooya Davoodi	cea95dfb94	[Frontend] Create ErrorResponse instead of raising exceptions in run_batch (#8347 )	2024-09-11 05:30:11 +00:00
Jiaxin Shan	db3bf7c991	[Core] Support load and unload LoRA in api server (#6566 ) Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2024-09-05 18:10:33 -07:00
Cyrus Leung	855c262a6b	[Frontend] Multimodal support in offline chat (#8098 )	2024-09-04 05:22:17 +00:00
Roger Wang	5231f0898e	[Frontend][VLM] Add support for multiple multi-modal items (#8049 )	2024-08-31 16:35:53 -07:00
Nick Hill	39178c7fbc	[Tests] Disable retries and use context manager for openai client (#7565 )	2024-08-26 21:33:17 -07:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	0b769992ec	[Bugfix]: Use float32 for base64 embedding (#7855 ) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2024-08-26 03:16:38 +00:00
youkaichao	7d9ffa2ae1	[misc][core] lazy import outlines (#7831 )	2024-08-24 00:51:38 -07:00
Tyler Rockwood	d81abefd2e	[Frontend] add json_schema support from OpenAI protocol (#7654 )	2024-08-23 23:07:24 -07:00
Pooya Davoodi	8da48e4d95	[Frontend] Publish Prometheus metrics in run_batch API (#7641 )	2024-08-23 23:04:22 -07:00
Maximilien de Bayser	e25fee57c2	[BugFix] Fix server crash on empty prompt (#7746 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2024-08-23 13:12:44 +00:00
Joe Runde	b903e1ba7f	[Frontend] error suppression cleanup (#7786 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-08-22 21:50:21 +00:00
Joe Runde	cde9183b40	[Bug][Frontend] Improve ZMQ client robustness (#7443 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-08-22 02:18:11 +00:00
Peter Salas	1ca0d4f86b	[Model] Add UltravoxModel and UltravoxConfig (#7615 )	2024-08-21 22:49:39 +00:00
Robert Shaw	970dfdc01d	[Frontend] Improve Startup Failure UX (#7716 )	2024-08-21 19:53:01 +00:00
Robert Shaw	f7e3b0c5aa	[Bugfix][Frontend] Fix Issues Under High Load With `zeromq` Frontend (#7394 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-08-21 13:34:14 -04:00
Cyrus Leung	baaedfdb2d	[mypy] Enable following imports for entrypoints (#7248 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Fei <dfdfcai4@gmail.com>	2024-08-20 23:28:21 -07:00
Robert Shaw	e3b318216d	[ Bugfix ] Fix Prometheus Metrics With `zeromq` Frontend (#7279 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-08-18 20:19:48 +00:00
Roger Wang	bbf55c4805	[VLM] Refactor `MultiModalConfig` initialization and profiling (#7530 )	2024-08-17 13:30:55 -07:00
nunjunj	3b19e39dc5	Chat method for offline llm (#5049 ) Co-authored-by: nunjunj <ray@g-3ff9f30f2ed650001.c.vllm-405802.internal> Co-authored-by: nunjunj <ray@g-1df6075697c3f0001.c.vllm-405802.internal> Co-authored-by: nunjunj <ray@g-c5a2c23abc49e0001.c.vllm-405802.internal> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-08-15 19:41:34 -07:00
Grant Pinkert	f878c8feb0	[Feature]: Add OpenAI server prompt_logprobs support #6508 (#7453 )	2024-08-16 02:38:08 +00:00
youkaichao	16422ea76f	[misc][plugin] add plugin system implementation (#7426 )	2024-08-13 16:24:17 -07:00
youkaichao	33e5d7e6b6	[frontend] spawn engine process from api server process (#7484 )	2024-08-13 15:40:17 -07:00
Peter Salas	00c3d68e45	[Frontend][Core] Add plumbing to support audio language models (#7446 )	2024-08-13 17:39:33 +00:00
Cyrus Leung	7025b11d94	[Bugfix] Fix weight loading for Chameleon when TP>1 (#7410 )	2024-08-13 05:33:41 +00:00
Andrew Wang	97a6be95ba	[Misc] improve logits processors logging message (#7435 )	2024-08-13 02:29:34 +00:00
Pooya Davoodi	249b88228d	[Frontend] Support embeddings in the run_batch API (#7132 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-08-09 09:48:21 -07:00
Cyrus Leung	7eb4a51c5f	[Core] Support serving encoder/decoder models (#7258 )	2024-08-09 10:39:41 +08:00
Joe Runde	21b9c49aa3	[Frontend] Kill the server on engine death (#6594 ) Signed-off-by: Joe Runde <joe@joerun.de> Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-08-08 09:47:48 -07:00
Maximilien de Bayser	fde47d3bc2	[BugFix] Fix frontend multiprocessing hang (#7217 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>	2024-08-07 18:09:36 +00:00
Cyrus Leung	66d617e343	[Frontend] Gracefully handle missing chat template and fix CI failure (#7238 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-08-07 09:12:05 +00:00
youkaichao	dfb1a15dcb	[ci][frontend] deduplicate tests (#7101 )	2024-08-05 15:59:22 -07:00
Yihuan Bu	654bc5ca49	Support for guided decoding for offline LLM (#6878 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-08-04 03:12:09 +00:00
Robert Shaw	ed812a73fa	[ Frontend ] Multiprocessing for OpenAI Server with `zeromq` (#6883 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <joe@joerun.de> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-08-02 18:27:28 -07:00
youkaichao	806949514a	[ci] set timeout for test_oot_registration.py (#7082 )	2024-08-02 10:03:24 -07:00
zifeitong	3c10591ef2	[Bugfix] Set SamplingParams.max_tokens for OpenAI requests if not provided by user (#6954 )	2024-07-31 21:13:34 -07:00
Nick Hill	9f69d8245a	[Frontend] New `allowed_token_ids` decoding request parameter (#6753 )	2024-07-29 23:37:27 +00:00
Chang Su	316a41ac1d	[Bugfix] Fix encoding_format in examples/openai_embedding_client.py (#6755 )	2024-07-24 22:48:07 -07:00
Evan Z. Liu	5689e256ba	[Frontend] Represent tokens with identifiable strings (#6626 )	2024-07-25 09:51:00 +08:00
Yehoshua Cohen	58f53034ad	[Frontend] Add Usage data in each chunk for chat_serving. #6540 (#6652 )	2024-07-23 11:41:55 -07:00
Cyrus Leung	97234be0ec	[Misc] Manage HTTP connections in one place (#6600 )	2024-07-22 21:32:02 -07:00
Cyrus Leung	739b61a348	[Frontend] Refactor prompt processing (#4028 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-22 10:13:53 -07:00
Cyrus Leung	6366efc67b	[Bugfix][Frontend] Fix missing `/metrics` endpoint (#6463 )	2024-07-19 03:55:13 +00:00
Nick Hill	e2fbaee725	[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs (#6227 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-07-18 15:13:30 +08:00
Cyrus Leung	5bf35a91e4	[Doc][CI/Build] Update docs and tests to use `vllm serve` (#6431 )	2024-07-17 07:43:21 +00:00
sasha0552	7a3d2a5b95	[Frontend] Support for chat completions input in the tokenize endpoint (#5923 )	2024-07-16 20:18:09 +08:00
Joe	d92b3c5cde	[Bugfix][CI/Build] Test prompt adapters in openai entrypoint tests (#6419 )	2024-07-15 18:54:15 -07:00
zifeitong	b47008b4d2	[BugFix] BatchResponseData body should be optional (#6345 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-07-15 04:06:09 +00:00
youkaichao	41708e5034	[ci] try to add multi-node tests (#6280 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-12 21:51:48 -07:00
Yihuan Bu	b039cbbce3	[Misc] add fixture to guided processor tests (#6341 )	2024-07-12 09:55:39 -07:00
jvlunteren	f1e15da6fe	[Frontend] Continuous usage stats in OpenAI completion API (#5742 )	2024-07-05 10:37:09 -07:00
xwjiang2010	d9e98f42e4	[vlm] Remove vision language config. (#6089 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-03 22:14:16 +00:00
SangBin Cho	d18bab3587	[CI] Fix base url doesn't strip "/" (#6087 )	2024-07-02 21:31:25 -07:00
Murali Andoorveedu	c5832d2ae9	[Core] Pipeline Parallel Support (#4412 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-02 10:58:08 -07:00
xwjiang2010	98d6682cd1	[VLM] Remove `image_input_type` from VLM config (#5852 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-02 07:57:09 +00:00
llmpros	c6c240aa0a	[Frontend]: Support base64 embedding (#5935 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-06-30 23:53:00 +08:00
Cyrus Leung	9d47f64eb6	[CI/Build] [3/3] Reorganize entrypoints tests (#5966 )	2024-06-30 12:58:49 +08:00
Matt Wong	9def10664e	[Bugfix][CI/Build][Hardware][AMD] Install matching torchvision to fix AMD tests (#5949 )	2024-06-29 12:47:58 -07:00
Cyrus Leung	3b752a6555	[CI/Build] [2/3] Reorganize entrypoints tests (#5904 )	2024-06-28 07:59:18 -07:00
Cyrus Leung	e9d32d077d	[CI/Build] [1/3] Reorganize entrypoints tests (#5526 )	2024-06-27 12:43:17 +00:00
sasha0552	c54269d967	[Frontend] Add tokenize/detokenize endpoints (#5054 )	2024-06-26 16:54:22 +00:00
Matt Wong	dd793d1de5	[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (#5422 )	2024-06-25 15:56:15 -07:00
Cyrus Leung	81fbb3655f	[CI/Build] Test both text and token IDs in batched OpenAI Completions API (#5568 )	2024-06-15 07:29:42 -04:00
Cyrus Leung	0e9164b40a	[mypy] Enable type checking for test directory (#5017 )	2024-06-15 04:45:31 +00:00
Cyrus Leung	39873476f8	[CI/Build] Simplify OpenAI server setup in tests (#5100 )	2024-06-13 11:21:53 -07:00
Cyrus Leung	640052b069	[Bugfix][Frontend] Cleanup "fix chat logprobs" (#5026 )	2024-06-10 22:36:46 -07:00
maor-ps	351d5e7b82	[Bugfix] OpenAI entrypoint limits logprobs while ignoring server defined --max-logprobs (#5312 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-06-11 10:30:31 +08:00
Itay Etelis	774d1035e4	[Feature][Frontend]: Continued `stream_options` implementation also in CompletionRequest (#5319 )	2024-06-10 14:22:09 +00:00
Roger Wang	7a9cb294ae	[Frontend] Add OpenAI Vision API Support (#5237 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-06-07 11:23:32 -07:00
Itay Etelis	baa15a9ec3	[Feature][Frontend]: Add support for `stream_options` in `ChatCompletionRequest` (#5135 )	2024-06-07 03:29:24 +00:00
Matthew Goldey	828da0d44e	[Frontend] enable passing multiple LoRA adapters at once to generate() (#5300 )	2024-06-06 15:48:13 -05:00
Breno Faria	7b0a0dfb22	[Frontend][Core] Update Outlines Integration from `FSM` to `Guide` (#4109 ) Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Breno Faria <breno.faria@intrafind.com>	2024-06-05 16:49:12 -07:00
Toshiki Kataoka	06b2550cbb	[Bugfix] Support `prompt_logprobs==0` (#5217 )	2024-06-03 17:59:30 -07:00
Breno Faria	f775a07e30	[FRONTEND] OpenAI `tools` support named functions (#5032 )	2024-06-03 18:25:29 -05:00
Breno Faria	87d41c849d	[BUGFIX] [FRONTEND] Correct chat logprobs (#5029 ) Co-authored-by: Breno Faria <breno.faria@intrafind.com>	2024-05-30 02:52:14 -07:00
Cyrus Leung	5ae5ed1e60	[Core] Consolidate prompt arguments to LLM engines (#4328 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-05-28 13:29:31 -07:00
Alex Wu	52f8107cf2	[Frontend] Support OpenAI batch file format (#4794 ) Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>	2024-05-15 19:13:36 -04:00
Cyrus Leung	fc0d9dfc3a	[Frontend] Re-enable custom roles in Chat Completions API (#4758 )	2024-05-15 14:58:46 -07:00
Cyrus Leung	350f9e107f	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 ) Since #4335 was merged, I've noticed that the definition of ServerRunner in the tests is the same as in the test for OpenAI API. I have moved the class to the test utilities to avoid code duplication. (Although it only has been repeated twice so far, I will add another similar test suite in #4200 which would duplicate the code a third time) Also, I have moved the test utilities file (test_utils.py) to under the test directory (tests/utils.py), since none of its code is actually used in the main package. Note that I have added __init__.py to each test subpackage and updated the ray.init() call in the test utilities file in order to relative import tests/utils.py.	2024-05-13 23:50:09 +09:00
Chang Su	e254497b66	[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734 )	2024-05-11 11:30:37 -07:00
Cyrus Leung	f12b20decc	[Frontend] Move async logic outside of constructor (#4674 )	2024-05-08 22:48:33 -07:00
Sebastian Schoennenbeck	f8e7adda21	Fix/async chat serving (#2727 )	2024-05-03 11:04:14 -07:00
sasha0552	c47ba4aaa9	[Bugfix] Add validation for seed (#4529 )	2024-05-01 19:31:22 +00:00
Robert Caulk	c3845d82dc	Allow user to define whitespace pattern for outlines (#4305 )	2024-04-30 20:48:39 -07:00
Florian Greinacher	a494140433	[Frontend] Support complex message content for chat completions endpoint (#3467 ) Co-authored-by: Lily Liu <lilyliupku@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-04-30 16:28:46 -07:00
Cyrus Leung	8947bc3c15	[Frontend][Bugfix] Disallow extra fields in OpenAI API (#4355 )	2024-04-27 05:08:24 +00:00
nunjunj	91528575ec	[Frontend] multiple sampling params support (#3570 )	2024-04-20 00:11:57 -07:00
Ayush Rautwar	138485a82d	[Bugfix] Add fix for JSON whitespace (#4189 ) Co-authored-by: Ubuntu <ubuntu@ip-172-31-13-147.ec2.internal>	2024-04-19 20:49:22 -07:00
James Whedbee	e1bb2fd52d	[Bugfix] Support logprobs when using guided_json and other constrained decoding fields (#4149 )	2024-04-18 21:12:55 +00:00
Noam Gat	05434764cd	LM Format Enforcer Guided Decoding Support (#3868 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-04-16 05:54:57 +00:00
Dylan Hawk	95e7d4a97c	Fix echo/logprob OpenAI completion bug (#3441 ) Co-authored-by: Dylan Hawk <dylanwawk@gmail.com>	2024-04-11 22:15:50 +00:00
SangBin Cho	67b4221a61	[Core][5/N] Fully working chunked prefill e2e (#3884 )	2024-04-10 17:56:48 -07:00
youkaichao	95baec828f	[Core] enable out-of-tree model register (#3871 )	2024-04-06 17:11:41 -07:00
Roy	f510395bbf	[BugFix][Frontend] Fix completion logprobs=0 error (#3731 )	2024-03-29 09:38:21 -07:00
Dylan Hawk	0b4997e05c	[Bugfix] API stream returning two stops (#3450 ) Co-authored-by: Dylan Hawk <dylanwawk@gmail.com>	2024-03-25 10:14:34 -07:00
SangBin Cho	01bfb22b41	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
Simon Mo	120157fd2a	Support arbitrary json_object in OpenAI and Context Free Grammar (#3211 )	2024-03-16 13:35:27 -07:00
Zhuohan Li	2f8844ba08	Re-enable the 80 char line width limit (#3305 )	2024-03-10 19:49:14 -07:00
Antoni Baum	22de45235c	Push logprob generation to LLMEngine (#3065 ) Co-authored-by: Avnish Narayan <avnish@anyscale.com>	2024-03-04 19:54:06 +00:00
felixzhu555	703e42ee4b	Add guided decoding for OpenAI API server (#2819 ) Co-authored-by: br3no <breno@veltefaria.de> Co-authored-by: simon-mo <simon.mo@hey.com>	2024-02-29 22:13:08 +00:00
Dylan Hawk	e0ade06d63	Support logit bias for OpenAI API (#3027 )	2024-02-27 11:51:53 +08:00
Jared Moore	70f3e8e3a1	Add LogProbs for Chat Completions in OpenAI (#2918 )	2024-02-26 10:39:34 +08:00
jvmncs	8f36444c4f	multi-LoRA as extra models in OpenAI server (#2775 ) how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)): ```terminal $ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/ $ python -m vllm.entrypoints.api_server \ --model meta-llama/Llama-2-7b-hf \ --enable-lora \ --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH ``` the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs no work has been done here to scope client permissions to specific models	2024-02-17 12:00:48 -08:00
Simon Mo	3a7dd7e367	Support Batch Completion in Server (#2529 )	2024-01-24 17:11:07 -08:00
Simon Mo	dd7e8f5f64	refactor complemention api for readability (#2499 )	2024-01-18 16:45:14 -08:00
FlorianJoncour	14cc317ba4	OpenAI Server refactoring (#2360 )	2024-01-16 21:33:14 -08:00

... 2 3 4 5 6 ...

325 Commits