Commit Graph

221 Commits

Author SHA1 Message Date
Nick Hill e9d517f276
[BugFix] Fix chat API continuous usage stats (#9357) 2024-10-14 23:19:48 -07:00
youkaichao cbc2ef5529
[misc] hide best_of from engine (#9261)
Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>
2024-10-10 21:30:44 -07:00
Daniele 9a94ca4a5d
[Bugfix] fix OpenAI API server startup with --disable-frontend-multiprocessing (#8537) 2024-10-08 09:38:40 -07:00
Alex Brooks 069d3bd8d0
[Frontend] Add Early Validation For Chat Template / Tool Call Parser (#9151)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-10-08 14:31:26 +00:00
Brendan Wong 8c746226c9
[Frontend] API support for beam search for MQLLMEngine (#9117) 2024-10-08 05:51:43 +00:00
Brendan Wong 168cab6bbf
[Frontend] API support for beam search (#9087)
Co-authored-by: youkaichao <youkaichao@126.com>
2024-10-05 23:39:03 -07:00
Flávia Béo 0dcc8cbe5a
Adds truncate_prompt_tokens param for embeddings creation (#8999)
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
2024-10-04 18:31:40 +00:00
Roger Wang 26aa325f4f
[Core][VLM] Test registration for OOT multimodal models (#8717)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-04 10:38:25 -07:00
Joe Runde 062c89e7c9
[Frontend][Core] Move guided decoding params into sampling params (#8252)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-10-01 09:34:25 +08:00
danieljannai21 6c9ba48fde
[Frontend] Added support for HF's new `continue_final_message` parameter (#8942) 2024-09-29 17:59:47 +00:00
Nick Hill 4b377d6feb
[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829) 2024-09-26 16:46:43 -07:00
Alexander Matveev 1a2aef3e59
Add output streaming support to multi-step + async while ensuring RequestOutput obj reuse (#8335) 2024-09-23 15:38:04 -07:00
Jiaxin Shan 260d40b5ea
[Core] Support Lora lineage and base model metadata management (#6315) 2024-09-20 06:20:56 +00:00
Alexander Matveev 7c7714d856
[Core][Bugfix][Perf] Introduce `MQLLMEngine` to avoid `asyncio` OH (#8157)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-09-18 13:56:58 +00:00
Pooya Davoodi cea95dfb94
[Frontend] Create ErrorResponse instead of raising exceptions in run_batch (#8347) 2024-09-11 05:30:11 +00:00
Jiaxin Shan db3bf7c991
[Core] Support load and unload LoRA in api server (#6566)
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2024-09-05 18:10:33 -07:00
Roger Wang 5231f0898e
[Frontend][VLM] Add support for multiple multi-modal items (#8049) 2024-08-31 16:35:53 -07:00
Nick Hill 39178c7fbc
[Tests] Disable retries and use context manager for openai client (#7565) 2024-08-26 21:33:17 -07:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 0b769992ec
[Bugfix]: Use float32 for base64 embedding (#7855)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
2024-08-26 03:16:38 +00:00
Tyler Rockwood d81abefd2e
[Frontend] add json_schema support from OpenAI protocol (#7654) 2024-08-23 23:07:24 -07:00
Pooya Davoodi 8da48e4d95
[Frontend] Publish Prometheus metrics in run_batch API (#7641) 2024-08-23 23:04:22 -07:00
Maximilien de Bayser e25fee57c2
[BugFix] Fix server crash on empty prompt (#7746)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2024-08-23 13:12:44 +00:00
Joe Runde b903e1ba7f
[Frontend] error suppression cleanup (#7786)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-08-22 21:50:21 +00:00
Joe Runde cde9183b40
[Bug][Frontend] Improve ZMQ client robustness (#7443)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-08-22 02:18:11 +00:00
Peter Salas 1ca0d4f86b
[Model] Add UltravoxModel and UltravoxConfig (#7615) 2024-08-21 22:49:39 +00:00
Robert Shaw 970dfdc01d
[Frontend] Improve Startup Failure UX (#7716) 2024-08-21 19:53:01 +00:00
Robert Shaw f7e3b0c5aa
[Bugfix][Frontend] Fix Issues Under High Load With `zeromq` Frontend (#7394)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-08-21 13:34:14 -04:00
Cyrus Leung baaedfdb2d
[mypy] Enable following imports for entrypoints (#7248)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Fei <dfdfcai4@gmail.com>
2024-08-20 23:28:21 -07:00
Robert Shaw e3b318216d
[ Bugfix ] Fix Prometheus Metrics With `zeromq` Frontend (#7279)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-08-18 20:19:48 +00:00
Roger Wang bbf55c4805
[VLM] Refactor `MultiModalConfig` initialization and profiling (#7530) 2024-08-17 13:30:55 -07:00
Grant Pinkert f878c8feb0
[Feature]: Add OpenAI server prompt_logprobs support #6508 (#7453) 2024-08-16 02:38:08 +00:00
youkaichao 16422ea76f
[misc][plugin] add plugin system implementation (#7426) 2024-08-13 16:24:17 -07:00
youkaichao 33e5d7e6b6
[frontend] spawn engine process from api server process (#7484) 2024-08-13 15:40:17 -07:00
Peter Salas 00c3d68e45
[Frontend][Core] Add plumbing to support audio language models (#7446) 2024-08-13 17:39:33 +00:00
Cyrus Leung 7025b11d94
[Bugfix] Fix weight loading for Chameleon when TP>1 (#7410) 2024-08-13 05:33:41 +00:00
Andrew Wang 97a6be95ba
[Misc] improve logits processors logging message (#7435) 2024-08-13 02:29:34 +00:00
Pooya Davoodi 249b88228d
[Frontend] Support embeddings in the run_batch API (#7132)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-08-09 09:48:21 -07:00
Cyrus Leung 7eb4a51c5f
[Core] Support serving encoder/decoder models (#7258) 2024-08-09 10:39:41 +08:00
Joe Runde 21b9c49aa3
[Frontend] Kill the server on engine death (#6594)
Signed-off-by: Joe Runde <joe@joerun.de>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-08-08 09:47:48 -07:00
Maximilien de Bayser fde47d3bc2
[BugFix] Fix frontend multiprocessing hang (#7217)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
2024-08-07 18:09:36 +00:00
Cyrus Leung 66d617e343
[Frontend] Gracefully handle missing chat template and fix CI failure (#7238)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-08-07 09:12:05 +00:00
youkaichao dfb1a15dcb
[ci][frontend] deduplicate tests (#7101) 2024-08-05 15:59:22 -07:00
Yihuan Bu 654bc5ca49
Support for guided decoding for offline LLM (#6878)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-04 03:12:09 +00:00
Robert Shaw ed812a73fa
[ Frontend ] Multiprocessing for OpenAI Server with `zeromq` (#6883)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-08-02 18:27:28 -07:00
youkaichao 806949514a
[ci] set timeout for test_oot_registration.py (#7082) 2024-08-02 10:03:24 -07:00
zifeitong 3c10591ef2
[Bugfix] Set SamplingParams.max_tokens for OpenAI requests if not provided by user (#6954) 2024-07-31 21:13:34 -07:00
Nick Hill 9f69d8245a
[Frontend] New `allowed_token_ids` decoding request parameter (#6753) 2024-07-29 23:37:27 +00:00
Chang Su 316a41ac1d
[Bugfix] Fix encoding_format in examples/openai_embedding_client.py (#6755) 2024-07-24 22:48:07 -07:00
Evan Z. Liu 5689e256ba
[Frontend] Represent tokens with identifiable strings (#6626) 2024-07-25 09:51:00 +08:00
Yehoshua Cohen 58f53034ad
[Frontend] Add Usage data in each chunk for chat_serving. #6540 (#6652) 2024-07-23 11:41:55 -07:00
Cyrus Leung 97234be0ec
[Misc] Manage HTTP connections in one place (#6600) 2024-07-22 21:32:02 -07:00
Cyrus Leung 739b61a348
[Frontend] Refactor prompt processing (#4028)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-22 10:13:53 -07:00
Cyrus Leung 6366efc67b
[Bugfix][Frontend] Fix missing `/metrics` endpoint (#6463) 2024-07-19 03:55:13 +00:00
Nick Hill e2fbaee725
[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs (#6227)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-07-18 15:13:30 +08:00
Cyrus Leung 5bf35a91e4
[Doc][CI/Build] Update docs and tests to use `vllm serve` (#6431) 2024-07-17 07:43:21 +00:00
sasha0552 7a3d2a5b95
[Frontend] Support for chat completions input in the tokenize endpoint (#5923) 2024-07-16 20:18:09 +08:00
Joe d92b3c5cde
[Bugfix][CI/Build] Test prompt adapters in openai entrypoint tests (#6419) 2024-07-15 18:54:15 -07:00
zifeitong b47008b4d2
[BugFix] BatchResponseData body should be optional (#6345)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-07-15 04:06:09 +00:00
youkaichao 41708e5034
[ci] try to add multi-node tests (#6280)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-12 21:51:48 -07:00
Yihuan Bu b039cbbce3
[Misc] add fixture to guided processor tests (#6341) 2024-07-12 09:55:39 -07:00
jvlunteren f1e15da6fe
[Frontend] Continuous usage stats in OpenAI completion API (#5742) 2024-07-05 10:37:09 -07:00
xwjiang2010 d9e98f42e4
[vlm] Remove vision language config. (#6089)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-03 22:14:16 +00:00
SangBin Cho d18bab3587
[CI] Fix base url doesn't strip "/" (#6087) 2024-07-02 21:31:25 -07:00
Murali Andoorveedu c5832d2ae9
[Core] Pipeline Parallel Support (#4412)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-02 10:58:08 -07:00
xwjiang2010 98d6682cd1
[VLM] Remove `image_input_type` from VLM config (#5852)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-02 07:57:09 +00:00
llmpros c6c240aa0a
[Frontend]: Support base64 embedding (#5935)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-06-30 23:53:00 +08:00
Cyrus Leung 9d47f64eb6
[CI/Build] [3/3] Reorganize entrypoints tests (#5966) 2024-06-30 12:58:49 +08:00
Cyrus Leung 5ae5ed1e60
[Core] Consolidate prompt arguments to LLM engines (#4328)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-05-28 13:29:31 -07:00
Chang Su e254497b66
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734) 2024-05-11 11:30:37 -07:00
Cyrus Leung f12b20decc
[Frontend] Move async logic outside of constructor (#4674) 2024-05-08 22:48:33 -07:00
Sebastian Schoennenbeck f8e7adda21
Fix/async chat serving (#2727) 2024-05-03 11:04:14 -07:00