Harry Mellor
|
6dbe5b5c93
|
Remove checks for `None` for fields which should never be `None` (#17985)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-28 21:32:19 +00:00 |
Akshat Tripathi
|
643622ba46
|
[Hardware][TPU][V1] Multi-LoRA Optimisations for the V1 TPU backend (#15655)
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Signed-off-by: xihajun <junfan@krai.ai>
Signed-off-by: Jorge de Freitas <jorge.de-freitas22@imperial.ac.uk>
Signed-off-by: Jorge de Freitas <jorge@krai.ai>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: xihajun <junfan@krai.ai>
Co-authored-by: Jorge de Freitas <jorge.de-freitas22@imperial.ac.uk>
Co-authored-by: Jorge de Freitas <jorge@krai.ai>
|
2025-05-28 19:59:09 +00:00 |
Aaron Pham
|
a09c7ca9f2
|
[Chore][Spec Decode] Update check NoneType instead of assigning variables (#18836)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-05-28 18:57:19 +00:00 |
Mark McLoughlin
|
0e98964e94
|
[V1][Metrics] Remove metrics that were deprecated in 0.8 (#18837)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-28 18:54:12 +00:00 |
rongfu.leng
|
c68b5c63eb
|
[Misc] fix olmoe model layer can't laod in tp gt 1 (#18828)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-05-28 17:36:21 +00:00 |
Aaron Pham
|
fced756923
|
[Chore] update ty configuration (#18839)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-05-28 08:59:11 -07:00 |
Alex Brooks
|
321331b8ae
|
[Core] Add Lora Support to Beam Search (#18346)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2025-05-28 08:58:24 -07:00 |
daniel-salib
|
6e4cea1cc5
|
decrement server_load on listen for disconnect (#18784)
Signed-off-by: Daniel Salib <danielsalib@meta.com>
|
2025-05-28 22:15:12 +08:00 |
Reid
|
435fa95444
|
[Frontend] add run batch to CLI (#18804)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-28 07:08:57 -07:00 |
Harry Mellor
|
4c2b38ce9e
|
Enable Pydantic mypy checks and convert configs to Pydantic dataclasses (#17599)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-28 12:46:04 +00:00 |
Mengqing Cao
|
d781930f90
|
[Platform][Dist] Make torch distributed process group extendable (#18763)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-05-28 10:52:34 +00:00 |
Lucas Wilkinson
|
ce75efeecb
|
[BugFix] FA2 MLA Accuracy Issue (#18807)
Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com>
|
2025-05-28 08:59:39 +00:00 |
Richard Zou
|
aa42561e40
|
Fix PiecewiseCompileInterpreter (#17338)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-05-28 08:40:53 +00:00 |
wang.yuqi
|
de65fc8e1e
|
[CI] improve embed testing (#18747)
|
2025-05-28 00:16:35 -07:00 |
Cyrus Leung
|
0c492b7824
|
[Deprecation] Remove fallbacks for Embeddings API (#18795)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-28 15:09:04 +08:00 |
Cyrus Leung
|
0f0926b43f
|
[Deprecation] Remove unused sync methods in `async_timeout` (#18792)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-28 15:08:48 +08:00 |
Cyrus Leung
|
7f2c1a87e9
|
[Deprecation] Require overriding `get_dummy_text` and `get_dummy_mm_data` (#18796)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-28 15:08:35 +08:00 |
Rabi Mishra
|
b78f844a67
|
[Bugfix][FailingTest]Fix test_model_load_with_params.py (#18758)
Signed-off-by: rabi <ramishra@redhat.com>
|
2025-05-28 05:42:54 +00:00 |
RonaldBXu
|
5e13c07d00
|
[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (2) (#18781)
Signed-off-by: Ronald Xu <ronaldxu@amazon.com>
|
2025-05-28 05:09:14 +00:00 |
Divakar Verma
|
774c5fde30
|
[V1] fix torch profiling for V1 offline scenarios (#18445)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-05-28 04:16:30 +00:00 |
Guillaume Calmettes
|
9a21e331ff
|
[Bugfix]: correctly propagate errors message caught at the chat_templating step to the client (#18769)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
|
2025-05-28 03:35:43 +00:00 |
wang.yuqi
|
3e9ce609bd
|
[Bugfix] Fix nomic max_model_len (#18755)
|
2025-05-27 20:29:53 -07:00 |
fxmarty-amd
|
794ae1f551
|
[rocm] Fix wrong attention log (#18764)
Signed-off-by: Felix Marty <felmarty@amd.com>
|
2025-05-27 19:45:41 -07:00 |
Lukas Geiger
|
d73a9457a5
|
[Core] Improve Tensor serialisation (#18774)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-05-28 09:46:21 +08:00 |
Luka Govedič
|
a3896c7f02
|
[Build] Fixes for CMake install (#18570)
|
2025-05-27 20:49:24 -04:00 |
cascade
|
51e98e4ffd
|
[Bugfix] Disable prefix caching by default for benchmark (#18771)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-05-28 08:18:09 +08:00 |
Michael Goin
|
e56f44d9ec
|
Support datasets in `vllm bench serve` and sync with benchmark_[serving,datasets].py (#18566)
|
2025-05-27 19:59:48 -04:00 |
Satyajith Chilappagari
|
e0cbad4e30
|
[Neuron] Support quantization on neuron (#18283)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
|
2025-05-27 22:10:33 +00:00 |
Carol Zheng
|
b48d5cca16
|
[CI/Build] [TPU] Fix TPU CI exit code (#18282)
Signed-off-by: Carol Zheng <cazheng@google.com>
|
2025-05-27 14:54:59 -07:00 |
Michael Goin
|
5873877241
|
[Bugfix] Mistral tool calling when content is list (#18729)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-27 09:05:37 -07:00 |
Cyrus Leung
|
696259ca01
|
[Core] Automatically cast multi-modal input dtype (#18756)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-27 23:45:48 +08:00 |
chunxiaozheng
|
6b6d496114
|
optimize get_kv_cache_torch_dtype (#18531)
Signed-off-by: idellzheng <idellzheng@tencent.com>
|
2025-05-27 13:08:44 +00:00 |
cascade
|
aaa4ac1c95
|
Disable prefix cache by default for benchmark (#18639)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-05-27 20:06:34 +08:00 |
Mark McLoughlin
|
06a0338015
|
[V1][Metrics] Add API for accessing in-memory Prometheus metrics (#17010)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-27 09:37:06 +00:00 |
Cyrus Leung
|
4318c0559d
|
[CI/Build] Remove imports of built-in `re` (#18750)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-27 09:19:18 +00:00 |
Hyogeun Oh (오효근)
|
a68e293cb9
|
[Doc] Convert Sphinx directives ( `{class}`, `{meth}`, `{attr}`, ...) to MkDocs format for better documentation linking (#18663)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
|
2025-05-27 01:44:20 -07:00 |
Shawn Huang
|
6881107948
|
[BUG FIX] minicpm (#18739)
Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com>
Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com>
|
2025-05-27 01:04:49 -07:00 |
Kebe
|
e0f0ff87b8
|
[Build] fix cpu build missing libtbbmalloc.so (#18744)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-05-27 01:03:56 -07:00 |
maobaolong
|
c24b1572ac
|
Minor fix about MooncakeStoreConnector (#18721)
Signed-off-by: baoloongmao <baoloongmao@tencent.com>
|
2025-05-27 08:02:28 +00:00 |
Calvin Chen
|
4693a3438c
|
[Doc] cleanup deprecated flag for doc (#18715)
Signed-off-by: calvin chen <120380290@qq.com>
|
2025-05-27 07:12:02 +00:00 |
Łukasz Durejko
|
bbd9a84dc5
|
[Hardware][Intel-Gaudi] [CI/Build] Fix multiple containers using the same name in run-hpu-test.sh (#18752)
Signed-off-by: Lukasz Durejko <ldurejko@habana.ai>
|
2025-05-27 00:10:26 -07:00 |
almersawi
|
a547aeb828
|
feat(rocm-support): support mamba2 on rocm (#18565)
Signed-off-by: Islam Almersawi <islam.almersawi@openinnovation.ai>
Co-authored-by: Islam Almersawi <islam.almersawi@openinnovation.ai>
|
2025-05-27 00:07:53 -07:00 |
Reid
|
fc6d0c290f
|
[Misc] improve docs (#18734)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-27 07:07:01 +00:00 |
Cyrus Leung
|
753944fa9b
|
[Doc] Update reproducibility doc and example (#18741)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-27 07:03:13 +00:00 |
Cyrus Leung
|
25a817f202
|
[Doc] Update OOT model docs (#18742)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-27 06:30:31 +00:00 |
vllmellm
|
d260f799a9
|
[FEAT] [ROCm] Upgrade AITER Fused MoE kernels. (#18271)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-05-26 23:14:07 -07:00 |
Lukas Geiger
|
b50602d5f0
|
[Model][Gemma3] Cast image pixel values already on CPU (#18732)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-05-27 05:42:54 +00:00 |
Isotr0py
|
1f1b1bc03b
|
[V1][Quantization] Add CUDA graph compatible v1 GGUF support (#18646)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-27 04:40:28 +00:00 |
Reid
|
1f88dbd2bb
|
[Misc] improve web section group title display (#18684)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-27 04:35:16 +00:00 |
Lukas Geiger
|
0eebd74842
|
[Model][Gemma3] Simplify image input validation (#18710)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-05-27 11:13:37 +08:00 |