Reid
|
435fa95444
|
[Frontend] add run batch to CLI (#18804)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-28 07:08:57 -07:00 |
Harry Mellor
|
4c2b38ce9e
|
Enable Pydantic mypy checks and convert configs to Pydantic dataclasses (#17599)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-28 12:46:04 +00:00 |
Mengqing Cao
|
d781930f90
|
[Platform][Dist] Make torch distributed process group extendable (#18763)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-05-28 10:52:34 +00:00 |
Lucas Wilkinson
|
ce75efeecb
|
[BugFix] FA2 MLA Accuracy Issue (#18807)
Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com>
|
2025-05-28 08:59:39 +00:00 |
Richard Zou
|
aa42561e40
|
Fix PiecewiseCompileInterpreter (#17338)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-05-28 08:40:53 +00:00 |
wang.yuqi
|
de65fc8e1e
|
[CI] improve embed testing (#18747)
|
2025-05-28 00:16:35 -07:00 |
Cyrus Leung
|
0c492b7824
|
[Deprecation] Remove fallbacks for Embeddings API (#18795)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-28 15:09:04 +08:00 |
Cyrus Leung
|
0f0926b43f
|
[Deprecation] Remove unused sync methods in `async_timeout` (#18792)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-28 15:08:48 +08:00 |
Cyrus Leung
|
7f2c1a87e9
|
[Deprecation] Require overriding `get_dummy_text` and `get_dummy_mm_data` (#18796)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-28 15:08:35 +08:00 |
Rabi Mishra
|
b78f844a67
|
[Bugfix][FailingTest]Fix test_model_load_with_params.py (#18758)
Signed-off-by: rabi <ramishra@redhat.com>
|
2025-05-28 05:42:54 +00:00 |
RonaldBXu
|
5e13c07d00
|
[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (2) (#18781)
Signed-off-by: Ronald Xu <ronaldxu@amazon.com>
|
2025-05-28 05:09:14 +00:00 |
Divakar Verma
|
774c5fde30
|
[V1] fix torch profiling for V1 offline scenarios (#18445)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-05-28 04:16:30 +00:00 |
Guillaume Calmettes
|
9a21e331ff
|
[Bugfix]: correctly propagate errors message caught at the chat_templating step to the client (#18769)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
|
2025-05-28 03:35:43 +00:00 |
wang.yuqi
|
3e9ce609bd
|
[Bugfix] Fix nomic max_model_len (#18755)
|
2025-05-27 20:29:53 -07:00 |
fxmarty-amd
|
794ae1f551
|
[rocm] Fix wrong attention log (#18764)
Signed-off-by: Felix Marty <felmarty@amd.com>
|
2025-05-27 19:45:41 -07:00 |
Lukas Geiger
|
d73a9457a5
|
[Core] Improve Tensor serialisation (#18774)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-05-28 09:46:21 +08:00 |
Luka Govedič
|
a3896c7f02
|
[Build] Fixes for CMake install (#18570)
|
2025-05-27 20:49:24 -04:00 |
cascade
|
51e98e4ffd
|
[Bugfix] Disable prefix caching by default for benchmark (#18771)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-05-28 08:18:09 +08:00 |
Michael Goin
|
e56f44d9ec
|
Support datasets in `vllm bench serve` and sync with benchmark_[serving,datasets].py (#18566)
|
2025-05-27 19:59:48 -04:00 |
Satyajith Chilappagari
|
e0cbad4e30
|
[Neuron] Support quantization on neuron (#18283)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
|
2025-05-27 22:10:33 +00:00 |
Carol Zheng
|
b48d5cca16
|
[CI/Build] [TPU] Fix TPU CI exit code (#18282)
Signed-off-by: Carol Zheng <cazheng@google.com>
|
2025-05-27 14:54:59 -07:00 |
Michael Goin
|
5873877241
|
[Bugfix] Mistral tool calling when content is list (#18729)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-27 09:05:37 -07:00 |
Cyrus Leung
|
696259ca01
|
[Core] Automatically cast multi-modal input dtype (#18756)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-27 23:45:48 +08:00 |
chunxiaozheng
|
6b6d496114
|
optimize get_kv_cache_torch_dtype (#18531)
Signed-off-by: idellzheng <idellzheng@tencent.com>
|
2025-05-27 13:08:44 +00:00 |
cascade
|
aaa4ac1c95
|
Disable prefix cache by default for benchmark (#18639)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-05-27 20:06:34 +08:00 |
Mark McLoughlin
|
06a0338015
|
[V1][Metrics] Add API for accessing in-memory Prometheus metrics (#17010)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-27 09:37:06 +00:00 |
Cyrus Leung
|
4318c0559d
|
[CI/Build] Remove imports of built-in `re` (#18750)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-27 09:19:18 +00:00 |
Hyogeun Oh (오효근)
|
a68e293cb9
|
[Doc] Convert Sphinx directives ( `{class}`, `{meth}`, `{attr}`, ...) to MkDocs format for better documentation linking (#18663)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
|
2025-05-27 01:44:20 -07:00 |
Shawn Huang
|
6881107948
|
[BUG FIX] minicpm (#18739)
Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com>
Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com>
|
2025-05-27 01:04:49 -07:00 |
Kebe
|
e0f0ff87b8
|
[Build] fix cpu build missing libtbbmalloc.so (#18744)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-05-27 01:03:56 -07:00 |
maobaolong
|
c24b1572ac
|
Minor fix about MooncakeStoreConnector (#18721)
Signed-off-by: baoloongmao <baoloongmao@tencent.com>
|
2025-05-27 08:02:28 +00:00 |
Calvin Chen
|
4693a3438c
|
[Doc] cleanup deprecated flag for doc (#18715)
Signed-off-by: calvin chen <120380290@qq.com>
|
2025-05-27 07:12:02 +00:00 |
Łukasz Durejko
|
bbd9a84dc5
|
[Hardware][Intel-Gaudi] [CI/Build] Fix multiple containers using the same name in run-hpu-test.sh (#18752)
Signed-off-by: Lukasz Durejko <ldurejko@habana.ai>
|
2025-05-27 00:10:26 -07:00 |
almersawi
|
a547aeb828
|
feat(rocm-support): support mamba2 on rocm (#18565)
Signed-off-by: Islam Almersawi <islam.almersawi@openinnovation.ai>
Co-authored-by: Islam Almersawi <islam.almersawi@openinnovation.ai>
|
2025-05-27 00:07:53 -07:00 |
Reid
|
fc6d0c290f
|
[Misc] improve docs (#18734)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-27 07:07:01 +00:00 |
Cyrus Leung
|
753944fa9b
|
[Doc] Update reproducibility doc and example (#18741)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-27 07:03:13 +00:00 |
Cyrus Leung
|
25a817f202
|
[Doc] Update OOT model docs (#18742)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-27 06:30:31 +00:00 |
vllmellm
|
d260f799a9
|
[FEAT] [ROCm] Upgrade AITER Fused MoE kernels. (#18271)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-05-26 23:14:07 -07:00 |
Lukas Geiger
|
b50602d5f0
|
[Model][Gemma3] Cast image pixel values already on CPU (#18732)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-05-27 05:42:54 +00:00 |
Isotr0py
|
1f1b1bc03b
|
[V1][Quantization] Add CUDA graph compatible v1 GGUF support (#18646)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-27 04:40:28 +00:00 |
Reid
|
1f88dbd2bb
|
[Misc] improve web section group title display (#18684)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-27 04:35:16 +00:00 |
Lukas Geiger
|
0eebd74842
|
[Model][Gemma3] Simplify image input validation (#18710)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-05-27 11:13:37 +08:00 |
Harry Mellor
|
27bebcd897
|
Convert `examples` to `ruff-format` (#18400)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-26 16:57:54 +00:00 |
Lukas Geiger
|
e7523c2e03
|
[V1][Sampler] Improve performance of FlashInfer sampling by sampling logits instead of probs (#18608)
|
2025-05-26 11:49:36 -04:00 |
Cyrus Leung
|
a869baca73
|
[Bugfix] Fix Llama GGUF initialization (#18717)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-26 07:49:22 -07:00 |
Cyrus Leung
|
82e2339b06
|
[Doc] Move examples and further reorganize user guide (#18666)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-26 07:38:04 -07:00 |
Cyrus Leung
|
9553fdb41e
|
[Doc] Improve API docs (#18713)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-26 07:33:34 -07:00 |
dylan
|
243eb9199f
|
[Bugfix]: handle hf-xet CAS error when loading Qwen3 weights in vLLM (#18701)
|
2025-05-26 07:10:56 -07:00 |
Reid
|
0665e29998
|
[Misc] add AutoGen integration (#18712)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-05-26 13:56:18 +00:00 |
Łukasz Durejko
|
e76be06550
|
[Hardware][Intel-Gaudi] [CI/Build] Add tensor parallel size = 2 test to HPU CI (#18709)
Signed-off-by: Lukasz Durejko <ldurejko@habana.ai>
|
2025-05-26 05:26:07 -07:00 |