Russell Bryant
|
6d0df0ebeb
|
[Docs] Generate correct github links for decorated functions (#17125)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-24 10:39:43 -07:00 |
Harry Mellor
|
0fa939e2d1
|
Improve configs - `LoRAConfig` + `PromptAdapterConfig` (#16980)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-24 10:29:34 -07:00 |
Harry Mellor
|
0422ce109f
|
Add `:markdownhelp:` to `EngineArgs` docs so markdown docstrings render properly (#17124)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-24 10:28:45 -07:00 |
Eyshika Agarwal
|
47bdee409c
|
Molmo Requirements (#17026)
Signed-off-by: Eyshika Agarwal <eyshikaengineer@gmail.com>
Signed-off-by: eyshika <eyshikaengineer@gmail.com>
|
2025-04-24 10:08:37 -07:00 |
Atilla
|
49f189439d
|
existing torch installation pip command fix for docs (#17059)
|
2025-04-24 10:07:21 -07:00 |
Aaruni Aggarwal
|
5adf6f6b7f
|
Updating builkite job for IBM Power (#17111)
Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com>
|
2025-04-24 10:06:17 -07:00 |
Russell Bryant
|
4115f19958
|
[CI] Add automation for the `tool-calling` github label (#17118)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-24 09:22:00 -07:00 |
Mark McLoughlin
|
340d7b1b21
|
[V1][Spec Decoding] Add num_drafts and num_accepted_tokens_per_position metrics (#16665)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-04-24 08:57:40 -07:00 |
Reid
|
1bcbcbf574
|
[Misc] refactor example series - structured outputs (#17040)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-24 07:49:48 -07:00 |
Michael Goin
|
82e43b2d7e
|
Add missing rocm_skinny_gemms kernel test to CI (#17060)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-24 07:49:37 -07:00 |
wang.yuqi
|
67309a1cb5
|
[Frontend] Using matryoshka_dimensions control the allowed output dimensions. (#16970)
|
2025-04-24 07:06:28 -07:00 |
Shanshan Shen
|
b724afe343
|
[V1][Structured Output] Clear xgrammar compiler object when engine core shut down to avoid nanobind leaked warning (#16954)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2025-04-24 06:15:03 -07:00 |
Harry Mellor
|
21f4f1c9a4
|
Improve static type checking in `LoRAModelRunnerMixin` (#17104)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-24 06:14:47 -07:00 |
Isotr0py
|
b0c1f6202d
|
[Misc] Remove OLMo2 config copy (#17066)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-04-24 06:14:32 -07:00 |
Rui Qiao
|
c0dfd97519
|
[V1][PP] Optimization: continue scheduling prefill chunks (#17080)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-04-24 05:27:08 -07:00 |
Harry Mellor
|
a9138e85b1
|
Fix OOT registration test (#17099)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-24 04:44:12 -07:00 |
Harry Mellor
|
0a05ed57e6
|
Simplify `TokenizerGroup` (#16790)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-24 04:43:56 -07:00 |
Michael Goin
|
14288d1332
|
Disable enforce_eager for V1 TPU sampler and structured output tests (#17016)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-24 02:50:09 -07:00 |
Woosuk Kwon
|
b411418ff0
|
[Chore] Remove Sampler from Model Code (#17084)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-24 02:49:33 -07:00 |
omer-dayan
|
2bc0f72ae5
|
Add docs for runai_streamer_sharded (#17093)
Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-04-24 01:03:21 -07:00 |
Reid
|
9c1244de57
|
[doc] update to hyperlink (#17096)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-24 00:58:08 -07:00 |
Reid
|
db2f8d915c
|
[V1] Update structured output (#16812)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-23 23:57:17 -07:00 |
张宇
|
6167c0e5d2
|
[Bugfix][Core] add seq_id_to_seq_group clearing to avoid memory leak when s… (#16472)
Signed-off-by: 开哲 <kaizhe.zy@alibaba-inc.com>
Co-authored-by: 开哲 <kaizhe.zy@alibaba-inc.com>
|
2025-04-24 11:25:37 +08:00 |
Areeb Syed
|
ed2e464653
|
Addendum Fix to support FIPS enabled machines with MD5 hashing (#17043)
Signed-off-by: sydarb <areebsyed237@gmail.com>
|
2025-04-23 19:55:00 -07:00 |
Harry Mellor
|
2c8ed8ee48
|
More informative error when using Transformers backend (#16988)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-23 19:54:03 -07:00 |
Michael Goin
|
ed50f46641
|
[Bugfix] Enable V1 usage stats (#16986)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-04-23 19:54:00 -07:00 |
Woosuk Kwon
|
46e678bcff
|
[Minor] Use larger batch sizes for A100/B100/B200/MI300x (#17073)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-23 19:18:59 -07:00 |
Chen Xia
|
6b2427f995
|
[Quantization]add prefix for commandA quantized model (#17017)
|
2025-04-23 17:32:40 -07:00 |
Sangyeon Cho
|
b07d741661
|
[CI/Build] workaround for CI build failure (#17070)
Signed-off-by: csy1204 <josang1204@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-04-23 16:14:18 -07:00 |
Woosuk Kwon
|
41fb013d29
|
[V1][Spec Decode] Always use argmax for sampling draft tokens (#16899)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-23 14:57:43 -07:00 |
Yong Hoon Shin
|
32d4b669d0
|
[BugFix][V1] Fix int32 token index overflow when preparing input ids (#16806)
|
2025-04-23 12:12:35 -07:00 |
Travis Johnson
|
3cde34a4a4
|
[Frontend] Support guidance:no-additional-properties for compatibility with xgrammar (#15949)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2025-04-23 18:34:41 +00:00 |
Harry Mellor
|
bdb3660312
|
Use `@property` and private field for `data_parallel_rank_local` (#17053)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-23 08:50:08 -07:00 |
Harry Mellor
|
f3a21e9c68
|
`CacheConfig.block_size` should always be `int` when used (#17052)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-23 08:50:05 -07:00 |
Harry Mellor
|
8e630d680e
|
Improve Transformers backend model loading QoL (#17039)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-23 07:33:51 -07:00 |
Russell Bryant
|
af869f6dff
|
[CI] Update structured-output label automation (#17055)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-23 07:33:14 -07:00 |
Harry Mellor
|
53c0fa1e25
|
Ensure that `pid` passed to `kill_process_tree` is `int` for `mypy` (#17051)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-23 07:32:26 -07:00 |
Michael Yao
|
f7912cba3d
|
[Doc] Add top anchor and a note to quantization/bitblas.md (#17042)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-04-23 07:32:16 -07:00 |
Michael Goin
|
6317a5174a
|
Categorize `tests/kernels/` based on kernel type (#16799)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-23 09:21:07 -04:00 |
Michael Goin
|
aa72d9a4ea
|
Mistral-format support for compressed-tensors (#16803)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-23 08:46:23 -04:00 |
Russell Bryant
|
ce17db8085
|
[CI] Run v1/test_serial_utils.py in CI (#16996)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-23 01:13:34 -07:00 |
Chauncey
|
8c87a9ad46
|
[Bugfix] Fix AssertionError: skip_special_tokens=False is not supported for Mistral tokenizers (#16964)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-04-23 07:24:09 +00:00 |
huafeng
|
ec69124eb4
|
[Misc] Improve readability of get_open_port function. (#17024)
Signed-off-by: gitover22 <qidizou88@gmail.com>
|
2025-04-23 06:16:53 +00:00 |
Lucas Wilkinson
|
d0da99fb70
|
[BugFix] llama4 fa3 fix - RuntimeError: scheduler_metadata must have shape (metadata_size) (#16998)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-04-22 21:49:24 -07:00 |
Nick Hill
|
b2f195c429
|
[V1] Avoid socket errors during shutdown when requests are in in-flight (#16807)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-23 12:36:29 +08:00 |
vllmellm
|
047797ef90
|
[Bugfix] Triton FA function takes no keyword arguments (#16902)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-04-22 21:35:24 -07:00 |
Reid
|
eb8ef4224d
|
[doc] add download path tips (#17013)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-23 04:06:30 +00:00 |
Chendi.Xue
|
56a735261c
|
[INTEL-HPU][v0] Port delayed sampling to upstream (#16949)
Signed-off-by: Michal Adamczyk <michal.adamczyk@intel.com>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Co-authored-by: Michal Adamczyk <madamczyk@habana.ai>
|
2025-04-22 20:14:11 -07:00 |
youkaichao
|
e1cf90e099
|
[misc] tune some env vars for GB200 (#16992)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-04-23 10:59:48 +08:00 |
Chauncey
|
6bc1e30ef9
|
Revert "[Misc] Add S3 environment variables for better support of MinIO." (#17021)
|
2025-04-22 19:22:29 -07:00 |