Default Branch

0e3fe896e2 · Support Llama 4 for fused_marlin_moe (#20457) · Updated 2025-07-04 15:55:10 +08:00

Branches

ec0ff9f64b · [V0 deprecation] Remove V0 CPU (#20437) · Updated 2025-07-04 11:26:30 +08:00    vLLM

16
12

2f2fcb31b8 · [Misc] Remove _maybe_ignore_quant_config from GLM4.1v (#20432) · Updated 2025-07-04 05:41:13 +08:00    vLLM

11
0
Included

7d092fc32c · revert skip-merge-desc · Updated 2025-07-04 04:30:45 +08:00    vLLM

14
3

a2d62f9d94 · MLA FlashInfer Ragged Prefill Support · Updated 2025-07-03 00:29:26 +08:00    vLLM

319
1

f8768f5244 · Remove executable flag on a few files · Updated 2025-07-02 21:58:53 +08:00    vLLM

42
1

8867a8b052 · Change default model to Qwen3-0.6B · Updated 2025-07-02 04:39:38 +08:00    vLLM

68
1

515a1915d7 · update · Updated 2025-07-02 03:55:37 +08:00    vLLM

67
15

8d6f411247 · fix · Updated 2025-07-02 02:24:59 +08:00    vLLM

67
2

17bccecb1c · add mtbench dataste · Updated 2025-06-30 13:30:12 +08:00    vLLM

1351
2

b801bf30d7 · iterate · Updated 2025-06-29 06:21:17 +08:00    vLLM

123
2

14a6efb83e · hack for topk ids · Updated 2025-06-26 04:46:41 +08:00    vLLM

168
1

e53382cc2e · Sage Moore fixes for full cuda graph support for DeepEP+DeepGEMM LL · Updated 2025-06-24 23:21:52 +08:00    vLLM

196
1

fcec8c8827 · add debug cruft · Updated 2025-06-21 04:37:37 +08:00    vLLM

283
12

86bfededba · [Do not merge] Cache model info · Updated 2025-06-19 13:31:33 +08:00    vLLM

267
1

e17250f0d2 · fix precommit · Updated 2025-06-19 12:17:43 +08:00    vLLM

270
1

265202c82f · Improve deep gemm logging · Updated 2025-06-12 09:01:45 +08:00    vLLM

393
1

b6553be1bc · [Misc] Slight improvement of the BNB (#19418) · Updated 2025-06-10 21:51:49 +08:00    vLLM

424
0
Included

ca15f0afe6 · ci(Mergify): configuration update · Updated 2025-06-09 15:44:44 +08:00    vLLM

455
1

d3b51c9bba · fix build · Updated 2025-06-09 08:38:37 +08:00    vLLM

700
10

9a76ef07b9 · Add pandas and datasets for benchmarks · Updated 2025-06-04 21:51:59 +08:00    vLLM

530
1