vllm/kernels at woosuk/async-sched - vllm - Gitea: Git with a cup of tea

History

TY-AMD 96453cfa83 [BugFix][V1][ROCm] Triton MLA uses V0 backend on V1 engine (#19067 ) Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>		2025-07-01 16:12:19 +08:00
..
attention	[BugFix][V1][ROCm] Triton MLA uses V0 backend on V1 engine (#19067 )	2025-07-01 16:12:19 +08:00
core	[CI] change spell checker from codespell to typos (#18711 )	2025-06-11 19:57:10 -07:00
mamba	[CI] change spell checker from codespell to typos (#18711 )	2025-06-11 19:57:10 -07:00
moe	[Bugfix] Fix deepep tests (#20288 )	2025-07-01 15:29:08 +08:00
quantization	Enable ZP Support for Machete (#20268 )	2025-07-01 07:12:20 +00:00
__init__.py	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 )	2024-05-13 23:50:09 +09:00
allclose_default.py	[Misc] Add SPDX-FileCopyrightText (#19100 )	2025-06-03 11:20:17 -07:00
quant_utils.py	[Misc] Add SPDX-FileCopyrightText (#19100 )	2025-06-03 11:20:17 -07:00
test_apply_repetition_penalties.py	[KERNEL] Sampler. CUDA kernel for applying repetition penalty (#18437 )	2025-06-03 21:13:01 -07:00
test_cutlass_mla_decode.py	[NVIDIA] Add Cutlass MLA backend (#17625 )	2025-06-03 21:40:26 -07:00
test_flex_attention.py	Fixes IMA for TP w/ flex-attention (#19712 )	2025-06-17 04:01:50 +00:00
test_fused_quant_activation.py	[Misc] Add SPDX-FileCopyrightText (#19100 )	2025-06-03 11:20:17 -07:00
test_triton_flash_attention.py	[Misc] Add SPDX-FileCopyrightText (#19100 )	2025-06-03 11:20:17 -07:00
utils.py	[Kernels][Bugfix] Use torch op for all kernels in FusedMoE forward. Add additional testing for cudagraphs. (#19717 )	2025-06-24 23:22:58 -07:00