.. |
attention
|
[Kernel] Raise verbose error and consolidate `num_heads/num_kv_heads` divisibility check (#19339)
|
2025-06-15 13:43:48 +08:00 |
core
|
[CI] change spell checker from codespell to typos (#18711)
|
2025-06-11 19:57:10 -07:00 |
mamba
|
[CI] change spell checker from codespell to typos (#18711)
|
2025-06-11 19:57:10 -07:00 |
moe
|
[Perf] Optimize `moe_align_block_size` CUDA kernel (#19572)
|
2025-06-17 11:49:26 -07:00 |
quantization
|
[Perf] Vectorize static / dynamic INT8 quant kernels (#19233)
|
2025-06-12 06:51:41 -07:00 |
__init__.py
|
[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425)
|
2024-05-13 23:50:09 +09:00 |
allclose_default.py
|
[Misc] Add SPDX-FileCopyrightText (#19100)
|
2025-06-03 11:20:17 -07:00 |
quant_utils.py
|
[Misc] Add SPDX-FileCopyrightText (#19100)
|
2025-06-03 11:20:17 -07:00 |
test_apply_repetition_penalties.py
|
[KERNEL] Sampler. CUDA kernel for applying repetition penalty (#18437)
|
2025-06-03 21:13:01 -07:00 |
test_cutlass_mla_decode.py
|
[NVIDIA] Add Cutlass MLA backend (#17625)
|
2025-06-03 21:40:26 -07:00 |
test_flex_attention.py
|
Fixes IMA for TP w/ flex-attention (#19712)
|
2025-06-17 04:01:50 +00:00 |
test_fused_quant_activation.py
|
[Misc] Add SPDX-FileCopyrightText (#19100)
|
2025-06-03 11:20:17 -07:00 |
test_triton_flash_attention.py
|
[Misc] Add SPDX-FileCopyrightText (#19100)
|
2025-06-03 11:20:17 -07:00 |
utils.py
|
[Misc] Add SPDX-FileCopyrightText (#19100)
|
2025-06-03 11:20:17 -07:00 |