vllm/kernels at 55f1a468d97fbf9387e577e901b3f290ed8aa15b - vllm

History

Lucas Wilkinson 4e1c6a0264 [Bugfix] fix rotary embedding test for _get_padded_tensor_shape (#18229 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>		2025-05-16 01:32:45 +00:00
..
attention	[Bugfix] Fix fp8 tests for triton_unified_attention for Triton 3.3 (#18013 )	2025-05-15 13:26:34 +08:00
core	[Bugfix] fix rotary embedding test for _get_padded_tensor_shape (#18229 )	2025-05-16 01:32:45 +00:00
mamba	[Model] Mamba2 causal conv1d Refactor to Split Prefill and Decode Requests for Corresponding Kernels (#17146 )	2025-05-06 17:59:30 -07:00
moe	Modularize fused experts and integrate PPLX kernels (#15956 )	2025-05-14 13:11:54 -07:00
quantization	Modularize fused experts and integrate PPLX kernels (#15956 )	2025-05-14 13:11:54 -07:00
__init__.py	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 )	2024-05-13 23:50:09 +09:00
allclose_default.py	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 )	2025-02-02 11:58:18 -08:00
quant_utils.py	Add missing rocm_skinny_gemms kernel test to CI (#17060 )	2025-04-24 07:49:37 -07:00
test_cutlass_mla_decode.py	[NVIDIA] Support Cutlass MLA for Blackwell GPUs (#16032 )	2025-04-27 06:29:21 -07:00
test_fused_quant_activation.py	[AMD][torch.compile] Enable silu+fp8_quant fusion for rocm (#18082 )	2025-05-13 22:13:56 -07:00
test_triton_flash_attention.py	[Kernel][Triton][FP8] Adding fp8 and variable length sequence support to Triton FAv2 kernel (#12591 )	2025-04-27 00:35:08 +00:00
utils.py	[Misc] Replace os environ to monkeypatch in test suite (#14516 )	2025-03-16 20:35:57 -07:00