vllm/csrc/moe
Lucas Wilkinson 7eb4255628
[BugFix] Accuracy fix for llama4 int4 - improperly casted scales (#16801)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-04-17 22:13:29 -07:00
..
marlin_kernels Update `pre-commit` hooks (#12475) 2025-01-27 17:23:08 -07:00
marlin_moe_wna16 [Kernel] moe wna16 marlin kernel (#14447) 2025-04-14 20:05:22 -07:00
marlin_moe_ops.cu [Bugfix] Fix support for dimension like integers and ScalarType (#9299) 2024-10-17 19:08:34 +00:00
moe_align_sum_kernels.cu Optimize moe_align_block_size for deepseek_v3 (#12850) 2025-02-13 18:43:37 -05:00
moe_ops.h [ROCm][Bugfix] Ensure that the moe_wna16_gemm kernel is not built on ROCm platforms. (#14629) 2025-03-12 08:00:28 -04:00
moe_wna16.cu [BugFix] Accuracy fix for llama4 int4 - improperly casted scales (#16801) 2025-04-17 22:13:29 -07:00
moe_wna16_utils.h [Kernel] moe wna16 cuda kernel (#13321) 2025-03-10 20:12:40 -04:00
topk_softmax_kernels.cu [Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047) 2024-06-09 16:23:30 -04:00
torch_bindings.cpp [Kernel] moe wna16 marlin kernel (#14447) 2025-04-14 20:05:22 -07:00