.. |
attention
|
[Bugfix] Fix some narrowing conversion warnings (#20141)
|
2025-06-27 09:01:28 -07:00 |
core
|
[Kernel] fp4 marlin kernel (#17687)
|
2025-05-10 19:58:49 -07:00 |
cpu
|
[CPU] Update custom ops for the CPU backend (#20255)
|
2025-07-01 07:25:03 +00:00 |
cutlass_extensions
|
[Misc] Add SPDX-FileCopyrightText (#19100)
|
2025-06-03 11:20:17 -07:00 |
mamba
|
[Bugfix] Fix some narrowing conversion warnings (#20141)
|
2025-06-27 09:01:28 -07:00 |
moe
|
remove unused variables in marlin_template.h (#20236)
|
2025-07-02 00:51:52 +00:00 |
prepare_inputs
|
[MISC] Remove unused variableds in C++ (#19609)
|
2025-06-15 20:05:28 -07:00 |
quantization
|
[Perf] Optimize Vectorization Utils for Int 8 Quantization Kernels (#20331)
|
2025-07-04 15:06:24 +08:00 |
quickreduce
|
[Feature] add quick all reduce (#19744)
|
2025-06-26 20:54:24 -07:00 |
rocm
|
[Fix][ROCm] Remove unused variables to fix build error on GFX11/12 (#19891)
|
2025-06-27 07:14:44 -07:00 |
sparse/cutlass
|
[CI] change spell checker from codespell to typos (#18711)
|
2025-06-11 19:57:10 -07:00 |
activation_kernels.cu
|
Modularize fused experts and integrate PPLX kernels (#15956)
|
2025-05-14 13:11:54 -07:00 |
cache.h
|
[Attention] MLA with chunked prefill (#12639)
|
2025-02-21 15:30:12 -08:00 |
cache_kernels.cu
|
Allocate kv_cache with stride order (#16605)
|
2025-04-25 22:03:31 -07:00 |
cuda_compat.h
|
[Kernel][ROCm][AMD] enable fused topk_softmax kernel for moe layer (#4927)
|
2024-06-02 14:13:26 -07:00 |
cuda_utils.h
|
[Attention] MLA with chunked prefill (#12639)
|
2025-02-21 15:30:12 -08:00 |
cuda_utils_kernels.cu
|
[NVIDIA] Support nvfp4 quantization (#12784)
|
2025-02-12 19:51:51 -08:00 |
cuda_view.cu
|
[V1] Fully Transparent Implementation of CPU Offloading (#15354)
|
2025-03-31 20:22:34 +08:00 |
cumem_allocator.cpp
|
[core] improve error handling when wake up from sleep mode (#12981)
|
2025-02-10 09:38:57 +08:00 |
custom_all_reduce.cu
|
[Distributed] Add custom allreduce support for ROCM (#14125)
|
2025-03-31 22:49:12 -07:00 |
custom_all_reduce.cuh
|
fix: spelling (#16466)
|
2025-04-11 23:24:22 -07:00 |
custom_all_reduce_test.cu
|
[Distributed] Add custom allreduce support for ROCM (#14125)
|
2025-03-31 22:49:12 -07:00 |
custom_quickreduce.cu
|
[Feature] add quick all reduce (#19744)
|
2025-06-26 20:54:24 -07:00 |
dispatch_utils.h
|
Modularize fused experts and integrate PPLX kernels (#15956)
|
2025-05-14 13:11:54 -07:00 |
layernorm_kernels.cu
|
[Kernel] Use fused rmsnorm for some models like qwen3 series (#17735)
|
2025-05-06 23:10:02 -07:00 |
layernorm_quant_kernels.cu
|
dynamic distpatch of fp8 kernels (#14245)
|
2025-03-11 10:54:56 -04:00 |
ops.h
|
[Feature] add quick all reduce (#19744)
|
2025-06-26 20:54:24 -07:00 |
permute_cols.cu
|
[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701)
|
2024-09-23 13:46:26 -04:00 |
pos_encoding_kernels.cu
|
[Kernel] Have rotary embeddings support tensors (#18046)
|
2025-05-14 15:43:55 -07:00 |
sampler.cu
|
[KERNEL] Sampler. CUDA kernel for applying repetition penalty (#18437)
|
2025-06-03 21:13:01 -07:00 |
torch_bindings.cpp
|
[Feature] add quick all reduce (#19744)
|
2025-06-26 20:54:24 -07:00 |
type_convert.cuh
|
[torch.compile] Fuse RMSNorm with quant (#9138)
|
2024-11-08 21:20:08 +00:00 |