Commit Graph

12 Commits

Author SHA1 Message Date
Jinzhen Lin 8c32b08a86
[Kernel] Fix awq error when n is not divisable by 128 (#13227) 2025-02-13 20:07:05 -08:00
Tyler Michael Smith 6e4852ce28
[CI/Build] Suppress divide-by-zero and missing return statement warnings (#7001) 2024-08-05 16:00:01 -04:00
Tyler Michael Smith aae6d36f7e
[Kernel] Remove unused variables in awq/gemm_kernels.cu (#6908) 2024-07-29 18:01:17 -06:00
bnellnm 5467ac3196
[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047) 2024-06-09 16:23:30 -04:00
Michael Goin 5f6d10c14c
[CI/Build] Enforce style for C++ and CUDA code with `clang-format` (#4722) 2024-05-22 07:18:41 +00:00
Rex 563836496a
Refactor 2 awq gemm kernels into m16nXk32 (#2723)
Co-authored-by: Chunan Zeng <chunanzeng@Chunans-Air.attlocal.net>
2024-02-12 11:02:17 -08:00
Casper beb89f68b4
AWQ: Up to 2.66x higher throughput (#2566) 2024-01-26 23:53:17 -08:00
Woosuk Kwon 29678cd213
Minor fix on AWQ kernel launch (#1356) 2023-10-15 21:53:56 -07:00
CHU Tianxiang 980dd4a2c4
Fix overflow in awq kernel (#1295)
Co-authored-by: 楚天翔 <tianxiang.ctx@alibaba-inc.com>
2023-10-11 00:19:53 -07:00
twaka 8285736840
workaround of AWQ for Turing GPUs (#1252) 2023-10-10 19:48:16 -07:00
Woosuk Kwon 2b1c116b5a
Add minimum capability requirement for AWQ (#1064) 2023-09-18 12:02:01 -07:00
Woosuk Kwon e3e79e9e8a
Implement AWQ quantization support for LLaMA (#1032)
Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>
2023-09-16 00:03:37 -07:00