Harry Mellor
|
40896bdf3f
|
`pre-commit autoupdate` (#17380)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-29 06:46:55 -07:00 |
Lu Fang
|
051da7efe3
|
Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 (#15160)
Signed-off-by: Lu Fang <lufang@fb.com>
Co-authored-by: Richard Barnes <rbarnes@meta.com>
|
2025-03-25 15:36:45 +08:00 |
Harry Mellor
|
823ab79633
|
Update `pre-commit` hooks (#12475)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-01-27 17:23:08 -07:00 |
Tyler Michael Smith
|
e2251109c7
|
[Kernel] Remove if-else with identical branches in marlin 2:4 (#10687)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-11-26 22:55:32 -08:00 |
Lucas Wilkinson
|
d200972e7f
|
[Bugfix] Marlin 2:4 temp fix for large M dim (>256) (#10464)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2024-11-19 19:40:33 -08:00 |
bnellnm
|
eca2c5f7c0
|
[Bugfix] Fix support for dimension like integers and ScalarType (#9299)
|
2024-10-17 19:08:34 +00:00 |
Lucas Wilkinson
|
aeb37c2a72
|
[CI/Build] Per file CUDA Archs (improve wheel size and dev build times) (#8845)
|
2024-10-03 22:55:25 -04:00 |
Lucas Wilkinson
|
a8d604ca2a
|
[Misc] Disambiguate quantized types via a new ScalarType (#6396)
|
2024-08-02 13:51:58 -07:00 |
Tyler Michael Smith
|
61a97c32f6
|
[Kernel] Fix marlin divide-by-zero warnings (#6904)
|
2024-07-30 01:26:07 +00:00 |
Tyler Michael Smith
|
b23ce92032
|
[Bugfix] Fix CUDA version check for mma warning suppression (#5642)
|
2024-06-18 23:48:49 +00:00 |
Tyler Michael Smith
|
348616ac4b
|
[Kernel] Suppress mma.sp warning on CUDA 12.5 and later (#5401)
|
2024-06-14 10:02:00 -07:00 |
bnellnm
|
5467ac3196
|
[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047)
|
2024-06-09 16:23:30 -04:00 |
Simon Mo
|
e9d3aa04f6
|
Revert "[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::ordered_metadata modifier (introduced with PTX 8.5)" (#5149)
|
2024-05-30 22:00:26 -07:00 |
Alexander Matveev
|
6d21fa1cad
|
[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::ordered_metadata modifier (introduced with PTX 8.5) (#5136)
|
2024-05-30 21:02:11 -05:00 |
Alexander Matveev
|
6066253296
|
Marlin 24 prefill performance improvement (about 25% better on average) (#4983)
|
2024-05-23 02:39:27 -04:00 |
Michael Goin
|
5f6d10c14c
|
[CI/Build] Enforce style for C++ and CUDA code with `clang-format` (#4722)
|
2024-05-22 07:18:41 +00:00 |
Alexander Matveev
|
6979ade384
|
Add GPTQ Marlin 2:4 sparse structured support (#4790)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2024-05-16 12:56:15 -04:00 |