Commit Graph

30 Commits

Author SHA1 Message Date
Simon Mo 02f0c7b220
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
Jinzhen Lin e73b7dfd69
[Bugfix] fix `an illegal memory access was encountered` of marlin kernel + act_order (#18245) 2025-05-16 16:02:44 -07:00
Jinzhen Lin d74e5f37bc
[Kernel] fp4 marlin kernel (#17687)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
2025-05-10 19:58:49 -07:00
Jinzhen Lin 1d0c9d6b2d
[Kernel] some optimizations for dense marlin and moe marlin (#16850)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
2025-05-05 09:39:30 -07:00
Harry Mellor 40896bdf3f
`pre-commit autoupdate` (#17380)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-29 06:46:55 -07:00
Jinzhen Lin d06ba4ed3f
[Kernel] moe wna16 marlin kernel (#14447)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-04-14 20:05:22 -07:00
Jinzhen Lin 2fa66ef713
[Bugfix] fix use_atomic_add support of marlin kernel when using v1 engine (#15946)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
2025-04-05 20:04:22 -07:00
Lu Fang 051da7efe3
Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 (#15160)
Signed-off-by: Lu Fang <lufang@fb.com>
Co-authored-by: Richard Barnes <rbarnes@meta.com>
2025-03-25 15:36:45 +08:00
Jinzhen Lin 6b3cc75be0
[Kernel] allow non-contiguous input for marlin kernel (#14658)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
2025-03-24 09:21:33 -04:00
Jinzhen Lin d0feea31c7
[Kernel] optimize performance of gptq marlin kernel when n is small (#14138)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
2025-03-07 11:53:38 -05:00
Harry Mellor 823ab79633
Update `pre-commit` hooks (#12475)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-27 17:23:08 -07:00
wchen61 5dba257506
Resolve race conditions in Marlin kernel (#11493)
Signed-off-by: wchen61 <wchen61@foxmail.com>
2025-01-02 22:58:56 +00:00
ElizaWszola b00b33d77e
[Model][Quantization] HQQ support through Marlin kernel expansion (#9766)
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
2024-11-19 13:31:12 -08:00
bnellnm eca2c5f7c0
[Bugfix] Fix support for dimension like integers and ScalarType (#9299) 2024-10-17 19:08:34 +00:00
ElizaWszola 05d686432f
[Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE (#8973)
Co-authored-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
2024-10-04 12:34:44 -06:00
Lucas Wilkinson aeb37c2a72
[CI/Build] Per file CUDA Archs (improve wheel size and dev build times) (#8845) 2024-10-03 22:55:25 -04:00
bnellnm 73202dbe77
[Kernel][Misc] register ops to prevent graph breaks (#6917)
Co-authored-by: Sage Moore <sage@neuralmagic.com>
2024-09-11 12:52:19 -07:00
Lucas Wilkinson 6aa33cb2dd
[Misc] Use scalar type to dispatch to different `gptq_marlin` kernels (#7323) 2024-08-12 14:40:13 -04:00
Tyler Michael Smith 6e4852ce28
[CI/Build] Suppress divide-by-zero and missing return statement warnings (#7001) 2024-08-05 16:00:01 -04:00
Lucas Wilkinson a8d604ca2a
[Misc] Disambiguate quantized types via a new ScalarType (#6396) 2024-08-02 13:51:58 -07:00
Tyler Michael Smith 61a97c32f6
[Kernel] Fix marlin divide-by-zero warnings (#6904) 2024-07-30 01:26:07 +00:00
Alexander Matveev 75acdaa4b6
[Kernel] Increase precision of GPTQ/AWQ Marlin kernel (#6795) 2024-07-27 17:52:33 -04:00
Alexander Matveev 396d92d5e0
[Kernel][Core] Add AWQ support to the Marlin kernel (#6612) 2024-07-21 19:41:42 -04:00
bnellnm 5467ac3196
[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047) 2024-06-09 16:23:30 -04:00
Michael Goin 5f6d10c14c
[CI/Build] Enforce style for C++ and CUDA code with `clang-format` (#4722) 2024-05-22 07:18:41 +00:00
Alexander Matveev da5a0b539d
Remove marlin warning (#4918) 2024-05-20 14:55:34 +00:00
Jinzhen Lin 99caa49106
[Kernel] add bfloat16 support for gptq marlin kernel (#4788) 2024-05-16 09:55:29 -04:00
alexm-nm e288df0632
[Bugfix] Fine-tune gptq_marlin configs to be more similar to marlin (#4626) 2024-05-08 17:14:31 -07:00
alexm-nm 7038e8b803
[Kernel] Support running GPTQ 8-bit models in Marlin (#4533) 2024-05-02 12:56:22 -04:00
Robert Shaw 73c8d677e5
[Kernel] Marlin Expansion: Support AutoGPTQ Models with Marlin (#3922)
Co-authored-by: alexm <alexm@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
2024-04-29 09:35:34 -07:00