Commit Graph

13 Commits

Author SHA1 Message Date
Robert Shaw b675069d74
[ Misc ] Refactor Marlin Python Utilities (#6082)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
2024-07-11 15:40:11 +00:00
Robert Shaw abfe705a02
[ Misc ] Support Fp8 via `llm-compressor` (#6110)
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
2024-07-07 20:42:11 +00:00
Robert Shaw 62963d129e
[ Misc ] Clean Up `CompressedTensorsW8A8` (#6113) 2024-07-03 22:50:08 +00:00
Robert Shaw af9ad46fca
[ Misc ] Refactor w8a8 to use `process_weights_after_load` (Simplify Weight Loading) (#5940)
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
2024-06-30 23:06:27 +00:00
Dipika Sikka dd248f7675
[Misc] Update `w4a16` `compressed-tensors` support to include `w8a16` (#5794) 2024-06-25 19:23:35 +00:00
Dipika Sikka 4a30d7e3cc
[Misc] Add per channel support for static activation quantization; update w8a8 schemes to share base classes (#5650) 2024-06-19 18:06:44 -04:00
Dipika Sikka 95db455e7f
[Misc] Add channel-wise quantization support for w8a8 dynamic per token activation quantization (#5542) 2024-06-18 12:45:05 -04:00
Dipika Sikka 890d8d960b
[Kernel] `compressed-tensors` marlin 24 support (#5435) 2024-06-17 12:32:48 -04:00
Dipika Sikka c2637a613b
[Kernel] `w4a16` support for `compressed-tensors` (#5385)
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
2024-06-13 10:19:56 -04:00
Dipika Sikka 5884c2b454
[Misc] Update to comply with the new `compressed-tensors` config (#5350)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-06-10 03:49:46 +00:00
youkaichao 8ea5e44a43
[CI/Test] improve robustness of test (vllm_runner) (#5357)
[CI/Test] improve robustness of test by replacing del with context manager (vllm_runner) (#5357)
2024-06-08 08:59:20 +00:00
Dipika Sikka ca3ea51bde
[Kernel] Dynamic Per-Token Activation Quantization (#5037)
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-06-07 09:36:26 -07:00
Dipika Sikka a1242324c9
[Kernel] Initial Activation Quantization Support (#4525)
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-05-23 21:29:18 +00:00