Dipika Sikka
6f2f53a82d
[Quantization] Add compressed-tensors NVFP4 MoE Support ( #19990 )
...
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Dipika <dipikasikka1@gmail.com>
2025-06-29 22:05:40 +00:00
Dipika Sikka
6bc7b57315
[Quantization] Remove FP4 emulation; Fall-back to marlin for device < 100 ( #19563 )
2025-06-16 17:33:51 -04:00
Dipika Sikka
c123bc33f9
[Quantization] Add compressed-tensors NVFP4 support ( #18312 )
2025-06-08 09:05:55 -04:00
Dipika Sikka
94870359cd
[Quantization] Bump compressed-tensors version; update NVFP4A16 test model ( #19224 )
...
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
2025-06-06 01:21:54 -07:00
Dipika Sikka
aa49f14832
[Quantization] Skip Fp4 Test for `compressed-tensors` ( #19217 )
2025-06-05 18:21:53 +00:00
Simon Mo
02f0c7b220
[Misc] Add SPDX-FileCopyrightText ( #19100 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
Dipika Sikka
cd3edfc908
[Misc] Add compressed-tensors NVFP4A16 emulation support ( #17914 )
...
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Dipika <dipikasikka1@gmail.com>
2025-05-11 15:58:38 +08:00
Dipika Sikka
54a66e5fee
[Misc] Update `compressed-tensors` WNA16 to support zero-points ( #14211 )
2025-04-15 07:33:51 -06:00
TJian
4965ec42d2
[FEAT] [ROCm] Add AITER int8 scaled gemm kernel ( #15433 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-03-29 03:33:56 -07:00
Robert Shaw
d4d93db2c5
[V1] V1 Enablement Oracle ( #13726 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2025-03-14 22:02:20 -07:00
YajieWang
6a92ff93e1
[Misc][Kernel]: Add GPTQAllSpark Quantization ( #12931 )
2025-02-28 22:30:59 -08:00
Tyler Michael Smith
09545c0a94
[Bugfix/CI] Turn test_compressed_tensors_2of4_sparse back on ( #13250 )
2025-02-13 20:19:25 -08:00
Rahul Tuli
3b2005e1db
Add: Support for Sparse24Bitmask Compressed Models
2025-02-05 13:30:43 -08:00
Russell Bryant
e489ad7a21
[Misc] Add SPDX-License-Identifier headers to python source files ( #12628 )
...
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**
commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com>
Date: Fri Jan 31 14:18:24 2025 -0500
Add SPDX license headers to python source files
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
also be easily used by tools to help manage license compliance.
The Linux Foundation runs license scans against the codebase to help
ensure
we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
More information can be found on the SPDX site:
- https://spdx.dev/learn/handling-license-info/
Signed-off-by: Russell Bryant <rbryant@redhat.com>
commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com>
Date: Fri Jan 31 14:36:32 2025 -0500
Check for SPDX headers using pre-commit
Signed-off-by: Russell Bryant <rbryant@redhat.com>
---------
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-02 11:58:18 -08:00
Harry Mellor
823ab79633
Update `pre-commit` hooks ( #12475 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-27 17:23:08 -07:00
Tyler Michael Smith
aa2cd2c43d
[Bugfix] Disable w16a16 2of4 sparse CompressedTensors24 ( #12417 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
2025-01-26 19:59:58 +08:00
Cyrus Leung
59a0192fb9
[Core] Interface for accessing model from `VllmRunner` ( #10353 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-20 15:00:59 +08:00
Dipika Sikka
8cef6e02dc
[Misc] add w8a8 asym models ( #11075 )
2024-12-23 13:33:20 -05:00
Tyler Michael Smith
5a9da2e6e9
[Bugfix][Build/CI] Fix sparse CUTLASS compilation on CUDA [12.0, 12.2) ( #11311 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-12-19 02:43:30 +00:00
Dipika Sikka
60508ffda9
[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support ( #10995 )
...
Co-authored-by: Faraz Shahsavan <faraz.shahsavan@gmail.com>
Co-authored-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
2024-12-18 09:57:16 -05:00
Luka Govedič
bf2ddc6610
[bugfix] Fix static asymmetric quantization case ( #10334 )
...
Signed-off-by: Daniël de Kok <me@danieldk.eu>
Signed-off-by: luka <luka@neuralmagic.com>
Co-authored-by: Daniël de Kok <me@danieldk.eu>
2024-11-15 09:35:11 +08:00
Michael Goin
22f8a69549
[Misc] Directly use compressed-tensors for checkpoint definitions ( #8909 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-15 15:40:25 -07:00
Luka Govedič
172d1cd276
[Kernel] AQ AZP 4/4: Integrate asymmetric quantization to linear method ( #7271 )
2024-09-27 14:25:10 -04:00
Li, Jiang
0b952af458
[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend ( #7257 )
2024-09-11 09:46:46 -07:00
Dipika Sikka
fc911880cc
[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel ( #7766 )
...
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
2024-08-27 15:07:09 -07:00
Kyle Sayers
f55a9aea45
[Misc] Revert `compressed-tensors` code reuse ( #7521 )
2024-08-14 15:07:37 -07:00
Kyle Sayers
373538f973
[Misc] `compressed-tensors` code reuse ( #7277 )
2024-08-13 19:05:15 -04:00
Dipika Sikka
0f7052bc7e
[Misc] Refactor linear layer weight loading; introduce `BasevLLMParameter` and `weight_loader_v2` ( #5874 )
2024-08-07 09:17:58 -07:00
Michael Goin
fb3db61688
[CI/Build] Remove sparseml requirement from testing ( #7037 )
2024-08-01 12:00:51 -07:00
Michael Goin
9e0b558a09
[Misc] Support FP8 kv cache scales from compressed-tensors ( #6528 )
2024-07-23 04:11:50 +00:00
Robert Shaw
b675069d74
[ Misc ] Refactor Marlin Python Utilities ( #6082 )
...
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
2024-07-11 15:40:11 +00:00
Robert Shaw
abfe705a02
[ Misc ] Support Fp8 via `llm-compressor` ( #6110 )
...
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
2024-07-07 20:42:11 +00:00
Robert Shaw
62963d129e
[ Misc ] Clean Up `CompressedTensorsW8A8` ( #6113 )
2024-07-03 22:50:08 +00:00
Robert Shaw
af9ad46fca
[ Misc ] Refactor w8a8 to use `process_weights_after_load` (Simplify Weight Loading) ( #5940 )
...
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
2024-06-30 23:06:27 +00:00
Dipika Sikka
dd248f7675
[Misc] Update `w4a16` `compressed-tensors` support to include `w8a16` ( #5794 )
2024-06-25 19:23:35 +00:00
Dipika Sikka
4a30d7e3cc
[Misc] Add per channel support for static activation quantization; update w8a8 schemes to share base classes ( #5650 )
2024-06-19 18:06:44 -04:00
Dipika Sikka
95db455e7f
[Misc] Add channel-wise quantization support for w8a8 dynamic per token activation quantization ( #5542 )
2024-06-18 12:45:05 -04:00
Dipika Sikka
890d8d960b
[Kernel] `compressed-tensors` marlin 24 support ( #5435 )
2024-06-17 12:32:48 -04:00
Dipika Sikka
c2637a613b
[Kernel] `w4a16` support for `compressed-tensors` ( #5385 )
...
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
2024-06-13 10:19:56 -04:00
Dipika Sikka
5884c2b454
[Misc] Update to comply with the new `compressed-tensors` config ( #5350 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-06-10 03:49:46 +00:00
youkaichao
8ea5e44a43
[CI/Test] improve robustness of test (vllm_runner) ( #5357 )
...
[CI/Test] improve robustness of test by replacing del with context manager (vllm_runner) (#5357 )
2024-06-08 08:59:20 +00:00
Dipika Sikka
ca3ea51bde
[Kernel] Dynamic Per-Token Activation Quantization ( #5037 )
...
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-06-07 09:36:26 -07:00
Dipika Sikka
a1242324c9
[Kernel] Initial Activation Quantization Support ( #4525 )
...
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-05-23 21:29:18 +00:00