Commit Graph

472 Commits

Author SHA1 Message Date
Robert Shaw 111815d482
[Kernel] Support Fp8 Checkpoints (Dynamic + Static) (#4332)
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-04-30 21:46:12 +00:00
Robert Shaw 73c8d677e5
[Kernel] Marlin Expansion: Support AutoGPTQ Models with Marlin (#3922)
Co-authored-by: alexm <alexm@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
2024-04-29 09:35:34 -07:00
James Fleming 2b7949c1c2
AQLM CUDA support (#3287)
Co-authored-by: mgoin <michael@neuralmagic.com>
2024-04-23 13:59:33 -04:00
Michael Goin 53b018edcb
[Bugfix] Get available quantization methods from quantization registry (#4098) 2024-04-18 00:21:55 -07:00
SangBin Cho 67b4221a61
[Core][5/N] Fully working chunked prefill e2e (#3884) 2024-04-10 17:56:48 -07:00
youkaichao 95baec828f
[Core] enable out-of-tree model register (#3871) 2024-04-06 17:11:41 -07:00
Robert Shaw 563c1d7ec5
[CI/Build] Make Marlin Tests Green (#3753) 2024-03-30 19:18:34 -07:00
SangBin Cho 26422e477b
[Test] Make model tests run again and remove --forked from pytest (#3631)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-03-28 21:06:40 -07:00
xwjiang2010 64172a976c
[Feature] Add vision language model support. (#3042) 2024-03-25 14:16:30 -07:00
SangBin Cho 01bfb22b41
[CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
Roy ea5f14e6ff
[Bugfix][Model] Fix Qwen2 (#3554) 2024-03-22 00:18:58 +00:00
Zhuohan Li 2f8844ba08
Re-enable the 80 char line width limit (#3305) 2024-03-10 19:49:14 -07:00
Robert Shaw c0c2335ce0
Integrate Marlin Kernels for Int4 GPTQ inference (#2497)
Co-authored-by: Robert Shaw <114415538+rib-2@users.noreply.github.com>
Co-authored-by: alexm <alexm@neuralmagic.com>
2024-03-01 12:47:51 -08:00
Seonghyeon bfdcfa6a05
Support starcoder2 architecture (#3089) 2024-02-29 00:51:48 -08:00
Isotr0py ab3a5a8259
Support OLMo models. (#2832) 2024-02-18 21:05:15 -08:00
Hyunsung Lee e1957c6ebd
Add StableLM3B model (#2372) 2024-01-16 20:32:40 -08:00
avideci de60a3fb93
Added DeciLM-7b and DeciLM-7b-instruct (#2062) 2023-12-19 02:29:33 -08:00
Woosuk Kwon f8c688d746
[Minor] Add Phi 2 to supported models (#2159) 2023-12-17 02:54:57 -08:00
Woosuk Kwon f1c8520146
[BugFix] Fix input positions for long context with sliding window (#2088) 2023-12-13 12:28:13 -08:00
maximzubkov 521b35f799
Support Microsoft Phi 1.5 (#1664) 2023-11-16 14:28:39 -08:00
Woosuk Kwon 9524867701
Add Mistral 7B to `test_models` (#1366) 2023-10-16 17:49:54 -07:00
Woosuk Kwon 32b6816e55
Add tests for models (#922) 2023-09-01 11:19:43 +09:00