Commit Graph

15 Commits

Author SHA1 Message Date
Akash kaothalkar e515668edf
[Hardware][Power] Enable compressed tensor W8A8 INT8 quantization for POWER (#17153)
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-05-07 22:35:03 -07:00
Yong Hoon Shin 98c89e16ff
Make key optional for rotary embedding (#17566)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-05-07 00:11:46 -07:00
Li, Jiang 550b2801ad
[CPU][Bugfix] Using custom allreduce for CPU backend (#15934)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-04-02 07:46:47 -07:00
Thien Tran 4f044b1d67
[Kernel][CPU] CPU MLA (#14744)
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
2025-03-25 09:34:59 +00:00
Gregory Shtrasberg e97f802b2d
[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
2025-01-23 18:04:03 +00:00
Lu Fang 4068f4b5b5
[MISC] Replace c10::optional with std::optional (#11730)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-01-05 10:20:34 +09:00
Li, Jiang 5eda21e773
[Hardware][CPU] compressed-tensor INT8 W8A8 AZP support (#9344) 2024-10-17 12:21:04 -04:00
Luka Govedič 5d73ae49d6
[Kernel] AQ AZP 3/4: Asymmetric quantization kernels (#7270) 2024-09-16 11:52:40 -07:00
bnellnm 73202dbe77
[Kernel][Misc] register ops to prevent graph breaks (#6917)
Co-authored-by: Sage Moore <sage@neuralmagic.com>
2024-09-11 12:52:19 -07:00
Li, Jiang 0b952af458
[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend (#7257) 2024-09-11 09:46:46 -07:00
Lucas Wilkinson a8d604ca2a
[Misc] Disambiguate quantized types via a new ScalarType (#6396) 2024-08-02 13:51:58 -07:00
Li, Jiang 3bbb4936dc
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125) 2024-07-26 13:50:10 -07:00
Michael Goin 978aed5300
[Kernel][Attention] Separate `Attention.kv_scale` into `k_scale` and `v_scale` (#6081) 2024-07-16 15:31:32 -07:00
Roger Wang bd620b01fb
[Kernel][CPU] Add Quick `gelu` to CPU (#5717) 2024-06-21 06:39:40 +00:00
bnellnm 5467ac3196
[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047) 2024-06-09 16:23:30 -04:00