Commit Graph

414 Commits

Author SHA1 Message Date
Sage Moore 460a2b1100
[torch.compile] Add torch inductor pass for fusing silu_and_mul with subsequent scaled_fp8_quant operations (#10867)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-05-01 07:59:28 -07:00
TY-AMD 06ffc7e1d3
[Misc][ROCm] Exclude `cutlass_mla_decode` for ROCm build (#17289)
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>
2025-04-29 10:26:42 -07:00
Harry Mellor 40896bdf3f
`pre-commit autoupdate` (#17380)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-29 06:46:55 -07:00
Richard Barnes d6da8a8ff2
[Bugfix] Fix `numel()` downcast in fused_layernorm_dynamic_per_token_quant.cu (#17316) 2025-04-28 19:23:18 -07:00
TherLF c12df53b60
[Bugfix] Fix cutlass dispatch for fp8/int8 to properly invoke M<=16 c… (#16751)
Signed-off-by: Ther-LF <2639852836@qq.com>
2025-04-27 19:38:42 -07:00
Kaixi Hou ed7a29d9f8
[NVIDIA] Support Cutlass MLA for Blackwell GPUs (#16032)
Signed-off-by: kaixih <kaixih@nvidia.com>
2025-04-27 06:29:21 -07:00
Shu Wang 9e96f56efb
Allocate kv_cache with stride order (#16605)
Signed-off-by: shuw <shuw@nvidia.com>
2025-04-25 22:03:31 -07:00
Lu Fang c8e5be35f7
[MISC][AMD] Add unused annotation to rocm kernel file (#17097)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-04-25 20:33:35 -07:00
Charlie Fu 188b7f9b8c
[Performance][ROCm] Add skinny gemms for unquantized linear on ROCm (#15830)
Signed-off-by: charlifu <charlifu@amd.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
2025-04-21 20:46:22 -07:00
Varun Sundar Rabindranath 7b8a2ab76f
[Kernel] Add expert_map support to Cutlass FP8 MOE (#16861)
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com>
Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com>
2025-04-21 20:44:32 -07:00
Lucas Wilkinson 7eb4255628
[BugFix] Accuracy fix for llama4 int4 - improperly casted scales (#16801)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-04-17 22:13:29 -07:00
DefTruth e82ee40de3
[Bugfix][Kernel] fix potential cuda graph broken for merge_attn_states kernel (#16693)
Signed-off-by: DefTruth <qiustudent_r@163.com>
2025-04-16 03:31:39 -07:00
Jinzhen Lin d06ba4ed3f
[Kernel] moe wna16 marlin kernel (#14447)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-04-14 20:05:22 -07:00
Tianer Zhou 4a3a518722
fix: spelling (#16466)
Signed-off-by: Tianer Zhou <ezhoureal@gmail.com>
2025-04-11 23:24:22 -07:00
DefTruth e9528f6dc6
[Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173)
Signed-off-by: DefTruth <qiustudent_r@163.com>
2025-04-11 06:50:50 -06:00
yihong 04149cce27
[BugFix] fix some typos found by typos. (#16314)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-04-09 03:43:59 -07:00
rongfu.leng 4e9cf8c1dd
[Bugfix] fix gettid method is not define (#16084)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2025-04-08 19:12:44 -07:00
TY-AMD 9351f91be9
[BugFix][ROCm] Fix GGUF MoE Dispatch Block_Dim for ROCm (#16247)
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>
2025-04-08 05:10:26 -07:00
Jinzhen Lin 2fa66ef713
[Bugfix] fix use_atomic_add support of marlin kernel when using v1 engine (#15946)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
2025-04-05 20:04:22 -07:00
Isotr0py 230b131b54
[Bugfix][kernels] Fix half2float conversion in gguf kernels (#15995)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-04-04 09:38:58 -07:00
Aleksandr Malyshev e73ff24e31
[ROCM][KERNEL] Paged attention for V1 (#15720)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com>
2025-04-02 19:48:00 -07:00
Li, Jiang 550b2801ad
[CPU][Bugfix] Using custom allreduce for CPU backend (#15934)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-04-02 07:46:47 -07:00
LukasBluebaum 90969fb39a
[Kernel] Add more dtype support for GGUF dequantization (#15879)
Signed-off-by: lukas.bluebaum <lukas.bluebaum@aleph-alpha.com>
2025-04-02 01:58:48 -07:00
Ilya Markov b7b7676d67
[Distributed] Add custom allreduce support for ROCM (#14125)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-03-31 22:49:12 -07:00
youkaichao 555aa21905
[V1] Fully Transparent Implementation of CPU Offloading (#15354)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-03-31 20:22:34 +08:00
Charlie Fu e85829450d
[Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050)
Signed-off-by: charlifu <charlifu@amd.com>
2025-03-31 04:42:18 -07:00
ElizaWszola 9239bf718e
[Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972)
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com>
2025-03-27 00:54:44 +00:00
Szymon Ożóg a608160027
[Kernel] Fix conflicting macro names for gguf kernels (#15456)
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
2025-03-25 13:50:49 +00:00
Thien Tran 4f044b1d67
[Kernel][CPU] CPU MLA (#14744)
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
2025-03-25 09:34:59 +00:00
Lu Fang 051da7efe3
Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 (#15160)
Signed-off-by: Lu Fang <lufang@fb.com>
Co-authored-by: Richard Barnes <rbarnes@meta.com>
2025-03-25 15:36:45 +08:00
Jinzhen Lin 6b3cc75be0
[Kernel] allow non-contiguous input for marlin kernel (#14658)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
2025-03-24 09:21:33 -04:00
Lu Fang d3ccbd6350
Fix CUDA kernel index data type in vllm/csrc/quantization/fused_kernels/layernorm_utils.cuh +10 (#15159)
Signed-off-by: Lu Fang <lufang@fb.com>
Co-authored-by: Richard Barnes <rbarnes@meta.com>
2025-03-21 10:01:11 +08:00
Serena 64fc2193dc
[Misc][Docs] fix the comments of KV_T and CACHE_T in CALL_RESHAPE_AND_CACHE_XX macros (#14347) 2025-03-18 05:50:19 -07:00
Lu Fang cd0cd85102
[MISC] More AMD unused var clean up (#14926)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-03-17 16:40:41 +08:00
Li, Jiang a2ae496589
[CPU] Support FP8 KV cache (#14741)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-03-14 22:07:36 -07:00
Lu Fang 8c0d15d5c5
[Misc][Easy] Annotate unused vars in the csrc files (#14798)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-03-15 12:40:09 +08:00
Yajie Wang 977a16772c
[Bugfix][Kernel]: Fix AllSpark kernel compilation errors and enable for CUDA < 12.0 (#14430)
Signed-off-by: wyj371990 <wyj371990@alibaba-inc.com>
2025-03-14 09:55:14 -07:00
DefTruth 40253bab44
[Bugfix][W8A8] fixed cutlass block fp8 binding (#14796) 2025-03-14 03:32:42 -07:00
Thien Tran 27b50f1fe6
[Bugfix][Kernel][CPU] Fix num_tokens in CPU rotary embedding kernel (#14667)
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
2025-03-13 23:47:49 -07:00
Jeff Daily 2a602b055a
forward fix PR 14245, restore build on ROCm 6.2 (#14709)
Signed-off-by: Jeff Daily <jeff.daily@amd.com>
2025-03-13 20:40:15 -07:00
TJian 916836bbfb
[FEAT] [ROCm] [Embedding] Add encoder-only model support into ROCm Flash Attention to enable embedding models. (#14664)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-03-12 09:31:19 -07:00
Sage Moore 45f3f3f59e
[ROCm][Bugfix] Ensure that the moe_wna16_gemm kernel is not built on ROCm platforms. (#14629)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-03-12 08:00:28 -04:00
Pavani Majety debd6bbf09
[Kernel] Add ModelOpt FP4 Checkpoint Support (#12520)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-03-12 05:13:11 +00:00
Szymon Ożóg e22ee1e7a2
[Kernel] GGUF MoE kernel (#14613)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
2025-03-12 03:33:27 +00:00
Lucas Wilkinson 07b4b7a37f
[BugFix/Build] Fix sparse kernels not getting built on hopper (#14572)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-03-11 17:09:03 +00:00
Jeff Daily a1c8f3796c
dynamic distpatch of fp8 kernels (#14245)
Signed-off-by: Jeff Daily <jeff.daily@amd.com>
2025-03-11 10:54:56 -04:00
Jinzhen Lin 90e88ab756
[Kernel] moe wna16 cuda kernel (#13321)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-03-10 20:12:40 -04:00
Szymon Ożóg 89cdaa83e7
[Kernel] Add more dtype support for GGUF kernels (#14043)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
2025-03-10 07:30:04 -07:00
Lucas Wilkinson 7caff01a7b
[Build/BugFix] Fix hopper 12.8 build (#14354)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-03-08 08:11:56 +00:00
Jinzhen Lin d0feea31c7
[Kernel] optimize performance of gptq marlin kernel when n is small (#14138)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
2025-03-07 11:53:38 -05:00
Lucas Wilkinson e5e03c2c1b
[BugFix] Illegal Memory Access in the blockwise cutlass fp8 GEMMs (#14396) 2025-03-06 21:56:06 -08:00
Tyler Michael Smith 99b0915d3b
[Kernel] Add needs_fixed_stride_order tag to most GEMMs (#14306)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-03-06 14:17:09 -08:00
Dilip Gowda Bhagavan ada19210a3
Adding cpu inference with VXE ISA for s390x architecture (#12613)
Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com>
Signed-off-by: Rishika Kedia <rishika.kedia@in.ibm.com>
Co-authored-by: Rishika Kedia <rishika.kedia@in.ibm.com>
2025-03-06 08:40:53 -08:00
kushanam f89978ad7c
add cutlass support for blackwell fp8 gemm (#13798) 2025-03-04 07:55:07 -08:00
TJian 848a6438ae
[ROCm] Faster Custom Paged Attention kernels (#12348) 2025-03-03 09:24:45 -08:00
Sheng Yao 09e56f9262
[Bugfix] Explicitly include "omp.h" for MacOS to avoid installation failure (#14051) 2025-03-02 17:35:01 -08:00
Harry Mellor cf069aa8aa
Update deprecated Python 3.8 typing (#13971) 2025-03-02 17:34:51 -08:00
YajieWang 6a92ff93e1
[Misc][Kernel]: Add GPTQAllSpark Quantization (#12931) 2025-02-28 22:30:59 -08:00
Sage Moore 378b3ef6f8
[ROCm][V1] Update reshape_and_cache to properly work with CUDA graph padding (#13922) 2025-02-26 20:04:12 -08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 a31614e386
[ROCm][Quantization][Kernel] Use FP8 FNUZ when OCP flag is 0 or undefined (#13851)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-02-27 10:39:10 +08:00
Henry Tsang 094b7d9496
[Kernel][Build/CI] Bump CUTLASS to 3.8 and add initializers for cutlass epilogues (#13797) 2025-02-25 18:52:03 -08:00
Gregory Shtrasberg aabeb2688f
[ROCm][Quantization][Kernel] Using HIP FP8 header (#12593) 2025-02-25 00:39:59 -08:00
Roger Wang 82e0d601fc
[CI/Build] Fix pre-commit errors from #13571 (#13709)
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-02-22 16:50:38 -08:00
Kaixi Hou e109e598c7
[NVIDIA] Support nvfp4 cutlass gemm (#13571) 2025-02-22 05:24:05 -08:00
Lucas Wilkinson 288cc6c234
[Attention] MLA with chunked prefill (#12639)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Patrick Horn <patrick.horn@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-02-21 15:30:12 -08:00
leoneo 839b27c6cc
[Kernel]Add streamK for block-quantized CUTLASS kernels (#12978) 2025-02-20 22:14:24 -08:00
Szymon Ożóg 1cdc88614a
Missing comment explaining VDR variable in GGUF kernels (#13290) 2025-02-20 22:06:54 -08:00
Kaixi Hou 27a09dc52c
[NVIDIA] Fix an issue to use current stream for the nvfp4 quant (#13632) 2025-02-20 22:01:48 -08:00
Gregory Shtrasberg 0023cd2b9d
[ROCm] MI300A compile targets deprecation (#13560) 2025-02-19 23:05:00 -08:00
Sage Moore c9f9d5b397
[Bugfix][AMD] Update torch_bindings so that scaled_fp4_quant isn't build on ROCm (#13235) 2025-02-14 20:30:42 -08:00
Jinzhen Lin 8c32b08a86
[Kernel] Fix awq error when n is not divisable by 128 (#13227) 2025-02-13 20:07:05 -08:00
Tyler Michael Smith c1e37bf71b
[Kernel][Bugfix] Refactor and Fix CUTLASS 2:4 Sparse Kernels (#13198)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-02-14 00:01:14 +00:00
Michael Goin 2344192a55
Optimize moe_align_block_size for deepseek_v3 (#12850)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-02-13 18:43:37 -05:00
Kaixi Hou 4fc5c23bb6
[NVIDIA] Support nvfp4 quantization (#12784) 2025-02-12 19:51:51 -08:00
Shiyan Deng f1042e86f0
[Misc] AMD Build Improvements (#12923) 2025-02-12 02:36:10 -08:00
youkaichao 59fff4a01a
[core] improve error handling when wake up from sleep mode (#12981)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-10 09:38:57 +08:00
Cyrus Leung 8a69e0e20e
[CI/Build] Auto-fix Markdown files (#12941) 2025-02-08 04:25:15 -08:00
Isotr0py 85ac82d228
[Kernel] Make rotary_embedding ops more flexible with input shape (#12777) 2025-02-06 08:46:13 -08:00
Gregory Shtrasberg 5b19b93082
[ROCm][Kernel] Using the correct warp_size value 2025-02-05 19:15:08 -08:00
Lucas Wilkinson 75e94309e8
[Perf] Mem align KV caches for CUDA devices (MLA perf improvement) (#12676)
Signed-off-by: simon-mo <xmo@berkeley.edu>
Signed-off-by: Lucas Wilkinson <lcwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
2025-02-04 18:22:24 -08:00
Tyler Michael Smith c11de33dad
[Bugfix][Kernel] Fix per-token/per-channel quantization for Hopper scaled mm (#12696)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-02-03 13:04:59 -08:00
Yang Chen 95460fc513
[Kernel] port sgl moe_align_block_size kernels (#12574)
sgl_moe_align_block_size is based on:


ded9fcd09a

moe_align_block_size is based on:


ba5112ff69

Signed-off-by: Yang Chen <yangche@fb.com>
2025-02-03 13:09:50 +08:00
Russell Bryant e489ad7a21
[Misc] Add SPDX-License-Identifier headers to python source files (#12628)
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**

commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:18:24 2025 -0500

    Add SPDX license headers to python source files
    
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
    also be easily used by tools to help manage license compliance.
    
The Linux Foundation runs license scans against the codebase to help
ensure
    we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
    
    More information can be found on the SPDX site:
    
    - https://spdx.dev/learn/handling-license-info/
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com>

commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:36:32 2025 -0500

    Check for SPDX headers using pre-commit
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com>

---------

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-02 11:58:18 -08:00
Tyler Michael Smith eb5741ad42
[Kernel][Quantization] Integrate block-quantized CUTLASS kernels for DeepSeekV3 (#12587)
Integrates the block-quantized kernels introduced in
https://github.com/vllm-project/vllm/pull/11868 for use in linear
layers.

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-01-31 15:29:11 -08:00
Lucas Wilkinson cabaf4eff3
[Attention] MLA decode optimizations (#12528)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: simon-mo <simon.mo@hey.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
2025-01-30 23:49:37 -08:00
Lucas Wilkinson 9798b2fb00
[Kernel] Update `cutlass_scaled_mm` to support 2d group (blockwise) scaling (#11868) 2025-01-30 18:33:00 -08:00
Harry Mellor 823ab79633
Update `pre-commit` hooks (#12475)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-27 17:23:08 -07:00
ElizaWszola 221d388cc5
[Bugfix][Kernel] Fix moe align block issue for mixtral (#12413) 2025-01-25 01:49:28 +00:00
Gregory Shtrasberg e97f802b2d
[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
2025-01-23 18:04:03 +00:00
youkaichao 68ad4e3a8d
[Core] Support fully transparent sleep mode (#11743)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-22 14:39:32 +08:00
Jinzhen Lin 1e60f87bb3
[Kernel] fix moe_align_block_size error condition (#12239)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
2025-01-21 10:30:28 -08:00
Jinzhen Lin 750f4cabfa
[Kernel] optimize moe_align_block_size for cuda graph and large num_experts (e.g. DeepSeek-V3) (#12222)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-01-20 16:42:16 -08:00
Harry Mellor 3ea7b94523
Move linting to `pre-commit` (#11975)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-20 14:58:01 +08:00
Elfie Guo 0794e7446e
[Misc] Add multipstep chunked-prefill support for FlashInfer (#10467) 2025-01-15 12:47:49 +08:00
Jee Jee Li 42f5e7c52a
[Kernel] Support MulAndSilu (#11624)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-15 02:29:53 +00:00
Wallas Henrique cfd3219f58
[Hardware][Apple] Native support for macOS Apple Silicon (#11696)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2025-01-08 16:35:49 +08:00
Lu Fang 4068f4b5b5
[MISC] Replace c10::optional with std::optional (#11730)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-01-05 10:20:34 +09:00
wchen61 5dba257506
Resolve race conditions in Marlin kernel (#11493)
Signed-off-by: wchen61 <wchen61@foxmail.com>
2025-01-02 22:58:56 +00:00
Tyler Michael Smith 970d6d0776
[Build][Kernel] Update CUTLASS to v3.6.0 (#11607)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-12-30 17:22:13 +08:00
Simon Mo f49777ba62
Deepseek v3 (#11502)
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: robertgshaw2-neuralmagic <rshaw@neuralmagic.com>
2024-12-26 16:09:44 -08:00