vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Joonchen Liau	9e5552aa13	[NVIDIA] Support Cutlass w8a8 FP8 for Blackwell Geforce GPUs (sm120) (#17280 ) Signed-off-by: kaln27 <liaojuncheng123@foxmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-02 06:47:19 -06:00
Tyler Michael Smith	3be8d312a2	[Kernel][Bugfix] Fixup some warnings in nvfp4_blockwise_moe when CUDA < 12.8 (#20324 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-07-01 18:05:47 -07:00
周周周	9290de5667	remove unused variables in marlin_template.h (#20236 )	2025-07-02 00:51:52 +00:00
Li, Jiang	6cc1e7d96d	[CPU] Update custom ops for the CPU backend (#20255 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-07-01 07:25:03 +00:00
Richard Barnes	86debab54c	Fix `numel()` downcast in vllm/csrc/moe/moe_align_sum_kernels.cu +2 (#17082 ) Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-01 06:48:10 +00:00
Tyler Michael Smith	e8c3bd2cd1	[Bugfix] Fix some narrowing conversion warnings (#20141 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-06-27 09:01:28 -07:00
Hosang	94a55c7681	[Fix][ROCm] Remove unused variables to fix build error on GFX11/12 (#19891 ) Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>	2025-06-27 07:14:44 -07:00
li haoyang	0740e29b66	[Feature] add quick all reduce (#19744 ) Signed-off-by: ilmarkov <imarkov@redhat.com> Signed-off-by: Haoyang Li <Haoyang.Li@amd.com> Co-authored-by: ilmarkov <imarkov@redhat.com>	2025-06-26 20:54:24 -07:00
Michael Goin	44d2e6af63	[Bugfix] Build moe_data for both sm100 and sm90 (#20086 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-26 20:50:12 -07:00
Ilya Markov	2d7779f888	[Perf] SM100 FP8 GEMM Optimizations after cutlass_profiler (#20071 ) Signed-off-by: ilmarkov <imarkov@redhat.com> Co-authored-by: ilmarkov <imarkov@redhat.com>	2025-06-26 20:50:09 -07:00
Li, Jiang	0567c8249f	[CPU] Fix torch version in x86 CPU backend (#19258 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-06-26 03:34:47 -07:00
Wentao Ye	ffb2cd6b54	[Perf] Optimize `moe_align_block_size` CUDA kernel (#19572 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-06-17 11:49:26 -07:00
Szymon Ożóg	dec66d253b	[Kernel] GGUF MMVQ kernel for multiple input vectors (#18754 ) Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>	2025-06-16 17:33:26 +08:00
Lu Fang	c6703d1e0d	[MISC] Remove unused variableds in C++ (#19609 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-06-15 20:05:28 -07:00
Ilya Markov	e13945f9dd	[Perf] Further tunings for SM100 FP8 CUTLASS kernel (#19566 )	2025-06-14 17:25:10 -07:00
jiahanc	294fc1e2c9	[Hardware][NVIDIA][kernel] Fp4 MOE quant kernel optimization (#19500 )	2025-06-14 09:34:28 -07:00
Wentao Ye	ce9dc02c93	[Refactor] Remove unused variables in `moe_permute_unpermute_kernel.inl` (#19573 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-13 06:12:15 -07:00
Wentao Ye	b6efafd9e4	[Perf] Vectorize static / dynamic INT8 quant kernels (#19233 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-12 06:51:41 -07:00
Ning Xie	2f1c19b245	[CI] change spell checker from codespell to typos (#18711 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-06-11 19:57:10 -07:00
Louie Tsai	5c8d34a42c	Support no privileged mode on CPU for docker and kubernetes deployments (#19241 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com>	2025-06-11 04:11:47 -07:00
ElizaWszola	84166fee97	[Kernel] Integrate CUTLASS MoE kernel with PPLX (#18762 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-06-06 18:26:11 -07:00
Chiyue Wei	61059bee40	[Hardware][NVIDIA] FP4 MoE kernel optimization (#19110 ) Signed-off-by: Chiyue Wei <chiyuew@nvidia.com> Co-authored-by: Chiyue Wei <chiyuew@nvidia.com>	2025-06-05 09:48:26 -07:00
Michael Goin	53a5a0ce30	[Perf] Tunings for SM100 FP8 CUTLASS kernel (#18778 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-04 10:46:28 -07:00
Lain	5f2cd251d2	Sm100 blockwise fp8 swap ab (#18564 )	2025-06-04 07:48:45 -07:00
Kaixi Hou	41aa578428	[NVIDIA] Add Cutlass MLA backend (#17625 )	2025-06-03 21:40:26 -07:00
Vadim Gimpelson	5d6d1adf15	[KERNEL] Sampler. CUDA kernel for applying repetition penalty (#18437 )	2025-06-03 21:13:01 -07:00
Michael Goin	e31446b6c8	[Perf] Tune `scaled_fp8_quant` by increasing vectorization (#18844 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-03 13:48:25 -07:00
Varun Sundar Rabindranath	fa98d77773	[Kernel] DeepEP dispatch-combine kernel integration (#18434 ) Signed-off-by: Varun <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-06-03 12:30:02 -07:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Charlie Fu	306d60401d	[ROCm][Kernel] Add gfx950 support for skinny gemms (#18010 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-05-31 07:40:05 -07:00
Lucas Wilkinson	ce75efeecb	[BugFix] FA2 MLA Accuracy Issue (#18807 ) Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com>	2025-05-28 08:59:39 +00:00
almersawi	a547aeb828	feat(rocm-support): support mamba2 on rocm (#18565 ) Signed-off-by: Islam Almersawi <islam.almersawi@openinnovation.ai> Co-authored-by: Islam Almersawi <islam.almersawi@openinnovation.ai>	2025-05-27 00:07:53 -07:00
Yuqi Zhang	d0bc2f810b	[Bugfix] Add half type support in reshape_and_cache_cpu_impl on x86 cpu platform (#18430 ) Signed-off-by: Yuqi Zhang <yuqizhang@google.com> Co-authored-by: Yuqi Zhang <yuqizhang@google.com>	2025-05-23 01:41:37 -07:00
Tyler Michael Smith	6e588da0f4	[Build/CI] Fix CUDA 11.8 build (#17679 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-22 12:13:54 -07:00
Hosang	dd5fa7e04f	[ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 (#17004 ) Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>	2025-05-21 08:35:00 -07:00
bnellnm	92247c522e	[Bug] Fix moe_sum signature (#18440 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-05-20 22:37:08 -07:00
Lucia Fang	258bf621d5	fix CUDA_check redefinition in #17918 (#18287 ) Signed-off-by: Lucia Fang <fanglu@fb.com> Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>	2025-05-19 13:42:35 -07:00
Jinzhen Lin	e73b7dfd69	[Bugfix] fix `an illegal memory access was encountered` of marlin kernel + act_order (#18245 )	2025-05-16 16:02:44 -07:00
Lain	e23564cb70	use ceil_div in cutlass block scaling shape check (#17918 )	2025-05-16 03:02:58 -07:00
omahs	a9944aabfa	fix: typos (#18151 ) Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>	2025-05-15 02:16:15 -07:00
Lucas Wilkinson	d93c976a0d	[Kernel] Have rotary embeddings support tensors (#18046 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-14 15:43:55 -07:00
bnellnm	f9c069c85e	Modularize fused experts and integrate PPLX kernels (#15956 )	2025-05-14 13:11:54 -07:00
Jinzhen Lin	d4154c35a2	[Bugfix] fix moe marlin `topk_weight` loading (#18080 ) Co-authored-by: mgoin <mgoin64@gmail.com>	2025-05-13 23:31:57 -07:00
Charlie Fu	7b2f28deba	[AMD][torch.compile] Enable silu+fp8_quant fusion for rocm (#18082 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-05-13 22:13:56 -07:00
Driss Guessous	e57e4d6e9e	Fix Broken macro for cutlass moe (#18049 ) Signed-off-by: drisspg <drisspguessous@gmail.com>	2025-05-12 23:31:06 -07:00
Arjun Kathuria	d8487ef557	[ROCm]: Fix build from source failure with gcc14 and ROCm 6.3 (#13779 ) Signed-off-by: Arjun Kathuria <arjun.kathuria8@gmail.com>	2025-05-12 20:36:33 -07:00
Tao He	60f7624334	Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844 )	2025-05-12 19:52:47 -07:00
Jinzhen Lin	d74e5f37bc	[Kernel] fp4 marlin kernel (#17687 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>	2025-05-10 19:58:49 -07:00
Pavani Majety	0c0fdae84f	[Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model (#16362 )	2025-05-09 16:24:41 -07:00
Shu Wang	376786fac1	Add cutlass support for blackwell fp8 blockwise gemm (#14383 ) Signed-off-by: Shu Wang <shuw@nvidia.com>	2025-05-08 15:09:55 -07:00

1 2 3 4 5 ...

425 Commits