vllm/csrc/cpu
Yuqi Zhang d0bc2f810b
[Bugfix] Add half type support in reshape_and_cache_cpu_impl on x86 cpu platform (#18430)
Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
Co-authored-by: Yuqi Zhang <yuqizhang@google.com>
2025-05-23 01:41:37 -07:00
..
activation.cpp [Kernel][CPU] Add Quick `gelu` to CPU (#5717) 2024-06-21 06:39:40 +00:00
attention.cpp Adding cpu inference with VXE ISA for s390x architecture (#12613) 2025-03-06 08:40:53 -08:00
cache.cpp [Kernel][CPU] CPU MLA (#14744) 2025-03-25 09:34:59 +00:00
cpu_types.hpp Adding cpu inference with VXE ISA for s390x architecture (#12613) 2025-03-06 08:40:53 -08:00
cpu_types_arm.hpp [Bugfix] Explicitly include "omp.h" for MacOS to avoid installation failure (#14051) 2025-03-02 17:35:01 -08:00
cpu_types_vsx.hpp [Hardware][Power] Enable compressed tensor W8A8 INT8 quantization for POWER (#17153) 2025-05-07 22:35:03 -07:00
cpu_types_vxe.hpp Adding cpu inference with VXE ISA for s390x architecture (#12613) 2025-03-06 08:40:53 -08:00
cpu_types_x86.hpp [Bugfix] Add half type support in reshape_and_cache_cpu_impl on x86 cpu platform (#18430) 2025-05-23 01:41:37 -07:00
dnnl_helper.hpp [Hardware][CPU] Update torch 2.5 (#9911) 2024-11-07 04:43:08 +00:00
layernorm.cpp [Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047) 2024-06-09 16:23:30 -04:00
mla_decode.cpp [Kernel][CPU] CPU MLA (#14744) 2025-03-25 09:34:59 +00:00
pos_encoding.cpp Make key optional for rotary embedding (#17566) 2025-05-07 00:11:46 -07:00
quant.cpp [Hardware][Power] Enable compressed tensor W8A8 INT8 quantization for POWER (#17153) 2025-05-07 22:35:03 -07:00
shm.cpp [CPU][Bugfix] Using custom allreduce for CPU backend (#15934) 2025-04-02 07:46:47 -07:00
torch_bindings.cpp [Hardware][Power] Enable compressed tensor W8A8 INT8 quantization for POWER (#17153) 2025-05-07 22:35:03 -07:00
utils.cpp [Bugfix] fix gettid method is not define (#16084) 2025-04-08 19:12:44 -07:00