Commit Graph

414 Commits

Author SHA1 Message Date
Woosuk Kwon b9926f7f66
Support block size 32 (#35) 2023-04-09 23:07:18 -07:00
Woosuk Kwon c267b1a02c
Add query stride to multi_query_cached_kv_attention & Add kernel benchmark script (#27)
* Add query stride to multi_query_cached_kv_attention

* Add kernel benchmark script
2023-04-08 13:36:09 -07:00
Woosuk Kwon 0f40557af6
Implement block copy kernel to optimize beam search (#32) 2023-04-07 17:45:07 -07:00
Siyuan (Ryans) Zhuang 21b3671bbc
Basic attention kernel that supports cached KV + (multi-)prompts (#24) 2023-04-04 20:34:46 -07:00
Woosuk Kwon 897cb2ae28
Optimize data movement (#20) 2023-04-02 00:30:17 -07:00
Woosuk Kwon 09e9245478
Add custom kernel for RMS normalization (#16) 2023-04-01 00:51:22 +08:00
Woosuk Kwon 88c0268a18
Implement custom kernel for LLaMA rotary embedding (#14) 2023-03-30 11:04:21 -07:00
Woosuk Kwon cfae35b861
Add miscellaneous updates (#8) 2023-03-13 13:48:38 -07:00
Woosuk Kwon 1a7eb7da61
Support beam search & parallel generation (#7) 2023-03-10 09:58:21 -08:00
Woosuk Kwon 0deacbce6e
Implement `single_query_cached_kv_attention` kernel (#3) 2023-03-01 15:02:19 -08:00
Woosuk Kwon c413c41cda Add reshape_and_cache op 2023-02-18 19:22:57 +00:00
Woosuk Kwon ffad4e1e03 cache_kernel -> cache_kernels 2023-02-16 20:05:45 +00:00
Woosuk Kwon 6d2f74efb3 Remove redundant fn 2023-02-16 09:24:42 +00:00
Woosuk Kwon 6f058c7ba8 Implement cache ops 2023-02-16 07:47:03 +00:00