Commit Graph

69 Commits

Author SHA1 Message Date
zhaoyang-star 9090bf02e7
Support FP8-E5M2 KV Cache (#2279)
Co-authored-by: zhaoyang <zhao.yang16@zte.com.cn>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-01-28 16:43:54 -08:00
Antoni Baum 9b945daaf1
[Experimental] Add multi-LoRA support (#1804)
Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Co-authored-by: Avnish Narayan <avnish@anyscale.com>
2024-01-23 15:26:37 -08:00
Woosuk Kwon 37ca558103
Optimize model execution with CUDA graph (#1926)
Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2023-12-16 21:12:08 -08:00
CHU Tianxiang 0fbfc4b81b
Add GPTQ support (#916) 2023-12-15 03:04:22 -08:00
Woosuk Kwon 5dd80d3777
Fix latency benchmark script (#2035) 2023-12-11 11:19:08 -08:00
Antoni Baum 05ff90b692
Save pytorch profiler output for latency benchmark (#1871)
* Save profiler output

* Apply feedback from code review
2023-12-05 20:55:55 -08:00
Woosuk Kwon 51d3cb951d
Remove max_num_seqs in latency benchmark script (#1855) 2023-11-30 00:00:32 -08:00
Woosuk Kwon e74b1736a1
Add profile option to latency benchmark script (#1839) 2023-11-29 23:42:52 -08:00
chooper1 1f24755bf8
Support SqueezeLLM (#1326)
Co-authored-by: squeeze-ai-lab <squeezeailab.bair@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2023-10-21 23:14:59 -07:00
Antoni Baum acbed3ef40
Use monotonic time where appropriate (#1249) 2023-10-02 19:22:05 -07:00
kg6-sleipnir b5a10eb0ef
Added `dtype` arg to benchmarks (#1228) 2023-09-30 21:04:03 -07:00
Woosuk Kwon e3e79e9e8a
Implement AWQ quantization support for LLaMA (#1032)
Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>
2023-09-16 00:03:37 -07:00
Ricardo Lu 8c4b2592fb
fix: enable trust-remote-code in api server & benchmark. (#509) 2023-07-19 17:06:15 -07:00
Woosuk Kwon 4338cc4750
[Tokenizer] Add an option to specify tokenizer (#284) 2023-06-28 09:46:58 -07:00
Woosuk Kwon 0b98ba15c7
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
Zhuohan Li e5464ee484
Rename servers to engines (#152) 2023-06-17 17:25:21 +08:00
Woosuk Kwon 311490a720
Add script for benchmarking serving throughput (#145) 2023-06-14 19:55:38 -07:00
Woosuk Kwon 8274ca23ac
Add docstrings for LLM (#137) 2023-06-04 12:52:41 -07:00
Woosuk Kwon 211318d44a
Add throughput benchmarking script (#133) 2023-05-28 03:20:05 -07:00