backport llama changes
This commit is contained in:
parent
9227cfd6d5
commit
d85b0ef5b5
|
@ -80,4 +80,4 @@ To learn more about distributed inference using vLLM please refer to [this doc](
|
|||
|
||||
### Acknowledgements
|
||||
|
||||
We would like to thank Meta for the pre-release partnership and letting us test the model. Independently from the release, we thank the following vLLM contributors for the features mentioned in this blogpost: [NeuralMagic](https://neuralmagic.com/) for FP8 quantization; [CentML](https://centml.ai/) for pipeline parallelism; [Anyscale](https://www.anyscale.com/) for the chunked prefill feature. The evaluation runs on [Lambda’s 1-Click Clusters](https://lambdalabs.com/service/gpu-cloud/1-click-clusters) with InfiniBand, and we thank Lambda Labs for the resource and the smooth cluster setup experience.
|
||||
We would like to thank Meta for the pre-release partnership and letting us test the model. Independently from the release, we thank the following vLLM contributors for the features mentioned in this blogpost: [Neural Magic](https://neuralmagic.com/) for FP8 quantization; [CentML](https://centml.ai/) for pipeline parallelism; [Anyscale](https://www.anyscale.com/) for the chunked prefill feature. The evaluation runs on [Lambda’s 1-Click Clusters](https://lambdalabs.com/service/gpu-cloud/1-click-clusters) with InfiniBand, and we thank Lambda for the resource and the smooth cluster setup experience.
|
||||
|
|
Loading…
Reference in New Issue