mirror of https://github.com/vllm-project/vllm.git
[Doc] correct LoRA capitalization (#20135)
Signed-off-by: kyolebu <kyu@redhat.com>
This commit is contained in:
parent
562308816c
commit
07b8fae219
|
@ -40,7 +40,7 @@ vLLM is flexible and easy to use with:
|
|||
- OpenAI-compatible API server
|
||||
- Support NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs, Gaudi® accelerators and GPUs, IBM Power CPUs, TPU, and AWS Trainium and Inferentia Accelerators.
|
||||
- Prefix caching support
|
||||
- Multi-lora support
|
||||
- Multi-LoRA support
|
||||
|
||||
For more information, check out the following:
|
||||
|
||||
|
|
|
@ -427,7 +427,7 @@ Specified using `--task embed`.
|
|||
See [relevant issue on HF Transformers](https://github.com/huggingface/transformers/issues/34882).
|
||||
|
||||
!!! note
|
||||
`jinaai/jina-embeddings-v3` supports multiple tasks through lora, while vllm temporarily only supports text-matching tasks by merging lora weights.
|
||||
`jinaai/jina-embeddings-v3` supports multiple tasks through LoRA, while vllm temporarily only supports text-matching tasks by merging LoRA weights.
|
||||
|
||||
!!! note
|
||||
The second-generation GTE model (mGTE-TRM) is named `NewModel`. The name `NewModel` is too generic, you should set `--hf-overrides '{"architectures": ["GteNewModel"]}'` to specify the use of the `GteNewModel` architecture.
|
||||
|
|
Loading…
Reference in New Issue