An external provider for Llama Stack allowing for the use of RamaLama for inference.
Updated 2025-08-21 17:46:16 +08:00
A high-throughput and memory-efficient inference and serving engine for LLMs
llm
mlops
pytorch
cuda
inference
llama
llm-serving
llmops
model-serving
qwen
rocm
tpu
trainium
transformer
amd
xpu
deepseek
gpt
hpu
inferentia
Updated 2025-08-01 01:35:07 +08:00
Updated 2025-07-22 18:08:32 +08:00
RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.
Updated 2025-07-19 18:35:54 +08:00
Podman AI Lab provider for Llama Stack
Updated 2025-06-06 16:08:45 +08:00