Simplify ramalama's top-level description. Remove the duplicate
statements.
Also make sure all references to PyPI are spelled this way.
Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
pip's caching behavior was causing errors when downloading huge (4.5G) torch wheels during
the rocm-ubi-rag build.
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
A previous commit changed the second argument to add_rag() from the image name to the
full repo path. Update the case statement accordingly, so the "GPU" variable is set correctly.
The "cuda" directory is no longer available on download.pytorch.org. When building for cuda,
pull wheels from the "cu128" directory, which contains binaries built for CUDA 12.8.
When building rocm* images, download binaries from the "rocm6.3" directory, which are built
for ROCm 6.3.
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
Currently other data files such as shortnames.conf, man pages, and shell
completions are included in the Python wheel. Including ramalama.conf
as well means we can avoid several calls to make in the RPM spec file,
instead relying on the wheel mechanisms to put these files in place. As
long as `make docs` is run before the wheel generation, all the
necessary files are included.
Signed-off-by: Carl George <carlwgeorge@gmail.com>
Extract information directly from the CDI YAML file by making some
simplifying assumptions instead of doing a complete YAML parse.
Default to all devices known to nvidia-smi.
Fix the signature of check_nvidia().
Remove some debug logging.
Signed-off-by: John Wiele <jwiele@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Signed-off-by: John Wiele <jwiele@redhat.com>
Allow GPUs to be specified by UUID as well as index since the index is
not guaranteed to persist across reboots.
Crosscheck requested GPUs with nvidia-smi and CDI configuration. If
any requested GPUs lack corresponding CDI configuration, print a
message with a pointer to documentation.
If the only GPU specified in the CDI configuration is "all", as
appears to be the case on WSL2, use "all" as the default.
Add an optional encoding argument to run_cmd() to facilitate checking
the output of the command.
Add pyYAML as a dependency for parsing the CDI configuration.
Signed-off-by: John Wiele <jwiele@redhat.com>
By default, Konflux triggers new pipelines when a PR moves from Draft to
"Ready for Review". Because the commit SHA hasn't changed, no new builds
are performed. However, a new integration test is also triggered, and because
no builds were performed it is unable to find the URL and digest of the images,
causing the integration test to fail. Updating the "on-cel-expression" to exclude
the transition to "Ready to Review" avoids the unnecessary pipelines and the
false integration test failures.
Update the whitespace of the "on-cel-expression" in the push pipelines for consistency.
No functional change.
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
When running in a krun-isolated container, we need
"/usr/libexec/virgl_render_server" to be present in the container
image to launch it before entering the microVM.
Install the virglrenderer package in addition to mesa-vulkan-drivers.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Setting RAMALAMA_IMAGE would cause some unit tests to fail. Make those
tests independent of the calling environment.
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
Including the source in the bats image ensures that we're always testing with the same
version of the code that was used to build the images. It also eliminates the need for
repeated checkouts of the repo and simplifies testing, avoiding additional volumes and
artifact references.
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
* This removes epel9 from packit rules as epel9 does not currently
build without many additional packages added to the distro.
* This fixes a breakage in epel10 by adding mailcap as a buildrequires.
Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
Co-authored-by: Stephen Smoogen <ssmoogen@redhat.com>
Fix the "serve and stop" test by passing the correct (possibly random) port to "ramalama chat".
Fix the definition of "ramalama_runtime".
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
Remove the STORAGE_DRIVER env var from the container so it doesn't force use
of the vfs driver in all cases.
Mount /dev/fuse into the container when running locally.
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
accel_image() is called to set option defaults, before options are even parsed.
This can cause images to be pulled even if they will not actually be used, slowing
down testing and making the cli less responsive. Set the "dryrun" option before
the first call to accel_image() to avoid unnecessary image pulls.
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
Don't pull images in _get_rag() and _get_source_model() if pull == "never"
or if running with "--dryrun".
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
* Start adding rpm/ramalama.spec for Fedora
Add a ramalama.spec to sit next to python-ramalama.spec while we get
this reviewed. Change various configs so they are aware of
ramalama.spec
Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
* Add needed obsoletes/provides in base rpm to start process.
Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
* Try to fix CI problems with initial mr
The initial MR puts two spec files in the same directory which was
causing problems with the CI. This splits them off into different
directories which should allow for the tooling to work.
Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
* Finish move of Fedora rpm package to new name.
Put changes into various files needed to allow for new RPM package
`ramalama` to build in Fedora infrastructure versus python3-ramalama.
Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
* Fix problem with path names lsm5 caught
Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
---------
Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
Co-authored-by: Stephen Smoogen <ssmoogen@redhat.com>
- Corrected the model name under the Benchmark section; previous name was not available in Ollama's registry.
- Added instructions to switch between CPU-only mode and using all available GPUs via CUDA_VISIBLE_DEVICES.
Signed-off-by: Mario Antonio Bortoli Filho <mario@bortoli.dev>
Add ramalama rag --format option to allow outputing
of markdown, json as well as qdrant databases.
This content can then be used as input to the client tool.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Co-authored-by: Ian Eaves <ian.k.eaves@gmail.com>
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
-rag builds were failing due to the 40G disk filling up. Run builds on
newly-available "d160" instance types which have 160G of disk space
available.
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
The integration tests will be triggered after all image builds associated with a single
commit are complete. Tests are currently being run on amd64 and arm64 platforms.
Remove "bats-nocontainer" from the build-time tests, since those are covered by "bats" run
in the integration tests.
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
Some bats tests need the ramalama-rag image avilable for the current arch. Build
all the ramalama layered images on arm64 as well as amd64.
Switch to building on larger VM instance types to reduce build times and improve
developer feedback and experience.
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
Some tests parse the output of the ramalama cli and hard-code the location of the expected
default image. However, this output changes based on the value of the RAMALAMA_IMAGE
environment variable, and setting this variable in the calling environment can cause those
tests to fail. Unset the RAMALAMA_IMAGE environment variable in these tests to avoid false failures.
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
This was recently removed:
+ if getattr(self.args, "model", False):
+ data["model"] = self.args.model
it is required
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
The split model feature was exclusive to URL models. Because of this - and the
improvements in mounting all model snapshot files - the logic has been removed
from the ModelFactory and put to the URL model class.
Signed-off-by: Michael Engel <mengel@redhat.com>
Previously, the model, mmproj and chat template files were mounted explicity if
present using many if-exists checks. Relying on the new ref file all files of that
model snapshot are either mounted or used directly with its blob path. When mounted
into a container, the files are put into MNT_DIR with the respective file names.
The split_model part has been dropped for now, but will be refactored in the next
commit.
Signed-off-by: Michael Engel <mengel@redhat.com>
Replacing the use of RefFile with the new RefJSONFile in model store. It also adds
support for adhoc migration of old to new ref file format.
This will break ramalama as is since no specific functionality for getting the explicit
(gguf) model file path has been implemented. Will be adjusted in the next commit to
fix this.
Signed-off-by: Michael Engel <mengel@redhat.com>
Added a new, simpler ref file format serialized as JSON. It also gets additional
fields such as the hash of the file that is used as the name of the blob file.
This essentially makes the snapshot directory and all symlinks obsolete, further
simplifying the storage and improving stability. It also leads to the ref file as
being the single source for all files of a model.
Further refactoring, incl. swapping and migrating from the old to new format, will
follow in subsequent commits.
Signed-off-by: Michael Engel <mengel@redhat.com>
mlx_lm.server is the only one in my path at least on my system.
Also, printing output like this which doesn't make sense:
Downloading huggingface://RedHatAI/Llama-3.2-1B-Instruct-FP8-dynamic/model.safetensors:latest ...
Trying to pull huggingface://RedHatAI/Llama-3.2-1B-Instruct-FP8-dynamic/model.safetensors:latest ...
Also remove recommendation to install via `brew install ramalama`, skips installing Apple specific
dependancies.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
This eliminates the need for pulling images by accident when not
using containers. Since these commands are only used for container
commands, no need for them in other places.
Fixes: https://github.com/containers/ramalama/issues/1662
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
previously we were setting an explicit version of `ramalama-stack`
in the Containerfile restricting what we used at runtime
moved the install to the entrypoint script and allowed the use of
the RAMALAMA_STACK_VERSION env var to install a specific version
(default with no env var installs the latest package and pulls the
YAML files from the main branch)
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
Build the -llama-server, -whisper-server, and -rag layered images, which inherit from
the existing ramalama, cuda, rocm, and rocm-ubi images.
Layered images use shared Containerfiles, and customize their builds using --build-arg.
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
Move the Containerfiles for the entrypoint and rag images out of container_build.sh and into their
own files. This is necessary so they can be built with Konflux.
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
The source code for the model store is getting bigger, so splitting it
into multiple source files under a directory helps keeping it easier
to read.
Signed-off-by: Michael Engel <mengel@redhat.com>
By using the pull field in the config instance for the flag to
indicate pulling of the container image should be attempted in
the accel_image function, the behavior is tied to the cli options.
This also prevents a ramalama ls to seemingly block since the
image is downloaded (with no output).
Signed-off-by: Michael Engel <mengel@redhat.com>
Previously would always remove this partial blob file.
Note: this assumes the blob hash equals the snapshot hash, which
is only true for repos with a single model
Signed-off-by: Oliver Walsh <owalsh@redhat.com>
When deleting a reference, count the remaining references to the
snapshot/blobs to determine if they should be deleted.
Signed-off-by: Oliver Walsh <owalsh@redhat.com>
Container builds and tests can take a long time. We'd rather them eventually complete successfully
than fail with a timeout.
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
* Start adding rpm/ramalama.spec for Fedora
Add a ramalama.spec to sit next to python-ramalama.spec while we get
this reviewed. Change various configs so they are aware of
ramalama.spec
Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
* Add needed obsoletes/provides in base rpm to start process.
Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
* Try to fix CI problems with initial mr
The initial MR puts two spec files in the same directory which was
causing problems with the CI. This splits them off into different
directories which should allow for the tooling to work.
Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
---------
Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
Co-authored-by: Stephen Smoogen <ssmoogen@redhat.com>
When you run a Model server within a container and only wanted it bound
to a certain port, the port binding should happen to the container not
inside of the container.
Fixes: https://github.com/containers/ramalama/issues/1572
Also fix handling of -t option, should not be used with anything other
then run command, and now I am not sure of that.
The LLAMA_PROMPT_PREFIX= environment variable should not be set within
containers as an environment variable, since we are doing chat on the
outside.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Adapt ramalama stack and chat modules for compatibility with llama-stack by updating host binding, argument formatting, and command invocation patterns, and add robust attribute checks in the chat utility.
Bug Fixes:
Add hasattr checks around optional args (pid2kill, name) in chat kills() to prevent attribute errors
Enhancements:
Bind model server to 0.0.0.0 instead of localhost for external accessibility
Convert port, context size, and thread count arguments to strings for consistent CLI usage
Reformat container YAML to use JSON array and multiline args for llama-server and llama-stack commands
Update Containerfile CMD to JSON exec form for llama-stack entrypoint
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
We're currently using /usr/share/zsh/vendor-completions for zsh
completions. However, the RPM macro %{zsh_completions_dir} (which is
required by the Fedora packaging guidelines) is defined as
/usr/share/zsh/site-functions, so let's switch to that.
https://docs.fedoraproject.org/en-US/packaging-guidelines/ShellCompletions/
Signed-off-by: Carl George <carlwgeorge@gmail.com>
Were using Podman to build images, so don't futz with Docker.
only build base images, not as necessary to build RAG Images.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Currently we are incorrectly reporting file models as
file://PATH as opposed to the correct file:///PATH.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Rename "nopull" to "pull" for improved clarity and readability. This
avoids the double-negative, making the logic more straightforward to
reason about. "pull = True" now means "pull the image", "pull = False"
means "don't pull the image."
Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
When building on Fedora systems make sure we install the
mesa version from the COPR, which has the patches to force
alignment to 16K (needed for GPU acceleration on macOS, but
harmless to other systems).
We also need to add "--nobest" to "dnf update" to ensure it
doesn't get frustrated by being unable to install the mesa package
from appstream.
Signed-off-by: Sergio Lopez <slp@redhat.com>
By accessing the model store via property a None-check can be performed
and creating an instance on-the-fly. In addition, this removes the need
for setting the store from the factory and removes its optional trait.
The unit tests for ollama have been rewritten as well since functions
such as repo_pull or exists have been removed. It only tests the pull
function which mocks away http calls to external services.
Signed-off-by: Michael Engel <mengel@redhat.com>
Relates to: github.com/containers/ramalama/pull/1559
Remove Model flag for safetensor files for now in order to
allow multiple safetensor files be downloaded for the
convert command.
Signed-off-by: Michael Engel <mengel@redhat.com>
In addition to pruning old model store code, the usage of downloading
files using the hfcli or modelscope cli has been removed.
In the future, the download of multiple files - incl. safetensors - will
be done explicitly based on the metadata only by http requests.
Signed-off-by: Michael Engel <mengel@redhat.com>
Add a new "bats" container which is configured to run the bats tests.
The container supports running the standard bats test suite
(container-in-container) as well as the "--nocontainer" tests.
Add two new Makefile targets for running the bats container via podman.
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
Add the --dri option to disable mounting /dev/dri into the container when running "ramalama serve --api llama-stack".
Update bats test to pass "--dri off".
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
We used to have this feature, got dropped recently accidentally,
can do things like:
`cat text_file_with_prompt.txt | ramalama run smollm:135m`
or
`cat some_doc | ramalama run smollm:135m Explain this document:`
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
None of our tests should take more then 1 hour, so time them
out and then need to figure out what is causing the issue.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Move the pipeline definitions into their own files and references them from the PipelineRuns
that are created on pull request and push. This allows the pipelines to be used for multiple
components and dramatically reduces code duplication and maintenance burden.
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
This fixes make validate to not complain about --ctx-size option.
No reason to have this available in display, since this is only for
users assuming vllm options.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
To be consistent with "ramalama run" experience. Inferencing
servers that have implemented model-swapping require this. In the
case of servers like llama-server that only load one server, any
value is sufficient.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
This commit adds TMT test jobs triggered via Packit that fetches an
instance with NVIDIA GPU, specified in `plans/no-rpm.fmf`, and can be
verified in the gpu_info test result.
In addition, system tests (nocontainer), validate, and unit tests are
also triggered via TMT.
Fixes: #1054
TODO:
1. Enable bats-docker tests
2. Resolve f41 validate test failures
Signed-off-by: Lokesh Mandvekar <lsm5@fedoraproject.org>
Copy the current checkout of the ramalama repo into the containers and use that for installation.
This removes the need for an extra checkout of the ramalama repo, and is consistent with the build
process used by container_build.sh (which used a bind-mount rather than a copy).
This keeps the version of ramalama in sync with the Containerfiles, and makes testing and CI more
useful.
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
ramalama chat does not use --context or --temp, these are server
settings not client side.
Also remove ramalama client command, since this is a duplicate of
ramalama chat.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
I see people showing things like:
file://srv/llm/modles/unsloth/Qwen3-235B-A22B-GGUF/UD_Q2_K_XL/Qwen3-235B-A22B-UD-Q2_K_XL-00001-of-00002.gguf:latest 1 month ago 46.42 GB
file://srv/llm/modles/unsloth/Qwen3-235B-A22B-GGUF/UD_Q2_K_XL/Qwen3-235B-A22B-UD-Q2_K_XL-00002-of-00002.gguf:latest 1 month ago 35.55 GB
file://srv/llm/modles/unsloth/Qwen3-235B-A22B-GGUF/Q8_0/Qwen3-235B-A22B-Q8_0-00001-of-00006.gguf:latest 1 week ago 46.44 GB
file://srv/llm/modles/unsloth/Qwen3-235B-A22B-GGUF/Q8_0/Qwen3-235B-A22B-Q8_0-00002-of-00006.gguf:latest 1 week ago 46.0 GB
file://srv/llm/modles/unsloth/Qwen3-235B-A22B-GGUF/Q8_0/Qwen3-235B-A22B-Q8_0-00003-of-00006.gguf:latest 1 week ago 45.93 GB
file://srv/llm/modles/unsloth/Qwen3-235B-A22B-GGUF/Q8_0/Qwen3-235B-A22B-Q8_0-00004-of-00006.gguf:latest 1 week ago 46.0 GB
file://srv/llm/modles/unsloth/Qwen3-235B-A22B-GGUF/Q8_0/Qwen3-235B-A22B-Q8_0-00005-of-00006.gguf:latest 1 week ago 46.0 GB
file://srv/llm/modles/unsloth/Qwen3-235B-A22B-GGUF/Q8_0/Qwen3-235B-A22B-Q8_0-00006-of-00006.gguf:latest 1 week ago 2.39 GB
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
Fixes: https://github.com/containers/ramalama/issues/1557
Remove Model flag for safetensor files for now in order to
allow multiple safetensor files be downloaded for the
convert command.
Signed-off-by: Michael Engel <mengel@redhat.com>
github UI showed red, changing just in case, incorrect tabs or
spaces can cause github ui to skip builds.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
Currently we are ignoreing the user specified image if it does not
contain a ':'
Fixes: https://github.com/containers/ramalama/issues/1525
While I was in the code base, I standardized on container-images for
Fedora to come from quay.io/fedora repo.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Think it's only meant for the:
container-images/scripts/build-cli.sh
version, it's breaking podman on my bootc system and replacing
/usr/bin/podman with a broken /usr/bin/podman-remote symlink.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
For now we will just add the chat command, next PR will remove the
external chat command and just use this internal one.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Relates to: https://github.com/containers/ramalama/issues/1278
By default, ramalama ls should not display partially downloaded
AI Models. In order to enable users to view all models, the new
option --all for the ls command has been introduced.
Signed-off-by: Michael Engel <mengel@redhat.com>
Remove failing on pipe errors, since something the network
can fail and break the demo, it would be better to continue
after failures.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
I have found that when running with nvidia the -t (--tty) option
in podman is covering up certain errors. When we are not running
ramalama interactively, we do not need this flag set, and this
would make it easier to diagnose what is going on with users
systems.
Don't add -i unless necessary
Server should not need to be run with --interactive or --tty.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Substitute huggingface with hf and remove :latest as it doesn't
really apply. huggingface lines are particularly lengthy so it's
welcome characters saved. hf is a common acronym for huggingface
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
Relates to: https://github.com/containers/ramalama/issues/1508
remove_snapshot should never fail, therefore adding the ignore_errors=True.
Before removing a snapshot with ramalama rm an existence check is made. If
the model does not exist, an error will be raised to preserve the previous
behavior of that command.
Signed-off-by: Michael Engel <mengel@redhat.com>
Trying to put this timeout to bed once and for all. There is a
chance a really large model on certain hardware could take more
than 16 seconds to load.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
`blobfile` dependency is already included in ramalama-stack version 0.2.0
adding it explicitly is unnecessarily
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
This does nothing on systems with no GPUs, but on Vulkan-capable
systems, this would automatically offload the model to capable
accelerators.
Take this moment to claim Vulkan support in README also.
Signed-off-by: Leorize <leorize+oss@disroot.org>
Discover AMD graphics devices using AMDKFD topology instead of
enumerating the PCIe bus. This interface exposes a lot more information
about potential devices, allowing RamaLama to filter out unsupported
devices.
Currently, devices older than GFX9 are filtered, as they are no longer
supported by ROCm.
Signed-off-by: Leorize <leorize+oss@disroot.org>
In the llama.cpp case it doesn't make as much sense, llama-server
prints this string when it's ready to be served like so:
main: server is listening on http://0.0.0.0:8080 - starting the main loop
This can be printed seconds or minutes too early potentially in
the llama.cpp case.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
llama-stack API is not working without --generate command.
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
llama-server by default warms up the model with an empty run for
performance reasons. We can warm up ourselves with a real query.
Warming up was causing issues and delays start time.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
RamaLama does not try to detect GPU if the user has already set
certain env vars. Make this list smaller.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
currently it builds correctly on s390x but we want to enforce the
-DGGML_VXE=ON flag. we also want to disable whisper.cpp for now until we
can bring up support for it, otherwise it will be a product that none of us
have experience in.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
fix: missing s390x for ramalama
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
feat: disable whisper.cpp for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
chore: remove s390x containerfile
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Switching pyproject.toml to python 3.10 since
CANN and MUSE containerfiles only have access to those
versions of python.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
An error when creating new snapshots has only been partially handled
inside the model store and the caller side had to clean up properly.
In order to simplify this, more error handling has been added when
creating new snapshots - removing the (faulty) snapshot, logging and
passing the exception upwards so that the caller can do additional
actions. This ensures that the state remains consistent.
Signed-off-by: Michael Engel <mengel@redhat.com>
Previously, the endianness check was done for each SnapshotFile and
these files might not be models, but could also be miscellaneous such
as chat templates or other meta data. By removing only the affected file
on a mismatch error the store might get into an inconsistent state since
the cleanup depends on the error handling of the caller.
Therefore, the check for endianness has been moved one layer up and only
checks the flagged model file. In case of a mismatch an implicit removal
of the whole snapshot is triggered.
Signed-off-by: Michael Engel <mengel@redhat.com>
By moving the recently improved code to detect the endianness into
a dedicated function, its reusability is increased. Also, a specific
exception class if the model is not in the gguf format has been added.
Signed-off-by: Michael Engel <mengel@redhat.com>
Add new option --api which allows users to specify the API Server
either llama-stack or none. With None, we just generate a service with
serve command. With `--api llama-stack`, RamaLama will generate an API
Server listening on port 8321 and a openai server listening on port
8080.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Add more logging to indacate requests to http/https addresses in
debug. This should make it easier to find out what exactly is going
on under the hood mainly for pull command.
Signed-off-by: Ales Musil <amusil@redhat.com>
Add global logger that can be used to print message to stderr.
Replace all perror calls in dabug cases with logger.debug calls
which reduces the extra argument required to pass as the module
will print error message based on the level.
Signed-off-by: Ales Musil <amusil@redhat.com>
podman 5.5 and Podman Desktop have been updated, this
should give us better performance then previous versions
on MAC.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
When 'ramalama run' is used with '--network none', it was inheriting host proxy
environment variables. This caused the interanl client to fail when connecting to
the internal llama-server on 127.0.0.1, as it tried to route loopback traffic through
the unreachable proxy.
This change modifies engine.py to:
- Correctly set NO_PROXY/no_proxy for localhost and 127.0.0.1.
- Explicitly unset http_proxy, https_proxy, HTTP_PROXY, and HTTPS_PROXY variables
for the container when the 'run' subcommand is invoked.
This allows the internal client to connect directly to the internal server, resolving
the connection error.
Fixes: #1414
Signed-off-by: Song Liu <soliu@redhat.com>
Re-implement without relying on ConfigParser which does not support duplicate
options.
Extend unit test coverage for this and correct the existing test data.
Signed-off-by: Oliver Walsh <owalsh@redhat.com>
prev commit made Python 3.11 the min version for
ramalama, but not all references in the project
were updated to reflect this
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
Last bump unfortunately bring a bug to rocm/hip support
bumping the version to include the fix.
[0] https://github.com/ggml-org/llama.cpp/issues/13437
Signed-off-by: Attila Fazekas <afazekas@redhat.com>
Bigger than 70B models typically stored in multiple gguf files
with a special naming what the llama.cpp expects.
Signed-off-by: Attila Fazekas <afazekas@redhat.com>
The huggingface repo tag refers to the quantization and is case insensitive.
Normalize this to uppercase.
Fixes: #1421
Signed-off-by: Oliver Walsh <owalsh@redhat.com>
Add support for pulling hf repos vs individual models, replicating
the `llama.cpp -hf <model>` logic.
Add support for mmproj file in model store snapshot.
If an mmproj file is available pass it on the llama.cpp command line.
Structure classes to continue support for modelscope as ModelScopeRepository
inherits from HuggingfaceRepository.
Example usage:
$ ramalama serve huggingface://ggml-org/gemma-3-4b-it-GGUF
...
Open webui, upload a picture, ask for a description.
Fixes: #1405
Signed-off-by: Oliver Walsh <owalsh@redhat.com>
Include additional information such as:
- Possibility to generate containers through Makefile
- Possibility to generate coverage reports through Makefile
Signed-off-by: Sergio Arroutbi <sarroutb@redhat.com>
This was added when we didn't have good installation techniques
for mac. We have pipx which was not intuitive and a hacked
together shell script as an alternative. Now that we have brew and
uv integrated we don't need this code.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
Added unit tests for new parsing feature of --generate option as
well as for the refactored quadlet file generation. In addition,
a system test has been added to verify the output directory of
the --generate option works as expected.
Signed-off-by: Michael Engel <mengel@redhat.com>
Instead of writing the quadlet string manually, lets use the
configparser from the standard library. A slim wrapper class
has been added as well to simplify the usage of configparser.
In addition, the generated quadlets are not directly written to
file, but instead the inifile instances are returned. This
implies that the caller needs to do the write_to_file call and
enables writing simple unit tests for the generation.
Signed-off-by: Michael Engel <mengel@redhat.com>
Currently the CUDA_VISIBLE_DEVICES environment variable defaults to '0'
when it's not overidden by the user. This commit updates it to include all
available GPUs detected by nvidia-smi, allowing the application to
utilize multiple GPUs by default.
Signed-off-by: Marius Cornea <mcornea@redhat.com>
get_cmd_with_wrapper() was changed in 849813f8 to accept a single string argument instead
of a list. Update cli.py to pass only the first element of the list.
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
The check_nvidia function was previously overriding any user-defined
CUDA_VISIBLE_DEVICES environment variable with a default value of "0".
This change adds a check to only set CUDA_VISIBLE_DEVICES=0 when it's not
already present in the environment.
Signed-off-by: Marius Cornea <mcornea@redhat.com>
This is consistent with how pip installs packages in the base ramalama image.
Remove some redundant package names from docling(), they're already installed in rag().
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
"$VERSION_ID" is set to "9.5" when building from a UBI9-based image (the default). This fails
the "-ge" test. Check if "$ID" is "fedora" before assuming "$VERSION_ID" is an integer.
If python3.11 is getting installed, also install python3.11-devel explicitly.
Signed-off-by: Mike Bonnet <mikeb@redhat.com>
Docling has support for pulling html pages, and we were not pulling them
correctly.
Also support --dryrun
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
removed uv.lock
Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
reverts uv-install.sh, bin/ramalama, and flat cli hierarchy
Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
packit version extraction from pyproject.toml
Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
pyproject.toml references license file
Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
fixed completion directory location
Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
fixed format and check-format. There is no longer a root .py file to check
Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
newline at end of install-uv.sh
Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
remove *.py from make lint flake8 command
Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
added import for ModelStoreImport to main
Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
attempt to consolidate main functions
Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
lint
Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
Make bin/ramalama executable
Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
typo
Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
Relates to: https://github.com/containers/ramalama/issues/1325
Follow-up of: https://github.com/containers/ramalama/pull/1350
Previously, the model_type member of the model store has been set to
the class name of the model, which mapped URL types like http or file
to url. This is now changed to use the model_type property of the
model class. It is, by default, still the inferred class name, except
in the URL class where it gets set to the URL scheme.
Signed-off-by: Michael Engel <mengel@redhat.com>
Relates to: https://github.com/containers/ramalama/issues/1325
In the list models function only the url:// prefix is present.
Passing a listed model to the factory can not map this model
input correctly to the URL model class. Therefore, this gets
extended and the unit tests updated by appropriate cases.
Signed-off-by: Michael Engel <mengel@redhat.com>
Relates to: https://github.com/containers/ramalama/issues/1325
Instead of appending the (partial) identifier directly, the returned
ModelFile class is extended to indicate if the file is partially
downloaded or not.
Signed-off-by: Michael Engel <mengel@redhat.com>
Using this script to install llama.cpp and whisper.cpp bare metal
on a bootc system, the build stops executing here:
+ ln -sf /usr/bin/podman-remote /usr/bin/podman
+ python3 -m pip install /run/ramalama --prefix=/usr
ERROR: Invalid requirement: '/run/ramalama': Expected package name at the start of dependency specifier
/run/ramalama
^
Hint: It looks like a path. File '/run/ramalama' does not exist.
Error: building at STEP "RUN chmod a+rx /usr/bin/build_llama_and_whisper.sh && build_llama_and_whisper.sh "rocm"": while running runtime: exit status 1
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
Was testing this, found some bugs, mainly caused by the recursive
call of cmdloop. Fixed this by using no recursion. Some
refactorings.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
This checked in file is an exact copy of:
curl -LsSfO https://astral.sh/uv/0.7.2/install.sh
Checking in the 0.7.2 version, because now a user can install with
access to github.com alone. Even if astral.sh is down for whatever
reason.
We may want to update uv installer from time to time.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
The llama.cpp documentation links is lost when we convert markdown to
nroff format. This change will expose the link in man pages.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Now that we've had one release with the wrapper scripts included
in the container images it should be safe to turn this on
everywhere.
Only add libexec for commands that have wrappers
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
Parsing a chat template in Go-syntax to a Jinja template might raise an
exception. Since this is only a nice-to-have feature and we fallback to
the chat template specified in the backend, lets silently skip it.
Signed-off-by: Michael Engel <mengel@redhat.com>
Don't exit on Ctrl-C, cut response short or print an info message
to the user telling them how they may exit.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
libcuda.so.1 is located at /usr/local/cuda-12.8/compat and that path
is not in any /etc/ld.so.conf.d/* files.
The workaround is to simply add the path and run ldconfig to make it
available.
Signed-off-by: Oliver Gutierrez <ogutsua@gmail.com>
Relates to: https://github.com/containers/ramalama/issues/1278
Remove the model tag including the : symbol from the file name on
migration from the old to new store. Also, rename the sanitize_hash
to sanitize_filename function.
Signed-off-by: Michael Engel <mengel@redhat.com>
Allows to pass draft model to serve and fetching it when needed.
'run' does not supports passing draft_model.
You should also pass draft related args tuned to your combination
and do not forget to set the sampling parameters like top_k
on the UI.
Signed-off-by: Attila Fazekas <afazekas@redhat.com>
Shrink the size of cly.py and model.py by moving all engine
related functions into new engine.py python module.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
The ref file could be not available, e.g. when running with --dryrun,
so the retrieved ref file instance is None. By checking this we
prevent ramalama from crashing.
Signed-off-by: Michael Engel <mengel@redhat.com>
The migration is run on each command to import all models
from the old store to the new one. It also removes the old
directories and creating the old structure is prevented.
Signed-off-by: Michael Engel <mengel@redhat.com>
Relates to: https://github.com/containers/ramalama/issues/1202
Passing the chat template file to the model run or serve leads to bad
results recently. As a temporary fix the template is not passed to the
model run.
Signed-off-by: Michael Engel <mengel@redhat.com>
temp 0 significatly reduces the sampling making unexpected output,
in many cases it makes the inference to always produce the same
output. Small modles are likely to get into a loop unless
the sampling is tuned.
Signed-off-by: Attila Fazekas <afazekas@redhat.com>
* intle-gpu the only rag user container with f41, moving to f42
* dependencies referenced by git url, adding git package
* numpy compile requires gcc-c++, python3-devel
* f42 has python3-sentencepiece same version (no compile)
Fixes issues with several other rag containers too
Signed-off-by: Attila Fazekas <afazekas@redhat.com>
Enable both server and client support for rpc.
The feature currently PoC in llama.cpp, but can work in practice.
Required for distributed inference.
Signed-off-by: Attila Fazekas <afazekas@redhat.com>
Packit by default uses `git describe` for rpm version in copr builds.
This can often lag behind the latest release thus making it impossible
to update default distro builds with copr builds.
Ref: https://copr.fedorainfracloud.org/coprs/rhcontainerbot/podman-next/package/python-ramalama/
The latest build in there still shows version: `0.7.3` when v0.7.4 is
the latest upstream release.
This commit adds a packit action to modify the spec file which fetches
version info from setup.py.
The rpm release info is also modified such that it will update over the
latest distro package in almost all circumstances, assuming no distro
package will have a release 1001+.
Signed-off-by: Lokesh Mandvekar <lsm5@fedoraproject.org>
We have accidently overwridden the images release version
if we also tag by digest, then we will not destroy the
image or manifest list. Since Podman Desktop AI Lab Recipes
relies on the image digest this makes it safer for them.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
cdb6df6877 recently added
the entrypoint.sh however the Containerfile does not have
relative path to container-images/ as for example
intel-gpu has.
container_build.sh uses container-images as working directory.
Signed-off-by: Attila Fazekas <afazekas@redhat.com>
Currently if you run in Debug mode and attempt to cut and paste the
Podman or Docker line, the PROMPT field has a space with a > in it.
When pasted this causes issues since it is not properly quoted.
With this change the the command can be successfully cut and pasted.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Relates to https://github.com/containers/podman-desktop-extension-ai-lab/issues/2630
Allow overriding the context size when running ramalama from a container.
2048 tokens (the default if not specified) is a small context window when running the inference server with
MCP tools or even for longer chat completion conversations.
Being able to provide a context window larger than 2048 is critical for those use cases.
Signed-off-by: Marc Nuri <marc@marcnuri.com>
intel-gpu will not currently build on Fedora 42, there are issues
in the glibc library. Should try again when Fedora 42 is released
in May.
Verification of the ramalama-cli command, was broken, since ramalama
is the entrypoint.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
If we are in toolbox, don't attempt to run nested containers. We
then have to rely on the user to install llama.cpp in the container
themselves. It's tempting to do an even more generic attempt to see
if we are already inside a container, so we never attempt to do
nested containers, whether toolbox, podman, docker, etc.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
Related to: https://github.com/containers/ramalama/pull/1164
Copies the improvement to only list OCI containers when the
--container flag is true.
Signed-off-by: Michael Engel <mengel@redhat.com>
the CONTRIBUTING.md doc refers to several issue templates
being present in the projec but currently none exist
this commit adds templates in based on the podman project
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
'make install-requirements' currently assumes 'pipx'
is installed in your env, but this may not be the case
add an explict install/upgrade command via pip
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
Some of the images were using f41 and others f42, moving
them all to the same version. f42 is in beta now so good time
to move.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
These are the scripts I am using to push images and build multi-arch
images to the quay.io repositories.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Currently, the config options are stored in a single dict, regardless
of where they are originated, e.g., environment variables, files, or
the preset default. This prevents overriding certain options, such as
"image", with a config file.
This groups config options by origins in collections.ChainMap.
Signed-off-by: Daiki Ueno <dueno@redhat.com>
This adds a unit test to check whether the image can be properly
overridden, with the --image command-line option or RAMALAMA_IMAGE
envvar.
Signed-off-by: Daiki Ueno <dueno@redhat.com>
Pipeline for building RamaLama images when PRs are submitted.
Based on the [docker-build-multi-platform-oci-ta](https://github.com/konflux-ci/build-definitions/tree/main/pipelines/docker-build-multi-platform-oci-ta) pipeline from [Konflux](https://konflux-ci.dev/).
params:
- description:Source Repository URL
name:git-url
type:string
- default:""
description:Revision of the Source Repository
name:revision
type:string
- description:Fully Qualified Output Image
name:output-image
type:string
- default:.
description:Path to the source code of an application's component from where to build image.
name:path-context
type:string
- default:Dockerfile
description:Path to the Dockerfile inside the context specified by parameter path-context
name:dockerfile
type:string
- default:"false"
description:Force rebuild image
name:rebuild
type:string
- default:"true"
description:Skip checks against built image
name:skip-checks
type:string
- default:"false"
description:Execute the build with network isolation
name:hermetic
type:string
- default:""
description:Build dependencies to be prefetched by Cachi2
name:prefetch-input
type:string
- default:""
description:Image tag expiration time, time values could be something like 1h, 2d, 3w for hours, days, and weeks, respectively.
name:image-expires-after
- default:"false"
description:Build a source image.
name:build-source-image
type:string
- default:"true"
description:Add built image into an OCI image index
name:build-image-index
type:string
- default:[]
description:Array of --build-arg values ("arg=value" strings) for buildah
name:build-args
type:array
- default:""
description:Path to a file with build arguments for buildah, see https://www.mankier.com/1/buildah-build#--build-arg-file
name:build-args-file
type:string
- default:"false"
description:Whether to enable privileged mode, should be used only with remote VMs
name:privileged-nested
type:string
- default:
- linux-c4xlarge/amd64
description:List of platforms to build the container images on. The available set of values is determined by the configuration of the multi-platform-controller.
name:build-platforms
type:array
- default:""
description:The parent image of the image being built.
name:parent-image
- default:""
description:The image to use for running tests.
name:test-image
- default:[]
description:List of environment variables (NAME=VALUE) to be set in the test environment.
name:test-envs
type:array
- default:
- echo "No tests defined"
description:List of test commands to run after the image is built.
Based on the [docker-build-multi-platform-oci-ta](https://github.com/konflux-ci/build-definitions/tree/main/pipelines/docker-build-multi-platform-oci-ta) pipeline from [Konflux](https://konflux-ci.dev/).
params:
- description:Source Repository URL
name:git-url
type:string
- default:""
description:Revision of the Source Repository
name:revision
type:string
- description:Fully Qualified Output Image
name:output-image
type:string
- default:.
description:Path to the source code of an application's component from where to build image.
name:path-context
type:string
- default:Dockerfile
description:Path to the Dockerfile inside the context specified by parameter path-context
name:dockerfile
type:string
- default:"false"
description:Force rebuild image
name:rebuild
type:string
- default:"false"
description:Skip checks against built image
name:skip-checks
type:string
- default:"false"
description:Execute the build with network isolation
name:hermetic
type:string
- default:""
description:Build dependencies to be prefetched by Cachi2
name:prefetch-input
type:string
- default:""
description:Image tag expiration time, time values could be something like 1h, 2d, 3w for hours, days, and weeks, respectively.
name:image-expires-after
- default:"false"
description:Build a source image.
name:build-source-image
type:string
- default:"true"
description:Add built image into an OCI image index
name:build-image-index
type:string
- default:[]
description:Array of --build-arg values ("arg=value" strings) for buildah
name:build-args
type:array
- default:""
description:Path to a file with build arguments for buildah, see https://www.mankier.com/1/buildah-build#--build-arg-file
name:build-args-file
type:string
- default:"false"
description:Whether to enable privileged mode, should be used only with remote VMs
name:privileged-nested
type:string
- default:
- linux-c4xlarge/amd64
description:List of platforms to build the container images on. The available set of values is determined by the configuration of the multi-platform-controller.
name:build-platforms
type:array
- default:""
description:The parent image of the image being built.
name:parent-image
- default:""
description:The image to use for running tests.
name:test-image
- default:[]
description:List of environment variables (NAME=VALUE) to be set in the test environment.
name:test-envs
type:array
- default:
- echo "No tests defined"
description:List of test commands to run after the image is built.
This pipeline is ideal for building container images from a Containerfile while maintaining trust after pipeline customization.
_Uses `buildah` to create a container image leveraging [trusted artifacts](https://konflux-ci.dev/architecture/ADR/0036-trusted-artifacts.html). It also optionally creates a source image and runs some build-time tests. Information is shared between tasks using OCI artifacts instead of PVCs. EC will pass the [`trusted_task.trusted`](https://enterprisecontract.dev/docs/ec-policies/release_policy.html#trusted_task__trusted) policy as long as all data used to build the artifact is generated from trusted tasks.
This pipeline is pushed as a Tekton bundle to [quay.io](https://quay.io/repository/konflux-ci/tekton-catalog/pipeline-docker-build-oci-ta?tab=tags)_
This pipeline is ideal for building container images from a Containerfile while maintaining trust after pipeline customization.
_Uses `buildah` to create a container image leveraging [trusted artifacts](https://konflux-ci.dev/architecture/ADR/0036-trusted-artifacts.html). It also optionally creates a source image and runs some build-time tests. Information is shared between tasks using OCI artifacts instead of PVCs. EC will pass the [`trusted_task.trusted`](https://enterprisecontract.dev/docs/ec-policies/release_policy.html#trusted_task__trusted) policy as long as all data used to build the artifact is generated from trusted tasks.
This pipeline is pushed as a Tekton bundle to [quay.io](https://quay.io/repository/konflux-ci/tekton-catalog/pipeline-docker-build-oci-ta?tab=tags)_