* This recipe uses langchain.js and langgraph.js to create an AI application that does function calling
Signed-off-by: Lucas Holmquist <lholmqui@redhat.com>
https://github.com/containers/ai-lab-recipes/pull/806 updated the
version of chromadb used with the rag recipe when run with podman
ai lab.
Update the versions of Langchain and Chromadb clients to be compatible
Signed-off-by: Michael Dawson <mdawson@devrus.com>
pin the chromadb version when using quadlet and bootc to the
same one used when run with podman ai lab. Chromadb seems to
break comapibility regularly and the client must be compatible
with the chromadb version used.
Signed-off-by: Michael Dawson <mdawson@devrus.com>
We need to share container image storage between rootless users, so that
we don't need `sudo` and we don't duplicate the `instructlab` image.
This change follows the Red Hat solution to
[create additional image store for rootless
users](https://access.redhat.com/solutions/6206192).
The `/usr/lib/containers/storage` folder can be read by anyone and new
users will inherit a default configuration via `/etc/skel` that
configures the additional storage.
The `ilab` wrapper is also modified to remove the impersonation code and
not use `sudo` anymore.
Follow-up on #766
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
We need to share container image storage between rootless users, so that
we don't need `sudo` and we don't duplicate the `instructlab` image.
This change follows the Red Hat solution to
[create additional image store for rootless users](https://access.redhat.com/solutions/6206192).
The `/usr/lib/containers/storage` folder can be read by anyone and new
users will inherit a default configuration via `/etc/skel` that
configures the additional storage.
The `ilab` wrapper is also modified to remove the impersonation code and
not use `sudo` anymore.
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
Adds different steps for building required libraries, packages and dependencies for Intel Habanalabs
Signed-off-by: Enrique Belarte Luque <ebelarte@redhat.com>
Add SSL_CERT_FILE and SSL_CERT_DIR to the preserved environment variables and ensure they are passed to Podman. This change ensures that SSL certificates are correctly handled within the container environment.
Signed-off-by: Tyler Lisowski <lisowski@us.ibm.com>
When working with AI/ML recipes, it is frequent to pull versioned
software and data from Git repositories. This change adds the `git`
and `git-lfs` packages.
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
Added workaround for libdnf,hl-smi binary and ilab wrapper.
Also added duplicated directory for common files working with Konflux CI.
Signed-off-by: Enrique Belarte Luque <ebelarte@redhat.com>
This change updates the version of AMD ROCm to 6.2 in the amd-bootc
image for training. With this new version, the `rocm-smi` package is
replaced by the `amd-smi` package.
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
The multi-stage build has too many stages. During the installation of
the `amggpu-dkms` package, the modules are built and installed in
`/lib/modules/${KERNEL_VERSION}`. If the installation of the package is
done in the `driver-toolkit` image, the extra dependencies are very
limited. This change removes the `source` stage and installs the
`amdgpu-dkms` package on top of `driver-toolkit`.
The `amdgpu-dkms` packages installs the modules in
`/lib/modules/${KERNEL_VERSION}/extra` and these are the only modules in
that folder. The `amdgpu-dkms-firmware` package is installed as a
dependency of `admgpu-dkms` and it installs the firwmare files in
`/lib/firmware/updates/amdgpu·`. So, this change removes the in-tree
`amdgpu` modules and firmware, then copies the ones generated by DKMS in
the `builder` stage.
The change also moves the repository definitions to the `repos.d` folder
and adds the AMD public key to verify the signatures of the AMD RPMs.
The users call a wrapper script called `ilab` to hide the `instructlab`
container image and the command line options. This change copies the
file from `nvidia-bootc` and adjusts the logic. The main change is that
`/dev/kfd` and `/dev/dri` devices are passed to the container, instead
of `nvidia.com/gpu=all`. The `ilab` wrapper is copied in the `amd-bootc`
image.
The Makefile is also modified to reflect these changes.
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
The comment is no longer relevant since we changed the way we pass
environment variables to the container.
Signed-off-by: Omer Tuchfeld <omer@tuchfeld.dev>
The use of a uid map leads to a new layer with all files chowned.
This takes several seconds due to the size of the instructlab
container (26GB). Normally this would be a one time cost where
the idmap layer is cached and reusued accross container creations;
however, since the container is stored on a read-only additional
image store, no caching is performed.
Address the problem by creating a derived empty contianer in
mutable container storage. This allows the 1k idmap layer to be
created in the smae area, yet reuses the layers in additional
image store.
Signed-off-by: Jason T. Greene <jason.greene@redhat.com>
The `/dev/nvswitchctl` device is created by the NVIDIA Fabric Manager
service, so it cannot be a condition for the `nvidia-fabricmanager`
service.
Looking at the NVIDIA driver startup script for Kubernetes, the actual
check is the presence of `/proc/driver/nvidia-nvswitch/devices` and the
fact that it's not empty [1].
This change modifies the condition to
`ConditionDirectoryNotEmpty=/proc/driver/nvidia-nvswitch/devices`, which
verifies that a certain path exists and is a non-empty directory.
[1] https://gitlab.com/nvidia/container-images/driver/-/blob/main/rhel9/nvidia-driver?ref_type=heads#L262-269
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
The `nvidia-driver` package provides the firmware files for the given
driver version. This change removes the copy of the firmware from the
builder step and install the `nvidia-driver` package instead. This also
allows a better tracability of the files in the final image.
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
Intel has released the version `1.17.0-495` of their Gaudi drivers. They
are available explicitly for RHEL 9.4 with a new `9.4` folder in the RPM
repository. This change updates the arguments to use the new version
from the new repository folder.
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
When building the `driver-toolkit` image, It is cumbersome to find kernel
version that matches the future `nvidia-bootc` and `intel-bootc` images.
However, the kernel version is stored as a label on the `rhel-bootc`
images, which are exposed as the `FROM` variable in the Makefile.
This change collects the kernel version using `skopeo inspect` and `jq`.
The `DRIVER_TOOLKIT_BASE_IMAGE` variable is introduced in the Makefile
to dissociate it from the `FROM` variable that is used as the `nvidia-bootc`
and `intel-bootc` base image.
The user can now specify something like:
```shell
make nvidia-bootc \
FROM=quay.io/centos-bootc/centos-bootc:stream9 \
DRIVER_TOOLKIT_BASE_IMAGE=quay.io/centos/centos:stream9
```
Also, the `VERSION` variable in `/etc/os-release` is the full version, so
this change modifies the command to retrieve the `OS_VERSION_MAJOR`
value.
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
During the build of the out-of-tree drivers, the base image will always
have the `kernel-core` package installed. And the `Makefile` doesn't
pass the `KERNEL_VERSION` argument to the build command. So, it's
simpler to rely on the `kernel-core` package info.
The commands to get the `KREL` and `KDIST` were not working with RHEL
9.4 kernel. The new set of commands has been tested with `ubi9/ubi:9.4`
and `centos/centos:stream9` based driver toolkit image and they return
the correct value. For example, the values returned for the following
kernels are:
* `5.14.0-427.28.1.el9_4` (`ubi9:ubi:9.4`):
* `KVER`: `5.14.0`
* `KREL`: `427.28.1`
* `KDIST`: `.el9_4`
* `5.14.0-427.el9` (`centos/centos:stream9`):
* `KVER`: `5.14.0`
* `KREL`: `427`
* `KDIST`: `.el9`
The `OS_VERSION_MAJOR` argument is also not passed by the `Makefile`,
but we can get it from the `/etc/os-release` file. I'm switching to
grep+sed, because I don't want to load all the other variables.
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
In the `nvidia-bootc` Containerfile, the condition on the existence of
`/dev/nvswitchctl` in the `nvidia-fabricmanager` unit file is not
persisted, because we don't use the `-i` option of `sed`, so the final
image still always tries to load the service. This change adds the `-i`
option to fix this.
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
# Background
df8885777d
# Issue
The current error handling for multiple subuid ranges is broken due to
surprising behavior of `wc -l` which always returns `1` even when the
input is empty.
# Solution
More carefully count the number of lines in the
`CURRENT_USER_SUBUID_RANGE` variable
# Additional changes
50fb00f26f had a small merge error, this
commit fixes that.
Signed-off-by: Omer Tuchfeld <omer@tuchfeld.dev>
We have a file that's always a duplicate of another file, until we can
get rid of this requirement a pre-commit hook to take care of it would
be nice
Signed-off-by: Omer Tuchfeld <omer@tuchfeld.dev>
vLLM fails with empty set values. Adjust the model of env passing to
only set a value if it is defined.
Signed-off-by: Jason T. Greene <jason.greene@redhat.com>
# Background
See df8885777d
# Issue
Introduced a regression [1] where it's no longer possible to run the script
as root, as the subuid map ends up being empty and this causes an error:
```
Error: invalid empty host id at UID map: [1 1]
```
# Solution
Avoid UID mapping if we're already running as root.
# Motivation
We want to also be able to run the script as root, for example as part
of a systemd service.
[1] RHELAI-798
Signed-off-by: Omer Tuchfeld <omer@tuchfeld.dev>
The default base image for the Driver Toolkit image is `centos:stream9`.
The original work for Driver Toolkit is in OpenShift and the base image
is `ubi9/ubi`. In bother cases, the images don't have the `kernel`
package installed.
This change adds a test on the `KERNEL_VERSION` argument and exits if
it's not provided at build time. This also ensure that only the
relevant kernel is present when using `centos:stream9` or `ubi9/ubi`
as the base image. And this realigns a bit with the original Driver
Toolkit.
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
- Set all indenting to 4 spaces (no tabs)
- Use POSIX style function definition in oneliner functions
- Remove unneeded exports on env variables
Signed-off-by: Javi Polo <jpolo@redhat.com>
Include ILAB_GLOBAL_CONFIG, VLLM_LOGGING_LEVEL, and NCCL_DEBUG as environment variables when starting the ilab container. Also add shared memory size of 10G to enable vllm execution. Resolves: https://github.com/containers/ai-lab-recipes/issues/721
Signed-off-by: Tyler Lisowski <lisowski@us.ibm.com>
# Background
The ilab command is wrapped by an `ilab` script which launches ilab
inside a podman container.
# Issue
Since the ilab container image is pulled during the bootc image build
process using the root user, the image is not accessible to non-root
users.
# Solution
We run the container as sudo in order to be able to access the root
container storage. But for security reasons we map root UID 0 inside the
container to the current user's UID (and all the other subuids to the
user's /etc/subuid range) so that we're effectively running the
container as the current user.
# Additional changes
Changed `"--env" "HOME"` to `"--env" "HOME=$HOME"` to pass the HOME
environment variable from the current shell and not from the sudo
environment.
# Future work
In the future, we will run podman as the current user, once we figure a
reasonable way for the current user to access the root's user container
storage
Signed-off-by: Omer Tuchfeld <omer@tuchfeld.dev>
# Background
We have an ilab wrapper script that users will use to launch the ilab
container.
Users may want to mount additional volumes into the container, as they
could possibly have e.g. large models stored in some external storage.
# Problem
Users cannot simply edit the script to add the mounts to the podman
command as it is read-only.
# Solution
Add support for an environment variable that users can set to specify
additional mounts to be added to the podman command. This will allow
users to specify additional mounts without having to modify the script.
# Implementation
The script will now check for the `ILAB_ADDITIONAL_MOUNTS` environment
variable. If it is set, the script will parse the variable as evaluated
bash code to get the mounts. The mounts will then be added to the podman
command.
Example `ILAB_ADDITIONAL_MOUNTS` usage:
```bash
ILAB_ADDITIONAL_MOUNTS="/host/path:/container/path /host/path2:/container/path2"`
```
If your path contains spaces, you can use quotes:
```bash
ILAB_ADDITIONAL_MOUNTS="/host/path:/container/path '/host/path with spaces':/container/path"
```
The latter works because the script uses `eval` to parse the mounts.
Signed-off-by: Omer Tuchfeld <omer@tuchfeld.dev>
The wrapper had a mixed used of tabs/spaces, making it annoying to edit
Formatted with shfmt to switch to spaces
Signed-off-by: Omer Tuchfeld <omer@tuchfeld.dev>
If the wrapper script is killed, the container will be left running.
Instead of just running the command, use `exec` to replace the
wrapper script with the command, so that the command will receive
the same signals as the wrapper script and the container will be
terminated as expected.
Signed-off-by: Omer Tuchfeld <omer@tuchfeld.dev>
Upgrade informer will run every couple of our and will be triggered by
systemd timer.
In order to start it on boot and run once i enabled it and timer.
Disabling auto upgrade service in order to remove unexpected reboots.
Service will run "bootc upgrade --check" and in case new version exists
it will create motd file with upgrade info.
Signed-off-by: Igal Tsoiref <itsoiref@redhat.com>
Signed-off-by: Javi Polo <jpolo@redhat.com>
While skopeo maybe part of the base image, there is no
guarantee, and as long as ilab requires it, we should
make sure it is installed.
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
Background
RHEL AI ships with a script in `/usr/bin` called `ilab` which
makes running `ilab` commands feel native even though they're actually
running in a podman container
Issues
The abstraction becomes leaky once you start dealing with paths.
The user thinks it's local paths, but it's actually paths inside the pod,
and if the user is doing any action with a path that's not mounted inside the pod,
files persisted to that path will not persist across ilab wrapper invocations
Examples:
1. ilab config init outputs:
Generating `/root/.config/instructlab/config.yaml`...
Initialization completed successfully, you're ready to start using `ilab`. Enjoy!
But:
ls /root/.config/instructlab/config.yaml
ls: cannot access '/root/.config/instructlab/config.yaml': Permission denied
2. User provided paths e.g.:
ilab config init --model-path...
ilab model download --model-dir=...
The path may not be mounted to the host and the data is written to overlay fs and gone when the conatiner dies
Solution
Mount the user HOME direcotry and set the HOME inside the conainer
This seems to resolve the abouve issues as long the user provided paths
are nested under the user HOME direcotry
Signed-off-by: Eran Cohen <eranco@redhat.com>
Ticket [RHELAI-442](https://issues.redhat.com/browse/RHELAI-442)
# Background
RHEL AI ships with a script in `/usr/local/bin` called `ilab` which
makes running `ilab` commands feel native even though they're actually
running in a podman container
# Issues
* The script is outdated / used several different container images for
different purposes, while it should be just using the single instructlab
image
* The volume mounts were incorrect, as instructlab now uses XDG paths
* Unnecessary directory creation for `HF_CACHE`
* Unnecessary GPU count logic
* Script has unnecessary fiddling of `ilab` parameters, essentially creating a
UX that deviates from the natural `ilab` CLI
# Solutions
* Changed script to use the single container image `IMAGE_NAME` (this
was already the case mostly, except for old references to `LVLM_NAME`
and `TRAIN_NAME` which no longer get replaced leading to a broken `PODMAN_COMMAND_SERVE`.
Also adjusted entrypoint to use the `ilab` executable in the pyenv
* Will now mount the host's `~/.config` and `~/.local` into the
container's corresponding directories, for `instructlab` to use
and for its config / data to persist across invocations
* Will now mount `~/.cache` into the container's corresponding `.cache`
directory, so that the information stored in the default `HF_CACHE` is
also persisted across invocations
* Removed unnecessary GPU count logic
* Removed all parameter parsing / fiddling
# Other changes
Added secret/fake "shell" `ilab` subcommand which opens a shell in the
wrapper's container, useful for troubleshooting issues with the wrapper
itself
Signed-off-by: Omer Tuchfeld <omer@tuchfeld.dev>
it matches much better.
Changing the way we set image_version_id label, in order for it to work in
konflux we should use LABEL in container file
Signed-off-by: Igal Tsoiref <itsoiref@redhat.com>
Of note, there was already a use of a "VENDOR" word to describe the
accelerator or provider (amd, intel, nvidia, etc..). I renamed that
in order to make room for this new use of VENDOR.
Signed-off-by: Ralph Bean <rbean@redhat.com>
Set github hash by defautl as image version.
Add RHEL_AI_VERSION into /etc/os-release in order to use it in
insights
Signed-off-by: Igal Tsoiref <itsoiref@redhat.com>
The `nvidia-persistenced` and `nvidia-fabricmanager` services should be
started on machines with NVIDIA devices. Fabric Manager is only needed
on machines with an NVLink switch, so we patch it to start only if
/dev/nvswitchctl is present.
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
Upstream, this image can be pulled unauthenticated, but in other
environments a user might want to include an image that exists in some
repository that requires authentication to pull.
The person building the image needs to provide
`--secret=id=instructlab-nvidia-pull/.dockerconfigjson,src=instructlab-nvidia-pull/.dockerconfigjson`
when building the image in order to make the secret available.
Signed-off-by: Ralph Bean <rbean@redhat.com>
For the InstructLab image, we use NVIDIA driver version `550.90.07` with
CUDA `12.4.1`, so this change updates the versions in the bootc image to
align the stack.
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
- Fix model download container and targets
- Add prometheus model for eval
- Improve caching in instructlab container
- Add additional "models" targets for all permutations
- Introduce build chaining so that you can build everthing in one step
- Small update to conform to $(MAKE) convention for submakes
Signed-off-by: Jason T. Greene <jason.greene@redhat.com>
The top level vendor targets (amd, intel, nvidia) fail with
"podman" build \
\
--file /root/ai-lab-recipes/training/model/../build/Containerfile.models \
--security-opt label=disable \
--tag "quay.io/ai-lab/-bootc-models:latest" \
-v /root/ai-lab-recipes/training/model/../build:/run/.input:ro
Error: tag quay.io/ai-lab/-bootc-models:latest: invalid reference format
make[1]: *** [Makefile:41: bootc-models] Error 125
make[1]: Leaving directory '/root/ai-lab-recipes/training/model'
make: *** [Makefile:70: bootc-models] Error 2
because VENDOR is not defined when the bootc-models target is called.
Modify the makefile to set VENDOR for each target.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
- Properly separate and order podman and bootc-image-builder arguments
- Move all the `selinux.tmp` workaround to the same layer, so bootc
install wont complain about missing files
Signed-off-by: Javi Polo <jpolo@redhat.com>
Any Gaudi update must be synchronized with all stakeholders. For now,
all packages from Kernel OOT drivers over firmware and SynapseAI to
PyTorch stack must have the same version. `habana-torch-plugin` version
`1.16.0.526` does not work with Kernel drivers `1.16.1-7`.
Signed-off-by: Christian Heimes <cheimes@redhat.com>
The NVIDIA bootc container is using multi-stage to avoid shipping build
dependencies in the final image, making it also smaller. This change
implements the same build strategy for the Intel bootc image.
The builder image is the same as for NVIDIA bootc. It is currently named
after NVIDIA, but should be renamed in a follow-up change. The benefit
is that a single builder image is maintained for all bootc images that
require out-of-tree drivers.
The number of build arguments is also reduced, since most of the
information is already present in the builder image. There is only one
kernel package per builder image and one image per architecture, so we
can retrieve the `KERNEL_VERSION` and `TARGET_ARCH` variables by
querying the RPM database. The OS information is retrieved by sourcing
the `/etc/os-release` file.
The extraction of the RPMs doesn't require storing the files, as
`rpm2cpio` supports streaming the file over HTTP(S). This number of
commands is smaller and the downloads happened already for each build,
since the download was not in a separate `RUN` statement.
It is not necessary to copy the source of the drivers in `/usr/src\, since
we don't need to keep it in the final image. The Makefiles accept a
`KVERSION` variable to specify the version of the kernel and resolve its
path. The other benefit is to build as non-root.
The `.ko` files can then be copied to the final image with `COPY
--from=builder`. The change also ensures that the firmware files are
copied to the final image.
This change also adds support for `EXTRA_RPM_PACKAGES`.
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
- Fix wrong script install (container lab used over wrapper [wont run on its own])
+ Restores elements that were unintentionally removed
- Fix quay tags
- Introduce "$ARCH-bootc-models" images in addition to bootc that include models
Signed-off-by: Jason T. Greene <jason.greene@redhat.com>
growfs is created by Makefile and CI does not use it. Also if I'm not misktaken growfs is only used for disk images creation.
By changing this, growfs file will only be created when Makefile is runningso CI pipelines can build the Containerfile and growfs can be also used when needed.
Signed-off-by: Enrique Belarte Luque <ebelarte@redhat.com>
Konflux CI fails when building using bootc images as base throwing this error:
`Error: Cannot create repo temporary directory "/var/cache/dnf/baseos-044cae74d71fe9ea/libdnf.1jsyRp": Permission denied`
This temporary workaround is needed for build pipeline to work on Konflux CI until libdnf fix is merged to RHEL.
References:
https://issues.redhat.com/browse/RHEL-39796https://github.com/rpm-software-management/libdnf/pull/1665
This should be removed once the permanent fix is merged.
Signed-off-by: Enrique Belarte Luque <ebelarte@redhat.com>
Many commands that are run for SDG and training can take a lot of time,
so there is a risk to have a network disconnection during the task. With
`tmux`, users have the ability to detach the jobs from their SSH session
and let the tasks run for a very long time.
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
The `lspci` command is frequently used to inspect the hardware on a
server. Adding it to the OS image would help users to troubleshoot
deployment and configuration issues.
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
Some users want to use buildah instead of podman to build
their container engines.
Buildah does not support --squash-all but after examining the podman
code --squash-all ends up just being the equivalent of "--squash --layers=false"
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Fixed two instructions in the README.
1) the instruction to make model pointed to etr-resnet-50 rather than the etr-resnet-101 that the instructions use.
2) The client container start had a /detecion in the model address where it should not have.
added signoff
Signed-off-by: Graeme Colman <gcolman@redhat.com>
And hence mixtral download fails
Downloading model failed with the following Hugging Face Hub error: 401 Client Error. (Request ID: Root=1-6637576e-28a8c5cb049f1dbb35d46d83;86121860-3ce0-419b-aed0-4fc79c440da7)
Cannot access gated repo for url https://huggingface.co/api/models/mistralai/Mixtral-8x7B-Instruct-v0.1/tree/main?recursive=True&expand=False.
Access to model mistralai/Mixtral-8x7B-Instruct-v0.1 is restricted. You must be authenticated to access it.
Signed-off-by: Rom Freiman <rfreiman@gmail.com>
We are seeing lots of users running out of disk space.
The target should help free up wasted space, but be
carefull no builds are running when you execute the command.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
In order to run podman inside of a container we need to disable
SELinux enforcement and add CAP_SYS_ADMIN to allow mounting
of overlay file systems. This matches what we are doing in the
nvidia and amd bootc containers.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
This is a little hacky to just get this in. Would be
better if we shared this all with the recipes subdir,
but for speed I just want to get this in.
We will need to revisit the Makefile.common concept
to share more between recipes and training.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
No reason to use containers/storage for instructlab or vllm
since we are only building for embedding within a bootc image.
By storing directly in OCI, we can save many minutes and lots of
disk size.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
The current source path is `../../common/usr` and it will copy to
destination path `build/usr/usr/`. I checked that the result are
different between Linux and MacOS, so modify to adapt different
platforms.
fix: #410
Signed-off-by: Yihuang Yu <yihyu@redhat.com>
When using this on alternate platform like aarch64, at least it will
cause the make script to fail early when the aarch64 file doesn't exist,
rather than incorrectly downloading the amd64 version. aarch64 chrome
may get packaged in future also.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
we now have the ability to do a top level `serve` in the ilab-wrapper which will start a vllm server, and run `generate` which will connect to the ENDPOINT_URL specified
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Also create a Makefile.common in training so that the
different hardware vendors Makefiles can share options.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
This extends nvidia-bootc:
- Adds the instructlab
- Uses a save/load approach instead of pull to avoid registry issues
- Adds an ilab wrapper script to orchestrate usage of ilab
+ In the near future this will be extended to coordinate the soon
to come multi-container mechanism for training but gets us
something working now.
- Adds ssh key setup
Signed-off-by: Jason T. Greene <jason.greene@redhat.com>
DISK_USER and DISK_GROUP are from #299, they are used for bib option
`--chown`. However, the current variable name may cause
misunderstanding, mislead people into thinking it's a user of operation
system inside disk, so rename them.
Also, add these 2 variables into the README.
Signed-off-by: Yihuang Yu <yihyu@redhat.com>
`bootc-image-builder` target depends on `bootc`, even the bootc image
exists in the local storage, it still builds a new one. As we add
`--local` in the podman command line, so this commit only check image in
local storage, will not check remote registry.
Signed-off-by: Yihuang Yu <yihyu@redhat.com>
We have hardcoded fields in both the Makefile and the
Containerfiles, only hard code them in the Containerfiles.
So we are less likely to make a mistake.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
The models/README.md described a bunch of download-model flags
that did not exist or were defined in different makefiles. This
PR removes the non-existing targets and adds the defined targets
into the models/Makefile and referenced from the others.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
bootc-image-builder requests root permission, which means the generated
disk image is own by the root user. Helpfully, bib provides the
"--chown" option to help us change the owner of the output directory.
This make user easy to custom UID:GID of the disk and use it later.
Signed-off-by: Yihuang Yu <yihyu@redhat.com>
Update the descriptions of the recipes. They are used by the AI Lab
Podman Desktop extension and were quite repetitive and did not guide the
user much into the various domains and use cases. I tried to describe
a bit more what each recipe does.
Fixes: #297
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
In a Makefile, when we use doule "$", we should make sure the variable
is present in the enviornment. However, in the current design, most
variables are Makefile variables, so that shell failed to handle them.
Signed-off-by: Yihuang Yu <yihyu@redhat.com>
Our standard workflows deal with building components and pushing their images to `quay.io/ai-lab`. These components include:
- recipe applications:
- Chatbot
- Codegen
- Summarizer
- RAG
- model_servers
- models
- instructlab workflows
- training bootc workflows
For a full list of the images we build check out or [quay organization](https://quay.io/organization/ai-lab). These standard workflows should all be run against our standard repo `containers/ai-labs-recipes` rather than the mirror repo.
## Testing frameworks
Our testing frameworks are a bit different from our standard workflows. In terms of compute, some of these jobs run either AWS machines provisioned via terraform using secrets in the github repository, or customized github hosted action runners, as well as the standard ubuntu-24.04 github runners for jobs not requiring additional resources.
These workflows start by checking out the [terraform-test-environment-module](https://github.com/containers/terraform-test-environment-module) repo, as well as the code in `containers/ai-lab-recipes` at the `main` branch. Then it will provision the terraform instance, install the correct ansible playbook requirements, and runs a coressponding playbook. Aditional actions may also be taken depending on the testing framework in question.
Finally all of our testing framework workflows will call `terraform destroy` to remove the aws instance we have provisioned and publish the results of the workflow to slack.
IMPORTATNT: If you are doing development and testing, please make sure that instances in AWS are spun down before leaving if you have access to the AWS account.
### training-e2e
The test environment is initially based off of `Fedroa 40`.
It bootstraps a `g5.8xlarge` AWS EC2 instance with Terraform.
Provisioning is executed with ansible. The ansible playbook is invoking bootc install and
@ -99,9 +99,9 @@ After creating your new recipe by adding the files above you might be asking you
## Contributing New Model Servers
There are a number of options out there for model servers and we want to ensure that we provide developers with a variety of vetted options for the model server that will meet there applications needs.
There are a number of options out there for model servers and we want to ensure that we provide developers with a variety of vetted options for the model server that will meet their application's needs.
Deciding which model server is right for a particular use case primarily comes down to the kind of model you want to use (LLM, Object Detection, Data Classification, etc..) and the resources available (GPU, CPU, Cloud, Local).
Deciding which model server is right for a particular use case primarily comes down to the kind of model you want to use (LLM, Object Detection, Data Classification, etc.) and the resources available (GPU, CPU, Cloud, Local).
Locallm currently relies on [llamacpp](https://github.com/ggerganov/llama.cpp) for its model service backend. Llamacpp requires that model be in a `*.gguf` format.
AI Lab Recipes' default model server is [llamacpp_python](https://github.com/abetlen/llama-cpp-python), which needs models to be in a `*.GGUF` format.
However, most models available on [huggingface](https://huggingface.co/models) are not provided directly as `*.gguf` files. More often they are provided as a set of `*.bin` files with some additional metadata files that are produced when the model is originally trained.
However, most models available on [huggingface](https://huggingface.co/models) are not provided directly as `*.GGUF` files. More often they are provided as a set of `*.bin`or `*.safetensor`files with some additional metadata produced when the model is trained.
There are of course a number of users on huggingface who provide `*gguf` versions of popular models. But this introduces an unnecessary interim dependency as well as possible security or licensing concerns.
There are of course a number of users on huggingface who provide `*.GGUF` versions of popular models. But this introduces an unnecessary interim dependency as well as possible security or licensing concerns.
To avoid these concerns and provide users with the maximum freedom of choice for their models, we provide a tool to quickly and easily convert and quantize a model on huggingface into a `*gguf` format for use with Locallm.
To avoid these concerns and provide users with the maximum freedom of choice for their models, we provide a tool to quickly and easily convert and quantize a model from huggingface into a `*.GGUF` format for use with our `*.GGUF` compatible model servers.

@ -19,10 +19,10 @@ podman build -t converter .
## Quantize and Convert
You can run the conversion image directly with Podman in the terminal. You just need to provide it with the huggingface model you want to download, the quantization level you want to use and whether or not you want to keep the raw files after conversion.
You can run the conversion image directly with podman in the terminal. You just need to provide it with the huggingface model name you want to download, the quantization level you want to use and whether or not you want to keep the raw files after conversion. "HF_TOKEN" is optional, it is required for private models.
You can also use the UI shown above to do the same.
@ -33,12 +33,12 @@ streamlit run convert_models/ui.py
## Model Storage and Use
This process writes the models into a Podman volume under a `gguf/` directory and not directly back to the user's host machine (This could be changed in an upcoming update if it is required).
This process writes the models into a podman volume under a `gguf/` directory and not directly back to the user's host machine (This could be changed in an upcoming update if it is required).
If a user wants to access these models to use with the llamacpp model-service, they would simply point their model-service volume mount to the Podman volume created here. For example:
If a user wants to access these models to use with the llamacpp_python model server, they would simply point their model service to the correct podman volume at run time. For example:
st.session_state["Question"]="What is the Higgs Boson?"
if"Answers"notinst.session_state:
st.session_state["Answers"]={}
st.session_state["Answers"]["Right_Answer_1"]="The Higgs boson, sometimes called the Higgs particle, is an elementary particle in the Standard Model of particle physics produced by the quantum excitation of the Higgs field, one of the fields in particle physics theory"
st.session_state["Answers"]["Wrong_Answer_1"]="Alan Turing was the first person to conduct substantial research in the field that he called machine intelligence."
In some cases it will be useful for a developers to updated the base language model they are using (like Llama2) with some custom data of their own. In order to do this they can "finetune" the model by partially retraining it with their custom data set. There are a number of ways to do this, and they vary in complexity and computational resource requirements. Here we will continue to rely on the [llama.cpp](https://github.com/ggerganov/llama.cpp) package and do LoRA (Low-Rank Adaption) fine tuning which often requires fewer resources than other fine tuning methods.
### Use the container image
We have created a pre-built container image for running the finetuning and producing a new model on a mac. The image can be found at [quay.io/michaelclifford/finetunellm](quay.io/michaelclifford/finetunellm).
```bash
podman pull quay.io/michaelclifford/finetunellm
```
It only requires 2 things from a user to start fine tuning. The data they wish to finetune with, and the Llama based model they want to finetune (the current implementation requires a variant of the Llama model).
### Make the data accessible
This is the trickiest part of the current demo and I'm hoping to find a smoother approach moving forward. That said, there are many ways to get data into and out of pods and containers, but here we will rely on exposing a directory on our local machine as a volume for the container.
This also assumes that `<location/of/your/data/>` contains the following 2 files.
podman run --rm -it -v <location/of/your/data/>:/locallm/data/ finetunellm
```
This will run 10 iterations of LoRA finetuning and generate a new model that can be exported and used in another chat application. I'll caution that 10 iterations is likely insufficient to see a real change in the model outputs, but it serves here for demo purposes.
### Export the model
Now that we have our finedtuned model we will want to move it out of the Podman machine and onto our local host for use by another application. Again, I'm sure there are better ways to do this long term.
Here we will rely on Podman's copy function to move the model.
If you would like to use a different model or dataset, you can replace the training data file in `data/` as well as the `.gguf` model file. However, for now llama.cpp finetuning requires a Llama variant model to be used.
To change the data and model used you can set the following environment variables when starting a new container.
* `DATA=data/data/<new-data-file>`
* `MODEL_FILE=data/<new-model-file.gguf>`
* `NEW_MODEL=<name-of-new-finetuned-model.gguf>`
```bash
podman run -it -v <location/of/your/data/>:/locallm/data/ \
The llamacpp_python model server images are based on the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) project that provides python bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp). This provides us with a python based and OpenAI API compatible model server that can run LLM's of various sizes locally across Linux, Windows or Mac.
This model server requires models to be converted from their original format, typically a set of `*.bin` or `*.safetensor` files into a single GGUF formatted file. Many models are available in GGUF format already on [huggingface.co](https://huggingface.co). You can also use the [model converter utility](../../convert_models/) available in this repo to convert models yourself.
This model server requires models to be converted from their original format, typically a set of `*.bin` or `*.safetensor` files into a single GGUF formatted file. Many models are available in GGUF format already on [huggingface.co](https://huggingface.co). You can also use the [model converter utility](../../convert_models/) available in this repo to convert models yourself.
## Image Options
We currently provide 3 options for the llamacpp_python model server:
* [Base](#base)
We currently provide 3 options for the llamacpp_python model server:
* [Base](#base)
* [Cuda](#cuda)
* [Vulkan (experimental)](#vulkan-experimental)
* [Vulkan (experimental)](#vulkan-experimental)
### Base
The [base image](../llamacpp_python/base/Containerfile) is the standard image that works for both arm64 and amd64 environments. However, it does not includes any hardware acceleration and will run with CPU only. If you use the base image, make sure that your container runtime has sufficient resources to run the desired model(s).
The [base image](../llamacpp_python/base/Containerfile) is the standard image that works for both arm64 and amd64 environments. However, it does not includes any hardware acceleration and will run with CPU only. If you use the base image, make sure that your container runtime has sufficient resources to run the desired model(s).
To build the base model service image:
```bash
make -f Makefile build
make build
```
To pull the base model service image:
```bash
podman pull quay.io/ai-lab/llamacpp-python
podman pull quay.io/ai-lab/llamacpp_python
```
### Cuda
The [Cuda image](../llamacpp_python/cuda/Containerfile) include all the extra drivers necessary to run our model server with Nvidia GPUs. This will significant speed up the models response time over CPU only deployments.
The [Cuda image](../llamacpp_python/cuda/Containerfile) include all the extra drivers necessary to run our model server with Nvidia GPUs. This will significant speed up the models response time over CPU only deployments.
To Build the the Cuda variant image:
```bash
make -f Makefile build-cuda
make build-cuda
```
To pull the base model service image:
```bash
podman pull quay.io/ai-lab/llamacpp-python-cuda
podman pull quay.io/ai-lab/llamacpp_python_cuda
```
**IMPORTANT!**
To run the Cuda image with GPU acceleration, you need to install the correct [Cuda drivers](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#driver-installation) for your system along with the [Nvidia Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#). Please use the links provided to find installation instructions for your system.
To run the Cuda image with GPU acceleration, you need to install the correct [Cuda drivers](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#driver-installation) for your system along with the [Nvidia Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#). Please use the links provided to find installation instructions for your system.
Once those are installed you can use the container toolkit CLI to discover your Nvidia device(s).
Once those are installed you can use the container toolkit CLI to discover your Nvidia device(s).
@ -57,35 +60,37 @@ Finally, you will also need to add `--device nvidia.com/gpu=all` to your `podman
### Vulkan (experimental)
The [Vulkan image](../llamacpp_python/vulkan/Containerfile) is experimental, but can be used for gaining partial GPU access on an M-series Mac, significantly speeding up model response time over a CPU only deployment. This image requires that your podman machine provider is "applehv" and that you use krunkit instead of vfkit. Since these tools are not currently supported by podman desktop this image will remain "experimental".
The [Vulkan](https://docs.vulkan.org/guide/latest/what_is_vulkan.html) image ([amd64](../llamacpp_python/vulkan/amd64/Containerfile)/[arm64](../llamacpp_python/vulkan/arm64/Containerfile)) is experimental, but can be used for gaining partial GPU access on an M-series Mac, significantly speeding up model response time over a CPU only deployment. This image requires that your podman machine provider is "applehv" and that you use krunkit instead of vfkit. Since these tools are not currently supported by podman desktop this image will remain "experimental".
To build the Vulkan model service variant image:
```bash
make -f Makefile build-vulkan
```
| System Architecture | Command |
|---|---|
| amd64 | make build-vulkan-amd64 |
| arm64 | make build-vulkan-arm64 |
To pull the base model service image:
```bash
podman pull quay.io/ai-lab/llamacpp-python-vulkan
podman pull quay.io/ai-lab/llamacpp_python_vulkan
```
## Download Model(s)
There are many models to choose from these days, most of which can be found on [huggingface.co](https://huggingface.co). In order to use a model with the llamacpp_python model server, it must be in GGUF format. You can either download pre-converted GGUF models directly or convert them yourself with the [model converter utility](../../convert_models/) available in this repo.
One of the more popular Apache-2.0 Licenesed models that we recommend using if you are just getting started is `mistral-7b-instruct-v0.1`. You can use the link below to quickly download a quantized (smaller) GGUF version of this model for use with the llamacpp_python model server.
A well performant Apache-2.0 licensed models that we recommend using if you are just getting started is
`granite-7b-lab`. You can use the link below to quickly download a quantized (smaller) GGUF version of this model for use with the llamacpp_python model server.
Place all models in the [models](../../models/) directory.
You can use this snippet below to download the default model:
```bash
make -f Makefile download-model-mistral
make download-model-granite
```
Or you can use the generic `download-models` target from the `/models` directory to download any model file from huggingface:
@ -93,7 +98,7 @@ Or you can use the generic `download-models` target from the `/models` directory
```bash
cd ../../models
make MODEL_NAME=<model_name> MODEL_URL=<model_url> -f Makefile download-model
# EX: make MODEL_NAME=mistral-7b-instruct-v0.1.Q4_K_M.gguf MODEL_URL=https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_M.gguf -f Makefile download-model
# EX: make MODEL_NAME=granite-7b-lab-Q4_K_M.gguf MODEL_URL=https://huggingface.co/instructlab/granite-7b-lab-GGUF/resolve/main/granite-7b-lab-Q4_K_M.gguf -f Makefile download-model
```
@ -104,32 +109,35 @@ make MODEL_NAME=<model_name> MODEL_URL=<model_url> -f Makefile download-model
To deploy the LLM server you must specify a volume mount `-v` where your models are stored on the host machine and the `MODEL_PATH` for your model of choice. The model_server is most easily deploy from calling the make command: `make -f Makefile run`. Of course as with all our make calls you can pass any number of the following variables: `REGISTRY`, `IMAGE_NAME`, `MODEL_NAME`, `MODEL_PATH`, and `PORT`.
To enable dynamic loading and unloading of different models present on your machine, you can start the model service with a `CONFIG_PATH` instead of a `MODEL_PATH`.
Here is an example `models_config.json` with two quantization variants of mistral-7B.
Here is an example `models_config.json` with two model options.
```json
{
@ -137,24 +145,24 @@ Here is an example `models_config.json` with two quantization variants of mistra
The object_detection_python model server is a simple [FastAPI](https://fastapi.tiangolo.com/) application written specifically for use in the [object_detection recipe](../../recipes/computer_vision/object_detection/) with "DEtection TRansformer" (DETR) models. It relies on huggingface's transformer package for `AutoImageProcessor` and `AutoModelforObjectDetection` to process image data and to make inferences respectively.
Currently, the server only implements a single endpoint, `/detection`, that expects an image in bytes and returns an image with labeled bounding boxes and the probability scores of each bounding box.
## Build Model Server
To build the object_detection_python model server image from this directory:
You can download models from [huggingface.co](https://huggingface.co/) for this model server. This model server is intended to be used with "DEtection TRansformer" (DETR) models. The default model we've used and validated is [facebook/detr-resnet-101](https://huggingface.co/facebook/detr-resnet-101).
You can download a copy of this model into your `models/` with the make command below.
```bash
make download-model-facebook-detr-resnet-101
```
or any model with
```bash
cd ../../models/ && \
python download_hf_models.py -m <MODEL>
```
## Deploy Model Server
The model server relies on a volume mount to the localhost to access the model files. It also employs environment variables to dictate the model used and where its served. You can start your model server using the following `make` command from the [`model_servers/object_detection_python`](../../../model_servers/object_detection_python) directory, which will be set with reasonable defaults:
The models directory stores models and provides automation around downloading models.
Want to try one of our tested models? Try or or all of the following:
Want to try one of our tested models? Try one or all of the following:
```bash
make -f Makefile download-model-llama
make -f Makefile download-model-tiny-llama
make -f Makefile download-model-mistral
make -f Makefile download-model-whisper-small
make -f Makefile download-model-whisper-base
make download-model-granite
make download-model-merlinite
make download-model-mistral
make download-model-mistral-code
make download-model-whisper-small
```
Want to download and run a model you dont see listed? This is supported with the `MODEL_NAME` and `MODEL_URL` params:
Want to download and run a model you don't see listed? This is supported with the `MODEL_NAME` and `MODEL_URL` params:
```bash
make -f Makefile download-model MODEL_URL=https://huggingface.co/andrewcanis/c4ai-command-r-v01-GGUF/resolve/main/c4ai-command-r-v01-Q4_K_S.gguf MODEL_NAME=c4ai-command-r-v01-Q4_K_S.gguf
```
make download-model MODEL_URL=https://huggingface.co/TheBloke/openchat-3.5-0106-GGUF/resolve/main/openchat-3.5-0106.Q4_K_M.gguf MODEL_NAME=openchat-3.5-0106.Q4_K_M.gguf
@ -56,11 +56,11 @@ The local Model Service relies on a volume mount to the localhost to access the
make run
```
As stated above, by default the model service will use [`facebook/detr-resnet-101`](https://huggingface.co/facebook/detr-resnet-101). However you can use other compatabale models. Simply pass the new `MODEL_NAME` and `MODEL_PATH` to the make command. Make sure the model is downloaded and exists in the [models directory](../../../models/):
As stated above, by default the model service will use [`facebook/detr-resnet-101`](https://huggingface.co/facebook/detr-resnet-101). However you can use other compatible models. Simply pass the new `MODEL_NAME` and `MODEL_PATH` to the make command. Make sure the model is downloaded and exists in the [models directory](../../../models/):
```bash
# from path model_servers/object_detection_python from repo containers/ai-lab-recipes
make MODEL_NAME=facebook/detr-resnet-50 MODEL_PATH=/models/facebook/detr-resnet-50 run
make MODEL_NAME=facebook/detr-resnet-50 MODEL_PATH=/models/facebook/detr-resnet-101 run
```
## Build the AI Application
@ -81,7 +81,7 @@ This could be any appropriately hosted Model Service (running locally or in the
The following Podman command can be used to run your AI Application:
```bash
podman run -p 8501:8501 -e MODEL_ENDPOINT=http://10.88.0.1:8000/detection object_detection_client
podman run -p 8501:8501 -e MODEL_ENDPOINT=http://10.88.0.1:8000 object_detection_client