History

Nick Santos b42f3a0cb5 charts: add Kubernetes examples - a helm chart - static Kubernetes configs for a few common setups I put these under ./charts so we can expose this as a Helm chart repo later if we want, but for now we'll just tell people to install it from source. Signed-off-by: Nick Santos <nick.santos@docker.com>		2025-07-29 12:53:05 -04:00
..
static	charts: add Kubernetes examples	2025-07-29 12:53:05 -04:00
templates	charts: add Kubernetes examples	2025-07-29 12:53:05 -04:00
CONTRIBUTING.md	charts: add Kubernetes examples	2025-07-29 12:53:05 -04:00
Chart.yaml	charts: add Kubernetes examples	2025-07-29 12:53:05 -04:00
Makefile	charts: add Kubernetes examples	2025-07-29 12:53:05 -04:00
README.md	charts: add Kubernetes examples	2025-07-29 12:53:05 -04:00
values.yaml	charts: add Kubernetes examples	2025-07-29 12:53:05 -04:00

README.md

Docker Model Runner Kubernetes Support

Manifests for deploying Docker Model Runner on Kubernetes with ephemeral storage, GPU support, and model pre-pulling capabilities.

Quickstart

On Docker Desktop

kubectl apply -f static/docker-model-runner-desktop.yaml
kubectl wait --for=condition=Available deployment/docker-model-runner --timeout=5m
MODEL_RUNNER_HOST=http://localhost:31245 docker model run ai/smollm2:latest

On any Kubernetes Cluster

kubectl apply -f static/docker-model-runner.yaml
kubectl wait --for=condition=Available deployment/docker-model-runner --timeout=5m
kubectl port-forward deployment/docker-model-runner 31245:12434

Then:

MODEL_RUNNER_HOST=http://localhost:31245 docker model run ai/smollm2:latest

Helm Configuration

Basic Configuration

Key configuration options in values.yaml:

# Storage configuration
storage:
  size: 100Gi
  storageClass: ""  # Set this to the storage class of your cloud provider.

# Model pre-pull configuration
modelInit:
  enabled: false
  models:
    - "ai/smollm2:latest"

# GPU configuration
gpu:
  enabled: false
  vendor: nvidia  # or amd
  count: 1

# NodePort configuration
nodePort:
  enabled: false
  port: 31245

GPU Scheduling

To enable GPU scheduling:

gpu:
  enabled: true
  vendor: nvidia  # or amd
  count: 1

This will add the appropriate resource requests/limits:

NVIDIA: nvidia.com/gpu
AMD: amd.com/gpu

Model Pre-pulling

Configure models to pre-pull during pod initialization:

modelInit:
  enabled: true
  models:
    - "ai/smollm2:latest"
    - "ai/llama3.2:latest"
    - "ai/mistral:latest"

Usage

Testing the Installation

Once installed, set up a port-forward to access the service:

kubectl port-forward service/docker-model-runner-nodeport 31245:80

Then test the model runner:

MODEL_RUNNER_HOST=http://localhost:31245 docker model run ai/smollm2:latest

Using with Open WebUI

To use Docker Model Runner with Open WebUI, install the Open WebUI Helm chart:

# Add the Open WebUI Helm repository
helm repo add open-webui https://helm.openwebui.com/
helm repo update

# Install Open WebUI with auth diabled
# See the open-webui Helm chart for
# connecting to your auth provider.
helm upgrade --install --wait open-webui open-webui/open-webui \
  --set ollama.enabled=false \
  --set pipelines.enabled=false \
  --set extraEnvVars[0].name="WEBUI_AUTH" \
  --set-string extraEnvVars[0].value=false \
  --set openaiBaseApiUrl="http://docker-model-runner/engines/v1"

Access Open WebUI:

kubectl port-forward service/open-webui 8080:80

Then visit http://localhost:8080 in your browser.

Values Reference

Parameter	Description	Default
`replicaCount`	Number of replicas	`1`
`image.repository`	Docker Model Runner image repository	`docker/model-runner`
`image.tag`	Docker Model Runner image tag	`latest`
`image.pullPolicy`	Image pull policy	`IfNotPresent`
`storage.size`	Ephemeral volume size	`100Gi`
`storage.storageClass`	Storage class for ephemeral volume	`""`
`modelInit.enabled`	Enable model pre-pulling	`false`
`modelInit.models`	List of models to pre-pull	`["ai/smollm2:latest"]`
`gpu.enabled`	Enable GPU support	`false`
`gpu.vendor`	GPU vendor (nvidia or amd)	`nvidia`
`gpu.count`	Number of GPUs to request	`1`
`nodePort.enabled`	Enable NodePort service	`false`
`nodePort.port`	NodePort port number	`31245`

Troubleshooting

Pod Fails to Start

Check the pod logs:

kubectl logs -f deployment/docker-model-runner

Model Pre-pull Issues

Check the init container logs:

kubectl logs -f deployment/docker-model-runner -c model-init

GPU Not Available

Your cluster must use a GPU scheduling plugin.

Ensure your cluster has GPU support and the appropriate device plugin installed:

For NVIDIA GPUs: Install the NVIDIA device plugin
For AMD GPUs: Install the AMD device plugin