model-runner/charts/docker-model-runner
Nick Santos b42f3a0cb5
charts: add Kubernetes examples
- a helm chart
- static Kubernetes configs for a few common setups

I put these under ./charts so we can expose
this as a Helm chart repo later if we want,
but for now we'll just tell people to install it
from source.

Signed-off-by: Nick Santos <nick.santos@docker.com>
2025-07-29 12:53:05 -04:00
..
static charts: add Kubernetes examples 2025-07-29 12:53:05 -04:00
templates charts: add Kubernetes examples 2025-07-29 12:53:05 -04:00
CONTRIBUTING.md charts: add Kubernetes examples 2025-07-29 12:53:05 -04:00
Chart.yaml charts: add Kubernetes examples 2025-07-29 12:53:05 -04:00
Makefile charts: add Kubernetes examples 2025-07-29 12:53:05 -04:00
README.md charts: add Kubernetes examples 2025-07-29 12:53:05 -04:00
values.yaml charts: add Kubernetes examples 2025-07-29 12:53:05 -04:00

README.md

Docker Model Runner Kubernetes Support

Manifests for deploying Docker Model Runner on Kubernetes with ephemeral storage, GPU support, and model pre-pulling capabilities.

Quickstart

On Docker Desktop

kubectl apply -f static/docker-model-runner-desktop.yaml
kubectl wait --for=condition=Available deployment/docker-model-runner --timeout=5m
MODEL_RUNNER_HOST=http://localhost:31245 docker model run ai/smollm2:latest

On any Kubernetes Cluster

kubectl apply -f static/docker-model-runner.yaml
kubectl wait --for=condition=Available deployment/docker-model-runner --timeout=5m
kubectl port-forward deployment/docker-model-runner 31245:12434

Then:

MODEL_RUNNER_HOST=http://localhost:31245 docker model run ai/smollm2:latest

Helm Configuration

Basic Configuration

Key configuration options in values.yaml:

# Storage configuration
storage:
  size: 100Gi
  storageClass: ""  # Set this to the storage class of your cloud provider.

# Model pre-pull configuration
modelInit:
  enabled: false
  models:
    - "ai/smollm2:latest"

# GPU configuration
gpu:
  enabled: false
  vendor: nvidia  # or amd
  count: 1

# NodePort configuration
nodePort:
  enabled: false
  port: 31245

GPU Scheduling

To enable GPU scheduling:

gpu:
  enabled: true
  vendor: nvidia  # or amd
  count: 1

This will add the appropriate resource requests/limits:

  • NVIDIA: nvidia.com/gpu
  • AMD: amd.com/gpu

Model Pre-pulling

Configure models to pre-pull during pod initialization:

modelInit:
  enabled: true
  models:
    - "ai/smollm2:latest"
    - "ai/llama3.2:latest"
    - "ai/mistral:latest"

Usage

Testing the Installation

Once installed, set up a port-forward to access the service:

kubectl port-forward service/docker-model-runner-nodeport 31245:80

Then test the model runner:

MODEL_RUNNER_HOST=http://localhost:31245 docker model run ai/smollm2:latest

Using with Open WebUI

To use Docker Model Runner with Open WebUI, install the Open WebUI Helm chart:

# Add the Open WebUI Helm repository
helm repo add open-webui https://helm.openwebui.com/
helm repo update

# Install Open WebUI with auth diabled
# See the open-webui Helm chart for
# connecting to your auth provider.
helm upgrade --install --wait open-webui open-webui/open-webui \
  --set ollama.enabled=false \
  --set pipelines.enabled=false \
  --set extraEnvVars[0].name="WEBUI_AUTH" \
  --set-string extraEnvVars[0].value=false \
  --set openaiBaseApiUrl="http://docker-model-runner/engines/v1"

Access Open WebUI:

kubectl port-forward service/open-webui 8080:80

Then visit http://localhost:8080 in your browser.

Values Reference

Parameter Description Default
replicaCount Number of replicas 1
image.repository Docker Model Runner image repository docker/model-runner
image.tag Docker Model Runner image tag latest
image.pullPolicy Image pull policy IfNotPresent
storage.size Ephemeral volume size 100Gi
storage.storageClass Storage class for ephemeral volume ""
modelInit.enabled Enable model pre-pulling false
modelInit.models List of models to pre-pull ["ai/smollm2:latest"]
gpu.enabled Enable GPU support false
gpu.vendor GPU vendor (nvidia or amd) nvidia
gpu.count Number of GPUs to request 1
nodePort.enabled Enable NodePort service false
nodePort.port NodePort port number 31245

Troubleshooting

Pod Fails to Start

Check the pod logs:

kubectl logs -f deployment/docker-model-runner

Model Pre-pull Issues

Check the init container logs:

kubectl logs -f deployment/docker-model-runner -c model-init

GPU Not Available

Your cluster must use a GPU scheduling plugin.

Ensure your cluster has GPU support and the appropriate device plugin installed: