AI Example model serving tensorflow (#563)

* Create AI Example model serving tensorflow

* ai/model-serving-tensorflow service.yaml

* ai/model-serving-tensorflow ingress.yaml

* ai/model-serving-tensorflow pv.yaml

* ai/model-serving-tensorflow pvc.yaml

* Create Readme.md

* Rename Readme.md to README.md

* Update with structure format for README.md

* Correct link for serving in ai/model-serving-tensorflow/README.md

Co-authored-by: Janet Kuo <chiachenk@google.com>

* Fix kubectl README.md

* Update README.md

* Update as per comments README.md

* Update tensorflow/serving:2.19.0 deployment.yaml

* remove hostname ai/model-serving-tensorflow/ingress.yaml

---------

Co-authored-by: Janet Kuo <chiachenk@google.com>
This commit is contained in:
Jayesh Mahajan 2025-06-03 20:48:38 -04:00 committed by GitHub
parent 209452cc17
commit 0598f0762a
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
6 changed files with 221 additions and 0 deletions

View File

@ -0,0 +1,132 @@
# TensorFlow Model Serving on Kubernetes
## 1 Purpose / What You'll Learn
This example demonstrates how to deploy a TensorFlow model for inference using [TensorFlow Serving](https://www.tensorflow.org/serving) on Kubernetes. Youll learn how to:
- Set up TensorFlow Serving with a pre-trained model
- Use a PersistentVolume to mount your model directory
- Expose the inference endpoint using a Kubernetes `Service` and `Ingress`
- Send a sample prediction request to the model
---
## 📚 Table of Contents
- [Prerequisites](#prerequisites)
- [Quick Start / TL;DR](#quick-start--tldr)
- [Detailed Steps & Explanation](#detailed-steps--explanation)
- [Verification / Seeing it Work](#verification--seeing-it-work)
- [Configuration Customization](#configuration-customization)
- [Cleanup](#cleanup)
- [Further Reading / Next Steps](#further-reading--next-steps)
---
## ⚙️ Prerequisites
- Kubernetes cluster (tested with v1.29+)
- `kubectl` configured
- Optional: `ingress-nginx` for external access
- x86-based machine (for running TensorFlow Serving image)
- Local hostPath support (for demo) or a cloud-based PVC
---
## ⚡ Quick Start / TL;DR
```bash
# Apply manifests
kubectl apply -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/pv.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/pvc.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/deployment.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/service.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/ingress.yaml # Optional
```
---
## 2. Expose the Servic
### 1. PersistentVolume & PVC Setup
> ⚠️ Note: For local testing, `hostPath` is used to mount `/mnt/models/my_model`. In production, replace this with a cloud-native storage backend (e.g., AWS EBS, GCP PD, or NFS).
Model folder structure:
```
/mnt/models/my_model/
└── 1/
├── saved_model.pb
└── variables/
```
---
### 2. Expose the Service
- A `ClusterIP` service exposes gRPC (8500) and REST (8501).
- An optional `Ingress` exposes `/tf/v1/models/my_model:predict` to external clients.
Update the `host` value in `ingress.yaml` to match your domain.
---
## 3 Verification / Seeing it Work
If using ingress:
```bash
curl -X POST http://<ingress-host>/tf/v1/models/my_model:predict \
-H "Content-Type: application/json" \
-d '{ "instances": [[1.0, 2.0, 5.0]] }'
```
Expected output:
```json
{
"predictions": [...]
}
```
To verify the pod is running:
```bash
kubectl get pods
kubectl wait --for=condition=Available deployment/tf-serving --timeout=300s
kubectl logs deployment/tf-serving
```
---
## 🛠️ Configuration Customization
- Update `model_name` and `model_base_path` in the deployment
- Replace `hostPath` with `PersistentVolumeClaim` bound to cloud storage
- Modify resource requests/limits for TensorFlow container
---
## 🧹 Cleanup
```bash
kubectl delete -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/ingress.yaml # Optional
kubectl delete -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/service.yaml
kubectl delete -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/deployment.yaml
kubectl delete -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/pvc.yaml
kubectl delete -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/pv.yaml
```
---
## 4 Further Reading / Next Steps
- [TensorFlow Serving](https://www.tensorflow.org/tfx/serving)
- [TF Serving REST API Reference](https://www.tensorflow.org/tfx/serving/api_rest)
- [Kubernetes Ingress Controller](https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/)
- [Persistent Volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/)

View File

@ -0,0 +1,34 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: tf-serving
labels:
app: tf-serving
spec:
replicas: 1
selector:
matchLabels:
app: tf-serving
template:
metadata:
labels:
app: tf-serving
spec:
containers:
- name: tensorflow-serving
image: tensorflow/serving:2.19.0
args:
- "--model_name=my_model"
- "--port=8500"
- "--rest_api_port=8501"
- "--model_base_path=/models/my_model"
ports:
- containerPort: 8500 # gRPC
- containerPort: 8501 # REST
volumeMounts:
- name: model-volume
mountPath: /models/my_model
volumes:
- name: model-volume
persistentVolumeClaim:
claimName: my-model-pvc

View File

@ -0,0 +1,17 @@
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: tf-serving-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /$2
spec:
rules:
- http:
paths:
- path: /tf(/|$)(.*)
pathType: Prefix
backend:
service:
name: tf-serving
port:
number: 8501

View File

@ -0,0 +1,12 @@
apiVersion: v1
kind: PersistentVolume
metadata:
name: my-model-pv
spec:
capacity:
storage: 1Gi
accessModes:
- ReadOnlyMany
persistentVolumeReclaimPolicy: Retain
hostPath:
path: /mnt/models/my_model

View File

@ -0,0 +1,11 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-model-pvc
spec:
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 1Gi
volumeName: my-model-pv

View File

@ -0,0 +1,15 @@
apiVersion: v1
kind: Service
metadata:
name: tf-serving
spec:
selector:
app: tf-serving
ports:
- name: grpc
port: 8500
targetPort: 8500
- name: rest
port: 8501
targetPort: 8501
type: ClusterIP