AI Example model serving tensorflow (#563)

* Create AI Example model serving tensorflow * ai/model-serving-tensorflow service.yaml * ai/model-serving-tensorflow ingress.yaml * ai/model-serving-tensorflow pv.yaml * ai/model-serving-tensorflow pvc.yaml * Create Readme.md * Rename Readme.md to README.md * Update with structure format for README.md * Correct link for serving in ai/model-serving-tensorflow/README.md Co-authored-by: Janet Kuo <chiachenk@google.com> * Fix kubectl README.md * Update README.md * Update as per comments README.md * Update tensorflow/serving:2.19.0 deployment.yaml * remove hostname ai/model-serving-tensorflow/ingress.yaml --------- Co-authored-by: Janet Kuo <chiachenk@google.com>
2025-06-03 20:48:38 -04:00 · 2025-06-03 20:48:38 -04:00 · 0598f0762a
parent 209452cc17
commit 0598f0762a
6 changed files with 221 additions and 0 deletions
--- a/ai/model-serving-tensorflow/README.md
+++ b/ai/model-serving-tensorflow/README.md
@ -0,0 +1,132 @@
 # TensorFlow Model Serving on Kubernetes
 ## 1 Purpose / What You'll Learn
 This example demonstrates how to deploy a TensorFlow model for inference using [TensorFlow Serving](https://www.tensorflow.org/serving) on Kubernetes. You’ll learn how to:
 - Set up TensorFlow Serving with a pre-trained model
 - Use a PersistentVolume to mount your model directory
 - Expose the inference endpoint using a Kubernetes `Service` and `Ingress`
 - Send a sample prediction request to the model
 ---
 ## 📚 Table of Contents
 - [Prerequisites](#prerequisites)
 - [Quick Start / TL;DR](#quick-start--tldr)
 - [Detailed Steps & Explanation](#detailed-steps--explanation)
 - [Verification / Seeing it Work](#verification--seeing-it-work)
 - [Configuration Customization](#configuration-customization)
 - [Cleanup](#cleanup)
 - [Further Reading / Next Steps](#further-reading--next-steps)
 ---
 ## ⚙️ Prerequisites
 - Kubernetes cluster (tested with v1.29+)
 - `kubectl` configured
 - Optional: `ingress-nginx` for external access
 - x86-based machine (for running TensorFlow Serving image)
 - Local hostPath support (for demo) or a cloud-based PVC
 ---
 ## ⚡ Quick Start / TL;DR
 ```bash
 # Apply manifests
 kubectl apply -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/pv.yaml
 kubectl apply -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/pvc.yaml
 kubectl apply -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/deployment.yaml
 kubectl apply -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/service.yaml
 kubectl apply -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/ingress.yaml  # Optional
 ```
 ---
 ## 2. Expose the Servic
 ### 1. PersistentVolume & PVC Setup
 > ⚠️ Note: For local testing, `hostPath` is used to mount `/mnt/models/my_model`. In production, replace this with a cloud-native storage backend (e.g., AWS EBS, GCP PD, or NFS).
 Model folder structure:
 ```
 /mnt/models/my_model/
 └── 1/
    ├── saved_model.pb
    └── variables/
 ```
 ---
 ### 2. Expose the Service
 - A `ClusterIP` service exposes gRPC (8500) and REST (8501).
 - An optional `Ingress` exposes `/tf/v1/models/my_model:predict` to external clients.
 Update the `host` value in `ingress.yaml` to match your domain.
 ---
 ## 3 Verification / Seeing it Work
 If using ingress:
 ```bash
 curl -X POST http://<ingress-host>/tf/v1/models/my_model:predict \
  -H "Content-Type: application/json" \
  -d '{ "instances": [[1.0, 2.0, 5.0]] }'
 ```
 Expected output:
 ```json
 {
  "predictions": [...]
 }
 ```
 To verify the pod is running:
 ```bash
 kubectl get pods
 kubectl wait --for=condition=Available deployment/tf-serving --timeout=300s
 kubectl logs deployment/tf-serving
 ```
 ---
 ## 🛠️ Configuration Customization
 - Update `model_name` and `model_base_path` in the deployment
 - Replace `hostPath` with `PersistentVolumeClaim` bound to cloud storage
 - Modify resource requests/limits for TensorFlow container
 ---
 ## 🧹 Cleanup
 ```bash
 kubectl delete -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/ingress.yaml  # Optional
 kubectl delete -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/service.yaml
 kubectl delete -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/deployment.yaml
 kubectl delete -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/pvc.yaml
 kubectl delete -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/pv.yaml
 ```
 ---
 ## 4 Further Reading / Next Steps
 - [TensorFlow Serving](https://www.tensorflow.org/tfx/serving)
 - [TF Serving REST API Reference](https://www.tensorflow.org/tfx/serving/api_rest)
 - [Kubernetes Ingress Controller](https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/)
 - [Persistent Volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/)
--- a/ai/model-serving-tensorflow/deployment.yaml
+++ b/ai/model-serving-tensorflow/deployment.yaml
@ -0,0 +1,34 @@
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: tf-serving
  labels:
    app: tf-serving
 spec:
  replicas: 1
  selector:
    matchLabels:
      app: tf-serving
  template:
    metadata:
      labels:
        app: tf-serving
    spec:
      containers:
        - name: tensorflow-serving
          image: tensorflow/serving:2.19.0
          args:
            - "--model_name=my_model"
            - "--port=8500"
            - "--rest_api_port=8501"
            - "--model_base_path=/models/my_model"
          ports:
            - containerPort: 8500  # gRPC
            - containerPort: 8501  # REST
          volumeMounts:
            - name: model-volume
              mountPath: /models/my_model
      volumes:
        - name: model-volume
          persistentVolumeClaim:
            claimName: my-model-pvc
--- a/ai/model-serving-tensorflow/ingress.yaml
+++ b/ai/model-serving-tensorflow/ingress.yaml
@ -0,0 +1,17 @@
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: tf-serving-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /$2
 spec:
  rules:
    - http:
        paths:
          - path: /tf(/|$)(.*)
            pathType: Prefix
            backend:
              service:
                name: tf-serving
                port:
                  number: 8501
--- a/ai/model-serving-tensorflow/pv.yaml
+++ b/ai/model-serving-tensorflow/pv.yaml
@ -0,0 +1,12 @@
 apiVersion: v1
 kind: PersistentVolume
 metadata:
  name: my-model-pv
 spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadOnlyMany
  persistentVolumeReclaimPolicy: Retain
  hostPath:
    path: /mnt/models/my_model
--- a/ai/model-serving-tensorflow/pvc.yaml
+++ b/ai/model-serving-tensorflow/pvc.yaml
@ -0,0 +1,11 @@
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: my-model-pvc
 spec:
  accessModes:
    - ReadOnlyMany
  resources:
    requests:
      storage: 1Gi
  volumeName: my-model-pv
--- a/ai/model-serving-tensorflow/service.yaml
+++ b/ai/model-serving-tensorflow/service.yaml
@ -0,0 +1,15 @@
 apiVersion: v1
 kind: Service
 metadata:
  name: tf-serving
 spec:
  selector:
    app: tf-serving
  ports:
    - name: grpc
      port: 8500
      targetPort: 8500
    - name: rest
      port: 8501
      targetPort: 8501
  type: ClusterIP