mirror of https://github.com/kubeflow/website.git
170 lines
6.0 KiB
Markdown
170 lines
6.0 KiB
Markdown
+++
|
|
title = "TensorFlow Serving"
|
|
description = "Serving TensorFlow models"
|
|
weight = 51
|
|
+++
|
|
|
|
## Serving a model
|
|
|
|
We treat each deployed model as two [components](https://ksonnet.io/docs/tutorial#2-generate-and-deploy-an-app-component)
|
|
in your APP: one tf-serving-deployment, and one tf-serving-service.
|
|
We can think of the service as a model, and the deployment as the version of the model.
|
|
|
|
Generate the service(model) component
|
|
|
|
```
|
|
ks generate tf-serving-service mnist-service
|
|
ks param set mnist-service modelName mnist // match your deployment mode name
|
|
ks param set mnist-service trafficRule v1:100 // optional, it's the default value
|
|
ks param set mnist-service serviceType LoadBalancer // optional, change type to LoadBalancer to expose external IP
|
|
```
|
|
|
|
Generate the deployment(version) component
|
|
|
|
```
|
|
MODEL_COMPONENT=mnist-v1
|
|
ks generate tf-serving-deployment-gcp ${MODEL_COMPONENT}
|
|
ks param set ${MODEL_COMPONENT} modelName mnist
|
|
ks param set ${MODEL_COMPONENT} versionName v1 // optional, it's the default value
|
|
ks param set ${MODEL_COMPONENT} modelBasePath gs://kubeflow-examples-data/mnist
|
|
ks param set ${MODEL_COMPONENT} gcpCredentialSecretName user-gcp-sa
|
|
ks param set ${MODEL_COMPONENT} injectIstio true // If you want to use istio
|
|
```
|
|
|
|
We enable TF Serving's REST API, and it's able to serve HTTP requests. The API is the same as our http proxy before.
|
|
|
|
### Pointing to the model
|
|
Depending where model file is located, set correct parameters
|
|
|
|
*Google cloud*
|
|
|
|
Set the param as above section.
|
|
|
|
We need a service account that can access the model.
|
|
If you are using Kubeflow's click-to-deploy app, there should be already a secret, `user-gcp-sa`, in the cluster.
|
|
|
|
The model at gs://kubeflow-examples-data/mnist is publicly accessible. However, if your environment doesn't
|
|
have google cloud credential setup, TF serving will not be able to read the model.
|
|
See this [issue](https://github.com/kubeflow/kubeflow/issues/621) for example.
|
|
To setup the google cloud credential, you should either have the environment variable
|
|
`GOOGLE_APPLICATION_CREDENTIALS` pointing to the credential file, or run `gcloud auth login`.
|
|
See [doc](https://cloud.google.com/docs/authentication/) for more detail.
|
|
|
|
*S3*
|
|
|
|
To use S3, generate a different prototype
|
|
```
|
|
ks generate tf-serving-deployment-aws ${MODEL_COMPONENT} --name=${MODEL_NAME}
|
|
```
|
|
|
|
First you need to create secret that will contain access credentials. Use base64 to encode your credentials and check details in the Kubernetes guide to [creating a secret manually](https://kubernetes.io/docs/concepts/configuration/secret/#creating-a-secret-manually)
|
|
```
|
|
apiVersion: v1
|
|
metadata:
|
|
name: secretname
|
|
data:
|
|
AWS_ACCESS_KEY_ID: bmljZSB0cnk6KQ==
|
|
AWS_SECRET_ACCESS_KEY: YnV0IHlvdSBkaWRuJ3QgZ2V0IG15IHNlY3JldCE=
|
|
kind: Secret
|
|
```
|
|
|
|
Enable S3, set url and point to correct Secret
|
|
|
|
```
|
|
MODEL_PATH=s3://kubeflow-models/inception
|
|
ks param set ${MODEL_COMPONENT} modelBasePath ${MODEL_PATH}
|
|
ks param set ${MODEL_COMPONENT} s3Enable true
|
|
ks param set ${MODEL_COMPONENT} s3SecretName secretname
|
|
```
|
|
|
|
Optionally you can also override default parameters of S3
|
|
|
|
```
|
|
# S3 region
|
|
ks param set ${MODEL_COMPONENT} s3AwsRegion us-west-1
|
|
|
|
# Whether or not to use https for S3 connections
|
|
ks param set ${MODEL_COMPONENT} s3UseHttps true
|
|
|
|
# Whether or not to verify https certificates for S3 connections
|
|
ks param set ${MODEL_COMPONENT} s3VerifySsl true
|
|
|
|
# URL for your s3-compatible endpoint.
|
|
ks param set ${MODEL_COMPONENT} s3Endpoint s3.us-west-1.amazonaws.com
|
|
```
|
|
|
|
### Using GPU
|
|
To serve a model with GPU, first make sure your Kubernetes cluster has a GPU node. Then set an additional param:
|
|
```
|
|
ks param set ${MODEL_COMPONENT} numGpus 1
|
|
```
|
|
There is an [example](https://github.com/kubeflow/examples/blob/master/object_detection/tf_serving_gpu.md)
|
|
for serving an object detection model with GPU.
|
|
|
|
### Deploying
|
|
|
|
```
|
|
export KF_ENV=default
|
|
ks apply ${KF_ENV} -c mnist-service
|
|
ks apply ${KF_ENV} -c ${MODEL_COMPONENT}
|
|
```
|
|
|
|
The `KF_ENV` environment variable represents a conceptual deployment environment
|
|
such as development, test, staging, or production, as defined by
|
|
ksonnet. For this example, we use the `default` environment.
|
|
You can read more about Kubeflow's use of ksonnet in the Kubeflow
|
|
[ksonnet component guide](/docs/components/ksonnet/).
|
|
|
|
### Sending prediction request directly
|
|
If the service type is LoadBalancer, it will have its own accessible external ip.
|
|
Get the external ip by:
|
|
|
|
```
|
|
kubectl get svc mnist-service
|
|
```
|
|
|
|
And then send the request
|
|
|
|
```
|
|
curl -X POST -d @input.json http://EXTERNAL_IP:8500/v1/models/mnist:predict
|
|
```
|
|
|
|
### Sending prediction request through ingress and IAP
|
|
If the service type is ClusterIP, you can access through ingress.
|
|
It's protected and only one with right credentials can access the endpoint.
|
|
Below shows how to programmatically authenticate a service account to access IAP.
|
|
|
|
1. Save the client ID that you used to
|
|
[deploy Kubeflow](/docs/gke/deploy/) as `IAP_CLIENT_ID`.
|
|
2. Create a service account
|
|
```
|
|
gcloud iam service-accounts create --project=$PROJECT $SERVICE_ACCOUNT
|
|
```
|
|
3. Grant the service account access to IAP enabled resources:
|
|
```
|
|
gcloud projects add-iam-policy-binding $PROJECT \
|
|
--role roles/iap.httpsResourceAccessor \
|
|
--member serviceAccount:$SERVICE_ACCOUNT
|
|
```
|
|
4. Download the service account key:
|
|
```
|
|
gcloud iam service-accounts keys create ${KEY_FILE} \
|
|
--iam-account ${SERVICE_ACCOUNT}@${PROJECT}.iam.gserviceaccount.com
|
|
```
|
|
5. Export the environment variable `GOOGLE_APPLICATION_CREDENTIALS` to point to the key file of the service account.
|
|
|
|
Finally, you can send the request with this python
|
|
[script](https://github.com/kubeflow/kubeflow/blob/master/docs/gke/iap_request.py)
|
|
|
|
```
|
|
python iap_request.py https://YOUR_HOST/tfserving/models/mnist IAP_CLIENT_ID --input=YOUR_INPUT_FILE
|
|
```
|
|
|
|
## Telemetry and Rolling out model using Istio
|
|
|
|
Please look at the [Istio guide](/docs/components/istio/).
|
|
|
|
## Logs and metrics with Stackdriver
|
|
See the guide to [logging and monitoring](/docs/gke/monitoring/)
|
|
for instructions on getting logs and metrics using Stackdriver.
|