# KServe Component Organization: KServe Organization Description: KServe is a highly scalable and standards based Model Inference Platform on Kubernetes for Trusted AI Version information: KServe 0.12.0. Works for Kubeflow 1.9 **Note:** To use the KServe 0.7.0 version of this component which runs on Kubeflow 1.5, then change the load_component_from_url in the usage section with the following YAML instead: ``` https://raw.githubusercontent.com/kubeflow/pipelines/1.8.1/components/kserve/component.yaml ``` Test status: Currently manual tests Owners information: - Tommy Li (Tomcli) - IBM, tommy.chaoping.li@ibm.com - Yi-Hong Wang (yhwang) - IBM, yh.wang@ibm.com ## Usage Load the component with: ```python import kfp.dsl as dsl import kfp from kfp import components kserve_op = components.load_component_from_url('https://raw.githubusercontent.com/kubeflow/pipelines/master/components/kserve/component.yaml') ``` ### Arguments | Argument | Default | Description | |----------|---------|-------------| | action | `create` | Action to execute on KServe. Available options are `create`, `update`, `apply`, and `delete`. Note: `apply` is equivalent to `update` if the resource exists and `create` if not. | | model_name | | Name to give to the deployed model/InferenceService | | model_uri | | Path of the S3 or GCS compatible directory containing the model. | | canary_traffic_percent | `100` | The traffic split percentage between the candidate model and the last ready model | | namespace | | Kubernetes namespace where the KServe service is deployed. If no namespace is provided, `anonymous` will be used unless a namespace is provided in the `inferenceservice_yaml` argument. | | framework | | Machine learning framework for model serving. Currently the supported frameworks are `tensorflow`, `pytorch`, `sklearn`, `xgboost`, `onnx`, `triton`, `pmml`, and `lightgbm`. | | runtime_version | `latest` | Runtime Version of Machine Learning Framework | | resource_requests | `{"cpu": "0.5", "memory": "512Mi"}` | CPU and Memory requests for Model Serving | | resource_limits | `{"cpu": "1", "memory": "1Gi"}` | CPU and Memory limits for Model Serving | | custom_model_spec | `{}` | Custom model runtime container spec in JSON. Sample spec: `{"image": "codait/max-object-detector", "port":5000, "name": "test-container"}` | | inferenceservice_yaml | `{}` | Raw InferenceService serialized YAML for deployment. Use this if you need additional configurations for your InferenceService. | | autoscaling_target | `0` | Autoscaling Target Number. If not 0, sets the following annotation on the InferenceService: `autoscaling.knative.dev/target` | | service_account | | ServiceAccount to use to run the InferenceService pod. | | enable_istio_sidecar | `True` | Whether to enable istio sidecar injection. | | watch_timeouot | `300` | Timeout in seconds for watching until the InferenceService becomes ready. | | min_replicas | `-1` | Minimum number of InferenceService replicas. Default of -1 just delegates to pod default of 1. | | max_replicas | `-1` | Maximum number of InferenceService replicas. | ### Basic InferenceService Creation The following will use the KServe component to deploy a TensorFlow model. ```python @dsl.pipeline( name='KServe Pipeline', description='A pipeline for KServe.' ) def kserve_pipeline(): kserve_op( action='apply', model_name='tf-sample', model_uri='gs://kfserving-examples/models/tensorflow/flowers', framework='tensorflow', ) kfp.Client().create_run_from_pipeline_func(kserve_pipeline, arguments={}) ``` Sample op for deploying a PyTorch model: ```python kserve_op( action='apply', model_name='pytorch-test', model_uri='gs://kfserving-examples/models/torchserve/image_classifier', framework='pytorch' ) ``` ### Canary Rollout Ensure you have an initial model deployed with 100 percent traffic with something like: ```python kserve_op( action = 'apply', model_name='tf-sample', model_uri='gs://kfserving-examples/models/tensorflow/flowers', framework='tensorflow', ) ``` Deploy the candidate model which will only get a portion of traffic: ```python kserve_op( action='apply', model_name='tf-sample', model_uri='gs://kfserving-examples/models/tensorflow/flowers-2', framework='tensorflow', canary_traffic_percent='10' ) ``` To promote the candidate model, you can either set `canary_traffic_percent` to `100` or simply remove it, then re-run the pipeline: ```python kserve_op( action='apply', model_name='tf-sample', model_uri='gs://kfserving-examples/models/tensorflow/flowers-2', framework='tensorflow' ) ``` If you instead want to rollback the candidate model, then set `canary_traffic_percent` to `0`, then re-run the pipeline: ```python kserve_op( action='apply', model_name='tf-sample', model_uri='gs://kfserving-examples/models/tensorflow/flowers-2', framework='tensorflow', canary_traffic_percent='0' ) ``` ### Deletion To delete a model, simply set the `action` to `'delete'` and pass in the InferenceService name: ```python kserve_op( action='delete', model_name='tf-sample' ) ``` ### Custom Runtime To pass in a custom model serving runtime, you can use the `custom_model_spec` argument. Currently, the expected format for `custom_model_spec` coincides with: ```json { "image": "some_image", "port": "port_number", "name": "custom-container", "env" : [{ "name": "some_name", "value": "some_value"}], "resources": { "requests": {}, "limits": {}} } ``` Sample deployment: ```python container_spec = '{ "image": "codait/max-object-detector", "port":5000, "name": "custom-container"}' kserve_op( action='apply', model_name='custom-simple', custom_model_spec=container_spec ) ``` ### Deploy using InferenceService YAML If you need more fine-grained configuration, there is the option to deploy using an InferenceService YAML file: ```python isvc_yaml = ''' apiVersion: "serving.kserve.io/v1beta1" kind: "InferenceService" metadata: name: "sklearn-iris" namespace: "anonymous" spec: predictor: sklearn: storageUri: "gs://kfserving-examples/models/sklearn/iris" ''' kserve_op( action='apply', inferenceservice_yaml=isvc_yaml ) ```