6.2 KiB
KServe Component
Organization: KServe
Organization Description: KServe is a highly scalable and standards based Model Inference Platform on Kubernetes for Trusted AI
Version information: KServe 0.12.0. Works for Kubeflow 1.9
Note: To use the KServe 0.7.0 version of this component which runs on Kubeflow 1.5, then change the load_component_from_url in the usage section with the following YAML instead:
https://raw.githubusercontent.com/kubeflow/pipelines/1.8.1/components/kserve/component.yaml
Test status: Currently manual tests
Owners information:
- Tommy Li (Tomcli) - IBM, tommy.chaoping.li@ibm.com
- Yi-Hong Wang (yhwang) - IBM, yh.wang@ibm.com
Usage
Load the component with:
import kfp.dsl as dsl
import kfp
from kfp import components
kserve_op = components.load_component_from_url('https://raw.githubusercontent.com/kubeflow/pipelines/master/components/kserve/component.yaml')
Arguments
Argument | Default | Description |
---|---|---|
action | create |
Action to execute on KServe. Available options are create , update , apply , and delete . Note: apply is equivalent to update if the resource exists and create if not. |
model_name | Name to give to the deployed model/InferenceService | |
model_uri | Path of the S3 or GCS compatible directory containing the model. | |
canary_traffic_percent | 100 |
The traffic split percentage between the candidate model and the last ready model |
namespace | Kubernetes namespace where the KServe service is deployed. If no namespace is provided, anonymous will be used unless a namespace is provided in the inferenceservice_yaml argument. |
|
framework | Machine learning framework for model serving. Currently the supported frameworks are tensorflow , pytorch , sklearn , xgboost , onnx , triton , pmml , and lightgbm . |
|
runtime_version | latest |
Runtime Version of Machine Learning Framework |
resource_requests | {"cpu": "0.5", "memory": "512Mi"} |
CPU and Memory requests for Model Serving |
resource_limits | {"cpu": "1", "memory": "1Gi"} |
CPU and Memory limits for Model Serving |
custom_model_spec | {} |
Custom model runtime container spec in JSON. Sample spec: {"image": "codait/max-object-detector", "port":5000, "name": "test-container"} |
inferenceservice_yaml | {} |
Raw InferenceService serialized YAML for deployment. Use this if you need additional configurations for your InferenceService. |
autoscaling_target | 0 |
Autoscaling Target Number. If not 0, sets the following annotation on the InferenceService: autoscaling.knative.dev/target |
service_account | ServiceAccount to use to run the InferenceService pod. | |
enable_istio_sidecar | True |
Whether to enable istio sidecar injection. |
watch_timeouot | 300 |
Timeout in seconds for watching until the InferenceService becomes ready. |
min_replicas | -1 |
Minimum number of InferenceService replicas. Default of -1 just delegates to pod default of 1. |
max_replicas | -1 |
Maximum number of InferenceService replicas. |
Basic InferenceService Creation
The following will use the KServe component to deploy a TensorFlow model.
@dsl.pipeline(
name='KServe Pipeline',
description='A pipeline for KServe.'
)
def kserve_pipeline():
kserve_op(
action='apply',
model_name='tf-sample',
model_uri='gs://kfserving-examples/models/tensorflow/flowers',
framework='tensorflow',
)
kfp.Client().create_run_from_pipeline_func(kserve_pipeline, arguments={})
Sample op for deploying a PyTorch model:
kserve_op(
action='apply',
model_name='pytorch-test',
model_uri='gs://kfserving-examples/models/torchserve/image_classifier',
framework='pytorch'
)
Canary Rollout
Ensure you have an initial model deployed with 100 percent traffic with something like:
kserve_op(
action = 'apply',
model_name='tf-sample',
model_uri='gs://kfserving-examples/models/tensorflow/flowers',
framework='tensorflow',
)
Deploy the candidate model which will only get a portion of traffic:
kserve_op(
action='apply',
model_name='tf-sample',
model_uri='gs://kfserving-examples/models/tensorflow/flowers-2',
framework='tensorflow',
canary_traffic_percent='10'
)
To promote the candidate model, you can either set canary_traffic_percent
to 100
or simply remove it, then re-run the pipeline:
kserve_op(
action='apply',
model_name='tf-sample',
model_uri='gs://kfserving-examples/models/tensorflow/flowers-2',
framework='tensorflow'
)
If you instead want to rollback the candidate model, then set canary_traffic_percent
to 0
, then re-run the pipeline:
kserve_op(
action='apply',
model_name='tf-sample',
model_uri='gs://kfserving-examples/models/tensorflow/flowers-2',
framework='tensorflow',
canary_traffic_percent='0'
)
Deletion
To delete a model, simply set the action
to 'delete'
and pass in the InferenceService name:
kserve_op(
action='delete',
model_name='tf-sample'
)
Custom Runtime
To pass in a custom model serving runtime, you can use the custom_model_spec
argument. Currently,
the expected format for custom_model_spec
coincides with:
{
"image": "some_image",
"port": "port_number",
"name": "custom-container",
"env" : [{ "name": "some_name", "value": "some_value"}],
"resources": { "requests": {}, "limits": {}}
}
Sample deployment:
container_spec = '{ "image": "codait/max-object-detector", "port":5000, "name": "custom-container"}'
kserve_op(
action='apply',
model_name='custom-simple',
custom_model_spec=container_spec
)
Deploy using InferenceService YAML
If you need more fine-grained configuration, there is the option to deploy using an InferenceService YAML file:
isvc_yaml = '''
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "sklearn-iris"
namespace: "anonymous"
spec:
predictor:
sklearn:
storageUri: "gs://kfserving-examples/models/sklearn/iris"
'''
kserve_op(
action='apply',
inferenceservice_yaml=isvc_yaml
)