History

Humair Khan 24782d178d chore: update all owners files (#11886 ) Various owners' files reviewers/approvers are no longer involved with the community. Additionaly, we have various other folks that are involved and have shown an active interest in reviewing various portions of the code base. This change updates all owners files to reflect this current state of the community. Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>		2025-05-02 14:47:04 +00:00
..
src	feat(components) Extend kserve component (#10136 )	2023-10-23 22:06:06 +00:00
Dockerfile	chore(components): Update kserve component to v0.12.0 (#10652 )	2024-04-02 20:04:22 +00:00
README.md	chore(components): Update kserve component to v0.12.0 (#10652 )	2024-04-02 20:04:22 +00:00
component.yaml	chore(components): Update kserve component to v0.12.0 (#10652 )	2024-04-02 20:04:22 +00:00
requirements.txt	chore(components): Update kserve component to v0.12.0 (#10652 )	2024-04-02 20:04:22 +00:00

README.md

KServe Component

Organization: KServe

Organization Description: KServe is a highly scalable and standards based Model Inference Platform on Kubernetes for Trusted AI

Version information: KServe 0.12.0. Works for Kubeflow 1.9

Note: To use the KServe 0.7.0 version of this component which runs on Kubeflow 1.5, then change the load_component_from_url in the usage section with the following YAML instead:

https://raw.githubusercontent.com/kubeflow/pipelines/1.8.1/components/kserve/component.yaml

Test status: Currently manual tests

Owners information:

Tommy Li (Tomcli) - IBM, tommy.chaoping.li@ibm.com
Yi-Hong Wang (yhwang) - IBM, yh.wang@ibm.com

Usage

Load the component with:

import kfp.dsl as dsl
import kfp
from kfp import components

kserve_op = components.load_component_from_url('https://raw.githubusercontent.com/kubeflow/pipelines/master/components/kserve/component.yaml')

Arguments

Argument	Default	Description
action	`create`	Action to execute on KServe. Available options are `create`, `update`, `apply`, and `delete`. Note: `apply` is equivalent to `update` if the resource exists and `create` if not.
model_name		Name to give to the deployed model/InferenceService
model_uri		Path of the S3 or GCS compatible directory containing the model.
canary_traffic_percent	`100`	The traffic split percentage between the candidate model and the last ready model
namespace		Kubernetes namespace where the KServe service is deployed. If no namespace is provided, `anonymous` will be used unless a namespace is provided in the `inferenceservice_yaml` argument.
framework		Machine learning framework for model serving. Currently the supported frameworks are `tensorflow`, `pytorch`, `sklearn`, `xgboost`, `onnx`, `triton`, `pmml`, and `lightgbm`.
runtime_version	`latest`	Runtime Version of Machine Learning Framework
resource_requests	`{"cpu": "0.5", "memory": "512Mi"}`	CPU and Memory requests for Model Serving
resource_limits	`{"cpu": "1", "memory": "1Gi"}`	CPU and Memory limits for Model Serving
custom_model_spec	`{}`	Custom model runtime container spec in JSON. Sample spec: `{"image": "codait/max-object-detector", "port":5000, "name": "test-container"}`
inferenceservice_yaml	`{}`	Raw InferenceService serialized YAML for deployment. Use this if you need additional configurations for your InferenceService.
autoscaling_target	`0`	Autoscaling Target Number. If not 0, sets the following annotation on the InferenceService: `autoscaling.knative.dev/target`
service_account		ServiceAccount to use to run the InferenceService pod.
enable_istio_sidecar	`True`	Whether to enable istio sidecar injection.
watch_timeouot	`300`	Timeout in seconds for watching until the InferenceService becomes ready.
min_replicas	`-1`	Minimum number of InferenceService replicas. Default of -1 just delegates to pod default of 1.
max_replicas	`-1`	Maximum number of InferenceService replicas.

Basic InferenceService Creation

The following will use the KServe component to deploy a TensorFlow model.

@dsl.pipeline(
  name='KServe Pipeline',
  description='A pipeline for KServe.'
)
def kserve_pipeline():
    kserve_op(
        action='apply',
        model_name='tf-sample',
        model_uri='gs://kfserving-examples/models/tensorflow/flowers',
        framework='tensorflow',
    )
kfp.Client().create_run_from_pipeline_func(kserve_pipeline, arguments={})

Sample op for deploying a PyTorch model:

kserve_op(
    action='apply',
    model_name='pytorch-test',
    model_uri='gs://kfserving-examples/models/torchserve/image_classifier',
    framework='pytorch'
)

Canary Rollout

Ensure you have an initial model deployed with 100 percent traffic with something like:

kserve_op(
    action = 'apply',
    model_name='tf-sample',
    model_uri='gs://kfserving-examples/models/tensorflow/flowers',
    framework='tensorflow',
)

Deploy the candidate model which will only get a portion of traffic:

kserve_op(
    action='apply',
    model_name='tf-sample',
    model_uri='gs://kfserving-examples/models/tensorflow/flowers-2',
    framework='tensorflow',
    canary_traffic_percent='10'
)

To promote the candidate model, you can either set canary_traffic_percent to 100 or simply remove it, then re-run the pipeline:

kserve_op(
    action='apply',
    model_name='tf-sample',
    model_uri='gs://kfserving-examples/models/tensorflow/flowers-2',
    framework='tensorflow'
)

If you instead want to rollback the candidate model, then set canary_traffic_percent to 0, then re-run the pipeline:

kserve_op(
    action='apply',
    model_name='tf-sample',
    model_uri='gs://kfserving-examples/models/tensorflow/flowers-2',
    framework='tensorflow',
    canary_traffic_percent='0'
)

Deletion

To delete a model, simply set the action to 'delete' and pass in the InferenceService name:

kserve_op(
    action='delete',
    model_name='tf-sample'
)

Custom Runtime

To pass in a custom model serving runtime, you can use the custom_model_spec argument. Currently, the expected format for custom_model_spec coincides with:

{
    "image": "some_image",
    "port": "port_number",
    "name": "custom-container",
    "env" : [{ "name": "some_name", "value": "some_value"}],
    "resources": { "requests": {},  "limits": {}}
}

Sample deployment:

container_spec = '{ "image": "codait/max-object-detector", "port":5000, "name": "custom-container"}'
kserve_op(
    action='apply',
    model_name='custom-simple',
    custom_model_spec=container_spec
)

Deploy using InferenceService YAML

If you need more fine-grained configuration, there is the option to deploy using an InferenceService YAML file:

isvc_yaml = '''
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "sklearn-iris"
  namespace: "anonymous"
spec:
  predictor:
    sklearn:
      storageUri: "gs://kfserving-examples/models/sklearn/iris"
'''
kserve_op(
    action='apply',
    inferenceservice_yaml=isvc_yaml
)