pipelines/components/contrib/sample/keras/train_classifier
IronPan 7d6133b564
fix(components): move components to contrib/ folder (#6438)
2021-08-27 14:35:22 -07:00
..
src fix(components): move components to contrib/ folder (#6438) 2021-08-27 14:35:22 -07:00
tests fix(components): move components to contrib/ folder (#6438) 2021-08-27 14:35:22 -07:00
Dockerfile fix(components): move components to contrib/ folder (#6438) 2021-08-27 14:35:22 -07:00
README.md fix(components): move components to contrib/ folder (#6438) 2021-08-27 14:35:22 -07:00
build_image.sh fix(components): move components to contrib/ folder (#6438) 2021-08-27 14:35:22 -07:00
component.yaml fix(components): move components to contrib/ folder (#6438) 2021-08-27 14:35:22 -07:00
requirements.txt fix(components): move components to contrib/ folder (#6438) 2021-08-27 14:35:22 -07:00
run_tests.sh fix(components): move components to contrib/ folder (#6438) 2021-08-27 14:35:22 -07:00

README.md

Keras - Train classifier

Trains classifier using Keras sequential model

Inputs

Name Type Default Description
training_set_features_path GcsPath: {data_type: TSV} Local or GCS path to the training set features table.
training_set_labels_path GcsPath: {data_type: TSV} Local or GCS path to the training set labels (each label is a class index from 0 to num-classes - 1).
output_model_uri GcsPath: {data_type: Keras model} Local or GCS path specifying where to save the trained model. The model (topology + weights + optimizer state) is saved in HDF5 format and can be loaded back by calling keras.models.load_model
model_config GcsPath: {data_type: Keras model config json} JSON string containing the serialized model structure. Can be obtained by calling model.to_json() on a Keras model.
number_of_classes Integer Number of classifier classes.
number_of_epochs Integer 100 Number of epochs to train the model. An epoch is an iteration over the entire x and y data provided.
batch_size Integer 32 Number of samples per gradient update

Outputs

Name Type Default Description
output_model_uri GcsPath: {data_type: Keras model} GCS path where the trained model has been saved. The model (topology + weights + optimizer state) is saved in HDF5 format and can be loaded back by calling keras.models.load_model

Container image

gcr.io/ml-pipeline/components/sample/keras/train_classifier

Usage:

import os
from pathlib import Path
import requests

import kfp

component_url_prefix = 'https://raw.githubusercontent.com/kubeflow/pipelines/master/components/sample/keras/train_classifier/'
test_data_url_prefix = component_url_prefix + 'tests/testdata/'

#Prepare input/output paths and data
input_data_gcs_dir = 'gs://<my bucket>/<path>/'
output_data_gcs_dir = 'gs://<my bucket>/<path>/'

#Downloading the training set (to upload to GCS later)
training_set_features_local_path = os.path.join('.', 'training_set_features.tsv')
training_set_labels_local_path = os.path.join('.', 'training_set_labels.tsv')

training_set_features_url = test_data_url_prefix + '/training_set_features.tsv'
training_set_labels_url = test_data_url_prefix + '/training_set_labels.tsv'

Path(training_set_features_local_path).write_bytes(requests.get(training_set_features_url).content)
Path(training_set_labels_local_path).write_bytes(requests.get(training_set_labels_url).content)

#Uploading the data to GCS where it can be read by the trainer
training_set_features_gcs_path = os.path.join(input_data_gcs_dir, 'training_set_features.tsv')
training_set_labels_gcs_path = os.path.join(input_data_gcs_dir, 'training_set_labels.tsv')

gfile.Copy(training_set_features_local_path, training_set_features_gcs_path)
gfile.Copy(training_set_labels_local_path, training_set_labels_gcs_path)

output_model_uri_template = os.path.join(output_data_gcs_dir, kfp.dsl.EXECUTION_ID_PLACEHOLDER, 'output_model_uri', 'data')

xor_model_config = requests.get(test_data_url_prefix + 'model_config.json').content


#Load the component
train_op = kfp.components.load_component_from_url(component_url_prefix + 'component.yaml')

#Use the component as part of the pipeline
@kfp.dsl.pipeline(name='Test keras/train_classifier', description='Pipeline to test keras/train_classifier component')
def pipeline_to_test_keras_train_classifier():
    train_task = train_op(
        training_set_features_path=training_set_features_gcs_path,
        training_set_labels_path=training_set_labels_gcs_path,
        output_model_uri=output_model_uri_template,
        model_config=xor_model_config,
        number_of_classes=2,
        number_of_epochs=10,
        batch_size=32,
    )
    #Use train_task.outputs['output_model_uri'] to obtain the reference to the trained model URI that can be a passed to other pipeline tasks (e.g. for prediction or analysis)