pipelines/samples/contrib/azure-samples/databricks-pipelines
Alejandro Campos Magencio 0708cd723e Add new Ops to Azure Databricks for KFP: secretscope, workspaceitem & dbfsblock (#2817)
* Adds secretscope and workspaceitem Ops to Azure Databricks for KFP

* Adds dbfsblock Op to Azure Databricks for KFP

* Adds code coverage to Azure Databricks for KFP tests

* Changes email address to generic one

* Sets single source of truth for package version

* Removes all warnings from tests

* Removes deprecated calls to Compiler().compile in pipelines

* Removes unnecessary .gitignore line

Co-authored-by: creddy123 <31089923+creddy123@users.noreply.github.com>
2020-01-21 08:13:36 -08:00
..
notebooks Add new Ops to Azure Databricks for KFP: secretscope, workspaceitem & dbfsblock (#2817) 2020-01-21 08:13:36 -08:00
.gitignore Add samples to manage Azure Databricks in Kubeflow Pipelines (#2709) 2019-12-13 12:15:36 -08:00
README.md Add new Ops to Azure Databricks for KFP: secretscope, workspaceitem & dbfsblock (#2817) 2020-01-21 08:13:36 -08:00
databricks_cluster_pipeline.py Add new Ops to Azure Databricks for KFP: secretscope, workspaceitem & dbfsblock (#2817) 2020-01-21 08:13:36 -08:00
databricks_job_pipeline.py Add new Ops to Azure Databricks for KFP: secretscope, workspaceitem & dbfsblock (#2817) 2020-01-21 08:13:36 -08:00
databricks_notebook_pipeline.py Add new Ops to Azure Databricks for KFP: secretscope, workspaceitem & dbfsblock (#2817) 2020-01-21 08:13:36 -08:00
databricks_run_pipeline.py Add new Ops to Azure Databricks for KFP: secretscope, workspaceitem & dbfsblock (#2817) 2020-01-21 08:13:36 -08:00
databricks_secretscope_pipeline.py Add new Ops to Azure Databricks for KFP: secretscope, workspaceitem & dbfsblock (#2817) 2020-01-21 08:13:36 -08:00
databricks_workspaceitem_pipeline.py Add new Ops to Azure Databricks for KFP: secretscope, workspaceitem & dbfsblock (#2817) 2020-01-21 08:13:36 -08:00
pipeline_cli.py Add samples to manage Azure Databricks in Kubeflow Pipelines (#2709) 2019-12-13 12:15:36 -08:00
requirements.txt Add samples to manage Azure Databricks in Kubeflow Pipelines (#2709) 2019-12-13 12:15:36 -08:00

README.md

Introduction to Azure Databricks pipeline samples

This folder contains several Kubeflow Pipeline samples which show how to manipulate Databricks resources using the Azure Databricks for Kubeflow Pipelines package.

Setup

  1. Create an Azure Databricks workspace
  2. Deploy the Azure Databricks Operator for Kubernetes
  3. Some samples reference 'sparkpi.jar' library. This library can be found here: Create and run a jar job. Upload it to Databricks File System using e.g. DBFS CLI.
  4. Some samples that use CreateSecretScopeOp reference a secret in Kubernetes. This secret must be created before running these pipelines. For example:
kubectl create secret generic -n kubeflow mysecret --from-literal=username=alex 
  1. Install the Kubeflow Pipelines SDK
  2. Install Azure Databricks for Kubeflow Pipelines package:
pip install -e "git+https://github.com/kubeflow/pipelines#egg=kfp-azure-databricks&subdirectory=samples/contrib/azure-samples/kfp-azure-databricks" --upgrade

To uninstall Azure Databricks for Kubeflow Pipelines package use:

pip uninstall kfp-azure-databricks

Testing the pipelines

Install the requirements:

pip install --upgrade -r requirements.txt

Compile a pipeline with any of the following commands:

dsl-compile --py databricks_run_pipeline.py --output databricks_run_pipeline.py.tar.gz
# Or
python3 databricks_run_pipeline.py
# Or
python3 pipeline_cli.py compile databricks_run_pipeline.py

Then run the compiled pipeline in Kubeflow:

python3 pipeline_cli.py run databricks_run_pipeline.py.tar.gz http://localhost:8080/pipeline '{"run_name":"test-run","parameter":"10"}'

Or compile and run a pipeline in Kubeflow with a single command:

python3 pipeline_cli.py compile_run databricks_run_pipeline.py http://localhost:8080/pipeline '{"run_name":"test-run","parameter":"10"}'