* Adds secretscope and workspaceitem Ops to Azure Databricks for KFP * Adds dbfsblock Op to Azure Databricks for KFP * Adds code coverage to Azure Databricks for KFP tests * Changes email address to generic one * Sets single source of truth for package version * Removes all warnings from tests * Removes deprecated calls to Compiler().compile in pipelines * Removes unnecessary .gitignore line Co-authored-by: creddy123 <31089923+creddy123@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| notebooks | ||
| .gitignore | ||
| README.md | ||
| databricks_cluster_pipeline.py | ||
| databricks_job_pipeline.py | ||
| databricks_notebook_pipeline.py | ||
| databricks_run_pipeline.py | ||
| databricks_secretscope_pipeline.py | ||
| databricks_workspaceitem_pipeline.py | ||
| pipeline_cli.py | ||
| requirements.txt | ||
README.md
Introduction to Azure Databricks pipeline samples
This folder contains several Kubeflow Pipeline samples which show how to manipulate Databricks resources using the Azure Databricks for Kubeflow Pipelines package.
Setup
- Create an Azure Databricks workspace
- Deploy the Azure Databricks Operator for Kubernetes
- Some samples reference 'sparkpi.jar' library. This library can be found here: Create and run a jar job. Upload it to Databricks File System using e.g. DBFS CLI.
- Some samples that use CreateSecretScopeOp reference a secret in Kubernetes. This secret must be created before running these pipelines. For example:
kubectl create secret generic -n kubeflow mysecret --from-literal=username=alex
- Install the Kubeflow Pipelines SDK
- Install Azure Databricks for Kubeflow Pipelines package:
pip install -e "git+https://github.com/kubeflow/pipelines#egg=kfp-azure-databricks&subdirectory=samples/contrib/azure-samples/kfp-azure-databricks" --upgrade
To uninstall Azure Databricks for Kubeflow Pipelines package use:
pip uninstall kfp-azure-databricks
Testing the pipelines
Install the requirements:
pip install --upgrade -r requirements.txt
Compile a pipeline with any of the following commands:
dsl-compile --py databricks_run_pipeline.py --output databricks_run_pipeline.py.tar.gz
# Or
python3 databricks_run_pipeline.py
# Or
python3 pipeline_cli.py compile databricks_run_pipeline.py
Then run the compiled pipeline in Kubeflow:
python3 pipeline_cli.py run databricks_run_pipeline.py.tar.gz http://localhost:8080/pipeline '{"run_name":"test-run","parameter":"10"}'
Or compile and run a pipeline in Kubeflow with a single command:
python3 pipeline_cli.py compile_run databricks_run_pipeline.py http://localhost:8080/pipeline '{"run_name":"test-run","parameter":"10"}'