+++ title = "ModelDB" description ="ModelDB - A system to manage machine learning models" weight = 10 toc = true [menu] [menu.docs] parent = "components" weight = 5 +++ ## Introduction ModelDB is an end-to-end system to manage machine learning models. It ingests models and associated metadata as models are being trained, stores model data in a structured format, and surfaces it through a web-frontend for rich querying. ModelDB can be used with any ML environment via the ModelDB Light API. ModelDB native clients can be used for advanced support in spark.ml and scikit-learn. For more info see [here](https://github.com/mitdbg/modeldb#overview). ## Deploying ModelDB Use the below commands to deploy ModelDB. ``` ks generate modeldb modeldb ks apply default -c modeldb ``` ## Concepts ModelDB organizes model data in a 3-level model hierarchy, from bottom to top - 1. ExperimentRun: every execution of a script/program creates an ExperimentRun. 1. Experiment: related ExperimentRuns can be grouped into an Experiment (e.g., "running hyperparameter optimization for the Neural Network"). 1. Project: Finally, all Experiments and ExperimentRuns belong to a Project (e.g., "churn prediction"). Classes - 1. Datasets takes filepaths and optional metadata. Associate a tag (key) for each Dataset (value). 1. Model takes model type, model and path to model as arguments. 1. ModelConfig takes model type and model config. 1. ModelMetrics takes what metric to use as argument. ## Using ModelDB After ModelDB is deployed and modeldb-db, modeldb-backend and modeldb-frontend pods are running - 1. Install ModelDB Modeldb is now a part of the verta library. verta is compatible with python 3.5+ and the latest verta releases are available as source packages over pip. When using pip it is generally recommended to install packages in a virtual environment to avoid modifying system state. - Check your python version : ```python --version``` - Creating and activating new environment : ```python -m venv .env``` ``` source .env/bin/activate``` - Install Verta : ```pip install verta==versionNumber``` 2. Setup Get the host and port details of the modelDB backend proxy. ``` kubectl get service modeldb-backend-proxy --namespace kubeflow ``` Configure HOST and PORT to connect to the modelDB backend. ``` from verta import ModelDBClient HOST = "" PORT = "" client = ModelDBClient(HOST, PORT) ``` 3. Creating a project Begin by creating a project and adding all the models as runs within the project. Each run can represent a strategy to solve the problem. ``` project = client.set_project(proj_name="My Project") # a project is a goal experiment = client.set_experiment(expt_name="My Experiment") # strategy for project run = client.set_experiment_run(run_name="First run") ``` 4. Logging hyperparameters, metrics and datasets Use ```run.log_xxx()``` in your code to record metrics, hyperparameters, datasets etc. ``` #Hyperparameters param_grid = {'n_estimators': [100], 'learning_rate':[ 0.1, 0.02], 'max_depth' : [6, 4], 'max_leaf_nodes': [3, 15], 'max_features': [1.0, 0.1] } for h, v in param_grid.items(): run.log_hyperparameter(h, v) #Metrics model = GradientBoostingRegressor(**hyperparameters) model.fit(X_train, y_train) y_pred = model.predict(X_test) train_score = model.score(X_train, y_train) test_score = model.score(X_test, y_test) run.log_metric("Accuracy_train", train_score) run.log_metric("Accuracy_test", test_score) #Datasets #save models with either joblib or pickle from sklearn.externals import joblib filename_2 = "simple_model_gbr_2.joblib" joblib.dump(model, filename_2) run.log_model("model_gbr_2", filename_2) ``` 5. View your models in the webapp Get the IP address of the modelDB webapp service and open it in the browser ``` kubectl get service modeldb-webapp --namespace kubeflow ``` ## Samples These notebooks show how each dataset, model, model configuration, and model metrics can be initialized and logged into modelDB - * TensorFlow [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/VertaAI/modeldb-client/blob/development/workflows/demos/tensorflow.ipynb) * Pytorch [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/VertaAI/modeldb-client/blob/development/workflows/demos/pytorch.ipynb) * sklearn [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/VertaAI/modeldb-client/blob/development/workflows/demos/sklearn.ipynb)