From d89e54174fb7c61d1ef91b0abc5f68950b6b8f4c Mon Sep 17 00:00:00 2001 From: Craig Osterhout <103533812+craig-osterhout@users.noreply.github.com> Date: Fri, 23 Feb 2024 10:58:06 -0800 Subject: [PATCH] add jupyter use-case guide (#19361) * add jupyter guide Signed-off-by: Craig Osterhout --- content/guides/use-case/jupyter.md | 421 +++++++++++++++++++++++++++++ data/toc.yaml | 3 + 2 files changed, 424 insertions(+) create mode 100644 content/guides/use-case/jupyter.md diff --git a/content/guides/use-case/jupyter.md b/content/guides/use-case/jupyter.md new file mode 100644 index 0000000000..fd29996843 --- /dev/null +++ b/content/guides/use-case/jupyter.md @@ -0,0 +1,421 @@ +--- +description: Run, develop, and share data science projects using JupyterLab and Docker +keywords: getting started, jupyter, notebook, python, jupyterlab, data science +title: Data science with JupyterLab +toc_max: 2 +--- + +Docker and JupyterLab are two powerful tools that can enhance your data science +workflow. In this guide, you will learn how to use them together to create and +run reproducible data science environments. This guide is based on +[Supercharging AI/ML Development with JupyterLab and +Docker](https://www.docker.com/blog/supercharging-ai-ml-development-with-jupyterlab-and-docker/). + +In this guide, you'll learn how to: + +- Run a personal Jupyter Server with JupyterLab on your local machine +- Customize your JupyterLab environment +- Share your JupyterLab notebook and environment with other data scientists + +## What is JupyterLab? + +[JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) is an open source application built around the concept of a computational notebook document. It enables sharing and executing code, data processing, visualization, and offers a range of interactive features for creating graphs. + +## Why use Docker and JupyterLab together? + +By combining Docker and JupyterLab, you can benefit from the advantages of both tools, such as: + +- Containerization ensures a consistent JupyterLab environment across all + deployments, eliminating compatibility issues. +- Containerized JupyterLab simplifies sharing and collaboration by removing the + need for manual environment setup. +- Containers offer scalability for JupyterLab, supporting workload distribution + and efficient resource management with platforms like Kubernetes. + +## Prerequisites + +To follow along with this guide, you must install the latest version of [Docker Desktop](../../../get-docker.md). + +## Run and access a JupyterLab container + +In a terminal, run the following command to run your JupyterLab container. + +```console +$ docker run --rm -p 8889:8888 quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token' +``` +The following are the notable parts of the command: + +- `-p 8889:8888`: Maps port 8889 from the host to port 8888 on the container. +- `start-notebook.py --NotebookApp.token='my-token'`: Sets an access token + rather than using a random token. + +For more details, see the [Jupyter Server Options](https://jupyter-docker-stacks.readthedocs.io/en/latest/using/common.html#jupyter-server-options) and the [docker run CLI reference](/reference/cli/docker/container/run/). + +If this is the first time you are running the image, Docker will download and +run it. The amount of time it takes to download the image will vary depending on +your network connection. + +After the image downloads and runs, you can access the container. To access the +container, in a web browser navigate to +[localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token). + +To stop the container, in the terminal press `ctrl`+`c`. + +To access an existing notebook on your system, you can use a +[bind mount](/storage/bind-mounts/). Open a terminal and +change directory to where your existing notebook is. Then, +run the following command based on your operating system. + +{{< tabs >}} +{{< tab name="Mac / Linux" >}} + +```console +$ docker run --rm -p 8889:8888 -v "$(pwd):/home/jovyan/work" quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token' +``` + +{{< /tab >}} +{{< tab name="Windows (Command Prompt)" >}} + +```console +$ docker run --rm -p 8889:8888 -v "%cd%":/home/jovyan/work quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token' +``` + +{{< /tab >}} +{{< tab name="Windows (PowerShell)" >}} + +```console +$ docker run --rm -p 8889:8888 -v "$(pwd):/home/jovyan/work" quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token' +``` + +{{< /tab >}} +{{< tab name="Windows (Git Bash)" >}} + +```console +$ docker run --rm -p 8889:8888 -v "/$(pwd):/home/jovyan/work" quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token' +``` + +{{< /tab >}} +{{< /tabs >}} + +The `-v` option tells Docker to mount your current working directory to +`/home/jovyan/work` inside the container. By default, the Jupyter image's root +directory is `/home/jovyan` and you can only access or save notebooks to that +directory in the container. + +Now you can access [localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token) and open notebooks contained in the bind mounted directory. + +To stop the container, in the terminal press `ctrl`+`c`. + +Docker also has volumes, which are the preferred mechanism for persisting +data generated by and used by Docker containers. While bind mounts are dependent +on the directory structure and OS of the host machine, volumes are completely +managed by Docker. + +## Save and access notebooks + +When you remove a container, all data in that container is deleted. To save +notebooks outside of the container, you can use a [volume](/storage/volumes/). + +### Run a JupterLab container with a volume + +To start the container with a volume, open a terminal and run the following command + +```console +$ docker run --rm -p 8889:8888 -v jupyter-data:/home/jovyan/work quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token' +``` + +The `-v` option tells Docker to create a volume named `jupyter-data` and mount it in the container at `/home/jovyan/work`. + +To access the container, in a web browser navigate to +[localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token). +Notebooks can now be saved to the volume and will accessible even when +the container is deleted. + +### Save a notebook to the volume + +For this example, you'll use the [Iris Dataset](https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html) example from scikit-learn. + +1. Open a web browser and access your JupyterLab container at [localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token). + +2. In the **Launcher**, under **Notebook**, select **Python 3**. + +3. In the notebook, specify the following to install the necessary packages. + + ```console + !pip install matplotlib scikit-learn + ``` + +4. Select the play button to run the code. + +5. In the notebook, specify the following code. + ```python + from sklearn import datasets + + iris = datasets.load_iris() + import matplotlib.pyplot as plt + + _, ax = plt.subplots() + scatter = ax.scatter(iris.data[:, 0], iris.data[:, 1], c=iris.target) + ax.set(xlabel=iris.feature_names[0], ylabel=iris.feature_names[1]) + _ = ax.legend( + scatter.legend_elements()[0], iris.target_names, loc="lower right", title="Classes" + ) + ``` +6. Select the play button to run the code. You should see a scatter plot of the + Iris dataset. + +7. In the top menu, select **File** and then **Save Notebook**. + +8. Specify a name in the `work` directory to save the notebook to the volume. + For example, `work/mynotebook.ipynb`. + +9. Select **Rename** to save the notebook. + +The notebook is now saved in the volume. + +In the terminal, press `ctrl`+ `c` to stop the container. + +Now, any time you run a Jupyter container with the volume, you'll have access to the saved notebook. + +When you do run a new container, and then run the data plot code again, it'll +need to run `!pip install matplotlib scikit-learn` and download the packages. +You can avoid reinstalling packages every time you run a new container by +creating your own image with the packages already installed. + +## Customize your JupyterLab environment + +You can create your own JupyterLab environment and build it into an image using +Docker. By building your own image, you can customize your JupyterLab +environment with the packages and tools you need, and ensure that it's +consistent and reproducible across different deployments. Building your own +image also makes it easier to share your JupyterLab environment with others, or +to use it as a base for further development. + +### Define your environment in a Dockerfile + +In the previous Iris Dataset example from [Save a notebook to the volume](#save-a-notebook-to-the-volume), you had to install the dependencies, `matplotlib` and `scikit-learn`, every time you ran a new container. While the dependencies in that small example download and +install quickly, it may become a problem as your list of dependencies grow. +There may also be other tools, packages, or files that you always want in your +environment. + +In this case, you can install the dependencies as part of the environment in the +image. Then, every time you run your container, the dependencies will always be +installed. + +You can define your environment in a Dockerfile. A Dockerfile is a text file +that instructs Docker how to create an image of your JupyterLab environment. An +image contains everything you want and need when running JupyterLab, such as +files, packages, and tools. + +In a directory of your choice, create a new text file named `Dockerfile`. Open the `Dockerfile` in an IDE or text editor and then add the following contents. + +```dockerfile +# syntax=docker/dockerfile:1 + +FROM quay.io/jupyter/base-notebook +RUN pip install --no-cache-dir matplotlib scikit-learn +``` + +This Dockerfile uses the `quay.io/jupyter/base-notebook` image as the base, and then runs `pip` to install the dependencies. For more details about the instructions in the Dockerfile, see the [Dockerfile reference](/reference/dockerfile/). + +Before you proceed, save your changes to the `Dockerfile`. + +### Build your environment into an image + +After you have a `Dockerfile` to define your environment, you can use `docker +build` to build an image using your `Dockerfile`. + +Open a terminal, change directory to the directory where your `Dockerfile` is +located, and then run the following command. + +```console +$ docker build -t my-jupyter-image . +``` + +The command builds a Docker image from your `Dockerfile` and a context. The +`-t` option specifies the name and tag of the image, in this case +`my-jupyter-image`. The `.` indicates that the current directory is the context, +which means that the files in that directory can be used in the image creation +process. + +You can verify that the image was built by viewing the `Images` view in Docker Desktop, or by running the `docker image ls` command in a terminal. You should see an image named `my-jupyter-image`. + +## Run your image as a container + +To run your image as a container, you use the `docker run` command. In the +`docker run` command, you'll specify your own image name. + +```console +$ docker run --rm -p 8889:8888 my-jupyter-image start-notebook.py --NotebookApp.token='my-token' +``` + +To access the container, in a web browser navigate to +[localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token). + +You can now use the packages without having to install them in your notebook. + +1. In the **Launcher**, under **Notebook**, select **Python 3**. + +2. In the notebook, specify the following code. + + ```python + from sklearn import datasets + + iris = datasets.load_iris() + import matplotlib.pyplot as plt + + _, ax = plt.subplots() + scatter = ax.scatter(iris.data[:, 0], iris.data[:, 1], c=iris.target) + ax.set(xlabel=iris.feature_names[0], ylabel=iris.feature_names[1]) + _ = ax.legend( + scatter.legend_elements()[0], iris.target_names, loc="lower right", title="Classes" + ) + ``` + +3. Select the play button to run the code. You should see a scatter plot of the Iris dataset. + +In the terminal, press `ctrl`+ `c` to stop the container. + +## Use Compose to run your container + +Docker Compose is a tool for defining and running multi-container applications. +In this case, the application isn't a multi-container application, but Docker +Compose can make it easier to run by defining all the `docker run` options in a +file. + +### Create a Compose file + +To use Compose, you need a `compose.yaml` file. In the same directory as your +`Dockerfile`, create a new file named `compose.yaml`. + +Open the `compose.yaml` file in an IDE or text editor and add the following +contents. + +```yaml +services: + jupyter: + build: + context: . + ports: + - 8889:8888 + volumes: + - jupyter-data:/home/jovyan/work + command: start-notebook.py --NotebookApp.token='my-token' + +volumes: + jupyter-data: + name: jupyter-data +``` + +This Compose file specifies all the options you used in the `docker run` command. For more details about the Compose instructions, see the +[Compose file reference](../../../compose/compose-file/_index.md). + +Before you proceed, save your changes to the `compose.yaml` file. + +### Run your container using Compose + +Open a terminal, change directory to where your `compose.yaml` file is located, and then run the following command. + +```console +$ docker compose up --build +``` + +This command builds your image and runs it as a container using the instructions +specified in the `compose.yaml` file. The `--build` option ensures that your +image is rebuilt, which is necessary if you made changes to your `Dockefile`. + +To access the container, in a web browser navigate to +[localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token). + +In the terminal, press `ctrl`+ `c` to stop the container. + +## Share your work + +By sharing your image and notebook, you create a portable and replicable +research environment that can be easily accessed and used by other data +scientists. This process not only facilitates collaboration but also ensures +that your work is preserved in an environment where it can be run without +compatibility issues. + +To share your image and data, you'll use [Docker Hub](https://hub.docker.com/). Docker Hub is a cloud-based registry service that lets you share and distribute container images. + +### Share your image + +1. [Sign up](https://www.docker.com/pricing?utm_source=docker&utm_medium=webreferral&utm_campaign=docs_driven_upgrade) or sign in to [Docker Hub](https://hub.docker.com). + +2. Rename your image so that Docker knows which repository to push it to. Open a + terminal and run the following `docker tag` command. Replace `YOUR-USER-NAME` + with your Docker ID. + + ```console + $ docker tag my-jupyter-image YOUR-USER-NAME/my-jupyter-image + ``` + +3. Run the following `docker push` command to push the image to Docker Hub. + Replace `YOUR-USER-NAME` with your Docker ID. + + ```console + $ docker push YOUR-USER-NAME/my-jupyter-image + ``` + +4. Verify that you pushed the image to Docker Hub. + 1. Go to [Docker Hub](https://hub.docker.com). + 2. Select **Repositories**. + 3. View the **Last pushed** time for your repository. + +Other users can now download and run your image using the `docker run` command. They need to replace `YOUR-USER-NAME` with your Docker ID. + +```console +$ docker run --rm -p 8889:8888 YOUR-USER-NAME/my-jupyer-image start-notebook.py --NotebookApp.token='my-token' +``` + +### Share your volume + +This example uses the Docker Desktop [Volumes Backup & Share](https://hub.docker.com/extensions/docker/volumes-backup-extension) extension. Alternatively, in the CLI you can [back up the volume](/storage/volumes/#back-up-a-volume) and then [push it using the ORAS CLI](/docker-hub/oci-artifacts/#push-a-volume). + +1. Install the Volumes Backup & Share extension. + 1. Open the Docker Dashboard and select **Extensions**. + 2. Search for `Volumes Backup & Share`. + 3. In the search results select **Install** for the extension. + +2. Open the **Volumes Backup & Share** extension in the Docker Dashboard. +3. Next to the **jupyter-data** volume, select the **Export volume** icon. +4. In the **Export content** window, select **Registry**. +5. In the text box under **Registry**, specify your Docker ID and a name for the + volume. For example, `YOUR-USERNAME/jupyter-data`. +6. Select **Export**. +7. Verify that you exported the volume to Docker Hub. + 1. Go to [Docker Hub](https://hub.docker.com). + 2. Select **Repositories**. + 3. View the **Last pushed** time for your repository. + +Other users can now download and import your volume. To import the volume and then run it with your image: + +1. In the Volumes Backup & Share extension, select **Import into new volume**. +2. In the **Import into a new volume** window, select **Registry**. +3. In the text box under **Registry**, specify your Docker ID and the repository + name for the volume. For example, `YOUR-USERNAME/jupyter-data`. +4. In **Volume name**, specify the name you want to give the + volume. This example uses `jupyter-data` as the name. +5. Select **Import**. +6. In a terminal, run `docker run` to run your image with the imported volume. + Replace `YOUR-USER-NAME` with your Docker ID. + + ```console + $ docker run --rm -p 8889:8888 -v jupyter-data:/home/jovyan/work YOUR-USER-NAME/my-jupyter-image start-notebook.py --NotebookApp.token='my-token' + ``` + +## Summary + +In this guide, you learned how to leverage Docker and JupyterLab to create +reproducible data science environments, facilitating the development and sharing +of data science projects. This included, running a personal JupyterLab server, +customizing the environment with necessary tools and packages, and sharing +notebooks and environments with other data scientists. + +Related information: + +- [Dockerfile reference](/reference/dockerfile/) +- [Compose file reference](/compose/compose-file/) +- [Docker CLI reference](reference/cli/docker/) +- [Jupyter Docker Stacks docs](https://jupyter-docker-stacks.readthedocs.io/en/latest/) \ No newline at end of file diff --git a/data/toc.yaml b/data/toc.yaml index 6b100f4ffc..a09575ac80 100644 --- a/data/toc.yaml +++ b/data/toc.yaml @@ -179,6 +179,9 @@ Guides: title: Text summarization - path: /scout/guides/vex/ title: Suppress CVEs with VEX + - path: /guides/use-case/jupyter/ + title: Data science with JupyterLab + - sectiontitle: Develop with Docker section: