From d89e54174fb7c61d1ef91b0abc5f68950b6b8f4c Mon Sep 17 00:00:00 2001
From: Craig Osterhout <103533812+craig-osterhout@users.noreply.github.com>
Date: Fri, 23 Feb 2024 10:58:06 -0800
Subject: [PATCH] add jupyter use-case guide (#19361)

* add jupyter guide

Signed-off-by: Craig Osterhout <craig.osterhout@docker.com>
---
 content/guides/use-case/jupyter.md | 421 +++++++++++++++++++++++++++++
 data/toc.yaml                      |   3 +
 2 files changed, 424 insertions(+)
 create mode 100644 content/guides/use-case/jupyter.md

diff --git a/content/guides/use-case/jupyter.md b/content/guides/use-case/jupyter.md
new file mode 100644
index 0000000000..fd29996843
--- /dev/null
+++ b/content/guides/use-case/jupyter.md
@@ -0,0 +1,421 @@
+---
+description: Run, develop, and share data science projects using JupyterLab and Docker
+keywords: getting started, jupyter, notebook, python, jupyterlab, data science
+title: Data science with JupyterLab
+toc_max: 2
+---
+
+Docker and JupyterLab are two powerful tools that can enhance your data science
+workflow. In this guide, you will learn how to use them together to create and
+run reproducible data science environments. This guide is based on
+[Supercharging AI/ML Development with JupyterLab and
+Docker](https://www.docker.com/blog/supercharging-ai-ml-development-with-jupyterlab-and-docker/).
+
+In this guide, you'll learn how to:
+
+- Run a personal Jupyter Server with JupyterLab on your local machine
+- Customize your JupyterLab environment
+- Share your JupyterLab notebook and environment with other data scientists
+
+## What is JupyterLab?
+
+[JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) is an open source application built around the concept of a computational notebook document. It enables sharing and executing code, data processing, visualization, and offers a range of interactive features for creating graphs.
+
+## Why use Docker and JupyterLab together?
+
+By combining Docker and JupyterLab, you can benefit from the advantages of both tools, such as:
+
+- Containerization ensures a consistent JupyterLab environment across all
+  deployments, eliminating compatibility issues.
+- Containerized JupyterLab simplifies sharing and collaboration by removing the
+  need for manual environment setup.
+- Containers offer scalability for JupyterLab, supporting workload distribution
+  and efficient resource management with platforms like Kubernetes.
+
+## Prerequisites
+
+To follow along with this guide, you must install the latest version of [Docker Desktop](../../../get-docker.md).
+
+## Run and access a JupyterLab container
+
+In a terminal, run the following command to run your JupyterLab container.
+
+```console
+$ docker run --rm -p 8889:8888 quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
+```
+The following are the notable parts of the command:
+
+- `-p 8889:8888`: Maps port 8889 from the host to port 8888 on the container.
+- `start-notebook.py --NotebookApp.token='my-token'`: Sets an access token
+  rather than using a random token.
+
+For more details, see the [Jupyter Server Options](https://jupyter-docker-stacks.readthedocs.io/en/latest/using/common.html#jupyter-server-options) and the [docker run CLI reference](/reference/cli/docker/container/run/).
+
+If this is the first time you are running the image, Docker will download and
+run it. The amount of time it takes to download the image will vary depending on
+your network connection.
+
+After the image downloads and runs, you can access the container. To access the
+container, in a web browser navigate to
+[localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token).
+
+To stop the container, in the terminal press `ctrl`+`c`.
+
+To access an existing notebook on your system, you can use a
+[bind mount](/storage/bind-mounts/). Open a terminal and
+change directory to where your existing notebook is. Then,
+run the following command based on your operating system.
+
+{{< tabs >}}
+{{< tab name="Mac / Linux" >}}
+
+```console
+$ docker run --rm -p 8889:8888 -v "$(pwd):/home/jovyan/work" quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
+```
+
+{{< /tab >}}
+{{< tab name="Windows (Command Prompt)" >}}
+
+```console
+$ docker run --rm -p 8889:8888 -v "%cd%":/home/jovyan/work quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
+```
+
+{{< /tab >}}
+{{< tab name="Windows (PowerShell)" >}}
+
+```console
+$ docker run --rm -p 8889:8888 -v "$(pwd):/home/jovyan/work" quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
+```
+
+{{< /tab >}}
+{{< tab name="Windows (Git Bash)" >}}
+
+```console
+$ docker run --rm -p 8889:8888 -v "/$(pwd):/home/jovyan/work" quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
+```
+
+{{< /tab >}}
+{{< /tabs >}}
+
+The `-v` option tells Docker to mount your current working directory to
+`/home/jovyan/work` inside the container. By default, the Jupyter image's root
+directory is `/home/jovyan` and you can only access or save notebooks to that
+directory in the container.
+
+Now you can access [localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token) and open notebooks contained in the bind mounted directory.
+
+To stop the container, in the terminal press `ctrl`+`c`.
+
+Docker also has volumes, which are the preferred mechanism for persisting
+data generated by and used by Docker containers. While bind mounts are dependent
+on the directory structure and OS of the host machine, volumes are completely
+managed by Docker.
+
+## Save and access notebooks
+
+When you remove a container, all data in that container is deleted. To save
+notebooks outside of the container, you can use a [volume](/storage/volumes/).
+
+### Run a JupterLab container with a volume
+
+To start the container with a volume, open a terminal and run the following command
+
+```console
+$ docker run --rm -p 8889:8888 -v jupyter-data:/home/jovyan/work quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
+```
+
+The `-v` option tells Docker to create a volume named `jupyter-data` and mount it in the container at `/home/jovyan/work`.
+
+To access the container, in a web browser navigate to
+[localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token).
+Notebooks can now be saved to the volume and will accessible even when
+the container is deleted.
+
+### Save a notebook to the volume
+
+For this example, you'll use the [Iris Dataset](https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html) example from scikit-learn.
+
+1. Open a web browser and access your JupyterLab container at [localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token).
+
+2. In the **Launcher**, under **Notebook**, select **Python 3**.
+
+3. In the notebook, specify the following to install the necessary packages.
+
+   ```console
+   !pip install matplotlib scikit-learn
+   ```
+
+4. Select the play button to run the code.
+
+5. In the notebook, specify the following code.
+   ```python
+   from sklearn import datasets
+
+   iris = datasets.load_iris()
+   import matplotlib.pyplot as plt
+
+   _, ax = plt.subplots()
+   scatter = ax.scatter(iris.data[:, 0], iris.data[:, 1], c=iris.target)
+   ax.set(xlabel=iris.feature_names[0], ylabel=iris.feature_names[1])
+   _ = ax.legend(
+      scatter.legend_elements()[0], iris.target_names, loc="lower right", title="Classes"
+   )
+   ```
+6. Select the play button to run the code. You should see a scatter plot of the
+   Iris dataset.
+
+7. In the top menu, select **File** and then **Save Notebook**.
+
+8. Specify a name in the `work` directory to save the notebook to the volume.
+   For example, `work/mynotebook.ipynb`.
+
+9. Select **Rename** to save the notebook.
+
+The notebook is now saved in the volume.
+
+In the terminal, press `ctrl`+ `c` to stop the container.
+
+Now, any time you run a Jupyter container with the volume, you'll have access to the saved notebook.
+
+When you do run a new container, and then run the data plot code again, it'll
+need to run `!pip install matplotlib scikit-learn` and download the packages.
+You can avoid reinstalling packages every time you run a new container by
+creating your own image with the packages already installed.
+
+## Customize your JupyterLab environment
+
+You can create your own JupyterLab environment and build it into an image using
+Docker. By building your own image, you can customize your JupyterLab
+environment with the packages and tools you need, and ensure that it's
+consistent and reproducible across different deployments. Building your own
+image also makes it easier to share your JupyterLab environment with others, or
+to use it as a base for further development.
+
+### Define your environment in a Dockerfile
+
+In the previous Iris Dataset example from [Save a notebook to the volume](#save-a-notebook-to-the-volume), you had to install the dependencies, `matplotlib` and `scikit-learn`, every time you ran a new container. While the dependencies in that small example download and
+install quickly, it may become a problem as your list of dependencies grow.
+There may also be other tools, packages, or files that you always want in your
+environment.
+
+In this case, you can install the dependencies as part of the environment in the
+image. Then, every time you run your container, the dependencies will always be
+installed.
+
+You can define your environment in a Dockerfile. A Dockerfile is a text file
+that instructs Docker how to create an image of your JupyterLab environment. An
+image contains everything you want and need when running JupyterLab, such as
+files, packages, and tools.
+
+In a directory of your choice, create a new text file named `Dockerfile`. Open the `Dockerfile` in an IDE or text editor and then add the following contents.
+
+```dockerfile
+# syntax=docker/dockerfile:1
+
+FROM quay.io/jupyter/base-notebook
+RUN pip install --no-cache-dir matplotlib scikit-learn
+```
+
+This Dockerfile uses the `quay.io/jupyter/base-notebook` image as the base, and then runs `pip` to install the dependencies. For more details about the instructions in the Dockerfile, see the [Dockerfile reference](/reference/dockerfile/).
+
+Before you proceed, save your changes to the `Dockerfile`.
+
+### Build your environment into an image
+
+After you have a `Dockerfile` to define your environment, you can use `docker
+build` to build an image using your `Dockerfile`.
+
+Open a terminal, change directory to the directory where your `Dockerfile` is
+located, and then run the following command.
+
+```console
+$ docker build -t my-jupyter-image .
+```
+
+The command  builds a Docker image from your `Dockerfile` and a context. The
+`-t` option specifies the name and tag of the image, in this case
+`my-jupyter-image`. The `.` indicates that the current directory is the context,
+which means that the files in that directory can be used in the image creation
+process.
+
+You can verify that the image was built by viewing the `Images` view in Docker Desktop, or by running the `docker image ls` command in a terminal. You should see an image named `my-jupyter-image`.
+
+## Run your image as a container
+
+To run your image as a container, you use the `docker run` command. In the
+`docker run` command, you'll specify your own image name.
+
+```console
+$ docker run --rm -p 8889:8888 my-jupyter-image start-notebook.py --NotebookApp.token='my-token'
+```
+
+To access the container, in a web browser navigate to
+[localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token).
+
+You can now use the packages without having to install them in your notebook.
+
+1. In the **Launcher**, under **Notebook**, select **Python 3**.
+
+2. In the notebook, specify the following code.
+
+   ```python
+   from sklearn import datasets
+
+   iris = datasets.load_iris()
+   import matplotlib.pyplot as plt
+
+   _, ax = plt.subplots()
+   scatter = ax.scatter(iris.data[:, 0], iris.data[:, 1], c=iris.target)
+   ax.set(xlabel=iris.feature_names[0], ylabel=iris.feature_names[1])
+   _ = ax.legend(
+      scatter.legend_elements()[0], iris.target_names, loc="lower right", title="Classes"
+   )
+   ```
+
+3. Select the play button to run the code. You should see a scatter plot of the Iris dataset.
+
+In the terminal, press `ctrl`+ `c` to stop the container.
+
+## Use Compose to run your container
+
+Docker Compose is a tool for defining and running multi-container applications.
+In this case, the application isn't a multi-container application, but Docker
+Compose can make it easier to run by defining all the `docker run` options in a
+file.
+
+### Create a Compose file
+
+To use Compose, you need a `compose.yaml` file. In the same directory as your
+`Dockerfile`, create a new file named `compose.yaml`.
+
+Open the `compose.yaml` file in an IDE or text editor and add the following
+contents.
+
+```yaml
+services:
+  jupyter:
+    build:
+      context: .
+    ports:
+      - 8889:8888
+    volumes:
+      - jupyter-data:/home/jovyan/work
+    command: start-notebook.py --NotebookApp.token='my-token'
+
+volumes:
+  jupyter-data:
+    name: jupyter-data
+```
+
+This Compose file specifies all the options you used in the `docker run` command. For more details about the Compose instructions, see the
+[Compose file reference](../../../compose/compose-file/_index.md).
+
+Before you proceed, save your changes to the `compose.yaml` file.
+
+### Run your container using Compose
+
+Open a terminal, change directory to where your `compose.yaml` file is located, and then run the following command.
+
+```console
+$ docker compose up --build
+```
+
+This command builds your image and runs it as a container using the instructions
+specified in the `compose.yaml` file. The `--build` option ensures that your
+image is rebuilt, which is necessary if you made changes to your `Dockefile`.
+
+To access the container, in a web browser navigate to
+[localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token).
+
+In the terminal, press `ctrl`+ `c` to stop the container.
+
+## Share your work
+
+By sharing your image and notebook, you create a portable and replicable
+research environment that can be easily accessed and used by other data
+scientists. This process not only facilitates collaboration but also ensures
+that your work is preserved in an environment where it can be run without
+compatibility issues.
+
+To share your image and data, you'll use [Docker Hub](https://hub.docker.com/). Docker Hub is a cloud-based registry service that lets you share and distribute container images.
+
+### Share your image
+
+1. [Sign up](https://www.docker.com/pricing?utm_source=docker&utm_medium=webreferral&utm_campaign=docs_driven_upgrade) or sign in to [Docker Hub](https://hub.docker.com).
+
+2. Rename your image so that Docker knows which repository to push it to. Open a
+   terminal and run the following `docker tag` command. Replace `YOUR-USER-NAME`
+   with your Docker ID.
+
+   ```console
+   $ docker tag my-jupyter-image YOUR-USER-NAME/my-jupyter-image
+   ```
+
+3. Run the following `docker push` command to push the image to Docker Hub.
+   Replace `YOUR-USER-NAME` with your Docker ID.
+
+   ```console
+   $ docker push YOUR-USER-NAME/my-jupyter-image
+   ```
+
+4. Verify that you pushed the image to Docker Hub.
+   1. Go to [Docker Hub](https://hub.docker.com).
+   2. Select **Repositories**.
+   3. View the **Last pushed** time for your repository.
+
+Other users can now download and run your image using the `docker run` command. They need to replace `YOUR-USER-NAME` with your Docker ID.
+
+```console
+$ docker run --rm -p 8889:8888 YOUR-USER-NAME/my-jupyer-image start-notebook.py --NotebookApp.token='my-token'
+```
+
+### Share your volume
+
+This example uses the Docker Desktop [Volumes Backup & Share](https://hub.docker.com/extensions/docker/volumes-backup-extension) extension. Alternatively, in the CLI you can [back up the volume](/storage/volumes/#back-up-a-volume) and then [push it using the ORAS CLI](/docker-hub/oci-artifacts/#push-a-volume).
+
+1. Install the Volumes Backup & Share extension.
+   1. Open the Docker Dashboard and select **Extensions**.
+   2. Search for `Volumes Backup & Share`.
+   3. In the search results select **Install** for the extension.
+
+2. Open the **Volumes Backup & Share** extension in the Docker Dashboard.
+3. Next to the **jupyter-data** volume, select the **Export volume** icon.
+4. In the **Export content** window, select **Registry**.
+5. In the text box under **Registry**, specify your Docker ID and a name for the
+   volume. For example, `YOUR-USERNAME/jupyter-data`.
+6. Select **Export**.
+7. Verify that you exported the volume to Docker Hub.
+   1. Go to [Docker Hub](https://hub.docker.com).
+   2. Select **Repositories**.
+   3. View the **Last pushed** time for your repository.
+
+Other users can now download and import your volume. To import the volume and then run it with your image:
+
+1. In the Volumes Backup & Share extension, select **Import into new volume**.
+2. In the **Import into a new volume** window, select **Registry**.
+3. In the text box under **Registry**, specify your Docker ID and the repository
+   name for the volume. For example, `YOUR-USERNAME/jupyter-data`.
+4. In **Volume name**, specify the name you want to give the
+   volume. This example uses `jupyter-data` as the name.
+5. Select **Import**.
+6. In a terminal, run `docker run` to run your image with the imported volume.
+   Replace `YOUR-USER-NAME` with your Docker ID.
+
+   ```console
+   $ docker run --rm -p 8889:8888 -v jupyter-data:/home/jovyan/work YOUR-USER-NAME/my-jupyter-image start-notebook.py --NotebookApp.token='my-token'
+   ```
+
+## Summary
+
+In this guide, you learned how to leverage Docker and JupyterLab to create
+reproducible data science environments, facilitating the development and sharing
+of data science projects. This included, running a personal JupyterLab server,
+customizing the environment with necessary tools and packages, and sharing
+notebooks and environments with other data scientists.
+
+Related information:
+
+- [Dockerfile reference](/reference/dockerfile/)
+- [Compose file reference](/compose/compose-file/)
+- [Docker CLI reference](reference/cli/docker/)
+- [Jupyter Docker Stacks docs](https://jupyter-docker-stacks.readthedocs.io/en/latest/)
\ No newline at end of file
diff --git a/data/toc.yaml b/data/toc.yaml
index 6b100f4ffc..a09575ac80 100644
--- a/data/toc.yaml
+++ b/data/toc.yaml
@@ -179,6 +179,9 @@ Guides:
         title: Text summarization
     - path: /scout/guides/vex/
       title: Suppress CVEs with VEX
+    - path: /guides/use-case/jupyter/
+      title: Data science with JupyterLab
+
 
 - sectiontitle: Develop with Docker
   section: