docs/content/guides/jupyter.md

437 lines
17 KiB
Markdown

---
description: Run, develop, and share data science projects using JupyterLab and Docker
keywords: getting started, jupyter, notebook, python, jupyterlab, data science
title: Data science with JupyterLab
toc_max: 2
summary: |
Use Docker to run Jupyter notebooks.
tags: [data-science]
languages: [python]
aliases:
- /guides/use-case/jupyter/
params:
time: 20 minutes
---
Docker and JupyterLab are two powerful tools that can enhance your data science
workflow. In this guide, you will learn how to use them together to create and
run reproducible data science environments. This guide is based on
[Supercharging AI/ML Development with JupyterLab and
Docker](https://www.docker.com/blog/supercharging-ai-ml-development-with-jupyterlab-and-docker/).
In this guide, you'll learn how to:
- Run a personal Jupyter Server with JupyterLab on your local machine
- Customize your JupyterLab environment
- Share your JupyterLab notebook and environment with other data scientists
## What is JupyterLab?
[JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) is an open source application built around the concept of a computational notebook document. It enables sharing and executing code, data processing, visualization, and offers a range of interactive features for creating graphs.
## Why use Docker and JupyterLab together?
By combining Docker and JupyterLab, you can benefit from the advantages of both tools, such as:
- Containerization ensures a consistent JupyterLab environment across all
deployments, eliminating compatibility issues.
- Containerized JupyterLab simplifies sharing and collaboration by removing the
need for manual environment setup.
- Containers offer scalability for JupyterLab, supporting workload distribution
and efficient resource management with platforms like Kubernetes.
## Prerequisites
To follow along with this guide, you must install the latest version of [Docker Desktop](/get-started/get-docker.md).
## Run and access a JupyterLab container
In a terminal, run the following command to run your JupyterLab container.
```console
$ docker run --rm -p 8889:8888 quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
```
The following are the notable parts of the command:
- `-p 8889:8888`: Maps port 8889 from the host to port 8888 on the container.
- `start-notebook.py --NotebookApp.token='my-token'`: Sets an access token
rather than using a random token.
For more details, see the [Jupyter Server Options](https://jupyter-docker-stacks.readthedocs.io/en/latest/using/common.html#jupyter-server-options) and the [docker run CLI reference](/reference/cli/docker/container/run/).
If this is the first time you are running the image, Docker will download and
run it. The amount of time it takes to download the image will vary depending on
your network connection.
After the image downloads and runs, you can access the container. To access the
container, in a web browser navigate to
[localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token).
To stop the container, in the terminal press `ctrl`+`c`.
To access an existing notebook on your system, you can use a
[bind mount](/storage/bind-mounts/). Open a terminal and
change directory to where your existing notebook is. Then,
run the following command based on your operating system.
{{< tabs >}}
{{< tab name="Mac / Linux" >}}
```console
$ docker run --rm -p 8889:8888 -v "$(pwd):/home/jovyan/work" quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
```
{{< /tab >}}
{{< tab name="Windows (Command Prompt)" >}}
```console
$ docker run --rm -p 8889:8888 -v "%cd%":/home/jovyan/work quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
```
{{< /tab >}}
{{< tab name="Windows (PowerShell)" >}}
```console
$ docker run --rm -p 8889:8888 -v "$(pwd):/home/jovyan/work" quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
```
{{< /tab >}}
{{< tab name="Windows (Git Bash)" >}}
```console
$ docker run --rm -p 8889:8888 -v "/$(pwd):/home/jovyan/work" quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
```
{{< /tab >}}
{{< /tabs >}}
The `-v` option tells Docker to mount your current working directory to
`/home/jovyan/work` inside the container. By default, the Jupyter image's root
directory is `/home/jovyan` and you can only access or save notebooks to that
directory in the container.
Now you can access [localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token) and open notebooks contained in the bind mounted directory.
To stop the container, in the terminal press `ctrl`+`c`.
Docker also has volumes, which are the preferred mechanism for persisting
data generated by and used by Docker containers. While bind mounts are dependent
on the directory structure and OS of the host machine, volumes are completely
managed by Docker.
## Save and access notebooks
When you remove a container, all data in that container is deleted. To save
notebooks outside of the container, you can use a [volume](/engine/storage/volumes/).
### Run a JupyterLab container with a volume
To start the container with a volume, open a terminal and run the following command
```console
$ docker run --rm -p 8889:8888 -v jupyter-data:/home/jovyan/work quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
```
The `-v` option tells Docker to create a volume named `jupyter-data` and mount it in the container at `/home/jovyan/work`.
To access the container, in a web browser navigate to
[localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token).
Notebooks can now be saved to the volume and will accessible even when
the container is deleted.
### Save a notebook to the volume
For this example, you'll use the [Iris Dataset](https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html) example from scikit-learn.
1. Open a web browser and access your JupyterLab container at [localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token).
2. In the **Launcher**, under **Notebook**, select **Python 3**.
3. In the notebook, specify the following to install the necessary packages.
```console
!pip install matplotlib scikit-learn
```
4. Select the play button to run the code.
5. In the notebook, specify the following code.
```python
from sklearn import datasets
iris = datasets.load_iris()
import matplotlib.pyplot as plt
_, ax = plt.subplots()
scatter = ax.scatter(iris.data[:, 0], iris.data[:, 1], c=iris.target)
ax.set(xlabel=iris.feature_names[0], ylabel=iris.feature_names[1])
_ = ax.legend(
scatter.legend_elements()[0], iris.target_names, loc="lower right", title="Classes"
)
```
6. Select the play button to run the code. You should see a scatter plot of the
Iris dataset.
7. In the top menu, select **File** and then **Save Notebook**.
8. Specify a name in the `work` directory to save the notebook to the volume.
For example, `work/mynotebook.ipynb`.
9. Select **Rename** to save the notebook.
The notebook is now saved in the volume.
In the terminal, press `ctrl`+ `c` to stop the container.
Now, any time you run a Jupyter container with the volume, you'll have access to the saved notebook.
When you do run a new container, and then run the data plot code again, it'll
need to run `!pip install matplotlib scikit-learn` and download the packages.
You can avoid reinstalling packages every time you run a new container by
creating your own image with the packages already installed.
## Customize your JupyterLab environment
You can create your own JupyterLab environment and build it into an image using
Docker. By building your own image, you can customize your JupyterLab
environment with the packages and tools you need, and ensure that it's
consistent and reproducible across different deployments. Building your own
image also makes it easier to share your JupyterLab environment with others, or
to use it as a base for further development.
### Define your environment in a Dockerfile
In the previous Iris Dataset example from [Save a notebook to the volume](#save-a-notebook-to-the-volume), you had to install the dependencies, `matplotlib` and `scikit-learn`, every time you ran a new container. While the dependencies in that small example download and
install quickly, it may become a problem as your list of dependencies grow.
There may also be other tools, packages, or files that you always want in your
environment.
In this case, you can install the dependencies as part of the environment in the
image. Then, every time you run your container, the dependencies will always be
installed.
You can define your environment in a Dockerfile. A Dockerfile is a text file
that instructs Docker how to create an image of your JupyterLab environment. An
image contains everything you want and need when running JupyterLab, such as
files, packages, and tools.
In a directory of your choice, create a new text file named `Dockerfile`. Open the `Dockerfile` in an IDE or text editor and then add the following contents.
```dockerfile
# syntax=docker/dockerfile:1
FROM quay.io/jupyter/base-notebook
RUN pip install --no-cache-dir matplotlib scikit-learn
```
This Dockerfile uses the `quay.io/jupyter/base-notebook` image as the base, and then runs `pip` to install the dependencies. For more details about the instructions in the Dockerfile, see the [Dockerfile reference](/reference/dockerfile/).
Before you proceed, save your changes to the `Dockerfile`.
### Build your environment into an image
After you have a `Dockerfile` to define your environment, you can use `docker
build` to build an image using your `Dockerfile`.
Open a terminal, change directory to the directory where your `Dockerfile` is
located, and then run the following command.
```console
$ docker build -t my-jupyter-image .
```
The command builds a Docker image from your `Dockerfile` and a context. The
`-t` option specifies the name and tag of the image, in this case
`my-jupyter-image`. The `.` indicates that the current directory is the context,
which means that the files in that directory can be used in the image creation
process.
You can verify that the image was built by viewing the `Images` view in Docker Desktop, or by running the `docker image ls` command in a terminal. You should see an image named `my-jupyter-image`.
## Run your image as a container
To run your image as a container, you use the `docker run` command. In the
`docker run` command, you'll specify your own image name.
```console
$ docker run --rm -p 8889:8888 my-jupyter-image start-notebook.py --NotebookApp.token='my-token'
```
To access the container, in a web browser navigate to
[localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token).
You can now use the packages without having to install them in your notebook.
1. In the **Launcher**, under **Notebook**, select **Python 3**.
2. In the notebook, specify the following code.
```python
from sklearn import datasets
iris = datasets.load_iris()
import matplotlib.pyplot as plt
_, ax = plt.subplots()
scatter = ax.scatter(iris.data[:, 0], iris.data[:, 1], c=iris.target)
ax.set(xlabel=iris.feature_names[0], ylabel=iris.feature_names[1])
_ = ax.legend(
scatter.legend_elements()[0], iris.target_names, loc="lower right", title="Classes"
)
```
3. Select the play button to run the code. You should see a scatter plot of the Iris dataset.
In the terminal, press `ctrl`+ `c` to stop the container.
## Use Compose to run your container
Docker Compose is a tool for defining and running multi-container applications.
In this case, the application isn't a multi-container application, but Docker
Compose can make it easier to run by defining all the `docker run` options in a
file.
### Create a Compose file
To use Compose, you need a `compose.yaml` file. In the same directory as your
`Dockerfile`, create a new file named `compose.yaml`.
Open the `compose.yaml` file in an IDE or text editor and add the following
contents.
```yaml
services:
jupyter:
build:
context: .
ports:
- 8889:8888
volumes:
- jupyter-data:/home/jovyan/work
command: start-notebook.py --NotebookApp.token='my-token'
volumes:
jupyter-data:
name: jupyter-data
```
This Compose file specifies all the options you used in the `docker run` command. For more details about the Compose instructions, see the
[Compose file reference](/reference/compose-file/_index.md).
Before you proceed, save your changes to the `compose.yaml` file.
### Run your container using Compose
Open a terminal, change directory to where your `compose.yaml` file is located, and then run the following command.
```console
$ docker compose up --build
```
This command builds your image and runs it as a container using the instructions
specified in the `compose.yaml` file. The `--build` option ensures that your
image is rebuilt, which is necessary if you made changes to your `Dockerfile`.
To access the container, in a web browser navigate to
[localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token).
In the terminal, press `ctrl`+ `c` to stop the container.
## Share your work
By sharing your image and notebook, you create a portable and replicable
research environment that can be easily accessed and used by other data
scientists. This process not only facilitates collaboration but also ensures
that your work is preserved in an environment where it can be run without
compatibility issues.
To share your image and data, you'll use [Docker Hub](https://hub.docker.com/). Docker Hub is a cloud-based registry service that lets you share and distribute container images.
### Share your image
1. [Sign up](https://www.docker.com/pricing?utm_source=docker&utm_medium=webreferral&utm_campaign=docs_driven_upgrade) or sign in to [Docker Hub](https://hub.docker.com).
2. Rename your image so that Docker knows which repository to push it to. Open a
terminal and run the following `docker tag` command. Replace `YOUR-USER-NAME`
with your Docker ID.
```console
$ docker tag my-jupyter-image YOUR-USER-NAME/my-jupyter-image
```
3. Run the following `docker push` command to push the image to Docker Hub.
Replace `YOUR-USER-NAME` with your Docker ID.
```console
$ docker push YOUR-USER-NAME/my-jupyter-image
```
4. Verify that you pushed the image to Docker Hub.
1. Go to [Docker Hub](https://hub.docker.com).
2. Select **Repositories**.
3. View the **Last pushed** time for your repository.
Other users can now download and run your image using the `docker run` command. They need to replace `YOUR-USER-NAME` with your Docker ID.
```console
$ docker run --rm -p 8889:8888 YOUR-USER-NAME/my-jupyter-image start-notebook.py --NotebookApp.token='my-token'
```
### Share your volume
This example uses the Docker Desktop graphical user interface. Alternatively, in the command line interface you can [back up the volume](/engine/storage/volumes/#back-up-a-volume) and then [push it using the ORAS CLI](/manuals/docker-hub/repos/manage/hub-images/oci-artifacts.md#push-a-volume).
1. Sign in to Docker Desktop.
2. In the Docker Dashboard, select **Volumes**.
3. Select the **jupyter-data** volume by selecting the name.
4. Select the **Exports** tab.
5. Select **Quick export**.
6. For **Location**, select **Registry**.
7. In the text box under **Registry**, specify your Docker ID, a name for the
volume, and a tag. For example, `YOUR-USERNAME/jupyter-data:latest`.
8. Select **Save**.
9. Verify that you exported the volume to Docker Hub.
1. Go to [Docker Hub](https://hub.docker.com).
2. Select **Repositories**.
3. View the **Last pushed** time for your repository.
Other users can now download and import your volume. To import the volume and then run it with your image:
1. Sign in to Docker Desktop.
2. In the Docker Dashboard, select **Volumes**.
3. Select **Create** to create a new volume.
4. Specify a name for the new volume. For this example, use `jupyter-data-2`.
5. Select **Create**.
6. In the list of volumes, select the **jupyter-data-2** volume by selecting the
name.
7. Select **Import**.
8. For **Location**, select **Registry**.
9. In the text box under **Registry**, specify the same name as the repository
that you exported your volume to. For example,
`YOUR-USERNAME/jupyter-data:latest`.
10. Select **Import**.
11. In a terminal, run `docker run` to run your image with the imported volume.
Replace `YOUR-USER-NAME` with your Docker ID.
```console
$ docker run --rm -p 8889:8888 -v jupyter-data-2:/home/jovyan/work YOUR-USER-NAME/my-jupyter-image start-notebook.py --NotebookApp.token='my-token'
```
## Summary
In this guide, you learned how to leverage Docker and JupyterLab to create
reproducible data science environments, facilitating the development and sharing
of data science projects. This included, running a personal JupyterLab server,
customizing the environment with necessary tools and packages, and sharing
notebooks and environments with other data scientists.
Related information:
- [Dockerfile reference](/reference/dockerfile/)
- [Compose file reference](/reference/compose-file/)
- [Docker CLI reference](reference/cli/docker/)
- [Jupyter Docker Stacks docs](https://jupyter-docker-stacks.readthedocs.io/en/latest/)