mirror of https://github.com/docker/docs.git
437 lines
17 KiB
Markdown
437 lines
17 KiB
Markdown
---
|
|
description: Run, develop, and share data science projects using JupyterLab and Docker
|
|
keywords: getting started, jupyter, notebook, python, jupyterlab, data science
|
|
title: Data science with JupyterLab
|
|
toc_max: 2
|
|
summary: |
|
|
Use Docker to run Jupyter notebooks.
|
|
tags: [data-science]
|
|
languages: [python]
|
|
aliases:
|
|
- /guides/use-case/jupyter/
|
|
params:
|
|
time: 20 minutes
|
|
---
|
|
|
|
Docker and JupyterLab are two powerful tools that can enhance your data science
|
|
workflow. In this guide, you will learn how to use them together to create and
|
|
run reproducible data science environments. This guide is based on
|
|
[Supercharging AI/ML Development with JupyterLab and
|
|
Docker](https://www.docker.com/blog/supercharging-ai-ml-development-with-jupyterlab-and-docker/).
|
|
|
|
In this guide, you'll learn how to:
|
|
|
|
- Run a personal Jupyter Server with JupyterLab on your local machine
|
|
- Customize your JupyterLab environment
|
|
- Share your JupyterLab notebook and environment with other data scientists
|
|
|
|
## What is JupyterLab?
|
|
|
|
[JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) is an open source application built around the concept of a computational notebook document. It enables sharing and executing code, data processing, visualization, and offers a range of interactive features for creating graphs.
|
|
|
|
## Why use Docker and JupyterLab together?
|
|
|
|
By combining Docker and JupyterLab, you can benefit from the advantages of both tools, such as:
|
|
|
|
- Containerization ensures a consistent JupyterLab environment across all
|
|
deployments, eliminating compatibility issues.
|
|
- Containerized JupyterLab simplifies sharing and collaboration by removing the
|
|
need for manual environment setup.
|
|
- Containers offer scalability for JupyterLab, supporting workload distribution
|
|
and efficient resource management with platforms like Kubernetes.
|
|
|
|
## Prerequisites
|
|
|
|
To follow along with this guide, you must install the latest version of [Docker Desktop](/get-started/get-docker.md).
|
|
|
|
## Run and access a JupyterLab container
|
|
|
|
In a terminal, run the following command to run your JupyterLab container.
|
|
|
|
```console
|
|
$ docker run --rm -p 8889:8888 quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
|
|
```
|
|
|
|
The following are the notable parts of the command:
|
|
|
|
- `-p 8889:8888`: Maps port 8889 from the host to port 8888 on the container.
|
|
- `start-notebook.py --NotebookApp.token='my-token'`: Sets an access token
|
|
rather than using a random token.
|
|
|
|
For more details, see the [Jupyter Server Options](https://jupyter-docker-stacks.readthedocs.io/en/latest/using/common.html#jupyter-server-options) and the [docker run CLI reference](/reference/cli/docker/container/run/).
|
|
|
|
If this is the first time you are running the image, Docker will download and
|
|
run it. The amount of time it takes to download the image will vary depending on
|
|
your network connection.
|
|
|
|
After the image downloads and runs, you can access the container. To access the
|
|
container, in a web browser navigate to
|
|
[localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token).
|
|
|
|
To stop the container, in the terminal press `ctrl`+`c`.
|
|
|
|
To access an existing notebook on your system, you can use a
|
|
[bind mount](/storage/bind-mounts/). Open a terminal and
|
|
change directory to where your existing notebook is. Then,
|
|
run the following command based on your operating system.
|
|
|
|
{{< tabs >}}
|
|
{{< tab name="Mac / Linux" >}}
|
|
|
|
```console
|
|
$ docker run --rm -p 8889:8888 -v "$(pwd):/home/jovyan/work" quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
|
|
```
|
|
|
|
{{< /tab >}}
|
|
{{< tab name="Windows (Command Prompt)" >}}
|
|
|
|
```console
|
|
$ docker run --rm -p 8889:8888 -v "%cd%":/home/jovyan/work quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
|
|
```
|
|
|
|
{{< /tab >}}
|
|
{{< tab name="Windows (PowerShell)" >}}
|
|
|
|
```console
|
|
$ docker run --rm -p 8889:8888 -v "$(pwd):/home/jovyan/work" quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
|
|
```
|
|
|
|
{{< /tab >}}
|
|
{{< tab name="Windows (Git Bash)" >}}
|
|
|
|
```console
|
|
$ docker run --rm -p 8889:8888 -v "/$(pwd):/home/jovyan/work" quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
|
|
```
|
|
|
|
{{< /tab >}}
|
|
{{< /tabs >}}
|
|
|
|
The `-v` option tells Docker to mount your current working directory to
|
|
`/home/jovyan/work` inside the container. By default, the Jupyter image's root
|
|
directory is `/home/jovyan` and you can only access or save notebooks to that
|
|
directory in the container.
|
|
|
|
Now you can access [localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token) and open notebooks contained in the bind mounted directory.
|
|
|
|
To stop the container, in the terminal press `ctrl`+`c`.
|
|
|
|
Docker also has volumes, which are the preferred mechanism for persisting
|
|
data generated by and used by Docker containers. While bind mounts are dependent
|
|
on the directory structure and OS of the host machine, volumes are completely
|
|
managed by Docker.
|
|
|
|
## Save and access notebooks
|
|
|
|
When you remove a container, all data in that container is deleted. To save
|
|
notebooks outside of the container, you can use a [volume](/engine/storage/volumes/).
|
|
|
|
### Run a JupyterLab container with a volume
|
|
|
|
To start the container with a volume, open a terminal and run the following command
|
|
|
|
```console
|
|
$ docker run --rm -p 8889:8888 -v jupyter-data:/home/jovyan/work quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
|
|
```
|
|
|
|
The `-v` option tells Docker to create a volume named `jupyter-data` and mount it in the container at `/home/jovyan/work`.
|
|
|
|
To access the container, in a web browser navigate to
|
|
[localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token).
|
|
Notebooks can now be saved to the volume and will accessible even when
|
|
the container is deleted.
|
|
|
|
### Save a notebook to the volume
|
|
|
|
For this example, you'll use the [Iris Dataset](https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html) example from scikit-learn.
|
|
|
|
1. Open a web browser and access your JupyterLab container at [localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token).
|
|
|
|
2. In the **Launcher**, under **Notebook**, select **Python 3**.
|
|
|
|
3. In the notebook, specify the following to install the necessary packages.
|
|
|
|
```console
|
|
!pip install matplotlib scikit-learn
|
|
```
|
|
|
|
4. Select the play button to run the code.
|
|
|
|
5. In the notebook, specify the following code.
|
|
|
|
```python
|
|
from sklearn import datasets
|
|
|
|
iris = datasets.load_iris()
|
|
import matplotlib.pyplot as plt
|
|
|
|
_, ax = plt.subplots()
|
|
scatter = ax.scatter(iris.data[:, 0], iris.data[:, 1], c=iris.target)
|
|
ax.set(xlabel=iris.feature_names[0], ylabel=iris.feature_names[1])
|
|
_ = ax.legend(
|
|
scatter.legend_elements()[0], iris.target_names, loc="lower right", title="Classes"
|
|
)
|
|
```
|
|
|
|
6. Select the play button to run the code. You should see a scatter plot of the
|
|
Iris dataset.
|
|
|
|
7. In the top menu, select **File** and then **Save Notebook**.
|
|
|
|
8. Specify a name in the `work` directory to save the notebook to the volume.
|
|
For example, `work/mynotebook.ipynb`.
|
|
|
|
9. Select **Rename** to save the notebook.
|
|
|
|
The notebook is now saved in the volume.
|
|
|
|
In the terminal, press `ctrl`+ `c` to stop the container.
|
|
|
|
Now, any time you run a Jupyter container with the volume, you'll have access to the saved notebook.
|
|
|
|
When you do run a new container, and then run the data plot code again, it'll
|
|
need to run `!pip install matplotlib scikit-learn` and download the packages.
|
|
You can avoid reinstalling packages every time you run a new container by
|
|
creating your own image with the packages already installed.
|
|
|
|
## Customize your JupyterLab environment
|
|
|
|
You can create your own JupyterLab environment and build it into an image using
|
|
Docker. By building your own image, you can customize your JupyterLab
|
|
environment with the packages and tools you need, and ensure that it's
|
|
consistent and reproducible across different deployments. Building your own
|
|
image also makes it easier to share your JupyterLab environment with others, or
|
|
to use it as a base for further development.
|
|
|
|
### Define your environment in a Dockerfile
|
|
|
|
In the previous Iris Dataset example from [Save a notebook to the volume](#save-a-notebook-to-the-volume), you had to install the dependencies, `matplotlib` and `scikit-learn`, every time you ran a new container. While the dependencies in that small example download and
|
|
install quickly, it may become a problem as your list of dependencies grow.
|
|
There may also be other tools, packages, or files that you always want in your
|
|
environment.
|
|
|
|
In this case, you can install the dependencies as part of the environment in the
|
|
image. Then, every time you run your container, the dependencies will always be
|
|
installed.
|
|
|
|
You can define your environment in a Dockerfile. A Dockerfile is a text file
|
|
that instructs Docker how to create an image of your JupyterLab environment. An
|
|
image contains everything you want and need when running JupyterLab, such as
|
|
files, packages, and tools.
|
|
|
|
In a directory of your choice, create a new text file named `Dockerfile`. Open the `Dockerfile` in an IDE or text editor and then add the following contents.
|
|
|
|
```dockerfile
|
|
# syntax=docker/dockerfile:1
|
|
|
|
FROM quay.io/jupyter/base-notebook
|
|
RUN pip install --no-cache-dir matplotlib scikit-learn
|
|
```
|
|
|
|
This Dockerfile uses the `quay.io/jupyter/base-notebook` image as the base, and then runs `pip` to install the dependencies. For more details about the instructions in the Dockerfile, see the [Dockerfile reference](/reference/dockerfile/).
|
|
|
|
Before you proceed, save your changes to the `Dockerfile`.
|
|
|
|
### Build your environment into an image
|
|
|
|
After you have a `Dockerfile` to define your environment, you can use `docker
|
|
build` to build an image using your `Dockerfile`.
|
|
|
|
Open a terminal, change directory to the directory where your `Dockerfile` is
|
|
located, and then run the following command.
|
|
|
|
```console
|
|
$ docker build -t my-jupyter-image .
|
|
```
|
|
|
|
The command builds a Docker image from your `Dockerfile` and a context. The
|
|
`-t` option specifies the name and tag of the image, in this case
|
|
`my-jupyter-image`. The `.` indicates that the current directory is the context,
|
|
which means that the files in that directory can be used in the image creation
|
|
process.
|
|
|
|
You can verify that the image was built by viewing the `Images` view in Docker Desktop, or by running the `docker image ls` command in a terminal. You should see an image named `my-jupyter-image`.
|
|
|
|
## Run your image as a container
|
|
|
|
To run your image as a container, you use the `docker run` command. In the
|
|
`docker run` command, you'll specify your own image name.
|
|
|
|
```console
|
|
$ docker run --rm -p 8889:8888 my-jupyter-image start-notebook.py --NotebookApp.token='my-token'
|
|
```
|
|
|
|
To access the container, in a web browser navigate to
|
|
[localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token).
|
|
|
|
You can now use the packages without having to install them in your notebook.
|
|
|
|
1. In the **Launcher**, under **Notebook**, select **Python 3**.
|
|
|
|
2. In the notebook, specify the following code.
|
|
|
|
```python
|
|
from sklearn import datasets
|
|
|
|
iris = datasets.load_iris()
|
|
import matplotlib.pyplot as plt
|
|
|
|
_, ax = plt.subplots()
|
|
scatter = ax.scatter(iris.data[:, 0], iris.data[:, 1], c=iris.target)
|
|
ax.set(xlabel=iris.feature_names[0], ylabel=iris.feature_names[1])
|
|
_ = ax.legend(
|
|
scatter.legend_elements()[0], iris.target_names, loc="lower right", title="Classes"
|
|
)
|
|
```
|
|
|
|
3. Select the play button to run the code. You should see a scatter plot of the Iris dataset.
|
|
|
|
In the terminal, press `ctrl`+ `c` to stop the container.
|
|
|
|
## Use Compose to run your container
|
|
|
|
Docker Compose is a tool for defining and running multi-container applications.
|
|
In this case, the application isn't a multi-container application, but Docker
|
|
Compose can make it easier to run by defining all the `docker run` options in a
|
|
file.
|
|
|
|
### Create a Compose file
|
|
|
|
To use Compose, you need a `compose.yaml` file. In the same directory as your
|
|
`Dockerfile`, create a new file named `compose.yaml`.
|
|
|
|
Open the `compose.yaml` file in an IDE or text editor and add the following
|
|
contents.
|
|
|
|
```yaml
|
|
services:
|
|
jupyter:
|
|
build:
|
|
context: .
|
|
ports:
|
|
- 8889:8888
|
|
volumes:
|
|
- jupyter-data:/home/jovyan/work
|
|
command: start-notebook.py --NotebookApp.token='my-token'
|
|
|
|
volumes:
|
|
jupyter-data:
|
|
name: jupyter-data
|
|
```
|
|
|
|
This Compose file specifies all the options you used in the `docker run` command. For more details about the Compose instructions, see the
|
|
[Compose file reference](/reference/compose-file/_index.md).
|
|
|
|
Before you proceed, save your changes to the `compose.yaml` file.
|
|
|
|
### Run your container using Compose
|
|
|
|
Open a terminal, change directory to where your `compose.yaml` file is located, and then run the following command.
|
|
|
|
```console
|
|
$ docker compose up --build
|
|
```
|
|
|
|
This command builds your image and runs it as a container using the instructions
|
|
specified in the `compose.yaml` file. The `--build` option ensures that your
|
|
image is rebuilt, which is necessary if you made changes to your `Dockerfile`.
|
|
|
|
To access the container, in a web browser navigate to
|
|
[localhost:8889/lab?token=my-token](http://localhost:8889/lab?token=my-token).
|
|
|
|
In the terminal, press `ctrl`+ `c` to stop the container.
|
|
|
|
## Share your work
|
|
|
|
By sharing your image and notebook, you create a portable and replicable
|
|
research environment that can be easily accessed and used by other data
|
|
scientists. This process not only facilitates collaboration but also ensures
|
|
that your work is preserved in an environment where it can be run without
|
|
compatibility issues.
|
|
|
|
To share your image and data, you'll use [Docker Hub](https://hub.docker.com/). Docker Hub is a cloud-based registry service that lets you share and distribute container images.
|
|
|
|
### Share your image
|
|
|
|
1. [Sign up](https://www.docker.com/pricing?utm_source=docker&utm_medium=webreferral&utm_campaign=docs_driven_upgrade) or sign in to [Docker Hub](https://hub.docker.com).
|
|
|
|
2. Rename your image so that Docker knows which repository to push it to. Open a
|
|
terminal and run the following `docker tag` command. Replace `YOUR-USER-NAME`
|
|
with your Docker ID.
|
|
|
|
```console
|
|
$ docker tag my-jupyter-image YOUR-USER-NAME/my-jupyter-image
|
|
```
|
|
|
|
3. Run the following `docker push` command to push the image to Docker Hub.
|
|
Replace `YOUR-USER-NAME` with your Docker ID.
|
|
|
|
```console
|
|
$ docker push YOUR-USER-NAME/my-jupyter-image
|
|
```
|
|
|
|
4. Verify that you pushed the image to Docker Hub.
|
|
1. Go to [Docker Hub](https://hub.docker.com).
|
|
2. Select **Repositories**.
|
|
3. View the **Last pushed** time for your repository.
|
|
|
|
Other users can now download and run your image using the `docker run` command. They need to replace `YOUR-USER-NAME` with your Docker ID.
|
|
|
|
```console
|
|
$ docker run --rm -p 8889:8888 YOUR-USER-NAME/my-jupyter-image start-notebook.py --NotebookApp.token='my-token'
|
|
```
|
|
|
|
### Share your volume
|
|
|
|
This example uses the Docker Desktop graphical user interface. Alternatively, in the command line interface you can [back up the volume](/engine/storage/volumes/#back-up-a-volume) and then [push it using the ORAS CLI](/manuals/docker-hub/repos/manage/hub-images/oci-artifacts.md#push-a-volume).
|
|
|
|
1. Sign in to Docker Desktop.
|
|
2. In the Docker Dashboard, select **Volumes**.
|
|
3. Select the **jupyter-data** volume by selecting the name.
|
|
4. Select the **Exports** tab.
|
|
5. Select **Quick export**.
|
|
6. For **Location**, select **Registry**.
|
|
7. In the text box under **Registry**, specify your Docker ID, a name for the
|
|
volume, and a tag. For example, `YOUR-USERNAME/jupyter-data:latest`.
|
|
8. Select **Save**.
|
|
9. Verify that you exported the volume to Docker Hub.
|
|
1. Go to [Docker Hub](https://hub.docker.com).
|
|
2. Select **Repositories**.
|
|
3. View the **Last pushed** time for your repository.
|
|
|
|
Other users can now download and import your volume. To import the volume and then run it with your image:
|
|
|
|
1. Sign in to Docker Desktop.
|
|
2. In the Docker Dashboard, select **Volumes**.
|
|
3. Select **Create** to create a new volume.
|
|
4. Specify a name for the new volume. For this example, use `jupyter-data-2`.
|
|
5. Select **Create**.
|
|
6. In the list of volumes, select the **jupyter-data-2** volume by selecting the
|
|
name.
|
|
7. Select **Import**.
|
|
8. For **Location**, select **Registry**.
|
|
9. In the text box under **Registry**, specify the same name as the repository
|
|
that you exported your volume to. For example,
|
|
`YOUR-USERNAME/jupyter-data:latest`.
|
|
10. Select **Import**.
|
|
11. In a terminal, run `docker run` to run your image with the imported volume.
|
|
Replace `YOUR-USER-NAME` with your Docker ID.
|
|
|
|
```console
|
|
$ docker run --rm -p 8889:8888 -v jupyter-data-2:/home/jovyan/work YOUR-USER-NAME/my-jupyter-image start-notebook.py --NotebookApp.token='my-token'
|
|
```
|
|
|
|
## Summary
|
|
|
|
In this guide, you learned how to leverage Docker and JupyterLab to create
|
|
reproducible data science environments, facilitating the development and sharing
|
|
of data science projects. This included, running a personal JupyterLab server,
|
|
customizing the environment with necessary tools and packages, and sharing
|
|
notebooks and environments with other data scientists.
|
|
|
|
Related information:
|
|
|
|
- [Dockerfile reference](/reference/dockerfile/)
|
|
- [Compose file reference](/reference/compose-file/)
|
|
- [Docker CLI reference](reference/cli/docker/)
|
|
- [Jupyter Docker Stacks docs](https://jupyter-docker-stacks.readthedocs.io/en/latest/)
|