containerd/client
Adrian Reber 9e6beafd53
Support container restore through CRI/Kubernetes
This implements container restore as described in:

https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/#restore-checkpointed-container-standalone

For detailed step by step instruction also see contrib/checkpoint/checkpoint-restore-cri-test.sh

The code changes are based on changes I have done in Podman around 2018
and CRI-O around 2020.

The history behind restoring container via CRI/Kubernetes probably
requires some explanation. The initial proposal to bring
checkpoint/restore to Kubernetes was looking at pod checkpoint and
restoring and the corresponding CRI changes.

https://github.com/kubernetes-sigs/cri-tools/pull/662
https://github.com/kubernetes/kubernetes/pull/97194

After discussing this topic for about two years another approach was
implemented as described in KEP-2008:

https://github.com/kubernetes/enhancements/issues/2008

"Forensic Container Checkpointing" allowed us to separate checkpointing
from restoring. For the "Forensic Container Checkpointing" it is enough
to create a checkpoint of the container. Restoring is not necessary as
the analysis of the checkpoint archive can happen without restoring the
container.

While thinking about a way to restore a container it was by coincidence
that we started to look into restoring containers in Kubernetes via
Create and Start. The way it was done in CRI-O is to figure out during
Create if the container image is a checkpoint image and if that is true
we are using another code path. The same was implemented now with this
change in containerd.

With this change it is possible to restore the container from a
checkpoint tar archive that is created during checkpointing via CRI.

To restore a container via Kubernetes we convert the tar archive to an
OCI image as described in the kubernetes.io blog post from above. Using
this OCI image it is possible to restore a container in Kubernetes.

At this point I think it should be doable to restore containers in
CRI-O and containerd no matter if they have been created by containerd or
CRI-O. The biggest difference is the container metadata and that can
be adapted during restore.

Open items:

 * It is not clear to me why restoring a container in containerd goes
   through task/Create(). But as the restore code already exists this
   change extended the existing code path to restore a container in
   task/Create() to also restore a container through the CRI via
   Create and Start.
 * Automatic image pulling. containerd does not pull images
   automatically if created via the CRI. There is an option in
   crictl to pull images before starting, but that uses the CRI
   image pull interface. It is still a separate pull and create
   operation. Restoring containers from an OCI image is a bit
   different. The checkpoint OCI image does not include the base
   image, but just a reference to the image (NAME@DIGEST).
   Using crictl with pulling will enable the pulling of the
   checkpoint image, but not of the base image the checkpoint is
   based on. So during preparation of the checkpoint containerd
   will automatically pull the base image, but I was not able how
   to pull an image blockingly in containerd. So there is a for
   loop waiting for the container image to appear in the internal
   store. I think this probably can be implemented better.

Anyway, this is a first step towards container restored in Kubernetes
when using containerd.

Signed-off-by: Adrian Reber <areber@redhat.com>
2025-03-11 12:55:13 +01:00
..
client.go client: add WithExtraDialOpts option 2025-01-21 16:18:59 +01:00
client_opts.go client: add WithExtraDialOpts option 2025-01-21 16:18:59 +01:00
container.go Support container restore through CRI/Kubernetes 2025-03-11 12:55:13 +01:00
container_checkpoint_opts.go use typeurl funcs for marshalling anypb.Any 2024-07-10 22:26:27 +05:30
container_opts.go use typeurl funcs for marshalling anypb.Any 2024-07-10 22:26:27 +05:30
container_opts_unix.go update ctr run to support multiple uid/gid mappings 2024-09-10 17:06:27 +00:00
container_restore_opts.go Move protobuf package under pkg 2024-05-02 10:52:03 -07:00
containerstore.go Remove loop variable copies 2024-12-23 23:14:49 -07:00
diff.go Make api a Go sub-module 2024-05-02 11:03:00 -07:00
events.go Update errdefs to 0.3.0 2024-10-18 16:04:54 -07:00
export.go Move images to core/images 2024-01-17 09:51:26 -08:00
grpc.go Move namespaces to pkg/namespaces 2024-01-17 09:55:39 -08:00
image.go Switch to new errdefs package 2024-01-25 22:18:45 -08:00
image_store.go Remove loop variable copies 2024-12-23 23:14:49 -07:00
import.go Switch to new errdefs package 2024-01-25 22:18:45 -08:00
install.go Cleanup introspection interface 2024-03-01 23:07:42 -08:00
install_opts.go Move client to subpackage 2023-11-01 10:37:00 -07:00
lease.go Move leases to core/leases 2024-01-17 09:51:45 -08:00
namespaces.go Update errdefs to 0.3.0 2024-10-18 16:04:54 -07:00
process.go Update errdefs to 0.3.0 2024-10-18 16:04:54 -07:00
pull.go Disable the support for Schema 1 images 2024-02-15 11:11:35 +09:00
sandbox.go sandbox: add update api for controller 2024-06-14 02:31:51 +00:00
services.go cri: remove sandbox controller from client 2024-10-16 17:37:07 +08:00
signals.go Remove ParseSignal from client 2024-02-10 18:02:05 -08:00
snapshotter_opts_unix.go Update snapshotter opts to support multiple uid/gid mapping entries 2024-12-11 18:04:11 +00:00
snapshotter_opts_windows.go Move snapshots to core/snapshots 2024-01-17 09:54:09 -08:00
task.go Update errdefs to 0.3.0 2024-10-18 16:04:54 -07:00
task_opts.go client: fix tasks with PID 0 cannot be forced to delete 2024-07-08 17:24:58 +08:00
task_opts_unix.go Make api a Go sub-module 2024-05-02 11:03:00 -07:00
transfer.go Update transfer proxy to support ttrpc 2024-05-02 23:16:51 -07:00