Fix https://github.com/kubeflow/kubeflow/issues/6366
Migrating to Kubebuilder v3 leads to the following changes:
- Add .dockerignore file.
- Upgrade Go version from v1.15 to v1.17.
- Adapt Makefile.
- Add image (build + push) target to makefile.
- Upgrade EnvTest to use K8s v1.22.
- Update PROJECT template.
- Migrate CRD apiVersion from v1beta to v1.
- Add livenessProbe and readinessProbe to controller manager.
- Upgrade controller-runtime from v0.2.0 to v0.11.0.
Other changes:
- Build image using public.ecr.aws registry instead of gcr.io.
- Update README.md documentation.
- Update 3rd party licences.
- Fix notebook.spec description.
- Add 3 sample notebooks (v1, v1alpha1 and v1beta1).
Signed-off-by: Samuel Veloso <svelosol@redhat.com>
The controller should not trigger the reconcile loop when an Event is
deleted. Previously the controller would run the reconciliation loop on
any event deletion.
This commit updates it to not run the loop for ANY event.
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* notebooks: Update notebook if timestamp changed
We don't want to be updating the spec of the notebook if the timestamp
hasn't changed, since this will lead to constant updates and
reconciliation loops.
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* notebooks: Use a deep-copy of the notebook spec
The controller should use a deep-copy of the notebook spec when
calculating the spec for the StatefulSet. If not then we could
update the notebook object without wanting it, since the spec could have
been changed when calculating the STS spec.
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* notebooks: Add prefix env var only if missing
The controller should be setting OR updating the NB_PREFIX env var.
Previously it would always blindly append it to the spec, which could
result in double entries for the same env var.
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* notebooks: Handle events gracefully
The controller is not exiting the reconciliation loop after it has
re-emitted a Pod/STS Event as a Notebook Event. This results in the
controller to later on try and GET a Notebook with the name of the Event
that triggered the reconciliation loop.
The controller should exit the reconciliation function once it has
emitted the event.
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* notebooks: Don't reconcile on deleted events
We don't want to trigger the reconciliation function when an event gets
deleted.
If a Notebook would be deleted then the underlying events would
be deleted as well, which results in the reconcile function to get
triggered and try to GET Events and Notebooks with the name of the
deleted event.
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* notebooks: Update image's tag in make
Modify Makefile to update properly the TAG
based on the git TAG.
Signed-off-by: Athanasios Markou <athamark@arrikto.com>
Reviewed-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* notebooks: Expose last-activity
Extend the notebook-controller to:
* cull idle Notebook Servers based on their new `last-activity`
annotation
* expose the last activity of each Notebook Server as an annotation
on the metadata of the corresponding CR object
Modify notebook_controller.go to:
* update the Last Activity of each Notebook Server that has a
Running pod
* delete the Last Activity Annotation for every Notebook Server
that does not have a Running pod
Extend culler.go to:
* perform culling based on the new `last-activity` annotation and
not based on the `/api/status` endpoint.
* update the last activity of a Notebook Server, based on the
kernels' execution states.
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
Reviewed-by: Athanasios Markou <athamark@arrikto.com>
* notebooks: Introduce a DEV env var
We introduce a DEV ENV var to allow admins
develop and test on their local machine their
custom Notebook Controller.
We provide information and instructions inside
the components/notebook-controller/README.md.
Signed-off-by: Athanasios Markou <athamark@arrikto.com>
Reviewed-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* notebooks: Add unit tests for last-activity
* Introduce new tests for allKernelsAreIdle()
* Extend the tests for NotebookIsIdle() and for
NotebookNeedsCulling().
Signed-off-by: Athanasios Markou <athamark@arrikto.com>
Reviewed-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* review: UpdateNotebookLastActivityAnnotation()
Ensure that UpdateNotebookLastActivityAnnotation() does not return
"true". This function should not return any value.
Signed-off-by: Athanasios Markou <athamark@arrikto.com>
In the existing version, the 'timeout: 300s' added to the notebook's virtual service would cause websockets to disconnect at the 5 minute mark, causing the Jupyter Notebook web terminal function to hang. This is described in https://github.com/kubeflow/kubeflow/issues/6124.
* Correct ContainerStatus of Notebook CR
The Notebook Controller doesn't set the State of the CR correctly. In some cases
the first container is the istio-sidecar which results in an incorrect state being
shown to the Notebook CR. This is fix now by showing the Notebook container
ContainerState to the Notebook CR ContainerState
* Changed log statement and added a comment
Implemented remarks of @yanniszark and @kimwnasptd
* Small reorganization of some if statements
* Implemented functional tests using ginkgo
The notebook controller can be tested using sigs.k8s.io/controller-runtime/pkg/envtest which comes as part of kubebuilder. With this we should be able to measurable test coverage.
* Fixed the incorrect test condition and included fix to download the envtest binaries.
Fixed the incorrect test condition and included fix to download the envtest binaries.
* Some tweaks based on review.
* Removed the check-license as it was blocking the test.
Included some of the tweaked yaml's files that were being generated.
* Allowing for an env var ADD_FSGROUP to be set to false to suppress the automatic addition of fsGroup: 100 in the pod's security context.
This addresses issue #4617.
* Adding note in README regarding ADD_FSGROUP.
This commit fixes the event filtering check, so it doesn't crash when
the Pod name doesn't contain a dash ("-").
Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>
* The jupyter docker image isn't building because it now depends on code
in components/common
* To make this work we need to configure it as a multi module package
and modify go.mod to redirect to a local path.
* Ref: https://github.com/golang/go/wiki/Modules#when-should-i-use-the-replace-directive
* Replaces PR #4583
Related to #4582 - Jupyter image doesn't build.