Commit Graph

70 Commits

Author SHA1 Message Date
Kimonas Sotirchos 1d24b75f57 notebooks: Fix endless restarts (kubeflow/kubeflow#6341)
* notebooks: Update notebook if timestamp changed

We don't want to be updating the spec of the notebook if the timestamp
hasn't changed, since this will lead to constant updates and
reconciliation loops.

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* notebooks: Use a deep-copy of the notebook spec

The controller should use a deep-copy of the notebook spec when
calculating the spec for the StatefulSet. If not then we could
update the notebook object without wanting it, since the spec could have
been changed when calculating the STS spec.

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* notebooks: Add prefix env var only if missing

The controller should be setting OR updating the NB_PREFIX env var.
Previously it would always blindly append it to the spec, which could
result in double entries for the same env var.

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
2022-02-09 17:32:07 +00:00
Kimonas Sotirchos 9510a2b913 notebooks: Graceful handling of events (kubeflow/kubeflow#6338)
* notebooks: Handle events gracefully

The controller is not exiting the reconciliation loop after it has
re-emitted a Pod/STS Event as a Notebook Event. This results in the
controller to later on try and GET a Notebook with the name of the Event
that triggered the reconciliation loop.

The controller should exit the reconciliation function once it has
emitted the event.

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* notebooks: Don't reconcile on deleted events

We don't want to trigger the reconciliation function when an event gets
deleted.

If a Notebook would be deleted then the underlying events would
be deleted as well, which results in the reconcile function to get
triggered and try to GET Events and Notebooks with the name of the
deleted event.

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
2022-02-09 15:01:07 +00:00
Athanasios Markou fbf5110f01 notebooks: Extend Notebook Controller to expose idleness for Jupyter (kubeflow/kubeflow#6297)
* notebooks: Update image's tag in make

Modify Makefile to update properly the TAG
based on the git TAG.

Signed-off-by: Athanasios Markou <athamark@arrikto.com>
Reviewed-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* notebooks: Expose last-activity

Extend the notebook-controller to:
* cull idle Notebook Servers based on their new `last-activity`
  annotation
* expose the last activity of each Notebook Server as an annotation
  on the metadata of the corresponding CR object

Modify notebook_controller.go to:
* update the Last Activity of each Notebook Server that has a
  Running pod
* delete the Last Activity Annotation for every Notebook Server
  that does not have a Running pod

Extend culler.go to:
* perform culling based on the new `last-activity` annotation and
  not based on the `/api/status` endpoint.
* update the last activity of a Notebook Server, based on the
  kernels' execution states.

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
Reviewed-by: Athanasios Markou <athamark@arrikto.com>

* notebooks: Introduce a DEV env var

We introduce a DEV ENV var to allow admins
develop and test on their local machine their
custom Notebook Controller.
We provide information and instructions inside
the components/notebook-controller/README.md.

Signed-off-by: Athanasios Markou <athamark@arrikto.com>
Reviewed-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* notebooks: Add unit tests for last-activity

* Introduce new tests for allKernelsAreIdle()
* Extend the tests for NotebookIsIdle() and for
  NotebookNeedsCulling().

Signed-off-by: Athanasios Markou <athamark@arrikto.com>
Reviewed-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: UpdateNotebookLastActivityAnnotation()

Ensure that UpdateNotebookLastActivityAnnotation() does not return
"true". This function should not return any value.

Signed-off-by: Athanasios Markou <athamark@arrikto.com>
2022-02-07 15:19:17 +00:00
Kimonas Sotirchos 64903665dc Update images for the 1.5 rc0 release (kubeflow/kubeflow#6319)
* Update the releasing version tag

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* Run automated script for updating versions

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
2022-01-27 14:16:10 +00:00
Tobia De Koninck b223e29a9d fix(notebooks) make culling work with multi-user (kubeflow/kubeflow#5128) (kubeflow/kubeflow#5980) 2022-01-21 11:25:19 +00:00
Abe Sharp 5e960331fd Remove virtualservice timeout to prevent websocket disconnect (kubeflow/kubeflow#6126)
In the existing version, the 'timeout: 300s' added to the notebook's virtual service would cause websockets to disconnect at the 5 minute mark, causing the Jupyter Notebook web terminal function to hang. This is described in https://github.com/kubeflow/kubeflow/issues/6124.
2021-09-09 03:07:01 -07:00
Filinto Duran 5ae1de4dcc Correct missing predicates in controller watches. Fixes #5326 (kubeflow/kubeflow#5873)
Co-authored-by: Filinto Duran <fduran@d2iq.com>
2021-08-11 09:17:26 -07:00
DavidSpek 4842c53f7a Update manifests to use ECR and fix fieldPath in kustomization files (kubeflow/kubeflow#5765)
* Update manifests to use ECR and latest image tags

* remove duplicate value in central-dashboard kustomization.yaml
2021-03-24 07:35:45 -07:00
Yannis Zarkadas 22e4cecf56 fix notebook controller manifests (kubeflow/kubeflow#5729)
* notebook-controller: Remove manager from gitignore

Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>

* notebook-controller: Add missing manifests

Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>
2021-03-19 16:09:17 -07:00
Yannis Zarkadas ae3b53f8d2 Notebook Controller: Consolidate manifests (kubeflow/kubeflow#5723)
* notebook-controller: Modify kubebuilder manifests

Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>

* notebook-controller: Set storageVersion to v1

Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>

* notebook-controller: Fix RBAC

Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>

* notebook-controller: Regenerate manifests

Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>

* notebook-controller: Remove unused kubebuilder manifests

Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>
2021-03-19 10:22:16 -07:00
DavidSpek a1e52c2b9e (Notebook-controller): Add `http-rewrite-uri` and `http-headers-request-set` annotations (kubeflow/kubeflow#5660)
Co-authored-by: Mathew Wicks <thesuperzapper@users.noreply.github.com>
2021-03-12 04:14:24 -08:00
Yannis Zarkadas 3f94f691c3 Notebook Controller: Move manifests development upstream (kubeflow/kubeflow#5608)
As part of the work of wg-manifests for 1.3
(https://github.com/kubeflow/manifests/issues/1735), we are moving manifests
development in upstream repos. This gives the application developers full
ownership of their manifests, tracked in a single place.

This commit copies the manifests for application `Notebook Controller`
from path `apps/jupyter/notebook-controller/upstream` of kubeflow/manifests to path
`components/notebook-controller/config` of the upstream repo (https://github.com/kubeflow/kubeflow).

Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>
2021-02-17 18:24:51 -08:00
Naveen e2555c4797 Upgrading the `go` compiler version. (kubeflow/kubeflow#5394)
Upgrade go version of the notebook-controller to 1.15, across the
Dockerfile, Makefile and README. We used the same Golang version as our Kubernetes
dependency, after @Jeffwan's suggestion.
2021-01-12 04:10:25 -08:00
gilbeckers 82c04b3be1 Correct ContainerStatus of Notebook CR (kubeflow/kubeflow#5314)
* Correct ContainerStatus of Notebook CR

The Notebook Controller doesn't set the State of the CR correctly. In some cases
the first container is the istio-sidecar which results in an incorrect state being
shown to the Notebook CR. This is fix now by showing the Notebook container
ContainerState to the Notebook CR ContainerState

* Changed log statement and added a comment
Implemented remarks of @yanniszark and @kimwnasptd

* Small reorganization of some if statements
2021-01-04 01:27:55 -08:00
Josh Risley fa5d1e7f9c Use valid commit for kubeflow/components/common module. (kubeflow/kubeflow#5309)
We use the local `../common` module to build `notebook-controller`. We
also need to specify a valid pseudo-version for `common` to support
importing the Notebook API in other modules. This is because according
to the `go.mod` docs [1]:

> exclude and replace directives only operate on the current (“main”)
> module. exclude and replace directives in modules other than the main
> module are ignored when building the main module.

If we don't replace the default "zero version" for `common` that is
generated in our require directive, then then builds fail for modules
that require the Notebook API. They will encounter an an "invalid
version" error for `common` at commit hash "000000000000".

[1]: https://github.com/golang/go/wiki/Modules#gomod
2020-12-01 12:24:52 -08:00
Naveen bc8df5407e Implemented functional tests using ginkgo for notebook controller (kubeflow/kubeflow#5378)
* Implemented functional tests using ginkgo

The notebook controller can be tested using sigs.k8s.io/controller-runtime/pkg/envtest which comes as part of kubebuilder. With this we should be able to measurable test coverage.

* Fixed the incorrect test condition and included fix to download the envtest binaries.

Fixed the incorrect test condition and included fix to download the envtest binaries.

* Some tweaks based on review.

* Removed the check-license as it was blocking the test.
Included some of the tweaked yaml's files that were being generated.
2020-11-11 05:57:49 -08:00
Naveen a75404d6d8 Included the instructions to contribute notebook-controller. (kubeflow/kubeflow#5383)
* Instructions to contribute.

* Update based on feedback.
2020-11-09 01:52:25 -08:00
Naveen 396ace7a83 Fixes the default leader election ID (kubeflow/kubeflow#5374)
The default leader election  ID is controller-leader-election-helper which could conflict when multiple controllers run within the same namespace. This is a required field in later versions of controller-runtime.
2020-11-02 23:22:17 -08:00
Mathew Wicks e9bbe43418 Add thesuperzapper to notebook OWNERS (kubeflow/kubeflow#5363) 2020-10-27 08:24:00 -07:00
MartinForReal 97a8be52a4 Add MartinForReal as reviewer (kubeflow/kubeflow#5241)
Add MartinForReal as reviewer
2020-09-06 19:27:41 -07:00
Kimonas Sotirchos 1a0a3986d2 Add owners for the Notebooks Controller (kubeflow/kubeflow#5240)
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
2020-08-25 06:34:16 -07:00
Nihir Patel b13382b558 notebook_controller.go: make clusterDomain an option (kubeflow/kubeflow#4468) 2020-07-03 19:42:48 -07:00
Humair 8470751a58 Fix notebook controller rbac gen (kubeflow/kubeflow#5083) 2020-06-22 07:18:39 -07:00
Ali Soume'e 6942bf5f87 Remove duplicate import (kubeflow/kubeflow#5058)
"k8s.io/api/core/v1" was imported with names "corev1" and "v1"
2020-06-08 20:47:19 -07:00
Chad Roberts 25bf002c34 Adding env var to suppress automatic additon of fsGroup in notebook pod (kubeflow/kubeflow#4713) (kubeflow/kubeflow#4782)
* Allowing for an env var ADD_FSGROUP to be set to false to suppress the automatic addition of fsGroup: 100 in the pod's security context.
This addresses issue #4617.

* Adding note in README regarding ADD_FSGROUP.
2020-02-19 09:08:25 -08:00
Yannis Zarkadas e02a82fbcc notebook-controller: Fix event filtering (kubeflow/kubeflow#4777)
This commit fixes the event filtering check, so it doesn't crash when
the Pod name doesn't contain a dash ("-").

Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>
2020-02-19 08:44:25 -08:00
Zhenghui Wang e8bf7974d4 add loadtest for notebook controller (kubeflow/kubeflow#4779) 2020-02-18 21:00:25 -08:00
Jeremy Lewi 0895c4d135 Fix docker builds of notebook and tensorboard controller (kubeflow/kubeflow#4664)
* Fix docker builds of notebook and tensorboard controller

* The notebook-controllers and tensorboard-controllers now depend on
  the go package components/common

* We need to rewrite the Dockerfiles so that the context is now

  ${KUBEfLOW_REPO}/common

  * so that components/common can be included in the context and copied
    to the Dockerfile

* Create skaffold configs to make it easier to do remote builds with Kaniko

  * The skaffold configs are currently written assuming the kubeflow-ci cluster
    is used to build the images. This could be generalized in the future.

* Remove the code to build the notebook-controller with GCB; we can just
  use skaffold and kaniko to do efficient remote builds.

* Related to #4582 - Jupyter image doesn't build.

* Fix docker build rule.
2020-01-21 17:54:34 -08:00
Zhenghui Wang 89acff862c Add Notebook Controller v1 spec (kubeflow/kubeflow#4649)
* add v1 spec

* change kubeflow.org_nootebooks.yaml
2020-01-13 19:43:08 -08:00
Zhenghui Wang e5410cd7c8 add source code of MPL licensed library. (kubeflow/kubeflow#4643) 2020-01-10 15:57:37 -08:00
Zhenghui Wang 4d2dc369cf Update notebook ctrler dockerfile (kubeflow/kubeflow#4641) 2020-01-09 13:56:34 -08:00
Jeremy Lewi d25a14aea2 Fix notebook controller and tensorboard controller docker image build. (kubeflow/kubeflow#4631)
* The jupyter docker image isn't building because it now depends on code
  in components/common

* To make this work we need to configure it as a multi module package
  and modify go.mod to redirect to a local path.

* Ref: https://github.com/golang/go/wiki/Modules#when-should-i-use-the-replace-directive

* Replaces PR #4583

Related to #4582 - Jupyter image doesn't build.
2020-01-07 16:25:41 -08:00
Zhenghui Wang 71918b8b64 Add licensing info for Notebook Controller (kubeflow/kubeflow#4623)
* add files for third party licensing for notebook ctlr

* lint
2020-01-06 23:20:17 -08:00
Jeremy Lewi a28e6692d6 Move the CD scripts and Tekton pipelines into kubeflow/testing (kubeflow/kubeflow#4593)
* Delete all the Tekton pipelines and scripts for continuous delivery
  of Kubeflow applications because they are moving into kubeflow/testing

* kubeflow/testing#551 is the PR moving the code into kubeflow/testing

Related to: kubeflow/testing#544 redo how we use kustomize and Tekton
            to parameterize the pipelines
2019-12-30 07:09:39 -08:00
Fernando Diaz 1ff2f7a880 Reissue pod and sts events as notebook events (kubeflow/kubeflow#4139) 2019-11-21 12:07:29 -08:00
MrXinWang d4fb94b020 Add arm64 support for controllers (kubeflow/kubeflow#4438)
Change-Id: I9f4b4871a5d02a53230abb836787f665dd8e3998
Signed-off-by: Henry Wang <henry.wang@arm.com>
Jira: ENTOS-1322
2019-10-31 19:53:23 -07:00
Quanjie Lin 1236c5e6d7 initial checkin of tensorboard controller (kubeflow/kubeflow#4312)
* initial checkin of tensorboard controller

* initial checkin of tensorboard controller

* typo

* typo

* fix typo

* support local path

* add status

* conflict

* remove binary
2019-10-29 09:12:44 -07:00
Lun-Kai Hsu 2fe3108347 fix notebook route (kubeflow/kubeflow#4402) 2019-10-24 16:01:39 -07:00
Ben Ye 2e7dc7ec06 add culling metrics (kubeflow/kubeflow#4336)
Signed-off-by: yeya24 <yb532204897@gmail.com>
2019-10-17 21:37:57 -07:00
Ben Ye d14f6ac07f support metrics in notebook-controller (kubeflow/kubeflow#4123)
Signed-off-by: yeya24 <yb532204897@gmail.com>
2019-10-16 00:15:40 -07:00
Kam Kasravi c1eca0937c Ci for components (kubeflow/kubeflow#4238)
* snapshot

* fixes to service-account and task

* adding admission-webhook, notebook-controller

* update to README.md

* update README.md
2019-10-15 08:31:53 -07:00
Jeremie Vallee c88e721fc7 [3945] Configurable Istio Gateway for Notebook Controller (kubeflow/kubeflow#4216) 2019-10-14 12:06:59 -07:00
Ben Ye 807843ec2a cleanup some codes in notebook controller (kubeflow/kubeflow#4098)
* cleanup some codes in notebook controller

Signed-off-by: yeya24 <yb532204897@gmail.com>

* remove ambassador in notebook controller

Signed-off-by: yeya24 <yb532204897@gmail.com>
2019-10-14 12:06:52 -07:00
Jerome Brette b5ff201a8c Migrate kustomize.go to Kustomize3 (kubeflow/kubeflow#4055)
* Migrate to kustomize3: Phase 1. Update kustomization.yaml

* Migrate to kustomize3: Phase 2: Update kustomize.go

- Update kustomize.go to match new package structure.
- Update module dependencies.

* Migrate to kustomize3: Phase 3: Implements code review

- As per request, revert kustomization.yaml back to deprecated syntax.
- As per request, revert kustomize.go to use deprecated .Bases field.
- Note: patchesStrategicMerge: will be turned into a deprecated field pretty soon.
- Rerun go mod tidy

* Migrate to kustomize3: Phase 4: Activate legacy order transformer
2019-09-20 21:21:25 -07:00
Lun-Kai Hsu 2f2938bead Notebook v1beta1 (kubeflow/kubeflow#4105)
* add v1beta1

* add storage version

* wip

* add conversion

* setup webhook

* fix

* fix manifest

* webhook wip

* no webhook
2019-09-13 07:04:29 -07:00
Lun-Kai Hsu 8cad496a13 Migrate notebook CR to kubebuilder V2 (kubeflow/kubeflow#4013)
* wip

* can build

* tested: able to control notebook

* fix
2019-09-04 17:06:22 -07:00
Kimonas Sotirchos 08f43598c2 Culling of Idle Jupyter Notebooks (kubeflow/kubeflow#3856)
* Create a culler as a package

Helper functions for culling resources. Takes for granted that ISTIO is
installed to the system and queries Prometheus to get metrics.
Specifically, requests/{configurable time}.

If the resource should be culled, then it should be done by setting an
annotation. This way the UIs can also show that the Resource is stopping
and also easily stop a resource by making a PATCH request.

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* Culling logic enhancements

Add necessary ENV Vars. Culling won't happen by default. To enable it
the user will need to set the ENABLE_CULLING=true

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* Misc fixes in logging and comment cleanup

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* Fix typo

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* Add Notebooks specific culling

Query the /api/status endpoint of each Server

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* Remove the generic culling logic

We need to discuss if it would make sense to have this logic as a go
library, or use knative.

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* Add unit tests

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* Remove unused code

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* Review changes #1

* rename `getEnvDef` to `getEnvDefault`
* Add a comment to describe how the STOP_ANNOTATION gets used

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* Make cluster domain configurable

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
2019-08-26 04:40:21 -07:00
Kam Kasravi 0b5e3bd995 add kkasravi to OWNERS (kubeflow/kubeflow#3311) 2019-06-18 16:58:32 -07:00
Gabriel Wen 70bd7acdf5 Merge branch 'master' into fix-notebook-controller 2019-06-03 14:33:05 -07:00
zabbasi daa4768f96 Add details to "conditions" in notebook status (kubeflow/kubeflow#3319)
* added detailes into NotebookCondition to keep track of notebook  container status change

* update notebook controller image

* fix conitions update

* small fix

* temporary changes to debug

* temporary remove delete step from workflow for debugging

* temoraray merging kfctl-test and kfctl-go-test fir debugging

* debugging

* undo the mistake

* debugging

* debugging tests

* merged kfctl-test and kfctl-go-test

* remove wait-for-kubeflow

* merged with master

* remove test delete step for debugging

* small fix

* update jupyter test component

* update condition test for jupyter component

* revert back deleting step

* revert back change in kfctl.sh

* added some temporary change to debug jupyter-test

* revert back temp changes
2019-06-03 14:19:30 -07:00