notebooks

Commit Graph

Author	SHA1	Message	Date
Suraj Kota	144fa6805f	Support Pod Defaults in Tensorboard controller (kubeflow/kubeflow#6874 ) * support poddefaults in tensorboard controller * initilize empty map	2023-01-18 15:22:21 +00:00
Oleksandr Shepotinnik	a17c966aff	tensorboard-controller: Fix tensorboard endless restarts (kubeflow/kubeflow#6722 )	2022-11-09 10:54:39 +00:00
Alex Lembiyeuski	8654406f47	Upgrade API version of `Tensorboard` CRD to `v1` (kubeflow/kubeflow#6406 ) * Migrate tensorboard-controller to Kubebuilder v3 * Fix paths inside Docker context * Remove test dependency from docker-build * Switch to kustomize 3.2.0, fix image tag * Fix namePrefix * Rename deployments, remove namespaces * Add runAsUser * Make tensorboard image and istio gateway configurable	2022-06-17 09:20:10 +00:00
DavidSpek	4b59c008b9	tensorboard-controller: fix binding issue (kubeflow/kubeflow#5925 )	2021-05-25 07:30:09 -07:00
Ilias Katsakioris	92ca8a2f84	tensorboard-controller: Fix scheduling unbound PVCs (kubeflow/kubeflow#5819 ) When the TB controller attempts to schedule a RWO PVC it checks its accessModes in the PVC status. The controller panics if the list is empty. This commit adds a check to ensure the list is not empty. Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com>	2021-04-08 17:27:03 -07:00
Rui Fang	7f9c309586	Tesorboard-Controller: use updateStatus instead of update (kubeflow/kubeflow#5644 )	2021-03-10 07:46:24 -08:00
Konstantinos Andriopoulos	254c3f7bfc	Add roles for Tensorboard controller pod (kubeflow/kubeflow#5262 ) * Add Tensorboard controller permissions for managing resources The pod running the Tensorboard controller didn't have permissions to manage the deployments, services, and VirtualServices needed so that the Tensorboard servers would function properly. In order for the deployed Tensorboard controller to run properly, permissions to 'get', 'list', 'watch', 'create' and 'update' are given to the Tensorboard controller pod so that the necessary deployments, services and VirtualServices are created and managed as expected. Also, permissions to 'get', 'list', 'watch' PVCs and pods were added. * Add namespace of Tensorboard CR to VirtualService prefix In order to avoid creating 2 virtual services that have the same prefix in different namespaces, the namespace of the corresponding Tensorboard CR was added in the prefix of the generated Virtual Service. * Fix directory bug in Makefile * Add README.md	2020-08-30 06:56:20 -07:00
Konstantinos Andriopoulos	e32222032c	Tensorboard web-app: Add functionality to inform TWA frontend about the status of Tensorboard servers (kubeflow/kubeflow#5259 ) * Extend Tensorboard CRD with status.readyReplicas field The Tensorboard CRD didn't contain any information about the Tensorboard server being ready or not. So, the status of the Tensorboard resource is extended so that it contains a readyReplicas field, similar to the status.readyReplicas of the deployment of the Tensorboard server. * Extend Tensorboard controller to update status of Tensorboard CR The frontend of the Tensorboard web-app will need information about whether the Tensorboard servers are ready to connect or not. As a result, the Tensorboard controller now copies the value of the status.readyReplicas field of the Tensorboard deployment to the status.readyReplicas of the Tensorboard CR. Also, a Deployment() function was added for applying and updating Tensorboard server deployments. * Update tensorboard.status.phase of TWA backend response The frontend of the TWA will need information about the status of the Tensorboard server, so that it can inform the user about the server being ready being ready to connect or not. As a result, the backend sets the status.phase field of the response to "ready", if tensorboard.status.readyReplicas == 1. Otherwise, the status.phase field of the response is set to "unavailable". Also, the getPVCName() function was added, which extracts the name of a given PVC object. * Add GET route for PVCs The Tensorboard web-app frontend will be using an autocomplete drop-bar to show user the PVCs that live in a specific namespace. These PVCs could be used as log storages for the Tensorboard server. So, a PVC GET route was added to the Tensorboard web-app backend. * Add message to Tensorboard response object in TWA backend The frontend of the TWA will need to output a response message for every Tensorboard object. This response message will inform the user about the current state of the Tensorboard server. * Use status.STATUS_PHASE for backend response * Add requirements.txt to TWA backend * Use status.create_status() for backend response	2020-08-30 05:08:20 -07:00
Konstantinos Andriopoulos	1936429ea5	Tensorboard controller: Add scheduling functionality for Tensorboard servers that use RWO PVCs as log storages (kubeflow/kubeflow#5218 ) * Add indexers as custom field selectors for list requests to cache The tensorboard controller must be able to list pods that have mounted a PVC with a specific ClaimName. In order for this list request to cache to work properly, custom field selectors are added. These selectors are used to index the "pod.spec.volumes.persistentvolumeclaim.claimname" field so that unneeded pods can be filtered out. * Set pod's nodeAffinity if log files exist in a PVC In the case of using a PVC as a logdir for Tensorboard Server, if the PVC had a ReadWriteOnce access mode and was alread mounted by another running pod X, then the Tensorboard Server pod would not always be scheduled on the same node as X. As a result, the Tensorboard Server pod would be blocked since multi-node access is prohibited on ReadWriteOnce volumes. In order for the Tensorboard Server pod to run successfully, nodeAffinity was added to the spec.template.spec.affinity field of the returned deployment. As a result, both X and the Tensorboard Server pod are now scheduled on the same node. Resolves kubernetes/kubernetes#26567 * Set Tensorboard Server scheduling feature to 'off' by default In the case that the Tensorboard Server used a RWO PVC (as a log storage) that was already mounted by another pod, nodeAffinity was used so that the Tensorboard Server would be scheduled (if possible) on the same node as that pod. Now, this added functionality is used only if the 'RWO_PVC_SCHEDULING' environmental variable is set to "true" when running the Tensorboard controller. This scheduling functionality is disabled by default.	2020-08-26 02:58:03 -07:00
Konstantinos Andriopoulos	9ae8d1ff40	tensorboard-controller: Mount GCP secret only when accessing Google storage (kubeflow/kubeflow#5069 ) * Remove duplicate package import Package "k8s.io/api/core/v1" was imported twice with names "v1" and "corev1". * Mount GCP secret only when accessing Google storage The Tensorboard controller used to create pods (running the Tensorboard server) that would always mount user-gcp-sa secret, regardless of the logs storage being a Google cloud bucket or not. This would lead to pods never starting properly in the case of using other cloud services (or PVCs) as log storages, if the user-gcp-sa secret didn't exist on the cluster. In order for the Tensorboard server pods to run properly, user-gcp-sa secret is now mounted only when Google cloud buckets are used as log storages. Fixes kubeflow/kubeflow#5065	2020-06-18 06:46:20 -07:00
Quanjie Lin	1236c5e6d7	initial checkin of tensorboard controller (kubeflow/kubeflow#4312 ) * initial checkin of tensorboard controller * initial checkin of tensorboard controller * typo * typo * fix typo * support local path * add status * conflict * remove binary	2019-10-29 09:12:44 -07:00

11 Commits