Commit Graph

24 Commits

Author SHA1 Message Date
Suraj Kota 144fa6805f Support Pod Defaults in Tensorboard controller (kubeflow/kubeflow#6874)
* support poddefaults in tensorboard controller

* initilize empty map
2023-01-18 15:22:21 +00:00
apoger 6b3fd05ea2 Update KF manifests and gh-action workflows to use the tag=`latest` (kubeflow/kubeflow#6854)
Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

review changes

* build images with the latest tag only when a PR
  is merged to master branch

* revert changes  in manifests/workflows for the
  notebook-server images

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>
2022-12-20 15:59:18 +00:00
apoger 54ab6a815e Fix workflows for publishing images only when PR is merged (kubeflow/kubeflow#6842)
* Fix docker-publish workflows

* Remove workflow that builds/push all images

* Remove redundant files from manifests
2022-12-15 09:51:21 +00:00
apoger be85f9f1bb tensorboard-controller: Extend tests for using images of each PR (kubeflow/kubeflow#6831)
* Introduce intergration test workflow for tensorboard-controller

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

* Publish Docker image only when PR is merged

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

* Remove kind & manifest gh-action workflows

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

* Update tag in manifests to v1.6.0

This change is required as images with v1.5.0 do not
exist in Dockerhub.

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>
2022-12-12 14:12:28 +00:00
apoger 10e0e93085 Cherry-pick commits for using DockerHub for all images (kubeflow/kubeflow#6825)
cherry-picking: #6548
* Update all images to use DockerHub
* Update releasing script for dockerhub

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
Cherry-picked-by: Apostolos Gerakaris <apoger@arrikto.com>

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
Co-authored-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
2022-12-08 15:37:10 +00:00
apoger 46f14d4e97 Use K8s 1.25 for the tests (kubeflow/kubeflow#6751)
* kind: Introduce config file for 1.25

* Add a new KinD configuration file for testing with K8s 1.25.3
* Install kind v0.17.0 for testing with K8s 1.25.3

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

* gh-actions: Use 1.25 for testing

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

* testing: Install Istio 1.16 for testing

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

* Test commit for enabling the tests

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

* notebook-controller: Fix Makefile

Remove the test rule as a prerequisite for running docker-build

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>
2022-11-24 08:30:10 +00:00
Alex Lembiyeuski 8654406f47 Upgrade API version of `Tensorboard` CRD to `v1` (kubeflow/kubeflow#6406)
* Migrate tensorboard-controller to Kubebuilder v3

* Fix paths inside Docker context

* Remove test dependency from docker-build

* Switch to kustomize 3.2.0, fix image tag

* Fix namePrefix

* Rename deployments, remove namespaces

* Add runAsUser

* Make tensorboard image and istio gateway configurable
2022-06-17 09:20:10 +00:00
Kimonas Sotirchos a61650ee88 release: Images for the 1.5.0 tag (kubeflow/kubeflow#6398)
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
2022-03-09 22:37:11 +00:00
Kimonas Sotirchos 9ba5be1c1c releasing: Create v1.5.0-rc.2 images (kubeflow/kubeflow#6394)
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
2022-03-04 17:55:59 +00:00
Kimonas Sotirchos fff5155e1e releasing: Update tags for v1.5.0-rc.1 (kubeflow/kubeflow#6343)
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
2022-02-10 18:57:15 +00:00
Hao Xin fcc4786a49 Fix(manifests): Upgrade rbac.authorization.k8s.io from v1beta1 to v1 (kubeflow/kubeflow#6261) 2022-02-03 16:18:16 +00:00
Kimonas Sotirchos 64903665dc Update images for the 1.5 rc0 release (kubeflow/kubeflow#6319)
* Update the releasing version tag

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* Run automated script for updating versions

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
2022-01-27 14:16:10 +00:00
juliusvonkohout 483cabb7e2 fix(backend): tensorboard-controller does not work because of missing permissions (kubeflow/kubeflow#6216) 2021-11-23 23:57:47 +00:00
juliusvonkohout f2df5f5b84 fix: tensorboard-controller is killed due to out of memory (kubeflow/kubeflow#6148)
* Update manager.yaml

* Update manager.yaml
2021-10-19 21:07:15 -07:00
DavidSpek 4b59c008b9 tensorboard-controller: fix binding issue (kubeflow/kubeflow#5925) 2021-05-25 07:30:09 -07:00
DavidSpek 94390858bc Specify commonLabels for tensorboard-controller (kubeflow/kubeflow#5780) 2021-03-26 03:35:46 -07:00
DavidSpek 4842c53f7a Update manifests to use ECR and fix fieldPath in kustomization files (kubeflow/kubeflow#5765)
* Update manifests to use ECR and latest image tags

* remove duplicate value in central-dashboard kustomization.yaml
2021-03-24 07:35:45 -07:00
Kimonas Sotirchos 0fe8bf5463 Tensorboards web app manifests: Don't use specific namespace in base (kubeflow/kubeflow#5753)
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
2021-03-23 08:46:44 -07:00
Kimonas Sotirchos ca44b1c4ee Manifests for Tensorboard controller (kubeflow/kubeflow#5730)
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
2021-03-21 14:28:17 -07:00
Rui Fang 7f9c309586 Tesorboard-Controller: use updateStatus instead of update (kubeflow/kubeflow#5644) 2021-03-10 07:46:24 -08:00
Konstantinos Andriopoulos a97b442e5b Add RWO_PVC_SCHEDULING env var to the Tensorboard Controller deployment (kubeflow/kubeflow#5266)
* Add RWO_PVC_SCHEDULING env var to Tensorboard controller deployment

The value of the 'RWO_PVC_SCHEDULING' env var is set to "false" by
default. The user will be able to change the value of the env var
manually by modifying the 'config/manager/manager.yaml' file.

* Update README.md
2020-08-31 08:12:21 -07:00
Konstantinos Andriopoulos 254c3f7bfc Add roles for Tensorboard controller pod (kubeflow/kubeflow#5262)
* Add Tensorboard controller permissions for managing resources

The pod running the Tensorboard controller didn't have permissions
to manage the deployments, services, and VirtualServices needed
so that the Tensorboard servers would function properly.

In order for the deployed Tensorboard controller to run properly,
permissions to 'get', 'list', 'watch', 'create' and 'update'
are given to the Tensorboard controller pod so that the necessary
deployments, services and VirtualServices are created and managed
as expected. Also, permissions to 'get', 'list', 'watch' PVCs and
pods were added.

* Add namespace of Tensorboard CR to VirtualService prefix

In order to avoid creating 2 virtual services that have the same
prefix in different namespaces, the namespace of the corresponding
Tensorboard CR was added in the prefix of the generated Virtual
Service.

* Fix directory bug in Makefile

* Add README.md
2020-08-30 06:56:20 -07:00
Konstantinos Andriopoulos e32222032c Tensorboard web-app: Add functionality to inform TWA frontend about the status of Tensorboard servers (kubeflow/kubeflow#5259)
* Extend Tensorboard CRD with status.readyReplicas field

The Tensorboard CRD didn't contain any information about the
Tensorboard server being ready or not. So, the status of the
Tensorboard resource is extended so that it contains a
readyReplicas field, similar to the status.readyReplicas of
the deployment of the Tensorboard server.

* Extend Tensorboard controller to update status of Tensorboard CR

The frontend of the Tensorboard web-app will need information
about whether the Tensorboard servers are ready to connect or not.
As a result, the Tensorboard controller now copies the value of the
status.readyReplicas field of the Tensorboard deployment to the
status.readyReplicas of the Tensorboard CR.

Also, a Deployment() function was added for applying and updating
Tensorboard server deployments.

* Update tensorboard.status.phase of TWA backend response

The frontend of the TWA will need information about the status
of the Tensorboard server, so that it can inform the user about
the server being ready being ready to connect or not.

As a result, the backend sets the status.phase field of the response
to "ready", if tensorboard.status.readyReplicas == 1. Otherwise, the
status.phase field of the response is set to "unavailable".

Also, the getPVCName() function was added, which extracts the name
of a given PVC object.

* Add GET route for PVCs

The Tensorboard web-app frontend will be using an autocomplete
drop-bar to show user the PVCs that live in a specific namespace.
These PVCs could be used as log storages for the Tensorboard server.

So, a PVC GET route was added to the Tensorboard web-app backend.

* Add message to Tensorboard response object in TWA backend

The frontend of the TWA will need to output a response message for
every Tensorboard object. This response message will inform the
user about the current state of the Tensorboard server.

* Use status.STATUS_PHASE for backend response

* Add requirements.txt to TWA backend

* Use status.create_status() for backend response
2020-08-30 05:08:20 -07:00
Quanjie Lin 1236c5e6d7 initial checkin of tensorboard controller (kubeflow/kubeflow#4312)
* initial checkin of tensorboard controller

* initial checkin of tensorboard controller

* typo

* typo

* fix typo

* support local path

* add status

* conflict

* remove binary
2019-10-29 09:12:44 -07:00