Create an Angular Library with common frontend code. Our crud web apps
should use this library to share common functionality like:
* Talking to Central Dashboard for the Namespace selection
* Making http calls
* Surfacing and showing error messages and warnings
* Form utilities
* Showing a table with entries and actions
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* Add indexers as custom field selectors for list requests to cache
The tensorboard controller must be able to list pods that have
mounted a PVC with a specific ClaimName.
In order for this list request to cache to work properly, custom
field selectors are added. These selectors are used to index the
"pod.spec.volumes.persistentvolumeclaim.claimname" field so that
unneeded pods can be filtered out.
* Set pod's nodeAffinity if log files exist in a PVC
In the case of using a PVC as a logdir for Tensorboard Server, if
the PVC had a ReadWriteOnce access mode and was alread mounted by
another running pod X, then the Tensorboard Server pod would not
always be scheduled on the same node as X. As a result, the
Tensorboard Server pod would be blocked since multi-node access
is prohibited on ReadWriteOnce volumes.
In order for the Tensorboard Server pod to run successfully,
nodeAffinity was added to the spec.template.spec.affinity field
of the returned deployment.
As a result, both X and the Tensorboard
Server pod are now scheduled on the same node.
Resolveskubernetes/kubernetes#26567
* Set Tensorboard Server scheduling feature to 'off' by default
In the case that the Tensorboard Server used a RWO PVC (as a log
storage) that was already mounted by another pod, nodeAffinity
was used so that the Tensorboard Server would be scheduled
(if possible) on the same node as that pod.
Now, this added functionality is used only if the
'RWO_PVC_SCHEDULING' environmental variable is set to "true"
when running the Tensorboard controller.
This scheduling functionality is disabled by default.
* Create Tensorboard web-app backend
Create the code for the Tensorboard web-app backend which
includes routes for GET, POST and DELETE requests.
The backend is created with Python/Flask, so it also uses
the common code from 'kubeflow.kubeflow.crud_backend'.
* Add 'get_age(k8s_object)' function to 'crud_backend' common code
It would be useful for all web apps of the 'crud-web-apps' folder
to return age information to their frontends.
As a result, 'get_age(k8s_object)' was added to the common code,
so that all web apps can use it.
Create a python module under the kubeflow.kubeflow package that will
be exposing common code and a base app the takes care of:
* Exceptions handling
* Common routes for serving static files and their cache control policy
* Authorization checks with SubjectAccessReview
* Authentication checks on the Kubeflow headers
* Common helper functions for dates, yaml parsing etc
* health/liveness probes
Backends that are written with Python/Flask should use this common code
in order for us to reduce code duplication and have our backends align
with our accepted practices.
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* Create a new directory in components for web apps
Since we want to also have some common code between our web apps we
should create a parent dir for any future web app we want to develop.
The code for the web apps, common or not, should be organized under this
directory.
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* remove the reviewers
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* Remove duplicate package import
Package "k8s.io/api/core/v1" was imported twice with names "v1"
and "corev1".
* Mount GCP secret only when accessing Google storage
The Tensorboard controller used to create pods (running the Tensorboard
server) that would always mount user-gcp-sa secret, regardless of the
logs storage being a Google cloud bucket or not. This would lead to pods
never starting properly in the case of using other cloud services (or
PVCs) as log storages, if the user-gcp-sa secret didn't exist on the
cluster.
In order for the Tensorboard server pods to run properly, user-gcp-sa
secret is now mounted only when Google cloud buckets are used as log
storages.
Fixeskubeflow/kubeflow#5065
* Allowing for an env var ADD_FSGROUP to be set to false to suppress the automatic addition of fsGroup: 100 in the pod's security context.
This addresses issue #4617.
* Adding note in README regarding ADD_FSGROUP.
This commit fixes the event filtering check, so it doesn't crash when
the Pod name doesn't contain a dash ("-").
Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>
* Fix docker builds of notebook and tensorboard controller
* The notebook-controllers and tensorboard-controllers now depend on
the go package components/common
* We need to rewrite the Dockerfiles so that the context is now
${KUBEfLOW_REPO}/common
* so that components/common can be included in the context and copied
to the Dockerfile
* Create skaffold configs to make it easier to do remote builds with Kaniko
* The skaffold configs are currently written assuming the kubeflow-ci cluster
is used to build the images. This could be generalized in the future.
* Remove the code to build the notebook-controller with GCB; we can just
use skaffold and kaniko to do efficient remote builds.
* Related to #4582 - Jupyter image doesn't build.
* Fix docker build rule.
* The jupyter docker image isn't building because it now depends on code
in components/common
* To make this work we need to configure it as a multi module package
and modify go.mod to redirect to a local path.
* Ref: https://github.com/golang/go/wiki/Modules#when-should-i-use-the-replace-directive
* Replaces PR #4583
Related to #4582 - Jupyter image doesn't build.
* Delete all the Tekton pipelines and scripts for continuous delivery
of Kubeflow applications because they are moving into kubeflow/testing
* kubeflow/testing#551 is the PR moving the code into kubeflow/testing
Related to: kubeflow/testing#544 redo how we use kustomize and Tekton
to parameterize the pipelines
* Migrate to kustomize3: Phase 1. Update kustomization.yaml
* Migrate to kustomize3: Phase 2: Update kustomize.go
- Update kustomize.go to match new package structure.
- Update module dependencies.
* Migrate to kustomize3: Phase 3: Implements code review
- As per request, revert kustomization.yaml back to deprecated syntax.
- As per request, revert kustomize.go to use deprecated .Bases field.
- Note: patchesStrategicMerge: will be turned into a deprecated field pretty soon.
- Rerun go mod tidy
* Migrate to kustomize3: Phase 4: Activate legacy order transformer
* Create a culler as a package
Helper functions for culling resources. Takes for granted that ISTIO is
installed to the system and queries Prometheus to get metrics.
Specifically, requests/{configurable time}.
If the resource should be culled, then it should be done by setting an
annotation. This way the UIs can also show that the Resource is stopping
and also easily stop a resource by making a PATCH request.
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* Culling logic enhancements
Add necessary ENV Vars. Culling won't happen by default. To enable it
the user will need to set the ENABLE_CULLING=true
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* Misc fixes in logging and comment cleanup
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* Fix typo
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* Add Notebooks specific culling
Query the /api/status endpoint of each Server
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* Remove the generic culling logic
We need to discuss if it would make sense to have this logic as a go
library, or use knative.
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* Add unit tests
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* Remove unused code
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* Review changes #1
* rename `getEnvDef` to `getEnvDefault`
* Add a comment to describe how the STOP_ANNOTATION gets used
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* Make cluster domain configurable
Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
* added detailes into NotebookCondition to keep track of notebook container status change
* update notebook controller image
* fix conitions update
* small fix
* temporary changes to debug
* temporary remove delete step from workflow for debugging
* temoraray merging kfctl-test and kfctl-go-test fir debugging
* debugging
* undo the mistake
* debugging
* debugging tests
* merged kfctl-test and kfctl-go-test
* remove wait-for-kubeflow
* merged with master
* remove test delete step for debugging
* small fix
* update jupyter test component
* update condition test for jupyter component
* revert back deleting step
* revert back change in kfctl.sh
* added some temporary change to debug jupyter-test
* revert back temp changes
* profile and Istio integration
* make profile manage Istio gateway
* add README.md
* make notebooks use gateway in kubeflow namespace
* gateway format to ns/name; add watch for istio ServiceRoleBinding
* Support setting auth header format via parameter
* update README
* update README
* update readme; resolve comments
* added ReadyReplicas status to notebook-controller
* fixed issues related to updating the notebook status
* fixed a problem in updating Notebook's status
* applied cr comments
* small change
* small formating change
* Fix Python code styles based on Pep8 and flake8
* More syle fixes to Python code
* Update python code styles based on what's provided in .style.yapf
* Sync with master and update styles
* Sync with master
* More Python style fixes
* Changes per code review
* Sync with master and update the remaining files
* Add a .flake8 config file for future reference