Compare commits

...

612 Commits

Author SHA1 Message Date
Yuki Iwai fe7a35dffa
tenzen-y steps down from Katib approver role (#2561)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2025-07-28 13:27:49 +00:00
dependabot[bot] dd107108b5
Bump golang.org/x/crypto from 0.31.0 to 0.35.0 (#2543)
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.31.0 to 0.35.0.
- [Commits](https://github.com/golang/crypto/compare/v0.31.0...v0.35.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-version: 0.35.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-16 20:08:39 +00:00
Yuki Iwai 8e887b8719
chore: Upgrade Go version to 1.23 (#2526)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2025-07-16 19:17:38 +00:00
Andrey Velichkevich 5d70808886
feat(docs): Guide to report security vulnerabilities (#2556)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2025-07-15 16:29:38 +00:00
Andrey Velichkevich ba2cf7d1ec
chore(docs): Add OpenSSF Badge (#2555)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2025-07-13 22:06:21 +00:00
Hezhi (Helen) Xie 73b8c5c029
[GSoC] Add e2e test for `tune` api with LLM hyperparameter optimization (#2420)
* add e2e test for tune api

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* upgrade training-operator sdk

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* specify the version of training operator sdk

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix num_labels error and update the version of training operator controller

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* check the version of training operator

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* debug

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* check import path of HuggingFaceModelParams

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update the version of training operator sdk

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update the name of experiment

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* add step of checking pod

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* check the logs of pod

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* add check

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* check reason for imagepullbackoff

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* revert timeout limit

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* extend timeout limit

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update training operator sdk version

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* check the logs of pod

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* rerun tests

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update the function of getting logs

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* add the step of describing pod

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* check disk space

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* change work directory

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* change work directory

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* increase timeout limit

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* check the logs of controller and events

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* change work directory

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* change work directory

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* change work directory

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* check the logs of kubelet

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* check the logs of kubelet

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* increase cpu

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* check the logs of training operator

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* check the use of resources

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* check the logs of container 'pytorch' and 'storage_initializer'

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix error of checking use of resources

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* add other checks to find the error reason

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* set 'storage_config'

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* reduce the number of tests

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* Check container runtime logs

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* set the driver of minikube as docker

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* set the driver of minikube to none

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* check logs of pod

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* check memory usage

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* increase 'termination_grace_period_seconds' in podspec

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix annotations error

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* restart docker

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* delete restarting docker

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* use original docker data directory

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update installation of Katib SDK with extra requires

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* test trainer image built with cpu

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* add action of free up disk space (including move docker data directory)

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* delete unnecessary checks and update the part of fetching pod description and logs

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* delete fetching pod logs

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* add blank line at the end of free-up-disk-space yaml file

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update experiment name

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update test function name to be consistent with experiment name

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* move import statements inside the function

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* apply pprint for the logging output

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update experiment names

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix the sequence of arguments in 'trial_template'

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* test example in user guide

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix access token error

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix the error of setup

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix the error of setup

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* reverse back

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

---------

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
2025-06-26 14:13:16 +00:00
dependabot[bot] 5cd9592335
Bump brace-expansion in /pkg/ui/v1beta1/frontend (#2551)
Bumps  and [brace-expansion](https://github.com/juliangruber/brace-expansion). These dependencies needed to be updated together.

Updates `brace-expansion` from 2.0.1 to 2.0.2
- [Release notes](https://github.com/juliangruber/brace-expansion/releases)
- [Commits](https://github.com/juliangruber/brace-expansion/compare/v2.0.1...v2.0.2)

Updates `brace-expansion` from 1.1.11 to 2.0.2
- [Release notes](https://github.com/juliangruber/brace-expansion/releases)
- [Commits](https://github.com/juliangruber/brace-expansion/compare/v2.0.1...v2.0.2)

---
updated-dependencies:
- dependency-name: brace-expansion
  dependency-version: 2.0.2
  dependency-type: indirect
- dependency-name: brace-expansion
  dependency-version: 2.0.2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-12 04:40:51 +00:00
Vikas Saxena 9421f2322b
New fixing kustomize5 warning (#2549)
* ran kustomize edit fix for katib-cert-manager

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* ran kustomize edit fix for katib-external-db

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* fixed up comments

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* fixed build error with katib-cert-maanager

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* fixed up comments in katib-external-db

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* fixed warnings in katib-leader-election

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* fixed warnings in katib-openshift

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* fixed warnings in katib-standalone-postgres

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* fixed warnings in katib-with-kubeflow

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* fixed up comments

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* fixing diffs errors in the updated code

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

---------

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>
2025-05-13 04:31:20 +00:00
Ayush Gupta 1ebd5e4453
Fix Istio sidecar injection by moving from annotations to labels (#2527)
* Fix Istio sidecar injection by moving from annotations to labels

Signed-off-by: madmecodes <ayushguptadev1@gmail.com>

* Update Istio sidecar injection from annotations to labels across the codebase
Replace annotations with labels for Istio sidecar injection according to Istio recommendations. Update conformance tests, examples, constants, composers, and utilities to use the new label-based approach consistently.

Signed-off-by: madmecodes <ayushguptadev1@gmail.com>

* fix: Update SuggestionLabels function and composer implementation for Istio label injection

Signed-off-by: madmecodes <ayushguptadev1@gmail.com>

* Fix linting issues in mpi-job-horovod.py

Signed-off-by: madmecodes <ayushguptadev1@gmail.com>

* update: function moved from annotations to labels

Signed-off-by: madmecodes <ayushguptadev1@gmail.com>

---------

Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
2025-05-09 17:52:41 +00:00
Harshvir Potpose c9513c633d
Fix PSS restricted warnings (#2528)
* fix pss warnings

Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>

* fix mysql

Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>

---------

Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
2025-04-29 16:33:02 +00:00
M!l!nd dd4acfc2ce
feat: add `CITATION.cff` file (#2547)
* feat: add `CITATION.cff` file

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>

* Update CITATION.cff

Co-authored-by: Shao Wang <2690692950@qq.com>
Signed-off-by: M!l!nd <99114125+milinddethe15@users.noreply.github.com>

---------

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>
Signed-off-by: M!l!nd <99114125+milinddethe15@users.noreply.github.com>
Co-authored-by: Shao Wang <2690692950@qq.com>
2025-04-18 16:47:24 +00:00
dependabot[bot] 349b571541
Bump github.com/golang-jwt/jwt/v4 from 4.5.1 to 4.5.2 (#2533)
Bumps [github.com/golang-jwt/jwt/v4](https://github.com/golang-jwt/jwt) from 4.5.1 to 4.5.2.
- [Release notes](https://github.com/golang-jwt/jwt/releases)
- [Changelog](https://github.com/golang-jwt/jwt/blob/main/VERSION_HISTORY.md)
- [Commits](https://github.com/golang-jwt/jwt/compare/v4.5.1...v4.5.2)

---
updated-dependencies:
- dependency-name: github.com/golang-jwt/jwt/v4
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-15 12:36:24 +00:00
Helber Belmiro 8e965f11d8
chore(test): Removed the no longer needed trigger-rerun-test.yaml (#2540)
Signed-off-by: Helber Belmiro <helber.belmiro@gmail.com>
2025-04-09 16:41:20 +00:00
Andrey Velichkevich 6578306795
chore(docs): Add Changelog Katib v0.18.0 (#2537)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2025-03-29 21:48:31 +00:00
saileshd1402 54764d6aa4
Revert GHCR changes for Notebook examples (#2536)
Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
2025-03-24 22:06:03 +00:00
Mahdi Khashan db4b68bf56
[feature] move manifest image references to ghcr (#2529)
* move to ghcr

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* move images to ghcr

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* manifests

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* change registry in all path

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* update script

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* fix

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* fix

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* slight fix

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

---------

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>
Signed-off-by: Mahdi Khashan <58775404+mahdikhashan@users.noreply.github.com>
2025-03-24 17:11:50 +00:00
Mahdi Khashan 1f76bb3bbf
[feature] migrate docker images to ghcr (#2520)
* update custom action

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* define token as input

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* clean up meta job

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* change build-and-publish-imageg.yaml

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* remove secret from workflow call

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* remove docker credentials from publish* images

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* revert meta step changes

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* revert changes

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* update

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* add dockerhub as a job

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* revert secrets

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* revert docker secrets

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* revert docker secrets

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* consolidate/merge registeries

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* fix inputs

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* revert docker path name

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

---------

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>
2025-03-18 19:12:14 +00:00
dependabot[bot] 4884253067
Bump axios from 1.7.9 to 1.8.3 in /pkg/ui/v1beta1/frontend (#2524)
Bumps [axios](https://github.com/axios/axios) from 1.7.9 to 1.8.3.
- [Release notes](https://github.com/axios/axios/releases)
- [Changelog](https://github.com/axios/axios/blob/v1.x/CHANGELOG.md)
- [Commits](https://github.com/axios/axios/compare/v1.7.9...v1.8.3)

---
updated-dependencies:
- dependency-name: axios
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-12 22:35:41 +00:00
dependabot[bot] 9e430ceaf5
Bump @babel/helpers from 7.25.0 to 7.26.10 in /pkg/ui/v1beta1/frontend (#2523)
Bumps [@babel/helpers](https://github.com/babel/babel/tree/HEAD/packages/babel-helpers) from 7.25.0 to 7.26.10.
- [Release notes](https://github.com/babel/babel/releases)
- [Changelog](https://github.com/babel/babel/blob/main/CHANGELOG.md)
- [Commits](https://github.com/babel/babel/commits/v7.26.10/packages/babel-helpers)

---
updated-dependencies:
- dependency-name: "@babel/helpers"
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-12 22:31:34 +00:00
Gary Miguel c18035e104
Support old-style TensorFlow events (tensorboard) (#2467)
* Support old-style TensorFlow events (tensorboard)

Fixes: https://github.com/kubeflow/katib/issues/2466
Signed-off-by: Gary Miguel <garymm@garymm.org>

* format

Signed-off-by: Gary Miguel <garymm@garymm.org>

* test

Signed-off-by: Gary Miguel <garymm@garymm.org>

* don't continue loops

Signed-off-by: Gary Miguel <garymm@garymm.org>

* format

Signed-off-by: Gary Miguel <garymm@garymm.org>

---------

Signed-off-by: Gary Miguel <garymm@garymm.org>
2025-02-15 00:59:37 +00:00
Anish Asthana 3c88967299
Add 'KEP Usage' KEP and template link (#2509)
Signed-off-by: Anish Asthana <anishasthana1@gmail.com>
2025-02-14 23:07:37 +00:00
Andrey Velichkevich 338a5c107b
Add Changelog for Katib v0.18.0-rc.0 (#2515)
* Add Changelog for Katib v0.18.0-rc.0

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Add sections for GSoC projects

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update CHANGELOG.md

Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2025-02-13 19:01:36 +00:00
Andrey Velichkevich 302020c29e
Bump Katib Python SDK to 0.18.0rc0 version (#2514)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2025-02-13 18:05:36 +00:00
Andrey Velichkevich 7b4652058d
[SDK] Support PyTorchJob as a Trial Worker (#2512)
* [SDK] Support PyTorchJob as Trial Worker

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Fix pod spec for Job

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Set default restart_policy to Never

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Fix primary_container_name for PyTorchJob

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Add unit tests for PyTorchJob as Trial

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Add e2e test for PyTorchJob as Trial

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Bump kubeflow-training SDK

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Deploy Training Operator with server side apply

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Decrease CPUs for E2E

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Install Training Operator for tune workflow

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Fix comments

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2025-02-13 11:10:36 +00:00
Shashank Mittal 6389cbadf1
[GSOC] `optuna` suggestion service logic update (#2446)
* unit test fixed

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* Update pkg/suggestion/v1beta1/hyperopt/base_service.py

Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* comment fixed

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* initial logic update

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* added unit and e2e tests for optuna suggestion service update

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* refactored code

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* added parameter for logUniform and minor changes

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* fix

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

---------

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2025-02-10 16:18:06 +00:00
Shao Wang c2b5b52762
fix(webhook): fix validation message in experiment webhook (#2507)
Signed-off-by: Electronic-Waste <2690692950@qq.com>
2025-02-05 03:09:37 +00:00
Aydan Pirani 4d2a23073a
Set experiment names at a max of 40 characters. (#2468)
Signed-off-by: Aydan Pirani <aydanpirani@gmail.com>
2025-02-04 17:05:36 +00:00
Mahdi Khashan 3e736dc54d
[CI] optimize katib ui dockerfile (#2505)
* fix flakiness

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* fix flakiness 2

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* fix flakiness 3

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* use alpine for first stage

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* use alpline git

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* no security audit

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* force npm ci

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

---------

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>
2025-02-01 20:42:33 +00:00
Shashank Mittal bf034636fa
[GSOC] `hyperopt` suggestion service logic update (#2412)
* resolved merge conflicts

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* fix

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* DISTRIBUTION_UNKNOWN enum set to 0 in gRPC api

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* convert parameter method fix

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

validation fix

add e2e tests for hyperopt

added e2e test to workflow

* convert feasibleSpace func updated

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* renamed DISTRIBUTION_UNKNOWN to DISTRIBUTION_UNSPECIFIED

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* fix

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* added more test cases for hyperopt distributions

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* added support for NORMAL and LOG_NORMAL in hyperopt suggestion service

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* added e2e tests for NORMAL and LOG_NORMAL

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

sigma calculation fixed

fix

parse new arguments to mnist.py

* hyperopt-suggestion example update

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* updated logic for log distributions

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* updated logic for log distributions

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* e2e test fixed

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* added support for parameter distributions for Parameter type INT

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* unit test fixed

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* Update pkg/suggestion/v1beta1/hyperopt/base_service.py

Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* comment fixed

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* added unit tests for INT parameter type

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* completed param unit test cases

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* handled default case for normal distributions when min or max are not specified

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* fixed validation logic for min and max

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* removed unnecessary test params

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* fixes

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* added comments

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* fix

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* set default distribution as uniform

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* line omit

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* removed empty spaces from yaml files

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

---------

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2025-01-30 21:26:52 +00:00
Hezhi (Helen) Xie 741238d712
Install typing-extensions v4.10.0 to fix Python test error (#2504)
* update the version of typing-extensions

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update comment

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

---------

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
2025-01-30 15:58:53 +00:00
Shao Wang 28e466e1b8
[GSoC] Provide a PyTorch MNIST Example for Push-based Metrics Collection (#2437)
Signed-off-by: Electronic-Waste <2690692950@qq.com>
2025-01-29 10:23:51 +00:00
Mahdi Khashan 09523cdfad
[SDK] improve PVC creation name error (#2496)
* improve pvc name error message by failing early and clear message with correct name example

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* fix lint

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* fix lint

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* raise value error for wrong name format by reconciliation

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* revert created utils

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* improve test case name

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* improve value error message

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* improve code flow

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

---------

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>
2025-01-28 00:32:50 +00:00
Du Xinmin 0133983d4a
Sort experiments by descending creation date by default in katib-ui (#2498)
* Sort experiments by descending creation date by default in katib-ui

Signed-off-by: Xinmin Du <2812493086@qq.com>

* fix: Update "renders every Experiment name into the table" test to not check order

Signed-off-by: Xinmin Du <2812493086@qq.com>

* fix: Update "renders every Experiment name into the table" test in order of startTime

Signed-off-by: Xinmin Du <2812493086@qq.com>

---------

Signed-off-by: Xinmin Du <2812493086@qq.com>
2025-01-27 23:14:51 +00:00
Hezhi (Helen) Xie 40e1e651f2
[GSoC] Add unit tests for `tune` API (#2423)
* add unit tests for tune api

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update unit tests and fix api errors

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* test

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* test

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update unit tests

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* undo changes to Makefile

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* delete debug code

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update unit test

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update the version of training operator

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* adjust 'list_namespaced_persistent_volume_claim' to be called with keyword argument

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* create constant for namespace when check pvc creation error

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* add type check for 'trainer_parameters'

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update test names

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* add verification for key Experiment information & add 'kubeflow-training[huggingface' into dependencies

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* rerun tests

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* add verification for objective metric name

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* delete unnecessary changes

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* unify objective function

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* unify objective function

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

---------

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
2025-01-24 20:38:21 +00:00
Hezhi (Helen) Xie 2567939fc9
[SDK] Update `tune` API (#2497)
* fix tune api error

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* delete check for

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

---------

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
2025-01-22 15:19:27 +00:00
dependabot[bot] f46cee565b
Bump axios from 1.7.2 to 1.7.9 in /pkg/ui/v1beta1/frontend (#2486)
Bumps [axios](https://github.com/axios/axios) from 1.7.2 to 1.7.9.
- [Release notes](https://github.com/axios/axios/releases)
- [Changelog](https://github.com/axios/axios/blob/v1.x/CHANGELOG.md)
- [Commits](https://github.com/axios/axios/compare/v1.7.2...v1.7.9)

---
updated-dependencies:
- dependency-name: axios
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-22 13:53:52 +00:00
dependabot[bot] d87b41f4b0
Bump express from 4.19.2 to 4.21.2 in /pkg/ui/v1beta1/frontend (#2477)
Bumps [express](https://github.com/expressjs/express) from 4.19.2 to 4.21.2.
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.2/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.19.2...4.21.2)

---
updated-dependencies:
- dependency-name: express
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-22 13:14:43 +00:00
royliang aa04cf4335
Update MutatingWebhookConfiguration: Switch from objectSelector to AdmissionWebhookMatchConditions (#2241)
Signed-off-by: lianghao208 <roylizard3@gmail.com>
2025-01-22 12:34:49 +00:00
Caio Almeida 59af784f50
chore: supporting the listen-address parameter on db-manager (#2465)
Signed-off-by: Caio Almeida <caio.f.r.amd@gmail.com>
2025-01-22 00:03:41 +00:00
Tsz Lung Chung 224aa9d7a0
fix(api): resolve all api voilation exceptions in katib api (#2482)
Signed-off-by: truc0 <22969604+truc0@users.noreply.github.com>
2025-01-21 14:23:11 +00:00
dependabot[bot] 93bee4dc25
Bump golang.org/x/net from 0.27.0 to 0.33.0 (#2476)
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.27.0 to 0.33.0.
- [Commits](https://github.com/golang/net/compare/v0.27.0...v0.33.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-15 05:27:12 +00:00
Du Xinmin 0cab624e6e
Upgrade klog to v2 (#2470)
* Upgrade klog dependency to v2

Signed-off-by: Xinmin Du <10803082+doris-xm@user.noreply.gitee.com>
Signed-off-by: Xinmin Du <2812493086@qq.com>

* fix: fix conflict with k8s upate

Signed-off-by: Xinmin Du <2812493086@qq.com>

---------

Signed-off-by: Xinmin Du <10803082+doris-xm@user.noreply.gitee.com>
Signed-off-by: Xinmin Du <2812493086@qq.com>
Signed-off-by: Du Xinmin <dux.m.in@sjtu.edu.cn>
Co-authored-by: Xinmin Du <10803082+doris-xm@user.noreply.gitee.com>
2025-01-15 05:22:12 +00:00
dependabot[bot] 1412c56059
Bump golang.org/x/crypto from 0.21.0 to 0.31.0 (#2464)
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.21.0 to 0.31.0.
- [Commits](https://github.com/golang/crypto/compare/v0.21.0...v0.31.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-15 01:55:22 +00:00
Du Xinmin e5482959fc
Ignore cache exporting errors in the image building workflows (#2487)
Signed-off-by: Xinmin Du <2812493086@qq.com>
2025-01-14 23:13:08 +00:00
Shao Wang 3b554aaf64
Upgrade grpcio version to v1.64.1 (#2483)
Signed-off-by: Electronic-Waste <2690692950@qq.com>
2025-01-14 20:35:42 +00:00
Shao Wang bf4a0b2c41
Upgrade Kubernetes to v1.31.3 (#2478)
* chore(ci): add k8s version 1.31.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore(Makefile): upgrade envtest version to 1.31 & setup-envtest to release-0.19.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: update k8s related package in go.mod

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: make generate.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(test): add SkipNameValidation option to test frame.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* refactor(grpc): remove deprecated grpc.Dial implementation.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(dependency): remove dependency on k8s v1.28

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: add type assertion to ptr.To

Signed-off-by: Electronic-Waste <2690692950@qq.com>

---------

Signed-off-by: Electronic-Waste <2690692950@qq.com>
2025-01-14 11:06:08 +00:00
Shao Wang eb8af4d502
fix(trial): use propagated gomega to improve debuggability. (#2432)
Signed-off-by: Electronic-Waste <2690692950@qq.com>
2025-01-10 18:57:44 +00:00
Shao Wang 9889b33599
Upgrade Kubernetes to v1.30.7 (#2463)
* chore: update go.mod & go mod tidy.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: replace source.Kind and EnqueueRequestForXxx with typed func call.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: update admission.Decoder in webhook.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: update Makefile.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: update codegen script.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: execute update-codegen.sh.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: update openapigen & generate new openapi definitions.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: fix typo error.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: update k8s version in CI.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(codegen): output CODEGEN_PKG.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(codegen): move shell check annotation.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(ci): change k8s version in go test to 1.30.0.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: remove toolchain declaration in go.mod

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: remove codegen dependency in openapigen.sh.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: fix bugs in recursive dir detection.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: remove a blank line.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: remove klog/v2

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore(codegen): add three dots in the comment.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(codegen): fix package dependency on k8s.io/code-generator.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore(Makefile): add go-mod-download.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

---------

Signed-off-by: Electronic-Waste <2690692950@qq.com>
2025-01-10 18:46:06 +00:00
Mahdi Khashan 9531372530
[DOCS] move llm hyperparameter optimisation design image to the proposal directory and rename it (#2472)
- remove redundant folder

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>
2025-01-08 17:43:21 +00:00
Shao Wang 336396436a
fix(ui): update None Collector with Push Collector. (#2418)
* fix(ui): update None Collector with Push Collector.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(ui): replace some remaining None MC.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

---------

Signed-off-by: Electronic-Waste <2690692950@qq.com>
2024-12-04 08:28:00 +00:00
Tariq Hasan 5212949244
fix: Resolve errors in e2e tests for cypress in Katib UI (#2384)
Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>
2024-12-03 14:02:59 +00:00
Shao Wang fce751a90e
doc(example): fix the broken link. (#2433)
* fix: fix the broken link.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(doc): update guidance in multi-users pipelines setup.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

---------

Signed-off-by: Electronic-Waste <2690692950@qq.com>
2024-12-02 13:32:58 +00:00
Shao Wang 3e3e0f8cdc
fix: remove remaining MXNet dependency. (#2456)
Signed-off-by: Electronic-Waste <2690692950@qq.com>
2024-12-02 13:23:57 +00:00
Andrey Velichkevich dc3398dbd4
Remove Dropout layer from ENAS Trial container to fix E2E tests (#2455)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2024-12-02 08:13:57 +00:00
dependabot[bot] 2b41ae62ab
Bump github.com/golang-jwt/jwt/v4 from 4.5.0 to 4.5.1 (#2449)
Bumps [github.com/golang-jwt/jwt/v4](https://github.com/golang-jwt/jwt) from 4.5.0 to 4.5.1.
- [Release notes](https://github.com/golang-jwt/jwt/releases)
- [Changelog](https://github.com/golang-jwt/jwt/blob/main/VERSION_HISTORY.md)
- [Commits](https://github.com/golang-jwt/jwt/compare/v4.5.0...v4.5.1)

---
updated-dependencies:
- dependency-name: github.com/golang-jwt/jwt/v4
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-05 00:45:09 +00:00
Gonçalo Montalvão Marques 706a6f2190
docs: remove katib workflow (#2443)
Signed-off-by: Gonçalo Montalvão Marques <9379664+gonmmarques@users.noreply.github.com>
2024-10-15 15:14:18 +00:00
Andrey Velichkevich 0bc143ad1a
Promote @Electronic-Waste and @helenxie-bit as Katib reviewers (#2439)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2024-10-11 19:07:12 +00:00
Andrey Velichkevich 719ae382c1
Update README and out-of-date docs (#2438)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2024-10-10 19:50:10 +00:00
Shao Wang 867c40a1b0
[GSoC] Compatibility Changes in Trial Controller (#2394)
* chore: add condition branch in requeue logic.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: add ReportObservationLog in katib_manager_util.go.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: add ReportTrialUnavailableMetrics func.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: insert unavailable value into Katib DB.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: fix lint error.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: add nil condition judgement.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: add nil condition judgement in trial_controller_util.go

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore(trial): delete nil check of MC kind in the Trial controller.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore(trial): init MC in newFakeTrialBatchJob to avoid nil condition in trial reconcile loop.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(trial): fix lint error.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(trial): fix lint error in controller.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(trial): add integration test for Push MC.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore(trial): retry reconcilation when reporting unavailable metrics failed.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(trial): fix EXPECT order.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(trial): fix typo error.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore(trial): add errReportMetricsFailed.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* Update pkg/controller.v1beta1/trial/trial_controller.go

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: Electronic-Waste <2690692950@qq.com>

* Update pkg/controller.v1beta1/trial/trial_controller_util.go

Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
Signed-off-by: Electronic-Waste <2690692950@qq.com>

* Update pkg/controller.v1beta1/trial/trial_controller.go

Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(trial): rename errors pkg.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(trial): update the order of UT.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(trial): use different names for UTs.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(trial): separate Push MC UTs with original UTs.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(trial): fix line error with gofmt.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(trial): reserve one UT for Push MC.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(trial): fix typo error.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(trial): make some tiny changes.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(trial): move cancel func to t.Cleanup.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(trial): use the propagated gomega instance to improve debuggability.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(trial): use gofmt to reformat code.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

---------

Signed-off-by: Electronic-Waste <2690692950@qq.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-09-19 07:24:28 +00:00
Hezhi (Helen) Xie bc09cfd412
[SDK] Fix types error (#2424)
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
2024-09-05 16:09:15 +00:00
Hezhi (Helen) Xie e251a07cb9
[GSoC] Update `tune` API for LLM hyperparameters optimization (#2393)
* update tune api for llm hyperparameters optimization

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* resolve conflict

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix the problem of dependency

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix the format of import statement

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* adjust the blank lines

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* delete the trainer to reuse it in Training Operator

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update constants

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update metrics format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update the type of  and

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update the message of 'ImportError'

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* add TODO of PVC creation

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update the name of pvc

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* reuse constants from Training Operator

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* keep 'parameters' and update validation

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update for test

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* reuse 'get_container_spec' and 'get_pod_template_spec' from Training Operator

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* format with black

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix Lint error

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix Lint errors

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* delete types

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix e2e test error

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* add TODO

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* format with max line length

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* format docstring

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* add helper functions

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* run test again

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* run test again

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* run test again

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix dict substitution in training_parameters

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix typo

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* resolve conflicts and add check for case of no parameters

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix flake8 error

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update isort file to black and fix typo

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* modify the set of metrics format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update tune API

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* add types.TrainerResources class

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix flake8 error

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* rerun tests

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* rerun tests

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* resolve conflict

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* rerun tests

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* rerun tests

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* rerun tests

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* rerun tests

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* rerun tests

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* rerun tests

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* rerun tests

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* rerun tests

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* rerun tests

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* rerun tests

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* delete properties of 'TrainerResources'

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format error

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update types

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* add import of 'TrainerResources' in '__init__.py' of katib

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* rerun tests

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* revert changes and rerun tests

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* check pvc and pv status of katib deployments

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* check pvc and pv status of katib deployments

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* recommit changes

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update minikube version when setup

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* delete the code that disables formatting for the tune function

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update according to andrey's feedback

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* add helper function in utils

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* rerun tests

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* move metrics_collector_spec back & update helper functions & add return type for helper functions

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* rerun tests

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix some typos

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* simplify the definition of 'TrainerResources'

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

---------

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
2024-09-03 11:35:14 +00:00
Shao Wang a524f33830
[SDK] fix grpc related bugs in Python SDK (#2398)
* fix: fix bugs in report_metrics.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: fix bugs in tune.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: fix bugs in get_trial_metrics.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: update .gitignore and setup.py.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: update Makefile.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* feat: add report_metrics_test.py.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: fix lint error.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* feat: add UTs for get_trial_metrics.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: update post_gen.py.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* refactor: rebase to master.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(sdk): use single katib_client.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(sdk): add TODO for import rewrite.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(sdk): fix lint error with black.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(sdk): fix lint error with isort.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(sdk): reformat import in katib_client_test.py.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

---------

Signed-off-by: Electronic-Waste <2690692950@qq.com>
2024-08-23 11:12:58 +00:00
Ignas Baranauskas 0e2ba6efc1
Changes isort profile to black, to be fully compatible and adds 'pkg' dir for black and flake8 (#2413)
* Chnage the isort profile to black, and add pkg dir for black and flake8

Signed-off-by: Ignas Baranauskas <ibaranau@redhat.com>

* Fix the formating

Signed-off-by: Ignas Baranauskas <ibaranau@redhat.com>

* Fix flake8 lint issues

Signed-off-by: Ignas Baranauskas <ibaranau@redhat.com>

---------

Signed-off-by: Ignas Baranauskas <ibaranau@redhat.com>
2024-08-22 15:33:57 +00:00
Shashank Mittal 4964d04208
[GSOC] Add validator for feasible space distribution (#2404)
* added validator for feasible space distribution

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

validation logic fixed

added unit test

added unit test for valid distribution

requested changes made

Update pkg/webhook/v1beta1/experiment/validator/validator.go

Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

fmt

* fmt fix

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

---------

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
2024-08-20 17:16:56 +00:00
Tariq Hasan abd1c428c7
Introduced error constants and replaced reflect with cmp (#2289)
* introduced error constants and replaced reflect with cmp

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>

* fix order of mock method calls

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>

---------

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>
2024-08-18 18:32:53 +00:00
Shashank Mittal 2f5bda2da9
[GSOC] added Unknown distribution and convertDistribution in suggestion client (#2403)
* added Unknown distribution and convertDistribution in suggestion client

added unit tests

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

* removed custom compare func

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

---------

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
2024-08-18 16:27:54 +00:00
Shao Wang 4a385f515a
[Test] Refactor `inject_webhook_test.go` according to the Developer Guide (#2401)
* test(webhook): save current work.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* refactor(test/webhook): refactor inject_webhook_test.go.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(webhook): fix lint error.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(webhook): add UT deleted by accident.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

---------

Signed-off-by: Electronic-Waste <2690692950@qq.com>
2024-08-16 18:46:28 +00:00
Ignas Baranauskas e9e6e0c0b1
Enhance pre-commit hooks with flake8 and black (#2407)
* Add black formater and flake8 linter to pre-commit

Also add's the flake8 config file

Signed-off-by: Ignas Baranauskas <ibaranau@redhat.com>

* Fixes black formating

Signed-off-by: Ignas Baranauskas <ibaranau@redhat.com>

* Fixes flake8 linting errors

Signed-off-by: Ignas Baranauskas <ibaranau@redhat.com>

---------

Signed-off-by: Ignas Baranauskas <ibaranau@redhat.com>
2024-08-16 10:13:28 +00:00
dependabot[bot] 8eb0e86385
Bump github.com/docker/docker from 26.1.4+incompatible to 26.1.5+incompatible (#2405)
Bumps [github.com/docker/docker](https://github.com/docker/docker) from 26.1.4+incompatible to 26.1.5+incompatible.
- [Release notes](https://github.com/docker/docker/releases)
- [Commits](https://github.com/docker/docker/compare/v26.1.4...v26.1.5)

---
updated-dependencies:
- dependency-name: github.com/docker/docker
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-09 19:55:38 +00:00
Shao Wang b6f7cfd9a7
[SDK] test: Add e2e test for tune function. (#2399)
* fix(sdk): fix error field metrics_collector in tune function.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(sdk): Add e2e tests for tune function.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(sdk): add missing field parameters.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* refactor(test/sdk): add run-e2e-tune-api.py.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(sdk): delete tune testing code in run-e2e-experiment.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(sdk): add blank lines.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(sdk): add verbose and temporarily delete e2e-experiment test.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(sdk): add namespace_labels.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(sdk): add time.sleep(5).

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(sdk): add error output.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(sdk): build random image for tune.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(sdk): delete extra debug log.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* refactor(test/sdk): create separate workflow for tune.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(sdk): change api to API.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(sdk): change the permission of scripts.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(sdk): delete exit code & comment image pulling.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(sdk): delete image pulling phase.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(sdk): refactor workflow file to use template.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(sdk): mark experiments and trial-images as not required.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(sdk): pass tune-api param to setup-minikube.sh.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(sdk): fix err in template-e2e-test.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(sdk): add debug logs.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(sdk): reorder params and delete logs.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

---------

Signed-off-by: Electronic-Waste <2690692950@qq.com>
2024-08-06 17:50:39 +00:00
Shashank Mittal 51b246fa1c
[GSOC] Support for various Parameter distributions in Katib (#2334)
Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

modified feasibleSpace

refactored proposal based on comments

comparison table updated

extra heading removed
2024-07-31 08:03:05 +00:00
Shashank Mittal 6a17c3e35a
[GSoC] Added `DistributionType` to Experiment API (#2377)
Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

modified feasibleSpace

Removed Categorical from Distribution
2024-07-31 04:37:05 +00:00
dependabot[bot] 9a8c9d480f
Bump github.com/docker/docker from 24.0.9+incompatible to 26.1.4+incompatible (#2400)
Bumps [github.com/docker/docker](https://github.com/docker/docker) from 24.0.9+incompatible to 26.1.4+incompatible.
- [Release notes](https://github.com/docker/docker/releases)
- [Commits](https://github.com/docker/docker/compare/v24.0.9...v26.1.4)

---
updated-dependencies:
- dependency-name: github.com/docker/docker
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-30 15:47:56 +00:00
Shashank Mittal ffc005855d
added `Distribution` field to feasibleSpace in `api.proto` (#2397)
Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
2024-07-26 02:40:55 +00:00
Hezhi Xie 2c57522758
[GSoC] Create LLM Hyperparameters Optimization API Proposal (#2333)
* create llm hyperparameters tuning api proposal

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update llm hyperparameters tuning api proposal

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update proposal

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix some typos

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update the path of image and delete parameter 'resouces_per_worker' from tune api

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* delete objective function and adjust the design of tune API

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* Update docs/proposals/llm-hyperparameter-optimization-api.md

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* Move 'Advanced Functionalities' to 'Non-Goals', and update 'Implementation' part

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update 'pytorch_config'

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* change the name of 'pytorch_config' to 'resources_per_trial'

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* adjust format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* adjust format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* adjust format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update implementation part and the type of 'resources_per_trial'

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update the example

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update 'resources_per_trial'& add one more option for defining objective function

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix typo errors

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* delete 'WIP' tag

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update example

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update example

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* update example

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

* fix format

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

---------

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2024-07-25 14:10:54 +00:00
Shao Wang a6c37e4f3a
fix: remove the dependency of `protocmp` in `google.golang.org/protobuf/testing/protocmp`. (#2391)
Signed-off-by: Electronic-Waste <2690692950@qq.com>
2024-07-24 16:03:53 +00:00
Shao Wang a8840f26f8
[GSoC] Add New Parameter in `tune` (#2369)
* chore: add metrics_collector_config in tune function.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* rebase: rebase feat/new-param-tune to master.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: add metrics collector kind list in comment.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: always pass Trial name to the training container.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: delete passing env variable logics in katib_client.py

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: passing env variable KATIB_TRIAL_NAME in the webhook of pod.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: pass env variable KATIB_TRIAL_NAME only to the primary container.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: add report_metrics in post_gen.py.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: change nil error to allErrs(deleted by accident).

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: fix lint error in inject_webhook.go.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: wrap env variables passing logics into mutatePodEnv.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: add unit tests for mutatePodEnv.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: delete protocmp.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

---------

Signed-off-by: Electronic-Waste <2690692950@qq.com>
2024-07-18 17:51:57 +00:00
Alex a3dd708541
Begin enabling pre-commit hooks (#2242)
* Begin enabling pre-commit hooks

Signed-off-by: droctothorpe <mythicalsunlight@gmail.com>

* Address PR feedback

Signed-off-by: droctothorpe <mythicalsunlight@gmail.com>

---------

Signed-off-by: droctothorpe <mythicalsunlight@gmail.com>
2024-07-18 17:04:58 +00:00
jaffe 206fe1c106
Update Instructions for Argo Workflows (#2382)
Signed-off-by: jaffe-fly <flydemailbox@163.com>
2024-07-17 15:32:57 +00:00
Ikko Eltociear Ashimine 7be8b243f6
docs: update suggestion.md (#2387)
implmentation -> implementation

Signed-off-by: Ikko Ashimine <ashimine_ikko_bp@tenso.com>
2024-07-17 14:13:57 +00:00
Andrey Velichkevich 0b4e7c1780
Add command to re-run GitHub Actions tests (#2385)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2024-07-15 15:38:55 +00:00
Andrey Velichkevich 33f60c8ac0
Bump Katib Python SDK to 0.17.0 version (#2379)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2024-07-15 15:14:55 +00:00
Andrey Velichkevich da3238d310
Add Changelog for Katib v0.17.0 (#2380)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2024-07-15 15:09:54 +00:00
Tariq Hasan db17214cf0
Replaced hpcloud with nxadm for tail package in Go (#2375)
Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>
2024-07-10 00:13:12 +00:00
Shao Wang 154a85b740
[GSoC] New Interface `report_metrics` in Python SDK (#2371)
* chore: add report_metrics.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: modify the code according to the first review.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: add validation for metrics value & rename katib_report_metrics.py to report_metrics.py.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: update import path in __init__.py.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: delete blank line.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: update RuntimeError doc string & correct spelling error & add new line.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: delete blank in the last line.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

---------

Signed-off-by: Electronic-Waste <2690692950@qq.com>
2024-07-05 23:41:48 +00:00
Shao Wang f06906d338
[GSoC] KEP for Project 6: Push-based Metrics Collection for Katib (#2328)
* doc: initial commit of gsoc proposal(project 6).

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* doc: complete KEP for gsoc proposal(Project 6).

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: add non-goals and examples.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: add .

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: add compatibility changes in trial controller.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: update architecture figure.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: update format.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: update doc after the review in 10th, June.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: add code link and remove namespace env variable.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: modify proposal after the review in 14th, June.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: delete WIP label.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: add timeout param into report_metrics.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: metrics_collector_config spelling.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

---------

Signed-off-by: Electronic-Waste <2690692950@qq.com>
2024-06-28 22:54:42 +00:00
Curtis e83628bb49
Use ErrorList for experiment validator (#2329)
Signed-off-by: Kun Chang <curtis@mail.ustc.edu.cn>
2024-06-27 11:03:11 +00:00
Andrey Velichkevich 57ed828702
Add Changelog for Katib v0.17.0-rc.1 (#2370)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2024-06-25 16:26:13 +00:00
Vihang Mehta 7eb73b6b19
Remove default caBundle value (#2368)
Signed-off-by: Vihang Mehta <vihang@gimletlabs.ai>
2024-06-24 14:09:09 +00:00
Andrey Velichkevich 8bbac200a8
Bump Katib Python SDK to 0.17.0rc1 version (#2365)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2024-06-20 18:51:00 +00:00
Tariq Hasan 99ba1d58cf
Add unit test for `create_experiment` in the `katib_client` module (#2325)
* added logger for katib_client module

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>

* added API_VERSION as a constant

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>

* updated the KatibClient constructor to match the TrainingClient constructor

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>

* added test for create_experiment in katib_client

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>

---------

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>
2024-06-20 15:34:00 +00:00
Andrey Velichkevich 5a0b7db651
Remove code generation from release script (#2363)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2024-06-20 10:27:00 +00:00
Andrey Velichkevich f8b8d8d484
[SDK] Fix empty list for env variables and numpy version (#2360)
* [SDK] Fix empty list for env variables

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Fix numpy version in tests

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2024-06-18 06:57:58 +00:00
Yuki Iwai 8a342460f2
Upgrade the protobuf version to >=4.21.12,<5 (#2358)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-06-17 10:51:57 +00:00
coldWater 0d190b9437
Replace gRPC code generation tool from Znly/protoc to Buf (#2344)
* Replace gRPC code generation tool from Znly/protoc to Buf

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* fix

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* del build.sh

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* cleanup

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* fix test

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* fix

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* fix

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* refine

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* fix

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* rm outter yaml

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* fix

Signed-off-by: forsaken628 <forsaken628@gmail.com>

---------

Signed-off-by: forsaken628 <forsaken628@gmail.com>
2024-06-15 15:18:33 +00:00
coldWater e6bd3e7b5b
Replace already closed github.com/golang/mock with go.uber.org/mock (#2357)
* replace gomock

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* fix

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* revert

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* fix

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* fix

Signed-off-by: forsaken628 <forsaken628@gmail.com>

---------

Signed-off-by: forsaken628 <forsaken628@gmail.com>
2024-06-14 13:54:09 +00:00
coldWater b02aed8ec6
Use cache-dependency-path in actions/setup-go for CI workflow (#2355)
Signed-off-by: forsaken628 <forsaken628@gmail.com>
2024-06-14 07:06:08 +00:00
coldWater 4e4ce6f731
Fix TestReconcileBatchJob (#2350)
* update

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* fix

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* update

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* update

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* update

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* fix

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* cleanup

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* fix

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* update

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* use gomock

Signed-off-by: forsaken628 <forsaken628@gmail.com>

---------

Signed-off-by: forsaken628 <forsaken628@gmail.com>
2024-06-14 06:41:09 +00:00
Andrey Velichkevich 7959ffd548
[SDK] Explain Python version support cycle (#2354)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2024-06-13 08:25:08 +00:00
coldWater d69d04e77e
Migrate KatibCertGenerator to OPA CertController (#2345)
* Migrate KatibCertGenerator to OPA CertController

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* fix

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* fix

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* fix

Signed-off-by: forsaken628 <forsaken628@gmail.com>

* typo

Signed-off-by: forsaken628 <forsaken628@gmail.com>

---------

Signed-off-by: forsaken628 <forsaken628@gmail.com>
2024-06-12 10:10:07 +00:00
Andrey Velichkevich 2a9ffb169b
Update Slack Invitation (#2349)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2024-06-11 11:18:08 +00:00
Hezhi Xie 87aec69b9f
Fix apple silicon rosetta error when building images from the source code (#2342)
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
2024-06-05 11:59:03 +00:00
Yuki Iwai 55e283ea1b
Drop Python 3.7 and Support Python 3.11 in the SDK (#2337)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-05-29 13:39:15 +00:00
Jerry-yz 328bc5ca6a
fix katib use crds token pipeline trail template guide (#2330)
Signed-off-by: Jerry-yz <yz386071268@gmail.com>
2024-05-29 09:42:16 +00:00
Andrey Velichkevich 199e8a41f5
Update GitHub template to better triage Issues (#2335)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2024-05-29 02:04:16 +00:00
Andrey Velichkevich a1046db880
Fix Scikit-Learn Version for Skopt Tests (#2336)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2024-05-29 00:15:11 +00:00
Andrey Velichkevich c4c3eb5243
Add Changelog for Katib v0.17.0-rc.0 (#2319)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2024-05-13 15:22:18 +00:00
Mehrshad 8c9a33a2f7
Update outdated actions (#2324)
Signed-off-by: Mehrshad <code.rezaei@gmail.com>
2024-05-07 06:20:43 +00:00
Tariq Hasan 1551ca3975
Make test fields private in Go unit tests (#2316)
Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>
2024-04-30 14:38:50 +00:00
Andrey Velichkevich af900202c6
Bump Katib Python SDK to 0.17.0rc0 Version (#2318)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2024-04-30 14:06:50 +00:00
Andrey Velichkevich ea46a7f2b7
Support ARM64 arch for release images (#2315)
* Support ARM arch for release images

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update Developer Doc

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2024-04-24 22:48:44 +00:00
dependabot[bot] 2d308b72c3
Bump golang.org/x/net from 0.19.0 to 0.23.0 (#2312)
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.19.0 to 0.23.0.
- [Commits](https://github.com/golang/net/compare/v0.19.0...v0.23.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-19 13:55:49 +00:00
Yuki Iwai 21320b6d57
Upgrade Go version to v1.22 (#2309)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-04-15 12:58:51 +00:00
Yuki Iwai 025ce256a4
Drop Kubernetes v1.26, and support Kubernetes v1.29 (#2308)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-04-15 10:55:51 +00:00
Yuki Iwai 1365e473c5
Drop Kubernetes v1.25, and Support Kubernetes v1.28 (#2303)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-04-11 16:14:47 +00:00
Andrey Velichkevich 086093fed7
[SDK] Fix env per Trial parameter in tune API (#2304)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2024-04-11 07:09:47 +00:00
Shao Wang 7df05c23a5
fix: clean up UTs for file metrics collector (#2285)
* chore: replace testDir with tempDir.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: expose and compare errors.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* refactor: integrate test generation func into testCases.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* refactor: update error comparing mechanism.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: make some changes under the review of yuki.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

---------

Signed-off-by: Electronic-Waste <2690692950@qq.com>
2024-04-03 06:42:22 +00:00
Yuki Iwai 9680b8c73f
Upgrade TensorFlow version to v2.16.1 (#2282)
* Upgrade TensorFlow version to v2.16.1

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Replace deprecated ImageDataGenerator with new data augmentation approach

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

---------

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-04-02 20:09:22 +00:00
Yuki Iwai 8629a3ce05
CI: Enable parallel mode for the coveralls (#2297)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-04-02 14:50:22 +00:00
Bharath K Balaji 36150bc3e9
Python SDK - Generate Name functionality for creating experiments. (#2272)
* added dco

Signed-off-by: Bharath Krishna <bharathk005@gmail.com>

* updated condition

Signed-off-by: Bharath Krishna <bharathk005@gmail.com>

* added exception to catch missing name and generateName

Signed-off-by: Bharath Krishna <bharathk005@gmail.com>

* updated experiment_name in create_experiment

Signed-off-by: Bharath Krishna <bharathk005@gmail.com>

* py sdk create_exp - added type validation

Signed-off-by: Bharath Krishna <bharathk005@gmail.com>

* added dco

Signed-off-by: Bharath Krishna <bharathk005@gmail.com>

* updated condition

Signed-off-by: Bharath Krishna <bharathk005@gmail.com>

* added exception to catch missing name and generateName

Signed-off-by: Bharath Krishna <bharathk005@gmail.com>

* updated experiment_name in create_experiment

Signed-off-by: Bharath Krishna <bharathk005@gmail.com>

* py sdk create_exp - added type validation

Signed-off-by: Bharath Krishna <bharathk005@gmail.com>

---------

Signed-off-by: Bharath Krishna <bharathk005@gmail.com>
2024-04-02 14:18:22 +00:00
dependabot[bot] 250e9d176f
Bump golang.org/x/net from 0.10.0 to 0.17.0 (#2233)
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.10.0 to 0.17.0.
- [Commits](https://github.com/golang/net/compare/v0.10.0...v0.17.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-25 20:04:16 +00:00
dependabot[bot] 1df32f2b24
Bump google.golang.org/grpc from 1.53.0 to 1.56.3 (#2236)
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.53.0 to 1.56.3.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.53.0...v1.56.3)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-25 18:47:17 +00:00
dependabot[bot] 0a5c9e5191
Bump golang.org/x/crypto from 0.1.0 to 0.17.0 (#2249)
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.1.0 to 0.17.0.
- [Commits](https://github.com/golang/crypto/compare/v0.1.0...v0.17.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-25 16:56:17 +00:00
dependabot[bot] b3e4715c33
Bump jose from 2.0.6 to 2.0.7 in /pkg/ui/v1beta1/frontend (#2275)
Bumps [jose](https://github.com/panva/jose) from 2.0.6 to 2.0.7.
- [Release notes](https://github.com/panva/jose/releases)
- [Changelog](https://github.com/panva/jose/blob/v2.0.7/CHANGELOG.md)
- [Commits](https://github.com/panva/jose/compare/v2.0.6...v2.0.7)

---
updated-dependencies:
- dependency-name: jose
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-25 15:41:18 +00:00
dependabot[bot] ec86f23311
Bump google.golang.org/protobuf from 1.30.0 to 1.33.0 (#2284)
Bumps google.golang.org/protobuf from 1.30.0 to 1.33.0.

---
updated-dependencies:
- dependency-name: google.golang.org/protobuf
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-25 15:34:18 +00:00
dependabot[bot] 51c9350847
Bump github.com/docker/docker from 24.0.0+incompatible to 24.0.9+incompatible (#2292)
Bumps [github.com/docker/docker](https://github.com/docker/docker) from 24.0.0+incompatible to 24.0.9+incompatible.
- [Release notes](https://github.com/docker/docker/releases)
- [Commits](https://github.com/docker/docker/compare/v24.0.0...v24.0.9)

---
updated-dependencies:
- dependency-name: github.com/docker/docker
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-25 15:14:17 +00:00
dependabot[bot] ae894507c9
Bump follow-redirects from 1.15.4 to 1.15.6 in /pkg/ui/v1beta1/frontend (#2287)
Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.15.4 to 1.15.6.
- [Release notes](https://github.com/follow-redirects/follow-redirects/releases)
- [Commits](https://github.com/follow-redirects/follow-redirects/compare/v1.15.4...v1.15.6)

---
updated-dependencies:
- dependency-name: follow-redirects
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-18 23:17:35 +00:00
Yuki Iwai 6f372f6808
Upgrade Python version to 3.11 (#2278)
* Upgrade Python version to 3.11

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Upgrade the numpy version to 1.25.2

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Increase resource requests for the ENAS suggestion service

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Update pytest CI

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Prepare dedicated pytest for skopt

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

---------

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-03-12 19:54:11 +00:00
Shao Wang 5837b8a90e
chore: add unit testcases for files in Text format. (#2274)
* chore: add unit testcases for files in Text format.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: adjust file layout using gofmt.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: combine test for JSON and TEXT file format.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: rename file-gen functions.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* refactor: update cmd.Diff params and log outputs.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: add more valid and invalid testcases for TEXT format.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: convert testcase name to const.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore: compact dir generation & deletion operations into funcs.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: delete constants used only once.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix: fix ci error in errorcheck.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

---------

Signed-off-by: Electronic-Waste <2690692950@qq.com>
2024-03-12 18:13:11 +00:00
Yuki Iwai 679e6fb8f8
Upgrade PyTorch version to v2.2.1 (#2279)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-03-12 00:07:10 +00:00
Chen Pin-han 61406a5397
Fix tensor devices for DARTS Trial (#2273)
* Update architect.py

72907153+sifa1024@users.noreply.github.com

Signed-off-by: Chen Pin-Han <72907153+sifa1024​@users.noreply.github.com>

* Update run_trial.py

72907153+sifa1024@users.noreply.github.com

Signed-off-by: Chen Pin-Han <72907153+sifa1024​@users.noreply.github.com>

* Update architect.py

72907153+sifa1024@users.noreply.github.com

Signed-off-by: Chen Pin-Han <72907153+sifa1024​@users.noreply.github.com>

---------

Signed-off-by: Chen Pin-Han <72907153+sifa1024​@users.noreply.github.com>
2024-03-10 03:15:40 +00:00
Curtis a2f3fcae55
Add environment variable option to set postgres ssl mode (#2266)
Signed-off-by: Kun Chang <curtis@mail.ustc.edu.cn>
2024-03-05 19:31:07 +00:00
Yuki Iwai 03a400128a
Upgrade google/go-containerregistry/pkg/authn/k8schain (#2252)
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-03-05 09:45:07 +00:00
Yuki Iwai fc858d15dd
Remove MXNet examples (#2267)
* UT: Replace MXNet example with PyTorch example

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* CI: Replace MXNet examples with PyTorch examples

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

---------

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-03-04 10:45:07 +00:00
Matteo Mortari 8df3c5c838
typo fix stale.yaml (#2257)
Message `close-pr-message` was likely a wrong copy-paste from stale.

This aligns `close-` messages.
2024-02-05 14:40:17 +00:00
dependabot[bot] 19268062f1
Bump axios and wait-on in /pkg/ui/v1beta1/frontend (#2254)
Bumps [axios](https://github.com/axios/axios) to 1.6.5 and updates ancestor dependency [wait-on](https://github.com/jeffbski/wait-on). These dependencies need to be updated together.


Updates `axios` from 0.27.2 to 1.6.5
- [Release notes](https://github.com/axios/axios/releases)
- [Changelog](https://github.com/axios/axios/blob/v1.x/CHANGELOG.md)
- [Commits](https://github.com/axios/axios/compare/v0.27.2...v1.6.5)

Updates `wait-on` from 7.0.1 to 7.2.0
- [Release notes](https://github.com/jeffbski/wait-on/releases)
- [Commits](https://github.com/jeffbski/wait-on/compare/v7.0.1...v7.2.0)

---
updated-dependencies:
- dependency-name: axios
  dependency-type: indirect
- dependency-name: wait-on
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-09 21:47:06 +00:00
dependabot[bot] 10f17fedfb
Bump follow-redirects from 1.14.8 to 1.15.4 in /pkg/ui/v1beta1/frontend (#2253)
Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.14.8 to 1.15.4.
- [Release notes](https://github.com/follow-redirects/follow-redirects/releases)
- [Commits](https://github.com/follow-redirects/follow-redirects/compare/v1.14.8...v1.15.4)

---
updated-dependencies:
- dependency-name: follow-redirects
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-09 20:46:33 +00:00
Sunghyuk Kay d92c168baa
DB: Add environment variable option to skip DB table creationˆ (#2245)
* DB: Add env to skip DB creationˆ

* DB: Rename var name & Remove new function

* Migration -> Initialization
* Remove GetBoolEnvOrDefault

* DB: Rearrange dependencies
2024-01-04 16:21:13 +00:00
Yuki Iwai bf9a1b09e9
Add Technical and style guide to the contribution guide (#2250)
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
2024-01-04 14:41:12 +00:00
Yuki Iwai 75ea35cc0f
Install typing-extensions v4.6.3 for Optuna (#2251)
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
2024-01-04 13:32:12 +00:00
Andrey Velichkevich 4617346302
Remove legacy BO code (#2246) 2023-12-06 02:46:06 +00:00
Shi Pengcheng f4c8861c81
[SDK] Add `env` & `env_from` in client tune (#2235)
* add env & env_from spec

* unify env and env_from specs
2023-11-17 09:33:08 +00:00
Andrey Velichkevich fbe7c786e9
Add Changelog for Katib v0.16.0 (#2239) 2023-11-03 03:07:52 +00:00
Andrey Velichkevich f62e40dbd3
Bump Katib Python SDK to 0.16.0 version (#2238) 2023-11-03 03:06:52 +00:00
Andrey Velichkevich 700e64e053
Fix Optuna Validation for CMA-ES (#2240)
* Fix Optuna Validation for CMA-ES

* Fix Optuna test
2023-11-02 18:48:32 +00:00
dependabot[bot] d2e311fc03
Bump debug from 4.2.0 to 4.3.4 in /pkg/ui/v1beta1/frontend (#2230)
Bumps [debug](https://github.com/debug-js/debug) from 4.2.0 to 4.3.4.
- [Release notes](https://github.com/debug-js/debug/releases)
- [Commits](https://github.com/debug-js/debug/compare/4.2.0...4.3.4)

---
updated-dependencies:
- dependency-name: debug
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-17 17:07:56 +00:00
dependabot[bot] cf7fe2e47e
Bump @babel/traverse from 7.15.4 to 7.23.2 in /pkg/ui/v1beta1/frontend (#2234)
Bumps [@babel/traverse](https://github.com/babel/babel/tree/HEAD/packages/babel-traverse) from 7.15.4 to 7.23.2.
- [Release notes](https://github.com/babel/babel/releases)
- [Changelog](https://github.com/babel/babel/blob/main/CHANGELOG.md)
- [Commits](https://github.com/babel/babel/commits/v7.23.2/packages/babel-traverse)

---
updated-dependencies:
- dependency-name: "@babel/traverse"
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-17 12:44:56 +00:00
Shi Pengcheng 50a3f4110d
[SDK] Add 'algorithm_settings' in client tune (#2227) 2023-10-05 10:22:15 +00:00
Alex 520a39701b
[SDK] Raise more human-readable name conflict exception (#2199)
Co-authored-by: andreafehrman <andrea.k.fehrman@vanderbilt.edu>
Co-authored-by: harrisonfritz <harrisonmichaelfritz@gmail.com>
2023-09-07 22:21:33 +00:00
Andrey Velichkevich e3e0aa24ae
Add Katib ROADMAP 2022/2023 (#2153)
* Add Katib ROADMAP 2022/2023

* Add multi-objective optimization

* Add Scalability Improvements

* Remove Katib CRD naming
2023-08-24 22:40:54 +00:00
Andrey Velichkevich 2843a814a6
Update Ubuntu to 22.04 for E2E Tests (#2222)
* Update Ubuntu to 22.04 for E2E Tests

* Update Ubuntu for all Tests
2023-08-24 20:06:16 +00:00
Andrey Velichkevich 373f6e6d7d
Run Stale Action Every 5th Hour (#2221) 2023-08-23 15:18:46 +00:00
Andrey Velichkevich ea27fa7fee
Add Stale GitHub Action (#2220) 2023-08-21 17:15:35 +00:00
Yuki Iwai 87a0161c2c
Use the controller-runtime logger in the cert-generator (#2219)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-08-18 17:05:50 +00:00
Andrey Velichkevich 1f5fb48c6e
Add Changelog for Katib v0.16.0-rc.1 (#2218) 2023-08-17 00:33:38 +00:00
Andrey Velichkevich b107b2cf4e
Add Changelog for Katib v0.16.0-rc.0 (#2204) 2023-08-16 22:31:37 +00:00
Andrey Velichkevich 2f3ffc7d23
Bump Katib Python SDK to 0.16.0rc1 version (#2217) 2023-08-16 16:07:03 +00:00
dependabot[bot] 2ae992a111
Bump d3-color and @swimlane/ngx-charts in /pkg/ui/v1beta1/frontend (#2210)
Bumps [d3-color](https://github.com/d3/d3-color) to 3.1.0 and updates ancestor dependency [@swimlane/ngx-charts](https://github.com/swimlane/ngx-charts). These dependencies need to be updated together.


Updates `d3-color` from 2.0.0 to 3.1.0
- [Release notes](https://github.com/d3/d3-color/releases)
- [Commits](https://github.com/d3/d3-color/compare/v2.0.0...v3.1.0)

Updates `@swimlane/ngx-charts` from 19.2.0 to 20.4.1
- [Changelog](https://github.com/swimlane/ngx-charts/blob/master/docs/changelog.md)
- [Commits](https://github.com/swimlane/ngx-charts/compare/19.2.0...20.4.1)

---
updated-dependencies:
- dependency-name: d3-color
  dependency-type: indirect
- dependency-name: "@swimlane/ngx-charts"
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-16 04:37:03 +00:00
Yuki Iwai 29887c13a0
Upgrade Tensorflow version to v2.13.0 (#2201)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-08-15 23:09:03 +00:00
Yuki Iwai c33494bc8f
Start waiting for certs to be ready before sending data to the channel (#2209)
Start waiting for certs to be ready before sending data to the channel

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-08-15 22:17:03 +00:00
Yuki Iwai aa772b607d
Remove a katib-webhook-cert Secret from components (#2207)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-08-15 22:06:03 +00:00
Yuki Iwai 1b68744276
Bug: Wait for the certs to be mounted inside the container (#2198)
* Wait for the certs to be mounted inside the container

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Initialize fullServiceDomain when adding certgenerator to the manager

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Output logs every 15 seconds if the certs don't yet exist in the container

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

---------

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-08-15 21:15:03 +00:00
Yuki Iwai 2ae3eb5adf
E2E: Add additional checks to verify if the components are ready (#2202)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-08-15 18:28:02 +00:00
Yuki Iwai 4dbb49f536
Skip to inject the metrics-collector pods to the katib controller (#2203)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-08-15 18:13:03 +00:00
Andrey Velichkevich 7f0d9229fa
Bump Katib Python SDK to 0.16.0rc0 version (#2205) 2023-08-15 15:18:03 +00:00
Yuki Iwai 888bec38f4
Sending an empty data to the certsReady channel (#2196)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-08-05 14:17:53 +00:00
Alex 923d0fcca8
[SDK] Enable resource specification for trial containers (#2192)
Co-authored-by: shipengcheng1230 <shipengcheng1230@gmail.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2023-08-05 10:46:54 +00:00
Andrey Velichkevich 114485dc04
Change failurePolicy to Fail for Katib Webhooks (#2018) 2023-08-04 23:27:53 +00:00
Yuki Iwai 06740a00e9
Consolidate the katib-cert-generator to the katib-controller (#2185)
* Consolidate the katib-cert-generator to the katib-controller

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Use deployed secret instead of creating a new secret when the cert-generator saves certs on secret

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Rename secretName with webhookSecretName in the .init.certGenerator

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Fix manifests

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Remove unneeded comments

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Restore unintentionally deleted log

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Rename package cert-generator with certgenerator

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Add test cases to check if the enable is set to true when the webhookServiceName or webhookSecretName is set

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Update the developer guide

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Swap livness probe and readiness probe

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Introduce SSA to the cert-generator

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Use the same member names between CertGenerator and KatibConfig

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Disable leader election on the cert-generator

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Drop unneeded fields from SSA patches

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

---------

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-08-04 19:31:20 +00:00
mChowdhury-91 f074329a14
Default Resume Policy to never from UI (#2195) 2023-08-04 18:05:20 +00:00
Yuki Iwai 74cf5b8d4e
Upgrade Go version to v1.20 (#2190)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-08-03 11:20:19 +00:00
Yuki Iwai c731fd29d5
Replace grpc_health_probe with the built-in gRPC container probe feature (#2189)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-08-03 11:19:20 +00:00
Yuki Iwai c749d27c70
Allow install binaries for the arm64 in the envtest (#2188)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-08-01 20:41:07 +00:00
Yuki Iwai e69235daa1
Implement KatibConfig API (#2176)
* Implement KatibConfig API

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Replace 'collectorKind' with 'kind'

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Replace 'metricsCollectorSidecars' with 'metricsCollectors'

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Fix a typo

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Make the init.controller.leaderElection non-pointer

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Make the init.controller.injectSecurityContext non-pointer

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Update a comment for the future works

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Update manifest for the katib-leader-election

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Fix a comment for the KatibConfig API

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Replace 'configapi' with 'configv1beta1'

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Remove debug code

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Put constant for the default KatibConfig value on /pkg/apis/config/v1beta1

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Use 'sigs.k8s.io/yaml' instead of 'github.com/ghodss/yaml'

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Avoid to depend on k8s.io/utils directly

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Fix a typo

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Refactor katib-config using kustomize vars

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Fix a typo

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Put KatibConfig on every install

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Remove configMapGenerator from the katib-with-kubeflow

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

---------

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-08-01 18:45:08 +00:00
Yuki Iwai f1e3f3adcd
Drop Kubernetes v1.24 and support Kubernetes v1.27 (#2182)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-08-01 16:32:06 +00:00
Alex b7295cb548
[SDK] Add namespace parameter to KatibClient (#2183)
* [SDK] Add namespace parameter to KatibClient

Co-authored-by: andreafehrman <andrea.k.fehrman@vanderbilt.edu>
Co-authored-by: ryanrusson <ryan.russon@gmail.com>

* Update sdk/python/v1beta1/kubeflow/katib/api/katib_client.py

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Co-authored-by: andreafehrman <andrea.k.fehrman@vanderbilt.edu>
Co-authored-by: ryanrusson <ryan.russon@gmail.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2023-08-01 15:18:08 +00:00
Yuki Iwai d67c07b7a1
Drop Kubernetes v1.23 and support Kubernetes v1.26 (#2177)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-07-31 20:15:29 +00:00
Yuki Iwai a6938481b1
Replace action to setup minikube with medyagh/setup-minikube (#2178)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-07-31 15:42:30 +00:00
Andrey Velichkevich a20bc85b94
[UI] Remove Deprecated Katib UI (#2179)
* [UI] Remove Deprecated Katib UI

* Fix UI Developer doc
2023-07-25 09:53:29 +00:00
dependabot[bot] 89bd21f710
Bump word-wrap from 1.2.3 to 1.2.4 in /pkg/new-ui/v1beta1/frontend (#2174)
Bumps [word-wrap](https://github.com/jonschlinkert/word-wrap) from 1.2.3 to 1.2.4.
- [Release notes](https://github.com/jonschlinkert/word-wrap/releases)
- [Commits](https://github.com/jonschlinkert/word-wrap/compare/1.2.3...1.2.4)

---
updated-dependencies:
- dependency-name: word-wrap
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-19 11:21:24 +00:00
dependabot[bot] eb901c192e
Bump word-wrap from 1.2.3 to 1.2.4 in /pkg/ui/v1beta1/frontend (#2173)
Bumps [word-wrap](https://github.com/jonschlinkert/word-wrap) from 1.2.3 to 1.2.4.
- [Release notes](https://github.com/jonschlinkert/word-wrap/releases)
- [Commits](https://github.com/jonschlinkert/word-wrap/compare/1.2.3...1.2.4)

---
updated-dependencies:
- dependency-name: word-wrap
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-19 09:56:23 +00:00
dependabot[bot] f740889569
Bump webpack from 5.74.0 to 5.88.2 in /pkg/ui/v1beta1/frontend (#2172)
Bumps [webpack](https://github.com/webpack/webpack) from 5.74.0 to 5.88.2.
- [Release notes](https://github.com/webpack/webpack/releases)
- [Commits](https://github.com/webpack/webpack/compare/v5.74.0...v5.88.2)

---
updated-dependencies:
- dependency-name: webpack
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-18 18:48:22 +00:00
dependabot[bot] 3b7c77a582
Bump golang.org/x/net from 0.5.0 to 0.7.0 (#2122)
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.5.0 to 0.7.0.
- [Release notes](https://github.com/golang/net/releases)
- [Commits](https://github.com/golang/net/compare/v0.5.0...v0.7.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-18 18:46:22 +00:00
dependabot[bot] 067c119337
Bump google.golang.org/grpc from 1.47.0 to 1.53.0 (#2167)
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.47.0 to 1.53.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.47.0...v1.53.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-18 15:30:26 +00:00
dependabot[bot] c5552362dc
Bump semver from 5.7.1 to 5.7.2 in /pkg/new-ui/v1beta1/frontend (#2170)
Bumps [semver](https://github.com/npm/node-semver) from 5.7.1 to 5.7.2.
- [Release notes](https://github.com/npm/node-semver/releases)
- [Changelog](https://github.com/npm/node-semver/blob/v5.7.2/CHANGELOG.md)
- [Commits](https://github.com/npm/node-semver/compare/v5.7.1...v5.7.2)

---
updated-dependencies:
- dependency-name: semver
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-18 15:12:22 +00:00
dependabot[bot] 86602b56ed
Bump semver from 6.3.0 to 6.3.1 in /pkg/ui/v1beta1/frontend (#2169)
Bumps [semver](https://github.com/npm/node-semver) from 6.3.0 to 6.3.1.
- [Release notes](https://github.com/npm/node-semver/releases)
- [Changelog](https://github.com/npm/node-semver/blob/v6.3.1/CHANGELOG.md)
- [Commits](https://github.com/npm/node-semver/compare/v6.3.0...v6.3.1)

---
updated-dependencies:
- dependency-name: semver
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-18 13:00:22 +00:00
dependabot[bot] 6bb3a3f3f3
Bump tough-cookie from 4.1.2 to 4.1.3 in /pkg/ui/v1beta1/frontend (#2168)
Bumps [tough-cookie](https://github.com/salesforce/tough-cookie) from 4.1.2 to 4.1.3.
- [Release notes](https://github.com/salesforce/tough-cookie/releases)
- [Changelog](https://github.com/salesforce/tough-cookie/blob/master/CHANGELOG.md)
- [Commits](https://github.com/salesforce/tough-cookie/compare/v4.1.2...v4.1.3)

---
updated-dependencies:
- dependency-name: tough-cookie
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-10 08:55:54 +00:00
Andrey Velichkevich ede6e7410c
[UI] Fix Trial Logs when Kubernetes Job Fails (#2164)
* [UI] Fix Trial Logs when Kubernetes Job Fails

* Return error when Pod is in the Pending state
2023-06-20 02:20:40 +00:00
Andrew Scribner 37b237f560
Remove Charmed Operators for Katib (#2161)
This PR removes the Charmed Operators for Katib, as well as the associated tests.  In the past this repo was the source of truth for these operators, but they have since been maintained [here](https://github.com/canonical/katib-operators/) and we've done a poor job of keeping the repos in sync.  This commit removes the redundancy.
2023-06-07 17:31:58 +00:00
pheianox 6e0069bc7e
Add PITS Global Data Recovery Services to the adopters list (#2160)
* Add PITS Global Data Recovery Services to the adopters list

* Apply alphabetical order in the adopters list
2023-05-26 15:44:21 +00:00
dependabot[bot] 0102f1fc1f
Bump socket.io-parser from 4.2.2 to 4.2.3 in /pkg/new-ui/v1beta1/frontend (#2158)
Bumps [socket.io-parser](https://github.com/socketio/socket.io-parser) from 4.2.2 to 4.2.3.
- [Release notes](https://github.com/socketio/socket.io-parser/releases)
- [Changelog](https://github.com/socketio/socket.io-parser/blob/main/CHANGELOG.md)
- [Commits](https://github.com/socketio/socket.io-parser/compare/4.2.2...4.2.3)

---
updated-dependencies:
- dependency-name: socket.io-parser
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-24 08:31:19 +00:00
dependabot[bot] b9dc63efb5
Bump github.com/docker/distribution from 2.8.1+incompatible to 2.8.2+incompatible (#2154)
Bumps [github.com/docker/distribution](https://github.com/docker/distribution) from 2.8.1+incompatible to 2.8.2+incompatible.
- [Release notes](https://github.com/docker/distribution/releases)
- [Commits](https://github.com/docker/distribution/compare/v2.8.1...v2.8.2)

---
updated-dependencies:
- dependency-name: github.com/docker/distribution
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-15 20:16:31 +00:00
dependabot[bot] 279f6794dc
Bump github.com/docker/docker from 20.10.16+incompatible to 20.10.24+incompatible (#2142)
Bumps [github.com/docker/docker](https://github.com/docker/docker) from 20.10.16+incompatible to 20.10.24+incompatible.
- [Release notes](https://github.com/docker/docker/releases)
- [Commits](https://github.com/docker/docker/compare/v20.10.16...v20.10.24)

---
updated-dependencies:
- dependency-name: github.com/docker/docker
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-10 03:35:43 +00:00
dependabot[bot] 6351e80614
Bump engine.io and socket.io in /pkg/new-ui/v1beta1/frontend (#2152)
Bumps [engine.io](https://github.com/socketio/engine.io) and [socket.io](https://github.com/socketio/socket.io). These dependencies needed to be updated together.

Updates `engine.io` from 6.2.1 to 6.4.2
- [Release notes](https://github.com/socketio/engine.io/releases)
- [Changelog](https://github.com/socketio/engine.io/blob/main/CHANGELOG.md)
- [Commits](https://github.com/socketio/engine.io/compare/6.2.1...6.4.2)

Updates `socket.io` from 4.5.1 to 4.6.1
- [Release notes](https://github.com/socketio/socket.io/releases)
- [Changelog](https://github.com/socketio/socket.io/blob/main/CHANGELOG.md)
- [Commits](https://github.com/socketio/socket.io/compare/4.5.1...4.6.1)

---
updated-dependencies:
- dependency-name: engine.io
  dependency-type: indirect
- dependency-name: socket.io
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-05 11:33:24 +00:00
Andrey Velichkevich fcea7a36cb
SDK: Import all Kubernetes Models (#2148) 2023-04-20 16:56:40 +00:00
nagar-ajay 195ce776a9
Fix conformance docker image (#2147) 2023-04-16 18:17:19 +00:00
nagar-ajay be965ae9c2
Containerize tests for katib-conformance (#2146) 2023-04-14 12:55:16 +00:00
nagar-ajay 7a4c118410
Namespace and trial pod annotations as CLI argument (#2138)
* disable istio sidecar injection for example manifests

* add namespace as commnad line arg to python test script

* revert disable istio sidecar injection

* add option to pass trial pod annotations

* split command over multiple lines

* remove redundant config loading

* add resource limit to containers of random experiment's trial spec pod

* update code to support already present annotations

* raise NotImplementedError if trailSpec is different from Job

* add metrics-collector-injection to namespace under test if missing
2023-04-10 17:41:54 +00:00
Yuki Iwai 1d3ab5726f
Relax dependencies restriction for the gRPC libraries (#2140)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-04-03 16:48:01 +00:00
Andrey Velichkevich d2d9cab1ca
Add SDK Breaking Change to Changelog (#2133) 2023-03-24 13:58:22 +00:00
Andrey Velichkevich c8fe90ea0f
Add Changelog for Katib v0.15.0 (#2129) 2023-03-24 11:38:22 +00:00
Yuki Iwai af0f775079
Increase the free spaces in CI (#2131)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-03-23 14:33:22 +00:00
Andrey Velichkevich acedc82aad
Bump Katib Python SDK to 0.15.0 version (#2130) 2023-03-22 18:09:44 +00:00
Elena Zioga 2e27185f82
kwa(front): Support all namespaces (#2119)
Add support for all-namespaces in KWA.

Signed-off-by: Elena Zioga <elena@arrikto.com>
2023-02-24 12:50:25 +00:00
Andrey Velichkevich 622af87b42
Add Changelog for Katib v0.15.0-rc.1 (#2123) 2023-02-23 20:16:24 +00:00
Andrey Velichkevich cff0002e6a
Add Changelog for Katib v0.15.0-rc.0 (#2106)
* Add Changelog for Katib v0.15.0-rc.0

* Move Optuna Grid Algorithm to the Core

* Add Breaking and Major Changes
2023-02-23 15:53:24 +00:00
Orfeas Kourkakis b6afce7d89
kwa(front): Update the use of SnackBarService (#2113)
* build: Update COMMIT file

Signed-off-by: Orfeas Kourkakis <orfeas@arrikto.com>

* kwa(front): Update the use of SnackBarService

Update the use of SnackBarService in order to pass required data via a
`config` object and provide MAT_SNACK_BAR_DEFAULT_OPTIONS.

Signed-off-by: Orfeas Kourkakis <orfeas@arrikto.com>

---------

Signed-off-by: Orfeas Kourkakis <orfeas@arrikto.com>
2023-02-22 13:33:42 +00:00
Yuki Iwai 22babe4eb1
UI: Remove an unsed import, EventV1beta1Api (#2116)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-02-17 22:09:36 +00:00
Andrey Velichkevich 1f3dce9032
Bump Katib Python SDK to 0.15.0rc1 version (#2121) 2023-02-15 19:50:36 +00:00
Elena Zioga 1429d61b00
[kwa-trials-logs] Create the LOGS tab of Trial's details page in KWA (#2101)
* backend: Update error message when no logs could be found

* Update the message the backend sends to not just expose that logs are
  not there because 'retain' might not be set, but also because the
  cluster was scaled down.

Signed-off-by: Elena Zioga <elena@arrikto.com>

* frontend: Add LOGS tab in Trial details page

In this commit:

* Create a distinct LOGS tab, which displays the trial's logs in the
  Trial details page.
* Don't show the backend's error popup for logs, but show the message
  error in the admonition.

Signed-off-by: Elena Zioga <elena@arrikto.com>

---------

Signed-off-by: Elena Zioga <elena@arrikto.com>
2023-02-14 19:39:25 +00:00
dependabot[bot] 6064c14806
Bump http-cache-semantics from 4.1.0 to 4.1.1 in /pkg/new-ui/v1beta1/frontend (#2107)
Bumps [http-cache-semantics](https://github.com/kornelski/http-cache-semantics) from 4.1.0 to 4.1.1.
- [Release notes](https://github.com/kornelski/http-cache-semantics/releases)
- [Commits](https://github.com/kornelski/http-cache-semantics/compare/v4.1.0...v4.1.1)

---
updated-dependencies:
- dependency-name: http-cache-semantics
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-02-14 15:34:26 +00:00
Yuki Iwai 099756684f
Reformat katib-operators (#2114)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-02-14 15:14:26 +00:00
Andrey Velichkevich 22b740802a
Bump Katib Python SDK to 0.15.0rc0 version (#2105) 2023-01-28 00:10:02 +00:00
Andrey Velichkevich 3b0fcd20cc
Fix Release Script for Updating SDK Version (#2104) 2023-01-27 03:40:11 +00:00
Johnu George c55414e033
Update Training operator Image in CI (#2103)
* Update training operator image in CI

* Remove deprecated GRPC var

* Remove deprecated GRPC var

* Remove deprecated GRPC var

* Support for k8s v1.25 in CI

* Revert "Support for k8s v1.25 in CI"

This reverts commit 16e6fe4b16.

* Update training operator image in Katib CI
2023-01-26 14:39:10 +00:00
Elena Zioga 7303a3a073
[kwa-actual-links-in-tables] Make links in KWA's tables actual links (#2090)
* build: Update COMMIT file

Signed-off-by: Elena Zioga <elena@arrikto.com>

* frontend: Make links actual links in experiments table

Make KWA's experiments table links actual links by using the new LinkValue
class.

Signed-off-by: Elena Zioga <elena@arrikto.com>

* frontend: Make links actual links in trials table

Make KWA's trials table links actual links by using the new LinkValue
class.

Signed-off-by: Elena Zioga <elena@arrikto.com>

Signed-off-by: Elena Zioga <elena@arrikto.com>
2023-01-26 12:57:10 +00:00
Elena Zioga 026d9ede81
frontend: Rework the trial graph using ECharts in KWA (#2089)
* frontend: Rework the trial graph using echarts

Signed-off-by: Elena Zioga <elena@arrikto.com>

* frontend: Remove d3 references

Signed-off-by: Elena Zioga <elena@arrikto.com>

Signed-off-by: Elena Zioga <elena@arrikto.com>
2023-01-26 12:53:10 +00:00
Orfeas Kourkakis c5923cb37f
kwa(front): Add UI tests with Cypress (#2088)
* kwa(front): Install Cypress

 - Install Cypress & npm scripts for UI tests
 - Remove Protractor files
 - Add README.md file to include UI tests instructions
 - Modify .gitignore

Signed-off-by: Orfeas Kourkakis <orfeas@arrikto.com>

* kwa(front): Add UI tests with Cypress

Add UI tests with Cypress to check that:
 - New Experiment form page loads template without errors.
 - Index page
    * has an "Experiments" title
    * lists experiments without errors
    * renders every experiment name into the table
    * renders properly Status icon for all experiments

Signed-off-by: Orfeas Kourkakis <orfeas@arrikto.com>

* gh-actions(kwa): Add UI tests in test-node action

Signed-off-by: Orfeas Kourkakis <orfeas@arrikto.com>

Signed-off-by: Orfeas Kourkakis <orfeas@arrikto.com>
2023-01-26 12:52:10 +00:00
fischor ff6441b895
More container fields for SuggestionConfig (#2000)
* More container fields for SuggestionConfig

* Inline corev1.Container into SuggestionConfig

* Set default value for suggestion container name

* Append suggestion volume and port only if not present

* Deep-Copy base suggestion container

* Check for suggestion container port number as well

* Prohibit suggestion port to be set in suggestion config
2023-01-25 19:47:05 +00:00
Johnu George 55e6e34e67
Narrow down RBAC rules (#2091)
* Update training operator image in CI

* Remove deprecated GRPC var

* Remove deprecated GRPC var

* Remove deprecated GRPC var

* Support for k8s v1.25 in CI

* Revert "Support for k8s v1.25 in CI"

This reverts commit 16e6fe4b16.

* Narrow down rbac

* Narrow down rbac

* Narrow down rbac

* Narrow down rbac

* Narrow down rbac

* Narrow down rbac

* Narrow down rbac

* Update tekton and argo docs

* Update tekton and argo docs
2023-01-25 16:51:53 +00:00
Andrey Velichkevich 318f66890e
Use Never Resume Policy as Default (#2102) 2023-01-25 14:11:53 +00:00
dependabot[bot] 2a2f124629
Bump ua-parser-js from 0.7.31 to 0.7.33 in /pkg/new-ui/v1beta1/frontend (#2100)
Bumps [ua-parser-js](https://github.com/faisalman/ua-parser-js) from 0.7.31 to 0.7.33.
- [Release notes](https://github.com/faisalman/ua-parser-js/releases)
- [Changelog](https://github.com/faisalman/ua-parser-js/blob/master/changelog.md)
- [Commits](https://github.com/faisalman/ua-parser-js/compare/0.7.31...0.7.33)

---
updated-dependencies:
- dependency-name: ua-parser-js
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-01-24 18:56:25 +00:00
Andrey Velichkevich 35fded7bf7
[SDK] Use Katib Client without Kube Config (#2098) 2023-01-24 18:28:25 +00:00
Yuki Iwai 5f40e1269f
Upgrade Go libraries to resolve security issues (#2094)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-01-24 17:53:25 +00:00
Yuki Iwai 9fbf095f20
Run e2e with various Python versions to verify Python SDK (#2092)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-01-24 17:47:25 +00:00
apoger 0421327a08
Update manifests to enable authorization check mechanisms for katib-UI in kubeflow mode (#2041)
* Upgrade manifests to enable authorization check mechanisms for katib-UI

Changes to install-with-kubeflow manifests:

* Enable istio sidecar injection for katib-ui component

* Add AuthorizationPolicy to allow only istio-ingressgateway
  to talk to katib-ui [user traffic].

* Set APP_DISABLE_AUTH ENV var to false when in kubeflow-mode
  to enable authorization checks in UI's backend

* Extend the RBAC persmissions of katib-ui so it can crate SAR objects
  when in kubeflow-mode

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

* UI(back): Secure /katib/fetch_trial/ route

Introduce authn/authz checks in the backend

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

* review: Remove duplicate dependencies

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

* review: Move patch to a separate file

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>
2023-01-24 16:25:25 +00:00
dependabot[bot] 00c24eb47a
Bump json5 from 1.0.1 to 1.0.2 in /pkg/ui/v1beta1/frontend (#2077)
Bumps [json5](https://github.com/json5/json5) from 1.0.1 to 1.0.2.
- [Release notes](https://github.com/json5/json5/releases)
- [Changelog](https://github.com/json5/json5/blob/main/CHANGELOG.md)
- [Commits](https://github.com/json5/json5/compare/v1.0.1...v1.0.2)

---
updated-dependencies:
- dependency-name: json5
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-01-24 15:40:26 +00:00
Yuki Iwai a44aaea7f2
Add a --prefer-binary flag to 'pip install' command (#2096)
* Pin the h5py version with 3.7.0

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Add a --prefer-binary flag to 'pip install' command

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-01-24 14:25:25 +00:00
Yuki Iwai 5db8349e20
Upgrade grpc-health-probe version to fix some security issues (#2093)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-01-21 09:43:49 +00:00
Yuki Iwai 0749265a22
Upgrade PyTorch version to v1.13.0 (#2082)
* Upgrade PyTorch version to v1.13.0

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Build container images using minikube in E2E tests

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-01-17 15:26:20 +00:00
Andrey Velichkevich 6bcbd25851
[SDK] Use Katib SDK for E2E Tests (#2075)
* [SDK] Use Katib SDK for E2E tests

* Fix pvc deletion

* Add list_suggestions API

* Remove wait from edit Experiment function

* Add shell to GitHub action

* Add protobuf package to Katib SDK

* Add Experiment Timeout to 40 min

* Modify SDK Examples

* Fix example text

* Change to custom_api

* Enable verbose logging for Katib E2E

* Use expected condition arg

* Add timeout and delete options

* Modify logging to debug

* Use read API to check resource status
2023-01-16 16:40:59 +00:00
Elena Zioga ae68b77c35
frontend: Enable actions in experiment graph (#2065)
Signed-off-by: Elena Zioga <elena@arrikto.com>

Signed-off-by: Elena Zioga <elena@arrikto.com>
2023-01-16 10:25:59 +00:00
Elena Zioga 2a6497f2b3
frontend: Show message in case of uncompleted trial instead of the graph (#2063)
* frontend: Show message in case of uncompleted trial instead of the graph

Signed-off-by: Elena Zioga <elena@arrikto.com>

* build: Update COMMIT file

Signed-off-by: Elena Zioga <elena@arrikto.com>

* frontend: Define the spinner text

Signed-off-by: Elena Zioga <elena@arrikto.com>

Signed-off-by: Elena Zioga <elena@arrikto.com>
2023-01-16 10:23:59 +00:00
Elena Zioga 1a128ae7bc
frontend: Add source maps in the browser (#2043)
* Enable source maps in both development and production.

Signed-off-by: Elena Zioga <elena@arrikto.com>

Signed-off-by: Elena Zioga <elena@arrikto.com>
2023-01-16 10:22:59 +00:00
Yuki Iwai d76f01edbf
Upgrade Tensorflow version (#2079)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-01-11 10:59:15 +00:00
Yuki Iwai 45a474432a
Upgrade Python version to 3.10 (#2057)
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-01-06 09:03:56 +00:00
dependabot[bot] aff39d80cb
Bump json5 from 1.0.1 to 1.0.2 in /pkg/new-ui/v1beta1/frontend (#2076)
Bumps [json5](https://github.com/json5/json5) from 1.0.1 to 1.0.2.
- [Release notes](https://github.com/json5/json5/releases)
- [Changelog](https://github.com/json5/json5/blob/main/CHANGELOG.md)
- [Commits](https://github.com/json5/json5/compare/v1.0.1...v1.0.2)

---
updated-dependencies:
- dependency-name: json5
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-01-05 19:49:11 +00:00
Yuki Iwai 9270274b91
Remove Chocolate Suggestion Service (#2071)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-01-05 18:40:11 +00:00
Dejan Golubovic c9dd1b4878
Backend for getting logs of a trial (#2039)
* Backend for getting logs of a trial

* Check Write return + use PrimaryPodLabels

* Add auth + use constants for labels + cleanup

* TODO comment for using controller-runtime client for logs

* Authorization for list pods and get logs, reduce RBAC

* Use corev1 for specifying resources, edit kf install RBAC

* Check namespace and trialName from request

* Remove auth checks for listing the pods

* Use context.Background()
2022-12-24 12:57:15 +00:00
Yuki Iwai 7c509bac56
Support for grid search algorithm in Optuna Suggestion Service (#2060)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2022-12-24 06:17:15 +00:00
Yuki Iwai 1dd7251099
Pin the NumPy version with v1.23.5 in some images (#2070)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2022-12-24 03:15:15 +00:00
Yuki Iwai db72ce1987
Upgrade the actions-setup-minikube version to v2.7.2 (#2064)
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>

Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
2022-12-14 16:25:30 +00:00
Johnu George f941ec61e5
Update Owners file (#2056)
* Update training operator image in CI

* Remove deprecated GRPC var

* Remove deprecated GRPC var

* Remove deprecated GRPC var

* Support for k8s v1.25 in CI

* Revert "Support for k8s v1.25 in CI"

This reverts commit 16e6fe4b16.

* Update Owners file
2022-12-10 08:17:25 +00:00
dependabot[bot] e933482d2f
Bump express from 4.17.1 to 4.18.2 in /pkg/new-ui/v1beta1/frontend (#2053)
Bumps [express](https://github.com/expressjs/express) from 4.17.1 to 4.18.2.
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/master/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.17.1...4.18.2)

---
updated-dependencies:
- dependency-name: express
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-09 04:29:11 +00:00
Andrey Velichkevich 54424f2ce6
[SDK] Get Trial Metrics from Katib DB (#2050)
* [SDK] Get Trial Metrics from Katib DB

* Always Copy gRPC API File
2022-12-09 03:56:11 +00:00
dependabot[bot] 331740cce7
Bump qs from 6.5.2 to 6.5.3 in /pkg/new-ui/v1beta1/frontend (#2052)
Bumps [qs](https://github.com/ljharb/qs) from 6.5.2 to 6.5.3.
- [Release notes](https://github.com/ljharb/qs/releases)
- [Changelog](https://github.com/ljharb/qs/blob/main/CHANGELOG.md)
- [Commits](https://github.com/ljharb/qs/compare/v6.5.2...v6.5.3)

---
updated-dependencies:
- dependency-name: qs
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-08 16:19:11 +00:00
Andrey Velichkevich 87b7e7d5b5
Add Conformance Program Doc for AutoML and Training WG (#2048)
* Add Conformance Program Doc for AutoML and Training WG

* Address Review Comments
2022-12-08 13:35:10 +00:00
Elena Zioga 01b59a4b68
frontend: Enable sorting in KWA's main table (#2017)
* Enable the sorting functionality in KWA's main table.

Signed-off-by: Elena Zioga <elena@arrikto.com>

Signed-off-by: Elena Zioga <elena@arrikto.com>
2022-12-07 16:57:37 +00:00
dependabot[bot] 4cc9500e73
Bump decode-uri-component from 0.2.0 to 0.2.2 in /pkg/new-ui/v1beta1/frontend (#2051)
Bumps [decode-uri-component](https://github.com/SamVerschueren/decode-uri-component) from 0.2.0 to 0.2.2.
- [Release notes](https://github.com/SamVerschueren/decode-uri-component/releases)
- [Commits](https://github.com/SamVerschueren/decode-uri-component/compare/v0.2.0...v0.2.2)

---
updated-dependencies:
- dependency-name: decode-uri-component
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-07 14:46:37 +00:00
Andrey Velichkevich a668252777
Remove Certificate Chain from Cert Generator (#2045)
* Remove Certificate Chain from Cert Generator

* Update Cert Generator Doc
2022-12-05 15:47:09 +00:00
Shaowei Su 1e4df8dc39
[Fix] add early stopped trials in converter (#2004)
* add early stopped trials in converter

* error out early

* Update pkg/suggestion/v1beta1/internal/trial.py

Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* add incomplete trial filter

* fix ut

* more fixes

* filter on es

* enrich existing tests

Co-authored-by: shaowei su <shaowei.su@airbnb.com>
Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2022-12-05 13:26:08 +00:00
Elena Zioga 9fe1bd6e73
frontend: Show the successful trials in the experiment graph (#2013) (#2033)
* Show the successful trials in the experiment graph.

Signed-off-by: Elena Zioga <elena@arrikto.com>

Signed-off-by: Elena Zioga <elena@arrikto.com>
2022-12-05 10:24:09 +00:00
zhixian82 55bdcbbc3d
fix: only validate Kubernetes Job (#2025) 2022-12-03 12:04:06 +00:00
Andrey Velichkevich dc24278245
Add Trial Labels During Pod Mutation (#2047) 2022-12-02 18:11:16 +00:00
zhixian82 3cbf3ecc45
add resources to earlystopping container (#2038) 2022-12-02 14:01:17 +00:00
Elena Zioga 0d0e77f90c
frontend: Migrate from tslint to eslint in KWA (#2042)
* frontend: Remove TSLint

Remove TSLint since it's deprecated.

Signed-off-by: Elena Zioga <elena@arrikto.com>

* frontend: Introduce ESLint

Introduce ESLint by using the following Angular command [1]:

ng add @angular-eslint/schematics

[1] https://github.com/angular-eslint/angular-eslint#quick-start-with-angular-v12-and-later

Signed-off-by: Elena Zioga <elena@arrikto.com>

* frontend: Fix linting errors

Fix linting errors.

Signed-off-by: Elena Zioga <elena@arrikto.com>

* gh-actions: Add GH action to run a lint check

Introduce a Github action to run a lint check.

Signed-off-by: Elena Zioga <elena@arrikto.com>

Signed-off-by: Elena Zioga <elena@arrikto.com>
2022-12-01 13:09:44 +00:00
Elena Zioga 88e6787168
frontend: Make trials table support pagination/sorting/filtering (#2040)
* build: Update the COMMIT file

* Update the COMMIT file.

Signed-off-by: Elena Zioga <elena@arrikto.com>

* frontend: Support paging/sorting/filtering in trials table (#1441)

* Make trials table support paging, sorting and filtering.

Signed-off-by: Elena Zioga <elena@arrikto.com>

* frontend: Create unit tests for trials table (#1441)

* Create unit tests for trials-table component.

Signed-off-by: Elena Zioga <elena@arrikto.com>

Signed-off-by: Elena Zioga <elena@arrikto.com>
2022-12-01 11:51:45 +00:00
Yuki Iwai 6b54eb276c
Add scripts to verify generated codes and Go Modules (#1999)
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>

Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
2022-11-30 19:03:16 +00:00
dependabot[bot] bd91301d36
Bump tensorflow from 2.9.1 to 2.9.3 in /cmd/suggestion/nas/enas/v1beta1 (#2029)
Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.9.1 to 2.9.3.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v2.9.1...v2.9.3)

---
updated-dependencies:
- dependency-name: tensorflow
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-29 07:43:42 +00:00
Elena Zioga 7f4eb27099
Dedicated yaml tab for Trials (#2034)
* frontend: Create a yaml tab for Trials (#2011)

* Create a dedicated yaml tab for Trials.

Signed-off-by: Elena Zioga <elena@arrikto.com>

* frontend: Rename components

* Rename trial-modal component to trial-details.
* Rename trial-modal-overview component to trial-overview.

Signed-off-by: Elena Zioga <elena@arrikto.com>

Signed-off-by: Elena Zioga <elena@arrikto.com>
2022-11-28 19:20:14 +00:00
Andrey Velichkevich b123dbf2b5
[Test] Reduce Katib GitHub Action Runs (#2036)
* [Test] Reduce Katib GitHub Action Runs

* Add cancel-in-progress flag

* Use single job for Charmed Katib

* Add cancel-in-progress for all actions except publish

* Bump ubuntu to 20.04 for Charmed tests
2022-11-28 14:37:15 +00:00
apoger 831e1d39bc
Add authorization mechanisms in new Katib UI backend (#1983)
* UI(back): Add authorization mechanisms in new Katib UI backend

* Introduce helper ENV vars and functions for authentication and
  authorization checks. The authz checks are using SubjectAcessReviews
  objects.
  * BACKEND_MODE={dev,prod}: skip authz when in dev mode
  * APP_DISABLE_AUTH={bool}: skip authz if explicity requested

* Introduce a client-go client to construct SubjectAccessReview objects.

* Before any request proceed to K8s api-server:
  * check if authorization must be skipped (BACKEND_MODE, APP_DISBLE_AUTH)
  * check if a username is proviced in request Header
  * query the K8s api-server with SAR to ensure that the user has
    appropriate access privilleges

* Replace the /katib/fetch_experiment/ route with /katib/fetch_namespaces_experiments.
  This route expects a namespace as a query parameter from which all experiments will be fetched.

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

* UI(front): Provide a namespace as a query parameter

This is needed for the new /katib/fetch_namespaced_experiments route.

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

* Update README for running locally without auth

Update the README of the web app to expose that devs should set
APP_DISABLE_AUTH=true when running locally, since there's no authnz when
running locally.

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

* remove duplicated variable types

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

* Review fixes

* proper error handling.
* switch to Go's build-in errors package.
* set appropriate verbs when constructing SAR objects.

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

* review: Use controller-runtime client to create SAR objects

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

* Review fixes

* fix backend routes.
  * '/katib/fetch_namespaces' to fetch experiments in a namespace
  * 'FetchExperiments' handler

* hit the appropriate route from frontend and provide namespace as a
  query parameter  to fetch experiments

* remove remove BACKEND_MODE env var in
  favour of the more specific APP_DISABLE_AUTH

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

* Review fixes

* Add constants for CRUD actions
* Add plural for experiments and suggestions as constants
* Add GetUsername logic under IsAuthorized and handle errors properly
* Have APP_DISABLE_AUTH by default as true, since currently Katib
  doesn't support this feature in standalone mode.

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>
2022-11-28 10:29:13 +00:00
Orfeas Kourkakis 0a1cb31364
[bugfix] Fix value passing bug in New Experiment form (#2027)
* [bugfix] Fix value passing bug in New Experiment form

Add missing logic in New Experiment form in order to pass the value
of the editor content in Metrics Collector tab, when Kind is set to
Custom.

* Adjust unit tests for custom yaml metrics collector
2022-11-23 13:48:43 +00:00
dependabot[bot] d97c8ae049
Bump tensorflow from 2.9.1 to 2.9.3 in /cmd/metricscollector/v1beta1/tfevent-metricscollector (#2028)
Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.9.1 to 2.9.3.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v2.9.1...v2.9.3)

---
updated-dependencies:
- dependency-name: tensorflow
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-22 17:45:42 +00:00
dependabot[bot] 2cbaf8c5fd
Bump engine.io from 6.2.0 to 6.2.1 in /pkg/new-ui/v1beta1/frontend (#2032)
Bumps [engine.io](https://github.com/socketio/engine.io) from 6.2.0 to 6.2.1.
- [Release notes](https://github.com/socketio/engine.io/releases)
- [Changelog](https://github.com/socketio/engine.io/blob/main/CHANGELOG.md)
- [Commits](https://github.com/socketio/engine.io/compare/6.2.0...6.2.1)

---
updated-dependencies:
- dependency-name: engine.io
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-22 13:37:42 +00:00
dependabot[bot] 65e41951a2
Bump tensorflow from 2.9.1 to 2.9.3 in /examples/v1beta1/trial-images/enas-cnn-cifar10 (#2031)
Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.9.1 to 2.9.3.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v2.9.1...v2.9.3)

---
updated-dependencies:
- dependency-name: tensorflow
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-22 13:36:42 +00:00
dependabot[bot] b3c3807358
Bump tensorflow from 2.9.1 to 2.9.3 in /examples/v1beta1/trial-images/tf-mnist-with-summaries (#2030)
Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.9.1 to 2.9.3.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v2.9.1...v2.9.3)

---
updated-dependencies:
- dependency-name: tensorflow
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-22 05:24:41 +00:00
dependabot[bot] 7d4d44f23a
Bump minimatch from 3.0.4 to 3.1.2 in /pkg/ui/v1beta1/frontend (#2026)
Bumps [minimatch](https://github.com/isaacs/minimatch) from 3.0.4 to 3.1.2.
- [Release notes](https://github.com/isaacs/minimatch/releases)
- [Commits](https://github.com/isaacs/minimatch/compare/v3.0.4...v3.1.2)

---
updated-dependencies:
- dependency-name: minimatch
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-21 12:07:12 +00:00
Orfeas Kourkakis ae1655c6f9
KWA: Use new Editor component (Monaco) (#2023)
* kwa(front): Add new Editor component

Import new Editor component from Kubeflow Common Library and replace
all instances of previous Ace Editor.

Signed-off-by: Orfeas Kourkakis <orfeas@arrikto.com>

* Update COMMIT file to a more recent one in Kubeflow

Signed-off-by: Orfeas Kourkakis <orfeas@arrikto.com>

Signed-off-by: Orfeas Kourkakis <orfeas@arrikto.com>
2022-11-21 09:48:12 +00:00
dependabot[bot] a5ef2db7df
Bump loader-utils from 2.0.3 to 2.0.4 in /pkg/ui/v1beta1/frontend (#2015)
Bumps [loader-utils](https://github.com/webpack/loader-utils) from 2.0.3 to 2.0.4.
- [Release notes](https://github.com/webpack/loader-utils/releases)
- [Changelog](https://github.com/webpack/loader-utils/blob/v2.0.4/CHANGELOG.md)
- [Commits](https://github.com/webpack/loader-utils/compare/v2.0.3...v2.0.4)

---
updated-dependencies:
- dependency-name: loader-utils
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-17 05:57:10 +00:00
Orfeas Kourkakis 24c970b69e
kwa(build): Introduce COMMIT file for building KWA (#2014)
Introduce COMMIT file that contains the commit where Katib needs to
checkout inside Kubeflow's common code in order to be built. This file
was integrated in the following places as well, thus a developer may
only update one file each time we need to checkout to a newer commit.
 - Dockerfile
 - GH actions
 - README.md

Signed-off-by: Orfeas Kourkakis <orfeas@arrikto.com>

Signed-off-by: Orfeas Kourkakis <orfeas@arrikto.com>
2022-11-16 18:09:10 +00:00
dependabot[bot] c50e1d32d8
Bump loader-utils from 1.4.1 to 1.4.2 in /pkg/new-ui/v1beta1/frontend (#2012)
Bumps [loader-utils](https://github.com/webpack/loader-utils) from 1.4.1 to 1.4.2.
- [Release notes](https://github.com/webpack/loader-utils/releases)
- [Changelog](https://github.com/webpack/loader-utils/blob/v1.4.2/CHANGELOG.md)
- [Commits](https://github.com/webpack/loader-utils/compare/v1.4.1...v1.4.2)

---
updated-dependencies:
- dependency-name: loader-utils
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-16 18:06:11 +00:00
Dejan Golubovic cdd0b90995
Add CERN to adopters (#2010) 2022-11-14 12:51:26 +00:00
Elena Zioga 9e0e173393
frontend: Fix 500 error after detail page refresh (#1967) (#2001)
Fix 500 error when refreshing KWA's detail page by also adding the
namespace variable as a query param to the route.

Signed-off-by: Elena Zioga <elena@arrikto.com>

Signed-off-by: Elena Zioga <elena@arrikto.com>
2022-11-14 08:38:25 +00:00
Elena Zioga 0848e0303d
[kwa-kfp-component] Introduce KWA's frontend component for kfp links (#1991)
* Introduce the kfp-run component as a distinct component.
* Make the pipeline button a link.

Signed-off-by: Elena Zioga <elena@arrikto.com>

Signed-off-by: Elena Zioga <elena@arrikto.com>
2022-11-11 15:21:52 +00:00
Orfeas Kourkakis 0ee6062762
gh-actions: Extend action to run Frontend Unit tests (#1998)
* gh-actions: Extend action to run Frontend Unit tests

Extend Frontend Test action to run also KWA frontend unit tests.

Signed-off-by: Orfeas Kourkakis <orfeas@arrikto.com>

* gh-actions: Exclude actions when there are only UI changes

Prevent the following workflows when a PR contains changes that affect
only the frontend:
 - Charmed Katib
 - E2E Test with darts-cnn-cifar10
 - E2E Test with enas-cnn-cifar10
 - E2E Test with mxnet-mnist
 - E2E Test with pytorch-mnist
 - E2E Test with simple-pbt
 - E2E Test with tf-mnist-with-summaries
 - Go Test
 - Publish AutoML Algorithm Images
 - Publish Katib Core Images
 - Publish Trial Images
 - Python Test
 - Shellcheck

Signed-off-by: Orfeas Kourkakis <orfeas@arrikto.com>

* gh-actions: Add action to build Katib UI image.

Signed-off-by: Orfeas Kourkakis <orfeas@arrikto.com>

Signed-off-by: Orfeas Kourkakis <orfeas@arrikto.com>
2022-11-11 13:03:51 +00:00
Andrey Velichkevich b1ed847f48
Add More Katib Presentations 2022 (#2009) 2022-11-10 23:33:50 +00:00
dependabot[bot] 390dba507b
Bump socket.io-parser from 4.0.4 to 4.0.5 in /pkg/new-ui/v1beta1/frontend (#2005)
Bumps [socket.io-parser](https://github.com/socketio/socket.io-parser) from 4.0.4 to 4.0.5.
- [Release notes](https://github.com/socketio/socket.io-parser/releases)
- [Changelog](https://github.com/socketio/socket.io-parser/blob/main/CHANGELOG.md)
- [Commits](https://github.com/socketio/socket.io-parser/compare/4.0.4...4.0.5)

---
updated-dependencies:
- dependency-name: socket.io-parser
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-10 08:32:59 +00:00
Shaowei Su d8fbe6e60e
Fix main process retrieve logic for early stopping (#1988)
Co-authored-by: shaowei su <shaowei.su@airbnb.com>
2022-11-09 20:34:40 +00:00
dependabot[bot] da836bbfe6
Bump loader-utils from 1.4.0 to 1.4.1 in /pkg/new-ui/v1beta1/frontend (#2003)
Bumps [loader-utils](https://github.com/webpack/loader-utils) from 1.4.0 to 1.4.1.
- [Release notes](https://github.com/webpack/loader-utils/releases)
- [Changelog](https://github.com/webpack/loader-utils/blob/v1.4.1/CHANGELOG.md)
- [Commits](https://github.com/webpack/loader-utils/compare/v1.4.0...v1.4.1)

---
updated-dependencies:
- dependency-name: loader-utils
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-09 20:26:40 +00:00
Elena Zioga c25518ae93
UI: Rename and right align the age column (#1989)
* Rename the Age header to Created at and right align it.

Signed-off-by: Elena Zioga <elena@arrikto.com>

Signed-off-by: Elena Zioga <elena@arrikto.com>
2022-11-07 10:25:38 +00:00
Johnu George 54b020b44e
Support for k8s v1.25 in CI (#1997)
* Support for k8s v1.25 in CI

* Revert "Support for k8s v1.25 in CI"

This reverts commit 16e6fe4b16.

* Support for k8s v1.25 in CI

* Support for k8s v1.25 in CI

* Support for k8s v1.25 in CI

* Add Readme changes
2022-11-04 10:06:17 +00:00
Yuki Iwai 68ecb1c251
[chore] Upgrade docker/metadata-action, actions/checkout, and actions/setup-python version (#1996)
* [chore] Upgrade docker/metadata-action to v4

Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>

* [chore] Upgrade actions/checkout to v3

Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>

* [chore] Upgrade action/setup-python version to v4

Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>

Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
2022-11-03 11:14:04 +00:00
Johnu George 4a2db414d8
Remove deprecated variable from GRPC definitions (#1994)
* Update training operator image in CI

* Remove deprecated GRPC var

* Remove deprecated GRPC var

* Remove deprecated GRPC var

* Support for k8s v1.25 in CI

* Revert "Support for k8s v1.25 in CI"

This reverts commit 16e6fe4b16.
2022-11-03 10:32:04 +00:00
Elena Zioga fadd9d826b
[kwa-show-status-first] Show the trials table's status column first (#1990)
* Move the status column to the first position of the trials table as
  it is in the other tables.

Signed-off-by: Elena Zioga <elena@arrikto.com>

Signed-off-by: Elena Zioga <elena@arrikto.com>
2022-11-03 08:38:04 +00:00
Yuki Iwai 570a3e68ff
[chore] Upgrade Go version to v1.19 (#1995)
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>

Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
2022-11-03 03:02:04 +00:00
Yuki Iwai 6b55540814
Use the katib-new-ui for Charmed gh-actions (#1987)
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>

Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
2022-11-01 14:13:46 +00:00
Elena Zioga 766fef97a0
UI: Make KWA's main table responsive and add toolbar (#1982)
* UI: Make KWA's main table responsive and add toolbar

* Add a top row toolbar with the title of the app and the button to
  create a new Experiment.
* Replace the card with a responsive table that shows the items. The
  component also has a paginator.

Signed-off-by: Elena Zioga <elena@arrikto.com>

* build: Update Dockerfile and README file

Update Dockerfile and README file to check out to the commit in master
branch from the Kubeflow repository that includes the corresponding
changes.

Signed-off-by: Elena Zioga <elena@arrikto.com>

Signed-off-by: Elena Zioga <elena@arrikto.com>
2022-10-31 13:55:49 +00:00
dependabot[bot] aee2109752
Bump ansi-html and react-scripts in /pkg/ui/v1beta1/frontend (#1986)
Removes [ansi-html](https://github.com/Tjatse/ansi-html). It's no longer used after updating ancestor dependency [react-scripts](https://github.com/facebook/create-react-app/tree/HEAD/packages/react-scripts). These dependencies need to be updated together.


Removes `ansi-html`

Updates `react-scripts` from 3.2.0 to 5.0.1
- [Release notes](https://github.com/facebook/create-react-app/releases)
- [Changelog](https://github.com/facebook/create-react-app/blob/main/CHANGELOG-3.x.md)
- [Commits](https://github.com/facebook/create-react-app/commits/react-scripts@5.0.1/packages/react-scripts)

---
updated-dependencies:
- dependency-name: ansi-html
  dependency-type: indirect
- dependency-name: react-scripts
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-27 13:35:19 +00:00
dependabot[bot] e7f8eb1467
Bump ansi-regex in /pkg/ui/v1beta1/frontend (#1985)
Bumps [ansi-regex](https://github.com/chalk/ansi-regex), [ansi-regex](https://github.com/chalk/ansi-regex) and [ansi-regex](https://github.com/chalk/ansi-regex). These dependencies needed to be updated together.

Updates `ansi-regex` from 5.0.0 to 5.0.1
- [Release notes](https://github.com/chalk/ansi-regex/releases)
- [Commits](https://github.com/chalk/ansi-regex/compare/v5.0.0...v5.0.1)

Updates `ansi-regex` from 4.1.0 to 5.0.1
- [Release notes](https://github.com/chalk/ansi-regex/releases)
- [Commits](https://github.com/chalk/ansi-regex/compare/v5.0.0...v5.0.1)

Updates `ansi-regex` from 3.0.0 to 5.0.1
- [Release notes](https://github.com/chalk/ansi-regex/releases)
- [Commits](https://github.com/chalk/ansi-regex/compare/v5.0.0...v5.0.1)

---
updated-dependencies:
- dependency-name: ansi-regex
  dependency-type: indirect
- dependency-name: ansi-regex
  dependency-type: indirect
- dependency-name: ansi-regex
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-26 15:23:31 +00:00
dependabot[bot] 1c74d6542a
Bump got from 11.7.0 to 11.8.5 in /pkg/new-ui/v1beta1/frontend (#1903)
Bumps [got](https://github.com/sindresorhus/got) from 11.7.0 to 11.8.5.
- [Release notes](https://github.com/sindresorhus/got/releases)
- [Commits](https://github.com/sindresorhus/got/compare/v11.7.0...v11.8.5)

---
updated-dependencies:
- dependency-name: got
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-26 10:11:30 +00:00
dependabot[bot] 0253a013b1
Bump eventsource from 1.0.7 to 1.1.2 in /pkg/ui/v1beta1/frontend (#1895)
Bumps [eventsource](https://github.com/EventSource/eventsource) from 1.0.7 to 1.1.2.
- [Release notes](https://github.com/EventSource/eventsource/releases)
- [Changelog](https://github.com/EventSource/eventsource/blob/master/HISTORY.md)
- [Commits](https://github.com/EventSource/eventsource/compare/v1.0.7...v1.1.2)

---
updated-dependencies:
- dependency-name: eventsource
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-26 06:25:30 +00:00
dependabot[bot] 572c54e97c
Bump terser from 4.8.0 to 4.8.1 in /pkg/ui/v1beta1/frontend (#1918)
Bumps [terser](https://github.com/terser/terser) from 4.8.0 to 4.8.1.
- [Release notes](https://github.com/terser/terser/releases)
- [Changelog](https://github.com/terser/terser/blob/master/CHANGELOG.md)
- [Commits](https://github.com/terser/terser/commits)

---
updated-dependencies:
- dependency-name: terser
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-25 16:21:29 +00:00
dependabot[bot] 7df2035424
Bump eventsource from 1.1.0 to 1.1.1 in /pkg/new-ui/v1beta1/frontend (#1880)
Bumps [eventsource](https://github.com/EventSource/eventsource) from 1.1.0 to 1.1.1.
- [Release notes](https://github.com/EventSource/eventsource/releases)
- [Changelog](https://github.com/EventSource/eventsource/blob/master/HISTORY.md)
- [Commits](https://github.com/EventSource/eventsource/compare/v1.1.0...v1.1.1)

---
updated-dependencies:
- dependency-name: eventsource
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-25 16:04:30 +00:00
dependabot[bot] a5e73d96e6
Bump karma from 6.3.14 to 6.3.16 in /pkg/new-ui/v1beta1/frontend (#1827)
Bumps [karma](https://github.com/karma-runner/karma) from 6.3.14 to 6.3.16.
- [Release notes](https://github.com/karma-runner/karma/releases)
- [Changelog](https://github.com/karma-runner/karma/blob/master/CHANGELOG.md)
- [Commits](https://github.com/karma-runner/karma/compare/v6.3.14...v6.3.16)

---
updated-dependencies:
- dependency-name: karma
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-25 12:57:30 +00:00
dependabot[bot] a995b87ed8
Bump lodash-es from 4.17.11 to 4.17.21 in /pkg/new-ui/v1beta1/frontend (#1835)
Bumps [lodash-es](https://github.com/lodash/lodash) from 4.17.11 to 4.17.21.
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](https://github.com/lodash/lodash/compare/4.17.11...4.17.21)

---
updated-dependencies:
- dependency-name: lodash-es
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-25 12:56:29 +00:00
dependabot[bot] f2ecdab729
Bump async from 2.6.3 to 2.6.4 in /pkg/ui/v1beta1/frontend (#1854)
Bumps [async](https://github.com/caolan/async) from 2.6.3 to 2.6.4.
- [Release notes](https://github.com/caolan/async/releases)
- [Changelog](https://github.com/caolan/async/blob/v2.6.4/CHANGELOG.md)
- [Commits](https://github.com/caolan/async/compare/v2.6.3...v2.6.4)

---
updated-dependencies:
- dependency-name: async
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-25 12:53:29 +00:00
dependabot[bot] 82f2a44ef2
Bump async from 2.6.3 to 2.6.4 in /pkg/new-ui/v1beta1/frontend (#1853)
Bumps [async](https://github.com/caolan/async) from 2.6.3 to 2.6.4.
- [Release notes](https://github.com/caolan/async/releases)
- [Changelog](https://github.com/caolan/async/blob/v2.6.4/CHANGELOG.md)
- [Commits](https://github.com/caolan/async/compare/v2.6.3...v2.6.4)

---
updated-dependencies:
- dependency-name: async
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-25 12:52:30 +00:00
dependabot[bot] 3b7e37a606
Bump minimist from 1.2.5 to 1.2.6 in /pkg/ui/v1beta1/frontend (#1843)
Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6.
- [Release notes](https://github.com/substack/minimist/releases)
- [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6)

---
updated-dependencies:
- dependency-name: minimist
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-25 12:51:30 +00:00
dependabot[bot] abe740dc0d
Bump minimist from 1.2.5 to 1.2.6 in /pkg/new-ui/v1beta1/frontend (#1840)
Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6.
- [Release notes](https://github.com/substack/minimist/releases)
- [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6)

---
updated-dependencies:
- dependency-name: minimist
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-25 12:50:29 +00:00
dependabot[bot] 382129abad
Bump url-parse from 1.4.7 to 1.5.10 in /pkg/ui/v1beta1/frontend (#1826)
Bumps [url-parse](https://github.com/unshiftio/url-parse) from 1.4.7 to 1.5.10.
- [Release notes](https://github.com/unshiftio/url-parse/releases)
- [Commits](https://github.com/unshiftio/url-parse/compare/1.4.7...1.5.10)

---
updated-dependencies:
- dependency-name: url-parse
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-25 12:49:29 +00:00
dependabot[bot] ed447f8238
Bump url-parse from 1.5.3 to 1.5.10 in /pkg/new-ui/v1beta1/frontend (#1825)
Bumps [url-parse](https://github.com/unshiftio/url-parse) from 1.5.3 to 1.5.10.
- [Release notes](https://github.com/unshiftio/url-parse/releases)
- [Commits](https://github.com/unshiftio/url-parse/compare/1.5.3...1.5.10)

---
updated-dependencies:
- dependency-name: url-parse
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-25 12:46:29 +00:00
dependabot[bot] 6ad6524b28
Bump jose from 2.0.5 to 2.0.6 in /pkg/new-ui/v1beta1/frontend (#1952)
Bumps [jose](https://github.com/panva/jose) from 2.0.5 to 2.0.6.
- [Release notes](https://github.com/panva/jose/releases)
- [Changelog](https://github.com/panva/jose/blob/v2.0.6/CHANGELOG.md)
- [Commits](https://github.com/panva/jose/compare/v2.0.5...v2.0.6)

---
updated-dependencies:
- dependency-name: jose
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-25 12:40:29 +00:00
Elena Zioga 6de74e9c5e
UI: Fix unit tests (#1977)
* Fix Katib unit tests.

Signed-off-by: Elena Zioga <elena@arrikto.com>

Signed-off-by: Elena Zioga <elena@arrikto.com>
2022-10-25 10:07:30 +00:00
Yuki Iwai e444ea99e4
Add the documentation for simple-pbt (#1978)
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>

Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
2022-10-24 19:42:42 +00:00
Andrey Velichkevich 1a1065285f
[SDK] Fix namespace parameter in tune API (#1981) 2022-10-24 19:14:42 +00:00
Andrey Velichkevich 47b746fad9
[SDK] Remove Final Keyword from constants (#1980) 2022-10-24 19:04:43 +00:00
Orfeas Kourkakis 8cbaf850c8
UI: Format code (#1979)
Signed-off-by: Orfeas Kourkakis <orfeas@arrikto.com>

Signed-off-by: Orfeas Kourkakis <orfeas@arrikto.com>
2022-10-24 14:38:43 +00:00
dependabot[bot] 42998563dc
Bump protobuf from 3.19.1 to 3.19.5 in /cmd/suggestion/pbt/v1beta1 (#1965)
Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 3.19.1 to 3.19.5.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/generate_changelog.py)
- [Commits](https://github.com/protocolbuffers/protobuf/compare/v3.19.1...v3.19.5)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-22 11:52:42 +00:00
dependabot[bot] 10c1bd740d
Bump protobuf from 3.19.1 to 3.19.5 in /cmd/suggestion/hyperband/v1beta1 (#1960)
Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 3.19.1 to 3.19.5.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/generate_changelog.py)
- [Commits](https://github.com/protocolbuffers/protobuf/compare/v3.19.1...v3.19.5)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-22 00:27:40 +00:00
dependabot[bot] 381d2c10d7
Bump protobuf from 3.19.1 to 3.19.5 in /cmd/suggestion/optuna/v1beta1 (#1964)
Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 3.19.1 to 3.19.5.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/generate_changelog.py)
- [Commits](https://github.com/protocolbuffers/protobuf/compare/v3.19.1...v3.19.5)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-21 18:03:11 +00:00
Elena Zioga 38e201747f
Recreate the Experiments Parallel Coordinates Graph (#1974)
* UI: Import echarts and ngx-echarts (#1879)

* Import echarts module and ngx-echarts directive for Echarts.

Signed-off-by: Elena Zioga <elena@arrikto.com>

* UI: Remove trials graph component (#1879)

* Remove trials graph component.

Signed-off-by: Elena Zioga <elena@arrikto.com>

* UI: Introduce graph's component (#1879)

* Create a new component that uses Echarts Parallel Graph.

Signed-off-by: Elena Zioga <elena@arrikto.com>

* UI: Modify graph's wrapper component (#1879)

* Make the wrapper component use the new graph.
* Show the graph when at least one trial is completed.

Signed-off-by: Elena Zioga <elena@arrikto.com>

* UI: Parallel Graph unit test (#1879)

* Create unit test for Parallel Graph.

Signed-off-by: Elena Zioga <elena@arrikto.com>

Signed-off-by: Elena Zioga <elena@arrikto.com>
2022-10-21 09:55:10 +00:00
dependabot[bot] cc740de4da
Bump protobuf from 3.19.1 to 3.19.5 in /cmd/suggestion/hyperopt/v1beta1 (#1962)
Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 3.19.1 to 3.19.5.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/generate_changelog.py)
- [Commits](https://github.com/protocolbuffers/protobuf/compare/v3.19.1...v3.19.5)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-21 08:44:10 +00:00
dependabot[bot] 5883fd90f1
Bump protobuf from 3.19.1 to 3.19.5 in /cmd/earlystopping/medianstop/v1beta1 (#1963)
Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 3.19.1 to 3.19.5.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/generate_changelog.py)
- [Commits](https://github.com/protocolbuffers/protobuf/compare/v3.19.1...v3.19.5)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-21 01:27:11 +00:00
dependabot[bot] c270eb5e2e
Bump protobuf from 3.19.1 to 3.19.5 in /cmd/suggestion/chocolate/v1beta1 (#1959)
Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 3.19.1 to 3.19.5.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/generate_changelog.py)
- [Commits](https://github.com/protocolbuffers/protobuf/compare/v3.19.1...v3.19.5)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-20 13:35:33 +00:00
dependabot[bot] 38438d88ff
Bump protobuf from 3.19.1 to 3.19.5 in /cmd/suggestion/nas/darts/v1beta1 (#1970)
Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 3.19.1 to 3.19.5.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/generate_changelog.py)
- [Commits](https://github.com/protocolbuffers/protobuf/compare/v3.19.1...v3.19.5)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-20 11:40:32 +00:00
dependabot[bot] 918d796eb6
Bump protobuf from 3.19.1 to 3.19.5 in /cmd/suggestion/skopt/v1beta1 (#1961)
Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 3.19.1 to 3.19.5.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/generate_changelog.py)
- [Commits](https://github.com/protocolbuffers/protobuf/compare/v3.19.1...v3.19.5)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-19 18:31:00 +00:00
Jaeyeon Kim(김재연) aaa42c11c3
[feat] health check for katib-controller (#1934)
* [feat]: add health check endpoint

* remove time sleep in github action test script

* add error check

* update docs
2022-10-11 16:50:26 +00:00
Andrey Velichkevich 96ab64b14a
Create Tune API in the Katib SDK (#1951)
* Create Tune API in the Katib SDK

* Add Final to consts
Modify packages_to_install doc
Create validate objective function

* Add GPU TF Image
Change k8s version package

* Create search module

* Fix link in README

Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Fix licence date

Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2022-10-04 21:19:22 +00:00
Luke Ogg 09a0d6575a
Improve UI API/controller logging to ease troubleshooting (#1966)
* Add logging to ui controller HP job info

* Change case

* Remove extra logging
2022-09-30 10:35:55 +00:00
Yuki Iwai 293a78575f
Add the license to pbt (#1958)
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>

Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
2022-09-28 02:28:08 +00:00
Yuki Iwai f5e4586b74
Add the CI to build multi-platform container images (#1956)
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>

Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
2022-09-23 14:27:10 +00:00
Yuki Iwai e02eb6e849
Drop Kuberenetes v1.21 and introduce Kubernetes v1.24 (#1953)
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>

Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
2022-09-17 12:59:45 +00:00
Yuki Iwai b1e912ce1f
Update the katib version in docs (#1950)
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>

Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
2022-09-09 09:07:28 +00:00
Yuki Iwai 077cf4d523
Support for arm64 in simple-pbt image (#1948)
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>

Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
2022-09-04 14:31:22 +00:00
Yuki Iwai 58a3f4b455
Support arm64 in darts-cnn-cifar10 image (#1947)
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>

Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
2022-09-03 19:16:20 +00:00
Yuki Iwai daf5b9b09a
Support for arm64 in enas-cnn-cifar10 image (#1944)
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>

Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
2022-08-31 17:37:54 +00:00
Yuki Iwai 6870587981
Support for arm64 in pytorch-mnist image (#1943) 2022-08-30 09:56:53 +00:00
Yuki Iwai 3906f8e8e0
Support for arm64 in mxnet-mnist image (#1940)
* support for arm64 in mxnet-mnist image

* fix build script
2022-08-29 18:16:53 +00:00
keisuke umezawa 733d98274c
Upgrade Optuna from v2.x.x to v3.0.0 (#1942)
* Use new distributions in Optuna v3

* Update optuna to v3
2022-08-29 10:23:53 +00:00
Yuki Iwai ca903e25e5
Add --connect-timeout flag to katib-db-manager (#1937) 2022-08-22 11:28:30 +00:00
Yuki Iwai 9c7d797ec5
Add validation webhooks for maxFailedTrialCount and parallelTrialCount (#1936)
* add validation webhooks for maxFailedTrialCount and parallelTrialCount

* [review] simplify validation logic
2022-08-22 06:15:30 +00:00
Yuki Iwai fe4d6e7803
Introduce Automatic platform ARGs (#1935)
* introduce Automatic platform ARGs
Automatic platform ARGs: https://docs.docker.com/engine/reference/builder/#automatic-platform-args-in-the-global-scope

* use BuildKit in actions test for charmed-katib

* use docker buildx in scripts/v1beta1/build.sh
2022-08-21 02:43:29 +00:00
Johnu George 1927e7369d
Update training operator image in CI (#1933) 2022-08-19 17:52:53 +00:00
Jaeyeon Kim(김재연) cc888afa34
Support postgres as a katib db (#1921)
* implement postgres for katib db

* fix yaml lint

* apply go mod tidy

* Update manifests/v1beta1/components/postgres/postgres.yaml

Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* refactoring by reviews

- split openconnection to common packages
- add unit test for postgres db

* change to install only mysql by default

* remove useless import

* add postgres kustomization and e2e test for it

* change mysql installation files to be variable

* fix shell scripts

* fix lint

* fix image name

* set default value on github action workflow

* make postgres deployment to use pvc

* temporarily comments

* uncomment invalid experiments

* test with for loop

* sleep until controller created well

* add some comments

* Update pkg/db/v1beta1/postgres/postgres.go

Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Update pkg/db/v1beta1/postgres/postgres_test.go

Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* refactor by reviews

Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2022-08-19 14:40:53 +00:00
Johnu George 6fa39138ee
Update CHANGELOG for v0.14.0 release (#1932) 2022-08-19 10:37:52 +00:00
Johnu George d5bc30b959
Update Katib SDK version (#1931) 2022-08-19 09:30:53 +00:00
Yuki Iwai cae12e63f8
Implement validations for darts suggestion service (#1926)
* implement validation for darts service

* Update pkg/suggestion/v1beta1/nas/darts/service.py

Co-authored-by: Jaeyeon Kim(김재연) <anencore94@gmail.com>

* Update pkg/suggestion/v1beta1/nas/darts/service.py

Co-authored-by: Jaeyeon Kim(김재연) <anencore94@gmail.com>

* [review] delete todo comment

* [review] change function name validate_algorithm_settings to validate_algorithm_spec

* [review] fix vaiolation comments

* [review] fix condition to validate batch_size

* [review] add comment for developers

* [review] use set instead of list

Co-authored-by: Jaeyeon Kim(김재연) <anencore94@gmail.com>
2022-08-18 10:08:06 +00:00
Yuki Iwai 478e01d612
[chore] Upgrade Go version to v1.18 (#1925) 2022-08-17 04:35:50 +00:00
Yuki Iwai 8d58b0a53e
Implement validation for optuna suggestion service (#1924) 2022-08-08 11:15:53 +00:00
Jaeyeon Kim 42bc6a9d11
[hotfix]: filter by name of experiment (#1920)
Signed-off-by: Jaeyeon Kim <anencore94@gmail.com>
2022-07-30 10:03:45 +00:00
Yuki Iwai 3b37d9329e
Add the pytorch-mnist with GPU support container image (#1916) 2022-07-16 17:39:05 +00:00
Johnu George 8f182c2373
Fix push script to include new images (#1911) 2022-06-30 13:24:16 +00:00
Johnu George 7847094e2d
Updating the training operator image in CI (#1910) 2022-06-29 17:16:23 +00:00
a9p 3f2804b18e
Add PBT to experiment creation form (#1909) 2022-06-29 12:34:23 +00:00
Dejan Golubovic 2d35224926
Distinct page for each Trial in the UI (#1783) 2022-06-29 12:30:23 +00:00
Yuki Iwai cfa2d84632
Upgrade Python and Pytorch versions for some examples (#1906) 2022-06-28 11:50:34 +00:00
Rishit Dagli 9ee8fdaccb
Linting for K8s YAML files (#1901)
* Add yamllint checking

* Update yamllint command

* Revert changes to charmed

* Create new workflow for yamllint

* Create a script to verify installation and run yamllint

* Add `make yamllint`

* Update lint workflow
2022-06-22 15:50:26 +00:00
Rishit Dagli 7bf39225f7
Fixes lint warnings in YAML files (#1902)
* Fix missing document start warnings

* Fix too few spaces before comment warning
2022-06-22 14:25:25 +00:00
a9p 04ac975b70
Population based training (#1833)
* docs: update new algorithm service details

* feat: trial augmentation strategy

* feat: pbt suggestion service

* feat: PbtTemplate and associated test image

* feat: introduce annotation field to trial specifications

* feat: trial assignment changes to support annotations from suggestion

- Add new Annotation types to suggestion_types.go
- Add Annotation object and update Trial parser in trial.py

* feat: update pbt suggestion to use new Annotation api

- Suggestion uses exact match to track spawned trials
- Trials that get transmitted, but not created (or added to experiment) are added back to the respawn pool (population_size consistency)

* chore: gofmt and black run across PBT changes

* feedback: remove tf summary export, change default print unit, reduce range to be percentage compatible.

* feedback: move PBT template to example.

* feedback: changes to inject_webhook and utils.

- Rename mutateVolume to mutateMetricsCollectorVolume
- Add addContainerVolumeMount
- Add getPrimaryContainerIndex

* feedback: change suggestion mutation mount variable name and add to consts

* feedback: Add trial_names to GetSuggestionsReply and change suggestion path to <experiment>/<trial>

* feedback: removed unnecessary checks and moved to async pbt implementation

* feedback: update trial name override location and change annotations override to labels.

* feedback: add pbt to github workflow

* feedback: move labels to ParameterAssignments in GetSuggestionsReply and cleanup pbt.yaml.

* feedback: remove operator changes

* feedback: GHA updates

* feedback: new formatting changes

* feedback: add suggestion-pbt to gh-actions build-load.sh.

* fix: missing pbt->simple-pbt name changes, add simple-pbt to update-images.sh update yaml function (causing failing gha).

* feedback: add pointer to website from main readme for pbt
2022-06-21 15:35:34 +00:00
Yuki Iwai f7261de932
Change integration test sysytem from KinD Cluster to Minikube Cluster (#1899) 2022-06-16 18:32:42 +00:00
Yuki Iwai 2c8758b26f
Allow running examples on Apple Silicon M1 and fix image build errors for arm64 (#1898) 2022-06-15 05:48:01 +00:00
Shaowei Su 170647d6c8
Update job name and service name as configurable for cert generator (#1889)
* add more flags

* rename

* add service validation

* add service read permission

Co-authored-by: shaowei su <shaowei.su@airbnb.com>
2022-06-14 18:18:32 +00:00
Yuki Iwai a75b83f8e3
Upgrade mysql version to v8.0.29 (#1897) 2022-06-14 08:08:32 +00:00
Yuki Iwai 6a21af058e
Add CyberAgent to adopters (#1894) 2022-06-10 10:29:20 +00:00
Yuki Iwai fe2ae99d5b
Upgrade tensorflow-aarch64 version to v2.9.1 (#1891) 2022-06-08 04:24:15 +00:00
Yuki Iwai ab2f59621e
Include MetricsUnavailable condition to Complete in Trial (#1877)
* include MetricsUnavailable condition to Complete in Trial

It is not easy for users to find why Trial failed when training code output incorrect format logs
since the trial-controller sets Succeeded condition with False to Trial if there are unavailable metrics in Katib DB as described in https://github.com/kubeflow/katib/issues/1343.
So we also include MetricsUnavailable condition to Complete in Trial.

* add gh-actions tasks to verify generated codes

* fix gh-actions workflow

* when the number of Failed Trials reaches maxTrialCount, experiment-controller sets Failed to Experiment status

* fix e2e test

* To avoid being set Failed in Experiment status when  and  is equal to 0, we need to add condition,
2022-06-08 04:22:15 +00:00
Yuki Iwai c9001d842f
chore: Upgrade Go libraries to resolve some security issues in the katib-controller (#1888) 2022-06-07 06:33:30 +00:00
Yuki Iwai e2378c3d9c
Fix errors when running the test on Apple Silicon M1 (#1886)
* specify the CPU architecture when running setup-envtest

* upgrade gopsutil version to v3.22.5

* fix shellcheck error
2022-06-06 07:58:29 +00:00
Yuki Iwai 72fff88143
Migrate kubeflow-katib-presubmit to GitHub Actions (#1882)
* migrate test-infra to GitHub Actions

* change python base image to python:3.9-slim

* move from minikube to kind

* separate darts container images by device type

* run e2e test with multi kubernetes version

* disble to deploy katib-ui by default

* change kind kubernetes cluster version

* fix update-images.sh

* fix shellcheck

* fix script to setup katib

* split enas, darts and tf-mnist-with-summaries with trial images

* specicy experiments in pytorch-mnist-e2e-test

* reduce storage size for mysql

* fix trial image name for enas and darts

* fix trial image name for file-metrics-collector-with-json-format

* change kubernetes versions

* do not run e2e test on push master branch

* remove backoffLimit field in examples
2022-06-06 04:50:29 +00:00
Yuki Iwai 5c7dce6bb2
Add semicolon when using `command` command in Makefile (#1885)
* add a semicolon when using the `command` command in Makefile to avoid the `make: command: Command not found` error.

* fix shellcheck
2022-06-04 20:43:27 +00:00
Yuki Iwai b9314c63a8
Fix `HAS_SHELLCHECK` and `HAS_SETUP_ENVTEST` in Makefile (#1884) 2022-06-04 06:32:27 +00:00
aws-kf-ci-bot 91974f1fd5
Remove presubmit tests depending on optional-test-infra (#1871)
* Deprecate Katib presubmit on optional-test-infra

This PR serves as sub-PR to deprecate katib presubmit on optional-test-infra.

* Update prow_config.yaml

Update config file
2022-06-03 20:55:26 +00:00
Yuki Iwai 90c34812ff
Upgrade the Tensorflow version to address some security issues (#1870)
* upgrade the tensorflow version to address some security issues

* fix enas example codes

* upgrade tensorflow to v2.9.1 and tensorflow-aarch64 to v2.9.0

* install protobuf (>= 3.9.2, < 3.20) for tensorflow-aarch64
2022-06-03 19:34:26 +00:00
Yuki Iwai a9d92bd4a2
Upgrade the grpc_health_probe version to v0.4.11 to resolve security vulnerability CVE-2022-27191 (#1875)
* upgrade the grpc_health_probe version to v0.4.11 to resolve security vulnerability CVE-2022-27191

* increase batch size of tfjob-mnist-with-summaries

* add primaryPodLabels to tfjob's example
2022-05-27 08:23:12 +00:00
Jeongwook Park cc29580035
additional metric names should not include objective metric name (#1874) 2022-05-25 11:28:34 +00:00
Yuki Iwai 502695abfc
Upgrade the Kubernetes Python client to 22.6.0 (#1869) 2022-05-23 09:17:01 +00:00
Yuki Iwai 779b331afe
Upgrade the kubebuilder to v3.2.0 and Kubernetes Go libraries to v1.22.2 (#1861)
* upgrade kubebuilder version from v2.3.0 to v3.2.0

* fix envtest for experiment-controller

* fix suite test

To avoid the `timeout waiting for process kube-apiserver to stop` error, we must use the `context.WithCancel`.
Ref: https://github.com/kubernetes-sigs/controller-runtime/issues/1571#issuecomment-945535598

* update Go version to v1.17 in kubeflow-katib-presubmit

To avoid the `../../../../pkg/mod/k8s.io/client-go@v0.22.2/plugin/pkg/client/auth/exec/metrics.go:21:2: package io/fs is not in GOROOT (/usr/local/go/src/io/fs)` error,
we must use Go v1.16 or later, but as described in https://github.com/kubeflow/training-operator/issues/1581,
we do not have permission to update `public.ecr.aws/j1r0q0g6/kubeflow-testing:latest` so we need to update it in this.
2022-05-22 18:33:00 +00:00
Elias Koromilas 10f674a155
Update FPGA XGBoost example (#1865)
Signed-off-by: Elias Koromilas <elias.koromilas@gmail.com>
2022-05-19 15:43:56 +00:00
Yuki Iwai d385d14512
Fix kubeflowkatib/mxnet-mnist image (#1866) 2022-05-18 18:55:27 +00:00
Daniela Plascencia ea23e715cf
operators/katib-*: pins pip and setuptools versions to avoid installation issues (#1867)
Due to pypa/setuptools_scm#713, we are experiencing errors when building
charms both locally and in the CI. This change will prevent the error
from happening until the issue is fixed.
2022-05-18 14:32:27 +00:00
Yuki Iwai 399340418a
Add shellcheck (#1857)
* add shellcheck

* fix /test/e2e/v1beta1/scripts/run-e2e-experiment.sh

* fix test/e2e/v1beta1/scripts/setup-katib.sh

* fix pkg/apis/manager/[v1beta1|health]/build.sh

* fix scripts/v1beta1/deploy.sh

* fix scripts/v1beta1/release.sh

* fix scripts/v1beta1/update-images.sh

* fix scripts/v1beta1/undeploy.sh

* fix hack/update-openapigen.sh

* update boilerplate

* fix hack/gen-python-sdk/gen-sdk.sh

* fix hack/update-codegen.sh

* fix hack/update-mockgen.sh

* use /usr/bin/env bash

* add path to update-boilerplate.sh

* fix script to update boilerplate

* fix comment
2022-05-03 01:15:00 +00:00
Yuki Iwai 9924827ee4
Bump kubeflow-katib and kfp version in notebook examples (#1849) 2022-05-02 08:55:26 +00:00
himkt e9e8eabe6d
Set upper constraint for Optuna (#1852) 2022-04-22 18:11:40 +00:00
Alexey Gorobets 7f1afbdaa9
Don't check if trial's metadata is in spec.parameters (#1848)
* do not check trial parameter in experiment parameters if it's trial's metadata

* revert unnecessary change

* add handle Labels[label] and Annotations[annotation]

* fix test description
2022-04-17 12:52:41 +00:00
Jeongwook Park bc5b82b265
Reconcile trial assignments by comparing suggestion and trials being executed (#1831)
* Reconcile trial assignments by comparing suggestion and trials being executed

* Add a unit test for ReconcileSuggestions

* Change add count
2022-04-17 12:44:41 +00:00
jarred wilson e031b5eeb4
Add prometheus scraping and grafana support to charmed katib-controller operator (#1839)
* Add prometheus scraping and grafana support to charmed operator

* Upgrade black version to 22.3.0 to fix issue with click dependency

* fix: unpin `black`, fix formatting errors

* fix: minor refactor of prometheus integration

Revert to defaults for relation names and paths, where appropriate.

* fix: apply operator linting checks only to source code

* chore: point katib-db to charm in charmhub

* fix: remove unneeded handling of prometheus relation event

* feat: Add template dashboard and alert rules

These are not working properly.  When connecting to grafana, the dashboard shows up but does not populate properly with data.  The data source appears wrong

* fix: handle leader-elected events

Without this, upgrade-charm does not work.

* fix: correctly template the sample grafana dashboard

* fix: remove placeholder grafana/prometheus files

* fix: bump wait time to avoid flaky test failure

Co-authored-by: Andrew Scribner <ca.scribner+1@gmail.com>
2022-04-14 12:17:53 +00:00
Hao Xin c62bacdf11
manifests: Increate the probes seconds (#1845) 2022-04-11 11:45:25 +00:00
Elias Koromilas 2985423cef
Fix the FPGA examples documentation (#1841)
* Set `kube-system` as the suggested namespace

Signed-off-by: Elias Koromilas <elias.koromilas@gmail.com>

* Replace broken link

Signed-off-by: Elias Koromilas <elias.koromilas@gmail.com>
2022-04-06 01:42:22 +00:00
Yuki Iwai d443ed3356
Support JSON format logs in `file-metrics-collector` (#1765)
* support JSON format logs in file-metrics-collector

* review: convert fileFormat to type FileSystemFileFormat

* Update cmd/metricscollector/v1beta1/file-metricscollector/main.go

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* review: remove func (f FileSystemFileFormat) String()

* review: get metricRegList only when the format is TEXT

* review: change var name in a script for e2e

* review: explict specify the cloudml-hypyertune in the Dockerfile

* review: use reflect.DeepEqual instead of go-cmp.Diff

* review: stop using 'JSON' directly in error statements

* review: install specific version cloudml-hypertune

* review: get objType in the updateStopRules function

* review: save optimalObjValue across multiple stopRules

* review: add warning messages to parseTimestamp func

* review: generate test files with go test command

* review: change api for new feature

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2022-04-05 15:20:36 +00:00
jarred wilson 36d0a57019
Upgrade Black to fix linting (#1842)
* Upgrade black version to 22.3.0 to fix issue with click dependency

* fix: unpin `black`, fix formatting errors

Co-authored-by: Andrew Scribner <ca.scribner+1@gmail.com>
2022-04-05 04:12:35 +00:00
Johnu George 9c88bbce8e
Update sdk version to 0.13.0 version (#1832) 2022-03-07 10:57:54 +00:00
Johnu George 4adf83af95
Merge pull request #1829 from johnugeorge/changelog
CHANGELOG for v0.13.0 release
2022-03-07 13:49:05 +05:30
Johnu George 350ca47d8c Update changelog for v0.13.0 release 2022-03-04 20:15:16 +05:30
Johnu George f785f4a63c Update changelog for v0.13.0 release 2022-03-04 20:04:44 +05:30
Johnu George 3597b873b5 Merge remote-tracking branch 'upstream/master' 2022-03-02 20:54:04 +05:30
Yuki Iwai 0515c1ecf3
Fix the Dockerfile for API documentation generation (#1822)
* fix the Dockerfile for API documentation generation

* regenerate API documentation
2022-02-21 07:27:50 +00:00
Andrey Velichkevich bd7cb768b4
Add more reviewers to Katib OWNERS (#1819) 2022-02-18 02:15:59 +00:00
Andrey Velichkevich 876339a51e
Add Argo Workflows and Katib presentation (#1818) 2022-02-17 18:11:35 +00:00
Andrey Velichkevich 5cb631a66e
Update Changelog for Katib v0.13.0-rc.1 release (#1815)
* Update Changelog for Katib v0.13.0-rc.1 release

* Modify name
2022-02-15 19:24:41 +00:00
Andrey Velichkevich e72611698e
Bump Katib Python SDK to 0.13.0rc1 version (#1814) 2022-02-15 19:21:41 +00:00
dependabot[bot] e3ed2c366c
Bump normalize-url from 4.5.0 to 4.5.1 in /pkg/new-ui/v1beta1/frontend (#1809)
Bumps [normalize-url](https://github.com/sindresorhus/normalize-url) from 4.5.0 to 4.5.1.
- [Release notes](https://github.com/sindresorhus/normalize-url/releases)
- [Commits](https://github.com/sindresorhus/normalize-url/commits)

---
updated-dependencies:
- dependency-name: normalize-url
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-15 16:24:41 +00:00
dependabot[bot] 09c59f1acd
Bump follow-redirects from 1.14.7 to 1.14.8 in /pkg/ui/v1beta1/frontend (#1812)
Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.14.7 to 1.14.8.
- [Release notes](https://github.com/follow-redirects/follow-redirects/releases)
- [Commits](https://github.com/follow-redirects/follow-redirects/compare/v1.14.7...v1.14.8)

---
updated-dependencies:
- dependency-name: follow-redirects
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-15 15:24:40 +00:00
dependabot[bot] e086094911
Bump ajv from 6.12.2 to 6.12.6 in /pkg/ui/v1beta1/frontend (#1811)
Bumps [ajv](https://github.com/ajv-validator/ajv) from 6.12.2 to 6.12.6.
- [Release notes](https://github.com/ajv-validator/ajv/releases)
- [Commits](https://github.com/ajv-validator/ajv/compare/v6.12.2...v6.12.6)

---
updated-dependencies:
- dependency-name: ajv
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-15 15:23:41 +00:00
dependabot[bot] dfcd3f06db
Bump ws from 6.2.1 to 6.2.2 in /pkg/new-ui/v1beta1/frontend (#1810)
Bumps [ws](https://github.com/websockets/ws) from 6.2.1 to 6.2.2.
- [Release notes](https://github.com/websockets/ws/releases)
- [Commits](https://github.com/websockets/ws/compare/6.2.1...6.2.2)

---
updated-dependencies:
- dependency-name: ws
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-15 15:22:41 +00:00
Andrey Velichkevich 6a36763ee2
Fix default label for Training Operators (#1808)
* Fix default label for Training Operators

* Fix version comment

* Change the docs

* Change git command
2022-02-15 07:19:40 +00:00
dependabot[bot] adfba2f294
Bump follow-redirects from 1.14.7 to 1.14.8 in /pkg/new-ui/v1beta1/frontend (#1807)
Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.14.7 to 1.14.8.
- [Release notes](https://github.com/follow-redirects/follow-redirects/releases)
- [Commits](https://github.com/follow-redirects/follow-redirects/compare/v1.14.7...v1.14.8)

---
updated-dependencies:
- dependency-name: follow-redirects
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-14 22:02:41 +00:00
dependabot[bot] ac3a70cb19
Bump tensorflow from 2.7.0 to 2.8.0 in /cmd/suggestion/nas/enas/v1beta1 (#1806)
Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.7.0 to 2.8.0.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v2.7.0...v2.8.0)

---
updated-dependencies:
- dependency-name: tensorflow
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-14 21:52:39 +00:00
Peter J De Sousa 3f2aea9c6b
Add min-juju-version element to operators prevent excess PVC creation (#1804) 2022-02-14 21:51:39 +00:00
dependabot[bot] 35ea563d80
Bump karma from 6.3.4 to 6.3.14 in /pkg/new-ui/v1beta1/frontend (#1805)
Bumps [karma](https://github.com/karma-runner/karma) from 6.3.4 to 6.3.14.
- [Release notes](https://github.com/karma-runner/karma/releases)
- [Changelog](https://github.com/karma-runner/karma/blob/master/CHANGELOG.md)
- [Commits](https://github.com/karma-runner/karma/compare/v6.3.4...v6.3.14)

---
updated-dependencies:
- dependency-name: karma
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-11 18:31:40 +00:00
Andrey Velichkevich d3c5388e56
Add Kubernetes version to AWS CI (#1758)
* Add Kubernetes version to AWS CI

* Change to 1.21

* Change to 1.19 version
2022-02-09 18:36:07 +00:00
Andrey Velichkevich 872df4d379
Update Changelog for Katib v0.13.0-rc.0 release (#1794) 2022-01-28 21:42:01 +00:00
Yuki Iwai 4376b2ca72
Update supported Python version for kubeflow-katib SDK (#1797)
* update supported Python version for kubeflow-katib SDK

* stop supporting Python2
2022-01-26 16:24:44 +00:00
Andrey Velichkevich 22fc2fed53
Bump Katib Python SDK to 0.13.0rc0 version (#1793) 2022-01-25 14:59:19 +00:00
Andrey Velichkevich 98284f32a7
Add CPU architecture to release scripts (#1791) 2022-01-25 01:53:19 +00:00
dependabot[bot] c5b89e9b81
Bump nanoid from 3.1.28 to 3.2.0 in /pkg/new-ui/v1beta1/frontend (#1788)
Bumps [nanoid](https://github.com/ai/nanoid) from 3.1.28 to 3.2.0.
- [Release notes](https://github.com/ai/nanoid/releases)
- [Changelog](https://github.com/ai/nanoid/blob/main/CHANGELOG.md)
- [Commits](https://github.com/ai/nanoid/compare/3.1.28...3.2.0)

---
updated-dependencies:
- dependency-name: nanoid
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-22 01:16:47 +00:00
dependabot[bot] 8651486963
Bump log4js from 6.3.0 to 6.4.0 in /pkg/new-ui/v1beta1/frontend (#1787)
Bumps [log4js](https://github.com/log4js-node/log4js-node) from 6.3.0 to 6.4.0.
- [Release notes](https://github.com/log4js-node/log4js-node/releases)
- [Changelog](https://github.com/log4js-node/log4js-node/blob/master/CHANGELOG.md)
- [Commits](https://github.com/log4js-node/log4js-node/compare/v6.3.0...v6.4.0)

---
updated-dependencies:
- dependency-name: log4js
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-22 00:14:47 +00:00
Yuki Iwai 7d4fa5559b
Fix a link for GRPC API documentation (#1786) 2022-01-21 16:15:20 +00:00
Yuki Iwai f5abfd0462
Bump grpc_health_probe version to v0.4.6 (#1781) 2022-01-18 14:23:55 +00:00
dependabot[bot] a163c201bf
Bump shelljs from 0.8.4 to 0.8.5 in /pkg/new-ui/v1beta1/frontend (#1780)
Bumps [shelljs](https://github.com/shelljs/shelljs) from 0.8.4 to 0.8.5.
- [Release notes](https://github.com/shelljs/shelljs/releases)
- [Changelog](https://github.com/shelljs/shelljs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/shelljs/shelljs/compare/v0.8.4...v0.8.5)

---
updated-dependencies:
- dependency-name: shelljs
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-18 14:21:54 +00:00
dependabot[bot] fa1babe3c8
Bump engine.io from 4.1.1 to 4.1.2 in /pkg/new-ui/v1beta1/frontend (#1777)
Bumps [engine.io](https://github.com/socketio/engine.io) from 4.1.1 to 4.1.2.
- [Release notes](https://github.com/socketio/engine.io/releases)
- [Changelog](https://github.com/socketio/engine.io/blob/4.1.2/CHANGELOG.md)
- [Commits](https://github.com/socketio/engine.io/compare/4.1.1...4.1.2)

---
updated-dependencies:
- dependency-name: engine.io
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-14 15:04:03 +00:00
Yuki Iwai 4c03c839f5
Bump `alpine` version to 3.15 (#1779) 2022-01-14 14:55:04 +00:00
dependabot[bot] 75a0572bd8
Bump follow-redirects from 1.5.10 to 1.14.7 in /pkg/ui/v1beta1/frontend (#1773)
Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.5.10 to 1.14.7.
- [Release notes](https://github.com/follow-redirects/follow-redirects/releases)
- [Commits](https://github.com/follow-redirects/follow-redirects/compare/v1.5.10...v1.14.7)

---
updated-dependencies:
- dependency-name: follow-redirects
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-14 01:33:02 +00:00
dependabot[bot] 0997bc5b1c
Bump tar from 6.1.5 to 6.1.11 in /pkg/new-ui/v1beta1/frontend (#1772)
Bumps [tar](https://github.com/npm/node-tar) from 6.1.5 to 6.1.11.
- [Release notes](https://github.com/npm/node-tar/releases)
- [Changelog](https://github.com/npm/node-tar/blob/main/CHANGELOG.md)
- [Commits](https://github.com/npm/node-tar/compare/v6.1.5...v6.1.11)

---
updated-dependencies:
- dependency-name: tar
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-14 00:32:03 +00:00
dependabot[bot] 359acdb379
Bump follow-redirects from 1.13.0 to 1.14.7 in /pkg/new-ui/v1beta1/frontend (#1771)
Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.13.0 to 1.14.7.
- [Release notes](https://github.com/follow-redirects/follow-redirects/releases)
- [Commits](https://github.com/follow-redirects/follow-redirects/compare/v1.13.0...v1.14.7)

---
updated-dependencies:
- dependency-name: follow-redirects
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-14 00:30:02 +00:00
Seongjin Kim ce5421e8da
[new-ui] Bump angular version to 12 (#1712)
* update README.md

* npm install

* version up : Angular 8 -> 9 / TS 3.5 -> 3.8

* version up: angular material 9

* remove deprecated entryComponents

* edit depricated testBed.get -> testBed.inject

* Version up : @angular/core@10 /cli@10

* Version up : @angular/cdk, @angular/material  9 -> 10

* Version up : @angular/cdk-experimental 8 -> 10

* dependency version up : @fortawesome/angular-fontawesome (0.5.0 -> 0.7.0)

* dependency version up : @swimlane/ngx-charts (13.0.4 -> 16.0.0)

* ng update @angular/core@11 @angular/cli@11

* ng update @angular/material@11

* npm install @angular/cdk-experimental@11.2.13

* ng update @angular/core@12 @angular/cli@12

* npm install @fortawesome/angular-fontawesome@0.9.0

* ng update @angular/material

* npm install @angular/cdk-experimental@12.2.7

* npm install @swimlane/ngx-charts@19.0.1

* remvoe error line from @angular/compiler

* npm install @angular/localise@12.2.7

* run format:write

* Edit Dockerfile

* Rollback package-lock.json version to 1

* Update README.md

* remove assets/fonts by gitignore
2022-01-13 13:56:38 +00:00
dependabot[bot] c7458152b1
Bump lodash from 4.17.15 to 4.17.21 in /pkg/ui/v1beta1/frontend (#1770)
Bumps [lodash](https://github.com/lodash/lodash) from 4.17.15 to 4.17.21.
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](https://github.com/lodash/lodash/compare/4.17.15...4.17.21)

---
updated-dependencies:
- dependency-name: lodash
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-12 11:06:10 +00:00
dependabot[bot] a43c786637
Bump path-parse from 1.0.6 to 1.0.7 in /pkg/ui/v1beta1/frontend (#1760)
Bumps [path-parse](https://github.com/jbgutierrez/path-parse) from 1.0.6 to 1.0.7.
- [Release notes](https://github.com/jbgutierrez/path-parse/releases)
- [Commits](https://github.com/jbgutierrez/path-parse/commits/v1.0.7)

---
updated-dependencies:
- dependency-name: path-parse
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-11 22:42:10 +00:00
dependabot[bot] 91df53a7f9
Bump ws from 5.2.2 to 5.2.3 in /pkg/ui/v1beta1/frontend (#1761)
Bumps [ws](https://github.com/websockets/ws) from 5.2.2 to 5.2.3.
- [Release notes](https://github.com/websockets/ws/releases)
- [Commits](https://github.com/websockets/ws/compare/5.2.2...5.2.3)

---
updated-dependencies:
- dependency-name: ws
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-11 22:40:10 +00:00
dependabot[bot] 0cefd03e5c
Bump tmpl from 1.0.4 to 1.0.5 in /pkg/ui/v1beta1/frontend (#1762)
Bumps [tmpl](https://github.com/daaku/nodejs-tmpl) from 1.0.4 to 1.0.5.
- [Release notes](https://github.com/daaku/nodejs-tmpl/releases)
- [Commits](https://github.com/daaku/nodejs-tmpl/commits/v1.0.5)

---
updated-dependencies:
- dependency-name: tmpl
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-11 21:49:10 +00:00
Hao Xin ca79be1105
manifests: Upgrade cert-manager API from v1alpha2 to v1 (#1752) 2022-01-11 21:48:11 +00:00
Yuki Iwai 2a0b12eba7
Bump Python version to 3.9 (#1731)
* bump Python to 3.9

* modify script to build container image

* fix example for enas

* update scripts to modify image name in ci

* review: change docker build command

* review: use new tf-mnist-with-example in Ci for tfjob

* review: refactor tf-mnist-with-summaries

* review: remove Dockerfile.ppc64le for new-ui

* review: update docs related tf-mnist-with-summaries

* TFEventMetricsCollector supports TF>=2.0 and stop supporting TF <=1.x

* review: add help command to scripts/v1beta1/build.sh

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* fix unit test for tfevent-metricscollector

* review: generate tf event files on CI

* add test command to Makefile

* update publish-trial-images

* update update-images.sh

* reduce batch size

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2021-12-10 12:33:49 +00:00
Andrey Velichkevich 7be1f0ad57
Use release tags for Trial images (#1757)
* Update all Trial image tags to latest

* Modify tags in Notebooks

* Rename script

* Script changes

* Few changes

* Add other images

* Modify test script

* Finish test script

* Modify release script

* Finish release script

* Few changes
2021-12-10 02:11:48 +00:00
Daniela Plascencia a2b5dae26f
katib/operators: removes unrecognized keys from metadata.yaml (#1759) 2021-12-09 22:47:49 +00:00
Andrey Velichkevich 151972406a
Add Workflow to Publish Katib Images (#1746)
* Add Workflow to Publish Katib Images

* Change docker hub

* Remove comment

* Fix path

* Use composite run
2021-12-03 18:13:58 +00:00
Andrey Velichkevich 326089d6de
Fix the default Metrics Collector regex (#1755) 2021-12-01 23:22:04 +00:00
DomFleischmann bb439fa550
Fix Status Handling in Charmed Operators (#1743)
- Handle status of operators only in set_pod_spec function.
- Centralize all the logic in set_pod_spec and
add helper checks.
- Pin black version
2021-12-01 15:51:14 +00:00
Seongjin Kim a701fd6f13
Enhance/UI feasible space (#1721)
* Add input type number

* Add non-zero validator on step

* remove non-standard css
2021-12-01 10:35:14 +00:00
Seongjin Kim 53baba87ea
[Bug Fix - new ui] : Fix bug on list type hp (#1704)
* bugfix : Fix bug on categorical hp ui

* run format:write

* remove changes on package-lock.json
2021-12-01 10:32:13 +00:00
Yuan Tang f67ad5b102
docs: Add my presentations that include Katib (#1753) 2021-11-30 17:37:30 +00:00
Andrey Velichkevich 1c7617c0b2
Update links for issue template (#1751) 2021-11-29 22:23:30 +00:00
Yuan Tang e32445da79
Add Akuity to list of adopters (#1749)
* Add Akuity to list of adopters

* Update ADOPTERS.md
2021-11-26 22:49:22 +00:00
Yuki Iwai 10051dc77a
Implement validation for early stopping (#1709)
* implement validation for early stopping

* fix some documents

* fix error messages

* implement gRPC API to verify parameters for early stopping

* review: use early_stopping as gRPC API

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* review: fix error description

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* review: remove t.Run

* review: remove condition to verify algorithmName for early stopping

* remove description about updating gRPC API docs in kubeflow website

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2021-11-26 15:47:54 +00:00
Yuan Tang 46207a3c10
docs: Argo -> Argo Workflows (#1741) 2021-11-23 03:55:06 +00:00
Andrey Velichkevich c3e778b673
Change namespace label for metrics collector injection (#1740)
* Change namespace label for metrics collector injection

* Fix var name
2021-11-23 02:34:05 +00:00
Andrey Velichkevich 0e1730237c
Fix Range for Int and Double values in Grid (#1732) 2021-11-12 02:22:52 +00:00
Jeongwook Park cffab076b6
fix: check if parameter references exist in experiment parameters (#1726)
* fix: check if parameter references exist in experiment parameters

* Fix validator test

* Update some comments and test descriptions

* Check trial parameter reference only when experiment parameters are not empty

* Add a test for the case 'spec.parameters' is mepty
2021-11-09 03:09:03 -08:00
Andrey Velichkevich 16e0574647
Modify gRPC API with Current Request Number (#1728)
* Modify API to current_request_number

* Changes after review

* Add request_number deprecated API

* Fix test
2021-11-04 18:03:31 -07:00
Andrey Velichkevich e72e5c8757
Allow to remove each resource in Katib config (#1729) 2021-11-04 15:24:32 -07:00
fabianvdW 594c177716
Fix same set for HyperParameters in Bayesian Optimization algorithm (#1701)
* Fix #1700

* Reformat
2021-11-02 12:42:06 -07:00
Chen WenJun bfe4527f13
fix: close mysql statement and rows resources when sql exec end (#1720)
* fix: close mysql statement and rows resources when sql exec end

* fix: close mysql statement and rows resources when sql exec end

* style: move code to other place

* style: correct the typo(Prepare)

Co-authored-by: 陈文军 <chenwenjun01@corp.netease.com>
2021-11-02 12:18:06 -07:00
Yuki Iwai f802295cb4
Support leader election for katib-controller (#1713)
* support leader election for katib-controller

* keep consistent with default flag name generated by kubebuilder

Co-authored-by: Ce Gao <ce.gao@outlook.com>

* fix developer guide

* fix e2e test

* modify directory structure for manifests

* fix e2e test

* review: turn off leader-election by default

* Update manifests/v1beta1/installs/katib-leader-election/leader-election-rbac.yaml

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update manifests/v1beta1/installs/katib-leader-election/leader-election-rbac.yaml

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update manifests/v1beta1/installs/katib-leader-election/leader-election-rbac.yaml

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update manifests/v1beta1/installs/katib-leader-election/leader-election-rbac.yaml

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Co-authored-by: Ce Gao <ce.gao@outlook.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2021-11-02 12:14:06 -07:00
Andrey Velichkevich b8622d6b0c
Run Early Stopping Pipeline Example (#1727) 2021-10-29 07:28:51 -07:00
Jeongwook Park 0a38e8769e
Fix clusterrole of katib-controller to access image pull secrets (#1725) 2021-10-28 07:28:26 -07:00
Jeongwook Park 3ed331e412
Emit events when fails to reconcile all trials (#1706)
* Emit events when fails to reconcile all trials

* Return error only when fails to get trial instance
2021-10-27 19:05:26 -07:00
Andrey Velichkevich b176b048bf
Update Algorithm Service Doc for new CI script (#1724)
* Update Algorithm Service Doc for new CI script

* Fix experiments

* Remove spaces
2021-10-26 19:32:27 -07:00
Kenneth Koski 5353cb51b1
Update Charmed Katib Operators + CI to 0.12 (#1717)
* Update katib-controller operator for 0.12

* Update katib-ui operator for 0.12

* Update katib-db-manager operator for 0.12

* Update Charmed Katib CI job

Updates general dependencies and fixes label selectors
2021-10-25 16:28:42 -07:00
Midhun R Nair a528757f82
Updating Katib CI to use training operator (#1710)
* Updating Katib CI to use training operator

* Changed master to 1.4 branch in echo statement
2021-10-22 03:16:03 -07:00
alexeykaplin 5a719e9372
#1714: Missing metrics port annotation (#1715) 2021-10-22 03:14:03 -07:00
Andrew Scribner e4cde95063
Update OWNERS for charm operators (#1718)
Cycles incoming and outgoing Canonical team members for the charm operators
2021-10-22 02:27:03 -07:00
Johnu George 33f8395705 Merge remote-tracking branch 'upstream/master' 2021-10-22 12:26:05 +05:30
Andrey Velichkevich 195db29237
Add Kubeflow Pipelines Examples (#1632)
* Init commit with e2e example

* Add Early Stopping and MPI Examples

* Add MPI to README

* Modify SDK for MPI example

* Modify doc

* Update Early Stopping example

* Finish e2e example

* Modify links for KFP guide
2021-10-12 10:29:44 -07:00
Andrey Velichkevich e5d76369e3
Fix path to the NAS Trial images (#1711) 2021-10-11 14:50:43 -07:00
Jaeyeon Kim 487b012ec6
[enhance] change default metrics collect format (#1707)
* [enhance] change default metrics collect format

-  to parse scientific notation well

* Update pkg/metricscollector/v1beta1/common/const.go

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2021-10-11 13:12:44 -07:00
Yuki Iwai 6de09d7f13
Implement some unit tests for the katibconfig package (#1690)
* resolve conflict

* implement unit tests for GetEarlyStoppingConfigData and GetMetricsCollectorConfigData in katib-config

* fix envtest for suggestion-controller

* remove debug code

* fix invalidCollectorKind value

* refactor tests struct

* remove unnecessary empty line

* add tests for custom resource requirements

* fix variable name
2021-10-08 04:19:23 -07:00
Yuki Iwai 30e47df5a6
Fix hyperlink for build status (#1703)
* fix hyperlink for build status

* Update README.md

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2021-10-07 20:51:23 -07:00
Andrey Velichkevich 7f83ed76d2
Modify SDK examples for Katib 0.12 release (#1631)
* Modify SDK examples for Katib 0.12 release

* Use Katib SDK 0.12
2021-10-07 18:45:23 -07:00
Andrey Velichkevich 60baacd0fd
Add Kubeflow MXJob example (#1688)
* Add Kubeflow MXJob example

* Reduce num examples

* Update image link

* Fix FPGA doc

* Add BytePS image
2021-10-07 17:27:23 -07:00
Andrey Velichkevich 983a867073
Refactor Examples folder structure (#1691)
* Refactor Katib Examples

* Fix links

* Use Kind image
Use kubectl wait

* Update examples/v1beta1/kind-cluster/README.md

Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Update examples/v1beta1/kind-cluster/README.md

Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Increase timeout

* Update docs/images-location.md

Co-authored-by: Elias Koromilas <elias.koromilas@gmail.com>

* Update examples/v1beta1/README.md

Co-authored-by: Elias Koromilas <elias.koromilas@gmail.com>

* Remove json

* Add example links to training containers

* Fix link

* Update links to training-operator

* Rename Trial settings to template

* Rename Trial training containers to Trial images

* Move NAS examples to Trial images

* Add NAS links to README

* Change TARGET DIR

* Update examples/v1beta1/trial-images/mxnet-mnist/Dockerfile

Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Update examples/v1beta1/trial-images/pytorch-mnist/Dockerfile

Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
Co-authored-by: Elias Koromilas <elias.koromilas@gmail.com>
2021-10-07 05:38:22 -07:00
Yuan Tang 2db65b2c6b
Update link to training operator (#1699) 2021-10-06 13:48:20 -07:00
Andrey Velichkevich 8d8acd8289
Add Changelog for Katib 0.12.0 release (#1695) 2021-10-06 00:37:08 -07:00
Andrey Velichkevich cc8fb500e8
Bump Katib Python SDK to 0.12.0 version (#1694) 2021-10-06 00:36:08 -07:00
Yuki Iwai 29409198ff
Fix readme in examples directory (#1687) 2021-09-29 16:46:09 -07:00
Elias Koromilas 40a2fc8f99
Update FPGA examples (#1685)
* Update FPGA Operator usage instructions

Signed-off-by: Elias Koromilas <elias.koromilas@gmail.com>

* Update FPGA example experiments

Signed-off-by: Elias Koromilas <elias.koromilas@gmail.com>

* Add FPGA experiment reviewers

Signed-off-by: Elias Koromilas <elias.koromilas@gmail.com>

* Update examples/v1beta1/fpga/OWNERS

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2021-09-29 12:48:09 -07:00
Andrey Velichkevich 6f0b2b1e54
Add Python test requirements file (#1684) 2021-09-28 09:49:10 -07:00
Andrey Velichkevich ad6e75aa5b
Update Go version to 1.17 (#1683)
* Use Go 1.17 version

* Fix code generator import

* Change after review

* Run lint

* Use go install for lint
2021-09-28 09:05:11 -07:00
Andrey Velichkevich 0a5e418329
Add GitHub Actions for Python unit tests (#1677)
* Add GitHub Actions for Python unit tests

* Add PythonPath

* Add health

* Add pwd

* Mock kube config

* Add Rest PATH
2021-09-27 18:57:10 -07:00
Kimonas Sotirchos 42e3cb214e
Add OWNERS file for the new ui (#1681)
* Add OWNERS file for the new ui

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: Update owners file

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2021-09-27 18:56:09 -07:00
Yuki Iwai 877a7ce123
Add envtest to check `reconcileRBAC` (#1678)
* add envtest to check reconcileRBAC

* fix gofmt

* merge test5 to test1

* fix comment

* fix variable name

* remove auto generated commnets by gofmt 1.17

* fix import
2021-09-27 11:09:40 -07:00
Jaeyeon Kim d9059438d3
[bugfix]: absolute value and typo (#1676)
Signed-off-by: Jaeyeon Kim <anencore94@gmail.com>
Co-authored-by: Seongjin Kim <seongjinkim1123@gmail.com>

Co-authored-by: Seongjin Kim <seongjinkim1123@gmail.com>
2021-09-27 10:08:40 -07:00
Andrey Velichkevich d6f75fe237
Create Python script to run e2e Argo Workflow (#1674)
* Init changes

* Create Argo Workflow Python Script

* Move scripts to e2e/v1beta1

* Get prow env dict

* Modify context

* Fix volumeMounts name

* Change to PULL_PULL_SHA

* Change context

* Remove tmp return

* Remove ksonnet

* Move depends up

* Fix SHA variable

* Decrease parallel experiments
2021-09-27 04:48:40 -07:00
Andrey Velichkevich 23a3a8afd6
Refactor README (#1667)
* Add init changes for README

* Add Algorithms

* Add rows

* Finish README

* Add algorithms first

* Fix install link

* Add check pod command

* Change list

* Move install link

* Add supported frameworks table

* Remove header

* Change table to list

* Add components link

* Add dot
2021-09-24 04:13:37 -07:00
Yuki Iwai c22afe9fb5
Use golangci-lint as linter for Go (#1671)
* use golangci-lint as linter for Go instead of golint

* Update hack/verify-golangci-lint.sh

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* specify golangci-lint version

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* fix echo comment

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* fix echo comment

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* fix lint

* fix gofmt

* fix generated codes

* close the connection for early stopping without using defer

* do not create or delete any resources within gomega.Eventually

* add condition to return error in getKatibJob func

* remove comment out lines

* simplify some codes

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* remove an unnecessary variable

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* fix golangci-lint error

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2021-09-23 23:09:37 -07:00
Yuki Iwai b0e91744a8
Change the minimal kustomize version in the developer guide (#1675) 2021-09-23 11:00:37 -07:00
dependabot[bot] a6cf571ea0
Bump axios from 0.18.1 to 0.21.2 in /pkg/ui/v1beta1/frontend (#1665)
Bumps [axios](https://github.com/axios/axios) from 0.18.1 to 0.21.2.
- [Release notes](https://github.com/axios/axios/releases)
- [Changelog](https://github.com/axios/axios/blob/master/CHANGELOG.md)
- [Commits](https://github.com/axios/axios/compare/v0.18.1...v0.21.2)

---
updated-dependencies:
- dependency-name: axios
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-09-16 10:34:55 -07:00
Andrey Velichkevich 0077876cc7
Bump Katib Python SDK to 0.12.0rc1 version (#1660) 2021-09-16 09:45:55 -07:00
Andrey Velichkevich 7f05e58b23
Add Cert Generator to prow include dirs (#1669) 2021-09-16 01:58:55 -07:00
Yuki Iwai 2a11c35116
Reimplement katib-cert-generator in Go (#1662)
* add cert-generator command

* go mod tidy

* fix gofmt lint check

* fix unittest for katib-cert-generator

* remove unnecessary test code

* fix comment

* review: fix kubeClient

* review: stop to use k8s.io/utils

* review: delete containers[].securityContext

* review: change directory name for cert-generator

* review: fix const

Co-authored-by: andreyvelich <andrey.velichkevich@gmail.com>

* review: stop to use k8s.io/utils

Co-authored-by: andreyvelich <andrey.velichkevich@gmail.com>

* review: delete containers[].securityContext

* review: change directory name for cert-generator

* review: fix const

Co-authored-by: andreyvelich <andrey.velichkevich@gmail.com>

* review: take webhook domain as consts

* review: keep the name testDescription and err

* review: do not try to patch webhook configuration in many times

* review: fix some functions to generate cert

* review: add comments

Co-authored-by: andreyvelich <andrey.velichkevich@gmail.com>

* review: remove v1beta1 from admissionReviewVersions in ValidatingWebhookConfiguration and MutatingWebhookConfiguration

* fix comments

* review: remove the securityContext field

Co-authored-by: andreyvelich <andrey.velichkevich@gmail.com>
2021-09-14 19:01:32 -07:00
Andrey Velichkevich aad3a74083
Add Changelog for Katib 0.12.0-rc.1 release (#1661) 2021-09-07 21:37:06 -07:00
Andrey Velichkevich 885f9064f9
Add mockgen version to gen script (#1658)
* Add mockgen version to gen script

* Rename to update-mockgen
2021-09-07 05:09:39 -07:00
Andrey Velichkevich 98e78e98f5
Remove README toc script (#1659) 2021-09-06 19:54:38 -07:00
Jaeyeon Kim 1ca6798c3f
[bugfix]: increase test timeout (#1654) 2021-09-06 04:56:13 -07:00
Kimonas Sotirchos 6d2c8bbae8
UI: Handle missing TrialTemplates (#1652)
* Use YAML input if TrialParams are missing

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: Separate TrialTemplates in two words

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2021-09-06 03:19:13 -07:00
dependabot[bot] 2d87caafb1
Bump tar from 4.4.15 to 4.4.19 in /pkg/new-ui/v1beta1/frontend (#1650)
Bumps [tar](https://github.com/npm/node-tar) from 4.4.15 to 4.4.19.
- [Release notes](https://github.com/npm/node-tar/releases)
- [Changelog](https://github.com/npm/node-tar/blob/main/CHANGELOG.md)
- [Commits](https://github.com/npm/node-tar/compare/v4.4.15...v4.4.19)

---
updated-dependencies:
- dependency-name: tar
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-09-02 07:36:31 -07:00
Andrey Velichkevich c202017d9f
Update Changelog for Katib 0.12.0-rc.0 release (#1642) 2021-08-31 03:50:03 -07:00
Jaeyeon Kim 7bf1aeba83
SDK: change list apis to return objects as default (#1630)
* SDK: change list apis to return objects as default

- change list_trials, list_experiments to return list of objects as
 a default
- also, give 'in_short' parameter for who wants only name and status
 as before

* [enh]: change return type from List[dict] to List[V1beta1Experiment]

* [enh]: deserialize dict to katib's custom class

* [docs]: refactor KatibClient docs

* change deserialize method location to utils

* remove useless import

* Add objects necessary to deserilization in swagger

* use fakeresponse rather than duplicating codes

* Update sdk/python/v1beta1/kubeflow/katib/utils/utils.py

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update sdk/python/v1beta1/kubeflow/katib/utils/utils.py

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2021-08-30 22:42:03 -07:00
Andrey Velichkevich 21580f6697
Add missing omitempty parameter to APIs (#1645) 2021-08-27 07:33:06 -07:00
Johnu George b4d5a562cc Merge branch 'master' of github.com:kubeflow/katib 2021-08-27 11:38:42 +05:30
Andrey Velichkevich 3fadef637a
Bump Katib Python SDK to 0.12.0rc0 version (#1640) 2021-08-26 10:43:05 -07:00
Andrey Velichkevich 6e66ae2453
Add Katib Release process guide (#1641)
* Add RELEASE doc

* Finish changelog script

* Update release doc

* Changes after review

* Modify README

* Add SDK PR link

* Modify doc

* Changelog for v0.6.0-rc.0

* Modify changelog script

* Add 0.2 changelog

* Remove TODO

* Add 0.6 release

* Add v0.9.0 release

* Add 0.10.0 release

* Add v0.10.1 release

* Add v0.11.0 release

* Add v0.11.1 release

* Fix release

* Update step
2021-08-26 09:57:05 -07:00
Seongjin Kim fa8718be9d
new-ui: Add devDependency - prettier (#1629)
* new-ui: Add devDependency - prettier

* new-ui: update prettier version on Github Action

* new-ui: Update test-node.yaml

* new-ui: Edit test-node.yaml / Makefile

* new-ui: Edit npm install prettier command on test-node workflow

* new-ui: Apply npm run format:write
2021-08-24 18:37:45 -07:00
Johnu George fe5963f8e3
Reconcile semantics for Suggestion Algorithms (#1633)
* Reuse suggestions

* Fix tests
2021-08-24 09:29:40 -07:00
Johnu George acc67db331 Fix tests 2021-08-24 21:09:40 +05:30
Johnu George ce12a891ba Reuse suggestions 2021-08-24 19:15:11 +05:30
Johnu George eea6ada974
Adding tests in multiple steps (#1634) 2021-08-23 20:31:39 -07:00
Andrey Velichkevich b91a9c86ca
Update Katib UI with Optuna Algorithm Settings (#1626)
* Update Katib UI with Optuna Algorithm Settings

* Fix Optuna tests
2021-08-18 06:18:38 -07:00
Andrey Velichkevich 04021b1509
Add Optuna Suggestion to Katib CI (#1624)
* Add Optuna Suggestion to Katib CI

* Update README
2021-08-16 23:33:42 -07:00
Andrey Velichkevich 2bd9b5e0e4
Add Multivariate TPE to Katib UI (#1625)
* Add Multivariate TPE to Katib UI

* Modify Experiment params
2021-08-16 20:59:41 -07:00
Shotaro Sano 7439a3762f
Add Optuna based suggestion service (#1613)
* Implement Optuna service and cmd

* Update pkg/suggestion/v1beta1/optuna/service.py

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update pkg/suggestion/v1beta1/optuna/service.py

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update pkg/suggestion/v1beta1/optuna/service.py

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update pkg/suggestion/v1beta1/optuna/service.py

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Merge the blocks of self.lock in OptunaService

* Remove Cython installation

* Update Python version for the Optuna suggestion service

* Add the example yaml of multivarite-tpe

* Fix the logic of handling unknown trials

* Use name and value instead of the string representation of assignment

* Turn on constant liar by default

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2021-08-16 07:10:06 -07:00
Andrey Velichkevich ecb4686007
Modify XGBoostJob example for the new Controller (#1623)
* Modify XGBoostJob example for the new Controller

* Modify port
2021-08-14 20:59:04 -07:00
Andrey Velichkevich 4ef26ef7a9
Modify labels for controller resources (#1621)
* Change labels for controller resources

* Fix Label in test
2021-08-14 12:16:04 -07:00
dependabot[bot] f42c6ccd7b
Bump jszip from 3.5.0 to 3.7.1 in /pkg/new-ui/v1beta1/frontend (#1619)
Bumps [jszip](https://github.com/Stuk/jszip) from 3.5.0 to 3.7.1.
- [Release notes](https://github.com/Stuk/jszip/releases)
- [Changelog](https://github.com/Stuk/jszip/blob/master/CHANGES.md)
- [Commits](https://github.com/Stuk/jszip/compare/v3.5.0...v3.7.1)

---
updated-dependencies:
- dependency-name: jszip
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-14 11:30:04 -07:00
dependabot[bot] f722725724
Bump path-parse from 1.0.6 to 1.0.7 in /pkg/new-ui/v1beta1/frontend (#1618)
Bumps [path-parse](https://github.com/jbgutierrez/path-parse) from 1.0.6 to 1.0.7.
- [Release notes](https://github.com/jbgutierrez/path-parse/releases)
- [Commits](https://github.com/jbgutierrez/path-parse/commits/v1.0.7)

---
updated-dependencies:
- dependency-name: path-parse
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-13 13:04:04 -07:00
dependabot[bot] 9bacb7b6eb
Bump tar from 4.4.13 to 4.4.15 in /pkg/new-ui/v1beta1/frontend (#1606)
Bumps [tar](https://github.com/npm/node-tar) from 4.4.13 to 4.4.15.
- [Release notes](https://github.com/npm/node-tar/releases)
- [Changelog](https://github.com/npm/node-tar/blob/main/CHANGELOG.md)
- [Commits](https://github.com/npm/node-tar/compare/v4.4.13...v4.4.15)

---
updated-dependencies:
- dependency-name: tar
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-13 09:45:54 -07:00
Andrey Velichkevich abbc9c9d34
Fix Metrics Collector error in case of non-existing Process (#1614)
* Skip error in case of empty Process

* Fix print

* Skip other errors

* Set first process as default main
2021-08-12 15:06:53 -07:00
Andrey Velichkevich 8b7a9fecd6
Add AutoML and Training WG Summit July 2021 (#1615) 2021-08-12 10:55:02 -07:00
Andrey Velichkevich 7a661cf9cb
Modify Labels for Katib Components (#1611) 2021-08-11 12:54:01 -07:00
Andrey Velichkevich be2b26d432
Validate possible operations for Grid suggestion (#1205)
* Create common function to test validate algorithm settings
Validate db exhausted for chocolate

* remove parentheses

* Use common util to test Suggestions

* Fix API name

* Fix indexing
2021-08-11 05:47:26 -07:00
Andrey Velichkevich c24d303c37
Change the default image for the new Katib UI (#1608)
* Change default image for Katib UI

* Change title for UI

* Fix image name

* Use katib-ui image for the new UI

* Remove build from CI workflow

* Add cache to Kaniko

* Add cache repo

* Add other cache repo

* Remove cache
2021-08-10 19:22:25 -07:00
Andrey Velichkevich a83828ce48
Upgrade CRDs to apiextensions.k8s.io/v1 (#1610)
* Changes for v1 version

* Add temp schema

* Use Kubebuilder 2.3.0

* Remove count 1
2021-08-10 16:31:25 -07:00
Andrey Velichkevich a3e36b377b
Remove TFJob and PyTorchJob CRDs from unit tests (#1609) 2021-08-09 19:34:28 -07:00
Andrey Velichkevich 6d54d4a920
Add Support for Argo Workflows (#1605)
* Add Support for Argo Workflows

* Few changes in README

* Add Argo to README

* Remove Argo access from Katib manifests

* Remove Tekton access from Katib manifests

* Few changes in README

* Change to Pipelines
2021-08-05 13:30:42 -07:00
Jaeyeon Kim a57745e6e1
[enh]: validate for bayesian optimization algorithm settings (#1600)
* [enh]: validate for skopt algorithm settings

* [style]: refactor with reviews

- use staticmethod rather than classmethod
- change convertAlgorithmSpec method name to a snake_case
- use .format() rather than f-string

Signed-off-by: Jaeyeon Kim <anencore94@gmail.com>
2021-08-03 13:05:41 -07:00
Andrey Velichkevich 287e868023
Add support for XGBoost Operator with LightGBM example (#1603)
* Add support for XGBoost Operator

* Specify Tag for LightGBM image
2021-08-02 16:45:11 -07:00
Abhishek Vilas Munagekar 1b71a7cc8b
fix mysql version in docker image (#1594) 2021-08-02 08:43:38 -07:00
Andrey Velichkevich ddf064a49e
Remove Kubeflow Training dependencies from Katib (#1599)
* Remove Kubeflow Training dependencies from Katib

* Add code-generator to go.mod
2021-08-01 06:43:37 -07:00
Andrey Velichkevich 44875b8250
Update Katib SDK with OpenAPI generator (#1572)
* Use openapi JAR

* Remove test from SDK

* Use k8s>=12.0 version
2021-07-29 08:35:19 -07:00
dependabot[bot] 1ed426d344
Bump ssri from 6.0.1 to 6.0.2 in /pkg/new-ui/v1beta1/frontend (#1526)
Bumps [ssri](https://github.com/npm/ssri) from 6.0.1 to 6.0.2.
- [Release notes](https://github.com/npm/ssri/releases)
- [Changelog](https://github.com/npm/ssri/blob/v6.0.2/CHANGELOG.md)
- [Commits](https://github.com/npm/ssri/compare/v6.0.1...v6.0.2)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-07-29 08:30:19 -07:00
dependabot[bot] 57f8393fd3
Bump elliptic from 6.5.3 to 6.5.4 in /pkg/new-ui/v1beta1/frontend (#1461)
Bumps [elliptic](https://github.com/indutny/elliptic) from 6.5.3 to 6.5.4.
- [Release notes](https://github.com/indutny/elliptic/releases)
- [Commits](https://github.com/indutny/elliptic/compare/v6.5.3...v6.5.4)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-07-28 14:25:19 -07:00
dependabot[bot] 828495947b
Bump lodash from 4.17.20 to 4.17.21 in /pkg/new-ui/v1beta1/frontend (#1532)
Bumps [lodash](https://github.com/lodash/lodash) from 4.17.20 to 4.17.21.
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](https://github.com/lodash/lodash/compare/4.17.20...4.17.21)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-07-28 14:23:18 -07:00
dependabot[bot] dc932d39e7
Bump underscore from 1.11.0 to 1.13.1 in /pkg/new-ui/v1beta1/frontend (#1529)
Bumps [underscore](https://github.com/jashkenas/underscore) from 1.11.0 to 1.13.1.
- [Release notes](https://github.com/jashkenas/underscore/releases)
- [Commits](https://github.com/jashkenas/underscore/compare/1.11.0...1.13.1)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-07-28 13:38:18 -07:00
dependabot[bot] 5add8f7084
Bump jose from 2.0.2 to 2.0.5 in /pkg/new-ui/v1beta1/frontend (#1518)
Bumps [jose](https://github.com/panva/jose) from 2.0.2 to 2.0.5.
- [Release notes](https://github.com/panva/jose/releases)
- [Changelog](https://github.com/panva/jose/blob/v2.0.5/CHANGELOG.md)
- [Commits](https://github.com/panva/jose/compare/v2.0.2...v2.0.5)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-07-28 13:35:18 -07:00
dependabot[bot] 4ffd131620
Bump hosted-git-info from 2.8.8 to 2.8.9 in /pkg/new-ui/v1beta1/frontend (#1531)
Bumps [hosted-git-info](https://github.com/npm/hosted-git-info) from 2.8.8 to 2.8.9.
- [Release notes](https://github.com/npm/hosted-git-info/releases)
- [Changelog](https://github.com/npm/hosted-git-info/blob/v2.8.9/CHANGELOG.md)
- [Commits](https://github.com/npm/hosted-git-info/compare/v2.8.8...v2.8.9)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-07-28 13:26:18 -07:00
dependabot[bot] 4363093cfd
Bump url-parse from 1.4.7 to 1.5.3 in /pkg/new-ui/v1beta1/frontend (#1591)
Bumps [url-parse](https://github.com/unshiftio/url-parse) from 1.4.7 to 1.5.3.
- [Release notes](https://github.com/unshiftio/url-parse/releases)
- [Commits](https://github.com/unshiftio/url-parse/compare/1.4.7...1.5.3)

---
updated-dependencies:
- dependency-name: url-parse
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-07-28 08:53:45 -07:00
DomFleischmann f1d565ec57
Install charmcraft 1.0.0 (#1593)
Install charmcraft 1.0.0 through pip for now
until we move to pytest-operator tests.
2021-07-28 07:40:45 -07:00
dependabot[bot] 56bc2da56a
Bump y18n from 4.0.0 to 4.0.1 in /pkg/new-ui/v1beta1/frontend (#1506)
Bumps [y18n](https://github.com/yargs/y18n) from 4.0.0 to 4.0.1.
- [Release notes](https://github.com/yargs/y18n/releases)
- [Changelog](https://github.com/yargs/y18n/blob/master/CHANGELOG.md)
- [Commits](https://github.com/yargs/y18n/commits)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-07-27 10:22:11 -07:00
dependabot[bot] 61e63cdf62
Bump dns-packet from 1.3.1 to 1.3.4 in /pkg/new-ui/v1beta1/frontend (#1543)
Bumps [dns-packet](https://github.com/mafintosh/dns-packet) from 1.3.1 to 1.3.4.
- [Release notes](https://github.com/mafintosh/dns-packet/releases)
- [Changelog](https://github.com/mafintosh/dns-packet/blob/master/CHANGELOG.md)
- [Commits](https://github.com/mafintosh/dns-packet/compare/v1.3.1...v1.3.4)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-07-27 10:21:12 -07:00
Andrey Velichkevich 575f89f137
Fix grep in Tekton Experiment Doc (#1578) 2021-07-15 07:16:50 -07:00
Andrey Velichkevich 4ca5fb0044
Allow make pre-release tag (#1575)
* Allow make pre-release cut

* Remove else in SDK version
2021-07-12 08:55:26 -07:00
Andrey Velichkevich 3a1f2b923a
Remove Job Level Mutation from Katib Controller (#1573) 2021-07-11 23:15:26 -07:00
Kenneth Koski a90c91b196
Switch to charmcraft snap for CI (#1574)
The snap store is now the preferred place for obtaining charmcraft, as
opposed to pypi.org.
2021-07-08 16:41:23 -07:00
Andrey Velichkevich 3962af9245
Add gRPC build script to CI (#1569)
* Add gRPC build script to CI

* Fix text

* Change to i flag

* Change to -i in health
2021-07-01 22:44:10 -07:00
Andrey Velichkevich 9bf23e57ba
Add Doc checklist to PR template (#1568) 2021-07-01 19:11:10 -07:00
Andrey Velichkevich 61d70089ef
Allow empty resources for CPU and Memory in Katib config (#1564) 2021-06-29 06:36:46 -07:00
Nick Veitch 9b33615b26
fix typo in operators/README (#1557) 2021-06-16 07:53:09 -07:00
Rui Vasconcelos 08d9497b92
Adds docs on how to use within KF (#1556)
See https://github.com/canonical/bundle-kubeflow/issues/371
2021-06-15 03:13:37 -07:00
Kenneth Koski 6ef600c6d4
Switch to sdi (#1555)
* Update operator dependencies

Updates requirements.txt for each operator to use latest version from
pypi.org.

* Switch katib-ui operator to SDI interface

Switches katib-ui operator to the serialized-data-interface library,
which provides a way to declaratively define relationships.
2021-06-14 07:08:37 -07:00
Andrey Velichkevich 4bd11b385f
Disable default PV for Experiment with resume from volume (#1552)
* Remove default PV creation from resume from volume

* Modify error msg

* Remove PV check from Suggestion controller test

* Use go 1.15.13
2021-06-08 23:24:16 -07:00
Andrey Velichkevich efe8f87f03
Fix gofmt to validate Experiment name (#1545) 2021-06-01 23:17:02 -07:00
Jaeyeon Kim e3ccbcf1f7
feat: add naming regex check on validating webhook (#1541)
- check experiments' naming convention on validating webhook
2021-06-01 04:55:04 -07:00
Andrey Velichkevich 778403b2b2
Add the new Katib presentations 2021 (#1539)
* Add new presentations

* Fix name
2021-05-25 18:59:41 -07:00
Andrey Velichkevich 4dbd53440e
Add go mod tidy check to GitHub Actions (#1535)
* Add go mod tidy check to GitHub actions

* Run go mod tidy
2021-05-18 11:10:41 -07:00
Masashi Shibata 40fc342ea6
Support Sobol's Quasirandom Sequence using Goptuna. (#1523) 2021-05-07 08:23:06 -07:00
Andrey Velichkevich c385748deb
Add Katib 2021 ROADMAP (#1524)
* Add 2021 ROADMAP

* Update

Co-authored-by: Masashi Shibata <c-bata@users.noreply.github.com>

Co-authored-by: Masashi Shibata <c-bata@users.noreply.github.com>
2021-05-07 08:20:07 -07:00
Andrey Velichkevich d0d9c8d80e
Remove PV from MySQL component (#1527)
Add startupProbe to mySQL
2021-05-06 18:59:06 -07:00
Andrey Velichkevich 36ec8b339e
Remove IBM install from Katib manifest (#1525) 2021-04-30 10:19:31 -07:00
Himanshu 0767df4bda
Error messages corrected (#1522)
* Error messages corrected

#1516

* ucfirst

The first letter of both error message is changed to uppercase.

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2021-04-23 07:52:43 -07:00
Masashi Shibata c36ac54b89
Bump the Goptuna version up to v0.8.0 with IPOP-CMA-ES and BIPOP-CMA-ES support. (#1519)
* Upgrade to goptuna v0.8.0

* Add option for restarting CMA-ES
2021-04-23 07:02:43 -07:00
Masashi Shibata 19df714b3b
Fix a link to Kustomize manifest for new Katib UI (#1521) 2021-04-22 11:57:43 -07:00
Максим Грушин 54854c1bb8
Add kustomization overlay: katib-openshift (#1513)
* Add kustomization overlay: katib-standalone-openshift

* Rename OpenShift kustomization and remove unused RBAC resources

* Update kustomization katib-openshift to support changes in #1498

* katib-openshift: move patches to dedicated dir

* katib-openshift: clarify comments

* Update katib-openshift image tags
2021-04-22 06:13:43 -07:00
DavidSpek 520f54e8b5
fix kustomize manifests for kubeflow (#1498)
* fix kustomize manifests for kubeflow

* fix standalone and external-db manifests

* remove old namespace file

* remove PV from kubeflow manifest

* fix katib-external-db reference outside of root

* fix katib-with-kubeflow-cert-manager

* Move image tags to katib-config.yaml and remove patches

* use common namespace kustomization

* Make kubeflow-cert use kubeflow as a base

* Remove katib-cert-generator job from kubeflow-cert-generator manifests

* Move pv-patch to patches folder

* Create katib-cert-manager and make kubeflowuse this as base

* Fix release and CI scripts for new layout

* Remove unnecessary cert-generator images from kustomization.yaml

* Remove unnecessary SA, CR and CRB from katib-cert-manager

* Remove commonLabel from katib-with-kubeflow

* Separate cert-generator from webhook kustomization
2021-04-12 19:33:03 -07:00
Andrey Velichkevich 5e3fb22fe2
Move tests from Travis to GitHub actions (#1514)
* Create workflow for Go

* Add GOPATH env

* Move check up

* Add env

* Add go mod download

* Add ls command

* Add path

* Change path for run

* Change GOPATH

* Add kubebuilder

* Download coveralls

* Add node test

* Remove Travis

* Add coveralls step

* Change coveralls use

* Add working dir

* Remove run
2021-04-09 21:06:04 -07:00
Andrey Velichkevich 86884ca2c2
Add cert manager install to the release script (#1511)
* Add cert manager install to release script

* Increase timeout to 50 min for e2e Experiments
2021-04-01 20:25:20 -07:00
Andrey Velichkevich 4e8c0a9462
Fix setup Katib script to work on the release branch (#1508) 2021-04-01 11:26:19 -07:00
Yannis Zarkadas 32cb42282f
manifests: Remove Application CR (#1507)
Remove Application CR as per https://github.com/kubeflow/manifests/issues/1715

Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>
2021-04-01 11:00:19 -07:00
Yannis Zarkadas c57502956c
Katib manifests fixes for 1.3 (#1502)
* cert-generator: Disable client-side validation

Closes https://github.com/kubeflow/katib/issues/1500

Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>

* manifests: Generate valid VirtualService by default

Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>

* manifests: Disable sidecar injection for all components

Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>

* manifests: Remove erroneous storageClassName from PVC

Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>

* manifests: Add katib-with-kubeflow-cert-manager overlay

Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>
2021-03-31 04:39:18 -07:00
Andrey Velichkevich acec1e2d7b
Fix release script to replace Katib image tags (#1493)
* Fix release script to replace Katib image tags

* Increase restart timeout in e2e
2021-03-21 19:10:17 -07:00
Andrey Velichkevich f83df1b845
Add Katib webhook documentation (#1486)
* Add webhook doc

* Update doc

* Changes after review
2021-03-20 08:54:17 -07:00
Andrey Velichkevich b65e4c34b4
Fix gRPC manager build script (#1492)
* Remove legacy gRPC REST

* Remove gRPC Swagger and Makefile

* Trigger CI
2021-03-19 23:49:17 -07:00
Andrey Velichkevich 5616044116
Add script to update boilerplate (#1491)
* Update boilerplate for clients

* Modify not format boilerplate

* Modify boilerplate script for go files

* Update boilerplate manually for go files

* Modify script

* Generate boilerplate for Go files

* Add script for Python files

* Generate boilerplate for the Py files

* Include shell

* Generate boilerplate for shell scripts

* Add to makefile

* Fix comments

* Change comments
2021-03-19 23:48:17 -07:00
Andrey Velichkevich 372502c366
Add Blog Posts in the README (#1487)
* Add Blog Posts in README

* Rename posts
2021-03-19 23:47:17 -07:00
Kimonas Sotirchos 86c6cfb0e8
Manifests for the new UI (#1476)
* Add small section for the new UI in README

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: Run update-readme-toc

Adds a top level table-of-contents entry for the new UI as well.

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: Rephrase how to launch the new UI

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: Add the lost newline before the TOC

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
2021-03-19 05:13:16 -07:00
Kimonas Sotirchos e67b714dd5
new-ui: Show the Namespace Selector when app is deployed as standalone (#1483)
* ui(base): Use new commit of the latest common code

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* ui(standalone): Expose namespace dropdown

If the Central Dashboard is not present, then the app will show the
namespace selector and will try to fetch the namespaces. This will allow
the app to work in standalone mode, since the users will be able to
navigate between namespaces even without the central dashboard.

The next steps would be to add authorization checks to the backend to
perform SubjectAccessReviews.

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
2021-03-19 05:11:16 -07:00
Andrey Velichkevich 52a9f6d323
Update Trial tag to v1beta1-45c5727 in SDK examples (#1481)
* Update Trial image tag to v1beta1-45c5727 in SDK examples

* Move SDK examples

* Check if Experiment is running after the restart in e2e
2021-03-18 20:08:16 -07:00
Andrey Velichkevich 6591528889
Update Prow with the latest folder structure (#1485) 2021-03-18 18:41:16 -07:00
Andrey Velichkevich 30197f59b4
Update documentation for SDK Katib Client (#1482) 2021-03-18 18:40:16 -07:00
Andrey Velichkevich c0aede670a
Remove legacy controller flags from the developer guide (#1480)
* Remove legacy controller flags from the guide

* Trigger CI
2021-03-18 06:37:16 -07:00
Andrey Velichkevich bda0503ccd
Add new UI image to release scripts (#1478)
* Add new ui to release scripts

* Change repo name
2021-03-17 19:49:15 -07:00
Theofilos Papapanagiotou 57febc4b7c
fix broken links to kubeflow website (#1477) 2021-03-17 08:53:16 -07:00
Andrey Velichkevich 3935da82af
Add release process script (#1473)
* Modify build script

* Modify release script

* Modify release script

* Change registry

* Leave one tag in build and push script

* Remove comments
2021-03-16 19:54:15 -07:00
Andrey Velichkevich 5ed27d703b
Update Trial images to v1beta1-45c5727 (#1470)
* Update Trial examples image tag to v1beta1-45c5727

* Modify test script
2021-03-16 00:22:15 -07:00
Andrey Velichkevich dbd723a446
Remove completed TODOs for Katib config (#1459) 2021-03-16 00:21:15 -07:00
Kimonas Sotirchos 45c57271f1
Add missing form fields to new UI (#1463)
* ui(form): Add Trial Parameters

The form will be showing a dynamic list of trial parameters that the
users will need to configure. This list is affected from the yaml
content.

The JS parses the yaml contents to find the trial parameters.

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* ui(form): Refactor the algorithm component

Create a common component for the algorithm settings. We will need this
for Early Stopping, which also has algorithm settings.

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* ui(form): Add support for early stopping

Add a distinct section for Early Stopping, when the Search Algorithm is
for Hyper Parameter tuning.

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* ui(form): Add resume policy to form

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* ui(form): Small fixes to algo's settings

* The algorithm settings got applied in the form only after the user
  selected a different algorithm. The preselected value would not have
  the list of settings assigned once the form loads.
* Use null everywhere for the `random_state` parameter

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: Use correct form group

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* review: Use LongRunning for resumepolicy default

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* ui(form): Use FormArray variable directly

Using "formGroup.get('trialParameters').controls" in the html results in
an error, during the build process. This is because Angular can't deduce that
the control returned from the get() method is a form array.

We will define a FormArray variable and use it directly instead of
get()ing it from he form group.

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2021-03-12 14:21:47 -08:00
Andrey Velichkevich beccd46556
Update new algorithm service doc (#1460)
* Update new suggestion doc

* Few changes

* Changes after review

* Update manifest links

* Add link to Katib config patch

* Modify sed command
2021-03-12 13:06:46 -08:00
Andrey Velichkevich 070f1cbb77
Refactor Kustomize manifests for Katib (#1464)
* Refactor kustomize manifests

* Remove file

* Modify README

* Disable actions

* Update Trial images tag to v1beta1-c6c9172

* Fix few comments

* Remove test print

* Remove image pull policy

* Remove TODOs

* Exclude PV from Kubeflow install
Add image versions to katib-external-db install

* Rename Katib IBM install

* Create patch file for Katib config

* Fix path for MC

* Fix var name

* Change tag in patch file

* Change download mnist for mxnet example

* Change MNIST to FashionMNIST

* Remove comment from actions
2021-03-12 12:39:47 -08:00
Kenneth Koski badbbdbe39
Update Katib operator and image (#1465)
Updates Docker image to include changes from #1450, and updates
operator to latest version of operator framework.
2021-03-12 10:48:50 -08:00
Kenneth Koski 4179e47af6
Update Katib operator and image (#1465)
Updates Docker image to include changes from #1450, and updates
operator to latest version of operator framework.
2021-03-12 10:47:19 -08:00
Kenneth Koski 985dee6ff3
Update Katib operator and image (#1465)
Updates Docker image to include changes from #1450, and updates
operator to latest version of operator framework.
2021-03-12 10:45:41 -08:00
Kenneth Koski a282f3e019
Update Katib operator and image (#1465)
Updates Docker image to include changes from #1450, and updates
operator to latest version of operator framework.
2021-03-12 10:44:57 -08:00
Andrey Velichkevich c6c91729be
Add Trial images build to the CI (#1457)
* Fix image for MXNET mnist
Rename example folder for PyTorch mnist

* Add build process for Katib Trial template in the CI
Fix problems with current image build

* Change release registry to kubeflowkatib

* Few changes

* Simplify sed for mnist

* Add script to change trial images

* Modify script

* Add if for macos

* Enable PyTorch examples

* Remove test print
2021-03-09 19:02:23 -08:00
Andrey Velichkevich 7b12b131af
Add cert generator build to the CI (#1458)
* Add cert generator build to the CI

* Remove musl from cert generator image

* Remove musl

* Trigger CI
2021-03-09 19:01:23 -08:00
Kenneth Koski 081db0dd7f
Re-enable Github Actions (#1456) 2021-03-08 11:17:23 -08:00
Yannis Zarkadas ac4f525dc8
Katib: Move manifests development upstream (#1432)
* manifests: Move manifests development upstream

As part of the work of wg-manifests for 1.3
(https://github.com/kubeflow/manifests/issues/1735), we are moving manifests
development in upstream repos. This gives the application developers full
ownership of their manifests, tracked in a single place.

This commit copies the manifests for application `Katib`
from path `apps/katib/upstream` of kubeflow/manifests to path
`manifests/v1beta1` of the upstream repo (https://github.com/kubeflow/katib).

Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>

* manifests: Fold base, overlays into components

Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>
2021-03-06 12:23:49 -08:00
Andrey Velichkevich 12e7f1e80e
Disable dynamic creation for admission hooks and update dependencies (#1450)
* Update all dependencies to the latest versions
Add cert generator for the webhooks
Add manifests for the webhooks

* Modify Dockerfile for manager

* Remove comments

* Update Dockerfiles for Go images

* Add signerName: kubernetes.io/kube-apiserver-client to csr
Update roles for controller RBAC
Changes after review

* Fix not installed CRD error

* Update scripts

* Revert operator changes

* Describe controller pod in test

* Add log line to test

* Move kubectl version

* Change csr version to v1beta1

* Remove log

* Change signerName to kubernetes.io/kubelet-serving

* Modify common name

Co-authored-by: Yuki Iwai <68272500+tenzen-y@users.noreply.github.com>

* Add env variable to init container

Co-authored-by: Yuki Iwai <68272500+tenzen-y@users.noreply.github.com>

* Get namespace from env

Co-authored-by: Yuki Iwai <68272500+tenzen-y@users.noreply.github.com>

* Remove quotes

* Remove spaces

* Run cert generator script from the Job

* Modify new ui Dockerfile

* Disable Actions on PR

* Modify setup Katib script

* Fix PODNUM

* Remove imagePullPolicy from PyTorch and TFJob examples

* Disable Pytorch examples in e2e

* Add sleep to e2e test

* Activate Actions

* Disable actions

Co-authored-by: Yuki Iwai <68272500+tenzen-y@users.noreply.github.com>
2021-03-06 09:29:49 -08:00
Kimonas Sotirchos d9b4602280
First iteration of the new Katib UI (#1427)
* Create a folder for the new-ui

We will create a `new-ui` folder under the `pkg` dir to add the new UI.
This will ensure that we won't break any existing functionality.

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* Initial code for the frontend

This PR introduces the new UI. We hope that this will be the last big PR
in this repo and all of the subsequent ones will be smaller bit-sized
PRs.

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* backend: Expose the entire status of an experiment

We want the table in the main UI page to show more information for each
experiment. This information lives in the status of each Experiment CR,
so we expand the API to also return the entire status for each
Experiment.

In the future we will probably need to just send the entire CR to the
frontend and not parse it at all in the backend.

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* backend: Return the KFP run uid

We want to return the Pipeline UID for a Trial, if such exists.

When combined with Kale, a Trial initiates a KFP run. In this case,
there is an annotation with the KFP run ID, which we can use to navigate
the user to the KFP UI for the specific run.

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* backend: sereve an Angular SPA

To serve an SPA the backend must return the index.html for any non-API
route. The index.html must be sent for any request to the app's page.
Then, once the javascript loads, the app will show to the user the
correct view.

In this commit we also completely remove any caching of the index.html,
for the browser to always request the latest version. This eliminates
the need to hard reload the page to view changes to the frontend code.

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* Dockerfile for the new Katib web app

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* Extend the dockerignore for the new UI

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* Update the README with build commands

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: use port 8080 instead of 80 in backend

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: use lowercase fields when fetchin exps

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: Add seconds to the x-axis of Trial info

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: Unify the npm run build commands

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: Move TypeMeta values to a common place

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: Remove section for max_old_space_size in README

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* review: add katib prefix to docs link

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* review: Correct link for new UI in README

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* review: Remove unused 'format' npm script

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: Ensure format checks work with Travis

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: Remove unused space

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* review: Use create_experiment route

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: fix travis govet test

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: Rename the Bayesian settings

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* review: Rename the ParametersSpec

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* review: Remove setting TypeMeta and ObjectMeta

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: Update README for build:watch

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* review: Fix a typo

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* review: Remove unused css

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: Use types from k8s.models file

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* Add kfp-run column if UID is present in trials

With Kale a Trial can launch a distinct KF Pipeline. The UID of this
pipeline will be also set as an annotation to the Trial owning the
Pipeline.

In this case the UI should have one extra column for this UID.

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* Properly expose the NAS fields

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: Move MetricCollector enums to global enums file

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: Remove unused volume enum

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* review: Don't send empty settings

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: Add parameters for TPE

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2021-03-02 18:38:48 -08:00
Planck0591 d20e11b18d
add adopter (#1451)
* Update ADOPTERS.md

add adopter

* keep the list in alphabetical order

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2021-03-02 02:31:47 -08:00
Anna 90a99a8d64
Add katib controller flags to developers guide (#1449)
Signed-off-by: Anna Jung (VMware) <antheaj@vmware.com>
2021-02-26 09:02:16 -08:00
Adarsh fe64c18172
Enhance katib client by adding get_success_trial_details() (#1442)
* Add katibClient method to get success trials

* Remove comment

* Replace hyperparams with hyperparameters

* Add status success check
2021-02-23 06:34:02 -08:00
Andrey Velichkevich 804bed4c6a
Add Katib presentations and community information (#1446)
* Init commit

* Add presentations list
Update README with community info

* Fix

* Fix presentation link
2021-02-22 19:18:02 -08:00
Andrey Velichkevich cb6c835bf2
Verify nil objective in Experiment defaults (#1445) 2021-02-19 17:35:41 -08:00
Andrey Velichkevich 53207648a0
Migrate to Go modules (#1438) 2021-02-18 10:04:52 -08:00
DomFleischmann 5e6a4bae65
Change roles to clusterroles for operators (#1426)
This will change the katib-controller and katib-ui
roles to clusterroles.

Additionally Dominik Fleischmann is being added to
the owners of the katib operators.
2021-02-09 04:10:46 -08:00
Yao Xiao 621973712e
Migrate katib to new test-infra (#1423)
* Migrate katib to new test-infra

* Roll back test worker image

* Roll back all the changes

* Wrap integer

* Remove DDL and TTL configuration and use default

* Migrate to new aws account's configuration

* Clean up comment
2021-01-28 09:13:34 -08:00
Kenneth Koski 26bcfea9f8
Add SVG logo traced from bitmap logo (#1414) 2021-01-25 04:20:54 -08:00
zhang_jf 377d52f6ee
Invalid example url (#1417)
* Invalid example url

* Update pkg/metricscollector/v1beta1/tfevent-metricscollector/tfevent_loader.py

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2021-01-15 15:11:44 -08:00
Andrey Velichkevich 28104e63f7
Fix SDK examples for 0.10 version (#1402) 2021-01-05 17:43:51 -08:00
Kenneth Koski 7901c43225
Add Github Actions CI for charm operators (#1407)
Builds images locally with `latest` tag, and includes new bundle for
testing purposes that sets operator docker images to `latest` as well
instead of a particular revision.
2021-01-05 03:56:01 -08:00
Rui Vasconcelos 55ed4e5b5e
Add Juju install commands to operators README (#1411)
Looking at this README, it is hard to follow by users who are not familiar with `juju`. 

Adding initial steps to get started with Juju.
2020-12-14 17:51:48 -08:00
Andrey Velichkevich 4b42a8dacc
Fix indentation in the OWNERS file (#1408) 2020-12-08 21:26:50 -08:00
Andrey Velichkevich 24ef0be33c
Bump Prettier to 2.2.0 for the Katib UI (#1409) 2020-12-08 21:24:50 -08:00
Kenneth Koski 96b81e8f6e
Add Katib Bundle for Juju (#1403)
* Add Katib Bundle for Juju

Adds Python operators for Katib, corresponding to the latest Katib manifests.

Adds an `operators` folder with an OWNERS file to hold the operators.

* Fixing code review items

* Update README.md

Update README.md

* Dedent OWNERS file

* Update README.md

Co-authored-by: Rui Vasconcelos <rui.vasconcelos.mail@gmail.com>
2020-12-03 07:25:00 -08:00
Andrey Velichkevich b2e01e568f
Remove duecredit pkg from the Suggestions (#1406) 2020-12-03 01:49:00 -08:00
Andrey Velichkevich 88204e3003
Fix Early Stopped Trials in Goptuna Suggestion (#1404) 2020-11-30 22:30:49 -08:00
Andrey Velichkevich 91e499637d
Remove v1alpha3 version (#1396)
* Remove v1alpha3 files

* Modify SDK

* Change dict() to object
2020-11-30 06:48:50 -08:00
Andrey Velichkevich 4559e16114
Update docs for Katib 0.10 (#1392)
* Modify doc for 0.10

* Modify workflow doc

* Modify README

* Change travis status to .com
2020-11-27 06:10:48 -08:00
Rui Vasconcelos 4c4a13e239
Adding to ADOPTERS.md (#1401) 2020-11-26 03:56:20 -08:00
Robbert van der Gugten 7104018c91
Feature/waitallprocesses config (#1394)
* Add waitAllProcesses to metricsCollector config

* wait_all_processes config for python metricscollector main

* Correct boolean check

* waitAllProcesses config as bool

* add omitempty to suggestion and earlystopping

* correct default config
2020-11-23 19:57:00 -08:00
Andrey Velichkevich db896d7984
Add recreate strategy to MySQL deployment (#1393) 2020-11-22 18:03:34 -08:00
Andrey Velichkevich 7d9ab7a47b
Move Adopters file (#1391)
* Move Adopters to root

* Trigger CI
2020-11-22 18:01:34 -08:00
Andrey Velichkevich 90f727b209
Add Stale config to close inactivity issues (#1390) 2020-11-22 17:59:33 -08:00
Andrey Velichkevich b9075c2a12
Remove new Trial kind doc (#1388) 2020-11-13 06:54:24 -08:00
Andrey Velichkevich 6a1531bc98
Fix compare step for the early stopping (#1386) 2020-11-11 03:21:48 -08:00
5244 changed files with 88898 additions and 1773906 deletions

View File

@ -1,9 +1,6 @@
.git
.gitignore
docs
examples
!examples/v1alpha3/nas
!examples/v1beta1/nas
manifests
pkg/ui/*/frontend/node_modules
pkg/ui/*/frontend/build

4
.flake8 Normal file
View File

@ -0,0 +1,4 @@
[flake8]
max-line-length = 100
# E203 is ignored to avoid conflicts with Black's formatting, as it's not PEP 8 compliant
extend-ignore = W503, E203

View File

@ -1,25 +0,0 @@
---
name: Bug report
about: Tell us about a problem you are experiencing
---
/kind bug
**What steps did you take and what happened:**
[A clear and concise description of what the bug is.]
**What did you expect to happen:**
**Anything else you would like to add:**
[Miscellaneous information that will assist in solving the issue.]
**Environment:**
- Kubeflow version (`kfctl version`):
- Minikube version (`minikube version`):
- Kubernetes version: (use `kubectl version`):
- OS (e.g. from `/etc/os-release`):

50
.github/ISSUE_TEMPLATE/bug_report.yaml vendored Normal file
View File

@ -0,0 +1,50 @@
name: Bug Report
description: Tell us about a problem you are experiencing with Katib
labels: ["kind/bug", "lifecycle/needs-triage"]
body:
- type: markdown
attributes:
value: |
Thanks for taking the time to fill out this Katib bug report!
- type: textarea
id: problem
attributes:
label: What happened?
description: |
Please provide as much info as possible. Not doing so may result in your bug not being
addressed in a timely manner.
validations:
required: true
- type: textarea
id: expected
attributes:
label: What did you expect to happen?
validations:
required: true
- type: textarea
id: environment
attributes:
label: Environment
value: |
Kubernetes version:
```bash
$ kubectl version
```
Katib controller version:
```bash
$ kubectl get pods -n kubeflow -l katib.kubeflow.org/component=controller -o jsonpath="{.items[*].spec.containers[*].image}"
```
Katib Python SDK version:
```bash
$ pip show kubeflow-katib
```
validations:
required: true
- type: input
id: votes
attributes:
label: Impacted by this bug?
value: Give it a 👍 We prioritize the issues with most 👍

12
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@ -0,0 +1,12 @@
blank_issues_enabled: true
contact_links:
- name: Katib Documentation
url: https://www.kubeflow.org/docs/components/katib/
about: Much help can be found in the docs
- name: Kubeflow Katib Slack Channel
url: https://www.kubeflow.org/docs/about/community/#kubeflow-slack-channels
about: Ask the Katib community on CNCF Slack
- name: Kubeflow Katib Community Meeting
url: https://bit.ly/2PWVCkV
about: Join the Kubeflow AutoML working group meeting

View File

@ -1,14 +0,0 @@
---
name: Feature enhancement request
about: Suggest an idea for this project
---
/kind feature
**Describe the solution you'd like**
[A clear and concise description of what you want to happen.]
**Anything else you would like to add:**
[Miscellaneous information that will assist in solving the issue.]

View File

@ -0,0 +1,28 @@
name: Feature Request
description: Suggest an idea for Katib
labels: ["kind/feature", "lifecycle/needs-triage"]
body:
- type: markdown
attributes:
value: |
Thanks for taking the time to fill out this Katib feature request!
- type: textarea
id: feature
attributes:
label: What you would like to be added?
description: |
A clear and concise description of what you want to add to Katib.
Please consider to write Katib enhancement proposal if it is a large feature request.
validations:
required: true
- type: textarea
id: rationale
attributes:
label: Why is this needed?
validations:
required: true
- type: input
id: votes
attributes:
label: Love this feature?
value: Give it a 👍 We prioritize the features with most 👍

View File

@ -1,25 +1,14 @@
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://git.k8s.io/community/contributors/guide/pull-requests.md#the-pull-request-submit-process and developer guide https://git.k8s.io/community/contributors/devel/development.md#development-guide
2. If you want *faster* PR reviews, read how: https://git.k8s.io/community/contributors/guide/pull-requests.md#best-practices-for-faster-reviews
3. Follow the instructions for writing a release note: https://git.k8s.io/community/contributors/guide/release-notes.md
4. If the PR is unfinished, see how to mark it: https://git.k8s.io/community/contributors/guide/pull-requests.md#marking-unfinished-pull-requests
5. If this PR changes image versions, please title this PR "Bump <image name> from x.x.x to y.y.y."
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, check our contributor guidelines https://www.kubeflow.org/docs/about/contributing
2. To know more about Katib components, check developer guide https://github.com/kubeflow/katib/blob/master/CONTRIBUTING.md
3. If you want *faster* PR reviews, check how: https://git.k8s.io/community/contributors/guide/pull-requests.md#best-practices-for-faster-reviews
-->
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Which issue(s) this PR fixes** _(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)_:
Fixes #
**Special notes for your reviewer**:
**Checklist:**
1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.
**Release note**:
<!-- Write your release note:
1. Enter your extended release note in the below block. If the PR requires additional action from users switching to the new release, include the string "action required".
2. If no release note is required, just write "NONE".
-->
```release-note
```
- [ ] [Docs](https://www.kubeflow.org/docs/components/katib/) included if any changes are user facing

View File

@ -0,0 +1,81 @@
# Reusable workflows for publishing Katib images.
name: Build and Publish Images
on:
workflow_call:
inputs:
component-name:
required: true
type: string
platforms:
required: true
type: string
dockerfile:
required: true
type: string
secrets:
DOCKERHUB_USERNAME:
required: false
DOCKERHUB_TOKEN:
required: false
jobs:
build-and-publish:
name: Build and Publish Images
runs-on: ubuntu-22.04
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set Publish Condition
id: publish-condition
shell: bash
run: |
if [[ "${{ github.repository }}" == 'kubeflow/katib' && \
( "${{ github.ref }}" == 'refs/heads/master' || \
"${{ github.ref }}" =~ ^refs/heads/release- || \
"${{ github.ref }}" =~ ^refs/tags/v ) ]]; then
echo "should_publish=true" >> $GITHUB_OUTPUT
else
echo "should_publish=false" >> $GITHUB_OUTPUT
fi
- name: GHCR Login
if: steps.publish-condition.outputs.should_publish == 'true'
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: DockerHub Login
if: steps.publish-condition.outputs.should_publish == 'true'
uses: docker/login-action@v3
with:
registry: docker.io
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Publish Component ${{ inputs.component-name }}
if: steps.publish-condition.outputs.should_publish == 'true'
id: publish
uses: ./.github/workflows/template-publish-image
with:
image: |
ghcr.io/kubeflow/katib/${{ inputs.component-name }}
docker.io/kubeflowkatib/${{ inputs.component-name }}
dockerfile: ${{ inputs.dockerfile }}
platforms: ${{ inputs.platforms }}
push: true
- name: Test Build For Component ${{ inputs.component-name }}
if: steps.publish.outcome == 'skipped'
uses: ./.github/workflows/template-publish-image
with:
image: |
ghcr.io/kubeflow/katib/${{ inputs.component-name }}
docker.io/kubeflowkatib/${{ inputs.component-name }}
dockerfile: ${{ inputs.dockerfile }}
platforms: ${{ inputs.platforms }}
push: false

View File

@ -0,0 +1,38 @@
name: E2E Test with darts-cnn-cifar10
on:
pull_request:
paths-ignore:
- "pkg/ui/v1beta1/frontend/**"
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
e2e:
runs-on: ubuntu-22.04
timeout-minutes: 120
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Test Env
uses: ./.github/workflows/template-setup-e2e-test
with:
kubernetes-version: ${{ matrix.kubernetes-version }}
python-version: "3.11"
- name: Run e2e test with ${{ matrix.experiments }} experiments
uses: ./.github/workflows/template-e2e-test
with:
experiments: ${{ matrix.experiments }}
# Comma Delimited
trial-images: darts-cnn-cifar10-cpu
strategy:
fail-fast: false
matrix:
kubernetes-version: ["v1.29.2", "v1.30.7", "v1.31.3"]
# Comma Delimited
experiments: ["darts-cpu"]

View File

@ -0,0 +1,38 @@
name: E2E Test with enas-cnn-cifar10
on:
pull_request:
paths-ignore:
- "pkg/ui/v1beta1/frontend/**"
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
e2e:
runs-on: ubuntu-22.04
timeout-minutes: 120
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Test Env
uses: ./.github/workflows/template-setup-e2e-test
with:
kubernetes-version: ${{ matrix.kubernetes-version }}
python-version: "3.8"
- name: Run e2e test with ${{ matrix.experiments }} experiments
uses: ./.github/workflows/template-e2e-test
with:
experiments: ${{ matrix.experiments }}
# Comma Delimited
trial-images: enas-cnn-cifar10-cpu
strategy:
fail-fast: false
matrix:
kubernetes-version: ["v1.29.2", "v1.30.7", "v1.31.3"]
# Comma Delimited
experiments: ["enas-cpu"]

View File

@ -0,0 +1,46 @@
name: E2E Test with pytorch-mnist
on:
pull_request:
paths-ignore:
- "pkg/ui/v1beta1/frontend/**"
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
e2e:
runs-on: ubuntu-22.04
timeout-minutes: 120
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Test Env
uses: ./.github/workflows/template-setup-e2e-test
with:
kubernetes-version: ${{ matrix.kubernetes-version }}
python-version: "3.10"
- name: Run e2e test with ${{ matrix.experiments }} experiments
uses: ./.github/workflows/template-e2e-test
with:
experiments: ${{ matrix.experiments }}
training-operator: true
# Comma Delimited
trial-images: pytorch-mnist-cpu
strategy:
fail-fast: false
matrix:
kubernetes-version: ["v1.29.2", "v1.30.7", "v1.31.3"]
# Comma Delimited
experiments:
# suggestion-hyperopt
- "long-running-resume,from-volume-resume,median-stop"
# others
- "grid,bayesian-optimization,tpe,multivariate-tpe,cma-es,hyperband"
- "hyperopt-distribution,optuna-distribution"
- "file-metrics-collector,pytorchjob-mnist"
- "median-stop-with-json-format,file-metrics-collector-with-json-format"

View File

@ -0,0 +1,38 @@
name: E2E Test with simple-pbt
on:
pull_request:
paths-ignore:
- "pkg/ui/v1beta1/frontend/**"
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
e2e:
runs-on: ubuntu-22.04
timeout-minutes: 120
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Test Env
uses: ./.github/workflows/template-setup-e2e-test
with:
kubernetes-version: ${{ matrix.kubernetes-version }}
- name: Run e2e test with ${{ matrix.experiments }} experiments
uses: ./.github/workflows/template-e2e-test
with:
experiments: ${{ matrix.experiments }}
# Comma Delimited
trial-images: simple-pbt
strategy:
fail-fast: false
matrix:
# Detail: https://hub.docker.com/r/kindest/node
kubernetes-version: ["v1.29.2", "v1.30.7", "v1.31.3"]
# Comma Delimited
experiments: ["simple-pbt"]

View File

@ -0,0 +1,38 @@
name: E2E Test with tf-mnist-with-summaries
on:
pull_request:
paths-ignore:
- "pkg/ui/v1beta1/frontend/**"
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
e2e:
runs-on: ubuntu-22.04
timeout-minutes: 120
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Test Env
uses: ./.github/workflows/template-setup-e2e-test
with:
kubernetes-version: ${{ matrix.kubernetes-version }}
- name: Run e2e test with ${{ matrix.experiments }} experiments
uses: ./.github/workflows/template-e2e-test
with:
experiments: ${{ matrix.experiments }}
training-operator: true
# Comma Delimited
trial-images: tf-mnist-with-summaries
strategy:
fail-fast: false
matrix:
kubernetes-version: ["v1.29.2", "v1.30.7", "v1.31.3"]
# Comma Delimited
experiments: ["tfjob-mnist-with-summaries"]

View File

@ -0,0 +1,40 @@
name: E2E Test with tune API
on:
pull_request:
paths-ignore:
- "pkg/ui/v1beta1/frontend/**"
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
e2e:
runs-on: ubuntu-22.04
timeout-minutes: 120
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Test Env
uses: ./.github/workflows/template-setup-e2e-test
with:
kubernetes-version: ${{ matrix.kubernetes-version }}
- name: Install Katib SDK with extra requires
shell: bash
run: |
pip install --prefer-binary -e 'sdk/python/v1beta1[huggingface]'
- name: Run e2e test with tune API
uses: ./.github/workflows/template-e2e-test
with:
tune-api: true
training-operator: true
strategy:
fail-fast: false
matrix:
# Detail: https://hub.docker.com/r/kindest/node
kubernetes-version: ["v1.29.2", "v1.30.7", "v1.31.3"]

View File

@ -0,0 +1,35 @@
name: E2E Test with Katib UI, random search, and postgres
on:
- pull_request
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
e2e:
runs-on: ubuntu-22.04
timeout-minutes: 120
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Test Env
uses: ./.github/workflows/template-setup-e2e-test
with:
kubernetes-version: ${{ matrix.kubernetes-version }}
- name: Run e2e test with ${{ matrix.experiments }} experiments
uses: ./.github/workflows/template-e2e-test
with:
experiments: random
# Comma Delimited
trial-images: pytorch-mnist-cpu
katib-ui: true
database-type: postgres
strategy:
fail-fast: false
matrix:
kubernetes-version: ["v1.29.2", "v1.30.7", "v1.31.3"]

View File

@ -0,0 +1,49 @@
name: Free-Up Disk Space
description: Remove Non-Essential Tools And Move Docker Data Directory to /mnt/docker
runs:
using: composite
steps:
# This step is a Workaround to avoid the "No space left on device" error.
# ref: https://github.com/actions/runner-images/issues/2840
- name: Remove unnecessary files
shell: bash
run: |
echo "Disk usage before cleanup:"
df -hT
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo rm -rf /usr/local/share/boost
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/local/share/powershell
sudo rm -rf /usr/share/swift
echo "Disk usage after cleanup:"
df -hT
- name: Prune docker images
shell: bash
run: |
docker image prune -a -f
docker system df
df -hT
- name: Move docker data directory
shell: bash
run: |
echo "Stopping docker service ..."
sudo systemctl stop docker
DOCKER_DEFAULT_ROOT_DIR=/var/lib/docker
DOCKER_ROOT_DIR=/mnt/docker
echo "Moving ${DOCKER_DEFAULT_ROOT_DIR} -> ${DOCKER_ROOT_DIR}"
sudo mv ${DOCKER_DEFAULT_ROOT_DIR} ${DOCKER_ROOT_DIR}
echo "Creating symlink ${DOCKER_DEFAULT_ROOT_DIR} -> ${DOCKER_ROOT_DIR}"
sudo ln -s ${DOCKER_ROOT_DIR} ${DOCKER_DEFAULT_ROOT_DIR}
echo "$(sudo ls -l ${DOCKER_DEFAULT_ROOT_DIR})"
echo "Starting docker service ..."
sudo systemctl daemon-reload
sudo systemctl start docker
echo "Docker service status:"
sudo systemctl --no-pager -l -o short status docker

View File

@ -0,0 +1,42 @@
name: Publish AutoML Algorithm Images
on:
push:
pull_request:
paths-ignore:
- "pkg/ui/v1beta1/frontend/**"
jobs:
algorithm:
name: Publish Image
uses: ./.github/workflows/build-and-publish-images.yaml
with:
component-name: ${{ matrix.component-name }}
platforms: linux/amd64,linux/arm64
dockerfile: ${{ matrix.dockerfile }}
secrets:
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}
strategy:
fail-fast: false
matrix:
include:
- component-name: suggestion-hyperopt
dockerfile: cmd/suggestion/hyperopt/v1beta1/Dockerfile
- component-name: suggestion-hyperband
dockerfile: cmd/suggestion/hyperband/v1beta1/Dockerfile
- component-name: suggestion-skopt
dockerfile: cmd/suggestion/skopt/v1beta1/Dockerfile
- component-name: suggestion-goptuna
dockerfile: cmd/suggestion/goptuna/v1beta1/Dockerfile
- component-name: suggestion-optuna
dockerfile: cmd/suggestion/optuna/v1beta1/Dockerfile
- component-name: suggestion-pbt
dockerfile: cmd/suggestion/pbt/v1beta1/Dockerfile
- component-name: suggestion-enas
dockerfile: cmd/suggestion/nas/enas/v1beta1/Dockerfile
- component-name: suggestion-darts
dockerfile: cmd/suggestion/nas/darts/v1beta1/Dockerfile
- component-name: earlystopping-medianstop
dockerfile: cmd/earlystopping/medianstop/v1beta1/Dockerfile

View File

@ -0,0 +1,24 @@
name: Publish Katib Conformance Test Images
on:
- push
- pull_request
jobs:
core:
name: Publish Image
uses: ./.github/workflows/build-and-publish-images.yaml
with:
component-name: ${{ matrix.component-name }}
platforms: linux/amd64,linux/arm64
dockerfile: ${{ matrix.dockerfile }}
secrets:
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}
strategy:
fail-fast: false
matrix:
include:
- component-name: katib-conformance
dockerfile: Dockerfile.conformance

View File

@ -0,0 +1,32 @@
name: Publish Katib Core Images
on:
- push
- pull_request
jobs:
core:
name: Publish Image
uses: ./.github/workflows/build-and-publish-images.yaml
with:
component-name: ${{ matrix.component-name }}
platforms: linux/amd64,linux/arm64
dockerfile: ${{ matrix.dockerfile }}
secrets:
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}
strategy:
fail-fast: false
matrix:
include:
- component-name: katib-controller
dockerfile: cmd/katib-controller/v1beta1/Dockerfile
- component-name: katib-db-manager
dockerfile: cmd/db-manager/v1beta1/Dockerfile
- component-name: katib-ui
dockerfile: cmd/ui/v1beta1/Dockerfile
- component-name: file-metrics-collector
dockerfile: cmd/metricscollector/v1beta1/file-metricscollector/Dockerfile
- component-name: tfevent-metrics-collector
dockerfile: cmd/metricscollector/v1beta1/tfevent-metricscollector/Dockerfile

View File

@ -0,0 +1,48 @@
name: Publish Trial Images
on:
push:
pull_request:
paths-ignore:
- "pkg/ui/v1beta1/frontend/**"
jobs:
trial:
name: Publish Image
uses: ./.github/workflows/build-and-publish-images.yaml
with:
component-name: ${{ matrix.trial-name }}
platforms: ${{ matrix.platforms }}
dockerfile: ${{ matrix.dockerfile }}
secrets:
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}
strategy:
fail-fast: false
matrix:
include:
- trial-name: pytorch-mnist-cpu
platforms: linux/amd64,linux/arm64
dockerfile: examples/v1beta1/trial-images/pytorch-mnist/Dockerfile.cpu
- trial-name: pytorch-mnist-gpu
platforms: linux/amd64
dockerfile: examples/v1beta1/trial-images/pytorch-mnist/Dockerfile.gpu
- trial-name: tf-mnist-with-summaries
platforms: linux/amd64,linux/arm64
dockerfile: examples/v1beta1/trial-images/tf-mnist-with-summaries/Dockerfile
- trial-name: enas-cnn-cifar10-gpu
platforms: linux/amd64
dockerfile: examples/v1beta1/trial-images/enas-cnn-cifar10/Dockerfile.gpu
- trial-name: enas-cnn-cifar10-cpu
platforms: linux/amd64,linux/arm64
dockerfile: examples/v1beta1/trial-images/enas-cnn-cifar10/Dockerfile.cpu
- trial-name: darts-cnn-cifar10-cpu
platforms: linux/amd64,linux/arm64
dockerfile: examples/v1beta1/trial-images/darts-cnn-cifar10/Dockerfile.cpu
- trial-name: darts-cnn-cifar10-gpu
platforms: linux/amd64
dockerfile: examples/v1beta1/trial-images/darts-cnn-cifar10/Dockerfile.gpu
- trial-name: simple-pbt
platforms: linux/amd64,linux/arm64
dockerfile: examples/v1beta1/trial-images/simple-pbt/Dockerfile

42
.github/workflows/stale.yaml vendored Normal file
View File

@ -0,0 +1,42 @@
# This workflow warns and then closes issues and PRs that have had no activity for a specified amount of time.
#
# You can adjust the behavior by modifying this file.
# For more information, see:
# https://github.com/actions/stale
name: Mark stale issues and pull requests
on:
schedule:
- cron: "0 */5 * * *"
jobs:
stale:
runs-on: ubuntu-22.04
permissions:
issues: write
pull-requests: write
steps:
- uses: actions/stale@v5
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
days-before-stale: 90
days-before-close: 20
stale-issue-message: >
This issue has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs. Thank you
for your contributions.
close-issue-message: >
This issue has been automatically closed because it has not had recent
activity. Please comment "/reopen" to reopen it.
stale-issue-label: lifecycle/stale
exempt-issue-labels: lifecycle/frozen
stale-pr-message: >
This pull request has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs. Thank you
for your contributions.
close-pr-message: >
This pull request has been automatically closed because it has not had recent
activity. Please comment "/reopen" to reopen it.
stale-pr-label: lifecycle/stale
exempt-pr-labels: lifecycle/frozen

View File

@ -0,0 +1,49 @@
# Composite action for e2e tests.
name: Run E2E Test
description: Run e2e test using the minikube cluster
inputs:
experiments:
required: false
description: comma delimited experiment name
default: ""
training-operator:
required: false
description: whether to deploy training-operator or not
default: false
trial-images:
required: false
description: comma delimited trial image name
default: ""
katib-ui:
required: true
description: whether to deploy katib-ui or not
default: false
database-type:
required: false
description: mysql or postgres
default: mysql
tune-api:
required: true
description: whether to execute tune-api test or not
default: false
runs:
using: composite
steps:
- name: Setup Minikube Cluster
shell: bash
run: ./test/e2e/v1beta1/scripts/gh-actions/setup-minikube.sh ${{ inputs.katib-ui }} ${{ inputs.tune-api }} ${{ inputs.trial-images }} ${{ inputs.experiments }}
- name: Setup Katib
shell: bash
run: ./test/e2e/v1beta1/scripts/gh-actions/setup-katib.sh ${{ inputs.katib-ui }} ${{ inputs.training-operator }} ${{ inputs.database-type }}
- name: Run E2E Experiment
shell: bash
run: |
if "${{ inputs.tune-api }}"; then
./test/e2e/v1beta1/scripts/gh-actions/run-e2e-tune-api.sh
else
./test/e2e/v1beta1/scripts/gh-actions/run-e2e-experiment.sh ${{ inputs.experiments }}
fi

View File

@ -0,0 +1,62 @@
# Composite action for publishing Katib images.
name: Build And Publish Container Images
description: Build MultiPlatform Supporting Container Images
inputs:
image:
required: true
description: image tag
dockerfile:
required: true
description: path for dockerfile
platforms:
required: true
description: linux/amd64 or linux/amd64,linux/arm64
push:
required: true
description: whether to push container images or not
runs:
using: composite
steps:
# This step is a Workaround to avoid the "No space left on device" error.
# ref: https://github.com/actions/runner-images/issues/2840
- name: Remove unnecessary files
shell: bash
run: |
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/local/share/powershell
sudo rm -rf /usr/share/swift
echo "Disk usage after cleanup:"
df -h
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set Up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Add Docker Tags
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ inputs.image }}
tags: |
type=raw,latest
type=sha,prefix=v1beta1-
- name: Build and Push
uses: docker/build-push-action@v5
with:
context: .
file: ${{ inputs.dockerfile }}
push: ${{ inputs.push }}
tags: ${{ steps.meta.outputs.tags }}
cache-from: type=gha
cache-to: type=gha,mode=max,ignore-error=true
platforms: ${{ inputs.platforms }}

View File

@ -0,0 +1,48 @@
# Composite action to setup e2e tests.
name: Setup E2E Test
description: setup env for e2e test using the minikube cluster
inputs:
kubernetes-version:
required: true
description: kubernetes version
python-version:
required: false
description: Python version
# Most latest supporting version
default: "3.10"
runs:
using: composite
steps:
# This step is a Workaround to avoid the "No space left on device" error.
# ref: https://github.com/actions/runner-images/issues/2840
- name: Free-Up Disk Space
uses: ./.github/workflows/free-up-disk-space
- name: Setup kubectl
uses: azure/setup-kubectl@v4
with:
version: ${{ inputs.kubernetes-version }}
- name: Setup Minikube Cluster
uses: medyagh/setup-minikube@v0.0.18
with:
network-plugin: cni
cni: flannel
driver: none
kubernetes-version: ${{ inputs.kubernetes-version }}
minikube-version: 1.34.0
start-args: --wait-timeout=120s
- name: Setup Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: ${{ inputs.python-version }}
- name: Install Katib SDK
shell: bash
run: pip install --prefer-binary -e sdk/python/v1beta1

79
.github/workflows/test-go.yaml vendored Normal file
View File

@ -0,0 +1,79 @@
name: Go Test
on:
pull_request:
paths-ignore:
- "pkg/ui/v1beta1/frontend/**"
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
generatetests:
name: Generate And Format Test
runs-on: ubuntu-22.04
env:
GOPATH: ${{ github.workspace }}/go
defaults:
run:
working-directory: ${{ env.GOPATH }}/src/github.com/kubeflow/katib
steps:
- name: Check out code
uses: actions/checkout@v4
with:
path: ${{ env.GOPATH }}/src/github.com/kubeflow/katib
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version-file: ${{ env.GOPATH }}/src/github.com/kubeflow/katib/go.mod
cache-dependency-path: ${{ env.GOPATH }}/src/github.com/kubeflow/katib/go.sum
- name: Check Go Modules, Generated Go/Python codes, and Format
run: make check
unittests:
name: Unit Test
runs-on: ubuntu-22.04
env:
GOPATH: ${{ github.workspace }}/go
defaults:
run:
working-directory: ${{ env.GOPATH }}/src/github.com/kubeflow/katib
steps:
- name: Check out code
uses: actions/checkout@v4
with:
path: ${{ env.GOPATH }}/src/github.com/kubeflow/katib
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version-file: ${{ env.GOPATH }}/src/github.com/kubeflow/katib/go.mod
cache-dependency-path: ${{ env.GOPATH }}/src/github.com/kubeflow/katib/go.sum
- name: Run Go test
run: go mod download && make test ENVTEST_K8S_VERSION=${{ matrix.kubernetes-version }}
- name: Coveralls report
uses: shogo82148/actions-goveralls@v1
with:
path-to-profile: coverage.out
working-directory: ${{ env.GOPATH }}/src/github.com/kubeflow/katib
parallel: true
strategy:
fail-fast: false
matrix:
# Detail: `setup-envtest list`
kubernetes-version: ["1.29.3", "1.30.0", "1.31.0"]
# notifies that all test jobs are finished.
finish:
needs: unittests
runs-on: ubuntu-22.04
steps:
- uses: shogo82148/actions-goveralls@v1
with:
parallel-finished: true

30
.github/workflows/test-lint.yaml vendored Executable file
View File

@ -0,0 +1,30 @@
name: Lint Files
on:
pull_request:
paths-ignore:
- "pkg/ui/v1beta1/frontend/**"
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
lint:
name: Lint
runs-on: ubuntu-22.04
steps:
- name: Check out code
uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: 3.9
- name: Check shell scripts
run: make shellcheck
- name: Run pre-commit
uses: pre-commit/action@v3.0.1

101
.github/workflows/test-node.yaml vendored Normal file
View File

@ -0,0 +1,101 @@
name: Frontend Test
on:
pull_request:
paths:
- pkg/ui/v1beta1/frontend/**
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
test:
name: Code format and lint
runs-on: ubuntu-22.04
steps:
- name: Check out code
uses: actions/checkout@v4
- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: 16.20.2
- name: Format katib code
run: |
npm install prettier --prefix ./pkg/ui/v1beta1/frontend
make prettier-check
- name: Lint katib code
run: |
cd pkg/ui/v1beta1/frontend
npm run lint-check
frontend-unit-tests:
name: Frontend Unit Tests
runs-on: ubuntu-22.04
steps:
- name: Check out code
uses: actions/checkout@v4
- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: 16.20.2
- name: Fetch Kubeflow and install common code dependencies
run: |
COMMIT=$(cat pkg/ui/v1beta1/frontend/COMMIT)
cd /tmp && git clone https://github.com/kubeflow/kubeflow.git
cd kubeflow
git checkout $COMMIT
cd components/crud-web-apps/common/frontend/kubeflow-common-lib
npm i
npm run build
npm link ./dist/kubeflow
- name: Install KWA dependencies
run: |
cd pkg/ui/v1beta1/frontend
npm i
npm link kubeflow
- name: Run unit tests
run: |
cd pkg/ui/v1beta1/frontend
npm run test:prod
frontend-ui-tests:
name: UI tests with Cypress
runs-on: ubuntu-22.04
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup node version to 16
uses: actions/setup-node@v4
with:
node-version: 16
- name: Fetch Kubeflow and install common code dependencies
run: |
COMMIT=$(cat pkg/ui/v1beta1/frontend/COMMIT)
cd /tmp && git clone https://github.com/kubeflow/kubeflow.git
cd kubeflow
git checkout $COMMIT
cd components/crud-web-apps/common/frontend/kubeflow-common-lib
npm i
npm run build
npm link ./dist/kubeflow
- name: Install KWA dependencies
run: |
cd pkg/ui/v1beta1/frontend
npm i
npm link kubeflow
- name: Serve UI & run Cypress tests in Chrome and Firefox
run: |
cd pkg/ui/v1beta1/frontend
npm run start & npx wait-on http://localhost:4200
npm run ui-test-ci-all

47
.github/workflows/test-python.yaml vendored Normal file
View File

@ -0,0 +1,47 @@
name: Python Test
on:
pull_request:
paths-ignore:
- "pkg/ui/v1beta1/frontend/**"
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
test:
name: Test
runs-on: ubuntu-22.04
steps:
- name: Check out code
uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: 3.11
- name: Run Python test
run: make pytest
# The skopt service doesn't work appropriately with Python 3.11.
# So, we need to run the test with Python 3.9.
# TODO (tenzen-y): Once we stop to support skopt, we can remove this test.
# REF: https://github.com/kubeflow/katib/issues/2280
test-skopt:
name: Test Skopt
runs-on: ubuntu-22.04
steps:
- name: Check out code
uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: 3.9
- name: Run Python test
run: make pytest-skopt

13
.gitignore vendored
View File

@ -4,6 +4,12 @@ __pycache__/
.coverage
.pytest_cache
*.egg-info
build/
*.charm
test/unit/v1beta1/metricscollector/testdata
# SDK generator JAR file
hack/gen-python-sdk/openapi-generator-cli.jar
# Project specific ignore files
*.swp
@ -16,6 +22,7 @@ bin
*.dll
*.so
*.dylib
pkg/metricscollector/v1beta1/file-metricscollector/testdata
## Test binary, build with `go test -c`
*.test
@ -68,3 +75,9 @@ $RECYCLE.BIN/
/katib-controller
/katib-db-manager
/katib-ui
## Vendor dir
vendor
# Jupyter Notebooks.
**/.ipynb_checkpoints

38
.pre-commit-config.yaml Normal file
View File

@ -0,0 +1,38 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
hooks:
- id: check-yaml
args: [--allow-multiple-documents]
- id: check-json
- repo: https://github.com/pycqa/isort
rev: 5.11.5
hooks:
- id: isort
name: isort
entry: isort --profile black
- repo: https://github.com/psf/black
rev: 24.2.0
hooks:
- id: black
files: (sdk|examples|pkg)/.*
- repo: https://github.com/pycqa/flake8
rev: 7.1.1
hooks:
- id: flake8
files: (sdk|examples|pkg)/.*
exclude: |
(?x)^(
.*zz_generated.deepcopy.*|
.*pb.go|
pkg/apis/manager/.*pb2(?:_grpc)?.py(?:i)?|
pkg/apis/v1beta1/openapi_generated.go|
pkg/mock/.*|
pkg/client/controller/.*|
sdk/python/v1beta1/kubeflow/katib/configuration.py|
sdk/python/v1beta1/kubeflow/katib/rest.py|
sdk/python/v1beta1/kubeflow/katib/__init__.py|
sdk/python/v1beta1/kubeflow/katib/exceptions.py|
sdk/python/v1beta1/kubeflow/katib/api_client.py|
sdk/python/v1beta1/kubeflow/katib/models/.*
)$

View File

@ -1,27 +0,0 @@
jobs:
include:
- name: "Go unit tests, gofmt, golint and coveralls"
language: go
go: "1.14.2"
go_import_path: github.com/kubeflow/katib
install:
- curl -L -O "https://github.com/kubernetes-sigs/kubebuilder/releases/download/v1.0.7/kubebuilder_1.0.7_linux_amd64.tar.gz"
- # extract the archive
- tar -zxvf kubebuilder_1.0.7_linux_amd64.tar.gz
- sudo mv kubebuilder_1.0.7_linux_amd64 /usr/local/kubebuilder
- export PATH=$PATH:/usr/local/kubebuilder/bin
# get coveralls.io support
- go get github.com/mattn/goveralls
script:
- make check
- make test
after_success:
- goveralls -coverprofile=coverage.out
- name: "Prettier frontend check"
language: node_js
node_js: "12.18.1"
install:
- npm install --global prettier@1.19.1
script:
- make prettier-check-v1alpha3
- make prettier-check

20
ADOPTERS.md Normal file
View File

@ -0,0 +1,20 @@
# Adopters of Kubeflow Katib
Below are the adopters of project Katib. If you are using Katib
please add yourself into the following list by a pull request.
Please keep the list in alphabetical order.
| Organization | Contact | Description of Use |
|--------------------------------------------------|------------------------------------------------------|----------------------------------------------------------------------|
| [Akuity](https://akuity.io/) | [@terrytangyuan](https://github.com/terrytangyuan) | |
| [Ant Group](https://www.antgroup.com/) | [@ohmystack](https://github.com/ohmystack) | Automatic training in Ant Group internal AI Platform |
| [babylon health](https://www.babylonhealth.com/) | [@jeremievallee](https://github.com/jeremievallee) | Hyperparameter tuning for AIR internal AI Platform |
| [caicloud](https://caicloud.io/) | [@gaocegege](https://github.com/gaocegege) | Hyperparameter tuning in Caicloud Cloud-Native AI Platform |
| [canonical](https://ubuntu.com/) | [@RFMVasconcelos](https://github.com/rfmvasconcelos) | Hyperparameter tuning for customer projects in Defense and Fintech |
| [CERN](https://home.cern/) | [@d-gol](https://github.com/d-gol) | Hyperparameter tuning within the ML platform on private cloud |
| [cisco](https://cisco.com/) | [@ramdootp](https://github.com/ramdootp) | Hyperparameter tuning for conversational AI interface using Rasa |
| [cubonacci](https://www.cubonacci.com) | [@janvdvegt](https://github.com/janvdvegt) | Hyperparameter tuning within the Cubonacci machine learning platform |
| [CyberAgent](https://www.cyberagent.co.jp/en/) | [@tenzen-y](https://github.com/tenzen-y) | Experiment in CyberAgent internal ML Platform on Private Cloud |
| [fuzhi](http://www.fuzhi.ai/) | [@planck0591](https://github.com/planck0591) | Experiment and Trial in autoML Platform |
| [karrot](https://uk.karrotmarket.com/) | [@muik](https://github.com/muik) | Hyperparameter tuning in Karrot ML Platform |
| [PITS Global Data Recovery Services](https://www.pitsdatarecovery.net/) | [@pheianox](https://github.com/pheianox) | CyberAgent and ML Platform |

File diff suppressed because it is too large Load Diff

43
CITATION.cff Normal file
View File

@ -0,0 +1,43 @@
cff-version: 1.2.0
message: "If you use Katib in your scientific publication, please cite it as below."
authors:
- family-names: "George"
given-names: "Johnu"
- family-names: "Gao"
given-names: "Ce"
- family-names: "Liu"
given-names: "Richard"
- family-names: "Liu"
given-names: "Hou Gang"
- family-names: "Tang"
given-names: "Yuan"
- family-names: "Pydipaty"
given-names: "Ramdoot"
- family-names: "Saha"
given-names: "Amit Kumar"
title: "Katib"
type: software
repository-code: "https://github.com/kubeflow/katib"
preferred-citation:
type: misc
title: "A Scalable and Cloud-Native Hyperparameter Tuning System"
authors:
- family-names: "George"
given-names: "Johnu"
- family-names: "Gao"
given-names: "Ce"
- family-names: "Liu"
given-names: "Richard"
- family-names: "Liu"
given-names: "Hou Gang"
- family-names: "Tang"
given-names: "Yuan"
- family-names: "Pydipaty"
given-names: "Ramdoot"
- family-names: "Saha"
given-names: "Amit Kumar"
year: 2020
url: "https://arxiv.org/abs/2006.02085"
identifiers:
- type: "other"
value: "arXiv:2006.02085"

167
CONTRIBUTING.md Normal file
View File

@ -0,0 +1,167 @@
# Developer Guide
This developer guide is for people who want to contribute to the Katib project.
If you're interesting in using Katib in your machine learning project,
see the following guides:
- [Getting started with Katib](https://kubeflow.org/docs/components/katib/hyperparameter/).
- [How to configure Katib Experiment](https://kubeflow.org/docs/components/katib/experiment/).
- [Katib architecture and concepts](https://www.kubeflow.org/docs/components/katib/reference/architecture/)
for hyperparameter tuning and neural architecture search.
## Requirements
- [Go](https://golang.org/) (1.22 or later)
- [Docker](https://docs.docker.com/) (24.0 or later)
- [Docker Buildx](https://docs.docker.com/build/buildx/) (0.8.0 or later)
- [Java](https://docs.oracle.com/javase/8/docs/technotes/guides/install/install_overview.html) (8 or later)
- [Python](https://www.python.org/) (3.11 or later)
- [kustomize](https://kustomize.io/) (4.0.5 or later)
- [pre-commit](https://pre-commit.com/)
## Build from source code
**Note** that your Docker Desktop should
[enable containerd image store](https://docs.docker.com/desktop/containerd/#enable-the-containerd-image-store)
to build multi-arch images. Check source code as follows:
```bash
make build REGISTRY=<image-registry> TAG=<image-tag>
```
If you are using an Apple Silicon machine and encounter the "rosetta error: bss_size overflow," go to Docker Desktop -> General and uncheck "Use Rosetta for x86_64/amd64 emulation on Apple Silicon."
To use your custom images for the Katib components, modify
[Kustomization file](https://github.com/kubeflow/katib/blob/master/manifests/v1beta1/installs/katib-standalone/kustomization.yaml)
and [Katib Config](https://github.com/kubeflow/katib/blob/master/manifests/v1beta1/installs/katib-standalone/katib-config.yaml)
You can deploy Katib v1beta1 manifests into a Kubernetes cluster as follows:
```bash
make deploy
```
You can undeploy Katib v1beta1 manifests from a Kubernetes cluster as follows:
```bash
make undeploy
```
## Technical and style guide
The following guidelines apply primarily to Katib,
but other projects like [Training Operator](https://github.com/kubeflow/training-operator) might also adhere to them.
## Go Development
When coding:
- Follow [effective go](https://go.dev/doc/effective_go) guidelines.
- Run locally [`make check`](https://github.com/kubeflow/katib/blob/46173463027e4fd2e604e25d7075b2b31a702049/Makefile#L31)
to verify if changes follow best practices before submitting PRs.
Testing:
- Use [`cmp.Diff`](https://pkg.go.dev/github.com/google/go-cmp/cmp#Diff) instead of `reflect.Equal`, to provide useful comparisons.
- Define test cases as maps instead of slices to avoid dependencies on the running order.
Map key should be equal to the test case name.
## Modify controller APIs
If you want to modify Katib controller APIs, you have to
generate deepcopy, clientset, listers, informers, open-api and Python SDK with the changed APIs.
You can update the necessary files as follows:
```bash
make generate
```
## Controller Flags
Below is a list of command-line flags accepted by Katib controller:
| Name | Type | Default | Description |
| ------------ | ------ | ------- | -------------------------------------------------------------------------------------------------------------------------------- |
| katib-config | string | "" | The katib-controller will load its initial configuration from this file. Omit this flag to use the default configuration values. |
## DB Manager Flags
Below is a list of command-line flags accepted by Katib DB Manager:
| Name | Type | Default | Description |
| --------------- | ------------- | -------------| ------------------------------------------------------------------- |
| connect-timeout | time.Duration | 60s | Timeout before calling error during database connection |
| listen-address | string | 0.0.0.0:6789 | The network interface or IP address to receive incoming connections |
## Katib admission webhooks
Katib uses three [Kubernetes admission webhooks](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/).
1. `validator.experiment.katib.kubeflow.org` -
[Validating admission webhook](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#validatingadmissionwebhook)
to validate the Katib Experiment before the creation.
1. `defaulter.experiment.katib.kubeflow.org` -
[Mutating admission webhook](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#mutatingadmissionwebhook)
to set the [default values](../pkg/apis/controller/experiments/v1beta1/experiment_defaults.go)
in the Katib Experiment before the creation.
1. `mutator.pod.katib.kubeflow.org` - Mutating admission webhook to inject the metrics
collector sidecar container to the training pod. Learn more about the Katib's
metrics collector in the
[Kubeflow documentation](https://www.kubeflow.org/docs/components/katib/user-guides/metrics-collector/).
You can find the YAMLs for the Katib webhooks
[here](../manifests/v1beta1/components/webhook/webhooks.yaml).
**Note:** If you are using a private Kubernetes cluster, you have to allow traffic
via `TCP:8443` by specifying the firewall rule and you have to update the master
plane CIDR source range to use the Katib webhooks
### Katib cert generator
Katib Controller has the internal `cert-generator` to generate certificates for the webhooks.
Once Katib is deployed in the Kubernetes cluster, the `cert-generator` follows these steps:
- Generate the self-signed certificate and private key.
- Update a Kubernetes Secret with the self-signed TLS certificate and private key.
- Patch the webhooks with the `CABundle`.
Once the `cert-generator` finished, the Katib controller starts to register controllers such as `experiment-controller` to the manager.
You can find the `cert-generator` source code [here](../pkg/certgenerator/v1beta1).
NOTE: the Katib also supports the [cert-manager](https://cert-manager.io/) to generate certs for the admission webhooks instead of using cert-generator.
You can find the installation with the cert-manager [here](../manifests/v1beta1/installs/katib-cert-manager).
## Implement a new algorithm and use it in Katib
Please see [new-algorithm-service.md](./new-algorithm-service.md).
## Katib UI documentation
Please see [Katib UI README](../pkg/ui/v1beta1).
## Design proposals
Please see [proposals](./proposals).
## Code Style
### pre-commit
Make sure to install [pre-commit](https://pre-commit.com/) (`pip install
pre-commit`) and run `pre-commit install` from the root of the repository at
least once before creating git commits.
The pre-commit [hooks](../.pre-commit-config.yaml) ensure code quality and
consistency. They are executed in CI. PRs that fail to comply with the hooks
will not be able to pass the corresponding CI gate. The hooks are only executed
against staged files unless you run `pre-commit run --all`, in which case,
they'll be executed against every file in the repository.
Specific programmatically generated files listed in the `exclude` field in
[.pre-commit-config.yaml](../.pre-commit-config.yaml) are deliberately excluded
from the hooks.

32
Dockerfile.conformance Normal file
View File

@ -0,0 +1,32 @@
# Copyright 2023 The Kubeflow Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Dockerfile for building the source code of conformance tests
FROM python:3.10-slim
WORKDIR /kubeflow/katib
COPY sdk/ /kubeflow/katib/sdk/
COPY examples/ /kubeflow/katib/examples/
COPY test/ /kubeflow/katib/test/
COPY pkg/ /kubeflow/katib/pkg/
COPY conformance/run.sh .
# Add test script.
RUN chmod +x run.sh
RUN pip install --prefer-binary -e sdk/python/v1beta1
ENTRYPOINT [ "./run.sh" ]

1577
Gopkg.lock generated

File diff suppressed because it is too large Load Diff

View File

@ -1,129 +0,0 @@
required = [
"github.com/emicklei/go-restful",
"github.com/onsi/ginkgo", # for test framework
"github.com/onsi/gomega", # for test matchers
"k8s.io/client-go/plugin/pkg/client/auth/gcp", # for development against gcp
"k8s.io/code-generator/cmd/deepcopy-gen", # for deepcopy generation
"k8s.io/code-generator/cmd/openapi-gen", # for openapi generation
"sigs.k8s.io/controller-tools/cmd/controller-gen", # for crd/rbac generation
"sigs.k8s.io/controller-runtime/pkg/client/config",
"sigs.k8s.io/controller-runtime/pkg/controller",
"sigs.k8s.io/controller-runtime/pkg/handler",
"sigs.k8s.io/controller-runtime/pkg/manager",
"sigs.k8s.io/controller-runtime/pkg/runtime/signals",
"sigs.k8s.io/controller-runtime/pkg/source",
"sigs.k8s.io/testing_frameworks/integration", # for integration testing
"k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1beta1",
"github.com/grpc-ecosystem/grpc-gateway/protoc-gen-grpc-gateway",
"github.com/grpc-ecosystem/grpc-gateway/protoc-gen-swagger",
"github.com/golang/protobuf/protoc-gen-go",
]
[prune]
go-tests = true
unused-packages = true
non-go = true
[[constraint]]
name = "github.com/go-sql-driver/mysql"
version = "1.4.0"
[[constraint]]
name = "github.com/golang/mock"
version = "1.4.3"
[[override]]
name = "github.com/google/go-containerregistry"
# HEAD as of 2019-03-20
revision = "8d4083db9aa0d2fae6588c1acdbe6a1f5db461e3"
[[constraint]]
name = "github.com/golang/protobuf"
version = "1.2.0"
[[constraint]]
name = "github.com/spf13/viper"
version = "1.2.0"
[[constraint]]
name = "golang.org/x/net"
revision="640f4622ab692b87c2f3a94265e6f579fe38263d"
[[constraint]]
name = "gopkg.in/DATA-DOG/go-sqlmock.v1"
version = "1.3.0"
[[constraint]]
name = "sigs.k8s.io/controller-runtime"
version = "0.1.9"
[[override]]
name = "k8s.io/api"
version = "kubernetes-1.12.9"
[[override]]
name = "k8s.io/apimachinery"
version = "kubernetes-1.12.9"
[[override]]
name = "k8s.io/code-generator"
version = "kubernetes-1.12.9"
[[override]]
name = "k8s.io/client-go"
version = "kubernetes-1.12.9"
[[override]]
name = "k8s.io/apiextensions-apiserver"
version = "kubernetes-1.12.9"
[[override]]
name = "k8s.io/kubernetes"
version = "v1.13.3"
[[override]]
name = "gopkg.in/fsnotify.v1"
source = "https://github.com/fsnotify/fsnotify.git"
version="v1.4.7"
[[constraint]]
name = "github.com/kubeflow/tf-operator"
branch = "v0.7-branch"
[[constraint]]
name = "github.com/kubeflow/pytorch-operator"
branch = "v0.7-branch"
[[constraint]]
name = "github.com/awalterschulze/gographviz"
branch = "master"
[[constraint]]
name = "github.com/c-bata/goptuna"
version = "v0.5.1"
[[prune.project]]
name = "github.com/kubeflow/katib"
unused-packages = false
non-go = false
[[prune.project]]
name = "k8s.io/code-generator"
unused-packages = false
non-go = false
[[prune.project]]
name = "k8s.io/gengo"
unused-packages = false
[[constraint]]
name = "github.com/grpc-ecosystem/go-grpc-middleware"
version = "1.2.0"
[[constraint]]
name = "github.com/tidwall/gjson"
version = "1.6.0"
[[constraint]]
name = "github.com/shirou/gopsutil"
version = "2.20.7"

198
Makefile Normal file → Executable file
View File

@ -1,75 +1,193 @@
HAS_DEP := $(shell command -v dep;)
HAS_LINT := $(shell command -v golint;)
HAS_LINT := $(shell command -v golangci-lint;)
HAS_YAMLLINT := $(shell command -v yamllint;)
HAS_SHELLCHECK := $(shell command -v shellcheck;)
HAS_SETUP_ENVTEST := $(shell command -v setup-envtest;)
HAS_MOCKGEN := $(shell command -v mockgen;)
COMMIT := v1beta1-$(shell git rev-parse --short=7 HEAD)
KATIB_REGISTRY := ghcr.io/kubeflow/katib
CPU_ARCH ?= linux/amd64,linux/arm64
ENVTEST_K8S_VERSION ?= 1.31
MOCKGEN_VERSION ?= $(shell grep 'go.uber.org/mock' go.mod | cut -d ' ' -f 2)
GO_VERSION=$(shell grep '^go' go.mod | cut -d ' ' -f 2)
GOPATH ?= $(shell go env GOPATH)
TEST_TENSORFLOW_EVENT_FILE_PATH ?= $(CURDIR)/test/unit/v1beta1/metricscollector/testdata/tfevent-metricscollector/logs
# Run tests
.PHONY: test
test:
go test ./pkg/... ./cmd/... -coverprofile coverage.out
test: envtest
KUBEBUILDER_ASSETS="$(shell setup-envtest use $(ENVTEST_K8S_VERSION) -p path)" go test ./pkg/... ./cmd/... -coverprofile coverage.out
depend:
ifndef HAS_DEP
curl https://raw.githubusercontent.com/golang/dep/master/install.sh | sh
envtest:
ifndef HAS_SETUP_ENVTEST
go install sigs.k8s.io/controller-runtime/tools/setup-envtest@release-0.19
$(info "setup-envtest has been installed")
endif
dep ensure -v
$(info "setup-envtest has already installed")
check: depend generate fmt vet lint
check: generated-codes go-mod fmt vet lint
fmt:
hack/verify-gofmt.sh
lint:
ifndef HAS_LINT
go get -u golang.org/x/lint/golint
echo "installing golint"
go install github.com/golangci/golangci-lint/cmd/golangci-lint@v1.64.7
$(info "golangci-lint has been installed")
endif
hack/verify-golint.sh
hack/verify-golangci-lint.sh
yamllint:
ifndef HAS_YAMLLINT
pip install --prefer-binary yamllint
$(info "yamllint has been installed")
endif
hack/verify-yamllint.sh
vet:
go vet ./pkg/... ./cmd/...
shellcheck:
ifndef HAS_SHELLCHECK
bash hack/install-shellcheck.sh
$(info "shellcheck has been installed")
endif
hack/verify-shellcheck.sh
update:
hack/update-gofmt.sh
# Deploy Katib v1alpha3 manifests into a k8s cluster
deployv1alpha3:
bash scripts/v1alpha3/deploy.sh
# Deploy Katib v1beta1 manifests into a k8s cluster
# Deploy Katib v1beta1 manifests using Kustomize into a k8s cluster.
deploy:
bash scripts/v1beta1/deploy.sh
bash scripts/v1beta1/deploy.sh $(WITH_DATABASE_TYPE)
# Undeploy Katib v1alpha3 manifests from a k8s cluster
undeployv1alpha3:
bash scripts/v1alpha3/undeploy.sh
# Undeploy Katib v1beta1 manifests from a k8s cluster
# Undeploy Katib v1beta1 manifests using Kustomize from a k8s cluster
undeploy:
bash scripts/v1beta1/undeploy.sh
# Generate deepcopy, clientset, listers, informers, open-api and python SDK for APIs.
generated-codes: generate
ifneq ($(shell bash hack/verify-generated-codes.sh '.'; echo $$?),0)
$(error 'Please run "make generate" to generate codes')
endif
go-mod: sync-go-mod
ifneq ($(shell bash hack/verify-generated-codes.sh 'go.*'; echo $$?),0)
$(error 'Please run "go mod tidy -go $(GO_VERSION)" to sync Go modules')
endif
sync-go-mod:
go mod tidy -go $(GO_VERSION)
.PHONY: go-mod-download
go-mod-download:
go mod download
CONTROLLER_GEN = $(shell pwd)/bin/controller-gen
.PHONY: controller-gen
controller-gen:
@GOBIN=$(shell pwd)/bin GO111MODULE=on go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.16.5
# Run this if you update any existing controller APIs.
generate:
ifndef GOPATH
$(error GOPATH not defined, please define GOPATH. Run "go help gopath" to learn more about GOPATH)
# 1. Generate deepcopy, clientset, listers, informers for the APIs (hack/update-codegen.sh)
# 2. Generate open-api for the APIs (hack/update-openapigen)
# 3. Generate Python SDK for Katib (hack/gen-python-sdk/gen-sdk.sh)
# 4. Generate gRPC manager APIs (pkg/apis/manager/v1beta1/build.sh and pkg/apis/manager/health/build.sh)
# 5. Generate Go mock codes
generate: go-mod-download controller-gen
ifndef HAS_MOCKGEN
go install go.uber.org/mock/mockgen@$(MOCKGEN_VERSION)
$(info "mockgen has been installed")
endif
go generate ./pkg/... ./cmd/...
hack/gen-python-sdk/gen-sdk.sh
hack/update-proto.sh
hack/update-mockgen.sh
# Build images for Katib v1alpha3 components
buildv1alpha3: depend generate
bash scripts/v1alpha3/build.sh
# Build images for Katib v1beta1 components
build: depend generate
ifeq ($(and $(REGISTRY),$(TAG)),)
$(error REGISTRY and TAG must be set. Usage make build REGISTRY=<registry> TAG=<TAG>)
# Build images for the Katib v1beta1 components.
build: generate
ifeq ($(and $(REGISTRY),$(TAG),$(CPU_ARCH)),)
$(error REGISTRY and TAG must be set. Usage: make build REGISTRY=<registry> TAG=<tag> CPU_ARCH=<cpu-architecture>)
endif
bash scripts/v1beta1/build.sh -r $(REGISTRY) -t $(TAG)
bash scripts/v1beta1/build.sh $(REGISTRY) $(TAG) $(CPU_ARCH)
# Prettier UI format check for Katib v1alpha3
prettier-check-v1alpha3:
npm run format:check --prefix pkg/ui/v1alpha3/frontend
# Build and push Katib images from the latest master commit.
push-latest: generate
bash scripts/v1beta1/build.sh $(KATIB_REGISTRY) latest $(CPU_ARCH)
bash scripts/v1beta1/build.sh $(KATIB_REGISTRY) $(COMMIT) $(CPU_ARCH)
bash scripts/v1beta1/push.sh $(KATIB_REGISTRY) latest
bash scripts/v1beta1/push.sh $(KATIB_REGISTRY) $(COMMIT)
# Prettier UI format check for Katib v1beta1
# Build and push Katib images for the given tag.
push-tag:
ifeq ($(TAG),)
$(error TAG must be set. Usage: make push-tag TAG=<release-tag>)
endif
bash scripts/v1beta1/build.sh $(KATIB_REGISTRY) $(TAG) $(CPU_ARCH)
bash scripts/v1beta1/push.sh $(KATIB_REGISTRY) $(TAG)
# Release a new version of Katib.
release:
ifeq ($(and $(BRANCH),$(TAG)),)
$(error BRANCH and TAG must be set. Usage: make release BRANCH=<branch> TAG=<tag>)
endif
bash scripts/v1beta1/release.sh $(BRANCH) $(TAG)
# Update all Katib images.
update-images:
ifeq ($(and $(OLD_PREFIX),$(NEW_PREFIX),$(TAG)),)
$(error OLD_PREFIX, NEW_PREFIX, and TAG must be set. \
Usage: make update-images OLD_PREFIX=<old-prefix> NEW_PREFIX=<new-prefix> TAG=<tag> \
For more information, check this file: scripts/v1beta1/update-images.sh)
endif
bash scripts/v1beta1/update-images.sh $(OLD_PREFIX) $(NEW_PREFIX) $(TAG)
# Prettier UI format check for Katib v1beta1.
prettier-check:
npm run format:check --prefix pkg/ui/v1beta1/frontend
# Update boilerplate for the source code.
update-boilerplate:
./hack/boilerplate/update-boilerplate.sh
prepare-pytest:
pip install --prefer-binary -r test/unit/v1beta1/requirements.txt
pip install --prefer-binary -r cmd/suggestion/hyperopt/v1beta1/requirements.txt
pip install --prefer-binary -r cmd/suggestion/optuna/v1beta1/requirements.txt
pip install --prefer-binary -r cmd/suggestion/hyperband/v1beta1/requirements.txt
pip install --prefer-binary -r cmd/suggestion/nas/enas/v1beta1/requirements.txt
pip install --prefer-binary -r cmd/suggestion/nas/darts/v1beta1/requirements.txt
pip install --prefer-binary -r cmd/suggestion/pbt/v1beta1/requirements.txt
pip install --prefer-binary -r cmd/earlystopping/medianstop/v1beta1/requirements.txt
pip install --prefer-binary -r cmd/metricscollector/v1beta1/tfevent-metricscollector/requirements.txt
# `TypeIs` was introduced in typing-extensions 4.10.0, and torch 2.6.0 requires typing-extensions>=4.10.0.
# REF: https://github.com/kubeflow/katib/pull/2504
# TODO (tenzen-y): Once we upgrade libraries depended on typing-extensions==4.5.0, we can remove this line.
pip install typing-extensions==4.10.0
prepare-pytest-testdata:
ifeq ("$(wildcard $(TEST_TENSORFLOW_EVENT_FILE_PATH))", "")
python examples/v1beta1/trial-images/tf-mnist-with-summaries/mnist.py --epochs 5 --batch-size 200 --log-path $(TEST_TENSORFLOW_EVENT_FILE_PATH)
endif
# TODO(Electronic-Waste): Remove the import rewrite when protobuf supports `python_package` option.
# REF: https://github.com/protocolbuffers/protobuf/issues/7061
pytest: prepare-pytest prepare-pytest-testdata
pytest ./test/unit/v1beta1/suggestion --ignore=./test/unit/v1beta1/suggestion/test_skopt_service.py
pytest ./test/unit/v1beta1/earlystopping
pytest ./test/unit/v1beta1/metricscollector
cp ./pkg/apis/manager/v1beta1/python/api_pb2.py ./sdk/python/v1beta1/kubeflow/katib/katib_api_pb2.py
cp ./pkg/apis/manager/v1beta1/python/api_pb2_grpc.py ./sdk/python/v1beta1/kubeflow/katib/katib_api_pb2_grpc.py
sed -i "s/api_pb2/kubeflow\.katib\.katib_api_pb2/g" ./sdk/python/v1beta1/kubeflow/katib/katib_api_pb2_grpc.py
pytest ./sdk/python/v1beta1/kubeflow/katib
rm ./sdk/python/v1beta1/kubeflow/katib/katib_api_pb2.py ./sdk/python/v1beta1/kubeflow/katib/katib_api_pb2_grpc.py
# The skopt service doesn't work appropriately with Python 3.11.
# So, we need to run the test with Python 3.9.
# TODO (tenzen-y): Once we stop to support skopt, we can remove this test.
# REF: https://github.com/kubeflow/katib/issues/2280
pytest-skopt:
pip install six
pip install --prefer-binary -r test/unit/v1beta1/requirements.txt
pip install --prefer-binary -r cmd/suggestion/skopt/v1beta1/requirements.txt
pytest ./test/unit/v1beta1/suggestion/test_skopt_service.py

14
OWNERS
View File

@ -1,8 +1,10 @@
approvers:
- gaocegege
- hougangliu
- johnugeorge
- andreyvelich
- andreyvelich
- gaocegege
- johnugeorge
reviewers:
- sperlingxx
- c-bata
- anencore94
- c-bata
- Electronic-Waste
emeritus_approvers:
- tenzen-y

View File

@ -1,3 +1,3 @@
version: "1"
version: "3"
domain: kubeflow.org
repo: github.com/kubeflow/katib

558
README.md
View File

@ -1,439 +1,213 @@
# Kubeflow Katib
[![Build Status](https://github.com/kubeflow/katib/actions/workflows/test-go.yaml/badge.svg?branch=master)](https://github.com/kubeflow/katib/actions/workflows/test-go.yaml?branch=master)
[![Coverage Status](https://coveralls.io/repos/github/kubeflow/katib/badge.svg?branch=master)](https://coveralls.io/github/kubeflow/katib?branch=master)
[![Go Report Card](https://goreportcard.com/badge/github.com/kubeflow/katib)](https://goreportcard.com/report/github.com/kubeflow/katib)
[![Releases](https://img.shields.io/github/release-pre/kubeflow/katib.svg?sort=semver)](https://github.com/kubeflow/katib/releases)
[![Slack Status](https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social)](https://www.kubeflow.org/docs/about/community/#kubeflow-slack-channels)
[![OpenSSF Best Practices](https://www.bestpractices.dev/projects/9941/badge)](https://www.bestpractices.dev/projects/9941)
<h1 align="center">
<img src="./docs/images/Katib_Logo.png" alt="logo" width="200">
<img src="./docs/images/logo-title.png" alt="logo" width="200">
<br>
</h1>
[![Build Status](https://travis-ci.org/kubeflow/katib.svg?branch=master)](https://travis-ci.org/kubeflow/katib)
[![Coverage Status](https://coveralls.io/repos/github/kubeflow/katib/badge.svg?branch=master)](https://coveralls.io/github/kubeflow/katib?branch=master)
[![Go Report Card](https://goreportcard.com/badge/github.com/kubeflow/katib)](https://goreportcard.com/report/github.com/kubeflow/katib)
Kubeflow Katib is a Kubernetes-native project for automated machine learning (AutoML).
Katib supports
[Hyperparameter Tuning](https://en.wikipedia.org/wiki/Hyperparameter_optimization),
[Early Stopping](https://en.wikipedia.org/wiki/Early_stopping) and
[Neural Architecture Search](https://en.wikipedia.org/wiki/Neural_architecture_search).
Katib is a Kubernetes-based system for [Hyperparameter Tuning][1] and [Neural Architecture Search][2]. Katib supports a number of ML frameworks, including TensorFlow, Apache MXNet, PyTorch, XGBoost, and others.
Katib is the project which is agnostic to machine learning (ML) frameworks.
It can tune hyperparameters of applications written in any language of the
users choice and natively supports many ML frameworks, such as
[TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/), [XGBoost](https://xgboost.readthedocs.io/en/latest/), and others.
Table of Contents
=================
* [Getting Started](#getting-started)
* [Name](#name)
* [Concepts in Katib](#concepts-in-katib)
* [Experiment](#experiment)
* [Suggestion](#suggestion)
* [Trial](#trial)
* [Worker Job](#worker-job)
* [Components in Katib](#components-in-katib)
* [Web UI](#web-ui)
* [API documentation](#api-documentation)
* [Installation](#installation)
* [TF operator](#tf-operator)
* [PyTorch operator](#pytorch-operator)
* [Katib](#katib)
* [Running examples](#running-examples)
* [Cleanups](#cleanups)
* [Katib SDK](#katib-sdk)
* [Quick Start](#quick-start)
* [Who are using Katib?](#who-are-using-katib)
* [Citation](#citation)
* [CONTRIBUTING](#contributing)
Created by [gh-md-toc](https://github.com/ekalinin/github-markdown-toc)
## Getting Started
See the [getting-started
guide](https://www.kubeflow.org/docs/components/hyperparameter-tuning/hyperparameter/)
on the Kubeflow website.
## Name
Katib can perform training jobs using any Kubernetes
[Custom Resources](https://www.kubeflow.org/docs/components/katib/trial-template/)
with out of the box support for [Kubeflow Training Operator](https://github.com/kubeflow/training-operator),
[Argo Workflows](https://github.com/argoproj/argo-workflows), [Tekton Pipelines](https://github.com/tektoncd/pipeline)
and many more.
Katib stands for `secretary` in Arabic.
## Concepts in Katib
## Search Algorithms
For a detailed description of the concepts in Katib, hyperparameter tuning, and
neural architecture search, see the [Kubeflow
documentation](https://www.kubeflow.org/docs/components/hyperparameter-tuning/overview/).
Katib supports several search algorithms. Follow the
[Kubeflow documentation](https://www.kubeflow.org/docs/components/katib/user-guides/hp-tuning/configure-algorithm/#hp-tuning-algorithms)
to know more about each algorithm and check the
[this guide](https://www.kubeflow.org/docs/components/katib/user-guides/hp-tuning/configure-algorithm/#use-custom-algorithm-in-katib)
to implement your custom algorithm.
Katib has the concepts of Experiment, Trial, Job and Suggestion.
<table>
<tbody>
<tr align="center">
<td>
<b>Hyperparameter Tuning</b>
</td>
<td>
<b>Neural Architecture Search</b>
</td>
<td>
<b>Early Stopping</b>
</td>
</tr>
<tr align="center">
<td>
<a href="https://www.kubeflow.org/docs/components/katib/experiment/#random-search">Random Search</a>
</td>
<td>
<a href="https://www.kubeflow.org/docs/components/katib/experiment/#neural-architecture-search-based-on-enas">ENAS</a>
</td>
<td>
<a href="https://www.kubeflow.org/docs/components/katib/early-stopping/#median-stopping-rule">Median Stop</a>
</td>
</tr>
<tr align="center">
<td>
<a href="https://www.kubeflow.org/docs/components/katib/experiment/#grid-search">Grid Search</a>
</td>
<td>
<a href="https://www.kubeflow.org/docs/components/katib/experiment/#differentiable-architecture-search-darts">DARTS</a>
</td>
<td>
</td>
</tr>
<tr align="center">
<td>
<a href="https://www.kubeflow.org/docs/components/katib/experiment/#bayesian-optimization">Bayesian Optimization</a>
</td>
<td>
</td>
<td>
</td>
</tr>
<tr align="center">
<td>
<a href="https://www.kubeflow.org/docs/components/katib/experiment/#tree-of-parzen-estimators-tpe">TPE</a>
</td>
<td>
</td>
<td>
</td>
</tr>
<tr align="center">
<td>
<a href="https://www.kubeflow.org/docs/components/katib/experiment/#multivariate-tpe">Multivariate TPE</a>
</td>
<td>
</td>
<td>
</td>
</tr>
<tr align="center">
<td>
<a href="https://www.kubeflow.org/docs/components/katib/experiment/#covariance-matrix-adaptation-evolution-strategy-cma-es">CMA-ES</a>
</td>
<td>
</td>
<td>
</td>
</tr>
<tr align="center">
<td>
<a href="https://www.kubeflow.org/docs/components/katib/experiment/#sobols-quasirandom-sequence">Sobol's Quasirandom Sequence</a>
</td>
<td>
</td>
<td>
</td>
</tr>
<tr align="center">
<td>
<a href="https://www.kubeflow.org/docs/components/katib/experiment/#hyperband">HyperBand</a>
</td>
<td>
</td>
<td>
</td>
</tr>
<tr align="center">
<td>
<a href="https://www.kubeflow.org/docs/components/katib/experiment/#pbt">Population Based Training</a>
</td>
<td>
</td>
<td>
</td>
</tr>
</tbody>
</table>
### Experiment
To perform the above algorithms Katib supports the following frameworks:
`Experiment` represents a single optimization run over a feasible space.
Each `Experiment` contains a configuration:
- [Goptuna](https://github.com/c-bata/goptuna)
- [Hyperopt](https://github.com/hyperopt/hyperopt)
- [Optuna](https://github.com/optuna/optuna)
- [Scikit Optimize](https://github.com/scikit-optimize/scikit-optimize)
1. Objective: What we are trying to optimize.
2. Search Space: Constraints for configurations describing the feasible space.
3. Search Algorithm: How to find the optimal configurations.
## Prerequisites
`Experiment` is defined as a CRD. See the detailed guide to [configuring and running a Katib
experiment](https://kubeflow.org/docs/components/hyperparameter-tuning/experiment/)
in the Kubeflow docs.
### Suggestion
A Suggestion is a proposed solution to the optimization problem which is one set of hyperparameter values or a list of parameter assignments. Then a `Trial` will be created to evaluate the parameter assignments.
`Suggestion` is defined as a CRD.
### Trial
A `Trial` is one iteration of the optimization process, which is one `worker job` instance with a list of parameter assignments(corresponding to a suggestion).
`Trial` is defined as a CRD.
### Worker Job
A `Worker Job` refers to a process responsible for evaluating a `Trial` and calculating its objective value.
The worker kind can be [Kubernetes Job](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/) which is a non distributed execution, [Kubeflow TFJob](https://www.kubeflow.org/docs/guides/components/tftraining/) or [Kubeflow PyTorchJob](https://www.kubeflow.org/docs/guides/components/pytorch/) which are distributed executions.
Thus, Katib supports multiple frameworks with the help of different job kinds.
Currently Katib supports the following exploration algorithms:
#### Hyperparameter Tuning
* [Random Search](https://en.wikipedia.org/wiki/Hyperparameter_optimization#Random_search)
* [Tree of Parzen Estimators (TPE)](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf)
* [Grid Search](https://en.wikipedia.org/wiki/Hyperparameter_optimization#Grid_search)
* [Hyperband](https://arxiv.org/pdf/1603.06560.pdf)
* [Bayesian Optimization](https://arxiv.org/pdf/1012.2599.pdf)
* [CMA Evolution Strategy](https://arxiv.org/abs/1604.00772)
#### Neural Architecture Search
* [Efficient Neural Architecture Search (ENAS)](https://github.com/kubeflow/katib/tree/master/pkg/suggestion/v1beta1/nas/enas)
* [Differentiable Architecture Search (DARTS)](https://github.com/kubeflow/katib/tree/master/pkg/suggestion/v1beta1/nas/darts)
## Components in Katib
Katib consists of several components as shown below. Each component is running on k8s as a deployment.
Each component communicates with others via GRPC and the API is defined at `pkg/apis/manager/v1beta1/api.proto`
for v1beta1 version and `pkg/apis/manager/v1alpha3/api.proto` for v1alpha3 version.
- Katib main components:
- katib-db-manager: GRPC API server of Katib which is the DB Interface.
- katib-mysql: Data storage backend of Katib using mysql.
- katib-ui: User interface of Katib.
- katib-controller: Controller for Katib CRDs in Kubernetes.
## Web UI
Katib provides a Web UI.
You can visualize general trend of Hyper parameter space and each training history. You can use
[random-example](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/random-example.yaml) or
[other examples](https://github.com/kubeflow/katib/blob/master/examples/v1beta1) to generate a similar UI.
![katibui](./docs/images/katib-ui.png)
## GRPC API documentation
See the [Katib v1beta1 API reference docs](https://github.com/kubeflow/katib/blob/master/pkg/apis/manager/v1beta1/gen-doc/api.md).
See the [Katib v1alpha3 API reference docs](https://www.kubeflow.org/docs/reference/katib/).
Please check [the official Kubeflow documentation](https://www.kubeflow.org/docs/components/katib/installation/#prerequisites)
for prerequisites to install Katib.
## Installation
For standard installation of Katib with support for all job operators,
install Kubeflow. Current official Katib version in Kubeflow latest release is v1alpha3.
See the documentation:
Please follow [the Kubeflow Katib guide](https://www.kubeflow.org/docs/components/katib/installation/#installing-katib)
for the detailed instructions on how to install Katib.
* [Kubeflow installation
guide](https://www.kubeflow.org/docs/started/getting-started/)
* [Kubeflow hyperparameter tuning
guides](https://www.kubeflow.org/docs/components/hyperparameter-tuning/).
### Installing the Control Plane
If you install Katib with other Kubeflow components, you can't submit Katib jobs in Kubeflow namespace.
Alternatively, if you want to install Katib manually with TF and PyTorch operators support, follow these steps:
Create Kubeflow namespace:
Run the following command to install the latest stable release of Katib control plane:
```
kubectl create namespace kubeflow
kubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=v0.17.0"
```
Clone Kubeflow manifest repository:
Run the following command to install the latest changes of Katib control plane:
```
git clone git@github.com:kubeflow/manifests.git
Set `MANIFESTS_DIR` to the cloned folder.
export MANIFESTS_DIR=<cloned-folder>
kubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=master"
```
### TF operator
For the Katib Experiments check the [complete examples list](./examples/v1beta1).
For installing TF operator, run the following:
### Installing the Python SDK
```
cd "${MANIFESTS_DIR}/tf-training/tf-job-crds/base"
kustomize build . | kubectl apply -f -
cd "${MANIFESTS_DIR}/tf-training/tf-job-operator/base"
kustomize build . | kubectl apply -n kubeflow -f -
Katib implements [a Python SDK](https://pypi.org/project/kubeflow-katib/) to simplify creation of
hyperparameter tuning jobs for Data Scientists.
Run the following command to install the latest stable release of Katib SDK:
```sh
pip install -U kubeflow-katib
```
### PyTorch operator
## Getting Started
For installing PyTorch operator, run the following:
Please refer to [the getting started guide](https://www.kubeflow.org/docs/components/katib/getting-started/#getting-started-with-katib-python-sdk)
to quickly create your first hyperparameter tuning Experiment using the Python SDK.
```
cd "${MANIFESTS_DIR}/pytorch-job/pytorch-job-crds/base"
kustomize build . | kubectl apply -f -
cd "${MANIFESTS_DIR}/pytorch-job/pytorch-operator/base/"
kustomize build . | kubectl apply -n kubeflow -f -
```
## Community
### Katib
The following links provide information on how to get involved in the community:
Finally, you can install Katib.
- Attend [the bi-weekly AutoML and Training Working Group](https://bit.ly/2PWVCkV)
community meeting.
- Join our [`#kubeflow-katib`](https://www.kubeflow.org/docs/about/community/#kubeflow-slack-channels)
Slack channel.
- Check out [who is using Katib](ADOPTERS.md) and [presentations about Katib project](docs/presentations.md).
For v1beta1 version, run the following:
## Contributing
```
git clone git@github.com:kubeflow/katib.git
bash katib/scripts/v1beta1/deploy.sh
```
For v1alpha3 version, run the following:
```
cd "${MANIFESTS_DIR}/katib/katib-crds/base"
kustomize build . | kubectl apply -f -
cd "${MANIFESTS_DIR}/katib/katib-controller/base"
kustomize build . | kubectl apply -f -
```
If you install Katib from Kubeflow manifest repository and you want to use Katib in a cluster that doesn't have a StorageClass for dynamic volume provisioning, you have to create persistent volume manually to bound your persistent volume claim.
This is sample yaml file for creating a persistent volume with local storage:
```yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: katib-mysql
labels:
type: local
app: katib
spec:
storageClassName: katib
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /tmp/katib
```
Create this PV after deploying Katib package
Check if all components are running successfully:
```
kubectl get pods -n kubeflow
```
Expected output:
```
NAME READY STATUS RESTARTS AGE
katib-controller-858d6cc48c-df9jc 1/1 Running 1 20m
katib-db-manager-7966fbdf9b-w2tn8 1/1 Running 0 20m
katib-mysql-7f8bc6956f-898f9 1/1 Running 0 20m
katib-ui-7cf9f967bf-nm72p 1/1 Running 0 20m
pytorch-operator-55f966b548-9gq9v 1/1 Running 0 20m
tf-job-operator-796b4747d8-4fh82 1/1 Running 0 21m
```
### Running examples
After deploy everything, you can run examples to verify the installation.
Examples bellow are for v1beta1 version.
This is an example for TF operator:
```
kubectl create -f https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1beta1/tfjob-example.yaml
```
This is an example for PyTorch operator:
```
kubectl create -f https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1beta1/pytorchjob-example.yaml
```
You can check status of experiment
```yaml
$ kubectl describe experiment tfjob-example -n kubeflow
Name: tfjob-example
Namespace: kubeflow
Labels: <none>
Annotations: <none>
API Version: kubeflow.org/v1beta1
Kind: Experiment
Metadata:
Creation Timestamp: 2020-07-15T14:27:53Z
Finalizers:
update-prometheus-metrics
Generation: 1
Resource Version: 100380029
Self Link: /apis/kubeflow.org/v1beta1/namespaces/kubeflow/experiments/tfjob-example
UID: 5e3cf1f5-c6a7-11ea-90dd-42010a9a0020
Spec:
Algorithm:
Algorithm Name: random
Max Failed Trial Count: 3
Max Trial Count: 12
Metrics Collector Spec:
Collector:
Kind: TensorFlowEvent
Source:
File System Path:
Kind: Directory
Path: /train
Objective:
Goal: 0.99
Metric Strategies:
Name: accuracy_1
Value: max
Objective Metric Name: accuracy_1
Type: maximize
Parallel Trial Count: 3
Parameters:
Feasible Space:
Max: 0.05
Min: 0.01
Name: learning_rate
Parameter Type: double
Feasible Space:
Max: 200
Min: 100
Name: batch_size
Parameter Type: int
Resume Policy: LongRunning
Trial Template:
Trial Parameters:
Description: Learning rate for the training model
Name: learningRate
Reference: learning_rate
Description: Batch Size
Name: batchSize
Reference: batch_size
Trial Spec:
API Version: kubeflow.org/v1
Kind: TFJob
Spec:
Tf Replica Specs:
Worker:
Replicas: 2
Restart Policy: OnFailure
Template:
Spec:
Containers:
Command:
python
/var/tf_mnist/mnist_with_summaries.py
--log_dir=/train/metrics
--learning_rate=${trialParameters.learningRate}
--batch_size=${trialParameters.batchSize}
Image: gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0
Image Pull Policy: Always
Name: tensorflow
Status:
Completion Time: 2020-07-15T14:30:52Z
Conditions:
Last Transition Time: 2020-07-15T14:27:53Z
Last Update Time: 2020-07-15T14:27:53Z
Message: Experiment is created
Reason: ExperimentCreated
Status: True
Type: Created
Last Transition Time: 2020-07-15T14:30:52Z
Last Update Time: 2020-07-15T14:30:52Z
Message: Experiment is running
Reason: ExperimentRunning
Status: False
Type: Running
Last Transition Time: 2020-07-15T14:30:52Z
Last Update Time: 2020-07-15T14:30:52Z
Message: Experiment has succeeded because Objective goal has reached
Reason: ExperimentGoalReached
Status: True
Type: Succeeded
Current Optimal Trial:
Best Trial Name: tfjob-example-gjxn54vl
Observation:
Metrics:
Latest: 0.966300010681
Max: 1.0
Min: 0.103260867298
Name: accuracy_1
Parameter Assignments:
Name: learning_rate
Value: 0.015945204040626416
Name: batch_size
Value: 184
Start Time: 2020-07-15T14:27:53Z
Succeeded Trial List:
tfjob-example-5jd8nnjg
tfjob-example-bgjfpd5t
tfjob-example-gjxn54vl
tfjob-example-vpdqxkch
tfjob-example-wvptx7gt
Trials: 5
Trials Succeeded: 5
Events: <none>
```
When the spec.Status.Condition becomes ```Succeeded```, the experiment is finished.
You can monitor your results in Katib UI.
Access Katib UI via Kubeflow dashboard if you have used standard installation or port-forward the `katib-ui` service if you have installed manually.
```
kubectl -n kubeflow port-forward svc/katib-ui 8080:80
```
You can access the Katib UI using this URL: ```http://localhost:8080/katib/```.
### Katib SDK
Katib supports Python SDK for v1beta1 and v1alpha3 version.
* See the [Katib v1beta1 SDK documentation](https://github.com/kubeflow/katib/tree/master/sdk/python/v1beta1).
* See the [Katib v1alpha3 SDK documentation](https://github.com/kubeflow/katib/tree/master/sdk/python/v1alpha3).
Run [`gen-sdk.sh`](https://github.com/kubeflow/katib/blob/master/hack/gen-python-sdk/gen-sdk.sh) to update SDK.
### Cleanups
To delete installed TF and PyTorch operator run `kubectl delete -f` on the respective folders.
To delete Katib for v1beta1 version run `bash katib/scripts/v1beta1/undeploy.sh`.
## Quick Start
Please see [Quick Start Guide](./docs/quick-start.md).
## Who are using Katib?
Please see [adopters.md](./docs/community/adopters.md).
## CONTRIBUTING
Please feel free to test the system! [developer-guide.md](./docs/developer-guide.md) is a good starting point for developers.
[1]: https://en.wikipedia.org/wiki/Hyperparameter_optimization
[2]: https://en.wikipedia.org/wiki/Neural_architecture_search
[3]: https://static.googleusercontent.com/media/research.google.com/ja//pubs/archive/bcb15507f4b52991a0783013df4222240e942381.pdf
Please refer to the [CONTRIBUTING guide](CONTRIBUTING.md).
## Citation
If you use Katib in a scientific publication, we would appreciate
citations to the following paper:
[A Scalable and Cloud-Native Hyperparameter Tuning System](https://arxiv.org/abs/2006.02085), George *et al.*, arXiv:2006.02085, 2020.
[A Scalable and Cloud-Native Hyperparameter Tuning System](https://arxiv.org/abs/2006.02085), George _et al._, arXiv:2006.02085, 2020.
Bibtex entry:

View File

@ -1,3 +1,71 @@
# Katib 2022/2023 Roadmap
## AutoML Features
- Support advance HyperParameter tuning algorithms:
- Population Based Training (PBT) - [#1382](https://github.com/kubeflow/katib/issues/1382)
- Tree of Parzen Estimators (TPE)
- Multivariate TPE
- Sobols Quasirandom Sequence
- Asynchronous Successive Halving - [ASHA](https://arxiv.org/pdf/1810.05934.pdf)
- Support multi-objective optimization - [#1549](https://github.com/kubeflow/katib/issues/1549)
- Support various HP distributions (log-uniform, uniform, normal) - [#1207](https://github.com/kubeflow/katib/issues/1207)
- Support Auto Model Compression - [#460](https://github.com/kubeflow/katib/issues/460)
- Support Auto Feature Engineering - [#475](https://github.com/kubeflow/katib/issues/475)
- Improve Neural Architecture Search design
## Backend and API Enhancements
- Conformance tests for Katib - [#2044](https://github.com/kubeflow/katib/issues/2044)
- Support push-based metrics collection in Katib - [#577](https://github.com/kubeflow/katib/issues/577)
- Support PostgreSQL as a Katib DB - [#915](https://github.com/kubeflow/katib/issues/915)
- Improve Katib scalability - [#1847](https://github.com/kubeflow/katib/issues/1847)
- Promote Katib APIs to the `v1` version
- Support multiple CRD versions (`v1beta1`, `v1`) with conversion webhook
## Improve Katib User Experience
- Simplify Katib Experiment creation with Katib SDK - [#1951](https://github.com/kubeflow/katib/pull/1951)
- Fully migrate to a new Katib UI - [Project 1](https://github.com/kubeflow/katib/projects/1)
- Expose Trial logs in Katib UI - [#971](https://github.com/kubeflow/katib/issues/971)
- Enhance Katib UI visualization metrics for AutoML Experiments
- Improve Katib Config UX - [#2150](https://github.com/kubeflow/katib/issues/2150)
## Integration with Kubeflow Components
- Kubeflow Pipeline as a Katib Trial target - [#1914](https://github.com/kubeflow/katib/issues/1914)
- Improve data passing when Katib Experiment is part of Kubeflow Pipeline - [#1846](https://github.com/kubeflow/katib/issues/1846)
# History
# Katib 2021 Roadmap
## New Features
### AutoML
- Support Population Based Training [#1382](https://github.com/kubeflow/katib/issues/1382)
- Support [ASHA](https://arxiv.org/pdf/1810.05934.pdf)
- Support Auto Model Compression [#460](https://github.com/kubeflow/katib/issues/460)
- Support Auto Feature Engineering [#475](https://github.com/kubeflow/katib/issues/475)
- Various CRDs for HP, NAS and other AutoML techniques.
### UI
- Migrate to the new Katib UI [Project 1](https://github.com/kubeflow/katib/projects/1)
- Hyperparameter importances visualization with fANOVA algorithm
## Enhancements
- Finish AWS CI/CD migration
- Support various parameter distribution [#1207](https://github.com/kubeflow/katib/issues/1207)
- Finish validation for Algorithms [#1126](https://github.com/kubeflow/katib/issues/1126)
- Refactor Hyperband [#1389](https://github.com/kubeflow/katib/issues/1389)
- Support multiple CRD version with conversion webhook
- MLMD integration with Katib Experiments
# Katib 2020 Roadmap
## New Features

64
SECURITY.md Normal file
View File

@ -0,0 +1,64 @@
# Security Policy
## Supported Versions
Kubeflow Katib versions are expressed as `vX.Y.Z`, where X is the major version,
Y is the minor version, and Z is the patch version, following the
[Semantic Versioning](https://semver.org/) terminology.
The Kubeflow Katib project maintains release branches for the most recent two minor releases.
Applicable fixes, including security fixes, may be backported to those two release branches,
depending on severity and feasibility.
Users are encouraged to stay updated with the latest releases to benefit from security patches and
improvements.
## Reporting a Vulnerability
We're extremely grateful for security researchers and users that report vulnerabilities to the
Kubeflow Open Source Community. All reports are thoroughly investigated by Kubeflow projects owners.
You can use the following ways to report security vulnerabilities privately:
- Using the Kubeflow Katib repository [GitHub Security Advisory](https://github.com/kubeflow/katib/security/advisories/new).
- Using our private Kubeflow Steering Committee mailing list: ksc@kubeflow.org.
Please provide detailed information to help us understand and address the issue promptly.
## Disclosure Process
**Acknowledgment**: We will acknowledge receipt of your report within 10 business days.
**Assessment**: The Kubeflow projects owners will investigate the reported issue to determine its
validity and severity.
**Resolution**: If the issue is confirmed, we will work on a fix and prepare a release.
**Notification**: Once a fix is available, we will notify the reporter and coordinate a public
disclosure.
**Public Disclosure**: Details of the vulnerability and the fix will be published in the project's
release notes and communicated through appropriate channels.
## Prevention Mechanisms
Kubeflow Katib employs several measures to prevent security issues:
**Code Reviews**: All code changes are reviewed by maintainers to ensure code quality and security.
**Dependency Management**: Regular updates and monitoring of dependencies (e.g. Dependabot) to
address known vulnerabilities.
**Continuous Integration**: Automated testing and security checks are integrated into the CI/CD pipeline.
**Image Scanning**: Container images are scanned for vulnerabilities.
## Communication Channels
For the general questions please join the following resources:
- Kubeflow [Slack channels](https://www.kubeflow.org/docs/about/community/#kubeflow-slack-channels).
- Kubeflow discuss [mailing list](https://www.kubeflow.org/docs/about/community/#kubeflow-mailing-list).
Please **do not report** security vulnerabilities through public channels.

View File

@ -1,26 +0,0 @@
FROM golang:alpine AS build-env
# The GOPATH in the image is /go.
ADD . /go/src/github.com/kubeflow/katib
WORKDIR /go/src/github.com/kubeflow/katib/cmd/db-manager
RUN if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
apk --update add git gcc musl-dev && \
go build -o katib-db-manager ./v1alpha3; \
else \
go build -o katib-db-manager ./v1alpha3; \
fi
RUN GRPC_HEALTH_PROBE_VERSION=v0.3.1 && \
if [ "$(uname -m)" = "ppc64le" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-ppc64le; \
elif [ "$(uname -m)" = "aarch64" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64; \
else \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64; \
fi && \
chmod +x /bin/grpc_health_probe
FROM alpine:3.7
WORKDIR /app
COPY --from=build-env /bin/grpc_health_probe /bin/
COPY --from=build-env /go/src/github.com/kubeflow/katib/cmd/db-manager/katib-db-manager /app/
ENTRYPOINT ["./katib-db-manager"]
CMD ["-w", "kubernetes"]

View File

@ -1,100 +0,0 @@
package main
import (
"context"
"flag"
"fmt"
"net"
"os"
health_pb "github.com/kubeflow/katib/pkg/apis/manager/health"
api_pb "github.com/kubeflow/katib/pkg/apis/manager/v1alpha3"
db "github.com/kubeflow/katib/pkg/db/v1alpha3"
"github.com/kubeflow/katib/pkg/db/v1alpha3/common"
"k8s.io/klog"
"google.golang.org/grpc"
"google.golang.org/grpc/reflection"
)
const (
port = "0.0.0.0:6789"
)
var dbIf common.KatibDBInterface
type server struct {
}
// Report a log of Observations for a Trial.
// The log consists of timestamp and value of metric.
// Katib store every log of metrics.
// You can see accuracy curve or other metric logs on UI.
func (s *server) ReportObservationLog(ctx context.Context, in *api_pb.ReportObservationLogRequest) (*api_pb.ReportObservationLogReply, error) {
err := dbIf.RegisterObservationLog(in.TrialName, in.ObservationLog)
return &api_pb.ReportObservationLogReply{}, err
}
// Get all log of Observations for a Trial.
func (s *server) GetObservationLog(ctx context.Context, in *api_pb.GetObservationLogRequest) (*api_pb.GetObservationLogReply, error) {
ol, err := dbIf.GetObservationLog(in.TrialName, in.MetricName, in.StartTime, in.EndTime)
return &api_pb.GetObservationLogReply{
ObservationLog: ol,
}, err
}
// Delete all log of Observations for a Trial.
func (s *server) DeleteObservationLog(ctx context.Context, in *api_pb.DeleteObservationLogRequest) (*api_pb.DeleteObservationLogReply, error) {
err := dbIf.DeleteObservationLog(in.TrialName)
return &api_pb.DeleteObservationLogReply{}, err
}
func (s *server) Check(ctx context.Context, in *health_pb.HealthCheckRequest) (*health_pb.HealthCheckResponse, error) {
resp := health_pb.HealthCheckResponse{
Status: health_pb.HealthCheckResponse_SERVING,
}
// We only accept optional service name only if it's set to suggested format.
if in != nil && in.Service != "" && in.Service != "grpc.health.v1.Health" {
resp.Status = health_pb.HealthCheckResponse_UNKNOWN
return &resp, fmt.Errorf("grpc.health.v1.Health can only be accepted if you specify service name.")
}
// Check if connection to katib db driver is okay since otherwise manager could not serve most of its methods.
err := dbIf.SelectOne()
if err != nil {
resp.Status = health_pb.HealthCheckResponse_NOT_SERVING
return &resp, fmt.Errorf("Failed to execute `SELECT 1` probe: %v", err)
}
return &resp, nil
}
func main() {
flag.Parse()
var err error
dbNameEnvName := common.DBNameEnvName
dbName := os.Getenv(dbNameEnvName)
if dbName == "" {
klog.Fatal("DB_NAME env is not set. Exiting")
}
dbIf, err = db.NewKatibDBInterface(dbName)
if err != nil {
klog.Fatalf("Failed to open db connection: %v", err)
}
dbIf.DBInit()
listener, err := net.Listen("tcp", port)
if err != nil {
klog.Fatalf("Failed to listen: %v", err)
}
size := 1<<31 - 1
klog.Infof("Start Katib manager: %s", port)
s := grpc.NewServer(grpc.MaxRecvMsgSize(size), grpc.MaxSendMsgSize(size))
api_pb.RegisterManagerServer(s, &server{})
health_pb.RegisterHealthServer(s, &server{})
reflection.Register(s)
if err = s.Serve(listener); err != nil {
klog.Fatalf("Failed to serve: %v", err)
}
}

View File

@ -1,171 +0,0 @@
package main
import (
"context"
"testing"
"github.com/golang/mock/gomock"
health_pb "github.com/kubeflow/katib/pkg/apis/manager/health"
api_pb "github.com/kubeflow/katib/pkg/apis/manager/v1alpha3"
mockdb "github.com/kubeflow/katib/pkg/mock/v1alpha3/db"
)
func TestReportObservationLog(t *testing.T) {
ctrl := gomock.NewController(t)
defer ctrl.Finish()
s := &server{}
mockDB := mockdb.NewMockKatibDBInterface(ctrl)
dbIf = mockDB
req := &api_pb.ReportObservationLogRequest{
TrialName: "test1-trial1",
ObservationLog: &api_pb.ObservationLog{
MetricLogs: []*api_pb.MetricLog{
{
TimeStamp: "2019-02-03T04:05:06+09:00",
Metric: &api_pb.Metric{
Name: "f1_score",
Value: "88.95",
},
},
{
TimeStamp: "2019-02-03T04:05:06+09:00",
Metric: &api_pb.Metric{
Name: "loss",
Value: "0.5",
},
},
{
TimeStamp: "2019-02-03T04:05:06+09:00",
Metric: &api_pb.Metric{
Name: "precision",
Value: "88.7",
},
},
{
TimeStamp: "2019-02-03T04:05:06+09:00",
Metric: &api_pb.Metric{
Name: "recall",
Value: "89.2",
},
},
},
},
}
mockDB.EXPECT().RegisterObservationLog(req.TrialName, req.ObservationLog).Return(nil)
_, err := s.ReportObservationLog(context.Background(), req)
if err != nil {
t.Fatalf("ReportObservationLog Error %v", err)
}
}
func TestGetObservationLog(t *testing.T) {
ctrl := gomock.NewController(t)
defer ctrl.Finish()
s := &server{}
mockDB := mockdb.NewMockKatibDBInterface(ctrl)
dbIf = mockDB
req := &api_pb.GetObservationLogRequest{
TrialName: "test1-trial1",
StartTime: "2019-02-03T03:05:06+09:00",
EndTime: "2019-02-03T05:05:06+09:00",
}
obs := &api_pb.ObservationLog{
MetricLogs: []*api_pb.MetricLog{
{
TimeStamp: "2019-02-03T04:05:06+09:00",
Metric: &api_pb.Metric{
Name: "f1_score",
Value: "88.95",
},
},
{
TimeStamp: "2019-02-03T04:05:06+09:00",
Metric: &api_pb.Metric{
Name: "loss",
Value: "0.5",
},
},
{
TimeStamp: "2019-02-03T04:05:06+09:00",
Metric: &api_pb.Metric{
Name: "precision",
Value: "88.7",
},
},
{
TimeStamp: "2019-02-03T04:05:06+09:00",
Metric: &api_pb.Metric{
Name: "recall",
Value: "89.2",
},
},
},
}
mockDB.EXPECT().GetObservationLog(req.TrialName, req.MetricName, req.StartTime, req.EndTime).Return(obs, nil)
ret, err := s.GetObservationLog(context.Background(), req)
if err != nil {
t.Fatalf("GetObservationLog Error %v", err)
}
if len(obs.MetricLogs) != len(ret.ObservationLog.MetricLogs) {
t.Fatalf("GetObservationLog Test fail expect metrics number %d got %d", len(obs.MetricLogs), len(ret.ObservationLog.MetricLogs))
}
}
func TestDeleteObservationLog(t *testing.T) {
ctrl := gomock.NewController(t)
defer ctrl.Finish()
s := &server{}
mockDB := mockdb.NewMockKatibDBInterface(ctrl)
dbIf = mockDB
req := &api_pb.DeleteObservationLogRequest{
TrialName: "test1-trial1",
}
mockDB.EXPECT().DeleteObservationLog(req.TrialName).Return(nil)
_, err := s.DeleteObservationLog(context.Background(), req)
if err != nil {
t.Fatalf("DeleteExperiment Error %v", err)
}
}
func TestCheck(t *testing.T) {
ctrl := gomock.NewController(t)
defer ctrl.Finish()
s := &server{}
mockDB := mockdb.NewMockKatibDBInterface(ctrl)
dbIf = mockDB
testCases := []struct {
Request *health_pb.HealthCheckRequest
ExpectedStatus health_pb.HealthCheckResponse_ServingStatus
Name string
}{
{
Request: &health_pb.HealthCheckRequest{
Service: "grpc.health.v1.Health",
},
ExpectedStatus: health_pb.HealthCheckResponse_SERVING,
Name: "Valid Request",
},
{
Request: &health_pb.HealthCheckRequest{
Service: "grpc.health.v1.1.Health",
},
ExpectedStatus: health_pb.HealthCheckResponse_UNKNOWN,
Name: "Invalid service name",
},
}
mockDB.EXPECT().SelectOne().Return(nil)
for _, tc := range testCases {
response, _ := s.Check(context.Background(), tc.Request)
if response.Status != tc.ExpectedStatus {
t.Fatalf("Case %v failed. ExpectedStatus %v, got %v", tc.Name, tc.ExpectedStatus, response.Status)
}
}
}

View File

@ -1,26 +1,24 @@
# Build the Katib DB manager.
FROM golang:alpine AS build-env
# The GOPATH in the image is /go.
ADD . /go/src/github.com/kubeflow/katib
WORKDIR /go/src/github.com/kubeflow/katib/cmd/db-manager
RUN if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
apk --update add git gcc musl-dev && \
go build -o katib-db-manager ./v1beta1; \
else \
go build -o katib-db-manager ./v1beta1; \
fi
RUN GRPC_HEALTH_PROBE_VERSION=v0.3.1 && \
if [ "$(uname -m)" = "ppc64le" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-ppc64le; \
elif [ "$(uname -m)" = "aarch64" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64; \
else \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64; \
fi && \
chmod +x /bin/grpc_health_probe
FROM alpine:3.7
ARG TARGETARCH
WORKDIR /go/src/github.com/kubeflow/katib
# Download packages.
COPY go.mod .
COPY go.sum .
RUN go mod download -x
# Copy sources.
COPY cmd/ cmd/
COPY pkg/ pkg/
# Build the binary.
RUN CGO_ENABLED=0 GOOS=linux GOARCH="${TARGETARCH}" go build -a -o katib-db-manager ./cmd/db-manager/v1beta1
# Copy the db-manager into a thin image.
FROM alpine:3.15
WORKDIR /app
COPY --from=build-env /bin/grpc_health_probe /bin/
COPY --from=build-env /go/src/github.com/kubeflow/katib/cmd/db-manager/katib-db-manager /app/
COPY --from=build-env /go/src/github.com/kubeflow/katib/katib-db-manager /app/
ENTRYPOINT ["./katib-db-manager"]
CMD ["-w", "kubernetes"]

View File

@ -1,3 +1,19 @@
/*
Copyright 2022 The Kubeflow Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package main
import (
@ -6,19 +22,21 @@ import (
"fmt"
"net"
"os"
"time"
health_pb "github.com/kubeflow/katib/pkg/apis/manager/health"
api_pb "github.com/kubeflow/katib/pkg/apis/manager/v1beta1"
db "github.com/kubeflow/katib/pkg/db/v1beta1"
"github.com/kubeflow/katib/pkg/db/v1beta1/common"
"k8s.io/klog"
"k8s.io/klog/v2"
"google.golang.org/grpc"
"google.golang.org/grpc/reflection"
)
const (
port = "0.0.0.0:6789"
defaultListenAddress = "0.0.0.0:6789"
defaultConnectTimeout = time.Second * 60
)
var dbIf common.KatibDBInterface
@ -71,25 +89,30 @@ func (s *server) Check(ctx context.Context, in *health_pb.HealthCheckRequest) (*
}
func main() {
var connectTimeout time.Duration
var listenAddress string
flag.DurationVar(&connectTimeout, "connect-timeout", defaultConnectTimeout, "Timeout before calling error during database connection. (e.g. 120s)")
flag.StringVar(&listenAddress, "listen-address", defaultListenAddress, "The network interface or IP address to receive incoming connections. (e.g. 0.0.0.0:6789)")
flag.Parse()
var err error
dbNameEnvName := common.DBNameEnvName
dbName := os.Getenv(dbNameEnvName)
if dbName == "" {
klog.Fatal("DB_NAME env is not set. Exiting")
}
dbIf, err = db.NewKatibDBInterface(dbName)
dbIf, err = db.NewKatibDBInterface(dbName, connectTimeout)
if err != nil {
klog.Fatalf("Failed to open db connection: %v", err)
}
dbIf.DBInit()
listener, err := net.Listen("tcp", port)
listener, err := net.Listen("tcp", listenAddress)
if err != nil {
klog.Fatalf("Failed to listen: %v", err)
}
size := 1<<31 - 1
klog.Infof("Start Katib manager: %s", port)
klog.Infof("Start Katib manager: %s", listenAddress)
s := grpc.NewServer(grpc.MaxRecvMsgSize(size), grpc.MaxSendMsgSize(size))
api_pb.RegisterDBManagerServer(s, &server{})
health_pb.RegisterHealthServer(s, &server{})

View File

@ -1,10 +1,26 @@
/*
Copyright 2022 The Kubeflow Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package main
import (
"context"
"testing"
"github.com/golang/mock/gomock"
"go.uber.org/mock/gomock"
health_pb "github.com/kubeflow/katib/pkg/apis/manager/health"
api_pb "github.com/kubeflow/katib/pkg/apis/manager/v1beta1"

View File

@ -1,22 +1,24 @@
FROM python:3.6
FROM python:3.11-slim
ARG TARGETARCH
ENV TARGET_DIR /opt/katib
ENV EARLY_STOPPING_DIR cmd/earlystopping/medianstop/v1beta1
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python
RUN if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
apt-get -y update && \
apt-get -y install gfortran libopenblas-dev liblapack-dev && \
pip install cython; \
RUN if [ "${TARGETARCH}" = "ppc64le" ] || [ "${TARGETARCH}" = "arm64" ]; then \
apt-get -y update && \
apt-get -y install gfortran libopenblas-dev liblapack-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*; \
fi
ADD ./pkg/ ${TARGET_DIR}/pkg/
ADD ./${EARLY_STOPPING_DIR}/ ${TARGET_DIR}/${EARLY_STOPPING_DIR}/
WORKDIR ${TARGET_DIR}/${EARLY_STOPPING_DIR}
RUN pip install --no-cache-dir -r requirements.txt
WORKDIR ${TARGET_DIR}/${EARLY_STOPPING_DIR}
RUN pip install --prefer-binary --no-cache-dir -r requirements.txt
RUN chgrp -R 0 ${TARGET_DIR} \
&& chmod -R g+rwX ${TARGET_DIR}
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python
ENTRYPOINT ["python", "main.py"]

View File

@ -1,9 +1,25 @@
import grpc
import time
# Copyright 2022 The Kubeflow Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import time
from concurrent import futures
import grpc
from pkg.apis.manager.v1beta1.python import api_pb2_grpc
from pkg.earlystopping.v1beta1.medianstop.service import MedianStopService
from concurrent import futures
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
DEFAULT_PORT = "0.0.0.0:6788"

View File

@ -1,4 +1,5 @@
grpcio==1.23.0
protobuf==3.9.1
grpcio>=1.64.1
protobuf>=4.21.12,<5
googleapis-common-protos==1.6.0
kubernetes==11.0.0
kubernetes==22.6.0
cython>=0.29.24

View File

@ -1,22 +0,0 @@
# Build the manager binary
FROM golang:alpine AS build-env
# Copy in the go src
ADD . /go/src/github.com/kubeflow/katib
WORKDIR /go/src/github.com/kubeflow/katib/cmd/katib-controller
# Build
RUN if [ "$(uname -m)" = "ppc64le" ]; then \
CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le go build -a -o katib-controller ./v1alpha3; \
elif [ "$(uname -m)" = "aarch64" ]; then \
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -a -o katib-controller ./v1alpha3; \
else \
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -o katib-controller ./v1alpha3; \
fi
# Copy the controller-manager into a thin image
FROM alpine:3.7
WORKDIR /app
RUN apk update && apk add ca-certificates
COPY --from=build-env /go/src/github.com/kubeflow/katib/cmd/katib-controller/katib-controller .
USER 1000
ENTRYPOINT ["./katib-controller"]

View File

@ -1,123 +0,0 @@
/*
Copyright 2018 The Kubeflow Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
/*
Katib-controller is a controller (operator) for Experiments and Trials
*/
package main
import (
"flag"
"os"
"github.com/spf13/viper"
_ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
"sigs.k8s.io/controller-runtime/pkg/client/config"
"sigs.k8s.io/controller-runtime/pkg/manager"
logf "sigs.k8s.io/controller-runtime/pkg/runtime/log"
"sigs.k8s.io/controller-runtime/pkg/runtime/signals"
apis "github.com/kubeflow/katib/pkg/apis/controller"
controller "github.com/kubeflow/katib/pkg/controller.v1alpha3"
"github.com/kubeflow/katib/pkg/controller.v1alpha3/consts"
webhook "github.com/kubeflow/katib/pkg/webhook/v1alpha3"
)
func main() {
logf.SetLogger(logf.ZapLogger(false))
log := logf.Log.WithName("entrypoint")
var experimentSuggestionName string
var metricsAddr string
var webhookPort int
var certLocalFS bool
var injectSecurityContext bool
var serviceName string
var enableGRPCProbeInSuggestion bool
flag.StringVar(&experimentSuggestionName, "experiment-suggestion-name",
"default", "The implementation of suggestion interface in experiment controller (default|fake)")
flag.StringVar(&metricsAddr, "metrics-addr", ":8080", "The address the metric endpoint binds to.")
flag.IntVar(&webhookPort, "webhook-port", 8443, "The port number to be used for admission webhook server.")
flag.BoolVar(&certLocalFS, "cert-localfs", false, "Store the webhook cert in local file system")
flag.BoolVar(&injectSecurityContext, "webhook-inject-securitycontext", false, "Inject the securityContext of container[0] in the sidecar")
flag.StringVar(&serviceName, "webhook-service-name", "katib-controller", "The service name which will be used in webhook")
flag.BoolVar(&enableGRPCProbeInSuggestion, "enable-grpc-probe-in-suggestion", true, "enable grpc probe in suggestions")
flag.Parse()
// Set the config in viper.
viper.Set(consts.ConfigExperimentSuggestionName, experimentSuggestionName)
viper.Set(consts.ConfigCertLocalFS, certLocalFS)
viper.Set(consts.ConfigInjectSecurityContext, injectSecurityContext)
viper.Set(consts.ConfigEnableGRPCProbeInSuggestion, enableGRPCProbeInSuggestion)
log.Info("Config:",
consts.ConfigExperimentSuggestionName,
viper.GetString(consts.ConfigExperimentSuggestionName),
consts.ConfigCertLocalFS,
viper.GetBool(consts.ConfigCertLocalFS),
"webhook-port",
webhookPort,
"metrics-addr",
metricsAddr,
consts.ConfigInjectSecurityContext,
viper.GetBool(consts.ConfigInjectSecurityContext),
consts.ConfigEnableGRPCProbeInSuggestion,
viper.GetBool(consts.ConfigEnableGRPCProbeInSuggestion),
)
// Get a config to talk to the apiserver
cfg, err := config.GetConfig()
if err != nil {
log.Error(err, "Fail to get the config")
os.Exit(1)
}
// Create a new katib controller to provide shared dependencies and start components
mgr, err := manager.New(cfg, manager.Options{
MetricsBindAddress: metricsAddr,
})
if err != nil {
log.Error(err, "unable add APIs to scheme")
os.Exit(1)
}
log.Info("Registering Components.")
// Setup Scheme for all resources
if err := apis.AddToScheme(mgr.GetScheme()); err != nil {
log.Error(err, "Fail to create the manager")
os.Exit(1)
}
// Setup all Controllers
log.Info("Setting up controller")
if err := controller.AddToManager(mgr); err != nil {
log.Error(err, "unable to register controllers to the manager")
os.Exit(1)
}
log.Info("Setting up webhooks")
if err := webhook.AddToManager(mgr, int32(webhookPort), serviceName); err != nil {
log.Error(err, "unable to register webhooks to the manager")
os.Exit(1)
}
// Start the Cmd
log.Info("Starting the Cmd.")
if err := mgr.Start(signals.SetupSignalHandler()); err != nil {
log.Error(err, "unable to run the manager")
os.Exit(1)
}
}

View File

@ -1,22 +1,24 @@
# Build the manager binary
# Build the Katib controller.
FROM golang:alpine AS build-env
# Copy in the go src
ADD . /go/src/github.com/kubeflow/katib
ARG TARGETARCH
WORKDIR /go/src/github.com/kubeflow/katib/cmd/katib-controller
# Build
RUN if [ "$(uname -m)" = "ppc64le" ]; then \
CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le go build -a -o katib-controller ./v1beta1; \
elif [ "$(uname -m)" = "aarch64" ]; then \
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -a -o katib-controller ./v1beta1; \
else \
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -o katib-controller ./v1beta1; \
fi
# Copy the controller-manager into a thin image
FROM alpine:3.7
WORKDIR /go/src/github.com/kubeflow/katib
# Download packages.
COPY go.mod .
COPY go.sum .
RUN go mod download -x
# Copy sources.
COPY cmd/ cmd/
COPY pkg/ pkg/
# Build the binary.
RUN CGO_ENABLED=0 GOOS=linux GOARCH=${TARGETARCH} go build -a -o katib-controller ./cmd/katib-controller/v1beta1
# Copy the controller-manager into a thin image.
FROM alpine:3.15
WORKDIR /app
RUN apk update && apk add ca-certificates
COPY --from=build-env /go/src/github.com/kubeflow/katib/cmd/katib-controller/katib-controller .
USER 1000
COPY --from=build-env /go/src/github.com/kubeflow/katib/katib-controller .
ENTRYPOINT ["./katib-controller"]

View File

@ -1,9 +1,12 @@
/*
Copyright 2018 The Kubeflow Authors
Copyright 2022 The Kubeflow Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@ -12,7 +15,7 @@ limitations under the License.
*/
/*
Katib-controller is a controller (operator) for Experiments and Trials
Katib-controller is a controller (operator) for Experiments and Trials
*/
package main
@ -21,60 +24,75 @@ import (
"os"
"github.com/spf13/viper"
"k8s.io/apimachinery/pkg/runtime"
_ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
"sigs.k8s.io/controller-runtime/pkg/client/config"
"sigs.k8s.io/controller-runtime/pkg/healthz"
logf "sigs.k8s.io/controller-runtime/pkg/log"
"sigs.k8s.io/controller-runtime/pkg/log/zap"
"sigs.k8s.io/controller-runtime/pkg/manager"
logf "sigs.k8s.io/controller-runtime/pkg/runtime/log"
"sigs.k8s.io/controller-runtime/pkg/runtime/signals"
"sigs.k8s.io/controller-runtime/pkg/manager/signals"
metricsserver "sigs.k8s.io/controller-runtime/pkg/metrics/server"
"sigs.k8s.io/controller-runtime/pkg/webhook"
configv1beta1 "github.com/kubeflow/katib/pkg/apis/config/v1beta1"
apis "github.com/kubeflow/katib/pkg/apis/controller"
controller "github.com/kubeflow/katib/pkg/controller.v1beta1"
cert "github.com/kubeflow/katib/pkg/certgenerator/v1beta1"
"github.com/kubeflow/katib/pkg/controller.v1beta1"
"github.com/kubeflow/katib/pkg/controller.v1beta1/consts"
trialutil "github.com/kubeflow/katib/pkg/controller.v1beta1/trial/util"
webhook "github.com/kubeflow/katib/pkg/webhook/v1beta1"
"github.com/kubeflow/katib/pkg/util/v1beta1/katibconfig"
webhookv1beta1 "github.com/kubeflow/katib/pkg/webhook/v1beta1"
utilruntime "k8s.io/apimachinery/pkg/util/runtime"
clientgoscheme "k8s.io/client-go/kubernetes/scheme"
)
var (
scheme = runtime.NewScheme()
log = logf.Log.WithName("entrypoint")
)
func init() {
utilruntime.Must(apis.AddToScheme(scheme))
utilruntime.Must(configv1beta1.AddToScheme(scheme))
utilruntime.Must(clientgoscheme.AddToScheme(scheme))
}
func main() {
logf.SetLogger(logf.ZapLogger(false))
log := logf.Log.WithName("entrypoint")
var experimentSuggestionName string
var metricsAddr string
var webhookPort int
var certLocalFS bool
var injectSecurityContext bool
var serviceName string
var enableGRPCProbeInSuggestion bool
var trialResources trialutil.GvkListFlag
flag.StringVar(&experimentSuggestionName, "experiment-suggestion-name",
"default", "The implementation of suggestion interface in experiment controller (default)")
flag.StringVar(&metricsAddr, "metrics-addr", ":8080", "The address the metric endpoint binds to.")
flag.IntVar(&webhookPort, "webhook-port", 8443, "The port number to be used for admission webhook server.")
flag.BoolVar(&certLocalFS, "cert-localfs", false, "Store the webhook cert in local file system")
flag.BoolVar(&injectSecurityContext, "webhook-inject-securitycontext", false, "Inject the securityContext of container[0] in the sidecar")
flag.StringVar(&serviceName, "webhook-service-name", "katib-controller", "The service name which will be used in webhook")
flag.BoolVar(&enableGRPCProbeInSuggestion, "enable-grpc-probe-in-suggestion", true, "enable grpc probe in suggestions")
flag.Var(&trialResources, "trial-resources", "The list of resources that can be used as trial template, in the form: Kind.version.group (e.g. TFJob.v1.kubeflow.org)")
logf.SetLogger(zap.New())
var katibConfigFile string
flag.StringVar(&katibConfigFile, "katib-config", "",
"The katib-controller will load its initial configuration from this file. "+
"Omit this flag to use the default configuration values. ")
flag.Parse()
initConfig, err := katibconfig.GetInitConfigData(scheme, katibConfigFile)
if err != nil {
log.Error(err, "Failed to get KatibConfig")
os.Exit(1)
}
// Set the config in viper.
viper.Set(consts.ConfigExperimentSuggestionName, experimentSuggestionName)
viper.Set(consts.ConfigCertLocalFS, certLocalFS)
viper.Set(consts.ConfigInjectSecurityContext, injectSecurityContext)
viper.Set(consts.ConfigEnableGRPCProbeInSuggestion, enableGRPCProbeInSuggestion)
viper.Set(consts.ConfigTrialResources, trialResources)
viper.Set(consts.ConfigExperimentSuggestionName, initConfig.ControllerConfig.ExperimentSuggestionName)
viper.Set(consts.ConfigInjectSecurityContext, initConfig.ControllerConfig.InjectSecurityContext)
viper.Set(consts.ConfigEnableGRPCProbeInSuggestion, initConfig.ControllerConfig.EnableGRPCProbeInSuggestion)
trialGVKs, err := katibconfig.TrialResourcesToGVKs(initConfig.ControllerConfig.TrialResources)
if err != nil {
log.Error(err, "Failed to parse trialResources")
os.Exit(1)
}
viper.Set(consts.ConfigTrialResources, trialGVKs)
log.Info("Config:",
consts.ConfigExperimentSuggestionName,
viper.GetString(consts.ConfigExperimentSuggestionName),
consts.ConfigCertLocalFS,
viper.GetBool(consts.ConfigCertLocalFS),
"webhook-port",
webhookPort,
initConfig.ControllerConfig.WebhookPort,
"metrics-addr",
metricsAddr,
initConfig.ControllerConfig.MetricsAddr,
"healthz-addr",
initConfig.ControllerConfig.HealthzAddr,
consts.ConfigInjectSecurityContext,
viper.GetBool(consts.ConfigInjectSecurityContext),
consts.ConfigEnableGRPCProbeInSuggestion,
@ -92,38 +110,76 @@ func main() {
// Create a new katib controller to provide shared dependencies and start components
mgr, err := manager.New(cfg, manager.Options{
MetricsBindAddress: metricsAddr,
Metrics: metricsserver.Options{
BindAddress: initConfig.ControllerConfig.MetricsAddr,
},
HealthProbeBindAddress: initConfig.ControllerConfig.HealthzAddr,
LeaderElection: initConfig.ControllerConfig.EnableLeaderElection,
LeaderElectionID: initConfig.ControllerConfig.LeaderElectionID,
Scheme: scheme,
})
if err != nil {
log.Error(err, "unable add APIs to scheme")
log.Error(err, "Failed to create the manager")
os.Exit(1)
}
log.Info("Registering Components.")
// Setup Scheme for all resources
if err := apis.AddToScheme(mgr.GetScheme()); err != nil {
log.Error(err, "Fail to create the manager")
os.Exit(1)
// Create a webhook server.
hookServer := webhook.NewServer(webhook.Options{
Port: *initConfig.ControllerConfig.WebhookPort,
CertDir: consts.CertDir,
})
ctx := signals.SetupSignalHandler()
certsReady := make(chan struct{})
defer close(certsReady)
// The setupControllers will register controllers to the manager
// after generated certs for the admission webhooks.
go setupControllers(mgr, certsReady, hookServer)
if initConfig.CertGeneratorConfig.Enable {
if err = cert.AddToManager(mgr, initConfig.CertGeneratorConfig, certsReady); err != nil {
log.Error(err, "Failed to set up cert-generator")
}
} else {
certsReady <- struct{}{}
}
// Setup all Controllers
log.Info("Setting up controller")
if err := controller.AddToManager(mgr); err != nil {
log.Error(err, "unable to register controllers to the manager")
log.Info("Setting up health checker.")
if err := mgr.AddReadyzCheck("readyz", hookServer.StartedChecker()); err != nil {
log.Error(err, "Unable to add readyz endpoint to the manager")
os.Exit(1)
}
log.Info("Setting up webhooks")
if err := webhook.AddToManager(mgr, int32(webhookPort), serviceName); err != nil {
log.Error(err, "unable to register webhooks to the manager")
if err = mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil {
log.Error(err, "Add webhook server health checker to the manager failed")
os.Exit(1)
}
// Start the Cmd
log.Info("Starting the Cmd.")
if err := mgr.Start(signals.SetupSignalHandler()); err != nil {
log.Error(err, "unable to run the manager")
log.Info("Starting the manager.")
if err = mgr.Start(ctx); err != nil {
log.Error(err, "Unable to run the manager")
os.Exit(1)
}
}
func setupControllers(mgr manager.Manager, certsReady chan struct{}, hookServer webhook.Server) {
// The certsReady blocks to register controllers until generated certs.
<-certsReady
log.Info("Certs ready")
// Setup all Controllers
log.Info("Setting up controller.")
if err := controller.AddToManager(mgr); err != nil {
log.Error(err, "Unable to register controllers to the manager")
os.Exit(1)
}
log.Info("Setting up webhooks.")
if err := webhookv1beta1.AddToManager(mgr, hookServer); err != nil {
log.Error(err, "Unable to register webhooks to the manager")
os.Exit(1)
}
}

View File

@ -1,22 +0,0 @@
# Build the manager binary
FROM golang:alpine AS build-env
# Copy in the go src
ADD . /go/src/github.com/kubeflow/katib
WORKDIR /go/src/github.com/kubeflow/katib/cmd/metricscollector/v1alpha3/file-metricscollector/
# Build
RUN if [ "$(uname -m)" = "ppc64le" ]; then \
CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le go build -a -o file-metricscollector ./; \
elif [ "$(uname -m)" = "aarch64" ]; then \
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -a -o file-metricscollector ./; \
else \
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -o file-metricscollector ./; \
fi
# Copy the controller-manager into a thin image
FROM alpine:3.7
WORKDIR /app
COPY --from=build-env /go/src/github.com/kubeflow/katib/cmd/metricscollector/v1alpha3/file-metricscollector/file-metricscollector .
ENTRYPOINT ["./file-metricscollector"]

View File

@ -1,128 +0,0 @@
/*
Copyright 2018 The Kubeflow Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
/*
MetricsCollector is a default metricscollector for worker.
It will collect metrics from pod log.
You should print metrics in {{MetricsName}}={{MetricsValue}} format.
For example, the objective value name is F1 and the metrics are loss, your training code should print like below.
---
epoch 1:
batch1 loss=0.8
batch2 loss=0.6
F1=0.4
epoch 2:
batch1 loss=0.4
batch2 loss=0.2
F1=0.7
---
The metrics collector will collect all logs of metrics.
*/
package main
import (
"context"
"flag"
"os"
"path/filepath"
"strings"
"github.com/hpcloud/tail"
"google.golang.org/grpc"
"k8s.io/klog"
api "github.com/kubeflow/katib/pkg/apis/manager/v1alpha3"
"github.com/kubeflow/katib/pkg/metricscollector/v1alpha3/common"
filemc "github.com/kubeflow/katib/pkg/metricscollector/v1alpha3/file-metricscollector"
)
var (
metricsFileName = flag.String("path", "", "Metrics File Path")
trialName = flag.String("t", "", "Trial Name")
managerService = flag.String("s", "", "Katib Manager service")
metricNames = flag.String("m", "", "Metric names")
filters = flag.String("f", "", "Metric filters")
pollInterval = flag.Duration("p", common.DefaultPollInterval, "Poll interval to check if main process of worker container exit")
timeout = flag.Duration("timeout", common.DefaultTimeout, "Timeout to check if main process of worker container exit")
waitAll = flag.Bool("w", common.DefaultWaitAll, "Whether wait for all other main process of container exiting")
)
func printMetricsFile(mFile string) {
for {
_, err := os.Stat(mFile)
if err == nil {
break
} else if os.IsNotExist(err) {
continue
} else {
klog.Fatalf("could not watch metrics file: %v", err)
}
}
t, _ := tail.TailFile(mFile, tail.Config{Follow: true})
for line := range t.Lines {
klog.Info(line.Text)
}
}
func main() {
flag.Parse()
klog.Infof("Trial Name: %s", *trialName)
go printMetricsFile(*metricsFileName)
wopts := common.WaitPidsOpts{
PollInterval: *pollInterval,
Timeout: *timeout,
WaitAll: *waitAll,
CompletedMarkedDirPath: filepath.Dir(*metricsFileName),
}
if err := common.Wait(wopts); err != nil {
klog.Fatalf("Failed to wait for worker container: %v", err)
}
conn, err := grpc.Dial(*managerService, grpc.WithInsecure())
if err != nil {
klog.Fatalf("could not connect: %v", err)
}
defer conn.Close()
c := api.NewManagerClient(conn)
ctx := context.Background()
var metricList []string
if len(*metricNames) != 0 {
metricList = strings.Split(*metricNames, ";")
}
var filterList []string
if len(*filters) != 0 {
filterList = strings.Split(*filters, ";")
}
olog, err := filemc.CollectObservationLog(*metricsFileName, metricList, filterList)
if err != nil {
klog.Fatalf("Failed to collect logs: %v", err)
}
reportreq := &api.ReportObservationLogRequest{
TrialName: *trialName,
ObservationLog: olog,
}
_, err = c.ReportObservationLog(ctx, reportreq)
if err != nil {
klog.Fatalf("Failed to Report logs: %v", err)
}
klog.Infof("Metrics reported. :\n%v", olog)
}

View File

@ -1,7 +0,0 @@
FROM tensorflow/tensorflow:1.11.0
RUN pip install rfc3339 grpcio googleapis-common-protos
ADD . /usr/src/app/github.com/kubeflow/katib
WORKDIR /usr/src/app/github.com/kubeflow/katib/cmd/metricscollector/v1alpha3/tfevent-metricscollector/
RUN pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /usr/src/app/github.com/kubeflow/katib:/usr/src/app/github.com/kubeflow/katib/pkg/apis/manager/v1alpha3/python:/usr/src/app/github.com/kubeflow/katib/pkg/metricscollector/v1alpha3/tfevent-metricscollector/:/usr/src/app/github.com/kubeflow/katib/pkg/metricscollector/v1alpha3/common/
ENTRYPOINT ["python", "main.py"]

View File

@ -1,28 +0,0 @@
FROM ubuntu:18.04
RUN apt-get update \
&& apt-get -y install software-properties-common \
autoconf \
automake \
build-essential \
cmake \
pkg-config \
wget \
python-pip \
libhdf5-dev \
libhdf5-serial-dev \
hdf5-tools\
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN wget https://github.com/lhelontra/tensorflow-on-arm/releases/download/v1.11.0/tensorflow-1.11.0-cp27-none-linux_aarch64.whl \
&& pip install tensorflow-1.11.0-cp27-none-linux_aarch64.whl \
&& rm tensorflow-1.11.0-cp27-none-linux_aarch64.whl \
&& rm -rf .cache
RUN pip install rfc3339 grpcio googleapis-common-protos jupyter
ADD . /usr/src/app/github.com/kubeflow/katib
WORKDIR /usr/src/app/github.com/kubeflow/katib/cmd/metricscollector/v1alpha3/tfevent-metricscollector/
RUN pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /usr/src/app/github.com/kubeflow/katib:/usr/src/app/github.com/kubeflow/katib/pkg/apis/manager/v1alpha3/python:/usr/src/app/github.com/kubeflow/katib/pkg/metricscollector/v1alpha3/tfevent-metricscollector/:/usr/src/app/github.com/kubeflow/katib/pkg/metricscollector/v1alpha3/common/
ENTRYPOINT ["python", "main.py"]

View File

@ -1,7 +0,0 @@
FROM ibmcom/tensorflow-ppc64le:1.14.0-py3
RUN pip install rfc3339 grpcio googleapis-common-protos
ADD . /usr/src/app/github.com/kubeflow/katib
WORKDIR /usr/src/app/github.com/kubeflow/katib/cmd/metricscollector/v1alpha3/tfevent-metricscollector/
RUN pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /usr/src/app/github.com/kubeflow/katib:/usr/src/app/github.com/kubeflow/katib/pkg/apis/manager/v1alpha3/python:/usr/src/app/github.com/kubeflow/katib/pkg/metricscollector/v1alpha3/tfevent-metricscollector/:/usr/src/app/github.com/kubeflow/katib/pkg/metricscollector/v1alpha3/common/
ENTRYPOINT ["python", "main.py"]

View File

@ -1,54 +0,0 @@
import grpc
import argparse
import api_pb2
import api_pb2_grpc
from pns import WaitOtherMainProcesses
from tfevent_loader import MetricsCollector
from logging import getLogger, StreamHandler, INFO
timeout_in_seconds = 60
def parse_options():
parser = argparse.ArgumentParser(
description='TF-Event MetricsCollector',
add_help=True
)
parser.add_argument("-s", "--manager_server_addr",
type=str, default="katib-db-manager:6789")
parser.add_argument("-t", "--trial_name", type=str, default="")
parser.add_argument("-path", "--dir_path", type=str, default="/log")
parser.add_argument("-m", "--metric_names", type=str, default="")
parser.add_argument("-f", "--metric_filters", type=str, default="")
opt = parser.parse_args()
return opt
if __name__ == '__main__':
logger = getLogger(__name__)
handler = StreamHandler()
handler.setLevel(INFO)
logger.setLevel(INFO)
logger.addHandler(handler)
logger.propagate = False
opt = parse_options()
manager_server = opt.manager_server_addr.split(':')
if len(manager_server) != 2:
raise Exception("Invalid katib manager service address: %s" %
opt.manager_server_addr)
WaitOtherMainProcesses(completed_marked_dir=opt.dir_path)
mc = MetricsCollector(opt.metric_names.split(';'))
observation_log = mc.parse_file(opt.dir_path)
channel = grpc.beta.implementations.insecure_channel(
manager_server[0], int(manager_server[1]))
with api_pb2.beta_create_Manager_stub(channel) as client:
logger.info("In " + opt.trial_name + " " +
str(len(observation_log.metric_logs)) + " metrics will be reported.")
client.ReportObservationLog(api_pb2.ReportObservationLogRequest(
trial_name=opt.trial_name,
observation_log=observation_log
), timeout=timeout_in_seconds)

View File

@ -1,22 +1,24 @@
# Build the manager binary
# Build the Katib file metrics collector.
FROM golang:alpine AS build-env
# Copy in the go src
ADD . /go/src/github.com/kubeflow/katib
ARG TARGETARCH
WORKDIR /go/src/github.com/kubeflow/katib/cmd/metricscollector/v1beta1/file-metricscollector/
WORKDIR /go/src/github.com/kubeflow/katib
# Build
RUN if [ "$(uname -m)" = "ppc64le" ]; then \
CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le go build -a -o file-metricscollector ./; \
elif [ "$(uname -m)" = "aarch64" ]; then \
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -a -o file-metricscollector ./; \
else \
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -o file-metricscollector ./; \
fi
# Download packages.
COPY go.mod .
COPY go.sum .
RUN go mod download -x
# Copy the controller-manager into a thin image
FROM alpine:3.7
# Copy sources.
COPY cmd/ cmd/
COPY pkg/ pkg/
# Build the binary.
RUN CGO_ENABLED=0 GOOS=linux GOARCH=${TARGETARCH} go build -a -o file-metricscollector ./cmd/metricscollector/v1beta1/file-metricscollector
# Copy the file metrics collector into a thin image.
FROM alpine:3.15
WORKDIR /app
COPY --from=build-env /go/src/github.com/kubeflow/katib/cmd/metricscollector/v1beta1/file-metricscollector/file-metricscollector .
COPY --from=build-env /go/src/github.com/kubeflow/katib/file-metricscollector .
ENTRYPOINT ["./file-metricscollector"]

View File

@ -1,5 +1,5 @@
/*
Copyright 2018 The Kubeflow Authors
Copyright 2022 The Kubeflow Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
@ -39,19 +39,21 @@ package main
import (
"context"
"encoding/json"
"flag"
"fmt"
"io/ioutil"
"os"
"path/filepath"
"regexp"
"strconv"
"strings"
"time"
"github.com/hpcloud/tail"
psutil "github.com/shirou/gopsutil/process"
"github.com/nxadm/tail"
psutil "github.com/shirou/gopsutil/v3/process"
"google.golang.org/grpc"
"k8s.io/klog"
"google.golang.org/grpc/credentials/insecure"
"k8s.io/klog/v2"
commonv1beta1 "github.com/kubeflow/katib/pkg/apis/controller/common/v1beta1"
api "github.com/kubeflow/katib/pkg/apis/manager/v1beta1"
@ -102,12 +104,13 @@ var (
earlyStopServiceAddr = flag.String("s-earlystop", "", "Katib Early Stopping service endpoint")
trialName = flag.String("t", "", "Trial Name")
metricsFilePath = flag.String("path", "", "Metrics File Path")
metricsFileFormat = flag.String("format", "", "Metrics File Format")
metricNames = flag.String("m", "", "Metric names")
objectiveType = flag.String("o-type", "", "Objective type")
metricFilters = flag.String("f", "", "Metric filters")
pollInterval = flag.Duration("p", common.DefaultPollInterval, "Poll interval between running processes check")
timeout = flag.Duration("timeout", common.DefaultTimeout, "Timeout before invoke error during running processes check")
waitAll = flag.Bool("w", common.DefaultWaitAll, "Whether wait for all other main process of container exiting")
waitAllProcesses = flag.String("w", common.DefaultWaitAllProcesses, "Whether wait for all other main process of container exiting")
stopRules stopRulesFlag
isEarlyStopped = false
)
@ -131,13 +134,17 @@ func printMetricsFile(mFile string) {
checkMetricFile(mFile)
// Print lines from metrics file.
t, _ := tail.TailFile(mFile, tail.Config{Follow: true})
t, err := tail.TailFile(mFile, tail.Config{Follow: true, ReOpen: true})
if err != nil {
klog.Errorf("Failed to open metrics file: %v", err)
}
for line := range t.Lines {
klog.Info(line.Text)
}
}
func watchMetricsFile(mFile string, stopRules stopRulesFlag, filters []string) {
func watchMetricsFile(mFile string, stopRules stopRulesFlag, filters []string, fileFormat commonv1beta1.FileFormat) {
// metricStartStep is the dict where key = metric name, value = start step.
// We should apply early stopping rule only if metric is reported at least "start_step" times.
@ -148,9 +155,6 @@ func watchMetricsFile(mFile string, stopRules stopRulesFlag, filters []string) {
}
}
// First metric is objective in metricNames array.
objMetric := strings.Split(*metricNames, ";")[0]
objType := commonv1beta1.ObjectiveType(*objectiveType)
// For objective metric we calculate best optimal value from the recorded metrics.
// This is workaround for Median Stop algorithm.
// TODO (andreyvelich): Think about it, maybe define latest, max or min strategy type in stop-rule as well ?
@ -159,8 +163,10 @@ func watchMetricsFile(mFile string, stopRules stopRulesFlag, filters []string) {
// Check that metric file exists.
checkMetricFile(mFile)
// Get Main proccess.
_, mainProcPid, err := common.GetMainProcesses(mFile)
// Get Main process.
// Extract the metric file dir path based on the file name.
mDirPath, _ := filepath.Split(mFile)
_, mainProcPid, err := common.GetMainProcesses(mDirPath)
if err != nil {
klog.Fatalf("GetMainProcesses failed: %v", err)
}
@ -169,9 +175,6 @@ func watchMetricsFile(mFile string, stopRules stopRulesFlag, filters []string) {
klog.Fatalf("Failed to create new Process from pid %v, error: %v", mainProcPid, err)
}
// Get list of regural expressions from filters.
metricRegList := filemc.GetFilterRegexpList(filters)
// Start watch log lines.
t, _ := tail.TailFile(mFile, tail.Config{Follow: true})
for line := range t.Lines {
@ -179,78 +182,82 @@ func watchMetricsFile(mFile string, stopRules stopRulesFlag, filters []string) {
// Print log line
klog.Info(logText)
// Check if log line contains metric from stop rules.
isRuleLine := false
for _, rule := range stopRules {
if strings.Contains(logText, rule.Name) {
isRuleLine = true
break
switch fileFormat {
case commonv1beta1.TextFormat:
// Get list of regural expressions from filters.
var metricRegList []*regexp.Regexp
metricRegList = filemc.GetFilterRegexpList(filters)
// Check if log line contains metric from stop rules.
isRuleLine := false
for _, rule := range stopRules {
if strings.Contains(logText, rule.Name) {
isRuleLine = true
break
}
}
// If log line doesn't contain appropriate metric, continue track file.
if !isRuleLine {
continue
}
}
// If log line doesn't contain appropriate metric, continue track file.
if !isRuleLine {
continue
}
// If log line contains appropriate metric, find all submatches from metric filters.
for _, metricReg := range metricRegList {
matchStrings := metricReg.FindAllStringSubmatch(logText, -1)
for _, subMatchList := range matchStrings {
if len(subMatchList) < 3 {
continue
}
// Submatch must have metric name and float value
metricName := strings.TrimSpace(subMatchList[1])
metricValue, err := strconv.ParseFloat(strings.TrimSpace(subMatchList[2]), 64)
if err != nil {
klog.Fatalf("Unable to parse value %v to float for metric %v", metricValue, metricName)
}
// stopRules contains array of EarlyStoppingRules that has not been reached yet.
// After rule is reached we delete appropriate element from the array.
for idx, rule := range stopRules {
if metricName != rule.Name {
// If log line contains appropriate metric, find all submatches from metric filters.
for _, metricReg := range metricRegList {
matchStrings := metricReg.FindAllStringSubmatch(logText, -1)
for _, subMatchList := range matchStrings {
if len(subMatchList) < 3 {
continue
}
// Calculate optimalObjValue.
if metricName == objMetric {
if optimalObjValue == nil {
optimalObjValue = &metricValue
} else if objType == commonv1beta1.ObjectiveTypeMaximize && metricValue > *optimalObjValue {
optimalObjValue = &metricValue
} else if objType == commonv1beta1.ObjectiveTypeMinimize && metricValue < *optimalObjValue {
optimalObjValue = &metricValue
}
// Assign best optimal value to metric value.
metricValue = *optimalObjValue
// Submatch must have metric name and float value
metricName := strings.TrimSpace(subMatchList[1])
metricValue, err := strconv.ParseFloat(strings.TrimSpace(subMatchList[2]), 64)
if err != nil {
klog.Fatalf("Unable to parse value %v to float for metric %v", metricValue, metricName)
}
// Reduce steps if appropriate metric is reported.
// Once rest steps are empty we apply early stopping rule.
if restSteps, ok := metricStartStep[metricName]; ok {
metricStartStep[metricName]--
if restSteps != 0 {
// stopRules contains array of EarlyStoppingRules that has not been reached yet.
// After rule is reached we delete appropriate element from the array.
for idx, rule := range stopRules {
if metricName != rule.Name {
continue
}
}
ruleValue, err := strconv.ParseFloat(rule.Value, 64)
if err != nil {
klog.Fatalf("Unable to parse value %v to float for rule metric %v", rule.Value, rule.Name)
}
// Metric value can be equal, less or greater than stop rule.
// Deleting suitable stop rule from the array.
if rule.Comparison == commonv1beta1.ComparisonTypeEqual && metricValue == ruleValue {
stopRules = deleteStopRule(stopRules, idx)
} else if rule.Comparison == commonv1beta1.ComparisonTypeLess && metricValue < ruleValue {
stopRules = deleteStopRule(stopRules, idx)
} else if rule.Comparison == commonv1beta1.ComparisonTypeGreater && metricValue > ruleValue {
stopRules = deleteStopRule(stopRules, idx)
stopRules, optimalObjValue = updateStopRules(stopRules, optimalObjValue, metricValue, metricStartStep, rule, idx)
}
}
}
case commonv1beta1.JsonFormat:
var logJsonObj map[string]interface{}
if err = json.Unmarshal([]byte(logText), &logJsonObj); err != nil {
klog.Fatalf("Failed to unmarshal logs in %v format, log: %s, error: %v", commonv1beta1.JsonFormat, logText, err)
}
// Check if log line contains metric from stop rules.
isRuleLine := false
for _, rule := range stopRules {
if _, exist := logJsonObj[rule.Name]; exist {
isRuleLine = true
break
}
}
// If log line doesn't contain appropriate metric, continue track file.
if !isRuleLine {
continue
}
// stopRules contains array of EarlyStoppingRules that has not been reached yet.
// After rule is reached we delete appropriate element from the array.
for idx, rule := range stopRules {
value, exist := logJsonObj[rule.Name].(string)
if !exist {
continue
}
metricValue, err := strconv.ParseFloat(strings.TrimSpace(value), 64)
if err != nil {
klog.Fatalf("Unable to parse value %v to float for metric %v", metricValue, rule.Name)
}
stopRules, optimalObjValue = updateStopRules(stopRules, optimalObjValue, metricValue, metricStartStep, rule, idx)
}
default:
klog.Fatalf("Format must be set to %v or %v", commonv1beta1.TextFormat, commonv1beta1.JsonFormat)
}
// If stopRules array is empty, Trial is early stopped.
@ -266,12 +273,12 @@ func watchMetricsFile(mFile string, stopRules stopRulesFlag, filters []string) {
klog.Fatalf("Create mark file %v error: %v", markFile, err)
}
err = ioutil.WriteFile(markFile, []byte(common.TrainingEarlyStopped), 0)
err = os.WriteFile(markFile, []byte(common.TrainingEarlyStopped), 0)
if err != nil {
klog.Fatalf("Write to file %v error: %v", markFile, err)
}
// Get child proccess from main PID.
// Get child process from main PID.
childProc, err := mainProc.Children()
if err != nil {
klog.Fatalf("Get children proceses for main PID: %v failed: %v", mainProcPid, err)
@ -289,9 +296,9 @@ func watchMetricsFile(mFile string, stopRules stopRulesFlag, filters []string) {
}
// Report metrics to DB.
reportMetrics(filters)
reportMetrics(filters, fileFormat)
// Wait until main proccess is completed.
// Wait until main process is completed.
timeout := 60 * time.Second
endTime := time.Now().Add(timeout)
isProcRunning := true
@ -304,11 +311,10 @@ func watchMetricsFile(mFile string, stopRules stopRulesFlag, filters []string) {
}
// Create connection and client for Early Stopping service.
conn, err := grpc.Dial(*earlyStopServiceAddr, grpc.WithInsecure())
conn, err := grpc.NewClient(*earlyStopServiceAddr, grpc.WithTransportCredentials(insecure.NewCredentials()))
if err != nil {
klog.Fatalf("Could not connect to Early Stopping service, error: %v", err)
}
defer conn.Close()
c := api.NewEarlyStoppingClient(conn)
setTrialStatusReq := &api.SetTrialStatusRequest{
@ -320,13 +326,65 @@ func watchMetricsFile(mFile string, stopRules stopRulesFlag, filters []string) {
if err != nil {
klog.Fatalf("Set Trial status error: %v", err)
}
conn.Close()
klog.Infof("Trial status is successfully updated")
}
}
}
func updateStopRules(
stopRules []commonv1beta1.EarlyStoppingRule,
optimalObjValue *float64,
metricValue float64,
metricStartStep map[string]int,
rule commonv1beta1.EarlyStoppingRule,
ruleIdx int,
) ([]commonv1beta1.EarlyStoppingRule, *float64) {
// First metric is objective in metricNames array.
objMetric := strings.Split(*metricNames, ";")[0]
objType := commonv1beta1.ObjectiveType(*objectiveType)
// Calculate optimalObjValue.
if rule.Name == objMetric {
if optimalObjValue == nil {
optimalObjValue = &metricValue
} else if objType == commonv1beta1.ObjectiveTypeMaximize && metricValue > *optimalObjValue {
optimalObjValue = &metricValue
} else if objType == commonv1beta1.ObjectiveTypeMinimize && metricValue < *optimalObjValue {
optimalObjValue = &metricValue
}
// Assign best optimal value to metric value.
metricValue = *optimalObjValue
}
// Reduce steps if appropriate metric is reported.
// Once rest steps are empty we apply early stopping rule.
if _, ok := metricStartStep[rule.Name]; ok {
metricStartStep[rule.Name]--
if metricStartStep[rule.Name] != 0 {
return stopRules, optimalObjValue
}
}
ruleValue, err := strconv.ParseFloat(rule.Value, 64)
if err != nil {
klog.Fatalf("Unable to parse value %v to float for rule metric %v", rule.Value, rule.Name)
}
// Metric value can be equal, less or greater than stop rule.
// Deleting suitable stop rule from the array.
if rule.Comparison == commonv1beta1.ComparisonTypeEqual && metricValue == ruleValue {
return deleteStopRule(stopRules, ruleIdx), optimalObjValue
} else if rule.Comparison == commonv1beta1.ComparisonTypeLess && metricValue < ruleValue {
return deleteStopRule(stopRules, ruleIdx), optimalObjValue
} else if rule.Comparison == commonv1beta1.ComparisonTypeGreater && metricValue > ruleValue {
return deleteStopRule(stopRules, ruleIdx), optimalObjValue
}
return stopRules, optimalObjValue
}
func deleteStopRule(stopRules []commonv1beta1.EarlyStoppingRule, idx int) []commonv1beta1.EarlyStoppingRule {
if idx >= len(stopRules) {
klog.Fatalf("Index %v out of range stopRules: %v", idx, stopRules)
@ -346,17 +404,21 @@ func main() {
filters = strings.Split(*metricFilters, ";")
}
fileFormat := commonv1beta1.FileFormat(*metricsFileFormat)
// If stop rule is set we need to parse metrics during run.
if len(stopRules) != 0 {
go watchMetricsFile(*metricsFilePath, stopRules, filters)
go watchMetricsFile(*metricsFilePath, stopRules, filters, fileFormat)
} else {
go printMetricsFile(*metricsFilePath)
}
waitAll, _ := strconv.ParseBool(*waitAllProcesses)
wopts := common.WaitPidsOpts{
PollInterval: *pollInterval,
Timeout: *timeout,
WaitAll: *waitAll,
WaitAll: waitAll,
CompletedMarkedDirPath: filepath.Dir(*metricsFilePath),
}
if err := common.WaitMainProcesses(wopts); err != nil {
@ -365,13 +427,13 @@ func main() {
// If training was not early stopped, report the metrics.
if !isEarlyStopped {
reportMetrics(filters)
reportMetrics(filters, fileFormat)
}
}
func reportMetrics(filters []string) {
func reportMetrics(filters []string, fileFormat commonv1beta1.FileFormat) {
conn, err := grpc.Dial(*dbManagerServiceAddr, grpc.WithInsecure())
conn, err := grpc.NewClient(*dbManagerServiceAddr, grpc.WithTransportCredentials(insecure.NewCredentials()))
if err != nil {
klog.Fatalf("Could not connect to DB manager service, error: %v", err)
}
@ -382,7 +444,7 @@ func reportMetrics(filters []string) {
if len(*metricNames) != 0 {
metricList = strings.Split(*metricNames, ";")
}
olog, err := filemc.CollectObservationLog(*metricsFilePath, metricList, filters)
olog, err := filemc.CollectObservationLog(*metricsFilePath, metricList, filters, fileFormat)
if err != nil {
klog.Fatalf("Failed to collect logs: %v", err)
}

View File

@ -1,7 +1,24 @@
FROM tensorflow/tensorflow:1.11.0
RUN pip install rfc3339 grpcio googleapis-common-protos
ADD . /usr/src/app/github.com/kubeflow/katib
WORKDIR /usr/src/app/github.com/kubeflow/katib/cmd/metricscollector/v1beta1/tfevent-metricscollector/
RUN pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /usr/src/app/github.com/kubeflow/katib:/usr/src/app/github.com/kubeflow/katib/pkg/apis/manager/v1beta1/python:/usr/src/app/github.com/kubeflow/katib/pkg/metricscollector/v1beta1/tfevent-metricscollector/:/usr/src/app/github.com/kubeflow/katib/pkg/metricscollector/v1beta1/common/
FROM python:3.11-slim
ARG TARGETARCH
ENV TARGET_DIR /opt/katib
ENV METRICS_COLLECTOR_DIR cmd/metricscollector/v1beta1/tfevent-metricscollector
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/metricscollector/v1beta1/tfevent-metricscollector/::${TARGET_DIR}/pkg/metricscollector/v1beta1/common/
ADD ./pkg/ ${TARGET_DIR}/pkg/
ADD ./${METRICS_COLLECTOR_DIR}/ ${TARGET_DIR}/${METRICS_COLLECTOR_DIR}/
WORKDIR ${TARGET_DIR}/${METRICS_COLLECTOR_DIR}
RUN if [ "${TARGETARCH}" = "arm64" ]; then \
apt-get -y update && \
apt-get -y install gfortran libpcre3 libpcre3-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*; \
fi
RUN pip install --prefer-binary --no-cache-dir -r requirements.txt
RUN chgrp -R 0 ${TARGET_DIR} \
&& chmod -R g+rwX ${TARGET_DIR}
ENTRYPOINT ["python", "main.py"]

View File

@ -1,28 +0,0 @@
FROM ubuntu:18.04
RUN apt-get update \
&& apt-get -y install software-properties-common \
autoconf \
automake \
build-essential \
cmake \
pkg-config \
wget \
python-pip \
libhdf5-dev \
libhdf5-serial-dev \
hdf5-tools\
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN wget https://github.com/lhelontra/tensorflow-on-arm/releases/download/v1.11.0/tensorflow-1.11.0-cp27-none-linux_aarch64.whl \
&& pip install tensorflow-1.11.0-cp27-none-linux_aarch64.whl \
&& rm tensorflow-1.11.0-cp27-none-linux_aarch64.whl \
&& rm -rf .cache
RUN pip install rfc3339 grpcio googleapis-common-protos jupyter
ADD . /usr/src/app/github.com/kubeflow/katib
WORKDIR /usr/src/app/github.com/kubeflow/katib/cmd/metricscollector/v1beta1/tfevent-metricscollector/
RUN pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /usr/src/app/github.com/kubeflow/katib:/usr/src/app/github.com/kubeflow/katib/pkg/apis/manager/v1beta1/python:/usr/src/app/github.com/kubeflow/katib/pkg/metricscollector/v1beta1/tfevent-metricscollector/:/usr/src/app/github.com/kubeflow/katib/pkg/metricscollector/v1beta1/common/
ENTRYPOINT ["python", "main.py"]

View File

@ -1,7 +1,6 @@
FROM ibmcom/tensorflow-ppc64le:1.14.0-py3
RUN pip install rfc3339 grpcio googleapis-common-protos
FROM ibmcom/tensorflow-ppc64le:2.2.0-py3
ADD . /usr/src/app/github.com/kubeflow/katib
WORKDIR /usr/src/app/github.com/kubeflow/katib/cmd/metricscollector/v1beta1/tfevent-metricscollector/
RUN pip install --no-cache-dir -r requirements.txt
RUN pip install --prefer-binary --no-cache-dir -r requirements.txt
ENV PYTHONPATH /usr/src/app/github.com/kubeflow/katib:/usr/src/app/github.com/kubeflow/katib/pkg/apis/manager/v1beta1/python:/usr/src/app/github.com/kubeflow/katib/pkg/metricscollector/v1beta1/tfevent-metricscollector/:/usr/src/app/github.com/kubeflow/katib/pkg/metricscollector/v1beta1/common/
ENTRYPOINT ["python", "main.py"]

View File

@ -1,10 +1,26 @@
import grpc
# Copyright 2022 The Kubeflow Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
from logging import INFO, StreamHandler, getLogger
import api_pb2
from pns import WaitMainProcesses
import api_pb2_grpc
import const
import grpc
from pns import WaitMainProcesses
from tfevent_loader import MetricsCollector
from logging import getLogger, StreamHandler, INFO
timeout_in_seconds = 60
@ -24,7 +40,7 @@ def parse_options():
parser.add_argument("-f", "--metric_filters", type=str, default="")
parser.add_argument("-p", "--poll_interval", type=int, default=const.DEFAULT_POLL_INTERVAL)
parser.add_argument("-timeout", "--timeout", type=int, default=const.DEFAULT_TIMEOUT)
parser.add_argument("-w", "--wait_all", type=bool, default=const.DEFAULT_WAIT_ALL)
parser.add_argument("-w", "--wait_all_processes", type=str, default=const.DEFAULT_WAIT_ALL_PROCESSES)
opt = parser.parse_args()
return opt
@ -38,27 +54,31 @@ if __name__ == '__main__':
logger.addHandler(handler)
logger.propagate = False
opt = parse_options()
wait_all_processes = opt.wait_all_processes.lower() == "true"
db_manager_server = opt.db_manager_server_addr.split(':')
if len(db_manager_server) != 2:
raise Exception("Invalid Katib DB manager service address: %s" %
opt.db_manager_server_addr)
raise Exception(
f"Invalid Katib DB manager service address: {opt.db_manager_server_addr}"
)
WaitMainProcesses(
pool_interval=opt.poll_interval,
timout=opt.timeout,
wait_all=opt.wait_all,
completed_marked_dir=opt.metrics_file_dir)
wait_all=wait_all_processes,
completed_marked_dir=opt.metrics_file_dir,
)
mc = MetricsCollector(opt.metric_names.split(';'))
mc = MetricsCollector(opt.metric_names.split(";"))
observation_log = mc.parse_file(opt.metrics_file_dir)
channel = grpc.beta.implementations.insecure_channel(
db_manager_server[0], int(db_manager_server[1]))
with api_pb2.beta_create_DBManager_stub(channel) as client:
logger.info("In " + opt.trial_name + " " +
str(len(observation_log.metric_logs)) + " metrics will be reported.")
client.ReportObservationLog(api_pb2.ReportObservationLogRequest(
trial_name=opt.trial_name,
observation_log=observation_log
), timeout=timeout_in_seconds)
with grpc.insecure_channel(opt.db_manager_server_addr) as channel:
stub = api_pb2_grpc.DBManagerStub(channel)
logger.info(
f"In {opt.trial_name} {str(len(observation_log.metric_logs))} metrics will be reported."
)
stub.ReportObservationLog(
api_pb2.ReportObservationLogRequest(
trial_name=opt.trial_name, observation_log=observation_log
),
timeout=timeout_in_seconds,
)

View File

@ -1 +1,6 @@
psutil==5.6.6
psutil==5.9.4
rfc3339>=6.2
grpcio>=1.64.1
googleapis-common-protos==1.6.0
tensorflow==2.16.1
protobuf>=4.21.12,<5

View File

@ -1,31 +0,0 @@
FROM python:3.6
ENV TARGET_DIR /opt/katib
ENV SUGGESTION_DIR cmd/suggestion/chocolate/v1alpha3
RUN if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
apt-get -y update && \
apt-get -y install gfortran libopenblas-dev liblapack-dev && \
pip install cython 'numpy>=1.13.3'; \
fi
RUN GRPC_HEALTH_PROBE_VERSION=v0.3.1 && \
if [ "$(uname -m)" = "ppc64le" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-ppc64le; \
elif [ "$(uname -m)" = "aarch64" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64; \
else \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64; \
fi && \
chmod +x /bin/grpc_health_probe
ADD ./pkg/ ${TARGET_DIR}/pkg/
ADD ./${SUGGESTION_DIR}/ ${TARGET_DIR}/${SUGGESTION_DIR}/
WORKDIR ${TARGET_DIR}/${SUGGESTION_DIR}
RUN pip install --no-cache-dir -r requirements.txt
RUN chgrp -R 0 ${TARGET_DIR} \
&& chmod -R g+rwX ${TARGET_DIR}
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1alpha3/python:${TARGET_DIR}/pkg/apis/manager/health/python
ENTRYPOINT ["python", "main.py"]

View File

@ -1,28 +0,0 @@
import grpc
import time
from pkg.apis.manager.v1alpha3.python import api_pb2_grpc
from pkg.apis.manager.health.python import health_pb2_grpc
from pkg.suggestion.v1alpha3.chocolate.service import ChocolateService
from concurrent import futures
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
DEFAULT_PORT = "0.0.0.0:6789"
def serve():
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
service = ChocolateService()
api_pb2_grpc.add_SuggestionServicer_to_server(service, server)
health_pb2_grpc.add_HealthServicer_to_server(service, server)
server.add_insecure_port(DEFAULT_PORT)
print("Listening...")
server.start()
try:
while True:
time.sleep(_ONE_DAY_IN_SECONDS)
except KeyboardInterrupt:
server.stop(0)
if __name__ == "__main__":
serve()

View File

@ -1,12 +0,0 @@
grpcio==1.23.0
duecredit===0.7.0
cloudpickle==0.5.6
numpy>=1.13.3
scikit-learn>=0.19.0
scipy>=0.19.1
forestci==0.3
protobuf==3.9.1
googleapis-common-protos==1.6.0
SQLAlchemy==1.3.8
git+https://github.com/AIworx-Labs/chocolate@master
ghalton>=0.6

View File

@ -1,31 +0,0 @@
FROM python:3.6
ENV TARGET_DIR /opt/katib
ENV SUGGESTION_DIR cmd/suggestion/chocolate/v1beta1
RUN if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
apt-get -y update && \
apt-get -y install gfortran libopenblas-dev liblapack-dev && \
pip install cython 'numpy>=1.13.3'; \
fi
RUN GRPC_HEALTH_PROBE_VERSION=v0.3.1 && \
if [ "$(uname -m)" = "ppc64le" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-ppc64le; \
elif [ "$(uname -m)" = "aarch64" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64; \
else \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64; \
fi && \
chmod +x /bin/grpc_health_probe
ADD ./pkg/ ${TARGET_DIR}/pkg/
ADD ./${SUGGESTION_DIR}/ ${TARGET_DIR}/${SUGGESTION_DIR}/
WORKDIR ${TARGET_DIR}/${SUGGESTION_DIR}
RUN pip install --no-cache-dir -r requirements.txt
RUN chgrp -R 0 ${TARGET_DIR} \
&& chmod -R g+rwX ${TARGET_DIR}
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/apis/manager/health/python
ENTRYPOINT ["python", "main.py"]

View File

@ -1,28 +0,0 @@
import grpc
import time
from pkg.apis.manager.v1beta1.python import api_pb2_grpc
from pkg.apis.manager.health.python import health_pb2_grpc
from pkg.suggestion.v1beta1.chocolate.service import ChocolateService
from concurrent import futures
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
DEFAULT_PORT = "0.0.0.0:6789"
def serve():
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
service = ChocolateService()
api_pb2_grpc.add_SuggestionServicer_to_server(service, server)
health_pb2_grpc.add_HealthServicer_to_server(service, server)
server.add_insecure_port(DEFAULT_PORT)
print("Listening...")
server.start()
try:
while True:
time.sleep(_ONE_DAY_IN_SECONDS)
except KeyboardInterrupt:
server.stop(0)
if __name__ == "__main__":
serve()

View File

@ -1,12 +0,0 @@
grpcio==1.23.0
duecredit===0.7.0
cloudpickle==0.5.6
numpy>=1.13.3
scikit-learn>=0.19.0
scipy>=0.19.1
forestci==0.3
protobuf==3.9.1
googleapis-common-protos==1.6.0
SQLAlchemy==1.3.8
git+https://github.com/AIworx-Labs/chocolate@master
ghalton>=0.6

View File

@ -1,33 +0,0 @@
FROM golang:alpine AS go-build
# The GOPATH in the image is /go.
ADD . /go/src/github.com/kubeflow/katib
WORKDIR /go/src/github.com/kubeflow/katib/cmd/suggestion/goptuna
RUN if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
apk --update add gcc musl-dev && \
go build -o goptuna-suggestion ./v1alpha3; \
else \
go build -o goptuna-suggestion ./v1alpha3; \
fi
RUN GRPC_HEALTH_PROBE_VERSION=v0.3.1 && \
if [ "$(uname -m)" = "ppc64le" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-ppc64le; \
elif [ "$(uname -m)" = "aarch64" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64; \
else \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64; \
fi && \
chmod +x /bin/grpc_health_probe
FROM alpine:3.7
ENV TARGET_DIR /opt/katib
WORKDIR ${TARGET_DIR}
COPY --from=go-build /bin/grpc_health_probe /bin/
COPY --from=go-build /go/src/github.com/kubeflow/katib/cmd/suggestion/goptuna/goptuna-suggestion ${TARGET_DIR}/
RUN chgrp -R 0 ${TARGET_DIR} \
&& chmod -R g+rwX ${TARGET_DIR}
ENTRYPOINT ["./goptuna-suggestion"]

View File

@ -1,41 +0,0 @@
package main
import (
"context"
"net"
health_pb "github.com/kubeflow/katib/pkg/apis/manager/health"
"github.com/kubeflow/katib/pkg/apis/manager/v1alpha3"
suggestion "github.com/kubeflow/katib/pkg/suggestion/v1alpha3/goptuna"
"google.golang.org/grpc"
"k8s.io/klog"
)
const (
address = "0.0.0.0:6789"
)
type healthService struct {
}
func (s *healthService) Check(ctx context.Context, in *health_pb.HealthCheckRequest) (*health_pb.HealthCheckResponse, error) {
return &health_pb.HealthCheckResponse{
Status: health_pb.HealthCheckResponse_SERVING,
}, nil
}
func main() {
l, err := net.Listen("tcp", address)
if err != nil {
klog.Fatalf("Failed to listen: %v", err)
}
srv := grpc.NewServer()
api_v1_alpha3.RegisterSuggestionServer(srv, suggestion.NewSuggestionService())
health_pb.RegisterHealthServer(srv, &healthService{})
klog.Infof("Start Goptuna suggestion service: %s", address)
err = srv.Serve(l)
if err != nil {
klog.Fatalf("Failed to serve: %v", err)
}
}

View File

@ -1,31 +1,30 @@
FROM golang:alpine AS go-build
# The GOPATH in the image is /go.
ADD . /go/src/github.com/kubeflow/katib
WORKDIR /go/src/github.com/kubeflow/katib/cmd/suggestion/goptuna
RUN if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
apk --update add gcc musl-dev && \
go build -o goptuna-suggestion ./v1beta1; \
else \
go build -o goptuna-suggestion ./v1beta1; \
fi
# Build the Goptuna Suggestion.
FROM golang:alpine AS build-env
RUN GRPC_HEALTH_PROBE_VERSION=v0.3.1 && \
if [ "$(uname -m)" = "ppc64le" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-ppc64le; \
elif [ "$(uname -m)" = "aarch64" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64; \
else \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64; \
fi && \
chmod +x /bin/grpc_health_probe
ARG TARGETARCH
FROM alpine:3.7
WORKDIR /go/src/github.com/kubeflow/katib
# Download packages.
COPY go.mod .
COPY go.sum .
RUN go mod download -x
# Copy sources.
COPY cmd/ cmd/
COPY pkg/ pkg/
# Build the binary.
RUN CGO_ENABLED=0 GOOS=linux GOARCH=${TARGETARCH} go build -a -o goptuna-suggestion ./cmd/suggestion/goptuna/v1beta1
# Copy the Goptuna suggestion into a thin image.
FROM alpine:3.15
ENV TARGET_DIR /opt/katib
WORKDIR ${TARGET_DIR}
COPY --from=go-build /bin/grpc_health_probe /bin/
COPY --from=go-build /go/src/github.com/kubeflow/katib/cmd/suggestion/goptuna/goptuna-suggestion ${TARGET_DIR}/
COPY --from=build-env /go/src/github.com/kubeflow/katib/goptuna-suggestion ${TARGET_DIR}/
RUN chgrp -R 0 ${TARGET_DIR} \
&& chmod -R g+rwX ${TARGET_DIR}

View File

@ -1,3 +1,19 @@
/*
Copyright 2022 The Kubeflow Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package main
import (
@ -8,7 +24,7 @@ import (
api_v1_beta1 "github.com/kubeflow/katib/pkg/apis/manager/v1beta1"
suggestion "github.com/kubeflow/katib/pkg/suggestion/v1beta1/goptuna"
"google.golang.org/grpc"
"k8s.io/klog"
"k8s.io/klog/v2"
)
const (

View File

@ -1,32 +0,0 @@
FROM python:3.6
ENV TARGET_DIR /opt/katib
ENV SUGGESTION_DIR cmd/suggestion/hyperband/v1alpha3
RUN if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
apt-get -y update && \
apt-get -y install gfortran libopenblas-dev liblapack-dev && \
pip install cython; \
fi
RUN GRPC_HEALTH_PROBE_VERSION=v0.3.1 && \
if [ "$(uname -m)" = "ppc64le" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-ppc64le; \
elif [ "$(uname -m)" = "aarch64" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64; \
else \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64; \
fi && \
chmod +x /bin/grpc_health_probe
ADD ./pkg/ ${TARGET_DIR}/pkg/
ADD ./${SUGGESTION_DIR}/ ${TARGET_DIR}/${SUGGESTION_DIR}/
WORKDIR ${TARGET_DIR}/${SUGGESTION_DIR}
RUN pip install --no-cache-dir -r requirements.txt
RUN chgrp -R 0 ${TARGET_DIR} \
&& chmod -R g+rwX ${TARGET_DIR}
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1alpha3/python:${TARGET_DIR}/pkg/apis/manager/health/python
ENTRYPOINT ["python", "main.py"]

View File

@ -1,29 +0,0 @@
import grpc
import time
from pkg.apis.manager.v1alpha3.python import api_pb2_grpc
from pkg.apis.manager.health.python import health_pb2_grpc
from pkg.suggestion.v1alpha3.hyperband.service import HyperbandService
from concurrent import futures
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
DEFAULT_PORT = "0.0.0.0:6789"
def serve():
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
service = HyperbandService()
api_pb2_grpc.add_SuggestionServicer_to_server(service, server)
health_pb2_grpc.add_HealthServicer_to_server(service, server)
server.add_insecure_port(DEFAULT_PORT)
print("Listening...")
server.start()
try:
while True:
time.sleep(_ONE_DAY_IN_SECONDS)
except KeyboardInterrupt:
server.stop(0)
if __name__ == "__main__":
serve()

View File

@ -1,9 +0,0 @@
grpcio==1.23.0
duecredit===0.7.0
cloudpickle==0.5.6
numpy>=1.13.3
scikit-learn>=0.19.0
scipy>=0.19.1
forestci==0.3
protobuf==3.9.1
googleapis-common-protos==1.6.0

View File

@ -1,32 +1,24 @@
FROM python:3.6
FROM python:3.11-slim
ARG TARGETARCH
ENV TARGET_DIR /opt/katib
ENV SUGGESTION_DIR cmd/suggestion/hyperband/v1beta1
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/apis/manager/health/python
RUN if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
RUN if [ "${TARGETARCH}" = "ppc64le" ] || [ "${TARGETARCH}" = "arm64" ]; then \
apt-get -y update && \
apt-get -y install gfortran libopenblas-dev liblapack-dev && \
pip install cython; \
apt-get clean && \
rm -rf /var/lib/apt/lists/*; \
fi
RUN GRPC_HEALTH_PROBE_VERSION=v0.3.1 && \
if [ "$(uname -m)" = "ppc64le" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-ppc64le; \
elif [ "$(uname -m)" = "aarch64" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64; \
else \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64; \
fi && \
chmod +x /bin/grpc_health_probe
ADD ./pkg/ ${TARGET_DIR}/pkg/
ADD ./${SUGGESTION_DIR}/ ${TARGET_DIR}/${SUGGESTION_DIR}/
WORKDIR ${TARGET_DIR}/${SUGGESTION_DIR}
RUN pip install --no-cache-dir -r requirements.txt
WORKDIR ${TARGET_DIR}/${SUGGESTION_DIR}
RUN pip install --prefer-binary --no-cache-dir -r requirements.txt
RUN chgrp -R 0 ${TARGET_DIR} \
&& chmod -R g+rwX ${TARGET_DIR}
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/apis/manager/health/python
ENTRYPOINT ["python", "main.py"]

View File

@ -1,10 +1,26 @@
import grpc
# Copyright 2022 The Kubeflow Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
from pkg.apis.manager.v1beta1.python import api_pb2_grpc
from pkg.apis.manager.health.python import health_pb2_grpc
from pkg.suggestion.v1beta1.hyperband.service import HyperbandService
from concurrent import futures
import grpc
from pkg.apis.manager.health.python import health_pb2_grpc
from pkg.apis.manager.v1beta1.python import api_pb2_grpc
from pkg.suggestion.v1beta1.hyperband.service import HyperbandService
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
DEFAULT_PORT = "0.0.0.0:6789"

View File

@ -1,9 +1,9 @@
grpcio==1.23.0
duecredit===0.7.0
grpcio>=1.64.1
cloudpickle==0.5.6
numpy>=1.13.3
scikit-learn>=0.19.0
scipy>=0.19.1
numpy>=1.25.2
scikit-learn>=0.24.0
scipy>=1.5.4
forestci==0.3
protobuf==3.9.1
protobuf>=4.21.12,<5
googleapis-common-protos==1.6.0
cython>=0.29.24

View File

@ -1,33 +0,0 @@
FROM python:3.6
ENV TARGET_DIR /opt/katib
ENV SUGGESTION_DIR cmd/suggestion/hyperopt/v1alpha3
RUN if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
apt-get -y update && \
apt-get -y install gfortran libopenblas-dev liblapack-dev && \
pip install cython; \
fi
RUN GRPC_HEALTH_PROBE_VERSION=v0.3.1 && \
if [ "$(uname -m)" = "ppc64le" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-ppc64le; \
elif [ "$(uname -m)" = "aarch64" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64; \
else \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64; \
fi && \
chmod +x /bin/grpc_health_probe
ADD ./pkg/ ${TARGET_DIR}/pkg/
ADD ./${SUGGESTION_DIR}/ ${TARGET_DIR}/${SUGGESTION_DIR}/
WORKDIR ${TARGET_DIR}/${SUGGESTION_DIR}
RUN pip install --no-cache-dir -r requirements.txt
RUN chgrp -R 0 ${TARGET_DIR} \
&& chmod -R g+rwX ${TARGET_DIR}
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1alpha3/python:${TARGET_DIR}/pkg/apis/manager/health/python
ENTRYPOINT ["python", "main.py"]

View File

@ -1,28 +0,0 @@
import grpc
import time
from pkg.apis.manager.v1alpha3.python import api_pb2_grpc
from pkg.apis.manager.health.python import health_pb2_grpc
from pkg.suggestion.v1alpha3.hyperopt.service import HyperoptService
from concurrent import futures
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
DEFAULT_PORT = "0.0.0.0:6789"
def serve():
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
service = HyperoptService()
api_pb2_grpc.add_SuggestionServicer_to_server(service, server)
health_pb2_grpc.add_HealthServicer_to_server(service, server)
server.add_insecure_port(DEFAULT_PORT)
print("Listening...")
server.start()
try:
while True:
time.sleep(_ONE_DAY_IN_SECONDS)
except KeyboardInterrupt:
server.stop(0)
if __name__ == "__main__":
serve()

View File

@ -1,10 +0,0 @@
grpcio==1.23.0
duecredit===0.7.0
cloudpickle==0.5.6
numpy>=1.13.3
scikit-learn>=0.19.0
scipy>=0.19.1
forestci==0.3
protobuf==3.9.1
googleapis-common-protos==1.6.0
hyperopt==0.2.3

View File

@ -1,33 +1,24 @@
FROM python:3.6
FROM python:3.11-slim
ARG TARGETARCH
ENV TARGET_DIR /opt/katib
ENV SUGGESTION_DIR cmd/suggestion/hyperopt/v1beta1
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/apis/manager/health/python
RUN if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
RUN if [ "${TARGETARCH}" = "ppc64le" ]; then \
apt-get -y update && \
apt-get -y install gfortran libopenblas-dev liblapack-dev && \
pip install cython; \
apt-get clean && \
rm -rf /var/lib/apt/lists/*; \
fi
RUN GRPC_HEALTH_PROBE_VERSION=v0.3.1 && \
if [ "$(uname -m)" = "ppc64le" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-ppc64le; \
elif [ "$(uname -m)" = "aarch64" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64; \
else \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64; \
fi && \
chmod +x /bin/grpc_health_probe
ADD ./pkg/ ${TARGET_DIR}/pkg/
ADD ./${SUGGESTION_DIR}/ ${TARGET_DIR}/${SUGGESTION_DIR}/
WORKDIR ${TARGET_DIR}/${SUGGESTION_DIR}
RUN pip install --no-cache-dir -r requirements.txt
WORKDIR ${TARGET_DIR}/${SUGGESTION_DIR}
RUN pip install --prefer-binary --no-cache-dir -r requirements.txt
RUN chgrp -R 0 ${TARGET_DIR} \
&& chmod -R g+rwX ${TARGET_DIR}
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/apis/manager/health/python
ENTRYPOINT ["python", "main.py"]

View File

@ -1,10 +1,26 @@
import grpc
# Copyright 2022 The Kubeflow Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
from pkg.apis.manager.v1beta1.python import api_pb2_grpc
from pkg.apis.manager.health.python import health_pb2_grpc
from pkg.suggestion.v1beta1.hyperopt.service import HyperoptService
from concurrent import futures
import grpc
from pkg.apis.manager.health.python import health_pb2_grpc
from pkg.apis.manager.v1beta1.python import api_pb2_grpc
from pkg.suggestion.v1beta1.hyperopt.service import HyperoptService
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
DEFAULT_PORT = "0.0.0.0:6789"

View File

@ -1,10 +1,10 @@
grpcio==1.23.0
duecredit===0.7.0
grpcio>=1.64.1
cloudpickle==0.5.6
numpy>=1.13.3
scikit-learn>=0.19.0
scipy>=0.19.1
numpy>=1.25.2
scikit-learn>=0.24.0
scipy>=1.5.4
forestci==0.3
protobuf==3.9.1
protobuf>=4.21.12,<5
googleapis-common-protos==1.6.0
hyperopt==0.2.3
hyperopt==0.2.5
cython>=0.29.24

View File

@ -1,33 +0,0 @@
FROM python:3.6
ENV TARGET_DIR /opt/katib
ENV SUGGESTION_DIR cmd/suggestion/nas/darts/v1alpha3
RUN if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
apt-get -y update && \
apt-get -y install gfortran libopenblas-dev liblapack-dev && \
pip install cython; \
fi
RUN GRPC_HEALTH_PROBE_VERSION=v0.3.1 && \
if [ "$(uname -m)" = "ppc64le" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-ppc64le; \
elif [ "$(uname -m)" = "aarch64" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64; \
else \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64; \
fi && \
chmod +x /bin/grpc_health_probe
ADD ./pkg/ ${TARGET_DIR}/pkg/
ADD ./${SUGGESTION_DIR}/ ${TARGET_DIR}/${SUGGESTION_DIR}/
WORKDIR ${TARGET_DIR}/${SUGGESTION_DIR}
RUN pip install --no-cache-dir -r requirements.txt
RUN chgrp -R 0 ${TARGET_DIR} \
&& chmod -R g+rwX ${TARGET_DIR}
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1alpha3/python:${TARGET_DIR}/pkg/apis/manager/health/python
ENTRYPOINT ["python", "main.py"]

View File

@ -1,30 +0,0 @@
import grpc
from concurrent import futures
import time
from pkg.apis.manager.v1alpha3.python import api_pb2_grpc
from pkg.apis.manager.health.python import health_pb2_grpc
from pkg.suggestion.v1alpha3.nas.darts.service import DartsService
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
DEFAULT_PORT = "0.0.0.0:6789"
def serve():
print("Darts Suggestion Service")
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
service = DartsService()
api_pb2_grpc.add_SuggestionServicer_to_server(service, server)
health_pb2_grpc.add_HealthServicer_to_server(service, server)
server.add_insecure_port(DEFAULT_PORT)
print("Listening...")
server.start()
try:
while True:
time.sleep(_ONE_DAY_IN_SECONDS)
except KeyboardInterrupt:
server.stop(0)
if __name__ == "__main__":
serve()

View File

@ -1,3 +0,0 @@
grpcio==1.23.0
protobuf==3.9.1
googleapis-common-protos==1.6.0

View File

@ -1,33 +1,24 @@
FROM python:3.6
FROM python:3.11-slim
ARG TARGETARCH
ENV TARGET_DIR /opt/katib
ENV SUGGESTION_DIR cmd/suggestion/nas/darts/v1beta1
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/apis/manager/health/python
RUN if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
RUN if [ "${TARGETARCH}" = "ppc64le" ]; then \
apt-get -y update && \
apt-get -y install gfortran libopenblas-dev liblapack-dev && \
pip install cython; \
apt-get clean && \
rm -rf /var/lib/apt/lists/*; \
fi
RUN GRPC_HEALTH_PROBE_VERSION=v0.3.1 && \
if [ "$(uname -m)" = "ppc64le" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-ppc64le; \
elif [ "$(uname -m)" = "aarch64" ]; then \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64; \
else \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64; \
fi && \
chmod +x /bin/grpc_health_probe
ADD ./pkg/ ${TARGET_DIR}/pkg/
ADD ./${SUGGESTION_DIR}/ ${TARGET_DIR}/${SUGGESTION_DIR}/
WORKDIR ${TARGET_DIR}/${SUGGESTION_DIR}
RUN pip install --no-cache-dir -r requirements.txt
WORKDIR ${TARGET_DIR}/${SUGGESTION_DIR}
RUN pip install --prefer-binary --no-cache-dir -r requirements.txt
RUN chgrp -R 0 ${TARGET_DIR} \
&& chmod -R g+rwX ${TARGET_DIR}
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/apis/manager/health/python
ENTRYPOINT ["python", "main.py"]

View File

@ -1,10 +1,25 @@
import grpc
from concurrent import futures
import time
from pkg.apis.manager.v1beta1.python import api_pb2_grpc
from pkg.apis.manager.health.python import health_pb2_grpc
from pkg.suggestion.v1beta1.nas.darts.service import DartsService
# Copyright 2022 The Kubeflow Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
from concurrent import futures
import grpc
from pkg.apis.manager.health.python import health_pb2_grpc
from pkg.apis.manager.v1beta1.python import api_pb2_grpc
from pkg.suggestion.v1beta1.nas.darts.service import DartsService
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
DEFAULT_PORT = "0.0.0.0:6789"

View File

@ -1,3 +1,4 @@
grpcio==1.23.0
protobuf==3.9.1
grpcio>=1.64.1
protobuf>=4.21.12,<5
googleapis-common-protos==1.6.0
cython>=0.29.24

Some files were not shown because too many files have changed in this diff Show More