katib/cmd
Ram Lau c9528e7d4e
Adding out of the box support to TrainJob (#2560)
* Out-of-the-box support TrainJob

Signed-off-by: Ram Lau <ramwt4444@gmail.com>

* Example for Pytorch Distributed

Signed-off-by: Ram Lau <ramwt4444@gmail.com>

* Update examples/v1beta1/kubeflow-training-operator/trainjob-pytorch.yaml

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: Ram Lau <ramwt4444@gmail.com>

* Create folder for Trainer as suggested

Signed-off-by: Ram Lau <ramwt4444@gmail.com>

* Movethe exmaple of trainjob to the new folder

Signed-off-by: Ram Lau <ramwt4444@gmail.com>

* Ref the primaryContainerName to that of ClusterTrainingRuntime

Signed-off-by: Ram Lau <ramwt4444@gmail.com>

* tenzen-y steps down from Katib approver role (#2561)

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
Signed-off-by: Ram Lau <ramwt4444@gmail.com>

* Set Default value for TrainJob Success, Failure Condition and PrimaryPodLabels in the trial Template

Signed-off-by: Ram Lau <ramwt4444@gmail.com>

* Enchance Handling for default value of Success, Fail Cond & Pod Label

Signed-off-by: Ram Lau <ramwt4444@gmail.com>

* Bug fix for default value condition

Signed-off-by: Ram Lau <ramwt4444@gmail.com>

* code format by hack/update-gofmt.sh

Signed-off-by: Ram Lau <ramwt4444@gmail.com>

* add TrainJob trial Resources to cert manager config

Signed-off-by: Ram Lau <ramwt4444@gmail.com>

* add trainjob to controller rbac

Signed-off-by: Ram Lau <ramwt4444@gmail.com>

* Grant JobSet permission to Katib controller

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Remove create/delete RBAC for TrainJob

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Fix docker build with libpcre2

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Signed-off-by: Ram Lau <ramwt4444@gmail.com>
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2025-08-23 00:01:07 +00:00
..
db-manager/v1beta1
earlystopping/medianstop/v1beta1
katib-controller/v1beta1
metricscollector/v1beta1 Adding out of the box support to TrainJob (#2560) 2025-08-23 00:01:07 +00:00
suggestion
ui/v1beta1