mirror of https://github.com/kubeflow/katib.git
* Out-of-the-box support TrainJob Signed-off-by: Ram Lau <ramwt4444@gmail.com> * Example for Pytorch Distributed Signed-off-by: Ram Lau <ramwt4444@gmail.com> * Update examples/v1beta1/kubeflow-training-operator/trainjob-pytorch.yaml Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Signed-off-by: Ram Lau <ramwt4444@gmail.com> * Create folder for Trainer as suggested Signed-off-by: Ram Lau <ramwt4444@gmail.com> * Movethe exmaple of trainjob to the new folder Signed-off-by: Ram Lau <ramwt4444@gmail.com> * Ref the primaryContainerName to that of ClusterTrainingRuntime Signed-off-by: Ram Lau <ramwt4444@gmail.com> * tenzen-y steps down from Katib approver role (#2561) Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com> Signed-off-by: Ram Lau <ramwt4444@gmail.com> * Set Default value for TrainJob Success, Failure Condition and PrimaryPodLabels in the trial Template Signed-off-by: Ram Lau <ramwt4444@gmail.com> * Enchance Handling for default value of Success, Fail Cond & Pod Label Signed-off-by: Ram Lau <ramwt4444@gmail.com> * Bug fix for default value condition Signed-off-by: Ram Lau <ramwt4444@gmail.com> * code format by hack/update-gofmt.sh Signed-off-by: Ram Lau <ramwt4444@gmail.com> * add TrainJob trial Resources to cert manager config Signed-off-by: Ram Lau <ramwt4444@gmail.com> * add trainjob to controller rbac Signed-off-by: Ram Lau <ramwt4444@gmail.com> * Grant JobSet permission to Katib controller Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Remove create/delete RBAC for TrainJob Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Fix docker build with libpcre2 Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> --------- Signed-off-by: Ram Lau <ramwt4444@gmail.com> Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com> Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com> |
||
|---|---|---|
| .. | ||
| db-manager/v1beta1 | ||
| earlystopping/medianstop/v1beta1 | ||
| katib-controller/v1beta1 | ||
| metricscollector/v1beta1 | ||
| suggestion | ||
| ui/v1beta1 | ||