mirror of https://github.com/kubeflow/katib.git
* add e2e test for tune api Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * upgrade training-operator sdk Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * specify the version of training operator sdk Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * fix num_labels error and update the version of training operator controller Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * check the version of training operator Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * debug Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * check import path of HuggingFaceModelParams Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * update the version of training operator sdk Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * update the name of experiment Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * add step of checking pod Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * check the logs of pod Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * add check Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * check reason for imagepullbackoff Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * revert timeout limit Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * fix format Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * extend timeout limit Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * update training operator sdk version Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * check the logs of pod Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * rerun tests Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * update the function of getting logs Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * add the step of describing pod Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * check disk space Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * change work directory Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * change work directory Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * increase timeout limit Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * check the logs of controller and events Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * change work directory Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * change work directory Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * change work directory Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * check the logs of kubelet Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * check the logs of kubelet Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * increase cpu Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * check the logs of training operator Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * check the use of resources Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * check the logs of container 'pytorch' and 'storage_initializer' Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * fix error of checking use of resources Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * add other checks to find the error reason Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * set 'storage_config' Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * reduce the number of tests Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * Check container runtime logs Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * set the driver of minikube as docker Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * set the driver of minikube to none Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * check logs of pod Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * check memory usage Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * increase 'termination_grace_period_seconds' in podspec Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * fix annotations error Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * restart docker Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * delete restarting docker Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * use original docker data directory Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * update installation of Katib SDK with extra requires Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * test trainer image built with cpu Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * add action of free up disk space (including move docker data directory) Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * delete unnecessary checks and update the part of fetching pod description and logs Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * delete fetching pod logs Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * add blank line at the end of free-up-disk-space yaml file Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * update experiment name Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * update test function name to be consistent with experiment name Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * move import statements inside the function Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * apply pprint for the logging output Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * update experiment names Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * fix format Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * fix format Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * fix the sequence of arguments in 'trial_template' Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * test example in user guide Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * fix access token error Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * fix the error of setup Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * fix the error of setup Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * reverse back Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * fix format Signed-off-by: helenxie-bit <helenxiehz@gmail.com> * fix format Signed-off-by: helenxie-bit <helenxiehz@gmail.com> --------- Signed-off-by: helenxie-bit <helenxiehz@gmail.com> |
||
|---|---|---|
| .. | ||
| e2e/v1beta1 | ||
| unit/v1beta1 | ||
| __init__.py | ||
| conftest.py | ||