trainer/examples
Andrew Chen 074d8b84bd
Add question-answer example for v2 trainer (#2580)
* Add question-answer example

Signed-off-by: solanyn <14799876+solanyn@users.noreply.github.com>

* chore: remove unused lines, add TODO comment

Signed-off-by: solanyn <14799876+solanyn@users.noreply.github.com>

* chore: update example description

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: Andrew Chen <14799876+solanyn@users.noreply.github.com>

* chore: update question-answering example

* run train job on CPU
* reduce batch size, dataset size and train epochs
* make upload to bucket optional
* add notebook to e2e-test
* set model name as trainjob argument

Signed-off-by: solanyn <14799876+solanyn@users.noreply.github.com>

* chore: extend e2e-run-notebook timeout

* e2e tests fail if trainjobs launched by notebook do not finish in 3s
* extends the timeout to 5min to block and wait for longer trainjobs until timeout or trainjob completes

Signed-off-by: solanyn <14799876+solanyn@users.noreply.github.com>

* chore: update example to wait for trainjob running status

* revert change to e2e-run-notebook.sh

Signed-off-by: solanyn <14799876+solanyn@users.noreply.github.com>

---------

Signed-off-by: solanyn <14799876+solanyn@users.noreply.github.com>
Signed-off-by: Andrew Chen <14799876+solanyn@users.noreply.github.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2025-05-09 21:08:41 +00:00
..
deepspeed/text-summarization feat(runtimes): Support MLX Distributed Runtime with OpenMPI (#2565) 2025-03-27 04:54:22 +00:00
mlx/image-classification feat(runtimes): Support MLX Distributed Runtime with OpenMPI (#2565) 2025-03-27 04:54:22 +00:00
pytorch Add question-answer example for v2 trainer (#2580) 2025-05-09 21:08:41 +00:00
README.md Update the naming conventions for Kubeflow Trainer (#2415) 2025-02-06 13:48:30 +00:00

README.md

Kubeflow Trainer Examples

Welcome to Kubeflow Trainer examples!

The Kubeflow Trainer documentation is available on kubeflow.org.