trainer/examples/pytorch
Andrew Chen 074d8b84bd
Add question-answer example for v2 trainer (#2580)
* Add question-answer example

Signed-off-by: solanyn <14799876+solanyn@users.noreply.github.com>

* chore: remove unused lines, add TODO comment

Signed-off-by: solanyn <14799876+solanyn@users.noreply.github.com>

* chore: update example description

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: Andrew Chen <14799876+solanyn@users.noreply.github.com>

* chore: update question-answering example

* run train job on CPU
* reduce batch size, dataset size and train epochs
* make upload to bucket optional
* add notebook to e2e-test
* set model name as trainjob argument

Signed-off-by: solanyn <14799876+solanyn@users.noreply.github.com>

* chore: extend e2e-run-notebook timeout

* e2e tests fail if trainjobs launched by notebook do not finish in 3s
* extends the timeout to 5min to block and wait for longer trainjobs until timeout or trainjob completes

Signed-off-by: solanyn <14799876+solanyn@users.noreply.github.com>

* chore: update example to wait for trainjob running status

* revert change to e2e-run-notebook.sh

Signed-off-by: solanyn <14799876+solanyn@users.noreply.github.com>

---------

Signed-off-by: solanyn <14799876+solanyn@users.noreply.github.com>
Signed-off-by: Andrew Chen <14799876+solanyn@users.noreply.github.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2025-05-09 21:08:41 +00:00
..
image-classification feat(sdk): Support MPI-based TrainJobs (#2545) 2025-03-20 18:08:02 +00:00
question-answering Add question-answer example for v2 trainer (#2580) 2025-05-09 21:08:41 +00:00