Commit Graph

18 Commits

Author SHA1 Message Date
Carlos Eduardo Arango Gutierrez a869150953
Bump to k8s 1.31 (#664)
* Bump to k8s 1.31

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>

* Bump sigs.k8s.io/controller-runtime to v0.19.0

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>

* Bump golangci-lint to v1.61.0

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>

* change queue from RateLimitingInterface  to TypedRateLimitingInterface[any]

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>

* Update kubectl url

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>

---------

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-10-15 20:25:18 +00:00
Yuki Iwai 10f9e20b89
Upgrade the k8s dependency versions to 1.30 (#657)
* Upgrade the k8s dependency versions to 1.30

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Generate codes

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Update testing version

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

---------

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-10-12 02:24:10 +00:00
Michał Szadkowski 1794cc0d44
Adjust the comment for managedBy (#656)
Signed-off-by: Michal Szadkowski <michal_szadkowski@epam.com>
2024-10-11 09:49:10 +00:00
Michał Szadkowski c29c37ca7e
Introduce ManagedBy field in RunPolicy (#650)
* Introduce ManageBy field to RunPolicy

Signed-off-by: Michal Szadkowski <michal_szadkowski@epam.com>

* Make mpi-operator a default value for ManagedBy

Signed-off-by: Michal Szadkowski <michal_szadkowski@epam.com>

* Add validation for ManagedBy field

Signed-off-by: Michal Szadkowski <michal_szadkowski@epam.com>

* Make use of ManagedBy in reconciliation process

Signed-off-by: Michal Szadkowski <michal_szadkowski@epam.com>

* Regenerate code after adding managedBy field

Signed-off-by: Michal Szadkowski <michal_szadkowski@epam.com>

* Add e2e tests

Signed-off-by: Michal Szadkowski <michal_szadkowski@epam.com>

* Update after code review

Signed-off-by: Michal Szadkowski <michal_szadkowski@epam.com>

* Update tests

Signed-off-by: Michal Szadkowski <michal_szadkowski@epam.com>

* Remove default value for ManagedBy

Signed-off-by: Michal Szadkowski <michal_szadkowski@epam.com>

* Add optional tag
Replace backoff and consistently with sleep

Signed-off-by: Michal Szadkowski <michal_szadkowski@epam.com>

* Create common util package for integration and e2e tests with sleep/wait constants

Signed-off-by: Michal Szadkowski <michal_szadkowski@epam.com>

---------

Signed-off-by: Michal Szadkowski <michal_szadkowski@epam.com>
2024-10-10 17:16:10 +00:00
Chitsing KUI a6c2da887d
run worker process in launcher pod (#612)
* run worker in launcher pod; fix DCO issue

Signed-off-by: kuizhiqing <kuizhiqing@msn.com>

* use ptr.Deref

Signed-off-by: kuizhiqing <kuizhiqing@msn.com>

* update manifest

Signed-off-by: kuizhiqing <kuizhiqing@msn.com>

* more Deref

Signed-off-by: kuizhiqing <kuizhiqing@msn.com>

* create one service for both launcher and worker

Signed-off-by: kuizhiqing <kuizhiqing@msn.com>

---------

Signed-off-by: kuizhiqing <kuizhiqing@msn.com>
2024-02-26 15:17:58 +00:00
Vanessasaurus 18250f5e69
add custom setup.py to install mpijob module (#579)
Problem: we cannot currently install the python mpijob module
Solution: add a setup.py proper that is ignored by the generator

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
Co-authored-by: vsoch <vsoch@users.noreply.github.com>
2023-07-10 14:17:55 +00:00
dragon-fly e1590ce61e
merge kubeflow/common.v1 to mpi-operator (#571)
* merge kubeflow/common.v1 to mpi-operator

Signed-off-by: lowang_bh <lhui_wang@163.com>

java gen Python SDK

Signed-off-by: lowang_bh <lhui_wang@163.com>

* update make generate and fix comment issues

Signed-off-by: lowang_bh <lhui_wang@163.com>

* Update pkg/apis/kubeflow/v2beta1/types.go

Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* merge from master to solve conflict

Signed-off-by: lowang-bh <lhui_wang@163.com>

* change reference link to training-operator project

Signed-off-by: lowang-bh <lhui_wang@163.com>

---------

Signed-off-by: lowang_bh <lhui_wang@163.com>
Signed-off-by: lowang-bh <lhui_wang@163.com>
Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-07-08 19:52:53 +00:00
xhejtman f8d815cdf4
Run workers first and wait for them (#484)
* Real rebase of waitforworkes option

Signed-off-by: Lukas Hejtmanek <xhejtman@gmail.com>

* Fix generated API

Signed-off-by: Lukas Hejtmanek <xhejtman@gmail.com>

* Fix format

Signed-off-by: Lukas Hejtmanek <xhejtman@gmail.com>

* Add docs

Signed-off-by: Lukas Hejtmanek <xhejtman@gmail.com>

* Fix typo

* Add tests for waitforworkers

Signed-off-by: Lukas Hejtmanek <xhejtman@gmail.com>

* Add missing err test

Signed-off-by: Lukas Hejtmanek <xhejtman@gmail.com>

* Fix cleanpodpolicy

Signed-off-by: Lukas Hejtmanek <xhejtman@gmail.com>

* Remove debug

Signed-off-by: Lukas Hejtmanek <xhejtman@gmail.com>

* Fix tests

Signed-off-by: Lukas Hejtmanek <xhejtman@gmail.com>

* Rework api

Signed-off-by: Lukas Hejtmanek <xhejtman@gmail.com>

* Fix generated api

Signed-off-by: Lukas Hejtmanek <xhejtman@gmail.com>

* One more fix of api

Signed-off-by: Lukas Hejtmanek <xhejtman@gmail.com>

* Swagger fix

Signed-off-by: Lukas Hejtmanek <xhejtman@gmail.com>

* Fix readme

Signed-off-by: Lukas Hejtmanek <xhejtman@gmail.com>

* Fix readme again

Signed-off-by: Lukas Hejtmanek <xhejtman@gmail.com>

* Add comments

Signed-off-by: Lukas Hejtmanek <xhejtman@gmail.com>

* Add kubebuilder annotations

Signed-off-by: Lukas Hejtmanek <xhejtman@gmail.com>

* Fix manifests

Signed-off-by: Lukas Hejtmanek <xhejtman@gmail.com>

---------

Signed-off-by: Lukas Hejtmanek <xhejtman@gmail.com>
2023-06-26 18:37:14 +00:00
Mateusz Kubica 21f326d1d2
MPICH support (#562)
* Add support for MPICH

* Fix CI errors

* Temporary: manual trigger

* Fix file name

* Add an empty line at the end of the file

* Fix formatting

* Revert "Temporary: manual trigger"

This reverts commit 15164a8b70.

* fix formatting

* Regenerate the mpi-operator.yaml

* Adding an empy line at the end of Dockerfiles

* Share the same entrypoin for Intel and MPICH

* share hostfile generation between Intel and MPICH

* Add validation test for MPICH

* Fix formatting

* Don't over engineer the tests - be explicit

* add non-root tests for IntelMPI and MPICH
2023-06-16 17:57:36 +00:00
Yuki Iwai 2495860427
Support the coscheduling plugin of scheduler-plugins (#538)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-03-29 02:27:12 +00:00
Yuki Iwai b302019be7
Respect SchedulingPolicy (#520)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-02-28 15:37:42 +00:00
Michał Woźniak 92e491e6e9
Support suspend semantics for MPIJob (#511)
* Implement Suspend semantics for MPIJob

# Conflicts:
#	pkg/apis/kubeflow/v2beta1/types.go
#	pkg/controller/mpi_job_controller.go
#	pkg/controller/mpi_job_controller_status.go
#	pkg/controller/mpi_job_controller_test.go
#	test/integration/mpi_job_controller_test.go

* Changes
- add unit tests for creating suspended, suspending and resuming
- use fake clock for unit tests
- do not return from the syncHandler after worker pods cleanup on
suspend - this allows to continue with the MPIJob update in the same sync

# Conflicts:
#	pkg/controller/mpi_job_controller.go
2023-02-03 15:44:02 +00:00
Yuki Iwai 4c8b4fc2e4
Use local copy of JobStatus by mpi-operator (#514)
* Use local copy of JobStatus by mpi-operator

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* address comments

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

---------

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-02-03 14:44:01 +00:00
Michał Woźniak 0b32af39c3
Use local copy of RunPolicy by MPI-operator (#513)
* Use local copy of RunPolicy by MPI-operator

Steps performed:
- copy the `RunPolicy` from common to `types.go`
- fix compilation errors by using the local RunPolicy definition
- run `make generate`
- run `make all`
- regenerate openapi_generated.go by `./hack/python-sdk/gen-sdk.sh` (with commented out rollback)

* Copy SchedulingPolicy and CleanPodPolicy for RunPolicy
2023-01-31 17:46:30 +00:00
Yuki Iwai 05ac6addc0
Upgrade Kubernetes dependencies (#502)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-01-26 18:13:09 +00:00
Yuki Iwai c131315192
Remove MPI Operator V1 (#492)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-01-06 13:40:56 +00:00
Gang Pu b88edad03a
Generate sdk for v2 (#434)
* Generate sdk for v2

* Refine the version parameters of sdk generator

* add example for v2beta1

* make runPolicy optional

* 1: Ignore some generated files that is not needed
2: Add gitattributes file
2021-11-30 12:44:30 +00:00
Wang Zhang 680cd4db0f
Add python sdk and auto-generate script (#357) 2021-05-13 20:20:43 -04:00