* Support gang scheduling with Yunikorn (#2107)
* Add Yunikorn scheduler and example
Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
* Add test cases
Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
* Add code comments
Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
* Add license comment
Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
* Inline mergeNodeSelector
Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
* Fix initial number implementation
Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
---------
Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
(cherry picked from commit 8fcda12657)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Update Makefile for building sparkctl (#2119)
Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit 4bc6e89708)
Signed-off-by: Yi Chen <github@chenyicn.net>
* fix: Add default values for namespaces to match usage descriptions (#2128)
* fix: Add default values for namespaces to match usage descriptions
Signed-off-by: pengfei4.li <pengfei4.li@ly.com>
* fix: remove incorrect cache settings
Signed-off-by: pengfei4.li <pengfei4.li@ly.com>
---------
Signed-off-by: pengfei4.li <pengfei4.li@ly.com>
Co-authored-by: pengfei4.li <pengfei4.li@ly.com>
(cherry picked from commit 52f818d535)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Fix: Spark role binding did not render properly when setting spark service account name (#2135)
Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit a1a38ea2f1)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Reintroduce option webhook.enable (#2142)
Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit 9e88049af1)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Add default batch scheduler argument (#2143)
* Add default batch scheduler argument
Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
* Add helm unit test
Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
---------
Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
(cherry picked from commit 9cc1c02c64)
Signed-off-by: Yi Chen <github@chenyicn.net>
* fix: unable to set controller/webhook replicas to zero (#2147)
Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit 1afa72e7a0)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Adding support for setting spark job namespaces to all namespaces (#2123)
Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit c93b0ec0e7)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Support extended kube-scheduler as batch scheduler (#2136)
* Support coscheduling with kube-scheduler plugins
Signed-off-by: Yi Chen <github@chenyicn.net>
* Add example for using kube-schulder coscheduling
Signed-off-by: Yi Chen <github@chenyicn.net>
---------
Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit e8d3de9e1a)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Run e2e tests on Kind (#2148)
Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
(cherry picked from commit c810ece25b)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Set schedulerName to Yunikorn (#2153)
Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
(cherry picked from commit 62b4ca636d)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Create role and rolebinding for controller/webhook in every spark job namespace if not watching all namespaces (#2129)
watching all namespaces
Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit 592b649917)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Fix: e2e test failes due to webhook not ready (#2149)
Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit dee91ba66c)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Upgrade to Go 1.23.1 (#2155)
Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
(cherry picked from commit 10fcb8e19a)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Upgrade to Spark 3.5.2 (#2154)
Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
(cherry picked from commit e1b7a27062)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Bump sigs.k8s.io/scheduler-plugins from 0.29.7 to 0.29.8 (#2159)
Bumps [sigs.k8s.io/scheduler-plugins](https://github.com/kubernetes-sigs/scheduler-plugins) from 0.29.7 to 0.29.8.
- [Release notes](https://github.com/kubernetes-sigs/scheduler-plugins/releases)
- [Changelog](https://github.com/kubernetes-sigs/scheduler-plugins/blob/master/RELEASE.md)
- [Commits](https://github.com/kubernetes-sigs/scheduler-plugins/compare/v0.29.7...v0.29.8)
---
updated-dependencies:
- dependency-name: sigs.k8s.io/scheduler-plugins
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit 95d202e95c)
Signed-off-by: Yi Chen <github@chenyicn.net>
* feat: support driver and executor pod use different priority (#2146)
* feat: support driver and executor pod use different priority
Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>
* feat: if *app.Spec.Driver.PriorityClassName and *app.Spec.Executor.PriorityClassName specifically defined, then can precedence over spec.batchSchedulerOptions.priorityClassName
Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>
* feat: merge the logic of setPodPriorityClassName into addPriorityClassName
Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>
* feat: support driver and executor pod use different priority
Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>
Signed-off-by: Kevin.Wu <kevin.wu@momenta.ai>
* feat: if *app.Spec.Driver.PriorityClassName and *app.Spec.Executor.PriorityClassName specifically defined, then can precedence over spec.batchSchedulerOptions.priorityClassName
Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>
Signed-off-by: Kevin.Wu <kevin.wu@momenta.ai>
* feat: merge the logic of setPodPriorityClassName into addPriorityClassName
Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>
Signed-off-by: Kevin.Wu <kevin.wu@momenta.ai>
* feat: add adjust pointer if is nil
Signed-off-by: Kevin.Wu <kevin.wu@momenta.ai>
* feat: remove spec.batchSchedulerOptions.priorityClassName define , split driver and executor pod priorityClass
Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>
* feat: remove spec.batchSchedulerOptions.priorityClassName define , split driver and executor pod priorityClass
Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>
* feat: Optimize code to avoid null pointer exceptions
Signed-off-by: Kevin.Wu <kevin.wu@momenta.ai>
* fix: remove backup crd files
Signed-off-by: Kevin.Wu <kevin.wu@momenta.ai>
* fix: remove BatchSchedulerOptions.PriorityClassName test code
Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>
* fix: add driver and executor pod priorityClassName test code
Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>
---------
Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>
Signed-off-by: Kevin.Wu <kevin.wu@momenta.ai>
Co-authored-by: Kevin Wu <kevin.wu@momenta.ai>
(cherry picked from commit 6ae1b2f69c)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Bump gocloud.dev from 0.37.0 to 0.39.0 (#2160)
Bumps [gocloud.dev](https://github.com/google/go-cloud) from 0.37.0 to 0.39.0.
- [Release notes](https://github.com/google/go-cloud/releases)
- [Commits](https://github.com/google/go-cloud/compare/v0.37.0...v0.39.0)
---
updated-dependencies:
- dependency-name: gocloud.dev
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit e58023b90d)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Update e2e tests (#2161)
* Add sleep buffer to ensture the webhooks are ready before running the e2e tests
Signed-off-by: Yi Chen <github@chenyicn.net>
* Remove duplicate operator image build tasks
Signed-off-by: Yi Chen <github@chenyicn.net>
* Update e2e tests
Signed-off-by: Yi Chen <github@chenyicn.net>
* Update examples
Signed-off-by: Yi Chen <github@chenyicn.net>
---------
Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit e6a7805079)
Signed-off-by: Yi Chen <github@chenyicn.net>
* fix: webhook not working when settings spark job namespaces to empty (#2163)
Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit 7785107ec5)
Signed-off-by: Yi Chen <github@chenyicn.net>
* fix: The logger had an odd number of arguments, making it panic (#2166)
Signed-off-by: tcassaert <tcassaert@inuits.eu>
(cherry picked from commit eb48b349a1)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Upgrade to Spark 3.5.2(#2012) (#2157)
* Upgrade to Spark 3.5.2
Signed-off-by: HyukSangCho <a01045542949@gmail.com>
* Upgrade to Spark 3.5.2
Signed-off-by: HyukSangCho <a01045542949@gmail.com>
* Upgrade to Spark 3.5.2
Signed-off-by: HyukSangCho <a01045542949@gmail.com>
* Upgrade to Spark 3.5.2
Signed-off-by: HyukSangCho <a01045542949@gmail.com>
---------
Signed-off-by: HyukSangCho <a01045542949@gmail.com>
(cherry picked from commit 9f0c08a65e)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Feature: Add pprof endpoint (#2164)
* add pprof support to the operator Controller Manager
Signed-off-by: ImpSy <3097030+ImpSy@users.noreply.github.com>
* add pprof support to helm chart
Signed-off-by: ImpSy <3097030+ImpSy@users.noreply.github.com>
---------
Signed-off-by: ImpSy <3097030+ImpSy@users.noreply.github.com>
(cherry picked from commit 75b926652b)
Signed-off-by: Yi Chen <github@chenyicn.net>
* fix the make kind-delete-custer to avoid accidental kubeconfig deletion (#2172)
Signed-off-by: ImpSy <3097030+ImpSy@users.noreply.github.com>
(cherry picked from commit cbfefd57bb)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Bump github.com/aws/aws-sdk-go-v2/config from 1.27.27 to 1.27.33 (#2174)
Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.27.27 to 1.27.33.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/config/v1.27.27...config/v1.27.33)
---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/config
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit b81833246f)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Bump helm.sh/helm/v3 from 3.15.3 to 3.16.1 (#2173)
Bumps [helm.sh/helm/v3](https://github.com/helm/helm) from 3.15.3 to 3.16.1.
- [Release notes](https://github.com/helm/helm/releases)
- [Commits](https://github.com/helm/helm/compare/v3.15.3...v3.16.1)
---
updated-dependencies:
- dependency-name: helm.sh/helm/v3
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit f3f80d49b1)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Add specific error in log line when failed to create web UI service (#2170)
* Add specific error in log line when failed to create web UI service
Signed-off-by: tcassaert <tcassaert@inuits.eu>
* Update log to reflect correct resource that could not be created
Co-authored-by: Yi Chen <github@chenyicn.net>
Signed-off-by: tcassaert <tcassaert@protonmail.com>
---------
Signed-off-by: tcassaert <tcassaert@inuits.eu>
Signed-off-by: tcassaert <tcassaert@protonmail.com>
Co-authored-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit ed3226ebe7)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Account for spark.executor.pyspark.memory in Yunikorn gang scheduling (#2178)
Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
(cherry picked from commit a2f71c6137)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Fix: spark application does not respect time to live seconds (#2165)
* Add time to live seconds example spark application
Signed-off-by: Yi Chen <github@chenyicn.net>
* fix: spark application does not respect time to live seconds
Signed-off-by: Yi Chen <github@chenyicn.net>
---------
Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit c855ee4c8b)
Signed-off-by: Yi Chen <github@chenyicn.net>
* Update release workflow and docs (#2121)
Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit bca6aa85cc)
Signed-off-by: Yi Chen <github@chenyicn.net>
---------
Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
Signed-off-by: Yi Chen <github@chenyicn.net>
Signed-off-by: pengfei4.li <pengfei4.li@ly.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>
Signed-off-by: Kevin.Wu <kevin.wu@momenta.ai>
Signed-off-by: tcassaert <tcassaert@inuits.eu>
Signed-off-by: HyukSangCho <a01045542949@gmail.com>
Signed-off-by: ImpSy <3097030+ImpSy@users.noreply.github.com>
Signed-off-by: tcassaert <tcassaert@protonmail.com>
Co-authored-by: Jacob Salway <jacob.salway@gmail.com>
Co-authored-by: Neo <56439757+snappyyouth@users.noreply.github.com>
Co-authored-by: pengfei4.li <pengfei4.li@ly.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kevinz <ruoshuidba@gmail.com>
Co-authored-by: Kevin Wu <kevin.wu@momenta.ai>
Co-authored-by: tcassaert <tcassaert@protonmail.com>
Co-authored-by: ha2hi <56156892+ha2hi@users.noreply.github.com>
Co-authored-by: Sébastien Maintrot <3097030+ImpSy@users.noreply.github.com>