Commit Graph

10 Commits

Author SHA1 Message Date
Thomas Newton d815e78c21
Robustness to driver pod taking time to create (#2315)
* Retry after driver pod now found if recent submission

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Add a test

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Make grace period configurable

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Update test

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Add an extra test with the driver pod

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Separate context to create and delete the driver pod

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Tidy

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Autoformat

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Update error message

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Add helm paramater

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Update internal/controller/sparkapplication/controller.go

Co-authored-by: Yi Chen <github@chenyicn.net>
Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Newlines between helm tests

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

---------

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>
Co-authored-by: Yi Chen <github@chenyicn.net>
2024-12-04 12:58:59 +00:00
Yi Chen d0daf2fd17
Support pod template for Spark 3.x applications (#2141)
* Update API definition to support pod template

Signed-off-by: Yi Chen <github@chenyicn.net>

* Mark pod template field as schemaless

Signed-off-by: Yi Chen <github@chenyicn.net>

* Add kubebuilder marker to preserve unknown fields

Signed-off-by: Yi Chen <github@chenyicn.net>

* Add example for using pod template

Signed-off-by: Yi Chen <github@chenyicn.net>

* Support pod template

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
2024-10-24 02:23:30 +00:00
Thomas Newton 735c7fc9e5
Fix retries (#2241)
* Attempt to requeue after correct period

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Syntactically correct

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* I think correct requeueing

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Same treatment for the other retries

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Tidy

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Requeue after deleting resources

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Try to fix submission status updates

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Tidy

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Correct usage of submitSparkApplication

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Fix error logging

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Bring back ExecutionAttempts increment that I forgot about

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Log after reconcile complete

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Fix setting submission ID

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Tidy logging

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Tidy

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Tidy

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Update comment

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Start a new test

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Working Fails submission and retries until retries are exhausted test

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Add Application fails and retries until retries are exhausted

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Tidy

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Comments

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Tidy

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Move fail configs out of the examples directory

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Fix lint

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Move TimeUntilNextRetryDue to `pkg/util/sparkapplication.go`

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Update internal/controller/sparkapplication/controller.go

Co-authored-by: Yi Chen <github@chenyicn.net>
Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Update test/e2e/sparkapplication_test.go

Co-authored-by: Yi Chen <github@chenyicn.net>
Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* camelCase

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* make fo-fmt

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* PR comments

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

---------

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>
Co-authored-by: Yi Chen <github@chenyicn.net>
2024-10-23 13:13:30 +00:00
Sébastien Maintrot a8b5d644b5
implement an upper bound limit to the number of tracked executor (#2181)
* implement an upper bound limit to the number of tracked executor

Signed-off-by: ImpSy <3097030+ImpSy@users.noreply.github.com>

* add upper bound limit to the number of tracked executor to helm chart

Signed-off-by: ImpSy <3097030+ImpSy@users.noreply.github.com>

---------

Signed-off-by: ImpSy <3097030+ImpSy@users.noreply.github.com>
2024-10-11 05:54:10 +00:00
Yi Chen c855ee4c8b
Fix: spark application does not respect time to live seconds (#2165)
* Add time to live seconds example spark application

Signed-off-by: Yi Chen <github@chenyicn.net>

* fix: spark application does not respect time to live seconds

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
2024-09-19 12:40:29 +00:00
tcassaert ed3226ebe7
Add specific error in log line when failed to create web UI service (#2170)
* Add specific error in log line when failed to create web UI service

Signed-off-by: tcassaert <tcassaert@inuits.eu>

* Update log to reflect correct resource that could not be created

Co-authored-by: Yi Chen <github@chenyicn.net>
Signed-off-by: tcassaert <tcassaert@protonmail.com>

---------

Signed-off-by: tcassaert <tcassaert@inuits.eu>
Signed-off-by: tcassaert <tcassaert@protonmail.com>
Co-authored-by: Yi Chen <github@chenyicn.net>
2024-09-19 08:11:28 +00:00
Yi Chen e8d3de9e1a
Support extended kube-scheduler as batch scheduler (#2136)
* Support coscheduling with kube-scheduler plugins

Signed-off-by: Yi Chen <github@chenyicn.net>

* Add example for using kube-schulder coscheduling

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
2024-09-03 03:23:13 +00:00
Jacob Salway 9cc1c02c64
Add default batch scheduler argument (#2143)
* Add default batch scheduler argument

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>

* Add helm unit test

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>

---------

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
2024-08-28 02:53:03 +00:00
Jacob Salway 8fcda12657
Support gang scheduling with Yunikorn (#2107)
* Add Yunikorn scheduler and example

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>

* Add test cases

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>

* Add code comments

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>

* Add license comment

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>

* Inline mergeNodeSelector

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>

* Fix initial number implementation

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>

---------

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
2024-08-22 04:15:57 +00:00
Yi Chen 0dc641bd1d
Use controller-runtime to reconsturct spark operator (#2072)
* Use controller-runtime to reconstruct spark operator

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update helm charts

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update examples

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
2024-08-01 12:29:06 +00:00