Compare commits

..

105 Commits

Author SHA1 Message Date
dependabot[bot] 920772e065
Bump github.com/go-viper/mapstructure/v2 from 2.2.1 to 2.3.0 (#2572)
Bumps [github.com/go-viper/mapstructure/v2](https://github.com/go-viper/mapstructure) from 2.2.1 to 2.3.0.
- [Release notes](https://github.com/go-viper/mapstructure/releases)
- [Changelog](https://github.com/go-viper/mapstructure/blob/main/CHANGELOG.md)
- [Commits](https://github.com/go-viper/mapstructure/compare/v2.2.1...v2.3.0)

---
updated-dependencies:
- dependency-name: github.com/go-viper/mapstructure/v2
  dependency-version: 2.3.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-10 09:17:53 +00:00
Mat Schaffer 799d915a44
Include pod.Status.Message in recordExecutorEvent (#2589)
This is especially useful for ephemeral-storage exhaustion.

Signed-off-by: Mat Schaffer <Mat.Schaffer@roblox.com>
2025-07-10 02:57:52 +00:00
Yi Chen 45cb1a1277
fix: should add executor env when driver env is empty (#2586)
* should check the length of executor env when adding env to executor pods

Signed-off-by: Yi Chen <github@chenyicn.net>

* Add unit tests for addEnvVars

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
2025-07-10 01:47:52 +00:00
dependabot[bot] 8635b2e84b
Bump aquasecurity/trivy-action from 0.31.0 to 0.32.0 (#2585)
Bumps [aquasecurity/trivy-action](https://github.com/aquasecurity/trivy-action) from 0.31.0 to 0.32.0.
- [Release notes](https://github.com/aquasecurity/trivy-action/releases)
- [Commits](https://github.com/aquasecurity/trivy-action/compare/0.31.0...0.32.0)

---
updated-dependencies:
- dependency-name: aquasecurity/trivy-action
  dependency-version: 0.32.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-08 01:57:50 +00:00
Yi Chen 107b457296
Make logging encoder configurable (#2580)
Signed-off-by: Yi Chen <github@chenyicn.net>
2025-07-08 01:28:50 +00:00
Yi Chen fd52169b25
Add SparkConnect e2e test (#2578)
Signed-off-by: Yi Chen <github@chenyicn.net>
2025-07-08 01:05:50 +00:00
Mat Schaffer dee20c86c4
Splat recordExecutorEvent args for cleaner event messages (#2582)
Signed-off-by: Mat Schaffer <Mat.Schaffer@roblox.com>
2025-07-07 02:39:21 +00:00
Manabu McCloskey 191ac52820
upgrade to Spark 4.0.0 (#2564)
* upgrade to Spark 4.0.0

Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>

* correct docker address

Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>

* fix dockr fqdn

Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>

---------

Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>
2025-07-04 02:11:18 +00:00
Yi Chen 9773369b6d
Add support for Spark Connect (#2569)
* Add v1alpha1 version of CRD SparkConnect

Signed-off-by: Yi Chen <github@chenyicn.net>

* Generate SparkConnect CRD manifests

Signed-off-by: Yi Chen <github@chenyicn.net>

* Implement SparkConnect controller

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update Helm chart CRDs and RBAC resources

Signed-off-by: Yi Chen <github@chenyicn.net>

* Add SparkConnect example

Signed-off-by: Yi Chen <github@chenyicn.net>

* Add pre stop lifecycle handler

Signed-off-by: Yi Chen <github@chenyicn.net>

* Rename LabelSparkConnName to LabelSparkConnectName

Signed-off-by: Yi Chen <github@chenyicn.net>

* Rename SparkConnectStateUnready to SparkConnectStateNotReady

Signed-off-by: Yi Chen <github@chenyicn.net>

* Add constant SparkConfigMapVolumeMountName

Signed-off-by: Yi Chen <github@chenyicn.net>

* Use server pod status message as SparkConnect condition message

Signed-off-by: Yi Chen <github@chenyicn.net>

* Define SparkConnect condition reasons

Signed-off-by: Yi Chen <github@chenyicn.net>

* Add prefix to Hadoop properties if it does not start with 'spark.hadoop.'

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
2025-07-02 03:21:18 +00:00
Francisco Arceo 2a51f5a426
chore: Adding OpenSSF Badge (#2571)
Signed-off-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-07-01 02:40:16 +00:00
Yi Chen 9d10f2f67a
Add changelog for v2.2.1 (#2570)
* Release v2.2.1 (#2568)

Signed-off-by: Yi Chen <github@chenyicn.net>

* Add changelog for v2.2.1

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
2025-06-27 01:06:17 +00:00
dependabot[bot] 5148c4e81a
Bump github.com/go-logr/logr from 1.4.2 to 1.4.3 (#2567)
Bumps [github.com/go-logr/logr](https://github.com/go-logr/logr) from 1.4.2 to 1.4.3.
- [Release notes](https://github.com/go-logr/logr/releases)
- [Changelog](https://github.com/go-logr/logr/blob/master/CHANGELOG.md)
- [Commits](https://github.com/go-logr/logr/compare/v1.4.2...v1.4.3)

---
updated-dependencies:
- dependency-name: github.com/go-logr/logr
  dependency-version: 1.4.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-25 06:13:15 +00:00
dependabot[bot] 1b8df900a8
Bump golang.org/x/mod from 0.24.0 to 0.25.0 (#2566)
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.24.0 to 0.25.0.
- [Commits](https://github.com/golang/mod/compare/v0.24.0...v0.25.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-version: 0.25.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-25 06:12:17 +00:00
dependabot[bot] 78bb172fa1
Bump sigs.k8s.io/scheduler-plugins from 0.30.6 to 0.31.8 (#2549)
Bumps [sigs.k8s.io/scheduler-plugins](https://github.com/kubernetes-sigs/scheduler-plugins) from 0.30.6 to 0.31.8.
- [Release notes](https://github.com/kubernetes-sigs/scheduler-plugins/releases)
- [Changelog](https://github.com/kubernetes-sigs/scheduler-plugins/blob/master/RELEASE.md)
- [Commits](https://github.com/kubernetes-sigs/scheduler-plugins/compare/v0.30.6...v0.31.8)

---
updated-dependencies:
- dependency-name: sigs.k8s.io/scheduler-plugins
  dependency-version: 0.31.8
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-19 13:50:10 +00:00
dependabot[bot] 6a32e32432
Bump github.com/prometheus/client_golang from 1.21.1 to 1.22.0 (#2548)
Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.21.1 to 1.22.0.
- [Release notes](https://github.com/prometheus/client_golang/releases)
- [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md)
- [Commits](https://github.com/prometheus/client_golang/compare/v1.21.1...v1.22.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/client_golang
  dependency-version: 1.22.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-19 13:49:11 +00:00
jbhalodia-slack ca11a8f55d
Use code-generator for clientset, informers, listers (#2563)
* Use code-generator to for clientset, informer, lister

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Add README in hack/ for code-generator

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Run verify-codegen.sh in tests

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* make generate

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* make build-api-docs

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* update makefile

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* update makefile

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* make go-fmt

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* make generate

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* run tests

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Remove deepcopy-gen since its conflicting with controller-gen

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Revert some changes

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Generate packages in pkg/client/

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Update year to 2025 in boilerplate.go.txt

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Update year to 2025 in types.go

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

---------

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>
2025-06-19 13:37:10 +00:00
Joshua Cuellar 818e34a36a
Update golangci lint (#2560)
* Update Golangci-lint version to 2.1.6

Signed-off-by: Joshua Cuellar <joshuac.cuellar@outlook.com>

* Fix Golangci-lint code issues

Signed-off-by: Joshua Cuellar <joshuac.cuellar@outlook.com>

* Simplify struct paths

Signed-off-by: Joshua Cuellar <joshuac.cuellar@outlook.com>

* Catch returned errors in test

Signed-off-by: Joshua Cuellar <joshuac.cuellar@outlook.com>

---------

Signed-off-by: Joshua Cuellar <joshuac.cuellar@outlook.com>
2025-06-17 02:40:08 +00:00
Yi Chen d14c901d12
Get logger from context (#2551)
Signed-off-by: Yi Chen <github@chenyicn.net>
2025-06-12 02:04:50 +00:00
Thomas Newton 8dd8db45a3
Make default ingress tls and annotations congurable in the helm config (#2513)
* Initial attempt at passing the info through the CLI

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Working getting ingressTLS config from helm yaml into a `[]networkingv1.IngressTLS`

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Vaguely correct adding TLS options to UI ingress

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* First test passing

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Pass through argument where I had forgotten

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Implement IngressAnnotations

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Add "default" to some variable names

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Update helm and CLI parsing, including adding annotations

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Sufficient unit tests

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Tests and documentation for helm

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Minor adjustments to test strings

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* PR comments

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Fix rebase

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Avoid manually constructing the expected ingress name in tests

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Quote and rename on the helm side

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Avoid using pointers

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* More renaming to remove "default"

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Revert helm quote on json strings

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Tidy imports

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Fix tests after #2554 moved the ingress creation to be on submitted spark applications

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Re-generate helm docs

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

---------

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>
2025-06-10 16:02:49 +00:00
Yi Chen 718e3a004e
Customize ingress URL with Spark application ID (#2554)
* Customize ingress URL with Spark application ID

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update policy rules for ingress

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
2025-06-10 09:33:48 +00:00
dependabot[bot] 84a7749680
Bump aquasecurity/trivy-action from 0.30.0 to 0.31.0 (#2557)
Bumps [aquasecurity/trivy-action](https://github.com/aquasecurity/trivy-action) from 0.30.0 to 0.31.0.
- [Release notes](https://github.com/aquasecurity/trivy-action/releases)
- [Commits](https://github.com/aquasecurity/trivy-action/compare/0.30.0...0.31.0)

---
updated-dependencies:
- dependency-name: aquasecurity/trivy-action
  dependency-version: 0.31.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-10 09:29:50 +00:00
Manabu McCloskey 53299362e5
add driver ingress unit tests (#2552)
* add driver ingress unit tests

Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>

* fix lint

Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>

* fix lint

Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>

---------

Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>
2025-06-06 09:07:15 +00:00
Yi Chen 5a1932490a
Add changelog for v2.2.0 (#2547)
* Release v2.2.0 (#2546)

Signed-off-by: Yi Chen <github@chenyicn.net>

* Add changelog for v2.2.0

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update README.md

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
2025-06-03 12:12:13 +00:00
Yi Chen 08dfbc5fc9
Pass the correct LDFLAGS when building the operator image (#2541)
* Update module path in Makefile

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update .dockerignore and .gitignore

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
2025-05-29 05:14:18 +00:00
Hossein Torabi 7b819728ff
#2525 spark metrics in depends on prometheus (#2529)
Signed-off-by: Hossein Torabi <blcksrx@pm.me>
2025-05-29 04:33:18 +00:00
Yi Chen ec83779094
Bump k8s.io dependencies to v0.32.5 (#2540)
Signed-off-by: Yi Chen <github@chenyicn.net>
2025-05-29 04:10:19 +00:00
Yi Chen 45efb0cf68
Add v2 to module path (#2515)
* Add v2 to moudle path

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update imports

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update API docs

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
2025-05-28 15:46:18 +00:00
Yi Chen 61510642af
Add support for using cert manager to generate webhook certificates (#2373)
* Add support for using cert manager to generate webhook certificates

Signed-off-by: Yi Chen <github@chenyicn.net>

* update certificate provider unit tests

Signed-off-by: Yi Chen <github@chenyicn.net>

* Add a newline at the end of file

Signed-off-by: Yi Chen <github@chenyicn.net>

* Add examples for configuring duration and renewBefore

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
2025-05-28 05:33:18 +00:00
Yi Chen 8c7a4949d0
Define SparkApplicationSubmitter interface to allow customizing submitting mechanism (#2500)
* Define SparkApplicationSubmitter interface to allow customizing submitting mechanism

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update comments

Co-authored-by: Manabu McCloskey <manabu.mccloskey@gmail.com>
Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
Co-authored-by: Manabu McCloskey <manabu.mccloskey@gmail.com>
2025-05-28 04:46:18 +00:00
dependabot[bot] 0de3502c9e
Bump manusa/actions-setup-minikube from 2.13.1 to 2.14.0 (#2523)
Bumps [manusa/actions-setup-minikube](https://github.com/manusa/actions-setup-minikube) from 2.13.1 to 2.14.0.
- [Release notes](https://github.com/manusa/actions-setup-minikube/releases)
- [Commits](https://github.com/manusa/actions-setup-minikube/compare/v2.13.1...v2.14.0)

---
updated-dependencies:
- dependency-name: manusa/actions-setup-minikube
  dependency-version: 2.14.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-28 02:24:18 +00:00
Vara Bonthu 8998abd881
Adding Manabu to the reviewers (#2522)
* Adding Manabu to the approvers

Signed-off-by: Vara Bonthu <vara.bonthu@gmail.com>

* Adding Manabu to the reviewers

Signed-off-by: Vara Bonthu <vara.bonthu@gmail.com>

---------

Signed-off-by: Vara Bonthu <vara.bonthu@gmail.com>
2025-05-28 02:23:18 +00:00
dependabot[bot] 81cac03725
Bump golang.org/x/mod from 0.23.0 to 0.24.0 (#2495)
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.23.0 to 0.24.0.
- [Commits](https://github.com/golang/mod/compare/v0.23.0...v0.24.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-27 09:21:27 +00:00
dependabot[bot] 486f5a7eee
Bump github.com/spf13/cobra from 1.8.1 to 1.9.1 (#2497)
Bumps [github.com/spf13/cobra](https://github.com/spf13/cobra) from 1.8.1 to 1.9.1.
- [Release notes](https://github.com/spf13/cobra/releases)
- [Commits](https://github.com/spf13/cobra/compare/v1.8.1...v1.9.1)

---
updated-dependencies:
- dependency-name: github.com/spf13/cobra
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-27 09:19:03 +00:00
Manabu McCloskey 7536c0739f
fix volcano tests (#2533)
Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>
2025-05-27 08:26:24 +00:00
Manabu McCloskey 851668f7ca
fix and add back unit tests (#2532)
* fix and add back unit tests

Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>

* add more tests

Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>

---------

Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>
2025-05-17 11:05:24 +00:00
jbhalodia-slack ca37f6b7b3
Add ShuffleTrackingEnabled to DynamicAllocation struct to allow disabling shuffle tracking (#2511)
* Add ShuffleTrackingEnabled *bool to DynamicAllocation struct to allow disabling shuffle tracking

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Run make generate

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* make manifests

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* make update-crd && make build-api-docs

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Update internal/controller/sparkapplication/submission.go

Co-authored-by: Yi Chen <github@chenyicn.net>
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Go fmt

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Refactor defaultExecutorSpec func

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Refactor dynamicAllocationOption func

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Add IsDynamicAllocationEnabled func

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

---------

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>
Co-authored-by: Yi Chen <github@chenyicn.net>
2025-05-14 05:49:22 +00:00
Tarek Abouzeid 0cb98f79cf
Adding securityContext to spark examples (#2530)
* Adding securityContext to spark examples

Signed-off-by: Tarek Abouzeid <tarek.abouzeid91@gmail.com>

* Fix identation and new lines

Signed-off-by: Tarek Abouzeid <tarek.abouzeid91@gmail.com>

---------

Signed-off-by: Tarek Abouzeid <tarek.abouzeid91@gmail.com>
2025-05-13 02:40:20 +00:00
Manabu McCloskey e071e5f4a6
add unit tests for driver and executor configs (#2521)
* add unit tests for driver and executor configs

Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>

* fix import

Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>

---------

Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>
2025-05-06 21:30:38 +00:00
Yi Chen 85b209ec20
Remove v1beta1 API (#2516)
Signed-off-by: Yi Chen <github@chenyicn.net>
2025-05-03 00:00:04 +00:00
Yi Chen 5002c08dce
Remove clientset, informer and listers generated by code-generator (#2506)
* Remove clientset, informer and listers generated by code-generator

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update imports

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
2025-04-27 03:20:59 +00:00
dependabot[bot] 634b0c1a9f
Bump golang.org/x/net from 0.37.0 to 0.38.0 (#2505)
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.37.0 to 0.38.0.
- [Commits](https://github.com/golang/net/compare/v0.37.0...v0.38.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-version: 0.38.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-21 08:32:26 +00:00
dependabot[bot] 6e0623cc7e
Bump github.com/spf13/viper from 1.19.0 to 1.20.1 (#2496)
Bumps [github.com/spf13/viper](https://github.com/spf13/viper) from 1.19.0 to 1.20.1.
- [Release notes](https://github.com/spf13/viper/releases)
- [Commits](https://github.com/spf13/viper/compare/v1.19.0...v1.20.1)

---
updated-dependencies:
- dependency-name: github.com/spf13/viper
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-21 07:25:26 +00:00
Daniel Freitas 5a97ca4daa
Enable the override of MemoryLimit through webhook (#2478)
* Documentation and interface definition

Signed-off-by: danielrs <danielrs@ibm.com>

* addMemoryLimit and convertion methods

Signed-off-by: danielrs <danielrs@ibm.com>

* Unit tests

Signed-off-by: danielrs <danielrs@ibm.com>

* Deepcopy

Signed-off-by: danielrs <danielrs@ibm.com>

* Adjustments after make command

Signed-off-by: danielrs <danielrs@ibm.com>

* Address comments

Signed-off-by: danielrs <danielrs@ibm.com>

---------

Signed-off-by: danielrs <danielrs@ibm.com>
Co-authored-by: danielrs <danielrs@ibm.com>
2025-04-21 07:12:26 +00:00
Yi Chen 836a9186e5
Remove sparkctl (#2466)
Signed-off-by: Yi Chen <github@chenyicn.net>
2025-04-21 06:46:26 +00:00
Yi Chen 32017d2f41
Add changelog for v2.1.1 (#2504)
Signed-off-by: Yi Chen <github@chenyicn.net>
2025-04-15 10:51:22 +00:00
dependabot[bot] f38ae18abe
Bump helm.sh/helm/v3 from 3.16.2 to 3.17.3 (#2503)
Bumps [helm.sh/helm/v3](https://github.com/helm/helm) from 3.16.2 to 3.17.3.
- [Release notes](https://github.com/helm/helm/releases)
- [Commits](https://github.com/helm/helm/compare/v3.16.2...v3.17.3)

---
updated-dependencies:
- dependency-name: helm.sh/helm/v3
  dependency-version: 3.17.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-14 08:01:09 +00:00
Jacob Salway 3c4ebc7235
Upgrade Golang to 1.24.1 and golangci-lint to 1.64.8 (#2494)
Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
2025-03-31 04:01:30 +00:00
Jacob Salway 50ae7a0062
Add timeZone to ScheduledSparkApplication (#2471)
* Add timeZone to ScheduledSparkApplication

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>

* Update api/v1beta2/scheduledsparkapplication_types.go

Co-authored-by: Yi Chen <github@chenyicn.net>
Signed-off-by: Jacob Salway <jacob.salway@gmail.com>

---------

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
Co-authored-by: Yi Chen <github@chenyicn.net>
2025-03-31 02:12:30 +00:00
Vikas Saxena 7668a1c551
Changing image repo from docker.io to ghcr.io (#2483)
* modified image.registry value

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* modified registry value to ghcr.io

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* modified image registry value to ghcr.io

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* modified image registry_value to ghcr.io

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* replaced docker.io with ghcr.io

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* updated container registry credentials

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* corrected registry username and password

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* changed image_repo value to inlude controller

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* removed unwanted space

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* Update charts/spark-operator-chart/values.yaml

Co-authored-by: Yi Chen <github@chenyicn.net>
Signed-off-by: Vikas Saxena <90456542+vikas-saxena02@users.noreply.github.com>

* Update Makefile

Co-authored-by: Yi Chen <github@chenyicn.net>
Signed-off-by: Vikas Saxena <90456542+vikas-saxena02@users.noreply.github.com>

* updated charts/spark-operator-chart/README.md by running make hem-docs

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

---------

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>
Signed-off-by: Vikas Saxena <90456542+vikas-saxena02@users.noreply.github.com>
Co-authored-by: Yi Chen <github@chenyicn.net>
2025-03-30 09:13:30 +00:00
TJ Miller d089a43836
fix: add webhook cert validity checking (#2489)
Signed-off-by: TJ Miller <millert@us.ibm.com>
2025-03-27 05:08:22 +00:00
Jacob Salway 2c0cac198c
Upgrade to Spark 3.5.5 (#2490)
Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
2025-03-26 11:22:03 +00:00
Vikas Saxena 5a1fc7ba16
Deprecating sparkctl (#2484)
* Added deprecation warning in cmd/sparkctl/main.go

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* added deprecation warning in Makefile

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* added recommendation to use kubectl in Makefile

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* updated readme for sparkctl to show its being deprecated

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

* adding deprecated comment as well to main.go

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>

---------

Signed-off-by: Vikas Saxena <Vikas.Saxena.2006@gmail.com>
2025-03-25 12:52:51 +00:00
dependabot[bot] 1768eb1ede
Bump sigs.k8s.io/controller-runtime from 0.20.1 to 0.20.4 (#2486)
Bumps [sigs.k8s.io/controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) from 0.20.1 to 0.20.4.
- [Release notes](https://github.com/kubernetes-sigs/controller-runtime/releases)
- [Changelog](https://github.com/kubernetes-sigs/controller-runtime/blob/main/RELEASE.md)
- [Commits](https://github.com/kubernetes-sigs/controller-runtime/compare/v0.20.1...v0.20.4)

---
updated-dependencies:
- dependency-name: sigs.k8s.io/controller-runtime
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-25 09:59:50 +00:00
dependabot[bot] 791d4ab9c8
Bump github.com/prometheus/client_golang from 1.20.5 to 1.21.1 (#2487)
Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.20.5 to 1.21.1.
- [Release notes](https://github.com/prometheus/client_golang/releases)
- [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md)
- [Commits](https://github.com/prometheus/client_golang/compare/v1.20.5...v1.21.1)

---
updated-dependencies:
- dependency-name: github.com/prometheus/client_golang
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-25 09:42:50 +00:00
dependabot[bot] c0403dd777
Bump github.com/stretchr/testify from 1.9.0 to 1.10.0 (#2488)
Bumps [github.com/stretchr/testify](https://github.com/stretchr/testify) from 1.9.0 to 1.10.0.
- [Release notes](https://github.com/stretchr/testify/releases)
- [Commits](https://github.com/stretchr/testify/compare/v1.9.0...v1.10.0)

---
updated-dependencies:
- dependency-name: github.com/stretchr/testify
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-25 09:30:51 +00:00
Shu 6f4102d3c2
Add APRA AMCOS to adopters (#2485)
Signed-off-by: Shu <57744345+shuch3ng@users.noreply.github.com>
2025-03-23 15:51:05 +00:00
dependabot[bot] 9f69b3a922
Bump sigs.k8s.io/scheduler-plugins from 0.29.8 to 0.30.6 (#2444)
Bumps [sigs.k8s.io/scheduler-plugins](https://github.com/kubernetes-sigs/scheduler-plugins) from 0.29.8 to 0.30.6.
- [Release notes](https://github.com/kubernetes-sigs/scheduler-plugins/releases)
- [Changelog](https://github.com/kubernetes-sigs/scheduler-plugins/blob/master/RELEASE.md)
- [Commits](https://github.com/kubernetes-sigs/scheduler-plugins/compare/v0.29.8...v0.30.6)

---
updated-dependencies:
- dependency-name: sigs.k8s.io/scheduler-plugins
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-21 06:38:26 +00:00
dependabot[bot] 544a342702
Bump github.com/aws/aws-sdk-go-v2/config from 1.28.0 to 1.29.9 (#2463)
Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.28.0 to 1.29.9.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/changelog-template.json)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/v1.28.0...config/v1.29.9)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/config
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-21 05:36:26 +00:00
dependabot[bot] 1676651ca0
Bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.66.0 to 1.78.2 (#2473)
Bumps [github.com/aws/aws-sdk-go-v2/service/s3](https://github.com/aws/aws-sdk-go-v2) from 1.66.0 to 1.78.2.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/changelog-template.json)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/service/s3/v1.66.0...service/s3/v1.78.2)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/service/s3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-21 05:12:27 +00:00
dependabot[bot] 9a76f6ad80
Bump k8s.io/apimachinery from 0.32.0 to 0.32.3 (#2474)
Bumps [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) from 0.32.0 to 0.32.3.
- [Commits](https://github.com/kubernetes/apimachinery/compare/v0.32.0...v0.32.3)

---
updated-dependencies:
- dependency-name: k8s.io/apimachinery
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-21 04:48:27 +00:00
dependabot[bot] c90a93aadf
Bump github.com/containerd/containerd from 1.7.19 to 1.7.27 (#2476)
Bumps [github.com/containerd/containerd](https://github.com/containerd/containerd) from 1.7.19 to 1.7.27.
- [Release notes](https://github.com/containerd/containerd/releases)
- [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md)
- [Commits](https://github.com/containerd/containerd/compare/v1.7.19...v1.7.27)

---
updated-dependencies:
- dependency-name: github.com/containerd/containerd
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-21 03:51:27 +00:00
dependabot[bot] e70b01a087
Bump golang.org/x/net from 0.35.0 to 0.37.0 (#2472)
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.35.0 to 0.37.0.
- [Commits](https://github.com/golang/net/compare/v0.35.0...v0.37.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-21 03:02:26 +00:00
dependabot[bot] 520f7a6fc8
Bump aquasecurity/trivy-action from 0.29.0 to 0.30.0 (#2475)
Bumps [aquasecurity/trivy-action](https://github.com/aquasecurity/trivy-action) from 0.29.0 to 0.30.0.
- [Release notes](https://github.com/aquasecurity/trivy-action/releases)
- [Commits](https://github.com/aquasecurity/trivy-action/compare/0.29.0...0.30.0)

---
updated-dependencies:
- dependency-name: aquasecurity/trivy-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-21 03:01:27 +00:00
Tan Qi 68e6c1d586
change env in executorSecretOption (#2467)
* change env in executorSecretOption

Signed-off-by: Qi Tan <16416018+TQJADE@users.noreply.github.com>

* Use spark.executorEnv instead

Signed-off-by: Qi Tan <16416018+TQJADE@users.noreply.github.com>

* Remove V2 and update SparkExecutorEnvTemplate

Signed-off-by: Qi Tan <16416018+TQJADE@users.noreply.github.com>

---------

Signed-off-by: Qi Tan <16416018+TQJADE@users.noreply.github.com>
2025-03-20 02:01:20 +00:00
dependabot[bot] 092e41ad58
Bump golang.org/x/net from 0.35.0 to 0.36.0 (#2470)
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.35.0 to 0.36.0.
- [Commits](https://github.com/golang/net/compare/v0.35.0...v0.36.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-19 06:25:45 +00:00
pvbouwel 3128c7f157
fix: webhook fail to add lifecycle to Spark3 executor pods (#2458)
* bugfix: A lifecycle on a spark3 executor should not fail

Before this fix if you have  a Spark 3.x spec where the executor has a lifecycle then the webhook will fail to identify the correct container. As described in issue [2457](https://github.com/kubeflow/spark-operator/issues/2457)

Signed-off-by: pvbouwel <463976+pvbouwel@users.noreply.github.com>

* tests: Add coverage for spark3 executor with a lifecycle

Signed-off-by: pvbouwel <463976+pvbouwel@users.noreply.github.com>

* make go-fmt

Signed-off-by: pvbouwel <463976+pvbouwel@users.noreply.github.com>

---------

Signed-off-by: pvbouwel <463976+pvbouwel@users.noreply.github.com>
2025-03-06 14:15:15 +00:00
Manabu McCloskey 939218c85f
add support for metrics-job-start-latency-buckets flag in helm (#2450)
Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>
2025-03-04 08:21:33 +00:00
Manabu McCloskey fc7c697c61
specify branch name in chart testing (#2451)
Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>
2025-03-01 01:04:29 +00:00
jbhalodia-slack 79264a4ac5
Support non-standard Spark container names (#2441)
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>
2025-02-20 18:36:43 +00:00
jbhalodia-slack d10b8f5f3a
Make image optional (#2439)
* Make app.Spec.Driver.Image and app.Spec.Image optional
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Make app.Spec.Executor.Image optional
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>
2025-02-20 04:03:42 +00:00
Manabu McCloskey bd197c6f8c
use cmd context in sparkctl (#2447)
Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>
2025-02-20 03:52:42 +00:00
Anish Asthana 405ae51de4
docs: Add information about KEP process (#2440)
Signed-off-by: Anish Asthana <anishasthana1@gmail.com>
2025-02-15 00:03:37 +00:00
Jacob Salway 25ca90cb07
Support Kubernetes 1.32 (#2416)
Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
Signed-off-by: Jacob Salway <jacob.salway@rokt.com>
2025-02-12 12:02:29 +00:00
dependabot[bot] 8892dd4b32
Bump golang.org/x/net from 0.32.0 to 0.35.0 (#2428)
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.32.0 to 0.35.0.
- [Commits](https://github.com/golang/net/compare/v0.32.0...v0.35.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-12 10:26:29 +00:00
dependabot[bot] 30c15f2db5
Bump github.com/golang/glog from 1.2.2 to 1.2.4 (#2411)
Bumps [github.com/golang/glog](https://github.com/golang/glog) from 1.2.2 to 1.2.4.
- [Release notes](https://github.com/golang/glog/releases)
- [Commits](https://github.com/golang/glog/compare/v1.2.2...v1.2.4)

---
updated-dependencies:
- dependency-name: github.com/golang/glog
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-12 10:12:29 +00:00
dependabot[bot] 53b2292025
Bump golang.org/x/mod from 0.21.0 to 0.23.0 (#2427)
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.21.0 to 0.23.0.
- [Commits](https://github.com/golang/mod/compare/v0.21.0...v0.23.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-12 10:11:29 +00:00
dependabot[bot] d73e209ea2
Bump helm/chart-testing-action from 2.6.1 to 2.7.0 (#2391)
Bumps [helm/chart-testing-action](https://github.com/helm/chart-testing-action) from 2.6.1 to 2.7.0.
- [Release notes](https://github.com/helm/chart-testing-action/releases)
- [Commits](https://github.com/helm/chart-testing-action/compare/v2.6.1...v2.7.0)

---
updated-dependencies:
- dependency-name: helm/chart-testing-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-12 10:01:29 +00:00
dependabot[bot] 46ec8e64f4
Bump manusa/actions-setup-minikube from 2.13.0 to 2.13.1 (#2390)
Bumps [manusa/actions-setup-minikube](https://github.com/manusa/actions-setup-minikube) from 2.13.0 to 2.13.1.
- [Release notes](https://github.com/manusa/actions-setup-minikube/releases)
- [Commits](https://github.com/manusa/actions-setup-minikube/compare/v2.13.0...v2.13.1)

---
updated-dependencies:
- dependency-name: manusa/actions-setup-minikube
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-12 10:00:29 +00:00
Yi Chen 54fb0b0305
Controller should only be granted event permissions in spark job namespaces (#2426)
Signed-off-by: Yi Chen <github@chenyicn.net>
2025-02-12 09:57:29 +00:00
Manabu McCloskey 2995a0a963
ensure passed context is used (#2432)
Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>
2025-02-12 08:44:29 +00:00
Yi Chen 1f2cfbcae7
Add option for disabling leader election (#2423)
* Add option for disabling leader election

Signed-off-by: Yi Chen <github@chenyicn.net>

* Remove related RBAC rules when disabling leader election

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
2025-02-10 23:15:28 +00:00
Yi Chen ae85466a52
Add helm unittest step to integration test workflow (#2424)
* chore: add helm unittest step to integration test workflow

Signed-off-by: Yi Chen <github@chenyicn.net>

* fix: contrainer security context unit test

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
2025-02-09 03:27:04 +00:00
Manabu McCloskey a348b9218f
fix make deploy and install (#2412)
* fix make deploy and install

Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>

* fix install-crd

Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>

* build local image

Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>

---------

Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>
2025-02-05 11:30:37 +00:00
Jacob Salway ad30d15bbb
Remove dependency on `k8s.io/kubernetes` (#2398)
* Remove dependency on `k8s.io/kubernetes`

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>

* Make code compliant with Apache 2.0 license

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>

---------

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
2025-01-28 15:36:51 +00:00
Manabu McCloskey 6e15770f83
add an example of using prometheus servlet (#2403)
* add an example of using prometheus servlet

Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>

* add k8s annotations

Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>

---------

Signed-off-by: Manabu McCloskey <manabu.mccloskey@gmail.com>
2025-01-27 13:50:50 +00:00
Tarek Abouzeid b2411033f0
Adding seccompProfile RuntimeDefault (#2397)
* Adding seccompProfile RuntimeDefault

Signed-off-by: Tarek Abouzeid <tarek.abouzeid@teliacompany.com>

* updating helm docs

Signed-off-by: Tarek Abouzeid <tarek.abouzeid@teliacompany.com>

---------

Signed-off-by: Tarek Abouzeid <tarek.abouzeid@teliacompany.com>
2025-01-21 12:58:35 +00:00
hongshaoyang e6c2337e02
Add Ninja Van to adopters (#2377)
* Add Ninja Van to adopters

Signed-off-by: hongshaoyang <hongsy2006@gmail.com>

* Fix typo

Signed-off-by: hongshaoyang <hongsy2006@gmail.com>

---------

Signed-off-by: hongshaoyang <hongsy2006@gmail.com>
2025-01-09 03:23:21 +00:00
dependabot[bot] 92deff0be9
Bump golang.org/x/crypto from 0.30.0 to 0.31.0 (#2365)
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.30.0 to 0.31.0.
- [Commits](https://github.com/golang/crypto/compare/v0.30.0...v0.31.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-18 16:55:10 +00:00
dependabot[bot] 7f5b5edea5
Bump golang.org/x/net from 0.30.0 to 0.32.0 (#2350)
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.30.0 to 0.32.0.
- [Commits](https://github.com/golang/net/compare/v0.30.0...v0.32.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-18 16:28:11 +00:00
Chaoran Yu 413d05eb22
Promoted jacobsalway from reviewer to approver (#2361)
Discussed offline on Slack and during community Zoom call today with fellow maintainers

Signed-off-by: Chaoran Yu <yuchaoran2011@gmail.com>
2024-12-15 14:55:08 +00:00
Yi Chen 71821733a6
Add changelog for v2.1.0 (#2355)
Signed-off-by: Yi Chen <github@chenyicn.net>
2024-12-11 09:58:04 +00:00
Yi Chen 2375a306f9
Move sparkctl to cmd directory (#2347)
* Move spark-operator

Signed-off-by: Yi Chen <github@chenyicn.net>

* Move sparkctl to cmd directory

Signed-off-by: Yi Chen <github@chenyicn.net>

* Remove unnecessary app package/directory

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
2024-12-06 05:08:00 +00:00
Aakcht 5dd91c4bf2
Use NSS_WRAPPER_PASSWD instead of /etc/passwd as in spark-operator image entrypoint.sh (#2312)
Signed-off-by: Aakcht <aakcht@gmail.com>
2024-12-04 13:07:00 +00:00
Thomas Newton d815e78c21
Robustness to driver pod taking time to create (#2315)
* Retry after driver pod now found if recent submission

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Add a test

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Make grace period configurable

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Update test

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Add an extra test with the driver pod

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Separate context to create and delete the driver pod

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Tidy

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Autoformat

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Update error message

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Add helm paramater

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Update internal/controller/sparkapplication/controller.go

Co-authored-by: Yi Chen <github@chenyicn.net>
Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Newlines between helm tests

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

---------

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>
Co-authored-by: Yi Chen <github@chenyicn.net>
2024-12-04 12:58:59 +00:00
C. H. Afzal a261523144
The webhook-key-name command-line param isn't taking effect (#2344)
Signed-off-by: C. H. Afzal <c-h-afzal@outlook.com>
2024-12-04 09:18:01 +00:00
dependabot[bot] 40423d5501
Bump github.com/onsi/ginkgo/v2 from 2.20.2 to 2.22.0 (#2335)
Bumps [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) from 2.20.2 to 2.22.0.
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](https://github.com/onsi/ginkgo/compare/v2.20.2...v2.22.0)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-04 09:12:59 +00:00
dependabot[bot] 270b09e4c7
Bump aquasecurity/trivy-action from 0.28.0 to 0.29.0 (#2332)
Bumps [aquasecurity/trivy-action](https://github.com/aquasecurity/trivy-action) from 0.28.0 to 0.29.0.
- [Release notes](https://github.com/aquasecurity/trivy-action/releases)
- [Commits](https://github.com/aquasecurity/trivy-action/compare/0.28.0...0.29.0)

---
updated-dependencies:
- dependency-name: aquasecurity/trivy-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-04 09:11:00 +00:00
Jacob Salway 43c1888c9d
Truncate UI service name if over 63 characters (#2311)
* Truncate UI service name if over 63 characters

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>

* Also truncate ingress name

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>

---------

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
2024-11-18 14:17:23 +00:00
Jacob Salway 22e4fb8e48
Bump `volcano.sh/apis` to 1.10.0 (#2320)
Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
2024-11-11 11:35:16 +00:00
Cian (Keen) Gallagher 2999546dc6
Fix: should not add emptyDir sizeLimit conf on executor pods if it is nil (#2316)
Signed-off-by: Cian Gallagher <cian@ciangallagher.net>
2024-11-11 02:13:15 +00:00
Nicholas Gretzon 72107fd7b8
Allow the Controller and Webhook Containers to run with the securityContext: readOnlyRootfilesystem: true (#2282)
* create a tmp dir for the controller to write Spark artifacts to and set the controller to readOnlyRootFilesystem

Signed-off-by: Nick Gretzon <npgretz@gmail.com>

* mount a dir for the webhook container to generate its certificates in and set readOnlyRootFilesystem: true for the webhook pod

Signed-off-by: Nick Gretzon <npgretz@gmail.com>

* update the securityContext in the controller deployment test

Signed-off-by: Nick Gretzon <npgretz@gmail.com>

* update securityContext of the webhook container in the deployment_test

Signed-off-by: Nick Gretzon <npgretz@gmail.com>

* update README

Signed-off-by: Nick Gretzon <npgretz@gmail.com>

* remove -- so comments are not rendered in the README.md

Signed-off-by: Nick Gretzon <npgretz@gmail.com>

* recreate README.md after removal of comments for volumes and volumeMounts

Signed-off-by: Nick Gretzon <npgretz@gmail.com>

* make indentation for volumes and volumeMounts consistent with rest of values.yaml

Signed-off-by: Nick Gretzon <npgretz@gmail.com>

* Revert "make indentation for volumes and volumeMounts consistent with rest of values.yaml"

This reverts commit dba97fc3d9458e5addfff79d021d23b30938cbb9.

Signed-off-by: Nick Gretzon <npgretz@gmail.com>

* fix indentation in webhook and controller deployment templates for volumes and volumeMounts

Signed-off-by: Nick Gretzon <npgretz@gmail.com>

* Update charts/spark-operator-chart/values.yaml

Co-authored-by: Yi Chen <github@chenyicn.net>
Signed-off-by: Nicholas Gretzon <50811947+npgretz@users.noreply.github.com>

* Update charts/spark-operator-chart/values.yaml

Co-authored-by: Yi Chen <github@chenyicn.net>
Signed-off-by: Nicholas Gretzon <50811947+npgretz@users.noreply.github.com>

* Update charts/spark-operator-chart/values.yaml

Co-authored-by: Yi Chen <github@chenyicn.net>
Signed-off-by: Nicholas Gretzon <50811947+npgretz@users.noreply.github.com>

* Update charts/spark-operator-chart/values.yaml

Co-authored-by: Yi Chen <github@chenyicn.net>
Signed-off-by: Nicholas Gretzon <50811947+npgretz@users.noreply.github.com>

* Update charts/spark-operator-chart/templates/controller/deployment.yaml

Co-authored-by: Yi Chen <github@chenyicn.net>
Signed-off-by: Nicholas Gretzon <50811947+npgretz@users.noreply.github.com>

* Update charts/spark-operator-chart/templates/controller/deployment.yaml

Co-authored-by: Yi Chen <github@chenyicn.net>
Signed-off-by: Nicholas Gretzon <50811947+npgretz@users.noreply.github.com>

* Update charts/spark-operator-chart/templates/webhook/deployment.yaml

Co-authored-by: Yi Chen <github@chenyicn.net>
Signed-off-by: Nicholas Gretzon <50811947+npgretz@users.noreply.github.com>

* Update charts/spark-operator-chart/templates/webhook/deployment.yaml

Co-authored-by: Yi Chen <github@chenyicn.net>
Signed-off-by: Nicholas Gretzon <50811947+npgretz@users.noreply.github.com>

* add additional securityContext to the controller deployment_test.yaml

Signed-off-by: Nick Gretzon <npgretz@gmail.com>

---------

Signed-off-by: Nick Gretzon <npgretz@gmail.com>
Signed-off-by: Nicholas Gretzon <50811947+npgretz@users.noreply.github.com>
Co-authored-by: Yi Chen <github@chenyicn.net>
2024-11-07 03:10:12 +00:00
Yi Chen 763682dfe6
Fix: should not add emptyDir sizeLimit conf if it is nil (#2305)
Signed-off-by: Yi Chen <github@chenyicn.net>
2024-11-04 11:17:15 +00:00
Yi Chen 171e429706
Fix: executor container security context does not work (#2306)
Signed-off-by: Yi Chen <github@chenyicn.net>
2024-11-04 11:03:15 +00:00
Aran Shavit 515d805b8a
Allow setting automountServiceAccountToken (#2298)
* Allow setting automountServiceAccountToken on workloads and serviceAccounts

Signed-off-by: Aran Shavit <Aranshavit@gmail.com>

* update helm docs

Signed-off-by: Aran Shavit <Aranshavit@gmail.com>

---------

Signed-off-by: Aran Shavit <Aranshavit@gmail.com>
2024-11-04 07:50:16 +00:00
Yi Chen 85f0ed039d
Update issue and pull request templates (#2287)
* Update issue templates

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update pull request template

Signed-off-by: Yi Chen <github@chenyicn.net>

* Referring to the operator as the Spark operator

Signed-off-by: Yi Chen <github@chenyicn.net>

* Add link to the Spark operator slack channel

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
2024-10-26 00:08:31 +00:00
216 changed files with 7748 additions and 8865 deletions

View File

@ -1,31 +1,7 @@
.github/
.idea/
.vscode/
bin/
charts/
docs/
config/
examples/
hack/
manifest/
spark-docker/
sparkctl/
test/
vendor/
.dockerignore
.DS_Store
.gitignore
.gitlab-ci.yaml
.golangci.yaml
.pre-commit-config.yaml
ADOPTERS.md
CODE_OF_CONDUCT.md
codecov.ymal
CONTRIBUTING.md
codecov.yaml
cover.out
Dockerfile
LICENSE
OWNERS
PROJECT
README.md
test.sh
.DS_Store
*.iml

View File

@ -1,46 +0,0 @@
---
name: Bug report
about: Create a report to help us improve
title: '[BUG] Brief description of the issue'
labels: bug
---
## Description
Please provide a clear and concise description of the issue you are encountering, and a reproduction of your configuration.
If your request is for a new feature, please use the `Feature request` template.
- [ ] ✋ I have searched the open/closed issues and my issue is not listed.
## Reproduction Code [Required]
<!-- REQUIRED -->
Steps to reproduce the behavior:
## Expected behavior
<!-- A clear and concise description of what you expected to happen -->
## Actual behavior
<!-- A clear and concise description of what actually happened -->
### Terminal Output Screenshot(s)
<!-- Optional but helpful -->
## Environment & Versions
- Spark Operator App version:
- Helm Chart Version:
- Kubernetes Version:
- Apache Spark version:
## Additional context
<!-- Add any other context about the problem here -->

54
.github/ISSUE_TEMPLATE/bug_report.yaml vendored Normal file
View File

@ -0,0 +1,54 @@
name: Bug Report
description: Tell us about a problem you are experiencing with the Spark operator.
labels:
- kind/bug
- lifecycle/needs-triage
body:
- type: markdown
attributes:
value: |
Thanks for taking the time to fill out this Spark operator bug report!
- type: textarea
id: problem
attributes:
label: What happened?
description: |
Please provide a clear and concise description of the issue you are encountering, and a reproduction of your configuration.
If your request is for a new feature, please use the `Feature request` template.
value: |
- [ ] ✋ I have searched the open/closed issues and my issue is not listed.
validations:
required: true
- type: textarea
id: reproduce
attributes:
label: Reproduction Code
description: Steps to reproduce the behavior.
- type: textarea
id: expected
attributes:
label: Expected behavior
description: A clear and concise description of what you expected to happen.
- type: textarea
id: actual
attributes:
label: Actual behavior
description: A clear and concise description of what actually happened.
- type: textarea
id: environment
attributes:
label: Environment & Versions
value: |
- Kubernetes Version:
- Spark Operator Version:
- Apache Spark Version:
- type: textarea
id: context
attributes:
label: Additional context
description: Add any other context about the problem here.
- type: input
id: votes
attributes:
label: Impacted by this bug?
value: Give it a 👍 We prioritize the issues with most 👍

9
.github/ISSUE_TEMPLATE/config.yaml vendored Normal file
View File

@ -0,0 +1,9 @@
blank_issues_enabled: true
contact_links:
- name: Spark Operator Documentation
url: https://www.kubeflow.org/docs/components/spark-operator
about: Much help can be found in the docs
- name: Spark Operator Slack Channel
url: https://app.slack.com/client/T08PSQ7BQ/C074588U7EG
about: Ask questions about the Spark Operator

View File

@ -1,32 +0,0 @@
---
name: Feature request
about: Suggest an idea for this project
title: '[FEATURE] Brief description of the feature'
labels: enhancement
---
<!--- Please keep this note for the community --->
### Community Note
* Please vote on this issue by adding a 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to the original issue to help the community and maintainers prioritize this request
* Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
* If you are interested in working on this issue or have submitted a pull request, please leave a comment
<!--- Thank you for keeping this note for the community --->
#### What is the outcome that you are trying to reach?
<!-- A clear and concise description of what the problem is. -->
#### Describe the solution you would like
<!-- A clear and concise description of what you want to happen. -->
#### Describe alternatives you have considered
<!-- A clear and concise description of any alternative solutions or features you've considered. -->
#### Additional context
<!-- Add any other context or screenshots about the feature request here. -->

View File

@ -0,0 +1,47 @@
name: Feature Request
description: Suggest an idea for the Spark operator.
labels:
- kind/feature
- lifecycle/needs-triage
body:
- type: markdown
attributes:
value: |
Thanks for taking the time to fill out this Spark operator feature request!
- type: markdown
attributes:
value: |
- Please vote on this issue by adding a 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to the original issue to help the community and maintainers prioritize this request.
- Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
- If you are interested in working on this issue or have submitted a pull request, please leave a comment.
- type: textarea
id: feature
attributes:
label: What feature you would like to be added?
description: |
A clear and concise description of what you want to add to the Spark operator.
Please consider to write a Spark operator enhancement proposal if it is a large feature request.
validations:
required: true
- type: textarea
id: rationale
attributes:
label: Why is this needed?
- type: textarea
id: solution
attributes:
label: Describe the solution you would like
- type: textarea
id: alternatives
attributes:
label: Describe alternatives you have considered
- type: textarea
id: context
attributes:
label: Additional context
description: Add any other context or screenshots about the feature request here.
- type: input
id: votes
attributes:
label: Love this feature?
value: Give it a 👍 We prioritize the features with most 👍

View File

@ -1,20 +0,0 @@
---
name: Question
about: I have a Question
title: '[QUESTION] Brief description of the Question'
labels: question
---
- [ ] ✋ I have searched the open/closed issues and my issue is not listed.
#### Please describe your question here
<!-- Provide as much information as possible to explain your question -->
#### Provide a link to the example/module related to the question
<!-- Please provide the link to the example related to this question from this repo -->
#### Additional context
<!-- Add any other context or screenshots about the question here -->

30
.github/ISSUE_TEMPLATE/question.yaml vendored Normal file
View File

@ -0,0 +1,30 @@
name: Question
description: Ask question about the Spark operator.
labels:
- kind/question
- lifecycle/needs-triage
body:
- type: markdown
attributes:
value: |
Thanks for taking the time to fill out this question!
- type: textarea
id: feature
attributes:
label: What question do you want to ask?
description: |
A clear and concise description of what you want to ask about the Spark operator.
value: |
- [ ] ✋ I have searched the open/closed issues and my issue is not listed.
validations:
required: true
- type: textarea
id: rationale
attributes:
label: Additional context
description: Add any other context or screenshots about the question here.
- type: input
id: votes
attributes:
label: Have the same question?
value: Give it a 👍 We prioritize the question with most 👍

View File

@ -1,18 +1,23 @@
### 🛑 Important:
Please open an issue to discuss significant work before you start. We appreciate your contributions and don't want your efforts to go to waste!
For guidelines on how to contribute, please review the [CONTRIBUTING.md](CONTRIBUTING.md) document.
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, check our contributor guidelines: https://www.kubeflow.org/docs/about/contributing
2. To know more about how to develop with the Spark operator, check the developer guide: https://www.kubeflow.org/docs/components/spark-operator/developer-guide/
3. If you want *faster* PR reviews, check how: https://git.k8s.io/community/contributors/guide/pull-requests.md#best-practices-for-faster-reviews
4. Please open an issue to discuss significant work before you start. We appreciate your contributions and don't want your efforts to go to waste!
-->
## Purpose of this PR
Provide a clear and concise description of the changes. Explain the motivation behind these changes and link to relevant issues or discussions.
<!-- Provide a clear and concise description of the changes. Explain the motivation behind these changes and link to relevant issues or discussions. -->
**Proposed changes:**
- <Change 1>
- <Change 2>
- <Change 3>
## Change Category
Indicate the type of change by marking the applicable boxes:
<!-- Indicate the type of change by marking the applicable boxes. -->
- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] Feature (non-breaking change which adds functionality)
@ -23,9 +28,9 @@ Indicate the type of change by marking the applicable boxes:
<!-- Provide reasoning for the changes if not already covered in the description above. -->
## Checklist
Before submitting your PR, please review the following:
<!-- Before submitting your PR, please review the following: -->
- [ ] I have conducted a self-review of my own code.
- [ ] I have updated documentation accordingly.
@ -35,4 +40,3 @@ Before submitting your PR, please review the following:
### Additional Notes
<!-- Include any additional notes or context that could be helpful for the reviewers here. -->

View File

@ -47,6 +47,10 @@ jobs:
false
fi
- name: Verify Codegen
run: |
make verify-codegen
- name: Run go fmt check
run: |
make go-fmt
@ -110,22 +114,6 @@ jobs:
- name: Build Spark operator
run: make build-operator
build-sparkctl:
runs-on: ubuntu-latest
steps:
- name: Checkout source code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version-file: go.mod
- name: Build sparkctl
run: make build-sparkctl
build-helm-chart:
runs-on: ubuntu-latest
steps:
@ -196,7 +184,7 @@ jobs:
- name: setup minikube
if: steps.list-changed.outputs.changed == 'true'
uses: manusa/actions-setup-minikube@v2.13.1
uses: manusa/actions-setup-minikube@v2.14.0
with:
minikube version: v1.33.0
kubernetes version: v1.30.0
@ -206,8 +194,8 @@ jobs:
- name: Run chart-testing (install)
if: steps.list-changed.outputs.changed == 'true'
run: |
docker build -t docker.io/kubeflow/spark-operator:local .
minikube image load docker.io/kubeflow/spark-operator:local
docker build -t ghcr.io/kubeflow/spark-operator/controller:local .
minikube image load ghcr.io/kubeflow/spark-operator/controller:local
ct install --target-branch ${{ steps.get_branch.outputs.BRANCH }}
e2e-test:

View File

@ -13,8 +13,8 @@ concurrency:
env:
SEMVER_PATTERN: '^v([0-9]+)\.([0-9]+)\.([0-9]+)(-rc\.([0-9]+))?$'
IMAGE_REGISTRY: docker.io
IMAGE_REPOSITORY: kubeflow/spark-operator
IMAGE_REGISTRY: ghcr.io
IMAGE_REPOSITORY: kubeflow/spark-operator/controller
jobs:
check-release:
@ -65,51 +65,9 @@ jobs:
echo "Tag '${VERSION}' does not exist."
fi
release_sparkctl:
needs:
- check-release
runs-on: ubuntu-latest
strategy:
fail-fast: true
matrix:
os:
- linux
- darwin
arch:
- amd64
- arm64
env:
GOOS: ${{ matrix.os }}
GOARCH: ${{ matrix.arch }}
steps:
- name: Checkout source code
uses: actions/checkout@v4
- name: Read version from VERSION file
run: |
VERSION=$(cat VERSION | sed "s/^v//")
echo "VERSION=${VERSION}" >> $GITHUB_ENV
- name: Build sparkctl binary
run: |
make build-sparkctl
tar -czvf sparkctl-${VERSION}-${GOOS}-${GOARCH}.tgz -C bin sparkctl
- name: Upload sparkctl binary
uses: actions/upload-artifact@v4
with:
name: sparkctl-${{ env.VERSION }}-${{ env.GOOS }}-${{ env.GOARCH }}
path: sparkctl-${{ env.VERSION }}-${{ env.GOOS }}-${{ env.GOARCH }}.tgz
if-no-files-found: error
retention-days: 1
build_images:
needs:
- release_sparkctl
- check-release
runs-on: ubuntu-latest
@ -152,8 +110,8 @@ jobs:
uses: docker/login-action@v3
with:
registry: ${{ env.IMAGE_REGISTRY }}
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push by digest
id: build
@ -214,8 +172,8 @@ jobs:
uses: docker/login-action@v3
with:
registry: ${{ env.IMAGE_REGISTRY }}
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Create manifest list and push
working-directory: /tmp/digests
@ -288,11 +246,6 @@ jobs:
helm package charts/${chart}
done
- name: Download artifacts
uses: actions/download-artifact@v4
with:
pattern: sparkctl-*
- name: Release
id: release
uses: softprops/action-gh-release@v2
@ -305,4 +258,3 @@ jobs:
draft: true
files: |
*.tgz
sparkctl-*/sparkctl-*.tgz

View File

@ -15,7 +15,7 @@ jobs:
run: make print-IMAGE >> $GITHUB_ENV
- name: trivy scan for github security tab
uses: aquasecurity/trivy-action@0.30.0
uses: aquasecurity/trivy-action@0.32.0
with:
image-ref: '${{ env.IMAGE }}'
format: 'sarif'

16
.gitignore vendored
View File

@ -1,11 +1,7 @@
bin/
vendor/
cover.out
sparkctl/sparkctl
sparkctl/sparkctl-linux-amd64
sparkctl/sparkctl-darwin-amd64
**/*.iml
# Various IDEs
.idea/
.vscode/
.vscode/
bin/
codecov.yaml
cover.out
.DS_Store
*.iml

View File

@ -1,6 +1,9 @@
version: "2"
run:
# Timeout for analysis, e.g. 30s, 5m.
# Default: 1m
# Timeout for total work, e.g. 30s, 5m, 5m30s.
# If the value is lower or equal to 0, the timeout is disabled.
# Default: 0 (disabled)
timeout: 2m
linters:
@ -13,8 +16,6 @@ linters:
- dupword
# Tool for detection of FIXME, TODO and other comment keywords.
# - godox
# Check import statements are formatted according to the 'goimport' command.
- goimports
# Enforces consistent import aliases.
- importas
# Find code that shadows one of Go's predeclared identifiers.
@ -26,15 +27,28 @@ linters:
# Checks Go code for unused constants, variables, functions and types.
- unused
settings:
importas:
# List of aliases
alias:
- pkg: k8s.io/api/admissionregistration/v1
alias: admissionregistrationv1
- pkg: k8s.io/api/apps/v1
alias: appsv1
- pkg: k8s.io/api/batch/v1
alias: batchv1
- pkg: k8s.io/api/core/v1
alias: corev1
- pkg: k8s.io/api/extensions/v1beta1
alias: extensionsv1beta1
- pkg: k8s.io/api/networking/v1
alias: networkingv1
- pkg: k8s.io/apimachinery/pkg/apis/meta/v1
alias: metav1
- pkg: sigs.k8s.io/controller-runtime
alias: ctrl
issues:
# Which dirs to exclude: issues from them won't be reported.
# Can use regexp here: `generated.*`, regexp is applied on full path,
# including the path prefix if one is set.
# Default dirs are skipped independently of this option's value (see exclude-dirs-use-default).
# "/" will be replaced by current OS file path separator to properly work on Windows.
# Default: []
exclude-dirs:
- sparkctl
# Maximum issues count per one linter.
# Set to 0 to disable.
# Default: 50
@ -44,23 +58,8 @@ issues:
# Default: 3
max-same-issues: 3
linters-settings:
importas:
# List of aliases
alias:
- pkg: k8s.io/api/admissionregistration/v1
alias: admissionregistrationv1
- pkg: k8s.io/api/apps/v1
alias: appsv1
- pkg: k8s.io/api/batch/v1
alias: batchv1
- pkg: k8s.io/api/core/v1
alias: corev1
- pkg: k8s.io/api/extensions/v1beta1
alias: extensionsv1beta1
- pkg: k8s.io/api/networking/v1
alias: networkingv1
- pkg: k8s.io/apimachinery/pkg/apis/meta/v1
alias: metav1
- pkg: sigs.k8s.io/controller-runtime
alias: ctrl
formatters:
enable:
# Check import statements are formatted according to the 'goimport' command.
- goimports

View File

@ -5,6 +5,7 @@ Below are the adopters of project Spark Operator. If you are using Spark Operato
| Organization | Contact (GitHub User Name) | Environment | Description of Use |
| ------------- | ------------- | ------------- | ------------- |
| [Alibaba Cloud](https://www.alibabacloud.com) | [@ChenYi015](https://github.com/ChenYi015) | Production | AI & Data Infrastructure |
| [APRA AMCOS](https://www.apraamcos.com.au/) | @shuch3ng | Production | Data Platform |
| [Beeline](https://beeline.ru) | @spestua | Evaluation | ML & Data Infrastructure |
| Bringg | @EladDolev | Production | ML & Analytics Data Platform |
| [Caicloud](https://intl.caicloud.io/) | @gaocegege | Production | Cloud-Native AI Platform |
@ -32,6 +33,7 @@ Below are the adopters of project Spark Operator. If you are using Spark Operato
| [Molex](https://www.molex.com/) | @AshishPushpSingh | Evaluation/Production | Data Platform |
| [MongoDB](https://www.mongodb.com) | @chickenpopcorn | Production | Data Infrastructure |
| Nielsen Identity Engine | @roitvt | Evaluation | Data pipelines |
| [Ninja Van](https://tech.ninjavan.co/) | @hongshaoyang | Production | Data Infrastructure |
| [PUBG](https://careers.pubg.com/#/en/) | @jacobhjkim | Production | ML & Data Infrastructure |
| [Qualytics](https://www.qualytics.co/) | @josecsotomorales | Production | Data Quality Platform |
| Riskified | @henbh | Evaluation | Analytics Data Platform |

View File

@ -1,5 +1,171 @@
# Changelog
## [v2.2.1](https://github.com/kubeflow/spark-operator/tree/v2.2.1) (2025-06-27)
### Features
- Customize ingress URL with Spark application ID ([#2554](https://github.com/kubeflow/spark-operator/pull/2554) by [@ChenYi015](https://github.com/ChenYi015))
- Make default ingress tls and annotations congurable in the helm config ([#2513](https://github.com/kubeflow/spark-operator/pull/2513) by [@Tom-Newton](https://github.com/Tom-Newton))
- Use code-generator for clientset, informers, listers ([#2563](https://github.com/kubeflow/spark-operator/pull/2563) by [@jbhalodia-slack](https://github.com/jbhalodia-slack))
### Misc
- add driver ingress unit tests ([#2552](https://github.com/kubeflow/spark-operator/pull/2552) by [@nabuskey](https://github.com/nabuskey))
- Get logger from context ([#2551](https://github.com/kubeflow/spark-operator/pull/2551) by [@ChenYi015](https://github.com/ChenYi015))
- Update golangci lint ([#2560](https://github.com/kubeflow/spark-operator/pull/2560) by [@joshuacuellar1](https://github.com/joshuacuellar1))
### Dependencies
- Bump aquasecurity/trivy-action from 0.30.0 to 0.31.0 ([#2557](https://github.com/kubeflow/spark-operator/pull/2557) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/prometheus/client_golang from 1.21.1 to 1.22.0 ([#2548](https://github.com/kubeflow/spark-operator/pull/2548) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump sigs.k8s.io/scheduler-plugins from 0.30.6 to 0.31.8 ([#2549](https://github.com/kubeflow/spark-operator/pull/2549) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump golang.org/x/mod from 0.24.0 to 0.25.0 ([#2566](https://github.com/kubeflow/spark-operator/pull/2566) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/go-logr/logr from 1.4.2 to 1.4.3 ([#2567](https://github.com/kubeflow/spark-operator/pull/2567) by [@dependabot[bot]](https://github.com/apps/dependabot))
## [v2.2.0](https://github.com/kubeflow/spark-operator/tree/v2.2.0) (2025-05-29)
### Features
- Upgrade to Spark 3.5.5 ([#2490](https://github.com/kubeflow/spark-operator/pull/2490) by [@jacobsalway](https://github.com/jacobsalway))
- Add timeZone to ScheduledSparkApplication ([#2471](https://github.com/kubeflow/spark-operator/pull/2471) by [@jacobsalway](https://github.com/jacobsalway))
- Enable the override of MemoryLimit through webhook ([#2478](https://github.com/kubeflow/spark-operator/pull/2478) by [@danielrsfreitas](https://github.com/danielrsfreitas))
- Add ShuffleTrackingEnabled to DynamicAllocation struct to allow disabling shuffle tracking ([#2511](https://github.com/kubeflow/spark-operator/pull/2511) by [@jbhalodia-slack](https://github.com/jbhalodia-slack))
- Define SparkApplicationSubmitter interface to allow customizing submitting mechanism ([#2500](https://github.com/kubeflow/spark-operator/pull/2500) by [@ChenYi015](https://github.com/ChenYi015))
- Add support for using cert manager to generate webhook certificates ([#2373](https://github.com/kubeflow/spark-operator/pull/2373) by [@ChenYi015](https://github.com/ChenYi015))
### Bug Fixes
- fix: add webhook cert validity checking ([#2489](https://github.com/kubeflow/spark-operator/pull/2489) by [@teejaded](https://github.com/teejaded))
- fix and add back unit tests ([#2532](https://github.com/kubeflow/spark-operator/pull/2532) by [@nabuskey](https://github.com/nabuskey))
- fix volcano tests ([#2533](https://github.com/kubeflow/spark-operator/pull/2533) by [@nabuskey](https://github.com/nabuskey))
- Add v2 to module path ([#2515](https://github.com/kubeflow/spark-operator/pull/2515) by [@ChenYi015](https://github.com/ChenYi015))
- #2525 spark metrics in depends on prometheus ([#2529](https://github.com/kubeflow/spark-operator/pull/2529) by [@blcksrx](https://github.com/blcksrx))
### Misc
- Add APRA AMCOS to adopters ([#2485](https://github.com/kubeflow/spark-operator/pull/2485) by [@shuch3ng](https://github.com/shuch3ng))
- Bump github.com/stretchr/testify from 1.9.0 to 1.10.0 ([#2488](https://github.com/kubeflow/spark-operator/pull/2488) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/prometheus/client_golang from 1.20.5 to 1.21.1 ([#2487](https://github.com/kubeflow/spark-operator/pull/2487) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump sigs.k8s.io/controller-runtime from 0.20.1 to 0.20.4 ([#2486](https://github.com/kubeflow/spark-operator/pull/2486) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Deprecating sparkctl ([#2484](https://github.com/kubeflow/spark-operator/pull/2484) by [@vikas-saxena02](https://github.com/vikas-saxena02))
- Changing image repo from docker.io to ghcr.io ([#2483](https://github.com/kubeflow/spark-operator/pull/2483) by [@vikas-saxena02](https://github.com/vikas-saxena02))
- Upgrade Golang to 1.24.1 and golangci-lint to 1.64.8 ([#2494](https://github.com/kubeflow/spark-operator/pull/2494) by [@jacobsalway](https://github.com/jacobsalway))
- Bump helm.sh/helm/v3 from 3.16.2 to 3.17.3 ([#2503](https://github.com/kubeflow/spark-operator/pull/2503) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Add changelog for v2.1.1 ([#2504](https://github.com/kubeflow/spark-operator/pull/2504) by [@ChenYi015](https://github.com/ChenYi015))
- Remove sparkctl ([#2466](https://github.com/kubeflow/spark-operator/pull/2466) by [@ChenYi015](https://github.com/ChenYi015))
- Bump github.com/spf13/viper from 1.19.0 to 1.20.1 ([#2496](https://github.com/kubeflow/spark-operator/pull/2496) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump golang.org/x/net from 0.37.0 to 0.38.0 ([#2505](https://github.com/kubeflow/spark-operator/pull/2505) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Remove clientset, informer and listers generated by code-generator ([#2506](https://github.com/kubeflow/spark-operator/pull/2506) by [@ChenYi015](https://github.com/ChenYi015))
- Remove v1beta1 API ([#2516](https://github.com/kubeflow/spark-operator/pull/2516) by [@ChenYi015](https://github.com/ChenYi015))
- add unit tests for driver and executor configs ([#2521](https://github.com/kubeflow/spark-operator/pull/2521) by [@nabuskey](https://github.com/nabuskey))
- Adding securityContext to spark examples ([#2530](https://github.com/kubeflow/spark-operator/pull/2530) by [@tarekabouzeid](https://github.com/tarekabouzeid))
- Bump github.com/spf13/cobra from 1.8.1 to 1.9.1 ([#2497](https://github.com/kubeflow/spark-operator/pull/2497) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump golang.org/x/mod from 0.23.0 to 0.24.0 ([#2495](https://github.com/kubeflow/spark-operator/pull/2495) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Adding Manabu to the reviewers ([#2522](https://github.com/kubeflow/spark-operator/pull/2522) by [@vara-bonthu](https://github.com/vara-bonthu))
- Bump manusa/actions-setup-minikube from 2.13.1 to 2.14.0 ([#2523](https://github.com/kubeflow/spark-operator/pull/2523) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump k8s.io dependencies to v0.32.5 ([#2540](https://github.com/kubeflow/spark-operator/pull/2540) by [@ChenYi015](https://github.com/ChenYi015))
- Pass the correct LDFLAGS when building the operator image ([#2541](https://github.com/kubeflow/spark-operator/pull/2541) by [@ChenYi015](https://github.com/ChenYi015))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/v2.1.1...v2.2.0)
## [v2.1.1](https://github.com/kubeflow/spark-operator/tree/v2.1.1) (2025-03-21)
### Features
- Adding seccompProfile RuntimeDefault ([#2397](https://github.com/kubeflow/spark-operator/pull/2397) by [@tarekabouzeid](https://github.com/tarekabouzeid))
- Add option for disabling leader election ([#2423](https://github.com/kubeflow/spark-operator/pull/2423) by [@ChenYi015](https://github.com/ChenYi015))
- Controller should only be granted event permissions in spark job namespaces ([#2426](https://github.com/kubeflow/spark-operator/pull/2426) by [@ChenYi015](https://github.com/ChenYi015))
- Make image optional ([#2439](https://github.com/kubeflow/spark-operator/pull/2439) by [@jbhalodia-slack](https://github.com/jbhalodia-slack))
- Support non-standard Spark container names ([#2441](https://github.com/kubeflow/spark-operator/pull/2441) by [@jbhalodia-slack](https://github.com/jbhalodia-slack))
- add support for metrics-job-start-latency-buckets flag in helm ([#2450](https://github.com/kubeflow/spark-operator/pull/2450) by [@nabuskey](https://github.com/nabuskey))
### Bug Fixes
- fix: webhook fail to add lifecycle to Spark3 executor pods ([#2458](https://github.com/kubeflow/spark-operator/pull/2458) by [@pvbouwel](https://github.com/pvbouwel))
- change env in executorSecretOption ([#2467](https://github.com/kubeflow/spark-operator/pull/2467) by [@TQJADE](https://github.com/TQJADE))
### Misc
- Move sparkctl to cmd directory ([#2347](https://github.com/kubeflow/spark-operator/pull/2347) by [@ChenYi015](https://github.com/ChenYi015))
- Bump golang.org/x/net from 0.30.0 to 0.32.0 ([#2350](https://github.com/kubeflow/spark-operator/pull/2350) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump golang.org/x/crypto from 0.30.0 to 0.31.0 ([#2365](https://github.com/kubeflow/spark-operator/pull/2365) by [@dependabot[bot]](https://github.com/apps/dependabot))
- add an example of using prometheus servlet ([#2403](https://github.com/kubeflow/spark-operator/pull/2403) by [@nabuskey](https://github.com/nabuskey))
- Remove dependency on `k8s.io/kubernetes` ([#2398](https://github.com/kubeflow/spark-operator/pull/2398) by [@jacobsalway](https://github.com/jacobsalway))
- fix make deploy and install ([#2412](https://github.com/kubeflow/spark-operator/pull/2412) by [@nabuskey](https://github.com/nabuskey))
- Add helm unittest step to integration test workflow ([#2424](https://github.com/kubeflow/spark-operator/pull/2424) by [@ChenYi015](https://github.com/ChenYi015))
- ensure passed context is used ([#2432](https://github.com/kubeflow/spark-operator/pull/2432) by [@nabuskey](https://github.com/nabuskey))
- Bump manusa/actions-setup-minikube from 2.13.0 to 2.13.1 ([#2390](https://github.com/kubeflow/spark-operator/pull/2390) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump helm/chart-testing-action from 2.6.1 to 2.7.0 ([#2391](https://github.com/kubeflow/spark-operator/pull/2391) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump golang.org/x/mod from 0.21.0 to 0.23.0 ([#2427](https://github.com/kubeflow/spark-operator/pull/2427) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/golang/glog from 1.2.2 to 1.2.4 ([#2411](https://github.com/kubeflow/spark-operator/pull/2411) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump golang.org/x/net from 0.32.0 to 0.35.0 ([#2428](https://github.com/kubeflow/spark-operator/pull/2428) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Support Kubernetes 1.32 ([#2416](https://github.com/kubeflow/spark-operator/pull/2416) by [@jacobsalway](https://github.com/jacobsalway))
- use cmd context in sparkctl ([#2447](https://github.com/kubeflow/spark-operator/pull/2447) by [@nabuskey](https://github.com/nabuskey))
- Bump golang.org/x/net from 0.35.0 to 0.36.0 ([#2470](https://github.com/kubeflow/spark-operator/pull/2470) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump aquasecurity/trivy-action from 0.29.0 to 0.30.0 ([#2475](https://github.com/kubeflow/spark-operator/pull/2475) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump golang.org/x/net from 0.35.0 to 0.37.0 ([#2472](https://github.com/kubeflow/spark-operator/pull/2472) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/containerd/containerd from 1.7.19 to 1.7.27 ([#2476](https://github.com/kubeflow/spark-operator/pull/2476) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump k8s.io/apimachinery from 0.32.0 to 0.32.3 ([#2474](https://github.com/kubeflow/spark-operator/pull/2474) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.66.0 to 1.78.2 ([#2473](https://github.com/kubeflow/spark-operator/pull/2473) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/aws/aws-sdk-go-v2/config from 1.28.0 to 1.29.9 ([#2463](https://github.com/kubeflow/spark-operator/pull/2463) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump sigs.k8s.io/scheduler-plugins from 0.29.8 to 0.30.6 ([#2444](https://github.com/kubeflow/spark-operator/pull/2444) by [@dependabot[bot]](https://github.com/apps/dependabot))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/v2.1.0...v2.1.1)
## [v2.1.0](https://github.com/kubeflow/spark-operator/tree/v2.1.0) (2024-12-06)
### New Features
- Upgrade to Spark 3.5.3 ([#2202](https://github.com/kubeflow/spark-operator/pull/2202) by [@jacobsalway](https://github.com/jacobsalway))
- feat: support archives param for spark-submit ([#2256](https://github.com/kubeflow/spark-operator/pull/2256) by [@kaka-zb](https://github.com/kaka-zb))
- Allow --ingress-class-name to be specified in chart ([#2278](https://github.com/kubeflow/spark-operator/pull/2278) by [@jacobsalway](https://github.com/jacobsalway))
- Update default container security context ([#2265](https://github.com/kubeflow/spark-operator/pull/2265) by [@ChenYi015](https://github.com/ChenYi015))
- Support pod template for Spark 3.x applications ([#2141](https://github.com/kubeflow/spark-operator/pull/2141) by [@ChenYi015](https://github.com/ChenYi015))
- Allow setting automountServiceAccountToken ([#2298](https://github.com/kubeflow/spark-operator/pull/2298) by [@Aranch](https://github.com/Aransh))
- Allow the Controller and Webhook Containers to run with the securityContext: readOnlyRootfilesystem: true ([#2282](https://github.com/kubeflow/spark-operator/pull/2282) by [@npgretz](https://github.com/npgretz))
- Use NSS_WRAPPER_PASSWD instead of /etc/passwd as in spark-operator image entrypoint.sh ([#2312](https://github.com/kubeflow/spark-operator/pull/2312) by [@Aakcht](https://github.com/Aakcht))
### Bug Fixes
- Minor fixes to e2e test `make` targets ([#2242](https://github.com/kubeflow/spark-operator/pull/2242) by [@Tom-Newton](https://github.com/Tom-Newton))
- Added off heap memory to calculation for YuniKorn gang scheduling ([#2209](https://github.com/kubeflow/spark-operator/pull/2209) by [@guangyu-yang-rokt](https://github.com/guangyu-yang-rokt))
- Add permissions to controller serviceaccount to list and watch ingresses ([#2246](https://github.com/kubeflow/spark-operator/pull/2246) by [@tcassaert](https://github.com/tcassaert))
- Make sure enable-ui-service flag is set to false when controller.uiService.enable is set to false ([#2261](https://github.com/kubeflow/spark-operator/pull/2261) by [@Roberdvs](https://github.com/Roberdvs))
- `omitempty` corrections ([#2255](https://github.com/kubeflow/spark-operator/pull/2255) by [@Tom-Newton](https://github.com/Tom-Newton))
- Fix retries ([#2241](https://github.com/kubeflow/spark-operator/pull/2241) by [@Tom-Newton](https://github.com/Tom-Newton))
- Fix: executor container security context does not work ([#2306](https://github.com/kubeflow/spark-operator/pull/2306) by [@ChenYi015](https://github.com/ChenYi015))
- Fix: should not add emptyDir sizeLimit conf if it is nil ([#2305](https://github.com/kubeflow/spark-operator/pull/2305) by [@ChenYi015](https://github.com/ChenYi015))
- Fix: should not add emptyDir sizeLimit conf on executor pods if it is nil ([#2316](https://github.com/kubeflow/spark-operator/pull/2316) by [@Cian911](https://github.com/Cian911))
- Truncate UI service name if over 63 characters ([#2311](https://github.com/kubeflow/spark-operator/pull/2311) by [@jacobsalway](https://github.com/jacobsalway))
- The webhook-key-name command-line param isn't taking effect ([#2344](https://github.com/kubeflow/spark-operator/pull/2344) by [@c-h-afzal](https://github.com/c-h-afzal))
- Robustness to driver pod taking time to create ([#2315](https://github.com/kubeflow/spark-operator/pull/2315) by [@Tom-Newton](https://github.com/Tom-Newton))
### Misc
- remove redundant test.sh file ([#2243](https://github.com/kubeflow/spark-operator/pull/2243) by [@ChenYi015](https://github.com/ChenYi015))
- Bump github.com/aws/aws-sdk-go-v2/config from 1.27.42 to 1.27.43 ([#2252](https://github.com/kubeflow/spark-operator/pull/2252) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump manusa/actions-setup-minikube from 2.12.0 to 2.13.0 ([#2247](https://github.com/kubeflow/spark-operator/pull/2247) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump golang.org/x/net from 0.29.0 to 0.30.0 ([#2251](https://github.com/kubeflow/spark-operator/pull/2251) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump aquasecurity/trivy-action from 0.24.0 to 0.27.0 ([#2248](https://github.com/kubeflow/spark-operator/pull/2248) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump gocloud.dev from 0.39.0 to 0.40.0 ([#2250](https://github.com/kubeflow/spark-operator/pull/2250) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Add Quick Start guide to README ([#2259](https://github.com/kubeflow/spark-operator/pull/2259) by [@jacobsalway](https://github.com/jacobsalway))
- Bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.63.3 to 1.65.3 ([#2249](https://github.com/kubeflow/spark-operator/pull/2249) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Add release badge to README ([#2263](https://github.com/kubeflow/spark-operator/pull/2263) by [@jacobsalway](https://github.com/jacobsalway))
- Bump helm.sh/helm/v3 from 3.16.1 to 3.16.2 ([#2275](https://github.com/kubeflow/spark-operator/pull/2275) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/prometheus/client_golang from 1.20.4 to 1.20.5 ([#2274](https://github.com/kubeflow/spark-operator/pull/2274) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump cloud.google.com/go/storage from 1.44.0 to 1.45.0 ([#2273](https://github.com/kubeflow/spark-operator/pull/2273) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Run e2e tests with Kubernetes version matrix ([#2266](https://github.com/kubeflow/spark-operator/pull/2266) by [@jacobsalway](https://github.com/jacobsalway))
- Bump aquasecurity/trivy-action from 0.27.0 to 0.28.0 ([#2270](https://github.com/kubeflow/spark-operator/pull/2270) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.65.3 to 1.66.0 ([#2271](https://github.com/kubeflow/spark-operator/pull/2271) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/aws/aws-sdk-go-v2/config from 1.27.43 to 1.28.0 ([#2272](https://github.com/kubeflow/spark-operator/pull/2272) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Add workflow for releasing sparkctl binary ([#2264](https://github.com/kubeflow/spark-operator/pull/2264) by [@ChenYi015](https://github.com/ChenYi015))
- Bump `volcano.sh/apis` to 1.10.0 ([#2320](https://github.com/kubeflow/spark-operator/pull/2320) by [@jacobsalway](https://github.com/jacobsalway))
- Bump aquasecurity/trivy-action from 0.28.0 to 0.29.0 ([#2332](https://github.com/kubeflow/spark-operator/pull/2332) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/onsi/ginkgo/v2 from 2.20.2 to 2.22.0 ([#2335](https://github.com/kubeflow/spark-operator/pull/2335) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Move sparkctl to cmd directory ([#2347](https://github.com/kubeflow/spark-operator/pull/2347) by [@ChenYi015](https://github.com/ChenYi015))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/a8b5d6...v2.1.0 )
## [v2.0.2](https://github.com/kubeflow/spark-operator/tree/v2.0.2) (2024-10-10)
### Bug Fixes

View File

@ -14,9 +14,9 @@
# limitations under the License.
#
ARG SPARK_IMAGE=spark:3.5.3
ARG SPARK_IMAGE=docker.io/library/spark:4.0.0
FROM golang:1.23.1 AS builder
FROM golang:1.24.1 AS builder
WORKDIR /workspace

View File

@ -21,7 +21,7 @@ GIT_TREE_STATE := $(shell if [ -z "`git status --porcelain`" ]; then echo "clean
GIT_SHA := $(shell git rev-parse --short HEAD || echo "HEAD")
GIT_VERSION := ${VERSION}+${GIT_SHA}
REPO := github.com/kubeflow/spark-operator
MODULE_PATH := $(shell awk '/^module/{print $$2; exit}' go.mod)
SPARK_OPERATOR_GOPATH := /go/src/github.com/kubeflow/spark-operator
SPARK_OPERATOR_CHART_PATH := charts/spark-operator-chart
DEP_VERSION := `grep DEP_VERSION= Dockerfile | awk -F\" '{print $$2}'`
@ -35,8 +35,8 @@ UNAME := `uname | tr '[:upper:]' '[:lower:]'`
CONTAINER_TOOL ?= docker
# Image URL to use all building/pushing image targets
IMAGE_REGISTRY ?= docker.io
IMAGE_REPOSITORY ?= kubeflow/spark-operator
IMAGE_REGISTRY ?= ghcr.io
IMAGE_REPOSITORY ?= kubeflow/spark-operator/controller
IMAGE_TAG ?= $(VERSION)
IMAGE ?= $(IMAGE_REGISTRY)/$(IMAGE_REPOSITORY):$(IMAGE_TAG)
@ -56,15 +56,15 @@ KIND_K8S_VERSION ?= v1.32.0
ENVTEST_VERSION ?= release-0.20
# ENVTEST_K8S_VERSION refers to the version of kubebuilder assets to be downloaded by envtest binary.
ENVTEST_K8S_VERSION ?= 1.32.0
GOLANGCI_LINT_VERSION ?= v1.61.0
GOLANGCI_LINT_VERSION ?= v2.1.6
GEN_CRD_API_REFERENCE_DOCS_VERSION ?= v0.3.0
HELM_VERSION ?= v3.15.3
HELM_UNITTEST_VERSION ?= 0.5.1
HELM_DOCS_VERSION ?= v1.14.2
CODE_GENERATOR_VERSION ?= v0.33.1
## Binaries
SPARK_OPERATOR ?= $(LOCALBIN)/spark-operator
SPARKCTL ?= $(LOCALBIN)/sparkctl
KUBECTL ?= kubectl
KUSTOMIZE ?= $(LOCALBIN)/kustomize-$(KUSTOMIZE_VERSION)
CONTROLLER_GEN ?= $(LOCALBIN)/controller-gen-$(CONTROLLER_TOOLS_VERSION)
@ -119,6 +119,14 @@ generate: controller-gen ## Generate code containing DeepCopy, DeepCopyInto, and
update-crd: manifests ## Update CRD files in the Helm chart.
cp config/crd/bases/* charts/spark-operator-chart/crds/
.PHONY: verify-codegen
verify-codegen: $(LOCALBIN) ## Install code-generator commands and verify changes
$(call go-install-tool,$(LOCALBIN)/register-gen-$(CODE_GENERATOR_VERSION),k8s.io/code-generator/cmd/register-gen,$(CODE_GENERATOR_VERSION))
$(call go-install-tool,$(LOCALBIN)/client-gen-$(CODE_GENERATOR_VERSION),k8s.io/code-generator/cmd/client-gen,$(CODE_GENERATOR_VERSION))
$(call go-install-tool,$(LOCALBIN)/lister-gen-$(CODE_GENERATOR_VERSION),k8s.io/code-generator/cmd/lister-gen,$(CODE_GENERATOR_VERSION))
$(call go-install-tool,$(LOCALBIN)/informer-gen-$(CODE_GENERATOR_VERSION),k8s.io/code-generator/cmd/informer-gen,$(CODE_GENERATOR_VERSION))
./hack/verify-codegen.sh
.PHONY: go-clean
go-clean: ## Clean up caches and output.
@echo "cleaning up caches and output"
@ -164,37 +172,26 @@ e2e-test: envtest ## Run the e2e tests against a Kind k8s instance that is spun
##@ Build
override LDFLAGS += \
-X ${REPO}.version=${GIT_VERSION} \
-X ${REPO}.buildDate=${BUILD_DATE} \
-X ${REPO}.gitCommit=${GIT_COMMIT} \
-X ${REPO}.gitTreeState=${GIT_TREE_STATE} \
-X ${MODULE_PATH}.version=${GIT_VERSION} \
-X ${MODULE_PATH}.buildDate=${BUILD_DATE} \
-X ${MODULE_PATH}.gitCommit=${GIT_COMMIT} \
-X ${MODULE_PATH}.gitTreeState=${GIT_TREE_STATE} \
-extldflags "-static"
.PHONY: build-operator
build-operator: ## Build Spark operator.
echo "Building spark-operator binary..."
go build -o $(SPARK_OPERATOR) -ldflags '${LDFLAGS}' cmd/operator/main.go
.PHONY: build-sparkctl
build-sparkctl: ## Build sparkctl binary.
echo "Building sparkctl binary..."
CGO_ENABLED=0 go build -o $(SPARKCTL) -buildvcs=false cmd/sparkctl/main.go
.PHONY: install-sparkctl
install-sparkctl: build-sparkctl ## Install sparkctl binary.
echo "Installing sparkctl binary to /usr/local/bin..."; \
sudo cp $(SPARKCTL) /usr/local/bin
CGO_ENABLED=0 go build -o $(SPARK_OPERATOR) -ldflags '${LDFLAGS}' cmd/operator/main.go
.PHONY: clean
clean: ## Clean spark-operator and sparktcl binaries.
clean: ## Clean binaries.
rm -f $(SPARK_OPERATOR)
rm -f $(SPARKCTL)
.PHONY: build-api-docs
build-api-docs: gen-crd-api-reference-docs ## Build api documentation.
$(GEN_CRD_API_REFERENCE_DOCS) \
-config hack/api-docs/config.json \
-api-dir github.com/kubeflow/spark-operator/api/v1beta2 \
-api-dir github.com/kubeflow/spark-operator/v2/api/v1beta2 \
-template-dir hack/api-docs/template \
-out-file docs/api-docs.md
@ -311,7 +308,7 @@ $(ENVTEST): $(LOCALBIN)
.PHONY: golangci-lint
golangci-lint: $(GOLANGCI_LINT) ## Download golangci-lint locally if necessary.
$(GOLANGCI_LINT): $(LOCALBIN)
$(call go-install-tool,$(GOLANGCI_LINT),github.com/golangci/golangci-lint/cmd/golangci-lint,${GOLANGCI_LINT_VERSION})
$(call go-install-tool,$(GOLANGCI_LINT),github.com/golangci/golangci-lint/v2/cmd/golangci-lint,${GOLANGCI_LINT_VERSION})
.PHONY: gen-crd-api-reference-docs
gen-crd-api-reference-docs: $(GEN_CRD_API_REFERENCE_DOCS) ## Download gen-crd-api-reference-docs locally if necessary.

5
OWNERS
View File

@ -1,9 +1,10 @@
approvers:
- andreyvelich
- ChenYi015
- jacobsalway
- mwielgus
- yuchaoran2011
- vara-bonthu
- yuchaoran2011
reviewers:
- ImpSy
- jacobsalway
- nabuskey

14
PROJECT
View File

@ -13,17 +13,9 @@ resources:
namespaced: true
controller: true
domain: sparkoperator.k8s.io
kind: SparkApplication
path: github.com/kubeflow/spark-operator/api/v1beta1
version: v1beta1
- api:
crdVersion: v1
namespaced: true
controller: true
domain: sparkoperator.k8s.io
kind: ScheduledSparkApplication
path: github.com/kubeflow/spark-operator/api/v1beta1
version: v1beta1
kind: SparkConnect
path: github.com/kubeflow/spark-operator/api/v1alpha1
version: v1alpha1
- api:
crdVersion: v1
namespaced: true

View File

@ -3,6 +3,7 @@
[![Integration Test](https://github.com/kubeflow/spark-operator/actions/workflows/integration.yaml/badge.svg)](https://github.com/kubeflow/spark-operator/actions/workflows/integration.yaml)
[![Go Report Card](https://goreportcard.com/badge/github.com/kubeflow/spark-operator)](https://goreportcard.com/report/github.com/kubeflow/spark-operator)
[![GitHub release](https://img.shields.io/github/v/release/kubeflow/spark-operator)](https://github.com/kubeflow/spark-operator/releases)
[![OpenSSF Best Practices](https://www.bestpractices.dev/projects/10524/badge)](https://www.bestpractices.dev/projects/10524)
## What is Spark Operator?
@ -15,18 +16,22 @@ For a more detailed guide, please refer to the [Getting Started guide](https://w
```bash
# Add the Helm repository
helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm repo update
helm repo add --force-update spark-operator https://kubeflow.github.io/spark-operator
# Install the operator into the spark-operator namespace and wait for deployments to be ready
helm install spark-operator spark-operator/spark-operator \
--namespace spark-operator --create-namespace --wait
--namespace spark-operator \
--create-namespace \
--wait
# Create an example application in the default namespace
kubectl apply -f https://raw.githubusercontent.com/kubeflow/spark-operator/refs/heads/master/examples/spark-pi.yaml
# Get the status of the application
kubectl get sparkapp spark-pi
# Delete the application
kubectl delete sparkapp spark-pi
```
## Overview
@ -43,8 +48,6 @@ The Kubernetes Operator for Apache Spark currently supports the following list o
* Supports automatic application re-submission for updated `SparkApplication` objects with updated specification.
* Supports automatic application restart with a configurable restart policy.
* Supports automatic retries of failed submissions with optional linear back-off.
* Supports mounting local Hadoop configuration as a Kubernetes ConfigMap automatically via `sparkctl`.
* Supports automatically staging local application dependencies to Google Cloud Storage (GCS) via `sparkctl`.
* Supports collecting and exporting application-level metrics and driver/executor metrics to Prometheus.
## Project Status
@ -77,6 +80,8 @@ The following table lists the most recent few versions of the operator.
| Operator Version | API Version | Kubernetes Version | Base Spark Version |
|-----------------------|-------------|--------------------|--------------------|
| `v2.2.x` | `v1beta2` | 1.16+ | `3.5.5` |
| `v2.1.x` | `v1beta2` | 1.16+ | `3.5.3` |
| `v2.0.x` | `v1beta2` | 1.16+ | `3.5.2` |
| `v1beta2-1.6.x-3.5.0` | `v1beta2` | 1.16+ | `3.5.0` |
| `v1beta2-1.5.x-3.5.0` | `v1beta2` | 1.16+ | `3.5.0` |

View File

@ -1 +1 @@
v2.1.1
v2.2.1

View File

@ -0,0 +1,82 @@
/*
Copyright 2025 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package v1alpha1
// DeployMode describes the type of deployment of a Spark application.
type DeployMode string
// Different types of deployments.
const (
DeployModeCluster DeployMode = "cluster"
DeployModeClient DeployMode = "client"
)
// DriverState tells the current state of a spark driver.
type DriverState string
// Different states a spark driver may have.
const (
DriverStatePending DriverState = "PENDING"
DriverStateRunning DriverState = "RUNNING"
DriverStateCompleted DriverState = "COMPLETED"
DriverStateFailed DriverState = "FAILED"
DriverStateUnknown DriverState = "UNKNOWN"
)
// ExecutorState tells the current state of an executor.
type ExecutorState string
// Different states an executor may have.
const (
ExecutorStatePending ExecutorState = "PENDING"
ExecutorStateRunning ExecutorState = "RUNNING"
ExecutorStateCompleted ExecutorState = "COMPLETED"
ExecutorStateFailed ExecutorState = "FAILED"
ExecutorStateUnknown ExecutorState = "UNKNOWN"
)
// DynamicAllocation contains configuration options for dynamic allocation.
type DynamicAllocation struct {
// Enabled controls whether dynamic allocation is enabled or not.
// +optional
Enabled bool `json:"enabled,omitempty"`
// InitialExecutors is the initial number of executors to request. If .spec.executor.instances
// is also set, the initial number of executors is set to the bigger of that and this option.
// +optional
InitialExecutors *int32 `json:"initialExecutors,omitempty"`
// MinExecutors is the lower bound for the number of executors if dynamic allocation is enabled.
// +optional
MinExecutors *int32 `json:"minExecutors,omitempty"`
// MaxExecutors is the upper bound for the number of executors if dynamic allocation is enabled.
// +optional
MaxExecutors *int32 `json:"maxExecutors,omitempty"`
// ShuffleTrackingEnabled enables shuffle file tracking for executors, which allows dynamic allocation without
// the need for an external shuffle service. This option will try to keep alive executors that are storing
// shuffle data for active jobs. If external shuffle service is enabled, set ShuffleTrackingEnabled to false.
// ShuffleTrackingEnabled is true by default if dynamicAllocation.enabled is true.
// +optional
ShuffleTrackingEnabled *bool `json:"shuffleTrackingEnabled,omitempty"`
// ShuffleTrackingTimeout controls the timeout in milliseconds for executors that are holding
// shuffle data if shuffle tracking is enabled (true by default if dynamic allocation is enabled).
// +optional
ShuffleTrackingTimeout *int64 `json:"shuffleTrackingTimeout,omitempty"`
}

View File

@ -1,7 +1,5 @@
// Code generated by k8s code-generator DO NOT EDIT.
/*
Copyright 2018 Google LLC
Copyright 2025 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
@ -16,7 +14,8 @@ See the License for the specific language governing permissions and
limitations under the License.
*/
// Code generated by client-gen. DO NOT EDIT.
package v1alpha1
// Package fake has the automatically generated clients.
package fake
// SetSparkConnectDefaults sets default values for certain fields of a SparkConnect.
func SetSparkConnectDefaults(conn *SparkConnect) {
}

View File

@ -1,5 +1,5 @@
/*
Copyright 2017 Google LLC
Copyright 2025 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
@ -16,6 +16,7 @@ limitations under the License.
// +k8s:deepcopy-gen=package,register
// Package v1beta1 is the v1beta1 version of the API.
// Package v1alpha1 is the v1alpha1 version of the API.
// +groupName=sparkoperator.k8s.io
package v1beta1
// +versionName=v1alpha1
package v1alpha1

View File

@ -1,5 +1,5 @@
/*
Copyright 2024 The Kubeflow authors.
Copyright 2025 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
@ -14,10 +14,10 @@ See the License for the specific language governing permissions and
limitations under the License.
*/
// Package v1beta1 contains API Schema definitions for the v1beta1 API group
// Package v1alpha1 contains API Schema definitions for the v1alpha1 API group
// +kubebuilder:object:generate=true
// +groupName=sparkoperator.k8s.io
package v1beta1
package v1alpha1
import (
"k8s.io/apimachinery/pkg/runtime/schema"
@ -26,7 +26,7 @@ import (
var (
// GroupVersion is group version used to register these objects.
GroupVersion = schema.GroupVersion{Group: "sparkoperator.k8s.io", Version: "v1beta1"}
GroupVersion = schema.GroupVersion{Group: "sparkoperator.k8s.io", Version: "v1alpha1"}
// SchemeBuilder is used to add go types to the GroupVersionKind scheme.
SchemeBuilder = &scheme.Builder{GroupVersion: GroupVersion}

View File

@ -1,5 +1,5 @@
/*
Copyright 2024 The Kubeflow authors.
Copyright 2025 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
@ -14,7 +14,7 @@ See the License for the specific language governing permissions and
limitations under the License.
*/
package v1beta1
package v1alpha1
import (
"k8s.io/apimachinery/pkg/runtime/schema"
@ -22,7 +22,7 @@ import (
const (
Group = "sparkoperator.k8s.io"
Version = "v1beta1"
Version = "v1alpha1"
)
// SchemeGroupVersion is the group version used to register these objects.

View File

@ -0,0 +1,185 @@
/*
Copyright 2025 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package v1alpha1
import (
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
func init() {
SchemeBuilder.Register(&SparkConnect{}, &SparkConnectList{})
}
// +kubebuilder:object:root=true
// +kubebuilder:metadata:annotations="api-approved.kubernetes.io=https://github.com/kubeflow/spark-operator/pull/1298"
// +kubebuilder:resource:scope=Namespaced,shortName=sparkconn,singular=sparkconnect
// +kubebuilder:subresource:status
// +kubebuilder:printcolumn:JSONPath=.metadata.creationTimestamp,name=Age,type=date
// SparkConnect is the Schema for the sparkconnections API.
type SparkConnect struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata"`
Spec SparkConnectSpec `json:"spec"`
Status SparkConnectStatus `json:"status,omitempty"`
}
// +kubebuilder:object:root=true
// SparkConnectList contains a list of SparkConnect.
type SparkConnectList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []SparkConnect `json:"items"`
}
// SparkConnectSpec defines the desired state of SparkConnect.
type SparkConnectSpec struct {
// SparkVersion is the version of Spark the spark connect use.
SparkVersion string `json:"sparkVersion"`
// Image is the container image for the driver, executor, and init-container. Any custom container images for the
// driver, executor, or init-container takes precedence over this.
// +optional
Image *string `json:"image,omitempty"`
// HadoopConf carries user-specified Hadoop configuration properties as they would use the "--conf" option
// in spark-submit. The SparkApplication controller automatically adds prefix "spark.hadoop." to Hadoop
// configuration properties.
// +optional
HadoopConf map[string]string `json:"hadoopConf,omitempty"`
// SparkConf carries user-specified Spark configuration properties as they would use the "--conf" option in
// spark-submit.
// +optional
SparkConf map[string]string `json:"sparkConf,omitempty"`
// Server is the Spark connect server specification.
Server ServerSpec `json:"server"`
// Executor is the Spark executor specification.
Executor ExecutorSpec `json:"executor"`
// DynamicAllocation configures dynamic allocation that becomes available for the Kubernetes
// scheduler backend since Spark 3.0.
// +optional
DynamicAllocation *DynamicAllocation `json:"dynamicAllocation,omitempty"`
}
// ServerSpec is specification of the Spark connect server.
type ServerSpec struct {
SparkPodSpec `json:",inline"`
}
// ExecutorSpec is specification of the executor.
type ExecutorSpec struct {
SparkPodSpec `json:",inline"`
// Instances is the number of executor instances.
// +optional
// +kubebuilder:validation:Minimum=0
Instances *int32 `json:"instances,omitempty"`
}
// SparkPodSpec defines common things that can be customized for a Spark driver or executor pod.
type SparkPodSpec struct {
// Cores maps to `spark.driver.cores` or `spark.executor.cores` for the driver and executors, respectively.
// +optional
// +kubebuilder:validation:Minimum=1
Cores *int32 `json:"cores,omitempty"`
// Memory is the amount of memory to request for the pod.
// +optional
Memory *string `json:"memory,omitempty"`
// Template is a pod template that can be used to define the driver or executor pod configurations that Spark configurations do not support.
// Spark version >= 3.0.0 is required.
// Ref: https://spark.apache.org/docs/latest/running-on-kubernetes.html#pod-template.
// +optional
// +kubebuilder:validation:Schemaless
// +kubebuilder:validation:Type:=object
// +kubebuilder:pruning:PreserveUnknownFields
Template *corev1.PodTemplateSpec `json:"template,omitempty"`
}
// SparkConnectStatus defines the observed state of SparkConnect.
type SparkConnectStatus struct {
// Represents the latest available observations of a SparkConnect's current state.
// +patchMergeKey=type
// +patchStrategy=merge
// +listType=map
// +listMapKey=type
// +optional
Conditions []metav1.Condition `json:"conditions,omitempty" patchMergeKey:"type" patchStrategy:"merge"`
// State represents the current state of the SparkConnect.
State SparkConnectState `json:"state,omitempty"`
// Server represents the current state of the SparkConnect server.
Server SparkConnectServerStatus `json:"server,omitempty"`
// Executors represents the current state of the SparkConnect executors.
Executors map[string]int `json:"executors,omitempty"`
// StartTime is the time at which the SparkConnect controller started processing the SparkConnect.
StartTime metav1.Time `json:"startTime,omitempty"`
// LastUpdateTime is the time at which the SparkConnect controller last updated the SparkConnect.
LastUpdateTime metav1.Time `json:"lastUpdateTime,omitempty"`
}
// SparkConnectConditionType represents the condition types of the SparkConnect.
type SparkConnectConditionType string
// All possible condition types of the SparkConnect.
const (
SparkConnectConditionServerPodReady SparkConnectConditionType = "ServerPodReady"
)
// SparkConnectConditionReason represents the reason of SparkConnect conditions.
type SparkConnectConditionReason string
// All possible reasons of SparkConnect conditions.
const (
SparkConnectConditionReasonServerPodReady SparkConnectConditionReason = "ServerPodReady"
SparkConnectConditionReasonServerPodNotReady SparkConnectConditionReason = "ServerPodNotReady"
)
// SparkConnectState represents the current state of the SparkConnect.
type SparkConnectState string
// All possible states of the SparkConnect.
const (
SparkConnectStateNew SparkConnectState = ""
SparkConnectStateProvisioning SparkConnectState = "Provisioning"
SparkConnectStateReady SparkConnectState = "Ready"
SparkConnectStateNotReady SparkConnectState = "NotReady"
SparkConnectStateFailed SparkConnectState = "Failed"
)
type SparkConnectServerStatus struct {
// PodName is the name of the pod that is running the Spark Connect server.
PodName string `json:"podName,omitempty"`
// PodIP is the IP address of the pod that is running the Spark Connect server.
PodIP string `json:"podIp,omitempty"`
// ServiceName is the name of the service that is exposing the Spark Connect server.
ServiceName string `json:"serviceName,omitempty"`
}

View File

@ -0,0 +1,281 @@
//go:build !ignore_autogenerated
/*
Copyright 2025 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
// Code generated by controller-gen. DO NOT EDIT.
package v1alpha1
import (
"k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
runtime "k8s.io/apimachinery/pkg/runtime"
)
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *DynamicAllocation) DeepCopyInto(out *DynamicAllocation) {
*out = *in
if in.InitialExecutors != nil {
in, out := &in.InitialExecutors, &out.InitialExecutors
*out = new(int32)
**out = **in
}
if in.MinExecutors != nil {
in, out := &in.MinExecutors, &out.MinExecutors
*out = new(int32)
**out = **in
}
if in.MaxExecutors != nil {
in, out := &in.MaxExecutors, &out.MaxExecutors
*out = new(int32)
**out = **in
}
if in.ShuffleTrackingEnabled != nil {
in, out := &in.ShuffleTrackingEnabled, &out.ShuffleTrackingEnabled
*out = new(bool)
**out = **in
}
if in.ShuffleTrackingTimeout != nil {
in, out := &in.ShuffleTrackingTimeout, &out.ShuffleTrackingTimeout
*out = new(int64)
**out = **in
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new DynamicAllocation.
func (in *DynamicAllocation) DeepCopy() *DynamicAllocation {
if in == nil {
return nil
}
out := new(DynamicAllocation)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *ExecutorSpec) DeepCopyInto(out *ExecutorSpec) {
*out = *in
in.SparkPodSpec.DeepCopyInto(&out.SparkPodSpec)
if in.Instances != nil {
in, out := &in.Instances, &out.Instances
*out = new(int32)
**out = **in
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ExecutorSpec.
func (in *ExecutorSpec) DeepCopy() *ExecutorSpec {
if in == nil {
return nil
}
out := new(ExecutorSpec)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *ServerSpec) DeepCopyInto(out *ServerSpec) {
*out = *in
in.SparkPodSpec.DeepCopyInto(&out.SparkPodSpec)
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ServerSpec.
func (in *ServerSpec) DeepCopy() *ServerSpec {
if in == nil {
return nil
}
out := new(ServerSpec)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *SparkConnect) DeepCopyInto(out *SparkConnect) {
*out = *in
out.TypeMeta = in.TypeMeta
in.ObjectMeta.DeepCopyInto(&out.ObjectMeta)
in.Spec.DeepCopyInto(&out.Spec)
in.Status.DeepCopyInto(&out.Status)
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SparkConnect.
func (in *SparkConnect) DeepCopy() *SparkConnect {
if in == nil {
return nil
}
out := new(SparkConnect)
in.DeepCopyInto(out)
return out
}
// DeepCopyObject is an autogenerated deepcopy function, copying the receiver, creating a new runtime.Object.
func (in *SparkConnect) DeepCopyObject() runtime.Object {
if c := in.DeepCopy(); c != nil {
return c
}
return nil
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *SparkConnectList) DeepCopyInto(out *SparkConnectList) {
*out = *in
out.TypeMeta = in.TypeMeta
in.ListMeta.DeepCopyInto(&out.ListMeta)
if in.Items != nil {
in, out := &in.Items, &out.Items
*out = make([]SparkConnect, len(*in))
for i := range *in {
(*in)[i].DeepCopyInto(&(*out)[i])
}
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SparkConnectList.
func (in *SparkConnectList) DeepCopy() *SparkConnectList {
if in == nil {
return nil
}
out := new(SparkConnectList)
in.DeepCopyInto(out)
return out
}
// DeepCopyObject is an autogenerated deepcopy function, copying the receiver, creating a new runtime.Object.
func (in *SparkConnectList) DeepCopyObject() runtime.Object {
if c := in.DeepCopy(); c != nil {
return c
}
return nil
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *SparkConnectServerStatus) DeepCopyInto(out *SparkConnectServerStatus) {
*out = *in
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SparkConnectServerStatus.
func (in *SparkConnectServerStatus) DeepCopy() *SparkConnectServerStatus {
if in == nil {
return nil
}
out := new(SparkConnectServerStatus)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *SparkConnectSpec) DeepCopyInto(out *SparkConnectSpec) {
*out = *in
if in.Image != nil {
in, out := &in.Image, &out.Image
*out = new(string)
**out = **in
}
if in.HadoopConf != nil {
in, out := &in.HadoopConf, &out.HadoopConf
*out = make(map[string]string, len(*in))
for key, val := range *in {
(*out)[key] = val
}
}
if in.SparkConf != nil {
in, out := &in.SparkConf, &out.SparkConf
*out = make(map[string]string, len(*in))
for key, val := range *in {
(*out)[key] = val
}
}
in.Server.DeepCopyInto(&out.Server)
in.Executor.DeepCopyInto(&out.Executor)
if in.DynamicAllocation != nil {
in, out := &in.DynamicAllocation, &out.DynamicAllocation
*out = new(DynamicAllocation)
(*in).DeepCopyInto(*out)
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SparkConnectSpec.
func (in *SparkConnectSpec) DeepCopy() *SparkConnectSpec {
if in == nil {
return nil
}
out := new(SparkConnectSpec)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *SparkConnectStatus) DeepCopyInto(out *SparkConnectStatus) {
*out = *in
if in.Conditions != nil {
in, out := &in.Conditions, &out.Conditions
*out = make([]metav1.Condition, len(*in))
for i := range *in {
(*in)[i].DeepCopyInto(&(*out)[i])
}
}
out.Server = in.Server
if in.Executors != nil {
in, out := &in.Executors, &out.Executors
*out = make(map[string]int, len(*in))
for key, val := range *in {
(*out)[key] = val
}
}
in.StartTime.DeepCopyInto(&out.StartTime)
in.LastUpdateTime.DeepCopyInto(&out.LastUpdateTime)
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SparkConnectStatus.
func (in *SparkConnectStatus) DeepCopy() *SparkConnectStatus {
if in == nil {
return nil
}
out := new(SparkConnectStatus)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *SparkPodSpec) DeepCopyInto(out *SparkPodSpec) {
*out = *in
if in.Cores != nil {
in, out := &in.Cores, &out.Cores
*out = new(int32)
**out = **in
}
if in.Memory != nil {
in, out := &in.Memory, &out.Memory
*out = new(string)
**out = **in
}
if in.Template != nil {
in, out := &in.Template, &out.Template
*out = new(v1.PodTemplateSpec)
(*in).DeepCopyInto(*out)
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SparkPodSpec.
func (in *SparkPodSpec) DeepCopy() *SparkPodSpec {
if in == nil {
return nil
}
out := new(SparkPodSpec)
in.DeepCopyInto(out)
return out
}

View File

@ -1,74 +0,0 @@
/*
Copyright 2017 Google LLC
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package v1beta1
// SetSparkApplicationDefaults sets default values for certain fields of a SparkApplication.
func SetSparkApplicationDefaults(app *SparkApplication) {
if app == nil {
return
}
if app.Spec.Mode == "" {
app.Spec.Mode = ClusterMode
}
if app.Spec.RestartPolicy.Type == "" {
app.Spec.RestartPolicy.Type = Never
}
if app.Spec.RestartPolicy.Type != Never {
// Default to 5 sec if the RestartPolicy is OnFailure or Always and these values aren't specified.
if app.Spec.RestartPolicy.OnFailureRetryInterval == nil {
app.Spec.RestartPolicy.OnFailureRetryInterval = new(int64)
*app.Spec.RestartPolicy.OnFailureRetryInterval = 5
}
if app.Spec.RestartPolicy.OnSubmissionFailureRetryInterval == nil {
app.Spec.RestartPolicy.OnSubmissionFailureRetryInterval = new(int64)
*app.Spec.RestartPolicy.OnSubmissionFailureRetryInterval = 5
}
}
setDriverSpecDefaults(&app.Spec.Driver, app.Spec.SparkConf)
setExecutorSpecDefaults(&app.Spec.Executor, app.Spec.SparkConf)
}
func setDriverSpecDefaults(spec *DriverSpec, sparkConf map[string]string) {
if _, exists := sparkConf["spark.driver.cores"]; !exists && spec.Cores == nil {
spec.Cores = new(float32)
*spec.Cores = 1
}
if _, exists := sparkConf["spark.driver.memory"]; !exists && spec.Memory == nil {
spec.Memory = new(string)
*spec.Memory = "1g"
}
}
func setExecutorSpecDefaults(spec *ExecutorSpec, sparkConf map[string]string) {
if _, exists := sparkConf["spark.executor.cores"]; !exists && spec.Cores == nil {
spec.Cores = new(float32)
*spec.Cores = 1
}
if _, exists := sparkConf["spark.executor.memory"]; !exists && spec.Memory == nil {
spec.Memory = new(string)
*spec.Memory = "1g"
}
if _, exists := sparkConf["spark.executor.instances"]; !exists && spec.Instances == nil {
spec.Instances = new(int32)
*spec.Instances = 1
}
}

View File

@ -1,104 +0,0 @@
/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package v1beta1
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// EDIT THIS FILE! THIS IS SCAFFOLDING FOR YOU TO OWN!
// NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized.
// +kubebuilder:skip
func init() {
SchemeBuilder.Register(&ScheduledSparkApplication{}, &ScheduledSparkApplicationList{})
}
// ScheduledSparkApplicationSpec defines the desired state of ScheduledSparkApplication
type ScheduledSparkApplicationSpec struct {
// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
// Important: Run "make generate" to regenerate code after modifying this file
// Schedule is a cron schedule on which the application should run.
Schedule string `json:"schedule"`
// Template is a template from which SparkApplication instances can be created.
Template SparkApplicationSpec `json:"template"`
// Suspend is a flag telling the controller to suspend subsequent runs of the application if set to true.
// Optional.
// Defaults to false.
Suspend *bool `json:"suspend,omitempty"`
// ConcurrencyPolicy is the policy governing concurrent SparkApplication runs.
ConcurrencyPolicy ConcurrencyPolicy `json:"concurrencyPolicy,omitempty"`
// SuccessfulRunHistoryLimit is the number of past successful runs of the application to keep.
// Optional.
// Defaults to 1.
SuccessfulRunHistoryLimit *int32 `json:"successfulRunHistoryLimit,omitempty"`
// FailedRunHistoryLimit is the number of past failed runs of the application to keep.
// Optional.
// Defaults to 1.
FailedRunHistoryLimit *int32 `json:"failedRunHistoryLimit,omitempty"`
}
// ScheduledSparkApplicationStatus defines the observed state of ScheduledSparkApplication
type ScheduledSparkApplicationStatus struct {
// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
// Important: Run "make generate" to regenerate code after modifying this file
// LastRun is the time when the last run of the application started.
LastRun metav1.Time `json:"lastRun,omitempty"`
// NextRun is the time when the next run of the application will start.
NextRun metav1.Time `json:"nextRun,omitempty"`
// LastRunName is the name of the SparkApplication for the most recent run of the application.
LastRunName string `json:"lastRunName,omitempty"`
// PastSuccessfulRunNames keeps the names of SparkApplications for past successful runs.
PastSuccessfulRunNames []string `json:"pastSuccessfulRunNames,omitempty"`
// PastFailedRunNames keeps the names of SparkApplications for past failed runs.
PastFailedRunNames []string `json:"pastFailedRunNames,omitempty"`
// ScheduleState is the current scheduling state of the application.
ScheduleState ScheduleState `json:"scheduleState,omitempty"`
// Reason tells why the ScheduledSparkApplication is in the particular ScheduleState.
Reason string `json:"reason,omitempty"`
}
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// ScheduledSparkApplication is the Schema for the scheduledsparkapplications API
type ScheduledSparkApplication struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec ScheduledSparkApplicationSpec `json:"spec,omitempty"`
Status ScheduledSparkApplicationStatus `json:"status,omitempty"`
}
// +kubebuilder:object:root=true
// ScheduledSparkApplicationList contains a list of ScheduledSparkApplication
type ScheduledSparkApplicationList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []ScheduledSparkApplication `json:"items"`
}
type ScheduleState string
const (
FailedValidationState ScheduleState = "FailedValidation"
ScheduledState ScheduleState = "Scheduled"
)

View File

@ -1,497 +0,0 @@
/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
// +kubebuilder:skip
package v1beta1
import (
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// EDIT THIS FILE! THIS IS SCAFFOLDING FOR YOU TO OWN!
// NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized.
// +kubebuilder:skip
func init() {
SchemeBuilder.Register(&SparkApplication{}, &SparkApplicationList{})
}
// SparkApplicationSpec defines the desired state of SparkApplication
type SparkApplicationSpec struct {
// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
// Important: Run "make generate" to regenerate code after modifying this file
// Type tells the type of the Spark application.
Type SparkApplicationType `json:"type"`
// SparkVersion is the version of Spark the application uses.
SparkVersion string `json:"sparkVersion"`
// Mode is the deployment mode of the Spark application.
Mode DeployMode `json:"mode,omitempty"`
// Image is the container image for the driver, executor, and init-container. Any custom container images for the
// driver, executor, or init-container takes precedence over this.
// Optional.
Image *string `json:"image,omitempty"`
// InitContainerImage is the image of the init-container to use. Overrides Spec.Image if set.
// Optional.
InitContainerImage *string `json:"initContainerImage,omitempty"`
// ImagePullPolicy is the image pull policy for the driver, executor, and init-container.
// Optional.
ImagePullPolicy *string `json:"imagePullPolicy,omitempty"`
// ImagePullSecrets is the list of image-pull secrets.
// Optional.
ImagePullSecrets []string `json:"imagePullSecrets,omitempty"`
// MainClass is the fully-qualified main class of the Spark application.
// This only applies to Java/Scala Spark applications.
// Optional.
MainClass *string `json:"mainClass,omitempty"`
// MainFile is the path to a bundled JAR, Python, or R file of the application.
// Optional.
MainApplicationFile *string `json:"mainApplicationFile"`
// Arguments is a list of arguments to be passed to the application.
// Optional.
Arguments []string `json:"arguments,omitempty"`
// SparkConf carries user-specified Spark configuration properties as they would use the "--conf" option in
// spark-submit.
// Optional.
SparkConf map[string]string `json:"sparkConf,omitempty"`
// HadoopConf carries user-specified Hadoop configuration properties as they would use the "--conf" option
// in spark-submit. The SparkApplication controller automatically adds prefix "spark.hadoop." to Hadoop
// configuration properties.
// Optional.
HadoopConf map[string]string `json:"hadoopConf,omitempty"`
// SparkConfigMap carries the name of the ConfigMap containing Spark configuration files such as log4j.properties.
// The controller will add environment variable SPARK_CONF_DIR to the path where the ConfigMap is mounted to.
// Optional.
SparkConfigMap *string `json:"sparkConfigMap,omitempty"`
// HadoopConfigMap carries the name of the ConfigMap containing Hadoop configuration files such as core-site.xml.
// The controller will add environment variable HADOOP_CONF_DIR to the path where the ConfigMap is mounted to.
// Optional.
HadoopConfigMap *string `json:"hadoopConfigMap,omitempty"`
// Volumes is the list of Kubernetes volumes that can be mounted by the driver and/or executors.
// Optional.
Volumes []corev1.Volume `json:"volumes,omitempty"`
// Driver is the driver specification.
Driver DriverSpec `json:"driver"`
// Executor is the executor specification.
Executor ExecutorSpec `json:"executor"`
// Deps captures all possible types of dependencies of a Spark application.
Deps Dependencies `json:"deps"`
// RestartPolicy defines the policy on if and in which conditions the controller should restart an application.
RestartPolicy RestartPolicy `json:"restartPolicy,omitempty"`
// NodeSelector is the Kubernetes node selector to be added to the driver and executor pods.
// This field is mutually exclusive with nodeSelector at podSpec level (driver or executor).
// This field will be deprecated in future versions (at SparkApplicationSpec level).
// Optional.
NodeSelector map[string]string `json:"nodeSelector,omitempty"`
// FailureRetries is the number of times to retry a failed application before giving up.
// This is best effort and actual retry attempts can be >= the value specified.
// Optional.
FailureRetries *int32 `json:"failureRetries,omitempty"`
// RetryInterval is the unit of intervals in seconds between submission retries.
// Optional.
RetryInterval *int64 `json:"retryInterval,omitempty"`
// This sets the major Python version of the docker
// image used to run the driver and executor containers. Can either be 2 or 3, default 2.
// Optional.
PythonVersion *string `json:"pythonVersion,omitempty"`
// This sets the Memory Overhead Factor that will allocate memory to non-JVM memory.
// For JVM-based jobs this value will default to 0.10, for non-JVM jobs 0.40. Value of this field will
// be overridden by `Spec.Driver.MemoryOverhead` and `Spec.Executor.MemoryOverhead` if they are set.
// Optional.
MemoryOverheadFactor *string `json:"memoryOverheadFactor,omitempty"`
// Monitoring configures how monitoring is handled.
// Optional.
Monitoring *MonitoringSpec `json:"monitoring,omitempty"`
// BatchScheduler configures which batch scheduler will be used for scheduling
// Optional.
BatchScheduler *string `json:"batchScheduler,omitempty"`
}
// SparkApplicationStatus defines the observed state of SparkApplication
type SparkApplicationStatus struct {
// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
// Important: Run "make generate" to regenerate code after modifying this file
// SparkApplicationID is set by the spark-distribution(via spark.app.id config) on the driver and executor pods
SparkApplicationID string `json:"sparkApplicationId,omitempty"`
// SubmissionID is a unique ID of the current submission of the application.
SubmissionID string `json:"submissionID,omitempty"`
// LastSubmissionAttemptTime is the time for the last application submission attempt.
LastSubmissionAttemptTime metav1.Time `json:"lastSubmissionAttemptTime,omitempty"`
// CompletionTime is the time when the application runs to completion if it does.
TerminationTime metav1.Time `json:"terminationTime,omitempty"`
// DriverInfo has information about the driver.
DriverInfo DriverInfo `json:"driverInfo"`
// AppState tells the overall application state.
AppState ApplicationState `json:"applicationState,omitempty"`
// ExecutorState records the state of executors by executor Pod names.
ExecutorState map[string]ExecutorState `json:"executorState,omitempty"`
// ExecutionAttempts is the total number of attempts to run a submitted application to completion.
// Incremented upon each attempted run of the application and reset upon invalidation.
ExecutionAttempts int32 `json:"executionAttempts,omitempty"`
// SubmissionAttempts is the total number of attempts to submit an application to run.
// Incremented upon each attempted submission of the application and reset upon invalidation and rerun.
SubmissionAttempts int32 `json:"submissionAttempts,omitempty"`
}
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// SparkApplication is the Schema for the sparkapplications API
type SparkApplication struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec SparkApplicationSpec `json:"spec,omitempty"`
Status SparkApplicationStatus `json:"status,omitempty"`
}
// +kubebuilder:object:root=true
// SparkApplicationList contains a list of SparkApplication
type SparkApplicationList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []SparkApplication `json:"items"`
}
// SparkApplicationType describes the type of a Spark application.
type SparkApplicationType string
// Different types of Spark applications.
const (
JavaApplicationType SparkApplicationType = "Java"
ScalaApplicationType SparkApplicationType = "Scala"
PythonApplicationType SparkApplicationType = "Python"
RApplicationType SparkApplicationType = "R"
)
// DeployMode describes the type of deployment of a Spark application.
type DeployMode string
// Different types of deployments.
const (
ClusterMode DeployMode = "cluster"
ClientMode DeployMode = "client"
InClusterClientMode DeployMode = "in-cluster-client"
)
// RestartPolicy is the policy of if and in which conditions the controller should restart a terminated application.
// This completely defines actions to be taken on any kind of Failures during an application run.
type RestartPolicy struct {
Type RestartPolicyType `json:"type,omitempty"`
// FailureRetries are the number of times to retry a failed application before giving up in a particular case.
// This is best effort and actual retry attempts can be >= the value specified due to caching.
// These are required if RestartPolicy is OnFailure.
OnSubmissionFailureRetries *int32 `json:"onSubmissionFailureRetries,omitempty"`
OnFailureRetries *int32 `json:"onFailureRetries,omitempty"`
// Interval to wait between successive retries of a failed application.
OnSubmissionFailureRetryInterval *int64 `json:"onSubmissionFailureRetryInterval,omitempty"`
OnFailureRetryInterval *int64 `json:"onFailureRetryInterval,omitempty"`
}
type RestartPolicyType string
const (
Never RestartPolicyType = "Never"
OnFailure RestartPolicyType = "OnFailure"
Always RestartPolicyType = "Always"
)
type ConcurrencyPolicy string
const (
// ConcurrencyAllow allows SparkApplications to run concurrently.
ConcurrencyAllow ConcurrencyPolicy = "Allow"
// ConcurrencyForbid forbids concurrent runs of SparkApplications, skipping the next run if the previous
// one hasn't finished yet.
ConcurrencyForbid ConcurrencyPolicy = "Forbid"
// ConcurrencyReplace kills the currently running SparkApplication instance and replaces it with a new one.
ConcurrencyReplace ConcurrencyPolicy = "Replace"
)
// ApplicationStateType represents the type of the current state of an application.
type ApplicationStateType string
// Different states an application may have.
const (
NewState ApplicationStateType = ""
SubmittedState ApplicationStateType = "SUBMITTED"
RunningState ApplicationStateType = "RUNNING"
CompletedState ApplicationStateType = "COMPLETED"
FailedState ApplicationStateType = "FAILED"
FailedSubmissionState ApplicationStateType = "SUBMISSION_FAILED"
PendingRerunState ApplicationStateType = "PENDING_RERUN"
InvalidatingState ApplicationStateType = "INVALIDATING"
SucceedingState ApplicationStateType = "SUCCEEDING"
FailingState ApplicationStateType = "FAILING"
UnknownState ApplicationStateType = "UNKNOWN"
)
// ApplicationState tells the current state of the application and an error message in case of failures.
type ApplicationState struct {
State ApplicationStateType `json:"state"`
ErrorMessage string `json:"errorMessage,omitempty"`
}
// ExecutorState tells the current state of an executor.
type ExecutorState string
// Different states an executor may have.
const (
ExecutorPendingState ExecutorState = "PENDING"
ExecutorRunningState ExecutorState = "RUNNING"
ExecutorCompletedState ExecutorState = "COMPLETED"
ExecutorFailedState ExecutorState = "FAILED"
ExecutorUnknownState ExecutorState = "UNKNOWN"
)
// Dependencies specifies all possible types of dependencies of a Spark application.
type Dependencies struct {
// Jars is a list of JAR files the Spark application depends on.
// Optional.
Jars []string `json:"jars,omitempty"`
// Files is a list of files the Spark application depends on.
// Optional.
Files []string `json:"files,omitempty"`
// PyFiles is a list of Python files the Spark application depends on.
// Optional.
PyFiles []string `json:"pyFiles,omitempty"`
// JarsDownloadDir is the location to download jars to in the driver and executors.
JarsDownloadDir *string `json:"jarsDownloadDir,omitempty"`
// FilesDownloadDir is the location to download files to in the driver and executors.
FilesDownloadDir *string `json:"filesDownloadDir,omitempty"`
// DownloadTimeout specifies the timeout in seconds before aborting the attempt to download
// and unpack dependencies from remote locations into the driver and executor pods.
DownloadTimeout *int32 `json:"downloadTimeout,omitempty"`
// MaxSimultaneousDownloads specifies the maximum number of remote dependencies to download
// simultaneously in a driver or executor pod.
MaxSimultaneousDownloads *int32 `json:"maxSimultaneousDownloads,omitempty"`
}
// SparkPodSpec defines common things that can be customized for a Spark driver or executor pod.
// TODO: investigate if we should use v1.PodSpec and limit what can be set instead.
type SparkPodSpec struct {
// Cores is the number of CPU cores to request for the pod.
// Optional.
Cores *float32 `json:"cores,omitempty"`
// CoreLimit specifies a hard limit on CPU cores for the pod.
// Optional
CoreLimit *string `json:"coreLimit,omitempty"`
// Memory is the amount of memory to request for the pod.
// Optional.
Memory *string `json:"memory,omitempty"`
// MemoryOverhead is the amount of off-heap memory to allocate in cluster mode, in MiB unless otherwise specified.
// Optional.
MemoryOverhead *string `json:"memoryOverhead,omitempty"`
// GPU specifies GPU requirement for the pod.
// Optional.
GPU *GPUSpec `json:"gpu,omitempty"`
// Image is the container image to use. Overrides Spec.Image if set.
// Optional.
Image *string `json:"image,omitempty"`
// ConfigMaps carries information of other ConfigMaps to add to the pod.
// Optional.
ConfigMaps []NamePath `json:"configMaps,omitempty"`
// Secrets carries information of secrets to add to the pod.
// Optional.
Secrets []SecretInfo `json:"secrets,omitempty"`
// EnvVars carries the environment variables to add to the pod.
// Optional.
EnvVars map[string]string `json:"envVars,omitempty"`
// EnvSecretKeyRefs holds a mapping from environment variable names to SecretKeyRefs.
// Optional.
EnvSecretKeyRefs map[string]NameKey `json:"envSecretKeyRefs,omitempty"`
// Labels are the Kubernetes labels to be added to the pod.
// Optional.
Labels map[string]string `json:"labels,omitempty"`
// Annotations are the Kubernetes annotations to be added to the pod.
// Optional.
Annotations map[string]string `json:"annotations,omitempty"`
// VolumeMounts specifies the volumes listed in ".spec.volumes" to mount into the main container's filesystem.
// Optional.
VolumeMounts []corev1.VolumeMount `json:"volumeMounts,omitempty"`
// Affinity specifies the affinity/anti-affinity settings for the pod.
// Optional.
Affinity *corev1.Affinity `json:"affinity,omitempty"`
// Tolerations specifies the tolerations listed in ".spec.tolerations" to be applied to the pod.
// Optional.
Tolerations []corev1.Toleration `json:"tolerations,omitempty"`
// SecurityContext specifies the PodSecurityContext to apply.
// Optional.
SecurityContext *corev1.PodSecurityContext `json:"securityContext,omitempty"`
// SchedulerName specifies the scheduler that will be used for scheduling
// Optional.
SchedulerName *string `json:"schedulerName,omitempty"`
// Sidecars is a list of sidecar containers that run along side the main Spark container.
// Optional.
Sidecars []corev1.Container `json:"sidecars,omitempty"`
// HostNetwork indicates whether to request host networking for the pod or not.
// Optional.
HostNetwork *bool `json:"hostNetwork,omitempty"`
// NodeSelector is the Kubernetes node selector to be added to the driver and executor pods.
// This field is mutually exclusive with nodeSelector at SparkApplication level (which will be deprecated).
// Optional.
NodeSelector map[string]string `json:"nodeSelector,omitempty"`
// DnsConfig dns settings for the pod, following the Kubernetes specifications.
// Optional.
DNSConfig *corev1.PodDNSConfig `json:"dnsConfig,omitempty"`
}
// DriverSpec is specification of the driver.
type DriverSpec struct {
SparkPodSpec `json:",inline"`
// PodName is the name of the driver pod that the user creates. This is used for the
// in-cluster client mode in which the user creates a client pod where the driver of
// the user application runs. It's an error to set this field if Mode is not
// in-cluster-client.
// Optional.
PodName *string `json:"podName,omitempty"`
// ServiceAccount is the name of the Kubernetes service account used by the driver pod
// when requesting executor pods from the API server.
ServiceAccount *string `json:"serviceAccount,omitempty"`
// JavaOptions is a string of extra JVM options to pass to the driver. For instance,
// GC settings or other logging.
JavaOptions *string `json:"javaOptions,omitempty"`
}
// ExecutorSpec is specification of the executor.
type ExecutorSpec struct {
SparkPodSpec `json:",inline"`
// Instances is the number of executor instances.
// Optional.
Instances *int32 `json:"instances,omitempty"`
// CoreRequest is the physical CPU core request for the executors.
// Optional.
CoreRequest *string `json:"coreRequest,omitempty"`
// JavaOptions is a string of extra JVM options to pass to the executors. For instance,
// GC settings or other logging.
JavaOptions *string `json:"javaOptions,omitempty"`
}
// NamePath is a pair of a name and a path to which the named objects should be mounted to.
type NamePath struct {
Name string `json:"name"`
Path string `json:"path"`
}
// SecretType tells the type of a secret.
type SecretType string
// An enumeration of secret types supported.
const (
// GCPServiceAccountSecret is for secrets from a GCP service account Json key file that needs
// the environment variable GOOGLE_APPLICATION_CREDENTIALS.
GCPServiceAccountSecret SecretType = "GCPServiceAccount"
// HadoopDelegationTokenSecret is for secrets from an Hadoop delegation token that needs the
// environment variable HADOOP_TOKEN_FILE_LOCATION.
HadoopDelegationTokenSecret SecretType = "HadoopDelegationToken"
// GenericType is for secrets that needs no special handling.
GenericType SecretType = "Generic"
)
// DriverInfo captures information about the driver.
type DriverInfo struct {
WebUIServiceName string `json:"webUIServiceName,omitempty"`
// UI Details for the UI created via ClusterIP service accessible from within the cluster.
WebUIPort int32 `json:"webUIPort,omitempty"`
WebUIAddress string `json:"webUIAddress,omitempty"`
// Ingress Details if an ingress for the UI was created.
WebUIIngressName string `json:"webUIIngressName,omitempty"`
WebUIIngressAddress string `json:"webUIIngressAddress,omitempty"`
PodName string `json:"podName,omitempty"`
}
// SecretInfo captures information of a secret.
type SecretInfo struct {
Name string `json:"name"`
Path string `json:"path"`
Type SecretType `json:"secretType"`
}
// NameKey represents the name and key of a SecretKeyRef.
type NameKey struct {
Name string `json:"name"`
Key string `json:"key"`
}
// MonitoringSpec defines the monitoring specification.
type MonitoringSpec struct {
// ExposeDriverMetrics specifies whether to expose metrics on the driver.
ExposeDriverMetrics bool `json:"exposeDriverMetrics"`
// ExposeExecutorMetrics specifies whether to expose metrics on the executors.
ExposeExecutorMetrics bool `json:"exposeExecutorMetrics"`
// MetricsProperties is the content of a custom metrics.properties for configuring the Spark metric system.
// Optional.
// If not specified, the content in spark-docker/conf/metrics.properties will be used.
MetricsProperties *string `json:"metricsProperties,omitempty"`
// Prometheus is for configuring the Prometheus JMX exporter.
// Optional.
Prometheus *PrometheusSpec `json:"prometheus,omitempty"`
}
// PrometheusSpec defines the Prometheus specification when Prometheus is to be used for
// collecting and exposing metrics.
type PrometheusSpec struct {
// JmxExporterJar is the path to the Prometheus JMX exporter jar in the container.
JmxExporterJar string `json:"jmxExporterJar"`
// Port is the port of the HTTP server run by the Prometheus JMX exporter.
// Optional.
// If not specified, 8090 will be used as the default.
Port *int32 `json:"port"`
// ConfigFile is the path to the custom Prometheus configuration file provided in the Spark image.
// ConfigFile takes precedence over Configuration, which is shown below.
ConfigFile *string `json:"configFile,omitempty"`
// Configuration is the content of the Prometheus configuration needed by the Prometheus JMX exporter.
// Optional.
// If not specified, the content in spark-docker/conf/prometheus.yaml will be used.
// Configuration has no effect if ConfigFile is set.
Configuration *string `json:"configuration,omitempty"`
}
type GPUSpec struct {
// Name is GPU resource name, such as: nvidia.com/gpu or amd.com/gpu
Name string `json:"name"`
// Quantity is the number of GPUs to request for driver or executor.
Quantity int64 `json:"quantity"`
}
// PrometheusMonitoringEnabled returns if Prometheus monitoring is enabled or not.
func (s *SparkApplication) PrometheusMonitoringEnabled() bool {
return s.Spec.Monitoring != nil && s.Spec.Monitoring.Prometheus != nil
}
// HasPrometheusConfigFile returns if Prometheus monitoring uses a configuration file in the container.
func (s *SparkApplication) HasPrometheusConfigFile() bool {
return s.PrometheusMonitoringEnabled() &&
s.Spec.Monitoring.Prometheus.ConfigFile != nil &&
*s.Spec.Monitoring.Prometheus.ConfigFile != ""
}
// ExposeDriverMetrics returns if driver metrics should be exposed.
func (s *SparkApplication) ExposeDriverMetrics() bool {
return s.Spec.Monitoring != nil && s.Spec.Monitoring.ExposeDriverMetrics
}
// ExposeExecutorMetrics returns if executor metrics should be exposed.
func (s *SparkApplication) ExposeExecutorMetrics() bool {
return s.Spec.Monitoring != nil && s.Spec.Monitoring.ExposeExecutorMetrics
}

View File

@ -1,778 +0,0 @@
//go:build !ignore_autogenerated
/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
// Code generated by controller-gen. DO NOT EDIT.
package v1beta1
import (
"k8s.io/api/core/v1"
runtime "k8s.io/apimachinery/pkg/runtime"
)
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *ApplicationState) DeepCopyInto(out *ApplicationState) {
*out = *in
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ApplicationState.
func (in *ApplicationState) DeepCopy() *ApplicationState {
if in == nil {
return nil
}
out := new(ApplicationState)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *Dependencies) DeepCopyInto(out *Dependencies) {
*out = *in
if in.Jars != nil {
in, out := &in.Jars, &out.Jars
*out = make([]string, len(*in))
copy(*out, *in)
}
if in.Files != nil {
in, out := &in.Files, &out.Files
*out = make([]string, len(*in))
copy(*out, *in)
}
if in.PyFiles != nil {
in, out := &in.PyFiles, &out.PyFiles
*out = make([]string, len(*in))
copy(*out, *in)
}
if in.JarsDownloadDir != nil {
in, out := &in.JarsDownloadDir, &out.JarsDownloadDir
*out = new(string)
**out = **in
}
if in.FilesDownloadDir != nil {
in, out := &in.FilesDownloadDir, &out.FilesDownloadDir
*out = new(string)
**out = **in
}
if in.DownloadTimeout != nil {
in, out := &in.DownloadTimeout, &out.DownloadTimeout
*out = new(int32)
**out = **in
}
if in.MaxSimultaneousDownloads != nil {
in, out := &in.MaxSimultaneousDownloads, &out.MaxSimultaneousDownloads
*out = new(int32)
**out = **in
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new Dependencies.
func (in *Dependencies) DeepCopy() *Dependencies {
if in == nil {
return nil
}
out := new(Dependencies)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *DriverInfo) DeepCopyInto(out *DriverInfo) {
*out = *in
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new DriverInfo.
func (in *DriverInfo) DeepCopy() *DriverInfo {
if in == nil {
return nil
}
out := new(DriverInfo)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *DriverSpec) DeepCopyInto(out *DriverSpec) {
*out = *in
in.SparkPodSpec.DeepCopyInto(&out.SparkPodSpec)
if in.PodName != nil {
in, out := &in.PodName, &out.PodName
*out = new(string)
**out = **in
}
if in.ServiceAccount != nil {
in, out := &in.ServiceAccount, &out.ServiceAccount
*out = new(string)
**out = **in
}
if in.JavaOptions != nil {
in, out := &in.JavaOptions, &out.JavaOptions
*out = new(string)
**out = **in
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new DriverSpec.
func (in *DriverSpec) DeepCopy() *DriverSpec {
if in == nil {
return nil
}
out := new(DriverSpec)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *ExecutorSpec) DeepCopyInto(out *ExecutorSpec) {
*out = *in
in.SparkPodSpec.DeepCopyInto(&out.SparkPodSpec)
if in.Instances != nil {
in, out := &in.Instances, &out.Instances
*out = new(int32)
**out = **in
}
if in.CoreRequest != nil {
in, out := &in.CoreRequest, &out.CoreRequest
*out = new(string)
**out = **in
}
if in.JavaOptions != nil {
in, out := &in.JavaOptions, &out.JavaOptions
*out = new(string)
**out = **in
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ExecutorSpec.
func (in *ExecutorSpec) DeepCopy() *ExecutorSpec {
if in == nil {
return nil
}
out := new(ExecutorSpec)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *GPUSpec) DeepCopyInto(out *GPUSpec) {
*out = *in
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new GPUSpec.
func (in *GPUSpec) DeepCopy() *GPUSpec {
if in == nil {
return nil
}
out := new(GPUSpec)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *MonitoringSpec) DeepCopyInto(out *MonitoringSpec) {
*out = *in
if in.MetricsProperties != nil {
in, out := &in.MetricsProperties, &out.MetricsProperties
*out = new(string)
**out = **in
}
if in.Prometheus != nil {
in, out := &in.Prometheus, &out.Prometheus
*out = new(PrometheusSpec)
(*in).DeepCopyInto(*out)
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new MonitoringSpec.
func (in *MonitoringSpec) DeepCopy() *MonitoringSpec {
if in == nil {
return nil
}
out := new(MonitoringSpec)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *NameKey) DeepCopyInto(out *NameKey) {
*out = *in
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new NameKey.
func (in *NameKey) DeepCopy() *NameKey {
if in == nil {
return nil
}
out := new(NameKey)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *NamePath) DeepCopyInto(out *NamePath) {
*out = *in
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new NamePath.
func (in *NamePath) DeepCopy() *NamePath {
if in == nil {
return nil
}
out := new(NamePath)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *PrometheusSpec) DeepCopyInto(out *PrometheusSpec) {
*out = *in
if in.Port != nil {
in, out := &in.Port, &out.Port
*out = new(int32)
**out = **in
}
if in.ConfigFile != nil {
in, out := &in.ConfigFile, &out.ConfigFile
*out = new(string)
**out = **in
}
if in.Configuration != nil {
in, out := &in.Configuration, &out.Configuration
*out = new(string)
**out = **in
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new PrometheusSpec.
func (in *PrometheusSpec) DeepCopy() *PrometheusSpec {
if in == nil {
return nil
}
out := new(PrometheusSpec)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *RestartPolicy) DeepCopyInto(out *RestartPolicy) {
*out = *in
if in.OnSubmissionFailureRetries != nil {
in, out := &in.OnSubmissionFailureRetries, &out.OnSubmissionFailureRetries
*out = new(int32)
**out = **in
}
if in.OnFailureRetries != nil {
in, out := &in.OnFailureRetries, &out.OnFailureRetries
*out = new(int32)
**out = **in
}
if in.OnSubmissionFailureRetryInterval != nil {
in, out := &in.OnSubmissionFailureRetryInterval, &out.OnSubmissionFailureRetryInterval
*out = new(int64)
**out = **in
}
if in.OnFailureRetryInterval != nil {
in, out := &in.OnFailureRetryInterval, &out.OnFailureRetryInterval
*out = new(int64)
**out = **in
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new RestartPolicy.
func (in *RestartPolicy) DeepCopy() *RestartPolicy {
if in == nil {
return nil
}
out := new(RestartPolicy)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *ScheduledSparkApplication) DeepCopyInto(out *ScheduledSparkApplication) {
*out = *in
out.TypeMeta = in.TypeMeta
in.ObjectMeta.DeepCopyInto(&out.ObjectMeta)
in.Spec.DeepCopyInto(&out.Spec)
in.Status.DeepCopyInto(&out.Status)
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ScheduledSparkApplication.
func (in *ScheduledSparkApplication) DeepCopy() *ScheduledSparkApplication {
if in == nil {
return nil
}
out := new(ScheduledSparkApplication)
in.DeepCopyInto(out)
return out
}
// DeepCopyObject is an autogenerated deepcopy function, copying the receiver, creating a new runtime.Object.
func (in *ScheduledSparkApplication) DeepCopyObject() runtime.Object {
if c := in.DeepCopy(); c != nil {
return c
}
return nil
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *ScheduledSparkApplicationList) DeepCopyInto(out *ScheduledSparkApplicationList) {
*out = *in
out.TypeMeta = in.TypeMeta
in.ListMeta.DeepCopyInto(&out.ListMeta)
if in.Items != nil {
in, out := &in.Items, &out.Items
*out = make([]ScheduledSparkApplication, len(*in))
for i := range *in {
(*in)[i].DeepCopyInto(&(*out)[i])
}
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ScheduledSparkApplicationList.
func (in *ScheduledSparkApplicationList) DeepCopy() *ScheduledSparkApplicationList {
if in == nil {
return nil
}
out := new(ScheduledSparkApplicationList)
in.DeepCopyInto(out)
return out
}
// DeepCopyObject is an autogenerated deepcopy function, copying the receiver, creating a new runtime.Object.
func (in *ScheduledSparkApplicationList) DeepCopyObject() runtime.Object {
if c := in.DeepCopy(); c != nil {
return c
}
return nil
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *ScheduledSparkApplicationSpec) DeepCopyInto(out *ScheduledSparkApplicationSpec) {
*out = *in
in.Template.DeepCopyInto(&out.Template)
if in.Suspend != nil {
in, out := &in.Suspend, &out.Suspend
*out = new(bool)
**out = **in
}
if in.SuccessfulRunHistoryLimit != nil {
in, out := &in.SuccessfulRunHistoryLimit, &out.SuccessfulRunHistoryLimit
*out = new(int32)
**out = **in
}
if in.FailedRunHistoryLimit != nil {
in, out := &in.FailedRunHistoryLimit, &out.FailedRunHistoryLimit
*out = new(int32)
**out = **in
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ScheduledSparkApplicationSpec.
func (in *ScheduledSparkApplicationSpec) DeepCopy() *ScheduledSparkApplicationSpec {
if in == nil {
return nil
}
out := new(ScheduledSparkApplicationSpec)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *ScheduledSparkApplicationStatus) DeepCopyInto(out *ScheduledSparkApplicationStatus) {
*out = *in
in.LastRun.DeepCopyInto(&out.LastRun)
in.NextRun.DeepCopyInto(&out.NextRun)
if in.PastSuccessfulRunNames != nil {
in, out := &in.PastSuccessfulRunNames, &out.PastSuccessfulRunNames
*out = make([]string, len(*in))
copy(*out, *in)
}
if in.PastFailedRunNames != nil {
in, out := &in.PastFailedRunNames, &out.PastFailedRunNames
*out = make([]string, len(*in))
copy(*out, *in)
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ScheduledSparkApplicationStatus.
func (in *ScheduledSparkApplicationStatus) DeepCopy() *ScheduledSparkApplicationStatus {
if in == nil {
return nil
}
out := new(ScheduledSparkApplicationStatus)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *SecretInfo) DeepCopyInto(out *SecretInfo) {
*out = *in
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SecretInfo.
func (in *SecretInfo) DeepCopy() *SecretInfo {
if in == nil {
return nil
}
out := new(SecretInfo)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *SparkApplication) DeepCopyInto(out *SparkApplication) {
*out = *in
out.TypeMeta = in.TypeMeta
in.ObjectMeta.DeepCopyInto(&out.ObjectMeta)
in.Spec.DeepCopyInto(&out.Spec)
in.Status.DeepCopyInto(&out.Status)
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SparkApplication.
func (in *SparkApplication) DeepCopy() *SparkApplication {
if in == nil {
return nil
}
out := new(SparkApplication)
in.DeepCopyInto(out)
return out
}
// DeepCopyObject is an autogenerated deepcopy function, copying the receiver, creating a new runtime.Object.
func (in *SparkApplication) DeepCopyObject() runtime.Object {
if c := in.DeepCopy(); c != nil {
return c
}
return nil
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *SparkApplicationList) DeepCopyInto(out *SparkApplicationList) {
*out = *in
out.TypeMeta = in.TypeMeta
in.ListMeta.DeepCopyInto(&out.ListMeta)
if in.Items != nil {
in, out := &in.Items, &out.Items
*out = make([]SparkApplication, len(*in))
for i := range *in {
(*in)[i].DeepCopyInto(&(*out)[i])
}
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SparkApplicationList.
func (in *SparkApplicationList) DeepCopy() *SparkApplicationList {
if in == nil {
return nil
}
out := new(SparkApplicationList)
in.DeepCopyInto(out)
return out
}
// DeepCopyObject is an autogenerated deepcopy function, copying the receiver, creating a new runtime.Object.
func (in *SparkApplicationList) DeepCopyObject() runtime.Object {
if c := in.DeepCopy(); c != nil {
return c
}
return nil
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *SparkApplicationSpec) DeepCopyInto(out *SparkApplicationSpec) {
*out = *in
if in.Image != nil {
in, out := &in.Image, &out.Image
*out = new(string)
**out = **in
}
if in.InitContainerImage != nil {
in, out := &in.InitContainerImage, &out.InitContainerImage
*out = new(string)
**out = **in
}
if in.ImagePullPolicy != nil {
in, out := &in.ImagePullPolicy, &out.ImagePullPolicy
*out = new(string)
**out = **in
}
if in.ImagePullSecrets != nil {
in, out := &in.ImagePullSecrets, &out.ImagePullSecrets
*out = make([]string, len(*in))
copy(*out, *in)
}
if in.MainClass != nil {
in, out := &in.MainClass, &out.MainClass
*out = new(string)
**out = **in
}
if in.MainApplicationFile != nil {
in, out := &in.MainApplicationFile, &out.MainApplicationFile
*out = new(string)
**out = **in
}
if in.Arguments != nil {
in, out := &in.Arguments, &out.Arguments
*out = make([]string, len(*in))
copy(*out, *in)
}
if in.SparkConf != nil {
in, out := &in.SparkConf, &out.SparkConf
*out = make(map[string]string, len(*in))
for key, val := range *in {
(*out)[key] = val
}
}
if in.HadoopConf != nil {
in, out := &in.HadoopConf, &out.HadoopConf
*out = make(map[string]string, len(*in))
for key, val := range *in {
(*out)[key] = val
}
}
if in.SparkConfigMap != nil {
in, out := &in.SparkConfigMap, &out.SparkConfigMap
*out = new(string)
**out = **in
}
if in.HadoopConfigMap != nil {
in, out := &in.HadoopConfigMap, &out.HadoopConfigMap
*out = new(string)
**out = **in
}
if in.Volumes != nil {
in, out := &in.Volumes, &out.Volumes
*out = make([]v1.Volume, len(*in))
for i := range *in {
(*in)[i].DeepCopyInto(&(*out)[i])
}
}
in.Driver.DeepCopyInto(&out.Driver)
in.Executor.DeepCopyInto(&out.Executor)
in.Deps.DeepCopyInto(&out.Deps)
in.RestartPolicy.DeepCopyInto(&out.RestartPolicy)
if in.NodeSelector != nil {
in, out := &in.NodeSelector, &out.NodeSelector
*out = make(map[string]string, len(*in))
for key, val := range *in {
(*out)[key] = val
}
}
if in.FailureRetries != nil {
in, out := &in.FailureRetries, &out.FailureRetries
*out = new(int32)
**out = **in
}
if in.RetryInterval != nil {
in, out := &in.RetryInterval, &out.RetryInterval
*out = new(int64)
**out = **in
}
if in.PythonVersion != nil {
in, out := &in.PythonVersion, &out.PythonVersion
*out = new(string)
**out = **in
}
if in.MemoryOverheadFactor != nil {
in, out := &in.MemoryOverheadFactor, &out.MemoryOverheadFactor
*out = new(string)
**out = **in
}
if in.Monitoring != nil {
in, out := &in.Monitoring, &out.Monitoring
*out = new(MonitoringSpec)
(*in).DeepCopyInto(*out)
}
if in.BatchScheduler != nil {
in, out := &in.BatchScheduler, &out.BatchScheduler
*out = new(string)
**out = **in
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SparkApplicationSpec.
func (in *SparkApplicationSpec) DeepCopy() *SparkApplicationSpec {
if in == nil {
return nil
}
out := new(SparkApplicationSpec)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *SparkApplicationStatus) DeepCopyInto(out *SparkApplicationStatus) {
*out = *in
in.LastSubmissionAttemptTime.DeepCopyInto(&out.LastSubmissionAttemptTime)
in.TerminationTime.DeepCopyInto(&out.TerminationTime)
out.DriverInfo = in.DriverInfo
out.AppState = in.AppState
if in.ExecutorState != nil {
in, out := &in.ExecutorState, &out.ExecutorState
*out = make(map[string]ExecutorState, len(*in))
for key, val := range *in {
(*out)[key] = val
}
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SparkApplicationStatus.
func (in *SparkApplicationStatus) DeepCopy() *SparkApplicationStatus {
if in == nil {
return nil
}
out := new(SparkApplicationStatus)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *SparkPodSpec) DeepCopyInto(out *SparkPodSpec) {
*out = *in
if in.Cores != nil {
in, out := &in.Cores, &out.Cores
*out = new(float32)
**out = **in
}
if in.CoreLimit != nil {
in, out := &in.CoreLimit, &out.CoreLimit
*out = new(string)
**out = **in
}
if in.Memory != nil {
in, out := &in.Memory, &out.Memory
*out = new(string)
**out = **in
}
if in.MemoryOverhead != nil {
in, out := &in.MemoryOverhead, &out.MemoryOverhead
*out = new(string)
**out = **in
}
if in.GPU != nil {
in, out := &in.GPU, &out.GPU
*out = new(GPUSpec)
**out = **in
}
if in.Image != nil {
in, out := &in.Image, &out.Image
*out = new(string)
**out = **in
}
if in.ConfigMaps != nil {
in, out := &in.ConfigMaps, &out.ConfigMaps
*out = make([]NamePath, len(*in))
copy(*out, *in)
}
if in.Secrets != nil {
in, out := &in.Secrets, &out.Secrets
*out = make([]SecretInfo, len(*in))
copy(*out, *in)
}
if in.EnvVars != nil {
in, out := &in.EnvVars, &out.EnvVars
*out = make(map[string]string, len(*in))
for key, val := range *in {
(*out)[key] = val
}
}
if in.EnvSecretKeyRefs != nil {
in, out := &in.EnvSecretKeyRefs, &out.EnvSecretKeyRefs
*out = make(map[string]NameKey, len(*in))
for key, val := range *in {
(*out)[key] = val
}
}
if in.Labels != nil {
in, out := &in.Labels, &out.Labels
*out = make(map[string]string, len(*in))
for key, val := range *in {
(*out)[key] = val
}
}
if in.Annotations != nil {
in, out := &in.Annotations, &out.Annotations
*out = make(map[string]string, len(*in))
for key, val := range *in {
(*out)[key] = val
}
}
if in.VolumeMounts != nil {
in, out := &in.VolumeMounts, &out.VolumeMounts
*out = make([]v1.VolumeMount, len(*in))
for i := range *in {
(*in)[i].DeepCopyInto(&(*out)[i])
}
}
if in.Affinity != nil {
in, out := &in.Affinity, &out.Affinity
*out = new(v1.Affinity)
(*in).DeepCopyInto(*out)
}
if in.Tolerations != nil {
in, out := &in.Tolerations, &out.Tolerations
*out = make([]v1.Toleration, len(*in))
for i := range *in {
(*in)[i].DeepCopyInto(&(*out)[i])
}
}
if in.SecurityContext != nil {
in, out := &in.SecurityContext, &out.SecurityContext
*out = new(v1.PodSecurityContext)
(*in).DeepCopyInto(*out)
}
if in.SchedulerName != nil {
in, out := &in.SchedulerName, &out.SchedulerName
*out = new(string)
**out = **in
}
if in.Sidecars != nil {
in, out := &in.Sidecars, &out.Sidecars
*out = make([]v1.Container, len(*in))
for i := range *in {
(*in)[i].DeepCopyInto(&(*out)[i])
}
}
if in.HostNetwork != nil {
in, out := &in.HostNetwork, &out.HostNetwork
*out = new(bool)
**out = **in
}
if in.NodeSelector != nil {
in, out := &in.NodeSelector, &out.NodeSelector
*out = make(map[string]string, len(*in))
for key, val := range *in {
(*out)[key] = val
}
}
if in.DNSConfig != nil {
in, out := &in.DNSConfig, &out.DNSConfig
*out = new(v1.PodDNSConfig)
(*in).DeepCopyInto(*out)
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SparkPodSpec.
func (in *SparkPodSpec) DeepCopy() *SparkPodSpec {
if in == nil {
return nil
}
out := new(SparkPodSpec)
in.DeepCopyInto(out)
return out
}

View File

@ -34,6 +34,12 @@ type ScheduledSparkApplicationSpec struct {
// Schedule is a cron schedule on which the application should run.
Schedule string `json:"schedule"`
// TimeZone is the time zone in which the cron schedule will be interpreted in.
// This value is passed to time.LoadLocation, so it must be either "Local", "UTC",
// or a valid IANA location name e.g. "America/New_York".
// +optional
// Defaults to "Local".
TimeZone string `json:"timeZone,omitempty"`
// Template is a template from which SparkApplication instances can be created.
Template SparkApplicationSpec `json:"template"`
// Suspend is a flag telling the controller to suspend subsequent runs of the application if set to true.
@ -80,10 +86,13 @@ type ScheduledSparkApplicationStatus struct {
// +kubebuilder:resource:scope=Namespaced,shortName=scheduledsparkapp,singular=scheduledsparkapplication
// +kubebuilder:subresource:status
// +kubebuilder:printcolumn:JSONPath=.spec.schedule,name=Schedule,type=string
// +kubebuilder:printcolumn:JSONPath=.spec.timeZone,name=TimeZone,type=string
// +kubebuilder:printcolumn:JSONPath=.spec.suspend,name=Suspend,type=string
// +kubebuilder:printcolumn:JSONPath=.status.lastRun,name=Last Run,type=date
// +kubebuilder:printcolumn:JSONPath=.status.lastRunName,name=Last Run Name,type=string
// +kubebuilder:printcolumn:JSONPath=.metadata.creationTimestamp,name=Age,type=date
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// +genclient
// ScheduledSparkApplication is the Schema for the scheduledsparkapplications API.
type ScheduledSparkApplication struct {
@ -95,6 +104,7 @@ type ScheduledSparkApplication struct {
}
// +kubebuilder:object:root=true
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// ScheduledSparkApplicationList contains a list of ScheduledSparkApplication.
type ScheduledSparkApplicationList struct {

View File

@ -182,6 +182,8 @@ type SparkApplicationStatus struct {
// +kubebuilder:printcolumn:JSONPath=.status.lastSubmissionAttemptTime,name=Start,type=string
// +kubebuilder:printcolumn:JSONPath=.status.terminationTime,name=Finish,type=string
// +kubebuilder:printcolumn:JSONPath=.metadata.creationTimestamp,name=Age,type=date
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// +genclient
// SparkApplication is the Schema for the sparkapplications API
type SparkApplication struct {
@ -193,6 +195,7 @@ type SparkApplication struct {
}
// +kubebuilder:object:root=true
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// SparkApplicationList contains a list of SparkApplication
type SparkApplicationList struct {
@ -427,6 +430,9 @@ type SparkPodSpec struct {
// Memory is the amount of memory to request for the pod.
// +optional
Memory *string `json:"memory,omitempty"`
// MemoryLimit overrides the memory limit of the pod.
// +optional
MemoryLimit *string `json:"memoryLimit,omitempty"`
// MemoryOverhead is the amount of off-heap memory to allocate in cluster mode, in MiB unless otherwise specified.
// +optional
MemoryOverhead *string `json:"memoryOverhead,omitempty"`
@ -700,6 +706,12 @@ type DynamicAllocation struct {
// MaxExecutors is the upper bound for the number of executors if dynamic allocation is enabled.
// +optional
MaxExecutors *int32 `json:"maxExecutors,omitempty"`
// ShuffleTrackingEnabled enables shuffle file tracking for executors, which allows dynamic allocation without
// the need for an external shuffle service. This option will try to keep alive executors that are storing
// shuffle data for active jobs. If external shuffle service is enabled, set ShuffleTrackingEnabled to false.
// ShuffleTrackingEnabled is true by default if dynamicAllocation.enabled is true.
// +optional
ShuffleTrackingEnabled *bool `json:"shuffleTrackingEnabled,omitempty"`
// ShuffleTrackingTimeout controls the timeout in milliseconds for executors that are holding
// shuffle data if shuffle tracking is enabled (true by default if dynamic allocation is enabled).
// +optional

View File

@ -1,5 +1,5 @@
/*
Copyright 2017 Google LLC
Copyright 2025 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
@ -14,14 +14,10 @@ See the License for the specific language governing permissions and
limitations under the License.
*/
package main
/*
This file is needed for kubernetes/code-generator/kube_codegen.sh script used in hack/update-codegen.sh.
*/
import (
_ "k8s.io/client-go/plugin/pkg/client/auth"
package v1beta2
"github.com/kubeflow/spark-operator/cmd/sparkctl/app"
)
func main() {
app.Execute()
}
//+genclient

View File

@ -1,7 +1,7 @@
//go:build !ignore_autogenerated
/*
Copyright 2024 The Kubeflow authors.
Copyright 2025 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
@ -279,6 +279,11 @@ func (in *DynamicAllocation) DeepCopyInto(out *DynamicAllocation) {
*out = new(int32)
**out = **in
}
if in.ShuffleTrackingEnabled != nil {
in, out := &in.ShuffleTrackingEnabled, &out.ShuffleTrackingEnabled
*out = new(bool)
**out = **in
}
if in.ShuffleTrackingTimeout != nil {
in, out := &in.ShuffleTrackingTimeout, &out.ShuffleTrackingTimeout
*out = new(int64)
@ -896,6 +901,11 @@ func (in *SparkPodSpec) DeepCopyInto(out *SparkPodSpec) {
*out = new(string)
**out = **in
}
if in.MemoryLimit != nil {
in, out := &in.MemoryLimit, &out.MemoryLimit
*out = new(string)
**out = **in
}
if in.MemoryOverhead != nil {
in, out := &in.MemoryOverhead, &out.MemoryOverhead
*out = new(string)

View File

@ -20,9 +20,9 @@ name: spark-operator
description: A Helm chart for Spark on Kubernetes operator.
version: 2.1.1
version: 2.2.1
appVersion: 2.1.1
appVersion: 2.2.1
keywords:
- apache spark

View File

@ -1,6 +1,6 @@
# spark-operator
![Version: 2.1.1](https://img.shields.io/badge/Version-2.1.1-informational?style=flat-square) ![AppVersion: 2.1.1](https://img.shields.io/badge/AppVersion-2.1.1-informational?style=flat-square)
![Version: 2.2.1](https://img.shields.io/badge/Version-2.2.1-informational?style=flat-square) ![AppVersion: 2.2.1](https://img.shields.io/badge/AppVersion-2.2.1-informational?style=flat-square)
A Helm chart for Spark on Kubernetes operator.
@ -78,8 +78,8 @@ See [helm uninstall](https://helm.sh/docs/helm/helm_uninstall) for command docum
| nameOverride | string | `""` | String to partially override release name. |
| fullnameOverride | string | `""` | String to fully override release name. |
| commonLabels | object | `{}` | Common labels to add to the resources. |
| image.registry | string | `"docker.io"` | Image registry. |
| image.repository | string | `"kubeflow/spark-operator"` | Image repository. |
| image.registry | string | `"ghcr.io"` | Image registry. |
| image.repository | string | `"kubeflow/spark-operator/controller"` | Image repository. |
| image.tag | string | If not set, the chart appVersion will be used. | Image tag. |
| image.pullPolicy | string | `"IfNotPresent"` | Image pull policy. |
| image.pullSecrets | list | `[]` | Image pull secrets for private image registry. |
@ -87,12 +87,15 @@ See [helm uninstall](https://helm.sh/docs/helm/helm_uninstall) for command docum
| controller.leaderElection.enable | bool | `true` | Specifies whether to enable leader election for controller. |
| controller.workers | int | `10` | Reconcile concurrency, higher values might increase memory usage. |
| controller.logLevel | string | `"info"` | Configure the verbosity of logging, can be one of `debug`, `info`, `error`. |
| controller.logEncoder | string | `"console"` | Configure the encoder of logging, can be one of `console` or `json`. |
| controller.driverPodCreationGracePeriod | string | `"10s"` | Grace period after a successful spark-submit when driver pod not found errors will be retried. Useful if the driver pod can take some time to be created. |
| controller.maxTrackedExecutorPerApp | int | `1000` | Specifies the maximum number of Executor pods that can be tracked by the controller per SparkApplication. |
| controller.uiService.enable | bool | `true` | Specifies whether to create service for Spark web UI. |
| controller.uiIngress.enable | bool | `false` | Specifies whether to create ingress for Spark web UI. `controller.uiService.enable` must be `true` to enable ingress. |
| controller.uiIngress.urlFormat | string | `""` | Ingress URL format. Required if `controller.uiIngress.enable` is true. |
| controller.uiIngress.ingressClassName | string | `""` | Optionally set the ingressClassName. |
| controller.uiIngress.tls | list | `[]` | Optionally set default TLS configuration for the Spark UI's ingress. `ingressTLS` in the SparkApplication spec overrides this. |
| controller.uiIngress.annotations | object | `{}` | Optionally set default ingress annotations for the Spark UI's ingress. `ingressAnnotations` in the SparkApplication spec overrides this. |
| controller.batchScheduler.enable | bool | `false` | Specifies whether to enable batch scheduler for spark jobs scheduling. If enabled, users can specify batch scheduler name in spark application. |
| controller.batchScheduler.kubeSchedulerNames | list | `[]` | Specifies a list of kube-scheduler names for scheduling Spark pods. |
| controller.batchScheduler.default | string | `""` | Default batch scheduler to be used if not specified by the user. If specified, this value must be either "volcano" or "yunikorn". Specifying any other value will cause the controller to error on startup. |
@ -130,6 +133,7 @@ See [helm uninstall](https://helm.sh/docs/helm/helm_uninstall) for command docum
| webhook.replicas | int | `1` | Number of replicas of webhook server. |
| webhook.leaderElection.enable | bool | `true` | Specifies whether to enable leader election for webhook. |
| webhook.logLevel | string | `"info"` | Configure the verbosity of logging, can be one of `debug`, `info`, `error`. |
| webhook.logEncoder | string | `"console"` | Configure the encoder of logging, can be one of `console` or `json`. |
| webhook.port | int | `9443` | Specifies webhook port. |
| webhook.portName | string | `"webhook"` | Specifies webhook service port name. |
| webhook.failurePolicy | string | `"Fail"` | Specifies how unrecognized errors are handled. Available options are `Ignore` or `Fail`. |
@ -175,6 +179,10 @@ See [helm uninstall](https://helm.sh/docs/helm/helm_uninstall) for command docum
| prometheus.podMonitor.labels | object | `{}` | Pod monitor labels |
| prometheus.podMonitor.jobLabel | string | `"spark-operator-podmonitor"` | The label to use to retrieve the job name from |
| prometheus.podMonitor.podMetricsEndpoint | object | `{"interval":"5s","scheme":"http"}` | Prometheus metrics endpoint properties. `metrics.portName` will be used as a port |
| certManager.enable | bool | `false` | Specifies whether to use [cert-manager](https://cert-manager.io) to generate certificate for webhook. `webhook.enable` must be set to `true` to enable cert-manager. |
| certManager.issuerRef | object | A self-signed issuer will be created and used if not specified. | The reference to the issuer. |
| certManager.duration | string | `2160h` (90 days) will be used if not specified. | The duration of the certificate validity (e.g. `2160h`). See [cert-manager.io/v1.Certificate](https://cert-manager.io/docs/reference/api-docs/#cert-manager.io/v1.Certificate). |
| certManager.renewBefore | string | 1/3 of issued certificates lifetime. | The duration before the certificate expiration to renew the certificate (e.g. `720h`). See [cert-manager.io/v1.Certificate](https://cert-manager.io/docs/reference/api-docs/#cert-manager.io/v1.Certificate). |
## Maintainers

View File

@ -21,6 +21,9 @@ spec:
- jsonPath: .spec.schedule
name: Schedule
type: string
- jsonPath: .spec.timeZone
name: TimeZone
type: string
- jsonPath: .spec.suspend
name: Suspend
type: string
@ -3118,6 +3121,10 @@ spec:
description: Memory is the amount of memory to request for
the pod.
type: string
memoryLimit:
description: MemoryLimit overrides the memory limit of the
pod.
type: string
memoryOverhead:
description: MemoryOverhead is the amount of off-heap memory
to allocate in cluster mode, in MiB unless otherwise specified.
@ -5289,6 +5296,13 @@ spec:
of executors if dynamic allocation is enabled.
format: int32
type: integer
shuffleTrackingEnabled:
description: |-
ShuffleTrackingEnabled enables shuffle file tracking for executors, which allows dynamic allocation without
the need for an external shuffle service. This option will try to keep alive executors that are storing
shuffle data for active jobs. If external shuffle service is enabled, set ShuffleTrackingEnabled to false.
ShuffleTrackingEnabled is true by default if dynamicAllocation.enabled is true.
type: boolean
shuffleTrackingTimeout:
description: |-
ShuffleTrackingTimeout controls the timeout in milliseconds for executors that are holding
@ -8242,6 +8256,10 @@ spec:
description: Memory is the amount of memory to request for
the pod.
type: string
memoryLimit:
description: MemoryLimit overrides the memory limit of the
pod.
type: string
memoryOverhead:
description: MemoryOverhead is the amount of off-heap memory
to allocate in cluster mode, in MiB unless otherwise specified.
@ -12384,6 +12402,13 @@ spec:
- sparkVersion
- type
type: object
timeZone:
description: |-
TimeZone is the time zone in which the cron schedule will be interpreted in.
This value is passed to time.LoadLocation, so it must be either "Local", "UTC",
or a valid IANA location name e.g. "America/New_York".
Defaults to "Local".
type: string
required:
- schedule
- template

View File

@ -3071,6 +3071,9 @@ spec:
description: Memory is the amount of memory to request for the
pod.
type: string
memoryLimit:
description: MemoryLimit overrides the memory limit of the pod.
type: string
memoryOverhead:
description: MemoryOverhead is the amount of off-heap memory to
allocate in cluster mode, in MiB unless otherwise specified.
@ -5235,6 +5238,13 @@ spec:
executors if dynamic allocation is enabled.
format: int32
type: integer
shuffleTrackingEnabled:
description: |-
ShuffleTrackingEnabled enables shuffle file tracking for executors, which allows dynamic allocation without
the need for an external shuffle service. This option will try to keep alive executors that are storing
shuffle data for active jobs. If external shuffle service is enabled, set ShuffleTrackingEnabled to false.
ShuffleTrackingEnabled is true by default if dynamicAllocation.enabled is true.
type: boolean
shuffleTrackingTimeout:
description: |-
ShuffleTrackingTimeout controls the timeout in milliseconds for executors that are holding
@ -8172,6 +8182,9 @@ spec:
description: Memory is the amount of memory to request for the
pod.
type: string
memoryLimit:
description: MemoryLimit overrides the memory limit of the pod.
type: string
memoryOverhead:
description: MemoryOverhead is the amount of off-heap memory to
allocate in cluster mode, in MiB unless otherwise specified.

View File

@ -0,0 +1,272 @@
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
api-approved.kubernetes.io: https://github.com/kubeflow/spark-operator/pull/1298
controller-gen.kubebuilder.io/version: v0.17.1
name: sparkconnects.sparkoperator.k8s.io
spec:
group: sparkoperator.k8s.io
names:
kind: SparkConnect
listKind: SparkConnectList
plural: sparkconnects
shortNames:
- sparkconn
singular: sparkconnect
scope: Namespaced
versions:
- additionalPrinterColumns:
- jsonPath: .metadata.creationTimestamp
name: Age
type: date
name: v1alpha1
schema:
openAPIV3Schema:
description: SparkConnect is the Schema for the sparkconnections API.
properties:
apiVersion:
description: |-
APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
type: string
kind:
description: |-
Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
type: string
metadata:
type: object
spec:
description: SparkConnectSpec defines the desired state of SparkConnect.
properties:
dynamicAllocation:
description: |-
DynamicAllocation configures dynamic allocation that becomes available for the Kubernetes
scheduler backend since Spark 3.0.
properties:
enabled:
description: Enabled controls whether dynamic allocation is enabled
or not.
type: boolean
initialExecutors:
description: |-
InitialExecutors is the initial number of executors to request. If .spec.executor.instances
is also set, the initial number of executors is set to the bigger of that and this option.
format: int32
type: integer
maxExecutors:
description: MaxExecutors is the upper bound for the number of
executors if dynamic allocation is enabled.
format: int32
type: integer
minExecutors:
description: MinExecutors is the lower bound for the number of
executors if dynamic allocation is enabled.
format: int32
type: integer
shuffleTrackingEnabled:
description: |-
ShuffleTrackingEnabled enables shuffle file tracking for executors, which allows dynamic allocation without
the need for an external shuffle service. This option will try to keep alive executors that are storing
shuffle data for active jobs. If external shuffle service is enabled, set ShuffleTrackingEnabled to false.
ShuffleTrackingEnabled is true by default if dynamicAllocation.enabled is true.
type: boolean
shuffleTrackingTimeout:
description: |-
ShuffleTrackingTimeout controls the timeout in milliseconds for executors that are holding
shuffle data if shuffle tracking is enabled (true by default if dynamic allocation is enabled).
format: int64
type: integer
type: object
executor:
description: Executor is the Spark executor specification.
properties:
cores:
description: Cores maps to `spark.driver.cores` or `spark.executor.cores`
for the driver and executors, respectively.
format: int32
minimum: 1
type: integer
instances:
description: Instances is the number of executor instances.
format: int32
minimum: 0
type: integer
memory:
description: Memory is the amount of memory to request for the
pod.
type: string
template:
description: |-
Template is a pod template that can be used to define the driver or executor pod configurations that Spark configurations do not support.
Spark version >= 3.0.0 is required.
Ref: https://spark.apache.org/docs/latest/running-on-kubernetes.html#pod-template.
type: object
x-kubernetes-preserve-unknown-fields: true
type: object
hadoopConf:
additionalProperties:
type: string
description: |-
HadoopConf carries user-specified Hadoop configuration properties as they would use the "--conf" option
in spark-submit. The SparkApplication controller automatically adds prefix "spark.hadoop." to Hadoop
configuration properties.
type: object
image:
description: |-
Image is the container image for the driver, executor, and init-container. Any custom container images for the
driver, executor, or init-container takes precedence over this.
type: string
server:
description: Server is the Spark connect server specification.
properties:
cores:
description: Cores maps to `spark.driver.cores` or `spark.executor.cores`
for the driver and executors, respectively.
format: int32
minimum: 1
type: integer
memory:
description: Memory is the amount of memory to request for the
pod.
type: string
template:
description: |-
Template is a pod template that can be used to define the driver or executor pod configurations that Spark configurations do not support.
Spark version >= 3.0.0 is required.
Ref: https://spark.apache.org/docs/latest/running-on-kubernetes.html#pod-template.
type: object
x-kubernetes-preserve-unknown-fields: true
type: object
sparkConf:
additionalProperties:
type: string
description: |-
SparkConf carries user-specified Spark configuration properties as they would use the "--conf" option in
spark-submit.
type: object
sparkVersion:
description: SparkVersion is the version of Spark the spark connect
use.
type: string
required:
- executor
- server
- sparkVersion
type: object
status:
description: SparkConnectStatus defines the observed state of SparkConnect.
properties:
conditions:
description: Represents the latest available observations of a SparkConnect's
current state.
items:
description: Condition contains details for one aspect of the current
state of this API Resource.
properties:
lastTransitionTime:
description: |-
lastTransitionTime is the last time the condition transitioned from one status to another.
This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
format: date-time
type: string
message:
description: |-
message is a human readable message indicating details about the transition.
This may be an empty string.
maxLength: 32768
type: string
observedGeneration:
description: |-
observedGeneration represents the .metadata.generation that the condition was set based upon.
For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date
with respect to the current state of the instance.
format: int64
minimum: 0
type: integer
reason:
description: |-
reason contains a programmatic identifier indicating the reason for the condition's last transition.
Producers of specific condition types may define expected values and meanings for this field,
and whether the values are considered a guaranteed API.
The value should be a CamelCase string.
This field may not be empty.
maxLength: 1024
minLength: 1
pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$
type: string
status:
description: status of the condition, one of True, False, Unknown.
enum:
- "True"
- "False"
- Unknown
type: string
type:
description: type of condition in CamelCase or in foo.example.com/CamelCase.
maxLength: 316
pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
type: string
required:
- lastTransitionTime
- message
- reason
- status
- type
type: object
type: array
x-kubernetes-list-map-keys:
- type
x-kubernetes-list-type: map
executors:
additionalProperties:
type: integer
description: Executors represents the current state of the SparkConnect
executors.
type: object
lastUpdateTime:
description: LastUpdateTime is the time at which the SparkConnect
controller last updated the SparkConnect.
format: date-time
type: string
server:
description: Server represents the current state of the SparkConnect
server.
properties:
podIp:
description: PodIP is the IP address of the pod that is running
the Spark Connect server.
type: string
podName:
description: PodName is the name of the pod that is running the
Spark Connect server.
type: string
serviceName:
description: ServiceName is the name of the service that is exposing
the Spark Connect server.
type: string
type: object
startTime:
description: StartTime is the time at which the SparkConnect controller
started processing the SparkConnect.
format: date-time
type: string
state:
description: State represents the current state of the SparkConnect.
type: string
type: object
required:
- metadata
- spec
type: object
served: true
storage: true
subresources:
status: {}

View File

@ -74,5 +74,5 @@ app.kubernetes.io/instance: {{ .Release.Name }}
Spark Operator image
*/}}
{{- define "spark-operator.image" -}}
{{ printf "%s/%s:%s" .Values.image.registry .Values.image.repository (.Values.image.tag | default .Chart.AppVersion) }}
{{ printf "%s/%s:%s" .Values.image.registry .Values.image.repository (.Values.image.tag | default .Chart.AppVersion | toString) }}
{{- end -}}

View File

@ -1,5 +1,5 @@
/*
Copyright 2018 Google LLC
{{- /*
Copyright 2025 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
@ -12,28 +12,18 @@ distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
*/ -}}
package app
{{/*
Create the name of the webhook certificate issuer.
*/}}
{{- define "spark-operator.certManager.issuer.name" -}}
{{ include "spark-operator.name" . }}-self-signed-issuer
{{- end -}}
import (
"time"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/duration"
)
func getSinceTime(timestamp metav1.Time) string {
if timestamp.IsZero() {
return "N.A."
}
return duration.ShortHumanDuration(time.Since(timestamp.Time))
}
func formatNotAvailable(info string) string {
if info == "" {
return "N.A."
}
return info
}
{{/*
Create the name of the certificate to be used by webhook.
*/}}
{{- define "spark-operator.certManager.certificate.name" -}}
{{ include "spark-operator.name" . }}-certificate
{{- end -}}

View File

@ -0,0 +1,56 @@
{{- /*
Copyright 2025 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/ -}}
{{- if .Values.webhook.enable }}
{{- if .Values.certManager.enable }}
{{- if not (.Capabilities.APIVersions.Has "cert-manager.io/v1/Certificate") }}
{{- fail "The cluster does not support the required API version `cert-manager.io/v1` for `Certificate`." }}
{{- end }}
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: {{ include "spark-operator.certManager.certificate.name" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "spark-operator.labels" . | nindent 4 }}
spec:
secretName: {{ include "spark-operator.webhook.secretName" . }}
issuerRef:
{{- if not .Values.certManager.issuerRef }}
group: cert-manager.io
kind: Issuer
name: {{ include "spark-operator.certManager.issuer.name" . }}
{{- else }}
{{- toYaml .Values.certManager.issuerRef | nindent 4 }}
{{- end }}
commonName: {{ include "spark-operator.webhook.serviceName" . }}.{{ .Release.Namespace }}.svc
dnsNames:
- {{ include "spark-operator.webhook.serviceName" . }}.{{ .Release.Namespace }}.svc
- {{ include "spark-operator.webhook.serviceName" . }}.{{ .Release.Namespace }}.svc.cluster.local
subject:
organizationalUnits:
- spark-operator
usages:
- server auth
- client auth
{{- with .Values.certManager.duration }}
duration: {{ . }}
{{- end }}
{{- with .Values.certManager.renewBefore }}
renewBefore: {{ . }}
{{- end }}
{{- end }}
{{- end }}

View File

@ -0,0 +1,34 @@
{{- /*
Copyright 2025 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/ -}}
{{- if .Values.webhook.enable }}
{{- if .Values.certManager.enable }}
{{- if not .Values.certManager.issuerRef }}
{{- if not (.Capabilities.APIVersions.Has "cert-manager.io/v1/Issuer") }}
{{- fail "The cluster does not support the required API version `cert-manager.io/v1` for `Issuer`." }}
{{- end }}
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: {{ include "spark-operator.certManager.issuer.name" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "spark-operator.labels" . | nindent 4 }}
spec:
selfSigned: {}
{{- end }}
{{- end }}
{{- end }}

View File

@ -172,15 +172,17 @@ Create the role policy rules for the controller in every Spark job namespace
- ingresses
verbs:
- get
- create
- delete
- list
- watch
- create
- update
- delete
- apiGroups:
- sparkoperator.k8s.io
resources:
- sparkapplications
- scheduledsparkapplications
- sparkconnects
verbs:
- get
- list
@ -196,6 +198,7 @@ Create the role policy rules for the controller in every Spark job namespace
- sparkapplications/finalizers
- scheduledsparkapplications/status
- scheduledsparkapplications/finalizers
- sparkconnects/status
verbs:
- get
- update

View File

@ -56,6 +56,9 @@ spec:
{{- with .Values.controller.logLevel }}
- --zap-log-level={{ . }}
{{- end }}
{{- with .Values.controller.logEncoder }}
- --zap-encoder={{ . }}
{{- end }}
{{- with .Values.spark.jobNamespaces }}
{{- if has "" . }}
- --namespaces=""
@ -72,6 +75,12 @@ spec:
{{- with .Values.controller.uiIngress.ingressClassName }}
- --ingress-class-name={{ . }}
{{- end }}
{{- with .Values.controller.uiIngress.tls }}
- --ingress-tls={{ . | toJson }}
{{- end }}
{{- with .Values.controller.uiIngress.annotations }}
- --ingress-annotations={{ . | toJson }}
{{- end }}
{{- end }}
{{- if .Values.controller.batchScheduler.enable }}
- --enable-batch-scheduler=true

View File

@ -83,7 +83,6 @@ Create the name of the secret to be used by webhook
{{ include "spark-operator.webhook.name" . }}-certs
{{- end -}}
{{/*
Create the name of the service to be used by webhook
*/}}

View File

@ -50,6 +50,9 @@ spec:
{{- with .Values.webhook.logLevel }}
- --zap-log-level={{ . }}
{{- end }}
{{- with .Values.webhook.logEncoder }}
- --zap-encoder={{ . }}
{{- end }}
{{- with .Values.spark.jobNamespaces }}
{{- if has "" . }}
- --namespaces=""
@ -67,6 +70,9 @@ spec:
{{- with .Values.webhook.resourceQuotaEnforcement.enable }}
- --enable-resource-quota-enforcement=true
{{- end }}
{{- if .Values.certManager.enable }}
- --enable-cert-manager=true
{{- end }}
{{- if .Values.prometheus.metrics.enable }}
- --enable-metrics=true
- --metrics-bind-address=:{{ .Values.prometheus.metrics.port }}

View File

@ -21,6 +21,10 @@ metadata:
name: {{ include "spark-operator.webhook.name" . }}
labels:
{{- include "spark-operator.webhook.labels" . | nindent 4 }}
{{- if .Values.certManager.enable }}
annotations:
cert-manager.io/inject-ca-from: {{ .Release.Namespace }}/{{ include "spark-operator.certManager.certificate.name" . }}
{{- end }}
webhooks:
- name: mutate--v1-pod.sparkoperator.k8s.io
admissionReviewVersions: ["v1"]

View File

@ -21,6 +21,10 @@ metadata:
name: {{ include "spark-operator.webhook.name" . }}
labels:
{{- include "spark-operator.webhook.labels" . | nindent 4 }}
{{- if .Values.certManager.enable }}
annotations:
cert-manager.io/inject-ca-from: {{ .Release.Namespace }}/{{ include "spark-operator.certManager.certificate.name" . }}
{{- end }}
webhooks:
- name: validate-sparkoperator-k8s-io-v1beta2-sparkapplication.sparkoperator.k8s.io
admissionReviewVersions: ["v1"]

View File

@ -0,0 +1,134 @@
#
# Copyright 2025 The Kubeflow authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
suite: Test CertManager Certificate
templates:
- certmanager/certificate.yaml
release:
name: spark-operator
namespace: spark-operator
tests:
- it: Should not create Certificate if `webhook.enable` is `false`
capabilities:
apiVersions:
- cert-manager.io/v1/Certificate
set:
webhook:
enable: false
certManager:
enable: true
asserts:
- hasDocuments:
count: 0
- it: Should not create Certificate if `certManager.enable` is `false`
capabilities:
apiVersions:
- cert-manager.io/v1/Certificate
set:
webhook:
enable: true
certManager:
enable: false
asserts:
- hasDocuments:
count: 0
- it: Should create Certificate if `webhook.enable` is `true` and `certManager.enable` is `true`
capabilities:
apiVersions:
- cert-manager.io/v1/Certificate
set:
webhook:
enable: true
certManager:
enable: true
asserts:
- containsDocument:
apiVersion: cert-manager.io/v1
kind: Certificate
name: spark-operator-certificate
namespace: spark-operator
- it: Should fail if the cluster does not support `cert-manager.io/v1/Certificate`
set:
webhook:
enable: true
certManager:
enable: true
asserts:
- failedTemplate:
errorMessage: "The cluster does not support the required API version `cert-manager.io/v1` for `Certificate`."
- it: Should use self signed issuer if `certManager.issuerRef` is not set
capabilities:
apiVersions:
- cert-manager.io/v1/Certificate
set:
webhook:
enable: true
certManager:
enable: true
issuerRef:
group: cert-manager.io
kind: Issuer
name: test-issuer
asserts:
- equal:
path: spec.issuerRef
value:
group: cert-manager.io
kind: Issuer
name: test-issuer
- it: Should use the specified issuer if `certManager.issuerRef` is set
capabilities:
apiVersions:
- cert-manager.io/v1/Certificate
set:
webhook:
enable: true
certManager:
enable: true
issuerRef:
group: cert-manager.io
kind: Issuer
name: test-issuer
asserts:
- equal:
path: spec.issuerRef
value:
group: cert-manager.io
kind: Issuer
name: test-issuer
- it: Should use the specified duration if `certManager.duration` is set
capabilities:
apiVersions:
- cert-manager.io/v1/Certificate
set:
webhook:
enable: true
certManager:
enable: true
duration: 8760h
asserts:
- equal:
path: spec.duration
value: 8760h

View File

@ -0,0 +1,95 @@
#
# Copyright 2025 The Kubeflow authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
suite: Test CertManager Issuer
templates:
- certmanager/issuer.yaml
release:
name: spark-operator
namespace: spark-operator
tests:
- it: Should not create Issuer if `webhook.enable` is `false`
capabilities:
apiVersions:
- cert-manager.io/v1/Issuer
set:
webhook:
enable: false
certManager:
enable: true
asserts:
- hasDocuments:
count: 0
- it: Should not create Issuer if `certManager.enable` is `false`
capabilities:
apiVersions:
- cert-manager.io/v1/Issuer
set:
webhook:
enable: true
certManager:
enable: false
asserts:
- hasDocuments:
count: 0
- it: Should not create Issuer if `certManager.issuerRef` is set
capabilities:
apiVersions:
- cert-manager.io/v1/Issuer
set:
webhook:
enable: true
certManager:
enable: true
issuerRef:
group: cert-manager.io
kind: Issuer
name: test-issuer
asserts:
- hasDocuments:
count: 0
- it: Should fail if the cluster does not support `cert-manager.io/v1/Issuer`
set:
webhook:
enable: true
certManager:
enable: true
asserts:
- failedTemplate:
errorMessage: "The cluster does not support the required API version `cert-manager.io/v1` for `Issuer`."
- it: Should create Issuer if `webhook.enable` is `true` and `certManager.enable` is `true`
capabilities:
apiVersions:
- cert-manager.io/v1/Issuer
set:
webhook:
enable: true
certManager:
enable: true
issuerRef: null
asserts:
- containsDocument:
apiVersion: cert-manager.io/v1
kind: Issuer
name: spark-operator-self-signed-issuer
namespace: spark-operator

View File

@ -184,6 +184,37 @@ tests:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --ingress-class-name=nginx
- it: Should contain `--ingress-tls` arg if `controller.uiIngress.enable` is set to `true` and `controller.uiIngress.tls` is set
set:
controller:
uiService:
enable: true
uiIngress:
enable: true
tls:
- hosts:
- "*.test.com"
secretName: test-secret
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: '--ingress-tls=[{"hosts":["*.test.com"],"secretName":"test-secret"}]'
- it: Should contain `--ingress-annotations` arg if `controller.uiIngress.enable` is set to `true` and `controller.uiIngress.annotations` is set
set:
controller:
uiService:
enable: true
uiIngress:
enable: true
annotations:
cert-manager.io/cluster-issuer: "letsencrypt"
kubernetes.io/ingress.class: nginx
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: '--ingress-annotations={"cert-manager.io/cluster-issuer":"letsencrypt","kubernetes.io/ingress.class":"nginx"}'
- it: Should contain `--enable-batch-scheduler` arg if `controller.batchScheduler.enable` is `true`
set:
controller:
@ -246,7 +277,7 @@ tests:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --leader-election-lock-namespace=spark-operator
- it: Should disable leader election if `controller.leaderElection.enable` is set to `false`
set:
controller:

View File

@ -29,9 +29,9 @@ commonLabels: {}
image:
# -- Image registry.
registry: docker.io
registry: ghcr.io
# -- Image repository.
repository: kubeflow/spark-operator
repository: kubeflow/spark-operator/controller
# -- Image tag.
# @default -- If not set, the chart appVersion will be used.
tag: ""
@ -55,6 +55,9 @@ controller:
# -- Configure the verbosity of logging, can be one of `debug`, `info`, `error`.
logLevel: info
# -- Configure the encoder of logging, can be one of `console` or `json`.
logEncoder: console
# -- Grace period after a successful spark-submit when driver pod not found errors will be retried. Useful if the driver pod can take some time to be created.
driverPodCreationGracePeriod: 10s
@ -74,6 +77,15 @@ controller:
urlFormat: ""
# -- Optionally set the ingressClassName.
ingressClassName: ""
# -- Optionally set default TLS configuration for the Spark UI's ingress. `ingressTLS` in the SparkApplication spec overrides this.
tls: []
# - hosts:
# - "*.example.com"
# secretName: "example-secret"
# -- Optionally set default ingress annotations for the Spark UI's ingress. `ingressAnnotations` in the SparkApplication spec overrides this.
annotations: {}
# key1: value1
# key2: value2
batchScheduler:
# -- Specifies whether to enable batch scheduler for spark jobs scheduling.
@ -231,6 +243,9 @@ webhook:
# -- Configure the verbosity of logging, can be one of `debug`, `info`, `error`.
logLevel: info
# -- Configure the encoder of logging, can be one of `console` or `json`.
logEncoder: console
# -- Specifies webhook port.
port: 9443
@ -407,3 +422,22 @@ prometheus:
podMetricsEndpoint:
scheme: http
interval: 5s
certManager:
# -- Specifies whether to use [cert-manager](https://cert-manager.io) to generate certificate for webhook.
# `webhook.enable` must be set to `true` to enable cert-manager.
enable: false
# -- The reference to the issuer.
# @default -- A self-signed issuer will be created and used if not specified.
issuerRef: {}
# group: cert-manager.io
# kind: ClusterIssuer
# name: selfsigned
# -- The duration of the certificate validity (e.g. `2160h`).
# See [cert-manager.io/v1.Certificate](https://cert-manager.io/docs/reference/api-docs/#cert-manager.io/v1.Certificate).
# @default -- `2160h` (90 days) will be used if not specified.
duration: ""
# -- The duration before the certificate expiration to renew the certificate (e.g. `720h`).
# See [cert-manager.io/v1.Certificate](https://cert-manager.io/docs/reference/api-docs/#cert-manager.io/v1.Certificate).
# @default -- 1/3 of issued certificates lifetime.
renewBefore: ""

View File

@ -18,7 +18,9 @@ package controller
import (
"crypto/tls"
"encoding/json"
"flag"
"fmt"
"os"
"slices"
"time"
@ -33,6 +35,7 @@ import (
"go.uber.org/zap/zapcore"
"golang.org/x/time/rate"
corev1 "k8s.io/api/core/v1"
networkingv1 "k8s.io/api/networking/v1"
"k8s.io/apimachinery/pkg/labels"
"k8s.io/apimachinery/pkg/runtime"
utilruntime "k8s.io/apimachinery/pkg/util/runtime"
@ -49,18 +52,19 @@ import (
ctrlwebhook "sigs.k8s.io/controller-runtime/pkg/webhook"
schedulingv1alpha1 "sigs.k8s.io/scheduler-plugins/apis/scheduling/v1alpha1"
sparkoperator "github.com/kubeflow/spark-operator"
"github.com/kubeflow/spark-operator/api/v1beta1"
"github.com/kubeflow/spark-operator/api/v1beta2"
"github.com/kubeflow/spark-operator/internal/controller/scheduledsparkapplication"
"github.com/kubeflow/spark-operator/internal/controller/sparkapplication"
"github.com/kubeflow/spark-operator/internal/metrics"
"github.com/kubeflow/spark-operator/internal/scheduler"
"github.com/kubeflow/spark-operator/internal/scheduler/kubescheduler"
"github.com/kubeflow/spark-operator/internal/scheduler/volcano"
"github.com/kubeflow/spark-operator/internal/scheduler/yunikorn"
"github.com/kubeflow/spark-operator/pkg/common"
"github.com/kubeflow/spark-operator/pkg/util"
sparkoperator "github.com/kubeflow/spark-operator/v2"
"github.com/kubeflow/spark-operator/v2/api/v1alpha1"
"github.com/kubeflow/spark-operator/v2/api/v1beta2"
"github.com/kubeflow/spark-operator/v2/internal/controller/scheduledsparkapplication"
"github.com/kubeflow/spark-operator/v2/internal/controller/sparkapplication"
"github.com/kubeflow/spark-operator/v2/internal/controller/sparkconnect"
"github.com/kubeflow/spark-operator/v2/internal/metrics"
"github.com/kubeflow/spark-operator/v2/internal/scheduler"
"github.com/kubeflow/spark-operator/v2/internal/scheduler/kubescheduler"
"github.com/kubeflow/spark-operator/v2/internal/scheduler/volcano"
"github.com/kubeflow/spark-operator/v2/internal/scheduler/yunikorn"
"github.com/kubeflow/spark-operator/v2/pkg/common"
"github.com/kubeflow/spark-operator/v2/pkg/util"
// +kubebuilder:scaffold:imports
)
@ -88,9 +92,11 @@ var (
defaultBatchScheduler string
// Spark web UI service and ingress
enableUIService bool
ingressClassName string
ingressURLFormat string
enableUIService bool
ingressClassName string
ingressURLFormat string
ingressTLS []networkingv1.IngressTLS
ingressAnnotations map[string]string
// Leader election
enableLeaderElection bool
@ -122,17 +128,31 @@ func init() {
utilruntime.Must(clientgoscheme.AddToScheme(scheme))
utilruntime.Must(schedulingv1alpha1.AddToScheme(scheme))
utilruntime.Must(v1beta1.AddToScheme(scheme))
utilruntime.Must(v1alpha1.AddToScheme(scheme))
utilruntime.Must(v1beta2.AddToScheme(scheme))
// +kubebuilder:scaffold:scheme
}
func NewStartCommand() *cobra.Command {
var ingressTLSstring string
var ingressAnnotationsString string
var command = &cobra.Command{
Use: "start",
Short: "Start controller and webhook",
PreRun: func(_ *cobra.Command, args []string) {
PreRunE: func(_ *cobra.Command, args []string) error {
development = viper.GetBool("development")
if ingressTLSstring != "" {
if err := json.Unmarshal([]byte(ingressTLSstring), &ingressTLS); err != nil {
return fmt.Errorf("failed parsing ingress-tls JSON string from CLI: %v", err)
}
}
if ingressAnnotationsString != "" {
if err := json.Unmarshal([]byte(ingressAnnotationsString), &ingressAnnotations); err != nil {
return fmt.Errorf("failed parsing ingress-annotations JSON string from CLI: %v", err)
}
}
return nil
},
Run: func(_ *cobra.Command, args []string) {
sparkoperator.PrintVersion(false)
@ -156,6 +176,8 @@ func NewStartCommand() *cobra.Command {
command.Flags().BoolVar(&enableUIService, "enable-ui-service", true, "Enable Spark Web UI service.")
command.Flags().StringVar(&ingressClassName, "ingress-class-name", "", "Set ingressClassName for ingress resources created.")
command.Flags().StringVar(&ingressURLFormat, "ingress-url-format", "", "Ingress URL format.")
command.Flags().StringVar(&ingressTLSstring, "ingress-tls", "", "JSON format string for the default TLS config on the Spark UI ingresses. e.g. '[{\"hosts\":[\"*.example.com\"],\"secretName\":\"example-secret\"}]'. `ingressTLS` in the SparkApplication spec will override this value.")
command.Flags().StringVar(&ingressAnnotationsString, "ingress-annotations", "", "JSON format string for the default ingress annotations for the Spark UI ingresses. e.g. '[{\"cert-manager.io/cluster-issuer\": \"letsencrypt\"}]'. `ingressAnnotations` in the SparkApplication spec will override this value.")
command.Flags().BoolVar(&enableLeaderElection, "leader-election", false, "Enable leader election for controller manager. "+
"Enabling this will ensure there is only one active controller manager.")
@ -264,6 +286,8 @@ func start() {
}
}
sparkSubmitter := &sparkapplication.SparkSubmitter{}
// Setup controller for SparkApplication.
if err = sparkapplication.NewReconciler(
mgr,
@ -271,6 +295,7 @@ func start() {
mgr.GetClient(),
mgr.GetEventRecorderFor("spark-application-controller"),
registry,
sparkSubmitter,
newSparkApplicationReconcilerOptions(),
).SetupWithManager(mgr, newControllerOptions()); err != nil {
logger.Error(err, "Failed to create controller", "controller", "SparkApplication")
@ -289,6 +314,18 @@ func start() {
os.Exit(1)
}
// Setup controller for SparkConnect.
if err = sparkconnect.NewReconciler(
mgr,
mgr.GetScheme(),
mgr.GetClient(),
mgr.GetEventRecorderFor("SparkConnect"),
newSparkConnectReconcilerOptions(),
).SetupWithManager(mgr, newControllerOptions()); err != nil {
logger.Error(err, "Failed to create controller", "controller", "SparkConnect")
os.Exit(1)
}
// +kubebuilder:scaffold:builder
if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil {
@ -314,19 +351,12 @@ func setupLog() {
logzap.UseFlagOptions(&zapOptions),
func(o *logzap.Options) {
o.Development = development
}, func(o *logzap.Options) {
o.ZapOpts = append(o.ZapOpts, zap.AddCaller())
}, func(o *logzap.Options) {
var config zapcore.EncoderConfig
if !development {
config = zap.NewProductionEncoderConfig()
} else {
config = zap.NewDevelopmentEncoderConfig()
}
config.EncodeLevel = zapcore.CapitalColorLevelEncoder
config.EncodeTime = zapcore.ISO8601TimeEncoder
config.EncodeCaller = zapcore.ShortCallerEncoder
o.Encoder = zapcore.NewConsoleEncoder(config)
o.EncoderConfigOptions = append(o.EncoderConfigOptions, func(config *zapcore.EncoderConfig) {
config.EncodeLevel = zapcore.CapitalLevelEncoder
config.EncodeTime = zapcore.ISO8601TimeEncoder
config.EncodeCaller = zapcore.ShortCallerEncoder
})
}),
)
}
@ -368,10 +398,12 @@ func newCacheOptions() cache.Options {
common.LabelLaunchedBySparkOperator: "true",
}),
},
&corev1.ConfigMap{}: {},
&corev1.PersistentVolumeClaim{}: {},
&corev1.Service{}: {},
&v1beta2.SparkApplication{}: {},
&corev1.ConfigMap{}: {},
&corev1.PersistentVolumeClaim{}: {},
&corev1.Service{}: {},
&v1beta2.SparkApplication{}: {},
&v1beta2.ScheduledSparkApplication{}: {},
&v1alpha1.SparkConnect{}: {},
},
}
@ -402,6 +434,8 @@ func newSparkApplicationReconcilerOptions() sparkapplication.Options {
EnableUIService: enableUIService,
IngressClassName: ingressClassName,
IngressURLFormat: ingressURLFormat,
IngressTLS: ingressTLS,
IngressAnnotations: ingressAnnotations,
DefaultBatchScheduler: defaultBatchScheduler,
DriverPodCreationGracePeriod: driverPodCreationGracePeriod,
SparkApplicationMetrics: sparkApplicationMetrics,
@ -420,3 +454,10 @@ func newScheduledSparkApplicationReconcilerOptions() scheduledsparkapplication.O
}
return options
}
func newSparkConnectReconcilerOptions() sparkconnect.Options {
options := sparkconnect.Options{
Namespaces: namespaces,
}
return options
}

View File

@ -22,9 +22,9 @@ import (
"github.com/spf13/cobra"
"github.com/kubeflow/spark-operator/cmd/operator/controller"
"github.com/kubeflow/spark-operator/cmd/operator/version"
"github.com/kubeflow/spark-operator/cmd/operator/webhook"
"github.com/kubeflow/spark-operator/v2/cmd/operator/controller"
"github.com/kubeflow/spark-operator/v2/cmd/operator/version"
"github.com/kubeflow/spark-operator/v2/cmd/operator/webhook"
)
func NewCommand() *cobra.Command {

View File

@ -19,7 +19,7 @@ package version
import (
"github.com/spf13/cobra"
sparkoperator "github.com/kubeflow/spark-operator"
sparkoperator "github.com/kubeflow/spark-operator/v2"
)
var (

View File

@ -48,15 +48,14 @@ import (
metricsserver "sigs.k8s.io/controller-runtime/pkg/metrics/server"
ctrlwebhook "sigs.k8s.io/controller-runtime/pkg/webhook"
sparkoperator "github.com/kubeflow/spark-operator"
"github.com/kubeflow/spark-operator/api/v1beta1"
"github.com/kubeflow/spark-operator/api/v1beta2"
"github.com/kubeflow/spark-operator/internal/controller/mutatingwebhookconfiguration"
"github.com/kubeflow/spark-operator/internal/controller/validatingwebhookconfiguration"
"github.com/kubeflow/spark-operator/internal/webhook"
"github.com/kubeflow/spark-operator/pkg/certificate"
"github.com/kubeflow/spark-operator/pkg/common"
"github.com/kubeflow/spark-operator/pkg/util"
sparkoperator "github.com/kubeflow/spark-operator/v2"
"github.com/kubeflow/spark-operator/v2/api/v1beta2"
"github.com/kubeflow/spark-operator/v2/internal/controller/mutatingwebhookconfiguration"
"github.com/kubeflow/spark-operator/v2/internal/controller/validatingwebhookconfiguration"
"github.com/kubeflow/spark-operator/v2/internal/webhook"
"github.com/kubeflow/spark-operator/v2/pkg/certificate"
"github.com/kubeflow/spark-operator/v2/pkg/common"
"github.com/kubeflow/spark-operator/v2/pkg/util"
// +kubebuilder:scaffold:imports
)
@ -86,6 +85,9 @@ var (
webhookServiceName string
webhookServiceNamespace string
// Cert Manager
enableCertManager bool
// Leader election
enableLeaderElection bool
leaderElectionLockName string
@ -111,7 +113,6 @@ var (
func init() {
utilruntime.Must(clientgoscheme.AddToScheme(scheme))
utilruntime.Must(v1beta1.AddToScheme(scheme))
utilruntime.Must(v1beta2.AddToScheme(scheme))
// +kubebuilder:scaffold:scheme
}
@ -129,11 +130,13 @@ func NewStartCommand() *cobra.Command {
},
}
// Controller
command.Flags().IntVar(&controllerThreads, "controller-threads", 10, "Number of worker threads used by the SparkApplication controller.")
command.Flags().StringSliceVar(&namespaces, "namespaces", []string{}, "The Kubernetes namespace to manage. Will manage custom resource objects of the managed CRD types for the whole cluster if unset or contains empty string.")
command.Flags().StringVar(&labelSelectorFilter, "label-selector-filter", "", "A comma-separated list of key=value, or key labels to filter resources during watch and list based on the specified labels.")
command.Flags().DurationVar(&cacheSyncTimeout, "cache-sync-timeout", 30*time.Second, "Informer cache sync timeout.")
// Webhook
command.Flags().StringVar(&webhookCertDir, "webhook-cert-dir", "/etc/k8s-webhook-server/serving-certs", "The directory that contains the webhook server key and certificate. "+
"When running as nonRoot, you must create and own this directory before running this command.")
command.Flags().StringVar(&webhookCertName, "webhook-cert-name", "tls.crt", "The file name of webhook server certificate.")
@ -147,6 +150,10 @@ func NewStartCommand() *cobra.Command {
command.Flags().StringVar(&webhookServiceNamespace, "webhook-svc-namespace", "spark-webhook", "The name of the Service for the webhook server.")
command.Flags().BoolVar(&enableResourceQuotaEnforcement, "enable-resource-quota-enforcement", false, "Whether to enable ResourceQuota enforcement for SparkApplication resources. Requires the webhook to be enabled.")
// Cert Manager
command.Flags().BoolVar(&enableCertManager, "enable-cert-manager", false, "Enable cert-manager to manage the webhook server's TLS certificate.")
// Leader election
command.Flags().BoolVar(&enableLeaderElection, "leader-election", false, "Enable leader election for controller manager. "+
"Enabling this will ensure there is only one active controller manager.")
command.Flags().StringVar(&leaderElectionLockName, "leader-election-lock-name", "spark-operator-lock", "Name of the ConfigMap for leader election.")
@ -155,6 +162,7 @@ func NewStartCommand() *cobra.Command {
command.Flags().DurationVar(&leaderElectionRenewDeadline, "leader-election-renew-deadline", 14*time.Second, "Leader election renew deadline.")
command.Flags().DurationVar(&leaderElectionRetryPeriod, "leader-election-retry-period", 4*time.Second, "Leader election retry period.")
// Prometheus metrics
command.Flags().BoolVar(&enableMetrics, "enable-metrics", false, "Enable metrics.")
command.Flags().StringVar(&metricsBindAddress, "metrics-bind-address", "0", "The address the metric endpoint binds to. "+
"Use the port :8080. If not set, it will be 0 in order to disable the metrics server")
@ -232,6 +240,7 @@ func start() {
client,
webhookServiceName,
webhookServiceNamespace,
enableCertManager,
)
if err := wait.ExponentialBackoff(
@ -242,7 +251,6 @@ func start() {
Jitter: 0.1,
},
func() (bool, error) {
logger.Info("Syncing webhook secret", "name", webhookSecretName, "namespace", webhookSecretNamespace)
if err := certProvider.SyncSecret(context.TODO(), webhookSecretName, webhookSecretNamespace); err != nil {
if errors.IsAlreadyExists(err) || errors.IsConflict(err) {
return false, nil
@ -262,22 +270,24 @@ func start() {
os.Exit(1)
}
if err := mutatingwebhookconfiguration.NewReconciler(
mgr.GetClient(),
certProvider,
mutatingWebhookName,
).SetupWithManager(mgr, controller.Options{}); err != nil {
logger.Error(err, "Failed to create controller", "controller", "MutatingWebhookConfiguration")
os.Exit(1)
}
if !enableCertManager {
if err := mutatingwebhookconfiguration.NewReconciler(
mgr.GetClient(),
certProvider,
mutatingWebhookName,
).SetupWithManager(mgr, controller.Options{}); err != nil {
logger.Error(err, "Failed to create controller", "controller", "MutatingWebhookConfiguration")
os.Exit(1)
}
if err := validatingwebhookconfiguration.NewReconciler(
mgr.GetClient(),
certProvider,
validatingWebhookName,
).SetupWithManager(mgr, controller.Options{}); err != nil {
logger.Error(err, "Failed to create controller", "controller", "ValidatingWebhookConfiguration")
os.Exit(1)
if err := validatingwebhookconfiguration.NewReconciler(
mgr.GetClient(),
certProvider,
validatingWebhookName,
).SetupWithManager(mgr, controller.Options{}); err != nil {
logger.Error(err, "Failed to create controller", "controller", "ValidatingWebhookConfiguration")
os.Exit(1)
}
}
if err := ctrl.NewWebhookManagedBy(mgr).

View File

@ -1,231 +0,0 @@
# sparkctl
`sparkctl` is a command-line tool of the Spark Operator for creating, listing, checking status of, getting logs of, and deleting `SparkApplication`s. It can also do port forwarding from a local port to the Spark web UI port for accessing the Spark web UI on the driver. Each function is implemented as a sub-command of `sparkctl`.
To build the `sparkctl` binary, run the following command in the root directory of the project:
```bash
make build-sparkctl
```
Then the `sparkctl` binary can be found in the `bin` directory:
```bash
$ bin/sparkctl --help
sparkctl is the command-line tool for working with the Spark Operator. It supports creating, deleting and
checking status of SparkApplication objects. It also supports fetching application logs.
Usage:
sparkctl [command]
Available Commands:
completion Generate the autocompletion script for the specified shell
create Create a SparkApplication object
delete Delete a SparkApplication object
event Shows SparkApplication events
forward Start to forward a local port to the remote port of the driver UI
help Help about any command
list List SparkApplication objects
log log is a sub-command of sparkctl that fetches logs of a Spark application.
status Check status of a SparkApplication
Flags:
-h, --help help for sparkctl
-k, --kubeconfig string The path to the local Kubernetes configuration file (default "/Users/chenyi/.kube/config")
-n, --namespace string The namespace in which the SparkApplication is to be created (default "default")
Use "sparkctl [command] --help" for more information about a command.
```
## Flags
The following global flags are available for all the sub commands:
* `--namespace`: the Kubernetes namespace of the `SparkApplication`(s). Defaults to `default`.
* `--kubeconfig`: the path to the file storing configuration for accessing the Kubernetes API server. Defaults to
`$HOME/.kube/config`
## Available Commands
### Create
`create` is a sub command of `sparkctl` for creating a `SparkApplication` object. There are two ways to create a `SparkApplication` object. One is parsing and creating a `SparkApplication` object in namespace specified by `--namespace` the from a given YAML file. In this way, `create` parses the YAML file, and sends the parsed `SparkApplication` object parsed to the Kubernetes API server. Usage of this way looks like the following:
Usage:
```bash
sparkctl create <path to YAML file>
```
The other way is creating a `SparkApplication` object from a named `ScheduledSparkApplication` to manually force a run of the `ScheduledSparkApplication`. Usage of this way looks like the following:
Usage:
```bash
sparkctl create <name of the SparkApplication> --from <name of the ScheduledSparkApplication>
```
The `create` command also supports shipping local Hadoop configuration files into the driver and executor pods. Specifically, it detects local Hadoop configuration files located at the path specified by the
environment variable `HADOOP_CONF_DIR`, create a Kubernetes `ConfigMap` from the files, and adds the `ConfigMap` to the `SparkApplication` object so it gets mounted into the driver and executor pods by the operator. The environment variable `HADOOP_CONF_DIR` is also set in the driver and executor containers.
#### Staging local dependencies
The `create` command also supports staging local application dependencies, though currently only uploading to a Google Cloud Storage (GCS) bucket is supported. The way it works is as follows. It checks if there is any local dependencies in `spec.mainApplicationFile`, `spec.deps.jars`, `spec.deps.files`, etc. in the parsed `SparkApplication` object. If so, it tries to upload the local dependencies to the remote location specified by `--upload-to`. The command fails if local dependencies are used but `--upload-to` is not specified. By default, a local file that already exists remotely, i.e., there exists a file with the same name and upload path remotely, will be ignored. If the remote file should be overridden instead, the `--override` flag should be specified.
##### Uploading to GCS
For uploading to GCS, the value should be in the form of `gs://<bucket>`. The bucket must exist and uploading fails if otherwise. The local dependencies will be uploaded to the path
`spark-app-dependencies/<SparkApplication namespace>/<SparkApplication name>` in the given bucket. It replaces the file path of each local dependency with the URI of the remote copy in the parsed `SparkApplication` object if uploading is successful.
Note that uploading to GCS requires a GCP service account with the necessary IAM permission to use the GCP project specified by service account JSON key file (`serviceusage.services.use`) and the permission to create GCS objects (`storage.object.create`).
The service account JSON key file must be locally available and be pointed to by the environment variable
`GOOGLE_APPLICATION_CREDENTIALS`. For more information on IAM authentication, please check
[Getting Started with Authentication](https://cloud.google.com/docs/authentication/getting-started).
Usage:
```bash
export GOOGLE_APPLICATION_CREDENTIALS="[PATH]/[FILE_NAME].json"
sparkctl create <path to YAML file> --upload-to gs://<bucket>
```
By default, the uploaded dependencies are not made publicly accessible and are referenced using URIs in the form of `gs://bucket/path/to/file`. Such dependencies are referenced through URIs of the form `gs://bucket/path/to/file`. To download the dependencies from GCS, a custom-built Spark init-container with the [GCS connector](https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage) installed and necessary Hadoop configuration properties specified is needed. An example Docker file of such an init-container can be found [here](https://gist.github.com/liyinan926/f9e81f7b54d94c05171a663345eb58bf).
If you want to make uploaded dependencies publicly available so they can be downloaded by the built-in init-container, simply add `--public` to the `create` command, as the following example shows:
```bash
sparkctl create <path to YAML file> --upload-to gs://<bucket> --public
```
Publicly available files are referenced through URIs of the form `https://storage.googleapis.com/bucket/path/to/file`.
##### Uploading to S3
For uploading to S3, the value should be in the form of `s3://<bucket>`. The bucket must exist and uploading fails if otherwise. The local dependencies will be uploaded to the path
`spark-app-dependencies/<SparkApplication namespace>/<SparkApplication name>` in the given bucket. It replaces the file path of each local dependency with the URI of the remote copy in the parsed `SparkApplication` object if uploading is successful.
Note that uploading to S3 with [AWS SDK](https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/configuring-sdk.html) requires credentials to be specified. For GCP, the S3 Interoperability credentials can be retrieved as described [here](https://cloud.google.com/storage/docs/migrating#keys).
SDK uses the default credential provider chain to find AWS credentials.
The SDK uses the first provider in the chain that returns credentials without an error.
The default provider chain looks for credentials in the following order:
* Environment variables
```
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
```
- Shared credentials file (.aws/credentials)
For more information about AWS SDK authentication, please check [Specifying Credentials](https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/configuring-sdk.html#specifying-credentials).
Usage:
```bash
export AWS_ACCESS_KEY_ID=[KEY]
export AWS_SECRET_ACCESS_KEY=[SECRET]
sparkctl create <path to YAML file> --upload-to s3://<bucket>
```
By default, the uploaded dependencies are not made publicly accessible and are referenced using URIs in the form of `s3a://bucket/path/to/file`. To download the dependencies from S3, a custom-built Spark Docker image with the required jars for `S3A Connector` (`hadoop-aws-2.7.6.jar`, `aws-java-sdk-1.7.6.jar` for Spark build with Hadoop2.7 profile, or `hadoop-aws-3.1.0.jar`, `aws-java-sdk-bundle-1.11.271.jar` for Hadoop3.1) need to be available in the classpath, and `spark-default.conf` with the AWS keys and the S3A FileSystemClass needs to be set (you can also use `spec.hadoopConf` in the SparkApplication YAML):
```properties
spark.hadoop.fs.s3a.endpoint https://storage.googleapis.com
spark.hadoop.fs.s3a.access.key [KEY]
spark.hadoop.fs.s3a.secret.key [SECRET]
spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem
```
NOTE: In Spark 2.3 init-containers are used for downloading remote application dependencies. In future versions, init-containers are removed.
It is recommended to use Apache Spark 2.4 for staging local dependencies with `s3`, which currently requires building a custom Docker image from the Spark master branch. Additionally, since Spark 2.4.0
there are two available build profiles, Hadoop2.7 and Hadoop3.1. For use of Spark with `S3A Connector`, Hadoop3.1 profile is recommended as this allows to use newer version of `aws-java-sdk-bundle`.
If you want to use custom S3 endpoint or region, add `--upload-to-endpoint` and `--upload-to-region`:
```bash
sparkctl create <path to YAML file> --upload-to-endpoint https://<endpoint-url> --upload-to-region <endpoint-region> --upload-to s3://<bucket>
```
If you want to force path style URLs for S3 objects add `--s3-force-path-style`:
```bash
sparkctl create <path to YAML file> --s3-force-path-style
```
If you want to make uploaded dependencies publicly available, add `--public` to the `create` command, as the following example shows:
```bash
sparkctl create <path to YAML file> --upload-to s3://<bucket> --public
```
Publicly available files are referenced through URIs in the default form `https://<endpoint-url>/bucket/path/to/file`.
### List
`list` is a sub command of `sparkctl` for listing `SparkApplication` objects in the namespace specified by
`--namespace`.
Usage:
```bash
sparkctl list
```
### Status
`status` is a sub command of `sparkctl` for checking and printing the status of a `SparkApplication` in the namespace specified by `--namespace`.
Usage:
```bash
sparkctl status <SparkApplication name>
```
### Event
`event` is a sub command of `sparkctl` for listing `SparkApplication` events in the namespace
specified by `--namespace`.
The `event` command also supports streaming the events with the `--follow` or `-f` flag.
The command will display events since last creation of the `SparkApplication` for the specific `name`, and continues to stream events even if `ResourceVersion` changes.
Usage:
```bash
sparkctl event <SparkApplication name> [-f]
```
### Log
`log` is a sub command of `sparkctl` for fetching the logs of a pod of `SparkApplication` with the given name in the namespace specified by `--namespace`. The command by default fetches the logs of the driver pod. To make it fetch logs of an executor pod instead, use the flag `--executor` or `-e` to specify the ID of the executor whose logs should be fetched.
The `log` command also supports streaming the driver or executor logs with the `--follow` or `-f` flag. It works in the same way as `kubectl logs -f`, i.e., it streams logs until no more logs are available.
Usage:
```bash
sparkctl log <SparkApplication name> [-e <executor ID, e.g., 1>] [-f]
```
### Delete
`delete` is a sub command of `sparkctl` for deleting a `SparkApplication` with the given name in the namespace specified by `--namespace`.
Usage:
```bash
sparkctl delete <SparkApplication name>
```
### Forward
`forward` is a sub command of `sparkctl` for doing port forwarding from a local port to the Spark web UI port on the driver. It allows the Spark web UI served in the driver pod to be accessed locally. By default, it forwards from local port `4040` to remote port `4040`, which is the default Spark web UI port. Users can specify different local port and remote port using the flags `--local-port` and `--remote-port`, respectively.
Usage:
```bash
sparkctl forward <SparkApplication name> [--local-port <local port>] [--remote-port <remote port>]
```
Once port forwarding starts, users can open `127.0.0.1:<local port>` or `localhost:<local port>` in a browser to access the Spark web UI. Forwarding continues until it is interrupted or the driver pod terminates.

View File

@ -1,77 +0,0 @@
/*
Copyright 2017 Google LLC
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package app
import (
"context"
"os"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
clientset "k8s.io/client-go/kubernetes"
"k8s.io/client-go/rest"
"k8s.io/client-go/tools/clientcmd"
"github.com/kubeflow/spark-operator/api/v1beta2"
crdclientset "github.com/kubeflow/spark-operator/pkg/client/clientset/versioned"
)
func buildConfig(kubeConfig string) (*rest.Config, error) {
// Check if kubeConfig exist
if _, err := os.Stat(kubeConfig); os.IsNotExist(err) {
// Try InClusterConfig for sparkctl running in a pod
config, err := rest.InClusterConfig()
if err != nil {
return nil, err
}
return config, nil
}
return clientcmd.BuildConfigFromFlags("", kubeConfig)
}
func getKubeClient() (clientset.Interface, error) {
config, err := buildConfig(KubeConfig)
if err != nil {
return nil, err
}
return getKubeClientForConfig(config)
}
func getKubeClientForConfig(config *rest.Config) (clientset.Interface, error) {
return clientset.NewForConfig(config)
}
func getSparkApplicationClient() (crdclientset.Interface, error) {
config, err := buildConfig(KubeConfig)
if err != nil {
return nil, err
}
return getSparkApplicationClientForConfig(config)
}
func getSparkApplicationClientForConfig(config *rest.Config) (crdclientset.Interface, error) {
return crdclientset.NewForConfig(config)
}
func getSparkApplication(name string, crdClientset crdclientset.Interface) (*v1beta2.SparkApplication, error) {
app, err := crdClientset.SparkoperatorV1beta2().SparkApplications(Namespace).Get(context.TODO(), name, metav1.GetOptions{})
if err != nil {
return nil, err
}
return app, nil
}

View File

@ -1,502 +0,0 @@
/*
Copyright 2017 Google LLC
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package app
import (
"context"
"fmt"
"net/url"
"os"
"path/filepath"
"reflect"
"unicode/utf8"
"github.com/spf13/cobra"
"gocloud.dev/blob"
"gocloud.dev/gcerrors"
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/yaml"
clientset "k8s.io/client-go/kubernetes"
"github.com/kubeflow/spark-operator/api/v1beta2"
crdclientset "github.com/kubeflow/spark-operator/pkg/client/clientset/versioned"
)
const bufferSize = 1024
var DeleteIfExists bool
var LogsEnabled bool
var RootPath string
var UploadToPath string
var UploadToEndpoint string
var UploadToRegion string
var Public bool
var S3ForcePathStyle bool
var Override bool
var From string
var createCmd = &cobra.Command{
Use: "create <yaml file>",
Short: "Create a SparkApplication object",
Long: `Create a SparkApplication from a given YAML file storing the application specification.`,
Run: func(cmd *cobra.Command, args []string) {
ctx := cmd.Context()
if From != "" && len(args) != 1 {
fmt.Fprintln(os.Stderr, "must specify the name of a ScheduledSparkApplication")
return
}
if len(args) != 1 {
fmt.Fprintln(os.Stderr, "must specify a YAML file of a SparkApplication")
return
}
kubeClient, err := getKubeClient()
if err != nil {
fmt.Fprintf(os.Stderr, "failed to get Kubernetes client: %v\n", err)
return
}
crdClient, err := getSparkApplicationClient()
if err != nil {
fmt.Fprintf(os.Stderr, "failed to get SparkApplication client: %v\n", err)
return
}
if From != "" {
if err := createFromScheduledSparkApplication(ctx, args[0], kubeClient, crdClient); err != nil {
fmt.Fprintf(os.Stderr, "%v\n", err)
}
} else {
if err := createFromYaml(ctx, args[0], kubeClient, crdClient); err != nil {
fmt.Fprintf(os.Stderr, "%v\n", err)
}
}
},
}
func init() {
createCmd.Flags().BoolVarP(&DeleteIfExists, "delete", "d", false,
"delete the SparkApplication if already exists")
createCmd.Flags().BoolVarP(&LogsEnabled, "logs", "l", false,
"watch the SparkApplication logs")
createCmd.Flags().StringVarP(&UploadToPath, "upload-to", "u", "",
"the name of the bucket where local application dependencies are to be uploaded")
createCmd.Flags().StringVarP(&RootPath, "upload-prefix", "p", "",
"the prefix to use for the dependency uploads")
createCmd.Flags().StringVarP(&UploadToRegion, "upload-to-region", "r", "",
"the GCS or S3 storage region for the bucket")
createCmd.Flags().StringVarP(&UploadToEndpoint, "upload-to-endpoint", "e",
"https://storage.googleapis.com", "the GCS or S3 storage api endpoint url")
createCmd.Flags().BoolVarP(&Public, "public", "c", false,
"whether to make uploaded files publicly available")
createCmd.Flags().BoolVar(&S3ForcePathStyle, "s3-force-path-style", false,
"whether to force path style URLs for S3 objects")
createCmd.Flags().BoolVarP(&Override, "override", "o", false,
"whether to override remote files with the same names")
createCmd.Flags().StringVarP(&From, "from", "f", "",
"the name of ScheduledSparkApplication from which a forced SparkApplication run is created")
}
func createFromYaml(ctx context.Context, yamlFile string, kubeClient clientset.Interface, crdClient crdclientset.Interface) error {
app, err := loadFromYAML(yamlFile)
if err != nil {
return fmt.Errorf("failed to read a SparkApplication from %s: %v", yamlFile, err)
}
if err := createSparkApplication(ctx, app, kubeClient, crdClient); err != nil {
return fmt.Errorf("failed to create SparkApplication %s: %v", app.Name, err)
}
return nil
}
func createFromScheduledSparkApplication(ctx context.Context, name string, kubeClient clientset.Interface, crdClient crdclientset.Interface) error {
sapp, err := crdClient.SparkoperatorV1beta2().ScheduledSparkApplications(Namespace).Get(context.TODO(), From, metav1.GetOptions{})
if err != nil {
return fmt.Errorf("failed to get ScheduledSparkApplication %s: %v", From, err)
}
app := &v1beta2.SparkApplication{
ObjectMeta: metav1.ObjectMeta{
Namespace: Namespace,
Name: name,
OwnerReferences: []metav1.OwnerReference{
{
APIVersion: v1beta2.SchemeGroupVersion.String(),
Kind: reflect.TypeOf(v1beta2.ScheduledSparkApplication{}).Name(),
Name: sapp.Name,
UID: sapp.UID,
},
},
},
Spec: *sapp.Spec.Template.DeepCopy(),
}
if err := createSparkApplication(ctx, app, kubeClient, crdClient); err != nil {
return fmt.Errorf("failed to create SparkApplication %s: %v", app.Name, err)
}
return nil
}
func createSparkApplication(ctx context.Context, app *v1beta2.SparkApplication, kubeClient clientset.Interface, crdClient crdclientset.Interface) error {
if DeleteIfExists {
if err := deleteSparkApplication(app.Name, crdClient); err != nil {
return err
}
}
v1beta2.SetSparkApplicationDefaults(app)
if err := validateSpec(app.Spec); err != nil {
return err
}
if err := handleLocalDependencies(app); err != nil {
return err
}
if hadoopConfDir := os.Getenv("HADOOP_CONF_DIR"); hadoopConfDir != "" {
fmt.Println("creating a ConfigMap for Hadoop configuration files in HADOOP_CONF_DIR")
if err := handleHadoopConfiguration(app, hadoopConfDir, kubeClient); err != nil {
return err
}
}
if _, err := crdClient.SparkoperatorV1beta2().SparkApplications(Namespace).Create(
context.TODO(),
app,
metav1.CreateOptions{},
); err != nil {
return err
}
fmt.Printf("SparkApplication \"%s\" created\n", app.Name)
if LogsEnabled {
if err := doLog(ctx, app.Name, true, kubeClient, crdClient); err != nil {
return nil
}
}
return nil
}
func loadFromYAML(yamlFile string) (*v1beta2.SparkApplication, error) {
file, err := os.Open(yamlFile)
if err != nil {
return nil, err
}
defer file.Close()
decoder := yaml.NewYAMLOrJSONDecoder(file, bufferSize)
app := &v1beta2.SparkApplication{}
err = decoder.Decode(app)
if err != nil {
return nil, err
}
return app, nil
}
func validateSpec(spec v1beta2.SparkApplicationSpec) error {
if spec.Image == nil && (spec.Driver.Image == nil || spec.Executor.Image == nil) {
return fmt.Errorf("'spec.driver.image' and 'spec.executor.image' cannot be empty when 'spec.image' " +
"is not set")
}
return nil
}
func handleLocalDependencies(app *v1beta2.SparkApplication) error {
if app.Spec.MainApplicationFile != nil {
isMainAppFileLocal, err := isLocalFile(*app.Spec.MainApplicationFile)
if err != nil {
return err
}
if isMainAppFileLocal {
uploadedMainFile, err := uploadLocalDependencies(app, []string{*app.Spec.MainApplicationFile})
if err != nil {
return fmt.Errorf("failed to upload local main application file: %v", err)
}
app.Spec.MainApplicationFile = &uploadedMainFile[0]
}
}
localJars, err := filterLocalFiles(app.Spec.Deps.Jars)
if err != nil {
return fmt.Errorf("failed to filter local jars: %v", err)
}
if len(localJars) > 0 {
uploadedJars, err := uploadLocalDependencies(app, localJars)
if err != nil {
return fmt.Errorf("failed to upload local jars: %v", err)
}
app.Spec.Deps.Jars = uploadedJars
}
localFiles, err := filterLocalFiles(app.Spec.Deps.Files)
if err != nil {
return fmt.Errorf("failed to filter local files: %v", err)
}
if len(localFiles) > 0 {
uploadedFiles, err := uploadLocalDependencies(app, localFiles)
if err != nil {
return fmt.Errorf("failed to upload local files: %v", err)
}
app.Spec.Deps.Files = uploadedFiles
}
localPyFiles, err := filterLocalFiles(app.Spec.Deps.PyFiles)
if err != nil {
return fmt.Errorf("failed to filter local pyfiles: %v", err)
}
if len(localPyFiles) > 0 {
uploadedPyFiles, err := uploadLocalDependencies(app, localPyFiles)
if err != nil {
return fmt.Errorf("failed to upload local pyfiles: %v", err)
}
app.Spec.Deps.PyFiles = uploadedPyFiles
}
return nil
}
func filterLocalFiles(files []string) ([]string, error) {
var localFiles []string
for _, file := range files {
if isLocal, err := isLocalFile(file); err != nil {
return nil, err
} else if isLocal {
localFiles = append(localFiles, file)
}
}
return localFiles, nil
}
func isLocalFile(file string) (bool, error) {
fileURL, err := url.Parse(file)
if err != nil {
return false, err
}
if fileURL.Scheme == "file" || fileURL.Scheme == "" {
return true, nil
}
return false, nil
}
type blobHandler interface {
// TODO: With go-cloud supporting setting ACLs, remove implementations of interface
setPublicACL(ctx context.Context, bucket string, filePath string) error
}
type uploadHandler struct {
blob blobHandler
blobUploadBucket string
blobEndpoint string
hdpScheme string
ctx context.Context
b *blob.Bucket
}
func (uh uploadHandler) uploadToBucket(uploadPath, localFilePath string) (string, error) {
fileName := filepath.Base(localFilePath)
uploadFilePath := filepath.Join(uploadPath, fileName)
// Check if exists by trying to fetch metadata
reader, err := uh.b.NewRangeReader(uh.ctx, uploadFilePath, 0, 0, nil)
if err == nil {
reader.Close()
}
if (gcerrors.Code(err) == gcerrors.NotFound) || (err == nil && Override) {
fmt.Printf("uploading local file: %s\n", fileName)
// Prepare the file for upload.
data, err := os.ReadFile(localFilePath)
if err != nil {
return "", fmt.Errorf("failed to read file: %s", err)
}
// Open Bucket
w, err := uh.b.NewWriter(uh.ctx, uploadFilePath, nil)
if err != nil {
return "", fmt.Errorf("failed to obtain bucket writer: %s", err)
}
// Write data to bucket and close bucket writer
_, writeErr := w.Write(data)
if err := w.Close(); err != nil {
return "", fmt.Errorf("failed to close bucket writer: %s", err)
}
// Check if write has been successful
if writeErr != nil {
return "", fmt.Errorf("failed to write to bucket: %s", err)
}
// Set public ACL if needed
if Public {
err := uh.blob.setPublicACL(uh.ctx, uh.blobUploadBucket, uploadFilePath)
if err != nil {
return "", err
}
endpointURL, err := url.Parse(uh.blobEndpoint)
if err != nil {
return "", err
}
// Public needs full bucket endpoint
return fmt.Sprintf("%s://%s/%s/%s",
endpointURL.Scheme,
endpointURL.Host,
uh.blobUploadBucket,
uploadFilePath), nil
}
} else if err == nil {
fmt.Printf("not uploading file %s as it already exists remotely\n", fileName)
} else {
return "", err
}
// Return path to file with proper hadoop-connector scheme
return fmt.Sprintf("%s://%s/%s", uh.hdpScheme, uh.blobUploadBucket, uploadFilePath), nil
}
func uploadLocalDependencies(app *v1beta2.SparkApplication, files []string) ([]string, error) {
if UploadToPath == "" {
return nil, fmt.Errorf(
"unable to upload local dependencies: no upload location specified via --upload-to")
}
uploadLocationURL, err := url.Parse(UploadToPath)
if err != nil {
return nil, err
}
uploadBucket := uploadLocationURL.Host
var uh *uploadHandler
ctx := context.Background()
switch uploadLocationURL.Scheme {
case "gs":
uh, err = newGCSBlob(ctx, uploadBucket, UploadToEndpoint, UploadToRegion)
case "s3":
uh, err = newS3Blob(ctx, uploadBucket, UploadToEndpoint, UploadToRegion, S3ForcePathStyle)
default:
return nil, fmt.Errorf("unsupported upload location URL scheme: %s", uploadLocationURL.Scheme)
}
// Check if bucket has been successfully setup
if err != nil {
return nil, err
}
var uploadedFilePaths []string
uploadPath := filepath.Join(RootPath, app.Namespace, app.Name)
for _, localFilePath := range files {
uploadFilePath, err := uh.uploadToBucket(uploadPath, localFilePath)
if err != nil {
return nil, err
}
uploadedFilePaths = append(uploadedFilePaths, uploadFilePath)
}
return uploadedFilePaths, nil
}
func handleHadoopConfiguration(
app *v1beta2.SparkApplication,
hadoopConfDir string,
kubeClientset clientset.Interface) error {
configMap, err := buildHadoopConfigMap(app.Name, hadoopConfDir)
if err != nil {
return fmt.Errorf("failed to create a ConfigMap for Hadoop configuration files in %s: %v",
hadoopConfDir, err)
}
err = kubeClientset.CoreV1().ConfigMaps(Namespace).Delete(context.TODO(), configMap.Name, metav1.DeleteOptions{})
if err != nil && !errors.IsNotFound(err) {
return fmt.Errorf("failed to delete existing ConfigMap %s: %v", configMap.Name, err)
}
if configMap, err = kubeClientset.CoreV1().ConfigMaps(Namespace).Create(context.TODO(), configMap, metav1.CreateOptions{}); err != nil {
return fmt.Errorf("failed to create ConfigMap %s: %v", configMap.Name, err)
}
app.Spec.HadoopConfigMap = &configMap.Name
return nil
}
func buildHadoopConfigMap(appName string, hadoopConfDir string) (*corev1.ConfigMap, error) {
info, err := os.Stat(hadoopConfDir)
if err != nil {
return nil, err
}
if !info.IsDir() {
return nil, fmt.Errorf("%s is not a directory", hadoopConfDir)
}
files, err := os.ReadDir(hadoopConfDir)
if err != nil {
return nil, err
}
if len(files) == 0 {
return nil, fmt.Errorf("no Hadoop configuration file found in %s", hadoopConfDir)
}
hadoopStringConfigFiles := make(map[string]string)
hadoopBinaryConfigFiles := make(map[string][]byte)
for _, file := range files {
if file.IsDir() {
continue
}
content, err := os.ReadFile(filepath.Join(hadoopConfDir, file.Name()))
if err != nil {
return nil, err
}
if utf8.Valid(content) {
hadoopStringConfigFiles[file.Name()] = string(content)
} else {
hadoopBinaryConfigFiles[file.Name()] = content
}
}
configMap := &corev1.ConfigMap{
ObjectMeta: metav1.ObjectMeta{
Name: appName + "-hadoop-config",
Namespace: Namespace,
},
Data: hadoopStringConfigFiles,
BinaryData: hadoopBinaryConfigFiles,
}
return configMap, nil
}

View File

@ -1,182 +0,0 @@
/*
Copyright 2017 Google LLC
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package app
import (
"strings"
"testing"
"github.com/stretchr/testify/assert"
"github.com/kubeflow/spark-operator/api/v1beta2"
)
func TestIsLocalFile(t *testing.T) {
type testcase struct {
file string
isLocal bool
}
testFn := func(test testcase, t *testing.T) {
isLocal, err := isLocalFile(test.file)
if err != nil {
t.Fatal(err)
}
assert.Equal(t, test.isLocal, isLocal, "%s: expected %v got %v", test.file, test.isLocal, isLocal)
}
testcases := []testcase{
{file: "/path/to/file", isLocal: true},
{file: "file:///path/to/file", isLocal: true},
{file: "local:///path/to/file", isLocal: false},
{file: "http://localhost/path/to/file", isLocal: false},
}
for _, test := range testcases {
testFn(test, t)
}
}
func TestFilterLocalFiles(t *testing.T) {
files := []string{
"path/to/file",
"/path/to/file",
"file:///file/to/path",
"http://localhost/path/to/file",
"hdfs://localhost/path/to/file",
"gs://bucket/path/to/file",
}
expected := []string{
"path/to/file",
"/path/to/file",
"file:///file/to/path",
}
actual, err := filterLocalFiles(files)
if err != nil {
t.Fatal(err)
}
assert.Equal(t, expected, actual)
}
func TestValidateSpec(t *testing.T) {
type testcase struct {
name string
spec v1beta2.SparkApplicationSpec
expectsValidationError bool
}
testFn := func(test testcase, t *testing.T) {
err := validateSpec(test.spec)
if test.expectsValidationError {
assert.Error(t, err, "%s: expected error got nothing", test.name)
} else {
assert.NoError(t, err, "%s: did not expect error got %v", test.name, err)
}
}
image := "spark"
remoteMainAppFile := "https://localhost/path/to/main/app/file"
containerLocalMainAppFile := "local:///path/to/main/app/file"
testcases := []testcase{
{
name: "application with spec.image set",
spec: v1beta2.SparkApplicationSpec{
Image: &image,
},
expectsValidationError: false,
},
{
name: "application with no spec.image and spec.driver.image",
spec: v1beta2.SparkApplicationSpec{
Executor: v1beta2.ExecutorSpec{
SparkPodSpec: v1beta2.SparkPodSpec{
Image: &image,
},
},
},
expectsValidationError: true,
},
{
name: "application with no spec.image and spec.executor.image",
spec: v1beta2.SparkApplicationSpec{
Driver: v1beta2.DriverSpec{
SparkPodSpec: v1beta2.SparkPodSpec{
Image: &image,
},
},
},
expectsValidationError: true,
},
{
name: "application with no spec.image but spec.driver.image and spec.executor.image",
spec: v1beta2.SparkApplicationSpec{
MainApplicationFile: &containerLocalMainAppFile,
Driver: v1beta2.DriverSpec{
SparkPodSpec: v1beta2.SparkPodSpec{
Image: &image,
},
},
Executor: v1beta2.ExecutorSpec{
SparkPodSpec: v1beta2.SparkPodSpec{
Image: &image,
},
},
},
expectsValidationError: false,
},
{
name: "application with remote main file and spec.image",
spec: v1beta2.SparkApplicationSpec{
Image: &image,
MainApplicationFile: &remoteMainAppFile,
},
expectsValidationError: false,
},
}
for _, test := range testcases {
testFn(test, t)
}
}
func TestLoadFromYAML(t *testing.T) {
app, err := loadFromYAML("testdata/test-app.yaml")
if err != nil {
t.Fatal(err)
}
assert.Equal(t, "example", app.Name)
assert.Equal(t, "org.examples.SparkExample", *app.Spec.MainClass)
assert.Equal(t, "local:///path/to/example.jar", *app.Spec.MainApplicationFile)
assert.Equal(t, "spark", *app.Spec.Driver.Image)
assert.Equal(t, "spark", *app.Spec.Executor.Image)
assert.Equal(t, 1, int(*app.Spec.Executor.Instances))
}
func TestHandleHadoopConfiguration(t *testing.T) {
configMap, err := buildHadoopConfigMap("test", "testdata/hadoop-conf")
if err != nil {
t.Fatal(err)
}
assert.Equal(t, "test-hadoop-config", configMap.Name)
assert.Len(t, configMap.BinaryData, 1)
assert.Len(t, configMap.Data, 1)
assert.True(t, strings.Contains(configMap.Data["core-site.xml"], "fs.gs.impl"))
}

View File

@ -1,64 +0,0 @@
/*
Copyright 2017 Google LLC
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package app
import (
"context"
"fmt"
"os"
"github.com/spf13/cobra"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
crdclientset "github.com/kubeflow/spark-operator/pkg/client/clientset/versioned"
)
var deleteCmd = &cobra.Command{
Use: "delete <name>",
Short: "Delete a SparkApplication object",
Long: `Delete a SparkApplication object with a given name`,
Run: func(_ *cobra.Command, args []string) {
if len(args) != 1 {
fmt.Fprintln(os.Stderr, "must specify a SparkApplication name")
return
}
crdClientset, err := getSparkApplicationClient()
if err != nil {
fmt.Fprintf(os.Stderr, "failed to get SparkApplication client: %v\n", err)
return
}
if err := doDelete(args[0], crdClientset); err != nil {
fmt.Fprintf(os.Stderr, "failed to delete SparkApplication %s: %v\n", args[0], err)
}
},
}
func doDelete(name string, crdClientset crdclientset.Interface) error {
if err := deleteSparkApplication(name, crdClientset); err != nil {
return err
}
fmt.Printf("SparkApplication \"%s\" deleted\n", name)
return nil
}
func deleteSparkApplication(name string, crdClientset crdclientset.Interface) error {
return crdClientset.SparkoperatorV1beta2().SparkApplications(Namespace).Delete(context.TODO(), name, metav1.DeleteOptions{})
}

View File

@ -1,180 +0,0 @@
/*
Copyright 2018 Google LLC
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package app
import (
"context"
"fmt"
"os"
"strings"
"time"
"github.com/olekukonko/tablewriter"
"github.com/spf13/cobra"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/watch"
"k8s.io/client-go/kubernetes"
clientWatch "k8s.io/client-go/tools/watch"
crdclientset "github.com/kubeflow/spark-operator/pkg/client/clientset/versioned"
"github.com/kubeflow/spark-operator/pkg/util"
)
var FollowEvents bool
var eventCommand = &cobra.Command{
Use: "event <name>",
Short: "Shows SparkApplication events",
Long: `Shows events associated with SparkApplication of a given name`,
Run: func(cmd *cobra.Command, args []string) {
if len(args) != 1 {
fmt.Fprintln(os.Stderr, "must specify a SparkApplication name")
return
}
crdClientset, err := getSparkApplicationClient()
if err != nil {
fmt.Fprintf(os.Stderr, "failed to get SparkApplication client: %v\n", err)
return
}
kubeClientset, err := getKubeClient()
if err != nil {
fmt.Fprintf(os.Stderr, "failed to get KubeClient: %v\n", err)
return
}
if err := doShowEvents(args[0], crdClientset, kubeClientset); err != nil {
fmt.Fprintf(os.Stderr, "failed to check events of SparkApplication %s: %v\n", args[0], err)
}
},
}
func init() {
eventCommand.Flags().BoolVarP(&FollowEvents, "follow", "f", false,
"whether to stream the events for the specified SparkApplication name")
}
func doShowEvents(name string, crdClientset crdclientset.Interface, kubeClientset kubernetes.Interface) error {
app, err := getSparkApplication(name, crdClientset)
if err != nil {
return fmt.Errorf("failed to get SparkApplication %s: %v", name, err)
}
app.Kind = "SparkApplication"
eventsInterface := kubeClientset.CoreV1().Events(Namespace)
if FollowEvents {
// watch for all events for this specific SparkApplication name
selector := eventsInterface.GetFieldSelector(&app.Name, &app.Namespace, &app.Kind, nil)
options := metav1.ListOptions{FieldSelector: selector.String(), Watch: true}
events, err := eventsInterface.Watch(context.TODO(), options)
if err != nil {
return err
}
if err := streamEvents(events, app.CreationTimestamp.Unix()); err != nil {
return err
}
} else {
// print only events for current SparkApplication UID
stringUID := string(app.UID)
selector := eventsInterface.GetFieldSelector(&app.Name, &app.Namespace, &app.Kind, &stringUID)
options := metav1.ListOptions{FieldSelector: selector.String()}
events, err := eventsInterface.List(context.TODO(), options)
if err != nil {
return err
}
if err := printEvents(events); err != nil {
return err
}
}
return nil
}
func prepareNewTable() *tablewriter.Table {
table := tablewriter.NewWriter(os.Stdout)
table.SetColMinWidth(0, 10)
table.SetColMinWidth(1, 6)
table.SetColMinWidth(2, 50)
return table
}
func prepareEventsHeader(table *tablewriter.Table) *tablewriter.Table {
table.SetBorders(tablewriter.Border{Left: true, Top: true, Right: true, Bottom: true})
table.SetHeader([]string{"Type", "Age", "Message"})
table.SetHeaderLine(true)
return table
}
func printEvents(events *corev1.EventList) error {
// Render all event rows
table := prepareNewTable()
table = prepareEventsHeader(table)
for _, event := range events.Items {
table.Append([]string{
event.Type,
getSinceTime(event.LastTimestamp),
strings.TrimSpace(event.Message),
})
}
table.Render()
return nil
}
func streamEvents(events watch.Interface, streamSince int64) error {
// Render just table header, without a additional header line as we stream
table := prepareNewTable()
table = prepareEventsHeader(table)
table.SetHeaderLine(false)
table.Render()
// Set 10 minutes inactivity timeout
watchExpire := 10 * time.Minute
intr := util.NewInterruptHandler(nil, events.Stop)
return intr.Run(func() error {
// Start rendering contents of the table without table header as it is already printed
table = prepareNewTable()
table.SetBorders(tablewriter.Border{Left: true, Top: false, Right: true, Bottom: false})
ctx := context.TODO()
ctx, cancel := context.WithTimeout(ctx, watchExpire)
defer cancel()
_, err := clientWatch.UntilWithoutRetry(ctx, events, func(ev watch.Event) (bool, error) {
if event, isEvent := ev.Object.(*corev1.Event); isEvent {
// Ensure to display events which are newer than last creation time of SparkApplication
// for this specific application name
if streamSince <= event.CreationTimestamp.Unix() {
// Render each row separately
table.ClearRows()
table.Append([]string{
event.Type,
getSinceTime(event.LastTimestamp),
strings.TrimSpace(event.Message),
})
table.Render()
}
} else {
fmt.Printf("info: %v", ev.Object)
}
return false, nil
})
return err
})
}

View File

@ -1,174 +0,0 @@
/*
Copyright 2017 Google LLC
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package app
import (
"context"
"fmt"
"net/http"
"net/url"
"os"
"os/signal"
"time"
"github.com/spf13/cobra"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
clientset "k8s.io/client-go/kubernetes"
"k8s.io/client-go/rest"
"k8s.io/client-go/tools/portforward"
"k8s.io/client-go/transport/spdy"
crdclientset "github.com/kubeflow/spark-operator/pkg/client/clientset/versioned"
)
var LocalPort int32
var RemotePort int32
var forwardCmd = &cobra.Command{
Use: "forward [--local-port <local port>] [--remote-port <remote port>]",
Short: "Start to forward a local port to the remote port of the driver UI",
Long: `Start to forward a local port to the remote port of the driver UI so the UI can be accessed locally.`,
Run: func(cmd *cobra.Command, args []string) {
if len(args) != 1 {
fmt.Fprintln(os.Stderr, "must specify a SparkApplication name")
return
}
config, err := buildConfig(KubeConfig)
if err != nil {
fmt.Fprintf(os.Stderr, "failed to get kubeconfig: %v\n", err)
return
}
crdClientset, err := getSparkApplicationClientForConfig(config)
if err != nil {
fmt.Fprintf(os.Stderr, "failed to get SparkApplication client: %v\n", err)
return
}
kubeClientset, err := getKubeClientForConfig(config)
if err != nil {
fmt.Fprintf(os.Stderr, "failed to get REST client: %v\n", err)
return
}
restClient := kubeClientset.CoreV1().RESTClient()
driverPodURL, driverPodName, err := getDriverPodURLAndName(args[0], restClient, crdClientset)
if err != nil {
fmt.Fprintf(os.Stderr,
"failed to get an API server URL of the driver pod of SparkApplication %s: %v\n",
args[0], err)
return
}
stopCh := make(chan struct{}, 1)
readyCh := make(chan struct{})
forwarder, err := newPortForwarder(config, driverPodURL, stopCh, readyCh)
if err != nil {
fmt.Fprintf(os.Stderr, "failed to get a port forwarder: %v\n", err)
return
}
fmt.Printf("forwarding from %d -> %d\n", LocalPort, RemotePort)
if err = runPortForward(driverPodName, stopCh, forwarder, kubeClientset); err != nil {
fmt.Fprintf(os.Stderr, "failed to run port forwarding: %v\n", err)
}
},
}
func init() {
forwardCmd.Flags().Int32VarP(&LocalPort, "local-port", "l", 4040,
"local port to forward from")
forwardCmd.Flags().Int32VarP(&RemotePort, "remote-port", "r", 4040,
"remote port to forward to")
}
func newPortForwarder(
config *rest.Config,
url *url.URL,
stopCh chan struct{},
readyCh chan struct{}) (*portforward.PortForwarder, error) {
transport, upgrader, err := spdy.RoundTripperFor(config)
if err != nil {
return nil, err
}
dialer := spdy.NewDialer(upgrader, &http.Client{Transport: transport}, "POST", url)
ports := []string{fmt.Sprintf("%d:%d", LocalPort, RemotePort)}
fw, err := portforward.New(dialer, ports, stopCh, readyCh, nil, os.Stderr)
if err != nil {
return nil, err
}
return fw, nil
}
func getDriverPodURLAndName(
name string,
restClient rest.Interface,
crdClientset crdclientset.Interface) (*url.URL, string, error) {
app, err := getSparkApplication(name, crdClientset)
if err != nil {
return nil, "", fmt.Errorf("failed to get SparkApplication %s: %v", name, err)
}
if app.Status.DriverInfo.PodName != "" {
request := restClient.Post().
Resource("pods").
Namespace(Namespace).
Name(app.Status.DriverInfo.PodName).
SubResource("portforward")
return request.URL(), app.Status.DriverInfo.PodName, nil
}
return nil, "", fmt.Errorf("driver pod name of SparkApplication %s is not available yet", name)
}
func runPortForward(
driverPodName string,
stopCh chan struct{},
forwarder *portforward.PortForwarder,
kubeClientset clientset.Interface) error {
signals := make(chan os.Signal, 1)
signal.Notify(signals, os.Interrupt)
defer signal.Stop(signals)
go func() {
defer close(stopCh)
for {
pod, err := kubeClientset.CoreV1().Pods(Namespace).Get(context.TODO(), driverPodName, metav1.GetOptions{})
if err != nil {
break
}
if pod.Status.Phase == corev1.PodSucceeded || pod.Status.Phase == corev1.PodFailed {
break
}
time.Sleep(1 * time.Second)
}
fmt.Println("stopping forwarding as the driver pod has terminated")
}()
go func() {
<-signals
close(stopCh)
}()
return forwarder.ForwardPorts()
}

View File

@ -1,80 +0,0 @@
/*
Copyright 2017 Google LLC
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package app
import (
"fmt"
"cloud.google.com/go/storage"
"gocloud.dev/blob/gcsblob"
"gocloud.dev/gcp"
"golang.org/x/net/context"
)
type blobGCS struct {
projectID string
endpoint string
region string
}
func (blob blobGCS) setPublicACL(
ctx context.Context,
bucket string,
filePath string) error {
client, err := storage.NewClient(ctx)
if err != nil {
return err
}
defer client.Close()
handle := client.Bucket(bucket).UserProject(blob.projectID)
if err = handle.Object(filePath).ACL().Set(ctx, storage.AllUsers, storage.RoleReader); err != nil {
return fmt.Errorf("failed to set ACL on GCS object %s: %v", filePath, err)
}
return nil
}
func newGCSBlob(
ctx context.Context,
bucket string,
endpoint string,
region string) (*uploadHandler, error) {
creds, err := gcp.DefaultCredentials(ctx)
if err != nil {
return nil, err
}
projectID, err := gcp.DefaultProjectID(creds)
if err != nil {
return nil, err
}
c, err := gcp.NewHTTPClient(gcp.DefaultTransport(), gcp.CredentialsTokenSource(creds))
if err != nil {
return nil, err
}
b, err := gcsblob.OpenBucket(ctx, c, bucket, nil)
return &uploadHandler{
blob: blobGCS{endpoint: endpoint, region: region, projectID: string(projectID)},
ctx: ctx,
b: b,
blobUploadBucket: bucket,
blobEndpoint: endpoint,
hdpScheme: "gs",
}, err
}

View File

@ -1,67 +0,0 @@
/*
Copyright 2017 Google LLC
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package app
import (
"context"
"fmt"
"os"
"github.com/olekukonko/tablewriter"
"github.com/spf13/cobra"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
crdclientset "github.com/kubeflow/spark-operator/pkg/client/clientset/versioned"
)
var listCmd = &cobra.Command{
Use: "list",
Short: "List SparkApplication objects",
Long: `List SparkApplication objects in a given namespaces.`,
Run: func(_ *cobra.Command, args []string) {
crdClientset, err := getSparkApplicationClient()
if err != nil {
fmt.Fprintf(os.Stderr, "failed to get SparkApplication client: %v\n", err)
return
}
if err = doList(crdClientset); err != nil {
fmt.Fprintf(os.Stderr, "failed to list SparkApplications: %v\n", err)
}
},
}
func doList(crdClientset crdclientset.Interface) error {
apps, err := crdClientset.SparkoperatorV1beta2().SparkApplications(Namespace).List(context.TODO(), metav1.ListOptions{})
if err != nil {
return err
}
table := tablewriter.NewWriter(os.Stdout)
table.SetHeader([]string{"Name", "State", "Submission Age", "Termination Age"})
for _, app := range apps.Items {
table.Append([]string{
app.Name,
string(app.Status.AppState.State),
getSinceTime(app.Status.LastSubmissionAttemptTime),
getSinceTime(app.Status.TerminationTime),
})
}
table.Render()
return nil
}

View File

@ -1,166 +0,0 @@
/*
Copyright 2017 Google LLC
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package app
import (
"context"
"fmt"
"io"
"os"
"time"
"github.com/spf13/cobra"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
clientset "k8s.io/client-go/kubernetes"
crdclientset "github.com/kubeflow/spark-operator/pkg/client/clientset/versioned"
)
var ExecutorID int32
var FollowLogs bool
var logCommand = &cobra.Command{
Use: "log <name>",
Short: "log is a sub-command of sparkctl that fetches logs of a Spark application.",
Long: ``,
Run: func(cmd *cobra.Command, args []string) {
ctx := cmd.Context()
if len(args) != 1 {
fmt.Fprintln(os.Stderr, "must specify a SparkApplication name")
return
}
kubeClientset, err := getKubeClient()
if err != nil {
fmt.Fprintf(os.Stderr, "failed to get Kubernetes client: %v\n", err)
return
}
crdClientset, err := getSparkApplicationClient()
if err != nil {
fmt.Fprintf(os.Stderr, "failed to get SparkApplication client: %v\n", err)
return
}
if err := doLog(ctx, args[0], FollowLogs, kubeClientset, crdClientset); err != nil {
fmt.Fprintf(os.Stderr, "failed to get driver logs of SparkApplication %s: %v\n", args[0], err)
}
},
}
func init() {
logCommand.Flags().Int32VarP(&ExecutorID, "executor", "e", -1,
"id of the executor to fetch logs for")
logCommand.Flags().BoolVarP(&FollowLogs, "follow", "f", false, "whether to stream the logs")
}
func doLog(
ctx context.Context,
name string,
followLogs bool,
kubeClient clientset.Interface,
crdClient crdclientset.Interface) error {
timeout := 30 * time.Second
podNameChannel := getPodNameChannel(ctx, name, crdClient)
var podName string
select {
case podName = <-podNameChannel:
case <-time.After(timeout):
return fmt.Errorf("not found pod name")
}
waitLogsChannel := waitForLogsFromPodChannel(ctx, podName, kubeClient, crdClient)
select {
case <-waitLogsChannel:
case <-time.After(timeout):
return fmt.Errorf("timeout to fetch logs from pod \"%s\"", podName)
}
if followLogs {
return streamLogs(ctx, os.Stdout, kubeClient, podName)
}
return printLogs(ctx, os.Stdout, kubeClient, podName)
}
func getPodNameChannel(
ctx context.Context,
sparkApplicationName string,
crdClient crdclientset.Interface) chan string {
channel := make(chan string, 1)
go func() {
for {
app, _ := crdClient.SparkoperatorV1beta2().SparkApplications(Namespace).Get(
ctx,
sparkApplicationName,
metav1.GetOptions{})
if app.Status.DriverInfo.PodName != "" {
channel <- app.Status.DriverInfo.PodName
break
}
}
}()
return channel
}
func waitForLogsFromPodChannel(
ctx context.Context,
podName string,
kubeClient clientset.Interface,
_ crdclientset.Interface) chan bool {
channel := make(chan bool, 1)
go func() {
for {
_, err := kubeClient.CoreV1().Pods(Namespace).GetLogs(podName, &corev1.PodLogOptions{}).Do(ctx).Raw()
if err == nil {
channel <- true
break
}
}
}()
return channel
}
// printLogs is a one time operation that prints the fetched logs of the given pod.
func printLogs(ctx context.Context, out io.Writer, kubeClientset clientset.Interface, podName string) error {
rawLogs, err := kubeClientset.CoreV1().Pods(Namespace).GetLogs(podName, &corev1.PodLogOptions{}).Do(ctx).Raw()
if err != nil {
return err
}
fmt.Fprintln(out, string(rawLogs))
return nil
}
// streamLogs streams the logs of the given pod until there are no more logs available.
func streamLogs(ctx context.Context, out io.Writer, kubeClientset clientset.Interface, podName string) error {
request := kubeClientset.CoreV1().Pods(Namespace).GetLogs(podName, &corev1.PodLogOptions{Follow: true})
reader, err := request.Stream(ctx)
if err != nil {
return err
}
defer reader.Close()
if _, err := io.Copy(out, reader); err != nil {
return err
}
return nil
}

View File

@ -1,61 +0,0 @@
/*
Copyright 2017 Google LLC
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package app
import (
"context"
"fmt"
"os"
"github.com/spf13/cobra"
)
func getKubeConfigPath() string {
var kubeConfigEnv = os.Getenv("KUBECONFIG")
if len(kubeConfigEnv) == 0 {
return os.Getenv("HOME") + "/.kube/config"
}
return kubeConfigEnv
}
var defaultKubeConfig = getKubeConfigPath()
var Namespace string
var KubeConfig string
var rootCmd = &cobra.Command{
Use: "sparkctl",
Short: "sparkctl is the command-line tool for working with the Spark Operator",
Long: `sparkctl is the command-line tool for working with the Spark Operator. It supports creating, deleting and
checking status of SparkApplication objects. It also supports fetching application logs.`,
}
func init() {
rootCmd.PersistentFlags().StringVarP(&Namespace, "namespace", "n", "default",
"The namespace in which the SparkApplication is to be created")
rootCmd.PersistentFlags().StringVarP(&KubeConfig, "kubeconfig", "k", defaultKubeConfig,
"The path to the local Kubernetes configuration file")
rootCmd.AddCommand(createCmd, deleteCmd, eventCommand, statusCmd, logCommand, listCmd, forwardCmd)
}
func Execute() {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
if err := rootCmd.ExecuteContext(ctx); err != nil {
fmt.Fprintf(os.Stderr, "%v", err)
}
}

View File

@ -1,85 +0,0 @@
/*
Copyright 2018 Google LLC
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package app
import (
"context"
"fmt"
"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/config"
"github.com/aws/aws-sdk-go-v2/service/s3"
"github.com/aws/aws-sdk-go-v2/service/s3/types"
"gocloud.dev/blob/s3blob"
)
type blobS3 struct {
client *s3.Client
}
func (blob blobS3) setPublicACL(
ctx context.Context,
bucket string,
filePath string) error {
acl := types.ObjectCannedACLPublicRead
if _, err := blob.client.PutObjectAcl(ctx, &s3.PutObjectAclInput{Bucket: &bucket, Key: &filePath, ACL: acl}); err != nil {
return fmt.Errorf("failed to set ACL on S3 object %s: %v", filePath, err)
}
return nil
}
func newS3Blob(
ctx context.Context,
bucket string,
endpoint string,
region string,
usePathStyle bool) (*uploadHandler, error) {
// AWS SDK does require specifying regions, thus set it to default S3 region
if region == "" {
region = "us-east1"
}
endpointResolver := aws.EndpointResolverWithOptionsFunc(func(service, region string, _ ...interface{}) (aws.Endpoint, error) {
if service == s3.ServiceID && endpoint != "" {
return aws.Endpoint{
PartitionID: "aws",
URL: endpoint,
SigningRegion: region,
}, nil
}
return aws.Endpoint{}, &aws.EndpointNotFoundError{}
})
conf, err := config.LoadDefaultConfig(
ctx, config.WithRegion(region),
config.WithEndpointResolverWithOptions(endpointResolver),
)
if err != nil {
return nil, err
}
client := s3.NewFromConfig(conf, func(o *s3.Options) {
o.UsePathStyle = usePathStyle
})
b, err := s3blob.OpenBucketV2(ctx, client, bucket, nil)
return &uploadHandler{
blob: blobS3{client: client},
ctx: ctx,
b: b,
blobUploadBucket: bucket,
blobEndpoint: endpoint,
hdpScheme: "s3a",
}, err
}

View File

@ -1,91 +0,0 @@
/*
Copyright 2017 Google LLC
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package app
import (
"fmt"
"os"
"github.com/olekukonko/tablewriter"
"github.com/spf13/cobra"
"github.com/kubeflow/spark-operator/api/v1beta2"
crdclientset "github.com/kubeflow/spark-operator/pkg/client/clientset/versioned"
)
var statusCmd = &cobra.Command{
Use: "status <name>",
Short: "Check status of a SparkApplication",
Long: `Check status of a SparkApplication with a given name`,
Run: func(_ *cobra.Command, args []string) {
if len(args) != 1 {
fmt.Fprintln(os.Stderr, "must specify a SparkApplication name")
return
}
crdClientset, err := getSparkApplicationClient()
if err != nil {
fmt.Fprintf(os.Stderr, "failed to get SparkApplication client: %v\n", err)
return
}
if err := doStatus(args[0], crdClientset); err != nil {
fmt.Fprintf(os.Stderr, "failed to check status of SparkApplication %s: %v\n", args[0], err)
}
},
}
func doStatus(name string, crdClientset crdclientset.Interface) error {
app, err := getSparkApplication(name, crdClientset)
if err != nil {
return fmt.Errorf("failed to get SparkApplication %s: %v", name, err)
}
printStatus(app)
return nil
}
func printStatus(app *v1beta2.SparkApplication) {
fmt.Println("application state:")
table := tablewriter.NewWriter(os.Stdout)
table.SetHeader([]string{"State", "Submission Age", "Completion Age", "Driver Pod", "Driver UI", "SubmissionAttempts", "ExecutionAttempts"})
table.Append([]string{
string(app.Status.AppState.State),
getSinceTime(app.Status.LastSubmissionAttemptTime),
getSinceTime(app.Status.TerminationTime),
formatNotAvailable(app.Status.DriverInfo.PodName),
formatNotAvailable(app.Status.DriverInfo.WebUIAddress),
fmt.Sprintf("%v", app.Status.SubmissionAttempts),
fmt.Sprintf("%v", app.Status.ExecutionAttempts),
})
table.Render()
if len(app.Status.ExecutorState) > 0 {
fmt.Println("executor state:")
table := tablewriter.NewWriter(os.Stdout)
table.SetHeader([]string{"Executor Pod", "State"})
for executorPod, state := range app.Status.ExecutorState {
table.Append([]string{executorPod, string(state)})
}
table.Render()
}
if app.Status.AppState.ErrorMessage != "" {
fmt.Printf("\napplication error message: %s\n", app.Status.AppState.ErrorMessage)
}
}

View File

@ -1 +0,0 @@
<EFBFBD>  

View File

@ -1,8 +0,0 @@
<?xml version="1.0" ?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.gs.impl</name>
<value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value>
</property>
</configuration>

View File

@ -1,31 +0,0 @@
#
# Copyright 2017 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: example
namespace: default
spec:
type: Scala
mode: cluster
mainClass: org.examples.SparkExample
mainApplicationFile: "local:///path/to/example.jar"
driver:
image: "spark"
executor:
image: "spark"
instances: 1

View File

@ -21,6 +21,9 @@ spec:
- jsonPath: .spec.schedule
name: Schedule
type: string
- jsonPath: .spec.timeZone
name: TimeZone
type: string
- jsonPath: .spec.suspend
name: Suspend
type: string
@ -3118,6 +3121,10 @@ spec:
description: Memory is the amount of memory to request for
the pod.
type: string
memoryLimit:
description: MemoryLimit overrides the memory limit of the
pod.
type: string
memoryOverhead:
description: MemoryOverhead is the amount of off-heap memory
to allocate in cluster mode, in MiB unless otherwise specified.
@ -5289,6 +5296,13 @@ spec:
of executors if dynamic allocation is enabled.
format: int32
type: integer
shuffleTrackingEnabled:
description: |-
ShuffleTrackingEnabled enables shuffle file tracking for executors, which allows dynamic allocation without
the need for an external shuffle service. This option will try to keep alive executors that are storing
shuffle data for active jobs. If external shuffle service is enabled, set ShuffleTrackingEnabled to false.
ShuffleTrackingEnabled is true by default if dynamicAllocation.enabled is true.
type: boolean
shuffleTrackingTimeout:
description: |-
ShuffleTrackingTimeout controls the timeout in milliseconds for executors that are holding
@ -8242,6 +8256,10 @@ spec:
description: Memory is the amount of memory to request for
the pod.
type: string
memoryLimit:
description: MemoryLimit overrides the memory limit of the
pod.
type: string
memoryOverhead:
description: MemoryOverhead is the amount of off-heap memory
to allocate in cluster mode, in MiB unless otherwise specified.
@ -12384,6 +12402,13 @@ spec:
- sparkVersion
- type
type: object
timeZone:
description: |-
TimeZone is the time zone in which the cron schedule will be interpreted in.
This value is passed to time.LoadLocation, so it must be either "Local", "UTC",
or a valid IANA location name e.g. "America/New_York".
Defaults to "Local".
type: string
required:
- schedule
- template

View File

@ -3071,6 +3071,9 @@ spec:
description: Memory is the amount of memory to request for the
pod.
type: string
memoryLimit:
description: MemoryLimit overrides the memory limit of the pod.
type: string
memoryOverhead:
description: MemoryOverhead is the amount of off-heap memory to
allocate in cluster mode, in MiB unless otherwise specified.
@ -5235,6 +5238,13 @@ spec:
executors if dynamic allocation is enabled.
format: int32
type: integer
shuffleTrackingEnabled:
description: |-
ShuffleTrackingEnabled enables shuffle file tracking for executors, which allows dynamic allocation without
the need for an external shuffle service. This option will try to keep alive executors that are storing
shuffle data for active jobs. If external shuffle service is enabled, set ShuffleTrackingEnabled to false.
ShuffleTrackingEnabled is true by default if dynamicAllocation.enabled is true.
type: boolean
shuffleTrackingTimeout:
description: |-
ShuffleTrackingTimeout controls the timeout in milliseconds for executors that are holding
@ -8172,6 +8182,9 @@ spec:
description: Memory is the amount of memory to request for the
pod.
type: string
memoryLimit:
description: MemoryLimit overrides the memory limit of the pod.
type: string
memoryOverhead:
description: MemoryOverhead is the amount of off-heap memory to
allocate in cluster mode, in MiB unless otherwise specified.

View File

@ -0,0 +1,272 @@
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
api-approved.kubernetes.io: https://github.com/kubeflow/spark-operator/pull/1298
controller-gen.kubebuilder.io/version: v0.17.1
name: sparkconnects.sparkoperator.k8s.io
spec:
group: sparkoperator.k8s.io
names:
kind: SparkConnect
listKind: SparkConnectList
plural: sparkconnects
shortNames:
- sparkconn
singular: sparkconnect
scope: Namespaced
versions:
- additionalPrinterColumns:
- jsonPath: .metadata.creationTimestamp
name: Age
type: date
name: v1alpha1
schema:
openAPIV3Schema:
description: SparkConnect is the Schema for the sparkconnections API.
properties:
apiVersion:
description: |-
APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
type: string
kind:
description: |-
Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
type: string
metadata:
type: object
spec:
description: SparkConnectSpec defines the desired state of SparkConnect.
properties:
dynamicAllocation:
description: |-
DynamicAllocation configures dynamic allocation that becomes available for the Kubernetes
scheduler backend since Spark 3.0.
properties:
enabled:
description: Enabled controls whether dynamic allocation is enabled
or not.
type: boolean
initialExecutors:
description: |-
InitialExecutors is the initial number of executors to request. If .spec.executor.instances
is also set, the initial number of executors is set to the bigger of that and this option.
format: int32
type: integer
maxExecutors:
description: MaxExecutors is the upper bound for the number of
executors if dynamic allocation is enabled.
format: int32
type: integer
minExecutors:
description: MinExecutors is the lower bound for the number of
executors if dynamic allocation is enabled.
format: int32
type: integer
shuffleTrackingEnabled:
description: |-
ShuffleTrackingEnabled enables shuffle file tracking for executors, which allows dynamic allocation without
the need for an external shuffle service. This option will try to keep alive executors that are storing
shuffle data for active jobs. If external shuffle service is enabled, set ShuffleTrackingEnabled to false.
ShuffleTrackingEnabled is true by default if dynamicAllocation.enabled is true.
type: boolean
shuffleTrackingTimeout:
description: |-
ShuffleTrackingTimeout controls the timeout in milliseconds for executors that are holding
shuffle data if shuffle tracking is enabled (true by default if dynamic allocation is enabled).
format: int64
type: integer
type: object
executor:
description: Executor is the Spark executor specification.
properties:
cores:
description: Cores maps to `spark.driver.cores` or `spark.executor.cores`
for the driver and executors, respectively.
format: int32
minimum: 1
type: integer
instances:
description: Instances is the number of executor instances.
format: int32
minimum: 0
type: integer
memory:
description: Memory is the amount of memory to request for the
pod.
type: string
template:
description: |-
Template is a pod template that can be used to define the driver or executor pod configurations that Spark configurations do not support.
Spark version >= 3.0.0 is required.
Ref: https://spark.apache.org/docs/latest/running-on-kubernetes.html#pod-template.
type: object
x-kubernetes-preserve-unknown-fields: true
type: object
hadoopConf:
additionalProperties:
type: string
description: |-
HadoopConf carries user-specified Hadoop configuration properties as they would use the "--conf" option
in spark-submit. The SparkApplication controller automatically adds prefix "spark.hadoop." to Hadoop
configuration properties.
type: object
image:
description: |-
Image is the container image for the driver, executor, and init-container. Any custom container images for the
driver, executor, or init-container takes precedence over this.
type: string
server:
description: Server is the Spark connect server specification.
properties:
cores:
description: Cores maps to `spark.driver.cores` or `spark.executor.cores`
for the driver and executors, respectively.
format: int32
minimum: 1
type: integer
memory:
description: Memory is the amount of memory to request for the
pod.
type: string
template:
description: |-
Template is a pod template that can be used to define the driver or executor pod configurations that Spark configurations do not support.
Spark version >= 3.0.0 is required.
Ref: https://spark.apache.org/docs/latest/running-on-kubernetes.html#pod-template.
type: object
x-kubernetes-preserve-unknown-fields: true
type: object
sparkConf:
additionalProperties:
type: string
description: |-
SparkConf carries user-specified Spark configuration properties as they would use the "--conf" option in
spark-submit.
type: object
sparkVersion:
description: SparkVersion is the version of Spark the spark connect
use.
type: string
required:
- executor
- server
- sparkVersion
type: object
status:
description: SparkConnectStatus defines the observed state of SparkConnect.
properties:
conditions:
description: Represents the latest available observations of a SparkConnect's
current state.
items:
description: Condition contains details for one aspect of the current
state of this API Resource.
properties:
lastTransitionTime:
description: |-
lastTransitionTime is the last time the condition transitioned from one status to another.
This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
format: date-time
type: string
message:
description: |-
message is a human readable message indicating details about the transition.
This may be an empty string.
maxLength: 32768
type: string
observedGeneration:
description: |-
observedGeneration represents the .metadata.generation that the condition was set based upon.
For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date
with respect to the current state of the instance.
format: int64
minimum: 0
type: integer
reason:
description: |-
reason contains a programmatic identifier indicating the reason for the condition's last transition.
Producers of specific condition types may define expected values and meanings for this field,
and whether the values are considered a guaranteed API.
The value should be a CamelCase string.
This field may not be empty.
maxLength: 1024
minLength: 1
pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$
type: string
status:
description: status of the condition, one of True, False, Unknown.
enum:
- "True"
- "False"
- Unknown
type: string
type:
description: type of condition in CamelCase or in foo.example.com/CamelCase.
maxLength: 316
pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
type: string
required:
- lastTransitionTime
- message
- reason
- status
- type
type: object
type: array
x-kubernetes-list-map-keys:
- type
x-kubernetes-list-type: map
executors:
additionalProperties:
type: integer
description: Executors represents the current state of the SparkConnect
executors.
type: object
lastUpdateTime:
description: LastUpdateTime is the time at which the SparkConnect
controller last updated the SparkConnect.
format: date-time
type: string
server:
description: Server represents the current state of the SparkConnect
server.
properties:
podIp:
description: PodIP is the IP address of the pod that is running
the Spark Connect server.
type: string
podName:
description: PodName is the name of the pod that is running the
Spark Connect server.
type: string
serviceName:
description: ServiceName is the name of the service that is exposing
the Spark Connect server.
type: string
type: object
startTime:
description: StartTime is the time at which the SparkConnect controller
started processing the SparkConnect.
format: date-time
type: string
state:
description: State represents the current state of the SparkConnect.
type: string
type: object
required:
- metadata
- spec
type: object
served: true
storage: true
subresources:
status: {}

View File

@ -46,6 +46,9 @@ rules:
- create
- delete
- get
- list
- update
- watch
- apiGroups:
- apiextensions.k8s.io
resources:
@ -62,7 +65,6 @@ rules:
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
@ -70,6 +72,7 @@ rules:
resources:
- scheduledsparkapplications
- sparkapplications
- sparkconnects
verbs:
- create
- delete
@ -90,6 +93,7 @@ rules:
resources:
- scheduledsparkapplications/status
- sparkapplications/status
- sparkconnects/status
verbs:
- get
- patch

View File

@ -1,7 +1,5 @@
## Append samples of your project ##
resources:
- v1beta1_sparkapplication.yaml
- v1beta1_scheduledsparkapplication.yaml
- v1beta2_sparkapplication.yaml
- v1beta2_scheduledsparkapplication.yaml
# +kubebuilder:scaffold:manifestskustomizesamples

View File

@ -1,9 +0,0 @@
apiVersion: sparkoperator.k8s.io/v1beta1
kind: ScheduledSparkApplication
metadata:
labels:
app.kubernetes.io/name: spark-operator
app.kubernetes.io/managed-by: kustomize
name: scheduledsparkapplication-sample
spec:
# TODO(user): Add fields here

View File

@ -1,23 +0,0 @@
apiVersion: sparkoperator.k8s.io/v1beta1
kind: SparkApplication
metadata:
labels:
app.kubernetes.io/name: spark-operator
app.kubernetes.io/managed-by: kustomize
name: sparkapplication-sample
spec:
type: Scala
mode: cluster
image: spark:3.5.3
imagePullPolicy: IfNotPresent
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples.jar
sparkVersion: 3.5.3
driver:
labels:
version: 3.5.3
serviceAccount: spark-operator-spark
executor:
labels:
version: 3.5.3
instances: 1

View File

@ -11,23 +11,23 @@ spec:
template:
type: Scala
mode: cluster
image: spark:3.5.3
image: docker.io/library/spark:4.0.0
imagePullPolicy: IfNotPresent
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples.jar
sparkVersion: 3.5.3
sparkVersion: 4.0.0
restartPolicy:
type: Never
driver:
labels:
version: 3.5.3
version: 4.0.0
cores: 1
coreLimit: 1200m
memory: 512m
serviceAccount: spark-operator-spark
executor:
labels:
version: 3.5.3
version: 4.0.0
instances: 1
cores: 1
coreLimit: 1200m

View File

@ -8,16 +8,16 @@ metadata:
spec:
type: Scala
mode: cluster
image: spark:3.5.3
image: docker.io/library/spark:4.0.0
imagePullPolicy: IfNotPresent
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples.jar
sparkVersion: 3.5.3
sparkVersion: 4.0.0
driver:
labels:
version: 3.5.3
version: 4.0.0
serviceAccount: spark-operator-spark
executor:
labels:
version: 3.5.3
version: 4.0.0
instances: 1

File diff suppressed because it is too large Load Diff

View File

@ -21,11 +21,11 @@ metadata:
spec:
type: Scala
mode: cluster
image: spark:3.5.3
image: docker.io/library/spark:4.0.0
imagePullPolicy: IfNotPresent
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples.jar
sparkVersion: 3.5.3
sparkVersion: 4.0.0
restartPolicy:
type: Never
volumes:
@ -39,6 +39,16 @@ spec:
- name: config-vol
mountPath: /opt/spark/config
serviceAccount: spark-operator-spark
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
executor:
instances: 1
cores: 1
@ -46,3 +56,13 @@ spec:
volumeMounts:
- name: config-vol
mountPath: /opt/spark/config
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault

View File

@ -21,11 +21,11 @@ metadata:
spec:
type: Scala
mode: cluster
image: spark:3.5.3
image: docker.io/library/spark:4.0.0
imagePullPolicy: IfNotPresent
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples.jar
sparkVersion: 3.5.3
sparkVersion: 4.0.0
restartPolicy:
type: Never
driver:
@ -33,8 +33,28 @@ spec:
coreLimit: 800m
memory: 512m
serviceAccount: spark-operator-spark
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
executor:
instances: 1
coreRequest: "1200m"
coreLimit: 1500m
memory: 512m
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault

View File

@ -21,19 +21,39 @@ metadata:
spec:
type: Scala
mode: cluster
image: spark:3.5.3
image: docker.io/library/spark:4.0.0
imagePullPolicy: IfNotPresent
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples.jar
sparkVersion: 3.5.3
sparkVersion: 4.0.0
driver:
cores: 1
memory: 512m
serviceAccount: spark-operator-spark
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
executor:
instances: 1
cores: 1
memory: 512m
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
dynamicAllocation:
enabled: true
initialExecutors: 2

View File

@ -21,17 +21,37 @@ metadata:
spec:
type: Scala
mode: cluster
image: spark:3.5.3
image: docker.io/library/spark:4.0.0
imagePullPolicy: IfNotPresent
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples.jar
sparkVersion: 3.5.3
sparkVersion: 4.0.0
driver:
cores: 1
memory: 512m
serviceAccount: spark-operator-spark
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
executor:
instances: 2
cores: 1
memory: 512m
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
batchScheduler: kube-scheduler

View File

@ -57,8 +57,8 @@ metadata:
spec:
type: Scala
mode: cluster
sparkVersion: 3.5.3
image: spark:3.5.3
sparkVersion: 4.0.0
image: docker.io/library/spark:4.0.0
imagePullPolicy: IfNotPresent
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples.jar
mainClass: org.apache.spark.examples.SparkPi
@ -68,10 +68,20 @@ spec:
template:
metadata:
labels:
spark.apache.org/version: 3.5.3
spark.apache.org/version: 4.0.0
annotations:
spark.apache.org/version: 3.5.3
spark.apache.org/version: 4.0.0
spec:
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
containers:
- name: spark-kubernetes-driver
env:
@ -133,10 +143,20 @@ spec:
template:
metadata:
labels:
spark.apache.org/version: 3.5.3
spark.apache.org/version: 4.0.0
annotations:
spark.apache.org/version: 3.5.3
spark.apache.org/version: 4.0.0
spec:
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
containers:
- name: spark-kubernetes-executor
env:

View File

@ -21,10 +21,10 @@ spec:
type: Python
pythonVersion: "3"
mode: cluster
image: spark:3.5.3
image: docker.io/library/spark:4.0.0
imagePullPolicy: IfNotPresent
mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py
sparkVersion: 3.5.3
sparkVersion: 4.0.0
sparkConf:
# Expose Spark metrics for Prometheus
"spark.kubernetes.driver.annotation.prometheus.io/scrape": "true"
@ -42,7 +42,27 @@ spec:
cores: 1
memory: 512m
serviceAccount: spark-operator-spark
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
executor:
instances: 1
cores: 1
memory: 512m
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault

View File

@ -22,27 +22,47 @@ metadata:
spec:
type: Scala
mode: cluster
image: {IMAGE_REGISTRY}/{IMAGE_REPOSITORY}/spark:3.5.3-gcs-prometheus
image: {IMAGE_REGISTRY}/{IMAGE_REPOSITORY}/docker.io/library/spark:4.0.0-gcs-prometheus
imagePullPolicy: Always
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples.jar
arguments:
- "100000"
sparkVersion: 3.5.3
sparkVersion: 4.0.0
restartPolicy:
type: Never
driver:
cores: 1
memory: 512m
labels:
version: 3.5.3
version: 4.0.0
serviceAccount: spark-operator-spark
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
executor:
cores: 1
instances: 1
memory: 512m
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
labels:
version: 3.5.3
version: 4.0.0
monitoring:
exposeDriverMetrics: true
exposeExecutorMetrics: true

View File

@ -22,15 +22,35 @@ spec:
type: Python
pythonVersion: "3"
mode: cluster
image: spark:3.5.3
image: docker.io/library/spark:4.0.0
imagePullPolicy: IfNotPresent
mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py
sparkVersion: 3.5.3
sparkVersion: 4.0.0
driver:
cores: 1
memory: 512m
serviceAccount: spark-operator-spark
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
executor:
instances: 1
cores: 1
memory: 512m
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault

View File

@ -25,18 +25,38 @@ spec:
template:
type: Scala
mode: cluster
image: spark:3.5.3
image: docker.io/library/spark:4.0.0
imagePullPolicy: IfNotPresent
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples.jar
sparkVersion: 3.5.3
sparkVersion: 4.0.0
restartPolicy:
type: Never
driver:
cores: 1
memory: 512m
serviceAccount: spark-operator-spark
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
executor:
instances: 1
cores: 1
memory: 512m
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault

View File

@ -21,17 +21,37 @@ metadata:
spec:
type: Scala
mode: cluster
image: spark:3.5.3
image: docker.io/library/spark:4.0.0
imagePullPolicy: IfNotPresent
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples.jar
sparkVersion: 3.5.3
sparkVersion: 4.0.0
timeToLiveSeconds: 30
driver:
cores: 1
memory: 512m
serviceAccount: spark-operator-spark
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
executor:
instances: 1
cores: 1
memory: 512m
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault

View File

@ -21,17 +21,37 @@ metadata:
spec:
type: Scala
mode: cluster
image: spark:3.5.3
image: docker.io/library/spark:4.0.0
imagePullPolicy: IfNotPresent
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples.jar
sparkVersion: 3.5.3
sparkVersion: 4.0.0
driver:
cores: 1
memory: 512m
serviceAccount: spark-operator-spark
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
executor:
instances: 2
cores: 1
memory: 512m
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
batchScheduler: volcano

View File

@ -21,19 +21,39 @@ metadata:
spec:
type: Scala
mode: cluster
image: spark:3.5.3
image: docker.io/library/spark:4.0.0
imagePullPolicy: IfNotPresent
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples.jar
sparkVersion: 3.5.3
sparkVersion: 4.0.0
driver:
cores: 1
memory: 512m
serviceAccount: spark-operator-spark
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
executor:
instances: 2
cores: 1
memory: 512m
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
batchScheduler: yunikorn
batchSchedulerOptions:
queue: root.default

View File

@ -21,22 +21,42 @@ metadata:
spec:
type: Scala
mode: cluster
image: spark:3.5.3
image: docker.io/library/spark:4.0.0
imagePullPolicy: IfNotPresent
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples.jar
arguments:
- "5000"
sparkVersion: 3.5.3
sparkVersion: 4.0.0
driver:
labels:
version: 3.5.3
version: 4.0.0
cores: 1
memory: 512m
serviceAccount: spark-operator-spark
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
executor:
labels:
version: 3.5.3
version: 4.0.0
instances: 1
cores: 1
memory: 512m
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault

View File

@ -0,0 +1,81 @@
#
# Copyright 2025 The Kubeflow authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: sparkoperator.k8s.io/v1alpha1
kind: SparkConnect
metadata:
name: spark-connect
namespace: default
spec:
sparkVersion: 4.0.0
server:
template:
metadata:
labels:
key1: value1
key2: value2
annotations:
key3: value3
key4: value4
spec:
containers:
- name: spark-kubernetes-driver
image: spark:4.0.0
imagePullPolicy: Always
resources:
requests:
cpu: 1
memory: 1Gi
limits:
cpu: 1
memory: 1Gi
serviceAccount: spark-operator-spark
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
executor:
instances: 2
cores: 1
memory: 512m
template:
metadata:
labels:
key1: value1
key2: value2
annotations:
key3: value3
key4: value4
spec:
containers:
- name: spark-kubernetes-executor
image: spark:4.0.0
imagePullPolicy: Always
securityContext:
capabilities:
drop:
- ALL
runAsGroup: 185
runAsUser: 185
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault

Some files were not shown because too many files have changed in this diff Show More